27 links
tagged with duckdb
Click any tag below to further narrow down your results
Links
The Airport extension enhances DuckDB by adding Arrow Flight support, allowing for efficient querying and data management via Arrow Flight servers. Instructions for cloning the repository, building the extension, and running tests are provided, along with notes on dependencies and setup requirements.
A lightweight RESTful geospatial feature server for DuckDB has been developed using Go, supporting the OGC API - Features standard and enabling CRUD operations on spatial data. It leverages DuckDB's spatial extension for fast analytical queries and efficient processing, with features like CORS support, GZIP encoding, and a user-friendly HTML interface. The server can be configured via environment variables and supports both HTTP and HTTPS protocols for secure access.
The article discusses the features and capabilities of DuckDB, a high-performance analytical database management system designed for data analytics. It highlights its integration with various data sources and its usability in data science workflows, emphasizing its efficiency and ease of use.
The article discusses advanced sorting techniques in DuckDB that enhance the performance of selective queries. It highlights the importance of efficient data retrieval and presents methods to optimize sorting for improved query execution speed. The innovations presented aim to benefit users dealing with large datasets and complex queries.
DuckDB 0.14.0 has been released, featuring significant enhancements and new functionalities aimed at improving performance and usability. Key updates include support for new data types, optimizations for query execution, and better integration with various programming environments. This release continues DuckDB's commitment to providing a powerful analytical database for data science and analytics tasks.
The article discusses the comparison between DuckDB and Polars, emphasizing that choosing between them depends on the specific context and requirements of the task at hand. It highlights DuckDB as an analytical database focused on SQL queries, while Polars is presented as a fast data manipulation library designed for data processing, akin to Pandas. Ultimately, the author argues that there is no definitive "better" option, and the choice should be driven by the problem being solved.
The article discusses spatial joins in DuckDB, highlighting their significance in efficiently combining datasets based on geographic relationships. It provides insights into various types of spatial joins and their implementation, showcasing the capabilities of DuckDB in handling spatial data analysis.
Open lakehouses are reshaping the data engineering landscape, presenting both opportunities and challenges for Databricks as competitors like DuckDB and Apache Ray emerge. These tools offer simplified and cost-effective alternatives for data processing and analytics, leading to potential integration complexities and the need for Databricks to adapt or risk losing its competitive edge. The future success of Databricks may hinge on its ability to manage this evolving ecosystem.
Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB using the Substrait query format, achieving approximately 10x speedup over CPU query engines for TPC-H workloads. It is designed for interactive analytics and supports various AWS EC2 instances, with detailed setup instructions for installation and performance testing. Sirius is currently in active development, with plans for additional features and support for more database systems.
DuckDB is gaining recognition as a transformative geospatial software that has emerged in the past decade, offering powerful capabilities for data analysis and manipulation. Its integration with geospatial features significantly enhances data processing efficiency, making it a valuable tool for developers and analysts in various fields. The article highlights its impact on the geospatial landscape and the potential it holds for future advancements.
The article discusses streaming patterns in DuckDB, highlighting its capabilities for handling large-scale data processing efficiently. It presents various approaches and techniques for optimizing data streaming and querying, emphasizing the importance of performance and scalability in modern data applications.
The article discusses the integration of DuckDB and PyIceberg within a serverless architecture, highlighting how these technologies can streamline data processing in a Lambda environment. It provides insights into the advantages of using DuckDB for analytics and the role of PyIceberg in managing data lakes efficiently. Additionally, it addresses performance considerations and implementation strategies for effective data management.
Foursquare has launched SQLRooms, an open-source framework for building single-node data applications using DuckDB, enabling enterprise-grade analytics to run directly in browsers without backend infrastructure. This innovative framework leverages recent advancements in single-node computing, browser capabilities, and local AI deployment, allowing users to process large datasets efficiently while maintaining data privacy and minimizing cloud costs. SQLRooms includes essential tools for state management, data visualization, and an AI-powered analytics assistant, transforming laptops and browsers into self-sufficient data processing environments.
The stochastic extension for DuckDB enhances SQL capabilities by adding a range of statistical distribution functions for advanced statistical analysis, probability calculations, and random sampling. Users can install the extension to compute various statistical properties, generate random samples, and perform complex analyses directly within their SQL queries. The extension supports numerous continuous and discrete distributions, making it a valuable tool for data scientists and statisticians.
DuckDB GSheets is an experimental extension that allows users to read and write Google Sheets using SQL commands. It supports authentication through various methods, including access tokens and private keys, enabling seamless integration between DuckDB and Google Sheets. The extension is community-maintained and comes with specific usage guidelines and limitations.
The Tera extension for DuckDB enables powerful template rendering directly within SQL queries, facilitating the generation of dynamic reports, configuration files, HTML, and more. It utilizes the Tera templating engine to allow users to create personalized content and perform data transformations seamlessly from their database environment.
SQLFlow is a high-performance stream processing engine that allows users to build data pipelines using SQL, integrating with various input sources like Kafka and WebSockets, and outputting to systems such as PostgreSQL and cloud storage. It leverages DuckDB and Apache Arrow for efficient processing, offering features like data aggregation, enrichment, and support for various serialization formats. The article provides a quickstart guide, setup instructions, and performance benchmarks for SQLFlow.
The article provides a comprehensive guide to getting started with Spark and DuckDB within the DuckLake environment, detailing setup and configuration steps. It emphasizes the integration of powerful data analysis tools for efficient data processing and management.
The article discusses how to optimize the FDA's drug event dataset, which is stored as large, nested JSON files, by normalizing repeated fields, particularly pharm_class_epc. By extracting these values into a separate lookup table and using integer IDs, the author significantly improved query performance and reduced memory usage in DuckDB, transforming slow, resource-intensive queries into fast, efficient ones.
The article discusses stream windowing functions in DuckDB, explaining how they can be utilized for analyzing time-series data with various windowing strategies. It emphasizes the importance of efficient data handling and processing in real-time analytics and provides examples of applying these functions for better data insights.
The article explores a creative use of DuckDB's WebAssembly (WASM) capabilities to render the classic video game Doom using SQL queries. It showcases how SQL, typically used for data manipulation, can be leveraged in unconventional ways to create interactive experiences like gaming. The approach highlights the flexibility and power of modern database technologies in innovative applications.
The podcast discusses DuckDB, an emerging database technology that offers powerful analytics capabilities and flexibility. It highlights its growing ecosystem, including integrations and community contributions, positioning DuckDB as a competitive option in the data management landscape.
Multiple DuckDB-related npm packages were compromised, including duckdb and its associated modules, which contained malicious code aimed at draining crypto wallets. The attack mirrors previous incidents of phishing in the npm ecosystem, leading to the vendor marking the latest release as deprecated and issuing an advisory on GitHub.
GTFS is a standardized format for public transportation data that enables interoperability across various transit applications. This article explains how to create a DuckDB database to analyze GTFS Schedule datasets, detailing the necessary steps for loading and querying the data from example datasets.
The article discusses the transition from using DuckDB, a powerful analytical database, to Duckhouse, a new framework designed to enhance data analysis capabilities. It highlights the features and improvements that Duckhouse offers, aiming to streamline data processing and analytics workflows. The author emphasizes the importance of this evolution for data professionals seeking more efficient tools.
The article discusses methods for converting CSV and TXT files to Excel format using various tools like Pandas, DuckDB, and Polars. It emphasizes the need for efficient and concise code solutions for this common task, highlighting the simplicity of some one-liner approaches. The author expresses a preference for minimal coding effort while achieving the desired outcome.
The article provides a comprehensive tutorial on implementing a semantic layer using DuckDB, which allows users to effectively manage and query their data. It covers key concepts, practical steps, and examples to help users understand the integration of a semantic layer with DuckDB. Additionally, it emphasizes the benefits of using a semantic layer for data accessibility and analysis.