Click any tag below to further narrow down your results
Links
DuckDB now allows users to interact with Iceberg REST Catalogs directly from their browser without any setup. This functionality leverages a WebAssembly version of DuckDB, enabling serverless querying of Iceberg data. Users can run SQL commands and access their data securely through familiar interfaces.
The author shares their shift from using Excel and Google Sheets to DuckDB for handling CSV files. They highlight the simplicity of using SQL for tasks like extracting unique user IDs and exporting data, while also noting the convenience of directly querying various data sources.
QuackStore is an extension that speeds up data queries by caching remote files locally. It stores frequently accessed portions of files, reducing load times for repeated queries and improving efficiency. The extension is ideal for scenarios with repeated access to large or remote datasets.
DuckDB v1.4 introduces support for data-at-rest encryption using AES-GCM and AES-CTR ciphers. The article details how to implement encryption, manage keys, and the structure of encrypted data within DuckDB. It also highlights performance considerations and current limitations in compliance with NIST standards.
This article explores the performance of single-node data processing frameworks like DuckDB, Polars, and Daft against Spark using a 650GB dataset stored in Delta Lake on S3. It highlights the concept of "cluster fatigue" and demonstrates that these single-node tools can handle large datasets efficiently without the overhead of distributed computing.
DuckDB 1.4.3 introduces bug fixes, performance improvements, and adds native extensions and Python support for Windows Arm64. Key updates include corrections to query results and enhancements for Azure Blob Storage writing. This version allows for better memory management and introduces a native ODBC driver.
The DuckDB-Iceberg extension now supports insert, update, and delete operations for Iceberg v2 tables in version 1.4.2. Users can interact with Iceberg REST Catalogs and manage table properties while utilizing SQL syntax for data manipulation. However, there are limitations regarding updates on partitioned tables and the lack of copy-on-write support.
This article explains how to deploy DuckDB as a WebAssembly module within Cloudflare Workers, enabling SQL queries without a traditional database server. It details the limitations of Cloudflare Workers, the use of Emscripten's Asyncify to handle asynchronous calls, and provides setup and coding instructions for creating a SQL query API.
This article details the creation of a personal knowledge assistant using Obsidian, DuckDB, and MotherDuck. It explains how to leverage local notes and a Retrieval-Augmented Generation system to uncover hidden connections and related content, enhancing brainstorming and research.
This article explains the optimization rules in DuckDB, focusing on how its advanced optimizer enhances query performance. It details the optimizer's structure, core functions, and how to implement custom optimization rules. A brief overview of 26 built-in optimization rules is also provided.
This article critiques SQL's complexities and inefficiencies while highlighting alternatives like DuckDB. It discusses common frustrations with SQL syntax and suggests ways to enhance usability, including more intuitive commands and error handling.
pg_lake allows Postgres to manage Iceberg tables and interact with data stored in object storage like S3. It supports transactions, various data formats, and utilizes DuckDB for efficient query execution. Users can create, modify, and query data seamlessly within Postgres.
The author shares their shift from using Excel and Google Sheets to DuckDB and SQL for handling CSV files, highlighting the efficiency of querying data directly. They discuss the benefits of using SQL for data manipulation and invite readers to share their own CSV handling tips.
DuckDB has proven to be superior to Polars when handling large datasets, particularly 1TB of data. While DuckDB effectively manages memory and execution with a robust design, Polars struggles with large data processing, leading to out-of-memory errors.
The Airport extension enhances DuckDB by adding Arrow Flight support, allowing for efficient querying and data management via Arrow Flight servers. Instructions for cloning the repository, building the extension, and running tests are provided, along with notes on dependencies and setup requirements.
A lightweight RESTful geospatial feature server for DuckDB has been developed using Go, supporting the OGC API - Features standard and enabling CRUD operations on spatial data. It leverages DuckDB's spatial extension for fast analytical queries and efficient processing, with features like CORS support, GZIP encoding, and a user-friendly HTML interface. The server can be configured via environment variables and supports both HTTP and HTTPS protocols for secure access.
The article discusses the features and capabilities of DuckDB, a high-performance analytical database management system designed for data analytics. It highlights its integration with various data sources and its usability in data science workflows, emphasizing its efficiency and ease of use.
The article discusses advanced sorting techniques in DuckDB that enhance the performance of selective queries. It highlights the importance of efficient data retrieval and presents methods to optimize sorting for improved query execution speed. The innovations presented aim to benefit users dealing with large datasets and complex queries.
The article discusses spatial joins in DuckDB, highlighting their significance in efficiently combining datasets based on geographic relationships. It provides insights into various types of spatial joins and their implementation, showcasing the capabilities of DuckDB in handling spatial data analysis.
DuckDB 0.14.0 has been released, featuring significant enhancements and new functionalities aimed at improving performance and usability. Key updates include support for new data types, optimizations for query execution, and better integration with various programming environments. This release continues DuckDB's commitment to providing a powerful analytical database for data science and analytics tasks.
The article discusses the comparison between DuckDB and Polars, emphasizing that choosing between them depends on the specific context and requirements of the task at hand. It highlights DuckDB as an analytical database focused on SQL queries, while Polars is presented as a fast data manipulation library designed for data processing, akin to Pandas. Ultimately, the author argues that there is no definitive "better" option, and the choice should be driven by the problem being solved.
Open lakehouses are reshaping the data engineering landscape, presenting both opportunities and challenges for Databricks as competitors like DuckDB and Apache Ray emerge. These tools offer simplified and cost-effective alternatives for data processing and analytics, leading to potential integration complexities and the need for Databricks to adapt or risk losing its competitive edge. The future success of Databricks may hinge on its ability to manage this evolving ecosystem.
Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB using the Substrait query format, achieving approximately 10x speedup over CPU query engines for TPC-H workloads. It is designed for interactive analytics and supports various AWS EC2 instances, with detailed setup instructions for installation and performance testing. Sirius is currently in active development, with plans for additional features and support for more database systems.
DuckDB is gaining recognition as a transformative geospatial software that has emerged in the past decade, offering powerful capabilities for data analysis and manipulation. Its integration with geospatial features significantly enhances data processing efficiency, making it a valuable tool for developers and analysts in various fields. The article highlights its impact on the geospatial landscape and the potential it holds for future advancements.
The article discusses streaming patterns in DuckDB, highlighting its capabilities for handling large-scale data processing efficiently. It presents various approaches and techniques for optimizing data streaming and querying, emphasizing the importance of performance and scalability in modern data applications.
The article discusses the integration of DuckDB and PyIceberg within a serverless architecture, highlighting how these technologies can streamline data processing in a Lambda environment. It provides insights into the advantages of using DuckDB for analytics and the role of PyIceberg in managing data lakes efficiently. Additionally, it addresses performance considerations and implementation strategies for effective data management.
The stochastic extension for DuckDB enhances SQL capabilities by adding a range of statistical distribution functions for advanced statistical analysis, probability calculations, and random sampling. Users can install the extension to compute various statistical properties, generate random samples, and perform complex analyses directly within their SQL queries. The extension supports numerous continuous and discrete distributions, making it a valuable tool for data scientists and statisticians.
Foursquare has launched SQLRooms, an open-source framework for building single-node data applications using DuckDB, enabling enterprise-grade analytics to run directly in browsers without backend infrastructure. This innovative framework leverages recent advancements in single-node computing, browser capabilities, and local AI deployment, allowing users to process large datasets efficiently while maintaining data privacy and minimizing cloud costs. SQLRooms includes essential tools for state management, data visualization, and an AI-powered analytics assistant, transforming laptops and browsers into self-sufficient data processing environments.
DuckDB GSheets is an experimental extension that allows users to read and write Google Sheets using SQL commands. It supports authentication through various methods, including access tokens and private keys, enabling seamless integration between DuckDB and Google Sheets. The extension is community-maintained and comes with specific usage guidelines and limitations.
SQLFlow is a high-performance stream processing engine that allows users to build data pipelines using SQL, integrating with various input sources like Kafka and WebSockets, and outputting to systems such as PostgreSQL and cloud storage. It leverages DuckDB and Apache Arrow for efficient processing, offering features like data aggregation, enrichment, and support for various serialization formats. The article provides a quickstart guide, setup instructions, and performance benchmarks for SQLFlow.
The Tera extension for DuckDB enables powerful template rendering directly within SQL queries, facilitating the generation of dynamic reports, configuration files, HTML, and more. It utilizes the Tera templating engine to allow users to create personalized content and perform data transformations seamlessly from their database environment.
The article provides a comprehensive guide to getting started with Spark and DuckDB within the DuckLake environment, detailing setup and configuration steps. It emphasizes the integration of powerful data analysis tools for efficient data processing and management.
The article discusses how to optimize the FDA's drug event dataset, which is stored as large, nested JSON files, by normalizing repeated fields, particularly pharm_class_epc. By extracting these values into a separate lookup table and using integer IDs, the author significantly improved query performance and reduced memory usage in DuckDB, transforming slow, resource-intensive queries into fast, efficient ones.
The article discusses stream windowing functions in DuckDB, explaining how they can be utilized for analyzing time-series data with various windowing strategies. It emphasizes the importance of efficient data handling and processing in real-time analytics and provides examples of applying these functions for better data insights.
The article explores a creative use of DuckDB's WebAssembly (WASM) capabilities to render the classic video game Doom using SQL queries. It showcases how SQL, typically used for data manipulation, can be leveraged in unconventional ways to create interactive experiences like gaming. The approach highlights the flexibility and power of modern database technologies in innovative applications.
The podcast discusses DuckDB, an emerging database technology that offers powerful analytics capabilities and flexibility. It highlights its growing ecosystem, including integrations and community contributions, positioning DuckDB as a competitive option in the data management landscape.
Multiple DuckDB-related npm packages were compromised, including duckdb and its associated modules, which contained malicious code aimed at draining crypto wallets. The attack mirrors previous incidents of phishing in the npm ecosystem, leading to the vendor marking the latest release as deprecated and issuing an advisory on GitHub.
GTFS is a standardized format for public transportation data that enables interoperability across various transit applications. This article explains how to create a DuckDB database to analyze GTFS Schedule datasets, detailing the necessary steps for loading and querying the data from example datasets.
The article discusses the transition from using DuckDB, a powerful analytical database, to Duckhouse, a new framework designed to enhance data analysis capabilities. It highlights the features and improvements that Duckhouse offers, aiming to streamline data processing and analytics workflows. The author emphasizes the importance of this evolution for data professionals seeking more efficient tools.
The article discusses methods for converting CSV and TXT files to Excel format using various tools like Pandas, DuckDB, and Polars. It emphasizes the need for efficient and concise code solutions for this common task, highlighting the simplicity of some one-liner approaches. The author expresses a preference for minimal coding effort while achieving the desired outcome.
The article provides a comprehensive tutorial on implementing a semantic layer using DuckDB, which allows users to effectively manage and query their data. It covers key concepts, practical steps, and examples to help users understand the integration of a semantic layer with DuckDB. Additionally, it emphasizes the benefits of using a semantic layer for data accessibility and analysis.