Click any tag below to further narrow down your results
Links
This article explores a new indexing technique for data lakehouses called OTree, developed by Qbeast. It challenges traditional methods by using adaptive hypercubes to optimize data layout, improving query performance while addressing issues like partition granularity and imbalanced data distribution.
This article explains the impact of excessive indexes on Postgres performance, detailing how they slow down writes and reads, waste disk space, and increase maintenance overhead. It emphasizes the importance of regularly dropping unused and redundant indexes to optimize database efficiency.
This article explores creative database optimization techniques in PostgreSQL, focusing on scenarios that bypass full table scans and reduce index size. It emphasizes using check constraints and function-based indexing to improve query performance without unnecessary overhead.
The article critiques the widespread praise for pgvector, highlighting its limitations when used in production. It discusses indexing issues, real-time search challenges, and the complexities of maintaining metadata consistency under heavy load.
This article explains how PostgreSQL indexes work and their impact on query performance. It covers the types of indexes available, how data is stored, and the trade-offs in using indexes, including costs related to disk space, write operations, and memory usage.
This article explains how Cursor speeds up the indexing of large codebases by reusing existing indexes from teammates, reducing time-to-first-query significantly. It details the use of Merkle trees and similarity hashes to ensure secure and efficient data handling during the process.
This article explains how Floe improves the performance of geo joins by using H3 indexes. Traditional spatial joins can be slow due to their quadratic complexity, but with H3, the process becomes a fast equi-join through a filtering step that reduces the number of candidates. The result is a significant speedup in geospatial queries.
This article explains the new skip scan feature in PostgreSQL 18, which improves query performance by allowing the database to bypass unnecessary index entries. It details the setup process, how btree indexes work, and provides examples showing significant performance gains.
Allocating too much memory to Postgres can actually slow down performance, especially during index builds. The author explains how exceeding certain memory thresholds can lead to inefficient data processing and increased write operations, which negatively impact speed. It's better to use modest memory settings and adjust only based on proven benefits.
Aiven has released PostgreSQL 18, which features significant performance improvements and new functionalities like asynchronous I/O, enhanced JOIN and GROUP BY operations, and parallel GIN index creation. This version allows more flexibility in schema evolution and smarter indexing with skip scans. Users can try PostgreSQL 18 with a free trial at Aiven.
Discord outlines its innovative approach to indexing trillions of messages, focusing on the architecture that enables efficient retrieval and storage. The platform leverages advanced technologies to ensure users can access relevant content quickly while maintaining high performance and scalability.
PostgreSQL 18 introduces significant improvements to the btree_gist extension, primarily through the implementation of sortsupport, which enhances index building efficiency. These updates enable better performance for use cases such as nearest-neighbour search and exclusion constraints, offering notable gains in query throughput compared to previous versions.
The article explores the use of custom ICU collations with PostgreSQL's citext data type, highlighting performance comparisons between equality, range, and pattern matching operations. It concludes that while custom collations are superior for equality and range queries, citext is more practical for pattern matching until better index support for nondeterministic collations is achieved.
The Marginalia Search index has undergone significant redesign to enhance performance through new data structures optimized for modern hardware, increasing the index size from 350 million to 800 million documents. The article discusses the challenges faced in query performance and the implications of NVMe SSD characteristics, as well as the transition from B-trees to deterministic block-based skip lists for improved efficiency in document retrieval.
The article discusses the advantages of indexing JSONB data types in PostgreSQL, emphasizing improved query performance and efficient data retrieval. It provides practical examples and techniques for creating indexes, as well as considerations for maintaining performance in applications that utilize JSONB fields.
The article discusses techniques for efficiently indexing codebases using cursors, which can significantly enhance navigation and searching capabilities. It emphasizes the importance of structured indexing to improve the speed and accuracy of code retrieval, making it easier for developers to work with large codebases.
ClickHouse introduces its capabilities in full-text search, highlighting the efficiency and performance improvements it offers over traditional search solutions. The article discusses various features, including indexing and query optimization, that enhance the user experience for searching large datasets. Additionally, it covers practical use cases and implementation strategies for developers.
PostgreSQL's Index Only Scan enhances query performance by allowing data retrieval without accessing the table heap, thus eliminating unnecessary delays. It requires specific index types and query conditions to function effectively, and the concept of a covering index, which includes fields in the index, further optimizes this process. Understanding these features is crucial for backend developers working with PostgreSQL databases.
User-defined indexes can be embedded within Apache Parquet files, enhancing query performance without compatibility issues. By utilizing existing footer metadata and offset addressing, developers can create custom indexes, such as distinct value indexes, to improve data pruning efficiency, particularly for columns with limited distinct values. The article provides a practical example of implementing such an index using Apache DataFusion.
The article explores the differences in indexing between traditional relational databases and open table formats like Apache Iceberg and Delta Lake, emphasizing the challenges and limitations of adding secondary indexes to optimize query performance in analytical workloads. It highlights the importance of data organization and auxiliary structures in determining read efficiency, rather than relying solely on traditional indexing methods.
Data types significantly influence the performance and efficiency of indexing in PostgreSQL. The article explores how different data types, such as integers, floating points, and text, affect the time required to create indexes, emphasizing the importance of choosing the right data type for optimal performance.
Understanding when to rebuild PostgreSQL indexes is crucial for maintaining database performance. The decision depends on index type, bloat levels, and performance metrics, with recommendations to use the `pgstattuple` extension to assess index health before initiating a rebuild. Regular automatic rebuilds are generally unnecessary and can waste resources.