Quit Emailing Yourself

# indexing → parquet

2 links tagged with all of: indexing + parquet

Click any tag below to further narrow down your results

Links

Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet

External indexes, metadata stores, catalogs, and caches can significantly enhance query performance on Apache Parquet by allowing efficient data retrieval without the need for extensive reparsing. The blog discusses how to implement these components using Apache DataFusion to optimize custom data platforms for specific use cases. It also highlights the advantages of Parquet's hierarchical data organization and its compatibility with various indexing strategies.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

parquet ✓ indexing ✓ + data-catalogs + query-optimization + metadata

Embedding User-Defined Indexes in Apache Parquet Files

User-defined indexes can be embedded within Apache Parquet files, enhancing query performance without compatibility issues. By utilizing existing footer metadata and offset addressing, developers can create custom indexes, such as distinct value indexes, to improve data pruning efficiency, particularly for columns with limited distinct values. The article provides a practical example of implementing such an index using Apache DataFusion.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

parquet ✓ indexing ✓ + datafusion + performance + analytics