Quit Emailing Yourself

3 links tagged with all of: metadata + query-optimization

Click any tag below to further narrow down your results

Links

Deep Dive Into Hudi's Indexing Subsystem (Part 2 of 2) | Apache Hudi

This article explains Hudi's advanced indexing features, focusing on record and secondary indexes for efficient query processing. It also covers expression indexes for transformed queries and the async indexing process that allows background index building without disrupting operations.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ indexing + hudi query-optimization ✓ + async-indexing metadata ✓

Making Sense of Apache Iceberg Statistics

Apache Iceberg's statistics play a crucial role in optimizing query performance by enabling data skipping and efficient query planning. The article details the different types of statistics, including data-level and metadata-level stats, their functionalities, and how they can be configured to enhance performance in large-scale analytics environments. Understanding these statistics allows users to better tune their systems as workloads evolve.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ apache-iceberg + statistics + data-analytics query-optimization ✓ metadata ✓

Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet

External indexes, metadata stores, catalogs, and caches can significantly enhance query performance on Apache Parquet by allowing efficient data retrieval without the need for extensive reparsing. The blog discusses how to implement these components using Apache DataFusion to optimize custom data platforms for specific use cases. It also highlights the advantages of Parquet's hierarchical data organization and its compatibility with various indexing strategies.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ parquet + indexing + data-catalogs query-optimization ✓ metadata ✓