3 links tagged with all of: metadata + query-optimization
Click any tag below to further narrow down your results
Links
This article explains Hudi's advanced indexing features, focusing on record and secondary indexes for efficient query processing. It also covers expression indexes for transformed queries and the async indexing process that allows background index building without disrupting operations.
Apache Iceberg's statistics play a crucial role in optimizing query performance by enabling data skipping and efficient query planning. The article details the different types of statistics, including data-level and metadata-level stats, their functionalities, and how they can be configured to enhance performance in large-scale analytics environments. Understanding these statistics allows users to better tune their systems as workloads evolve.
External indexes, metadata stores, catalogs, and caches can significantly enhance query performance on Apache Parquet by allowing efficient data retrieval without the need for extensive reparsing. The blog discusses how to implement these components using Apache DataFusion to optimize custom data platforms for specific use cases. It also highlights the advantages of Parquet's hierarchical data organization and its compatibility with various indexing strategies.