Click any tag below to further narrow down your results
Links
parqeye is a command-line tool for viewing Parquet files. It allows users to check the contents, schema, and metadata directly in the terminal, featuring a tab-based interface for easy navigation. You can visualize data, explore schemas, and access row group statistics efficiently.
External indexes, metadata stores, catalogs, and caches can significantly enhance query performance on Apache Parquet by allowing efficient data retrieval without the need for extensive reparsing. The blog discusses how to implement these components using Apache DataFusion to optimize custom data platforms for specific use cases. It also highlights the advantages of Parquet's hierarchical data organization and its compatibility with various indexing strategies.