3 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Yelp outlines its approach to processing Amazon S3 server-access logs at scale, addressing challenges like high log volume and storage costs. They now compress logs into Parquet files, greatly reducing storage needs and improving query performance for analytics tasks. This system supports various operational use cases, from debugging to cost analysis.
If you do, here's more
Yelp developed a scalable and cost-effective system for processing Amazon S3 server access logs (SAL), tackling issues related to high log volume, storage costs, and query performance. Their approach involves generating terabytes of access logs daily and converting them into compact Parquet-formatted files. This process includes periodic compaction, which reduces storage use by about 85% and the number of log objects by more than 99.99%. The system allows for efficient analytics, enabling fast lookups for tasks like permission debugging and cost attribution.
The architecture utilizes AWS Glue Data Catalog to manage schemas across multiple accounts, along with a combination of scheduled batch jobs, Lambda functions, and partition-projection tables for automated log ingestion. Itβs designed to handle delayed or duplicate log deliveries, implementing idempotent inserts and tagging old log objects for lifecycle expiration. This setup supports various operational needs, such as querying access patterns for debugging and aggregating API usage for cost analysis.
Yelp's work demonstrates that object-level logging on S3 can be both efficient and manageable, offering a reference model for others facing similar challenges. Other companies like Upsolver provide tools that align with Yelpβs strategy by simplifying the ingestion and transformation of access logs. AWS has also shared its architecture for processing SAL, which mirrors Yelp's design. For real-time needs, AWS suggests a method for ingesting logs into OpenSearch, providing immediate insights at the expense of some long-term storage efficiency.
Questions about this article
No questions yet.