8 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article outlines the evolution of a data pipeline from using JSON to AVRO for Change Data Capture (CDC). As the business expanded, the limitations of JSON became apparent, leading to the adoption of AVRO, which improved performance, reduced storage costs, and streamlined schema evolution.
If you do, here's more
Two years ago, Fresha launched a CDC pipeline using Postgres, Debezium, Kafka, and Snowpipe Streaming to Snowflake. Initially, it operated smoothly with data in JSON format, allowing near real-time data ingestion. However, as the business expanded, the limitations of JSON became apparent. Adding new columns in Postgres required manual updates to the Snowflake models, causing operational headaches. The performance of queries on JSON data lagged behind that of relational tables, resulting in longer compilation times and increased data scans.
When Snowflake introduced AVRO support with schema evolution, Fresha seized the opportunity to improve their pipeline. They added a Schema Registry and set up Debezium connectors for AVRO serialization. This shift led to significant reductions in data footprint—down from 9 GB daily for JSON to under 1 GB for AVRO. The efficiency gains included faster network transfers and reduced latency. In Snowflake, the ingestion tables became columnar and typed directly from the schema registry, eliminating the need for complex operations on JSON and minimizing maintenance.
Comparing storage for 6.3 billion rows, the old JSON format consumed 546.9 GB, while the new AVRO format required only 394.3 GB. This transformation not only streamlined the data pipeline but also improved query efficiency and automated schema evolution, allowing Fresha to focus on analytics without the burden of constant model updates.
Questions about this article
No questions yet.