Quit Emailing Yourself

Colocating Input Partitions with Kafka Streams When Consuming Multiple Topics: Sub-Topology Matters! | by Vishal Sharma | Expedia Group Technology | Medium

3 min read | Saved February 14, 2026 | Copied!

kafka 🤖 kafka-streams 🤖 caching 🤖 architecture 🤖 performance 🤖

Do you care about this?

This article discusses how Expedia Group improved their Kafka Streams application by ensuring that identical keys from two topics were processed by the same instance. They faced issues with partition assignment and solved it by using a shared state store, which enhanced caching efficiency and reduced redundant API calls.

If you do, here's more

Expedia Group faced challenges while consuming events from two Kafka topics that had the same number of partitions and emitted similarly keyed records. The goal was to optimize processing by using a local in-memory cache to avoid redundant API calls when transforming keys. They expected Kafka Streams to route identical partition indices from both topics to the same processing instance, which would enable effective cache reuse. However, in production, they found that identical keys were processed by different instances, disrupting their caching strategy and leading to unnecessary API calls.

To address this issue, the team replaced the local cache with a Kafka Streams state store, which is a distributed storage solution. By linking both processing branches to this shared state store, they merged the two sub-topologies into a single topology. This adjustment ensured that Kafka Streams assigned partitions from both topics to the same task, allowing identical keys to be routed to the same instance consistently. The result was significant: reduced external API calls and improved system performance.

Ultimately, this experience highlighted how topology design impacts partition assignment behavior in Kafka Streams. The architecture must be carefully constructed, particularly when multiple topics are involved without an explicit join. Using a shared state store proved effective not just for data sharing but for influencing the execution behavior of distributed applications.

Questions about this article

No questions yet.