Quit Emailing Yourself

6x Faster ML Inference: Why Online >> Batch

The article discusses the transformation of a batch machine learning inference system into a real-time system to handle explosive user growth, achieving a 5.8x reduction in latency and maintaining over 99.9% reliability. Key optimizations included migrating to Redis for faster data access, compiling models to native C binaries, and implementing gRPC for improved data transmission. These changes enabled the system to serve millions of predictions quickly while capturing significant revenue that would have otherwise been lost.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ machine-learning + performance latency ✓ optimization ✓ + real-time

Showing Delivery SLA at 10x Scale with 1/10th Latency

Flipkart's Promise team optimized the delivery date calculation process for their Search and Browse (S&B) page, reducing latency to 100ms for 100 items while scaling to 10 times the current query per second (QPS). The solution involved caching source and vendor capacities and decoupling their storage to enhance real-time delivery date accuracy and efficiency. These improvements ensure a better user experience without compromising on performance metrics during high demand.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ delivery optimization ✓ + e-commerce latency ✓ + supply-chain

Links

6x Faster ML Inference: Why Online >> Batch

Showing Delivery SLA at 10x Scale with 1/10th Latency