2 links tagged with all of: monitoring + distributed-systems
Click any tag below to further narrow down your results
Links
Netflix engineers presented a centralized platform for managing data deletion across various storage systems while ensuring durability, availability, and correctness. The platform has successfully deleted 76.8 billion rows without data loss, addressing challenges like data resurrection and resource spikes during deletion. Key recommendations emphasize the importance of rigorous validation and centralized monitoring.
Sharing a single Redis cache cluster across multiple services can lead to significant issues, such as key eviction affecting all services, complicating monitoring and debugging processes. While it may seem simpler initially, this approach can create confusion and performance problems as the system scales. In some cases, a shared cache is acceptable, but it's often better to maintain separate clusters for improved reliability and clarity.