4 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article follows a new employee tasked with creating a Service Level Objective (SLO) for a service called Foo. The narrative highlights various issues that arise during the implementation, including misunderstandings about error metrics and the importance of accurate monitoring. Ultimately, the employee faces criticism for failing to meet expectations in a timely manner.
If you do, here's more
A new employee at Corp. Générique, referred to as Newguy, is tasked with implementing an availability Service Level Objective (SLO) for the Foo service. The goal is to achieve four nines availability, with the SLO due for presentation in a Friday demo. After presenting, issues arise when it’s discovered that the SLO has counted 4xx response codes as bad events, leading to the conclusion that the error budget was exhausted despite no real incidents occurring. The feedback highlights a misunderstanding of how to interpret these codes, as they often indicate user errors rather than service failures.
As the conversation unfolds, it becomes clear that Newguy's SLO failed to catch a significant outage that lasted four hours because the Foo service wasn’t emitting any metrics during that time. The solution proposed involves using metrics from the load balancer instead. However, another issue arises when the Bar team reports problems with object creation in the web UI, which also returns 400 errors. Newguy is instructed to adjust the SLO to include these errors, noting that they stem from the UI, which is part of Foo's responsibility.
Despite attempts to rectify the SLO, the leadership team identifies further problems with the SLO metrics, showcasing a spike to 102% availability. This figure raises concerns about the accuracy and reliability of the data being reported. The ongoing issues, after two weeks of work on what was initially a simple two-point task, lead to the conclusion that there’s a skills mismatch for Newguy's role. HR gets involved, suggesting that Newguy may not be suited for the position after the numerous setbacks and the team's frustration with the unresolved problems.
Questions about this article
No questions yet.