3 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article discusses how attrition among engineers, particularly in the context of AWS outages, is rarely acknowledged in public incident reports. While internal write-ups may reference attrition, they often focus on technical causes, leaving out broader organizational factors that contribute to incidents. The author argues that attrition is a significant risk factor that impacts system reliability, similar to other systemic risks.
If you do, here's more
The recent AWS outage sparked discussions about how the loss of experienced engineers may have contributed. Corey Quinn, a known cloud economist, highlighted this in his commentary, linking it to Amazon's announcement of laying off around 14,000 employees, including AWS staff. James Gosling, a prominent figure in Java development and former AWS employee, echoed these concerns. While the author refrains from assessing the accuracy of these claims, he aims to explore how attrition is treated in incident reports, both public and internal.
Public incident reports typically avoid mentioning attrition because their primary goal is to reassure stakeholders that the technical issue is being resolved. Acknowledging attrition complicates the narrative, especially when addressing potential risks associated with layoffs. This creates a situation where companies can deny that attrition increases incident risk, as they don't bring it up in public documents. In contrast, internal reports can reference attrition, but they usually focus narrowly on technical failures. The author recalls instances from his career where internal write-ups acknowledged expertise loss, particularly in cases where the original teams no longer existed, leaving others to manage critical services without sufficient knowledge.
Despite the ability to address attrition in internal reports, it's still rarely discussed. The author argues that while tools like the five whys method aim to uncover systemic issues, they often overlook organizational factors like attrition. He likens attrition's role in incidents to smoking in lung cancer or climate change in severe weather—both increase risk but can't be pinned as the sole cause of specific events. The author asserts that organizational factors contribute to every major incident, and their absence in reports reflects the limitations of the questions being asked rather than the complexities of the incidents themselves.
Questions about this article
No questions yet.