Quit Emailing Yourself

Scaling maintenance: Rethinking HDFS block placement for exabyte-scale clusters

7 min read | Saved February 14, 2026 | Copied!

hdfs 🤖 block-placement 🤖 maintenance 🤖 data-management 🤖 infrastructure 🤖

Do you care about this?

This article discusses LinkedIn's approach to improving HDFS block placement for its massive data clusters. It explains how they adapted their block placement policy to streamline maintenance operations, reduce data replication, and maintain high data availability. The changes were necessary due to the challenges of managing over 5 exabytes of data efficiently.

If you do, here's more

LinkedIn manages some of the largest Apache Hadoop clusters globally, with around 5 exabytes of data and about 10 billion objects. These clusters are vital for processing vast amounts of offline data. To maintain a 99.99% data availability, LinkedIn frequently updates software, firmware, and hardware. However, managing maintenance becomes increasingly complex as the clusters grow. The challenge lies in upgrading thousands of datanodes without causing significant disruptions or violating compliance standards for security patches.

Initially, LinkedIn relied on the default block placement policy (BPP), which became inefficient as the cluster size increased. This method required extensive data replication during maintenance, significantly straining network resources and slowing down operations. The two traditional approaches—decommissioning datanodes or using maintenance mode—proved inadequate due to the high demand on the namenode and network traffic. Decommissioning could take hours, while maintenance mode, even with a minimum live replica configuration, still led to slowdowns.

To tackle these issues, LinkedIn implemented a new strategy involving upgrade domains, which effectively categorize datanodes into logical groups. By assigning upgrade domains to datanodes, the company could eliminate the need for block replication during maintenance. This approach allows for more efficient use of network resources and significantly speeds up the maintenance process. Each cluster now has 20 upgrade domains, ensuring that replicas of data blocks are spread across different racks, reducing the risk of data loss while streamlining operations.

Questions about this article

No questions yet.