Search This Blog

Enabling rack awareness for Hadoop cluster

In this article , We will learn how to enable rack awareness in hadoop clusters. Assume that cluster has large  number of nodes and nodes are placed in more than one rack. If we enable rack awareness , all replicas of block will not be stored in one rack so that we can have at least one replica of block is available for data processing in case of rack failures.

Goal of rack awareness is to improve data availability and decrease network bandwidth.

1) Enabling rack awareness without Apache Ambari.

In old versions of HDP we used to enable rack awareness manually. Latest versions of Apache Ambari  supports rack awareness in GUI.

Check the link on how to enable rack awareness manually , You will not require this as most of the latest versions of Apache Ambari are supporting in GUI.

2) Enabling rack awareness using Apache Ambari

Now we are going to see how to enable rack awareness using Apache Ambari . We have a five node cluster and by default we have got all nodes in default-rack.

Now we will modify rack for datanode3.

go to --> hosts in ambari -----> click on host where you want to modify rack------>go to host actions -----> click set rack

Modify rack name to rack-1 and click OK.

Go back to hosts page in Ambari to see rack name for datanode3 is changed.

You can see that nodes are placed in two different racks they are default-rack and rack-1.

3) Confirm rack awareness enabled

We can also confirm from fsck command and also from hdfs dfsadmin -report  commands.

The picture below is the output of command hdfs fsck / and it shows number of racks is 2.

Let me know if you have any questions on above article.