Search This Blog

Decommissioning datanodes in Hadoop cluster

In this article , We will learn how to perform decommissioning of data nodes in a Hadoop cluster.
Decommissioning process of the data node ensures that data is transferred to other nodes so that the existing replication factor is not disturbed.

1) Check NameNode UI for available data nodes and their status.

The picture below shows We have three data nodes in the cluster. All of them have had admin state as In Service that means all of them are in working state.

We will try to decommission the data node on the master2 host.

2) dfs.hosts.exclude property

Ensure dfs.hosts.exclude property there in hdfs-site.xml, If not there please add it. This property is required to perform decommissioning of the data node.

The picture below shows property in Apache Ambari. You can also check the same property in Cloudera Manager if you are using CDH.

3) Update dfs.exclude file

Update dfs.exclude file with hostname where you want to decommission the datanode. We would like to decommission datanode on the host master2. This file name is the value of property dfs.hosts.exclude.

 Perform this step on the host where active name node is running.

4) Run refreshNodes command

Run refreshNodes command on active name node to decommission the data node .

hdfs dfs -refreshNodes.

5) Check decommissioning status.

Check name node UI to see master2 host under decommissioning category.

6) Check Decommissioned status

Check name node UI to see one node is marked as decommissioned.
The picture below shows data node in master2 is decommissioned.

Trouble shooting : 

1)  Check name node logs

 If you are unable to decommission the data node,  Check active namenode logs for errors. These logs will have errors related to decommissioning.

2)  Unable to decommission the data node

In small clusters, It is difficult to perform decommissioning of data nodes if  replication factor is less than number of available  data nodes.

For example :

If you have 3 data nodes  and replication factor is 3, You can not decommission one data node because cluster can not achieved replication factor 3 if one data node is decommissioned.

Reduce replication factor to 2 and perform decommissioning.

Let me know if you have any questions.