Search This Blog

Fixing HDFS issues

fsck command scans all files and directories in HDFS for errors and abnormal conditions.  This has to be run by administrator periodically and also name node runs it and fixes most of the issues periodically.

Below is the command syntax and it needs to be run as hdfs user.

hdfs fsck <path>

We can specify root (/) directory to check for errors on complete HDFS or we can specify directory to check for errors in it.

fsck report contains

Displays under-replicated blocks,over-replicated, mis-replicated and corrupt blocks.

Displays number of total files and directories available in HDFS. 

Default replication factor and available average replication factor .

Number of data nodes and number of racks are also displayed in fsck report.

Finally it displays file system status as healthy or corrupt.

fsck final status needs to be healthy, If it is corrupt it needs to be fixed by either administrator or most of issues will be fixed by name node automatically over a period of time.

Below is sample fsck output.


hdfs fsck /

Total size:    466471737404 B (Total open files size: 27 B)

 Total dirs:    917
 Total files:   2042
 Total symlinks:                0 (Files currently being written: 3)
 Total blocks (validated):      4790 (avg. block size 97384496 B) (Total open file blocks (not validated): 3)
  ********************************
  CORRUPT FILES:        9
  MISSING BLOCKS:       9
  MISSING SIZE:         315800 B
  CORRUPT BLOCKS:       9
  ********************************
 Minimally replicated blocks:   4781 (99.81211 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       4274 (89.227554 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.0885177
 Corrupt blocks:                9
 Missing replicas:              4280 (29.944729 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Sun Mar 20 12:52:45 EDT 2016 in 244 milliseconds


The filesystem under path '/' is CORRUPT






under replicated blocks
over replicated blocks


dfs.replication in hdfs-site.xml specifies required number of replicas for a block on cluster. If number of replicas are less than that they are called under replicated blocks. This might happen when data nodes go down. If number of replicas are higher than that they are called over replicated blocks. Over replicated blocks might happen when crashed data nodes come back to normal.

under and over replicated blocks can be addressed with setrep command or name node will fix it after some point of time.

hdfs dfs -setrep -w 3 /path


  1. If you have 2 replicas but required 3 replicas,set replication factor 3. and also if you have 4 replicas but required 3 then also set replication factor 3.
  2. Run balancer ,some times it should also fix it.
  3. Copy under/over replicated file to different location and remove that under/over replicated file. Now rename copied file to original name. You need to be careful to use this trick.If you remove under/over replicated file,Jobs using that file might fail.
After replication factor is set, Use hdfs dfs -ls command on the file that also displays replication factor.

Corrupted blocks

We  should delete corrupted files and we can set appropriate replucate factor after that.
We need to use hdfs fsck / -delete command to delete corrupted files.

We can check corrupted blocks using  hdfs fsck / -list-corruptfileblocks command.

Missing blocks


    --find out which node has missing blocks and check if data node is running or not if possible try with data node restart. we can check data node status from active name node UI or run jps command on all data nodes to check if data node is running or not.

Administrator has to run fsck command reguralry to check hadoop file system for errors. and He has to take necessary actions against errors to avoid data loss.

fsck command has several options,Some of them are

-files

     It displays files of directory.

hdfs@cluster10-1:~> hdfs fsck / -files
/user/oozie/share/lib/sqoop/commons-io-2.1.jar 163151 bytes, 1 block(s):  OK

-blocks

       It displays blocks information.

hdfs@cluster10-1:~> hdfs fsck / -files -blocks
/user/oozie/share/lib/sqoop/oozie-sharelib-sqoop-4.0.0.2.1.2.0-402.jar 7890 bytes, 1 block(s):  OK
0. BP-18950707-10.20.0.1-1404875454485:blk_1073742090_1266 len=7890 repl=3

-locations
               It displays nodes host name where blocks are stored.

hdfs@cluster10-1:~> hdfs fsck / -files -blocks -locations
/user/oozie/share/lib/sqoop/sqoop-1.4.4.2.1.2.0-402.jar 819248 bytes, 1 block(s):  OK
0. BP-18950707-10.20.0.1-1404875454485:blk_1073742091_1267 len=819248 repl=3 [10.20.0.1:50010, 10.20.0.1:50010, 10.20.0.1:50010]

-delete
               It deletes corrupted blocks. We need to run it when we find corrupted blocks in the cluster.


-openforwrite

                      Displays files opened for writing.

-list-corruptfileblocks

                                  It displays only corrupted blocks for given path.

hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://cluster10-1:50070
The filesystem under path '/' has 0 CORRUPT files


Checking specific information


If we want to see specific type of files in fsck report we need to use grep command on fsck report.
If we want to see only under replicated blocks we need to grep like below.

hdfs fsck / -files -blocks -locations |grep -i "Under replicated"

/data/output/_partition.lst 297 bytes, 1 block(s):  Under replicated BP-18950707-10.20.0.1-1404875454485:blk_1073778630_38021. Target Replicas is 10 but found 4 replica(s).

We can replace under replicated with corrupt to see corrupt files.

hdfs@cluster10-1:~> hdfs fsck / -files -blocks -locations|grep -i corrupt
Connecting to namenode via http://cluster10-2:50070
/apps/hbase/data/corrupt <dir>
/data/output/part-r-00004: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778646
/data/output/part-r-00008: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778648
/data/output/part-r-00009: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778649
/data/output/part-r-00010: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778650
/data/output/part-r-00016: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778654
/data/output/part-r-00019: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778659
/data/output/part-r-00020: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778660
/data/output/part-r-00021: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778661
/data/output/part-r-00026: CORRUPT blockpool BP-18950707-10.21.0.1-1404875454485 block blk_1073778663

Status: CORRUPT
  CORRUPT FILES:        9
  CORRUPT BLOCKS:       9
 Corrupt blocks:                9

The filesystem under path '/' is CORRUPT


Above command displays complete information. If we want only file path,we need to use AWK.

hdfs fsck / -files -blocks -locations |grep -i "Under replicated"|awk -F " " '{print $1}'
Connecting to namenode via http://cluster10-1:50070
/data/output/_partition.lst

as we have discussed we can set replication factor using setrep command to fix under replicated blocks.when we have so many under-replicated blocks it is difficult to run setrep command on all files.
To avoid manual setting,write all under replicated files path to a file and write a shell script that sets replication factor for all files.

hdfs fsck / -files -blocks -locations |grep -i "Under replicated"|awk -F " " '{print $1}' >>underreplicatedfiles



Happy Hadooping.








4 comments:

  1. Once you have the information in file after running awk script you can use the following script to set replication factor for under replicated blocks.

    $ for problemfile in `cat /underreplicatedfiles`; do echo "Setting replication for $problemfile"; hdfs dfs -setrep 3 $problemfile; done

    https://netjs.blogspot.com/2018/07/how-to-handle-corrupt-missing-under-replicated-hdfs-blocks.html

    ReplyDelete
  2. Your article was really impressive.Do keep post like this . You can also visit my site if you have time.Kindly visit our site for Hp printer setup for additional info...
    123.hp.com/dj3755
    123.hp.com/dj3630

    ReplyDelete
  3. Sandboxie Crack Free Download
    Sandboxie Crack is excellent program which seems to recommended for particular purpose of preserving existing browsing programmer and modifications. .

    ReplyDelete