Exploring snapshots in HDFS

HDFS snapshot is saved copy of an existing directory. Snapshots will be useful for restoring the corrupt data. In this article we will learn how to manage HDFS snapshots.

Practice below commands to get practical understanding of HDFS snapshots.

1) Create a local file with sample numbers.

2) Create a folder on hdfs and upload local file to HDFS directory

The following commands create a folder called numbers in HDFS directory /user/hdfs and upload local file called numbers to HDFS directory.

hdfs dfs -mkdir /user/hdfs/numbers
hdfs dfs -ls /user/hdfs/numbers
hdfs dfs -put numbers /user/hdfs/numbers

3) Try to create a snapshot on  an HDFS directory

Snapshots can not be created on a folder directly. We need to enable snapshot on the directory before creating snapshots on it.

Directory is not a snapshottable directory error is thrown if snapshots are not enabled.

hdfs dfs -createSnapshot /user/hdfs/numbers

4) Allow snapshots and create snapshots

allowSnapshot command enables snapshots on a HDFS directory.

The folowing commands first enable snapshot on /user/hdfs/numbers and create snapshot on the same.

hadoop dfsadmin -allowSnapshot /user/hdfs/numbers
hdfs dfs -createSnapshot /user/hdfs/numbers

5) List snapshots using ls command

We can check snapshots in a directory using ls command. Snapshots of a directory will be stored in .snapshot directory of the folder.

 hdfs dfs -ls /user/hdfs/numbers/.snapshot
  hdfs dfs -ls /user/hdfs/numbers/.snapshot/s20170902-133455.787

The picture below shows HDFS directory /user/hdfs/numbers has a file called numbers that is also saved in snapshot diretcory /user/hdfs/numbers/.snapshot/s20170902-133455 .

If numbers file in /user/hdfs/numbers is corrupted , We can restore numbers file from /user/hdfs/numbers/.snapshot/s20170902-133455 directory.

6) List snapshottable directories in entire HDFS

lsSnapshottableDir  command lists all HDFS directory those have snapshots enabled.

hdfs lsSnapshottableDir

8) Create  snapshot with a specific name

By default snapshots are created with timestamp as a folder name. We can even name snapshot of directory at the time of creatinng snapshots.

The command below creates a snapshot called secondSS on HDFS directory /user/hdfs/numbers.

hdfs dfs -createSnapshot /user/hdfs/numbers secondSS

9) Delete file from HDFS folder

The command below deletes file numbers from directory /user/hdfs/numbers to see how to restore it.

hdfs dfs -rm /user/hdfs/numbers/numbers

10) Restore snapshot from HDFS directory

Snapshots will be restored using HDFS command cp.

 hdfs dfs -cp /user/hdfs/numbers/.snapshot/secondSS/numbers /user/hdfs/numbers

11) Try to disable snapshots

We need to delete all snapshots before disabling snapshots on a HDFS directory.

hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

12) Delete snapshots and disallow snapshot

The commands below first delete all snapshots before disabling snapshots.

 hdfs dfs -deleteSnapshot /user/hdfs/numbers secondSS
 hdfs dfsadmin -disallowSnapshot /user/hdfs/numbers

13) Rename a snapshot

renameSnapshot  command is used to change the name of a snapshot.

  hdfs dfs -renameSnapshot /user/hdfs/numbers secondSS thirdSS

Hope you migh learned HDFS snapshots with this article.

Happy Hadooping.