Search This Blog

HDFS REST API example with Knox gateway and without Knox gateway

In this article, We will learn how to use HDFS REST API both with Knox and without Knox API.

As we know Apache Knox is a security technology that provides common REST API to hide REST APIs of all Hadoop eco-system tools. Apache Knox hides REST API details of several technologies like Hadoop , Hive , HBASE and OOZIE.

1) Check  the folder  status in HDFS using HDFS REST API.

In this step we will learn how to use HDFS REST API , the command below check status of a HDFS directory   /user/hdfs/restapitest using HDFS API.

 curl  "http://master2:50070/webhdfs/v1/user/hdfs/restapitest?"

master2                      : hostname of Active namenode

50070                         : Http port number of active namenode

webhdfs                     : HDFS REST API is called webhdfs and It is fixed in URL.

v1                               : v1 is the version number of webhdfs.  It is also fixed .

GETFILESTATUS    : Gives file or folder information from HDFS.                   : Takes user name on behalf of whom you are submitting the HDF REST API command

Problems :

  • Hadoop host names and port numbers are  exposed to external world.  People can attack on the HDFS cluster easily.

2) Check same folder status using Knox REST API.

In this step we will check status of same HDFS directory /user/hdfs/restapitest using Knox REST API.

Apache Knox URL does not contain any details about namenode hostname and port numbers . It just contains webHDFS word , user name, directory path and operation we are performing like below.

curl -u admin:admin-password -i -v -k "https://datanode1:8442/gateway/default/webhdfs/v1/user/hdfs/restapitest?"

admin:admin-password : default username and password for default topology in knox.
defualt                             : topology name , It is also default topology.
8442                                 : Knox gateway port number defined in gateway.port property.
datanode1                       : hostname where Knox gateway is installed.

Apche Knox connects to active name node and port number using it's topology. Apache Knox comes with topology called default. default topology information is stored in /etc/knox/conf/topologies/default.xml file.

The picture shows webHDFS urls stored in default topology.

Advantages :

  • Hadoop services host names and port numbers are not exposed to external world.  Very less probability for external attacks.
I hope it is clear now how Knox protects our Hadoop eco system using REST API.