Search This Blog

HDP (Hortonworks Data Platform) installation using Apache Ambari

HDP (Hortonworks Data Platform) installation using Apache Ambari.

In this article , We will learn how to install HDP (Hortonworks Data Platform) using Apache Ambari. Before HDP installation we need to complete Ambari server installation.

1) Monitor the Ambari-server log

It is recommended to have live amabri-server log to see detailed error if you hit any issues during the HDP installation. So open ambari-server log file and monitor the same.

The picture below shows live ambari-server log.




2) Install Wizard

Login into Ambari GUI using credentials (default :admin/admin) and click on Launch install wizard button to start HDP installation.





3) HDP version

Select the HDP version you want to install. Amabri also shows what are the different components we get in a HDP version.

Currently HDP2.6 is the latest and we are trying to install HDP 2.5 here.



Click Next once HDP version is selected.


4)Name the cluster

Mention the name you want to use for your cluster. I have given myhdp25 as the name to my cluster.



5) Install options

In this step , We will register nodes to be added to the cluster. We will mention hostnames one per line. We can also use patterns, click on pattern expressions to know more.
We are adding single node named secured01 to the cluster.

/etc/hosts file on every node should contain whatever hostnames you add here.


We can register nodes in two ways.

Automatic registration :

Ambari will auto-install ambari-agents on all nodes to register them with ambari-server. We need to specify private key of  Ambari server host.


Check private key file path below.




Manual registration :

In manual registration , we are responsible for installing and configuring ambari-agents to make them report to ambari server.

We are going with automatic registration. click next after entering hostnames and private key.



All hostnames mentioned should be FullyQualifiedDomainNames (FQDN) names. In other words hostnames should be output of hostname -f command.

Ambari will throw warning to check if hostnames are FQDNs, if you are confident click OK.







6)Confirm hosts

In this step , Ambari will check all nodes to see if they meet prerequisites for hadoop installation and also installs and configures ambari-agents.



Ambari will display warnings in the bottom if any of nodes do not meet prerequisites.  Click on the click to see warning link to know more about warnings.

The picture below shows ntpd process is not started on a node.




Go to node and start the ntpd process . If ntp is not installed ,first install (yum install ntp) and then start the ntp.



After fixing warnings click on rerun checks to confirm warning are gone.

The picture below shows no warnings now.



7) Choose services 

In this step, we will choose services required for our cluster. We are selecting HDFS, Mapreduce2 and YARN.





















Ambari will also add services those are mandatory.





8) Assign Masters.

In this step , We will choose nodes for master components. For example , we will choose a node/host to install Namenode.




9) Assign slave and clients

In this step, we will just select select slave components and clients to be installed.



10) Customize Services

In this step , We can modify configurations of the services. Ambari will show us red colored numbers if any inputs are required from us.

For example : ambari-metrics and Smartsense require password for admin user. mention admin password for both and click next.




Ambari will also recommend configuration values as warnings. You can also ignore them and clik proceed anyway or hit cancel to fix them.




11) Review 

Ambari gives us last opportunity to review our HDP installation. At this step , We can go back and modify things.
In this step, Ambari shows installation details like hosts,repositories and services.

Click deploy to continue installation.



12) Install , Start and Test

This step performs three operations .

Installation of services
Starting the services
Performing service checks to see if services are in working state.




It takes some time to complete this step if services are being downloaded from the public repositories.

13) Summary

Ambari will provide us the final summary about our HDP installation. It will also list warnings and errors if any.

The picture below shows one warning and one error.


Click on the complete to go to your new HDP cluster.


14)

The picture below shows New HDP cluster installed with HDFS, Mapreduce2, YARN, Zookeeper, Metrics and Smartsense services.


This is how we install HDP (Hortonqorks Data Platform) using Apache Ambari on single node. This is the same process we can follow for multi node cluster. Few steps like install options and Assign masters will change.

Let me know if you have any questions.

Add new service to Hadoop cluster using Apache Ambari

In this article , We will learn how to install Apache Knox and also how to configure it.

Please follow below steps after logging into the Ambari to install Apache Knox .


1) Click on add service.


In the left side services area , Click on actions and click on Add service option to add new service to the Hadoop cluster. 

The picture below show Add service option.




2) Choose service.

Once you click on the Add service, new popup will be opened to show list of services those can be installed using Apache Ambari. Select Knox  and hit Next button.


The picture below shows Knox service.



2) Assign Masters

Apache Knox has a master daemon called Knox gateway . We need to choose a host to install Knox gateway in this step. After selecting the host , click next.

The picture below shows secured01 host is selected to install Knox gateway.




3) Assign Slaves and clients

Apache Knox does not have any slave daemons and clients so Ambari automatically skips this step.

4) Customize the services

This is one of the important steps in adding new service to the cluster. In this step , We can modify default property values.

We need to enter master secret to continue installation of the Knox.

And also confirm the master secret re-entering again. Click Next after entering the master secret.




5) Configure the identities

This step is also one of the important steps. We need to configure Kerberos principals and keytabs in this step. By default , Ambari suggests principals and keytabs , We can go with them.

If you are not modifying the default configs , just hit next.

The picture below shows Configure identities step.



Once you click Next, Amabri will ask for Kerberos admin principal name and password.

The picture below show the popup to enter kerberos admin credentials.



Admin ACLs are mentioned in the file /var/kerberos/krb5kdc/kadm5.acl


If no Kerberos installed on the cluster , This step will be skipped by Ambari automatically.


6) Review 

In this step Add service wizard will show us what service it is going to install and also on which node it is installing.

The picture below shows Ambari is going to install Know on secured01 host.





Click deploy to start installation of the new service.



7) Install start Test

In this step, Ambari performs three things.

  • Installation of the selected services.

  • Starting the newly installed services.

  • Testing the newly installed services by running service checks.

The picture below shows Add service wizard is installing Knox.





8) Summary

In this step , Ambari shows  complete installation report . Ambari also displays that we need to restart some services those have restart symbol after installing any service to the cluster.


The picture below shows installation has one warning, We can click on it to see what is the warning.



Click complete to finish Knox installation.

9)  Check new service 

In this final step, Ambari automatically reloads home page and shows newly installed services.

Check new service Knox in the services Area of Ambari home page .


We can perform service check to see if the Knox is in working state or not.

By following the same steps of above , We can install any service in the hadoop cluster.

Let me know  in the comments if you have any questions 

Ambari server installation including Java and Postgres

In this article we will learn how to install Ambari-server. Ambari server depends on Java and RDBMS system. By default, Ambari server installs Postgres RDBMS system to store Ambari data.
We will also see how to install  Java and Postgres and how to configure Postgres for Ambari.

1)  Create Ambari repo

Create an Ambari repository by downloading ambari.repo file.

Command:

wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo

This command works on CentOS/RHEL/Oracle Linux.

The picture below shows how to create Ambari repo in CentOS/RHEL/Linux.




2)  Install ambari-server package and  also Postgres  package

After first step , We will see a new repo ambari-2.6.0.0 created .


We will install ambari-server package using yum install command. This command will also resolve dependency Postgres and will also install it.

The picture below shows Ambari-server installed and also it's dependency Postgres.





3)  Ambari-server setup

Once ambari-server is installed , If we type ambari-server, it will show command's options that confirms Ambari-server is successfully installed.









After ambari-server installation using yum command, We need to setup ambari-server to configure Java and Postgres.

We use ambari-server setup command for the same and this command also configures prerequisites for Ambari and Hadoop.

Disable SELinux

SELinux needs to be disabled for Hadoop installation. Ambari server checks whether it is disabled or not. If it is not disabled , setup command will disable SELinux with user's confirmation.




Customize user account

By default ambari-server runs with root user, If we want to change the root user to some other user we can change. It is recommended to use root user.


JDK version

Choose Jdk version to be installed. Ambari setup will shows us all available JDK versions from Oracle and also gives custom JDK option to install other JDK like OpenJDK. Apart from JDK , Ambari server and Hadoop also require JCE. Amabri setup command will also install it.

It is better to choose latest JDK version from Oracle.




Accept the Oracle binary code license agreement.

Before we install Oracle JDK , We need to accept the Oracle binary code license agreement.



Advanced DB configs

We can also configure Postgres as per our wish.  We can configure users and schema as per our wish. It is ok to go with default database configuration. We can just say no(n) in this step.




Ambari server setup complete.

After all these steps ,  We can see ambari-server is successfully setup.

The picture below shows ambari-server is setup successfully.




4) Start the ambari-server .

Now we can start the Ambari-server using ambari-server start command.

The picture below shows ambari-server is successfully started and also shows amabri-server is listening on 8080 port number. 8080 is the default port number for Ambari-server.




5) Ambari server login screen from Browser

We can use ip address where Amabri-server installed and port number to access the ambari GUI from browser.

The picture below shows ambari GUI from browser.



Default login credentials for Ambari are admin/admin.


6) Install HDP by clicking Launch install wizard.

We can login into Ambari GUI using default login credentials admin/admin.

Once we login into Ambari , We are ready to create a HDP (Hortonworks Data Platform ) cluster using Ambari.

We can click on Launch Install Wizard to create the HDP cluster.

The picture below shows Launch Install Wizard button.



In next article we will see how to install HDP (Hortonworks Data Platform) using Ambari. Let me know in the comments if you have any questions about Ambari server installation.


Transfering data between hadoop clusters using distcp command.

In this article , We will learn how to transfer data between two hadoop clusters. hadoop distcp command is used to transfer the data between clusters.

One of the main use cases of distcp command is to sync the data between production cluster and backup/DR cluster.  We will learn distcp with some examples.

1)

Connect to source cluster and create a file called numbers under /user/hdfs/distcptest directory in HDFS.

The picture below shows how to create local file named numbers, how to upload it to HDFS directory /user/hdfs/distcptest.





2)

hadoop distcp command takes source paths and destination path as it's arguments.

source path syntax : 

 hdfs://[active-namenode-hostname]:[name-node-port-number]/path/to/hdfs/file


active-namenode-hostname :

We have to specify active name node hostname or ip address. If no HA is enabled, We will not have any active namenode.

We can specify name node hostname or ip address directly.


Port  number  : We need to specify rpc port number of  Name node. By default it is 8020.

In the same way we need to specify active name node hostname or IP address and rpc port numbers of destination cluster.


We use below source path and destination paths.


Source path : hdfs://192.168.1.113:8020/user/hdfs/distcptest/numbers

Destination path : hdfs://192.168.1.115:8020/user/hdfs/target


3)

Create a diretcory called /user/hdfs/target in destination directory and run ls command.

The picture below shows creating /user/hdfs/target folder in destination cluster.




4)

The pictures below shows how to run distcp command and also confirms that destination cluster has the file transferred.

Command run :

 hadoop distcp hdfs://192.168.1.113:8020/user/hdfs/distcptest/numbers hdfs://192.168.1.115:8020/user/hdfs/target






5)
Hardcoding name node ip address and port number is a bad idea because if active name node on that ip address goes down , hadoop distcp command fails.

We need to use nameservice ID of the cluster.

The picture below shows how to use nameservice ID in distcp command. Active namenode ip address and port numbers are replaced with just nameserviceID.

Picture also shows how to get nameservice ID of a cluster.




6) Update or overwrite

What to do if destination cluster's target directory  already has the same files ?


We have two options to choose in the above scenerio. They are update or overwrite.

We can update the destination cluster's directory with new files from source cluster's files using update option.

Or we can simply overwrite destination cluster's files with source cluster's files using overwrite option.


7)  Update example

The following example shows how to update destination cluster's directory with new files in source cluster's directory.

Source cluster's directory has new file called numbersNew , only numbersNew file will be copied to destination cluster's directory /user/hdfs/target.


The destination cluster's target directory has file numberNew with new timestamp.





8) Overwrite Example 

The following pictures show how to use overwrite option in distcp command.






9) Multiple source files

If we need to copy multiple files from source cluster to destination cluster . We need to specify all source files first and target path last in distcp command.

The picture below shows how to transfer multiple files from source cluster to destination cluster.

Source files are highlighted in the picture.





10 ) Multiple source files with -f option


If we have more number of source files , We can specify all of them in a file and we can use that file in distcp command.

distcp command provides -f option to use hdfs file to specify all source files.

The following picture shows how to use -f option.



Let me know if you have any questions on distcp command of HDFS.