Search This Blog

Connecting to Hive database with dynamic service discovery and without dynamic service discovery

We will learn how to use dynamic service discovery in Hive and what are the advantages of dynamic service discovery feature.

First we will see what are the issues faced if we do not use dynamic service discovery in Hive .


1) 

Hive provides two prompts to run hive queries. They are hive prompt and beeline prompt. Hive prompt is deprecated , we need to use beeline prompt.

The below picture show how to connect to beeline prompt.



2) 

Before running Hive queries , We need to establish  a connection to Hive data base. We can establish a connection with dynamic  service discovery and with out dynamic service discovery.

We will see how to connect to database without dynamic service discovery.


We need to know below things to establish a connection to Hive data base .

HiveServer 2 host name : Host name or IP address where hiveserver2  is running. Multiple nodes will be having hiveserver2 instances running. Please use one hostname or IP address.

Database name : Which database in Hive you want to connect.  We will see how to connect to default database.

Port number : Hiveserver2 port number. default value is 10000.

User name : User name for database. We are using hive here.

Password  : Password for database. We are using hive here.


The below picture shows how to connect to hive server2 without dynamic service discovery.

Connection string used : jdbc:hive2://master1:10000/default

jdbc:hive2:// is fixed for all conection strings.

User name and password used : hive and hive



3) 

We will have multiple hiveserver2 instances running on the cluster. Assume that we have connected to database using hiveserver2 running on master1 host,  If hiveserver 2 on master1 is down, Hive queries would fail.

Other problem is we are also increasing the load on one hiveserver2 instance by hard coding it.


In the below picture , first query was successful but second query failed as hiveserver2 went down on master1.





4) 

Hive provides advanced feature called dynamic service discovery to address the above problems.

In dynamic service discovery , Rather than using hiveserver2 host name directlry,We will use zookeeper to connect to hive database.

Zookeeper will always resolve to active hiveserver2 so that your queries never fail.

We need below things to use dynamic service discovery .

Host names and port numbers  where your zookeeper is running . We also call it as Zookeeper ensemble.
We can easily get this value from property hive.zookeeper.quorum in Hive.

zookeeper's default port number is 2181. You can get zookeeper host names from Zookeeper configuration files also.


Specify service discovery mode using serviceDiscoveryMode=zooKeeper .


Specify zookeeper namespace as hivesever2. This is the value of hive.server2.zookeeper.namespace property in Hive.


We are using below connection string .

jdbc:hive2://datanode1:2181,master1:2181,master2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2


The picture below shows how to connect to Hive database using dynamic service discovery.



5)

Many people have typos in dynamic service discovery connection string , Easiest way to get correct connection string to take it from Ambari GUI.

In Ambari -------->  Click on Hive  --------------------> Click Summary ----------------> Click Left arrow icon to copy connection string.




We can directly paste connection string in the terminal once copied from Ambari to avoid typos.

6)

Benefits of Dynamic Service Discovery.

High Availability  :

Dynamic service discovery will always point user queries to active hiveserver2 so that hive queries would never fail.


Load balancing  :

Dynamic service discovery changes hiveserver2 to every new user query in round rpobin fashion so that all hiveserver2 instances would get the same load.

It is always recommended to use dynamic service discovery in production to avoid hiveserver2 crashes.

Let me know if any questions you have.

1 comment: