Search This Blog

Step by step guide for joins in Apache Hive

This article discusses how to use different joins in Hive in a step by step manner.

1) Create a department table with below data.

Check How to create a table in Apache Hive here.


10,IT
15,Administration
20,Engineering
35,Sales
40,Others
45,HR

create external table department(deptno int,deptname string) 
row format delimited fields terminated by ','
location '/user/hdfs/department';



2)
Describe employee table and describe department table.

describe employee;

describe department;





3)

Check data from both the tables. Both tables have deptno as a common column.

select * from employee;

select * from department;



4)

Retrieve data from both the tables where deptno data matching in both tables;

For example : deptno 10 in employee table is matched with deptno 10 in department table, but not deptno 25.

select employee.name  , employee.salary  , employee.deptno , employee.doj,department.deptno , department.deptname  
from employee join department 
on employee.deptno=department.deptno;





The fuctionality above is called inner join. We can even specify inner join like below.

select employee.name  , employee.salary  , employee.deptno , employee.doj,department.deptno , department.deptname  
from employee inner join department 
on employee.deptno=department.deptno;

We can even specify alias for table names  like below.

select e.name  , e.salary  , e.deptno , e.doj,d.deptno , d.deptname  
from employee e inner join department d
on e.deptno=d.deptno;



5)

If we want to see even unmatched rows from employee (left side ) table , Then we can use left outer join like below.

select e.name  , e.salary  , e.deptno , e.doj,d.deptno , d.deptname  
from employee e left outer join department d
on e.deptno=d.deptno;


The output above displays NULL for deptno 25 row as it does not have any data department table.

6)
If we want to see even unmatched rows from department (right side) table , Then we can use right outer join like below.

select e.name  , e.salary  , e.deptno , e.doj,d.deptno , d.deptname  
from employee e right outer join department d
on e.deptno=d.deptno;



The output above displays NULL for deptno 45 row as it does not have any data in employee table.

7)

If we want to see unmatched rows from both tables , we need to use full outer join like below.

select e.name  , e.salary  , e.deptno , e.doj,d.deptno , d.deptname  
from employee e full outer join department d
on e.deptno=d.deptno;


Full outer join opitput contains NULL rows from both left outer and right outer joins.

We have learnt how to use below joins :

Inner joins
Left outer join
Right outer join
Full outer join