Search This Blog

Usage of subqueries in Apache Hive queries

In this article, we will learn how and where to use subqueries in Apache Hive. Sub-queries in Hive queries are allowed in from clause ad where clause.

The following steps explain you in detail about subqueries in Hive.

1) Create a department table as explained in this article.

2)  In From clause.

The following query uses subquery in from clause.

select name,newsalary from (select name,salary*1.1 as newsalary from employee) emp_new;

Subquery columns can be accessed in main query. In the above query, We have accessed newsalary column in main query from subquery.

Union and Union all are also allowed in the subquery of from clause.

select name,newsalary from (
select name,salary*1.1 as newsalary from employee
select name,salary from employee
) emp_new;

2)  In where clause

Subquery  in the where clause is used to define list of elements.

The following query defines list of deptno elements from department table. and the main query retrieves names from employee table whose department number available in department table.

select name from employee where deptno in (select deptno from department) ;

We can use both in or not in in where clause.

In or not in subqueries should have only one column in subquery.

If subquery has morethan one column, Hive throws an error SubQuery can contain only 1 item in Select List.

Sub queries need to be only right side , Left side sub-queries not allowed.

The folowing query is not valid as subquery defined left side.

select name from employee where  (select deptno from department)d in (10,20);

We can use exists or not exists in where clause to compare main query data with subquery data.

The query below compares deptno in employee table to deptno in department table and retrieves matched rows.

select name from employee e where exists (select name from department d where e.deptno=d.deptno) ;

Hope it is clear now for you, Please comment your questions if you get any.

Happy Hadooping !!!!!!