Search This Blog

Step by step guide for working on partitioned table in Apache Hive

We have seen how to create a table in Hive, Now we will see how to create a partitioned table and how to perform operations on it.

1) Partioned table requires extra option called partioned by in create table syntax.
The command below creates a partitioned table called employee_partitions.

create table employee_partitions(name string,salary int,deptno int,DOJ date) partitioned by(joinyear string) row format delimited fields terminated by ',' ;



2)

Run show partitions command to see if any partitions available in the table. As above create command does not have location option included,It will not show any partitions.

show partitions employee_partitions.




3)
Run a select query on normal table created in how to create Hive table article and insert the output into  employee_partitions table.

The query below retrieves 2014 employees from employee table and inserts into employee_partitions table partition called 2014.

 Insert into employee_partitions partition (joinyear=2014) select name,salary,deptno,doj from emp where year(doj)=2014;





4)
Now run the show partitions command and you will see one partition displayed.



5)

Check Hive table's HDFS location using describe formatted command. Location row displays hdfs path for table.

describe formatted employee_partitions;

6)

Check how partitioned data is stored in HDFS. You will find partitioned data is stored under subdirectory of table's HDFS directory.

hdfs dfs -ls hdfs://apslabc1/user/hdfs/data/employee_partitions





7) In step 3, We have loaded one partition data into employee_part table. If we want to load multiple partitions data one time, We have to use the query below.  This is also called dynamic loading of partitions;

set hive.exec.dynamic.partition.mode=nonstrict;
 Insert into employee_partitions partition (joinyear) select name,salary,deptno,doj,year(doj) from employee;



8)

Now check all partitions.

show partitions employee_partitions;




9)

If we want to specify location at the time of creating partitioned table, Location should have data as per partitions.

The output below has partition folders and can be used in location option .



10)

We can load data into partitioned table from local file system using load data statement like below.


Hope above queries help you, Happy hadooping!!!