Search This Blog

Loading compressed data into Hive table.

In this article, We will learn how to load compressed data (gzip and bzip2 formats) into Hive table.

1)  Create a file  called employee_gz on local file system and convert that into gz format file using gzip command.

Sample : employee data.


Balu,300000,10,2014-02-01
Radha,350000,15,2014-02-05
Nitya,325000,15,2015-02-06
Bubly,350000,25,2015-05-01
Pandu,300000,35,2014-06-01
Nirupam,350000,40,2016-01-01
Sai,400000,25,2015-05-02
Bala,400000,20,2016-10-10

Example :



2)

Create a hive table  called employee_gz without any location .

The code below is for creating Hive table.

create table employee_gz(name string,salary int,deptno int,DOJ date)
row format delimited fields terminated by ',';





3)

Load data from local file system file employee_gz to Hive table employee_gz.

The code below loads GZ compressed data in /home/hdfs/employee_gz.gz into hive table employee_gz.

load data local inpath '/home/hdfs/employee_gz.gz' into table employee_gz;

Hive recognizes compressed data and  loads it into table. We need not specify that is in gzip format.



Hive also uncompresses the data automatically while running select query.

If we remove local in hive query, Data will be loadedd into Hive table from HDFS location.

4) Check Hive table's data stored in GZ format or not in HDFS.

Now we will check how to load bzip2 format data into Hive table.

5) Create local file called employee_bz2 with bzip2 format.



6) Create a new table called employee_bz2.

The code below creates a hive table called employee_bz2.

create table employee_bz2(name string,salary int,deptno int,DOJ date)
row format delimited fields terminated by ',';



7) Load bzip2 format data into Hive table.

load data local inpath '/home/hdfs/employee_bz2.bz2' into table employee_bz2;


We can load gzip ad bzip2 formats data into Hive table like normal text files.  We do not need to specify any format in the query.

1 comment: