Search This Blog

User Defined Functions in Hive

We have three types of functions in hive ,first one is single row function they operate on single row at a time.
second one is multi row function they can operate on multiple rows at a time and third is table generating function they generate multiple rows out of a single row
Hive has good number of built in functions in these categories ,you can check all of them using

show functions;

If you want to understand one particular function like concatYou can use

describe function concat

It displays small help page for concat function.

However sometimes you may also need to write your own function if you do not find any suitable function for you.

These custom functions can be of three types

1.Single row function (UDF =User Defined Function)
2.Multi row function (UDAF=User Defined Aggregate Function)
3.Table generation function (UDTF =User Defined Table generating Function)

In this, we learn how to develop UDF in hive.

Assume we have a table emp with data like below.

eno,ename,sal,dno

10,Balu,100000,15
20,Bala,200000,25
30,Sai,200000,35
40,Nirupam,300000,15

In this we develop a custom function which prepends Hi to employee name.

Below are steps for the same.


1.write a UDF by extending UDF class using Eclipse

To develop UDF ,we should extend UDF class of hive-exec.jar and override evaluate method of it.

public class HiPrepender extends  UDF {

public Text evaluate(Text column){
if(column!=null&&column.getLength()>0){
return new Text("Hi "+column.toString());
}
return null;
}

}

for this you need to have 3 jar files on classpath

hadoop-core*.jar
hive-exec*.jar
apache-commons*.jar



2.Create a jar file for above program



File---->export---->jar file----->specify file path for jar--->next--->do not select main class---->finish

assume you created a jar file named hiprepender.jar

3.Transfer jar file to unix box using filzilla/winscp,if you are not on the same .

if you are on other operating system like windows ,you have to transfer it to machine from where you are running hive queries.

assume you have transferred your jar file to /root directory.


4.From Hive prompt ,add jar file to your class path

hive > add jar /root/hiprepender.jar

5. Create a temporary function

create temporary function prependhi as 'HiPrepender';

Here HiPrepender is the classname we wrote in the first step.

6. Use the custom function;

select hiprepend(ename) from emp;

you will get output like below

Hi Balu
Hi Bala
Hi Sai
Hi Nirupam

In coming articles we learn UDAF and UDTF.

No comments:

Post a Comment