Hadoop Lessons: Reusable scripts in hive

If I want to see top ten rows of a table (users) in Hive.
I will write query like below

select * from users limit 10;

will save it to a file in unix ,say topn.q

and will run the query like below

hive -f topn.q

Problem with the above script is.

1.table name and number are hard coded so if we have same requirement on different table or different number ,we have to write the new script or modify the existing one.
because of the above reason we have to write reusable scripts.
we can achieve the same with the help of hiveconf in hive.
hiveconf is handy to substute variables in hive script at runtime.
let us learn how to avoid hard coding in above script by using hiveconf.

change the above script to below.

select * from ${hiveconf:tablename} limit ${hiveconf:number}

save above script in a file, forexample dynatopN.q

now we can pass the table name and number at the time of running query like below.

hive -hiveconf tablename=users -hiveconf number=10 -f dynatopN.q

even we can change the tablename and number like below

hive -hiveconf tablename=movies -hiveconf number=20 -f dynatopN.q

we should rarely touch the production scripts.so better to use hiveconf in production scripts also.

To achieve the same in Pig we use -param option while running the script and we use $ symbol inside the script.
If number of parameters we are passing at runtime getting increased, it is hard to maintain such scripts in hive.but in Pig provides one more option -paramfile you can specify the filename where all parameter names and values are maintained.
So pig is more flexible than hive.
This approach is also recommended for production scripts .once query ran successfully , we should avoid touching it as much as possible.

2 comments:

VisitorJanuary 23, 2025 at 9:36 PM
Great explanation! The step-by-step approach makes it easy to follow and understand. I particularly appreciate the emphasis on modularity and reusability, which are key for managing complex data workflows efficiently. Could you also share some best practices for debugging these scripts in case of errors? Thanks for the valuable insights! check our service
GuestFebruary 14, 2025 at 4:10 AM
Keep up the excellent work! Your blog is a true gem. https://www.roofingburnaby.com

Technology

Search This Blog

Reusable scripts in hive

2 comments: