In this post I am going to discuss how to write word count program in Hive.
Assume we have data in our table like below
This is a Hadoop Post
and Hadoop is a big data technology
and we want to generate word count like below
a 2
and 1
Big 1
data 1
Hadoop 2
is 2
Post 1
technology 1
This 1
Now we will learn how to write program for the same.
1.Convert sentence into words
the data we have is in sentences,first we have to convert that it into words applying space as delimiter.we have to use split function of hive.
split (sentence ,' ')
2.Convert column into rows
Now we have array of strings like this
[This,is,a,hadoop,Post]
but we have to convert it into multiple rows like below
This
is
a
hadoop
Post
I mean we have to convert every line of data into multiple rows ,for this we have function called explode in hive and this is also called table generating function.
SELECT explode(split(sentence, ' ')) AS word FROM texttable
and create above output as intermediate table.
(SELECT explode(split(sentence, ' ')) AS word FROM texttable)tempTable
after second step you should get output like below
a
a
and
Big
data
Hadoop
Hadoop
is
is
Post
technology
This
3.Apply group by
after second step , it is straight forward ,we have to apply group by to count word occurrences.
select word,count(1) as count from
(SELECT explode(split(sentence, ' ')) AS word FROM texttable)tempTable
group by word
thank you sir..till now never think of word count using hive
ReplyDeleteAivivu chuyên vé máy bay, tham khảo
ReplyDeletegia ve may bay vietjet tu han quoc ve viet nam
đặt vé máy bay hải phòng sài gòn
vé máy bay sài gòn hà nội hôm nay
vé máy bay hải phòng nha trang vietjet
săn vé máy bay giá rẻ đi Mỹ
This is very informative and nice post. House Painter Schaumburg, Il
ReplyDelete