Search This Blog

Hadoop eco system books to read

Even though Hadoop is a 10 years old technology still you will find less number of resources to learn it. They are different reasons for that. One of them is , Hadoop is rapidly changing technology and many people might have not tried all features of it. and some times It is also not ready for enterprise use cases.

I would like to put list of Hadoop eco system books at one place in this article.

Hadoop The definitive guide :

First and fore most important book every body should read on Hadoop is Hadoop the definitive guide. It not only covers HDFS and Mapreduce but also mapreduce abstractions like cascading,hive and pig too. Apart from that it covers both aspects development and administration of technology .It also has latest features covered in latest edition. It also covers certification syllabus for both Hortonworks and Cloudera. Book is organised well and covers quality content and examples.
However if you want to have complete understanding of a specific tool .Then you may have to check another book on that tool. This book covers all features of HDFS and Mapreduce but only core features of other tools in eco system. Spark is also covered in latest edition.

Hadoop in practise :

This book covers in depth topics of HDFS and Mapreduce with very good coding examples.
Apart from HDFS and Mapreduce ,It also covers SQL tools like Hive , Impala and Spark SQL.

Hadoop operations

This book is very good book on administration operations.It covers installation and configuration of hadoop daemons. Operating system  and Network details are also covered as part of cluster planning topics. This is a small book but covers quality content on administration.
for administration you may be interested to refer this book along with Hadoop definitive guide.

Eco system tools:

This is a very good book on Apache Hive . It almost covers all topics of Hive. Best part is It also covers most difficult features of hive in an understandable explanation.If you want to master UDFs and UDAFs , You can depend on it.

Small book covers Apache Pig. Author has very good experience on Apache Pig. Editorial work is not done properly.There is a scope for improvement in this book. 

Cascading is most useful tool in hadoop eco system. It has very good documentation on its home page. To know more practical applications and different analytical capabilities  of Cascading framework, This book is very useful.

If you want to know, why or where RDBMS is not relevant in big data applications and How HBase addresses the problems of big data , is well covered in this book.  It is useful for both development and administration of Hbase.

Below are other books available on utility tools like Sqoo, Oozie and Flume. I have not read these books also we do not have options for them as of now.

Apache Sqoop Cookbook :

Apache Oozie : the Workflow scheduler for Hadoop

Using Flume : Flexible, Scalable ad Reliable Data Streaming