Search This Blog

Hadoop Eco system research papers


You know , Hadoop eco-system has many tools , Every tool is an implementation of a research paper. Of course ,most of these research papers are written by Google employees.I would like to put most of these papers at one place in this article.



As You already know Hadoop has two core modules HDFS and MAPREDUCE.These two are open source implementations for Google products GFS and MAPREDUCE.


Below are their links .


1. GFS ( The Google File System).


2. MAPREDUCE : Simplified Data Processing Large Clusters.

Apache Hive is a  ware house created on top of Hadoop. It is an implementation of paper Peta byte scale data ware house using Hadoop.

Apache Pig is a platform for analyzing large data sets using data flow language Pig Latin.It is an implementation of paper Pig Latin: Not so foreign language for data processing.

Apache HBase is an open source implementation of Google's BigTable paper.

Apache Spark is an implementation of paper A fault tolerant abstraction for in-memory cluster computing

Apache Tez is an implementation of paper A Unifying Framework for Modeling and Building Data Processing Applications.

Apache Crunch is an implementation of Google's FlumeJava.

Apache Zookeeper is an implementation of paper wait free coordination for internet scale systems.

YARN is an implementation of Apache Hadoop YARN : Yet Another Resource Negotiator.

Apache Storm is an implementation of paper Storm @ Twitter.


Hope these papers are useful to you.