Anatomy of a Map-Reduce Job Run :
Hadoop Framework comprises of two main components :
- Hadoop Distributed File System (HDFS) for Data Storage
- MapReduce for Data Processing.
A typical Hadoop MapReduce job is divided into a set of Map and Reduce tasks that execute on a Hadoop cluster.
The execution flow occurs as follows:
- Input data is split into small subsets of data.
- Map tasks work on these data splits.
- The intermediate input data from Map tasks are then submitted to Reduce task after an intermediate process called ‘shuffle’.
- The Reduce task(s) works on this intermediate data to generate the result of a MapReduce Job.
0 Comments