Header Ads Widget

Anatomy of a Map-Reduce Job Run

Anatomy of a Map-Reduce Job Run :

Hadoop Framework comprises of two main components :

  • Hadoop Distributed File System (HDFS) for Data Storage
  • MapReduce for Data Processing.

A typical Hadoop MapReduce job is divided into a set of Map and Reduce tasks that execute on a Hadoop cluster. 

The execution flow occurs as follows:

  • Input data is split into small subsets of data.
  • Map tasks work on these data splits.
  • The intermediate input data from Map tasks are then submitted to Reduce task after an intermediate process called ‘shuffle’.
  • The Reduce task(s) works on this intermediate data to generate the result of a MapReduce Job.

Post a Comment

0 Comments