Header Ads Widget

Map Reduce Framework and Basics

Map Reduce Framework and Basics :

  • MapReduce is a software framework for processing data sets in a distributed fashion over several machines.
  • Prior to Hadoop 2.0, MapReduce was the only way to process data in Hadoop.
  • A MapReduce job usually splits the input data set into independent chunks, which are processed by the map tasks in a completely parallel manner.
  • The core idea behind MapReduce is mapping your data set into a collection of < key, value> pairs, and then reducing overall pairs with the same key.
  • The framework sorts the outputs of the maps, which are then inputted to the reduced tasks.
  • Both the input and the output of the job are stored in a file system.
  • The framework takes care of scheduling tasks, monitors them, and re-executes the failed tasks.

The overall concept is simple :

1. Almost all data can be mapped into pairs somehow, and

2.Your keys and values may be of any type: strings, integers, dummy types and, of course, pairs themselves.

Post a Comment

0 Comments