Header Ads Widget

Hadoop Streaming

Hadoop Streaming :

  It is a feature that comes with a Hadoop distribution that allows developers or programmers to write the Map-Reduce program using different programming languages like Ruby, Perl, Python, C++, etc.

We can use any language that can read from the standard input(STDIN) like keyboard input and all and write using standard output(STDOUT).

Although Hadoop Framework is completely written in java programs for Hadoop do not necessarily need to code in Java programming language.

In the diagram,

We have an Input Reader which is responsible for reading the input data and produces the list of key-value pairs. We can read data in .csv format, in delimiter format, from a database table, image data(.jpg, .png), audio data etc.

This list of key-value pairs is fed to the Map phase and Mapper will work on each of these key-value pair of each pixel and generate some intermediate key-value pairs.

After shuffling and sorting, the intermediate key-value pairs are fed to the Reducer: then the final output produced by the reducer will be written to the HDFS. These are how a simple Map-Reduce job works.

Post a Comment

0 Comments