HDFS Interfaces

Rahul Saini May 12, 2022

HDFS Interfaces :

Features of HDFS interfaces are :

Create new file
Upload files/folder
Set Permission
Copy
Move
Rename
Delete
Drag and Drop
HDFS File viewer

Data Flow :

MapReduce is used to compute a huge amount of data.
To handle the upcoming data in a parallel and distributed form, the data has to flow from various phases :

Input Reader :
The input reader reads the upcoming data and splits it into the data blocks of the appropriate size (64 MB to 128 MB).
Once input reads the data, it generates the corresponding key-value pairs.
The input files reside in HDFS.
Map Function :
The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs.
The mapped input and output types may be different from each other.
Partition Function :
The partition function assigns the output of each Map function to the appropriate reducer.
The available key and value provide this function.
It returns the index of reducers.
Shuffling and Sorting :
The data are shuffled between nodes so that it moves out from the map and get ready to process for reduce function.
The sorting operation is performed on input data for Reduce function.
Reduce Function :
The Reduce function is assigned to each unique key.
These keys are already arranged in sorted order.
The values associated with the keys can iterate the Reduce and generates the corresponding output.
Output Writer :
Once the data flow from all the above phases, the Output writer executes.
The role of the Output writer is to write the Reduce output to the stable storage.

Post a Comment

0 Comments