HDFS Interfaces :
Features of HDFS interfaces are :
- Create new file
- Upload files/folder
- Set Permission
- Copy
- Move
- Rename
- Delete
- Drag and Drop
- HDFS File viewer
Data Flow :
- MapReduce is used to compute a huge amount of data.
- To handle the upcoming data in a parallel and distributed form, the data has to flow from various phases :
- Input Reader :
- The input reader reads the upcoming data and splits it into the data blocks of the appropriate size (64 MB to 128 MB).
- Once input reads the data, it generates the corresponding key-value pairs.
- The input files reside in HDFS.
- Map Function :
- The map function process the upcoming key-value pairs and generated the corresponding output key-value pairs.
- The mapped input and output types may be different from each other.
- Partition Function :
- The partition function assigns the output of each Map function to the appropriate reducer.
- The available key and value provide this function.
- It returns the index of reducers.
- Shuffling and Sorting :
- The data are shuffled between nodes so that it moves out from the map and get ready to process for reduce function.
- The sorting operation is performed on input data for Reduce function.
- Reduce Function :
- The Reduce function is assigned to each unique key.
- These keys are already arranged in sorted order.
- The values associated with the keys can iterate the Reduce and generates the corresponding output.
- Output Writer :
- Once the data flow from all the above phases, the Output writer executes.
- The role of the Output writer is to write the Reduce output to the stable storage.
0 Comments