I/O Compression :
- In the Hadoop framework, where large data sets are stored and processed, you will need storage for large files.
- These files are divided into blocks and those blocks are stored in different nodes across the cluster so lots of I/O and network data transfer is also involved.
- In order to reduce the storage requirements and to reduce the time spent in-network transfer, you can have a look at data compression in the Hadoop framework.
- Using data compression in Hadoop you can compress files at various steps, at all of these steps it will help to reduce storage and quantity of data transferred.
- You can compress the input file itself.
- That will help you reduce storage space in HDFS.
- You can also configure that the output of a MapReduce job is compressed in Hadoop.
- That helps is reducing storage space if you are archiving output or sending it to some other application for further processing.
0 Comments