Data Replication :
- Replication ensures the availability of the data.
- Replication is - making a copy of something and the
number of times you make a copy of that particular thing can be expressed as its Replication Factor.
- As HDFS stores the data in the form of various blocks at the same time Hadoop is also configured to make a copy of those file blocks.
- By default, the Replication Factor for Hadoop is set to 3 which can be configured.
- We need this replication for our file blocks because for running Hadoop we are using commodity hardware (inexpensive system hardware) which can be crashed at any time.
- We are not using a supercomputer for our Hadoop setup.
- That is why we need such a feature in HDFS that can make copies of that file blocks for backup purposes, this is known as fault tolerance.
- For the big brand organization, the data is very much important than the storage, so nobody cares about this extra storage.
- You can configure the Replication factor in your hdfs-site.xml file.
0 Comments