Header Ads Widget

Data Replication

Data Replication :

  • Replication ensures the availability of the data. 
  • Replication is - making a copy of something and the

number of times you make a copy of that particular thing can be expressed as its Replication Factor. 

  • As HDFS stores the data in the form of various blocks at the same time Hadoop is also configured to make a copy of those file blocks. 
  • By default, the Replication Factor for Hadoop is set to 3 which can be configured.
  • We need this replication for our file blocks because for running Hadoop we are using commodity hardware (inexpensive system hardware) which can be crashed at any time.
  • We are not using a supercomputer for our Hadoop setup. 
  • That is why we need such a feature in HDFS that can make copies of that file blocks for backup purposes, this is known as fault tolerance.
  • For the big brand organization, the data is very much important than the storage, so nobody cares about this extra storage.
  • You can configure the Replication factor in your hdfs-site.xml file.


Post a Comment

0 Comments