Header Ads Widget

Design of HDFS

Design of HDFS :

  • HDFS is a filesystem designed for storing very large files with streaming data access patterns, running on clusters of commodity hardware.
  • There are Hadoop clusters running today that store petabytes of data.
  • HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. 
  • A dataset is typically generated or copied from a source, then various analyses are

performed on that dataset over time.

  • It’s designed to run on clusters of commodity hardware (commonly available hardware available from multiple vendors) for which the chance of node failure across the cluster is high, at least for large clusters. 
  • HDFS is designed to carry on working without a noticeable interruption to the user in the face of such failure.
  • Since the namenode holds filesystem metadata in memory, the limit to the number of files in a filesystem is governed by the amount of memory on the namenode.
  • Files in HDFS may be written by a single writer. 
  • Writes are always made at the end of the file.
  • There is no support for multiple writers, or for modifications at arbitrary offsets in the file.

Post a Comment

0 Comments