HDFS Concepts

Rahul Saini May 04, 2022

HDFS Concepts :

Blocks :
HDFS has the concept of a block, but it is a much larger unit—64 MB by default.
Files in HDFS are broken into block-sized chunks, which are stored as independent units.
Having a block abstraction for a distributed filesystem brings several benefits. :

A file can be larger than any single disk in the network. Nothing requires the blocks from a file to be stored on the same disk, so they can take advantage of any of the disks in the cluster.
Making the unit of abstraction a block rather than a file simplifies the storage subsystem. It simplifies the storage management (since blocks are a fixed size, it is easy to calculate how many can be stored on a given disk) and eliminating metadata concerns.
Blocks fit well with replication for providing fault tolerance and availability. To insure against corrupted blocks and disk and machine failure, each block is replicated to a small number of physically separate machines.

HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks.
Namenodes and Datanodes :
An HDFS cluster has two types of nodes operating in a master-worker pattern:

A Namenode (the master) and
A number of datanodes (workers).

The namenode manages the filesystem namespace.
It maintains the filesystem tree and the metadata for all the files and directories in the tree.
This information is stored persistently on the local disk in the form of two files:
The namespace image
The edit log.
The namenode also knows the datanodes on which all the blocks for a given file are located.

Post a Comment

0 Comments