HDFS Concepts :
- Blocks :
- HDFS has the concept of a block, but it is a much larger unit—64 MB by default.
- Files in HDFS are broken into block-sized chunks, which are stored as independent units.
- Having a block abstraction for a distributed filesystem brings several benefits. :
- A file can be larger than any single disk in the network. Nothing requires the blocks from a file to be stored on the same disk, so they can take advantage of any of the disks in the cluster.
- Making the unit of abstraction a block rather than a file simplifies the storage subsystem. It simplifies the storage management (since blocks are a fixed size, it is easy to calculate how many can be stored on a given disk) and eliminating metadata concerns.
- Blocks fit well with replication for providing fault tolerance and availability. To insure against corrupted blocks and disk and machine failure, each block is replicated to a small number of physically separate machines.
- HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks.
- Namenodes and Datanodes :
- An HDFS cluster has two types of nodes operating in a master-worker pattern:
- A Namenode (the master) and
- A number of datanodes (workers).
- The namenode manages the filesystem namespace.
- It maintains the filesystem tree and the metadata for all the files and directories in the tree.
- This information is stored persistently on the local disk in the form of two files:
- The namespace image
- The edit log.
- The namenode also knows the datanodes on which all the blocks for a given file are located.
0 Comments