Benefits and Challenges :
Benefits of HDFS:
- HDFS can store a large amount of information.
- HDFS is a simple & robust coherency model.
- HDFS is scalable and has fast access to required information.
- HDFS also serve a substantial number of clients by adding more machines to the cluster.
- HDFS provides streaming read access.
- HDFS can be used to read data stored multiple times but the data will be written to the HDFS once.
- The recovery techniques will be applied very quickly.
- Hardware and operating systems portability across is heterogeneous commodities.
- High Economy by distributing data and processing across clusters of commodity personal computers.
- High Efficiency by distributing data, logic on parallel nodes to process it from where data is located.
- High Reliability by automatically maintaining multiple copies of data and automatically redeploying processing logic in the event of failures.
Challenges for HDFS :
- HDFS does not give any reliability if that machine goes down.
- An enormous number of clients must be handled if all the clients need the data stored on a single machine.
- Clients need to copy the data to their local machines before they can operate it.
- Applications that require low-latency access to data, in the tens of milliseconds range, will not work well with HDFS.
- Since the namenode holds filesystem metadata in memory, the limit to the number of files in a filesystem is governed by the amount of memory on the namenode.
- Files in HDFS may be written by a single writer. Writers are always made at the end of the file.
- There is no support for multiple writers, or for modifications at arbitrary offsets in the file.
0 Comments