MRv2
MRv2 (aka YARN, "Yet Another Resource Negotiator") has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources/tasks, etc. The MapReduce framework in the Hadoop 1.x version is also known as MRv1. The MRv1 framework includes client communication, job execution and management, resource scheduling and resource management. The Hadoop daemons associated with MRv1 are JobTracker and TaskTracker as shown in the following figure:
YARN
YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was described as a “Redesigned Resource Manager” at the time of its launching, but it has now evolved to be known as a large-scale distributed operating system used for Big Data processing.YARN also allows different data processing engines like graph processing, interactive processing, stream processing as well as batch processing to run and process data stored in HDFS (Hadoop Distributed File System) thus making the system much more efficient. Through its various components, it can dynamically allocate various resources and schedule the application processing. For large volume data processing, it is quite necessary to manage the available resources properly so that every application can leverage them.
Running MRv1 in YARN.
YARN uses the ResourceManager web interface for monitoring applications running on a YARN cluster. The ResourceManager UI shows the basic cluster metrics, list of applications, and nodes associated with the cluster. In this section, we'll discuss the monitoring of MRv1 applications over YARN.
The Resource Manager is the core component of YARN – Yet Another Resource Negotiator. In analogy, it occupies the place of JobTracker of MRV1. Hadoop YARN is designed to provide a generic and flexible framework to administer the computing resources in the Hadoop cluster.
In this direction, the YARN Resource Manager Service (RM) is the central controlling authority for resource management and makes allocation decisions ResourceManager has two main components: Scheduler and ApplicationsManager.
0 Comments