Developing A Map-Reduce Application :
Writing a program in MapReduce follows a certain pattern.
You start by writing your map and reduce functions, ideally with unit tests to make sure they do what you expect.
Then you write a driver program to run a job, which can run from your IDE using a small subset of the data to check that it is working.
If it fails, you can use your IDE’s debugger to find the source of the problem.
When the program runs as expected against the small dataset, you are ready to unleash it on a cluster.
Running against the full dataset is likely to expose some more issues, which you can fix by expanding your tests and altering your mapper or reducer to handle the new cases.
After the program is working, you may wish to do some tuning :
- First by running through some standard checks for making MapReduce programs faster
- Second by doing task profiling.
Profiling distributed programs are not easy, but Hadoop has hooks to aid in the process.
Before we start writing a MapReduce program, we need to set up and configure the development environment.
Components in Hadoop are configured using Hadoop’s own configuration API.
An instance of the Configuration class represents a collection of configuration properties and their values.
Each property is named by a String, and the type of a value may be one of several, including Java primitives such as boolean, int, long, and float and other useful types such as String, Class, and java.io.File; and collections of Strings.
0 Comments