Test Hadoop
We'll look at the best methods to test your MapReduce code, and also look at some design aspects to consider when writing MapReduce to help in your testing efforts
[Java] Logging Configuration
You'll need to have a log4j.properties file in the classpath that's configured to write to standard out
log4j.rootLogger=debug,stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d [%t] %p %C.%M - %m%n
[Java] Test Case Using MRUnit
Javadocs: http://mrunit.apache.org/documentation/javadocs/1.0.0/index.html
Download Source Code
Library Requirements:
- commons-logging-1.1.1.jar
- hamcrest-core-1.1.jar
- junit-4.10.jar
- mockito-all-1.8.5.jar
- mrunit-1.0.0-hadoop1.jar
public class WordCountTest { private MapDriver<LongWritable, Text, Text, IntWritable> mapDriver; private ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver; private MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver; @Before public void setUp() throws Exception { WordCountMapper mapper = new WordCountMapper(); WordCountReducer reducer = new WordCountReducer(); mapDriver = MapDriver.newMapDriver(mapper); reduceDriver = ReduceDriver.newReduceDriver(reducer); mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer); } @Test public void testMapper() throws IOException { mapDriver.withInput(new LongWritable(), new Text("one,two,two,three,four,one,one,five")); List<Pair<Text, IntWritable>> results = new ArrayList<Pair<Text, IntWritable>>(); results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("three"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("four"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("five"), new IntWritable(1))); mapDriver.withAllOutput(results); mapDriver.runTest(); } @Test public void testReducer() throws IOException { List<IntWritable> values = new ArrayList<IntWritable>(); values.add(new IntWritable(1)); values.add(new IntWritable(2)); reduceDriver.withInput(new Text("one"), values); reduceDriver.withOutput(new Text("one"), new IntWritable(3)); reduceDriver.runTest(); } @Test public void testMapReduce() throws IOException { mapReduceDriver.withInput(new LongWritable(), new Text("one,two,two,three,four,one,one,five")); List<Pair<Text, IntWritable>> results = new ArrayList<Pair<Text, IntWritable>>(); results.add(new Pair<Text, IntWritable>(new Text("five"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("four"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(3))); results.add(new Pair<Text, IntWritable>(new Text("three"), new IntWritable(1))); results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(2))); mapReduceDriver.withAllOutput(results); mapReduceDriver.runTest(); } }
[Java] Local Job Runner - Local Application Testing With Eclipse
Using Eclipse for Hadoop development provides the capability to run the complete MapReduce application locally - in a "single instance" mode
The Hadoop distribution (Hadoop-core) comes with the local job runner that lets you run Hadoop on a local machine, in a single JVM.
In this case, you can set breakpoints inside the mapor reducemethods, using the Eclipse debugger, and "step through" the code to examine programming errors
Download Source Code
a. Libraries Required
- hadoop-core-1.2.1.jar
- commons-logging-1.1.3.jar
- commons-configuration-1.10.jar
- commons-httpclient-3.1.jar
- commons-lang-2.6.jar
- jackson-core-asl-1.9.11.jar
- jackson-mapper-asl-1.9.11.jar
- junit-4.10.jar
b. Libraries Useful
- commons-cli-1.2.jar : require for Tool
- commons-io-2.4.jar : read output by IOUtils.readLines
- log4j-1.2.17.jar : use log4j logging
c. Create input/output folder
- data/input
- data/output
d. Local Job Runner
- In test folder
- Right-click the test class WordCountTest
- Choose Run As (or Debug As) => JUnit Test
[Java] Hadoop Unit Testing with MiniMRCluster and MiniDFSCluster
MiniMRCluster and MiniDFSCluster. These classes offer full-blown in-memory MapReduce and HDFS clusters, and can launch multiple MapReduce and HDFS nodes. MiniMRCluster and MiniDFSCluster are bundled with the Hadoop test JAR, and are used heavily within Hadoop's own unit tests
Useful Resources:
Comments
Post a Comment