Test Hadoop

We'll look at the best methods to test your MapReduce code, and also look at some design aspects to consider when writing MapReduce to help in your testing efforts

  1. [Java] Logging Configuration

    You'll need to have a log4j.properties file in the classpath that's configured to write to standard out

    log4j.rootLogger=debug,stdout
    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d [%t] %p %C.%M - %m%n
    
  2. [Java] Test Case Using MRUnit

    Javadocs: http://mrunit.apache.org/documentation/javadocs/1.0.0/index.html

    Download Source Code

    Library Requirements:

    • commons-logging-1.1.1.jar
    • hamcrest-core-1.1.jar
    • junit-4.10.jar
    • mockito-all-1.8.5.jar
    • mrunit-1.0.0-hadoop1.jar

    public class WordCountTest {
      private MapDriver<LongWritable, Text, Text, IntWritable> mapDriver;
      private ReduceDriver<Text, IntWritable, Text, IntWritable> reduceDriver;
      private MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable> mapReduceDriver;
      @Before
      public void setUp() throws Exception {
        WordCountMapper mapper = new WordCountMapper();
        WordCountReducer reducer = new WordCountReducer();
        mapDriver = MapDriver.newMapDriver(mapper);
        reduceDriver = ReduceDriver.newReduceDriver(reducer);
        mapReduceDriver = MapReduceDriver.newMapReduceDriver(mapper, reducer);
      }
      @Test
      public void testMapper() throws IOException {
        mapDriver.withInput(new LongWritable(), new Text("one,two,two,three,four,one,one,five"));
        List<Pair<Text, IntWritable>> results = new ArrayList<Pair<Text, IntWritable>>();
        results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("three"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("four"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("five"), new IntWritable(1)));
        mapDriver.withAllOutput(results);
        mapDriver.runTest();
      }
      @Test
      public void testReducer() throws IOException {
        List<IntWritable> values = new ArrayList<IntWritable>();
        values.add(new IntWritable(1));
        values.add(new IntWritable(2));
        reduceDriver.withInput(new Text("one"), values);
        reduceDriver.withOutput(new Text("one"), new IntWritable(3));
        reduceDriver.runTest();
      }
      @Test
      public void testMapReduce() throws IOException {
        mapReduceDriver.withInput(new LongWritable(), new Text("one,two,two,three,four,one,one,five"));
        List<Pair<Text, IntWritable>> results = new ArrayList<Pair<Text, IntWritable>>();
        results.add(new Pair<Text, IntWritable>(new Text("five"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("four"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("one"), new IntWritable(3)));
        results.add(new Pair<Text, IntWritable>(new Text("three"), new IntWritable(1)));
        results.add(new Pair<Text, IntWritable>(new Text("two"), new IntWritable(2)));
        mapReduceDriver.withAllOutput(results);
        mapReduceDriver.runTest();
      }
    }
    
  3. [Java] Local Job Runner - Local Application Testing With Eclipse

    Using Eclipse for Hadoop development provides the capability to run the complete MapReduce application locally - in a "single instance" mode

    The Hadoop distribution (Hadoop-core) comes with the local job runner that lets you run Hadoop on a local machine, in a single JVM.

    In this case, you can set breakpoints inside the mapor reducemethods, using the Eclipse debugger, and "step through" the code to examine programming errors

    Download Source Code

    a. Libraries Required

    • hadoop-core-1.2.1.jar
    • commons-logging-1.1.3.jar
    • commons-configuration-1.10.jar
    • commons-httpclient-3.1.jar
    • commons-lang-2.6.jar
    • jackson-core-asl-1.9.11.jar
    • jackson-mapper-asl-1.9.11.jar
    • junit-4.10.jar

    b. Libraries Useful

    • commons-cli-1.2.jar : require for Tool
    • commons-io-2.4.jar : read output by IOUtils.readLines
    • log4j-1.2.17.jar : use log4j logging

    c. Create input/output folder

    • data/input
    • data/output

    d. Local Job Runner

    • In test folder
    • Right-click the test class WordCountTest
    • Choose Run As (or Debug As) => JUnit Test

  4. [Java] Hadoop Unit Testing with MiniMRCluster and MiniDFSCluster

    MiniMRCluster and MiniDFSCluster. These classes offer full-blown in-memory MapReduce and HDFS clusters, and can launch multiple MapReduce and HDFS nodes. MiniMRCluster and MiniDFSCluster are bundled with the Hadoop test JAR, and are used heavily within Hadoop's own unit tests

  5. Useful Resources:

    Hadoop Testing with Minicluster

    Hadoop Unit Testing with Minimrcluster

Comments

Popular posts from this blog