MRUnitでHadoop MapReduceの試験を

こんにちは。
落合です。

MRUnit を使った、
Hadoop MapReduce のテスト方法をご紹介します。

MRUnitとは

MRUnitは、Hadoop MapReduce のテストのためのライブラリです。
これを使うと、Hadoop MapReduce の JUnitテストを行うことができます。
Context を自分で作る必要もないため、
Mapper や Reducer の in と out の確認が簡単にできます。

開発環境を作る

本ブログ執筆時点で最新の、Cloudera のディストリビューション CDH3u1 を使いました。
以下のサイトでTarball（hadoop-0.20.2-cdh3u1.tar.gz）をダウンロードし解凍します。
https://ccp.cloudera.com/display/SUPPORT/Downloads

解凍したら、libの下と、hadoop-core-0.20.2-cdh3u1.jar を Eclipse のビルドパスに追加しておきます。

さらに、MRUnitを使うために、
contrib\mrunit の下の、hadoop-mrunit-0.20.2-cdh3u1.jar もビルドパスに追加します。

テスト対象の Mapper, Reducer の用意

今回はとりあえず、先ほどダウンロードしたファイル中に保存されている、WordCount.java を使います。

hadoop-0.20.2-cdh3u1\src\examples\org\apache\hadoop\examples\WordCount.java

パッケージ名だけ変えてあります。

確認しやすいように、Eclipse の Refactor 機能で
Mapper と Reducer を別ファイルに切り出しました。

Mapperのテスト

TokenizerMapper では、入力文字列をスペースで分割し、、
<単語,1> のペアをキー、バリューの組として出力とします。

public class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
    
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
      
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

このMapperに対するテストコードを書いてみます。

public class TokenizerMapperTest extends TestCase {
    private Mapper<Object, Text, Text, IntWritable>    mapper;

    private MapDriver<Object, Text, Text, IntWritable> driver;

    @Before
    public void setUp() {
        mapper = new TokenizerMapper();
        driver = new MapDriver<Object, Text, Text, IntWritable>(
                mapper);
    }

    @Test
    public void testTokenizerMapper() {
        driver.withInput(
                new LongWritable(1),
                new Text("We must know. We will know."));
        driver.withOutput(new Text("We"), new IntWritable(1));
        driver.withOutput(new Text("must"), new IntWritable(1));
        driver.withOutput(new Text("know."), new IntWritable(1));
        driver.withOutput(new Text("We"), new IntWritable(1));
        driver.withOutput(new Text("will"), new IntWritable(1));
        driver.withOutput(new Text("know."), new IntWritable(1));
        driver.runTest();
    }
}

TokenizerMapperTest を実行すると、以下のように成功します。

ためしに出力結果を間違えてみると、

    public void testTokenizerMapper() {
        driver.withInput(
                new LongWritable(1),
                new Text("We must know. We will know."));
        driver.withOutput(new Text("I"), new IntWritable(1));
        driver.withOutput(new Text("must"), new IntWritable(1));
        driver.withOutput(new Text("know."), new IntWritable(1));
        driver.withOutput(new Text("We"), new IntWritable(1));
        driver.withOutput(new Text("will"), new IntWritable(1));
        driver.withOutput(new Text("know."), new IntWritable(1));
        driver.runTest();
    }

こうなります。

コンソールにエラーが出ます。

エラーメッセージは以下の通り。

11/07/24 02:41:31 ERROR mrunit.TestDriver: Matched expected output (We, 1) but at incorrect position 0 (expected position 3)
11/07/24 02:41:31 ERROR mrunit.TestDriver: Missing expected output (I, 1) at position 0

どこを間違えたかわかりますね。

Reducerのテスト

Reducer のテストもほとんど同様です。

public class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

に対するテストコードが以下。

public class IntSumReducerTest extends TestCase {
    private Reducer<Text,IntWritable,Text,IntWritable>      reducer;

    private ReduceDriver<Text,IntWritable,Text,IntWritable> driver;

    @Before
    public void setUp() {
        reducer = new IntSumReducer();
        driver = new ReduceDriver<Text,IntWritable,Text,IntWritable>(
                reducer);
    }

    @Test
    public void testIntSumReducer() {
        List<IntWritable> values = new ArrayList<IntWritable>();
        values.add(new IntWritable(1));
        values.add(new IntWritable(1));
        driver.withInput(new Text("We"), values);
        driver.withOutput(new Text("We"), new IntWritable(2));
        driver.runTest();
    }
}

Mapper との違いは、
List で複数の value を渡してやるところくらいですね。

実行結果は以下のようになります。

MapperとReducerをつなげたテスト

Mapper から Reducer まで、一連の流れをテストすることもできます。

その場合、以下のように、MapReduceDriver を使います。

public class TokenizerMapperIntSumReducerTest extends TestCase {
	private Mapper<Object, Text, Text, IntWritable> mapper;
	private Reducer<Text, IntWritable, Text, IntWritable> reducer;

	private MapReduceDriver<Object, Text, Text, IntWritable, Text, IntWritable> driver;

	@Before
    public void setUp() {
        mapper = new TokenizerMapper();
        reducer = new IntSumReducer();
        driver = new MapReduceDriver<Object, Text,
            Text, IntWritable, Text,IntWritable>(mapper, reducer);
    }

	@Test
	public void testTokenizerMapper() {
		driver.withInput(new LongWritable(1), new Text(
				"We must know. We will know."));
        driver.withOutput(new Text("We"), new IntWritable(2));
        driver.withOutput(new Text("know."), new IntWritable(2));
        driver.withOutput(new Text("must"), new IntWritable(1));
        driver.withOutput(new Text("will"), new IntWritable(1));
		driver.runTest();
	}
}