HBASE-4788. Adding MR to file example.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1202031 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-11-15 04:10:38 +00:00
parent 548a74e9b4
commit eaada4b531
2 changed files with 53 additions and 3 deletions

View File

@ -1017,7 +1017,7 @@ TableMapReduceUtil.initTableMapperJob(
job);
TableMapReduceUtil.initTableReducerJob(
targetTable, // output table
MyReducer.class, // reducer class
MyTableReducer.class, // reducer class
job);
job.setNumReduceTasks(1); // at least one, adjust as required
@ -1044,7 +1044,7 @@ public static class MyMapper extends TableMapper<Text, IntWritable> {
</programlisting>
In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a <classname>Put</classname>.
<programlisting>
public static class MyReducer extends TableReducer&lt;Text, IntWritable, ImmutableBytesWritable&gt; {
public static class MyTableReducer extends TableReducer&lt;Text, IntWritable, ImmutableBytesWritable&gt; {
public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
int i = 0;
@ -1058,10 +1058,55 @@ public static class MyReducer extends TableReducer&lt;Text, IntWritable, Immutab
}
}
</programlisting>
</para>
</section>
<section xml:id="mapreduce.example.summary.file">
<title>HBase MapReduce Summary to File Example</title>
<para>This very similar to the summary example above, with exception that this is using HBase as a MapReduce source
but HDFS as the sink. The differences are in the job setup and in the reducer. The mapper remains the same.
</para>
<programlisting>
Configuration config = HBaseConfiguration.create();
Job job = new Job(config,"ExampleSummaryToFile");
job.setJarByClass(MySummaryFileJob.class); // class that contains mapper and reducer
Scan scan = new Scan();
scan.setCaching(500); // 1 is the default in Scan, which will be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
// set other scan attrs
TableMapReduceUtil.initTableMapperJob(
sourceTable, // input table
scan, // Scan instance to control CF and attribute selection
MyMapper.class, // mapper class
Text.class, // mapper output key
IntWritable.class, // mapper output value
job);
job.setReducerClass(MyReducer.class); // reducer class
job.setNumReduceTasks(1); // at least one, adjust as required
FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile")); // adjust directories as required
boolean b = job.waitForCompletion(true);
if (!b) {
throw new IOException("error with job!");
}
</programlisting>
As stated above, the previous Mapper can run unchanged with this example.
As for the Reducer, it is a "generic" Reducer instead of extending TableMapper and emitting Puts.
<programlisting>
public static class MyReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
int i = 0;
for (IntWritable val : values) {
i += val.get();
}
context.write(key, new IntWritable(i));
}
}
</programlisting>
</section>
</section> <!-- mr examples -->
<section xml:id="mapreduce.htable.access">
<title>Accessing Other HBase Tables in a MapReduce Job</title>
<para>Although the framework currently allows one HBase table as input to a

View File

@ -535,6 +535,8 @@ hadoop 17789 155 35.2 9067824 8604364 ? S&lt;l Mar04 9855:48 /usr/java/j
<para><programlisting>hadoop fs -dus /hbase/</programlisting> ...returns the summarized disk utilization for all HBase objects. </para>
<para><programlisting>hadoop fs -dus /hbase/myTable</programlisting> ...returns the summarized disk utilization for the HBase table 'myTable'. </para>
<para><programlisting>hadoop fs -du /hbase/myTable</programlisting> ...returns a list of the regions under the HBase table 'myTable' and their disk utilization. </para>
<para>For more information on HDFS shell commands, see the <link xlink:href="http://hadoop.apache.org/common/docs/current/file_system_shell.html">HDFS FileSystem Shell documentation</link>.
</para>
</section>
<section xml:id="trouble.namenode.hbase.objects">
<title>Browsing HDFS for HBase Objects</title>
@ -557,6 +559,9 @@ hadoop 17789 155 35.2 9067824 8604364 ? S&lt;l Mar04 9855:48 /usr/java/j
<filename>/&lt;RegionServer&gt;</filename> (RegionServers)
<filename>/&lt;HLog&gt;</filename> (WAL HLog files for the RegionServer)
</programlisting>
</para>
<para>See the <link xlink:href="see http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html">HDFS User Guide</link> for other non-shell diagnostic
utilities like <code>fsck</code>.
</para>
<section xml:id="trouble.namenode.uncompaction">
<title>Use Cases</title>