HBASE-4788. Adding MR to file example.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1202031 13f79535-47bb-0310-9956-ffa450edef68
2011-11-15 04:10:38 +00:00 · 2011-11-15 04:10:38 +00:00 · eaada4b531
parent 548a74e9b4
commit eaada4b531
2 changed files with 53 additions and 3 deletions
--- a/src/docbkx/book.xml
+++ b/src/docbkx/book.xml
@ -1017,7 +1017,7 @@ TableMapReduceUtil.initTableMapperJob(
 	job);
 TableMapReduceUtil.initTableReducerJob(
 	targetTable,        // output table
-	MyReducer.class,    // reducer class
+	MyTableReducer.class,    // reducer class
 	job);
 job.setNumReduceTasks(1);   // at least one, adjust as required
 	    
@ -1044,7 +1044,7 @@ public static class MyMapper extends TableMapper&lt;Text, IntWritable&gt;  {
    </programlisting>
    In the reducer, the "ones" are counted (just like any other MR example that does this), and then emits a <classname>Put</classname>.
    <programlisting>
-public static class MyReducer extends TableReducer&lt;Text, IntWritable, ImmutableBytesWritable&gt;  {
+public static class MyTableReducer extends TableReducer&lt;Text, IntWritable, ImmutableBytesWritable&gt;  {
        
 	public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
    		int i = 0;
@ -1058,10 +1058,55 @@ public static class MyReducer extends TableReducer&lt;Text, IntWritable, Immutab
   	}
 }
    </programlisting>
-    
    </para>
    </section>
+    <section xml:id="mapreduce.example.summary.file">
+    <title>HBase MapReduce Summary to File Example</title>
+       <para>This very similar to the summary example above, with exception that this is using HBase as a MapReduce source
+       but HDFS as the sink.  The differences are in the job setup and in the reducer.  The mapper remains the same.
+       </para> 
+    <programlisting>
+Configuration config = HBaseConfiguration.create();
+Job job = new Job(config,"ExampleSummaryToFile");
+job.setJarByClass(MySummaryFileJob.class);     // class that contains mapper and reducer
+	        
+Scan scan = new Scan();
+scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobs
+scan.setCacheBlocks(false);  // don't set to true for MR jobs
+// set other scan attrs
+	        
+TableMapReduceUtil.initTableMapperJob(
+	sourceTable,        // input table
+	scan,               // Scan instance to control CF and attribute selection
+	MyMapper.class,     // mapper class
+	Text.class,         // mapper output key
+	IntWritable.class,  // mapper output value
+	job);
+job.setReducerClass(MyReducer.class);    // reducer class
+job.setNumReduceTasks(1);    // at least one, adjust as required
+FileOutputFormat.setOutputPath(job, new Path("/tmp/mr/mySummaryFile"));  // adjust directories as required
+	    
+boolean b = job.waitForCompletion(true);
+if (!b) {
+	throw new IOException("error with job!");
+}    
+    </programlisting>
+    As stated above, the previous Mapper can run unchanged with this example.  
+    As for the Reducer, it is a "generic" Reducer instead of extending TableMapper and emitting Puts.
+    <programlisting>
+ public static class MyReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt;  {
+        
+	public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
+		int i = 0;
+		for (IntWritable val : values) {
+			i += val.get();
+		}	
+		context.write(key, new IntWritable(i));
+	}
+}
+    </programlisting>
   </section>
+   </section> <!--  mr examples -->
   <section xml:id="mapreduce.htable.access">
   <title>Accessing Other HBase Tables in a MapReduce Job</title>
 	<para>Although the framework currently allows one HBase table as input to a
--- a/src/docbkx/troubleshooting.xml
+++ b/src/docbkx/troubleshooting.xml
@ -535,6 +535,8 @@ hadoop   17789  155 35.2 9067824 8604364 ?     S&lt;l  Mar04 9855:48 /usr/java/j
            <para><programlisting>hadoop fs -dus /hbase/</programlisting> ...returns the summarized disk utilization for all HBase objects.  </para>
            <para><programlisting>hadoop fs -dus /hbase/myTable</programlisting> ...returns the summarized disk utilization for the HBase table 'myTable'. </para>
            <para><programlisting>hadoop fs -du /hbase/myTable</programlisting> ...returns a list of the regions under the HBase table 'myTable' and their disk utilization. </para>
+            <para>For more information on HDFS shell commands, see the <link xlink:href="http://hadoop.apache.org/common/docs/current/file_system_shell.html">HDFS FileSystem Shell documentation</link>.
+            </para>
       </section>    
       <section xml:id="trouble.namenode.hbase.objects">
            <title>Browsing HDFS for HBase Objects</title>
@ -557,6 +559,9 @@ hadoop   17789  155 35.2 9067824 8604364 ?     S&lt;l  Mar04 9855:48 /usr/java/j
          <filename>/&lt;RegionServer&gt;</filename>    (RegionServers)
               <filename>/&lt;HLog&gt;</filename>           (WAL HLog files for the RegionServer)
            </programlisting>
+            </para>
+		    <para>See the <link xlink:href="see http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html">HDFS User Guide</link> for other non-shell diagnostic 
+		    utilities like <code>fsck</code>. 
            </para>
          <section xml:id="trouble.namenode.uncompaction">
            <title>Use Cases</title>