hbase-4906. book.xml. added MR summary to RDBMS section in MR chapter.

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1208654 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-11-30 18:30:10 +00:00
parent 338e4d602c
commit c4078275ea
1 changed files with 36 additions and 2 deletions

View File

@ -1004,7 +1004,7 @@ public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put&
</para>
</section>
<section xml:id="mapreduce.example.summary">
<title>HBase MapReduce Summary Example</title>
<title>HBase MapReduce Summary to HBase Example</title>
<para>The following example uses HBase as a MapReduce source and sink with a summarization step. This example will
count the number of distinct instances of a value in a table and write those summarized counts in another table.
<programlisting>
@ -1116,7 +1116,7 @@ if (!b) {
</programlisting>
</section>
<section xml:id="mapreduce.example.summary.noreducer">
<title>HBase MapReduce Summary Without Reducer</title>
<title>HBase MapReduce Summary to HBase Without Reducer</title>
<para>It is also possible to perform summaries without a reducer - if you use HBase as the reducer.
</para>
<para>An HBase target table would need to exist for the job summary. The HTable method <code>incrementColumnValue</code>
@ -1128,6 +1128,40 @@ if (!b) {
<para>In the end, the summary results are in HBase.
</para>
</section>
<section xml:id="mapreduce.example.summary.rdbms">
<title>HBase MapReduce Summary to RDBMS</title>
<para>Sometimes it is more appropriate to generate summaries to an RDBMS. For these cases, it is possible
to generate summaries directly to an RDBMS via a custom reducer. The <code>setup</code> method
can connect to an RDBMS (the connection information can be passed via custom parameters in the context) and the
cleanup method can close the connection.
</para>
<para>It is critical to understand that number of reducers for the job affects the summarization implementation, you
you'll have to design this into your reducer. Specifically, whether it is designed to run as a singleton (one reducer)
or multiple reducers. Neither is right or wrong, it depends on your use-case.
</para>
<programlisting>
public static class MyRdbmsReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
private Connection c = null;
public void setup(Context context) {
// create DB connection...
}
public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context) throws IOException, InterruptedException {
// do summarization
// in this example the keys are Text, but this is just an example
}
public void cleanup(Context context) {
// close db connection
}
}
</programlisting>
<para>In the end, the summary results are in HBase.
</para>
</section>
</section> <!-- mr examples -->
<section xml:id="mapreduce.htable.access">