HBASE-4172 laundry list of changes (book, configuration, developer, performance)

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1154555 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-08-06 17:21:20 +00:00
parent 2961f7dcd4
commit dc8bf13685
4 changed files with 52 additions and 14 deletions

View File

@ -260,6 +260,15 @@ admin.enableTable(table);
search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily. that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
</para> </para>
<section xml:id="counters">
<title>Counters</title>
<para>
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
</para>
<para>Synchronization on counters are done on the RegionServer, not in the client.
</para>
</section>
</section> </section>
<section xml:id="cf.in.memory"> <section xml:id="cf.in.memory">
<title> <title>
@ -811,7 +820,7 @@ admin.enableTable(table);
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
is responsible for finding RegionServers that are serving the is responsible for finding RegionServers that are serving the
particular row range of interest. It does this by querying particular row range of interest. It does this by querying
the <code>.META.</code> and <code>-ROOT</code> catalog tables the <code>.META.</code> and <code>-ROOT-</code> catalog tables
(TODO: Explain). After locating the required (TODO: Explain). After locating the required
region(s), the client <emphasis>directly</emphasis> contacts region(s), the client <emphasis>directly</emphasis> contacts
the RegionServer serving that region (i.e., it does not go the RegionServer serving that region (i.e., it does not go
@ -842,6 +851,11 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
For more information about how connections are handled in the HBase client, For more information about how connections are handled in the HBase client,
see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>. see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>.
</para> </para>
<section xml:id="client.connection.pooling"><title>Connection Pooling</title>
<para>For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads
in a single JVM), see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html">HTablePool</link>.
</para>
</section>
</section> </section>
<section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title> <section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title>
<para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on <para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on
@ -1055,7 +1069,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link> For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link>
</para> </para>
<para>After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable; <para>After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
major compactions will usually have to be <xref linkend="disable.splitting" /> on large systems. major compactions will usually have to be done manually on large systems. See <xref linkend="managed.compactions" />.
</para> </para>
</section> </section>

View File

@ -1076,7 +1076,18 @@ script to perform a network IO safe rolling split
of all regions. of all regions.
</para> </para>
</section> </section>
<section xml:id="managed.compactions"><title>Managed Compactions</title>
<para>A common administrative technique is to manage major compactions manually, rather than letting
HBase do it. By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
may kick in when you least desire it - especially on a busy system. To "turn off" automatic major compactions set
the value to <varname>Long.MAX_VALUE</varname>.
</para>
<para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
they occur. They can be administered through the HBase shell, or via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
</para>
</section>
</section> </section>
</section> </section>

View File

@ -35,6 +35,7 @@ git clone git://git.apache.org/hbase.git
<para>See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3678">HBASE-3678 Add Eclipse-based Apache Formatter to HBase Wiki</link> <para>See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3678">HBASE-3678 Add Eclipse-based Apache Formatter to HBase Wiki</link>
for an Eclipse formatter to help ensure your code conforms to HBase'y coding conventsion. for an Eclipse formatter to help ensure your code conforms to HBase'y coding conventsion.
The issue includes instructions for loading the attached formatter.</para> The issue includes instructions for loading the attached formatter.</para>
<para>Also, no @author tags - that's a rule. Quality Javadoc comments are appreciated. And include the Apache license.</para>
</section> </section>
<section xml:id="eclipse.svn"> <section xml:id="eclipse.svn">
<title>Subversive Plugin</title> <title>Subversive Plugin</title>
@ -129,13 +130,15 @@ mvn test -Dtest=TestXYZ
<section xml:id="getting.involved"> <section xml:id="getting.involved">
<title>Getting Involved</title> <title>Getting Involved</title>
<para>HBase gets better only when people contribute! The following are highlights from the HBase wiki on <para>HBase gets better only when people contribute!
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToContribute">How To Contribute</link>.
</para> </para>
<section xml:id="mailing.list"> <section xml:id="mailing.list">
<title>Mailing Lists</title> <title>Mailing Lists</title>
<para>Sign up for the dev-list, and the user-list too for greater coverage. See the <para>Sign up for the dev-list and the user-list. See the
<link xlink:href="http://hbase.apache.org/mail-lists.html">mailing lists</link> page. <link xlink:href="http://hbase.apache.org/mail-lists.html">mailing lists</link> page.
Posing questions - and helping to answer other people's questions - is encouraged!
There are varying levels of experience on both lists so patience and politeness are encouraged (and please
stay on topic.)
</para> </para>
</section> </section>
<section xml:id="jira"> <section xml:id="jira">
@ -144,14 +147,19 @@ mvn test -Dtest=TestXYZ
If it's either a new feature request, enhancement, or a bug, file a ticket. If it's either a new feature request, enhancement, or a bug, file a ticket.
</para> </para>
</section> </section>
<section xml:id="codelines"><title>Codelines</title>
<para>Most development is done on TRUNK. However, there are branches for minor releases (e.g., 0.90.1, 0.90.2, and 0.90.3 are on the 0.90 branch).</para>
<para>If you have any questions on this just send an email to the dev dist-list.</para>
</section>
<section xml:id="submitting.patches"> <section xml:id="submitting.patches">
<title>Submitting Patches</title> <title>Submitting Patches</title>
<section xml:id="submitting.patches.create"> <section xml:id="submitting.patches.create">
<title>Create Patch</title> <title>Create Patch</title>
<para>Patch files can be easily generated from Eclipse, for example by selecting Team -&gt; Create Patch. <para>Patch files can be easily generated from Eclipse, for example by selecting "Team -&gt; Create Patch".
</para> </para>
<para>Please submit one patch-file per Jira. For example, if multiple files are changed make sure the <para>Please submit one patch-file per Jira. For example, if multiple files are changed make sure the
selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. </para> selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. </para>
<para>Make sure you review <xref linkend="eclipse.code.formatting"/> for code style. </para>
</section> </section>
<section xml:id="submitting.patches.naming"> <section xml:id="submitting.patches.naming">
<title>Patch File Naming</title> <title>Patch File Naming</title>
@ -162,12 +170,14 @@ mvn test -Dtest=TestXYZ
<section xml:id="submitting.patches.tests"> <section xml:id="submitting.patches.tests">
<title>Unit Tests</title> <title>Unit Tests</title>
<para>Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes).</para> <para>Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes).</para>
<para>Also, please make sure unit tests pass locally before submitting the patch.</para>
</section> </section>
<section xml:id="submitting.patches.jira"> <section xml:id="submitting.patches.jira">
<title>Attach Patch to Jira</title> <title>Attach Patch to Jira</title>
<para>The patch should be attached to the associated Jira ticket. <para>The patch should be attached to the associated Jira ticket "More Actions -&gt; Attach Files". Make sure you click the
ASF license inclusion, otherwise the patch can't be considered for inclusion.
</para> </para>
<para>Once attached to the ticket, click "submit patch" and <para>Once attached to the ticket, click "Submit Patch" and
the status of the ticket will change. Committers will review submitted patches for inclusion into the codebase. Please the status of the ticket will change. Committers will review submitted patches for inclusion into the codebase. Please
understand that not every patch may get committed, and that feedback will likely be provided on the patch. Fear not, though, understand that not every patch may get committed, and that feedback will likely be provided on the patch. Fear not, though,
because the HBase community is helpful! because the HBase community is helpful!
@ -177,7 +187,7 @@ mvn test -Dtest=TestXYZ
<section xml:id="committing.patches"> <section xml:id="committing.patches">
<title>Committing Patches</title> <title>Committing Patches</title>
<para> <para>
See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki. Committers do this. See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki.
</para> </para>
</section> </section>

View File

@ -232,10 +232,13 @@ Deferred log flush can be configured on tables via <link
</section> </section>
<section xml:id="perf.hbase.write.mr.reducer"> <section xml:id="perf.hbase.write.mr.reducer">
<title>MapReduce: Skip The Reducer</title> <title>MapReduce: Skip The Reducer</title>
<para>When writing a lot of data to an HBase table in a in a Mapper (e.g., with <link <para>When writing a lot of data to an HBase table from a MR job (e.g., with <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted
skip the Reducer step whenever possible. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then shuffled to other from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other
Reducers that will most likely be off-node. Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.
</para>
<para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result).
This is a different processing problem than from the the above case.
</para> </para>
</section> </section>