HBASE-4172 laundry list of changes (book, configuration, developer, performance)
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1154555 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
2961f7dcd4
commit
dc8bf13685
|
@ -260,6 +260,15 @@ admin.enableTable(table);
|
|||
search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
|
||||
that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
|
||||
</para>
|
||||
<section xml:id="counters">
|
||||
<title>Counters</title>
|
||||
<para>
|
||||
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
|
||||
</para>
|
||||
<para>Synchronization on counters are done on the RegionServer, not in the client.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section xml:id="cf.in.memory">
|
||||
<title>
|
||||
|
@ -811,7 +820,7 @@ admin.enableTable(table);
|
|||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>
|
||||
is responsible for finding RegionServers that are serving the
|
||||
particular row range of interest. It does this by querying
|
||||
the <code>.META.</code> and <code>-ROOT</code> catalog tables
|
||||
the <code>.META.</code> and <code>-ROOT-</code> catalog tables
|
||||
(TODO: Explain). After locating the required
|
||||
region(s), the client <emphasis>directly</emphasis> contacts
|
||||
the RegionServer serving that region (i.e., it does not go
|
||||
|
@ -842,6 +851,11 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
|||
For more information about how connections are handled in the HBase client,
|
||||
see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>.
|
||||
</para>
|
||||
<section xml:id="client.connection.pooling"><title>Connection Pooling</title>
|
||||
<para>For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads
|
||||
in a single JVM), see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTablePool.html">HTablePool</link>.
|
||||
</para>
|
||||
</section>
|
||||
</section>
|
||||
<section xml:id="client.writebuffer"><title>WriteBuffer and Batch Methods</title>
|
||||
<para>If <xref linkend="perf.hbase.client.autoflush" /> is turned off on
|
||||
|
@ -1055,7 +1069,7 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting>
|
|||
For a description of how a minor compaction picks files to compact, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii diagram in the Store source code.</link>
|
||||
</para>
|
||||
<para>After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable;
|
||||
major compactions will usually have to be <xref linkend="disable.splitting" /> on large systems.
|
||||
major compactions will usually have to be done manually on large systems. See <xref linkend="managed.compactions" />.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
|
|
@ -1074,6 +1074,17 @@ interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. I
|
|||
grows too large, use the (post-0.90.0 HBase) <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>
|
||||
script to perform a network IO safe rolling split
|
||||
of all regions.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="managed.compactions"><title>Managed Compactions</title>
|
||||
<para>A common administrative technique is to manage major compactions manually, rather than letting
|
||||
HBase do it. By default, <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname> is one day and major compactions
|
||||
may kick in when you least desire it - especially on a busy system. To "turn off" automatic major compactions set
|
||||
the value to <varname>Long.MAX_VALUE</varname>.
|
||||
</para>
|
||||
<para>It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when
|
||||
they occur. They can be administered through the HBase shell, or via
|
||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
|
|
@ -35,6 +35,7 @@ git clone git://git.apache.org/hbase.git
|
|||
<para>See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3678">HBASE-3678 Add Eclipse-based Apache Formatter to HBase Wiki</link>
|
||||
for an Eclipse formatter to help ensure your code conforms to HBase'y coding conventsion.
|
||||
The issue includes instructions for loading the attached formatter.</para>
|
||||
<para>Also, no @author tags - that's a rule. Quality Javadoc comments are appreciated. And include the Apache license.</para>
|
||||
</section>
|
||||
<section xml:id="eclipse.svn">
|
||||
<title>Subversive Plugin</title>
|
||||
|
@ -129,13 +130,15 @@ mvn test -Dtest=TestXYZ
|
|||
|
||||
<section xml:id="getting.involved">
|
||||
<title>Getting Involved</title>
|
||||
<para>HBase gets better only when people contribute! The following are highlights from the HBase wiki on
|
||||
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToContribute">How To Contribute</link>.
|
||||
<para>HBase gets better only when people contribute!
|
||||
</para>
|
||||
<section xml:id="mailing.list">
|
||||
<title>Mailing Lists</title>
|
||||
<para>Sign up for the dev-list, and the user-list too for greater coverage. See the
|
||||
<para>Sign up for the dev-list and the user-list. See the
|
||||
<link xlink:href="http://hbase.apache.org/mail-lists.html">mailing lists</link> page.
|
||||
Posing questions - and helping to answer other people's questions - is encouraged!
|
||||
There are varying levels of experience on both lists so patience and politeness are encouraged (and please
|
||||
stay on topic.)
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="jira">
|
||||
|
@ -144,14 +147,19 @@ mvn test -Dtest=TestXYZ
|
|||
If it's either a new feature request, enhancement, or a bug, file a ticket.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="codelines"><title>Codelines</title>
|
||||
<para>Most development is done on TRUNK. However, there are branches for minor releases (e.g., 0.90.1, 0.90.2, and 0.90.3 are on the 0.90 branch).</para>
|
||||
<para>If you have any questions on this just send an email to the dev dist-list.</para>
|
||||
</section>
|
||||
<section xml:id="submitting.patches">
|
||||
<title>Submitting Patches</title>
|
||||
<section xml:id="submitting.patches.create">
|
||||
<title>Create Patch</title>
|
||||
<para>Patch files can be easily generated from Eclipse, for example by selecting Team -> Create Patch.
|
||||
<para>Patch files can be easily generated from Eclipse, for example by selecting "Team -> Create Patch".
|
||||
</para>
|
||||
<para>Please submit one patch-file per Jira. For example, if multiple files are changed make sure the
|
||||
selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. </para>
|
||||
<para>Make sure you review <xref linkend="eclipse.code.formatting"/> for code style. </para>
|
||||
</section>
|
||||
<section xml:id="submitting.patches.naming">
|
||||
<title>Patch File Naming</title>
|
||||
|
@ -162,12 +170,14 @@ mvn test -Dtest=TestXYZ
|
|||
<section xml:id="submitting.patches.tests">
|
||||
<title>Unit Tests</title>
|
||||
<para>Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes).</para>
|
||||
<para>Also, please make sure unit tests pass locally before submitting the patch.</para>
|
||||
</section>
|
||||
<section xml:id="submitting.patches.jira">
|
||||
<title>Attach Patch to Jira</title>
|
||||
<para>The patch should be attached to the associated Jira ticket.
|
||||
<para>The patch should be attached to the associated Jira ticket "More Actions -> Attach Files". Make sure you click the
|
||||
ASF license inclusion, otherwise the patch can't be considered for inclusion.
|
||||
</para>
|
||||
<para>Once attached to the ticket, click "submit patch" and
|
||||
<para>Once attached to the ticket, click "Submit Patch" and
|
||||
the status of the ticket will change. Committers will review submitted patches for inclusion into the codebase. Please
|
||||
understand that not every patch may get committed, and that feedback will likely be provided on the patch. Fear not, though,
|
||||
because the HBase community is helpful!
|
||||
|
@ -177,7 +187,7 @@ mvn test -Dtest=TestXYZ
|
|||
<section xml:id="committing.patches">
|
||||
<title>Committing Patches</title>
|
||||
<para>
|
||||
See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki.
|
||||
Committers do this. See <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HowToCommit">How To Commit</link> in the HBase wiki.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
|
|
@ -232,10 +232,13 @@ Deferred log flush can be configured on tables via <link
|
|||
</section>
|
||||
<section xml:id="perf.hbase.write.mr.reducer">
|
||||
<title>MapReduce: Skip The Reducer</title>
|
||||
<para>When writing a lot of data to an HBase table in a in a Mapper (e.g., with <link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>),
|
||||
skip the Reducer step whenever possible. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then shuffled to other
|
||||
Reducers that will most likely be off-node.
|
||||
<para>When writing a lot of data to an HBase table from a MR job (e.g., with <link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted
|
||||
from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other
|
||||
Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.
|
||||
</para>
|
||||
<para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result).
|
||||
This is a different processing problem than from the the above case.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
|
|
Loading…
Reference in New Issue