HBASE-4165 reorganized performance chapter
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1153898 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
c0acc54f57
commit
2af2f5dbd9
|
@ -134,13 +134,8 @@
|
|||
<para>See <xref linkend="number.of.cfs" />.</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="perf.one.region">
|
||||
<title>Data Clumping</title>
|
||||
|
||||
<para>If all your data is being written to one region, then re-read the
|
||||
section on processing <link linkend="timeseries">timeseries</link>
|
||||
data.</para>
|
||||
</section>
|
||||
<section xml:id="perf.writing">
|
||||
<title>Writing to HBase</title>
|
||||
|
||||
<section xml:id="perf.batch.loading">
|
||||
<title>Batch Loading</title>
|
||||
|
@ -148,6 +143,7 @@
|
|||
<link xlink:href="http://hbase.apache.org/bulk-loads.html">Bulk Loads</link>.
|
||||
Otherwise, pay attention to the below.
|
||||
</para>
|
||||
</section> <!-- batch loading -->
|
||||
|
||||
<section xml:id="precreate.regions">
|
||||
<title>
|
||||
|
@ -199,13 +195,9 @@ Deferred log flush can be configured on tables via <link
|
|||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>. The default value of <varname>hbase.regionserver.optionallogflushinterval</varname> is 1000ms.
|
||||
</para>
|
||||
</section>
|
||||
</section> <!-- batch loading -->
|
||||
|
||||
<section>
|
||||
<title>HBase Client</title>
|
||||
|
||||
<section xml:id="perf.hbase.client.autoflush">
|
||||
<title>AutoFlush</title>
|
||||
<title>HBase Client: AutoFlush</title>
|
||||
|
||||
<para>When performing a lot of Puts, make sure that setAutoFlush is set
|
||||
to false on your <link
|
||||
|
@ -218,6 +210,46 @@ Deferred log flush can be configured on tables via <link
|
|||
Calling <methodname>close</methodname> on the <classname>HTable</classname>
|
||||
instance will invoke <methodname>flushCommits</methodname>.</para>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.client.putwal">
|
||||
<title>HBase Client: Turn off WAL on Puts</title>
|
||||
<para>A frequently discussed option for increasing throughput on <classname>Put</classname>s is to call <code>writeToWAL(false)</code>. Turning this off means
|
||||
that the RegionServer will <emphasis>not</emphasis> write the <classname>Put</classname> to the Write Ahead Log,
|
||||
only into the memstore, HOWEVER the consequence is that if there
|
||||
is a RegionServer failure <emphasis>there will be data loss</emphasis>.
|
||||
If <code>writeToWAL(false)</code> is used, do so with extreme caution. You may find in actuality that
|
||||
it makes little difference if your load is well distributed across the cluster.
|
||||
</para>
|
||||
<para>In general, it is best to use WAL for Puts, and where loading throughput
|
||||
is a concern to use <link linkend="perf.batch.loading">bulk loading</link> techniques instead.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.client.regiongroup">
|
||||
<title>HBase Client: Group Puts by RegionServer</title>
|
||||
<para>In addition to using the writeBuffer, grouping <classname>Put</classname>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
|
||||
There is a utility <classname>HTableUtil</classname> currently on TRUNK that does this, but you can either copy that or implement your own verison for
|
||||
those still on 0.90.x or earlier.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.write.mr.reducer">
|
||||
<title>MapReduce: Skip The Reducer</title>
|
||||
<para>When writing a lot of data to an HBase table in a in a Mapper (e.g., with <link
|
||||
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>),
|
||||
skip the Reducer step whenever possible. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then shuffled to other
|
||||
Reducers that will most likely be off-node.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="perf.one.region">
|
||||
<title>Anti-Pattern: One Hot Region</title>
|
||||
<para>If all your data is being written to one region at a time, then re-read the
|
||||
section on processing <link linkend="timeseries">timeseries</link> data.</para>
|
||||
<para>Also, see <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
|
||||
</section>
|
||||
|
||||
</section> <!-- writing -->
|
||||
|
||||
<section xml:id="perf.reading">
|
||||
<title>Reading from HBase</title>
|
||||
|
||||
<section xml:id="perf.hbase.client.caching">
|
||||
<title>Scan Caching</title>
|
||||
|
@ -286,18 +318,12 @@ htable.close();</programlisting></para>
|
|||
and minimal network traffic to the client for a single row.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.client.putwal">
|
||||
<title>Turn off WAL on Puts</title>
|
||||
<para>A frequently discussed option for increasing throughput on <classname>Put</classname>s is to call <code>writeToWAL(false)</code>. Turning this off means
|
||||
that the RegionServer will <emphasis>not</emphasis> write the <classname>Put</classname> to the Write Ahead Log,
|
||||
only into the memstore, HOWEVER the consequence is that if there
|
||||
is a RegionServer failure <emphasis>there will be data loss</emphasis>.
|
||||
If <code>writeToWAL(false)</code> is used, do so with extreme caution. You may find in actuality that
|
||||
it makes little difference if your load is well distributed across the cluster.
|
||||
</para>
|
||||
<para>In general, it is best to use WAL for Puts, and where loading throughput
|
||||
is a concern to use <link linkend="perf.batch.loading">bulk loading</link> techniques instead.
|
||||
</para>
|
||||
</section>
|
||||
<section xml:id="perf.hbase.read.dist">
|
||||
<title>Concurrency: Monitor Data Spread</title>
|
||||
<para>When performing a high number of concurrent reads, monitor the data spread of the target tables. If there target table(s) are in
|
||||
too few regions then the reads will fall on only a few nodes. </para>
|
||||
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
|
||||
</section>
|
||||
|
||||
</section> <!-- reading -->
|
||||
</chapter>
|
||||
|
|
Loading…
Reference in New Issue