Add note that can OOME if many regions and MSLAB on

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1406306 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2012-11-06 20:20:16 +00:00
parent 9b3e63dcb7
commit 1ed61755ad
1 changed files with 66 additions and 59 deletions

View File

@ -47,7 +47,7 @@
<title>Network</title> <title>Network</title>
<para> <para>
Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware
that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more). that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).
</para> </para>
<para> <para>
Important items to consider: Important items to consider:
@ -59,15 +59,15 @@
</para> </para>
<section xml:id="perf.network.1switch"> <section xml:id="perf.network.1switch">
<title>Single Switch</title> <title>Single Switch</title>
<para>The single most important factor in this configuration is that the switching capacity of the hardware is capable of <para>The single most important factor in this configuration is that the switching capacity of the hardware is capable of
handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware handling the traffic which can be generated by all systems connected to the switch. Some lower priced commodity hardware
can have a slower switching capacity than could be utilized by a full switch. can have a slower switching capacity than could be utilized by a full switch.
</para> </para>
</section> </section>
<section xml:id="perf.network.2switch"> <section xml:id="perf.network.2switch">
<title>Multiple Switches</title> <title>Multiple Switches</title>
<para>Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a <para>Multiple switches are a potential pitfall in the architecture. The most common configuration of lower priced hardware is a
simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication. simple 1Gbps uplink from one switch to another. This often overlooked pinch point can easily become a bottleneck for cluster communication.
Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated. Especially with MapReduce jobs that are both reading and writing a lot of data the communication across this uplink could be saturated.
</para> </para>
<para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways: <para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
@ -85,10 +85,10 @@
<listitem>Poor switch capacity performance</listitem> <listitem>Poor switch capacity performance</listitem>
<listitem>Insufficient uplink to another rack</listitem> <listitem>Insufficient uplink to another rack</listitem>
</itemizedlist> </itemizedlist>
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing
more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks. more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack The downside of this method however, is in the overhead of ports that could potentially be used. An example of this is, creating an 8Gbps port channel from rack
A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster. A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
</para> </para>
<para>Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to <para>Using 10Gbe links between racks will greatly increase performance, and assuming your switches support a 10Gbe uplink or allow for an expansion card will allow you to
save your ports for machines as opposed to uplinks. save your ports for machines as opposed to uplinks.
@ -128,7 +128,14 @@
slides for background and detail<footnote><para>The latest jvms do better slides for background and detail<footnote><para>The latest jvms do better
regards fragmentation so make sure you are running a recent release. regards fragmentation so make sure you are running a recent release.
Read down in the message, Read down in the message,
<link xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html">Identifying concurrent mode failures caused by fragmentation</link>.</para></footnote>.</para> <link xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html">Identifying concurrent mode failures caused by fragmentation</link>.</para></footnote>.
Be aware that when enabled, each MemStore instance will occupy at least
an MSLAB instance of memory. If you have thousands of regions or lots
of regions each with many column families, this allocation of MSLAB
may be responsible for a good portion of your heap allocation and in
an extreme case cause you to OOME. Disable MSLAB in this case, or
lower the amount of memory it uses or float less regions per server.
</para>
<para>For more information about GC logs, see <xref linkend="trouble.log.gc" />. <para>For more information about GC logs, see <xref linkend="trouble.log.gc" />.
</para> </para>
</section> </section>
@ -159,44 +166,44 @@
<section xml:id="perf.handlers"> <section xml:id="perf.handlers">
<title><varname>hbase.regionserver.handler.count</varname></title> <title><varname>hbase.regionserver.handler.count</varname></title>
<para>See <xref linkend="hbase.regionserver.handler.count"/>. <para>See <xref linkend="hbase.regionserver.handler.count"/>.
</para> </para>
</section> </section>
<section xml:id="perf.hfile.block.cache.size"> <section xml:id="perf.hfile.block.cache.size">
<title><varname>hfile.block.cache.size</varname></title> <title><varname>hfile.block.cache.size</varname></title>
<para>See <xref linkend="hfile.block.cache.size"/>. <para>See <xref linkend="hfile.block.cache.size"/>.
A memory setting for the RegionServer process. A memory setting for the RegionServer process.
</para> </para>
</section> </section>
<section xml:id="perf.rs.memstore.upperlimit"> <section xml:id="perf.rs.memstore.upperlimit">
<title><varname>hbase.regionserver.global.memstore.upperLimit</varname></title> <title><varname>hbase.regionserver.global.memstore.upperLimit</varname></title>
<para>See <xref linkend="hbase.regionserver.global.memstore.upperLimit"/>. <para>See <xref linkend="hbase.regionserver.global.memstore.upperLimit"/>.
This memory setting is often adjusted for the RegionServer process depending on needs. This memory setting is often adjusted for the RegionServer process depending on needs.
</para> </para>
</section> </section>
<section xml:id="perf.rs.memstore.lowerlimit"> <section xml:id="perf.rs.memstore.lowerlimit">
<title><varname>hbase.regionserver.global.memstore.lowerLimit</varname></title> <title><varname>hbase.regionserver.global.memstore.lowerLimit</varname></title>
<para>See <xref linkend="hbase.regionserver.global.memstore.lowerLimit"/>. <para>See <xref linkend="hbase.regionserver.global.memstore.lowerLimit"/>.
This memory setting is often adjusted for the RegionServer process depending on needs. This memory setting is often adjusted for the RegionServer process depending on needs.
</para> </para>
</section> </section>
<section xml:id="perf.hstore.blockingstorefiles"> <section xml:id="perf.hstore.blockingstorefiles">
<title><varname>hbase.hstore.blockingStoreFiles</varname></title> <title><varname>hbase.hstore.blockingStoreFiles</varname></title>
<para>See <xref linkend="hbase.hstore.blockingStoreFiles"/>. <para>See <xref linkend="hbase.hstore.blockingStoreFiles"/>.
If there is blocking in the RegionServer logs, increasing this can help. If there is blocking in the RegionServer logs, increasing this can help.
</para> </para>
</section> </section>
<section xml:id="perf.hregion.memstore.block.multiplier"> <section xml:id="perf.hregion.memstore.block.multiplier">
<title><varname>hbase.hregion.memstore.block.multiplier</varname></title> <title><varname>hbase.hregion.memstore.block.multiplier</varname></title>
<para>See <xref linkend="hbase.hregion.memstore.block.multiplier"/>. <para>See <xref linkend="hbase.hregion.memstore.block.multiplier"/>.
If there is enough RAM, increasing this can help. If there is enough RAM, increasing this can help.
</para> </para>
</section> </section>
<section xml:id="hbase.regionserver.checksum.verify"> <section xml:id="hbase.regionserver.checksum.verify">
<title><varname>hbase.regionserver.checksum.verify</varname></title> <title><varname>hbase.regionserver.checksum.verify</varname></title>
<para>Have HBase write the checksum into the datablock and save <para>Have HBase write the checksum into the datablock and save
having to do the checksum seek whenever you read. See the having to do the checksum seek whenever you read. See the
release note on <link xlink:href="https://issues.apache.org/jira/browse/HBASE-5074">HBASE-5074 support checksums in HBase block cache</link>. release note on <link xlink:href="https://issues.apache.org/jira/browse/HBASE-5074">HBASE-5074 support checksums in HBase block cache</link>.
</para> </para>
</section> </section>
@ -218,7 +225,7 @@ more discussion around short circuit reads.
</para> </para>
<para>To enable "short circuit" reads, you must set two configurations. <para>To enable "short circuit" reads, you must set two configurations.
First, the hdfs-site.xml needs to be amended. Set First, the hdfs-site.xml needs to be amended. Set
the property <varname>dfs.block.local-path-access.user</varname> the property <varname>dfs.block.local-path-access.user</varname>
to be the <emphasis>only</emphasis> user that can use the shortcut. to be the <emphasis>only</emphasis> user that can use the shortcut.
This has to be the user that started HBase. Then in hbase-site.xml, This has to be the user that started HBase. Then in hbase-site.xml,
set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname> set <varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
@ -241,19 +248,19 @@ the data will still be read.
</section> </section>
<section xml:id="perf.schema"> <section xml:id="perf.schema">
<title>Schema Design</title> <title>Schema Design</title>
<section xml:id="perf.number.of.cfs"> <section xml:id="perf.number.of.cfs">
<title>Number of Column Families</title> <title>Number of Column Families</title>
<para>See <xref linkend="number.of.cfs" />.</para> <para>See <xref linkend="number.of.cfs" />.</para>
</section> </section>
<section xml:id="perf.schema.keys"> <section xml:id="perf.schema.keys">
<title>Key and Attribute Lengths</title> <title>Key and Attribute Lengths</title>
<para>See <xref linkend="keysize" />. See also <xref linkend="perf.compression.however" /> for <para>See <xref linkend="keysize" />. See also <xref linkend="perf.compression.however" /> for
compression caveats.</para> compression caveats.</para>
</section> </section>
<section xml:id="schema.regionsize"><title>Table RegionSize</title> <section xml:id="schema.regionsize"><title>Table RegionSize</title>
<para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on <para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the
event where certain tables require different regionsizes than the configured default regionsize. event where certain tables require different regionsizes than the configured default regionsize.
</para> </para>
<para>See <xref linkend="perf.number.of.regions"/> for more information. <para>See <xref linkend="perf.number.of.regions"/> for more information.
@ -269,23 +276,23 @@ the data will still be read.
on each insert. If <varname>ROWCOL</varname>, the hash of the row + on each insert. If <varname>ROWCOL</varname>, the hash of the row +
column family + column family qualifier will be added to the bloom on column family + column family qualifier will be added to the bloom on
each key insert.</para> each key insert.</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and
<xref linkend="blooms"/> for more information or this answer up in quora, <xref linkend="blooms"/> for more information or this answer up in quora,
<link xlink:href="http://www.quora.com/How-are-bloom-filters-used-in-HBase">How are bloom filters used in HBase?</link>. <link xlink:href="http://www.quora.com/How-are-bloom-filters-used-in-HBase">How are bloom filters used in HBase?</link>.
</para> </para>
</section> </section>
<section xml:id="schema.cf.blocksize"><title>ColumnFamily BlockSize</title> <section xml:id="schema.cf.blocksize"><title>ColumnFamily BlockSize</title>
<para>The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes. <para>The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes.
There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting
indexes should be roughly halved). indexes should be roughly halved).
</para> </para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>
and <xref linkend="store"/>for more information. and <xref linkend="store"/>for more information.
</para> </para>
</section> </section>
<section xml:id="cf.in.memory"> <section xml:id="cf.in.memory">
<title>In-Memory ColumnFamilies</title> <title>In-Memory ColumnFamilies</title>
<para>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily. <para>ColumnFamilies can optionally be defined as in-memory. Data is still persisted to disk, just like any other ColumnFamily.
In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table In-memory blocks have the highest priority in the <xref linkend="block.cache" />, but it is not a guarantee that the entire table
will be in memory. will be in memory.
</para> </para>
@ -297,17 +304,17 @@ the data will still be read.
<para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information. <para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information.
</para> </para>
<section xml:id="perf.compression.however"><title>However...</title> <section xml:id="perf.compression.however"><title>However...</title>
<para>Compression deflates data <emphasis>on disk</emphasis>. When it's in-memory (e.g., in the <para>Compression deflates data <emphasis>on disk</emphasis>. When it's in-memory (e.g., in the
MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated. MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated.
So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate
the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names. the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
</para> </para>
<para>See <xref linkend="keysize" /> on for schema design tips, and <xref linkend="keyvalue"/> for more information on HBase stores data internally. <para>See <xref linkend="keysize" /> on for schema design tips, and <xref linkend="keyvalue"/> for more information on HBase stores data internally.
</para> </para>
</section> </section>
</section> </section>
</section> <!-- perf schema --> </section> <!-- perf schema -->
<section xml:id="perf.writing"> <section xml:id="perf.writing">
<title>Writing to HBase</title> <title>Writing to HBase</title>
@ -335,7 +342,7 @@ throws IOException {
} catch (TableExistsException e) { } catch (TableExistsException e) {
logger.info("table " + table.getNameAsString() + " already exists"); logger.info("table " + table.getNameAsString() + " already exists");
// the table already exists... // the table already exists...
return false; return false;
} }
} }
@ -360,7 +367,7 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
Table Creation: Deferred Log Flush Table Creation: Deferred Log Flush
</title> </title>
<para> <para>
The default behavior for Puts using the Write Ahead Log (WAL) is that <classname>HLog</classname> edits will be written immediately. If deferred log flush is used, The default behavior for Puts using the Write Ahead Log (WAL) is that <classname>HLog</classname> edits will be written immediately. If deferred log flush is used,
WAL edits are kept in memory until the flush period. The benefit is aggregated and asynchronous <classname>HLog</classname>- writes, but the potential downside is that if WAL edits are kept in memory until the flush period. The benefit is aggregated and asynchronous <classname>HLog</classname>- writes, but the potential downside is that if
the RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than not using WAL at all with Puts. the RegionServer goes down the yet-to-be-flushed edits are lost. This is safer, however, than not using WAL at all with Puts.
</para> </para>
@ -368,7 +375,7 @@ WAL edits are kept in memory until the flush period. The benefit is aggregated
Deferred log flush can be configured on tables via <link Deferred log flush can be configured on tables via <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>. The default value of <varname>hbase.regionserver.optionallogflushinterval</varname> is 1000ms. xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>. The default value of <varname>hbase.regionserver.optionallogflushinterval</varname> is 1000ms.
</para> </para>
</section> </section>
<section xml:id="perf.hbase.client.autoflush"> <section xml:id="perf.hbase.client.autoflush">
<title>HBase Client: AutoFlush</title> <title>HBase Client: AutoFlush</title>
@ -394,25 +401,25 @@ Deferred log flush can be configured on tables via <link
it makes little difference if your load is well distributed across the cluster. it makes little difference if your load is well distributed across the cluster.
</para> </para>
<para>In general, it is best to use WAL for Puts, and where loading throughput <para>In general, it is best to use WAL for Puts, and where loading throughput
is a concern to use <link linkend="perf.batch.loading">bulk loading</link> techniques instead. is a concern to use <link linkend="perf.batch.loading">bulk loading</link> techniques instead.
</para> </para>
</section> </section>
<section xml:id="perf.hbase.client.regiongroup"> <section xml:id="perf.hbase.client.regiongroup">
<title>HBase Client: Group Puts by RegionServer</title> <title>HBase Client: Group Puts by RegionServer</title>
<para>In addition to using the writeBuffer, grouping <classname>Put</classname>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush. <para>In addition to using the writeBuffer, grouping <classname>Put</classname>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
There is a utility <classname>HTableUtil</classname> currently on TRUNK that does this, but you can either copy that or implement your own verison for There is a utility <classname>HTableUtil</classname> currently on TRUNK that does this, but you can either copy that or implement your own verison for
those still on 0.90.x or earlier. those still on 0.90.x or earlier.
</para> </para>
</section> </section>
<section xml:id="perf.hbase.write.mr.reducer"> <section xml:id="perf.hbase.write.mr.reducer">
<title>MapReduce: Skip The Reducer</title> <title>MapReduce: Skip The Reducer</title>
<para>When writing a lot of data to an HBase table from a MR job (e.g., with <link <para>When writing a lot of data to an HBase table from a MR job (e.g., with <link
xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html">TableOutputFormat</link>), and specifically where Puts are being emitted
from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other
Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase.
</para> </para>
<para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). <para>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result).
This is a different processing problem than from the the above case. This is a different processing problem than from the the above case.
</para> </para>
</section> </section>
@ -421,16 +428,16 @@ Deferred log flush can be configured on tables via <link
<para>If all your data is being written to one region at a time, then re-read the <para>If all your data is being written to one region at a time, then re-read the
section on processing <link linkend="timeseries">timeseries</link> data.</para> section on processing <link linkend="timeseries">timeseries</link> data.</para>
<para>Also, if you are pre-splitting regions and all your data is <emphasis>still</emphasis> winding up in a single region even though <para>Also, if you are pre-splitting regions and all your data is <emphasis>still</emphasis> winding up in a single region even though
your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy. There are a
variety of reasons that regions may appear "well split" but won't work with your data. As variety of reasons that regions may appear "well split" but won't work with your data. As
the HBase client communicates directly with the RegionServers, this can be obtained via the HBase client communicates directly with the RegionServers, this can be obtained via
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[]%29">HTable.getRegionLocation</link>. <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[]%29">HTable.getRegionLocation</link>.
</para> </para>
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para> <para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
</section> </section>
</section> <!-- writing --> </section> <!-- writing -->
<section xml:id="perf.reading"> <section xml:id="perf.reading">
<title>Reading from HBase</title> <title>Reading from HBase</title>
@ -452,7 +459,7 @@ Deferred log flush can be configured on tables via <link
<para>Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException) <para>Scan settings in MapReduce jobs deserve special attention. Timeouts can result (e.g., UnknownScannerException)
in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the
next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process next set of data. This problem can occur because there is non-trivial processing occuring per row. If you process
rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes), rows quickly, set caching higher. If you process rows more slowly (e.g., lots of transformations per row, writes),
then set caching lower. then set caching lower.
</para> </para>
<para>Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the <para>Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the
@ -472,8 +479,8 @@ Deferred log flush can be configured on tables via <link
</section> </section>
<section xml:id="perf.hbase.mr.input"> <section xml:id="perf.hbase.mr.input">
<title>MapReduce - Input Splits</title> <title>MapReduce - Input Splits</title>
<para>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to <para>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to
have the same Input Split (i.e., the RegionServer serving the data), see the have the same Input Split (i.e., the RegionServer serving the data), see the
Troubleshooting Case Study in <xref linkend="casestudies.slownode"/>. Troubleshooting Case Study in <xref linkend="casestudies.slownode"/>.
</para> </para>
</section> </section>
@ -522,9 +529,9 @@ htable.close();</programlisting></para>
</section> </section>
<section xml:id="perf.hbase.read.dist"> <section xml:id="perf.hbase.read.dist">
<title>Concurrency: Monitor Data Spread</title> <title>Concurrency: Monitor Data Spread</title>
<para>When performing a high number of concurrent reads, monitor the data spread of the target tables. If the target table(s) have <para>When performing a high number of concurrent reads, monitor the data spread of the target tables. If the target table(s) have
too few regions then the reads could likely be served from too few nodes. </para> too few regions then the reads could likely be served from too few nodes. </para>
<para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para> <para>See <xref linkend="precreate.regions"/>, as well as <xref linkend="perf.configurations"/> </para>
</section> </section>
<section xml:id="blooms"> <section xml:id="blooms">
<title>Bloom Filters</title> <title>Bloom Filters</title>
@ -554,7 +561,7 @@ htable.close();</programlisting></para>
</footnote></para> </footnote></para>
<para>See also <xref linkend="schema.bloom" />. <para>See also <xref linkend="schema.bloom" />.
</para> </para>
<section xml:id="bloom_footprint"> <section xml:id="bloom_footprint">
<title>Bloom StoreFile footprint</title> <title>Bloom StoreFile footprint</title>
@ -584,7 +591,7 @@ htable.close();</programlisting></para>
data. Obtained on-demand. Stored in the LRU cache, if it is enabled data. Obtained on-demand. Stored in the LRU cache, if it is enabled
(Its enabled by default).</para> (Its enabled by default).</para>
</section> </section>
</section> </section>
<section xml:id="config.bloom"> <section xml:id="config.bloom">
<title>Bloom Filter Configuration</title> <title>Bloom Filter Configuration</title>
<section> <section>
@ -615,10 +622,10 @@ htable.close();</programlisting></para>
in HBase</link> for more on what this option means.</para> in HBase</link> for more on what this option means.</para>
</section> </section>
</section> </section>
</section> <!-- bloom --> </section> <!-- bloom -->
</section> <!-- reading --> </section> <!-- reading -->
<section xml:id="perf.deleting"> <section xml:id="perf.deleting">
<title>Deleting from HBase</title> <title>Deleting from HBase</title>
<section xml:id="perf.deleting.queue"> <section xml:id="perf.deleting.queue">
@ -647,20 +654,20 @@ htable.close();</programlisting></para>
<section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency Reads</title> <section xml:id="perf.hdfs.curr"><title>Current Issues With Low-Latency Reads</title>
<para>The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority. <para>The original use-case for HDFS was batch processing. As such, there low-latency reads were historically not a priority.
With the increased adoption of Apache HBase this is changing, and several improvements are already in development. With the increased adoption of Apache HBase this is changing, and several improvements are already in development.
See the See the
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>. <link xlink:href="https://issues.apache.org/jira/browse/HDFS-1599">Umbrella Jira Ticket for HDFS Improvements for HBase</link>.
</para> </para>
</section> </section>
<section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title> <section xml:id="perf.hdfs.comp"><title>Performance Comparisons of HBase vs. HDFS</title>
<para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as <para>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues, a MapReduce source or sink). The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,
returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this
processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS processing context. Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS
will always be faster in this use-case. will always be faster in this use-case.
</para> </para>
</section> </section>
</section> </section>
<section xml:id="perf.ec2"><title>Amazon EC2</title> <section xml:id="perf.ec2"><title>Amazon EC2</title>
<para>Performance questions are common on Amazon EC2 environments because it is a shared environment. You will <para>Performance questions are common on Amazon EC2 environments because it is a shared environment. You will
not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same not see the same throughput as a dedicated server. In terms of running tests on EC2, run them several times for the same
@ -670,7 +677,7 @@ htable.close();</programlisting></para>
because EC2 issues are practically a separate class of performance issues. because EC2 issues are practically a separate class of performance issues.
</para> </para>
</section> </section>
<section xml:id="perf.casestudy"><title>Case Studies</title> <section xml:id="perf.casestudy"><title>Case Studies</title>
<para>For Performance and Troubleshooting Case Studies, see <xref linkend="casestudies"/>. <para>For Performance and Troubleshooting Case Studies, see <xref linkend="casestudies"/>.
</para> </para>