Note that bulk loading skirts replication

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1545777 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-11-26 18:47:54 +00:00
parent 997d1bb727
commit a4b414b133
1 changed files with 10 additions and 1 deletions

View File

@ -2139,7 +2139,7 @@ Initial stripe count to create. You can use it as follows:
<listitem>
for relatively uniform row keys, if you know the approximate target number of stripes from the above, you can avoid some splitting overhead by starting w/several stripes (2, 5, 10...). Note that if the early data is not representative of overall row key distribution, this will not be as efficient.
</listitem><listitem>
for existing tables with lots of data, you can use this to pre-split stripes.
for existing tables with lots of data, you can use this to pre-split stripes.
</listitem><listitem>
for e.g. hash-prefixed sequential keys, with more than one hash prefix per region, you know that some pre-splitting makes sense.
</listitem>
@ -2189,6 +2189,11 @@ All the settings that apply to normal compactions (file size limits, etc.) apply
simply using the HBase API.
</para>
</section>
<section xml:id="arch.bulk.load.limitations"><title>Bulk Load Limitations</title>
<para>As bulk loading bypasses the write path, the WAL doesnt get written to as part of the process.
Replication works by reading the WAL files so it wont see the bulk loaded data and the same goes for the edits that use Put.setWriteToWAL(true).
One way to handle that is to ship the raw files or the HFiles to the other cluster and do the other processing there.</para>
</section>
<section xml:id="arch.bulk.load.arch"><title>Bulk Load Architecture</title>
<para>
The HBase bulk load process consists of two main steps.
@ -2270,6 +2275,10 @@ All the settings that apply to normal compactions (file size limits, etc.) apply
<section xml:id="arch.bulk.load.also"><title>See Also</title>
<para>For more information about the referenced utilities, see <xref linkend="importtsv"/> and <xref linkend="completebulkload"/>.
</para>
<para>
See <link xlink:ref="http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/">How-to: Use HBase Bulk Loading, and Why</link>
for a recent blog on current state of bulk loading.
</para>
</section>
<section xml:id="arch.bulk.load.adv"><title>Advanced Usage</title>
<para>