diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml index 71dc93f1769..3ebf249ac6f 100644 --- a/src/main/docbkx/book.xml +++ b/src/main/docbkx/book.xml @@ -2139,7 +2139,7 @@ Initial stripe count to create. You can use it as follows: for relatively uniform row keys, if you know the approximate target number of stripes from the above, you can avoid some splitting overhead by starting w/several stripes (2, 5, 10...). Note that if the early data is not representative of overall row key distribution, this will not be as efficient. -for existing tables with lots of data, you can use this to pre-split stripes. +for existing tables with lots of data, you can use this to pre-split stripes. for e.g. hash-prefixed sequential keys, with more than one hash prefix per region, you know that some pre-splitting makes sense. @@ -2189,6 +2189,11 @@ All the settings that apply to normal compactions (file size limits, etc.) apply simply using the HBase API. +
Bulk Load Limitations + As bulk loading bypasses the write path, the WAL doesn’t get written to as part of the process. + Replication works by reading the WAL files so it won’t see the bulk loaded data – and the same goes for the edits that use Put.setWriteToWAL(true). + One way to handle that is to ship the raw files or the HFiles to the other cluster and do the other processing there. +
Bulk Load Architecture The HBase bulk load process consists of two main steps. @@ -2270,6 +2275,10 @@ All the settings that apply to normal compactions (file size limits, etc.) apply
See Also For more information about the referenced utilities, see and . + + See How-to: Use HBase Bulk Loading, and Why + for a recent blog on current state of bulk loading. +
Advanced Usage