HBASE-24144 Update docs from master (addendum)

Bring back documentation from master branch (37b863bd0b), using

```
$ git checkout master -- src/main/asciidoc/
$ git checkout master -- src/site/asciidoc/
```

And then:
 * remove changes from HBASE-23890
   This reverts commit 420e38083f.
 * delete docs re: sync_replication
 * delete docs re: backup
This commit is contained in:
Nick Dimiduk 2020-06-29 08:21:21 -07:00 committed by Nick Dimiduk
parent 3effd28a75
commit 5b8afaeacd
3 changed files with 110 additions and 94 deletions

View File

@ -312,6 +312,7 @@ link:https://hadoop.apache.org/cve_list.html[CVEs] so we drop the support in new
|Hadoop-3.0.3+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] |Hadoop-3.0.3+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:times-circle[role="red"] | icon:times-circle[role="red"]
|Hadoop-3.1.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] |Hadoop-3.1.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"]
|Hadoop-3.1.1+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] |Hadoop-3.1.1+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] | icon:check-circle[role="green"]
|Hadoop-3.2.x | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:check-circle[role="green"]
|=== |===
.Hadoop Pre-2.6.1 and JDK 1.8 Kerberos .Hadoop Pre-2.6.1 and JDK 1.8 Kerberos

View File

@ -30,76 +30,81 @@
[[regionserver.offheap.overview]] [[regionserver.offheap.overview]]
== Overview == Overview
For reducing the Java GC impact to P99/P999 RPC latency, HBase 2.x has made the offheap read and write path. The cells are To help reduce P99/P999 RPC latencies, HBase 2.x has made the read and write path use a pool of offheap buffers. Cells are
allocated from JVM offheap memory area, which wont be garbage collected by JVM and need to be deallocated explicitly by allocated in offheap memory outside of the purview of the JVM garbage collector with attendent reduction in GC pressure.
upstream callers. In the write path, the request packet received from client will be allocated offheap and retained In the write path, the request packet received from client will be read in on a pre-allocated offheap buffer and retained
until those cells are successfully written to the WAL and Memstore. The memory data structure in Memstore does offheap until those cells are successfully persisted to the WAL and Memstore. The memory data structure in Memstore does
not directly store the cell memory, but reference to cells which are encoded in multiple chunks in MSLAB, this is easier not directly store the cell memory, but references the cells encoded in the offheap buffers. Similarly for the read path.
to manage the offheap memory. Similarly, in the read path, well try to read the cache firstly, if the cache Well try to read the block cache first and if a cache misses, we'll go to the HFile and read the respective block. The
misses, go to the HFile and read the corresponding block. The workflow: from reading blocks to sending cells to workflow from reading blocks to sending cells to client does its best to avoid on-heap memory allocations reducing the
client, it's basically not involved in on-heap memory allocations. amount of work the GC has to do.
image::offheap-overview.png[] image::offheap-overview.png[]
For redress for the single mention of onheap in the read-section of the diagram above see <<regionserver.read.hdfs.block.offheap>>.
[[regionserver.offheap.readpath]] [[regionserver.offheap.readpath]]
== Offheap read-path == Offheap read-path
In HBase-2.0.0, link:https://issues.apache.org/jira/browse/HBASE-11425[HBASE-11425] changed the HBase read path so it In HBase-2.0.0, link:https://issues.apache.org/jira/browse/HBASE-11425[HBASE-11425] changed the HBase read path so it
could hold the read-data off-heap (from BucketCache) avoiding copying of cached data on to the java heap. could hold the read-data off-heap avoiding copying of cached data (BlockCache) on to the java heap (for uncached data,
This reduces GC pauses given there is less garbage made and so less to clear. The off-heap read path can have a performance see note under the diagram in the section above). This reduces GC pauses given there is less garbage made and so less
that is similar or better to that of the on-heap LRU cache. This feature is available since HBase 2.0.0. to clear. The off-heap read path can have a performance that is similar or better to that of the on-heap LRU cache.
Refer to below blogs for more details and test results on off heaped read path This feature is available since HBase 2.0.0. Refer to below blogs for more details and test results on off heaped read path
link:https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in[Offheaping the Read Path in Apache HBase: Part 1 of 2] link:https://blogs.apache.org/hbase/entry/offheaping_the_read_path_in[Offheaping the Read Path in Apache HBase: Part 1 of 2]
and link:https://blogs.apache.org/hbase/entry/offheap-read-path-in-production[Offheap Read-Path in Production - The Alibaba story] and link:https://blogs.apache.org/hbase/entry/offheap-read-path-in-production[Offheap Read-Path in Production - The Alibaba story]
For an end-to-end off-heaped read-path, all you have to do is enable an off-heap backed <<offheap.blockcache>>(BC). For an end-to-end off-heaped read-path, all you have to do is enable an off-heap backed <<offheap.blockcache>>(BC).
Configure _hbase.bucketcache.ioengine_ to be _offheap_ in _hbase-site.xml_ (See <<bc.deploy.modes>> to learn more about _hbase.bucketcache.ioengine_ options). To do this, configure _hbase.bucketcache.ioengine_ to be _offheap_ in _hbase-site.xml_ (See <<bc.deploy.modes>> to learn
Also specify the total capacity of the BC using `hbase.bucketcache.size` config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in more about _hbase.bucketcache.ioengine_ options). Also specify the total capacity of the BC using `hbase.bucketcache.size`.
_hbase-env.sh_ (See <<bc.example>> for help sizing and an example enabling). This configuration is for specifying the maximum Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in _hbase-env.sh_ (See <<bc.example>> for help sizing and an example
possible off-heap memory allocation for the RegionServer java process. This should be bigger than the off-heap BC size enabling). This configuration is for specifying the maximum possible off-heap memory allocation for the RegionServer java
to accommodate usage by other features making use of off-heap memory such as Server RPC buffer pool and short-circuit process. This should be bigger than the off-heap BC size to accommodate usage by other features making use of off-heap memory
reads (See discussion in <<bc.example>>). such as Server RPC buffer pool and short-circuit reads (See discussion in <<bc.example>>).
Please keep in mind that there is no default for `hbase.bucketcache.ioengine` Please keep in mind that there is no default for `hbase.bucketcache.ioengine` which means the `BlockCache` is OFF by default
which means the BC is OFF by default (See <<direct.memory>>). (See <<direct.memory>>).
This is all you need to do to enable off-heap read path. Most buffers in HBase are already off-heap. With BC off-heap, This is all you need to do to enable off-heap read path. Most buffers in HBase are already off-heap. With BC off-heap,
the read pipeline will copy data between HDFS and the server socket send of the results back to the client. the read pipeline will copy data between HDFS and the server socket -- caveat <<hbase.ipc.server.reservoir.initial.max>> --
sending results back to the client.
[[regionserver.offheap.rpc.bb.tuning]] [[regionserver.offheap.rpc.bb.tuning]]
===== Tuning the RPC buffer pool ===== Tuning the RPC buffer pool
It is possible to tune the ByteBuffer pool on the RPC server side It is possible to tune the ByteBuffer pool on the RPC server side used to accumulate the cell bytes and create result
used to accumulate the cell bytes and create result cell blocks to send back to the client side. cell blocks to send back to the client side. Use `hbase.ipc.server.reservoir.enabled` to turn this pool ON or OFF. By
`hbase.ipc.server.reservoir.enabled` can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off-heap ByteBuffers default this pool is ON and available. HBase will create off-heap ByteBuffers and pool them them by default. Please
and pool them them by default. Please make sure not to turn this OFF if you want end-to-end off-heaping in read path. make sure not to turn this OFF if you want end-to-end off-heaping in read path.
NOTE: the config keys which start with prefix `hbase.ipc.server.reservoir` are deprecated in HBase3.x. If you are still If this pool is turned off, the server will create temp buffers onheap to accumulate the cell bytes and
in HBase2.x, then just use the old config keys. otherwise if in HBase3.x, please use the new config keys. make a result cell block. This can impact the GC on a highly read loaded server.
NOTE: the config keys which start with prefix `hbase.ipc.server.reservoir` are deprecated in hbase-3.x (the
internal pool implementation changed). If you are still in hbase-2.2.x or older, then just use the old config
keys. Otherwise if in hbase-3.x or hbase-2.3.x+, please use the new config keys
(See <<regionserver.read.hdfs.block.offheap,deprecated and new configs in HBase3.x>>) (See <<regionserver.read.hdfs.block.offheap,deprecated and new configs in HBase3.x>>)
If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and Next thing to tune is the ByteBuffer pool on the RPC server side. The user can tune this pool with respect to how
make a result cell block. This can impact the GC on a highly read loaded server. many buffers are in the pool and what should be the size of each ByteBuffer. Use the config
Next thing to tune is the ByteBuffer pool on the RPC server side: `hbase.ipc.server.reservoir.initial.buffer.size` to tune each of the buffer sizes. Default is 64KB for hbase-2.2.x
and less, changed to 65KB by default for hbase-2.3.x+
The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer.
Use the config `hbase.ipc.server.reservoir.initial.buffer.size` to tune each of the buffer sizes. Default is 64 KB for HBase2.x, while it will be changed to 65KB by default for HBase3.x
(see link:https://issues.apache.org/jira/browse/HBASE-22532[HBASE-22532]) (see link:https://issues.apache.org/jira/browse/HBASE-22532[HBASE-22532])
When the result size is larger than one ByteBuffer size, the server will try to grab more than one ByteBuffer and make a result cell block out of these. When the result size is larger than one 64KB (Default) ByteBuffer size, the server will try to grab more than one
When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. ByteBuffer and make a result cell block out of a collection of fixed-sized ByteBuffers. When the pool is running
out of buffers, the server will skip the pool and create temporary on-heap buffers.
The maximum number of ByteBuffers in the pool can be tuned using the config `hbase.ipc.server.reservoir.initial.max`. The maximum number of ByteBuffers in the pool can be tuned using the config `hbase.ipc.server.reservoir.initial.max`.
Its value defaults to 64 * region server handlers configured (See the config `hbase.regionserver.handler.count`). The Its default is a factor of region server handlers count (See the config `hbase.regionserver.handler.count`). The
math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be
handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler
32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response
and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using
pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that
we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller
sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. sized random row reads, tune this max count. These are lazily created buffers and the count is the max count to be pooled.
If you still see GC issues even after making end-to-end read path off-heap, look for issues in the appropriate buffer If you still see GC issues even after making end-to-end read path off-heap, look for issues in the appropriate buffer
pool. Check the below RegionServer log with INFO level in HBase2.x: pool. Check for the below RegionServer log line at INFO level in HBase2.x:
[source] [source]
---- ----
@ -113,105 +118,114 @@ Or the following log message in HBase3.x:
Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.server.allocator.max.buffer.count' ? Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.server.allocator.max.buffer.count' ?
---- ----
The setting for _HBASE_OFFHEAPSIZE_ in _hbase-env.sh_ should consider this off heap buffer pool at the RPC side also. [[hbase.offheapsize]]
The setting for _HBASE_OFFHEAPSIZE_ in _hbase-env.sh_ should consider this off heap buffer pool on the server side also.
We need to config this max off heap size for the RegionServer as a bit higher than the sum of this max pool size and We need to config this max off heap size for the RegionServer as a bit higher than the sum of this max pool size and
the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS
client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra
of 1 - 2 GB for the max direct memory size has worked in tests. 1 - 2 GB for the max direct memory size has worked in tests.
If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of If you are using coprocessors and refer to the Cells in the read results, DO NOT store reference to these Cells out of
the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering the scope of the CP hook methods. Some times the CPs want to store info about the cell (Like its row key) for considering
in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases.
[ See CellUtil#cloneXXX(Cell) APIs ] [ See CellUtil#cloneXXX(Cell) APIs ]
[[regionserver.read.hdfs.block.offheap]] [[regionserver.read.hdfs.block.offheap]]
== Read block from HDFS to offheap directly == Read block from HDFS to offheap directly
In HBase-2.x, the RegionServer will still read block from HDFS to a temporary heap ByteBuffer and then flush to BucketCache's In HBase-2.x, the RegionServer will read blocks from HDFS to a temporary onheap ByteBuffer and then flush to
IOEngine asynchronously, finally it will be an offheap one. We can still observe much GC pressure when cache hit ratio the BucketCache. Even if the BucketCache is offheap, we will first pull the HDFS read onheap before writing
is not very high (such as cacheHitRatio ~ 60% ), so in link:https://issues.apache.org/jira/browse/HBASE-21879[HBASE-21879] it out to the offheap BucketCache. We can observe much GC pressure when cache hit ratio low (e.g. a cacheHitRatio ~ 60% ).
we redesigned the read path and made the HDFS block reading be offheap now. This feature will be available in HBASE-3.0.0. link:https://issues.apache.org/jira/browse/HBASE-21879[HBASE-21879] addresses this issue (Requires hbase-2.3.x/hbase-3.x).
It depends on there being a supporting HDFS being in place (hadoop-2.10.x or hadoop-3.3.x) and it may require patching
HBase itself (as of this writing); see
link:https://issues.apache.org/jira/browse/HBASE-21879[HBASE-21879 Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose].
Appropriately setup, reads from HDFS can be into offheap buffers passed offheap to the offheap BlockCache to cache.
For more details about the design and performance improvement, please see the link:https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E/edit?usp=sharing[document]. For more details about the design and performance improvement, please see the
Here we will share some best practice about the performance tuning: link:https://docs.google.com/document/d/1xSy9axGxafoH-Qc17zbD2Bd--rWjjI00xTWQZ8ZwI_E[Design Doc -Read HFile's block to Offheap].
Firstly, we introduced several configurations about the ByteBuffAllocator (which was abstracted to manage the memory application or release): Here we will share some best practice about the performance tuning but first we introduce new (hbase-3.x/hbase-2.3.x) configuration names
that go with the new internal pool implementation (`ByteBuffAllocator` vs the old `ByteBufferPool`), some of which mimic now deprecated
hbase-2.2.x configurations discussed above in the <<regionserver.offheap.rpc.bb.tuning>>. Much of the advice here overlaps that given above
in the <<regionserver.offheap.rpc.bb.tuning>> since the implementations have similar configurations.
1. `hbase.server.allocator.pool.enabled`: means whether the region server will use the pooled offheap ByteBuffer allocator. Its default 1. `hbase.server.allocator.pool.enabled` is for whether the RegionServer will use the pooled offheap ByteBuffer allocator. Default
value is true. In HBase2.x, we still use the deprecated `hbase.ipc.server.reservoir.enabled` config while we'll use the new value is true. In hbase-2.x, the deprecated `hbase.ipc.server.reservoir.enabled` did similar and is mapped to this config
one in HBase3.x. until support for the old configuration is removed. This new name will be used in hbase-3.x and hbase-2.3.x+.
2. `hbase.server.allocator.minimal.allocate.size`: If the desired byte size is not less than this one, then it will 2. `hbase.server.allocator.minimal.allocate.size` is the threshold at which we start allocating from the pool. Otherwise the
be allocated as a pooled offheap ByteBuff, otherwise it will be allocated from heap directly because it request will be allocated from onheap directly because it would be wasteful allocating small stuff from our pool of fixed-size
is too wasting to allocate from pool with fixed-size ByteBuffers, default value is `hbase.server.allocator.buffer.size/6`. ByteBuffers. The default minimum is `hbase.server.allocator.buffer.size/6`.
3. `hbase.server.allocator.max.buffer.count`: The ByteBuffAllocator will have many fixed-size ByteBuffers inside which 3. `hbase.server.allocator.max.buffer.count`: The `ByteBuffAllocator`, the new pool/reservoir implementation, has fixed-size
are composited as a pool, this config indicate how many buffers are there in the pool. Its default value will be 2MB * 2 * hbase.regionserver.handler.count / 65KB, ByteBuffers. This config is for how many buffers to pool. Its default value is 2MB * 2 * hbase.regionserver.handler.count / 65KB
the default hbase.regionserver.handler.count is 30, then its value will be 1890. (similar to thediscussion above in <<regionserver.offheap.rpc.bb.tuning>>). If the default `hbase.regionserver.handler.count` is 30, then the default will be 1890.
4. `hbase.server.allocator.buffer.size`: The byte size of each ByteBuffer, default value is 66560 (65KB), here we choose 65KB instead of 64KB 4. `hbase.server.allocator.buffer.size`: The byte size of each ByteBuffer. The default value is 66560 (65KB), here we choose 65KB instead of 64KB
because of link:https://issues.apache.org/jira/browse/HBASE-22532[HBASE-22532]. because of link:https://issues.apache.org/jira/browse/HBASE-22532[HBASE-22532].
The three config keys: `hbase.ipc.server.reservoir.enabled`, `hbase.ipc.server.reservoir.initial.buffer.size` and `hbase.ipc.server.reservoir.initial.max` are introduced in HBase2.x. while in HBase3.x The three config keys -- `hbase.ipc.server.reservoir.enabled`, `hbase.ipc.server.reservoir.initial.buffer.size` and `hbase.ipc.server.reservoir.initial.max` -- introduced in hbase-2.x
they are deprecated now, instead please use the new config keys: `hbase.server.allocator.pool.enabled`, `hbase.server.allocator.buffer.size` and `hbase.server.allocator.max.buffer.count`. have been renamed and deprecated in hbase-3.x/hbase-2.3.x. Please use the new config keys instead:
`hbase.server.allocator.pool.enabled`, `hbase.server.allocator.buffer.size` and `hbase.server.allocator.max.buffer.count`.
If you still use the deprecated three config keys in HBase3.0.0, you will get a WARN log message like: If you still use the deprecated three config keys in hbase-3.x, you will get a WARN log message like:
[source] [source]
---- ----
The config keys hbase.ipc.server.reservoir.initial.buffer.size and hbase.ipc.server.reservoir.initial.max are deprecated now, instead please use hbase.server.allocator.buffer.size and hbase.server.allocator.max.buffer.count. In future release we will remove the two deprecated configs. The config keys hbase.ipc.server.reservoir.initial.buffer.size and hbase.ipc.server.reservoir.initial.max are deprecated now, instead please use hbase.server.allocator.buffer.size and hbase.server.allocator.max.buffer.count. In future release we will remove the two deprecated configs.
---- ----
Second, we have some suggestions about the performance: Next, we have some suggestions regards performance.
.Please make sure that there are enough pooled DirectByteBuffer in your ByteBuffAllocator. .Please make sure that there are enough pooled DirectByteBuffer in your ByteBuffAllocator.
The ByteBuffAllocator will allocate ByteBuffer from DirectByteBuffer pool firstly, if theres no available ByteBuffer The ByteBuffAllocator will allocate ByteBuffer from the DirectByteBuffer pool first. If
from the pool, then it will just allocate the ByteBuffers from heap, then the GC pressures will increase again. theres no available ByteBuffer in the pool, then we will allocate the ByteBuffers from onheap.
By default, we will pre-allocate 4MB for each RPC handler (The handler count is determined by the config:
By default, we will pre-allocate 4MB for each RPC handlers ( The handler count is determined by the config:
`hbase.regionserver.handler.count`, it has the default value 30) . Thats to say, if your `hbase.server.allocator.buffer.size` `hbase.regionserver.handler.count`, it has the default value 30) . Thats to say, if your `hbase.server.allocator.buffer.size`
is 65KB, then your pool will have 2MB * 2 / 65KB * 30 = 945 DirectByteBuffer. If you have some large scan and have a big caching, is 65KB, then your pool will have 2MB * 2 / 65KB * 30 = 945 DirectByteBuffer. If you have a large scan and a big cache,
say you may have a rpc response whose bytes size is greater than 2MB (another 2MB for receiving rpc request), then it will you may have a RPC response whose bytes size is greater than 2MB (another 2MB for receiving rpc request), then it will
be better to increase the `hbase.server.allocator.max.buffer.count`. be better to increase the `hbase.server.allocator.max.buffer.count`.
The RegionServer web UI also has the statistic about ByteBuffAllocator: The RegionServer web UI has statistics on ByteBuffAllocator:
image::bytebuff-allocator-stats.png[] image::bytebuff-allocator-stats.png[]
If the following condition meet, you may need to increase your max buffer.count: If the following condition is met, you may need to increase your max buffer.count:
heapAllocationRatio >= hbase.server.allocator.minimal.allocate.size / hbase.server.allocator.buffer.size * 100% heapAllocationRatio >= hbase.server.allocator.minimal.allocate.size / hbase.server.allocator.buffer.size * 100%
.Please make sure the buffer size is greater than your block size. .Please make sure the buffer size is greater than your block size.
We have the default block size=64KB, so almost all of the data block have a block size: 64KB + delta, whose delta is We have the default block size of 64KB, so almost all of the data blocks will be 64KB + a small delta, where the delta is
very small, depends on the size of last KeyValue. If we use the default `hbase.server.allocator.buffer.size`=64KB, very small, depending on the size of the last Cell. If we set `hbase.server.allocator.buffer.size`=64KB,
then each block will be allocated as two ByteBuffers: one 64KB DirectByteBuffer and one HeapByteBuffer with delta bytes, then each block will be allocated as two ByteBuffers: one 64KB DirectByteBuffer and one HeapByteBuffer for the delta bytes.
the HeapByteBuffer will increase the GC pressure. Ideally, we should let the data block to be allocated as one ByteBuffer, Ideally, we should let the data block to be allocated as one ByteBuffer; it has a simpler data structure, faster access speed,
it has simpler data structure, faster access speed, less heap usage. On the other hand, If the blocks are composited by multiple ByteBuffers, and less heap usage. Also, if the blocks are a composite of multiple ByteBuffers, to validate the checksum
so we have to validate the checksum by an temporary heap copying (see link:https://issues.apache.org/jira/browse/HBASE-21917[HBASE-21917]), while if its a single ByteBuffer, we have to perform a temporary heap copy (see link:https://issues.apache.org/jira/browse/HBASE-21917[HBASE-21917])
we can speed the checksum by calling the hadoop' checksum in native lib, it's more faster. whereas if its a single ByteBuffer we can speed the checksum by calling the hadoop' checksum native lib; it's more faster.
Please also see: link:https://issues.apache.org/jira/browse/HBASE-22483[HBASE-22483] Please also see: link:https://issues.apache.org/jira/browse/HBASE-22483[HBASE-22483]
Don't forget to up your _HBASE_OFFHEAPSIZE_ accordingly. See <<hbase.offheapsize>>
[[regionserver.offheap.writepath]] [[regionserver.offheap.writepath]]
== Offheap write-path == Offheap write-path
In HBase 2.0.0, link:https://issues.apache.org/jira/browse/HBASE-15179[HBASE-15179] made the HBase write path to work off-heap. By default, the MemStores use In hbase-2.x, link:https://issues.apache.org/jira/browse/HBASE-15179[HBASE-15179] made the HBase write path work off-heap. By default, the MemStores in
MSLAB to avoid memory fragmentation. It creates bigger fixed sized chunks and memstore cell's data will get copied into these chunks. These chunks can be pooled HBase have always used MemStore Local Allocation Buffers (MSLABs) to avoid memory fragmentation; an MSLAB creates bigger fixed sized chunks and then the
also and from 2.0.0 the MSLAB (MemStore-Local Allocation Buffer) pool is by default ON. Write off-heaping makes use of the MSLAB pool. It creates MSLAB chunks MemStores Cell's data gets copied into these MSLAB chunks. These chunks can be pooled also and from hbase-2.x on, the MSLAB pool is by default ON.
as Direct ByteBuffers and pools them. HBase defaults to using no off-heap memory for MSLAB which means that cells are copied to heap chunk in MSLAB by default Write off-heaping makes use of the MSLAB pool. It creates MSLAB chunks as Direct ByteBuffers and pools them.
rather than off-heap chunk.
`hbase.regionserver.offheap.global.memstore.size` is the configuration key which controls the amount of off-heap data whose value is the number of megabytes `hbase.regionserver.offheap.global.memstore.size` is the configuration key which controls the amount of off-heap data. Its value is the number of megabytes
of off-heap memory that should be by MSLAB (e.g. `25` would result in 25MB of off-heap). Be sure to increase `HBASE_OFFHEAPSIZE` which will set the JVM's of off-heap memory that should be used by MSLAB (e.g. `25` would result in 25MB of off-heap). Be sure to increase _HBASE_OFFHEAPSIZE_ which will set the JVM's
MaxDirectMemorySize property. Its default value is 0, means MSLAB use heap chunks. MaxDirectMemorySize property (see <<hbase.offheapsize>> for more on _HBASE_OFFHEAPSIZE_). The default value of
`hbase.regionserver.offheap.global.memstore.size` is 0 which means MSLAB uses onheap, not offheap, chunks by default.
`hbase.hregion.memstore.mslab.chunksize` controls the size of each off-heap chunk, defaulting to `2097152` (2MB). `hbase.hregion.memstore.mslab.chunksize` controls the size of each off-heap chunk. Default is `2097152` (2MB).
When a Cell is added to a MemStore, the bytes for that Cell are copied into these off-heap buffers (if set the `hbase.regionserver.offheap.global.memstore.size` to non-zero) When a Cell is added to a MemStore, the bytes for that Cell are copied into these off-heap buffers (if `hbase.regionserver.offheap.global.memstore.size` is non-zero)
and a Cell POJO will refer to this memory area. This can greatly reduce the on-heap occupancy of the MemStores and reduce the total heap utilization for RegionServers and a Cell POJO will refer to this memory area. This can greatly reduce the on-heap occupancy of the MemStores and reduce the total heap utilization for RegionServers
in a write-heavy workload. On-heap and off-heap memory utiliazation are tracked at multiple levels to implement low level and high level memory management. in a write-heavy workload. On-heap and off-heap memory utiliazation are tracked at multiple levels to implement low level and high level memory management.
The decision to flush a MemStore considers both the on-heap and off-heap usage of that MemStore. At the Region level, the sum of the on-heap and off-heap usages and The decision to flush a MemStore considers both the on-heap and off-heap usage of that MemStore. At the Region level, we sum the on-heap and off-heap usages and
compares them against the region flush size (128MB, by default). Globally, on-heap size occupancy of all memstores are tracked as well as off-heap size. When any of compare them against the region flush size (128MB, by default). Globally, on-heap size occupancy of all memstores are tracked as well as off-heap size. When any of
these sizes breaches the lower mark (`hbase.regionserver.global.memstore.size.lower.limit`) or the maximum size `hbase.regionserver.global.memstore.size`), all these sizes breache the lower mark (`hbase.regionserver.global.memstore.size.lower.limit`) or the maximum size `hbase.regionserver.global.memstore.size`), all
regions are selected for forced flushes. regions are selected for forced flushes.

View File

@ -69,6 +69,7 @@ In addition to the usual API versioning considerations HBase has other compatibi
* Allow changing or removing existing client APIs. * Allow changing or removing existing client APIs.
* An API needs to be deprecated for a whole major version before we will change/remove it. * An API needs to be deprecated for a whole major version before we will change/remove it.
** An example: An API was deprecated in 2.0.1 and will be marked for deletion in 4.0.0. On the other hand, an API deprecated in 2.0.0 can be removed in 3.0.0. ** An example: An API was deprecated in 2.0.1 and will be marked for deletion in 4.0.0. On the other hand, an API deprecated in 2.0.0 can be removed in 3.0.0.
** Occasionally mistakes are made and internal classes are marked with a higher access level than they should. In these rare circumstances, we will accelerate the deprecation schedule to the next major version (i.e., deprecated in 2.2.x, marked `IA.Private` 3.0.0). Such changes are communicated and explained via release note in Jira.
* APIs available in a patch version will be available in all later patch versions. However, new APIs may be added which will not be available in earlier patch versions. * APIs available in a patch version will be available in all later patch versions. However, new APIs may be added which will not be available in earlier patch versions.
* New APIs introduced in a patch version will only be added in a source compatible way footnote:[See 'Source Compatibility' https://blogs.oracle.com/darcy/entry/kinds_of_compatibility]: i.e. code that implements public APIs will continue to compile. * New APIs introduced in a patch version will only be added in a source compatible way footnote:[See 'Source Compatibility' https://blogs.oracle.com/darcy/entry/kinds_of_compatibility]: i.e. code that implements public APIs will continue to compile.
** Example: A user using a newly deprecated API does not need to modify application code with HBase API calls until the next major version. ** Example: A user using a newly deprecated API does not need to modify application code with HBase API calls until the next major version.