From 891569bbd92f95797033c0706c9de9617228415b Mon Sep 17 00:00:00 2001 From: stack Date: Tue, 22 Mar 2016 13:01:13 -0700 Subject: [PATCH] Add javadoc on how BBIOEngine works (by Anoop Sam John) --- .../io/hfile/bucket/ByteBufferIOEngine.java | 33 ++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/ByteBufferIOEngine.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/ByteBufferIOEngine.java index 45ed1aeae3e..63de32c89b9 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/ByteBufferIOEngine.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/bucket/ByteBufferIOEngine.java @@ -31,7 +31,38 @@ import org.apache.hadoop.hbase.util.ByteBufferArray; /** * IO engine that stores data in memory using an array of ByteBuffers - * {@link ByteBufferArray} + * {@link ByteBufferArray}. + * + *

How it Works

+ * First, see {@link ByteBufferArray} and how it gives a view across multiple ByteBuffers managed + * by it internally. This class does the physical BB create and the write and read to the + * underlying BBs. So we will create N BBs based on the total BC capacity specified on create + * of the ByteBufferArray. So say we have 10 GB of off heap BucketCache, we will create 2560 such + * BBs inside our ByteBufferArray. + * + *

Now the way BucketCache works is that the entire 10 GB is split into diff sized buckets: by + * default from 5 KB to 513 KB. Within each bucket of a particular size, there are + * usually more than one bucket 'block'. The way it is calculate in bucketcache is that the total + * bucketcache size is divided by 4 (hard-coded currently) * max size option. So using defaults, + * buckets will be is 4 * 513kb (the biggest default value) = 2052kb. A bucket of 2052kb at offset + * zero will serve out bucket 'blocks' of 5kb, the next bucket will do the next size up and so on + * up to the maximum (default) of 513kb). + * + *

When we write blocks to the bucketcache, we will see which bucket size group it best fits. + * So a 4 KB block size goes to the 5 KB size group. Each of the block writes, writes within its + * appropriate bucket. Though the bucket is '4kb' in size, it will occupy one of the + * 5 KB bucket 'blocks' (even if actual size of the bucket is less). Bucket 'blocks' will not span + * buckets. + * + *

But you can see the physical memory under the bucket 'blocks' can be split across the + * underlying backing BBs from ByteBufferArray. All is split into 4 MB sized BBs. + * + *

Each Bucket knows its offset in the entire space of BC and when block is written the offset + * arrives at ByteBufferArray and it figures which BB to write to. It may so happen that the entire + * block to be written does not fit a particular backing ByteBufferArray so the remainder goes to + * another BB. See {@link ByteBufferArray#putMultiple(long, int, byte[])}. + +So said all these, when we read a block it may be possible that the bytes of that blocks is physically placed in 2 adjucent BBs. In such case also, we avoid any copy need by having the MBB... */ @InterfaceAudience.Private public class ByteBufferIOEngine implements IOEngine {