Summary:
DataBlockEncodingTool currently does not provide baseline compression
efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if
we are using LZO to compress blocks, we would like to have the following
columns in the report (possibly as percentages of raw data size).
Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V
DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on
disk)
Background: we never store compressed blocks in cache, but we always store
encoded data blocks in cache if data block encoding is enabled for the column
family.
This patch also has multiple bugfixes and improvements to DataBlockEncodingTool,
including presentation format, memory requirements (reduced 3x) and fixing the
handling of compression.
Test Plan:
* Run unit tests.
* Run DataBlockEncodingTool on a variety of real-world HFiles.
Reviewers: JIRA, dhruba, tedyu, stack, heyongqiang
Reviewed By: tedyu
Differential Revision: https://reviews.facebook.net/D2409
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1304626 13f79535-47bb-0310-9956-ffa450edef68
context
Author: Yongqiang He
Summary:
https://issues.apache.org/jira/browse/HBASE-5521
As part of working on HBASE-5313, we want to add a new columnar encoder/decoder.
It makes sense to move compression to be part of encoder/decoder:
1) a scanner for a columnar encoded block can do lazy decompression to a
specific part of a key value object
2) avoid an extra bytes copy from encoder to hblock-writer.
If there is no encoder specified for a writer, the HBlock.Writer will use a
default compression-context to do something very similar to today's code.
Test Plan: existing unit tests verified by mbautin and tedyu. And no new test
added here since this code is just a preparation for columnar encoder. Will add
testcase later in that diff.
Reviewers: dhruba, tedyu, sc, mbautin
Reviewed By: mbautin
Differential Revision: https://reviews.facebook.net/D2097
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1302602 13f79535-47bb-0310-9956-ffa450edef68
Author: Zhiqiu Kong
Summary:
Added two separate metrics for both get() and next(). This is done by
refactoring on internal next() API. To be more specific, only Get.get()
and ResultScanner.next() passes the metric name ("getsize" and
"nextsize" repectively) to
HRegion::RegionScanner::next(List<KeyValue>, String)
This will eventually hit StoreScanner()::next((List<KeyValue>,
int, String) where the metrics are counted.
And their call paths are:
1) Get
HTable::get(final Get get)
=> HRegionServer::get(byte [] regionName, Get get)
=> HRegion::get(final Get get, final Integer lockid)
=> HRegion::get(final Get get) [pass METRIC_GETSIZE to the
callee]
=> HRegion::RegionScanner::next(List<KeyValue> outResults, String
metric)
=> HRegion::RegionScanner::next(List<KeyValue> outResults, int limit,
String metric)
=> HRegion::RegionScanner::nextInternal(int limit, String metric)
=> KeyValueHeap::next(List<KeyValue> result, int limit, String
metric)
=> StoreScanner::next(List<KeyValue> outResult, int limit, String
metric)
2) Next
HTable::ClientScanner::next()
=> ScannerCallable::call()
=> HRegionServer::next(long scannerId)
=> HRegionServer::next(final long scannerId, int nbRows) [pass
METRIC_NEXTSIZE to the callee]
=> HRegion::RegionScanner::next(List<KeyValue> outResults, String
metric)
=> HRegion::RegionScanner::next(List<KeyValue> outResults, int limit,
String metric)
=> HRegion::RegionScanner::nextInternal(int limit, String metric)
=> KeyValueHeap::next(List<KeyValue> result, int limit, String
metric)
=> StoreScanner::next(List<KeyValue> outResult, int limit, String
metric)
Test Plan:
1. Passed unit tests.
2. Created a testcase TestRegionServerMetrics::testGetNextSize to
guarantee:
* Get/Next contributes to getsize/nextsize metrics
* Both getsize/nextsize are per Column Family
* Flush/compaction won't affect these two metrics
Reviewed By: mbautin
Reviewers: Kannan, mbautin, Liyin, JIRA
CC: Kannan, mbautin, Liyin, zhiqiu
Differential Revision: https://reviews.facebook.net/D1617
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1299147 13f79535-47bb-0310-9956-ffa450edef68
Author: Zhiqiu Kong
Summary:
The original 89-fb diff is: https://reviews.facebook.net/D1263
Slow opertaion log does not provide enough information when a filter is
present. The followings are done to add the filter info:
1) Added toString() method for filters inheriting FilterBase, this
affect 22 filters and their subclasses. The info added includes the
filter's name and its members. For example, for TimestampsFilter, we'll
output its class name as well as the defined timestamps.
2) Added a field 'filter' in Get::toMap() and Scan::toMap() to enable
the logging of filter info.
Task ID: #750975
Blame Rev:
Test Plan:
1. Run and passed unit-tests to make sure it does not break things
2. Run kannan's script to trigger the slow operation logging, checked
for each filter to make sure the filter info was logged. To be more
detailed, the output log are as following (only 'filter' filed is put
here for ease of reading):
"filter":"TimestampsFilter (3/3): [2, 3, 5]"
"filter":"TimestampsFilter (5/6): [2, 3, 5, 7, 11]"
"filter":"ColumnPrefixFilter col2"
"filter":"ColumnRangeFilter [col2a, col2b]"
"filter":"ColumnCountGetFilter 8"
"filter":"ColumnPaginationFilter (4, 4)"
"filter":"InclusiveStopFilter row"
"filter":"PrefixFilter row"
"filter":"PageFilter 1"
"filter":"SkipFilter TimestampsFilter (1/1): [1000]"
"filter":"WhileMatchFilter TimestampsFilter (3/3): [2, 3, 5]"
"filter":"KeyOnlyFilter"
"filter":"FirstKeyOnlyFilter"
"filter":"MultipleColumnPrefixFilter (3/3): [a, b, c]"
"filter":"DependentColumnFilter (family, qualifier, true, LESS, value)"
"filter":"FamilyFilter (LESS, value)"
"filter":"QualifierFilter (LESS, value)"
"filter":"RowFilter (LESS, value)"
"filter":"ValueFilter (LESS, value)"
"filter":"KeyOnlyFilter"
"filter":"FirstKeyOnlyFilter"
"filter":"SingleColumnValueFilter (family, qualifier, EQUAL, value)"
"filter":"SingleColumnValueExcludeFilter (family, qualifier, EQUAL,
value)"
"filter":"FilterList AND (2/2): [KeyOnlyFilter, FirstKeyOnlyFilter]"
Please check ~zhiqiu/Codes/scripts/testFilter.rb for the testing script.
3. Added unit test cases to TestOperation to verify the filters'
toString() method works well.
Reviewed By: mbautin
Reviewers: Kannan, madhuvaidya, mbautin, JIRA
CC: Kannan, madhuvaidya, mbautin, zhiqiu, stack
Differential Revision: https://reviews.facebook.net/D1539
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1299019 13f79535-47bb-0310-9956-ffa450edef68
Author: Dhruba
Summary:
HFile is enhanced to store a checksum for each block. HDFS checksum verification
is avoided while reading data into the block cache. On a checksum verification
failure, we retry the file system read request with hdfs checksums switched on
(thanks Todd).
I have a benchmark that shows that it reduces iops on the disk by about 40%. In
this experiment, the entire memory on the regionserver is allocated to the
regionserver's jvm and the OS buffer cache size is negligible. I also measured
negligible (<5%) additional cpu usage while using hbase-level checksums.
The salient points of this patch:
1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so
that these 4 bytes can be interpreted as a (major version number, minor
version). Pre-existing hfiles have a minor version of 0. The new hfile format
has a minor version of 1 (thanks Mikhail). The hfile major version remains
unchanged at 2. The reason I did not introduce a new major version number is
because the code changes needed to store/read checksums do not differ much from
existing V2 writers/readers.
2. Introduced a HFileSystem object which is a encapsulates the FileSystem
objects needed to access data from hfiles and hlogs. HDFS FileSystem objects
already had the ability to switch off checksum verifications for reads.
3. The majority of the code changes are located in hbase.io.hfie package. The
retry of a read on an initial checksum failure occurs inside the hbase.io.hfile
package itself. The code changes to hbase.regionserver package are minor.
4. The format of a hfileblock is the header followed by the data followed by the
checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The
hfileblock header has two additional fields: a 4 byte value to store the
bytesPerChecksum and a 4 byte value to store the size of the user data
(excluding the checksum data). This is well explained in the associated
javadocs.
5. I added a test to test backward compatibility. I will be writing more unit
tests that triggers checksum verification failures aggressively. I have left a
few redundant log messages in the code (just for easier debugging) and will
remove them in later stage of this patch. I will also be adding metrics on
number of checksum verification failures/success in a later version of this
diff.
6. By default, hbase-level checksums are switched on and hdfs level checksums
are switched off for hfile-reads. No changes to Hlog code path here.
Test Plan: The default setting is to switch on hbase checksums for hfile-reads,
thus all existing tests actually validate the new code pieces. I will be writing
more unit tests for triggering checksum verification failures.
Reviewers: mbautin
Reviewed By: mbautin
CC: JIRA, tedyu, mbautin, dhruba, todd, stack
Differential Revision: https://reviews.facebook.net/D1521
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1298641 13f79535-47bb-0310-9956-ffa450edef68
Summary: Cleaning up the factory method explosion in HFile writer and StoreFile.
Now, adding a new parameter to HFile/StoreFile writer initialization will not
require modifying factory method invocations all over the codebase.
Test Plan:
Run unit tests
Deploy to dev cluster and run a load test
Reviewers: JIRA, stack, tedyu, Kannan, Karthik, Liyin
Reviewed By: stack
Differential Revision: https://reviews.facebook.net/D1893
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1293095 13f79535-47bb-0310-9956-ffa450edef68
compression codecs loaded
Summary:
DataBlockEncodingTool was fixed as part of porting data block encoding
(HBASE-4218) to 89-fb
(https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1245291,
https://reviews.facebook.net/D1659). The bug being fixed here appeared when
using GZ as baseline compression codec but not loading native Hadoop libraries,
in which case the compressor instance would be null.
Test Plan:
Run DataBlockEncoding tool with GZ (no native codecs) and LZO (with native
codecs) as baseline (Hadoop-level) compression codecs
Reviewers: JIRA, Kannan, mcorgan, lhofhansl, todd, stack, tedyu
Reviewed By: tedyu
Differential Revision: https://reviews.facebook.net/D1917
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1293057 13f79535-47bb-0310-9956-ffa450edef68
Summary: Deprecate three ugly HColumnDescriptor constructors and use a "builder"
pattern instead, e.g. new
HColumnDescriptor(cfName).setMaxVersions(5).setBlockSize(8192), etc. Setters
have now become "builder" methods.
Test Plan: Run unit tests
Reviewers: JIRA, todd, stack, tedyu, Kannan, Karthik, Liyin
Reviewed By: tedyu
Differential Revision: https://reviews.facebook.net/D1851
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1293026 13f79535-47bb-0310-9956-ffa450edef68
Summary:
Changing the jitter in major compactions to be deterministic,
so reboots don't cause a time-based compaction storm. This
implementation seeds a random number generator with HDFS data for
persistence.
Test Plan: - mvn test -Dtest=TestCompaction,TestCompactSelection,TestHeapSize
Reviewers: JIRA, Kannan, aaiyer, stack
Reviewed By: Kannan
Differential Revision: https://reviews.facebook.net/D1785
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1292495 13f79535-47bb-0310-9956-ffa450edef68
Summary: We need to to reuse compression streams in HFileBlock.Writer instead of
allocating them every time. The motivation is that when using Java's built-in
implementation of Gzip, we allocate a new GZIPOutputStream object and an
associated native data structure any time. This is one suspected cause of recent
TestHFileBlock failures on Hadoop QA:
https://builds.apache.org/job/HBase-TRUNK/2658/testReport/org.apache.hadoop.hbase.io.hfile/TestHFileBlock/testPreviousOffset_1_/.
Test Plan:
* Run unit tests
* Create a GZIP-compressed CF with new code, load some data, shut down HBase,
deploy old code, restart HBase, and scan the table
Reviewers: tedyu, Liyin, dhruba, JIRA, lhofhansl
Reviewed By: lhofhansl
CC: tedyu, lhofhansl, mbautin
Differential Revision: https://reviews.facebook.net/D1719
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1243667 13f79535-47bb-0310-9956-ffa450edef68
Summary: This is a unit test that should have been part of HBASE-4683 but was
not committed. The original test was reviewed as part of
https://reviews.facebook.net/D807. Submitting unit test as a separate JIRA and
patch, and extending the scope of the test to also handle the case when block
cache is enabled for the column family.
Test Plan: Run unit tests
Reviewers: JIRA, jdcryans, lhofhansl, Liyin
Reviewed By: jdcryans
CC: jdcryans
Differential Revision: https://reviews.facebook.net/D1695
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1242894 13f79535-47bb-0310-9956-ffa450edef68
Summary: This is a followup for D1017 to make it similar to D909 (89-fb). The
fix for 89-fb used the TTL-based scanner filtering logic on both normal scanners
and compactions, while the trunk fix D1017 did not. This is just the delta
between the two diffs that brings filtering expired store files on compaction to
trunk.
Test Plan: Unit tests
Reviewers: Liyin, JIRA, lhofhansl, Kannan
Reviewed By: Liyin
CC: Liyin, tedyu, Kannan, mbautin, lhofhansl
Differential Revision: https://reviews.facebook.net/D1473
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1236483 13f79535-47bb-0310-9956-ffa450edef68
implementation (Jacek Midgal, Mikhail Bautin)
Summary:
Adding a framework that allows to "encode" keys in an HFile data block. We
support two modes of encoding: (1) both on disk and in cache, and (2) in cache
only. This is distinct from compression that is already being done in HBase,
e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache
in an uncompressed but encoded form. This allows to fit more blocks in cache
and reduce the number of disk reads.
The most common example of data block encoding is delta encoding, where we take
advantage of the fact that HFile keys are sorted and share a lot of common
prefixes, and only store the delta between each pair of consecutive keys.
Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX.
This is based on the delta encoding patch developed by Jacek Midgal during his
2011 summer internship at Facebook. The original patch is available here:
https://reviews.apache.org/r/2308/diff/.
Test Plan: Unit tests. Distributed load test on a five-node cluster.
Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan
Reviewed By: Kannan
CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen
Differential Revision: https://reviews.facebook.net/D447
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1236031 13f79535-47bb-0310-9956-ffa450edef68