Summary:
DataBlockEncodingTool currently does not provide baseline compression
efficiency, e.g. Hadoop compression codec applied to unencoded data. E.g. if
we are using LZO to compress blocks, we would like to have the following
columns in the report (possibly as percentages of raw data size).
Baseline K+V in blockcache | Baseline K + V on disk (LZO compressed) | K + V
DataBlockEncoded in block cache | K + V DataBlockEncoded + LZOCompressed (on
disk)
Background: we never store compressed blocks in cache, but we always store
encoded data blocks in cache if data block encoding is enabled for the column
family.
This patch also has multiple bugfixes and improvements to DataBlockEncodingTool,
including presentation format, memory requirements (reduced 3x) and fixing the
handling of compression.
Test Plan:
* Run unit tests.
* Run DataBlockEncodingTool on a variety of real-world HFiles.
Reviewers: JIRA, dhruba, tedyu, stack, heyongqiang
Reviewed By: tedyu
Differential Revision: https://reviews.facebook.net/D2409
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1304626 13f79535-47bb-0310-9956-ffa450edef68
context
Author: Yongqiang He
Summary:
https://issues.apache.org/jira/browse/HBASE-5521
As part of working on HBASE-5313, we want to add a new columnar encoder/decoder.
It makes sense to move compression to be part of encoder/decoder:
1) a scanner for a columnar encoded block can do lazy decompression to a
specific part of a key value object
2) avoid an extra bytes copy from encoder to hblock-writer.
If there is no encoder specified for a writer, the HBlock.Writer will use a
default compression-context to do something very similar to today's code.
Test Plan: existing unit tests verified by mbautin and tedyu. And no new test
added here since this code is just a preparation for columnar encoder. Will add
testcase later in that diff.
Reviewers: dhruba, tedyu, sc, mbautin
Reviewed By: mbautin
Differential Revision: https://reviews.facebook.net/D2097
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1302602 13f79535-47bb-0310-9956-ffa450edef68
Summary: Adding Java lint engine to HBase's .arcconfig. The lint engine itself
is part of the arc-jira repository that lives at
https://github.com/facebook/arc-jira/. There are some changes to be made there
to prevent lint from reporting errors for unmodified lines.
Test Plan: arc lint
Reviewers: nspiegelberg, Kannan, Karthik, JIRA, Liyin
Reviewed By: Liyin
Differential Revision: https://reviews.facebook.net/D2289
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1301751 13f79535-47bb-0310-9956-ffa450edef68
Author: Zhiqiu Kong
Summary:
Added two separate metrics for both get() and next(). This is done by
refactoring on internal next() API. To be more specific, only Get.get()
and ResultScanner.next() passes the metric name ("getsize" and
"nextsize" repectively) to
HRegion::RegionScanner::next(List<KeyValue>, String)
This will eventually hit StoreScanner()::next((List<KeyValue>,
int, String) where the metrics are counted.
And their call paths are:
1) Get
HTable::get(final Get get)
=> HRegionServer::get(byte [] regionName, Get get)
=> HRegion::get(final Get get, final Integer lockid)
=> HRegion::get(final Get get) [pass METRIC_GETSIZE to the
callee]
=> HRegion::RegionScanner::next(List<KeyValue> outResults, String
metric)
=> HRegion::RegionScanner::next(List<KeyValue> outResults, int limit,
String metric)
=> HRegion::RegionScanner::nextInternal(int limit, String metric)
=> KeyValueHeap::next(List<KeyValue> result, int limit, String
metric)
=> StoreScanner::next(List<KeyValue> outResult, int limit, String
metric)
2) Next
HTable::ClientScanner::next()
=> ScannerCallable::call()
=> HRegionServer::next(long scannerId)
=> HRegionServer::next(final long scannerId, int nbRows) [pass
METRIC_NEXTSIZE to the callee]
=> HRegion::RegionScanner::next(List<KeyValue> outResults, String
metric)
=> HRegion::RegionScanner::next(List<KeyValue> outResults, int limit,
String metric)
=> HRegion::RegionScanner::nextInternal(int limit, String metric)
=> KeyValueHeap::next(List<KeyValue> result, int limit, String
metric)
=> StoreScanner::next(List<KeyValue> outResult, int limit, String
metric)
Test Plan:
1. Passed unit tests.
2. Created a testcase TestRegionServerMetrics::testGetNextSize to
guarantee:
* Get/Next contributes to getsize/nextsize metrics
* Both getsize/nextsize are per Column Family
* Flush/compaction won't affect these two metrics
Reviewed By: mbautin
Reviewers: Kannan, mbautin, Liyin, JIRA
CC: Kannan, mbautin, Liyin, zhiqiu
Differential Revision: https://reviews.facebook.net/D1617
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1299147 13f79535-47bb-0310-9956-ffa450edef68