implementation (Jacek Midgal, Mikhail Bautin) Summary: Adding a framework that allows to "encode" keys in an HFile data block. We support two modes of encoding: (1) both on disk and in cache, and (2) in cache only. This is distinct from compression that is already being done in HBase, e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache in an uncompressed but encoded form. This allows to fit more blocks in cache and reduce the number of disk reads. The most common example of data block encoding is delta encoding, where we take advantage of the fact that HFile keys are sorted and share a lot of common prefixes, and only store the delta between each pair of consecutive keys. Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX. This is based on the delta encoding patch developed by Jacek Midgal during his 2011 summer internship at Facebook. The original patch is available here: https://reviews.apache.org/r/2308/diff/. Test Plan: Unit tests. Distributed load test on a five-node cluster. Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan Reviewed By: Kannan CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen Differential Revision: https://reviews.facebook.net/D447 git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1236031 13f79535-47bb-0310-9956-ffa450edef68
Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3]. To get started using HBase, the full documentation for this release can be found under the doc/ directory that accompanies this README. Using a browser, open the docs/index.html to view the project home page (or browse to [1]). The hbase 'book' at docs/book.html has a 'quick start' section and is where you should being your exploration of the hbase project. The latest HBase can be downloaded from an Apache Mirror [4]. The source code can be found at [5] The HBase issue tracker is at [6] Apache HBase is made available under the Apache License, version 2.0 [7] The HBase mailing lists and archives are listed here [8]. 1. http://hbase.apache.org 2. http://labs.google.com/papers/bigtable.html 3. http://hadoop.apache.org 4. http://www.apache.org/dyn/closer.cgi/hbase/ 5. http://hbase.apache.org/docs/current/source-repository.html 6. http://hbase.apache.org/docs/current/issue-tracking.html 7. http://hbase.apache.org/docs/current/license.html 8. http://hbase.apache.org/docs/current/mail-lists.html
Description
Languages
Java
96.1%
Ruby
1.7%
Perl
0.8%
Shell
0.7%
Python
0.3%
Other
0.1%