mbautin ec1c804fc1 [jira] [HBASE-4218] HFile data block encoding framework and delta encoding
implementation (Jacek Midgal, Mikhail Bautin)

Summary:

Adding a framework that allows to "encode" keys in an HFile data block. We
support two modes of encoding: (1) both on disk and in cache, and (2) in cache
only. This is distinct from compression that is already being done in HBase,
e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache
in an uncompressed but encoded form. This allows to fit more blocks in cache
and reduce the number of disk reads.

The most common example of data block encoding is delta encoding, where we take
advantage of the fact that HFile keys are sorted and share a lot of common
prefixes, and only store the delta between each pair of consecutive keys.
Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX.

This is based on the delta encoding patch developed by Jacek Midgal during his
2011 summer internship at Facebook. The original patch is available here:
https://reviews.apache.org/r/2308/diff/.

Test Plan: Unit tests. Distributed load test on a five-node cluster.

Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

Reviewed By: Kannan

CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen

Differential Revision: https://reviews.facebook.net/D447



git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1236031 13f79535-47bb-0310-9956-ffa450edef68
2012-01-26 02:58:57 +00:00

Apache HBase [1] is an open-source, distributed, versioned, column-oriented
store modeled after Google' Bigtable: A Distributed Storage System for
Structured Data by Chang et al.[2]  Just as Bigtable leverages the distributed
data storage provided by the Google File System, HBase provides Bigtable-like
capabilities on top of Apache Hadoop [3].

To get started using HBase, the full documentation for this release can be
found under the doc/ directory that accompanies this README.  Using a browser,
open the docs/index.html to view the project home page (or browse to [1]).
The hbase 'book' at docs/book.html has a 'quick start' section and is where you
should being your exploration of the hbase project.

The latest HBase can be downloaded from an Apache Mirror [4].

The source code can be found at [5]

The HBase issue tracker is at [6]

Apache HBase is made available under the Apache License, version 2.0 [7]

The HBase mailing lists and archives are listed here [8].

1. http://hbase.apache.org
2. http://labs.google.com/papers/bigtable.html
3. http://hadoop.apache.org
4. http://www.apache.org/dyn/closer.cgi/hbase/
5. http://hbase.apache.org/docs/current/source-repository.html
6. http://hbase.apache.org/docs/current/issue-tracking.html
7. http://hbase.apache.org/docs/current/license.html
8. http://hbase.apache.org/docs/current/mail-lists.html
Description
No description provided
Readme 550 MiB
Languages
Java 96.1%
Ruby 1.7%
Perl 0.8%
Shell 0.7%
Python 0.3%
Other 0.1%