961455cd9d
Author: Dhruba Summary: HFile is enhanced to store a checksum for each block. HDFS checksum verification is avoided while reading data into the block cache. On a checksum verification failure, we retry the file system read request with hdfs checksums switched on (thanks Todd). I have a benchmark that shows that it reduces iops on the disk by about 40%. In this experiment, the entire memory on the regionserver is allocated to the regionserver's jvm and the OS buffer cache size is negligible. I also measured negligible (<5%) additional cpu usage while using hbase-level checksums. The salient points of this patch: 1. Each hfile's trailer used to have a 4 byte version number. I enhanced this so that these 4 bytes can be interpreted as a (major version number, minor version). Pre-existing hfiles have a minor version of 0. The new hfile format has a minor version of 1 (thanks Mikhail). The hfile major version remains unchanged at 2. The reason I did not introduce a new major version number is because the code changes needed to store/read checksums do not differ much from existing V2 writers/readers. 2. Introduced a HFileSystem object which is a encapsulates the FileSystem objects needed to access data from hfiles and hlogs. HDFS FileSystem objects already had the ability to switch off checksum verifications for reads. 3. The majority of the code changes are located in hbase.io.hfie package. The retry of a read on an initial checksum failure occurs inside the hbase.io.hfile package itself. The code changes to hbase.regionserver package are minor. 4. The format of a hfileblock is the header followed by the data followed by the checksum(s). Each 16 K (configurable) size of data has a 4 byte checksum. The hfileblock header has two additional fields: a 4 byte value to store the bytesPerChecksum and a 4 byte value to store the size of the user data (excluding the checksum data). This is well explained in the associated javadocs. 5. I added a test to test backward compatibility. I will be writing more unit tests that triggers checksum verification failures aggressively. I have left a few redundant log messages in the code (just for easier debugging) and will remove them in later stage of this patch. I will also be adding metrics on number of checksum verification failures/success in a later version of this diff. 6. By default, hbase-level checksums are switched on and hdfs level checksums are switched off for hfile-reads. No changes to Hlog code path here. Test Plan: The default setting is to switch on hbase checksums for hfile-reads, thus all existing tests actually validate the new code pieces. I will be writing more unit tests for triggering checksum verification failures. Reviewers: mbautin Reviewed By: mbautin CC: JIRA, tedyu, mbautin, dhruba, todd, stack Differential Revision: https://reviews.facebook.net/D1521 git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1298641 13f79535-47bb-0310-9956-ffa450edef68 |
||
---|---|---|
bin | ||
conf | ||
dev-support | ||
security/src | ||
src | ||
.arcconfig | ||
.gitignore | ||
CHANGES.txt | ||
LICENSE.txt | ||
NOTICE.txt | ||
README.txt | ||
pom.xml |
README.txt
Apache HBase [1] is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al.[2] Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Apache Hadoop [3]. To get started using HBase, the full documentation for this release can be found under the doc/ directory that accompanies this README. Using a browser, open the docs/index.html to view the project home page (or browse to [1]). The hbase 'book' at docs/book.html has a 'quick start' section and is where you should being your exploration of the hbase project. The latest HBase can be downloaded from an Apache Mirror [4]. The source code can be found at [5] The HBase issue tracker is at [6] Apache HBase is made available under the Apache License, version 2.0 [7] The HBase mailing lists and archives are listed here [8]. 1. http://hbase.apache.org 2. http://labs.google.com/papers/bigtable.html 3. http://hadoop.apache.org 4. http://www.apache.org/dyn/closer.cgi/hbase/ 5. http://hbase.apache.org/docs/current/source-repository.html 6. http://hbase.apache.org/docs/current/issue-tracking.html 7. http://hbase.apache.org/docs/current/license.html 8. http://hbase.apache.org/docs/current/mail-lists.html