HBASE-2294 Enumerate ACID properties of HBase in a well defined spec
git-svn-id: https://svn.apache.org/repos/asf/hadoop/hbase/trunk@936112 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
3cb1168601
commit
c39d8f341a
|
@ -18,6 +18,8 @@ Release 0.21.0 - Unreleased
|
|||
HBASE-2378 Bulk insert with multiple reducers broken due to improper
|
||||
ImmutableBytesWritable comparator (Todd Lipcon via Stack)
|
||||
HBASE-2392 Upgrade to ZooKeeper 3.3.0
|
||||
HBASE-2294 Enumerate ACID properties of HBase in a well defined spec
|
||||
(Todd Lipcon via Stack)
|
||||
|
||||
BUG FIXES
|
||||
HBASE-1791 Timeout in IndexRecordWriter (Bradford Stephens via Andrew
|
||||
|
|
|
@ -0,0 +1,227 @@
|
|||
<?xml version="1.0"?>
|
||||
<!--
|
||||
Copyright 2002-2008 The Apache Software Foundation
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
|
||||
"http://forrest.apache.org/dtd/document-v20.dtd">
|
||||
|
||||
|
||||
<document>
|
||||
|
||||
<header>
|
||||
<title>
|
||||
HBase ACID Properties
|
||||
</title>
|
||||
</header>
|
||||
|
||||
<body>
|
||||
<section>
|
||||
<title>About this Document</title>
|
||||
<p>HBase is not an ACID compliant database. However, it does guarantee certain specific
|
||||
properties.</p>
|
||||
<p>This specification enumerates the ACID properties of HBase.</p>
|
||||
</section>
|
||||
<section>
|
||||
<title>Definitions</title>
|
||||
<p>For the sake of common vocabulary, we define the following terms:</p>
|
||||
<dl>
|
||||
<dt>Atomicity</dt>
|
||||
<dd>an operation is atomic if it either completes entirely or not at all</dd>
|
||||
|
||||
<dt>Consistency</dt>
|
||||
<dd>
|
||||
all actions cause the table to transition from one valid state directly to another
|
||||
(eg a row will not disappear during an update, etc)
|
||||
</dd>
|
||||
|
||||
<dt>Isolation</dt>
|
||||
<dd>
|
||||
an operation is isolated if it appears to complete independently of any other concurrent transaction
|
||||
</dd>
|
||||
|
||||
<dt>Durability</dt>
|
||||
<dd>any update that reports "successful" to the client will not be lost</dd>
|
||||
|
||||
<dt>Visibility</dt>
|
||||
<dd>an update is considered visible if any subsequent read will see the update as having been committed</dd>
|
||||
</dl>
|
||||
<p>
|
||||
The terms <em>must</em> and <em>may</em> are used as specified by RFC 2119.
|
||||
In short, the word "must" implies that, if some case exists where the statement
|
||||
is not true, it is a bug. The word "may" implies that, even if the guarantee
|
||||
is provided in a current release, users should not rely on it.
|
||||
</p>
|
||||
</section>
|
||||
<section>
|
||||
<title>APIs to consider</title>
|
||||
<ul>
|
||||
<li>Read APIs
|
||||
<ul>
|
||||
<li>get</li>
|
||||
<li>scan</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>Write APIs</li>
|
||||
<ul>
|
||||
<li>put</li>
|
||||
<li>batch put</li>
|
||||
<li>delete</li>
|
||||
</ul>
|
||||
<li>Combination (read-modify-write) APIs</li>
|
||||
<ul>
|
||||
<li>incrementColumnValue</li>
|
||||
<li>checkAndPut</li>
|
||||
</ul>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section>
|
||||
<title>Guarantees Provided</title>
|
||||
|
||||
<section>
|
||||
<title>Atomicity</title>
|
||||
|
||||
<ol>
|
||||
<li>All mutations are atomic within a row. Any put will either wholely succeed or wholely fail.</li>
|
||||
<ol>
|
||||
<li>An operation that returns a "success" code has completely succeeded.</li>
|
||||
<li>An operation that returns a "failure" code has completely failed.</li>
|
||||
<li>An operation that times out may have succeeded and may have failed. However,
|
||||
it will not have partially succeeded or failed.</li>
|
||||
</ol>
|
||||
<li> This is true even if the mutation crosses multiple column families within a row.</li>
|
||||
<li> APIs that mutate several rows will _not_ be atomic across the multiple rows.
|
||||
For example, a multiput that operates on rows 'a','b', and 'c' may return having
|
||||
mutated some but not all of the rows. In such cases, these APIs will return a list
|
||||
of success codes, each of which may be succeeded, failed, or timed out as described above.</li>
|
||||
<li> The checkAndPut API happens atomically like the typical compareAndSet (CAS) operation
|
||||
found in many hardware architectures.</li>
|
||||
<li> The order of mutations is seen to happen in a well-defined order for each row, with no
|
||||
interleaving. For example, if one writer issues the mutation "a=1,b=1,c=1" and
|
||||
another writer issues the mutation "a=2,b=2,c=2", the row must either
|
||||
be "a=1,b=1,c=1" or "a=2,b=2,c=2" and must <em>not</em> be something
|
||||
like "a=1,b=2,c=1".</li>
|
||||
<ol>
|
||||
<li>Please note that this is not true _across rows_ for multirow batch mutations.</li>
|
||||
</ol>
|
||||
</ol>
|
||||
</section>
|
||||
<section>
|
||||
<title>Consistency and Isolation</title>
|
||||
<ol>
|
||||
<li>All rows returned via any access API will consist of a complete row that existed at
|
||||
some point in the table's history.</li>
|
||||
<li>This is true across column families - i.e a get of a full row that occurs concurrent
|
||||
with some mutations 1,2,3,4,5 will return a complete row that existed at some point in time
|
||||
between mutation i and i+1 for some i between 1 and 5.</li>
|
||||
<li>The state of a row will only move forward through the history of edits to it.</li>
|
||||
</ol>
|
||||
|
||||
<section><title>Consistency of Scans</title>
|
||||
<p>
|
||||
A scan is <strong>not</strong> a consistent view of a table. Scans do
|
||||
<strong>not</strong> exhibit <em>snapshot isolation</em>.
|
||||
</p>
|
||||
<p>
|
||||
Rather, scans have the following properties:
|
||||
</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
Any row returned by the scan will be a consistent view (i.e. that version
|
||||
of the complete row existed at some point in time)
|
||||
</li>
|
||||
<li>
|
||||
A scan will always reflect a view of the data <em>at least as new as</em>
|
||||
the beginning of the scan. This satisfies the visibility guarantees
|
||||
enumerated below.</li>
|
||||
<ol>
|
||||
<li>For example, if client A writes data X and then communicates via a side
|
||||
channel to client B, any scans started by client B will contain data at least
|
||||
as new as X.</li>
|
||||
<li>A scan _must_ reflect all mutations committed prior to the construction
|
||||
of the scanner, and _may_ reflect some mutations committed subsequent to the
|
||||
construction of the scanner.</li>
|
||||
<li>Scans must include <em>all</em> data written prior to the scan (except in
|
||||
the case where data is subsequently mutated, in which case it _may_ reflect
|
||||
the mutation)</li>
|
||||
</ol>
|
||||
</ol>
|
||||
<p>
|
||||
Those familiar with relational databases will recognize this isolation level as "read committed".
|
||||
</p>
|
||||
<p>
|
||||
Please note that the guarantees listed above regarding scanner consistency
|
||||
are referring to "transaction commit time", not the "timestamp"
|
||||
field of each cell. That is to say, a scanner started at time <em>t</em> may see edits
|
||||
with a timestamp value greater than <em>t</em>, if those edits were committed with a
|
||||
"forward dated" timestamp before the scanner was constructed.
|
||||
</p>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>Visibility</title>
|
||||
<ol>
|
||||
<li> When a client receives a "success" response for any mutation, that
|
||||
mutation is immediately visible to both that client and any client with whom it
|
||||
later communicates through side channels.</li>
|
||||
<li> A row must never exhibit so-called "time-travel" properties. That
|
||||
is to say, if a series of mutations moves a row sequentially through a series of
|
||||
states, any sequence of concurrent reads will return a subsequence of those states.</li>
|
||||
<ol>
|
||||
<li>For example, if a row's cells are mutated using the "incrementColumnValue"
|
||||
API, a client must never see the value of any cell decrease.</li>
|
||||
<li>This is true regardless of which read API is used to read back the mutation.</li>
|
||||
</ol>
|
||||
<li> Any version of a cell that has been returned to a read operation is guaranteed to
|
||||
be durably stored.</li>
|
||||
</ol>
|
||||
|
||||
</section>
|
||||
<section>
|
||||
<title>Durability</title>
|
||||
<ol>
|
||||
<li> All visible data is also durable data. That is to say, a read will never return
|
||||
data that has not been made durable on disk[1]</li>
|
||||
<li> Any operation that returns a "success" code (eg does not throw an exception)
|
||||
will be made durable.</li>
|
||||
<li> Any operation that returns a "failure" code will not be made durable
|
||||
(subject to the Atomicity guarantees above)</li>
|
||||
<li> All reasonable failure scenarios will not affect any of the guarantees of this document.</li>
|
||||
|
||||
</ol>
|
||||
</section>
|
||||
<section>
|
||||
<title>Tunability</title>
|
||||
<p>All of the above guarantees must be possible within HBase. For users who would like to trade
|
||||
off some guarantees for performance, HBase may offer several tuning options. For example:</p>
|
||||
<ul>
|
||||
<li>Visibility may be tuned on a per-read basis to allow stale reads or time travel.</li>
|
||||
<li>Durability may be tuned to only flush data to disk on a periodic basis</li>
|
||||
</ul>
|
||||
</section>
|
||||
</section>
|
||||
<section>
|
||||
<title>Footnotes</title>
|
||||
|
||||
<p>[1] In the context of HBase, "durably on disk" implies an hflush() call on the transaction
|
||||
log. This does not actually imply an fsync() to magnetic media, but rather just that the data has been
|
||||
written to the OS cache on all replicas of the log. In the case of a full datacenter power loss, it is
|
||||
possible that the edits are not truly durable.</p>
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</document>
|
|
@ -36,6 +36,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
|
|||
<started label="Getting Started" href="ext:api/started" />
|
||||
<api label="API Docs" href="ext:api/index" />
|
||||
<api label="HBase Metrics" href="metrics.html" />
|
||||
<api label="HBase Semantics" href="acid-semantics.html" />
|
||||
<api label="HBase Default Configuration" href="hbase-conf.html" />
|
||||
<api label="HBase on Windows" href="cygwin.html" />
|
||||
<wiki label="Wiki" href="ext:wiki" />
|
||||
|
|
Loading…
Reference in New Issue