HBASE-6286 Move site back up out of hbase-assembly; bad idea; ADDENDUM -- MANUAL I COPIED WAS STALE

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1465460 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2013-04-07 20:58:13 +00:00
parent db02f627b7
commit c43c3b8cd1
6 changed files with 112 additions and 614 deletions

View File

@ -581,445 +581,8 @@ htable.put(put);
</section>
</chapter> <!-- data model -->
<chapter xml:id="schema">
<title>HBase and Schema Design</title>
<para>A good general introduction on the strength and weaknesses modelling on
the various non-rdbms datastores is Ian Varley's Master thesis,
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
</para>
<section xml:id="schema.creation">
<title>
Schema Creation
</title>
<para>HBase schemas can be created or updated with <xref linkend="shell" />
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
</para>
<para>Tables must be disabled when making ColumnFamily modifications, for example..
<programlisting>
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";
admin.disableTable(table);
HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1); // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2); // modifying existing ColumnFamily
admin.enableTable(table);
</programlisting>
</para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
<para>Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
to be disabled.
</para>
<section xml:id="schema.updates"><title>Schema Updates</title>
<para>When changes are made to either Tables or ColumnFamilies (e.g., region size, block size), these changes
take effect the next time there is a major compaction and the StoreFiles get re-written.
</para>
<para>See <xref linkend="store"/> for more information on StoreFiles.
</para>
</section>
</section>
<section xml:id="number.of.cfs">
<title>
On the number of column families
</title>
<para>
HBase currently does not do well with anything above two or three column families so keep the number
of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
will also be flushed though the amount of data they carry is small. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
changing flushing and compaction to work on a per column family basis). For more information
on compactions, see <xref linkend="compaction"/>.
</para>
<para>Try to make do with one column family if you can in your schemas. Only introduce a
second and third column family in the case where data access is usually column scoped;
i.e. you query one column family or the other but usually not both at the one time.
</para>
<section xml:id="number.of.cfs.card"><title>Cardinality of ColumnFamilies</title>
<para>Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows).
If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion rows, ColumnFamilyA's data will likely be spread
across many, many regions (and RegionServers). This makes mass scans for ColumnFamilyA less efficient.
</para>
</section>
</section>
<section xml:id="rowkey.design"><title>Rowkey Design</title>
<section xml:id="timeseries">
<title>
Monotonically Increasing Row Keys/Timeseries Data
</title>
<para>
In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
</para>
<para>If you do need to upload time series data into HBase, you should
study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
</para>
</section>
<section xml:id="keysize">
<title>Try to minimize row and column sizes</title>
<subtitle>Or why are my StoreFile indices large?</subtitle>
<para>In HBase, values are always freighted with their coordinates; as a
cell value passes through the system, it'll be accompanied by its
row, column name, and timestamp - always. If your rows and column names
are large, especially compared to the size of the cell value, then
you may run up against some interesting scenarios. One such is
the case described by Marc Limotte at the tail of
<link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&amp;focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
(recommended!).
Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
to facilitate random access may end up occupyng large chunks of the HBase
allotted RAM because the cell value coordinates are large.
Mark in the above cited comment suggests upping the block size so
entries in the store file index happen at a larger interval or
modify the table schema so it makes for smaller rows and column
names.
Compression will also make for larger indices. See
the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
up on the user mailing list.
</para>
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
several billion times in your data. </para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.atttributes"><title>Attributes</title>
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
to store in HBase.
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.row"><title>Rowkey Length</title>
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
when designing rowkeys.
</para>
</section>
<section xml:id="keysize.patterns"><title>Byte Patterns</title>
<para>A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes.
If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
</para>
<para>Not convinced? Below is some sample code that you can run on your own.
<programlisting>
// long
//
long l = 1234567890L;
byte[] lb = Bytes.toBytes(l);
System.out.println("long bytes length: " + lb.length); // returns 8
String s = "" + l;
byte[] sb = Bytes.toBytes(s);
System.out.println("long as string length: " + sb.length); // returns 10
// hash
//
MessageDigest md = MessageDigest.getInstance("MD5");
byte[] digest = md.digest(Bytes.toBytes(s));
System.out.println("md5 digest bytes length: " + digest.length); // returns 16
String sDigest = new String(digest);
byte[] sbDigest = Bytes.toBytes(sDigest);
System.out.println("md5 digest as string length: " + sbDigest.length); // returns 26
</programlisting>
</para>
</section>
</section>
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
<para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly),
the technique involves appending (<code>Long.MAX_VALUE - timestamp</code>) to the end of any key, e.g., [key][reverse_timestamp].
</para>
<para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
</para>
<para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
"forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
</para>
</section>
<section xml:id="rowkey.scope">
<title>Rowkeys and ColumnFamilies</title>
<para>Rowkeys are scoped to ColumnFamilies. Thus, the same rowkey could exist in each ColumnFamily that exists in a table without collision.
</para>
</section>
<section xml:id="changing.rowkeys"><title>Immutability of Rowkeys</title>
<para>Rowkeys cannot be changed. The only way they can be "changed" in a table is if the row is deleted and then re-inserted.
This is a fairly common question on the HBase dist-list so it pays to get the rowkeys right the first time (and/or before you've
inserted a lot of data).
</para>
</section>
<section xml:id="rowkey.regionsplits"><title>Relationship Between RowKeys and Region Splits</title>
<para>If you pre-split your table, it is <emphasis>critical</emphasis> to understand how your rowkey will be distributed across
the region boundaries. As an example of why this is important, consider the example of using displayable hex characters as the
lead position of the key (e.g., ""0000000000000000" to "ffffffffffffffff"). Running those key ranges through <code>Bytes.split</code>
(which is the split strategy used when creating regions in <code>HBaseAdmin.createTable(byte[] startKey, byte[] endKey, numRegions)</code>
for 10 regions will generate the following splits...
</para>
<para>
<programlisting>
48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 48 // 0
54 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 -10 // 6
61 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -67 -68 // =
68 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -124 -126 // D
75 75 75 75 75 75 75 75 75 75 75 75 75 75 75 72 // K
82 18 18 18 18 18 18 18 18 18 18 18 18 18 18 14 // R
88 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -40 -44 // X
95 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -97 -102 // _
102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 102 // f
</programlisting>
... (note: the lead byte is listed to the right as a comment.) Given that the first split is a '0' and the last split is an 'f',
everything is great, right? Not so fast.
</para>
<para>The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and
possibly "hot") region problem. To understand why, refer to an <link xlink:href="http://www.asciitable.com">ASCII Table</link>.
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will <emphasis>never appear in this
keyspace</emphasis> because the only values are [0-9] and [a-f]. Thus, the middle regions regions will
never be used. To make pre-spliting work with this example keyspace, a custom definition of splits (i.e., and not relying on the
built-in split method) is required.
</para>
<para>Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the
regions are accessible in the keyspace. While this example demonstrated the problem with a hex-key keyspace, the same problem can happen
with <emphasis>any</emphasis> keyspace. Know your data.
</para>
<para>Lesson #2: While generally not advisable, using hex-keys (and more generally, displayable data) can still work with pre-split
tables as long as all the created regions are accessible in the keyspace.
</para>
<para>To conclude this example, the following is an example of how appropriate splits can be pre-created for hex-keys:.
</para>
<programlisting>public static boolean createTable(HBaseAdmin admin, HTableDescriptor table, byte[][] splits)
throws IOException {
try {
admin.createTable( table, splits );
return true;
} catch (TableExistsException e) {
logger.info("table " + table.getNameAsString() + " already exists");
// the table already exists...
return false;
}
}
public static byte[][] getHexSplits(String startKey, String endKey, int numRegions) {
byte[][] splits = new byte[numRegions-1][];
BigInteger lowestKey = new BigInteger(startKey, 16);
BigInteger highestKey = new BigInteger(endKey, 16);
BigInteger range = highestKey.subtract(lowestKey);
BigInteger regionIncrement = range.divide(BigInteger.valueOf(numRegions));
lowestKey = lowestKey.add(regionIncrement);
for(int i=0; i &lt; numRegions-1;i++) {
BigInteger key = lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
byte[] b = String.format("%016x", key).getBytes();
splits[i] = b;
}
return splits;
}</programlisting>
</section>
</section> <!-- rowkey design -->
<section xml:id="schema.versions">
<title>
Number of Versions
</title>
<section xml:id="schema.versions.max"><title>Maximum Number of Versions</title>
<para>The maximum number of row versions to store is configured per column
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
The default for max versions is 3.
This is an important parameter because as described in <xref linkend="datamodel" />
section HBase does <emphasis>not</emphasis> overwrite row values, but rather
stores different values per row by time (and qualifier). Excess versions are removed during major
compactions. The number of max versions may need to be increased or decreased depending on application needs.
</para>
<para>It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are
very dear to you because this will greatly increase StoreFile size.
</para>
</section>
<section xml:id="schema.minversions">
<title>
Minimum Number of Versions
</title>
<para>Like maximum number of row versions, the minimum number of row versions to keep is configured per column
family via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link>.
The default for min versions is 0, which means the feature is disabled.
The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the
number of row versions parameter to allow configurations such as
"keep the last T minutes worth of data, at most N versions, <emphasis>but keep at least M versions around</emphasis>"
(where M is the value for minimum number of row versions, M&lt;N).
This parameter should only be set when time-to-live is enabled for a column family and must be less than the
number of row versions.
</para>
</section>
</section>
<section xml:id="supported.datatypes">
<title>
Supported Datatypes
</title>
<para>HBase supports a "bytes-in/bytes-out" interface via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html">Put</link> and
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
</para>
<para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and
that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
</para>
<section xml:id="counters">
<title>Counters</title>
<para>
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#increment%28org.apache.hadoop.hbase.client.Increment%29">Increment</link> in HTable.
</para>
<para>Synchronization on counters are done on the RegionServer, not in the client.
</para>
</section>
</section>
<section xml:id="schema.joins"><title>Joins</title>
<para>If you have multiple tables, don't forget to factor in the potential for <xref linkend="joins"/> into the schema design.
</para>
</section>
<section xml:id="ttl">
<title>Time To Live (TTL)</title>
<para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="cf.keep.deleted">
<title>
Keeping Deleted Cells
</title>
<para>ColumnFamilies can optionally keep deleted cells. That means deleted cells can still be retrieved with
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html">Get</link> or
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html">Scan</link> operations,
as long these operations have a time range specified that ends before the timestamp of any delete that would affect the cells.
This allows for point in time queries even in the presence of deletes.
</para>
<para>
Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells.
A new "raw" scan options returns all deleted rows and the delete markers.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="secondary.indexes">
<title>
Secondary Indexes and Alternate Query Paths
</title>
<para>This section could also be titled "what if my table rowkey looks like <emphasis>this</emphasis> but I also want to query my table like <emphasis>that</emphasis>."
A common example on the dist-list is where a row-key is of the format "user-timestamp" but there are reporting requirements on activity across users for certain
time ranges. Thus, selecting by user is easy because it is in the lead position of the key, but time is not.
</para>
<para>There is no single answer on the best way to handle this because it depends on...
<itemizedlist>
<listitem>Number of users</listitem>
<listitem>Data size and data arrival rate</listitem>
<listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>
<listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>
</itemizedlist>
... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
</para>
<para>It should not be a surprise that secondary indexes require additional cluster space and processing.
This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RBDMS products
are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off.
</para>
<para>Pay attention to <xref linkend="performance"/> when implementing any of these approaches.</para>
<para>Additionally, see the David Butler response in this dist-list thread <link xlink:href="http://search-hadoop.com/m/nvbiBp2TDP/Stargate%252Bhbase&amp;subj=Stargate+hbase">HBase, mail # user - Stargate+hbase</link>
</para>
<section xml:id="secondary.indexes.filter">
<title>
Filter Query
</title>
<para>Depending on the case, it may be appropriate to use <xref linkend="client.filter"/>. In this case, no secondary index is created.
However, don't try a full-scan on a large table like this from an application (i.e., single-threaded client).
</para>
</section>
<section xml:id="secondary.indexes.periodic">
<title>
Periodic-Update Secondary Index
</title>
<para>A secondary index could be created in an other table which is periodically updated via a MapReduce job. The job could be executed intra-day, but depending on
load-strategy it could still potentially be out of sync with the main data table.</para>
<para>See <xref linkend="mapreduce.example.readwrite"/> for more information.</para>
</section>
<section xml:id="secondary.indexes.dualwrite">
<title>
Dual-Write Secondary Index
</title>
<para>Another strategy is to build the secondary index while publishing data to the cluster (e.g., write to data table, write to index table).
If this is approach is taken after a data table already exists, then bootstrapping will be needed for the secondary index with a MapReduce job (see <xref linkend="secondary.indexes.periodic"/>).</para>
</section>
<section xml:id="secondary.indexes.summary">
<title>
Summary Tables
</title>
<para>Where time-ranges are very wide (e.g., year-long report) and where the data is voluminous, summary tables are a common approach.
These would be generated with MapReduce jobs into another table.</para>
<para>See <xref linkend="mapreduce.example.summary"/> for more information.</para>
</section>
<section xml:id="secondary.indexes.coproc">
<title>
Coprocessor Secondary Index
</title>
<para>Coprocessors act like RDBMS triggers. These were added in 0.92. For more information, see <xref linkend="coprocessors"/>
</para>
</section>
</section>
<section xml:id="schema.smackdown"><title>Schema Design Smackdown</title>
<para>This section will describe common schema design questions that appear on the dist-list. These are
general guidelines and not laws - each application must consider its own needs.
</para>
<section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
<para>A common question is whether one should prefer rows or HBase's built-in-versioning. The context is typically where there are
"a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions). The
rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
</para>
<para>Preference: Rows (generally speaking).
</para>
</section>
<section xml:id="schema.smackdown.rowscols"><title>Rows vs. Columns</title>
<para>Another common question is whether one should prefer rows or columns. The context is typically in extreme cases of wide
tables, such as having 1 row with 1 million attributes, or 1 million rows with 1 columns apiece.
</para>
<para>Preference: Rows (generally speaking). To be clear, this guideline is in the context is in extremely wide cases, not in the
standard use-case where one needs to store a few dozen or hundred columns. But there is also a middle path between these two
options, and that is "Rows as Columns."
</para>
</section>
<section xml:id="schema.smackdown.rowsascols"><title>Rows as Columns</title>
<para>The middle path between Rows vs. Columns is packing data that would be a separate row into columns, for certain rows.
OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
columns. This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
advantage of being I/O efficient. For an overview of this approach, see
<link xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</link>
from HBaseCon2012.
</para>
</section>
</section>
<section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
<para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
</para>
</section>
<section xml:id="constraints"><title>Constraints</title>
<para>HBase currently supports 'constraints' in traditional (SQL) database parlance. The advised usage for Constraints is in enforcing business rules for attributes in the table (eg. make sure values are in the range 1-10).
Constraints could also be used to enforce referential integrity, but this is strongly discouraged as it will dramatically decrease the write throughput of the tables where integrity checking is enabled.
Extensive documentation on using Constraints can be found at: <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/constraint">Constraint</link> since version 0.94.
</para>
</section>
</chapter> <!-- schema design -->
<!-- schema design -->
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="schema_design.xml" />
<chapter xml:id="mapreduce">
<title>HBase and MapReduce</title>
@ -2175,14 +1738,14 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
</para>
</section>
</section>
<section>
<title>Online Region Merges</title>
<para>Both Master and Regionserver participate in the event of online region merges.
Client sends merge RPC to master, then master moves the regions together to the
same regionserver where the more heavily loaded region resided, finally master
send merge request to this regionserver and regionserver run the region merges.
send merge request to this regionserver and regionserver run the region merges.
Similar with process of region splits, region merges run as a local transaction
on the regionserver, offlines the regions and then merges two regions on the file
system, atomically delete merging regions from META and add merged region to the META,
@ -2193,7 +1756,7 @@ myHtd.setValue(HTableDescriptor.SPLIT_POLICY, MyCustomSplitPolicy.class.getName(
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true
</programlisting>
It's an asynchronous operation and call returns immediately without waiting merge completed.
Passing 'true' as the optional third parameter will force a merge ('force' merges regardless
Passing 'true' as the optional third parameter will force a merge ('force' merges regardless
else merge will fail unless passed adjacent regions. 'force' is for expert use only)
</para>
</section>
@ -3121,19 +2684,22 @@ hbase> describe 't1'</programlisting>
<para>
You will find the snappy library file under the .libs directory from your Snappy build (For example
/home/hbase/snappy-1.0.5/.libs/). The file is called libsnappy.so.1.x.x where 1.x.x is the version of the snappy
code you are building. You can either copy this file into your hbase directory under libsnappy.so name, or simply
create a symbolic link to it.
code you are building. You can either copy this file into your hbase lib directory -- under lib/native/PLATFORM --
naming the file as libsnappy.so,
or simply create a symbolic link to it (See ./bin/hbase for how it does library path for native libs).
</para>
<para>
The second file you need is the hadoop native library. You will find this file in your hadoop installation directory
under lib/native/Linux-amd64-64/ or lib/native/Linux-i386-32/. The file you are looking for is libhadoop.so.1.x.x.
Again, you can simply copy this file or link to it, under the name libhadoop.so.
Again, you can simply copy this file or link to it from under hbase in lib/native/PLATFORM (e.g. Linux-amd64-64, etc.),
using the name libhadoop.so.
</para>
<para>
At the end of the installation, you should have both libsnappy.so and libhadoop.so links or files present into
lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32
lib/native/Linux-amd64-64 or into lib/native/Linux-i386-32 (where the last part of the directory path is the
PLATFORM you built and rare running the native lib on)
</para>
<para>To point hbase at snappy support, in hbase-env.sh set
<programlisting>export HBASE_LIBRARY_PATH=/pathtoyourhadoop/lib/native/Linux-amd64-64</programlisting>

View File

@ -37,141 +37,8 @@
<section xml:id="casestudies.schema">
<title>Schema Design</title>
<section xml:id="casestudies.schema.listdata">
<title>List Data</title>
<para>The following is an exchange from the user dist-list regarding a fairly common question:
how to handle per-user list data in Apache HBase.
</para>
<para>*** QUESTION ***</para>
<para>
We're looking at how to store a large amount of (per-user) list data in
HBase, and we were trying to figure out what kind of access pattern made
the most sense. One option is store the majority of the data in a key, so
we could have something like:
</para>
<programlisting>
&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId1&gt;:"" (no value)
&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId2&gt;:"" (no value)
&lt;FixedWidthUserName&gt;&lt;FixedWidthValueId3&gt;:"" (no value)
</programlisting>
The other option we had was to do this entirely using:
<programlisting>
&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum0&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
&lt;FixedWidthUserName&gt;&lt;FixedWidthPageNum1&gt;:&lt;FixedWidthLength&gt;&lt;FixedIdNextPageNum&gt;&lt;ValueId1&gt;&lt;ValueId2&gt;&lt;ValueId3&gt;...
</programlisting>
<para>
where each row would contain multiple values.
So in one case reading the first thirty values would be:
</para>
<programlisting>
scan { STARTROW =&gt; 'FixedWidthUsername' LIMIT =&gt; 30}
</programlisting>
And in the second case it would be
<programlisting>
get 'FixedWidthUserName\x00\x00\x00\x00'
</programlisting>
<para>
The general usage pattern would be to read only the first 30 values of
these lists, with infrequent access reading deeper into the lists. Some
users would have &lt;= 30 total values in these lists, and some users would
have millions (i.e. power-law distribution)
</para>
<para>
The single-value format seems like it would take up more space on HBase,
but would offer some improved retrieval / pagination flexibility. Would
there be any significant performance advantages to be able to paginate via
gets vs paginating with scans?
</para>
<para>
My initial understanding was that doing a scan should be faster if our
paging size is unknown (and caching is set appropriately), but that gets
should be faster if we'll always need the same page size. I've ended up
hearing different people tell me opposite things about performance. I
assume the page sizes would be relatively consistent, so for most use cases
we could guarantee that we only wanted one page of data in the
fixed-page-length case. I would also assume that we would have infrequent
updates, but may have inserts into the middle of these lists (meaning we'd
need to update all subsequent rows).
</para>
<para>
Thanks for help / suggestions / follow-up questions.
</para>
<para>*** ANSWER ***</para>
<para>
If I understand you correctly, you're ultimately trying to store
triples in the form "user, valueid, value", right? E.g., something
like:
</para>
<programlisting>
"user123, firstname, Paul",
"user234, lastname, Smith"
</programlisting>
<para>
(But the usernames are fixed width, and the valueids are fixed width).
</para>
<para>
And, your access pattern is along the lines of: "for user X, list the
next 30 values, starting with valueid Y". Is that right? And these
values should be returned sorted by valueid?
</para>
<para>
The tl;dr version is that you should probably go with one row per
user+value, and not build a complicated intra-row pagination scheme on
your own unless you're really sure it is needed.
</para>
<para>
Your two options mirror a common question people have when designing
HBase schemas: should I go "tall" or "wide"? Your first schema is
"tall": each row represents one value for one user, and so there are
many rows in the table for each user; the row key is user + valueid,
and there would be (presumably) a single column qualifier that means
"the value". This is great if you want to scan over rows in sorted
order by row key (thus my question above, about whether these ids are
sorted correctly). You can start a scan at any user+valueid, read the
next 30, and be done. What you're giving up is the ability to have
transactional guarantees around all the rows for one user, but it
doesn't sound like you need that. Doing it this way is generally
recommended (see
here <link xlink:href="http://hbase.apache.org/book.html#schema.smackdown">http://hbase.apache.org/book.html#schema.smackdown</link>).
</para>
<para>
Your second option is "wide": you store a bunch of values in one row,
using different qualifiers (where the qualifier is the valueid). The
simple way to do that would be to just store ALL values for one user
in a single row. I'm guessing you jumped to the "paginated" version
because you're assuming that storing millions of columns in a single
row would be bad for performance, which may or may not be true; as
long as you're not trying to do too much in a single request, or do
things like scanning over and returning all of the cells in the row,
it shouldn't be fundamentally worse. The client has methods that allow
you to get specific slices of columns.
</para>
<para>
Note that neither case fundamentally uses more disk space than the
other; you're just "shifting" part of the identifying information for
a value either to the left (into the row key, in option one) or to the
right (into the column qualifiers in option 2). Under the covers,
every key/value still stores the whole row key, and column family
name. (If this is a bit confusing, take an hour and watch Lars
George's excellent video about understanding HBase schema design:
<link xlink:href="http://www.youtube.com/watch?v=_HLoH_PgrLk)">http://www.youtube.com/watch?v=_HLoH_PgrLk)</link>.
</para>
<para>
A manually paginated version has lots more complexities, as you note,
like having to keep track of how many things are in each page,
re-shuffling if new values are inserted, etc. That seems significantly
more complex. It might have some slight speed advantages (or
disadvantages!) at extremely high throughput, and the only way to
really know that would be to try it out. If you don't have time to
build it both ways and compare, my advice would be to start with the
simplest option (one row per user+value). Start simple and iterate! :)
</para>
</section> <!-- listdata -->
<para>See the schema design case studies here: <xref linkend="schema.casestudies"/>
</para>
</section> <!-- schema design -->

View File

@ -36,7 +36,7 @@
keep elsewhere -- it should be public so it can be observed -- and you can update dev mailing list on progress. When the feature is ready for commit,
3 +1s from committers will get your feature merged<footnote><para>See <link xlink:href="http://search-hadoop.com/m/asM982C5FkS1">HBase, mail # dev - Thoughts about large feature dev branches</link></para></footnote>
</para>
</section>
</section>
<section xml:id="patchplusonepolicy">
<title>Patch +1 Policy</title>
<para>
@ -47,7 +47,7 @@ first to see if it works before we cast it in stone.
<para>
Apache HBase is made of
<link xlink:href="https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel">components</link>.
Components have one or more <xref linkend="OWNER" />s. See the 'Description' field on the
Components have one or more <xref linkend="OWNER" />s. See the 'Description' field on the
<link xlink:href="https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel">components</link>
JIRA page for who the current owners are by component.
</para>
@ -67,8 +67,31 @@ first pass).
Any -1 on a patch by anyone vetos a patch; it cannot be committed
until the justification for the -1 is addressed.
</para>
</section>
</section>
</section>
<section xml:id="hbase.fix.version.in.JIRA">
<title>How to set fix version in JIRA on issue resolve</title>
<para>Here is how <link xlink:href="http://search-hadoop.com/m/azemIi5RCJ1">we agreed</link> to set versions in JIRA when we
resolve an issue. If trunk is going to be 0.98.0 then:
<itemizedlist>
<listitem><para>
Commit only to trunk: Mark with 0.98
</para></listitem>
<listitem><para>
Commit to 0.95 and trunk : Mark with 0.98, and 0.95.x
</para></listitem>
<listitem><para>
Commit to 0.94.x and 0.95, and trunk: Mark with 0.98, 0.95.x, and 0.94.x
</para></listitem>
<listitem><para>
Commit to 89-fb: Mark with 89-fb.
</para></listitem>
<listitem><para>
Commit site fixes: no version
</para></listitem>
</itemizedlist>
</para>
</section>
</section>
<section xml:id="community.roles">
<title>Community Roles</title>
<section xml:id="OWNER">
@ -104,6 +127,6 @@ goals or the design toward which they are driving their component
If you would like to be volunteer as a component owner, just write the
dev list and we'll sign you up. Owners do not need to be committers.
</para>
</section>
</section>
</section>
</section>
</chapter>

View File

@ -669,7 +669,7 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
The generated file is a docbook section with a glossary
in it-->
<!--presumes the pre-site target has put the hbase-default.xml at this location-->
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../target/docbkx/hbase-default.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbkx/hbase-default.xml" />
</section>
<section xml:id="hbase.env.sh">

View File

@ -325,16 +325,8 @@ What is the new development version for "HBase"? (org.apache.hbase:hbase) 0.92.3
<title>Updating hbase.apache.org</title>
<section xml:id="hbase.org.site.contributing">
<title>Contributing to hbase.apache.org</title>
<para>The Apache HBase apache web site (including this reference guide) is maintained as part of the main
Apache HBase source tree, under <filename>/src/main/docbkx</filename> and <filename>/src/main/site</filename>
<footnote><para>Before 0.95.0, site and reference guide were at src/docbkx and src/site respectively</para></footnote>.
The former -- docbkx -- is this reference guide as a bunch of xml marked up using <link xlink:href="http://docbook.org">docbook</link>;
the latter is the hbase site (the navbars, the header, the layout, etc.),
and some of the documentation, legacy pages mostly that are in the process of being merged into the docbkx tree that is
converted to html by a maven plugin by the site build.</para>
<para>To contribute to the reference guide, edit these files under site or docbkx and submit them as a patch
(see <xref linkend="submitting.patches"/>). Your Jira should contain a summary of the changes in each
section (see <link xlink:href="https://issues.apache.org/jira/browse/HBASE-6081">HBASE-6081</link> for an example).</para>
<para>The Apache HBase apache web site (including this reference guide) is maintained as part of the main Apache HBase source tree, under <filename>/src/docbkx</filename> and <filename>/src/site</filename>. The former is this reference guide; the latter, in most cases, are legacy pages that are in the process of being merged into the docbkx tree.</para>
<para>To contribute to the reference guide, edit these files and submit them as a patch (see <xref linkend="submitting.patches"/>). Your Jira should contain a summary of the changes in each section (see <link xlink:href="https://issues.apache.org/jira/browse/HBASE-6081">HBASE-6081</link> for an example).</para>
<para>To generate the site locally while you're working on it, run:
<programlisting>mvn site</programlisting>
Then you can load up the generated HTML files in your browser (file are under <filename>/target/site</filename>).</para>
@ -342,22 +334,20 @@ What is the new development version for "HBase"? (org.apache.hbase:hbase) 0.92.3
<section xml:id="hbase.org.site.publishing">
<title>Publishing hbase.apache.org</title>
<para>As of <link xlink:href="https://issues.apache.org/jira/browse/INFRA-5680">INFRA-5680 Migrate apache hbase website</link>,
to publish the website, build it, and then deploy it over a checkout of <filename>https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk</filename>.
Finally, check it in. For example, if trunk is checked out out at <filename>/Users/stack/checkouts/trunk</filename>
and the hbase website, hbase.apache.org, is checked out at <filename>/Users/stack/checkouts/hbase.apache.org/trunk</filename>, to update
to publish the website, build it, and then deploy it over a checkout of <filename>https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk</filename>,
and then check it in. For example, if trunk is checked out out at <filename>/Users/stack/checkouts/trunk</filename>
and hbase.apache.org is checked out at <filename>/Users/stack/checkouts/hbase.apache.org/trunk</filename>, to update
the site, do the following:
<programlisting>
# Build the site and deploy it to the checked out directory
# Getting the javadoc into site is a little tricky. You have to build it before you invoke 'site'.
$ MAVEN_OPTS=" -Xmx3g" mvn clean install -DskipTests javadoc:aggregate site site:stage -DstagingDirectory=/Users/stack/checkouts/hbase.apache.org/trunk
</programlisting>
Now check the deployed site by viewing in a brower, browse to file:////Users/stack/checkouts/hbase.apache.org/trunk/index.html and check all is good.
If all checks out, commit it and your new build will show up immediately at http://hbase.apache.org
<programlisting>
# Getting the javadoc into site is a little tricky. You have to build it independent, then
# 'aggregate' it at top-level so the pre-site site lifecycle step can find it; that is
# what the javadoc:javadoc and javadoc:aggregate is about.
$ MAVEN_OPTS=" -Xmx3g" mvn clean -DskipTests javadoc:aggregate site site:stage -DstagingDirectory=/Users/stack/checkouts/hbase.apache.org/trunk
# Check the deployed site by viewing in a brower.
# If all is good, commit it and it will show up at http://hbase.apache.org
#
$ cd /Users/stack/checkouts/hbase.apache.org/trunk
$ svn status
# Do an svn add of any new content...
$ svn add ....
$ svn commit -m 'Committing latest version of website...'
</programlisting>
</para>

View File

@ -28,11 +28,58 @@
-->
<title>Upgrading</title>
<para>You cannot skip major verisons upgrading. If you are upgrading from
version 0.20.x to 0.92.x, you must first go from 0.20.x to 0.90.x and then go
from 0.90.x to 0.92.x.</para>
version 0.90.x to 0.94.x, you must first go from 0.90.x to 0.92.x and then go
from 0.92.x to 0.94.x.</para>
<para>
Review <xref linkend="configuration" />, in particular the section on Hadoop version.
</para>
<section xml:id="hbase.versioning">
<title>HBase version numbers</title>
<para>HBase has not walked a straight line where version numbers are concerned.
Since we came up out of hadoop itself, we originally tracked hadoop versioning.
Later we left hadoop versioning behind because we were moving at a different rate
to that of our parent. If you are into the arcane, checkout our old wiki page
on <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HBaseVersions">HBase Versioning</link>
which tries to connect the HBase version dots.</para>
<section xml:id="hbase.development.series"><title>Odd/Even Versioning or "Development"" Series Releases</title>
<para>Ahead of big releases, we have been putting up preview versions to start the
feedback cycle turning-over earlier. These "Development" Series releases,
always odd-numbered, come with no guarantees, not even regards being able
to upgrade between two sequential releases (we reserve the right to break compatibility across
"Development" Series releases). Needless to say, these releases are not for
production deploys. They are a preview of what is coming in the hope that
interested parties will take the release for a test drive and flag us early if we
there are issues we've missed ahead of our rolling a production-worthy release.
</para>
<para>Our first "Development" Series was the 0.89 set that came out ahead of
HBase 0.90.0. HBase 0.95 is another "Development" Series that portends
HBase 0.96.0.
</para>
</section>
<section xml:id="hbase.binary.compatibility">
<title>Binary Compatibility</title>
<para>When we say two HBase versions are compatible, we mean that the versions
are wire and binary compatible. Compatible HBase versions means that
clients can talk to compatible but differently versioned servers.
It means too that you can just swap out the jars of one version and replace
them with the jars of another, compatible version and all will just work.
Unless otherwise specified, HBase point versions are binary compatible.
You can safely do rolling upgrades between binary compatible versions; i.e.
across point versions: e.g. from 0.94.5 to 0.94.6<footnote><para>See
<link xlink:href="http://search-hadoop.com/m/bOOvwHGW981/Does+compatibility+between+versions+also+mean+binary+compatibility%253F&amp;subj=Re+Does+compatibility+between+versions+also+mean+binary+compatibility+">Does compatibility between versions also mean binary compatibility?</link>
discussion on the hbaes dev mailing list.
</para></footnote>.
</para>
</section>
<section xml:id="hbase.rolling.restart">
<title>Rolling Upgrade between versions/Binary compatibility</title>
<para>Unless otherwise specified, HBase point versions are binary compatible.
you can do a rolling upgrade between hbase point versions;
for example, you can go to 0.94.6 from 0.94.5 by doing a rolling upgrade across the cluster
replacing the 0.94.5 binary with a 0.94.6 binary.
</para>
</section>
</section>
<section xml:id="upgrade0.96">
<title>Upgrading from 0.94.x to 0.96.x</title>
<subtitle>The Singularity</subtitle>
@ -47,7 +94,12 @@
</section>
<section xml:id="upgrade0.94">
<title>Upgrading from 0.92.x to 0.94.x</title>
<para>0.92 and 0.94 are interface compatible. You can do a rolling upgrade between these versions.
<para>We used to think that 0.92 and 0.94 were interface compatible and that you can do a
rolling upgrade between these versions but then we figured that
<link xlink:href="https://issues.apache.org/jira/browse/HBASE-5357">HBASE-5357 Use builder pattern in HColumnDescriptor</link>
changed method signatures so rather than return void they instead return HColumnDescriptor. This
will throw <programlisting>java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V</programlisting>
.... so 0.92 and 0.94 are NOT compatible. You cannot do a rolling upgrade between them.
</para>
</section>
<section xml:id="upgrade0.92">