HBASE-4566 book.xml,ops_mgt.xml - KeyValue documentation
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1181091 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
f4d8833824
commit
5364f98a32
|
@ -312,7 +312,7 @@ public static class MyReducer extends TableReducer<Text, IntWritable, Immutab
|
||||||
<para>A good general introduction on the strength and weaknesses modelling on
|
<para>A good general introduction on the strength and weaknesses modelling on
|
||||||
the various non-rdbms datastores is Ian Varleys' Master thesis,
|
the various non-rdbms datastores is Ian Varleys' Master thesis,
|
||||||
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
|
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
|
||||||
Recommended.
|
Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="schema.creation">
|
<section xml:id="schema.creation">
|
||||||
<title>
|
<title>
|
||||||
|
@ -400,7 +400,7 @@ admin.enableTable(table);
|
||||||
</para>
|
</para>
|
||||||
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
|
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
|
||||||
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
|
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
|
||||||
several billion times in your data</para>
|
several billion times in your data. See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
|
||||||
<section xml:id="keysize.cf"><title>Column Families</title>
|
<section xml:id="keysize.cf"><title>Column Families</title>
|
||||||
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
|
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
|
||||||
</para>
|
</para>
|
||||||
|
@ -1615,6 +1615,8 @@ scan.setFilter(filter);
|
||||||
Schubert Zhang's blog post on <link xlink:ref="http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html">HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs</link> makes for a thorough introduction to HBase's hfile. Matteo Bertozzi has also put up a
|
Schubert Zhang's blog post on <link xlink:ref="http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html">HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs</link> makes for a thorough introduction to HBase's hfile. Matteo Bertozzi has also put up a
|
||||||
helpful description, <link xlink:href="http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw">HBase I/O: HFile</link>.
|
helpful description, <link xlink:href="http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw">HBase I/O: HFile</link>.
|
||||||
</para>
|
</para>
|
||||||
|
<para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html">HFile source code</link>.
|
||||||
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="hfile_tool">
|
<section xml:id="hfile_tool">
|
||||||
|
@ -1631,6 +1633,40 @@ scan.setFilter(filter);
|
||||||
tool.</para>
|
tool.</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="hfile.blocks">
|
||||||
|
<title>Blocks</title>
|
||||||
|
<para>StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis.
|
||||||
|
</para>
|
||||||
|
<para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlock.html">HFileBlock source code</link>.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="keyvalue">
|
||||||
|
<title>KeyValue</title>
|
||||||
|
<para>The KeyValue class is the heart of data storage in HBase. KeyValue wraps a byte array and takes offsets and lengths into passed array
|
||||||
|
at where to start interpreting the content as KeyValue.
|
||||||
|
</para>
|
||||||
|
<para>The KeyValue format inside a byte array is:
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>keylength</listitem>
|
||||||
|
<listitem>valuelength</listitem>
|
||||||
|
<listitem>key</listitem>
|
||||||
|
<listitem>value</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
<para>The Key is further decomposed as:
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>rowlength</listitem>
|
||||||
|
<listitem>row (i.e., the rowkey)</listitem>
|
||||||
|
<listitem>columnfamilylength</listitem>
|
||||||
|
<listitem>columnfamily</listitem>
|
||||||
|
<listitem>columnqualifier</listitem>
|
||||||
|
<listitem>timestamp</listitem>
|
||||||
|
<listitem>keytype (e.g., Put, Delete)</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
<para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/KeyValue.html">KeyValue source code</link>.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
<section xml:id="compaction">
|
<section xml:id="compaction">
|
||||||
<title>Compaction</title>
|
<title>Compaction</title>
|
||||||
<para>There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent
|
<para>There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent
|
||||||
|
|
|
@ -301,6 +301,32 @@ false
|
||||||
<para>Since the cluster is up, there is a risk that edits could be missed in the export process.
|
<para>Since the cluster is up, there is a risk that edits could be missed in the export process.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
</section> <!-- backup -->
|
||||||
|
<section xml:id="ops.capacity"><title>Capacity Planning</title>
|
||||||
|
<section xml:id="ops.capacity.storage"><title>Storage</title>
|
||||||
|
<para>A common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
|
||||||
|
There are several apsects to consider, the most important of which is what data load into the cluster. Start
|
||||||
|
with a solid understanding of how HBase handles data internally (KeyValue).
|
||||||
|
</para>
|
||||||
|
<section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
|
||||||
|
<para>HBase storage will be dominated by KeyValues. See <xref linkend="keyvalue" /> and <xref linkend="keysize" /> for
|
||||||
|
how HBase stores data internally.
|
||||||
|
</para>
|
||||||
|
<para>It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
|
||||||
|
rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
|
||||||
|
factor.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
|
||||||
|
<para>KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
|
||||||
|
Blocks are aggregated into StoreFile's. See <xref linkend="regions.arch" />.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
|
||||||
|
<para>Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
Loading…
Reference in New Issue