hbase-5404. book.xml, performance.xml - more info on compression and schema design

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1244649 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2012-02-15 19:08:05 +00:00
parent 71682997f3
commit 421c120f4a
2 changed files with 16 additions and 3 deletions

View File

@ -648,15 +648,17 @@ admin.enableTable(table);
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
several billion times in your data. </para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.atttributes"><title>Attributes</title>
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
to store in HBase.
</para>
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
</section>
<section xml:id="keysize.row"><title>Rowkey Length</title>
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
@ -692,6 +694,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
</programlisting>
</para>
</section>
</section>
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
<para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
@ -888,7 +891,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
</section>
<section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
<para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
schema design options, such as Bloom Filters, Table-configured regionsizes, and blocksizes.
schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
</para>
</section>

View File

@ -198,7 +198,8 @@
</section>
<section xml:id="perf.schema.keys">
<title>Key and Attribute Lengths</title>
<para>See <xref linkend="keysize" />.</para>
<para>See <xref linkend="keysize" />. See also <xref linkend="perf.compression.however" /> for
compression caveats.</para>
</section>
<section xml:id="schema.regionsize"><title>Table RegionSize</title>
<para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on
@ -244,6 +245,15 @@
<title>Compression</title>
<para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information.
</para>
<section xml:id="perf.compression.however"><title>However...</title>
<para>Compression deflates data <emphasis>on disk</emphasis>. When it's in-memory (e.g., in the
MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated.
So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate
the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
</para>
<para>See <xref linkend="keysize" /> on for schema design tips, and <xref linkend="keyvalue"/> for more information on HBase stores data internally.
</para>
</section>
</section>
</section> <!-- perf schema -->