hbase-5404. book.xml, performance.xml - more info on compression and schema design
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1244649 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
71682997f3
commit
421c120f4a
|
@ -648,15 +648,17 @@ admin.enableTable(table);
|
||||||
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
|
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
|
||||||
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
|
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
|
||||||
several billion times in your data. </para>
|
several billion times in your data. </para>
|
||||||
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
|
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
|
||||||
<section xml:id="keysize.cf"><title>Column Families</title>
|
<section xml:id="keysize.cf"><title>Column Families</title>
|
||||||
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
|
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
|
||||||
</para>
|
</para>
|
||||||
|
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="keysize.atttributes"><title>Attributes</title>
|
<section xml:id="keysize.atttributes"><title>Attributes</title>
|
||||||
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
|
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
|
||||||
to store in HBase.
|
to store in HBase.
|
||||||
</para>
|
</para>
|
||||||
|
<para>See <xref linkend="keyvalue"/> for more information on HBase stores data internally to see why this is important.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="keysize.row"><title>Rowkey Length</title>
|
<section xml:id="keysize.row"><title>Rowkey Length</title>
|
||||||
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
|
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
|
||||||
|
@ -692,6 +694,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
|
<section xml:id="reverse.timestamp"><title>Reverse Timestamps</title>
|
||||||
<para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
|
<para>A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps
|
||||||
|
@ -888,7 +891,7 @@ System.out.println("md5 digest as string length: " + sbDigest.length); // ret
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
|
<section xml:id="schema.ops"><title>Operational and Performance Configuration Options</title>
|
||||||
<para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
|
<para>See the Performance section <xref linkend="perf.schema"/> for more information operational and performance
|
||||||
schema design options, such as Bloom Filters, Table-configured regionsizes, and blocksizes.
|
schema design options, such as Bloom Filters, Table-configured regionsizes, compression, and blocksizes.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
|
@ -198,7 +198,8 @@
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="perf.schema.keys">
|
<section xml:id="perf.schema.keys">
|
||||||
<title>Key and Attribute Lengths</title>
|
<title>Key and Attribute Lengths</title>
|
||||||
<para>See <xref linkend="keysize" />.</para>
|
<para>See <xref linkend="keysize" />. See also <xref linkend="perf.compression.however" /> for
|
||||||
|
compression caveats.</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="schema.regionsize"><title>Table RegionSize</title>
|
<section xml:id="schema.regionsize"><title>Table RegionSize</title>
|
||||||
<para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on
|
<para>The regionsize can be set on a per-table basis via <code>setFileSize</code> on
|
||||||
|
@ -244,6 +245,15 @@
|
||||||
<title>Compression</title>
|
<title>Compression</title>
|
||||||
<para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information.
|
<para>Production systems should use compression with their ColumnFamily definitions. See <xref linkend="compression" /> for more information.
|
||||||
</para>
|
</para>
|
||||||
|
<section xml:id="perf.compression.however"><title>However...</title>
|
||||||
|
<para>Compression deflates data <emphasis>on disk</emphasis>. When it's in-memory (e.g., in the
|
||||||
|
MemStore) or on the wire (e.g., transferring between RegionServer and Client) it's inflated.
|
||||||
|
So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate
|
||||||
|
the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
|
||||||
|
</para>
|
||||||
|
<para>See <xref linkend="keysize" /> on for schema design tips, and <xref linkend="keyvalue"/> for more information on HBase stores data internally.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
</section>
|
</section>
|
||||||
</section> <!-- perf schema -->
|
</section> <!-- perf schema -->
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue