HBASE-4208 updates to book.xml, performance.xml

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1158427 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-08-16 19:32:13 +00:00
parent 9686b3d4ba
commit f43cb3245c
2 changed files with 53 additions and 8 deletions

View File

@ -108,10 +108,10 @@ TableMapReduceUtil.initTableMapperJob(
job // job instance
);</programlisting>
...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
<programlisting>public class MyMapper extends TableMapper&lt;Text, LongWritable&gt; {
public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException {
// process data for the row from the Result instance.</programlisting>
<programlisting>
public class MyMapper extends TableMapper&lt;Text, LongWritable&gt; {
public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
// process data for the row from the Result instance.</programlisting>
</para>
</section>
<section xml:id="mapreduce.htable.access">
@ -211,7 +211,7 @@ admin.enableTable(table);
</section>
<section xml:id="keysize">
<title>Try to minimize row and column sizes</title>
<subtitle>Or why are my storefile indices large?</subtitle>
<subtitle>Or why are my StoreFile indices large?</subtitle>
<para>In HBase, values are always freighted with their coordinates; as a
cell value passes through the system, it'll be accompanied by its
row, column name, and timestamp - always. If your rows and column names
@ -230,9 +230,25 @@ admin.enableTable(table);
Compression will also make for larger indices. See
the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&amp;subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
up on the user mailing list.
`</para>
<para>In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants.
Also, try to keep the row-keys as small as possible too.</para>
</para>
<para>Most frequently small inefficiencies don't matter all that much. Unfortunately,
this is a case where it does. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
several billion times in your data</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
</para>
</section>
<section xml:id="keysize.atttributes"><title>Attributes</title>
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
to store in HBase.
</para>
</section>
<section xml:id="keysize.row"><title>Row Key</title>
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
when designing rowkeys.
</para>
</section>
</section>
<section xml:id="schema.versions">
<title>
@ -289,6 +305,14 @@ admin.enableTable(table);
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="ttl">
<title>Time To Live (TTL)</title>
<para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
</para>
</section>
<section xml:id="secondary.indexes">
<title>
Secondary Indexes and Alternate Query Paths

View File

@ -336,4 +336,25 @@ htable.close();</programlisting></para>
</section>
</section> <!-- reading -->
<section xml:id="perf.deleting">
<title>Deleting from HBase</title>
<section xml:id="perf.deleting.queue">
<title>Using HBase Tables as Queues</title>
<para>HBase tables are sometimes used as queues. In this case, special care must be taken to regularly perform major compactions on tables used in
this manner. As is documented in <xref linkend="datamodel" />, marking rows as deleted creates additional StoreFiles which then need to be processed
on reads. Tombstones only get cleaned up with major compactions.
</para>
<para>See also <xref linkend="compaction" /> and <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
</para>
</section>
<section xml:id="perf.deleting.rpc">
<title>Delete RPC Behavior</title>
<para>Be aware that <code>htable.delete(Delete)</code> doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation.
For a large number of deletes, consider <code>htable.delete(List)</code>.
</para>
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"></link>
</para>
</section>
</section> <!-- deleting -->
</chapter>