HBASE-4208 updates to book.xml, performance.xml
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1158427 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
9686b3d4ba
commit
f43cb3245c
|
@ -108,10 +108,10 @@ TableMapReduceUtil.initTableMapperJob(
|
||||||
job // job instance
|
job // job instance
|
||||||
);</programlisting>
|
);</programlisting>
|
||||||
...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
|
...and the mapper instance would extend <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html">TableMapper</link>...
|
||||||
<programlisting>public class MyMapper extends TableMapper<Text, LongWritable> {
|
<programlisting>
|
||||||
public void map(ImmutableBytesWritable row, Result value, Context context)
|
public class MyMapper extends TableMapper<Text, LongWritable> {
|
||||||
throws InterruptedException, IOException {
|
public void map(ImmutableBytesWritable row, Result value, Context context) throws InterruptedException, IOException {
|
||||||
// process data for the row from the Result instance.</programlisting>
|
// process data for the row from the Result instance.</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="mapreduce.htable.access">
|
<section xml:id="mapreduce.htable.access">
|
||||||
|
@ -211,7 +211,7 @@ admin.enableTable(table);
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="keysize">
|
<section xml:id="keysize">
|
||||||
<title>Try to minimize row and column sizes</title>
|
<title>Try to minimize row and column sizes</title>
|
||||||
<subtitle>Or why are my storefile indices large?</subtitle>
|
<subtitle>Or why are my StoreFile indices large?</subtitle>
|
||||||
<para>In HBase, values are always freighted with their coordinates; as a
|
<para>In HBase, values are always freighted with their coordinates; as a
|
||||||
cell value passes through the system, it'll be accompanied by its
|
cell value passes through the system, it'll be accompanied by its
|
||||||
row, column name, and timestamp - always. If your rows and column names
|
row, column name, and timestamp - always. If your rows and column names
|
||||||
|
@ -230,9 +230,25 @@ admin.enableTable(table);
|
||||||
Compression will also make for larger indices. See
|
Compression will also make for larger indices. See
|
||||||
the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
|
the thread <link xlink:href="http://search-hadoop.com/m/hemBv1LiN4Q1/a+question+storefileIndexSize&subj=a+question+storefileIndexSize">a question storefileIndexSize</link>
|
||||||
up on the user mailing list.
|
up on the user mailing list.
|
||||||
`</para>
|
</para>
|
||||||
<para>In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants.
|
<para>Most frequently small inefficiencies don't matter all that much. Unfortunately,
|
||||||
Also, try to keep the row-keys as small as possible too.</para>
|
this is a case where it does. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
|
||||||
|
several billion times in your data</para>
|
||||||
|
<section xml:id="keysize.cf"><title>Column Families</title>
|
||||||
|
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="keysize.atttributes"><title>Attributes</title>
|
||||||
|
<para>Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
|
||||||
|
to store in HBase.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="keysize.row"><title>Row Key</title>
|
||||||
|
<para>Keep them as short as is reasonable such that they can still be useful for required data access (e.g., Get vs. Scan).
|
||||||
|
A short key that is useless for data access is not better than a longer key with better get/scan properties. Expect tradeoffs
|
||||||
|
when designing rowkeys.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="schema.versions">
|
<section xml:id="schema.versions">
|
||||||
<title>
|
<title>
|
||||||
|
@ -289,6 +305,14 @@ admin.enableTable(table);
|
||||||
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
|
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="ttl">
|
||||||
|
<title>Time To Live (TTL)</title>
|
||||||
|
<para>ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached.
|
||||||
|
This applies to <emphasis>all</emphasis> versions of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC.
|
||||||
|
</para>
|
||||||
|
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> for more information.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
<section xml:id="secondary.indexes">
|
<section xml:id="secondary.indexes">
|
||||||
<title>
|
<title>
|
||||||
Secondary Indexes and Alternate Query Paths
|
Secondary Indexes and Alternate Query Paths
|
||||||
|
|
|
@ -336,4 +336,25 @@ htable.close();</programlisting></para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section> <!-- reading -->
|
</section> <!-- reading -->
|
||||||
|
|
||||||
|
<section xml:id="perf.deleting">
|
||||||
|
<title>Deleting from HBase</title>
|
||||||
|
<section xml:id="perf.deleting.queue">
|
||||||
|
<title>Using HBase Tables as Queues</title>
|
||||||
|
<para>HBase tables are sometimes used as queues. In this case, special care must be taken to regularly perform major compactions on tables used in
|
||||||
|
this manner. As is documented in <xref linkend="datamodel" />, marking rows as deleted creates additional StoreFiles which then need to be processed
|
||||||
|
on reads. Tombstones only get cleaned up with major compactions.
|
||||||
|
</para>
|
||||||
|
<para>See also <xref linkend="compaction" /> and <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin.majorCompact</link>.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="perf.deleting.rpc">
|
||||||
|
<title>Delete RPC Behavior</title>
|
||||||
|
<para>Be aware that <code>htable.delete(Delete)</code> doesn't use the writeBuffer. It will execute an RegionServer RPC with each invocation.
|
||||||
|
For a large number of deletes, consider <code>htable.delete(List)</code>.
|
||||||
|
</para>
|
||||||
|
<para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29"></link>
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
</section> <!-- deleting -->
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
Loading…
Reference in New Issue