HBASE-11932 Docbook html-single build improvements
This commit is contained in:
parent
eec15bd172
commit
2816487247
1
pom.xml
1
pom.xml
|
@ -807,6 +807,7 @@
|
||||||
<phase>pre-site</phase>
|
<phase>pre-site</phase>
|
||||||
<configuration>
|
<configuration>
|
||||||
<targetDirectory>${basedir}/target/docbkx/</targetDirectory>
|
<targetDirectory>${basedir}/target/docbkx/</targetDirectory>
|
||||||
|
<includes>book.xml</includes>
|
||||||
<preProcess>
|
<preProcess>
|
||||||
<copy todir="target/docbkx/images">
|
<copy todir="target/docbkx/images">
|
||||||
<fileset dir="src/main/site/resources/images/"/>
|
<fileset dir="src/main/site/resources/images/"/>
|
||||||
|
|
|
@ -31,18 +31,22 @@
|
||||||
<para>As we will be discussing changes to the HFile format, it is useful to give a short overview of the original (HFile version 1) format.</para>
|
<para>As we will be discussing changes to the HFile format, it is useful to give a short overview of the original (HFile version 1) format.</para>
|
||||||
<section xml:id="hfilev1.overview">
|
<section xml:id="hfilev1.overview">
|
||||||
<title>Overview of Version 1</title>
|
<title>Overview of Version 1</title>
|
||||||
<para>An HFile in version 1 format is structured as follows:
|
<para>An HFile in version 1 format is structured as follows:</para>
|
||||||
<inlinemediaobject>
|
<figure>
|
||||||
<imageobject>
|
<title>HFile V1 Format</title>
|
||||||
<imagedata align="center" valign="middle" fileref="hfile.png" />
|
<mediaobject>
|
||||||
</imageobject>
|
<imageobject>
|
||||||
<textobject>
|
<imagedata align="center" valign="middle" fileref="hfile.png"/>
|
||||||
<phrase>HFile Version 1</phrase>
|
</imageobject>
|
||||||
</textobject>
|
<textobject>
|
||||||
|
<phrase>HFile Version 1</phrase>
|
||||||
</inlinemediaobject>
|
</textobject>
|
||||||
<footnote><para>Image courtesy of Lars George, <link xlink:href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html">hbase-architecture-101-storage.html</link>.</para></footnote>
|
<caption><para>Image courtesy of Lars George, <link
|
||||||
</para>
|
xlink:href="http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html"
|
||||||
|
>hbase-architecture-101-storage.html</link>.</para></caption>
|
||||||
|
</mediaobject>
|
||||||
|
</figure>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
<section><title> Block index format in version 1 </title>
|
<section><title> Block index format in version 1 </title>
|
||||||
<para>The block index in version 1 is very straightforward. For each entry, it contains: </para>
|
<para>The block index in version 1 is very straightforward. For each entry, it contains: </para>
|
||||||
|
|
|
@ -639,24 +639,19 @@ try {
|
||||||
store file, the most recent values are found first.</para>
|
store file, the most recent values are found first.</para>
|
||||||
|
|
||||||
<para>There is a lot of confusion over the semantics of <literal>cell</literal> versions, in
|
<para>There is a lot of confusion over the semantics of <literal>cell</literal> versions, in
|
||||||
HBase. In particular, a couple questions that often come up are:</para>
|
HBase. In particular:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>If multiple writes to a cell have the same version, are all versions maintained or
|
<para>If multiple writes to a cell have the same version, only the last written is
|
||||||
just the last?<footnote>
|
fetchable.</para>
|
||||||
<para>Currently, only the last written is fetchable.</para>
|
|
||||||
</footnote></para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Is it OK to write cells in a non-increasing version order?<footnote>
|
<para>It is OK to write cells in a non-increasing version order.</para>
|
||||||
<para>Yes</para>
|
|
||||||
</footnote></para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>Below we describe how the version dimension in HBase currently works<footnote>
|
<para>Below we describe how the version dimension in HBase currently works. See <link
|
||||||
<para>See <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link> for
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-2406">HBASE-2406</link> for
|
||||||
discussion of HBase versions. <link
|
discussion of HBase versions. <link
|
||||||
xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in HBase</link>
|
xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in HBase</link>
|
||||||
|
@ -665,7 +660,6 @@ try {
|
||||||
<emphasis>Overwriting values at existing timestamps</emphasis> mentioned in the
|
<emphasis>Overwriting values at existing timestamps</emphasis> mentioned in the
|
||||||
article no longer holds in HBase. This section is basically a synopsis of this article
|
article no longer holds in HBase. This section is basically a synopsis of this article
|
||||||
by Bruno Dumon.</para>
|
by Bruno Dumon.</para>
|
||||||
</footnote>.</para>
|
|
||||||
|
|
||||||
<section
|
<section
|
||||||
xml:id="versions.ops">
|
xml:id="versions.ops">
|
||||||
|
@ -783,11 +777,10 @@ htable.put(put);
|
||||||
xml:id="version.delete">
|
xml:id="version.delete">
|
||||||
<title>Delete</title>
|
<title>Delete</title>
|
||||||
|
|
||||||
<para>There are three different types of internal delete markers <footnote>
|
<para>There are three different types of internal delete markers. See Lars Hofhansl's blog
|
||||||
<para>See Lars Hofhansl's blog for discussion of his attempt adding another, <link
|
for discussion of his attempt adding another, <link
|
||||||
xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning
|
xlink:href="http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html">Scanning
|
||||||
in HBase: Prefix Delete Marker</link></para>
|
in HBase: Prefix Delete Marker</link>. </para>
|
||||||
</footnote>: </para>
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>Delete: for a specific version of a column.</para>
|
<para>Delete: for a specific version of a column.</para>
|
||||||
|
@ -808,11 +801,10 @@ htable.put(put);
|
||||||
modifies data in place, so for example a delete will not immediately delete (or mark as
|
modifies data in place, so for example a delete will not immediately delete (or mark as
|
||||||
deleted) the entries in the storage file that correspond to the delete condition.
|
deleted) the entries in the storage file that correspond to the delete condition.
|
||||||
Rather, a so-called <emphasis>tombstone</emphasis> is written, which will mask the
|
Rather, a so-called <emphasis>tombstone</emphasis> is written, which will mask the
|
||||||
deleted values<footnote>
|
deleted values. When HBase does a major compaction, the tombstones are processed to
|
||||||
<para>When HBase does a major compaction, the tombstones are processed to actually
|
actually remove the dead values, together with the tombstones themselves. If the version
|
||||||
remove the dead values, together with the tombstones themselves.</para>
|
you specified when deleting a row is larger than the version of any value in the row,
|
||||||
</footnote>. If the version you specified when deleting a row is larger than the version
|
then you can consider the complete row to be deleted.</para>
|
||||||
of any value in the row, then you can consider the complete row to be deleted.</para>
|
|
||||||
<para>For an informative discussion on how deletes and versioning interact, see the thread <link
|
<para>For an informative discussion on how deletes and versioning interact, see the thread <link
|
||||||
xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/
|
xlink:href="http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421">Put w/
|
||||||
timestamp -> Deleteall -> Put w/ timestamp fails</link> up on the user mailing
|
timestamp -> Deleteall -> Put w/ timestamp fails</link> up on the user mailing
|
||||||
|
@ -846,10 +838,8 @@ htable.put(put);
|
||||||
<title>Deletes mask Puts</title>
|
<title>Deletes mask Puts</title>
|
||||||
|
|
||||||
<para>Deletes mask puts, even puts that happened after the delete
|
<para>Deletes mask puts, even puts that happened after the delete
|
||||||
was entered<footnote>
|
was entered. See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-2256"
|
||||||
<para><link
|
>HBASE-2256</link>. Remember that a delete writes a tombstone, which only
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2256">HBASE-2256</link></para>
|
|
||||||
</footnote>. Remember that a delete writes a tombstone, which only
|
|
||||||
disappears after then next major compaction has run. Suppose you do
|
disappears after then next major compaction has run. Suppose you do
|
||||||
a delete of everything <= T. After this you do a new put with a
|
a delete of everything <= T. After this you do a new put with a
|
||||||
timestamp <= T. This put, even if it happened after the delete,
|
timestamp <= T. This put, even if it happened after the delete,
|
||||||
|
@ -868,14 +858,12 @@ htable.put(put);
|
||||||
<title>Major compactions change query results</title>
|
<title>Major compactions change query results</title>
|
||||||
|
|
||||||
<para><quote>...create three cell versions at t1, t2 and t3, with a maximum-versions
|
<para><quote>...create three cell versions at t1, t2 and t3, with a maximum-versions
|
||||||
setting of 2. So when getting all versions, only the values at t2 and t3 will be
|
setting of 2. So when getting all versions, only the values at t2 and t3 will be
|
||||||
returned. But if you delete the version at t2 or t3, the one at t1 will appear again.
|
returned. But if you delete the version at t2 or t3, the one at t1 will appear again.
|
||||||
Obviously, once a major compaction has run, such behavior will not be the case anymore...<footnote>
|
Obviously, once a major compaction has run, such behavior will not be the case
|
||||||
<para>See <emphasis>Garbage Collection</emphasis> in <link
|
anymore...</quote> (See <emphasis>Garbage Collection</emphasis> in <link
|
||||||
xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in
|
xlink:href="http://outerthought.org/blog/417-ot.html">Bending time in
|
||||||
HBase</link>
|
HBase</link>.)</para>
|
||||||
</para>
|
|
||||||
</footnote></quote></para>
|
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
@ -2020,14 +2008,13 @@ rs.close();
|
||||||
</section> <!-- client.filter -->
|
</section> <!-- client.filter -->
|
||||||
|
|
||||||
<section xml:id="master"><title>Master</title>
|
<section xml:id="master"><title>Master</title>
|
||||||
<para><code>HMaster</code> is the implementation of the Master Server. The Master server
|
<para><code>HMaster</code> is the implementation of the Master Server. The Master server is
|
||||||
is responsible for monitoring all RegionServer instances in the cluster, and is
|
responsible for monitoring all RegionServer instances in the cluster, and is the interface
|
||||||
the interface for all metadata changes. In a distributed cluster, the Master typically runs on the <xref linkend="arch.hdfs.nn" /><footnote>
|
for all metadata changes. In a distributed cluster, the Master typically runs on the <xref
|
||||||
<para>J Mohamed Zahoor goes into some more detail on the Master Architecture in this blog posting, <link
|
linkend="arch.hdfs.nn"/>. J Mohamed Zahoor goes into some more detail on the Master
|
||||||
xlink:href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/">HBase HMaster Architecture
|
Architecture in this blog posting, <link
|
||||||
</link>.</para>
|
xlink:href="http://blog.zahoor.in/2012/08/hbase-hmaster-architecture/">HBase HMaster
|
||||||
</footnote>
|
Architecture </link>.</para>
|
||||||
</para>
|
|
||||||
<section xml:id="master.startup"><title>Startup Behavior</title>
|
<section xml:id="master.startup"><title>Startup Behavior</title>
|
||||||
<para>If run in a multi-Master environment, all Masters compete to run the cluster. If the active
|
<para>If run in a multi-Master environment, all Masters compete to run the cluster. If the active
|
||||||
Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to
|
Master loses its lease in ZooKeeper (or the Master shuts down), then then the remaining Masters jostle to
|
||||||
|
@ -2469,17 +2456,16 @@ rs.close();
|
||||||
physical RAM, and is likely to be less than the total available RAM due to other
|
physical RAM, and is likely to be less than the total available RAM due to other
|
||||||
memory requirements and system constraints.
|
memory requirements and system constraints.
|
||||||
</para>
|
</para>
|
||||||
<para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is configured to use
|
<para>You can see how much memory -- onheap and offheap/direct -- a RegionServer is
|
||||||
and how much it is using at any one time by looking at the
|
configured to use and how much it is using at any one time by looking at the
|
||||||
<emphasis>Server Metrics: Memory</emphasis> tab in the UI.
|
<emphasis>Server Metrics: Memory</emphasis> tab in the UI. It can also be gotten
|
||||||
It can also be gotten via JMX. In particular the direct
|
via JMX. In particular the direct memory currently used by the server can be found
|
||||||
memory currently used by the server can be found on the
|
on the <varname>java.nio.type=BufferPool,name=direct</varname> bean. Terracotta has
|
||||||
<varname>java.nio.type=BufferPool,name=direct</varname>
|
a <link
|
||||||
bean.
|
xlink:href="http://terracotta.org/documentation/4.0/bigmemorygo/configuration/storage-options"
|
||||||
<footnote><para>Terracotta has a <link xlink:href="http://terracotta.org/documentation/4.0/bigmemorygo/configuration/storage-options">good write up</link> on using offheap memory in java.
|
>good write up</link> on using offheap memory in java. It is for their product
|
||||||
It is for their product BigMemory but alot of the issues noted apply
|
BigMemory but alot of the issues noted apply in general to any attempt at going
|
||||||
in general to any attempt at going offheap. Check it out.</para></footnote>
|
offheap. Check it out.</para>
|
||||||
</para>
|
|
||||||
</note>
|
</note>
|
||||||
<note xml:id="hbase.bucketcache.percentage.in.combinedcache"><title>hbase.bucketcache.percentage.in.combinedcache</title>
|
<note xml:id="hbase.bucketcache.percentage.in.combinedcache"><title>hbase.bucketcache.percentage.in.combinedcache</title>
|
||||||
<para>This is a pre-HBase 1.0 configuration removed because it
|
<para>This is a pre-HBase 1.0 configuration removed because it
|
||||||
|
@ -2613,12 +2599,10 @@ rs.close();
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para>If the <varname>hbase.hlog.split.skip.errors</varname> optionset to
|
<para>If the <varname>hbase.hlog.split.skip.errors</varname> optionset to
|
||||||
<literal>false</literal>, the default, the exception will be propagated and the
|
<literal>false</literal>, the default, the exception will be propagated and the
|
||||||
split will be logged as failed.<footnote>
|
split will be logged as failed. See <link
|
||||||
<para>See <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2958">HBASE-2958 When
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-2958">HBASE-2958 When
|
||||||
hbase.hlog.split.skip.errors is set to false, we fail the split but thats
|
hbase.hlog.split.skip.errors is set to false, we fail the split but thats
|
||||||
it</link>. We need to do more than just fail split if this flag is set.</para>
|
it</link>. We need to do more than just fail split if this flag is set.</para>
|
||||||
</footnote></para>
|
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>How EOFExceptions are treated when splitting a crashed RegionServers'
|
<title>How EOFExceptions are treated when splitting a crashed RegionServers'
|
||||||
|
@ -2628,11 +2612,9 @@ rs.close();
|
||||||
<varname>hbase.hlog.split.skip.errors</varname> is set to
|
<varname>hbase.hlog.split.skip.errors</varname> is set to
|
||||||
<literal>false</literal>. An EOFException while reading the last log in the set of
|
<literal>false</literal>. An EOFException while reading the last log in the set of
|
||||||
files to split is likely, because the RegionServer is likely to be in the process of
|
files to split is likely, because the RegionServer is likely to be in the process of
|
||||||
writing a record at the time of a crash. <footnote>
|
writing a record at the time of a crash. For background, see <link
|
||||||
<para>For background, see <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643
|
||||||
Figure how to deal with eof splitting logs</link></para>
|
Figure how to deal with eof splitting logs</link></para>
|
||||||
</footnote></para>
|
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
@ -3042,9 +3024,9 @@ ctime = Sat Jun 23 11:13:40 PDT 2012
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>Third replica is written on the same rack as the second, but on a different node chosen randomly</para>
|
<listitem><para>Third replica is written on the same rack as the second, but on a different node chosen randomly</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>Subsequent replicas are written on random nodes on the cluster
|
<listitem><para>Subsequent replicas are written on random nodes on the cluster. See <emphasis>Replica Placement: The First Baby Steps</emphasis> on this page: <link
|
||||||
<footnote><para>See <emphasis>Replica Placement: The First Baby Steps</emphasis> on this page: <link xlink:href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS Architecture</link></para></footnote>
|
xlink:href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html"
|
||||||
</para></listitem>
|
>HDFS Architecture</link></para></listitem>
|
||||||
</orderedlist>
|
</orderedlist>
|
||||||
<para>
|
<para>
|
||||||
Thus, HBase eventually achieves locality for a region after a flush or a compaction.
|
Thus, HBase eventually achieves locality for a region after a flush or a compaction.
|
||||||
|
@ -5166,8 +5148,7 @@ This option should not normally be used, and it is not in <code>-fixAll</code>.
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="data_block_no_encoding.png" width="800"/>
|
<imagedata fileref="data_block_no_encoding.png" width="800"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject><para></para>
|
<caption><para>A ColumnFamily with no encoding></para></caption>
|
||||||
</textobject>
|
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
<para>Here is the same data with prefix data encoding.</para>
|
<para>Here is the same data with prefix data encoding.</para>
|
||||||
|
@ -5177,8 +5158,7 @@ This option should not normally be used, and it is not in <code>-fixAll</code>.
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="data_block_prefix_encoding.png" width="800"/>
|
<imagedata fileref="data_block_prefix_encoding.png" width="800"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject><para></para>
|
<caption><para>A ColumnFamily with prefix encoding</para></caption>
|
||||||
</textobject>
|
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
@ -5202,8 +5182,7 @@ This option should not normally be used, and it is not in <code>-fixAll</code>.
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="data_block_diff_encoding.png" width="800"/>
|
<imagedata fileref="data_block_diff_encoding.png" width="800"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject><para></para>
|
<caption><para>A ColumnFamily with diff encoding</para></caption>
|
||||||
</textobject>
|
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
|
@ -40,12 +40,9 @@
|
||||||
committer will add it for you. Thereafter you can file issues against your feature branch in
|
committer will add it for you. Thereafter you can file issues against your feature branch in
|
||||||
Apache HBase JIRA. Your code you keep elsewhere -- it should be public so it can be observed
|
Apache HBase JIRA. Your code you keep elsewhere -- it should be public so it can be observed
|
||||||
-- and you can update dev mailing list on progress. When the feature is ready for commit, 3
|
-- and you can update dev mailing list on progress. When the feature is ready for commit, 3
|
||||||
+1s from committers will get your feature merged<footnote>
|
+1s from committers will get your feature merged. See <link
|
||||||
<para>See <link
|
|
||||||
xlink:href="http://search-hadoop.com/m/asM982C5FkS1">HBase, mail # dev - Thoughts
|
xlink:href="http://search-hadoop.com/m/asM982C5FkS1">HBase, mail # dev - Thoughts
|
||||||
about large feature dev branches</link></para>
|
about large feature dev branches</link></para>
|
||||||
</footnote>
|
|
||||||
</para>
|
|
||||||
</section>
|
</section>
|
||||||
<section
|
<section
|
||||||
xml:id="patchplusonepolicy">
|
xml:id="patchplusonepolicy">
|
||||||
|
|
|
@ -212,12 +212,10 @@
|
||||||
<term>DNS</term>
|
<term>DNS</term>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>HBase uses the local hostname to self-report its IP address. Both forward and
|
<para>HBase uses the local hostname to self-report its IP address. Both forward and
|
||||||
reverse DNS resolving must work in versions of HBase previous to 0.92.0.<footnote>
|
reverse DNS resolving must work in versions of HBase previous to 0.92.0. The <link
|
||||||
<para>The <link
|
|
||||||
xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link>
|
xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link>
|
||||||
tool can be used to verify DNS is working correctly on the cluster. The project
|
tool can be used to verify DNS is working correctly on the cluster. The project
|
||||||
README file provides detailed instructions on usage. </para>
|
README file provides detailed instructions on usage. </para>
|
||||||
</footnote></para>
|
|
||||||
|
|
||||||
<para>If your server has multiple network interfaces, HBase defaults to using the
|
<para>If your server has multiple network interfaces, HBase defaults to using the
|
||||||
interface that the primary hostname resolves to. To override this behavior, set the
|
interface that the primary hostname resolves to. To override this behavior, set the
|
||||||
|
@ -306,11 +304,10 @@
|
||||||
running the HBase process is an operating system configuration, rather than an HBase
|
running the HBase process is an operating system configuration, rather than an HBase
|
||||||
configuration. It is also important to be sure that the settings are changed for the
|
configuration. It is also important to be sure that the settings are changed for the
|
||||||
user that actually runs HBase. To see which user started HBase, and that user's ulimit
|
user that actually runs HBase. To see which user started HBase, and that user's ulimit
|
||||||
configuration, look at the first line of the HBase log for that instance.<footnote>
|
configuration, look at the first line of the HBase log for that instance. A useful read
|
||||||
<para>A useful read setting config on you hadoop cluster is Aaron Kimballs' <link
|
setting config on you hadoop cluster is Aaron Kimballs' <link
|
||||||
xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration
|
xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/"
|
||||||
Parameters: What can you just ignore?</link></para>
|
>Configuration Parameters: What can you just ignore?</link></para>
|
||||||
</footnote></para>
|
|
||||||
<formalpara xml:id="ulimit_ubuntu">
|
<formalpara xml:id="ulimit_ubuntu">
|
||||||
<title><command>ulimit</command> Settings on Ubuntu</title>
|
<title><command>ulimit</command> Settings on Ubuntu</title>
|
||||||
<para>To configure <command>ulimit</command> settings on Ubuntu, edit
|
<para>To configure <command>ulimit</command> settings on Ubuntu, edit
|
||||||
|
@ -414,12 +411,8 @@ hadoop - nproc 32000
|
||||||
<entry>HBase-0.92.x</entry>
|
<entry>HBase-0.92.x</entry>
|
||||||
<entry>HBase-0.94.x</entry>
|
<entry>HBase-0.94.x</entry>
|
||||||
<entry>HBase-0.96.x</entry>
|
<entry>HBase-0.96.x</entry>
|
||||||
<entry>HBase-0.98.x<footnote>
|
<entry><para>HBase-0.98.x (Support for Hadoop 1.x is deprecated.)</para></entry>
|
||||||
<para>Support for Hadoop 1.x is deprecated.</para>
|
<entry><para>HBase-1.0.x (Hadoop 1.x is NOT supported)</para></entry>
|
||||||
</footnote></entry>
|
|
||||||
<entry>HBase-1.0.x<footnote>
|
|
||||||
<para>Hadoop 1.x is NOT supported</para>
|
|
||||||
</footnote></entry>
|
|
||||||
</row>
|
</row>
|
||||||
</thead>
|
</thead>
|
||||||
<tbody>
|
<tbody>
|
||||||
|
@ -440,11 +433,9 @@ hadoop - nproc 32000
|
||||||
<entry>X</entry>
|
<entry>X</entry>
|
||||||
</row>
|
</row>
|
||||||
<row>
|
<row>
|
||||||
<entry>Hadoop-1.0.0-1.0.2<footnote>
|
<entry><para>Hadoop-1.0.0-1.0.2 (HBase requires hadoop 1.0.3 at a minimum; there is an
|
||||||
<para>HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot
|
issue where we cannot find KerberosUtil compiling against earlier versions of
|
||||||
find KerberosUtil compiling against earlier versions of Hadoop.</para>
|
Hadoop.)</para></entry>
|
||||||
</footnote>
|
|
||||||
</entry>
|
|
||||||
<entry>X</entry>
|
<entry>X</entry>
|
||||||
<entry>X</entry>
|
<entry>X</entry>
|
||||||
<entry>X</entry>
|
<entry>X</entry>
|
||||||
|
@ -494,10 +485,9 @@ hadoop - nproc 32000
|
||||||
<row>
|
<row>
|
||||||
<entry>Hadoop-2.2.0 </entry>
|
<entry>Hadoop-2.2.0 </entry>
|
||||||
<entry>X</entry>
|
<entry>X</entry>
|
||||||
<entry>NT <footnote>
|
<entry><para>NT - To get 0.94.x to run on hadoop 2.2.0, you need to change the hadoop
|
||||||
<para>To get 0.94.x to run on hadoop 2.2.0, you need to change the hadoop 2 and
|
2 and protobuf versions in the <filename>pom.xml</filename>: Here is a diff with
|
||||||
protobuf versions in the <filename>pom.xml</filename>: Here is a diff with
|
pom.xml changes: </para>
|
||||||
pom.xml changes: </para>
|
|
||||||
<programlisting><![CDATA[$ svn diff pom.xml
|
<programlisting><![CDATA[$ svn diff pom.xml
|
||||||
Index: pom.xml
|
Index: pom.xml
|
||||||
===================================================================
|
===================================================================
|
||||||
|
@ -540,8 +530,7 @@ Index: pom.xml
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
<para> Building against the hadoop 2 profile by running something like the
|
<para> Building against the hadoop 2 profile by running something like the
|
||||||
following command: </para>
|
following command: </para>
|
||||||
<screen language="bourne">$ mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests</screen>
|
<screen language="bourne">$ mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests</screen></entry>
|
||||||
</footnote></entry>
|
|
||||||
<entry>S</entry>
|
<entry>S</entry>
|
||||||
<entry>S</entry>
|
<entry>S</entry>
|
||||||
<entry>NT</entry>
|
<entry>NT</entry>
|
||||||
|
@ -601,11 +590,9 @@ Index: pom.xml
|
||||||
<para> As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. Hadoop 2 is
|
<para> As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. Hadoop 2 is
|
||||||
strongly encouraged (faster but also has fixes that help MTTR). We will no longer run
|
strongly encouraged (faster but also has fixes that help MTTR). We will no longer run
|
||||||
properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache
|
properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache
|
||||||
HBase 0.96.x if you cannot upgrade your Hadoop.<footnote>
|
HBase 0.96.x if you cannot upgrade your Hadoop.. See <link
|
||||||
<para>See <link
|
|
||||||
xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS:
|
xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS:
|
||||||
Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para>
|
Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para>
|
||||||
</footnote></para>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section
|
<section
|
||||||
|
@ -615,13 +602,12 @@ Index: pom.xml
|
||||||
<code>sync</code> implementation. DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and
|
<code>sync</code> implementation. DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and
|
||||||
Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions
|
Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions
|
||||||
0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have
|
0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have
|
||||||
a working, durable sync <footnote>
|
a working, durable sync. The Cloudera blog post <link
|
||||||
<para>The Cloudera blog post <link
|
xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An
|
||||||
xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An
|
update on Apache Hadoop 1.0</link> by Charles Zedlweski has a nice exposition on how all
|
||||||
update on Apache Hadoop 1.0</link> by Charles Zedlweski has a nice exposition on how
|
the Hadoop versions relate. Its worth checking out if you are having trouble making sense
|
||||||
all the Hadoop versions relate. Its worth checking out if you are having trouble
|
of the Hadoop version morass. </para>
|
||||||
making sense of the Hadoop version morass. </para>
|
<para>Sync has to be explicitly enabled by setting
|
||||||
</footnote>. Sync has to be explicitly enabled by setting
|
|
||||||
<varname>dfs.support.append</varname> equal to true on both the client side -- in
|
<varname>dfs.support.append</varname> equal to true on both the client side -- in
|
||||||
<filename>hbase-site.xml</filename> -- and on the serverside in
|
<filename>hbase-site.xml</filename> -- and on the serverside in
|
||||||
<filename>hdfs-site.xml</filename> (The sync facility HBase needs is a subset of the
|
<filename>hdfs-site.xml</filename> (The sync facility HBase needs is a subset of the
|
||||||
|
@ -713,9 +699,7 @@ Index: pom.xml
|
||||||
<para>Distributed mode can be subdivided into distributed but all daemons run on a single node
|
<para>Distributed mode can be subdivided into distributed but all daemons run on a single node
|
||||||
-- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
|
-- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
|
||||||
<emphasis>fully-distributed</emphasis> where the daemons are spread across all nodes in
|
<emphasis>fully-distributed</emphasis> where the daemons are spread across all nodes in
|
||||||
the cluster <footnote>
|
the cluster. The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.</para>
|
||||||
<para>The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.</para>
|
|
||||||
</footnote>.</para>
|
|
||||||
|
|
||||||
<para>Pseudo-distributed mode can run against the local filesystem or it can run against an
|
<para>Pseudo-distributed mode can run against the local filesystem or it can run against an
|
||||||
instance of the <emphasis>Hadoop Distributed File System</emphasis> (HDFS).
|
instance of the <emphasis>Hadoop Distributed File System</emphasis> (HDFS).
|
||||||
|
|
|
@ -29,16 +29,14 @@
|
||||||
*/
|
*/
|
||||||
-->
|
-->
|
||||||
<title>Apache HBase Coprocessors</title>
|
<title>Apache HBase Coprocessors</title>
|
||||||
<para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable<footnote>
|
<para> HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable
|
||||||
<para><link
|
(<link xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009"/>, pages
|
||||||
xlink:href="http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009" />, pages
|
66-67.). Coprocessors function in a similar way to Linux kernel modules. They provide a way to
|
||||||
66-67.</para>
|
run server-level code against locally-stored data. The functionality they provide is very
|
||||||
</footnote>. Coprocessors function in a similar way to Linux kernel modules. They provide a way
|
|
||||||
to run server-level code against locally-stored data. The functionality they provide is very
|
|
||||||
powerful, but also carries great risk and can have adverse effects on the system, at the level
|
powerful, but also carries great risk and can have adverse effects on the system, at the level
|
||||||
of the operating system. The information in this chapter is primarily sourced and heavily reused
|
of the operating system. The information in this chapter is primarily sourced and heavily reused
|
||||||
from Mingjie Lai's blog post at <link
|
from Mingjie Lai's blog post at <link
|
||||||
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction" />. </para>
|
xlink:href="https://blogs.apache.org/hbase/entry/coprocessor_introduction"/>. </para>
|
||||||
|
|
||||||
<para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
|
<para> Coprocessors are not designed to be used by end users of HBase, but by HBase developers who
|
||||||
need to add specialized functionality to HBase. One example of the use of coprocessors is
|
need to add specialized functionality to HBase. One example of the use of coprocessors is
|
||||||
|
@ -418,10 +416,10 @@ coprocessors=[AggregateImplementation]
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="coprocessor_stats.png" width="100%"/>
|
<imagedata fileref="coprocessor_stats.png" width="100%"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>The Coprocessor Metrics UI shows statistics about time spent executing a given
|
<para>The Coprocessor Metrics UI shows statistics about time spent executing a given
|
||||||
coprocessor, including min, max, average, and 90th, 95th, and 99th percentile.</para>
|
coprocessor, including min, max, average, and 90th, 95th, and 99th percentile.</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
</section>
|
</section>
|
||||||
|
|
|
@ -757,11 +757,9 @@ false
|
||||||
<para>Without this facility, decommissioning mulitple nodes may be non-optimal because
|
<para>Without this facility, decommissioning mulitple nodes may be non-optimal because
|
||||||
regions that are being drained from one region server may be moved to other regionservers
|
regions that are being drained from one region server may be moved to other regionservers
|
||||||
that are also draining. Marking RegionServers to be in the draining state prevents this
|
that are also draining. Marking RegionServers to be in the draining state prevents this
|
||||||
from happening<footnote>
|
from happening. See this <link
|
||||||
<para>See this <link
|
xlink:href="http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html">blog
|
||||||
xlink:href="http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html">blog
|
post</link> for more details.</para>
|
||||||
post</link> for more details.</para>
|
|
||||||
</footnote>. </para>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section
|
<section
|
||||||
|
@ -1206,9 +1204,9 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="bc_basic.png" width="100%"/>
|
<imagedata fileref="bc_basic.png" width="100%"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>Shows the cache implementation</para>
|
<para>Shows the cache implementation</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
<figure>
|
<figure>
|
||||||
|
@ -1217,9 +1215,9 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="bc_config.png" width="100%"/>
|
<imagedata fileref="bc_config.png" width="100%"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>Shows all cache configuration options.</para>
|
<para>Shows all cache configuration options.</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
<figure>
|
<figure>
|
||||||
|
@ -1228,9 +1226,9 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="bc_stats.png" width="100%"/>
|
<imagedata fileref="bc_stats.png" width="100%"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>Shows statistics about the performance of the cache.</para>
|
<para>Shows statistics about the performance of the cache.</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
<figure>
|
<figure>
|
||||||
|
@ -1242,9 +1240,9 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="bc_l2_buckets.png" width="100%"/>
|
<imagedata fileref="bc_l2_buckets.png" width="100%"/>
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>Shows information about the L1 and L2 caches.</para>
|
<para>Shows information about the L1 and L2 caches.</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
<para>This is not an exhaustive list of all the screens and reports available. Have a look in
|
<para>This is not an exhaustive list of all the screens and reports available. Have a look in
|
||||||
|
@ -1305,10 +1303,10 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --
|
||||||
<imageobject>
|
<imageobject>
|
||||||
<imagedata fileref="replication_overview.png" />
|
<imagedata fileref="replication_overview.png" />
|
||||||
</imageobject>
|
</imageobject>
|
||||||
<textobject>
|
<caption>
|
||||||
<para>Illustration of the replication architecture in HBase, as described in the prior
|
<para>Illustration of the replication architecture in HBase, as described in the prior
|
||||||
text.</para>
|
text.</para>
|
||||||
</textobject>
|
</caption>
|
||||||
</mediaobject>
|
</mediaobject>
|
||||||
</figure>
|
</figure>
|
||||||
|
|
||||||
|
|
|
@ -146,8 +146,7 @@
|
||||||
xml:id="gcpause">
|
xml:id="gcpause">
|
||||||
<title>Long GC pauses</title>
|
<title>Long GC pauses</title>
|
||||||
|
|
||||||
<para
|
<para xml:id="mslab">In his presentation, <link
|
||||||
xml:id="mslab">In his presentation, <link
|
|
||||||
xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs
|
xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs
|
||||||
with MemStore-Local Allocation Buffers</link>, Todd Lipcon describes two cases of
|
with MemStore-Local Allocation Buffers</link>, Todd Lipcon describes two cases of
|
||||||
stop-the-world garbage collections common in HBase, especially during loading; CMS failure
|
stop-the-world garbage collections common in HBase, especially during loading; CMS failure
|
||||||
|
@ -158,16 +157,16 @@
|
||||||
Todd added an experimental facility, <indexterm><primary>MSLAB</primary></indexterm>, that
|
Todd added an experimental facility, <indexterm><primary>MSLAB</primary></indexterm>, that
|
||||||
must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in Apache 0.92.x
|
must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in Apache 0.92.x
|
||||||
HBase). See <code>hbase.hregion.memstore.mslab.enabled</code> to true in your
|
HBase). See <code>hbase.hregion.memstore.mslab.enabled</code> to true in your
|
||||||
<classname>Configuration</classname>. See the cited slides for background and detail<footnote>
|
<classname>Configuration</classname>. See the cited slides for background and detail.
|
||||||
<para>The latest jvms do better regards fragmentation so make sure you are running a
|
The latest jvms do better regards fragmentation so make sure you are running a recent
|
||||||
recent release. Read down in the message, <link
|
release. Read down in the message, <link
|
||||||
xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html">Identifying
|
xlink:href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html">Identifying
|
||||||
concurrent mode failures caused by fragmentation</link>.</para>
|
concurrent mode failures caused by fragmentation</link>. Be aware that when enabled,
|
||||||
</footnote>. Be aware that when enabled, each MemStore instance will occupy at least an
|
each MemStore instance will occupy at least an MSLAB instance of memory. If you have
|
||||||
MSLAB instance of memory. If you have thousands of regions or lots of regions each with
|
thousands of regions or lots of regions each with many column families, this allocation of
|
||||||
many column families, this allocation of MSLAB may be responsible for a good portion of
|
MSLAB may be responsible for a good portion of your heap allocation and in an extreme case
|
||||||
your heap allocation and in an extreme case cause you to OOME. Disable MSLAB in this case,
|
cause you to OOME. Disable MSLAB in this case, or lower the amount of memory it uses or
|
||||||
or lower the amount of memory it uses or float less regions per server. </para>
|
float less regions per server. </para>
|
||||||
<para>If you have a write-heavy workload, check out <link
|
<para>If you have a write-heavy workload, check out <link
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-8163">HBASE-8163
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-8163">HBASE-8163
|
||||||
MemStoreChunkPool: An improvement for JAVA GC when using MSLAB</link>. It describes
|
MemStoreChunkPool: An improvement for JAVA GC when using MSLAB</link>. It describes
|
||||||
|
@ -916,24 +915,20 @@ htable.close();
|
||||||
latencies.</para>
|
latencies.</para>
|
||||||
<para><link
|
<para><link
|
||||||
xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed
|
xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed
|
||||||
over in <link
|
over in <link xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200 Add
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200 Add
|
bloomfilters</link>. For description of the development process -- why static blooms rather than dynamic
|
||||||
bloomfilters</link>.<footnote>
|
|
||||||
<para>For description of the development process -- why static blooms rather than dynamic
|
|
||||||
-- and for an overview of the unique properties that pertain to blooms in HBase, as well
|
-- and for an overview of the unique properties that pertain to blooms in HBase, as well
|
||||||
as possible future directions, see the <emphasis>Development Process</emphasis> section
|
as possible future directions, see the <emphasis>Development Process</emphasis> section
|
||||||
of the document <link
|
of the document <link
|
||||||
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
xlink:href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf">BloomFilters
|
||||||
in HBase</link> attached to <link
|
in HBase</link> attached to <link
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>.</para>
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200</link>. The bloom filters described here are actually version two of blooms in HBase. In
|
||||||
</footnote><footnote>
|
|
||||||
<para>The bloom filters described here are actually version two of blooms in HBase. In
|
|
||||||
versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the <link
|
versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the <link
|
||||||
xlink:href="http://www.one-lab.org">European Commission One-Lab Project 034819</link>.
|
xlink:href="http://www.one-lab.org">European Commission One-Lab Project 034819</link>.
|
||||||
The core of the HBase bloom work was later pulled up into Hadoop to implement
|
The core of the HBase bloom work was later pulled up into Hadoop to implement
|
||||||
org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well.
|
org.apache.hadoop.io.BloomMapFile. Version 1 of HBase blooms never worked that well.
|
||||||
Version 2 is a rewrite from scratch though again it starts with the one-lab work.</para>
|
Version 2 is a rewrite from scratch though again it starts with the one-lab work.</para>
|
||||||
</footnote></para>
|
|
||||||
<para>See also <xref
|
<para>See also <xref
|
||||||
linkend="schema.bloom" />. </para>
|
linkend="schema.bloom" />. </para>
|
||||||
|
|
||||||
|
@ -1047,11 +1042,9 @@ htable.close();
|
||||||
possible for the DFSClient to take a "short circuit" and read directly from the disk instead
|
possible for the DFSClient to take a "short circuit" and read directly from the disk instead
|
||||||
of going through the DataNode when the data is local. What this means for HBase is that the
|
of going through the DataNode when the data is local. What this means for HBase is that the
|
||||||
RegionServers can read directly off their machine's disks instead of having to open a socket
|
RegionServers can read directly off their machine's disks instead of having to open a socket
|
||||||
to talk to the DataNode, the former being generally much faster<footnote>
|
to talk to the DataNode, the former being generally much faster. See JD's <link
|
||||||
<para>See JD's <link
|
|
||||||
xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance
|
xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Performance
|
||||||
Talk</link></para>
|
Talk</link>. Also see <link
|
||||||
</footnote>. Also see <link
|
|
||||||
xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short
|
xlink:href="http://search-hadoop.com/m/zV6dKrLCVh1">HBase, mail # dev - read short
|
||||||
circuit</link> thread for more discussion around short circuit reads. </para>
|
circuit</link> thread for more discussion around short circuit reads. </para>
|
||||||
<para>To enable "short circuit" reads, it will depend on your version of Hadoop. The original
|
<para>To enable "short circuit" reads, it will depend on your version of Hadoop. The original
|
||||||
|
@ -1081,19 +1074,17 @@ htable.close();
|
||||||
</description>
|
</description>
|
||||||
</property>]]></programlisting>
|
</property>]]></programlisting>
|
||||||
<para>Be careful about permissions for the directory that hosts the shared domain socket;
|
<para>Be careful about permissions for the directory that hosts the shared domain socket;
|
||||||
dfsclient will complain if open to other than the hbase user. <footnote>
|
dfsclient will complain if open to other than the hbase user. </para>
|
||||||
<para>If you are running on an old Hadoop, one that is without <link
|
<para>If you are running on an old Hadoop, one that is without <link
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that
|
xlink:href="https://issues.apache.org/jira/browse/HDFS-347">HDFS-347</link> but that has
|
||||||
has <link
|
<link xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>, you
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</link>, you
|
must set two configurations. First, the hdfs-site.xml needs to be amended. Set the property
|
||||||
must set two configurations. First, the hdfs-site.xml needs to be amended. Set the
|
<varname>dfs.block.local-path-access.user</varname> to be the <emphasis>only</emphasis>
|
||||||
property <varname>dfs.block.local-path-access.user</varname> to be the
|
user that can use the shortcut. This has to be the user that started HBase. Then in
|
||||||
<emphasis>only</emphasis> user that can use the shortcut. This has to be the user that
|
hbase-site.xml, set <varname>dfs.client.read.shortcircuit</varname> to be
|
||||||
started HBase. Then in hbase-site.xml, set
|
<varname>true</varname>
|
||||||
<varname>dfs.client.read.shortcircuit</varname> to be <varname>true</varname>
|
|
||||||
</para>
|
|
||||||
</footnote>
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para> Services -- at least the HBase RegionServers -- will need to be restarted in order to
|
<para> Services -- at least the HBase RegionServers -- will need to be restarted in order to
|
||||||
pick up the new configurations. </para>
|
pick up the new configurations. </para>
|
||||||
<note
|
<note
|
||||||
|
|
|
@ -43,7 +43,7 @@
|
||||||
<title>About This Guide</title>
|
<title>About This Guide</title>
|
||||||
<para>This reference guide is a work in progress. The source for this guide can be found in
|
<para>This reference guide is a work in progress. The source for this guide can be found in
|
||||||
the <filename>src/main/docbkx</filename> directory of the HBase source. This reference
|
the <filename>src/main/docbkx</filename> directory of the HBase source. This reference
|
||||||
guide is marked up using <link xlink:href="http://www.docbook.com/">DocBook</link> from
|
guide is marked up using <link xlink:href="http://www.docbook.org/">DocBook</link> from
|
||||||
which the the finished guide is generated as part of the 'site' build target. Run
|
which the the finished guide is generated as part of the 'site' build target. Run
|
||||||
<programlisting language="bourne">mvn site</programlisting> to generate this documentation. Amendments and
|
<programlisting language="bourne">mvn site</programlisting> to generate this documentation. Amendments and
|
||||||
improvements to the documentation are welcomed. Click <link
|
improvements to the documentation are welcomed. Click <link
|
||||||
|
|
|
@ -104,7 +104,8 @@
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title><preamble></title>
|
<title><preamble></title>
|
||||||
<para><programlisting><MAGIC 4 byte integer> <1 byte RPC Format Version> <1 byte auth type><footnote><para> We need the auth method spec. here so the connection header is encoded if auth enabled.</para></footnote></programlisting></para>
|
<programlisting><MAGIC 4 byte integer> <1 byte RPC Format Version> <1 byte auth type></programlisting>
|
||||||
|
<para> We need the auth method spec. here so the connection header is encoded if auth enabled.</para>
|
||||||
<para>E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- ‘HBas’ -- plus one-byte of
|
<para>E.g.: HBas0x000x50 -- 4 bytes of MAGIC -- ‘HBas’ -- plus one-byte of
|
||||||
version, 0 in this case, and one byte, 0x50 (SIMPLE). of an auth
|
version, 0 in this case, and one byte, 0x50 (SIMPLE). of an auth
|
||||||
type.</para>
|
type.</para>
|
||||||
|
|
|
@ -32,11 +32,9 @@
|
||||||
<section
|
<section
|
||||||
xml:id="hbase.secure.configuration">
|
xml:id="hbase.secure.configuration">
|
||||||
<title>Secure Client Access to Apache HBase</title>
|
<title>Secure Client Access to Apache HBase</title>
|
||||||
<para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients<footnote>
|
<para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients. See also Matteo Bertozzi's article on <link
|
||||||
<para>See also Matteo Bertozzi's article on <link
|
|
||||||
xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding
|
xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding
|
||||||
User Authentication and Authorization in Apache HBase</link>.</para>
|
User Authentication and Authorization in Apache HBase</link>.</para>
|
||||||
</footnote>.</para>
|
|
||||||
<para>This describes how to set up Apache HBase and clients for connection to secure HBase
|
<para>This describes how to set up Apache HBase and clients for connection to secure HBase
|
||||||
resources.</para>
|
resources.</para>
|
||||||
|
|
||||||
|
@ -339,11 +337,9 @@ grant 'rest_server', 'RWCA'
|
||||||
<section
|
<section
|
||||||
xml:id="hbase.secure.simpleconfiguration">
|
xml:id="hbase.secure.simpleconfiguration">
|
||||||
<title>Simple User Access to Apache HBase</title>
|
<title>Simple User Access to Apache HBase</title>
|
||||||
<para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients<footnote>
|
<para>Newer releases of Apache HBase (>= 0.92) support optional SASL authentication of clients. See also Matteo Bertozzi's article on <link
|
||||||
<para>See also Matteo Bertozzi's article on <link
|
|
||||||
xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding
|
xlink:href="http://www.cloudera.com/blog/2012/09/understanding-user-authentication-and-authorization-in-apache-hbase/">Understanding
|
||||||
User Authentication and Authorization in Apache HBase</link>.</para>
|
User Authentication and Authorization in Apache HBase</link>.</para>
|
||||||
</footnote>.</para>
|
|
||||||
<para>This describes how to set up Apache HBase and clients for simple user access to HBase
|
<para>This describes how to set up Apache HBase and clients for simple user access to HBase
|
||||||
resources.</para>
|
resources.</para>
|
||||||
|
|
||||||
|
|
|
@ -233,19 +233,16 @@ export SERVER_GC_OPTS="$SERVER_GC_OPTS -XX:NewSize=64m -XX:MaxNewSize=64m"
|
||||||
<section
|
<section
|
||||||
xml:id="trouble.resources.lists">
|
xml:id="trouble.resources.lists">
|
||||||
<title>Mailing Lists</title>
|
<title>Mailing Lists</title>
|
||||||
<para>Ask a question on the <link
|
<para>Ask a question on the <link xlink:href="http://hbase.apache.org/mail-lists.html">Apache
|
||||||
xlink:href="http://hbase.apache.org/mail-lists.html">Apache HBase mailing lists</link>.
|
HBase mailing lists</link>. The 'dev' mailing list is aimed at the community of developers
|
||||||
The 'dev' mailing list is aimed at the community of developers actually building Apache
|
actually building Apache HBase and for features currently under development, and 'user' is
|
||||||
HBase and for features currently under development, and 'user' is generally used for
|
generally used for questions on released versions of Apache HBase. Before going to the
|
||||||
questions on released versions of Apache HBase. Before going to the mailing list, make sure
|
mailing list, make sure your question has not already been answered by searching the mailing
|
||||||
your question has not already been answered by searching the mailing list archives first.
|
list archives first. Use <xref linkend="trouble.resources.searchhadoop"/>. Take some time
|
||||||
Use <xref
|
crafting your question. See <link xlink:href="http://www.mikeash.com/getting_answers.html"
|
||||||
linkend="trouble.resources.searchhadoop" />. Take some time crafting your question<footnote>
|
>Getting Answers</link> for ideas on crafting good questions. A quality question that
|
||||||
<para>See <link
|
includes all context and exhibits evidence the author has tried to find answers in the
|
||||||
xlink:href="http://www.mikeash.com/getting_answers.html">Getting Answers</link></para>
|
manual and out on lists is more likely to get a prompt response. </para>
|
||||||
</footnote>; a quality question that includes all context and exhibits evidence the author
|
|
||||||
has tried to find answers in the manual and out on lists is more likely to get a prompt
|
|
||||||
response. </para>
|
|
||||||
</section>
|
</section>
|
||||||
<section
|
<section
|
||||||
xml:id="trouble.resources.irc">
|
xml:id="trouble.resources.irc">
|
||||||
|
|
|
@ -71,12 +71,10 @@
|
||||||
out the jars of one version and replace them with the jars of another, compatible
|
out the jars of one version and replace them with the jars of another, compatible
|
||||||
version and all will just work. Unless otherwise specified, HBase point versions are
|
version and all will just work. Unless otherwise specified, HBase point versions are
|
||||||
binary compatible. You can safely do rolling upgrades between binary compatible
|
binary compatible. You can safely do rolling upgrades between binary compatible
|
||||||
versions; i.e. across point versions: e.g. from 0.94.5 to 0.94.6<footnote>
|
versions; i.e. across point versions: e.g. from 0.94.5 to 0.94.6. See <link
|
||||||
<para>See <link
|
|
||||||
xlink:href="http://search-hadoop.com/m/bOOvwHGW981/Does+compatibility+between+versions+also+mean+binary+compatibility%253F&subj=Re+Does+compatibility+between+versions+also+mean+binary+compatibility+">Does
|
xlink:href="http://search-hadoop.com/m/bOOvwHGW981/Does+compatibility+between+versions+also+mean+binary+compatibility%253F&subj=Re+Does+compatibility+between+versions+also+mean+binary+compatibility+">Does
|
||||||
compatibility between versions also mean binary compatibility?</link>
|
compatibility between versions also mean binary compatibility?</link>
|
||||||
discussion on the hbaes dev mailing list. </para>
|
discussion on the hbaes dev mailing list. </para>
|
||||||
</footnote>. </para>
|
|
||||||
</section>
|
</section>
|
||||||
<section
|
<section
|
||||||
xml:id="hbase.rolling.restart">
|
xml:id="hbase.rolling.restart">
|
||||||
|
@ -506,12 +504,10 @@ Successfully completed Log splitting
|
||||||
to change this (The 'normal'/default value is 64MB (67108864)). Run the script
|
to change this (The 'normal'/default value is 64MB (67108864)). Run the script
|
||||||
<filename>bin/set_meta_memstore_size.rb</filename>. This will make the necessary
|
<filename>bin/set_meta_memstore_size.rb</filename>. This will make the necessary
|
||||||
edit to your <varname>.META.</varname> schema. Failure to run this change will make for
|
edit to your <varname>.META.</varname> schema. Failure to run this change will make for
|
||||||
a slow cluster <footnote>
|
a slow cluster. See <link
|
||||||
<para> See <link
|
|
||||||
xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499
|
xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499
|
||||||
Users upgrading to 0.90.0 need to have their .META. table updated with the
|
Users upgrading to 0.90.0 need to have their .META. table updated with the
|
||||||
right MEMSTORE_SIZE</link>
|
right MEMSTORE_SIZE</link>
|
||||||
</para>
|
</para>
|
||||||
</footnote> . </para>
|
|
||||||
</section>
|
</section>
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
|
@ -52,12 +52,10 @@
|
||||||
<varname>hbase.zookeeper.property.clientPort</varname> property. For all default values used
|
<varname>hbase.zookeeper.property.clientPort</varname> property. For all default values used
|
||||||
by HBase, including ZooKeeper configuration, see <xref
|
by HBase, including ZooKeeper configuration, see <xref
|
||||||
linkend="hbase_default_configurations" />. Look for the
|
linkend="hbase_default_configurations" />. Look for the
|
||||||
<varname>hbase.zookeeper.property</varname> prefix <footnote>
|
<varname>hbase.zookeeper.property</varname> prefix. For the full list of ZooKeeper configurations, see ZooKeeper's
|
||||||
<para>For the full list of ZooKeeper configurations, see ZooKeeper's
|
|
||||||
<filename>zoo.cfg</filename>. HBase does not ship with a <filename>zoo.cfg</filename> so
|
<filename>zoo.cfg</filename>. HBase does not ship with a <filename>zoo.cfg</filename> so
|
||||||
you will need to browse the <filename>conf</filename> directory in an appropriate ZooKeeper
|
you will need to browse the <filename>conf</filename> directory in an appropriate ZooKeeper
|
||||||
download.</para>
|
download.</para>
|
||||||
</footnote></para>
|
|
||||||
|
|
||||||
<para>You must at least list the ensemble servers in <filename>hbase-site.xml</filename> using the
|
<para>You must at least list the ensemble servers in <filename>hbase-site.xml</filename> using the
|
||||||
<varname>hbase.zookeeper.quorum</varname> property. This property defaults to a single
|
<varname>hbase.zookeeper.quorum</varname> property. This property defaults to a single
|
||||||
|
|
Loading…
Reference in New Issue