HBASE-3563 [site] Add one-page-only version of hbase doc

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1074349 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2011-02-24 23:12:21 +00:00
parent d3ec71e317
commit fb502bee15
3 changed files with 97 additions and 31 deletions

View File

@ -78,6 +78,7 @@ Release 0.91.0 - Unreleased
coprocesssors -- instead version each individually
HBASE-3520 Update our bundled hadoop from branch-0.20-append to latest
(rpc version 43)
HBASE-3563 [site] Add one-page-only version of hbase doc
NEW FEATURES

38
pom.xml
View File

@ -287,10 +287,38 @@
<version>2.0.11</version>
<executions>
<execution>
<id>multipage</id>
<phase>pre-site</phase>
<configuration>
<xincludeSupported>true</xincludeSupported>
<navigShowtitles>true</navigShowtitles>
<chunkedOutput>true</chunkedOutput>
<useIdAsFilename>true</useIdAsFilename>
<sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
<sectionAutolabel>true</sectionAutolabel>
<sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
<targetDirectory>${basedir}/target/site/book/</targetDirectory>
<htmlStylesheet>../css/freebsd_docbook.css</htmlStylesheet>
</configuration>
<goals>
<goal>generate-html</goal>
</goals>
</execution>
<execution>
<id>onepage</id>
<phase>pre-site</phase>
<configuration>
<xincludeSupported>true</xincludeSupported>
<useIdAsFilename>true</useIdAsFilename>
<sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
<sectionAutolabel>true</sectionAutolabel>
<sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
<targetDirectory>${basedir}/target/site/</targetDirectory>
<htmlStylesheet>css/freebsd_docbook.css</htmlStylesheet>
</configuration>
<goals>
<goal>generate-html</goal>
</goals>
</execution>
</executions>
<dependencies>
@ -301,16 +329,6 @@
<scope>runtime</scope>
</dependency>
</dependencies>
<configuration>
<xincludeSupported>true</xincludeSupported>
<chunkedOutput>true</chunkedOutput>
<useIdAsFilename>true</useIdAsFilename>
<sectionAutolabelMaxDepth>100</sectionAutolabelMaxDepth>
<sectionAutolabel>true</sectionAutolabel>
<sectionLabelIncludesComponentLabel>true</sectionLabelIncludesComponentLabel>
<targetDirectory>${basedir}/target/site/</targetDirectory>
<htmlStylesheet>css/freebsd_docbook.css</htmlStylesheet>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>

View File

@ -251,7 +251,7 @@ hbase(main):013:0&gt; drop 'test'
<para><programlisting>hbase(main):014:0&gt; exit</programlisting></para>
</section>
<section>
<section xml:id="stopping">
<title>Stopping HBase</title>
<para>Stop your hbase instance by running the stop script.</para>
@ -456,7 +456,7 @@ guide.
</section>
<section><title>HBase run modes: Standalone and Distributed</title>
<section xml:id="standalone_dist"><title>HBase run modes: Standalone and Distributed</title>
<para>HBase has two run modes: <link linkend="standalone">standalone</link>
and <link linkend="distributed">distributed</link>.
Out of the box, HBase runs in standalone mode. To set up a
@ -479,7 +479,7 @@ Set <varname>JAVA_HOME</varname> to point at the root of your
talk to HBase.
</para>
</section>
<section><title>Distributed</title>
<section xml:id="distributed"><title>Distributed</title>
<para>Distributed mode can be subdivided into distributed but all daemons run on a
single node -- a.k.a <emphasis>pseudo-distributed</emphasis>-- and
<emphasis>fully-distributed</emphasis> where the daemons
@ -595,7 +595,7 @@ make the following configuration.</para>
&lt;/configuration&gt;
</programlisting>
<section><title><filename>regionservers</filename></title>
<section xml:id="regionserver"><title><filename>regionservers</filename></title>
<para>In addition, a fully-distributed mode requires that you
modify <filename>conf/regionservers</filename>.
The <filename><link linkend="regionservrers">regionservers</link></filename> file lists all hosts
@ -820,7 +820,7 @@ before stopping the Hadoop daemons.</para>
<section><title>Example Configurations</title>
<section xml:id="example_config"><title>Example Configurations</title>
<section><title>Basic Distributed HBase Install</title>
<para>Here is an example basic configuration for a distributed ten node cluster.
The nodes are named <varname>example0</varname>, <varname>example1</varname>, etc., through
@ -1002,7 +1002,7 @@ to ensure well-formedness of your document after an edit session.
Use <command>rsync</command>.</para>
<section>
<section xml:id="hbase.site">
<title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
<para>Just as in Hadoop where you add site-specific HDFS configuration
to the <filename>hdfs-site.xml</filename> file,
@ -1032,7 +1032,7 @@ to ensure well-formedness of your document after an edit session.
href="../../target/site/hbase-default.xml" />
</section>
<section>
<section xml:id="hbase.env.sh">
<title><filename>hbase-env.sh</filename></title>
<para>Set HBase environment variables in this file.
Examples include options to pass the JVM on start of
@ -1280,7 +1280,7 @@ of all regions.
<para>See <link linkend="shell_exercises">Shell Exercises</link>
for example basic shell operation.</para>
<section><title>Scripting</title>
<section xml:id="scripting"><title>Scripting</title>
<para>For examples scripting HBase, look in the
HBase <filename>bin</filename> directory. Look at the files
that end in <filename>*.rb</filename>. To run one of these
@ -1349,14 +1349,31 @@ of all regions.
<chapter xml:id="schema">
<title>HBase and Schema Design</title>
<section xml:id="number.of.cfs">
<title>
On the number of column families
</title>
<para>
HBase currently does not do well with anything about two or three column families so keep the number
of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so
if one column family is carrying the bulk of the data bringing on flushes, the adjacent families
will also be flushed though the amount of data they carry is small. Compaction is currently triggered
by the total number of files under a column family. Its not size based. When many column families the
flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by
changing flushing and compaction to work on a per column family basis).
</para>
<para>Try to make do with one column famliy if you can in your schemas. Only introduce a
second and third column family in the case where data access is usually column scoped;
i.e. you query one column family or the other but usually not both at the one time.
</para>
</section>
<section>
<title>
Monotonically Increasing Row Keys/Timeseries Data
</title>
<para>See this comic by IKai Lan on why monotically increasing row keys are
problematic in BigTable-like datastores:
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>
in BigTable-like stores.</para>
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>.</para>
<para>If you need to upload time series data into HBase, you should
study <link xlink:href="http://opentsdb.net/">OpenTSDB</link> as a
successful example. It has a page describing the schema it uses in
@ -1434,7 +1451,7 @@ of all regions.
<para></para>
</section>
<section>
<section xml:id="cells">
<title>Cells<indexterm><primary>Cells</primary></indexterm></title>
<para>A <emphasis>{row, column, version} </emphasis>tuple exactly
specifies a <literal>cell</literal> in HBase.
@ -1493,7 +1510,7 @@ of all regions.
basically a synopsis of this article by Bruno Dumon.</para>
</footnote>.</para>
<section>
<section xml:id="versions">
<title>Versions and HBase Operations</title>
<para>In this section we look at the behavior of the version dimension
@ -1635,15 +1652,15 @@ of all regions.
<chapter xml:id="architecture">
<title>Architecture</title>
<section>
<section xml:id="daemons">
<title>Daemons</title>
<section><title>Master</title>
<section xml:id="master"><title>Master</title>
</section>
<section><title>RegionServer</title>
<section xml:id="regionserver.arch"><title>RegionServer</title>
</section>
</section>
<section>
<section xml:id="regions.arch">
<title>Regions</title>
<para>This chapter is all about Regions.</para>
<note>
@ -1758,7 +1775,7 @@ of all regions.
<para>Each RegionServer adds updates to its Write-ahead Log (WAL)
first, and then to memory.</para>
<section>
<section xml:id="purpose.wal">
<title>What is the purpose of the HBase WAL</title>
<para>
@ -1812,6 +1829,36 @@ of all regions.
</section>
</chapter>
<chapter xml:id="performance">
<title>Performance Tuning</title>
<para>Start with the <link xlink:href="http://wiki.apache.org/hadoop/PerformanceTuning">wiki Performance Tuning</link> page.
It has a general discussion of the main factors involved; RAM, compression, JVM settings, etc.
Afterward, come back here for more pointers.
</para>
<section xml:id="jvm">
<title>Java</title>
<section xml:id="gc">
<title>The Garage Collector and HBase</title>
<section xml:id="gcpause">
<title>Long GC pauses</title>
<para>
In his presentation,
<link xlink:href="http://www.slideshare.net/cloudera/hbase-hug-presentation">Avoiding Full GCs with MemStore-Local Allocation Buffers</link>,
Todd Lipcon describes two cases of stop-the-world garbage collections common in HBase, especially during loading;
CMS failure modes and old generation heap fragmentation brought. To address the first,
start the CMS earlier than default by adding <code>-XX:CMSInitiatingOccupancyFraction</code>
and setting it down from defaults. Start at 60 or 70 percent (The lower you bring down
the threshold, the more GCing is done, the more CPU used). To address the second
fragmentation issue, Todd added an experimental facility that must be
explicitly enabled in HBase 0.90.x (Its defaulted to be on in 0.92.x HBase). See
<code>hbase.hregion.memstore.mslab.enabled</code> to true in your
<classname>Configuration</classname>. See the cited slides for background and
detail.
</para>
</section>
</section>
</section>
</chapter>
<chapter xml:id="blooms">
<title>Bloom Filters</title>
@ -1839,7 +1886,7 @@ of all regions.
work.</para>
</footnote></para>
<section>
<section xml:id="bloom.config">
<title>Configurations</title>
<para>Blooms are enabled by specifying options on a column family in the
@ -1975,7 +2022,7 @@ of all regions.
doing:<programlisting> $ ./<code>bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog --split hdfs://example.org:9000/hbase/.logs/example.org,60020,1283516293161/</code></programlisting></para>
</section>
</section>
<section><title>Compression Tool</title>
<section xml:id="compression.tool"><title>Compression Tool</title>
<para>See <link linkend="compression.tool" >Compression Tool</link>.</para>
</section>
</appendix>
@ -1984,7 +2031,7 @@ of all regions.
<title >Compression In HBase<indexterm><primary>Compression</primary></indexterm></title>
<section id="compression.test">
<section xml:id="compression.test">
<title>CompressionTest Tool</title>
<para>
HBase includes a tool to test compression is set up properly.
@ -1993,7 +2040,7 @@ of all regions.
</para>
</section>
<section id="hbase.regionserver.codecs">
<section xml:id="hbase.regionserver.codecs">
<title>
<varname>
hbase.regionserver.codecs