HBASE-1932 Encourage use of 'lzo' compression... add the wiki page to getting started

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1029952 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Michael Stack 2010-11-02 04:57:25 +00:00
parent 1351769c16
commit c48a1061a7
2 changed files with 71 additions and 12 deletions

View File

@ -640,6 +640,8 @@ Release 0.21.0 - Unreleased
HBASE-3179 Enable ReplicationLogsCleaner only if replication is,
and fix its test
HBASE-3185 User-triggered compactions are triggering splits!
HBASE-1932 Encourage use of 'lzo' compression... add the wiki page to
getting started
IMPROVEMENTS

View File

@ -230,6 +230,7 @@ stopping hbase...............</programlisting></para>
different HBase run modes: standalone, what is described above in <link
linkend="quickstart">Quick Start,</link> pseudo-distributed where all
daemons run on a single server, and distributed.</para>
<para>Be sure to read the <link linkend="important_configurations">Important Configurations</link>.</para>
</section>
</chapter>
@ -242,7 +243,6 @@ stopping hbase...............</programlisting></para>
<title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title>
<para>What are these?
</para>
<para>
Not all configuration options make it out to
<filename>hbase-default.xml</filename>. Configuration
@ -250,37 +250,94 @@ stopping hbase...............</programlisting></para>
in code; the only way to turn up the configurations is
via a reading of the source code.
</para>
<!--The file hbase-default.xml is generated as part of
the build of the hbase site. See the hbase pom.xml.
The generated file is a docbook section with a glossary
in it-->
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="../../target/site/hbase-default.xml" />
</section>
</section>
<section>
<title><filename>hbase-env.sh</filename></title>
<para></para>
</section>
<section>
<title><filename>log4j.properties</filename></title>
<para></para>
</section>
<section>
<title>Noteworthy Configuration</title>
<para>Below we review a couple of the key configurations.
We'll list those you must to change to suit your context
and others that you should review and consider moving on
from defaults after guaging your deploys load and query profiles.
<section xml:id="important_configurations">
<title>The Important Configurations</title>
<para>Below we list the important Configurations. We've divided this section into
required configuration and worth-a-look recommended configs.
</para>
<section>
<section xml:id="required_configuration"><title>Required Configurations</title>
<para>Here are some configurations you must configure to suit
your deploy.
</para>
<section xml:id="ulimit">
<title><varname>ulimit</varname></title>
<para>HBase is a database, it uses a lot of files at the same time.
The default ulimit -n of 1024 on *nix systems is insufficient.
Any significant amount of loading will lead you to
<link xlink:href="http://wiki.apache.org/hadoop/Hbase/FAQ#A6">FAQ: Why do I see "java.io.IOException...(Too many open files)" in my logs?</link>.
You will also notice errors like:
<programlisting>2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
</programlisting>
Do yourself a favor and change the upper bound on the number of file descriptors.
Set it to north of 10k. See the above referenced FAQ for how.</para>
<para>To be clear, upping the file descriptors for the user who is
running the HBase process is an operating system configuration, not an
HBase configuration.
</para>
</section>
<section xml:id="dfs.datanode.max.xcievers">
<title><varname>dfs.datanode.max.xcievers</varname></title>
<para>
Hadoop HDFS has an upper bound of files that it will serve at one same time,
called <varname>xcievers</varname> (yes, this is misspelled). Again, before
doing any loading, make sure you have configured Hadoop's <filename>conf/hdfs-site.xml</filename>
setting the <varname>xceivers</varname> value to at least the following:
<programlisting>
&lt;property&gt;
&lt;name&gt;dfs.datanode.max.xcievers&lt;/name&gt;
&lt;value&gt;2047&lt;/value&gt;
&lt;/property&gt;
</programlisting>
</para>
</section>
</section>
<section xml:id="recommended_configurations"><title>Recommended Configuations</title>
<section xml:id="lzo">
<title>LZO compression</title>
<para>You should consider enabling LZO compression. Its
near-frictionless and in most all cases boosts performance.
To enable LZO, TODO...
</para>
<para>Unfortunately, HBase cannot ship with LZO because of
the licensing issues; HBase is Apache-licensed, LZO is GPL.
Therefore LZO install is to be done post-HBase install.
See the <link xlink:href="http://wiki.apache.org/hadoop/UsingLzoCompression">Using LZO Compression</link>
wiki page for how to make LZO work with HBase.
</para>
<para>A common problem users run into when using LZO is that while initial
setup of the cluster runs smooth, a month goes by and some sysadmin goes to
add a machine to the cluster only they'll have forgotten to do the LZO
fixup on the new machine. In versions since HBase 0.90.0, we should
fail in a way that makes it plain what the problem is, but maybe not.
Remember you read this paragraph<footnote><para>See
<link linkend="hbase.regionserver.codec">hbase.regionserver.codec</link>
for a feature to help protect against failed LZO install</para></footnote>.
</para>
</section>
</section>
</section>
</chapter>
@ -1201,7 +1258,7 @@ stopping hbase...............</programlisting></para>
xlink:href="http://en.wikipedia.org/wiki/Write-ahead_logging"> Write-Ahead
Log</link></subtitle>
<para>Each RegionServer adds updates to its <link linkend="???">WAL</link>
<para>Each RegionServer adds updates to its Write-ahead Log (WAL)
first, and then to memory.</para>
<section>