hbase-4939 book.xml (architecture/faq), troubleshooting.xml (created resources section)

git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1209688 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Doug Meil 2011-12-02 20:57:35 +00:00
parent b3d87416d2
commit 7df968e09a
2 changed files with 90 additions and 28 deletions

View File

@ -1200,6 +1200,63 @@ if (!b) {
<chapter xml:id="architecture"> <chapter xml:id="architecture">
<title>Architecture</title> <title>Architecture</title>
<section xml:id="arch.overview">
<title>Overview</title>
<section xml:id="arch.overview.nosql">
<title>NoSQL?</title>
<para>HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which
supports SQL as it's primary access language, but there are many types of NoSQL databases: BerkeleyDB is an
example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking,
HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS,
such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
</para>
<para>However, HBase has many features which supports both linear and modular scaling. HBase clusters expand
by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20
RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best
performance requires specialized hardware and storage devices. HBase features of note are:
<itemizedlist>
<listitem>Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This
makes it very suitable for tasks such as high-speed counter aggregation. </listitem>
<listitem>Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are
automatically split and re-distributed as your data grows.</listitem>
<listitem>Automatic RegionServer failover</listitem>
<listitem>Hadoop/HDFS Integration: HBase supports HDFS out of the box as it's distributed file system.</listitem>
<listitem>MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both
source and sink.</listitem>
<listitem>Java Client API: HBase supports an easy to use Java API for programmatic access.</listitem>
<listitem>Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.</listitem>
<listitem>Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.</listitem>
<listitem>Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.</listitem>
</itemizedlist>
</para>
</section>
<section xml:id="arch.overview.when">
<title>When Should I Use HBase?</title>
<para>First, make sure you have enough data. HBase isn't suitable for every problem. If you have
hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
thousand/million rows, then using a traditional RDBMS might be a better choice due to the
fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
be sitting idle.
</para>
<para>Second, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
</para>
<para>HBase can run quite well stand-alone on a laptop - but this should be considered a development
configuration only.
</para>
</section>
<section xml:id="arch.overview.hbasehdfs">
<title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
<para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files.
It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist
on HDFS for high-speed lookups. See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals.
</para>
</section>
</section>
<section xml:id="arch.catalog"> <section xml:id="arch.catalog">
<title>Catalog Tables</title> <title>Catalog Tables</title>
@ -2000,17 +2057,7 @@ hbase> describe 't1'</programlisting>
<qandaentry> <qandaentry>
<question><para>When should I use HBase?</para></question> <question><para>When should I use HBase?</para></question>
<answer> <answer>
<para> <para>See the <xref linkend="arch.overview" /> in the Architecture chapter.
Anybody can download and give HBase a spin, even on a laptop. The scope of this answer is when
would it be best to use HBase in a <emphasis>real</emphasis> deployment.
</para>
<para>First, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
Second, make sure you have enough data. HBase isn't suitable for every problem. If you have
hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
thousand/million rows, then using a traditional RDBMS might be a better choice due to the
fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
be sitting idle.
</para> </para>
</answer> </answer>
</qandaentry> </qandaentry>
@ -2031,17 +2078,6 @@ hbase> describe 't1'</programlisting>
</para> </para>
</answer> </answer>
</qandaentry> </qandaentry>
<qandaentry xml:id="faq.hdfs.hbase">
<question><para>How does HBase work on top of HDFS?</para></question>
<answer>
<para>
<link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. It's documentation
states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion.
See the <xref linkend="datamodel" /> and <xref linkend="architecture" /> sections for more information on how HBase achieves its goals.
</para>
</answer>
</qandaentry>
</qandadiv> </qandadiv>
<qandadiv xml:id="faq.config"><title>Configuration</title> <qandadiv xml:id="faq.config"><title>Configuration</title>
<qandaentry xml:id="faq.config.started"> <qandaentry xml:id="faq.config.started">
@ -2109,6 +2145,16 @@ hbase> describe 't1'</programlisting>
</answer> </answer>
</qandaentry> </qandaentry>
</qandadiv> </qandadiv>
<qandadiv xml:id="faq.mapreduce"><title>MapReduce</title>
<qandaentry xml:id="faq.mapreduce.use">
<question><para>How can I use MapReduce with HBase?</para></question>
<answer>
<para>
See <xref linkend="mapreduce" />
</para>
</answer>
</qandaentry>
</qandadiv>
<qandadiv><title>Performance and Troubleshooting</title> <qandadiv><title>Performance and Troubleshooting</title>
<qandaentry> <qandaentry>
<question><para> <question><para>

View File

@ -196,6 +196,28 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m &lt;cms options from above
</para> </para>
</section> </section>
</section> </section>
<section xml:id="trouble.resources">
<title>Resources</title>
<section xml:id="trouble.resources.lists">
<title>Dist-Lists</title>
<para>Sign up for the <link xlink:href="http://hbase.apache.org/mail-lists.html">HBase Dist-Lists</link> and post a question. 'Dev' is aimed at the
community of developers actually building HBase and for features currently under development, and 'User' for generally used for questions on released
versions of HBase.
</para>
</section>
<section xml:id="trouble.resources.searchhadoop">
<title>search-hadoop.com</title>
<para>
<link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and is great for historical searches.
</para>
</section>
<section xml:id="trouble.resources.jira">
<title>JIRA</title>
<para>
<link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link> is also really helpful when looking for Hadoop/HBase-specific issues.
</para>
</section>
</section>
<section xml:id="trouble.tools"> <section xml:id="trouble.tools">
<title>Tools</title> <title>Tools</title>
<section xml:id="trouble.tools.builtin"> <section xml:id="trouble.tools.builtin">
@ -221,12 +243,6 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m &lt;cms options from above
</section> </section>
<section xml:id="trouble.tools.external"> <section xml:id="trouble.tools.external">
<title>External Tools</title> <title>External Tools</title>
<section xml:id="trouble.tools.searchhadoop">
<title>search-hadoop.com</title>
<para>
<link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>, its really helpful when looking for Hadoop/HBase-specific issues.
</para>
</section>
<section xml:id="trouble.tools.tail"> <section xml:id="trouble.tools.tail">
<title>tail</title> <title>tail</title>
<para> <para>