hbase-4939 book.xml (architecture/faq), troubleshooting.xml (created resources section)
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1209688 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
b3d87416d2
commit
7df968e09a
|
@ -1200,6 +1200,63 @@ if (!b) {
|
||||||
|
|
||||||
<chapter xml:id="architecture">
|
<chapter xml:id="architecture">
|
||||||
<title>Architecture</title>
|
<title>Architecture</title>
|
||||||
|
<section xml:id="arch.overview">
|
||||||
|
<title>Overview</title>
|
||||||
|
<section xml:id="arch.overview.nosql">
|
||||||
|
<title>NoSQL?</title>
|
||||||
|
<para>HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which
|
||||||
|
supports SQL as it's primary access language, but there are many types of NoSQL databases: BerkeleyDB is an
|
||||||
|
example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking,
|
||||||
|
HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS,
|
||||||
|
such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
|
||||||
|
</para>
|
||||||
|
<para>However, HBase has many features which supports both linear and modular scaling. HBase clusters expand
|
||||||
|
by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20
|
||||||
|
RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
|
||||||
|
RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best
|
||||||
|
performance requires specialized hardware and storage devices. HBase features of note are:
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem>Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This
|
||||||
|
makes it very suitable for tasks such as high-speed counter aggregation. </listitem>
|
||||||
|
<listitem>Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are
|
||||||
|
automatically split and re-distributed as your data grows.</listitem>
|
||||||
|
<listitem>Automatic RegionServer failover</listitem>
|
||||||
|
<listitem>Hadoop/HDFS Integration: HBase supports HDFS out of the box as it's distributed file system.</listitem>
|
||||||
|
<listitem>MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both
|
||||||
|
source and sink.</listitem>
|
||||||
|
<listitem>Java Client API: HBase supports an easy to use Java API for programmatic access.</listitem>
|
||||||
|
<listitem>Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.</listitem>
|
||||||
|
<listitem>Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.</listitem>
|
||||||
|
<listitem>Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section xml:id="arch.overview.when">
|
||||||
|
<title>When Should I Use HBase?</title>
|
||||||
|
<para>First, make sure you have enough data. HBase isn't suitable for every problem. If you have
|
||||||
|
hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
|
||||||
|
thousand/million rows, then using a traditional RDBMS might be a better choice due to the
|
||||||
|
fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
|
||||||
|
be sitting idle.
|
||||||
|
</para>
|
||||||
|
<para>Second, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
|
||||||
|
5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
|
||||||
|
</para>
|
||||||
|
<para>HBase can run quite well stand-alone on a laptop - but this should be considered a development
|
||||||
|
configuration only.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="arch.overview.hbasehdfs">
|
||||||
|
<title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
|
||||||
|
<para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files.
|
||||||
|
It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
|
||||||
|
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
|
||||||
|
This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist
|
||||||
|
on HDFS for high-speed lookups. See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section xml:id="arch.catalog">
|
<section xml:id="arch.catalog">
|
||||||
<title>Catalog Tables</title>
|
<title>Catalog Tables</title>
|
||||||
|
@ -2000,17 +2057,7 @@ hbase> describe 't1'</programlisting>
|
||||||
<qandaentry>
|
<qandaentry>
|
||||||
<question><para>When should I use HBase?</para></question>
|
<question><para>When should I use HBase?</para></question>
|
||||||
<answer>
|
<answer>
|
||||||
<para>
|
<para>See the <xref linkend="arch.overview" /> in the Architecture chapter.
|
||||||
Anybody can download and give HBase a spin, even on a laptop. The scope of this answer is when
|
|
||||||
would it be best to use HBase in a <emphasis>real</emphasis> deployment.
|
|
||||||
</para>
|
|
||||||
<para>First, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
|
|
||||||
5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
|
|
||||||
Second, make sure you have enough data. HBase isn't suitable for every problem. If you have
|
|
||||||
hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
|
|
||||||
thousand/million rows, then using a traditional RDBMS might be a better choice due to the
|
|
||||||
fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
|
|
||||||
be sitting idle.
|
|
||||||
</para>
|
</para>
|
||||||
</answer>
|
</answer>
|
||||||
</qandaentry>
|
</qandaentry>
|
||||||
|
@ -2031,17 +2078,6 @@ hbase> describe 't1'</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</answer>
|
</answer>
|
||||||
</qandaentry>
|
</qandaentry>
|
||||||
<qandaentry xml:id="faq.hdfs.hbase">
|
|
||||||
<question><para>How does HBase work on top of HDFS?</para></question>
|
|
||||||
<answer>
|
|
||||||
<para>
|
|
||||||
<link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. It's documentation
|
|
||||||
states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
|
|
||||||
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion.
|
|
||||||
See the <xref linkend="datamodel" /> and <xref linkend="architecture" /> sections for more information on how HBase achieves its goals.
|
|
||||||
</para>
|
|
||||||
</answer>
|
|
||||||
</qandaentry>
|
|
||||||
</qandadiv>
|
</qandadiv>
|
||||||
<qandadiv xml:id="faq.config"><title>Configuration</title>
|
<qandadiv xml:id="faq.config"><title>Configuration</title>
|
||||||
<qandaentry xml:id="faq.config.started">
|
<qandaentry xml:id="faq.config.started">
|
||||||
|
@ -2109,6 +2145,16 @@ hbase> describe 't1'</programlisting>
|
||||||
</answer>
|
</answer>
|
||||||
</qandaentry>
|
</qandaentry>
|
||||||
</qandadiv>
|
</qandadiv>
|
||||||
|
<qandadiv xml:id="faq.mapreduce"><title>MapReduce</title>
|
||||||
|
<qandaentry xml:id="faq.mapreduce.use">
|
||||||
|
<question><para>How can I use MapReduce with HBase?</para></question>
|
||||||
|
<answer>
|
||||||
|
<para>
|
||||||
|
See <xref linkend="mapreduce" />
|
||||||
|
</para>
|
||||||
|
</answer>
|
||||||
|
</qandaentry>
|
||||||
|
</qandadiv>
|
||||||
<qandadiv><title>Performance and Troubleshooting</title>
|
<qandadiv><title>Performance and Troubleshooting</title>
|
||||||
<qandaentry>
|
<qandaentry>
|
||||||
<question><para>
|
<question><para>
|
||||||
|
|
|
@ -196,6 +196,28 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m <cms options from above
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
<section xml:id="trouble.resources">
|
||||||
|
<title>Resources</title>
|
||||||
|
<section xml:id="trouble.resources.lists">
|
||||||
|
<title>Dist-Lists</title>
|
||||||
|
<para>Sign up for the <link xlink:href="http://hbase.apache.org/mail-lists.html">HBase Dist-Lists</link> and post a question. 'Dev' is aimed at the
|
||||||
|
community of developers actually building HBase and for features currently under development, and 'User' for generally used for questions on released
|
||||||
|
versions of HBase.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="trouble.resources.searchhadoop">
|
||||||
|
<title>search-hadoop.com</title>
|
||||||
|
<para>
|
||||||
|
<link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and is great for historical searches.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
<section xml:id="trouble.resources.jira">
|
||||||
|
<title>JIRA</title>
|
||||||
|
<para>
|
||||||
|
<link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link> is also really helpful when looking for Hadoop/HBase-specific issues.
|
||||||
|
</para>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
<section xml:id="trouble.tools">
|
<section xml:id="trouble.tools">
|
||||||
<title>Tools</title>
|
<title>Tools</title>
|
||||||
<section xml:id="trouble.tools.builtin">
|
<section xml:id="trouble.tools.builtin">
|
||||||
|
@ -221,12 +243,6 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m <cms options from above
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.tools.external">
|
<section xml:id="trouble.tools.external">
|
||||||
<title>External Tools</title>
|
<title>External Tools</title>
|
||||||
<section xml:id="trouble.tools.searchhadoop">
|
|
||||||
<title>search-hadoop.com</title>
|
|
||||||
<para>
|
|
||||||
<link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>, it’s really helpful when looking for Hadoop/HBase-specific issues.
|
|
||||||
</para>
|
|
||||||
</section>
|
|
||||||
<section xml:id="trouble.tools.tail">
|
<section xml:id="trouble.tools.tail">
|
||||||
<title>tail</title>
|
<title>tail</title>
|
||||||
<para>
|
<para>
|
||||||
|
|
Loading…
Reference in New Issue