HBASE-11155 Fix Validation Errors in Ref Guide (Misty Stanley-Jones)
git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1596110 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
4d07aa5f1c
commit
f980d27472
File diff suppressed because it is too large
Load Diff
|
@ -7,7 +7,7 @@
|
||||||
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
||||||
xmlns:html="http://www.w3.org/1999/xhtml"
|
xmlns:html="http://www.w3.org/1999/xhtml"
|
||||||
xmlns:db="http://docbook.org/ns/docbook">
|
xmlns:db="http://docbook.org/ns/docbook">
|
||||||
<!--
|
<!--
|
||||||
/**
|
/**
|
||||||
* Licensed to the Apache Software Foundation (ASF) under one
|
* Licensed to the Apache Software Foundation (ASF) under one
|
||||||
* or more contributor license agreements. See the NOTICE file
|
* or more contributor license agreements. See the NOTICE file
|
||||||
|
@ -58,16 +58,16 @@
|
||||||
<section><title>Hardware</title>
|
<section><title>Hardware</title>
|
||||||
<para>Datanodes:
|
<para>Datanodes:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Two 12-core processors</listitem>
|
<listitem><para>Two 12-core processors</para></listitem>
|
||||||
<listitem>Six Enerprise SATA disks</listitem>
|
<listitem><para>Six Enerprise SATA disks</para></listitem>
|
||||||
<listitem>24GB of RAM</listitem>
|
<listitem><para>24GB of RAM</para></listitem>
|
||||||
<listitem>Two bonded gigabit NICs</listitem>
|
<listitem><para>Two bonded gigabit NICs</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Network:
|
<para>Network:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>10 Gigabit top-of-rack switches</listitem>
|
<listitem><para>10 Gigabit top-of-rack switches</para></listitem>
|
||||||
<listitem>20 Gigabit bonded interconnects between racks.</listitem>
|
<listitem><para>20 Gigabit bonded interconnects between racks.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -106,7 +106,7 @@
|
||||||
<para>Since neither the disks nor the processors were being utilized heavily, we moved on to the performance of the network interfaces. The datanode had two
|
<para>Since neither the disks nor the processors were being utilized heavily, we moved on to the performance of the network interfaces. The datanode had two
|
||||||
gigabit ethernet adapters, bonded to form an active-standby interface. <code>ifconfig(8)</code> showed some unusual anomalies, namely interface errors, overruns, framing errors.
|
gigabit ethernet adapters, bonded to form an active-standby interface. <code>ifconfig(8)</code> showed some unusual anomalies, namely interface errors, overruns, framing errors.
|
||||||
While not unheard of, these kinds of errors are exceedingly rare on modern hardware which is operating as it should:
|
While not unheard of, these kinds of errors are exceedingly rare on modern hardware which is operating as it should:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
$ /sbin/ifconfig bond0
|
$ /sbin/ifconfig bond0
|
||||||
bond0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
|
bond0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
|
||||||
inet addr:10.x.x.x Bcast:10.x.x.255 Mask:255.255.255.0
|
inet addr:10.x.x.x Bcast:10.x.x.255 Mask:255.255.255.0
|
||||||
|
@ -120,7 +120,7 @@ RX bytes:2416328868676 (2.4 TB) TX bytes:3464991094001 (3.4 TB)
|
||||||
<para>These errors immediately lead us to suspect that one or more of the ethernet interfaces might have negotiated the wrong line speed. This was confirmed both by running an ICMP ping
|
<para>These errors immediately lead us to suspect that one or more of the ethernet interfaces might have negotiated the wrong line speed. This was confirmed both by running an ICMP ping
|
||||||
from an external host and observing round-trip-time in excess of 700ms, and by running <code>ethtool(8)</code> on the members of the bond interface and discovering that the active interface
|
from an external host and observing round-trip-time in excess of 700ms, and by running <code>ethtool(8)</code> on the members of the bond interface and discovering that the active interface
|
||||||
was operating at 100Mbs/, full duplex.
|
was operating at 100Mbs/, full duplex.
|
||||||
<programlisting>
|
<programlisting>
|
||||||
$ sudo ethtool eth0
|
$ sudo ethtool eth0
|
||||||
Settings for eth0:
|
Settings for eth0:
|
||||||
Supported ports: [ TP ]
|
Supported ports: [ TP ]
|
||||||
|
@ -188,4 +188,4 @@ Link detected: yes
|
||||||
|
|
||||||
</section> <!-- performance/troubleshooting -->
|
</section> <!-- performance/troubleshooting -->
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
|
@ -173,7 +173,7 @@ needed for servers to pick up changes (caveat dynamic config. to be described la
|
||||||
<footnote>
|
<footnote>
|
||||||
<para>A useful read setting config on you hadoop cluster is Aaron
|
<para>A useful read setting config on you hadoop cluster is Aaron
|
||||||
Kimballs' <link
|
Kimballs' <link
|
||||||
xlink:ref="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration
|
xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/">Configuration
|
||||||
Parameters: What can you just ignore?</link></para>
|
Parameters: What can you just ignore?</link></para>
|
||||||
</footnote></para>
|
</footnote></para>
|
||||||
|
|
||||||
|
@ -527,8 +527,7 @@ homed on the node <varname>h-24-30.example.com</varname>.
|
||||||
</para>
|
</para>
|
||||||
<para>Now skip to <xref linkend="confirm" /> for how to start and verify your
|
<para>Now skip to <xref linkend="confirm" /> for how to start and verify your
|
||||||
pseudo-distributed install. <footnote>
|
pseudo-distributed install. <footnote>
|
||||||
<para>See <xref linkend="pseudo.extras">Pseudo-distributed
|
<para>See <xref linkend="pseudo.extras"/> for notes on how to start extra Masters and
|
||||||
mode extras</xref> for notes on how to start extra Masters and
|
|
||||||
RegionServers when running pseudo-distributed.</para>
|
RegionServers when running pseudo-distributed.</para>
|
||||||
</footnote></para>
|
</footnote></para>
|
||||||
|
|
||||||
|
@ -695,11 +694,7 @@ homed on the node <varname>h-24-30.example.com</varname>.
|
||||||
|
|
||||||
<programlisting>bin/start-hbase.sh</programlisting>
|
<programlisting>bin/start-hbase.sh</programlisting>
|
||||||
|
|
||||||
Run the above from the
|
<para>Run the above from the <varname>HBASE_HOME</varname> directory.</para>
|
||||||
|
|
||||||
<varname>HBASE_HOME</varname>
|
|
||||||
|
|
||||||
directory.
|
|
||||||
|
|
||||||
<para>You should now have a running HBase instance. HBase logs can be
|
<para>You should now have a running HBase instance. HBase logs can be
|
||||||
found in the <filename>logs</filename> subdirectory. Check them out
|
found in the <filename>logs</filename> subdirectory. Check them out
|
||||||
|
@ -771,7 +766,79 @@ stopping hbase...............</programlisting> Shutdown can take a moment to
|
||||||
The generated file is a docbook section with a glossary
|
The generated file is a docbook section with a glossary
|
||||||
in it-->
|
in it-->
|
||||||
<!--presumes the pre-site target has put the hbase-default.xml at this location-->
|
<!--presumes the pre-site target has put the hbase-default.xml at this location-->
|
||||||
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../target/docbkx/hbase-default.xml" />
|
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../../target/docbkx/hbase-default.xml">
|
||||||
|
<xi:fallback>
|
||||||
|
<section xml:id="hbase_default_configurations">
|
||||||
|
<title></title>
|
||||||
|
<para>
|
||||||
|
<emphasis>This file is fallback content</emphasis>. If you are seeing this, something is wrong with the build of the HBase documentation or you are doing pre-build verification.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The file hbase-default.xml is generated as part of
|
||||||
|
the build of the hbase site. See the hbase <filename>pom.xml</filename>.
|
||||||
|
The generated file is a docbook glossary.
|
||||||
|
</para>
|
||||||
|
<section>
|
||||||
|
<title>IDs that are auto-generated and cause validation errors if not present</title>
|
||||||
|
<para>
|
||||||
|
Each of these is a reference to a configuration file parameter which will cause an error if you are using the fallback content here. This is a dirty dirty hack.
|
||||||
|
</para>
|
||||||
|
<section xml:id="fail.fast.expired.active.master">
|
||||||
|
<title>fail.fast.expired.active.master</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hregion.memstore.flush.size">
|
||||||
|
<title>"hbase.hregion.memstore.flush.size"</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hstore.bytes.per.checksum">
|
||||||
|
<title>hbase.hstore.bytes.per.checksum</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.online.schema.update.enable">
|
||||||
|
<title>hbase.online.schema.update.enable</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.regionserver.global.memstore.size">
|
||||||
|
<title>hbase.regionserver.global.memstore.size</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hregion.max.filesize">
|
||||||
|
<title>hbase.hregion.max.filesize</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hstore.blockingStoreFiles">
|
||||||
|
<title>hbase.hstore.BlockingStoreFiles</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hfile.block.cache.size">
|
||||||
|
<title>hfile.block.cache.size</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="copy.table">
|
||||||
|
<title>copy.table</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hstore.checksum.algorithm">
|
||||||
|
<title>hbase.hstore.checksum.algorithm</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.zookeeper.useMulti">
|
||||||
|
<title>hbase.zookeeper.useMulti</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.hregion.memstore.block.multiplier">
|
||||||
|
<title>hbase.hregion.memstore.block.multiplier</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
<section xml:id="hbase.regionserver.global.memstore.size.lower.limit">
|
||||||
|
<title>hbase.regionserver.global.memstore.size.lower.limit</title>
|
||||||
|
<para />
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
</xi:fallback>
|
||||||
|
</xi:include>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="hbase.env.sh">
|
<section xml:id="hbase.env.sh">
|
||||||
|
|
|
@ -118,8 +118,8 @@ git clone git://github.com/apache/hbase.git
|
||||||
<title>Maven Classpath Variable</title>
|
<title>Maven Classpath Variable</title>
|
||||||
<para>The <varname>M2_REPO</varname> classpath variable needs to be set up for the project. This needs to be set to
|
<para>The <varname>M2_REPO</varname> classpath variable needs to be set up for the project. This needs to be set to
|
||||||
your local Maven repository, which is usually <filename>~/.m2/repository</filename></para>
|
your local Maven repository, which is usually <filename>~/.m2/repository</filename></para>
|
||||||
If this classpath variable is not configured, you will see compile errors in Eclipse like this...
|
<para>If this classpath variable is not configured, you will see compile errors in Eclipse like this:
|
||||||
<programlisting>
|
</para> <programlisting>
|
||||||
Description Resource Path Location Type
|
Description Resource Path Location Type
|
||||||
The project cannot be built until build path errors are resolved hbase Unknown Java Problem
|
The project cannot be built until build path errors are resolved hbase Unknown Java Problem
|
||||||
Unbound classpath variable: 'M2_REPO/asm/asm/3.1/asm-3.1.jar' in project 'hbase' hbase Build path Build Path Problem
|
Unbound classpath variable: 'M2_REPO/asm/asm/3.1/asm-3.1.jar' in project 'hbase' hbase Build path Build Path Problem
|
||||||
|
@ -223,51 +223,52 @@ mvn compile -Dcompile-protobuf -Dprotoc.path=/opt/local/bin/protoc
|
||||||
poms when you build. For now, just be aware of the difference between HBase 1.x
|
poms when you build. For now, just be aware of the difference between HBase 1.x
|
||||||
builds and those of HBase 0.96-0.98. Below we will come back to this difference
|
builds and those of HBase 0.96-0.98. Below we will come back to this difference
|
||||||
when we list out build instructions.</para>
|
when we list out build instructions.</para>
|
||||||
</section>
|
|
||||||
|
|
||||||
<para xml:id="mvn.settings.file">Publishing to maven requires you sign the artifacts you want to upload. To have the
|
<para xml:id="mvn.settings.file">Publishing to maven requires you sign the artifacts you want to upload. To have the
|
||||||
build do this for you, you need to make sure you have a properly configured
|
build do this for you, you need to make sure you have a properly configured
|
||||||
<filename>settings.xml</filename> in your local repository under <filename>.m2</filename>.
|
<filename>settings.xml</filename> in your local repository under <filename>.m2</filename>.
|
||||||
Here is my <filename>~/.m2/settings.xml</filename>.
|
Here is my <filename>~/.m2/settings.xml</filename>.
|
||||||
<programlisting><settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
|
<programlisting><![CDATA[<settings xmlns="http://maven.apache.org/SETTINGS/1.0.0"
|
||||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||||
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
|
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
|
||||||
http://maven.apache.org/xsd/settings-1.0.0.xsd">
|
http://maven.apache.org/xsd/settings-1.0.0.xsd">
|
||||||
<servers>
|
<servers>
|
||||||
<!- To publish a snapshot of some part of Maven -->
|
<!- To publish a snapshot of some part of Maven -->
|
||||||
<server>
|
<server>
|
||||||
<id>apache.snapshots.https</id>
|
<id>apache.snapshots.https</id>
|
||||||
<username>YOUR_APACHE_ID
|
<username>YOUR_APACHE_ID
|
||||||
</username>
|
</username>
|
||||||
<password>YOUR_APACHE_PASSWORD
|
<password>YOUR_APACHE_PASSWORD
|
||||||
</password>
|
</password>
|
||||||
</server>
|
</server>
|
||||||
<!-- To publish a website using Maven -->
|
<!-- To publish a website using Maven -->
|
||||||
<!-- To stage a release of some part of Maven -->
|
<!-- To stage a release of some part of Maven -->
|
||||||
<server>
|
<server>
|
||||||
<id>apache.releases.https</id>
|
<id>apache.releases.https</id>
|
||||||
<username>YOUR_APACHE_ID
|
<username>YOUR_APACHE_ID
|
||||||
</username>
|
</username>
|
||||||
<password>YOUR_APACHE_PASSWORD
|
<password>YOUR_APACHE_PASSWORD
|
||||||
</password>
|
</password>
|
||||||
</server>
|
</server>
|
||||||
</servers>
|
</servers>
|
||||||
<profiles>
|
<profiles>
|
||||||
<profile>
|
<profile>
|
||||||
<id>apache-release</id>
|
<id>apache-release</id>
|
||||||
<properties>
|
<properties>
|
||||||
<gpg.keyname>YOUR_KEYNAME</gpg.keyname>
|
<gpg.keyname>YOUR_KEYNAME</gpg.keyname>
|
||||||
<!--Keyname is something like this ... 00A5F21E... do gpg --list-keys to find it-->
|
<!--Keyname is something like this ... 00A5F21E... do gpg --list-keys to find it-->
|
||||||
<gpg.passphrase>YOUR_KEY_PASSWORD
|
<gpg.passphrase>YOUR_KEY_PASSWORD
|
||||||
</gpg.passphrase>
|
</gpg.passphrase>
|
||||||
</properties>
|
</properties>
|
||||||
</profile>
|
</profile>
|
||||||
</profiles>
|
</profiles>
|
||||||
</settings>
|
</settings>]]>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
<para>You must use maven 3.0.x (Check by running <command>mvn -version</command>).
|
<para>You must use maven 3.0.x (Check by running <command>mvn -version</command>).
|
||||||
</para>
|
</para>
|
||||||
|
</section>
|
||||||
<section xml:id="maven.release">
|
<section xml:id="maven.release">
|
||||||
<title>Making a Release Candidate</title>
|
<title>Making a Release Candidate</title>
|
||||||
<para>I'll explain by running through the process. See later in this section for more detail on particular steps.
|
<para>I'll explain by running through the process. See later in this section for more detail on particular steps.
|
||||||
|
@ -501,17 +502,17 @@ HBase have a character not usually seen in other projects.</para>
|
||||||
dependency tree).</para>
|
dependency tree).</para>
|
||||||
<section xml:id="hbase.moduletest.run">
|
<section xml:id="hbase.moduletest.run">
|
||||||
<title>Running Tests in other Modules</title>
|
<title>Running Tests in other Modules</title>
|
||||||
If the module you are developing in has no other dependencies on other HBase modules, then
|
<para>If the module you are developing in has no other dependencies on other HBase modules, then
|
||||||
you can cd into that module and just run:
|
you can cd into that module and just run:</para>
|
||||||
<programlisting>mvn test</programlisting>
|
<programlisting>mvn test</programlisting>
|
||||||
which will just run the tests IN THAT MODULE. If there are other dependencies on other modules,
|
<para>which will just run the tests IN THAT MODULE. If there are other dependencies on other modules,
|
||||||
then you will have run the command from the ROOT HBASE DIRECTORY. This will run the tests in the other
|
then you will have run the command from the ROOT HBASE DIRECTORY. This will run the tests in the other
|
||||||
modules, unless you specify to skip the tests in that module. For instance, to skip the tests in the hbase-server module,
|
modules, unless you specify to skip the tests in that module. For instance, to skip the tests in the hbase-server module,
|
||||||
you would run:
|
you would run:</para>
|
||||||
<programlisting>mvn clean test -PskipServerTests</programlisting>
|
<programlisting>mvn clean test -PskipServerTests</programlisting>
|
||||||
from the top level directory to run all the tests in modules other than hbase-server. Note that you
|
<para>from the top level directory to run all the tests in modules other than hbase-server. Note that you
|
||||||
can specify to skip tests in multiple modules as well as just for a single module. For example, to skip
|
can specify to skip tests in multiple modules as well as just for a single module. For example, to skip
|
||||||
the tests in <classname>hbase-server</classname> and <classname>hbase-common</classname>, you would run:
|
the tests in <classname>hbase-server</classname> and <classname>hbase-common</classname>, you would run:</para>
|
||||||
<programlisting>mvn clean test -PskipServerTests -PskipCommonTests</programlisting>
|
<programlisting>mvn clean test -PskipServerTests -PskipCommonTests</programlisting>
|
||||||
<para>Also, keep in mind that if you are running tests in the <classname>hbase-server</classname> module you will need to
|
<para>Also, keep in mind that if you are running tests in the <classname>hbase-server</classname> module you will need to
|
||||||
apply the maven profiles discussed in <xref linkend="hbase.unittests.cmds"/> to get the tests to run properly.</para>
|
apply the maven profiles discussed in <xref linkend="hbase.unittests.cmds"/> to get the tests to run properly.</para>
|
||||||
|
@ -541,7 +542,7 @@ The first three categories, small, medium, and large are for tests run when
|
||||||
you type <code>$ mvn test</code>; i.e. these three categorizations are for
|
you type <code>$ mvn test</code>; i.e. these three categorizations are for
|
||||||
HBase unit tests. The integration category is for not for unit tests but for integration
|
HBase unit tests. The integration category is for not for unit tests but for integration
|
||||||
tests. These are run when you invoke <code>$ mvn verify</code>. Integration tests
|
tests. These are run when you invoke <code>$ mvn verify</code>. Integration tests
|
||||||
are described in <xref linkend="integration.tests">integration tests section</xref> and will not be discussed further
|
are described in <xref linkend="integration.tests"/> and will not be discussed further
|
||||||
in this section on HBase unit tests.</para>
|
in this section on HBase unit tests.</para>
|
||||||
<para>
|
<para>
|
||||||
Apache HBase uses a patched maven surefire plugin and maven profiles to implement
|
Apache HBase uses a patched maven surefire plugin and maven profiles to implement
|
||||||
|
@ -579,7 +580,7 @@ the developer machine as well.
|
||||||
<section xml:id="hbase.unittests.integration">
|
<section xml:id="hbase.unittests.integration">
|
||||||
<title>Integration Tests<indexterm><primary>IntegrationTests</primary></indexterm></title>
|
<title>Integration Tests<indexterm><primary>IntegrationTests</primary></indexterm></title>
|
||||||
<para><emphasis>Integration</emphasis> tests are system level tests. See
|
<para><emphasis>Integration</emphasis> tests are system level tests. See
|
||||||
<xref linkend="integration.tests">integration tests section</xref> for more info.
|
<xref linkend="integration.tests"/> for more info.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
@ -704,17 +705,17 @@ should not impact these resources, it's worth checking these log lines
|
||||||
<title>General rules</title>
|
<title>General rules</title>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
As much as possible, tests should be written as category small tests.
|
<para>As much as possible, tests should be written as category small tests.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names.
|
<para>All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests.
|
<para>Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Tests can be written with <classname>HBaseTestingUtility</classname>.
|
<para>Tests can be written with <classname>HBaseTestingUtility</classname>.
|
||||||
This class offers helper functions to create a temp directory and do the cleanup, or to start a cluster.
|
This class offers helper functions to create a temp directory and do the cleanup, or to start a cluster.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</section>
|
</section>
|
||||||
|
@ -722,19 +723,19 @@ This class offers helper functions to create a temp directory and do the cleanup
|
||||||
<title>Categories and execution time</title>
|
<title>Categories and execution time</title>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
All tests must be categorized, if not they could be skipped.
|
<para>All tests must be categorized, if not they could be skipped.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
All tests should be written to be as fast as possible.
|
<para>All tests should be written to be as fast as possible.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Small category tests should last less than 15 seconds, and must not have any side effect.
|
<para>Small category tests should last less than 15 seconds, and must not have any side effect.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Medium category tests should last less than 50 seconds.
|
<para>Medium category tests should last less than 50 seconds.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Large category tests should last less than 3 minutes. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.
|
<para>Large category tests should last less than 3 minutes. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</section>
|
</section>
|
||||||
|
@ -862,17 +863,17 @@ are running other tests.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
ChaosMonkey defines Action's and Policy's. Actions are sequences of events. We have at least the following actions:
|
ChaosMonkey defines Action's and Policy's. Actions are sequences of events. We have at least the following actions:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Restart active master (sleep 5 sec)</listitem>
|
<listitem><para>Restart active master (sleep 5 sec)</para></listitem>
|
||||||
<listitem>Restart random regionserver (sleep 5 sec)</listitem>
|
<listitem><para>Restart random regionserver (sleep 5 sec)</para></listitem>
|
||||||
<listitem>Restart random regionserver (sleep 60 sec)</listitem>
|
<listitem><para>Restart random regionserver (sleep 60 sec)</para></listitem>
|
||||||
<listitem>Restart META regionserver (sleep 5 sec)</listitem>
|
<listitem><para>Restart META regionserver (sleep 5 sec)</para></listitem>
|
||||||
<listitem>Restart ROOT regionserver (sleep 5 sec)</listitem>
|
<listitem><para>Restart ROOT regionserver (sleep 5 sec)</para></listitem>
|
||||||
<listitem>Batch restart of 50% of regionservers (sleep 5 sec)</listitem>
|
<listitem><para>Batch restart of 50% of regionservers (sleep 5 sec)</para></listitem>
|
||||||
<listitem>Rolling restart of 100% of regionservers (sleep 5 sec)</listitem>
|
<listitem><para>Rolling restart of 100% of regionservers (sleep 5 sec)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
<para>
|
||||||
Policies on the other hand are responsible for executing the actions based on a strategy.
|
Policies on the other hand are responsible for executing the actions based on a strategy.
|
||||||
The default policy is to execute a random action every minute based on predefined action
|
The default policy is to execute a random action every minute based on predefined action
|
||||||
weights. ChaosMonkey executes predefined named policies until it is stopped. More than one
|
weights. ChaosMonkey executes predefined named policies until it is stopped. More than one
|
||||||
|
@ -881,11 +882,12 @@ policy can be active at any time.
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. ChaosMonkey uses the configuration
|
To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. ChaosMonkey uses the configuration
|
||||||
from the bin/hbase script, thus no extra configuration needs to be done. You can invoke the ChaosMonkey by running:
|
from the bin/hbase script, thus no extra configuration needs to be done. You can invoke the ChaosMonkey by running:</para>
|
||||||
<programlisting>bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey</programlisting>
|
<programlisting>bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey</programlisting>
|
||||||
|
<para>
|
||||||
This will output smt like:
|
This will output smt like:
|
||||||
<programlisting>
|
</para>
|
||||||
|
<screen>
|
||||||
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
|
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000
|
||||||
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
|
12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter
|
||||||
12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master
|
12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master
|
||||||
|
@ -921,8 +923,8 @@ This will output smt like:
|
||||||
12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out
|
12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out
|
||||||
|
|
||||||
12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
|
12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
|
||||||
</programlisting>
|
</screen>
|
||||||
|
<para>
|
||||||
As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. ChaosMonkey tool, if run from command line, will keep on running until the process is killed.
|
As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. ChaosMonkey tool, if run from command line, will keep on running until the process is killed.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -958,7 +960,7 @@ mvn compile
|
||||||
<para>
|
<para>
|
||||||
The above will build against whatever explicit hadoop 1.x version we have in our <filename>pom.xml</filename> as our '1.0' version.
|
The above will build against whatever explicit hadoop 1.x version we have in our <filename>pom.xml</filename> as our '1.0' version.
|
||||||
Tests may not all pass so you may need to pass <code>-DskipTests</code> unless you are inclined to fix the failing tests.</para>
|
Tests may not all pass so you may need to pass <code>-DskipTests</code> unless you are inclined to fix the failing tests.</para>
|
||||||
<note id="maven.build.passing.default.profile">
|
<note xml:id="maven.build.passing.default.profile">
|
||||||
<title>'dependencyManagement.dependencies.dependency.artifactId' for org.apache.hbase:${compat.module}:test-jar with value '${compat.module}' does not match a valid id pattern</title>
|
<title>'dependencyManagement.dependencies.dependency.artifactId' for org.apache.hbase:${compat.module}:test-jar with value '${compat.module}' does not match a valid id pattern</title>
|
||||||
<para>You will see ERRORs like the above title if you pass the <emphasis>default</emphasis> profile; e.g. if
|
<para>You will see ERRORs like the above title if you pass the <emphasis>default</emphasis> profile; e.g. if
|
||||||
you pass <property>hadoop.profile=1.1</property> when building 0.96 or
|
you pass <property>hadoop.profile=1.1</property> when building 0.96 or
|
||||||
|
@ -1001,12 +1003,12 @@ pecularity that is probably fixable but we've not spent the time trying to figur
|
||||||
<section xml:id="jira.priorities"><title>Jira Priorities</title>
|
<section xml:id="jira.priorities"><title>Jira Priorities</title>
|
||||||
<para>The following is a guideline on setting Jira issue priorities:
|
<para>The following is a guideline on setting Jira issue priorities:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Blocker: Should only be used if the issue WILL cause data loss or cluster instability reliably.</listitem>
|
<listitem><para>Blocker: Should only be used if the issue WILL cause data loss or cluster instability reliably.</para></listitem>
|
||||||
<listitem>Critical: The issue described can cause data loss or cluster instability in some cases.</listitem>
|
<listitem><para>Critical: The issue described can cause data loss or cluster instability in some cases.</para></listitem>
|
||||||
<listitem>Major: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant
|
<listitem><para>Major: Important but not tragic issues, like updates to the client API that will add a lot of much-needed functionality or significant
|
||||||
bugs that need to be fixed but that don't cause data loss.</listitem>
|
bugs that need to be fixed but that don't cause data loss.</para></listitem>
|
||||||
<listitem>Minor: Useful enhancements and annoying but not damaging bugs.</listitem>
|
<listitem><para>Minor: Useful enhancements and annoying but not damaging bugs.</para></listitem>
|
||||||
<listitem>Trivial: Useful enhancements but generally cosmetic.</listitem>
|
<listitem><para>Trivial: Useful enhancements but generally cosmetic.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -1161,10 +1163,9 @@ pecularity that is probably fixable but we've not spent the time trying to figur
|
||||||
<para>Please submit one patch-file per Jira. For example, if multiple files are changed make sure the
|
<para>Please submit one patch-file per Jira. For example, if multiple files are changed make sure the
|
||||||
selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. </para>
|
selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. </para>
|
||||||
<para>
|
<para>
|
||||||
Generating patches using git:<sbr/>
|
Generating patches using git:</para>
|
||||||
<programlisting>
|
<screen>$ git diff --no-prefix > HBASE_XXXX.patch</screen>
|
||||||
$ git diff --no-prefix > HBASE_XXXX.patch
|
<para>
|
||||||
</programlisting>
|
|
||||||
Don't forget the 'no-prefix' option; and generate the diff from the root directory of project
|
Don't forget the 'no-prefix' option; and generate the diff from the root directory of project
|
||||||
</para>
|
</para>
|
||||||
<para>Make sure you review <xref linkend="eclipse.code.formatting"/> for code style. </para>
|
<para>Make sure you review <xref linkend="eclipse.code.formatting"/> for code style. </para>
|
||||||
|
@ -1283,11 +1284,10 @@ Bar bar = foo.getBar(); <--- imagine there's an extra space(s) after the
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="common.patch.feedback.javadoc">
|
<section xml:id="common.patch.feedback.javadoc">
|
||||||
<title>Javadoc</title>
|
<title>Javadoc</title>
|
||||||
<para>This is also a very common feedback item. Don't forget Javadoc!
|
<para>This is also a very common feedback item. Don't forget Javadoc!</para>
|
||||||
<para>Javadoc warnings are checked during precommit. If the precommit tool gives you a '-1',
|
<para>Javadoc warnings are checked during precommit. If the precommit tool gives you a '-1',
|
||||||
please fix the javadoc issue. Your patch won't be committed if it adds such warnings.
|
please fix the javadoc issue. Your patch won't be committed if it adds such warnings.
|
||||||
</para>
|
</para>
|
||||||
</para>
|
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="common.patch.feedback.findbugs">
|
<section xml:id="common.patch.feedback.findbugs">
|
||||||
<title>Findbugs</title>
|
<title>Findbugs</title>
|
||||||
|
@ -1345,25 +1345,25 @@ Bar bar = foo.getBar(); <--- imagine there's an extra space(s) after the
|
||||||
</para>
|
</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
Do not delete the old patch file
|
<para>Do not delete the old patch file</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
version your new patch file using a simple scheme like this: <sbr/>
|
<para>version your new patch file using a simple scheme like this:</para>
|
||||||
HBASE-{jira number}-{version}.patch <sbr/>
|
<screen>HBASE-{jira number}-{version}.patch</screen>
|
||||||
e.g:
|
<para>e.g:</para>
|
||||||
HBASE_XXXX-v2.patch
|
<screen>HBASE_XXXX-v2.patch</screen>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
'Cancel Patch' on JIRA.. bug status will change back to Open
|
<para>'Cancel Patch' on JIRA.. bug status will change back to Open</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Attach new patch file (e.g. HBASE_XXXX-v2.patch) using 'Files --> Attach'
|
<para>Attach new patch file (e.g. HBASE_XXXX-v2.patch) using 'Files --> Attach'</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
Click on 'Submit Patch'. Now the bug status will say 'Patch Available'.
|
<para>Click on 'Submit Patch'. Now the bug status will say 'Patch Available'.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
Committers will review the patch. Rinse and repeat as many times as needed :-)
|
<para>Committers will review the patch. Rinse and repeat as many times as needed :-)</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
|
@ -1372,32 +1372,32 @@ Bar bar = foo.getBar(); <--- imagine there's an extra space(s) after the
|
||||||
At times you may want to break a big change into mulitple patches. Here is a sample work-flow using git
|
At times you may want to break a big change into mulitple patches. Here is a sample work-flow using git
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
patch 1:
|
<para>patch 1:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
$ git diff --no-prefix > HBASE_XXXX-1.patch
|
<screen>$ git diff --no-prefix > HBASE_XXXX-1.patch</screen>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
patch 2:
|
<para>patch 2:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
create a new git branch <sbr/>
|
<para>create a new git branch</para>
|
||||||
$ git checkout -b my_branch
|
<screen>$ git checkout -b my_branch</screen>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
save your work
|
<para>save your work</para>
|
||||||
$ git add file1 file2 <sbr/>
|
<screen>$ git add file1 file2 </screen>
|
||||||
$ git commit -am 'saved after HBASE_XXXX-1.patch' <sbr/>
|
<screen>$ git commit -am 'saved after HBASE_XXXX-1.patch'</screen>
|
||||||
now you have your own branch, that is different from remote master branch <sbr/>
|
<para>now you have your own branch, that is different from remote master branch</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
make more changes...
|
<para>make more changes...</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
create second patch <sbr/>
|
<para>create second patch</para>
|
||||||
$ git diff --no-prefix > HBASE_XXXX-2.patch
|
<screen>$ git diff --no-prefix > HBASE_XXXX-2.patch</screen>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
|
|
|
@ -27,8 +27,8 @@
|
||||||
*/
|
*/
|
||||||
-->
|
-->
|
||||||
<title>Apache HBase External APIs</title>
|
<title>Apache HBase External APIs</title>
|
||||||
This chapter will cover access to Apache HBase either through non-Java languages, or through custom protocols.
|
<para> This chapter will cover access to Apache HBase either through non-Java languages, or through custom protocols.
|
||||||
|
</para>
|
||||||
<section xml:id="nonjava.jvm">
|
<section xml:id="nonjava.jvm">
|
||||||
<title>Non-Java Languages Talking to the JVM</title>
|
<title>Non-Java Languages Talking to the JVM</title>
|
||||||
<para>Currently the documentation on this topic in the
|
<para>Currently the documentation on this topic in the
|
||||||
|
@ -172,7 +172,7 @@
|
||||||
<para>Example4:<code> =, 'substring:abc123' </code>will match everything that begins with the substring "abc123"</para>
|
<para>Example4:<code> =, 'substring:abc123' </code>will match everything that begins with the substring "abc123"</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="example PHP Client Program"><title>Example PHP Client Program that uses the Filter Language</title>
|
<section xml:id="examplePHPClientProgram"><title>Example PHP Client Program that uses the Filter Language</title>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
<? $_SERVER['PHP_ROOT'] = realpath(dirname(__FILE__).'/..');
|
<? $_SERVER['PHP_ROOT'] = realpath(dirname(__FILE__).'/..');
|
||||||
require_once $_SERVER['PHP_ROOT'].'/flib/__flib.php';
|
require_once $_SERVER['PHP_ROOT'].'/flib/__flib.php';
|
||||||
|
@ -205,7 +205,7 @@
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<orderedlist>
|
<orderedlist>
|
||||||
<para>
|
<listitem>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para><code>“(RowFilter (=, ‘binary:Row 1’) AND TimeStampsFilter (74689, 89734)) OR
|
<para><code>“(RowFilter (=, ‘binary:Row 1’) AND TimeStampsFilter (74689, 89734)) OR
|
||||||
|
@ -216,7 +216,7 @@
|
||||||
<para>1) The key-value pair must be in a column that is lexicographically >= abc and < xyz </para>
|
<para>1) The key-value pair must be in a column that is lexicographically >= abc and < xyz </para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</listitem>
|
||||||
</orderedlist>
|
</orderedlist>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
@ -228,7 +228,7 @@
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="Individual Filter Syntax"><title>Individual Filter Syntax</title>
|
<section xml:id="IndividualFilterSyntax"><title>Individual Filter Syntax</title>
|
||||||
<orderedlist>
|
<orderedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<para><emphasis role="bold"><emphasis role="underline">KeyOnlyFilter</emphasis></emphasis></para>
|
<para><emphasis role="bold"><emphasis role="underline">KeyOnlyFilter</emphasis></emphasis></para>
|
||||||
|
|
|
@ -7,7 +7,7 @@
|
||||||
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
xmlns:m="http://www.w3.org/1998/Math/MathML"
|
||||||
xmlns:html="http://www.w3.org/1999/xhtml"
|
xmlns:html="http://www.w3.org/1999/xhtml"
|
||||||
xmlns:db="http://docbook.org/ns/docbook">
|
xmlns:db="http://docbook.org/ns/docbook">
|
||||||
<!--
|
<!--
|
||||||
/**
|
/**
|
||||||
* Licensed to the Apache Software Foundation (ASF) under one
|
* Licensed to the Apache Software Foundation (ASF) under one
|
||||||
* or more contributor license agreements. See the NOTICE file
|
* or more contributor license agreements. See the NOTICE file
|
||||||
|
@ -57,17 +57,16 @@
|
||||||
<link xlink:href="https://issues.apache.org/jira/browse/HBASE-3696"/> and its associated issues for more details.</para></note>
|
<link xlink:href="https://issues.apache.org/jira/browse/HBASE-3696"/> and its associated issues for more details.</para></note>
|
||||||
<note xml:id="loopback.ip.getting.started">
|
<note xml:id="loopback.ip.getting.started">
|
||||||
<title>Loopback IP</title>
|
<title>Loopback IP</title>
|
||||||
<note>
|
|
||||||
<para><emphasis>The below advice is for hbase-0.94.x and older versions only. We believe this fixed in hbase-0.96.0 and beyond
|
<para><emphasis>The below advice is for hbase-0.94.x and older versions only. We believe this fixed in hbase-0.96.0 and beyond
|
||||||
(let us know if we have it wrong).</emphasis> There should be no need of the below modification to <filename>/etc/hosts</filename> in
|
(let us know if we have it wrong).</emphasis> There should be no need of the below modification to <filename>/etc/hosts</filename> in
|
||||||
later versions of HBase.</para>
|
later versions of HBase.</para>
|
||||||
</note>
|
|
||||||
<para>HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions,
|
<para>HBase expects the loopback IP address to be 127.0.0.1. Ubuntu and some other distributions,
|
||||||
for example, will default to 127.0.1.1 and this will cause problems for you
|
for example, will default to 127.0.1.1 and this will cause problems for you
|
||||||
<footnote><para>See <link xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase care about /etc/hosts?</link> for detail.</para></footnote>.
|
<footnote><para>See <link xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase care about /etc/hosts?</link> for detail.</para></footnote>.
|
||||||
</para>
|
</para>
|
||||||
<para><filename>/etc/hosts</filename> should look something like this:
|
<para><filename>/etc/hosts</filename> should look something like this:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
127.0.0.1 localhost
|
127.0.0.1 localhost
|
||||||
127.0.0.1 ubuntu.ubuntu-domain ubuntu
|
127.0.0.1 ubuntu.ubuntu-domain ubuntu
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
@ -99,7 +98,7 @@ $ cd hbase-<?eval ${project.version}?>
|
||||||
<varname>hbase.rootdir</varname>, the directory HBase writes data to,
|
<varname>hbase.rootdir</varname>, the directory HBase writes data to,
|
||||||
and <varname>hbase.zookeeper.property.dataDir</varname>, the directory
|
and <varname>hbase.zookeeper.property.dataDir</varname>, the directory
|
||||||
ZooKeeper writes its data too:
|
ZooKeeper writes its data too:
|
||||||
<programlisting><?xml version="1.0"?>
|
<programlisting><?xml version="1.0"?>
|
||||||
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
|
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
|
||||||
<configuration>
|
<configuration>
|
||||||
<property>
|
<property>
|
||||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -51,9 +51,9 @@
|
||||||
<para>
|
<para>
|
||||||
Important items to consider:
|
Important items to consider:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Switching capacity of the device</listitem>
|
<listitem><para>Switching capacity of the device</para></listitem>
|
||||||
<listitem>Number of systems connected</listitem>
|
<listitem><para>Number of systems connected</para></listitem>
|
||||||
<listitem>Uplink capacity</listitem>
|
<listitem><para>Uplink capacity</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="perf.network.1switch">
|
<section xml:id="perf.network.1switch">
|
||||||
|
@ -71,9 +71,9 @@
|
||||||
</para>
|
</para>
|
||||||
<para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
|
<para>Mitigation of this issue is fairly simple and can be accomplished in multiple ways:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Use appropriate hardware for the scale of the cluster which you're attempting to build.</listitem>
|
<listitem><para>Use appropriate hardware for the scale of the cluster which you're attempting to build.</para></listitem>
|
||||||
<listitem>Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port</listitem>
|
<listitem><para>Use larger single switch configurations i.e. single 48 port as opposed to 2x 24 port</para></listitem>
|
||||||
<listitem>Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.</listitem>
|
<listitem><para>Configure port trunking for uplinks to utilize multiple interfaces to increase cross switch bandwidth.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -81,8 +81,8 @@
|
||||||
<title>Multiple Racks</title>
|
<title>Multiple Racks</title>
|
||||||
<para>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:
|
<para>Multiple rack configurations carry the same potential issues as multiple switches, and can suffer performance degradation from two main areas:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Poor switch capacity performance</listitem>
|
<listitem><para>Poor switch capacity performance</para></listitem>
|
||||||
<listitem>Insufficient uplink to another rack</listitem>
|
<listitem><para>Insufficient uplink to another rack</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing
|
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing
|
||||||
more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
|
more of your cluster across racks. The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
|
||||||
|
|
|
@ -32,7 +32,7 @@
|
||||||
the various non-rdbms datastores is Ian Varley's Master thesis,
|
the various non-rdbms datastores is Ian Varley's Master thesis,
|
||||||
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
|
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
|
||||||
Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally, and the section on
|
Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally, and the section on
|
||||||
<xref linkend="schema.casestudies">HBase Schema Design Case Studies</xref>.
|
<xref linkend="schema.casestudies"/>.
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="schema.creation">
|
<section xml:id="schema.creation">
|
||||||
<title>
|
<title>
|
||||||
|
@ -41,7 +41,7 @@
|
||||||
<para>HBase schemas can be created or updated with <xref linkend="shell" />
|
<para>HBase schemas can be created or updated with <xref linkend="shell" />
|
||||||
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
|
or by using <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html">HBaseAdmin</link> in the Java API.
|
||||||
</para>
|
</para>
|
||||||
<para>Tables must be disabled when making ColumnFamily modifications, for example..
|
<para>Tables must be disabled when making ColumnFamily modifications, for example:</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
Configuration config = HBaseConfiguration.create();
|
Configuration config = HBaseConfiguration.create();
|
||||||
HBaseAdmin admin = new HBaseAdmin(conf);
|
HBaseAdmin admin = new HBaseAdmin(conf);
|
||||||
|
@ -56,7 +56,7 @@ admin.modifyColumn(table, cf2); // modifying existing ColumnFamily
|
||||||
|
|
||||||
admin.enableTable(table);
|
admin.enableTable(table);
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.
|
<para>See <xref linkend="client_dependencies"/> for more information about configuring client connections.</para>
|
||||||
<para>Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
|
<para>Note: online schema changes are supported in the 0.92.x codebase, but the 0.90.x codebase requires the table
|
||||||
to be disabled.
|
to be disabled.
|
||||||
</para>
|
</para>
|
||||||
|
@ -98,7 +98,7 @@ admin.enableTable(table);
|
||||||
Monotonically Increasing Row Keys/Timeseries Data
|
Monotonically Increasing Row Keys/Timeseries Data
|
||||||
</title>
|
</title>
|
||||||
<para>
|
<para>
|
||||||
In the HBase chapter of Tom White's book <link xlink:url="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
|
In the HBase chapter of Tom White's book <link xlink:href="http://oreilly.com/catalog/9780596521981">Hadoop: The Definitive Guide</link> (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores:
|
||||||
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
|
<link xlink:href="http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/">monotonically increasing values are bad</link>. The pile-up on a single region brought on
|
||||||
by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
|
by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
|
||||||
</para>
|
</para>
|
||||||
|
@ -107,7 +107,7 @@ admin.enableTable(table);
|
||||||
successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
|
successful example. It has a page describing the <link xlink:href=" http://opentsdb.net/schema.html">schema</link> it uses in
|
||||||
HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
|
HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the <emphasis>lead</emphasis> position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
|
||||||
</para>
|
</para>
|
||||||
<para>See <xref linkend="schema.casestudies">HBase Schema Design Case Studies</xref> for some rowkey design examples.
|
<para>See <xref linkend="schema.casestudies"/> for some rowkey design examples.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="keysize">
|
<section xml:id="keysize">
|
||||||
|
@ -119,7 +119,7 @@ admin.enableTable(table);
|
||||||
are large, especially compared to the size of the cell value, then
|
are large, especially compared to the size of the cell value, then
|
||||||
you may run up against some interesting scenarios. One such is
|
you may run up against some interesting scenarios. One such is
|
||||||
the case described by Marc Limotte at the tail of
|
the case described by Marc Limotte at the tail of
|
||||||
<link xlink:url="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
|
<link xlink:href="https://issues.apache.org/jira/browse/HBASE-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005272#comment-13005272">HBASE-3551</link>
|
||||||
(recommended!).
|
(recommended!).
|
||||||
Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
|
Therein, the indices that are kept on HBase storefiles (<xref linkend="hfile" />)
|
||||||
to facilitate random access may end up occupyng large chunks of the HBase
|
to facilitate random access may end up occupyng large chunks of the HBase
|
||||||
|
@ -213,7 +213,7 @@ COLUMN CELL
|
||||||
<para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
|
<para>The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys
|
||||||
are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
|
are in sorted order, this key sorts before any older row-keys for [key] and thus is first.
|
||||||
</para>
|
</para>
|
||||||
<para>This technique would be used instead of using <xref linkend="schema.versions">HBase Versioning</xref> where the intent is to hold onto all versions
|
<para>This technique would be used instead of using <xref linkend="schema.versions"/> where the intent is to hold onto all versions
|
||||||
"forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
|
"forever" (or a very long time) and at the same time quickly obtain access to any other version by using the same Scan technique.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -335,11 +335,11 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
|
||||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
|
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html">Result</link>, so anything that can be
|
||||||
converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
||||||
</para>
|
</para>
|
||||||
<para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase
|
<para>There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
|
||||||
would probably be too much to ask); search the mailing list for conversations on this topic.
|
search the mailling list for conversations on this topic. All rows in HBase conform to the <xref linkend="datamodel"/>, and
|
||||||
All rows in HBase conform to the <xref linkend="datamodel">datamodel</xref>, and that includes
|
that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
|
||||||
versioning. Take that into consideration when making your design, as well as block size for
|
</para>
|
||||||
the ColumnFamily. </para>
|
|
||||||
<section xml:id="counters">
|
<section xml:id="counters">
|
||||||
<title>Counters</title>
|
<title>Counters</title>
|
||||||
<para>
|
<para>
|
||||||
|
@ -389,10 +389,10 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
|
||||||
</para>
|
</para>
|
||||||
<para>There is no single answer on the best way to handle this because it depends on...
|
<para>There is no single answer on the best way to handle this because it depends on...
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Number of users</listitem>
|
<listitem><para>Number of users</para></listitem>
|
||||||
<listitem>Data size and data arrival rate</listitem>
|
<listitem><para>Data size and data arrival rate</para></listitem>
|
||||||
<listitem>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </listitem>
|
<listitem><para>Flexibility of reporting requirements (e.g., completely ad-hoc date selection vs. pre-configured ranges) </para></listitem>
|
||||||
<listitem>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </listitem>
|
<listitem><para>Desired execution speed of query (e.g., 90 seconds may be reasonable to some for an ad-hoc report, whereas it may be too long for others) </para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
|
... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
|
||||||
Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
|
Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
|
||||||
|
@ -455,26 +455,26 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
|
||||||
can be approached. Note: this is just an illustration of potential approaches, not an exhaustive list.
|
can be approached. Note: this is just an illustration of potential approaches, not an exhaustive list.
|
||||||
Know your data, and know your processing requirements.
|
Know your data, and know your processing requirements.
|
||||||
</para>
|
</para>
|
||||||
<para>It is highly recommended that you read the rest of the <xref linkend="schema">Schema Design Chapter</xref> first, before reading
|
<para>It is highly recommended that you read the rest of the <xref linkend="schema"/> first, before reading
|
||||||
these case studies.
|
these case studies.
|
||||||
</para>
|
</para>
|
||||||
<para>Thee following case studies are described:
|
<para>Thee following case studies are described:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Log Data / Timeseries Data</listitem>
|
<listitem><para>Log Data / Timeseries Data</para></listitem>
|
||||||
<listitem>Log Data / Timeseries on Steroids</listitem>
|
<listitem><para>Log Data / Timeseries on Steroids</para></listitem>
|
||||||
<listitem>Customer/Order</listitem>
|
<listitem><para>Customer/Order</para></listitem>
|
||||||
<listitem>Tall/Wide/Middle Schema Design</listitem>
|
<listitem><para>Tall/Wide/Middle Schema Design</para></listitem>
|
||||||
<listitem>List Data</listitem>
|
<listitem><para>List Data</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="schema.casestudies.log-timeseries">
|
<section xml:id="schema.casestudies.log-timeseries">
|
||||||
<title>Case Study - Log Data and Timeseries Data</title>
|
<title>Case Study - Log Data and Timeseries Data</title>
|
||||||
<para>Assume that the following data elements are being collected.
|
<para>Assume that the following data elements are being collected.
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Hostname</listitem>
|
<listitem><para>Hostname</para></listitem>
|
||||||
<listitem>Timestamp</listitem>
|
<listitem><para>Timestamp</para></listitem>
|
||||||
<listitem>Log event</listitem>
|
<listitem><para>Log event</para></listitem>
|
||||||
<listitem>Value/message</listitem>
|
<listitem><para>Value/message</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
We can store them in an HBase table called LOG_DATA, but what will the rowkey be?
|
We can store them in an HBase table called LOG_DATA, but what will the rowkey be?
|
||||||
From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically?
|
From these attributes the rowkey will be some combination of hostname, timestamp, and log-event - but what specifically?
|
||||||
|
@ -531,9 +531,9 @@ long bucket = timestamp % numBuckets;
|
||||||
</para>
|
</para>
|
||||||
<para>Composite Rowkey With Hashes:
|
<para>Composite Rowkey With Hashes:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[MD5 hash of hostname] = 16 bytes</listitem>
|
<listitem><para>[MD5 hash of hostname] = 16 bytes</para></listitem>
|
||||||
<listitem>[MD5 hash of event-type] = 16 bytes</listitem>
|
<listitem><para>[MD5 hash of event-type] = 16 bytes</para></listitem>
|
||||||
<listitem>[timestamp] = 8 bytes</listitem>
|
<listitem><para>[timestamp] = 8 bytes</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Composite Rowkey With Numeric Substitution:
|
<para>Composite Rowkey With Numeric Substitution:
|
||||||
|
@ -541,17 +541,17 @@ long bucket = timestamp % numBuckets;
|
||||||
<para>For this approach another lookup table would be needed in addition to LOG_DATA, called LOG_TYPES.
|
<para>For this approach another lookup table would be needed in addition to LOG_DATA, called LOG_TYPES.
|
||||||
The rowkey of LOG_TYPES would be:
|
The rowkey of LOG_TYPES would be:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[type] (e.g., byte indicating hostname vs. event-type)</listitem>
|
<listitem><para>[type] (e.g., byte indicating hostname vs. event-type)</para></listitem>
|
||||||
<listitem>[bytes] variable length bytes for raw hostname or event-type.</listitem>
|
<listitem><para>[bytes] variable length bytes for raw hostname or event-type.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
A column for this rowkey could be a long with an assigned number, which could be obtained by using an
|
A column for this rowkey could be a long with an assigned number, which could be obtained by using an
|
||||||
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29">HBase counter</link>.
|
<link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29">HBase counter</link>.
|
||||||
</para>
|
</para>
|
||||||
<para>So the resulting composite rowkey would be:
|
<para>So the resulting composite rowkey would be:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[substituted long for hostname] = 8 bytes</listitem>
|
<listitem><para>[substituted long for hostname] = 8 bytes</para></listitem>
|
||||||
<listitem>[substituted long for event type] = 8 bytes</listitem>
|
<listitem><para>[substituted long for event type] = 8 bytes</para></listitem>
|
||||||
<listitem>[timestamp] = 8 bytes</listitem>
|
<listitem><para>[timestamp] = 8 bytes</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
In either the Hash or Numeric substitution approach, the raw values for hostname and event-type can be stored as columns.
|
In either the Hash or Numeric substitution approach, the raw values for hostname and event-type can be stored as columns.
|
||||||
</para>
|
</para>
|
||||||
|
@ -586,19 +586,19 @@ long bucket = timestamp % numBuckets;
|
||||||
</para>
|
</para>
|
||||||
<para>The Customer record type would include all the things that you’d typically expect:
|
<para>The Customer record type would include all the things that you’d typically expect:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Customer number</listitem>
|
<listitem><para>Customer number</para></listitem>
|
||||||
<listitem>Customer name</listitem>
|
<listitem><para>Customer name</para></listitem>
|
||||||
<listitem>Address (e.g., city, state, zip)</listitem>
|
<listitem><para>Address (e.g., city, state, zip)</para></listitem>
|
||||||
<listitem>Phone numbers, etc.</listitem>
|
<listitem><para>Phone numbers, etc.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The Order record type would include things like:
|
<para>The Order record type would include things like:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Customer number</listitem>
|
<listitem><para>Customer number</para></listitem>
|
||||||
<listitem>Order number</listitem>
|
<listitem><para>Order number</para></listitem>
|
||||||
<listitem>Sales date</listitem>
|
<listitem><para>Sales date</para></listitem>
|
||||||
<listitem>A series of nested objects for shipping locations and line-items (see <xref linkend="schema.casestudies.custorder.obj"/>
|
<listitem><para>A series of nested objects for shipping locations and line-items (see <xref linkend="schema.casestudies.custorder.obj"/>
|
||||||
for details)</listitem>
|
for details)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Assuming that the combination of customer number and sales order uniquely identify an order, these two attributes will compose
|
<para>Assuming that the combination of customer number and sales order uniquely identify an order, these two attributes will compose
|
||||||
|
@ -614,14 +614,14 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
</para>
|
</para>
|
||||||
<para>Composite Rowkey With Hashes:
|
<para>Composite Rowkey With Hashes:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[MD5 of customer number] = 16 bytes</listitem>
|
<listitem><para>[MD5 of customer number] = 16 bytes</para></listitem>
|
||||||
<listitem>[MD5 of order number] = 16 bytes</listitem>
|
<listitem><para>[MD5 of order number] = 16 bytes</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Composite Numeric/Hash Combo Rowkey:
|
<para>Composite Numeric/Hash Combo Rowkey:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[substituted long for customer number] = 8 bytes</listitem>
|
<listitem><para>[substituted long for customer number] = 8 bytes</para></listitem>
|
||||||
<listitem>[MD5 of order number] = 16 bytes</listitem>
|
<listitem><para>[MD5 of order number] = 16 bytes</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="schema.casestudies.custorder.tables">
|
<section xml:id="schema.casestudies.custorder.tables">
|
||||||
|
@ -631,15 +631,15 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
</para>
|
</para>
|
||||||
<para>Customer Record Type Rowkey:
|
<para>Customer Record Type Rowkey:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[customer-id]</listitem>
|
<listitem><para>[customer-id]</para></listitem>
|
||||||
<listitem>[type] = type indicating ‘1’ for customer record type</listitem>
|
<listitem><para>[type] = type indicating ‘1’ for customer record type</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Order Record Type Rowkey:
|
<para>Order Record Type Rowkey:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[customer-id]</listitem>
|
<listitem><para>[customer-id]</para></listitem>
|
||||||
<listitem>[type] = type indicating ‘2’ for order record type</listitem>
|
<listitem><para>[type] = type indicating ‘2’ for order record type</para></listitem>
|
||||||
<listitem>[order]</listitem>
|
<listitem><para>[order]</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The advantage of this particular CUSTOMER++ approach is that organizes many different record-types by customer-id
|
<para>The advantage of this particular CUSTOMER++ approach is that organizes many different record-types by customer-id
|
||||||
|
@ -665,23 +665,23 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
</para>
|
</para>
|
||||||
<para>The SHIPPING_LOCATION's composite rowkey would be something like this:
|
<para>The SHIPPING_LOCATION's composite rowkey would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[shipping location number] (e.g., 1st location, 2nd, etc.)</listitem>
|
<listitem><para>[shipping location number] (e.g., 1st location, 2nd, etc.)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The LINE_ITEM table's composite rowkey would be something like this:
|
<para>The LINE_ITEM table's composite rowkey would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[shipping location number] (e.g., 1st location, 2nd, etc.)</listitem>
|
<listitem><para>[shipping location number] (e.g., 1st location, 2nd, etc.)</para></listitem>
|
||||||
<listitem>[line item number] (e.g., 1st lineitem, 2nd, etc.)</listitem>
|
<listitem><para>[line item number] (e.g., 1st lineitem, 2nd, etc.)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>Such a normalized model is likely to be the approach with an RDBMS, but that's not your only option with HBase.
|
<para>Such a normalized model is likely to be the approach with an RDBMS, but that's not your only option with HBase.
|
||||||
The cons of such an approach is that to retrieve information about any Order, you will need:
|
The cons of such an approach is that to retrieve information about any Order, you will need:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Get on the ORDER table for the Order</listitem>
|
<listitem><para>Get on the ORDER table for the Order</para></listitem>
|
||||||
<listitem>Scan on the SHIPPING_LOCATION table for that order to get the ShippingLocation instances</listitem>
|
<listitem><para>Scan on the SHIPPING_LOCATION table for that order to get the ShippingLocation instances</para></listitem>
|
||||||
<listitem>Scan on the LINE_ITEM for each ShippingLocation</listitem>
|
<listitem><para>Scan on the LINE_ITEM for each ShippingLocation</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
... granted, this is what an RDBMS would do under the covers anyway, but since there are no joins in HBase
|
... granted, this is what an RDBMS would do under the covers anyway, but since there are no joins in HBase
|
||||||
you're just more aware of this fact.
|
you're just more aware of this fact.
|
||||||
|
@ -693,23 +693,23 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
</para>
|
</para>
|
||||||
<para>The Order rowkey was described above: <xref linkend="schema.casestudies.custorder"/>
|
<para>The Order rowkey was described above: <xref linkend="schema.casestudies.custorder"/>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[ORDER record type]</listitem>
|
<listitem><para>[ORDER record type]</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The ShippingLocation composite rowkey would be something like this:
|
<para>The ShippingLocation composite rowkey would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[SHIPPING record type]</listitem>
|
<listitem><para>[SHIPPING record type]</para></listitem>
|
||||||
<listitem>[shipping location number] (e.g., 1st location, 2nd, etc.)</listitem>
|
<listitem><para>[shipping location number] (e.g., 1st location, 2nd, etc.)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The LineItem composite rowkey would be something like this:
|
<para>The LineItem composite rowkey would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[LINE record type]</listitem>
|
<listitem><para>[LINE record type]</para></listitem>
|
||||||
<listitem>[shipping location number] (e.g., 1st location, 2nd, etc.)</listitem>
|
<listitem><para>[shipping location number] (e.g., 1st location, 2nd, etc.)</para></listitem>
|
||||||
<listitem>[line item number] (e.g., 1st lineitem, 2nd, etc.)</listitem>
|
<listitem><para>[line item number] (e.g., 1st lineitem, 2nd, etc.)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
@ -720,21 +720,21 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
</para>
|
</para>
|
||||||
<para>The LineItem composite rowkey would be something like this:
|
<para>The LineItem composite rowkey would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>[order-rowkey]</listitem>
|
<listitem><para>[order-rowkey]</para></listitem>
|
||||||
<listitem>[LINE record type]</listitem>
|
<listitem><para>[LINE record type]</para></listitem>
|
||||||
<listitem>[line item number] (e.g., 1st lineitem, 2nd, etc. - care must be taken that there are unique across the entire order)</listitem>
|
<listitem><para>[line item number] (e.g., 1st lineitem, 2nd, etc. - care must be taken that there are unique across the entire order)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>... and the LineItem columns would be something like this:
|
<para>... and the LineItem columns would be something like this:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>itemNumber</listitem>
|
<listitem><para>itemNumber</para></listitem>
|
||||||
<listitem>quantity</listitem>
|
<listitem><para>quantity</para></listitem>
|
||||||
<listitem>price</listitem>
|
<listitem><para>price</para></listitem>
|
||||||
<listitem>shipToLine1 (denormalized from ShippingLocation)</listitem>
|
<listitem><para>shipToLine1 (denormalized from ShippingLocation)</para></listitem>
|
||||||
<listitem>shipToLine2 (denormalized from ShippingLocation)</listitem>
|
<listitem><para>shipToLine2 (denormalized from ShippingLocation)</para></listitem>
|
||||||
<listitem>shipToCity (denormalized from ShippingLocation)</listitem>
|
<listitem><para>shipToCity (denormalized from ShippingLocation)</para></listitem>
|
||||||
<listitem>shipToState (denormalized from ShippingLocation)</listitem>
|
<listitem><para>shipToState (denormalized from ShippingLocation)</para></listitem>
|
||||||
<listitem>shipToZip (denormalized from ShippingLocation)</listitem>
|
<listitem><para>shipToZip (denormalized from ShippingLocation)</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>The pros of this approach include a less complex object heirarchy, but one of the cons is that updating gets more
|
<para>The pros of this approach include a less complex object heirarchy, but one of the cons is that updating gets more
|
||||||
|
@ -789,7 +789,7 @@ reasonable spread in the keyspace, similar options appear:
|
||||||
OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
|
OpenTSDB is the best example of this case where a single row represents a defined time-range, and then discrete events are treated as
|
||||||
columns. This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
|
columns. This approach is often more complex, and may require the additional complexity of re-writing your data, but has the
|
||||||
advantage of being I/O efficient. For an overview of this approach, see
|
advantage of being I/O efficient. For an overview of this approach, see
|
||||||
<xref linkend="schema.casestudies.log-timeseries.log-steroids"/>.
|
<xref linkend="schema.casestudies.log-steroids"/>.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
|
@ -815,7 +815,7 @@ we could have something like:
|
||||||
<FixedWidthUserName><FixedWidthValueId3>:"" (no value)
|
<FixedWidthUserName><FixedWidthValueId3>:"" (no value)
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
The other option we had was to do this entirely using:
|
<para>The other option we had was to do this entirely using:</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
<FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
|
<FixedWidthUserName><FixedWidthPageNum0>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
|
||||||
<FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
|
<FixedWidthUserName><FixedWidthPageNum1>:<FixedWidthLength><FixedIdNextPageNum><ValueId1><ValueId2><ValueId3>...
|
||||||
|
@ -827,8 +827,8 @@ So in one case reading the first thirty values would be:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
scan { STARTROW => 'FixedWidthUsername' LIMIT => 30}
|
scan { STARTROW => 'FixedWidthUsername' LIMIT => 30}
|
||||||
</programlisting>
|
</programlisting>
|
||||||
And in the second case it would be
|
<para>And in the second case it would be
|
||||||
<programlisting>
|
</para> <programlisting>
|
||||||
get 'FixedWidthUserName\x00\x00\x00\x00'
|
get 'FixedWidthUserName\x00\x00\x00\x00'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
|
|
|
@ -517,7 +517,7 @@
|
||||||
<para>
|
<para>
|
||||||
You must configure HBase for secure or simple user access operation. Refer to the
|
You must configure HBase for secure or simple user access operation. Refer to the
|
||||||
<link linkend='hbase.accesscontrol.configuration'>Secure Client Access to HBase</link> or
|
<link linkend='hbase.accesscontrol.configuration'>Secure Client Access to HBase</link> or
|
||||||
<link linkend='hbase.accesscontrol.simpleconfiguration'>Simple User Access to HBase</link>
|
<link linkend='hbase.secure.simpleconfiguration'>Simple User Access to HBase</link>
|
||||||
sections and complete all of the steps described
|
sections and complete all of the steps described
|
||||||
there.
|
there.
|
||||||
</para>
|
</para>
|
||||||
|
@ -779,14 +779,16 @@ Access control mechanisms are mature and fairly standardized in the relational d
|
||||||
<para>
|
<para>
|
||||||
The HBase shell has been extended to provide simple commands for editing and updating user permissions. The following commands have been added for access control list management:
|
The HBase shell has been extended to provide simple commands for editing and updating user permissions. The following commands have been added for access control list management:
|
||||||
</para>
|
</para>
|
||||||
Grant
|
|
||||||
<para>
|
|
||||||
|
<example>
|
||||||
|
<title>Grant</title>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
grant <user|@group> <permissions> [ <table> [ <column family> [ <column qualifier> ] ] ]
|
grant <user|@group> <permissions> [ <table> [ <column family> [ <column qualifier> ] ] ]
|
||||||
</programlisting>
|
</programlisting></example>
|
||||||
</para>
|
|
||||||
<para>
|
<para>
|
||||||
<code class="code"><user|@group></code> is user or group (start with character '@'), Groups are created and manipulated via the Hadoop group mapping service.
|
<code><user|@group></code> is user or group (start with character '@'), Groups are created and manipulated via the Hadoop group mapping service.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
<code><permissions></code> is zero or more letters from the set "RWCA": READ('R'), WRITE('W'), CREATE('C'), ADMIN('A').
|
<code><permissions></code> is zero or more letters from the set "RWCA": READ('R'), WRITE('W'), CREATE('C'), ADMIN('A').
|
||||||
|
@ -794,32 +796,32 @@ The HBase shell has been extended to provide simple commands for editing and upd
|
||||||
<para>
|
<para>
|
||||||
Note: Grants and revocations of individual permissions on a resource are both accomplished using the <code>grant</code> command. A separate <code>revoke</code> command is also provided by the shell, but this is for fast revocation of all of a user's access rights to a given resource only.
|
Note: Grants and revocations of individual permissions on a resource are both accomplished using the <code>grant</code> command. A separate <code>revoke</code> command is also provided by the shell, but this is for fast revocation of all of a user's access rights to a given resource only.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<example>
|
||||||
Revoke
|
<title>Revoke</title>
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
revoke <user|@group> [ <table> [ <column family> [ <column qualifier> ] ] ]
|
revoke <user|@group> [ <table> [ <column family> [ <column qualifier> ] ] ]
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</example>
|
||||||
|
<example>
|
||||||
|
<title>Alter</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Alter
|
The <code>alter</code> command has been extended to allow ownership assignment:</para>
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The <code>alter</code> command has been extended to allow ownership assignment:
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
alter 'tablename', {OWNER => 'username|@group'}
|
alter 'tablename', {OWNER => 'username|@group'}
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</example>
|
||||||
|
|
||||||
|
<example>
|
||||||
|
<title>User Permission</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
User Permission
|
The <code>user_permission</code> command shows all access permissions for the current user for a given table:</para>
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The <code>user_permission</code> command shows all access permissions for the current user for a given table:
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
user_permission <table>
|
user_permission <table>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</example>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section> <!-- Access Control -->
|
</section> <!-- Access Control -->
|
||||||
|
@ -829,12 +831,12 @@ The HBase shell has been extended to provide simple commands for editing and upd
|
||||||
<para>
|
<para>
|
||||||
Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the mapreduce job to HBase. Secure bulk loading is implemented by a coprocessor, named <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html">SecureBulkLoadEndpoint</link>. SecureBulkLoadEndpoint uses a staging directory <code>"hbase.bulkload.staging.dir"</code>, which defaults to <code>/tmp/hbase-staging/</code>. The algorithm is as follows.
|
Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the mapreduce job to HBase. Secure bulk loading is implemented by a coprocessor, named <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html">SecureBulkLoadEndpoint</link>. SecureBulkLoadEndpoint uses a staging directory <code>"hbase.bulkload.staging.dir"</code>, which defaults to <code>/tmp/hbase-staging/</code>. The algorithm is as follows.
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Create an hbase owned staging directory which is world traversable (<code>-rwx--x--x, 711</code>) <code>/tmp/hbase-staging</code>. </listitem>
|
<listitem><para>Create an hbase owned staging directory which is world traversable (<code>-rwx--x--x, 711</code>) <code>/tmp/hbase-staging</code>. </para></listitem>
|
||||||
<listitem>A user writes out data to his secure output directory: /user/foo/data </listitem>
|
<listitem><para>A user writes out data to his secure output directory: /user/foo/data </para></listitem>
|
||||||
<listitem>A call is made to hbase to create a secret staging directory
|
<listitem><para>A call is made to hbase to create a secret staging directory
|
||||||
which is globally readable/writable (<code>-rwxrwxrwx, 777</code>): /tmp/hbase-staging/averylongandrandomdirectoryname</listitem>
|
which is globally readable/writable (<code>-rwxrwxrwx, 777</code>): /tmp/hbase-staging/averylongandrandomdirectoryname</para></listitem>
|
||||||
<listitem>The user makes the data world readable and writable, then moves it
|
<listitem><para>The user makes the data world readable and writable, then moves it
|
||||||
into the random staging directory, then calls bulkLoadHFiles()</listitem>
|
into the random staging directory, then calls bulkLoadHFiles()</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
|
@ -870,9 +872,9 @@ The HBase shell has been extended to provide simple commands for editing and upd
|
||||||
<para>
|
<para>
|
||||||
Visibility expressions like the above can be added when storing or mutating a cell using the API,
|
Visibility expressions like the above can be added when storing or mutating a cell using the API,
|
||||||
</para>
|
</para>
|
||||||
<para><code>Mutation#setCellVisibility(new CellVisibility(String labelExpession));</code></para>
|
<programlisting>Mutation#setCellVisibility(new CellVisibility(String labelExpession));</programlisting>
|
||||||
Where the labelExpression could be '( secret | topsecret ) & !probationary'
|
<para> Where the labelExpression could be '( secret | topsecret ) & !probationary'
|
||||||
|
</para>
|
||||||
<para>
|
<para>
|
||||||
We build the user's label set in the RPC context when a request is first received by the HBase RegionServer. How users are associated with labels is pluggable. The default plugin passes through labels specified in Authorizations added to the Get or Scan and checks those against the calling user's authenticated labels list. When client passes some labels for which the user is not authenticated, this default algorithm will drop those. One can pass a subset of user authenticated labels via the Scan/Get authorizations.
|
We build the user's label set in the RPC context when a request is first received by the HBase RegionServer. How users are associated with labels is pluggable. The default plugin passes through labels specified in Authorizations added to the Get or Scan and checks those against the calling user's authenticated labels list. When client passes some labels for which the user is not authenticated, this default algorithm will drop those. One can pass a subset of user authenticated labels via the Scan/Get authorizations.
|
||||||
</para>
|
</para>
|
||||||
|
@ -896,7 +898,7 @@ The HBase shell has been extended to provide simple commands for editing and upd
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section xml:id="hbase.visibility.label.administration.add.label">
|
<section xml:id="hbase.visibility.label.administration.add.label2">
|
||||||
<title>User Label Association</title>
|
<title>User Label Association</title>
|
||||||
<para>A set of labels can be associated with a user by using the API</para>
|
<para>A set of labels can be associated with a user by using the API</para>
|
||||||
<para><code>VisibilityClient#setAuths(Configuration conf, final String[] auths, final String user)</code></para>
|
<para><code>VisibilityClient#setAuths(Configuration conf, final String[] auths, final String user)</code></para>
|
||||||
|
|
|
@ -235,7 +235,7 @@ export SERVER_GC_OPTS="$SERVER_GC_OPTS -XX:NewSize=64m -XX:MaxNewSize=64m"
|
||||||
is generally used for questions on released versions of Apache HBase. Before going to the mailing list, make sure your
|
is generally used for questions on released versions of Apache HBase. Before going to the mailing list, make sure your
|
||||||
question has not already been answered by searching the mailing list archives first. Use
|
question has not already been answered by searching the mailing list archives first. Use
|
||||||
<xref linkend="trouble.resources.searchhadoop" />.
|
<xref linkend="trouble.resources.searchhadoop" />.
|
||||||
Take some time crafting your question<footnote><para>See <link xlink="http://www.mikeash.com/getting_answers.html">Getting Answers</link></para></footnote>; a quality question that includes all context and
|
Take some time crafting your question<footnote><para>See <link xlink:href="http://www.mikeash.com/getting_answers.html">Getting Answers</link></para></footnote>; a quality question that includes all context and
|
||||||
exhibits evidence the author has tried to find answers in the manual and out on lists
|
exhibits evidence the author has tried to find answers in the manual and out on lists
|
||||||
is more likely to get a prompt response.
|
is more likely to get a prompt response.
|
||||||
</para>
|
</para>
|
||||||
|
@ -360,15 +360,15 @@ hadoop@sv4borg12:~$ jps
|
||||||
</programlisting>
|
</programlisting>
|
||||||
In order, we see a:
|
In order, we see a:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Hadoop TaskTracker, manages the local Childs</listitem>
|
<listitem><para>Hadoop TaskTracker, manages the local Childs</para></listitem>
|
||||||
<listitem>HBase RegionServer, serves regions</listitem>
|
<listitem><para>HBase RegionServer, serves regions</para></listitem>
|
||||||
<listitem>Child, its MapReduce task, cannot tell which type exactly</listitem>
|
<listitem><para>Child, its MapReduce task, cannot tell which type exactly</para></listitem>
|
||||||
<listitem>Hadoop TaskTracker, manages the local Childs</listitem>
|
<listitem><para>Hadoop TaskTracker, manages the local Childs</para></listitem>
|
||||||
<listitem>Hadoop DataNode, serves blocks</listitem>
|
<listitem><para>Hadoop DataNode, serves blocks</para></listitem>
|
||||||
<listitem>HQuorumPeer, a ZooKeeper ensemble member</listitem>
|
<listitem><para>HQuorumPeer, a ZooKeeper ensemble member</para></listitem>
|
||||||
<listitem>Jps, well… it’s the current process</listitem>
|
<listitem><para>Jps, well… it’s the current process</para></listitem>
|
||||||
<listitem>ThriftServer, it’s a special one will be running only if thrift was started</listitem>
|
<listitem><para>ThriftServer, it’s a special one will be running only if thrift was started</para></listitem>
|
||||||
<listitem>jmx, this is a local process that’s part of our monitoring platform ( poorly named maybe). You probably don’t have that.</listitem>
|
<listitem><para>jmx, this is a local process that’s part of our monitoring platform ( poorly named maybe). You probably don’t have that.</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
|
@ -620,21 +620,19 @@ Harsh J investigated the issue as part of the mailing list thread
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.client.oome.directmemory.leak">
|
<section xml:id="trouble.client.oome.directmemory.leak">
|
||||||
<title>Client running out of memory though heap size seems to be stable (but the off-heap/direct heap keeps growing)</title>
|
<title>Client running out of memory though heap size seems to be stable (but the off-heap/direct heap keeps growing)</title>
|
||||||
<para> You are likely running into the issue that is described and worked through in the
|
<para>
|
||||||
mail thread <link
|
You are likely running into the issue that is described and worked through in
|
||||||
xhref="http://search-hadoop.com/m/ubhrX8KvcH/Suspected+memory+leak&subj=Re+Suspected+memory+leak"
|
the mail thread <link xlink:href="http://search-hadoop.com/m/ubhrX8KvcH/Suspected+memory+leak&subj=Re+Suspected+memory+leak">HBase, mail # user - Suspected memory leak</link>
|
||||||
>HBase, mail # user - Suspected memory leak</link> and continued over in <link
|
and continued over in <link xlink:href="http://search-hadoop.com/m/p2Agc1Zy7Va/MaxDirectMemorySize+Was%253A+Suspected+memory+leak&subj=Re+FeedbackRe+Suspected+memory+leak">HBase, mail # dev - FeedbackRe: Suspected memory leak</link>.
|
||||||
xhref="http://search-hadoop.com/m/p2Agc1Zy7Va/MaxDirectMemorySize+Was%253A+Suspected+memory+leak&subj=Re+FeedbackRe+Suspected+memory+leak"
|
A workaround is passing your client-side JVM a reasonable value for <code>-XX:MaxDirectMemorySize</code>. By default,
|
||||||
>HBase, mail # dev - FeedbackRe: Suspected memory leak</link>. A workaround is passing
|
the <varname>MaxDirectMemorySize</varname> is equal to your <code>-Xmx</code> max heapsize setting (if <code>-Xmx</code> is set).
|
||||||
your client-side JVM a reasonable value for <code>-XX:MaxDirectMemorySize</code>. By
|
Try seting it to something smaller (for example, one user had success setting it to <code>1g</code> when
|
||||||
default, the <varname>MaxDirectMemorySize</varname> is equal to your <code>-Xmx</code> max
|
they had a client-side heap of <code>12g</code>). If you set it too small, it will bring on <code>FullGCs</code> so keep
|
||||||
heapsize setting (if <code>-Xmx</code> is set). Try seting it to something smaller (for
|
it a bit hefty. You want to make this setting client-side only especially if you are running the new experiemental
|
||||||
example, one user had success setting it to <code>1g</code> when they had a client-side heap
|
server-side off-heap cache since this feature depends on being able to use big direct buffers (You may have to keep
|
||||||
of <code>12g</code>). If you set it too small, it will bring on <code>FullGCs</code> so keep
|
separate client-side and server-side config dirs).
|
||||||
it a bit hefty. You want to make this setting client-side only especially if you are running
|
</para>
|
||||||
the new experimental server-side off-heap cache since this feature depends on being able to
|
|
||||||
use big direct buffers (You may have to keep separate client-side and server-side config
|
|
||||||
dirs). </para>
|
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.client.slowdown.admin">
|
<section xml:id="trouble.client.slowdown.admin">
|
||||||
<title>Client Slowdown When Calling Admin Methods (flush, compact, etc.)</title>
|
<title>Client Slowdown When Calling Admin Methods (flush, compact, etc.)</title>
|
||||||
|
@ -753,7 +751,7 @@ Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
|
||||||
<filename>/<HLog></filename> (WAL HLog files for the RegionServer)
|
<filename>/<HLog></filename> (WAL HLog files for the RegionServer)
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
<para>See the <link xlink:href="see http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html">HDFS User Guide</link> for other non-shell diagnostic
|
<para>See the <link xlink:href="http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html">HDFS User Guide</link> for other non-shell diagnostic
|
||||||
utilities like <code>fsck</code>.
|
utilities like <code>fsck</code>.
|
||||||
</para>
|
</para>
|
||||||
<section xml:id="trouble.namenode.0size.hlogs">
|
<section xml:id="trouble.namenode.0size.hlogs">
|
||||||
|
@ -926,25 +924,26 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
||||||
Since the RegionServer's local ZooKeeper client cannot send heartbeats, the session times out.
|
Since the RegionServer's local ZooKeeper client cannot send heartbeats, the session times out.
|
||||||
By design, we shut down any node that isn't able to contact the ZooKeeper ensemble after getting a timeout so that it stops serving data that may already be assigned elsewhere.
|
By design, we shut down any node that isn't able to contact the ZooKeeper ensemble after getting a timeout so that it stops serving data that may already be assigned elsewhere.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>Make sure you give plenty of RAM (in <filename>hbase-env.sh</filename>), the default of 1GB won't be able to sustain long running imports.</listitem>
|
<listitem><para>Make sure you give plenty of RAM (in <filename>hbase-env.sh</filename>), the default of 1GB won't be able to sustain long running imports.</para></listitem>
|
||||||
<listitem>Make sure you don't swap, the JVM never behaves well under swapping.</listitem>
|
<listitem><para>Make sure you don't swap, the JVM never behaves well under swapping.</para></listitem>
|
||||||
<listitem>Make sure you are not CPU starving the RegionServer thread. For example, if you are running a MapReduce job using 6 CPU-intensive tasks on a machine with 4 cores, you are probably starving the RegionServer enough to create longer garbage collection pauses.</listitem>
|
<listitem><para>Make sure you are not CPU starving the RegionServer thread. For example, if you are running a MapReduce job using 6 CPU-intensive tasks on a machine with 4 cores, you are probably starving the RegionServer enough to create longer garbage collection pauses.</para></listitem>
|
||||||
<listitem>Increase the ZooKeeper session timeout</listitem>
|
<listitem><para>Increase the ZooKeeper session timeout</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
If you wish to increase the session timeout, add the following to your <filename>hbase-site.xml</filename> to increase the timeout from the default of 60 seconds to 120 seconds.
|
<para>If you wish to increase the session timeout, add the following to your <filename>hbase-site.xml</filename> to increase the timeout from the default of 60 seconds to 120 seconds.
|
||||||
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
<property>
|
<![CDATA[<property>
|
||||||
<name>zookeeper.session.timeout</name>
|
<name>zookeeper.session.timeout</name>
|
||||||
<value>1200000</value>
|
<value>1200000</value>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
<name>hbase.zookeeper.property.tickTime</name>
|
<name>hbase.zookeeper.property.tickTime</name>
|
||||||
<value>6000</value>
|
<value>6000</value>
|
||||||
</property>
|
</property>]]>
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
|
||||||
<para>
|
<para>
|
||||||
Be aware that setting a higher timeout means that the regions served by a failed RegionServer will take at least
|
Be aware that setting a higher timeout means that the regions served by a failed RegionServer will take at least
|
||||||
that amount of time to be transfered to another RegionServer. For a production system serving live requests, we would instead
|
that amount of time to be transfered to another RegionServer. For a production system serving live requests, we would instead
|
||||||
|
@ -954,8 +953,8 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
||||||
<para>
|
<para>
|
||||||
If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider bulk loading.
|
If this is happening during an upload which only happens once (like initially loading all your data into HBase), consider bulk loading.
|
||||||
</para>
|
</para>
|
||||||
See <xref linkend="trouble.zookeeper.general"/> for other general information about ZooKeeper troubleshooting.
|
<para>See <xref linkend="trouble.zookeeper.general"/> for other general information about ZooKeeper troubleshooting.
|
||||||
</section>
|
</para> </section>
|
||||||
<section xml:id="trouble.rs.runtime.notservingregion">
|
<section xml:id="trouble.rs.runtime.notservingregion">
|
||||||
<title>NotServingRegionException</title>
|
<title>NotServingRegionException</title>
|
||||||
<para>This exception is "normal" when found in the RegionServer logs at DEBUG level. This exception is returned back to the client
|
<para>This exception is "normal" when found in the RegionServer logs at DEBUG level. This exception is returned back to the client
|
||||||
|
@ -970,7 +969,7 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
||||||
RegionServer is not using the name given it by the master; double entry in master listing of servers</link> for gorey details.
|
RegionServer is not using the name given it by the master; double entry in master listing of servers</link> for gorey details.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.rs.runtime.codecmsgs">
|
<section xml:id="brand.new.compressor">
|
||||||
<title>Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got
|
<title>Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Got
|
||||||
brand-new compressor' messages</title>
|
brand-new compressor' messages</title>
|
||||||
<para>We are not using the native versions of compression
|
<para>We are not using the native versions of compression
|
||||||
|
@ -992,7 +991,7 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.rs.shutdown">
|
<section xml:id="trouble.rs.shutdown">
|
||||||
<title>Shutdown Errors</title>
|
<title>Shutdown Errors</title>
|
||||||
|
<para />
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
@ -1020,7 +1019,7 @@ ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expi
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="trouble.master.shutdown">
|
<section xml:id="trouble.master.shutdown">
|
||||||
<title>Shutdown Errors</title>
|
<title>Shutdown Errors</title>
|
||||||
|
<para/>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
|
@ -225,8 +225,8 @@
|
||||||
Now start up hbase-0.96.0.
|
Now start up hbase-0.96.0.
|
||||||
</para>
|
</para>
|
||||||
</section>
|
</section>
|
||||||
<section xml:id="096.migration.troubleshooting"><title>Troubleshooting</title>
|
<section xml:id="s096.migration.troubleshooting"><title>Troubleshooting</title>
|
||||||
<section xml:id="096.migration.troubleshooting.old.client"><title>Old Client connecting to 0.96 cluster</title>
|
<section xml:id="s096.migration.troubleshooting.old.client"><title>Old Client connecting to 0.96 cluster</title>
|
||||||
<para>It will fail with an exception like the below. Upgrade.
|
<para>It will fail with an exception like the below. Upgrade.
|
||||||
<programlisting>17:22:15 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF
|
<programlisting>17:22:15 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF
|
||||||
17:22:15 *
|
17:22:15 *
|
||||||
|
@ -266,20 +266,20 @@
|
||||||
<para>
|
<para>
|
||||||
If you've not patience, here are the important things to know upgrading.
|
If you've not patience, here are the important things to know upgrading.
|
||||||
<orderedlist>
|
<orderedlist>
|
||||||
<listitem>Once you upgrade, you can’t go back.
|
<listitem><para>Once you upgrade, you can’t go back.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem><para>
|
||||||
MSLAB is on by default. Watch that heap usage if you have a lot of regions.
|
MSLAB is on by default. Watch that heap usage if you have a lot of regions.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem><para>
|
||||||
Distributed splitting is on by defaul. It should make region server failover faster.
|
Distributed splitting is on by defaul. It should make region server failover faster.
|
||||||
</listitem>
|
</para></listitem>
|
||||||
<listitem>
|
<listitem><para>
|
||||||
There’s a separate tarball for security.
|
There’s a separate tarball for security.
|
||||||
</listitem>
|
</para></listitem>
|
||||||
<listitem>
|
<listitem><para>
|
||||||
If -XX:MaxDirectMemorySize is set in your hbase-env.sh, it’s going to enable the experimental off-heap cache (You may not want this).
|
If -XX:MaxDirectMemorySize is set in your hbase-env.sh, it’s going to enable the experimental off-heap cache (You may not want this).
|
||||||
</listitem>
|
</para></listitem>
|
||||||
</orderedlist>
|
</orderedlist>
|
||||||
</para>
|
</para>
|
||||||
</note>
|
</note>
|
||||||
|
@ -301,7 +301,7 @@ This means you cannot go back to 0.90.x once you’ve started HBase 0.92.0 over
|
||||||
<para>In 0.92.0, the <link xlink:href="http://hbase.apache.org/book.html#hbase.hregion.memstore.mslab.enabled">hbase.hregion.memstore.mslab.enabled</link> flag is set to true
|
<para>In 0.92.0, the <link xlink:href="http://hbase.apache.org/book.html#hbase.hregion.memstore.mslab.enabled">hbase.hregion.memstore.mslab.enabled</link> flag is set to true
|
||||||
(See <xref linkend="mslab" />). In 0.90.x it was <constant>false</constant>. When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the
|
(See <xref linkend="mslab" />). In 0.90.x it was <constant>false</constant>. When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the
|
||||||
memstore has zero or just a few small elements. This is fine usually but if you had lots of regions per regionserver in a 0.90.x cluster (and MSLAB was off),
|
memstore has zero or just a few small elements. This is fine usually but if you had lots of regions per regionserver in a 0.90.x cluster (and MSLAB was off),
|
||||||
you may find yourself OOME'ing on upgrade because the <mathphrase>thousands of regions * number of column families * 2MB MSLAB (at a minimum)</mathphrase>
|
you may find yourself OOME'ing on upgrade because the <code>thousands of regions * number of column families * 2MB MSLAB (at a minimum)</code>
|
||||||
puts your heap over the top. Set <varname>hbase.hregion.memstore.mslab.enabled</varname> to
|
puts your heap over the top. Set <varname>hbase.hregion.memstore.mslab.enabled</varname> to
|
||||||
<constant>false</constant> or set the MSLAB size down from 2MB by setting <varname>hbase.hregion.memstore.mslab.chunksize</varname> to something less.
|
<constant>false</constant> or set the MSLAB size down from 2MB by setting <varname>hbase.hregion.memstore.mslab.chunksize</varname> to something less.
|
||||||
</para>
|
</para>
|
||||||
|
|
|
@ -219,7 +219,7 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
standalone Zookeeper quorum) for ease of learning.
|
standalone Zookeeper quorum) for ease of learning.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<section><title>Operating System Prerequisites</title></section>
|
<section><title>Operating System Prerequisites</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
You need to have a working Kerberos KDC setup. For
|
You need to have a working Kerberos KDC setup. For
|
||||||
|
@ -283,7 +283,7 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
|
|
||||||
<para>We'll refer to this JAAS configuration file as
|
<para>We'll refer to this JAAS configuration file as
|
||||||
<filename>$CLIENT_CONF</filename> below.</para>
|
<filename>$CLIENT_CONF</filename> below.</para>
|
||||||
|
</section>
|
||||||
<section>
|
<section>
|
||||||
<title>HBase-managed Zookeeper Configuration</title>
|
<title>HBase-managed Zookeeper Configuration</title>
|
||||||
|
|
||||||
|
@ -312,10 +312,10 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
};
|
};
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
<para>where the <filename>$PATH_TO_HBASE_KEYTAB</filename> and
|
||||||
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
<filename>$PATH_TO_ZOOKEEPER_KEYTAB</filename> files are what
|
||||||
you created above, and <code>$HOST</code> is the hostname for that
|
you created above, and <code>$HOST</code> is the hostname for that
|
||||||
node.
|
node.</para>
|
||||||
|
|
||||||
<para>The <code>Server</code> section will be used by
|
<para>The <code>Server</code> section will be used by
|
||||||
the Zookeeper quorum server, while the
|
the Zookeeper quorum server, while the
|
||||||
|
@ -342,9 +342,9 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
export HBASE_REGIONSERVER_OPTS="-Djava.security.auth.login.config=$HBASE_SERVER_CONF"
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
where <filename>$HBASE_SERVER_CONF</filename> and
|
<para>where <filename>$HBASE_SERVER_CONF</filename> and
|
||||||
<filename>$CLIENT_CONF</filename> are the full paths to the
|
<filename>$CLIENT_CONF</filename> are the full paths to the
|
||||||
JAAS configuration files created above.
|
JAAS configuration files created above.</para>
|
||||||
|
|
||||||
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
<para>Modify your <filename>hbase-site.xml</filename> on each node
|
||||||
that will run zookeeper, master or regionserver to contain:</para>
|
that will run zookeeper, master or regionserver to contain:</para>
|
||||||
|
@ -537,10 +537,10 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
<section>
|
<section>
|
||||||
<title>Configuration from Scratch</title>
|
<title>Configuration from Scratch</title>
|
||||||
|
|
||||||
This has been tested on the current standard Amazon
|
<para>This has been tested on the current standard Amazon
|
||||||
Linux AMI. First setup KDC and principals as
|
Linux AMI. First setup KDC and principals as
|
||||||
described above. Next checkout code and run a sanity
|
described above. Next checkout code and run a sanity
|
||||||
check.
|
check.</para>
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
git clone git://git.apache.org/hbase.git
|
git clone git://git.apache.org/hbase.git
|
||||||
|
@ -548,9 +548,9 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
mvn clean test -Dtest=TestZooKeeperACL
|
mvn clean test -Dtest=TestZooKeeperACL
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
Then configure HBase as described above.
|
<para>Then configure HBase as described above.
|
||||||
Manually edit target/cached_classpath.txt (see below)..
|
Manually edit target/cached_classpath.txt (see below):
|
||||||
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
bin/hbase zookeeper &
|
bin/hbase zookeeper &
|
||||||
bin/hbase master &
|
bin/hbase master &
|
||||||
|
@ -582,14 +582,16 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
programmatically</title>
|
programmatically</title>
|
||||||
|
|
||||||
|
|
||||||
This would avoid the need for a separate Hadoop jar
|
<para>This would avoid the need for a separate Hadoop jar
|
||||||
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
that fixes <link xlink:href="https://issues.apache.org/jira/browse/HADOOP-7070">HADOOP-7070</link>.
|
||||||
|
</para>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
<section>
|
<section>
|
||||||
<title>Elimination of
|
<title>Elimination of
|
||||||
<code>kerberos.removeHostFromPrincipal</code> and
|
<code>kerberos.removeHostFromPrincipal</code> and
|
||||||
<code>kerberos.removeRealmFromPrincipal</code></title>
|
<code>kerberos.removeRealmFromPrincipal</code></title>
|
||||||
|
<para />
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</section>
|
</section>
|
||||||
|
|
Loading…
Reference in New Issue