Commit for HBASE-14825 -- corrections of typos, misspellings, and mangled links
Commit for HBASE-14825 -- corrections of typos, misspellings, and mangled links - addition commit for line lengths
This commit is contained in:
parent
8b67df6948
commit
6a493ddff7
|
@ -290,7 +290,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
updates are blocked and flushes are forced. Defaults to 40% of heap (0.4).
|
updates are blocked and flushes are forced. Defaults to 40% of heap (0.4).
|
||||||
Updates are blocked and flushes are forced until size of all memstores
|
Updates are blocked and flushes are forced until size of all memstores
|
||||||
in a region server hits hbase.regionserver.global.memstore.size.lower.limit.
|
in a region server hits hbase.regionserver.global.memstore.size.lower.limit.
|
||||||
The default value in this configuration has been intentionally left emtpy in order to
|
The default value in this configuration has been intentionally left empty in order to
|
||||||
honor the old hbase.regionserver.global.memstore.upperLimit property if present.</description>
|
honor the old hbase.regionserver.global.memstore.upperLimit property if present.</description>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
|
@ -300,7 +300,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
Defaults to 95% of hbase.regionserver.global.memstore.size (0.95).
|
Defaults to 95% of hbase.regionserver.global.memstore.size (0.95).
|
||||||
A 100% value for this value causes the minimum possible flushing to occur when updates are
|
A 100% value for this value causes the minimum possible flushing to occur when updates are
|
||||||
blocked due to memstore limiting.
|
blocked due to memstore limiting.
|
||||||
The default value in this configuration has been intentionally left emtpy in order to
|
The default value in this configuration has been intentionally left empty in order to
|
||||||
honor the old hbase.regionserver.global.memstore.lowerLimit property if present.</description>
|
honor the old hbase.regionserver.global.memstore.lowerLimit property if present.</description>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
|
@ -356,7 +356,8 @@ possible configurations would overwhelm and obscure the important.
|
||||||
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
||||||
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
||||||
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
||||||
For example, if a HBase region server connects to a ZK ensemble that's also managed by HBase, then the
|
For example, if an HBase region server connects to a ZK ensemble that's also managed
|
||||||
|
by HBase, then the
|
||||||
session timeout will be the one specified by this configuration. But, a region server that connects
|
session timeout will be the one specified by this configuration. But, a region server that connects
|
||||||
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
|
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
|
||||||
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
|
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
|
||||||
|
@ -368,7 +369,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<value>/hbase</value>
|
<value>/hbase</value>
|
||||||
<description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
|
<description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
|
||||||
files that are configured with a relative path will go under this node.
|
files that are configured with a relative path will go under this node.
|
||||||
By default, all of HBase's ZooKeeper file path are configured with a
|
By default, all of HBase's ZooKeeper file paths are configured with a
|
||||||
relative path, so they will all go under this directory unless changed.</description>
|
relative path, so they will all go under this directory unless changed.</description>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
|
@ -598,8 +599,8 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<name>hbase.server.versionfile.writeattempts</name>
|
<name>hbase.server.versionfile.writeattempts</name>
|
||||||
<value>3</value>
|
<value>3</value>
|
||||||
<description>
|
<description>
|
||||||
How many time to retry attempting to write a version file
|
How many times to retry attempting to write a version file
|
||||||
before just aborting. Each attempt is seperated by the
|
before just aborting. Each attempt is separated by the
|
||||||
hbase.server.thread.wakefrequency milliseconds.</description>
|
hbase.server.thread.wakefrequency milliseconds.</description>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
|
@ -739,7 +740,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<description>A StoreFile (or a selection of StoreFiles, when using ExploringCompactionPolicy)
|
<description>A StoreFile (or a selection of StoreFiles, when using ExploringCompactionPolicy)
|
||||||
smaller than this size will always be eligible for minor compaction.
|
smaller than this size will always be eligible for minor compaction.
|
||||||
HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if
|
HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if
|
||||||
they are eligible. Because this limit represents the "automatic include"limit for all
|
they are eligible. Because this limit represents the "automatic include" limit for all
|
||||||
StoreFiles smaller than this value, this value may need to be reduced in write-heavy
|
StoreFiles smaller than this value, this value may need to be reduced in write-heavy
|
||||||
environments where many StoreFiles in the 1-2 MB range are being flushed, because every
|
environments where many StoreFiles in the 1-2 MB range are being flushed, because every
|
||||||
StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the
|
StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the
|
||||||
|
@ -808,7 +809,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<value>2684354560</value>
|
<value>2684354560</value>
|
||||||
<description>There are two different thread pools for compactions, one for large compactions and
|
<description>There are two different thread pools for compactions, one for large compactions and
|
||||||
the other for small compactions. This helps to keep compaction of lean tables (such as
|
the other for small compactions. This helps to keep compaction of lean tables (such as
|
||||||
<systemitem>hbase:meta</systemitem>) fast. If a compaction is larger than this threshold, it
|
hbase:meta) fast. If a compaction is larger than this threshold, it
|
||||||
goes into the large compaction pool. In most cases, the default value is appropriate. Default:
|
goes into the large compaction pool. In most cases, the default value is appropriate. Default:
|
||||||
2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size (which defaults to 128MB).
|
2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size (which defaults to 128MB).
|
||||||
The value field assumes that the value of hbase.hregion.memstore.flush.size is unchanged from
|
The value field assumes that the value of hbase.hregion.memstore.flush.size is unchanged from
|
||||||
|
@ -1111,8 +1112,8 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<description>Set to true to skip the 'hbase.defaults.for.version' check.
|
<description>Set to true to skip the 'hbase.defaults.for.version' check.
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
IDE. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (\${hbase.version}), this
|
seems to be for and old version of HBase (\${hbase.version}), this
|
||||||
version is X.X.X-SNAPSHOT"</description>
|
version is X.X.X-SNAPSHOT"</description>
|
||||||
</property>
|
</property>
|
||||||
|
@ -1209,7 +1210,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<property>
|
<property>
|
||||||
<name>hbase.rootdir.perms</name>
|
<name>hbase.rootdir.perms</name>
|
||||||
<value>700</value>
|
<value>700</value>
|
||||||
<description>FS Permissions for the root directory in a secure(kerberos) setup.
|
<description>FS Permissions for the root directory in a secure (kerberos) setup.
|
||||||
When master starts, it creates the rootdir with this permissions or sets the permissions
|
When master starts, it creates the rootdir with this permissions or sets the permissions
|
||||||
if it does not match.</description>
|
if it does not match.</description>
|
||||||
</property>
|
</property>
|
||||||
|
@ -1523,7 +1524,7 @@ possible configurations would overwhelm and obscure the important.
|
||||||
<description>
|
<description>
|
||||||
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
|
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
|
||||||
If this is enabled, a replication peer named "region_replica_replication" will be created
|
If this is enabled, a replication peer named "region_replica_replication" will be created
|
||||||
which will tail the logs and replicate the mutatations to region replicas for tables that
|
which will tail the logs and replicate the mutations to region replicas for tables that
|
||||||
have region replication > 1. If this is enabled once, disabling this replication also
|
have region replication > 1. If this is enabled once, disabling this replication also
|
||||||
requires disabling the replication peer using shell or ReplicationAdmin java class.
|
requires disabling the replication peer using shell or ReplicationAdmin java class.
|
||||||
Replication to secondary region replicas works over standard inter-cluster replication.
|
Replication to secondary region replicas works over standard inter-cluster replication.
|
||||||
|
|
|
@ -136,7 +136,7 @@
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (@@@VERSION@@@), this
|
seems to be for and old version of HBase (@@@VERSION@@@), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
</description>
|
</description>
|
||||||
|
|
|
@ -144,7 +144,7 @@
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (@@@VERSION@@@), this
|
seems to be for and old version of HBase (@@@VERSION@@@), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
</description>
|
</description>
|
||||||
|
|
|
@ -140,7 +140,7 @@
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (@@@VERSION@@@), this
|
seems to be for and old version of HBase (@@@VERSION@@@), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
</description>
|
</description>
|
||||||
|
|
|
@ -144,7 +144,7 @@
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (@@@VERSION@@@), this
|
seems to be for and old version of HBase (@@@VERSION@@@), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
</description>
|
</description>
|
||||||
|
|
|
@ -144,7 +144,7 @@
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (@@@VERSION@@@), this
|
seems to be for and old version of HBase (@@@VERSION@@@), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
</description>
|
</description>
|
||||||
|
|
|
@ -125,7 +125,7 @@ This directory also stores images used in the HBase Reference Guide.
|
||||||
|
|
||||||
The website's pages are written in an HTML-like XML dialect called xdoc, which
|
The website's pages are written in an HTML-like XML dialect called xdoc, which
|
||||||
has a reference guide at
|
has a reference guide at
|
||||||
link:http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
|
http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
|
||||||
You can edit these files in a plain-text editor, an IDE, or an XML editor such
|
You can edit these files in a plain-text editor, an IDE, or an XML editor such
|
||||||
as XML Mind XML Editor (XXE) or Oxygen XML Author.
|
as XML Mind XML Editor (XXE) or Oxygen XML Author.
|
||||||
|
|
||||||
|
@ -159,7 +159,7 @@ artifacts to the 0.94/ directory of the `asf-site` branch.
|
||||||
|
|
||||||
The HBase Reference Guide is written in Asciidoc and built using link:http://asciidoctor.org[AsciiDoctor].
|
The HBase Reference Guide is written in Asciidoc and built using link:http://asciidoctor.org[AsciiDoctor].
|
||||||
The following cheat sheet is included for your reference. More nuanced and comprehensive documentation
|
The following cheat sheet is included for your reference. More nuanced and comprehensive documentation
|
||||||
is available at link:http://asciidoctor.org/docs/user-manual/.
|
is available at http://asciidoctor.org/docs/user-manual/.
|
||||||
|
|
||||||
.AsciiDoc Cheat Sheet
|
.AsciiDoc Cheat Sheet
|
||||||
[cols="1,1,a",options="header"]
|
[cols="1,1,a",options="header"]
|
||||||
|
@ -186,7 +186,8 @@ is available at link:http://asciidoctor.org/docs/user-manual/.
|
||||||
include\::path/to/app.rb[]
|
include\::path/to/app.rb[]
|
||||||
----
|
----
|
||||||
................
|
................
|
||||||
| Include only part of a separate file | Similar to Javadoc | See link:http://asciidoctor.org/docs/user-manual/#by-tagged-regions
|
| Include only part of a separate file | Similar to Javadoc
|
||||||
|
| See http://asciidoctor.org/docs/user-manual/#by-tagged-regions
|
||||||
| Filenames, directory names, new terms | italic | \_hbase-default.xml_
|
| Filenames, directory names, new terms | italic | \_hbase-default.xml_
|
||||||
| External naked URLs | A link with the URL as link text |
|
| External naked URLs | A link with the URL as link text |
|
||||||
----
|
----
|
||||||
|
@ -285,7 +286,11 @@ Title:: content
|
||||||
Title::
|
Title::
|
||||||
content
|
content
|
||||||
----
|
----
|
||||||
| Sidebars, quotes, or other blocks of text | a block of text, formatted differently from the default | Delimited using different delimiters, see link:http://asciidoctor.org/docs/user-manual/#built-in-blocks-summary. Some of the examples above use delimiters like \...., ----,====.
|
| Sidebars, quotes, or other blocks of text
|
||||||
|
| a block of text, formatted differently from the default
|
||||||
|
| Delimited using different delimiters,
|
||||||
|
see http://asciidoctor.org/docs/user-manual/#built-in-blocks-summary.
|
||||||
|
Some of the examples above use delimiters like \...., ----,====.
|
||||||
........
|
........
|
||||||
[example]
|
[example]
|
||||||
====
|
====
|
||||||
|
|
|
@ -252,7 +252,8 @@ However, the version is always stored as the last four-byte integer in the file.
|
||||||
|===
|
|===
|
||||||
| Version 1 | Version 2
|
| Version 1 | Version 2
|
||||||
| |File info offset (long)
|
| |File info offset (long)
|
||||||
| Data index offset (long)| loadOnOpenOffset (long) /The offset of the sectionthat we need toload when opening the file./
|
| Data index offset (long)
|
||||||
|
| loadOnOpenOffset (long) /The offset of the section that we need to load when opening the file./
|
||||||
| | Number of data index entries (int)
|
| | Number of data index entries (int)
|
||||||
| metaIndexOffset (long) /This field is not being used by the version 1 reader, so we removed it from version 2./ | uncompressedDataIndexSize (long) /The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks./
|
| metaIndexOffset (long) /This field is not being used by the version 1 reader, so we removed it from version 2./ | uncompressedDataIndexSize (long) /The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks./
|
||||||
| | Number of meta index entries (int)
|
| | Number of meta index entries (int)
|
||||||
|
@ -260,7 +261,7 @@ However, the version is always stored as the last four-byte integer in the file.
|
||||||
| numEntries (int) | numEntries (long)
|
| numEntries (int) | numEntries (long)
|
||||||
| Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int) | Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
|
| Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int) | Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
|
||||||
| | The number of levels in the data block index (int)
|
| | The number of levels in the data block index (int)
|
||||||
| | firstDataBlockOffset (long) /The offset of the first first data block. Used when scanning./
|
| | firstDataBlockOffset (long) /The offset of the first data block. Used when scanning./
|
||||||
| | lastDataBlockEnd (long) /The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning./
|
| | lastDataBlockEnd (long) /The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning./
|
||||||
| Version: 1 (int) | Version: 2 (int)
|
| Version: 1 (int) | Version: 2 (int)
|
||||||
|===
|
|===
|
||||||
|
|
|
@ -41,7 +41,8 @@ Technically speaking, HBase is really more a "Data Store" than "Data Base" becau
|
||||||
However, HBase has many features which supports both linear and modular scaling.
|
However, HBase has many features which supports both linear and modular scaling.
|
||||||
HBase clusters expand by adding RegionServers that are hosted on commodity class servers.
|
HBase clusters expand by adding RegionServers that are hosted on commodity class servers.
|
||||||
If a cluster expands from 10 to 20 RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
|
If a cluster expands from 10 to 20 RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
|
||||||
RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best performance requires specialized hardware and storage devices.
|
An RDBMS can scale well, but only up to a point - specifically, the size of a single database
|
||||||
|
server - and for the best performance requires specialized hardware and storage devices.
|
||||||
HBase features of note are:
|
HBase features of note are:
|
||||||
|
|
||||||
* Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore.
|
* Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore.
|
||||||
|
@ -140,7 +141,7 @@ If a region has both an empty start and an empty end key, it is the only region
|
||||||
|
|
||||||
In the (hopefully unlikely) event that programmatic processing of catalog metadata
|
In the (hopefully unlikely) event that programmatic processing of catalog metadata
|
||||||
is required, see the
|
is required, see the
|
||||||
+++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</a>+++
|
+++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte%5B%5D%29">Writables</a>+++
|
||||||
utility.
|
utility.
|
||||||
|
|
||||||
[[arch.catalog.startup]]
|
[[arch.catalog.startup]]
|
||||||
|
@ -172,7 +173,7 @@ The API changed in HBase 1.0. For connection configuration information, see <<cl
|
||||||
|
|
||||||
==== API as of HBase 1.0.0
|
==== API as of HBase 1.0.0
|
||||||
|
|
||||||
Its been cleaned up and users are returned Interfaces to work against rather than particular types.
|
It's been cleaned up and users are returned Interfaces to work against rather than particular types.
|
||||||
In HBase 1.0, obtain a `Connection` object from `ConnectionFactory` and thereafter, get from it instances of `Table`, `Admin`, and `RegionLocator` on an as-need basis.
|
In HBase 1.0, obtain a `Connection` object from `ConnectionFactory` and thereafter, get from it instances of `Table`, `Admin`, and `RegionLocator` on an as-need basis.
|
||||||
When done, close the obtained instances.
|
When done, close the obtained instances.
|
||||||
Finally, be sure to cleanup your `Connection` instance before exiting.
|
Finally, be sure to cleanup your `Connection` instance before exiting.
|
||||||
|
@ -295,7 +296,11 @@ scan.setFilter(list);
|
||||||
[[client.filter.cv.scvf]]
|
[[client.filter.cv.scvf]]
|
||||||
==== SingleColumnValueFilter
|
==== SingleColumnValueFilter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html[SingleColumnValueFilter] can be used to test column values for equivalence (`CompareOp.EQUAL`), inequality (`CompareOp.NOT_EQUAL`), or ranges (e.g., `CompareOp.GREATER`). The following is example of testing equivalence a column to a String value "my value"...
|
A SingleColumnValueFilter (see:
|
||||||
|
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
|
||||||
|
can be used to test column values for equivalence (`CompareOp.EQUAL`),
|
||||||
|
inequality (`CompareOp.NOT_EQUAL`), or ranges (e.g., `CompareOp.GREATER`). The following is an
|
||||||
|
example of testing equivalence of a column to a String value "my value"...
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
|
@ -694,7 +699,8 @@ Here are others that you may have to take into account:
|
||||||
|
|
||||||
Catalog Tables::
|
Catalog Tables::
|
||||||
The `-ROOT-` (prior to HBase 0.96, see <<arch.catalog.root,arch.catalog.root>>) and `hbase:meta` tables are forced into the block cache and have the in-memory priority which means that they are harder to evict.
|
The `-ROOT-` (prior to HBase 0.96, see <<arch.catalog.root,arch.catalog.root>>) and `hbase:meta` tables are forced into the block cache and have the in-memory priority which means that they are harder to evict.
|
||||||
The former never uses more than a few hundreds bytes while the latter can occupy a few MBs (depending on the number of regions).
|
The former never uses more than a few hundred bytes while the latter can occupy a few MBs
|
||||||
|
(depending on the number of regions).
|
||||||
|
|
||||||
HFiles Indexes::
|
HFiles Indexes::
|
||||||
An _HFile_ is the file format that HBase uses to store data in HDFS.
|
An _HFile_ is the file format that HBase uses to store data in HDFS.
|
||||||
|
@ -878,7 +884,10 @@ image::region_split_process.png[Region Split Process]
|
||||||
. The Master learns about this znode, since it has a watcher for the parent `region-in-transition` znode.
|
. The Master learns about this znode, since it has a watcher for the parent `region-in-transition` znode.
|
||||||
. The RegionServer creates a sub-directory named `.splits` under the parent’s `region` directory in HDFS.
|
. The RegionServer creates a sub-directory named `.splits` under the parent’s `region` directory in HDFS.
|
||||||
. The RegionServer closes the parent region and marks the region as offline in its local data structures. *THE SPLITTING REGION IS NOW OFFLINE.* At this point, client requests coming to the parent region will throw `NotServingRegionException`. The client will retry with some backoff. The closing region is flushed.
|
. The RegionServer closes the parent region and marks the region as offline in its local data structures. *THE SPLITTING REGION IS NOW OFFLINE.* At this point, client requests coming to the parent region will throw `NotServingRegionException`. The client will retry with some backoff. The closing region is flushed.
|
||||||
. The RegionServer creates region directories under the `.splits` directory, for daughter regions A and B, and creates necessary data structures. Then it splits the store files, in the sense that it creates two link:http://www.google.com/url?q=http%3A%2F%2Fhbase.apache.org%2Fapidocs%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fio%2FReference.html&sa=D&sntz=1&usg=AFQjCNEkCbADZ3CgKHTtGYI8bJVwp663CA[Reference] files per store file in the parent region. Those reference files will point to the parent regions'files.
|
. The RegionServer creates region directories under the `.splits` directory, for daughter
|
||||||
|
regions A and B, and creates necessary data structures. Then it splits the store files,
|
||||||
|
in the sense that it creates two Reference files per store file in the parent region.
|
||||||
|
Those reference files will point to the parent region's files.
|
||||||
. The RegionServer creates the actual region directory in HDFS, and moves the reference files for each daughter.
|
. The RegionServer creates the actual region directory in HDFS, and moves the reference files for each daughter.
|
||||||
. The RegionServer sends a `Put` request to the `.META.` table, to set the parent as offline in the `.META.` table and add information about daughter regions. At this point, there won’t be individual entries in `.META.` for the daughters. Clients will see that the parent region is split if they scan `.META.`, but won’t know about the daughters until they appear in `.META.`. Also, if this `Put` to `.META`. succeeds, the parent will be effectively split. If the RegionServer fails before this RPC succeeds, Master and the next Region Server opening the region will clean dirty state about the region split. After the `.META.` update, though, the region split will be rolled-forward by Master.
|
. The RegionServer sends a `Put` request to the `.META.` table, to set the parent as offline in the `.META.` table and add information about daughter regions. At this point, there won’t be individual entries in `.META.` for the daughters. Clients will see that the parent region is split if they scan `.META.`, but won’t know about the daughters until they appear in `.META.`. Also, if this `Put` to `.META`. succeeds, the parent will be effectively split. If the RegionServer fails before this RPC succeeds, Master and the next Region Server opening the region will clean dirty state about the region split. After the `.META.` update, though, the region split will be rolled-forward by Master.
|
||||||
. The RegionServer opens daughters A and B in parallel.
|
. The RegionServer opens daughters A and B in parallel.
|
||||||
|
@ -1008,7 +1017,8 @@ If you set the `hbase.hlog.split.skip.errors` option to `true`, errors are treat
|
||||||
* Processing of the WAL will continue
|
* Processing of the WAL will continue
|
||||||
|
|
||||||
If the `hbase.hlog.split.skip.errors` option is set to `false`, the default, the exception will be propagated and the split will be logged as failed.
|
If the `hbase.hlog.split.skip.errors` option is set to `false`, the default, the exception will be propagated and the split will be logged as failed.
|
||||||
See link:https://issues.apache.org/jira/browse/HBASE-2958[HBASE-2958 When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it].
|
See link:https://issues.apache.org/jira/browse/HBASE-2958[HBASE-2958 When
|
||||||
|
hbase.hlog.split.skip.errors is set to false, we fail the split but that's it].
|
||||||
We need to do more than just fail split if this flag is set.
|
We need to do more than just fail split if this flag is set.
|
||||||
|
|
||||||
====== How EOFExceptions are treated when splitting a crashed RegionServer's WALs
|
====== How EOFExceptions are treated when splitting a crashed RegionServer's WALs
|
||||||
|
@ -1117,7 +1127,8 @@ Based on the state of the task whose data is changed, the split log manager does
|
||||||
Each RegionServer runs a daemon thread called the _split log worker_, which does the work to split the logs.
|
Each RegionServer runs a daemon thread called the _split log worker_, which does the work to split the logs.
|
||||||
The daemon thread starts when the RegionServer starts, and registers itself to watch HBase znodes.
|
The daemon thread starts when the RegionServer starts, and registers itself to watch HBase znodes.
|
||||||
If any splitlog znode children change, it notifies a sleeping worker thread to wake up and grab more tasks.
|
If any splitlog znode children change, it notifies a sleeping worker thread to wake up and grab more tasks.
|
||||||
If if a worker's current task's node data is changed, the worker checks to see if the task has been taken by another worker.
|
If a worker's current task's node data is changed,
|
||||||
|
the worker checks to see if the task has been taken by another worker.
|
||||||
If so, the worker thread stops work on the current task.
|
If so, the worker thread stops work on the current task.
|
||||||
+
|
+
|
||||||
The worker monitors the splitlog znode constantly.
|
The worker monitors the splitlog znode constantly.
|
||||||
|
@ -1127,7 +1138,7 @@ At this point, the split log worker scans for another unclaimed task.
|
||||||
+
|
+
|
||||||
.How the Split Log Worker Approaches a Task
|
.How the Split Log Worker Approaches a Task
|
||||||
* It queries the task state and only takes action if the task is in `TASK_UNASSIGNED `state.
|
* It queries the task state and only takes action if the task is in `TASK_UNASSIGNED `state.
|
||||||
* If the task is is in `TASK_UNASSIGNED` state, the worker attempts to set the state to `TASK_OWNED` by itself.
|
* If the task is in `TASK_UNASSIGNED` state, the worker attempts to set the state to `TASK_OWNED` by itself.
|
||||||
If it fails to set the state, another worker will try to grab it.
|
If it fails to set the state, another worker will try to grab it.
|
||||||
The split log manager will also ask all workers to rescan later if the task remains unassigned.
|
The split log manager will also ask all workers to rescan later if the task remains unassigned.
|
||||||
* If the worker succeeds in taking ownership of the task, it tries to get the task state again to make sure it really gets it asynchronously.
|
* If the worker succeeds in taking ownership of the task, it tries to get the task state again to make sure it really gets it asynchronously.
|
||||||
|
@ -1135,7 +1146,7 @@ At this point, the split log worker scans for another unclaimed task.
|
||||||
** Get the HBase root folder, create a temp folder under the root, and split the log file to the temp folder.
|
** Get the HBase root folder, create a temp folder under the root, and split the log file to the temp folder.
|
||||||
** If the split was successful, the task executor sets the task to state `TASK_DONE`.
|
** If the split was successful, the task executor sets the task to state `TASK_DONE`.
|
||||||
** If the worker catches an unexpected IOException, the task is set to state `TASK_ERR`.
|
** If the worker catches an unexpected IOException, the task is set to state `TASK_ERR`.
|
||||||
** If the worker is shutting down, set the the task to state `TASK_RESIGNED`.
|
** If the worker is shutting down, set the task to state `TASK_RESIGNED`.
|
||||||
** If the task is taken by another worker, just log it.
|
** If the task is taken by another worker, just log it.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1326,7 +1337,7 @@ image::region_states.png[]
|
||||||
. Before assigning a region, the master moves the region to `OFFLINE` state automatically if it is in `CLOSED` state.
|
. Before assigning a region, the master moves the region to `OFFLINE` state automatically if it is in `CLOSED` state.
|
||||||
. When a RegionServer is about to split a region, it notifies the master.
|
. When a RegionServer is about to split a region, it notifies the master.
|
||||||
The master moves the region to be split from `OPEN` to `SPLITTING` state and add the two new regions to be created to the RegionServer.
|
The master moves the region to be split from `OPEN` to `SPLITTING` state and add the two new regions to be created to the RegionServer.
|
||||||
These two regions are in `SPLITING_NEW` state initially.
|
These two regions are in `SPLITTING_NEW` state initially.
|
||||||
. After notifying the master, the RegionServer starts to split the region.
|
. After notifying the master, the RegionServer starts to split the region.
|
||||||
Once past the point of no return, the RegionServer notifies the master again so the master can update the `hbase:meta` table.
|
Once past the point of no return, the RegionServer notifies the master again so the master can update the `hbase:meta` table.
|
||||||
However, the master does not update the region states until it is notified by the server that the split is done.
|
However, the master does not update the region states until it is notified by the server that the split is done.
|
||||||
|
@ -1404,8 +1415,8 @@ hbase> create 'test', {METHOD => 'table_att', CONFIG => {'SPLIT_POLICY' => 'org.
|
||||||
----
|
----
|
||||||
|
|
||||||
The default split policy can be overwritten using a custom
|
The default split policy can be overwritten using a custom
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html
|
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy(HBase 0.94+)].
|
||||||
[RegionSplitPolicy(HBase 0.94+)]. Typically a custom split policy should extend HBase's default split policy:
|
Typically a custom split policy should extend HBase's default split policy:
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html[ConstantSizeRegionSplitPolicy].
|
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html[ConstantSizeRegionSplitPolicy].
|
||||||
|
|
||||||
The policy can be set globally through the HBaseConfiguration used or on a per table basis:
|
The policy can be set globally through the HBaseConfiguration used or on a per table basis:
|
||||||
|
@ -1972,8 +1983,8 @@ Why?
|
||||||
* 100 -> No, because sum(50, 23, 12, 12) * 1.0 = 97.
|
* 100 -> No, because sum(50, 23, 12, 12) * 1.0 = 97.
|
||||||
* 50 -> No, because sum(23, 12, 12) * 1.0 = 47.
|
* 50 -> No, because sum(23, 12, 12) * 1.0 = 47.
|
||||||
* 23 -> Yes, because sum(12, 12) * 1.0 = 24.
|
* 23 -> Yes, because sum(12, 12) * 1.0 = 24.
|
||||||
* 12 -> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
|
* 12 -> Yes, because the previous file has been included, and because this does not exceed the max-file limit of 5
|
||||||
* 12 -> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
|
* 12 -> Yes, because the previous file had been included, and because this does not exceed the max-file limit of 5.
|
||||||
|
|
||||||
[[compaction.file.selection.example2]]
|
[[compaction.file.selection.example2]]
|
||||||
====== Minor Compaction File Selection - Example #2 (Not Enough Files ToCompact)
|
====== Minor Compaction File Selection - Example #2 (Not Enough Files ToCompact)
|
||||||
|
@ -2234,7 +2245,7 @@ See link:http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and
|
||||||
[[arch.bulk.load.adv]]
|
[[arch.bulk.load.adv]]
|
||||||
=== Advanced Usage
|
=== Advanced Usage
|
||||||
|
|
||||||
Although the `importtsv` tool is useful in many cases, advanced users may want to generate data programatically, or import data from other formats.
|
Although the `importtsv` tool is useful in many cases, advanced users may want to generate data programmatically, or import data from other formats.
|
||||||
To get started doing so, dig into `ImportTsv.java` and check the JavaDoc for HFileOutputFormat.
|
To get started doing so, dig into `ImportTsv.java` and check the JavaDoc for HFileOutputFormat.
|
||||||
|
|
||||||
The import step of the bulk load can also be done programmatically.
|
The import step of the bulk load can also be done programmatically.
|
||||||
|
@ -2330,8 +2341,8 @@ In terms of semantics, TIMELINE consistency as implemented by HBase differs from
|
||||||
.Timeline Consistency
|
.Timeline Consistency
|
||||||
image::timeline_consistency.png[Timeline Consistency]
|
image::timeline_consistency.png[Timeline Consistency]
|
||||||
|
|
||||||
To better understand the TIMELINE semantics, lets look at the above diagram.
|
To better understand the TIMELINE semantics, let's look at the above diagram.
|
||||||
Lets say that there are two clients, and the first one writes x=1 at first, then x=2 and x=3 later.
|
Let's say that there are two clients, and the first one writes x=1 at first, then x=2 and x=3 later.
|
||||||
As above, all writes are handled by the primary region replica.
|
As above, all writes are handled by the primary region replica.
|
||||||
The writes are saved in the write ahead log (WAL), and replicated to the other replicas asynchronously.
|
The writes are saved in the write ahead log (WAL), and replicated to the other replicas asynchronously.
|
||||||
In the above diagram, notice that replica_id=1 received 2 updates, and its data shows that x=2, while the replica_id=2 only received a single update, and its data shows that x=1.
|
In the above diagram, notice that replica_id=1 received 2 updates, and its data shows that x=2, while the replica_id=2 only received a single update, and its data shows that x=1.
|
||||||
|
@ -2367,7 +2378,7 @@ The regions opened in secondary mode will share the same data files with the pri
|
||||||
This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items.
|
This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items.
|
||||||
|
|
||||||
=== Propagating writes to region replicas
|
=== Propagating writes to region replicas
|
||||||
As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommeded.
|
As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommended.
|
||||||
|
|
||||||
==== StoreFile Refresher
|
==== StoreFile Refresher
|
||||||
The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time.
|
The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time.
|
||||||
|
@ -2399,7 +2410,7 @@ Currently, Async WAL Replication is not done for the META table’s WAL. The met
|
||||||
The secondary region replicas refer to the data files of the primary region replica, but they have their own memstores (in HBase-1.1+) and uses block cache as well. However, one distinction is that the secondary region replicas cannot flush the data when there is memory pressure for their memstores. They can only free up memstore memory when the primary region does a flush and this flush is replicated to the secondary. Since in a region server hosting primary replicas for some regions and secondaries for some others, the secondaries might cause extra flushes to the primary regions in the same host. In extreme situations, there can be no memory left for adding new writes coming from the primary via wal replication. For unblocking this situation (and since secondary cannot flush by itself), the secondary is allowed to do a “store file refresh” by doing a file system list operation to pick up new files from primary, and possibly dropping its memstore. This refresh will only be performed if the memstore size of the biggest secondary region replica is at least `hbase.region.replica.storefile.refresh.memstore.multiplier` (default 4) times bigger than the biggest memstore of a primary replica. One caveat is that if this is performed, the secondary can observe partial row updates across column families (since column families are flushed independently). The default should be good to not do this operation frequently. You can set this value to a large number to disable this feature if desired, but be warned that it might cause the replication to block forever.
|
The secondary region replicas refer to the data files of the primary region replica, but they have their own memstores (in HBase-1.1+) and uses block cache as well. However, one distinction is that the secondary region replicas cannot flush the data when there is memory pressure for their memstores. They can only free up memstore memory when the primary region does a flush and this flush is replicated to the secondary. Since in a region server hosting primary replicas for some regions and secondaries for some others, the secondaries might cause extra flushes to the primary regions in the same host. In extreme situations, there can be no memory left for adding new writes coming from the primary via wal replication. For unblocking this situation (and since secondary cannot flush by itself), the secondary is allowed to do a “store file refresh” by doing a file system list operation to pick up new files from primary, and possibly dropping its memstore. This refresh will only be performed if the memstore size of the biggest secondary region replica is at least `hbase.region.replica.storefile.refresh.memstore.multiplier` (default 4) times bigger than the biggest memstore of a primary replica. One caveat is that if this is performed, the secondary can observe partial row updates across column families (since column families are flushed independently). The default should be good to not do this operation frequently. You can set this value to a large number to disable this feature if desired, but be warned that it might cause the replication to block forever.
|
||||||
|
|
||||||
=== Secondary replica failover
|
=== Secondary replica failover
|
||||||
When a secondary region replica first comes online, or fails over, it may have served some edits from it’s memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a “region open event” replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message “The region's reads are disabled”. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this feature if needed.
|
When a secondary region replica first comes online, or fails over, it may have served some edits from its memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a “region open event” replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message “The region's reads are disabled”. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this feature if needed.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@ -2435,7 +2446,7 @@ Instead you can change the number of region replicas per table to increase or de
|
||||||
<name>hbase.region.replica.replication.enabled</name>
|
<name>hbase.region.replica.replication.enabled</name>
|
||||||
<value>true</value>
|
<value>true</value>
|
||||||
<description>
|
<description>
|
||||||
Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutatations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work.
|
Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work.
|
||||||
</description>
|
</description>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
|
@ -2603,7 +2614,7 @@ hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
|
||||||
|
|
||||||
==== Java
|
==== Java
|
||||||
|
|
||||||
You can set set the consistency for Gets and Scans and do requests as follows.
|
You can set the consistency for Gets and Scans and do requests as follows.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
|
|
|
@ -55,7 +55,7 @@ These jobs were consistently found to be waiting on map and reduce tasks assigne
|
||||||
|
|
||||||
.Datanodes:
|
.Datanodes:
|
||||||
* Two 12-core processors
|
* Two 12-core processors
|
||||||
* Six Enerprise SATA disks
|
* Six Enterprise SATA disks
|
||||||
* 24GB of RAM
|
* 24GB of RAM
|
||||||
* Two bonded gigabit NICs
|
* Two bonded gigabit NICs
|
||||||
|
|
||||||
|
|
|
@ -56,7 +56,7 @@ If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice.
|
||||||
|
|
||||||
Patches that span components need at least two +1s before they can be committed, preferably +1s by owners of components touched by the x-component patch (TODO: This needs tightening up but I think fine for first pass).
|
Patches that span components need at least two +1s before they can be committed, preferably +1s by owners of components touched by the x-component patch (TODO: This needs tightening up but I think fine for first pass).
|
||||||
|
|
||||||
Any -1 on a patch by anyone vetos a patch; it cannot be committed until the justification for the -1 is addressed.
|
Any -1 on a patch by anyone vetoes a patch; it cannot be committed until the justification for the -1 is addressed.
|
||||||
|
|
||||||
[[hbase.fix.version.in.jira]]
|
[[hbase.fix.version.in.jira]]
|
||||||
.How to set fix version in JIRA on issue resolve
|
.How to set fix version in JIRA on issue resolve
|
||||||
|
|
|
@ -151,7 +151,7 @@ If you see the following in your HBase logs, you know that HBase was unable to l
|
||||||
----
|
----
|
||||||
If the libraries loaded successfully, the WARN message does not show.
|
If the libraries loaded successfully, the WARN message does not show.
|
||||||
|
|
||||||
Lets presume your Hadoop shipped with a native library that suits the platform you are running HBase on.
|
Let's presume your Hadoop shipped with a native library that suits the platform you are running HBase on.
|
||||||
To check if the Hadoop native library is available to HBase, run the following tool (available in Hadoop 2.1 and greater):
|
To check if the Hadoop native library is available to HBase, run the following tool (available in Hadoop 2.1 and greater):
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
@ -170,7 +170,7 @@ Above shows that the native hadoop library is not available in HBase context.
|
||||||
To fix the above, either copy the Hadoop native libraries local or symlink to them if the Hadoop and HBase stalls are adjacent in the filesystem.
|
To fix the above, either copy the Hadoop native libraries local or symlink to them if the Hadoop and HBase stalls are adjacent in the filesystem.
|
||||||
You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable.
|
You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable.
|
||||||
|
|
||||||
Where the JVM looks to find native librarys is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM` is the label for the platform your HBase is installed on.
|
Where the JVM looks to find native libraries is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM` is the label for the platform your HBase is installed on.
|
||||||
On a local linux machine, it seems to be the concatenation of the java properties `os.name` and `os.arch` followed by whether 32 or 64 bit.
|
On a local linux machine, it seems to be the concatenation of the java properties `os.name` and `os.arch` followed by whether 32 or 64 bit.
|
||||||
HBase on startup prints out all of the java system properties so find the os.name and os.arch in the log.
|
HBase on startup prints out all of the java system properties so find the os.name and os.arch in the log.
|
||||||
For example:
|
For example:
|
||||||
|
|
|
@ -162,7 +162,7 @@ For example, assuming that a schema had 3 ColumnFamilies per region with an aver
|
||||||
+
|
+
|
||||||
Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS issues thread on the hbase-users mailing list, from 2011.
|
Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS issues thread on the hbase-users mailing list, from 2011.
|
||||||
+
|
+
|
||||||
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration Parameters: What can you just ignore?
|
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on your hadoop cluster is Aaron Kimball's Configuration Parameters: What can you just ignore?
|
||||||
+
|
+
|
||||||
.`ulimit` Settings on Ubuntu
|
.`ulimit` Settings on Ubuntu
|
||||||
====
|
====
|
||||||
|
@ -410,7 +410,7 @@ Zookeeper binds to a well known port so clients may talk to HBase.
|
||||||
|
|
||||||
=== Distributed
|
=== Distributed
|
||||||
|
|
||||||
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
|
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a. _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
|
||||||
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
|
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
|
||||||
|
|
||||||
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
|
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
|
||||||
|
@ -540,7 +540,7 @@ HBase logs can be found in the _logs_ subdirectory.
|
||||||
Check them out especially if HBase had trouble starting.
|
Check them out especially if HBase had trouble starting.
|
||||||
|
|
||||||
HBase also puts up a UI listing vital attributes.
|
HBase also puts up a UI listing vital attributes.
|
||||||
By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at _http://master.example.org:16010_ to see the web interface.
|
By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at pass:[http://master.example.org:16010] to see the web interface.
|
||||||
|
|
||||||
Prior to HBase 0.98 the master UI was deployed on port 60010, and the HBase RegionServers UI on port 60030.
|
Prior to HBase 0.98 the master UI was deployed on port 60010, and the HBase RegionServers UI on port 60030.
|
||||||
|
|
||||||
|
@ -604,7 +604,7 @@ ZooKeeper is where all these values are kept.
|
||||||
Thus clients require the location of the ZooKeeper ensemble before they can do anything else.
|
Thus clients require the location of the ZooKeeper ensemble before they can do anything else.
|
||||||
Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
|
Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
|
||||||
|
|
||||||
If you are configuring an IDE to run a HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
|
If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
|
||||||
|
|
||||||
Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
|
Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
|
||||||
[source]
|
[source]
|
||||||
|
@ -917,7 +917,7 @@ See <<master.processes.loadbalancer,master.processes.loadbalancer>> for more inf
|
||||||
==== Disabling Blockcache
|
==== Disabling Blockcache
|
||||||
|
|
||||||
Do not turn off block cache (You'd do it by setting `hbase.block.cache.size` to zero). Currently we do not do well if you do this because the RegionServer will spend all its time loading HFile indices over and over again.
|
Do not turn off block cache (You'd do it by setting `hbase.block.cache.size` to zero). Currently we do not do well if you do this because the RegionServer will spend all its time loading HFile indices over and over again.
|
||||||
If your working set it such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
|
If your working set is such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
|
||||||
|
|
||||||
[[nagles]]
|
[[nagles]]
|
||||||
==== link:http://en.wikipedia.org/wiki/Nagle's_algorithm[Nagle's] or the small package problem
|
==== link:http://en.wikipedia.org/wiki/Nagle's_algorithm[Nagle's] or the small package problem
|
||||||
|
@ -930,7 +930,7 @@ You might also see the graphs on the tail of link:https://issues.apache.org/jira
|
||||||
==== Better Mean Time to Recover (MTTR)
|
==== Better Mean Time to Recover (MTTR)
|
||||||
|
|
||||||
This section is about configurations that will make servers come back faster after a fail.
|
This section is about configurations that will make servers come back faster after a fail.
|
||||||
See the Deveraj Das an Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
|
See the Deveraj Das and Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
|
||||||
|
|
||||||
The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
|
The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
|
||||||
The below suggested configurations are Varun's suggestions distilled and tested.
|
The below suggested configurations are Varun's suggestions distilled and tested.
|
||||||
|
@ -1087,7 +1087,7 @@ NOTE: To enable the HBase JMX implementation on Master, you also need to add bel
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
<property>
|
<property>
|
||||||
<ame>hbase.coprocessor.master.classes</name>
|
<name>hbase.coprocessor.master.classes</name>
|
||||||
<value>org.apache.hadoop.hbase.JMXListener</value>
|
<value>org.apache.hadoop.hbase.JMXListener</value>
|
||||||
</property>
|
</property>
|
||||||
----
|
----
|
||||||
|
|
|
@ -101,7 +101,7 @@ link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/Cop
|
||||||
|
|
||||||
. Load the Coprocessor: Currently there are two ways to load the Coprocessor. +
|
. Load the Coprocessor: Currently there are two ways to load the Coprocessor. +
|
||||||
Static:: Loading from configuration
|
Static:: Loading from configuration
|
||||||
Dynammic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). +
|
Dynamic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). +
|
||||||
For more details see <<cp_loading,Loading Coprocessors>>.
|
For more details see <<cp_loading,Loading Coprocessors>>.
|
||||||
|
|
||||||
. Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase
|
. Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase
|
||||||
|
@ -239,10 +239,10 @@ link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.h
|
||||||
From version 0.96, implementing Endpoint Coprocessor is not straight forward. Now it is done with
|
From version 0.96, implementing Endpoint Coprocessor is not straight forward. Now it is done with
|
||||||
the help of Google's Protocol Buffer. For more details on Protocol Buffer, please see
|
the help of Google's Protocol Buffer. For more details on Protocol Buffer, please see
|
||||||
link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
|
link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
|
||||||
Endpoints Coprocessor written in version 0.94 are not compatible with with version 0.96 or later
|
Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later
|
||||||
(for more details, see
|
(for more details, see
|
||||||
link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]),
|
link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]),
|
||||||
so if your are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you
|
so if you are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you
|
||||||
have to rewrite your Endpoint coprocessor.
|
have to rewrite your Endpoint coprocessor.
|
||||||
|
|
||||||
For example see <<cp_example,Examples>>
|
For example see <<cp_example,Examples>>
|
||||||
|
@ -252,7 +252,7 @@ For example see <<cp_example,Examples>>
|
||||||
== Loading Coprocessors
|
== Loading Coprocessors
|
||||||
|
|
||||||
_Loading of Coprocessor refers to the process of making your custom Coprocessor implementation
|
_Loading of Coprocessor refers to the process of making your custom Coprocessor implementation
|
||||||
available to the the HBase, so that when a requests comes in or an event takes place the desired
|
available to HBase, so that when a request comes in or an event takes place the desired
|
||||||
functionality implemented in your custom code gets executed. +
|
functionality implemented in your custom code gets executed. +
|
||||||
Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files)
|
Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files)
|
||||||
and the other one is dynamic loading (using hbase shell or java code).
|
and the other one is dynamic loading (using hbase shell or java code).
|
||||||
|
@ -271,10 +271,10 @@ sub elements <name> and <value> respectively.
|
||||||
... 'hbase.coprocessor.region.classes' for RegionObservers and Endpoints.
|
... 'hbase.coprocessor.region.classes' for RegionObservers and Endpoints.
|
||||||
... 'hbase.coprocessor.wal.classes' for WALObservers.
|
... 'hbase.coprocessor.wal.classes' for WALObservers.
|
||||||
... 'hbase.coprocessor.master.classes' for MasterObservers.
|
... 'hbase.coprocessor.master.classes' for MasterObservers.
|
||||||
.. <value> must contain the fully qualified class name of your class implmenting the Coprocessor.
|
.. <value> must contain the fully qualified class name of your class implementing the Coprocessor.
|
||||||
+
|
+
|
||||||
For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
|
For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
|
||||||
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directiory):
|
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
|
||||||
+
|
+
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
@ -297,7 +297,7 @@ When calling out to registered observers, the framework executes their callbacks
|
||||||
sorted order of their priority. +
|
sorted order of their priority. +
|
||||||
Ties are broken arbitrarily.
|
Ties are broken arbitrarily.
|
||||||
|
|
||||||
. Put your code on classpth of HBase: There are various ways to do so, like adding jars on
|
. Put your code on classpath of HBase: There are various ways to do so, like adding jars on
|
||||||
classpath etc. One easy way to do this is to drop the jar (containing you code and all the
|
classpath etc. One easy way to do this is to drop the jar (containing you code and all the
|
||||||
dependencies) in 'lib' folder of the HBase installation.
|
dependencies) in 'lib' folder of the HBase installation.
|
||||||
|
|
||||||
|
@ -455,7 +455,7 @@ hbase(main):003:0> alter 'users', METHOD => 'table_att_unset',
|
||||||
hbase(main):004:0* NAME => 'coprocessor$1'
|
hbase(main):004:0* NAME => 'coprocessor$1'
|
||||||
----
|
----
|
||||||
|
|
||||||
. Using HtableDescriptor: Simply reload the table definition _without_ setting the value of
|
. Using HTableDescriptor: Simply reload the table definition _without_ setting the value of
|
||||||
Coprocessor either in setValue() or addCoprocessor() methods. This will remove the Coprocessor
|
Coprocessor either in setValue() or addCoprocessor() methods. This will remove the Coprocessor
|
||||||
attached to this table, if any. For example:
|
attached to this table, if any. For example:
|
||||||
+
|
+
|
||||||
|
@ -624,12 +624,12 @@ hadoop fs -copyFromLocal coprocessor.jar coprocessor.jar
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
Configuration conf = HBaseConfiguration.create();
|
Configuration conf = HBaseConfiguration.create();
|
||||||
// Use below code for HBase verion 1.x.x or above.
|
// Use below code for HBase version 1.x.x or above.
|
||||||
Connection connection = ConnectionFactory.createConnection(conf);
|
Connection connection = ConnectionFactory.createConnection(conf);
|
||||||
TableName tableName = TableName.valueOf("users");
|
TableName tableName = TableName.valueOf("users");
|
||||||
Table table = connection.getTable(tableName);
|
Table table = connection.getTable(tableName);
|
||||||
|
|
||||||
//Use below code HBase verion 0.98.xx or below.
|
//Use below code HBase version 0.98.xx or below.
|
||||||
//HConnection connection = HConnectionManager.createConnection(conf);
|
//HConnection connection = HConnectionManager.createConnection(conf);
|
||||||
//HTableInterface table = connection.getTable("users");
|
//HTableInterface table = connection.getTable("users");
|
||||||
|
|
||||||
|
@ -789,12 +789,12 @@ following code as shown below:
|
||||||
----
|
----
|
||||||
|
|
||||||
Configuration conf = HBaseConfiguration.create();
|
Configuration conf = HBaseConfiguration.create();
|
||||||
// Use below code for HBase verion 1.x.x or above.
|
// Use below code for HBase version 1.x.x or above.
|
||||||
Connection connection = ConnectionFactory.createConnection(conf);
|
Connection connection = ConnectionFactory.createConnection(conf);
|
||||||
TableName tableName = TableName.valueOf("users");
|
TableName tableName = TableName.valueOf("users");
|
||||||
Table table = connection.getTable(tableName);
|
Table table = connection.getTable(tableName);
|
||||||
|
|
||||||
//Use below code HBase verion 0.98.xx or below.
|
//Use below code HBase version 0.98.xx or below.
|
||||||
//HConnection connection = HConnectionManager.createConnection(conf);
|
//HConnection connection = HConnectionManager.createConnection(conf);
|
||||||
//HTableInterface table = connection.getTable("users");
|
//HTableInterface table = connection.getTable("users");
|
||||||
|
|
||||||
|
|
|
@ -171,7 +171,7 @@ For more information about the internals of how Apache HBase stores data, see <<
|
||||||
A namespace is a logical grouping of tables analogous to a database in relation database systems.
|
A namespace is a logical grouping of tables analogous to a database in relation database systems.
|
||||||
This abstraction lays the groundwork for upcoming multi-tenancy related features:
|
This abstraction lays the groundwork for upcoming multi-tenancy related features:
|
||||||
|
|
||||||
* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (ie regions, tables) a namespace can consume.
|
* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (i.e. regions, tables) a namespace can consume.
|
||||||
* Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
|
* Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
|
||||||
* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation.
|
* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation.
|
||||||
|
|
||||||
|
@ -257,7 +257,7 @@ For example, the columns _courses:history_ and _courses:math_ are both members o
|
||||||
The colon character (`:`) delimits the column family from the column family qualifier.
|
The colon character (`:`) delimits the column family from the column family qualifier.
|
||||||
The column family prefix must be composed of _printable_ characters.
|
The column family prefix must be composed of _printable_ characters.
|
||||||
The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes.
|
The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes.
|
||||||
Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up an running.
|
Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up and running.
|
||||||
|
|
||||||
Physically, all column family members are stored together on the filesystem.
|
Physically, all column family members are stored together on the filesystem.
|
||||||
Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.
|
Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.
|
||||||
|
@ -279,7 +279,7 @@ Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hba
|
||||||
|
|
||||||
=== Put
|
=== Put
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])[Table.batch] (non-writeBuffer).
|
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,%20java.lang.Object%5B%5D)[Table.batch] (non-writeBuffer).
|
||||||
|
|
||||||
[[scan]]
|
[[scan]]
|
||||||
=== Scans
|
=== Scans
|
||||||
|
|
|
@ -90,7 +90,7 @@ We used to be on SVN.
|
||||||
We migrated.
|
We migrated.
|
||||||
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
|
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
|
||||||
See link:http://hbase.apache.org/source-repository.html[Source Code
|
See link:http://hbase.apache.org/source-repository.html[Source Code
|
||||||
Management] page for contributor and committer links or seach for HBase on the link:http://git.apache.org/[Apache Git] page.
|
Management] page for contributor and committer links or search for HBase on the link:http://git.apache.org/[Apache Git] page.
|
||||||
|
|
||||||
== IDEs
|
== IDEs
|
||||||
|
|
||||||
|
@ -133,7 +133,7 @@ If you cloned the project via git, download and install the Git plugin (EGit). A
|
||||||
==== HBase Project Setup in Eclipse using `m2eclipse`
|
==== HBase Project Setup in Eclipse using `m2eclipse`
|
||||||
|
|
||||||
The easiest way is to use the +m2eclipse+ plugin for Eclipse.
|
The easiest way is to use the +m2eclipse+ plugin for Eclipse.
|
||||||
Eclipse Indigo or newer includes +m2eclipse+, or you can download it from link:http://www.eclipse.org/m2e//. It provides Maven integration for Eclipse, and even lets you use the direct Maven commands from within Eclipse to compile and test your project.
|
Eclipse Indigo or newer includes +m2eclipse+, or you can download it from http://www.eclipse.org/m2e/. It provides Maven integration for Eclipse, and even lets you use the direct Maven commands from within Eclipse to compile and test your project.
|
||||||
|
|
||||||
To import the project, click and select the HBase root directory. `m2eclipse` locates all the hbase modules for you.
|
To import the project, click and select the HBase root directory. `m2eclipse` locates all the hbase modules for you.
|
||||||
|
|
||||||
|
@ -146,7 +146,7 @@ If you install +m2eclipse+ and import HBase in your workspace, do the following
|
||||||
----
|
----
|
||||||
Failed to execute goal
|
Failed to execute goal
|
||||||
org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hbase:
|
org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hbase:
|
||||||
'An Ant BuildException has occured: Replace: source file .../target/classes/hbase-default.xml
|
'An Ant BuildException has occurred: Replace: source file .../target/classes/hbase-default.xml
|
||||||
doesn't exist
|
doesn't exist
|
||||||
----
|
----
|
||||||
+
|
+
|
||||||
|
@ -213,7 +213,7 @@ For additional information on setting up Eclipse for HBase development on Window
|
||||||
|
|
||||||
=== IntelliJ IDEA
|
=== IntelliJ IDEA
|
||||||
|
|
||||||
You can set up IntelliJ IDEA for similar functinoality as Eclipse.
|
You can set up IntelliJ IDEA for similar functionality as Eclipse.
|
||||||
Follow these steps.
|
Follow these steps.
|
||||||
|
|
||||||
. Select
|
. Select
|
||||||
|
@ -227,7 +227,7 @@ Using the Eclipse Code Formatter plugin for IntelliJ IDEA, you can import the HB
|
||||||
|
|
||||||
=== Other IDEs
|
=== Other IDEs
|
||||||
|
|
||||||
It would be userful to mirror the <<eclipse,eclipse>> set-up instructions for other IDEs.
|
It would be useful to mirror the <<eclipse,eclipse>> set-up instructions for other IDEs.
|
||||||
If you would like to assist, please have a look at link:https://issues.apache.org/jira/browse/HBASE-11704[HBASE-11704].
|
If you would like to assist, please have a look at link:https://issues.apache.org/jira/browse/HBASE-11704[HBASE-11704].
|
||||||
|
|
||||||
[[build]]
|
[[build]]
|
||||||
|
@ -331,13 +331,13 @@ Tests may not all pass so you may need to pass `-DskipTests` unless you are incl
|
||||||
====
|
====
|
||||||
You will see ERRORs like the above title if you pass the _default_ profile; e.g.
|
You will see ERRORs like the above title if you pass the _default_ profile; e.g.
|
||||||
if you pass +hadoop.profile=1.1+ when building 0.96 or +hadoop.profile=2.0+ when building hadoop 0.98; just drop the hadoop.profile stipulation in this case to get your build to run again.
|
if you pass +hadoop.profile=1.1+ when building 0.96 or +hadoop.profile=2.0+ when building hadoop 0.98; just drop the hadoop.profile stipulation in this case to get your build to run again.
|
||||||
This seems to be a maven pecularity that is probably fixable but we've not spent the time trying to figure it.
|
This seems to be a maven peculiarity that is probably fixable but we've not spent the time trying to figure it.
|
||||||
====
|
====
|
||||||
|
|
||||||
Similarly, for 3.0, you would just replace the profile value.
|
Similarly, for 3.0, you would just replace the profile value.
|
||||||
Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
|
Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artifact - you will need to build and install your own in your local maven repository if you want to run against this profile.
|
||||||
|
|
||||||
In earilier versions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
|
In earlier versions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
|
||||||
If you are running, for example HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:
|
If you are running, for example HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:
|
||||||
|
|
||||||
[source,bourne]
|
[source,bourne]
|
||||||
|
@ -415,7 +415,7 @@ mvn -DskipTests package assembly:single deploy
|
||||||
==== Build Gotchas
|
==== Build Gotchas
|
||||||
|
|
||||||
If you see `Unable to find resource 'VM_global_library.vm'`, ignore it.
|
If you see `Unable to find resource 'VM_global_library.vm'`, ignore it.
|
||||||
Its not an error.
|
It's not an error.
|
||||||
It is link:http://jira.codehaus.org/browse/MSITE-286[officially
|
It is link:http://jira.codehaus.org/browse/MSITE-286[officially
|
||||||
ugly] though.
|
ugly] though.
|
||||||
|
|
||||||
|
@ -504,7 +504,7 @@ For building earlier versions, the process is different.
|
||||||
See this section under the respective release documentation folders.
|
See this section under the respective release documentation folders.
|
||||||
|
|
||||||
.Point Releases
|
.Point Releases
|
||||||
If you are making a point release (for example to quickly address a critical incompatability or security problem) off of a release branch instead of a development branch, the tagging instructions are slightly different.
|
If you are making a point release (for example to quickly address a critical incompatibility or security problem) off of a release branch instead of a development branch, the tagging instructions are slightly different.
|
||||||
I'll prefix those special steps with _Point Release Only_.
|
I'll prefix those special steps with _Point Release Only_.
|
||||||
|
|
||||||
.Before You Begin
|
.Before You Begin
|
||||||
|
@ -516,7 +516,7 @@ You should also have tried recent branch tips out on a cluster under load, perha
|
||||||
[NOTE]
|
[NOTE]
|
||||||
====
|
====
|
||||||
At this point you should tag the previous release branch (ex: 0.96.1) with the new point release tag (e.g.
|
At this point you should tag the previous release branch (ex: 0.96.1) with the new point release tag (e.g.
|
||||||
0.96.1.1 tag). Any commits with changes for the point release should be appled to the new tag.
|
0.96.1.1 tag). Any commits with changes for the point release should be applied to the new tag.
|
||||||
====
|
====
|
||||||
|
|
||||||
The Hadoop link:http://wiki.apache.org/hadoop/HowToRelease[How To
|
The Hadoop link:http://wiki.apache.org/hadoop/HowToRelease[How To
|
||||||
|
@ -584,8 +584,8 @@ $ mvn clean install -DskipTests assembly:single -Dassembly.file=hbase-assembly/s
|
||||||
Extract the tarball and make sure it looks good.
|
Extract the tarball and make sure it looks good.
|
||||||
A good test for the src tarball being 'complete' is to see if you can build new tarballs from this source bundle.
|
A good test for the src tarball being 'complete' is to see if you can build new tarballs from this source bundle.
|
||||||
If the source tarball is good, save it off to a _version directory_, a directory somewhere where you are collecting all of the tarballs you will publish as part of the release candidate.
|
If the source tarball is good, save it off to a _version directory_, a directory somewhere where you are collecting all of the tarballs you will publish as part of the release candidate.
|
||||||
For example if you were building a hbase-0.96.0 release candidate, you might call the directory _hbase-0.96.0RC0_.
|
For example if you were building an hbase-0.96.0 release candidate, you might call the directory _hbase-0.96.0RC0_.
|
||||||
Later you will publish this directory as our release candidate up on http://people.apache.org/~YOU.
|
Later you will publish this directory as our release candidate up on pass:[http://people.apache.org/~YOU].
|
||||||
|
|
||||||
. Build the binary tarball.
|
. Build the binary tarball.
|
||||||
+
|
+
|
||||||
|
@ -1146,7 +1146,7 @@ However, maven will do this for us; just use: +mvn
|
||||||
|
|
||||||
This is very similar to how you specify running a subset of unit tests (see above), but use the property `it.test` instead of `test`.
|
This is very similar to how you specify running a subset of unit tests (see above), but use the property `it.test` instead of `test`.
|
||||||
To just run `IntegrationTestClassXYZ.java`, use: +mvn
|
To just run `IntegrationTestClassXYZ.java`, use: +mvn
|
||||||
failsafe:integration-test -Dit.test=IntegrationTestClassXYZ+ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java: +mvn failsafe:integration-test -Dit.test=*ClassX*+ This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*". You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups.This would look something like: +mvn
|
failsafe:integration-test -Dit.test=IntegrationTestClassXYZ+ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java: +mvn failsafe:integration-test -Dit.test=*ClassX*+ This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*". You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups. This would look something like: +mvn
|
||||||
failsafe:integration-test -Dit.test=*ClassX*, *ClassY+
|
failsafe:integration-test -Dit.test=*ClassX*, *ClassY+
|
||||||
|
|
||||||
[[maven.build.commands.integration.tests.distributed]]
|
[[maven.build.commands.integration.tests.distributed]]
|
||||||
|
@ -1183,8 +1183,9 @@ For other deployment options, a ClusterManager can be implemented and plugged in
|
||||||
[[maven.build.commands.integration.tests.destructive]]
|
[[maven.build.commands.integration.tests.destructive]]
|
||||||
==== Destructive integration / system tests (ChaosMonkey)
|
==== Destructive integration / system tests (ChaosMonkey)
|
||||||
|
|
||||||
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
|
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after
|
||||||
[same-named tool by Netflix's Chaos Monkey tool]. ChaosMonkey simulates real-world
|
link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix's Chaos Monkey tool].
|
||||||
|
ChaosMonkey simulates real-world
|
||||||
faults in a running cluster by killing or disconnecting random servers, or injecting
|
faults in a running cluster by killing or disconnecting random servers, or injecting
|
||||||
other failures into the environment. You can use ChaosMonkey as a stand-alone tool
|
other failures into the environment. You can use ChaosMonkey as a stand-alone tool
|
||||||
to run a policy while other tests are running. In some environments, ChaosMonkey is
|
to run a policy while other tests are running. In some environments, ChaosMonkey is
|
||||||
|
@ -1262,8 +1263,8 @@ HBase ships with several ChaosMonkey policies, available in the
|
||||||
[[chaos.monkey.properties]]
|
[[chaos.monkey.properties]]
|
||||||
==== Configuring Individual ChaosMonkey Actions
|
==== Configuring Individual ChaosMonkey Actions
|
||||||
|
|
||||||
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348
|
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]),
|
||||||
[HBASE-11348]), ChaosMonkey integration tests can be configured per test run.
|
ChaosMonkey integration tests can be configured per test run.
|
||||||
Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using
|
Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using
|
||||||
the `-monkeyProps` configuration flag. Configurable properties, along with their default
|
the `-monkeyProps` configuration flag. Configurable properties, along with their default
|
||||||
values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
|
values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
|
||||||
|
@ -1604,7 +1605,7 @@ All are subject to challenge of course but until then, please hold to the rules
|
||||||
|
|
||||||
ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.
|
ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.
|
||||||
|
|
||||||
* .ExceptionsThere are currently a few exceptions that we need to fix around whether a table is enabled or disabled.
|
* .Exceptions: There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.
|
||||||
* Replication data is currently stored only in ZooKeeper.
|
* Replication data is currently stored only in ZooKeeper.
|
||||||
Deleting ZooKeeper data related to replication may cause replication to be disabled.
|
Deleting ZooKeeper data related to replication may cause replication to be disabled.
|
||||||
Do not delete the replication tree, _/hbase/replication/_.
|
Do not delete the replication tree, _/hbase/replication/_.
|
||||||
|
@ -1866,9 +1867,9 @@ If the push fails for any reason, fix the problem or ask for help.
|
||||||
Do not do a +git push --force+.
|
Do not do a +git push --force+.
|
||||||
+
|
+
|
||||||
Before you can commit a patch, you need to determine how the patch was created.
|
Before you can commit a patch, you need to determine how the patch was created.
|
||||||
The instructions and preferences around the way to create patches have changed, and there will be a transition periond.
|
The instructions and preferences around the way to create patches have changed, and there will be a transition period.
|
||||||
+
|
+
|
||||||
* .Determine How a Patch Was CreatedIf the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using +git format-patch+.
|
* .Determine How a Patch Was Created: If the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using +git format-patch+.
|
||||||
This is the preference, because you can reuse the submitter's commit message.
|
This is the preference, because you can reuse the submitter's commit message.
|
||||||
If the commit message is not appropriate, you can still use the commit, then run the command +git
|
If the commit message is not appropriate, you can still use the commit, then run the command +git
|
||||||
rebase -i origin/master+, and squash and reword as appropriate.
|
rebase -i origin/master+, and squash and reword as appropriate.
|
||||||
|
@ -1971,7 +1972,7 @@ When the amending author is different from the original committer, add notice of
|
||||||
from master to branch].
|
from master to branch].
|
||||||
|
|
||||||
[[committer.tests]]
|
[[committer.tests]]
|
||||||
====== Committers are responsible for making sure commits do not break thebuild or tests
|
====== Committers are responsible for making sure commits do not break the build or tests
|
||||||
|
|
||||||
If a committer commits a patch, it is their responsibility to make sure it passes the test suite.
|
If a committer commits a patch, it is their responsibility to make sure it passes the test suite.
|
||||||
It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase.
|
It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase.
|
||||||
|
|
|
@ -77,7 +77,7 @@ of the <<security>> chapter.
|
||||||
|
|
||||||
=== Using REST Endpoints
|
=== Using REST Endpoints
|
||||||
|
|
||||||
The following examples use the placeholder server `http://example.com:8000`, and
|
The following examples use the placeholder server pass:[http://example.com:8000], and
|
||||||
the following commands can all be run using `curl` or `wget` commands. You can request
|
the following commands can all be run using `curl` or `wget` commands. You can request
|
||||||
plain text (the default), XML , or JSON output by adding no header for plain text,
|
plain text (the default), XML , or JSON output by adding no header for plain text,
|
||||||
or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.
|
or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.
|
||||||
|
|
|
@ -46,7 +46,7 @@ What is the history of HBase?::
|
||||||
|
|
||||||
=== Upgrading
|
=== Upgrading
|
||||||
How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?::
|
How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?::
|
||||||
In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the `hbase-client` module or another module as appropriate, rather than a single JAR. You can model your Maven depency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information.
|
In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the `hbase-client` module or another module as appropriate, rather than a single JAR. You can model your Maven dependency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information.
|
||||||
+
|
+
|
||||||
.Maven Dependency for HBase 0.98
|
.Maven Dependency for HBase 0.98
|
||||||
[source,xml]
|
[source,xml]
|
||||||
|
|
|
@ -497,7 +497,8 @@ ZooKeeper session timeout in milliseconds. It is used in two different ways.
|
||||||
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
||||||
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
||||||
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
||||||
For example, if a HBase region server connects to a ZK ensemble that's also managed by HBase, then the
|
For example, if an HBase region server connects to a ZK ensemble that's also managed
|
||||||
|
by HBase, then the
|
||||||
session timeout will be the one specified by this configuration. But, a region server that connects
|
session timeout will be the one specified by this configuration. But, a region server that connects
|
||||||
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
|
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
|
||||||
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
|
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
|
||||||
|
@ -844,7 +845,7 @@ Time to sleep in between searches for work (in milliseconds).
|
||||||
.Description
|
.Description
|
||||||
|
|
||||||
How many time to retry attempting to write a version file
|
How many time to retry attempting to write a version file
|
||||||
before just aborting. Each attempt is seperated by the
|
before just aborting. Each attempt is separated by the
|
||||||
hbase.server.thread.wakefrequency milliseconds.
|
hbase.server.thread.wakefrequency milliseconds.
|
||||||
+
|
+
|
||||||
.Default
|
.Default
|
||||||
|
@ -1578,7 +1579,7 @@ Set to true to skip the 'hbase.defaults.for.version' check.
|
||||||
Setting this to true can be useful in contexts other than
|
Setting this to true can be useful in contexts other than
|
||||||
the other side of a maven generation; i.e. running in an
|
the other side of a maven generation; i.e. running in an
|
||||||
ide. You'll want to set this boolean to true to avoid
|
ide. You'll want to set this boolean to true to avoid
|
||||||
seeing the RuntimException complaint: "hbase-default.xml file
|
seeing the RuntimeException complaint: "hbase-default.xml file
|
||||||
seems to be for and old version of HBase (\${hbase.version}), this
|
seems to be for and old version of HBase (\${hbase.version}), this
|
||||||
version is X.X.X-SNAPSHOT"
|
version is X.X.X-SNAPSHOT"
|
||||||
+
|
+
|
||||||
|
@ -2139,7 +2140,7 @@ Fully qualified name of class implementing coordinated state manager.
|
||||||
|
|
||||||
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
|
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
|
||||||
If this is enabled, a replication peer named "region_replica_replication" will be created
|
If this is enabled, a replication peer named "region_replica_replication" will be created
|
||||||
which will tail the logs and replicate the mutatations to region replicas for tables that
|
which will tail the logs and replicate the mutations to region replicas for tables that
|
||||||
have region replication > 1. If this is enabled once, disabling this replication also
|
have region replication > 1. If this is enabled once, disabling this replication also
|
||||||
requires disabling the replication peer using shell or ReplicationAdmin java class.
|
requires disabling the replication peer using shell or ReplicationAdmin java class.
|
||||||
Replication to secondary region replicas works over standard inter-cluster replication.
|
Replication to secondary region replicas works over standard inter-cluster replication.
|
||||||
|
|
|
@ -115,7 +115,7 @@ suit your environment, and restart or rolling restart the RegionServer.
|
||||||
<value>1000</value>
|
<value>1000</value>
|
||||||
<description>
|
<description>
|
||||||
Number of opened file handlers to cache.
|
Number of opened file handlers to cache.
|
||||||
A larger value will benefit reads by provinding more file handlers per mob
|
A larger value will benefit reads by providing more file handlers per mob
|
||||||
file cache and would reduce frequent file opening and closing.
|
file cache and would reduce frequent file opening and closing.
|
||||||
However, if this is set too high, this could lead to a "too many opened file handers"
|
However, if this is set too high, this could lead to a "too many opened file handers"
|
||||||
The default value is 1000.
|
The default value is 1000.
|
||||||
|
@ -167,7 +167,7 @@ These commands are also available via `Admin.compactMob` and
|
||||||
==== MOB Sweeper
|
==== MOB Sweeper
|
||||||
|
|
||||||
HBase MOB a MapReduce job called the Sweeper tool for
|
HBase MOB a MapReduce job called the Sweeper tool for
|
||||||
optimization. The Sweeper tool oalesces small MOB files or MOB files with many
|
optimization. The Sweeper tool coalesces small MOB files or MOB files with many
|
||||||
deletions or updates. The Sweeper tool is not required if you use native MOB compaction, which
|
deletions or updates. The Sweeper tool is not required if you use native MOB compaction, which
|
||||||
does not rely on MapReduce.
|
does not rely on MapReduce.
|
||||||
|
|
||||||
|
|
|
@ -42,7 +42,7 @@ $ ./bin/hbase hbck
|
||||||
----
|
----
|
||||||
|
|
||||||
At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES present.
|
At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES present.
|
||||||
You may also want to run run hbck a few times because some inconsistencies can be transient (e.g.
|
You may also want to run hbck a few times because some inconsistencies can be transient (e.g.
|
||||||
cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g.
|
cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g.
|
||||||
via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected.
|
via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected.
|
||||||
The using the `-details` option will report more details including a representative listing of all the splits present in all the tables.
|
The using the `-details` option will report more details including a representative listing of all the splits present in all the tables.
|
||||||
|
@ -177,7 +177,7 @@ $ ./bin/hbase hbck -fixMetaOnly -fixAssignments
|
||||||
==== Special cases: HBase version file is missing
|
==== Special cases: HBase version file is missing
|
||||||
|
|
||||||
HBase's data on the file system requires a version file in order to start.
|
HBase's data on the file system requires a version file in order to start.
|
||||||
If this flie is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file.
|
If this file is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file.
|
||||||
This assumes that the version of hbck you are running is the appropriate version for the HBase cluster.
|
This assumes that the version of hbck you are running is the appropriate version for the HBase cluster.
|
||||||
|
|
||||||
==== Special case: Root and META are corrupt.
|
==== Special case: Root and META are corrupt.
|
||||||
|
|
|
@ -65,7 +65,7 @@ The dependencies only need to be available on the local `CLASSPATH`.
|
||||||
The following example runs the bundled HBase link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`.
|
The following example runs the bundled HBase link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`.
|
||||||
If you have not set the environment variables expected in the command (the parts prefixed by a `$` sign and surrounded by curly braces), you can use the actual system paths instead.
|
If you have not set the environment variables expected in the command (the parts prefixed by a `$` sign and surrounded by curly braces), you can use the actual system paths instead.
|
||||||
Be sure to use the correct version of the HBase JAR for your system.
|
Be sure to use the correct version of the HBase JAR for your system.
|
||||||
The backticks (``` symbols) cause ths shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`.
|
The backticks (``` symbols) cause the shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`.
|
||||||
This example assumes you use a BASH-compatible shell.
|
This example assumes you use a BASH-compatible shell.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
|
@ -279,7 +279,7 @@ That is where the logic for map-task assignment resides.
|
||||||
|
|
||||||
The following is an example of using HBase as a MapReduce source in read-only manner.
|
The following is an example of using HBase as a MapReduce source in read-only manner.
|
||||||
Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper.
|
Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper.
|
||||||
There job would be defined as follows...
|
The job would be defined as follows...
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
|
@ -592,7 +592,7 @@ public class MyMapper extends TableMapper<Text, LongWritable> {
|
||||||
== Speculative Execution
|
== Speculative Execution
|
||||||
|
|
||||||
It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source.
|
It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source.
|
||||||
This can either be done on a per-Job basis through properties, on on the entire cluster.
|
This can either be done on a per-Job basis through properties, or on the entire cluster.
|
||||||
Especially for longer running jobs, speculative execution will create duplicate map-tasks which will double-write your data to HBase; this is probably not what you want.
|
Especially for longer running jobs, speculative execution will create duplicate map-tasks which will double-write your data to HBase; this is probably not what you want.
|
||||||
|
|
||||||
See <<spec.ex,spec.ex>> for more information.
|
See <<spec.ex,spec.ex>> for more information.
|
||||||
|
@ -613,7 +613,7 @@ The following example shows a Cascading `Flow` which "sinks" data into an HBase
|
||||||
// emits two fields: "offset" and "line"
|
// emits two fields: "offset" and "line"
|
||||||
Tap source = new Hfs( new TextLine(), inputFileLhs );
|
Tap source = new Hfs( new TextLine(), inputFileLhs );
|
||||||
|
|
||||||
// store data in a HBase cluster
|
// store data in an HBase cluster
|
||||||
// accepts fields "num", "lower", and "upper"
|
// accepts fields "num", "lower", and "upper"
|
||||||
// will automatically scope incoming fields to their proper familyname, "left" or "right"
|
// will automatically scope incoming fields to their proper familyname, "left" or "right"
|
||||||
Fields keyFields = new Fields( "num" );
|
Fields keyFields = new Fields( "num" );
|
||||||
|
|
|
@ -199,7 +199,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -t 600000
|
||||||
|
|
||||||
By default, the canary tool only check the read operations, it's hard to find the problem in the
|
By default, the canary tool only check the read operations, it's hard to find the problem in the
|
||||||
write path. To enable the write sniffing, you can run canary with the `-writeSniffing` option.
|
write path. To enable the write sniffing, you can run canary with the `-writeSniffing` option.
|
||||||
When the write sniffing is enabled, the canary tool will create a hbase table and make sure the
|
When the write sniffing is enabled, the canary tool will create an hbase table and make sure the
|
||||||
regions of the table distributed on all region servers. In each sniffing period, the canary will
|
regions of the table distributed on all region servers. In each sniffing period, the canary will
|
||||||
try to put data to these regions to check the write availability of each region server.
|
try to put data to these regions to check the write availability of each region server.
|
||||||
----
|
----
|
||||||
|
@ -351,7 +351,7 @@ You can invoke it via the HBase cli with the 'wal' command.
|
||||||
[NOTE]
|
[NOTE]
|
||||||
====
|
====
|
||||||
Prior to version 2.0, the WAL Pretty Printer was called the `HLogPrettyPrinter`, after an internal name for HBase's write ahead log.
|
Prior to version 2.0, the WAL Pretty Printer was called the `HLogPrettyPrinter`, after an internal name for HBase's write ahead log.
|
||||||
In those versions, you can pring the contents of a WAL using the same configuration as above, but with the 'hlog' command.
|
In those versions, you can print the contents of a WAL using the same configuration as above, but with the 'hlog' command.
|
||||||
|
|
||||||
----
|
----
|
||||||
$ ./bin/hbase hlog hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
|
$ ./bin/hbase hlog hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
|
||||||
|
@ -523,7 +523,7 @@ row9 c1 c2
|
||||||
row10 c1 c2
|
row10 c1 c2
|
||||||
----
|
----
|
||||||
|
|
||||||
For ImportTsv to use this imput file, the command line needs to look like this:
|
For ImportTsv to use this input file, the command line needs to look like this:
|
||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
|
@ -781,7 +781,7 @@ To decommission a loaded RegionServer, run the following: +$
|
||||||
====
|
====
|
||||||
The `HOSTNAME` passed to _graceful_stop.sh_ must match the hostname that hbase is using to identify RegionServers.
|
The `HOSTNAME` passed to _graceful_stop.sh_ must match the hostname that hbase is using to identify RegionServers.
|
||||||
Check the list of RegionServers in the master UI for how HBase is referring to servers.
|
Check the list of RegionServers in the master UI for how HBase is referring to servers.
|
||||||
Its usually hostname but can also be FQDN.
|
It's usually hostname but can also be FQDN.
|
||||||
Whatever HBase is using, this is what you should pass the _graceful_stop.sh_ decommission script.
|
Whatever HBase is using, this is what you should pass the _graceful_stop.sh_ decommission script.
|
||||||
If you pass IPs, the script is not yet smart enough to make a hostname (or FQDN) of it and so it will fail when it checks if server is currently running; the graceful unloading of regions will not run.
|
If you pass IPs, the script is not yet smart enough to make a hostname (or FQDN) of it and so it will fail when it checks if server is currently running; the graceful unloading of regions will not run.
|
||||||
====
|
====
|
||||||
|
@ -821,12 +821,12 @@ Hence, it is better to manage the balancer apart from `graceful_stop` reenabling
|
||||||
[[draining.servers]]
|
[[draining.servers]]
|
||||||
==== Decommissioning several Regions Servers concurrently
|
==== Decommissioning several Regions Servers concurrently
|
||||||
|
|
||||||
If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping mutiple RegionServers concurrently.
|
If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping multiple RegionServers concurrently.
|
||||||
To gracefully drain multiple regionservers at the same time, RegionServers can be put into a "draining" state.
|
To gracefully drain multiple regionservers at the same time, RegionServers can be put into a "draining" state.
|
||||||
This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper under the _hbase_root/draining_ znode.
|
This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper under the _hbase_root/draining_ znode.
|
||||||
This znode has format `name,port,startcode` just like the regionserver entries under _hbase_root/rs_ znode.
|
This znode has format `name,port,startcode` just like the regionserver entries under _hbase_root/rs_ znode.
|
||||||
|
|
||||||
Without this facility, decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining.
|
Without this facility, decommissioning multiple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining.
|
||||||
Marking RegionServers to be in the draining state prevents this from happening.
|
Marking RegionServers to be in the draining state prevents this from happening.
|
||||||
See this link:http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html[blog
|
See this link:http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html[blog
|
||||||
post] for more details.
|
post] for more details.
|
||||||
|
@ -991,7 +991,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h
|
||||||
Restart the region server for the changes to take effect.
|
Restart the region server for the changes to take effect.
|
||||||
|
|
||||||
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
|
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
|
||||||
To filter which metrics are emitted or to extend the metrics framework, see link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
|
To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
|
||||||
|
|
||||||
.HBase Metrics and Ganglia
|
.HBase Metrics and Ganglia
|
||||||
[NOTE]
|
[NOTE]
|
||||||
|
@ -1014,15 +1014,15 @@ Rather than listing each metric which HBase emits by default, you can browse thr
|
||||||
Different metrics are exposed for the Master process and each region server process.
|
Different metrics are exposed for the Master process and each region server process.
|
||||||
|
|
||||||
.Procedure: Access a JSON Output of Available Metrics
|
.Procedure: Access a JSON Output of Available Metrics
|
||||||
. After starting HBase, access the region server's web UI, at `http://REGIONSERVER_HOSTNAME:60030` by default (or port 16030 in HBase 1.0+).
|
. After starting HBase, access the region server's web UI, at pass:[http://REGIONSERVER_HOSTNAME:60030] by default (or port 16030 in HBase 1.0+).
|
||||||
. Click the [label]#Metrics Dump# link near the top.
|
. Click the [label]#Metrics Dump# link near the top.
|
||||||
The metrics for the region server are presented as a dump of the JMX bean in JSON format.
|
The metrics for the region server are presented as a dump of the JMX bean in JSON format.
|
||||||
This will dump out all metrics names and their values.
|
This will dump out all metrics names and their values.
|
||||||
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60030/jmx?description=true`.
|
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60030/jmx?description=true].
|
||||||
Not all beans and attributes have descriptions.
|
Not all beans and attributes have descriptions.
|
||||||
. To view metrics for the Master, connect to the Master's web UI instead (defaults to `http://localhost:60010` or port 16010 in HBase 1.0+) and click its [label]#Metrics
|
. To view metrics for the Master, connect to the Master's web UI instead (defaults to pass:[http://localhost:60010] or port 16010 in HBase 1.0+) and click its [label]#Metrics
|
||||||
Dump# link.
|
Dump# link.
|
||||||
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60010/jmx?description=true`.
|
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60010/jmx?description=true].
|
||||||
Not all beans and attributes have descriptions.
|
Not all beans and attributes have descriptions.
|
||||||
|
|
||||||
|
|
||||||
|
@ -1341,9 +1341,9 @@ disable_peer <ID>::
|
||||||
remove_peer <ID>::
|
remove_peer <ID>::
|
||||||
Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs.
|
Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs.
|
||||||
enable_table_replication <TABLE_NAME>::
|
enable_table_replication <TABLE_NAME>::
|
||||||
Enable the table replication switch for all it's column families. If the table is not found in the destination cluster then it will create one with the same name and column families.
|
Enable the table replication switch for all its column families. If the table is not found in the destination cluster then it will create one with the same name and column families.
|
||||||
disable_table_replication <TABLE_NAME>::
|
disable_table_replication <TABLE_NAME>::
|
||||||
Disable the table replication switch for all it's column families.
|
Disable the table replication switch for all its column families.
|
||||||
|
|
||||||
=== Verifying Replicated Data
|
=== Verifying Replicated Data
|
||||||
|
|
||||||
|
@ -1462,7 +1462,7 @@ Speed is also limited by total size of the list of edits to replicate per slave,
|
||||||
With this configuration, a master cluster region server with three slaves would use at most 192 MB to store data to replicate.
|
With this configuration, a master cluster region server with three slaves would use at most 192 MB to store data to replicate.
|
||||||
This does not account for the data which was filtered but not garbage collected.
|
This does not account for the data which was filtered but not garbage collected.
|
||||||
|
|
||||||
Once the maximum size of edits has been buffered or the reader reaces the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return.
|
Once the maximum size of edits has been buffered or the reader reaches the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return.
|
||||||
If the RPC was successful, the source determines whether the current file has been emptied or it contains more data which needs to be read.
|
If the RPC was successful, the source determines whether the current file has been emptied or it contains more data which needs to be read.
|
||||||
If the file has been emptied, the source deletes the znode in the queue.
|
If the file has been emptied, the source deletes the znode in the queue.
|
||||||
Otherwise, it registers the new offset in the log's znode.
|
Otherwise, it registers the new offset in the log's znode.
|
||||||
|
@ -1778,7 +1778,7 @@ but still suboptimal compared to a mechanism which allows large requests to be s
|
||||||
into multiple smaller ones.
|
into multiple smaller ones.
|
||||||
|
|
||||||
HBASE-10993 introduces such a system for deprioritizing long-running scanners. There
|
HBASE-10993 introduces such a system for deprioritizing long-running scanners. There
|
||||||
are two types of queues,`fifo` and `deadline`.To configure the type of queue used,
|
are two types of queues, `fifo` and `deadline`. To configure the type of queue used,
|
||||||
configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There
|
configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There
|
||||||
is no way to estimate how long each request may take, so de-prioritization only affects
|
is no way to estimate how long each request may take, so de-prioritization only affects
|
||||||
scans, and is based on the number of “next” calls a scan request has made. An assumption
|
scans, and is based on the number of “next” calls a scan request has made. An assumption
|
||||||
|
@ -2049,7 +2049,7 @@ Aside from the disk space necessary to store the data, one RS may not be able to
|
||||||
[[ops.capacity.nodes.throughput]]
|
[[ops.capacity.nodes.throughput]]
|
||||||
==== Read/Write throughput
|
==== Read/Write throughput
|
||||||
|
|
||||||
Number of nodes can also be driven by required thoughput for reads and/or writes.
|
Number of nodes can also be driven by required throughput for reads and/or writes.
|
||||||
The throughput one can get per node depends a lot on data (esp.
|
The throughput one can get per node depends a lot on data (esp.
|
||||||
key/value sizes) and request patterns, as well as node and system configuration.
|
key/value sizes) and request patterns, as well as node and system configuration.
|
||||||
Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count.
|
Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count.
|
||||||
|
|
|
@ -88,7 +88,7 @@ Multiple rack configurations carry the same potential issues as multiple switche
|
||||||
* Poor switch capacity performance
|
* Poor switch capacity performance
|
||||||
* Insufficient uplink to another rack
|
* Insufficient uplink to another rack
|
||||||
|
|
||||||
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks.
|
If the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks.
|
||||||
The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
|
The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
|
||||||
The downside of this method however, is in the overhead of ports that could potentially be used.
|
The downside of this method however, is in the overhead of ports that could potentially be used.
|
||||||
An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
|
An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
|
||||||
|
@ -102,7 +102,7 @@ Are all the network interfaces functioning correctly? Are you sure? See the Trou
|
||||||
|
|
||||||
[[perf.network.call_me_maybe]]
|
[[perf.network.call_me_maybe]]
|
||||||
=== Network Consistency and Partition Tolerance
|
=== Network Consistency and Partition Tolerance
|
||||||
The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics:
|
The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three characteristics:
|
||||||
- *C*onsistency -- all nodes see the same data.
|
- *C*onsistency -- all nodes see the same data.
|
||||||
- *A*vailability -- every request receives a response about whether it succeeded or failed.
|
- *A*vailability -- every request receives a response about whether it succeeded or failed.
|
||||||
- *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others.
|
- *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others.
|
||||||
|
@ -556,7 +556,7 @@ When writing a lot of data to an HBase table from a MR job (e.g., with link:http
|
||||||
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
|
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
|
||||||
It's far more efficient to just write directly to HBase.
|
It's far more efficient to just write directly to HBase.
|
||||||
|
|
||||||
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case.
|
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the above case.
|
||||||
|
|
||||||
[[perf.one.region]]
|
[[perf.one.region]]
|
||||||
=== Anti-Pattern: One Hot Region
|
=== Anti-Pattern: One Hot Region
|
||||||
|
@ -565,7 +565,7 @@ If all your data is being written to one region at a time, then re-read the sect
|
||||||
|
|
||||||
Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.
|
Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.
|
||||||
There are a variety of reasons that regions may appear "well split" but won't work with your data.
|
There are a variety of reasons that regions may appear "well split" but won't work with your data.
|
||||||
As the HBase client communicates directly with the RegionServers, this can be obtained via link:hhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte[])[Table.getRegionLocation].
|
As the HBase client communicates directly with the RegionServers, this can be obtained via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation].
|
||||||
|
|
||||||
See <<precreate.regions>>, as well as <<perf.configurations>>
|
See <<precreate.regions>>, as well as <<perf.configurations>>
|
||||||
|
|
||||||
|
@ -607,7 +607,7 @@ When columns are selected explicitly with `scan.addColumn`, HBase will schedule
|
||||||
When rows have few columns and each column has only a few versions this can be inefficient.
|
When rows have few columns and each column has only a few versions this can be inefficient.
|
||||||
A seek operation is generally slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes.
|
A seek operation is generally slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes.
|
||||||
|
|
||||||
In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set the on Scan object.
|
In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set on the Scan object.
|
||||||
The following code instructs the RegionServer to attempt two iterations of next before a seek is scheduled:
|
The following code instructs the RegionServer to attempt two iterations of next before a seek is scheduled:
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
|
@ -731,7 +731,7 @@ However, if hedged reads are enabled, the client waits some configurable amount
|
||||||
Whichever read returns first is used, and the other read request is discarded.
|
Whichever read returns first is used, and the other read request is discarded.
|
||||||
Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection.
|
Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection.
|
||||||
|
|
||||||
Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment.
|
Because an HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment.
|
||||||
|
|
||||||
.Configuration for Hedged Reads
|
.Configuration for Hedged Reads
|
||||||
* `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads.
|
* `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads.
|
||||||
|
@ -870,7 +870,7 @@ If you are running on EC2 and post performance questions on the dist-list, pleas
|
||||||
== Collocating HBase and MapReduce
|
== Collocating HBase and MapReduce
|
||||||
|
|
||||||
It is often recommended to have different clusters for HBase and MapReduce.
|
It is often recommended to have different clusters for HBase and MapReduce.
|
||||||
A better qualification of this is: don't collocate a HBase that serves live requests with a heavy MR workload.
|
A better qualification of this is: don't collocate an HBase that serves live requests with a heavy MR workload.
|
||||||
OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other, usually the former.
|
OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other, usually the former.
|
||||||
For example, short latency-sensitive disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as possible.
|
For example, short latency-sensitive disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as possible.
|
||||||
MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate blocks in the <<block.cache>>.
|
MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate blocks in the <<block.cache>>.
|
||||||
|
|
|
@ -106,7 +106,7 @@ After client sends preamble and connection header, server does NOT respond if su
|
||||||
No response means server is READY to accept requests and to give out response.
|
No response means server is READY to accept requests and to give out response.
|
||||||
If the version or authentication in the preamble is not agreeable or the server has trouble parsing the preamble, it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the error and will then disconnect.
|
If the version or authentication in the preamble is not agreeable or the server has trouble parsing the preamble, it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the error and will then disconnect.
|
||||||
If the client in the connection header -- i.e.
|
If the client in the connection header -- i.e.
|
||||||
the protobuf'd Message that comes after the connection preamble -- asks for for a Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.
|
the protobuf'd Message that comes after the connection preamble -- asks for a Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.
|
||||||
|
|
||||||
==== Request
|
==== Request
|
||||||
|
|
||||||
|
@ -118,7 +118,7 @@ The header includes the method name and optionally, metadata on the optional Cel
|
||||||
The parameter type suits the method being invoked: i.e.
|
The parameter type suits the method being invoked: i.e.
|
||||||
if we are doing a getRegionInfo request, the protobuf Message param will be an instance of GetRegionInfoRequest.
|
if we are doing a getRegionInfo request, the protobuf Message param will be an instance of GetRegionInfoRequest.
|
||||||
The response will be a GetRegionInfoResponse.
|
The response will be a GetRegionInfoResponse.
|
||||||
The CellBlock is optionally used ferrying the bulk of the RPC data: i.e Cells/KeyValues.
|
The CellBlock is optionally used ferrying the bulk of the RPC data: i.e. Cells/KeyValues.
|
||||||
|
|
||||||
===== Request Parts
|
===== Request Parts
|
||||||
|
|
||||||
|
@ -182,7 +182,7 @@ Codecs will live on the server for all time so old clients can connect.
|
||||||
|
|
||||||
.Constraints
|
.Constraints
|
||||||
In some part, current wire-format -- i.e.
|
In some part, current wire-format -- i.e.
|
||||||
all requests and responses preceeded by a length -- has been dictated by current server non-async architecture.
|
all requests and responses preceded by a length -- has been dictated by current server non-async architecture.
|
||||||
|
|
||||||
.One fat pb request or header+param
|
.One fat pb request or header+param
|
||||||
We went with pb header followed by pb param making a request and a pb header followed by pb response for now.
|
We went with pb header followed by pb param making a request and a pb header followed by pb response for now.
|
||||||
|
@ -214,9 +214,9 @@ If a server sees no codec, it will return all responses in pure protobuf.
|
||||||
Running pure protobuf all the time will be slower than running with cellblocks.
|
Running pure protobuf all the time will be slower than running with cellblocks.
|
||||||
|
|
||||||
.Compression
|
.Compression
|
||||||
Uses hadoops compression codecs.
|
Uses hadoop's compression codecs.
|
||||||
To enable compressing of passed CellBlocks, set `hbase.client.rpc.compressor` to the name of the Compressor to use.
|
To enable compressing of passed CellBlocks, set `hbase.client.rpc.compressor` to the name of the Compressor to use.
|
||||||
Compressor must implement Hadoops' CompressionCodec Interface.
|
Compressor must implement Hadoop's CompressionCodec Interface.
|
||||||
After connection setup, all passed cellblocks will be sent compressed.
|
After connection setup, all passed cellblocks will be sent compressed.
|
||||||
The server will return cellblocks compressed using this same compressor as long as the compressor is on its CLASSPATH (else you will get `UnsupportedCompressionCodecException`).
|
The server will return cellblocks compressed using this same compressor as long as the compressor is on its CLASSPATH (else you will get `UnsupportedCompressionCodecException`).
|
||||||
|
|
||||||
|
|
|
@ -187,7 +187,7 @@ See this comic by IKai Lan on why monotonically increasing row keys are problema
|
||||||
The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
|
The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
|
||||||
|
|
||||||
If you do need to upload time series data into HBase, you should study link:http://opentsdb.net/[OpenTSDB] as a successful example.
|
If you do need to upload time series data into HBase, you should study link:http://opentsdb.net/[OpenTSDB] as a successful example.
|
||||||
It has a page describing the link: http://opentsdb.net/schema.html[schema] it uses in HBase.
|
It has a page describing the link:http://opentsdb.net/schema.html[schema] it uses in HBase.
|
||||||
The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.
|
The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.
|
||||||
However, the difference is that the timestamp is not in the _lead_ position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.
|
However, the difference is that the timestamp is not in the _lead_ position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.
|
||||||
Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
|
Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
|
||||||
|
@ -339,8 +339,8 @@ As an example of why this is important, consider the example of using displayabl
|
||||||
|
|
||||||
The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and possibly "hot") region problem.
|
The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and possibly "hot") region problem.
|
||||||
To understand why, refer to an link:http://www.asciitable.com[ASCII Table].
|
To understand why, refer to an link:http://www.asciitable.com[ASCII Table].
|
||||||
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f]. Thus, the middle regions regions will never be used.
|
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f]. Thus, the middle regions will never be used.
|
||||||
To make pre-spliting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required.
|
To make pre-splitting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required.
|
||||||
|
|
||||||
Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the regions are accessible in the keyspace.
|
Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the regions are accessible in the keyspace.
|
||||||
While this example demonstrated the problem with a hex-key keyspace, the same problem can happen with _any_ keyspace.
|
While this example demonstrated the problem with a hex-key keyspace, the same problem can happen with _any_ keyspace.
|
||||||
|
@ -406,7 +406,7 @@ The minimum number of row versions parameter is used together with the time-to-l
|
||||||
HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
|
HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
|
||||||
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
||||||
|
|
||||||
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailling list for conversations on this topic.
|
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic.
|
||||||
All rows in HBase conform to the <<datamodel>>, and that includes versioning.
|
All rows in HBase conform to the <<datamodel>>, and that includes versioning.
|
||||||
Take that into consideration when making your design, as well as block size for the ColumnFamily.
|
Take that into consideration when making your design, as well as block size for the ColumnFamily.
|
||||||
|
|
||||||
|
@ -514,7 +514,7 @@ ROW COLUMN+CELL
|
||||||
|
|
||||||
Notice how delete cells are let go.
|
Notice how delete cells are let go.
|
||||||
|
|
||||||
Now lets run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table or per-column-family):
|
Now let's run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table or per-column-family):
|
||||||
|
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
@ -605,7 +605,7 @@ However, don't try a full-scan on a large table like this from an application (i
|
||||||
[[secondary.indexes.periodic]]
|
[[secondary.indexes.periodic]]
|
||||||
=== Periodic-Update Secondary Index
|
=== Periodic-Update Secondary Index
|
||||||
|
|
||||||
A secondary index could be created in an other table which is periodically updated via a MapReduce job.
|
A secondary index could be created in another table which is periodically updated via a MapReduce job.
|
||||||
The job could be executed intra-day, but depending on load-strategy it could still potentially be out of sync with the main data table.
|
The job could be executed intra-day, but depending on load-strategy it could still potentially be out of sync with the main data table.
|
||||||
|
|
||||||
See <<mapreduce.example.readwrite,mapreduce.example.readwrite>> for more information.
|
See <<mapreduce.example.readwrite,mapreduce.example.readwrite>> for more information.
|
||||||
|
@ -753,7 +753,7 @@ In either the Hash or Numeric substitution approach, the raw values for hostname
|
||||||
|
|
||||||
This effectively is the OpenTSDB approach.
|
This effectively is the OpenTSDB approach.
|
||||||
What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
|
What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
|
||||||
For a detailed explanation, see: link:http://opentsdb.net/schema.html, and
|
For a detailed explanation, see: http://opentsdb.net/schema.html, and
|
||||||
+++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++
|
+++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++
|
||||||
from HBaseCon2012.
|
from HBaseCon2012.
|
||||||
|
|
||||||
|
@ -800,7 +800,7 @@ Assuming that the combination of customer number and sales order uniquely identi
|
||||||
[customer number][order number]
|
[customer number][order number]
|
||||||
----
|
----
|
||||||
|
|
||||||
for a ORDER table.
|
for an ORDER table.
|
||||||
However, there are more design decisions to make: are the _raw_ values the best choices for rowkeys?
|
However, there are more design decisions to make: are the _raw_ values the best choices for rowkeys?
|
||||||
|
|
||||||
The same design questions in the Log Data use-case confront us here.
|
The same design questions in the Log Data use-case confront us here.
|
||||||
|
@ -931,9 +931,9 @@ For example, the ORDER table's rowkey was described above: <<schema.casestudies.
|
||||||
|
|
||||||
There are many options here: JSON, XML, Java Serialization, Avro, Hadoop Writables, etc.
|
There are many options here: JSON, XML, Java Serialization, Avro, Hadoop Writables, etc.
|
||||||
All of them are variants of the same approach: encode the object graph to a byte-array.
|
All of them are variants of the same approach: encode the object graph to a byte-array.
|
||||||
Care should be taken with this approach to ensure backward compatibilty in case the object model changes such that older persisted structures can still be read back out of HBase.
|
Care should be taken with this approach to ensure backward compatibility in case the object model changes such that older persisted structures can still be read back out of HBase.
|
||||||
|
|
||||||
Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per Order in this example), but the cons include the aforementioned warning about backward compatiblity of serialization, language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in getting frameworks like Hive to work with custom objects like this.
|
Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per Order in this example), but the cons include the aforementioned warning about backward compatibility of serialization, language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in getting frameworks like Hive to work with custom objects like this.
|
||||||
|
|
||||||
[[schema.smackdown]]
|
[[schema.smackdown]]
|
||||||
=== Case Study - "Tall/Wide/Middle" Schema Design Smackdown
|
=== Case Study - "Tall/Wide/Middle" Schema Design Smackdown
|
||||||
|
@ -945,7 +945,7 @@ These are general guidelines and not laws - each application must consider its o
|
||||||
==== Rows vs. Versions
|
==== Rows vs. Versions
|
||||||
|
|
||||||
A common question is whether one should prefer rows or HBase's built-in-versioning.
|
A common question is whether one should prefer rows or HBase's built-in-versioning.
|
||||||
The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwite with each successive update.
|
The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwrite with each successive update.
|
||||||
|
|
||||||
Preference: Rows (generally speaking).
|
Preference: Rows (generally speaking).
|
||||||
|
|
||||||
|
@ -1044,14 +1044,14 @@ The tl;dr version is that you should probably go with one row per user+value, an
|
||||||
|
|
||||||
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
|
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
|
||||||
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
|
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
|
||||||
Doing it this way is generally recommended (see here link:http://hbase.apache.org/book.html#schema.smackdown).
|
Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown).
|
||||||
|
|
||||||
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
|
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
|
||||||
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.
|
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.
|
||||||
The client has methods that allow you to get specific slices of columns.
|
The client has methods that allow you to get specific slices of columns.
|
||||||
|
|
||||||
Note that neither case fundamentally uses more disk space than the other; you're just "shifting" part of the identifying information for a value either to the left (into the row key, in option one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value still stores the whole row key, and column family name.
|
Note that neither case fundamentally uses more disk space than the other; you're just "shifting" part of the identifying information for a value either to the left (into the row key, in option one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value still stores the whole row key, and column family name.
|
||||||
(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding HBase schema design: link:http://www.youtube.com/watch?v=_HLoH_PgrLk).
|
(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding HBase schema design: http://www.youtube.com/watch?v=_HLoH_PgrLk).
|
||||||
|
|
||||||
A manually paginated version has lots more complexities, as you note, like having to keep track of how many things are in each page, re-shuffling if new values are inserted, etc.
|
A manually paginated version has lots more complexities, as you note, like having to keep track of how many things are in each page, re-shuffling if new values are inserted, etc.
|
||||||
That seems significantly more complex.
|
That seems significantly more complex.
|
||||||
|
|
|
@ -331,7 +331,7 @@ To enable REST gateway Kerberos authentication for client access, add the follow
|
||||||
Substitute the keytab for HTTP for _$KEYTAB_.
|
Substitute the keytab for HTTP for _$KEYTAB_.
|
||||||
|
|
||||||
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
|
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
|
||||||
You can also implement a custom authentication by implemening Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
|
You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
|
||||||
For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
|
For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
|
||||||
|
|
||||||
[[security.rest.gateway]]
|
[[security.rest.gateway]]
|
||||||
|
@ -343,7 +343,7 @@ To the HBase server, all requests are from the REST gateway user.
|
||||||
The actual users are unknown.
|
The actual users are unknown.
|
||||||
You can turn on the impersonation support.
|
You can turn on the impersonation support.
|
||||||
With impersonation, the REST gateway user is a proxy user.
|
With impersonation, the REST gateway user is a proxy user.
|
||||||
The HBase server knows the acutal/real user of each request.
|
The HBase server knows the actual/real user of each request.
|
||||||
So it can apply proper authorizations.
|
So it can apply proper authorizations.
|
||||||
|
|
||||||
To turn on REST gateway impersonation, we need to configure HBase servers (masters and region servers) to allow proxy users; configure REST gateway to enable impersonation.
|
To turn on REST gateway impersonation, we need to configure HBase servers (masters and region servers) to allow proxy users; configure REST gateway to enable impersonation.
|
||||||
|
@ -1117,7 +1117,7 @@ NOTE: Visibility labels are not currently applied for superusers.
|
||||||
| Interpretation
|
| Interpretation
|
||||||
|
|
||||||
| fulltime
|
| fulltime
|
||||||
| Allow accesss to users associated with the fulltime label.
|
| Allow access to users associated with the fulltime label.
|
||||||
|
|
||||||
| !public
|
| !public
|
||||||
| Allow access to users not associated with the public label.
|
| Allow access to users not associated with the public label.
|
||||||
|
|
|
@ -76,7 +76,7 @@ NOTE: Spawning HBase Shell commands in this way is slow, so keep that in mind wh
|
||||||
|
|
||||||
.Passing Commands to the HBase Shell
|
.Passing Commands to the HBase Shell
|
||||||
====
|
====
|
||||||
You can pass commands to the HBase Shell in non-interactive mode (see <<hbasee.shell.noninteractive,hbasee.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
|
You can pass commands to the HBase Shell in non-interactive mode (see <<hbase.shell.noninteractive,hbase.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
|
||||||
Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell.
|
Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell.
|
||||||
Some debug-level output has been truncated from the example below.
|
Some debug-level output has been truncated from the example below.
|
||||||
|
|
||||||
|
|
|
@ -36,9 +36,9 @@ more information on the Spark project and subprojects. This document will focus
|
||||||
on 4 main interaction points between Spark and HBase. Those interaction points are:
|
on 4 main interaction points between Spark and HBase. Those interaction points are:
|
||||||
|
|
||||||
Basic Spark::
|
Basic Spark::
|
||||||
The ability to have a HBase Connection at any point in your Spark DAG.
|
The ability to have an HBase Connection at any point in your Spark DAG.
|
||||||
Spark Streaming::
|
Spark Streaming::
|
||||||
The ability to have a HBase Connection at any point in your Spark Streaming
|
The ability to have an HBase Connection at any point in your Spark Streaming
|
||||||
application.
|
application.
|
||||||
Spark Bulk Load::
|
Spark Bulk Load::
|
||||||
The ability to write directly to HBase HFiles for bulk insertion into HBase
|
The ability to write directly to HBase HFiles for bulk insertion into HBase
|
||||||
|
@ -205,7 +205,7 @@ There are three inputs to the `hbaseBulkPut` function.
|
||||||
. The hbaseContext that carries the configuration boardcast information link us
|
. The hbaseContext that carries the configuration boardcast information link us
|
||||||
to the HBase Connections in the executors
|
to the HBase Connections in the executors
|
||||||
. The table name of the table we are putting data into
|
. The table name of the table we are putting data into
|
||||||
. A function that will convert a record in the DStream into a HBase Put object.
|
. A function that will convert a record in the DStream into an HBase Put object.
|
||||||
====
|
====
|
||||||
|
|
||||||
== Bulk Load
|
== Bulk Load
|
||||||
|
@ -350,7 +350,7 @@ FROM hbaseTmp
|
||||||
WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
|
WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
|
||||||
----
|
----
|
||||||
|
|
||||||
Now lets look at an example where we will end up doing two scans on HBase.
|
Now let's look at an example where we will end up doing two scans on HBase.
|
||||||
|
|
||||||
[source, sql]
|
[source, sql]
|
||||||
----
|
----
|
||||||
|
|
|
@ -89,11 +89,11 @@ Additionally, each DataNode server will also have a TaskTracker/NodeManager log
|
||||||
[[rpc.logging]]
|
[[rpc.logging]]
|
||||||
==== Enabling RPC-level logging
|
==== Enabling RPC-level logging
|
||||||
|
|
||||||
Enabling the RPC-level logging on a RegionServer can often given insight on timings at the server.
|
Enabling the RPC-level logging on a RegionServer can often give insight on timings at the server.
|
||||||
Once enabled, the amount of log spewed is voluminous.
|
Once enabled, the amount of log spewed is voluminous.
|
||||||
It is not recommended that you leave this logging on for more than short bursts of time.
|
It is not recommended that you leave this logging on for more than short bursts of time.
|
||||||
To enable RPC-level logging, browse to the RegionServer UI and click on _Log Level_.
|
To enable RPC-level logging, browse to the RegionServer UI and click on _Log Level_.
|
||||||
Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (Thats right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
|
Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (That's right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
|
||||||
Analyze.
|
Analyze.
|
||||||
|
|
||||||
To disable, set the logging level back to `INFO` level.
|
To disable, set the logging level back to `INFO` level.
|
||||||
|
@ -185,7 +185,7 @@ The key points here is to keep all these pauses low.
|
||||||
CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.
|
CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.
|
||||||
|
|
||||||
This can be due to the size of the ParNew, which should be relatively small.
|
This can be due to the size of the ParNew, which should be relatively small.
|
||||||
If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if its too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.
|
If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if it's too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.
|
||||||
|
|
||||||
Add the below line in _hbase-env.sh_:
|
Add the below line in _hbase-env.sh_:
|
||||||
[source,bourne]
|
[source,bourne]
|
||||||
|
@ -443,7 +443,7 @@ java.lang.Thread.State: WAITING (on object monitor)
|
||||||
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
|
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
|
||||||
----
|
----
|
||||||
|
|
||||||
A handler thread that's waiting for stuff to do (like put, delete, scan, etc):
|
A handler thread that's waiting for stuff to do (like put, delete, scan, etc.):
|
||||||
|
|
||||||
[source]
|
[source]
|
||||||
----
|
----
|
||||||
|
@ -849,7 +849,7 @@ are snapshots and WALs.
|
||||||
|
|
||||||
Snapshots::
|
Snapshots::
|
||||||
When you create a snapshot, HBase retains everything it needs to recreate the table's
|
When you create a snapshot, HBase retains everything it needs to recreate the table's
|
||||||
state at that time of tne snapshot. This includes deleted cells or expired versions.
|
state at that time of the snapshot. This includes deleted cells or expired versions.
|
||||||
For this reason, your snapshot usage pattern should be well-planned, and you should
|
For this reason, your snapshot usage pattern should be well-planned, and you should
|
||||||
prune snapshots that you no longer need. Snapshots are stored in `/hbase/.snapshots`,
|
prune snapshots that you no longer need. Snapshots are stored in `/hbase/.snapshots`,
|
||||||
and archives needed to restore snapshots are stored in
|
and archives needed to restore snapshots are stored in
|
||||||
|
@ -1070,7 +1070,7 @@ However, if the NotServingRegionException is logged ERROR, then the client ran o
|
||||||
|
|
||||||
Fix your DNS.
|
Fix your DNS.
|
||||||
In versions of Apache HBase before 0.92.x, reverse DNS needs to give same answer as forward lookup.
|
In versions of Apache HBase before 0.92.x, reverse DNS needs to give same answer as forward lookup.
|
||||||
See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not using the name given it by the master; double entry in master listing of servers] for gorey details.
|
See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not using the name given it by the master; double entry in master listing of servers] for gory details.
|
||||||
|
|
||||||
[[brand.new.compressor]]
|
[[brand.new.compressor]]
|
||||||
==== Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Gotbrand-new compressor' messages
|
==== Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Gotbrand-new compressor' messages
|
||||||
|
|
|
@ -96,13 +96,13 @@ public class TestMyHbaseDAOData {
|
||||||
|
|
||||||
These tests ensure that your `createPut` method creates, populates, and returns a `Put` object with expected values.
|
These tests ensure that your `createPut` method creates, populates, and returns a `Put` object with expected values.
|
||||||
Of course, JUnit can do much more than this.
|
Of course, JUnit can do much more than this.
|
||||||
For an introduction to JUnit, see link:https://github.com/junit-team/junit/wiki/Getting-started.
|
For an introduction to JUnit, see https://github.com/junit-team/junit/wiki/Getting-started.
|
||||||
|
|
||||||
== Mockito
|
== Mockito
|
||||||
|
|
||||||
Mockito is a mocking framework.
|
Mockito is a mocking framework.
|
||||||
It goes further than JUnit by allowing you to test the interactions between objects without having to replicate the entire environment.
|
It goes further than JUnit by allowing you to test the interactions between objects without having to replicate the entire environment.
|
||||||
You can read more about Mockito at its project site, link:https://code.google.com/p/mockito/.
|
You can read more about Mockito at its project site, https://code.google.com/p/mockito/.
|
||||||
|
|
||||||
You can use Mockito to do unit testing on smaller units.
|
You can use Mockito to do unit testing on smaller units.
|
||||||
For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a `org.apache.hadoop.hbase.master.MasterServices` interface reference rather than a full-blown `org.apache.hadoop.hbase.master.HMaster`.
|
For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a `org.apache.hadoop.hbase.master.MasterServices` interface reference rather than a full-blown `org.apache.hadoop.hbase.master.HMaster`.
|
||||||
|
@ -182,7 +182,7 @@ public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable>
|
||||||
public static final byte[] CF = "CF".getBytes();
|
public static final byte[] CF = "CF".getBytes();
|
||||||
public static final byte[] QUALIFIER = "CQ-1".getBytes();
|
public static final byte[] QUALIFIER = "CQ-1".getBytes();
|
||||||
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
|
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
|
||||||
//bunch of processing to extract data to be inserted, in our case, lets say we are simply
|
//bunch of processing to extract data to be inserted, in our case, let's say we are simply
|
||||||
//appending all the records we receive from the mapper for this particular
|
//appending all the records we receive from the mapper for this particular
|
||||||
//key and insert one record into HBase
|
//key and insert one record into HBase
|
||||||
StringBuffer data = new StringBuffer();
|
StringBuffer data = new StringBuffer();
|
||||||
|
@ -259,7 +259,7 @@ Your MRUnit test verifies that the output is as expected, the Put that is insert
|
||||||
|
|
||||||
MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other operations, including reading from HBase, processing data, or writing to HDFS,
|
MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other operations, including reading from HBase, processing data, or writing to HDFS,
|
||||||
|
|
||||||
== Integration Testing with a HBase Mini-Cluster
|
== Integration Testing with an HBase Mini-Cluster
|
||||||
|
|
||||||
HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using a [firstterm]_mini-cluster_.
|
HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using a [firstterm]_mini-cluster_.
|
||||||
The first step is to add some dependencies to your Maven POM file.
|
The first step is to add some dependencies to your Maven POM file.
|
||||||
|
|
|
@ -132,7 +132,7 @@ HBase Client API::
|
||||||
|
|
||||||
[[hbase.limitetprivate.api]]
|
[[hbase.limitetprivate.api]]
|
||||||
HBase LimitedPrivate API::
|
HBase LimitedPrivate API::
|
||||||
LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those consumers are coprocessors, phoenix, replication endpoint implemnetations or similar. At this point, HBase only guarantees source and binary compatibility for these interfaces between patch versions.
|
LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those consumers are coprocessors, phoenix, replication endpoint implementations or similar. At this point, HBase only guarantees source and binary compatibility for these interfaces between patch versions.
|
||||||
|
|
||||||
[[hbase.private.api]]
|
[[hbase.private.api]]
|
||||||
HBase Private API::
|
HBase Private API::
|
||||||
|
@ -158,7 +158,7 @@ When we say two HBase versions are compatible, we mean that the versions are wir
|
||||||
|
|
||||||
A rolling upgrade is the process by which you update the servers in your cluster a server at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible. See <<hbase.rolling.restart>> for more on what this means. Coarsely, a rolling upgrade is a graceful stop each server, update the software, and then restart. You do this for each server in the cluster. Usually you upgrade the Master first and then the RegionServers. See <<rolling>> for tools that can help use the rolling upgrade process.
|
A rolling upgrade is the process by which you update the servers in your cluster a server at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible. See <<hbase.rolling.restart>> for more on what this means. Coarsely, a rolling upgrade is a graceful stop each server, update the software, and then restart. You do this for each server in the cluster. Usually you upgrade the Master first and then the RegionServers. See <<rolling>> for tools that can help use the rolling upgrade process.
|
||||||
|
|
||||||
For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before running a rolling restart over the cluser, we changed the symlink to point at the new HBase software version and then ran
|
For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before running a rolling restart over the cluster, we changed the symlink to point at the new HBase software version and then ran
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
|
@ -200,7 +200,7 @@ ports.
|
||||||
|
|
||||||
[[upgrade1.0.hbase.bucketcache.percentage.in.combinedcache]]
|
[[upgrade1.0.hbase.bucketcache.percentage.in.combinedcache]]
|
||||||
.hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED
|
.hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED
|
||||||
You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache, this change does not effect you. Its removal means that your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config., its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
|
You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache, this change does not affect you. Its removal means that your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config., its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
|
||||||
|
|
||||||
[[hbase-12068]]
|
[[hbase-12068]]
|
||||||
.If you have your own customer filters.
|
.If you have your own customer filters.
|
||||||
|
@ -392,7 +392,7 @@ The migration is a one-time event. However, every time your cluster starts, `MET
|
||||||
|
|
||||||
[[upgrade0.94]]
|
[[upgrade0.94]]
|
||||||
=== Upgrading from 0.92.x to 0.94.x
|
=== Upgrading from 0.92.x to 0.94.x
|
||||||
We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder pattern in HColumnDescriptor] changed method signatures so rather than return `void` they instead return `HColumnDescriptor`. This will throw`java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible. You cannot do a rolling upgrade between them.
|
We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder pattern in HColumnDescriptor] changed method signatures so rather than return `void` they instead return `HColumnDescriptor`. This will throw `java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible. You cannot do a rolling upgrade between them.
|
||||||
|
|
||||||
[[upgrade0.92]]
|
[[upgrade0.92]]
|
||||||
=== Upgrading from 0.90.x to 0.92.x
|
=== Upgrading from 0.90.x to 0.92.x
|
||||||
|
|
|
@ -97,7 +97,7 @@ In the example below we have ZooKeeper persist to _/user/local/zookeeper_.
|
||||||
</configuration>
|
</configuration>
|
||||||
----
|
----
|
||||||
|
|
||||||
.What verion of ZooKeeper should I use?
|
.What version of ZooKeeper should I use?
|
||||||
[CAUTION]
|
[CAUTION]
|
||||||
====
|
====
|
||||||
The newer version, the better.
|
The newer version, the better.
|
||||||
|
|
Loading…
Reference in New Issue