HBASE-19068 Change all url of apache.org from HTTP to HTTPS in HBase book
Signed-off-by: Jan Hentschel <jan.hentschel@ultratendency.com>
This commit is contained in:
parent
125f3eace9
commit
8e0571a3a4
|
@ -35,9 +35,9 @@ including the documentation.
|
||||||
|
|
||||||
In HBase, documentation includes the following areas, and probably some others:
|
In HBase, documentation includes the following areas, and probably some others:
|
||||||
|
|
||||||
* The link:http://hbase.apache.org/book.html[HBase Reference
|
* The link:https://hbase.apache.org/book.html[HBase Reference
|
||||||
Guide] (this book)
|
Guide] (this book)
|
||||||
* The link:http://hbase.apache.org/[HBase website]
|
* The link:https://hbase.apache.org/[HBase website]
|
||||||
* API documentation
|
* API documentation
|
||||||
* Command-line utility output and help text
|
* Command-line utility output and help text
|
||||||
* Web UI strings, explicit help text, context-sensitive strings, and others
|
* Web UI strings, explicit help text, context-sensitive strings, and others
|
||||||
|
@ -126,7 +126,7 @@ This directory also stores images used in the HBase Reference Guide.
|
||||||
|
|
||||||
The website's pages are written in an HTML-like XML dialect called xdoc, which
|
The website's pages are written in an HTML-like XML dialect called xdoc, which
|
||||||
has a reference guide at
|
has a reference guide at
|
||||||
http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
|
https://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
|
||||||
You can edit these files in a plain-text editor, an IDE, or an XML editor such
|
You can edit these files in a plain-text editor, an IDE, or an XML editor such
|
||||||
as XML Mind XML Editor (XXE) or Oxygen XML Author.
|
as XML Mind XML Editor (XXE) or Oxygen XML Author.
|
||||||
|
|
||||||
|
|
|
@ -101,7 +101,7 @@ The `hbase:meta` table structure is as follows:
|
||||||
|
|
||||||
.Values
|
.Values
|
||||||
|
|
||||||
* `info:regioninfo` (serialized link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html[HRegionInfo] instance for this region)
|
* `info:regioninfo` (serialized link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html[HRegionInfo] instance for this region)
|
||||||
* `info:server` (server:port of the RegionServer containing this region)
|
* `info:server` (server:port of the RegionServer containing this region)
|
||||||
* `info:serverstartcode` (start-time of the RegionServer process containing this region)
|
* `info:serverstartcode` (start-time of the RegionServer process containing this region)
|
||||||
|
|
||||||
|
@ -119,7 +119,7 @@ If a region has both an empty start and an empty end key, it is the only region
|
||||||
====
|
====
|
||||||
|
|
||||||
In the (hopefully unlikely) event that programmatic processing of catalog metadata
|
In the (hopefully unlikely) event that programmatic processing of catalog metadata
|
||||||
is required, see the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/RegionInfo.html#parseFrom-byte:A-[RegionInfo.parseFrom] utility.
|
is required, see the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/RegionInfo.html#parseFrom-byte:A-[RegionInfo.parseFrom] utility.
|
||||||
|
|
||||||
[[arch.catalog.startup]]
|
[[arch.catalog.startup]]
|
||||||
=== Startup Sequencing
|
=== Startup Sequencing
|
||||||
|
@ -141,7 +141,7 @@ Should a region be reassigned either by the master load balancer or because a Re
|
||||||
|
|
||||||
See <<master.runtime>> for more information about the impact of the Master on HBase Client communication.
|
See <<master.runtime>> for more information about the impact of the Master on HBase Client communication.
|
||||||
|
|
||||||
Administrative functions are done via an instance of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin]
|
Administrative functions are done via an instance of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin]
|
||||||
|
|
||||||
[[client.connections]]
|
[[client.connections]]
|
||||||
=== Cluster Connections
|
=== Cluster Connections
|
||||||
|
@ -157,12 +157,12 @@ Finally, be sure to cleanup your `Connection` instance before exiting.
|
||||||
`Connections` are heavyweight objects but thread-safe so you can create one for your application and keep the instance around.
|
`Connections` are heavyweight objects but thread-safe so you can create one for your application and keep the instance around.
|
||||||
`Table`, `Admin` and `RegionLocator` instances are lightweight.
|
`Table`, `Admin` and `RegionLocator` instances are lightweight.
|
||||||
Create as you go and then let go as soon as you are done by closing them.
|
Create as you go and then let go as soon as you are done by closing them.
|
||||||
See the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html[Client Package Javadoc Description] for example usage of the new HBase 1.0 API.
|
See the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html[Client Package Javadoc Description] for example usage of the new HBase 1.0 API.
|
||||||
|
|
||||||
==== API before HBase 1.0.0
|
==== API before HBase 1.0.0
|
||||||
|
|
||||||
Instances of `HTable` are the way to interact with an HBase cluster earlier than 1.0.0. _link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances are not thread-safe_. Only one thread can use an instance of Table at any given time.
|
Instances of `HTable` are the way to interact with an HBase cluster earlier than 1.0.0. _link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances are not thread-safe_. Only one thread can use an instance of Table at any given time.
|
||||||
When creating Table instances, it is advisable to use the same link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
|
When creating Table instances, it is advisable to use the same link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
|
||||||
This will ensure sharing of ZooKeeper and socket instances to the RegionServers which is usually what you want.
|
This will ensure sharing of ZooKeeper and socket instances to the RegionServers which is usually what you want.
|
||||||
For example, this is preferred:
|
For example, this is preferred:
|
||||||
|
|
||||||
|
@ -183,7 +183,7 @@ HBaseConfiguration conf2 = HBaseConfiguration.create();
|
||||||
HTable table2 = new HTable(conf2, "myTable");
|
HTable table2 = new HTable(conf2, "myTable");
|
||||||
----
|
----
|
||||||
|
|
||||||
For more information about how connections are handled in the HBase client, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html[ConnectionFactory].
|
For more information about how connections are handled in the HBase client, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html[ConnectionFactory].
|
||||||
|
|
||||||
[[client.connection.pooling]]
|
[[client.connection.pooling]]
|
||||||
===== Connection Pooling
|
===== Connection Pooling
|
||||||
|
@ -207,19 +207,19 @@ try (Connection connection = ConnectionFactory.createConnection(conf);
|
||||||
[WARNING]
|
[WARNING]
|
||||||
====
|
====
|
||||||
Previous versions of this guide discussed `HTablePool`, which was deprecated in HBase 0.94, 0.95, and 0.96, and removed in 0.98.1, by link:https://issues.apache.org/jira/browse/HBASE-6580[HBASE-6580], or `HConnection`, which is deprecated in HBase 1.0 by `Connection`.
|
Previous versions of this guide discussed `HTablePool`, which was deprecated in HBase 0.94, 0.95, and 0.96, and removed in 0.98.1, by link:https://issues.apache.org/jira/browse/HBASE-6580[HBASE-6580], or `HConnection`, which is deprecated in HBase 1.0 by `Connection`.
|
||||||
Please use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html[Connection] instead.
|
Please use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html[Connection] instead.
|
||||||
====
|
====
|
||||||
|
|
||||||
[[client.writebuffer]]
|
[[client.writebuffer]]
|
||||||
=== WriteBuffer and Batch Methods
|
=== WriteBuffer and Batch Methods
|
||||||
|
|
||||||
In HBase 1.0 and later, link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] is deprecated in favor of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]. `Table` does not use autoflush. To do buffered writes, use the BufferedMutator class.
|
In HBase 1.0 and later, link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] is deprecated in favor of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]. `Table` does not use autoflush. To do buffered writes, use the BufferedMutator class.
|
||||||
|
|
||||||
In HBase 2.0 and later, link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] does not use BufferedMutator to execute the ``Put`` operation. Refer to link:https://issues.apache.org/jira/browse/HBASE-18500[HBASE-18500] for more information.
|
In HBase 2.0 and later, link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] does not use BufferedMutator to execute the ``Put`` operation. Refer to link:https://issues.apache.org/jira/browse/HBASE-18500[HBASE-18500] for more information.
|
||||||
|
|
||||||
For additional information on write durability, review the link:/acid-semantics.html[ACID semantics] page.
|
For additional information on write durability, review the link:/acid-semantics.html[ACID semantics] page.
|
||||||
|
|
||||||
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[batch] methods on Table.
|
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[batch] methods on Table.
|
||||||
|
|
||||||
[[async.client]]
|
[[async.client]]
|
||||||
=== Asynchronous Client ===
|
=== Asynchronous Client ===
|
||||||
|
@ -263,7 +263,7 @@ Information on non-Java clients and custom protocols is covered in <<external_ap
|
||||||
[[client.filter]]
|
[[client.filter]]
|
||||||
== Client Request Filters
|
== Client Request Filters
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be optionally configured with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[filters] which are applied on the RegionServer.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be optionally configured with link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[filters] which are applied on the RegionServer.
|
||||||
|
|
||||||
Filters can be confusing because there are many different types, and it is best to approach them by understanding the groups of Filter functionality.
|
Filters can be confusing because there are many different types, and it is best to approach them by understanding the groups of Filter functionality.
|
||||||
|
|
||||||
|
@ -275,7 +275,7 @@ Structural Filters contain other Filters.
|
||||||
[[client.filter.structural.fl]]
|
[[client.filter.structural.fl]]
|
||||||
==== FilterList
|
==== FilterList
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html[FilterList] represents a list of Filters with a relationship of `FilterList.Operator.MUST_PASS_ALL` or `FilterList.Operator.MUST_PASS_ONE` between the Filters.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html[FilterList] represents a list of Filters with a relationship of `FilterList.Operator.MUST_PASS_ALL` or `FilterList.Operator.MUST_PASS_ONE` between the Filters.
|
||||||
The following example shows an 'or' between two Filters (checking for either 'my value' or 'my other value' on the same attribute).
|
The following example shows an 'or' between two Filters (checking for either 'my value' or 'my other value' on the same attribute).
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
|
@ -305,7 +305,7 @@ scan.setFilter(list);
|
||||||
==== SingleColumnValueFilter
|
==== SingleColumnValueFilter
|
||||||
|
|
||||||
A SingleColumnValueFilter (see:
|
A SingleColumnValueFilter (see:
|
||||||
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
|
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
|
||||||
can be used to test column values for equivalence (`CompareOperaor.EQUAL`),
|
can be used to test column values for equivalence (`CompareOperaor.EQUAL`),
|
||||||
inequality (`CompareOperaor.NOT_EQUAL`), or ranges (e.g., `CompareOperaor.GREATER`). The following is an
|
inequality (`CompareOperaor.NOT_EQUAL`), or ranges (e.g., `CompareOperaor.GREATER`). The following is an
|
||||||
example of testing equivalence of a column to a String value "my value"...
|
example of testing equivalence of a column to a String value "my value"...
|
||||||
|
@ -330,7 +330,7 @@ These Comparators are used in concert with other Filters, such as <<client.filte
|
||||||
[[client.filter.cvp.rcs]]
|
[[client.filter.cvp.rcs]]
|
||||||
==== RegexStringComparator
|
==== RegexStringComparator
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html[RegexStringComparator] supports regular expressions for value comparisons.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html[RegexStringComparator] supports regular expressions for value comparisons.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
|
@ -349,7 +349,7 @@ See the Oracle JavaDoc for link:http://download.oracle.com/javase/6/docs/api/jav
|
||||||
[[client.filter.cvp.substringcomparator]]
|
[[client.filter.cvp.substringcomparator]]
|
||||||
==== SubstringComparator
|
==== SubstringComparator
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html[SubstringComparator] can be used to determine if a given substring exists in a value.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html[SubstringComparator] can be used to determine if a given substring exists in a value.
|
||||||
The comparison is case-insensitive.
|
The comparison is case-insensitive.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
|
@ -368,12 +368,12 @@ scan.setFilter(filter);
|
||||||
[[client.filter.cvp.bfp]]
|
[[client.filter.cvp.bfp]]
|
||||||
==== BinaryPrefixComparator
|
==== BinaryPrefixComparator
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html[BinaryPrefixComparator].
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html[BinaryPrefixComparator].
|
||||||
|
|
||||||
[[client.filter.cvp.bc]]
|
[[client.filter.cvp.bc]]
|
||||||
==== BinaryComparator
|
==== BinaryComparator
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html[BinaryComparator].
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html[BinaryComparator].
|
||||||
|
|
||||||
[[client.filter.kvm]]
|
[[client.filter.kvm]]
|
||||||
=== KeyValue Metadata
|
=== KeyValue Metadata
|
||||||
|
@ -383,18 +383,18 @@ As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters eva
|
||||||
[[client.filter.kvm.ff]]
|
[[client.filter.kvm.ff]]
|
||||||
==== FamilyFilter
|
==== FamilyFilter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html[FamilyFilter] can be used to filter on the ColumnFamily.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html[FamilyFilter] can be used to filter on the ColumnFamily.
|
||||||
It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.
|
It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.
|
||||||
|
|
||||||
[[client.filter.kvm.qf]]
|
[[client.filter.kvm.qf]]
|
||||||
==== QualifierFilter
|
==== QualifierFilter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html[QualifierFilter] can be used to filter based on Column (aka Qualifier) name.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html[QualifierFilter] can be used to filter based on Column (aka Qualifier) name.
|
||||||
|
|
||||||
[[client.filter.kvm.cpf]]
|
[[client.filter.kvm.cpf]]
|
||||||
==== ColumnPrefixFilter
|
==== ColumnPrefixFilter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html[ColumnPrefixFilter] can be used to filter based on the lead portion of Column (aka Qualifier) names.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html[ColumnPrefixFilter] can be used to filter based on the lead portion of Column (aka Qualifier) names.
|
||||||
|
|
||||||
A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row and for each involved column family.
|
A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row and for each involved column family.
|
||||||
It can be used to efficiently get a subset of the columns in very wide rows.
|
It can be used to efficiently get a subset of the columns in very wide rows.
|
||||||
|
@ -427,7 +427,7 @@ rs.close();
|
||||||
[[client.filter.kvm.mcpf]]
|
[[client.filter.kvm.mcpf]]
|
||||||
==== MultipleColumnPrefixFilter
|
==== MultipleColumnPrefixFilter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html[MultipleColumnPrefixFilter] behaves like ColumnPrefixFilter but allows specifying multiple prefixes.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html[MultipleColumnPrefixFilter] behaves like ColumnPrefixFilter but allows specifying multiple prefixes.
|
||||||
|
|
||||||
Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the first column matching the lowest prefix and also seeks past ranges of columns between prefixes.
|
Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the first column matching the lowest prefix and also seeks past ranges of columns between prefixes.
|
||||||
It can be used to efficiently get discontinuous sets of columns from very wide rows.
|
It can be used to efficiently get discontinuous sets of columns from very wide rows.
|
||||||
|
@ -457,7 +457,7 @@ rs.close();
|
||||||
[[client.filter.kvm.crf]]
|
[[client.filter.kvm.crf]]
|
||||||
==== ColumnRangeFilter
|
==== ColumnRangeFilter
|
||||||
|
|
||||||
A link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html[ColumnRangeFilter] allows efficient intra row scanning.
|
A link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html[ColumnRangeFilter] allows efficient intra row scanning.
|
||||||
|
|
||||||
A ColumnRangeFilter can seek ahead to the first matching column for each involved column family.
|
A ColumnRangeFilter can seek ahead to the first matching column for each involved column family.
|
||||||
It can be used to efficiently get a 'slice' of the columns of a very wide row.
|
It can be used to efficiently get a 'slice' of the columns of a very wide row.
|
||||||
|
@ -498,7 +498,7 @@ Note: Introduced in HBase 0.92
|
||||||
[[client.filter.row.rf]]
|
[[client.filter.row.rf]]
|
||||||
==== RowFilter
|
==== RowFilter
|
||||||
|
|
||||||
It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html[RowFilter] can also be used.
|
It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html[RowFilter] can also be used.
|
||||||
|
|
||||||
[[client.filter.utility]]
|
[[client.filter.utility]]
|
||||||
=== Utility
|
=== Utility
|
||||||
|
@ -507,7 +507,7 @@ It is generally a better idea to use the startRow/stopRow methods on Scan for ro
|
||||||
==== FirstKeyOnlyFilter
|
==== FirstKeyOnlyFilter
|
||||||
|
|
||||||
This is primarily used for rowcount jobs.
|
This is primarily used for rowcount jobs.
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter].
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter].
|
||||||
|
|
||||||
[[architecture.master]]
|
[[architecture.master]]
|
||||||
== Master
|
== Master
|
||||||
|
@ -634,7 +634,7 @@ However, latencies tend to be less erratic across time, because there is less ga
|
||||||
If the BucketCache is deployed in off-heap mode, this memory is not managed by the GC at all.
|
If the BucketCache is deployed in off-heap mode, this memory is not managed by the GC at all.
|
||||||
This is why you'd use BucketCache, so your latencies are less erratic and to mitigate GCs and heap fragmentation.
|
This is why you'd use BucketCache, so your latencies are less erratic and to mitigate GCs and heap fragmentation.
|
||||||
See Nick Dimiduk's link:http://www.n10k.com/blog/blockcache-101/[BlockCache 101] for comparisons running on-heap vs off-heap tests.
|
See Nick Dimiduk's link:http://www.n10k.com/blog/blockcache-101/[BlockCache 101] for comparisons running on-heap vs off-heap tests.
|
||||||
Also see link:http://people.apache.org/~stack/bc/[Comparing BlockCache Deploys] which finds that if your dataset fits inside your LruBlockCache deploy, use it otherwise if you are experiencing cache churn (or you want your cache to exist beyond the vagaries of java GC), use BucketCache.
|
Also see link:https://people.apache.org/~stack/bc/[Comparing BlockCache Deploys] which finds that if your dataset fits inside your LruBlockCache deploy, use it otherwise if you are experiencing cache churn (or you want your cache to exist beyond the vagaries of java GC), use BucketCache.
|
||||||
|
|
||||||
When you enable BucketCache, you are enabling a two tier caching system, an L1 cache which is implemented by an instance of LruBlockCache and an off-heap L2 cache which is implemented by BucketCache.
|
When you enable BucketCache, you are enabling a two tier caching system, an L1 cache which is implemented by an instance of LruBlockCache and an off-heap L2 cache which is implemented by BucketCache.
|
||||||
Management of these two tiers and the policy that dictates how blocks move between them is done by `CombinedBlockCache`.
|
Management of these two tiers and the policy that dictates how blocks move between them is done by `CombinedBlockCache`.
|
||||||
|
@ -645,7 +645,7 @@ See <<offheap.blockcache>> for more detail on going off-heap.
|
||||||
==== General Cache Configurations
|
==== General Cache Configurations
|
||||||
|
|
||||||
Apart from the cache implementation itself, you can set some general configuration options to control how the cache performs.
|
Apart from the cache implementation itself, you can set some general configuration options to control how the cache performs.
|
||||||
See http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html.
|
See https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html.
|
||||||
After setting any of these options, restart or rolling restart your cluster for the configuration to take effect.
|
After setting any of these options, restart or rolling restart your cluster for the configuration to take effect.
|
||||||
Check logs for errors or unexpected behavior.
|
Check logs for errors or unexpected behavior.
|
||||||
|
|
||||||
|
@ -755,7 +755,7 @@ Since link:https://issues.apache.org/jira/browse/HBASE-4683[HBASE-4683 Always ca
|
||||||
===== How to Enable BucketCache
|
===== How to Enable BucketCache
|
||||||
|
|
||||||
The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 on-heap cache implemented by LruBlockCache and a second L2 cache implemented with BucketCache.
|
The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 on-heap cache implemented by LruBlockCache and a second L2 cache implemented with BucketCache.
|
||||||
The managing class is link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache] by default.
|
The managing class is link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache] by default.
|
||||||
The previous link describes the caching 'policy' implemented by CombinedBlockCache.
|
The previous link describes the caching 'policy' implemented by CombinedBlockCache.
|
||||||
In short, it works by keeping meta blocks -- INDEX and BLOOM in the L1, on-heap LruBlockCache tier -- and DATA blocks are kept in the L2, BucketCache tier.
|
In short, it works by keeping meta blocks -- INDEX and BLOOM in the L1, on-heap LruBlockCache tier -- and DATA blocks are kept in the L2, BucketCache tier.
|
||||||
It is possible to amend this behavior in HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted on-heap in the L1 tier by setting `cacheDataInL1` via `(HColumnDescriptor.setCacheDataInL1(true)` or in the shell, creating or amending column families setting `CACHE_DATA_IN_L1` to true: e.g.
|
It is possible to amend this behavior in HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted on-heap in the L1 tier by setting `cacheDataInL1` via `(HColumnDescriptor.setCacheDataInL1(true)` or in the shell, creating or amending column families setting `CACHE_DATA_IN_L1` to true: e.g.
|
||||||
|
@ -881,7 +881,7 @@ The compressed BlockCache is disabled by default. To enable it, set `hbase.block
|
||||||
|
|
||||||
As write requests are handled by the region server, they accumulate in an in-memory storage system called the _memstore_. Once the memstore fills, its content are written to disk as additional store files. This event is called a _memstore flush_. As store files accumulate, the RegionServer will <<compaction,compact>> them into fewer, larger files. After each flush or compaction finishes, the amount of data stored in the region has changed. The RegionServer consults the region split policy to determine if the region has grown too large or should be split for another policy-specific reason. A region split request is enqueued if the policy recommends it.
|
As write requests are handled by the region server, they accumulate in an in-memory storage system called the _memstore_. Once the memstore fills, its content are written to disk as additional store files. This event is called a _memstore flush_. As store files accumulate, the RegionServer will <<compaction,compact>> them into fewer, larger files. After each flush or compaction finishes, the amount of data stored in the region has changed. The RegionServer consults the region split policy to determine if the region has grown too large or should be split for another policy-specific reason. A region split request is enqueued if the policy recommends it.
|
||||||
|
|
||||||
Logically, the process of splitting a region is simple. We find a suitable point in the keyspace of the region where we should divide the region in half, then split the region's data into two new regions at that point. The details of the process however are not simple. When a split happens, the newly created _daughter regions_ do not rewrite all the data into new files immediately. Instead, they create small files similar to symbolic link files, named link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/Reference.html[Reference files], which point to either the top or bottom part of the parent store file according to the split point. The reference file is used just like a regular data file, but only half of the records are considered. The region can only be split if there are no more references to the immutable data files of the parent region. Those reference files are cleaned gradually by compactions, so that the region will stop referring to its parents files, and can be split further.
|
Logically, the process of splitting a region is simple. We find a suitable point in the keyspace of the region where we should divide the region in half, then split the region's data into two new regions at that point. The details of the process however are not simple. When a split happens, the newly created _daughter regions_ do not rewrite all the data into new files immediately. Instead, they create small files similar to symbolic link files, named link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/Reference.html[Reference files], which point to either the top or bottom part of the parent store file according to the split point. The reference file is used just like a regular data file, but only half of the records are considered. The region can only be split if there are no more references to the immutable data files of the parent region. Those reference files are cleaned gradually by compactions, so that the region will stop referring to its parents files, and can be split further.
|
||||||
|
|
||||||
Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the `.META.` table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. The steps taken by the RegionServer to execute the split are illustrated in <<regionserver_split_process_image>>. Each step is labeled with its step number. Actions from RegionServers or Master are shown in red, while actions from the clients are show in green.
|
Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the `.META.` table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. The steps taken by the RegionServer to execute the split are illustrated in <<regionserver_split_process_image>>. Each step is labeled with its step number. Actions from RegionServers or Master are shown in red, while actions from the clients are show in green.
|
||||||
|
|
||||||
|
@ -915,7 +915,7 @@ Under normal operations, the WAL is not needed because data changes move from th
|
||||||
However, if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
|
However, if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
|
||||||
If writing to the WAL fails, the entire operation to modify the data fails.
|
If writing to the WAL fails, the entire operation to modify the data fails.
|
||||||
|
|
||||||
HBase uses an implementation of the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WAL.html[WAL] interface.
|
HBase uses an implementation of the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WAL.html[WAL] interface.
|
||||||
Usually, there is only one instance of a WAL per RegionServer.
|
Usually, there is only one instance of a WAL per RegionServer.
|
||||||
The RegionServer records Puts and Deletes to it, before recording them to the <<store.memstore>> for the affected <<store>>.
|
The RegionServer records Puts and Deletes to it, before recording them to the <<store.memstore>> for the affected <<store>>.
|
||||||
|
|
||||||
|
@ -1389,12 +1389,12 @@ The HDFS client does the following by default when choosing locations to write r
|
||||||
. Second replica is written to a random node on another rack
|
. Second replica is written to a random node on another rack
|
||||||
. Third replica is written on the same rack as the second, but on a different node chosen randomly
|
. Third replica is written on the same rack as the second, but on a different node chosen randomly
|
||||||
. Subsequent replicas are written on random nodes on the cluster.
|
. Subsequent replicas are written on random nodes on the cluster.
|
||||||
See _Replica Placement: The First Baby Steps_ on this page: link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture]
|
See _Replica Placement: The First Baby Steps_ on this page: link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture]
|
||||||
|
|
||||||
Thus, HBase eventually achieves locality for a region after a flush or a compaction.
|
Thus, HBase eventually achieves locality for a region after a flush or a compaction.
|
||||||
In a RegionServer failover situation a RegionServer may be assigned regions with non-local StoreFiles (because none of the replicas are local), however as new data is written in the region, or the table is compacted and StoreFiles are re-written, they will become "local" to the RegionServer.
|
In a RegionServer failover situation a RegionServer may be assigned regions with non-local StoreFiles (because none of the replicas are local), however as new data is written in the region, or the table is compacted and StoreFiles are re-written, they will become "local" to the RegionServer.
|
||||||
|
|
||||||
For more information, see _Replica Placement: The First Baby Steps_ on this page: link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] and also Lars George's blog on link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS locality].
|
For more information, see _Replica Placement: The First Baby Steps_ on this page: link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] and also Lars George's blog on link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS locality].
|
||||||
|
|
||||||
[[arch.region.splits]]
|
[[arch.region.splits]]
|
||||||
=== Region Splits
|
=== Region Splits
|
||||||
|
@ -1409,9 +1409,9 @@ See <<disable.splitting>> for how to manually manage splits (and for why you mig
|
||||||
|
|
||||||
==== Custom Split Policies
|
==== Custom Split Policies
|
||||||
You can override the default split policy using a custom
|
You can override the default split policy using a custom
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy](HBase 0.94+).
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy](HBase 0.94+).
|
||||||
Typically a custom split policy should extend HBase's default split policy:
|
Typically a custom split policy should extend HBase's default split policy:
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy].
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy].
|
||||||
|
|
||||||
The policy can set globally through the HBase configuration or on a per-table
|
The policy can set globally through the HBase configuration or on a per-table
|
||||||
basis.
|
basis.
|
||||||
|
@ -1485,13 +1485,13 @@ Using a Custom Algorithm::
|
||||||
As parameters, you give it the algorithm, desired number of regions, and column families.
|
As parameters, you give it the algorithm, desired number of regions, and column families.
|
||||||
It includes two split algorithms.
|
It includes two split algorithms.
|
||||||
The first is the
|
The first is the
|
||||||
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html[HexStringSplit]`
|
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html[HexStringSplit]`
|
||||||
algorithm, which assumes the row keys are hexadecimal strings.
|
algorithm, which assumes the row keys are hexadecimal strings.
|
||||||
The second,
|
The second,
|
||||||
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html[UniformSplit]`,
|
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html[UniformSplit]`,
|
||||||
assumes the row keys are random byte arrays.
|
assumes the row keys are random byte arrays.
|
||||||
You will probably need to develop your own
|
You will probably need to develop your own
|
||||||
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html[SplitAlgorithm]`,
|
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html[SplitAlgorithm]`,
|
||||||
using the provided ones as models.
|
using the provided ones as models.
|
||||||
|
|
||||||
=== Online Region Merges
|
=== Online Region Merges
|
||||||
|
@ -1567,7 +1567,7 @@ StoreFiles are where your data lives.
|
||||||
|
|
||||||
===== HFile Format
|
===== HFile Format
|
||||||
|
|
||||||
The _HFile_ file format is based on the SSTable file described in the link:http://research.google.com/archive/bigtable.html[BigTable [2006]] paper and on Hadoop's link:http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html[TFile] (The unit test suite and the compression harness were taken directly from TFile). Schubert Zhang's blog post on link:http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html[HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs] makes for a thorough introduction to HBase's HFile.
|
The _HFile_ file format is based on the SSTable file described in the link:http://research.google.com/archive/bigtable.html[BigTable [2006]] paper and on Hadoop's link:https://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html[TFile] (The unit test suite and the compression harness were taken directly from TFile). Schubert Zhang's blog post on link:http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html[HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs] makes for a thorough introduction to HBase's HFile.
|
||||||
Matteo Bertozzi has also put up a helpful description, link:http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw[HBase I/O: HFile].
|
Matteo Bertozzi has also put up a helpful description, link:http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw[HBase I/O: HFile].
|
||||||
|
|
||||||
For more information, see the HFile source code.
|
For more information, see the HFile source code.
|
||||||
|
@ -2393,7 +2393,7 @@ See the `LoadIncrementalHFiles` class for more information.
|
||||||
|
|
||||||
As HBase runs on HDFS (and each StoreFile is written as a file on HDFS), it is important to have an understanding of the HDFS Architecture especially in terms of how it stores files, handles failovers, and replicates blocks.
|
As HBase runs on HDFS (and each StoreFile is written as a file on HDFS), it is important to have an understanding of the HDFS Architecture especially in terms of how it stores files, handles failovers, and replicates blocks.
|
||||||
|
|
||||||
See the Hadoop documentation on link:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] for more information.
|
See the Hadoop documentation on link:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] for more information.
|
||||||
|
|
||||||
[[arch.hdfs.nn]]
|
[[arch.hdfs.nn]]
|
||||||
=== NameNode
|
=== NameNode
|
||||||
|
@ -2797,7 +2797,7 @@ if (result.isStale()) {
|
||||||
=== Resources
|
=== Resources
|
||||||
|
|
||||||
. More information about the design and implementation can be found at the jira issue: link:https://issues.apache.org/jira/browse/HBASE-10070[HBASE-10070]
|
. More information about the design and implementation can be found at the jira issue: link:https://issues.apache.org/jira/browse/HBASE-10070[HBASE-10070]
|
||||||
. HBaseCon 2014 talk: link:http://hbase.apache.org/www.hbasecon.com/#2014-PresentationsRecordings[HBase Read High Availability Using Timeline-Consistent Region Replicas] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
|
. HBaseCon 2014 talk: link:https://hbase.apache.org/www.hbasecon.com/#2014-PresentationsRecordings[HBase Read High Availability Using Timeline-Consistent Region Replicas] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
|
||||||
|
|
||||||
ifdef::backend-docbook[]
|
ifdef::backend-docbook[]
|
||||||
[index]
|
[index]
|
||||||
|
|
|
@ -35,13 +35,13 @@ HBase is a project in the Apache Software Foundation and as such there are respo
|
||||||
[[asf.devprocess]]
|
[[asf.devprocess]]
|
||||||
=== ASF Development Process
|
=== ASF Development Process
|
||||||
|
|
||||||
See the link:http://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF.
|
See the link:https://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF.
|
||||||
|
|
||||||
[[asf.reporting]]
|
[[asf.reporting]]
|
||||||
=== ASF Board Reporting
|
=== ASF Board Reporting
|
||||||
|
|
||||||
Once a quarter, each project in the ASF portfolio submits a report to the ASF board.
|
Once a quarter, each project in the ASF portfolio submits a report to the ASF board.
|
||||||
This is done by the HBase project lead and the committers.
|
This is done by the HBase project lead and the committers.
|
||||||
See link:http://www.apache.org/foundation/board/reporting[ASF board reporting] for more information.
|
See link:https://www.apache.org/foundation/board/reporting[ASF board reporting] for more information.
|
||||||
|
|
||||||
:numbered:
|
:numbered:
|
||||||
|
|
|
@ -43,7 +43,7 @@ _backup-masters_::
|
||||||
|
|
||||||
_hadoop-metrics2-hbase.properties_::
|
_hadoop-metrics2-hbase.properties_::
|
||||||
Used to connect HBase Hadoop's Metrics2 framework.
|
Used to connect HBase Hadoop's Metrics2 framework.
|
||||||
See the link:http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
|
See the link:https://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
|
||||||
Contains only commented-out examples by default.
|
Contains only commented-out examples by default.
|
||||||
|
|
||||||
_hbase-env.cmd_ and _hbase-env.sh_::
|
_hbase-env.cmd_ and _hbase-env.sh_::
|
||||||
|
@ -124,7 +124,7 @@ NOTE: You must set `JAVA_HOME` on each node of your cluster. _hbase-env.sh_ prov
|
||||||
[[os]]
|
[[os]]
|
||||||
.Operating System Utilities
|
.Operating System Utilities
|
||||||
ssh::
|
ssh::
|
||||||
HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
|
HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:https://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
|
||||||
|
|
||||||
DNS::
|
DNS::
|
||||||
HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool can be used to verify DNS is working correctly on the cluster. The project `README` file provides detailed instructions on usage.
|
HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool can be used to verify DNS is working correctly on the cluster. The project `README` file provides detailed instructions on usage.
|
||||||
|
@ -180,13 +180,13 @@ Windows::
|
||||||
|
|
||||||
|
|
||||||
[[hadoop]]
|
[[hadoop]]
|
||||||
=== link:http://hadoop.apache.org[Hadoop](((Hadoop)))
|
=== link:https://hadoop.apache.org[Hadoop](((Hadoop)))
|
||||||
|
|
||||||
The following table summarizes the versions of Hadoop supported with each version of HBase.
|
The following table summarizes the versions of Hadoop supported with each version of HBase.
|
||||||
Based on the version of HBase, you should select the most appropriate version of Hadoop.
|
Based on the version of HBase, you should select the most appropriate version of Hadoop.
|
||||||
You can use Apache Hadoop, or a vendor's distribution of Hadoop.
|
You can use Apache Hadoop, or a vendor's distribution of Hadoop.
|
||||||
No distinction is made here.
|
No distinction is made here.
|
||||||
See link:http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
|
See link:https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
|
||||||
|
|
||||||
.Hadoop 2.x is recommended.
|
.Hadoop 2.x is recommended.
|
||||||
[TIP]
|
[TIP]
|
||||||
|
@ -357,7 +357,7 @@ Distributed mode can be subdivided into distributed but all daemons run on a sin
|
||||||
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
|
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
|
||||||
|
|
||||||
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
|
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
|
||||||
See the Hadoop link:http://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
|
See the Hadoop link:https://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
|
||||||
A good walk-through for setting up HDFS on Hadoop 2 can be found at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
|
A good walk-through for setting up HDFS on Hadoop 2 can be found at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
|
||||||
|
|
||||||
[[pseudo]]
|
[[pseudo]]
|
||||||
|
@ -575,7 +575,7 @@ A basic example _hbase-site.xml_ for client only may look as follows:
|
||||||
[[java.client.config]]
|
[[java.client.config]]
|
||||||
==== Java client configuration
|
==== Java client configuration
|
||||||
|
|
||||||
The configuration used by a Java client is kept in an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
|
The configuration used by a Java client is kept in an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
|
||||||
|
|
||||||
The factory method on HBaseConfiguration, `HBaseConfiguration.create();`, on invocation, will read in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`, if one is present (Invocation will also factor in any _hbase-default.xml_ found; an _hbase-default.xml_ ships inside the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to read from a _hbase-site.xml_.
|
The factory method on HBaseConfiguration, `HBaseConfiguration.create();`, on invocation, will read in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`, if one is present (Invocation will also factor in any _hbase-default.xml_ found; an _hbase-default.xml_ ships inside the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to read from a _hbase-site.xml_.
|
||||||
For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:
|
For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:
|
||||||
|
@ -586,7 +586,7 @@ Configuration config = HBaseConfiguration.create();
|
||||||
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
|
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
|
||||||
----
|
----
|
||||||
|
|
||||||
If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
|
If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
|
||||||
|
|
||||||
[[example_config]]
|
[[example_config]]
|
||||||
== Example Configurations
|
== Example Configurations
|
||||||
|
@ -820,7 +820,7 @@ See the entry for `hbase.hregion.majorcompaction` in the <<compaction.parameters
|
||||||
====
|
====
|
||||||
Major compactions are absolutely necessary for StoreFile clean-up.
|
Major compactions are absolutely necessary for StoreFile clean-up.
|
||||||
Do not disable them altogether.
|
Do not disable them altogether.
|
||||||
You can run major compactions manually via the HBase shell or via the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
|
You can run major compactions manually via the HBase shell or via the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
|
||||||
====
|
====
|
||||||
|
|
||||||
For more information about compactions and the compaction file selection process, see <<compaction,compaction>>
|
For more information about compactions and the compaction file selection process, see <<compaction,compaction>>
|
||||||
|
|
|
@ -61,7 +61,7 @@ coprocessor can severely degrade cluster performance and stability.
|
||||||
|
|
||||||
In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
|
In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
|
||||||
query. In order to fetch only the relevant data, you filter it using a HBase
|
query. In order to fetch only the relevant data, you filter it using a HBase
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
|
||||||
, whereas in an RDBMS you use a `WHERE` predicate.
|
, whereas in an RDBMS you use a `WHERE` predicate.
|
||||||
|
|
||||||
After fetching the data, you perform computations on it. This paradigm works well
|
After fetching the data, you perform computations on it. This paradigm works well
|
||||||
|
@ -121,8 +121,8 @@ package.
|
||||||
|
|
||||||
Observer coprocessors are triggered either before or after a specific event occurs.
|
Observer coprocessors are triggered either before or after a specific event occurs.
|
||||||
Observers that happen before an event use methods that start with a `pre` prefix,
|
Observers that happen before an event use methods that start with a `pre` prefix,
|
||||||
such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
|
such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
|
||||||
with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
|
with a `post` prefix, such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
|
||||||
|
|
||||||
|
|
||||||
==== Use Cases for Observer Coprocessors
|
==== Use Cases for Observer Coprocessors
|
||||||
|
@ -139,7 +139,7 @@ Referential Integrity::
|
||||||
|
|
||||||
Secondary Indexes::
|
Secondary Indexes::
|
||||||
You can use a coprocessor to maintain secondary indexes. For more information, see
|
You can use a coprocessor to maintain secondary indexes. For more information, see
|
||||||
link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
|
link:https://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
|
||||||
|
|
||||||
|
|
||||||
==== Types of Observer Coprocessor
|
==== Types of Observer Coprocessor
|
||||||
|
@ -163,7 +163,7 @@ MasterObserver::
|
||||||
WalObserver::
|
WalObserver::
|
||||||
A WalObserver allows you to observe events related to writes to the Write-Ahead
|
A WalObserver allows you to observe events related to writes to the Write-Ahead
|
||||||
Log (WAL). See
|
Log (WAL). See
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
|
||||||
|
|
||||||
<<cp_example,Examples>> provides working examples of observer coprocessors.
|
<<cp_example,Examples>> provides working examples of observer coprocessors.
|
||||||
|
|
||||||
|
|
|
@ -270,21 +270,21 @@ Cell content is uninterpreted bytes
|
||||||
== Data Model Operations
|
== Data Model Operations
|
||||||
|
|
||||||
The four primary data model operations are Get, Put, Scan, and Delete.
|
The four primary data model operations are Get, Put, Scan, and Delete.
|
||||||
Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
|
Operations are applied via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
|
||||||
|
|
||||||
=== Get
|
=== Get
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
|
||||||
Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
|
Gets are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
|
||||||
|
|
||||||
=== Put
|
=== Put
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
|
||||||
|
|
||||||
[[scan]]
|
[[scan]]
|
||||||
=== Scans
|
=== Scans
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
|
||||||
|
|
||||||
The following is an example of a Scan on a Table instance.
|
The following is an example of a Scan on a Table instance.
|
||||||
Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row".
|
Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row".
|
||||||
|
@ -311,12 +311,12 @@ try {
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
Note that generally the easiest way to specify a specific stop point for a scan is by using the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
|
Note that generally the easiest way to specify a specific stop point for a scan is by using the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
|
||||||
|
|
||||||
=== Delete
|
=== Delete
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
|
||||||
Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
|
Deletes are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
|
||||||
|
|
||||||
HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
|
HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
|
||||||
These tombstones, along with the dead values, are cleaned up on major compactions.
|
These tombstones, along with the dead values, are cleaned up on major compactions.
|
||||||
|
@ -355,7 +355,7 @@ Prior to HBase 0.96, the default number of versions kept was `3`, but in 0.96 an
|
||||||
.Modify the Maximum Number of Versions for a Column Family
|
.Modify the Maximum Number of Versions for a Column Family
|
||||||
====
|
====
|
||||||
This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`.
|
This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`.
|
||||||
You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
||||||
|
|
||||||
----
|
----
|
||||||
hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
|
hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
|
||||||
|
@ -367,7 +367,7 @@ hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
|
||||||
You can also specify the minimum number of versions to store per column family.
|
You can also specify the minimum number of versions to store per column family.
|
||||||
By default, this is set to 0, which means the feature is disabled.
|
By default, this is set to 0, which means the feature is disabled.
|
||||||
The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell.
|
The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell.
|
||||||
You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
||||||
|
|
||||||
----
|
----
|
||||||
hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
|
hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
|
||||||
|
@ -385,12 +385,12 @@ In this section we look at the behavior of the version dimension for each of the
|
||||||
==== Get/Scan
|
==== Get/Scan
|
||||||
|
|
||||||
Gets are implemented on top of Scans.
|
Gets are implemented on top of Scans.
|
||||||
The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
|
The below discussion of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
|
||||||
|
|
||||||
By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
|
By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
|
||||||
|
|
||||||
* to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
|
* to return more than one version, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
|
||||||
* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
|
* to return versions other than the latest, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
|
||||||
+
|
+
|
||||||
To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.
|
To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.
|
||||||
|
|
||||||
|
|
|
@ -46,7 +46,7 @@ As Apache HBase is an Apache Software Foundation project, see <<asf,asf>>
|
||||||
=== Mailing Lists
|
=== Mailing Lists
|
||||||
|
|
||||||
Sign up for the dev-list and the user-list.
|
Sign up for the dev-list and the user-list.
|
||||||
See the link:http://hbase.apache.org/mail-lists.html[mailing lists] page.
|
See the link:https://hbase.apache.org/mail-lists.html[mailing lists] page.
|
||||||
Posing questions - and helping to answer other people's questions - is encouraged! There are varying levels of experience on both lists so patience and politeness are encouraged (and please stay on topic.)
|
Posing questions - and helping to answer other people's questions - is encouraged! There are varying levels of experience on both lists so patience and politeness are encouraged (and please stay on topic.)
|
||||||
|
|
||||||
[[slack]]
|
[[slack]]
|
||||||
|
@ -173,8 +173,8 @@ GIT is our repository of record for all but the Apache HBase website.
|
||||||
We used to be on SVN.
|
We used to be on SVN.
|
||||||
We migrated.
|
We migrated.
|
||||||
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
|
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
|
||||||
See link:http://hbase.apache.org/source-repository.html[Source Code
|
See link:https://hbase.apache.org/source-repository.html[Source Code
|
||||||
Management] page for contributor and committer links or search for HBase on the link:http://git.apache.org/[Apache Git] page.
|
Management] page for contributor and committer links or search for HBase on the link:https://git.apache.org/[Apache Git] page.
|
||||||
|
|
||||||
== IDEs
|
== IDEs
|
||||||
|
|
||||||
|
@ -583,7 +583,7 @@ the checking of the produced artifacts to ensure they are 'good' --
|
||||||
e.g. extracting the produced tarballs, verifying that they
|
e.g. extracting the produced tarballs, verifying that they
|
||||||
look right, then starting HBase and checking that everything is running
|
look right, then starting HBase and checking that everything is running
|
||||||
correctly -- or the signing and pushing of the tarballs to
|
correctly -- or the signing and pushing of the tarballs to
|
||||||
link:http://people.apache.org[people.apache.org].
|
link:https://people.apache.org[people.apache.org].
|
||||||
Take a look. Modify/improve as you see fit.
|
Take a look. Modify/improve as you see fit.
|
||||||
====
|
====
|
||||||
|
|
||||||
|
@ -763,7 +763,7 @@ To finish the release, take up the script from here on out.
|
||||||
+
|
+
|
||||||
The artifacts are in the maven repository in the staging area in the 'open' state.
|
The artifacts are in the maven repository in the staging area in the 'open' state.
|
||||||
While in this 'open' state you can check out what you've published to make sure all is good.
|
While in this 'open' state you can check out what you've published to make sure all is good.
|
||||||
To do this, log in to Apache's Nexus at link:http://repository.apache.org[repository.apache.org] using your Apache ID.
|
To do this, log in to Apache's Nexus at link:https://repository.apache.org[repository.apache.org] using your Apache ID.
|
||||||
Find your artifacts in the staging repository. Click on 'Staging Repositories' and look for a new one ending in "hbase" with a status of 'Open', select it.
|
Find your artifacts in the staging repository. Click on 'Staging Repositories' and look for a new one ending in "hbase" with a status of 'Open', select it.
|
||||||
Use the tree view to expand the list of repository contents and inspect if the artifacts you expect are present. Check the POMs.
|
Use the tree view to expand the list of repository contents and inspect if the artifacts you expect are present. Check the POMs.
|
||||||
As long as the staging repo is open you can re-upload if something is missing or built incorrectly.
|
As long as the staging repo is open you can re-upload if something is missing or built incorrectly.
|
||||||
|
@ -785,7 +785,7 @@ Be sure to edit the pom to point to the proper staging repository.
|
||||||
Make sure you are pulling from the repository when tests run and that you are not getting from your local repository, by either passing the `-U` flag or deleting your local repo content and check maven is pulling from remote out of the staging repository.
|
Make sure you are pulling from the repository when tests run and that you are not getting from your local repository, by either passing the `-U` flag or deleting your local repo content and check maven is pulling from remote out of the staging repository.
|
||||||
====
|
====
|
||||||
|
|
||||||
See link:http://www.apache.org/dev/publishing-maven-artifacts.html[Publishing Maven Artifacts] for some pointers on this maven staging process.
|
See link:https://www.apache.org/dev/publishing-maven-artifacts.html[Publishing Maven Artifacts] for some pointers on this maven staging process.
|
||||||
|
|
||||||
If the HBase version ends in `-SNAPSHOT`, the artifacts go elsewhere.
|
If the HBase version ends in `-SNAPSHOT`, the artifacts go elsewhere.
|
||||||
They are put into the Apache snapshots repository directly and are immediately available.
|
They are put into the Apache snapshots repository directly and are immediately available.
|
||||||
|
@ -869,7 +869,7 @@ This plugin is run when you specify the +site+ goal as in when you run +mvn site
|
||||||
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on building the documentation.
|
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on building the documentation.
|
||||||
|
|
||||||
[[hbase.org]]
|
[[hbase.org]]
|
||||||
== Updating link:http://hbase.apache.org[hbase.apache.org]
|
== Updating link:https://hbase.apache.org[hbase.apache.org]
|
||||||
|
|
||||||
[[hbase.org.site.contributing]]
|
[[hbase.org.site.contributing]]
|
||||||
=== Contributing to hbase.apache.org
|
=== Contributing to hbase.apache.org
|
||||||
|
@ -877,7 +877,7 @@ See <<appendix_contributing_to_documentation,appendix contributing to documentat
|
||||||
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on contributing to the documentation or website.
|
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on contributing to the documentation or website.
|
||||||
|
|
||||||
[[hbase.org.site.publishing]]
|
[[hbase.org.site.publishing]]
|
||||||
=== Publishing link:http://hbase.apache.org[hbase.apache.org]
|
=== Publishing link:https://hbase.apache.org[hbase.apache.org]
|
||||||
|
|
||||||
See <<website_publish>> for instructions on publishing the website and documentation.
|
See <<website_publish>> for instructions on publishing the website and documentation.
|
||||||
|
|
||||||
|
@ -1277,7 +1277,7 @@ $ mvn clean install test -Dtest=TestZooKeeper -PskipIntegrationTests
|
||||||
==== Running integration tests against mini cluster
|
==== Running integration tests against mini cluster
|
||||||
|
|
||||||
HBase 0.92 added a `verify` maven target.
|
HBase 0.92 added a `verify` maven target.
|
||||||
Invoking it, for example by doing `mvn verify`, will run all the phases up to and including the verify phase via the maven link:http://maven.apache.org/plugins/maven-failsafe-plugin/[failsafe
|
Invoking it, for example by doing `mvn verify`, will run all the phases up to and including the verify phase via the maven link:https://maven.apache.org/plugins/maven-failsafe-plugin/[failsafe
|
||||||
plugin], running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group.
|
plugin], running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group.
|
||||||
After you have completed +mvn install -DskipTests+ You can run just the integration tests by invoking:
|
After you have completed +mvn install -DskipTests+ You can run just the integration tests by invoking:
|
||||||
|
|
||||||
|
@ -1332,7 +1332,7 @@ Currently there is no support for running integration tests against a distribute
|
||||||
The tests interact with the distributed cluster by using the methods in the `DistributedHBaseCluster` (implementing `HBaseCluster`) class, which in turn uses a pluggable `ClusterManager`.
|
The tests interact with the distributed cluster by using the methods in the `DistributedHBaseCluster` (implementing `HBaseCluster`) class, which in turn uses a pluggable `ClusterManager`.
|
||||||
Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default `ClusterManager` is `HBaseClusterManager`, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough "power" to start/stop servers on the remote machines.
|
Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default `ClusterManager` is `HBaseClusterManager`, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough "power" to start/stop servers on the remote machines.
|
||||||
By default, it picks up `HBASE_SSH_OPTS`, `HBASE_HOME`, `HBASE_CONF_DIR` from the env, and uses `bin/hbase-daemon.sh` to carry out the actions.
|
By default, it picks up `HBASE_SSH_OPTS`, `HBASE_HOME`, `HBASE_CONF_DIR` from the env, and uses `bin/hbase-daemon.sh` to carry out the actions.
|
||||||
Currently tarball deployments, deployments which uses _hbase-daemons.sh_, and link:http://incubator.apache.org/ambari/[Apache Ambari] deployments are supported.
|
Currently tarball deployments, deployments which uses _hbase-daemons.sh_, and link:https://incubator.apache.org/ambari/[Apache Ambari] deployments are supported.
|
||||||
_/etc/init.d/_ scripts are not supported for now, but it can be easily added.
|
_/etc/init.d/_ scripts are not supported for now, but it can be easily added.
|
||||||
For other deployment options, a ClusterManager can be implemented and plugged in.
|
For other deployment options, a ClusterManager can be implemented and plugged in.
|
||||||
|
|
||||||
|
@ -1844,10 +1844,10 @@ The script checks the directory for sub-directory called _.git/_, before proceed
|
||||||
=== Submitting Patches
|
=== Submitting Patches
|
||||||
|
|
||||||
If you are new to submitting patches to open source or new to submitting patches to Apache, start by
|
If you are new to submitting patches to open source or new to submitting patches to Apache, start by
|
||||||
reading the link:http://commons.apache.org/patches.html[On Contributing Patches] page from
|
reading the link:https://commons.apache.org/patches.html[On Contributing Patches] page from
|
||||||
link:http://commons.apache.org/[Apache Commons Project].
|
link:https://commons.apache.org/[Apache Commons Project].
|
||||||
It provides a nice overview that applies equally to the Apache HBase Project.
|
It provides a nice overview that applies equally to the Apache HBase Project.
|
||||||
link:http://accumulo.apache.org/git.html[Accumulo doc on how to contribute and develop] is also
|
link:https://accumulo.apache.org/git.html[Accumulo doc on how to contribute and develop] is also
|
||||||
good read to understand development workflow.
|
good read to understand development workflow.
|
||||||
|
|
||||||
[[submitting.patches.create]]
|
[[submitting.patches.create]]
|
||||||
|
@ -1941,11 +1941,11 @@ Significant new features should provide an integration test in addition to unit
|
||||||
[[reviewboard]]
|
[[reviewboard]]
|
||||||
==== ReviewBoard
|
==== ReviewBoard
|
||||||
|
|
||||||
Patches larger than one screen, or patches that will be tricky to review, should go through link:http://reviews.apache.org[ReviewBoard].
|
Patches larger than one screen, or patches that will be tricky to review, should go through link:https://reviews.apache.org[ReviewBoard].
|
||||||
|
|
||||||
.Procedure: Use ReviewBoard
|
.Procedure: Use ReviewBoard
|
||||||
. Register for an account if you don't already have one.
|
. Register for an account if you don't already have one.
|
||||||
It does not use the credentials from link:http://issues.apache.org[issues.apache.org].
|
It does not use the credentials from link:https://issues.apache.org[issues.apache.org].
|
||||||
Log in.
|
Log in.
|
||||||
. Click [label]#New Review Request#.
|
. Click [label]#New Review Request#.
|
||||||
. Choose the `hbase-git` repository.
|
. Choose the `hbase-git` repository.
|
||||||
|
@ -1971,8 +1971,8 @@ For more information on how to use ReviewBoard, see link:http://www.reviewboard.
|
||||||
|
|
||||||
New committers are encouraged to first read Apache's generic committer documentation:
|
New committers are encouraged to first read Apache's generic committer documentation:
|
||||||
|
|
||||||
* link:http://www.apache.org/dev/new-committers-guide.html[Apache New Committer Guide]
|
* link:https://www.apache.org/dev/new-committers-guide.html[Apache New Committer Guide]
|
||||||
* link:http://www.apache.org/dev/committers.html[Apache Committer FAQ]
|
* link:https://www.apache.org/dev/committers.html[Apache Committer FAQ]
|
||||||
|
|
||||||
===== Review
|
===== Review
|
||||||
|
|
||||||
|
|
|
@ -29,7 +29,7 @@
|
||||||
|
|
||||||
This chapter will cover access to Apache HBase either through non-Java languages and
|
This chapter will cover access to Apache HBase either through non-Java languages and
|
||||||
through custom protocols. For information on using the native HBase APIs, refer to
|
through custom protocols. For information on using the native HBase APIs, refer to
|
||||||
link:http://hbase.apache.org/apidocs/index.html[User API Reference] and the
|
link:https://hbase.apache.org/apidocs/index.html[User API Reference] and the
|
||||||
<<hbase_apis,HBase APIs>> chapter.
|
<<hbase_apis,HBase APIs>> chapter.
|
||||||
|
|
||||||
== REST
|
== REST
|
||||||
|
@ -641,8 +641,8 @@ represent persistent data.
|
||||||
This code example has the following dependencies:
|
This code example has the following dependencies:
|
||||||
|
|
||||||
. HBase 0.90.x or newer
|
. HBase 0.90.x or newer
|
||||||
. commons-beanutils.jar (http://commons.apache.org/)
|
. commons-beanutils.jar (https://commons.apache.org/)
|
||||||
. commons-pool-1.5.5.jar (http://commons.apache.org/)
|
. commons-pool-1.5.5.jar (https://commons.apache.org/)
|
||||||
. transactional-tableindexed for HBase 0.90 (https://github.com/hbase-trx/hbase-transactional-tableindexed)
|
. transactional-tableindexed for HBase 0.90 (https://github.com/hbase-trx/hbase-transactional-tableindexed)
|
||||||
|
|
||||||
.Download `hbase-jdo`
|
.Download `hbase-jdo`
|
||||||
|
@ -802,7 +802,7 @@ with HBase.
|
||||||
----
|
----
|
||||||
resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"
|
resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"
|
||||||
|
|
||||||
resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/"
|
resolvers += "Thrift" at "https://people.apache.org/~rawson/repo/"
|
||||||
|
|
||||||
libraryDependencies ++= Seq(
|
libraryDependencies ++= Seq(
|
||||||
"org.apache.hadoop" % "hadoop-core" % "0.20.2",
|
"org.apache.hadoop" % "hadoop-core" % "0.20.2",
|
||||||
|
|
|
@ -33,10 +33,10 @@ When should I use HBase?::
|
||||||
See <<arch.overview>> in the Architecture chapter.
|
See <<arch.overview>> in the Architecture chapter.
|
||||||
|
|
||||||
Are there other HBase FAQs?::
|
Are there other HBase FAQs?::
|
||||||
See the FAQ that is up on the wiki, link:http://wiki.apache.org/hadoop/Hbase/FAQ[HBase Wiki FAQ].
|
See the FAQ that is up on the wiki, link:https://wiki.apache.org/hadoop/Hbase/FAQ[HBase Wiki FAQ].
|
||||||
|
|
||||||
Does HBase support SQL?::
|
Does HBase support SQL?::
|
||||||
Not really. SQL-ish support for HBase via link:http://hive.apache.org/[Hive] is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the <<datamodel>> section for examples on the HBase client.
|
Not really. SQL-ish support for HBase via link:https://hive.apache.org/[Hive] is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the <<datamodel>> section for examples on the HBase client.
|
||||||
|
|
||||||
How can I find examples of NoSQL/HBase?::
|
How can I find examples of NoSQL/HBase?::
|
||||||
See the link to the BigTable paper in <<other.info>>, as well as the other papers.
|
See the link to the BigTable paper in <<other.info>>, as well as the other papers.
|
||||||
|
|
|
@ -70,7 +70,7 @@ See <<java,Java>> for information about supported JDK versions.
|
||||||
=== Get Started with HBase
|
=== Get Started with HBase
|
||||||
|
|
||||||
.Procedure: Download, Configure, and Start HBase in Standalone Mode
|
.Procedure: Download, Configure, and Start HBase in Standalone Mode
|
||||||
. Choose a download site from this list of link:http://www.apache.org/dyn/closer.cgi/hbase/[Apache Download Mirrors].
|
. Choose a download site from this list of link:https://www.apache.org/dyn/closer.cgi/hbase/[Apache Download Mirrors].
|
||||||
Click on the suggested top link.
|
Click on the suggested top link.
|
||||||
This will take you to a mirror of _HBase Releases_.
|
This will take you to a mirror of _HBase Releases_.
|
||||||
Click on the folder named _stable_ and then download the binary file that ends in _.tar.gz_ to your local filesystem.
|
Click on the folder named _stable_ and then download the binary file that ends in _.tar.gz_ to your local filesystem.
|
||||||
|
@ -307,7 +307,7 @@ You can skip the HDFS configuration to continue storing your data in the local f
|
||||||
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote
|
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote
|
||||||
system, and that they are running and available. It also assumes you are using Hadoop 2.
|
system, and that they are running and available. It also assumes you are using Hadoop 2.
|
||||||
The guide on
|
The guide on
|
||||||
link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster]
|
link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster]
|
||||||
in the Hadoop documentation is a good starting point.
|
in the Hadoop documentation is a good starting point.
|
||||||
====
|
====
|
||||||
|
|
||||||
|
|
|
@ -476,7 +476,7 @@ The host name or IP address of the name server (DNS)
|
||||||
ZooKeeper session timeout in milliseconds. It is used in two different ways.
|
ZooKeeper session timeout in milliseconds. It is used in two different ways.
|
||||||
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
|
||||||
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
|
||||||
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
https://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
|
||||||
For example, if an HBase region server connects to a ZK ensemble that's also managed
|
For example, if an HBase region server connects to a ZK ensemble that's also managed
|
||||||
by HBase, then the
|
by HBase, then the
|
||||||
session timeout will be the one specified by this configuration. But, a region server that connects
|
session timeout will be the one specified by this configuration. But, a region server that connects
|
||||||
|
@ -540,7 +540,7 @@ The host name or IP address of the name server (DNS)
|
||||||
+
|
+
|
||||||
.Description
|
.Description
|
||||||
Port used by ZooKeeper peers to talk to each other.
|
Port used by ZooKeeper peers to talk to each other.
|
||||||
See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
|
See https://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
|
||||||
for more information.
|
for more information.
|
||||||
+
|
+
|
||||||
.Default
|
.Default
|
||||||
|
@ -552,7 +552,7 @@ Port used by ZooKeeper peers to talk to each other.
|
||||||
+
|
+
|
||||||
.Description
|
.Description
|
||||||
Port used by ZooKeeper for leader election.
|
Port used by ZooKeeper for leader election.
|
||||||
See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
|
See https://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
|
||||||
for more information.
|
for more information.
|
||||||
+
|
+
|
||||||
.Default
|
.Default
|
||||||
|
@ -1264,7 +1264,7 @@ A comma-separated list of sizes for buckets for the bucketcache
|
||||||
+
|
+
|
||||||
.Description
|
.Description
|
||||||
The HFile format version to use for new files.
|
The HFile format version to use for new files.
|
||||||
Version 3 adds support for tags in hfiles (See http://hbase.apache.org/book.html#hbase.tags).
|
Version 3 adds support for tags in hfiles (See https://hbase.apache.org/book.html#hbase.tags).
|
||||||
Distributed Log Replay requires that tags are enabled. Also see the configuration
|
Distributed Log Replay requires that tags are enabled. Also see the configuration
|
||||||
'hbase.replication.rpc.codec'.
|
'hbase.replication.rpc.codec'.
|
||||||
|
|
||||||
|
@ -1964,7 +1964,7 @@ If the DFSClient configuration
|
||||||
|
|
||||||
Class used to execute the regions balancing when the period occurs.
|
Class used to execute the regions balancing when the period occurs.
|
||||||
See the class comment for more on how it works
|
See the class comment for more on how it works
|
||||||
http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html
|
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html
|
||||||
It replaces the DefaultLoadBalancer as the default (since renamed
|
It replaces the DefaultLoadBalancer as the default (since renamed
|
||||||
as the SimpleLoadBalancer).
|
as the SimpleLoadBalancer).
|
||||||
|
|
||||||
|
|
|
@ -28,7 +28,7 @@
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
This chapter provides information about performing operations using HBase native APIs.
|
This chapter provides information about performing operations using HBase native APIs.
|
||||||
This information is not exhaustive, and provides a quick reference in addition to the link:http://hbase.apache.org/apidocs/index.html[User API Reference].
|
This information is not exhaustive, and provides a quick reference in addition to the link:https://hbase.apache.org/apidocs/index.html[User API Reference].
|
||||||
The examples here are not comprehensive or complete, and should be used for purposes of illustration only.
|
The examples here are not comprehensive or complete, and should be used for purposes of illustration only.
|
||||||
|
|
||||||
Apache HBase also works with multiple external APIs.
|
Apache HBase also works with multiple external APIs.
|
||||||
|
|
|
@ -27,10 +27,10 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
Apache MapReduce is a software framework used to analyze large amounts of data. It is provided by link:http://hadoop.apache.org/[Apache Hadoop].
|
Apache MapReduce is a software framework used to analyze large amounts of data. It is provided by link:https://hadoop.apache.org/[Apache Hadoop].
|
||||||
MapReduce itself is out of the scope of this document.
|
MapReduce itself is out of the scope of this document.
|
||||||
A good place to get started with MapReduce is http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.
|
A good place to get started with MapReduce is https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.
|
||||||
MapReduce version 2 (MR2)is now part of link:http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/[YARN].
|
MapReduce version 2 (MR2)is now part of link:https://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/[YARN].
|
||||||
|
|
||||||
This chapter discusses specific configuration steps you need to take to use MapReduce on data within HBase.
|
This chapter discusses specific configuration steps you need to take to use MapReduce on data within HBase.
|
||||||
In addition, it discusses other interactions and issues between HBase and MapReduce
|
In addition, it discusses other interactions and issues between HBase and MapReduce
|
||||||
|
@ -70,7 +70,7 @@ job runner letting hbase utility pick out from the full-on classpath what it nee
|
||||||
MapReduce job configuration (See the source at `TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job)` for how this is done).
|
MapReduce job configuration (See the source at `TableMapReduceUtil#addDependencyJars(org.apache.hadoop.mapreduce.Job)` for how this is done).
|
||||||
|
|
||||||
|
|
||||||
The following example runs the bundled HBase link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`.
|
The following example runs the bundled HBase link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`.
|
||||||
It sets into `HADOOP_CLASSPATH` the jars hbase needs to run in an MapReduce context (including configuration files such as hbase-site.xml).
|
It sets into `HADOOP_CLASSPATH` the jars hbase needs to run in an MapReduce context (including configuration files such as hbase-site.xml).
|
||||||
Be sure to use the correct version of the HBase JAR for your system; replace the VERSION string in the below command line w/ the version of
|
Be sure to use the correct version of the HBase JAR for your system; replace the VERSION string in the below command line w/ the version of
|
||||||
your local hbase install. The backticks (``` symbols) cause the shell to execute the sub-commands, setting the output of `hbase classpath` into `HADOOP_CLASSPATH`.
|
your local hbase install. The backticks (``` symbols) cause the shell to execute the sub-commands, setting the output of `hbase classpath` into `HADOOP_CLASSPATH`.
|
||||||
|
@ -259,10 +259,10 @@ $ ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar rowcounte
|
||||||
|
|
||||||
== HBase as a MapReduce Job Data Source and Data Sink
|
== HBase as a MapReduce Job Data Source and Data Sink
|
||||||
|
|
||||||
HBase can be used as a data source, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat], and data sink, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html[MultiTableOutputFormat], for MapReduce jobs.
|
HBase can be used as a data source, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat], and data sink, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html[MultiTableOutputFormat], for MapReduce jobs.
|
||||||
Writing MapReduce jobs that read or write HBase, it is advisable to subclass link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper] and/or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableReducer.html[TableReducer].
|
Writing MapReduce jobs that read or write HBase, it is advisable to subclass link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper] and/or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableReducer.html[TableReducer].
|
||||||
See the do-nothing pass-through classes link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableMapper.html[IdentityTableMapper] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html[IdentityTableReducer] for basic usage.
|
See the do-nothing pass-through classes link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableMapper.html[IdentityTableMapper] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html[IdentityTableReducer] for basic usage.
|
||||||
For a more involved example, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] or review the `org.apache.hadoop.hbase.mapreduce.TestTableMapReduce` unit test.
|
For a more involved example, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] or review the `org.apache.hadoop.hbase.mapreduce.TestTableMapReduce` unit test.
|
||||||
|
|
||||||
If you run MapReduce jobs that use HBase as source or sink, need to specify source and sink table and column names in your configuration.
|
If you run MapReduce jobs that use HBase as source or sink, need to specify source and sink table and column names in your configuration.
|
||||||
|
|
||||||
|
@ -275,7 +275,7 @@ On insert, HBase 'sorts' so there is no point double-sorting (and shuffling data
|
||||||
If you do not need the Reduce, your map might emit counts of records processed for reporting at the end of the job, or set the number of Reduces to zero and use TableOutputFormat.
|
If you do not need the Reduce, your map might emit counts of records processed for reporting at the end of the job, or set the number of Reduces to zero and use TableOutputFormat.
|
||||||
If running the Reduce step makes sense in your case, you should typically use multiple reducers so that load is spread across the HBase cluster.
|
If running the Reduce step makes sense in your case, you should typically use multiple reducers so that load is spread across the HBase cluster.
|
||||||
|
|
||||||
A new HBase partitioner, the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.html[HRegionPartitioner], can run as many reducers the number of existing regions.
|
A new HBase partitioner, the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.html[HRegionPartitioner], can run as many reducers the number of existing regions.
|
||||||
The HRegionPartitioner is suitable when your table is large and your upload will not greatly alter the number of existing regions upon completion.
|
The HRegionPartitioner is suitable when your table is large and your upload will not greatly alter the number of existing regions upon completion.
|
||||||
Otherwise use the default partitioner.
|
Otherwise use the default partitioner.
|
||||||
|
|
||||||
|
@ -286,7 +286,7 @@ For more on how this mechanism works, see <<arch.bulk.load>>.
|
||||||
|
|
||||||
== RowCounter Example
|
== RowCounter Example
|
||||||
|
|
||||||
The included link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job uses `TableInputFormat` and does a count of all rows in the specified table.
|
The included link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job uses `TableInputFormat` and does a count of all rows in the specified table.
|
||||||
To run it, use the following command:
|
To run it, use the following command:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
|
@ -306,13 +306,13 @@ If you have classpath errors, see <<hbase.mapreduce.classpath>>.
|
||||||
[[splitter.default]]
|
[[splitter.default]]
|
||||||
=== The Default HBase MapReduce Splitter
|
=== The Default HBase MapReduce Splitter
|
||||||
|
|
||||||
When link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat] is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table.
|
When link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat] is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table.
|
||||||
Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.
|
Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.
|
||||||
|
|
||||||
[[splitter.custom]]
|
[[splitter.custom]]
|
||||||
=== Custom Splitters
|
=== Custom Splitters
|
||||||
|
|
||||||
For those interested in implementing custom splitters, see the method `getSplits` in link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html[TableInputFormatBase].
|
For those interested in implementing custom splitters, see the method `getSplits` in link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html[TableInputFormatBase].
|
||||||
That is where the logic for map-task assignment resides.
|
That is where the logic for map-task assignment resides.
|
||||||
|
|
||||||
[[mapreduce.example]]
|
[[mapreduce.example]]
|
||||||
|
@ -352,7 +352,7 @@ if (!b) {
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
...and the mapper instance would extend link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper]...
|
...and the mapper instance would extend link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper]...
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
----
|
----
|
||||||
|
@ -400,7 +400,7 @@ if (!b) {
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
An explanation is required of what `TableMapReduceUtil` is doing, especially with the reducer. link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] is being used as the outputFormat class, and several parameters are being set on the config (e.g., `TableOutputFormat.OUTPUT_TABLE`), as well as setting the reducer output key to `ImmutableBytesWritable` and reducer value to `Writable`.
|
An explanation is required of what `TableMapReduceUtil` is doing, especially with the reducer. link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] is being used as the outputFormat class, and several parameters are being set on the config (e.g., `TableOutputFormat.OUTPUT_TABLE`), as well as setting the reducer output key to `ImmutableBytesWritable` and reducer value to `Writable`.
|
||||||
These could be set by the programmer on the job and conf, but `TableMapReduceUtil` tries to make things easier.
|
These could be set by the programmer on the job and conf, but `TableMapReduceUtil` tries to make things easier.
|
||||||
|
|
||||||
The following is the example mapper, which will create a `Put` and matching the input `Result` and emit it.
|
The following is the example mapper, which will create a `Put` and matching the input `Result` and emit it.
|
||||||
|
|
|
@ -664,7 +664,7 @@ To NOT run WALPlayer as a mapreduce job on your cluster, force it to run all in
|
||||||
[[rowcounter]]
|
[[rowcounter]]
|
||||||
=== RowCounter and CellCounter
|
=== RowCounter and CellCounter
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] is a mapreduce job to count all the rows of a table.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] is a mapreduce job to count all the rows of a table.
|
||||||
This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
|
This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
|
||||||
It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit. It is also possible to limit
|
It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit. It is also possible to limit
|
||||||
the time range of data to be scanned by using the `--starttime=[starttime]` and `--endtime=[endtime]` flags.
|
the time range of data to be scanned by using the `--starttime=[starttime]` and `--endtime=[endtime]` flags.
|
||||||
|
@ -677,7 +677,7 @@ RowCounter only counts one version per cell.
|
||||||
|
|
||||||
Note: caching for the input Scan is configured via `hbase.client.scanner.caching` in the job configuration.
|
Note: caching for the input Scan is configured via `hbase.client.scanner.caching` in the job configuration.
|
||||||
|
|
||||||
HBase ships another diagnostic mapreduce job called link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html[CellCounter].
|
HBase ships another diagnostic mapreduce job called link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html[CellCounter].
|
||||||
Like RowCounter, it gathers more fine-grained statistics about your table.
|
Like RowCounter, it gathers more fine-grained statistics about your table.
|
||||||
The statistics gathered by RowCounter are more fine-grained and include:
|
The statistics gathered by RowCounter are more fine-grained and include:
|
||||||
|
|
||||||
|
@ -710,7 +710,7 @@ See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add ability
|
||||||
=== Offline Compaction Tool
|
=== Offline Compaction Tool
|
||||||
|
|
||||||
See the usage for the
|
See the usage for the
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
|
||||||
Run it like:
|
Run it like:
|
||||||
|
|
||||||
[source, bash]
|
[source, bash]
|
||||||
|
@ -766,7 +766,7 @@ The LoadTestTool has received many updates in recent HBase releases, including s
|
||||||
[[ops.regionmgt.majorcompact]]
|
[[ops.regionmgt.majorcompact]]
|
||||||
=== Major Compaction
|
=== Major Compaction
|
||||||
|
|
||||||
Major compactions can be requested via the HBase shell or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
Major compactions can be requested via the HBase shell or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
||||||
|
|
||||||
Note: major compactions do NOT do region merges.
|
Note: major compactions do NOT do region merges.
|
||||||
See <<compaction,compaction>> for more information about compactions.
|
See <<compaction,compaction>> for more information about compactions.
|
||||||
|
@ -912,7 +912,7 @@ But usually disks do the "John Wayne" -- i.e.
|
||||||
take a while to go down spewing errors in _dmesg_ -- or for some reason, run much slower than their companions.
|
take a while to go down spewing errors in _dmesg_ -- or for some reason, run much slower than their companions.
|
||||||
In this case you want to decommission the disk.
|
In this case you want to decommission the disk.
|
||||||
You have two options.
|
You have two options.
|
||||||
You can link:http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F[decommission
|
You can link:https://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F[decommission
|
||||||
the datanode] or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode, unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will throw some errors in its logs as it recalibrates where to get its data from -- it will likely roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging.
|
the datanode] or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode, unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will throw some errors in its logs as it recalibrates where to get its data from -- it will likely roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging.
|
||||||
|
|
||||||
.Short Circuit Reads
|
.Short Circuit Reads
|
||||||
|
@ -1065,7 +1065,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h
|
||||||
Restart the region server for the changes to take effect.
|
Restart the region server for the changes to take effect.
|
||||||
|
|
||||||
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
|
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
|
||||||
To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
|
To filter which metrics are emitted or to extend the metrics framework, see https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
|
||||||
|
|
||||||
.HBase Metrics and Ganglia
|
.HBase Metrics and Ganglia
|
||||||
[NOTE]
|
[NOTE]
|
||||||
|
@ -1073,7 +1073,7 @@ To filter which metrics are emitted or to extend the metrics framework, see http
|
||||||
By default, HBase emits a large number of metrics per region server.
|
By default, HBase emits a large number of metrics per region server.
|
||||||
Ganglia may have difficulty processing all these metrics.
|
Ganglia may have difficulty processing all these metrics.
|
||||||
Consider increasing the capacity of the Ganglia server or reducing the number of metrics emitted by HBase.
|
Consider increasing the capacity of the Ganglia server or reducing the number of metrics emitted by HBase.
|
||||||
See link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering[Metrics Filtering].
|
See link:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering[Metrics Filtering].
|
||||||
====
|
====
|
||||||
|
|
||||||
=== Disabling Metrics
|
=== Disabling Metrics
|
||||||
|
@ -1458,7 +1458,7 @@ A single WAL edit goes through several steps in order to be replicated to a slav
|
||||||
. The edit is tagged with the master's UUID and added to a buffer.
|
. The edit is tagged with the master's UUID and added to a buffer.
|
||||||
When the buffer is filled, or the reader reaches the end of the file, the buffer is sent to a random region server on the slave cluster.
|
When the buffer is filled, or the reader reaches the end of the file, the buffer is sent to a random region server on the slave cluster.
|
||||||
. The region server reads the edits sequentially and separates them into buffers, one buffer per table.
|
. The region server reads the edits sequentially and separates them into buffers, one buffer per table.
|
||||||
After all edits are read, each buffer is flushed using link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], HBase's normal client.
|
After all edits are read, each buffer is flushed using link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], HBase's normal client.
|
||||||
The master's UUID and the UUIDs of slaves which have already consumed the data are preserved in the edits they are applied, in order to prevent replication loops.
|
The master's UUID and the UUIDs of slaves which have already consumed the data are preserved in the edits they are applied, in order to prevent replication loops.
|
||||||
. In the master, the offset for the WAL that is currently being replicated is registered in ZooKeeper.
|
. In the master, the offset for the WAL that is currently being replicated is registered in ZooKeeper.
|
||||||
|
|
||||||
|
@ -2093,7 +2093,7 @@ The act of copying these files creates new HDFS metadata, which is why a restore
|
||||||
=== Live Cluster Backup - Replication
|
=== Live Cluster Backup - Replication
|
||||||
|
|
||||||
This approach assumes that there is a second cluster.
|
This approach assumes that there is a second cluster.
|
||||||
See the HBase page on link:http://hbase.apache.org/book.html#_cluster_replication[replication] for more information.
|
See the HBase page on link:https://hbase.apache.org/book.html#_cluster_replication[replication] for more information.
|
||||||
|
|
||||||
[[ops.backup.live.copytable]]
|
[[ops.backup.live.copytable]]
|
||||||
=== Live Cluster Backup - CopyTable
|
=== Live Cluster Backup - CopyTable
|
||||||
|
@ -2302,7 +2302,7 @@ as in <<snapshots_s3>>.
|
||||||
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
|
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
|
||||||
higher. No version of HBase supports Hadoop 2.7.0.
|
higher. No version of HBase supports Hadoop 2.7.0.
|
||||||
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
|
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
|
||||||
See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
|
See https://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
|
||||||
|
|
||||||
After you meet the prerequisites, follow the instructions
|
After you meet the prerequisites, follow the instructions
|
||||||
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
|
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
|
||||||
|
@ -2365,7 +2365,7 @@ See <<gcpause,gcpause>>, <<trouble.log.gc,trouble.log.gc>> and elsewhere (TODO:
|
||||||
Generally less regions makes for a smoother running cluster (you can always manually split the big regions later (if necessary) to spread the data, or request load, over the cluster); 20-200 regions per RS is a reasonable range.
|
Generally less regions makes for a smoother running cluster (you can always manually split the big regions later (if necessary) to spread the data, or request load, over the cluster); 20-200 regions per RS is a reasonable range.
|
||||||
The number of regions cannot be configured directly (unless you go for fully <<disable.splitting,disable.splitting>>); adjust the region size to achieve the target region size given table size.
|
The number of regions cannot be configured directly (unless you go for fully <<disable.splitting,disable.splitting>>); adjust the region size to achieve the target region size given table size.
|
||||||
|
|
||||||
When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor], as well as shell commands.
|
When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor], as well as shell commands.
|
||||||
These settings will override the ones in `hbase-site.xml`.
|
These settings will override the ones in `hbase-site.xml`.
|
||||||
That is useful if your tables have different workloads/use cases.
|
That is useful if your tables have different workloads/use cases.
|
||||||
|
|
||||||
|
|
|
@ -320,7 +320,7 @@ See also <<perf.compression.however>> for compression caveats.
|
||||||
[[schema.regionsize]]
|
[[schema.regionsize]]
|
||||||
=== Table RegionSize
|
=== Table RegionSize
|
||||||
|
|
||||||
The regionsize can be set on a per-table basis via `setFileSize` on link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize.
|
The regionsize can be set on a per-table basis via `setFileSize` on link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize.
|
||||||
|
|
||||||
See <<ops.capacity.regions>> for more information.
|
See <<ops.capacity.regions>> for more information.
|
||||||
|
|
||||||
|
@ -372,7 +372,7 @@ Bloom filters are enabled on a Column Family.
|
||||||
You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API.
|
You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API.
|
||||||
Valid values are `NONE`, `ROW` (default), or `ROWCOL`.
|
Valid values are `NONE`, `ROW` (default), or `ROWCOL`.
|
||||||
See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`.
|
See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`.
|
||||||
See also the API documentation for link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
See also the API documentation for link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
||||||
|
|
||||||
The following example creates a table and enables a ROWCOL Bloom filter on the `colfam1` column family.
|
The following example creates a table and enables a ROWCOL Bloom filter on the `colfam1` column family.
|
||||||
|
|
||||||
|
@ -431,7 +431,7 @@ The blocksize can be configured for each ColumnFamily in a table, and defaults t
|
||||||
Larger cell values require larger blocksizes.
|
Larger cell values require larger blocksizes.
|
||||||
There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
|
There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information.
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information.
|
||||||
|
|
||||||
[[cf.in.memory]]
|
[[cf.in.memory]]
|
||||||
=== In-Memory ColumnFamilies
|
=== In-Memory ColumnFamilies
|
||||||
|
@ -440,7 +440,7 @@ ColumnFamilies can optionally be defined as in-memory.
|
||||||
Data is still persisted to disk, just like any other ColumnFamily.
|
Data is still persisted to disk, just like any other ColumnFamily.
|
||||||
In-memory blocks have the highest priority in the <<block.cache>>, but it is not a guarantee that the entire table will be in memory.
|
In-memory blocks have the highest priority in the <<block.cache>>, but it is not a guarantee that the entire table will be in memory.
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
|
||||||
|
|
||||||
[[perf.compression]]
|
[[perf.compression]]
|
||||||
=== Compression
|
=== Compression
|
||||||
|
@ -549,7 +549,7 @@ If deferred log flush is used, WAL edits are kept in memory until the flush peri
|
||||||
The benefit is aggregated and asynchronous `WAL`- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost.
|
The benefit is aggregated and asynchronous `WAL`- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost.
|
||||||
This is safer, however, than not using WAL at all with Puts.
|
This is safer, however, than not using WAL at all with Puts.
|
||||||
|
|
||||||
Deferred log flush can be configured on tables via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor].
|
Deferred log flush can be configured on tables via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor].
|
||||||
The default value of `hbase.regionserver.optionallogflushinterval` is 1000ms.
|
The default value of `hbase.regionserver.optionallogflushinterval` is 1000ms.
|
||||||
|
|
||||||
[[perf.hbase.client.putwal]]
|
[[perf.hbase.client.putwal]]
|
||||||
|
@ -574,7 +574,7 @@ There is a utility `HTableUtil` currently on MASTER that does this, but you can
|
||||||
[[perf.hbase.write.mr.reducer]]
|
[[perf.hbase.write.mr.reducer]]
|
||||||
=== MapReduce: Skip The Reducer
|
=== MapReduce: Skip The Reducer
|
||||||
|
|
||||||
When writing a lot of data to an HBase table from a MR job (e.g., with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step.
|
When writing a lot of data to an HBase table from a MR job (e.g., with link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step.
|
||||||
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
|
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
|
||||||
It's far more efficient to just write directly to HBase.
|
It's far more efficient to just write directly to HBase.
|
||||||
|
|
||||||
|
@ -600,7 +600,7 @@ For example, here is a good general thread on what to look at addressing read-ti
|
||||||
[[perf.hbase.client.caching]]
|
[[perf.hbase.client.caching]]
|
||||||
=== Scan Caching
|
=== Scan Caching
|
||||||
|
|
||||||
If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed.
|
If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed.
|
||||||
Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed.
|
Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed.
|
||||||
There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better.
|
There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better.
|
||||||
|
|
||||||
|
@ -649,7 +649,7 @@ For MapReduce jobs that use HBase tables as a source, if there a pattern where t
|
||||||
=== Close ResultScanners
|
=== Close ResultScanners
|
||||||
|
|
||||||
This isn't so much about improving performance but rather _avoiding_ performance problems.
|
This isn't so much about improving performance but rather _avoiding_ performance problems.
|
||||||
If you forget to close link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers.
|
If you forget to close link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers.
|
||||||
Always have ResultScanner processing enclosed in try/catch blocks.
|
Always have ResultScanner processing enclosed in try/catch blocks.
|
||||||
|
|
||||||
[source,java]
|
[source,java]
|
||||||
|
@ -669,7 +669,7 @@ table.close();
|
||||||
[[perf.hbase.client.blockcache]]
|
[[perf.hbase.client.blockcache]]
|
||||||
=== Block Cache
|
=== Block Cache
|
||||||
|
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method.
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method.
|
||||||
For input Scans to MapReduce jobs, this should be `false`.
|
For input Scans to MapReduce jobs, this should be `false`.
|
||||||
For frequently accessed rows, it is advisable to use the block cache.
|
For frequently accessed rows, it is advisable to use the block cache.
|
||||||
|
|
||||||
|
@ -679,8 +679,8 @@ See <<offheap.blockcache>>
|
||||||
[[perf.hbase.client.rowkeyonly]]
|
[[perf.hbase.client.rowkeyonly]]
|
||||||
=== Optimal Loading of Row Keys
|
=== Optimal Loading of Row Keys
|
||||||
|
|
||||||
When performing a table link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`.
|
When performing a table link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`.
|
||||||
The filter list should include both a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter].
|
The filter list should include both a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter].
|
||||||
Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row.
|
Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row.
|
||||||
|
|
||||||
[[perf.hbase.read.dist]]
|
[[perf.hbase.read.dist]]
|
||||||
|
@ -816,7 +816,7 @@ In this case, special care must be taken to regularly perform major compactions
|
||||||
As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads.
|
As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads.
|
||||||
Tombstones only get cleaned up with major compactions.
|
Tombstones only get cleaned up with major compactions.
|
||||||
|
|
||||||
See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
See also <<compaction>> and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
||||||
|
|
||||||
[[perf.deleting.rpc]]
|
[[perf.deleting.rpc]]
|
||||||
=== Delete RPC Behavior
|
=== Delete RPC Behavior
|
||||||
|
@ -825,7 +825,7 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer.
|
||||||
It will execute an RegionServer RPC with each invocation.
|
It will execute an RegionServer RPC with each invocation.
|
||||||
For a large number of deletes, consider `Table.delete(List)`.
|
For a large number of deletes, consider `Table.delete(List)`.
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete]
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete]
|
||||||
|
|
||||||
[[perf.hdfs]]
|
[[perf.hdfs]]
|
||||||
== HDFS
|
== HDFS
|
||||||
|
|
|
@ -27,11 +27,11 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
This is the official reference guide for the link:http://hbase.apache.org/[HBase] version it ships with.
|
This is the official reference guide for the link:https://hbase.apache.org/[HBase] version it ships with.
|
||||||
|
|
||||||
Herein you will find either the definitive documentation on an HBase topic as of its
|
Herein you will find either the definitive documentation on an HBase topic as of its
|
||||||
standing when the referenced HBase version shipped, or it will point to the location
|
standing when the referenced HBase version shipped, or it will point to the location
|
||||||
in link:http://hbase.apache.org/apidocs/index.html[Javadoc] or
|
in link:https://hbase.apache.org/apidocs/index.html[Javadoc] or
|
||||||
link:https://issues.apache.org/jira/browse/HBASE[JIRA] where the pertinent information can be found.
|
link:https://issues.apache.org/jira/browse/HBASE[JIRA] where the pertinent information can be found.
|
||||||
|
|
||||||
.About This Guide
|
.About This Guide
|
||||||
|
|
|
@ -28,7 +28,7 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop
|
In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop
|
||||||
Writables].
|
Writables].
|
||||||
Our RPC wire format therefore changes.
|
Our RPC wire format therefore changes.
|
||||||
This document describes the client/server request/response protocol and our new RPC wire-format.
|
This document describes the client/server request/response protocol and our new RPC wire-format.
|
||||||
|
|
|
@ -47,7 +47,7 @@ See also Robert Yokota's link:https://blogs.apache.org/hbase/entry/hbase-applica
|
||||||
[[schema.creation]]
|
[[schema.creation]]
|
||||||
== Schema Creation
|
== Schema Creation
|
||||||
|
|
||||||
HBase schemas can be created or updated using the <<shell>> or by using link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API.
|
HBase schemas can be created or updated using the <<shell>> or by using link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API.
|
||||||
|
|
||||||
Tables must be disabled when making ColumnFamily modifications, for example:
|
Tables must be disabled when making ColumnFamily modifications, for example:
|
||||||
|
|
||||||
|
@ -223,7 +223,7 @@ You could also optimize things so that certain pairs of keys were always in the
|
||||||
A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first.
|
A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first.
|
||||||
This effectively randomizes row keys, but sacrifices row ordering properties.
|
This effectively randomizes row keys, but sacrifices row ordering properties.
|
||||||
|
|
||||||
See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:http://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting.
|
See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:https://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting.
|
||||||
|
|
||||||
[[timeseries]]
|
[[timeseries]]
|
||||||
=== Monotonically Increasing Row Keys/Timeseries Data
|
=== Monotonically Increasing Row Keys/Timeseries Data
|
||||||
|
@ -433,7 +433,7 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
|
||||||
[[schema.versions.max]]
|
[[schema.versions.max]]
|
||||||
=== Maximum Number of Versions
|
=== Maximum Number of Versions
|
||||||
|
|
||||||
The maximum number of row versions to store is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
The maximum number of row versions to store is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
||||||
The default for max versions is 1.
|
The default for max versions is 1.
|
||||||
This is an important parameter because as described in <<datamodel>> section HBase does _not_ overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions.
|
This is an important parameter because as described in <<datamodel>> section HBase does _not_ overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions.
|
||||||
The number of max versions may need to be increased or decreased depending on application needs.
|
The number of max versions may need to be increased or decreased depending on application needs.
|
||||||
|
@ -443,14 +443,14 @@ It is not recommended setting the number of max versions to an exceedingly high
|
||||||
[[schema.minversions]]
|
[[schema.minversions]]
|
||||||
=== Minimum Number of Versions
|
=== Minimum Number of Versions
|
||||||
|
|
||||||
Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
|
||||||
The default for min versions is 0, which means the feature is disabled.
|
The default for min versions is 0, which means the feature is disabled.
|
||||||
The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, _but keep at least M versions around_" (where M is the value for minimum number of row versions, M<N). This parameter should only be set when time-to-live is enabled for a column family and must be less than the number of row versions.
|
The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, _but keep at least M versions around_" (where M is the value for minimum number of row versions, M<N). This parameter should only be set when time-to-live is enabled for a column family and must be less than the number of row versions.
|
||||||
|
|
||||||
[[supported.datatypes]]
|
[[supported.datatypes]]
|
||||||
== Supported Datatypes
|
== Supported Datatypes
|
||||||
|
|
||||||
HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
|
HBase supports a "bytes-in/bytes-out" interface via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
|
||||||
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
|
||||||
|
|
||||||
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic.
|
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic.
|
||||||
|
@ -459,7 +459,7 @@ Take that into consideration when making your design, as well as block size for
|
||||||
|
|
||||||
=== Counters
|
=== Counters
|
||||||
|
|
||||||
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`.
|
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`.
|
||||||
|
|
||||||
Synchronization on counters are done on the RegionServer, not in the client.
|
Synchronization on counters are done on the RegionServer, not in the client.
|
||||||
|
|
||||||
|
@ -479,7 +479,7 @@ Store files which contains only expired rows are deleted on minor compaction.
|
||||||
Setting `hbase.store.delete.expired.storefile` to `false` disables this feature.
|
Setting `hbase.store.delete.expired.storefile` to `false` disables this feature.
|
||||||
Setting minimum number of versions to other than 0 also disables this.
|
Setting minimum number of versions to other than 0 also disables this.
|
||||||
|
|
||||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
|
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
|
||||||
|
|
||||||
Recent versions of HBase also support setting time to live on a per cell basis.
|
Recent versions of HBase also support setting time to live on a per cell basis.
|
||||||
See link:https://issues.apache.org/jira/browse/HBASE-10560[HBASE-10560] for more information.
|
See link:https://issues.apache.org/jira/browse/HBASE-10560[HBASE-10560] for more information.
|
||||||
|
@ -494,7 +494,7 @@ There are two notable differences between cell TTL handling and ColumnFamily TTL
|
||||||
== Keeping Deleted Cells
|
== Keeping Deleted Cells
|
||||||
|
|
||||||
By default, delete markers extend back to the beginning of time.
|
By default, delete markers extend back to the beginning of time.
|
||||||
Therefore, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed.
|
Therefore, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed.
|
||||||
|
|
||||||
ColumnFamilies can optionally keep deleted cells.
|
ColumnFamilies can optionally keep deleted cells.
|
||||||
In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells.
|
In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells.
|
||||||
|
@ -684,7 +684,7 @@ in the table (e.g. make sure values are in the range 1-10). Constraints could
|
||||||
also be used to enforce referential integrity, but this is strongly discouraged
|
also be used to enforce referential integrity, but this is strongly discouraged
|
||||||
as it will dramatically decrease the write throughput of the tables where integrity
|
as it will dramatically decrease the write throughput of the tables where integrity
|
||||||
checking is enabled. Extensive documentation on using Constraints can be found at
|
checking is enabled. Extensive documentation on using Constraints can be found at
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint]
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint]
|
||||||
since version 0.94.
|
since version 0.94.
|
||||||
|
|
||||||
[[schema.casestudies]]
|
[[schema.casestudies]]
|
||||||
|
@ -1095,7 +1095,7 @@ The tl;dr version is that you should probably go with one row per user+value, an
|
||||||
|
|
||||||
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
|
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
|
||||||
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
|
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
|
||||||
Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown).
|
Doing it this way is generally recommended (see here https://hbase.apache.org/book.html#schema.smackdown).
|
||||||
|
|
||||||
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
|
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
|
||||||
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.
|
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.
|
||||||
|
|
|
@ -354,7 +354,7 @@ grant 'rest_server', 'RWCA'
|
||||||
|
|
||||||
For more information about ACLs, please see the <<hbase.accesscontrol.configuration>> section
|
For more information about ACLs, please see the <<hbase.accesscontrol.configuration>> section
|
||||||
|
|
||||||
HBase REST gateway supports link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway.
|
HBase REST gateway supports link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway.
|
||||||
To enable REST gateway Kerberos authentication for client access, add the following to the `hbase-site.xml` file for every REST gateway.
|
To enable REST gateway Kerberos authentication for client access, add the following to the `hbase-site.xml` file for every REST gateway.
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
|
@ -390,7 +390,7 @@ Substitute the keytab for HTTP for _$KEYTAB_.
|
||||||
|
|
||||||
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
|
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
|
||||||
You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
|
You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
|
||||||
For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
|
For more information, refer to link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
|
||||||
|
|
||||||
[[security.rest.gateway]]
|
[[security.rest.gateway]]
|
||||||
=== REST Gateway Impersonation Configuration
|
=== REST Gateway Impersonation Configuration
|
||||||
|
@ -1390,11 +1390,11 @@ When you issue a Scan or Get, HBase uses your default set of authorizations to
|
||||||
filter out cells that you do not have access to. A superuser can set the default
|
filter out cells that you do not have access to. A superuser can set the default
|
||||||
set of authorizations for a given user by using the `set_auths` HBase Shell command
|
set of authorizations for a given user by using the `set_auths` HBase Shell command
|
||||||
or the
|
or the
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method.
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method.
|
||||||
|
|
||||||
You can specify a different authorization during the Scan or Get, by passing the
|
You can specify a different authorization during the Scan or Get, by passing the
|
||||||
AUTHORIZATIONS option in HBase Shell, or the
|
AUTHORIZATIONS option in HBase Shell, or the
|
||||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()]
|
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()]
|
||||||
method if you use the API. This authorization will be combined with your default
|
method if you use the API. This authorization will be combined with your default
|
||||||
set as an additional filter. It will further filter your results, rather than
|
set as an additional filter. It will further filter your results, rather than
|
||||||
giving you additional authorization.
|
giving you additional authorization.
|
||||||
|
@ -1644,7 +1644,7 @@ Rotate the Master Key::
|
||||||
|
|
||||||
Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase.
|
Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase.
|
||||||
Secure bulk loading is implemented by a coprocessor, named
|
Secure bulk loading is implemented by a coprocessor, named
|
||||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
|
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
|
||||||
which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to
|
which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to
|
||||||
_/tmp/hbase-staging/_.
|
_/tmp/hbase-staging/_.
|
||||||
|
|
||||||
|
|
|
@ -27,7 +27,7 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
link:http://spark.apache.org/[Apache Spark] is a software framework that is used
|
link:https://spark.apache.org/[Apache Spark] is a software framework that is used
|
||||||
to process data in memory in a distributed manner, and is replacing MapReduce in
|
to process data in memory in a distributed manner, and is replacing MapReduce in
|
||||||
many use cases.
|
many use cases.
|
||||||
|
|
||||||
|
@ -151,7 +151,7 @@ access to HBase
|
||||||
For examples of all these functionalities, see the HBase-Spark Module.
|
For examples of all these functionalities, see the HBase-Spark Module.
|
||||||
|
|
||||||
== Spark Streaming
|
== Spark Streaming
|
||||||
http://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream
|
https://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream
|
||||||
processing framework built on top of Spark. HBase and Spark Streaming make great
|
processing framework built on top of Spark. HBase and Spark Streaming make great
|
||||||
companions in that HBase can help serve the following benefits alongside Spark
|
companions in that HBase can help serve the following benefits alongside Spark
|
||||||
Streaming.
|
Streaming.
|
||||||
|
|
|
@ -33,10 +33,10 @@ The following projects offer some support for SQL over HBase.
|
||||||
[[phoenix]]
|
[[phoenix]]
|
||||||
=== Apache Phoenix
|
=== Apache Phoenix
|
||||||
|
|
||||||
link:http://phoenix.apache.org[Apache Phoenix]
|
link:https://phoenix.apache.org[Apache Phoenix]
|
||||||
|
|
||||||
=== Trafodion
|
=== Trafodion
|
||||||
|
|
||||||
link:http://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase]
|
link:https://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase]
|
||||||
|
|
||||||
:numbered:
|
:numbered:
|
||||||
|
|
|
@ -28,7 +28,7 @@
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
|
|
||||||
Apache link:http://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework.
|
Apache link:https://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework.
|
||||||
HBase includes a Thrift API and filter language.
|
HBase includes a Thrift API and filter language.
|
||||||
The Thrift API relies on client and server processes.
|
The Thrift API relies on client and server processes.
|
||||||
|
|
||||||
|
|
|
@ -30,7 +30,7 @@
|
||||||
:icons: font
|
:icons: font
|
||||||
:experimental:
|
:experimental:
|
||||||
|
|
||||||
link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://htrace.incubator.apache.org/[HTrace].
|
link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:https://htrace.incubator.apache.org/[HTrace].
|
||||||
Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement).
|
Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement).
|
||||||
|
|
||||||
[[tracing.spanreceivers]]
|
[[tracing.spanreceivers]]
|
||||||
|
@ -67,7 +67,7 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi
|
||||||
|
|
||||||
HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster.
|
HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster.
|
||||||
|
|
||||||
_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes.
|
_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:https://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes.
|
||||||
|
|
||||||
`ZipkinSpanReceiver` for properties called `hbase.htrace.zipkin.collector-hostname` and `hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing the Zipkin collector server to which span information are sent.
|
`ZipkinSpanReceiver` for properties called `hbase.htrace.zipkin.collector-hostname` and `hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing the Zipkin collector server to which span information are sent.
|
||||||
|
|
||||||
|
|
|
@ -225,7 +225,7 @@ Search here first when you have an issue as its more than likely someone has alr
|
||||||
[[trouble.resources.lists]]
|
[[trouble.resources.lists]]
|
||||||
=== Mailing Lists
|
=== Mailing Lists
|
||||||
|
|
||||||
Ask a question on the link:http://hbase.apache.org/mail-lists.html[Apache HBase mailing lists].
|
Ask a question on the link:https://hbase.apache.org/mail-lists.html[Apache HBase mailing lists].
|
||||||
The 'dev' mailing list is aimed at the community of developers actually building Apache HBase and for features currently under development, and 'user' is generally used for questions on released versions of Apache HBase.
|
The 'dev' mailing list is aimed at the community of developers actually building Apache HBase and for features currently under development, and 'user' is generally used for questions on released versions of Apache HBase.
|
||||||
Before going to the mailing list, make sure your question has not already been answered by searching the mailing list archives first.
|
Before going to the mailing list, make sure your question has not already been answered by searching the mailing list archives first.
|
||||||
Use <<trouble.resources.searchhadoop>>.
|
Use <<trouble.resources.searchhadoop>>.
|
||||||
|
@ -596,7 +596,7 @@ See also Jesse Andersen's link:http://blog.cloudera.com/blog/2014/04/how-to-use-
|
||||||
In some situations clients that fetch data from a RegionServer get a LeaseException instead of the usual <<trouble.client.scantimeout>>.
|
In some situations clients that fetch data from a RegionServer get a LeaseException instead of the usual <<trouble.client.scantimeout>>.
|
||||||
Usually the source of the exception is `org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)` (line number may vary). It tends to happen in the context of a slow/freezing `RegionServer#next` call.
|
Usually the source of the exception is `org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)` (line number may vary). It tends to happen in the context of a slow/freezing `RegionServer#next` call.
|
||||||
It can be prevented by having `hbase.rpc.timeout` > `hbase.regionserver.lease.period`.
|
It can be prevented by having `hbase.rpc.timeout` > `hbase.regionserver.lease.period`.
|
||||||
Harsh J investigated the issue as part of the mailing list thread link:http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions]
|
Harsh J investigated the issue as part of the mailing list thread link:https://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions]
|
||||||
|
|
||||||
[[trouble.client.scarylogs]]
|
[[trouble.client.scarylogs]]
|
||||||
=== Shell or client application throws lots of scary exceptions during normal operation
|
=== Shell or client application throws lots of scary exceptions during normal operation
|
||||||
|
@ -802,7 +802,7 @@ hadoop fs -du /hbase/myTable
|
||||||
----
|
----
|
||||||
...returns a list of the regions under the HBase table 'myTable' and their disk utilization.
|
...returns a list of the regions under the HBase table 'myTable' and their disk utilization.
|
||||||
|
|
||||||
For more information on HDFS shell commands, see the link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation].
|
For more information on HDFS shell commands, see the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation].
|
||||||
|
|
||||||
[[trouble.namenode.hbase.objects]]
|
[[trouble.namenode.hbase.objects]]
|
||||||
=== Browsing HDFS for HBase Objects
|
=== Browsing HDFS for HBase Objects
|
||||||
|
@ -1174,7 +1174,7 @@ If you have a DNS server, you can set `hbase.zookeeper.dns.interface` and `hbase
|
||||||
|
|
||||||
ZooKeeper is the cluster's "canary in the mineshaft". It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster.
|
ZooKeeper is the cluster's "canary in the mineshaft". It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster.
|
||||||
|
|
||||||
See the link:http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page.
|
See the link:https://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page.
|
||||||
It has suggestions and tools for checking disk and networking performance; i.e.
|
It has suggestions and tools for checking disk and networking performance; i.e.
|
||||||
the operating environment your ZooKeeper and HBase are running in.
|
the operating environment your ZooKeeper and HBase are running in.
|
||||||
|
|
||||||
|
@ -1313,7 +1313,7 @@ These changes were backported to HBase 0.98.x and apply to all newer versions.
|
||||||
== HBase and HDFS
|
== HBase and HDFS
|
||||||
|
|
||||||
General configuration guidance for Apache HDFS is out of the scope of this guide.
|
General configuration guidance for Apache HDFS is out of the scope of this guide.
|
||||||
Refer to the documentation available at http://hadoop.apache.org/ for extensive information about configuring HDFS.
|
Refer to the documentation available at https://hadoop.apache.org/ for extensive information about configuring HDFS.
|
||||||
This section deals with HDFS in terms of HBase.
|
This section deals with HDFS in terms of HBase.
|
||||||
|
|
||||||
In most cases, HBase stores its data in Apache HDFS.
|
In most cases, HBase stores its data in Apache HDFS.
|
||||||
|
|
|
@ -171,7 +171,7 @@ Similarly, you can now expand into other operations such as Get, Scan, or Delete
|
||||||
|
|
||||||
== MRUnit
|
== MRUnit
|
||||||
|
|
||||||
link:http://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs.
|
link:https://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs.
|
||||||
You can use it to test HBase jobs in the same way as other MapReduce jobs.
|
You can use it to test HBase jobs in the same way as other MapReduce jobs.
|
||||||
|
|
||||||
Given a MapReduce job that writes to an HBase table called `MyTest`, which has one column family called `CF`, the reducer of such a job could look like the following:
|
Given a MapReduce job that writes to an HBase table called `MyTest`, which has one column family called `CF`, the reducer of such a job could look like the following:
|
||||||
|
|
|
@ -125,14 +125,14 @@ for warning about incompatible changes). All effort will be made to provide a de
|
||||||
[[hbase.client.api.surface]]
|
[[hbase.client.api.surface]]
|
||||||
==== HBase API Surface
|
==== HBase API Surface
|
||||||
|
|
||||||
HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses link:http://yetus.apache.org/documentation/0.5.0/interface-classification/[Apache Yetus Audience Annotations] to guide downstream expectations for stability.
|
HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses link:https://yetus.apache.org/documentation/0.5.0/interface-classification/[Apache Yetus Audience Annotations] to guide downstream expectations for stability.
|
||||||
|
|
||||||
* InterfaceAudience (link:http://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceAudience.html[javadocs]): captures the intended audience, possible values include:
|
* InterfaceAudience (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceAudience.html[javadocs]): captures the intended audience, possible values include:
|
||||||
- Public: safe for end users and external projects
|
- Public: safe for end users and external projects
|
||||||
- LimitedPrivate: used for internals we expect to be pluggable, such as coprocessors
|
- LimitedPrivate: used for internals we expect to be pluggable, such as coprocessors
|
||||||
- Private: strictly for use within HBase itself
|
- Private: strictly for use within HBase itself
|
||||||
Classes which are defined as `IA.Private` may be used as parameters or return values for interfaces which are declared `IA.LimitedPrivate`. Treat the `IA.Private` object as opaque; do not try to access its methods or fields directly.
|
Classes which are defined as `IA.Private` may be used as parameters or return values for interfaces which are declared `IA.LimitedPrivate`. Treat the `IA.Private` object as opaque; do not try to access its methods or fields directly.
|
||||||
* InterfaceStability (link:http://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceStability.html[javadocs]): describes what types of interface changes are permitted. Possible values include:
|
* InterfaceStability (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceStability.html[javadocs]): describes what types of interface changes are permitted. Possible values include:
|
||||||
- Stable: the interface is fixed and is not expected to change
|
- Stable: the interface is fixed and is not expected to change
|
||||||
- Evolving: the interface may change in future minor verisons
|
- Evolving: the interface may change in future minor verisons
|
||||||
- Unstable: the interface may change at any time
|
- Unstable: the interface may change at any time
|
||||||
|
@ -159,7 +159,7 @@ HBase Private API::
|
||||||
=== Pre 1.0 versions
|
=== Pre 1.0 versions
|
||||||
|
|
||||||
.HBase Pre-1.0 versions are all EOM
|
.HBase Pre-1.0 versions are all EOM
|
||||||
NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:http://www.apache.org/dist/hbase/[the header of our downloads].
|
NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:https://www.apache.org/dist/hbase/[the header of our downloads].
|
||||||
|
|
||||||
Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:https://web.archive.org/web/20150905071342/https://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0.
|
Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:https://web.archive.org/web/20150905071342/https://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0.
|
||||||
|
|
||||||
|
|
|
@ -106,7 +106,7 @@ The newer version, the better. ZooKeeper 3.4.x is required as of HBase 1.0.0
|
||||||
.ZooKeeper Maintenance
|
.ZooKeeper Maintenance
|
||||||
[CAUTION]
|
[CAUTION]
|
||||||
====
|
====
|
||||||
Be sure to set up the data dir cleaner described under link:http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper
|
Be sure to set up the data dir cleaner described under link:https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper
|
||||||
Maintenance] else you could have 'interesting' problems a couple of months in; i.e.
|
Maintenance] else you could have 'interesting' problems a couple of months in; i.e.
|
||||||
zookeeper could start dropping sessions if it has to run through a directory of hundreds of thousands of logs which is wont to do around leader reelection time -- a process rare but run on occasion whether because a machine is dropped or happens to hiccup.
|
zookeeper could start dropping sessions if it has to run through a directory of hundreds of thousands of logs which is wont to do around leader reelection time -- a process rare but run on occasion whether because a machine is dropped or happens to hiccup.
|
||||||
====
|
====
|
||||||
|
@ -135,9 +135,9 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
|
||||||
Note that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase.
|
Note that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase.
|
||||||
Just make sure to set `HBASE_MANAGES_ZK` to `false` if you want it to stay up across HBase restarts so that when HBase shuts down, it doesn't take ZooKeeper down with it.
|
Just make sure to set `HBASE_MANAGES_ZK` to `false` if you want it to stay up across HBase restarts so that when HBase shuts down, it doesn't take ZooKeeper down with it.
|
||||||
|
|
||||||
For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting
|
For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:https://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting
|
||||||
Started Guide].
|
Started Guide].
|
||||||
Additionally, see the link:http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper
|
Additionally, see the link:https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper
|
||||||
documentation] for more information on ZooKeeper sizing.
|
documentation] for more information on ZooKeeper sizing.
|
||||||
|
|
||||||
[[zk.sasl.auth]]
|
[[zk.sasl.auth]]
|
||||||
|
|
|
@ -42,7 +42,7 @@
|
||||||
// Logo for HTML -- doesn't render in PDF
|
// Logo for HTML -- doesn't render in PDF
|
||||||
++++
|
++++
|
||||||
<div>
|
<div>
|
||||||
<a href="http://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a>
|
<a href="https://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a>
|
||||||
</div>
|
</div>
|
||||||
++++
|
++++
|
||||||
|
|
||||||
|
|
|
@ -82,7 +82,7 @@ NOTE:This is not true _across rows_ for multirow batch mutations.
|
||||||
A scan is *not* a consistent view of a table. Scans do *not* exhibit _snapshot isolation_.
|
A scan is *not* a consistent view of a table. Scans do *not* exhibit _snapshot isolation_.
|
||||||
|
|
||||||
Rather, scans have the following properties:
|
Rather, scans have the following properties:
|
||||||
. Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time)footnoteref[consistency,A consistent view is not guaranteed intra-row scanning -- i.e. fetching a portion of a row in one RPC then going back to fetch another portion of the row in a subsequent RPC. Intra-row scanning happens when you set a limit on how many values to return per Scan#next (See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)"[Scan#setBatch(int)]).]
|
. Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time)footnoteref[consistency,A consistent view is not guaranteed intra-row scanning -- i.e. fetching a portion of a row in one RPC then going back to fetch another portion of the row in a subsequent RPC. Intra-row scanning happens when you set a limit on how many values to return per Scan#next (See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)"[Scan#setBatch(int)]).]
|
||||||
. A scan will always reflect a view of the data _at least as new as_ the beginning of the scan. This satisfies the visibility guarantees enumerated below.
|
. A scan will always reflect a view of the data _at least as new as_ the beginning of the scan. This satisfies the visibility guarantees enumerated below.
|
||||||
.. For example, if client A writes data X and then communicates via a side channel to client B, any scans started by client B will contain data at least as new as X.
|
.. For example, if client A writes data X and then communicates via a side channel to client B, any scans started by client B will contain data at least as new as X.
|
||||||
.. A scan _must_ reflect all mutations committed prior to the construction of the scanner, and _may_ reflect some mutations committed subsequent to the construction of the scanner.
|
.. A scan _must_ reflect all mutations committed prior to the construction of the scanner, and _may_ reflect some mutations committed subsequent to the construction of the scanner.
|
||||||
|
|
|
@ -22,11 +22,11 @@ under the License.
|
||||||
|
|
||||||
== Introduction
|
== Introduction
|
||||||
|
|
||||||
link:http://hbase.apache.org[Apache HBase (TM)] is a distributed, column-oriented store, modeled after Google's link:http://research.google.com/archive/bigtable.html[BigTable]. Apache HBase is built on top of link:http://hadoop.apache.org[Hadoop] for its link:http://hadoop.apache.org/mapreduce[MapReduce] link:http://hadoop.apache.org/hdfs[distributed file system] implementations. All these projects are open-source and part of the link:http://www.apache.org[Apache Software Foundation].
|
link:https://hbase.apache.org[Apache HBase (TM)] is a distributed, column-oriented store, modeled after Google's link:http://research.google.com/archive/bigtable.html[BigTable]. Apache HBase is built on top of link:https://hadoop.apache.org[Hadoop] for its link:https://hadoop.apache.org/mapreduce[MapReduce] link:https://hadoop.apache.org/hdfs[distributed file system] implementations. All these projects are open-source and part of the link:https://www.apache.org[Apache Software Foundation].
|
||||||
|
|
||||||
== Purpose
|
== Purpose
|
||||||
|
|
||||||
This document explains the *intricacies* of running Apache HBase on Windows using Cygwin* as an all-in-one single-node installation for testing and development. The HBase link:http://hbase.apache.org/apidocs/overview-summary.html#overview_description[Overview] and link:book.html#getting_started[QuickStart] guides on the other hand go a long way in explaning how to setup link:http://hadoop.apache.org/hbase[HBase] in more complex deployment scenarios.
|
This document explains the *intricacies* of running Apache HBase on Windows using Cygwin* as an all-in-one single-node installation for testing and development. The HBase link:https://hbase.apache.org/apidocs/overview-summary.html#overview_description[Overview] and link:book.html#getting_started[QuickStart] guides on the other hand go a long way in explaning how to setup link:https://hadoop.apache.org/hbase[HBase] in more complex deployment scenarios.
|
||||||
|
|
||||||
== Installation
|
== Installation
|
||||||
|
|
||||||
|
@ -86,7 +86,7 @@ HBase (and Hadoop) rely on link:http://nl.wikipedia.org/wiki/Secure_Shell[*SSH*]
|
||||||
|
|
||||||
=== HBase
|
=== HBase
|
||||||
|
|
||||||
Download the *latest release* of Apache HBase from link:http://www.apache.org/dyn/closer.cgi/hbase/. As the Apache HBase distributable is just a zipped archive, installation is as simple as unpacking the archive so it ends up in its final *installation* directory. Notice that HBase has to be installed in Cygwin and a good directory suggestion is to use `/usr/local/` (or [`*Root* directory]\usr\local` in Windows slang). You should end up with a `/usr/local/hbase-_versi` installation in Cygwin.
|
Download the *latest release* of Apache HBase from link:https://www.apache.org/dyn/closer.cgi/hbase/. As the Apache HBase distributable is just a zipped archive, installation is as simple as unpacking the archive so it ends up in its final *installation* directory. Notice that HBase has to be installed in Cygwin and a good directory suggestion is to use `/usr/local/` (or [`*Root* directory]\usr\local` in Windows slang). You should end up with a `/usr/local/hbase-_versi` installation in Cygwin.
|
||||||
|
|
||||||
This finishes installation. We go on with the configuration.
|
This finishes installation. We go on with the configuration.
|
||||||
|
|
||||||
|
|
|
@ -20,7 +20,7 @@ under the License.
|
||||||
= Apache HBase™ Home
|
= Apache HBase™ Home
|
||||||
|
|
||||||
.Welcome to Apache HBase(TM)
|
.Welcome to Apache HBase(TM)
|
||||||
link:http://www.apache.org/[Apache HBase(TM)] is the link:http://hadoop.apache.org[Hadoop] database, a distributed, scalable, big data store.
|
link:https://www.apache.org/[Apache HBase(TM)] is the link:https://hadoop.apache.org[Hadoop] database, a distributed, scalable, big data store.
|
||||||
|
|
||||||
.When Would I Use Apache HBase?
|
.When Would I Use Apache HBase?
|
||||||
Use Apache HBase when you need random, realtime read/write access to your Big Data. +
|
Use Apache HBase when you need random, realtime read/write access to your Big Data. +
|
||||||
|
|
|
@ -20,13 +20,13 @@ under the License.
|
||||||
= Apache HBase (TM) Metrics
|
= Apache HBase (TM) Metrics
|
||||||
|
|
||||||
== Introduction
|
== Introduction
|
||||||
Apache HBase (TM) emits Hadoop link:http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html[metrics].
|
Apache HBase (TM) emits Hadoop link:https://hadoop.apache.org/core/docs/stable/api/org/apache/hadoop/metrics/package-summary.html[metrics].
|
||||||
|
|
||||||
== Setup
|
== Setup
|
||||||
|
|
||||||
First read up on Hadoop link:http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html[metrics].
|
First read up on Hadoop link:https://hadoop.apache.org/core/docs/stable/api/org/apache/hadoop/metrics/package-summary.html[metrics].
|
||||||
|
|
||||||
If you are using ganglia, the link:http://wiki.apache.org/hadoop/GangliaMetrics[GangliaMetrics] wiki page is useful read.
|
If you are using ganglia, the link:https://wiki.apache.org/hadoop/GangliaMetrics[GangliaMetrics] wiki page is useful read.
|
||||||
|
|
||||||
To have HBase emit metrics, edit `$HBASE_HOME/conf/hadoop-metrics.properties` and enable metric 'contexts' per plugin. As of this writing, hadoop supports *file* and *ganglia* plugins. Yes, the hbase metrics files is named hadoop-metrics rather than _hbase-metrics_ because currently at least the hadoop metrics system has the properties filename hardcoded. Per metrics _context_, comment out the NullContext and enable one or more plugins instead.
|
To have HBase emit metrics, edit `$HBASE_HOME/conf/hadoop-metrics.properties` and enable metric 'contexts' per plugin. As of this writing, hadoop supports *file* and *ganglia* plugins. Yes, the hbase metrics files is named hadoop-metrics rather than _hbase-metrics_ because currently at least the hadoop metrics system has the properties filename hardcoded. Per metrics _context_, comment out the NullContext and enable one or more plugins instead.
|
||||||
|
|
||||||
|
|
|
@ -57,7 +57,7 @@ October 25th, 2012:: link:http://www.meetup.com/HBase-NYC/events/81728932/[Strat
|
||||||
|
|
||||||
September 11th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/80621872/[Contributor's Pow-Wow at HortonWorks HQ.]
|
September 11th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/80621872/[Contributor's Pow-Wow at HortonWorks HQ.]
|
||||||
|
|
||||||
August 8th, 2012:: link:http://www.apache.org/dyn/closer.cgi/hbase/[Apache HBase 0.94.1 is available for download]
|
August 8th, 2012:: link:https://www.apache.org/dyn/closer.cgi/hbase/[Apache HBase 0.94.1 is available for download]
|
||||||
|
|
||||||
June 15th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/59829652/[Birds-of-a-feather] in San Jose, day after:: link:http://hadoopsummit.org[Hadoop Summit]
|
June 15th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/59829652/[Birds-of-a-feather] in San Jose, day after:: link:http://hadoopsummit.org[Hadoop Summit]
|
||||||
|
|
||||||
|
@ -69,9 +69,9 @@ March 27th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/56021562/[Me
|
||||||
|
|
||||||
January 19th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/46702842/[Meetup @ EBay]
|
January 19th, 2012:: link:http://www.meetup.com/hbaseusergroup/events/46702842/[Meetup @ EBay]
|
||||||
|
|
||||||
January 23rd, 2012:: Apache HBase 0.92.0 released. link:http://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
January 23rd, 2012:: Apache HBase 0.92.0 released. link:https://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
||||||
|
|
||||||
December 23rd, 2011:: Apache HBase 0.90.5 released. link:http://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
December 23rd, 2011:: Apache HBase 0.90.5 released. link:https://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
||||||
|
|
||||||
November 29th, 2011:: link:http://www.meetup.com/hackathon/events/41025972/[Developer Pow-Wow in SF] at Salesforce HQ
|
November 29th, 2011:: link:http://www.meetup.com/hackathon/events/41025972/[Developer Pow-Wow in SF] at Salesforce HQ
|
||||||
|
|
||||||
|
@ -83,9 +83,9 @@ June 30th, 2011:: link:http://www.meetup.com/hbaseusergroup/events/20572251/[HBa
|
||||||
|
|
||||||
June 8th, 2011:: link:http://berlinbuzzwords.de/wiki/hbase-workshop-and-hackathon[HBase Hackathon] in Berlin to coincide with:: link:http://berlinbuzzwords.de/[Berlin Buzzwords]
|
June 8th, 2011:: link:http://berlinbuzzwords.de/wiki/hbase-workshop-and-hackathon[HBase Hackathon] in Berlin to coincide with:: link:http://berlinbuzzwords.de/[Berlin Buzzwords]
|
||||||
|
|
||||||
May 19th, 2011: Apache HBase 0.90.3 released. link:http://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
May 19th, 2011: Apache HBase 0.90.3 released. link:https://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
||||||
|
|
||||||
April 12th, 2011: Apache HBase 0.90.2 released. link:http://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
April 12th, 2011: Apache HBase 0.90.2 released. link:https://www.apache.org/dyn/closer.cgi/hbase/[Download it!]
|
||||||
|
|
||||||
March 21st, 2011:: link:http://www.meetup.com/hackathon/events/16770852/[HBase 0.92 Hackathon at StumbleUpon, SF]
|
March 21st, 2011:: link:http://www.meetup.com/hackathon/events/16770852/[HBase 0.92 Hackathon at StumbleUpon, SF]
|
||||||
February 22nd, 2011:: link:http://www.meetup.com/hbaseusergroup/events/16492913/[HUG12: February HBase User Group at StumbleUpon SF]
|
February 22nd, 2011:: link:http://www.meetup.com/hbaseusergroup/events/16492913/[HUG12: February HBase User Group at StumbleUpon SF]
|
||||||
|
@ -105,7 +105,7 @@ March 10th, 2010:: link:http://www.meetup.com/hbaseusergroup/calendar/12689351/[
|
||||||
|
|
||||||
January 27th, 2010:: Sign up for the link:http://www.meetup.com/hbaseusergroup/calendar/12241393/[HBase User Group Meeting, HUG8], at StumbleUpon in SF
|
January 27th, 2010:: Sign up for the link:http://www.meetup.com/hbaseusergroup/calendar/12241393/[HBase User Group Meeting, HUG8], at StumbleUpon in SF
|
||||||
|
|
||||||
September 8th, 2010:: Apache HBase 0.20.0 is faster, stronger, slimmer, and sweeter tasting than any previous Apache HBase release. Get it off the link:http://www.apache.org/dyn/closer.cgi/hbase/[Releases] page.
|
September 8th, 2010:: Apache HBase 0.20.0 is faster, stronger, slimmer, and sweeter tasting than any previous Apache HBase release. Get it off the link:https://www.apache.org/dyn/closer.cgi/hbase/[Releases] page.
|
||||||
|
|
||||||
November 2-6th, 2009:: link:http://dev.us.apachecon.com/c/acus2009/[ApacheCon] in Oakland. The Apache Foundation will be celebrating its 10th anniversary in beautiful Oakland by the Bay. Lots of good talks and meetups including an HBase presentation by a couple of the lads.
|
November 2-6th, 2009:: link:http://dev.us.apachecon.com/c/acus2009/[ApacheCon] in Oakland. The Apache Foundation will be celebrating its 10th anniversary in beautiful Oakland by the Bay. Lots of good talks and meetups including an HBase presentation by a couple of the lads.
|
||||||
|
|
||||||
|
@ -113,7 +113,7 @@ October 2nd, 2009:: HBase at Hadoop World in NYC. A few of us will be talking on
|
||||||
|
|
||||||
August 7th-9th, 2009:: HUG7 and HBase Hackathon at StumbleUpon in SF: Sign up for the:: link:http://www.meetup.com/hbaseusergroup/calendar/10950511/[HBase User Group Meeting, HUG7] or for the link:http://www.meetup.com/hackathon/calendar/10951718/[Hackathon] or for both (all are welcome!).
|
August 7th-9th, 2009:: HUG7 and HBase Hackathon at StumbleUpon in SF: Sign up for the:: link:http://www.meetup.com/hbaseusergroup/calendar/10950511/[HBase User Group Meeting, HUG7] or for the link:http://www.meetup.com/hackathon/calendar/10951718/[Hackathon] or for both (all are welcome!).
|
||||||
|
|
||||||
June, 2009:: HBase at HadoopSummit2009 and at NOSQL: See the link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[presentations]
|
June, 2009:: HBase at HadoopSummit2009 and at NOSQL: See the link:https://wiki.apache.org/hadoop/HBase/HBasePresentations[presentations]
|
||||||
|
|
||||||
March 3rd, 2009 :: HUG6 -- link:http://www.meetup.com/hbaseusergroup/calendar/9764004/[HBase User Group 6]
|
March 3rd, 2009 :: HUG6 -- link:http://www.meetup.com/hbaseusergroup/calendar/9764004/[HBase User Group 6]
|
||||||
|
|
||||||
|
|
|
@ -19,7 +19,7 @@ under the License.
|
||||||
|
|
||||||
= Apache HBase(TM) Sponsors
|
= Apache HBase(TM) Sponsors
|
||||||
|
|
||||||
First off, thanks to link:http://www.apache.org/foundation/thanks.html[all who sponsor] our parent, the Apache Software Foundation.
|
First off, thanks to link:https://www.apache.org/foundation/thanks.html[all who sponsor] our parent, the Apache Software Foundation.
|
||||||
|
|
||||||
The below companies have been gracious enough to provide their commerical tool offerings free of charge to the Apache HBase(TM) project.
|
The below companies have been gracious enough to provide their commerical tool offerings free of charge to the Apache HBase(TM) project.
|
||||||
|
|
||||||
|
@ -32,5 +32,5 @@ The below companies have been gracious enough to provide their commerical tool o
|
||||||
* Thank you to Boris at link:http://www.vectorportal.com/[Vector Portal] for granting us a license on the image on which our logo is based.
|
* Thank you to Boris at link:http://www.vectorportal.com/[Vector Portal] for granting us a license on the image on which our logo is based.
|
||||||
|
|
||||||
== Sponsoring the Apache Software Foundation">
|
== Sponsoring the Apache Software Foundation">
|
||||||
To contribute to the Apache Software Foundation, a good idea in our opinion, see the link:http://www.apache.org/foundation/sponsorship.html[ASF Sponsorship] page.
|
To contribute to the Apache Software Foundation, a good idea in our opinion, see the link:https://www.apache.org/foundation/sponsorship.html[ASF Sponsorship] page.
|
||||||
|
|
||||||
|
|
|
@ -29,11 +29,11 @@ under the License.
|
||||||
<body>
|
<body>
|
||||||
<section name="Introduction">
|
<section name="Introduction">
|
||||||
<p>
|
<p>
|
||||||
Apache HBase (TM) emits Hadoop <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html">metrics</a>.
|
Apache HBase (TM) emits Hadoop <a href="http://hadoop.apache.org/core/docs/stable/api/org/apache/hadoop/metrics/package-summary.html">metrics</a>.
|
||||||
</p>
|
</p>
|
||||||
</section>
|
</section>
|
||||||
<section name="Setup">
|
<section name="Setup">
|
||||||
<p>First read up on Hadoop <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html">metrics</a>.
|
<p>First read up on Hadoop <a href="http://hadoop.apache.org/core/docs/stable/api/org/apache/hadoop/metrics/package-summary.html">metrics</a>.
|
||||||
If you are using ganglia, the <a href="http://wiki.apache.org/hadoop/GangliaMetrics">GangliaMetrics</a>
|
If you are using ganglia, the <a href="http://wiki.apache.org/hadoop/GangliaMetrics">GangliaMetrics</a>
|
||||||
wiki page is useful read.</p>
|
wiki page is useful read.</p>
|
||||||
<p>To have HBase emit metrics, edit <code>$HBASE_HOME/conf/hadoop-metrics.properties</code>
|
<p>To have HBase emit metrics, edit <code>$HBASE_HOME/conf/hadoop-metrics.properties</code>
|
||||||
|
|
Loading…
Reference in New Issue