HBASE-19068 Change all url of apache.org from HTTP to HTTPS in HBase book

Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
This commit is contained in:
Yung-An He 2017-12-18 19:22:08 +08:00 committed by Chia-Ping Tsai
parent 811b88a877
commit a9e0ae6dfc
27 changed files with 174 additions and 174 deletions

View File

@ -35,9 +35,9 @@ including the documentation.
In HBase, documentation includes the following areas, and probably some others:
* The link:http://hbase.apache.org/book.html[HBase Reference
* The link:https://hbase.apache.org/book.html[HBase Reference
Guide] (this book)
* The link:http://hbase.apache.org/[HBase website]
* The link:https://hbase.apache.org/[HBase website]
* API documentation
* Command-line utility output and help text
* Web UI strings, explicit help text, context-sensitive strings, and others
@ -126,7 +126,7 @@ This directory also stores images used in the HBase Reference Guide.
The website's pages are written in an HTML-like XML dialect called xdoc, which
has a reference guide at
http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
https://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
You can edit these files in a plain-text editor, an IDE, or an XML editor such
as XML Mind XML Editor (XXE) or Oxygen XML Author.

View File

@ -101,7 +101,7 @@ The `hbase:meta` table structure is as follows:
.Values
* `info:regioninfo` (serialized link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html[HRegionInfo] instance for this region)
* `info:regioninfo` (serialized link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html[HRegionInfo] instance for this region)
* `info:server` (server:port of the RegionServer containing this region)
* `info:serverstartcode` (start-time of the RegionServer process containing this region)
@ -119,7 +119,7 @@ If a region has both an empty start and an empty end key, it is the only region
====
In the (hopefully unlikely) event that programmatic processing of catalog metadata
is required, see the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/RegionInfo.html#parseFrom-byte:A-[RegionInfo.parseFrom] utility.
is required, see the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/RegionInfo.html#parseFrom-byte:A-[RegionInfo.parseFrom] utility.
[[arch.catalog.startup]]
=== Startup Sequencing
@ -141,7 +141,7 @@ Should a region be reassigned either by the master load balancer or because a Re
See <<master.runtime>> for more information about the impact of the Master on HBase Client communication.
Administrative functions are done via an instance of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin]
Administrative functions are done via an instance of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin]
[[client.connections]]
=== Cluster Connections
@ -157,12 +157,12 @@ Finally, be sure to cleanup your `Connection` instance before exiting.
`Connections` are heavyweight objects but thread-safe so you can create one for your application and keep the instance around.
`Table`, `Admin` and `RegionLocator` instances are lightweight.
Create as you go and then let go as soon as you are done by closing them.
See the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html[Client Package Javadoc Description] for example usage of the new HBase 1.0 API.
See the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html[Client Package Javadoc Description] for example usage of the new HBase 1.0 API.
==== API before HBase 1.0.0
Instances of `HTable` are the way to interact with an HBase cluster earlier than 1.0.0. _link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances are not thread-safe_. Only one thread can use an instance of Table at any given time.
When creating Table instances, it is advisable to use the same link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
Instances of `HTable` are the way to interact with an HBase cluster earlier than 1.0.0. _link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances are not thread-safe_. Only one thread can use an instance of Table at any given time.
When creating Table instances, it is advisable to use the same link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
This will ensure sharing of ZooKeeper and socket instances to the RegionServers which is usually what you want.
For example, this is preferred:
@ -183,7 +183,7 @@ HBaseConfiguration conf2 = HBaseConfiguration.create();
HTable table2 = new HTable(conf2, "myTable");
----
For more information about how connections are handled in the HBase client, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html[ConnectionFactory].
For more information about how connections are handled in the HBase client, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ConnectionFactory.html[ConnectionFactory].
[[client.connection.pooling]]
===== Connection Pooling
@ -207,19 +207,19 @@ try (Connection connection = ConnectionFactory.createConnection(conf);
[WARNING]
====
Previous versions of this guide discussed `HTablePool`, which was deprecated in HBase 0.94, 0.95, and 0.96, and removed in 0.98.1, by link:https://issues.apache.org/jira/browse/HBASE-6580[HBASE-6580], or `HConnection`, which is deprecated in HBase 1.0 by `Connection`.
Please use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html[Connection] instead.
Please use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Connection.html[Connection] instead.
====
[[client.writebuffer]]
=== WriteBuffer and Batch Methods
In HBase 1.0 and later, link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] is deprecated in favor of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]. `Table` does not use autoflush. To do buffered writes, use the BufferedMutator class.
In HBase 1.0 and later, link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] is deprecated in favor of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table]. `Table` does not use autoflush. To do buffered writes, use the BufferedMutator class.
In HBase 2.0 and later, link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] does not use BufferedMutator to execute the ``Put`` operation. Refer to link:https://issues.apache.org/jira/browse/HBASE-18500[HBASE-18500] for more information.
In HBase 2.0 and later, link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.html[HTable] does not use BufferedMutator to execute the ``Put`` operation. Refer to link:https://issues.apache.org/jira/browse/HBASE-18500[HBASE-18500] for more information.
For additional information on write durability, review the link:/acid-semantics.html[ACID semantics] page.
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[batch] methods on Table.
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[batch] methods on Table.
[[async.client]]
=== Asynchronous Client ===
@ -263,7 +263,7 @@ Information on non-Java clients and custom protocols is covered in <<external_ap
[[client.filter]]
== Client Request Filters
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be optionally configured with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[filters] which are applied on the RegionServer.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be optionally configured with link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[filters] which are applied on the RegionServer.
Filters can be confusing because there are many different types, and it is best to approach them by understanding the groups of Filter functionality.
@ -275,7 +275,7 @@ Structural Filters contain other Filters.
[[client.filter.structural.fl]]
==== FilterList
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html[FilterList] represents a list of Filters with a relationship of `FilterList.Operator.MUST_PASS_ALL` or `FilterList.Operator.MUST_PASS_ONE` between the Filters.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html[FilterList] represents a list of Filters with a relationship of `FilterList.Operator.MUST_PASS_ALL` or `FilterList.Operator.MUST_PASS_ONE` between the Filters.
The following example shows an 'or' between two Filters (checking for either 'my value' or 'my other value' on the same attribute).
[source,java]
@ -305,7 +305,7 @@ scan.setFilter(list);
==== SingleColumnValueFilter
A SingleColumnValueFilter (see:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
can be used to test column values for equivalence (`CompareOperaor.EQUAL`),
inequality (`CompareOperaor.NOT_EQUAL`), or ranges (e.g., `CompareOperaor.GREATER`). The following is an
example of testing equivalence of a column to a String value "my value"...
@ -330,7 +330,7 @@ These Comparators are used in concert with other Filters, such as <<client.filte
[[client.filter.cvp.rcs]]
==== RegexStringComparator
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html[RegexStringComparator] supports regular expressions for value comparisons.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RegexStringComparator.html[RegexStringComparator] supports regular expressions for value comparisons.
[source,java]
----
@ -349,7 +349,7 @@ See the Oracle JavaDoc for link:http://download.oracle.com/javase/6/docs/api/jav
[[client.filter.cvp.substringcomparator]]
==== SubstringComparator
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html[SubstringComparator] can be used to determine if a given substring exists in a value.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html[SubstringComparator] can be used to determine if a given substring exists in a value.
The comparison is case-insensitive.
[source,java]
@ -368,12 +368,12 @@ scan.setFilter(filter);
[[client.filter.cvp.bfp]]
==== BinaryPrefixComparator
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html[BinaryPrefixComparator].
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryPrefixComparator.html[BinaryPrefixComparator].
[[client.filter.cvp.bc]]
==== BinaryComparator
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html[BinaryComparator].
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/BinaryComparator.html[BinaryComparator].
[[client.filter.kvm]]
=== KeyValue Metadata
@ -383,18 +383,18 @@ As HBase stores data internally as KeyValue pairs, KeyValue Metadata Filters eva
[[client.filter.kvm.ff]]
==== FamilyFilter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html[FamilyFilter] can be used to filter on the ColumnFamily.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FamilyFilter.html[FamilyFilter] can be used to filter on the ColumnFamily.
It is generally a better idea to select ColumnFamilies in the Scan than to do it with a Filter.
[[client.filter.kvm.qf]]
==== QualifierFilter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html[QualifierFilter] can be used to filter based on Column (aka Qualifier) name.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/QualifierFilter.html[QualifierFilter] can be used to filter based on Column (aka Qualifier) name.
[[client.filter.kvm.cpf]]
==== ColumnPrefixFilter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html[ColumnPrefixFilter] can be used to filter based on the lead portion of Column (aka Qualifier) names.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.html[ColumnPrefixFilter] can be used to filter based on the lead portion of Column (aka Qualifier) names.
A ColumnPrefixFilter seeks ahead to the first column matching the prefix in each row and for each involved column family.
It can be used to efficiently get a subset of the columns in very wide rows.
@ -427,7 +427,7 @@ rs.close();
[[client.filter.kvm.mcpf]]
==== MultipleColumnPrefixFilter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html[MultipleColumnPrefixFilter] behaves like ColumnPrefixFilter but allows specifying multiple prefixes.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.html[MultipleColumnPrefixFilter] behaves like ColumnPrefixFilter but allows specifying multiple prefixes.
Like ColumnPrefixFilter, MultipleColumnPrefixFilter efficiently seeks ahead to the first column matching the lowest prefix and also seeks past ranges of columns between prefixes.
It can be used to efficiently get discontinuous sets of columns from very wide rows.
@ -457,7 +457,7 @@ rs.close();
[[client.filter.kvm.crf]]
==== ColumnRangeFilter
A link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html[ColumnRangeFilter] allows efficient intra row scanning.
A link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/ColumnRangeFilter.html[ColumnRangeFilter] allows efficient intra row scanning.
A ColumnRangeFilter can seek ahead to the first matching column for each involved column family.
It can be used to efficiently get a 'slice' of the columns of a very wide row.
@ -498,7 +498,7 @@ Note: Introduced in HBase 0.92
[[client.filter.row.rf]]
==== RowFilter
It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html[RowFilter] can also be used.
It is generally a better idea to use the startRow/stopRow methods on Scan for row selection, however link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html[RowFilter] can also be used.
[[client.filter.utility]]
=== Utility
@ -507,7 +507,7 @@ It is generally a better idea to use the startRow/stopRow methods on Scan for ro
==== FirstKeyOnlyFilter
This is primarily used for rowcount jobs.
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter].
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter].
[[architecture.master]]
== Master
@ -634,7 +634,7 @@ However, latencies tend to be less erratic across time, because there is less ga
If the BucketCache is deployed in off-heap mode, this memory is not managed by the GC at all.
This is why you'd use BucketCache, so your latencies are less erratic and to mitigate GCs and heap fragmentation.
See Nick Dimiduk's link:http://www.n10k.com/blog/blockcache-101/[BlockCache 101] for comparisons running on-heap vs off-heap tests.
Also see link:http://people.apache.org/~stack/bc/[Comparing BlockCache Deploys] which finds that if your dataset fits inside your LruBlockCache deploy, use it otherwise if you are experiencing cache churn (or you want your cache to exist beyond the vagaries of java GC), use BucketCache.
Also see link:https://people.apache.org/~stack/bc/[Comparing BlockCache Deploys] which finds that if your dataset fits inside your LruBlockCache deploy, use it otherwise if you are experiencing cache churn (or you want your cache to exist beyond the vagaries of java GC), use BucketCache.
When you enable BucketCache, you are enabling a two tier caching system, an L1 cache which is implemented by an instance of LruBlockCache and an off-heap L2 cache which is implemented by BucketCache.
Management of these two tiers and the policy that dictates how blocks move between them is done by `CombinedBlockCache`.
@ -645,7 +645,7 @@ See <<offheap.blockcache>> for more detail on going off-heap.
==== General Cache Configurations
Apart from the cache implementation itself, you can set some general configuration options to control how the cache performs.
See http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html.
See https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html.
After setting any of these options, restart or rolling restart your cluster for the configuration to take effect.
Check logs for errors or unexpected behavior.
@ -755,7 +755,7 @@ Since link:https://issues.apache.org/jira/browse/HBASE-4683[HBASE-4683 Always ca
===== How to Enable BucketCache
The usual deploy of BucketCache is via a managing class that sets up two caching tiers: an L1 on-heap cache implemented by LruBlockCache and a second L2 cache implemented with BucketCache.
The managing class is link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache] by default.
The managing class is link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.html[CombinedBlockCache] by default.
The previous link describes the caching 'policy' implemented by CombinedBlockCache.
In short, it works by keeping meta blocks -- INDEX and BLOOM in the L1, on-heap LruBlockCache tier -- and DATA blocks are kept in the L2, BucketCache tier.
It is possible to amend this behavior in HBase since version 1.0 and ask that a column family have both its meta and DATA blocks hosted on-heap in the L1 tier by setting `cacheDataInL1` via `(HColumnDescriptor.setCacheDataInL1(true)` or in the shell, creating or amending column families setting `CACHE_DATA_IN_L1` to true: e.g.
@ -881,7 +881,7 @@ The compressed BlockCache is disabled by default. To enable it, set `hbase.block
As write requests are handled by the region server, they accumulate in an in-memory storage system called the _memstore_. Once the memstore fills, its content are written to disk as additional store files. This event is called a _memstore flush_. As store files accumulate, the RegionServer will <<compaction,compact>> them into fewer, larger files. After each flush or compaction finishes, the amount of data stored in the region has changed. The RegionServer consults the region split policy to determine if the region has grown too large or should be split for another policy-specific reason. A region split request is enqueued if the policy recommends it.
Logically, the process of splitting a region is simple. We find a suitable point in the keyspace of the region where we should divide the region in half, then split the region's data into two new regions at that point. The details of the process however are not simple. When a split happens, the newly created _daughter regions_ do not rewrite all the data into new files immediately. Instead, they create small files similar to symbolic link files, named link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/Reference.html[Reference files], which point to either the top or bottom part of the parent store file according to the split point. The reference file is used just like a regular data file, but only half of the records are considered. The region can only be split if there are no more references to the immutable data files of the parent region. Those reference files are cleaned gradually by compactions, so that the region will stop referring to its parents files, and can be split further.
Logically, the process of splitting a region is simple. We find a suitable point in the keyspace of the region where we should divide the region in half, then split the region's data into two new regions at that point. The details of the process however are not simple. When a split happens, the newly created _daughter regions_ do not rewrite all the data into new files immediately. Instead, they create small files similar to symbolic link files, named link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/io/Reference.html[Reference files], which point to either the top or bottom part of the parent store file according to the split point. The reference file is used just like a regular data file, but only half of the records are considered. The region can only be split if there are no more references to the immutable data files of the parent region. Those reference files are cleaned gradually by compactions, so that the region will stop referring to its parents files, and can be split further.
Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the `.META.` table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. The steps taken by the RegionServer to execute the split are illustrated in <<regionserver_split_process_image>>. Each step is labeled with its step number. Actions from RegionServers or Master are shown in red, while actions from the clients are show in green.
@ -915,7 +915,7 @@ Under normal operations, the WAL is not needed because data changes move from th
However, if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
If writing to the WAL fails, the entire operation to modify the data fails.
HBase uses an implementation of the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WAL.html[WAL] interface.
HBase uses an implementation of the link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/wal/WAL.html[WAL] interface.
Usually, there is only one instance of a WAL per RegionServer.
The RegionServer records Puts and Deletes to it, before recording them to the <<store.memstore>> for the affected <<store>>.
@ -1364,12 +1364,12 @@ The HDFS client does the following by default when choosing locations to write r
. Second replica is written to a random node on another rack
. Third replica is written on the same rack as the second, but on a different node chosen randomly
. Subsequent replicas are written on random nodes on the cluster.
See _Replica Placement: The First Baby Steps_ on this page: link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture]
See _Replica Placement: The First Baby Steps_ on this page: link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture]
Thus, HBase eventually achieves locality for a region after a flush or a compaction.
In a RegionServer failover situation a RegionServer may be assigned regions with non-local StoreFiles (because none of the replicas are local), however as new data is written in the region, or the table is compacted and StoreFiles are re-written, they will become "local" to the RegionServer.
For more information, see _Replica Placement: The First Baby Steps_ on this page: link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] and also Lars George's blog on link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS locality].
For more information, see _Replica Placement: The First Baby Steps_ on this page: link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] and also Lars George's blog on link:http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html[HBase and HDFS locality].
[[arch.region.splits]]
=== Region Splits
@ -1384,9 +1384,9 @@ See <<disable.splitting>> for how to manually manage splits (and for why you mig
==== Custom Split Policies
You can override the default split policy using a custom
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy](HBase 0.94+).
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy](HBase 0.94+).
Typically a custom split policy should extend HBase's default split policy:
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy].
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.html[IncreasingToUpperBoundRegionSplitPolicy].
The policy can set globally through the HBase configuration or on a per-table
basis.
@ -1460,13 +1460,13 @@ Using a Custom Algorithm::
As parameters, you give it the algorithm, desired number of regions, and column families.
It includes two split algorithms.
The first is the
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html[HexStringSplit]`
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.HexStringSplit.html[HexStringSplit]`
algorithm, which assumes the row keys are hexadecimal strings.
The second,
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html[UniformSplit]`,
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.UniformSplit.html[UniformSplit]`,
assumes the row keys are random byte arrays.
You will probably need to develop your own
`link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html[SplitAlgorithm]`,
`link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/RegionSplitter.SplitAlgorithm.html[SplitAlgorithm]`,
using the provided ones as models.
=== Online Region Merges
@ -1542,7 +1542,7 @@ StoreFiles are where your data lives.
===== HFile Format
The _HFile_ file format is based on the SSTable file described in the link:http://research.google.com/archive/bigtable.html[BigTable [2006]] paper and on Hadoop's link:http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html[TFile] (The unit test suite and the compression harness were taken directly from TFile). Schubert Zhang's blog post on link:http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html[HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs] makes for a thorough introduction to HBase's HFile.
The _HFile_ file format is based on the SSTable file described in the link:http://research.google.com/archive/bigtable.html[BigTable [2006]] paper and on Hadoop's link:https://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/file/tfile/TFile.html[TFile] (The unit test suite and the compression harness were taken directly from TFile). Schubert Zhang's blog post on link:http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html[HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs] makes for a thorough introduction to HBase's HFile.
Matteo Bertozzi has also put up a helpful description, link:http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw[HBase I/O: HFile].
For more information, see the HFile source code.
@ -2368,7 +2368,7 @@ See the `LoadIncrementalHFiles` class for more information.
As HBase runs on HDFS (and each StoreFile is written as a file on HDFS), it is important to have an understanding of the HDFS Architecture especially in terms of how it stores files, handles failovers, and replicates blocks.
See the Hadoop documentation on link:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] for more information.
See the Hadoop documentation on link:https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS Architecture] for more information.
[[arch.hdfs.nn]]
=== NameNode
@ -2772,7 +2772,7 @@ if (result.isStale()) {
=== Resources
. More information about the design and implementation can be found at the jira issue: link:https://issues.apache.org/jira/browse/HBASE-10070[HBASE-10070]
. HBaseCon 2014 talk: link:http://hbase.apache.org/www.hbasecon.com/#2014-PresentationsRecordings[HBase Read High Availability Using Timeline-Consistent Region Replicas] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
. HBaseCon 2014 talk: link:https://hbase.apache.org/www.hbasecon.com/#2014-PresentationsRecordings[HBase Read High Availability Using Timeline-Consistent Region Replicas] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
ifdef::backend-docbook[]
[index]

View File

@ -35,13 +35,13 @@ HBase is a project in the Apache Software Foundation and as such there are respo
[[asf.devprocess]]
=== ASF Development Process
See the link:http://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF.
See the link:https://www.apache.org/dev/#committers[Apache Development Process page] for all sorts of information on how the ASF is structured (e.g., PMC, committers, contributors), to tips on contributing and getting involved, and how open-source works at ASF.
[[asf.reporting]]
=== ASF Board Reporting
Once a quarter, each project in the ASF portfolio submits a report to the ASF board.
This is done by the HBase project lead and the committers.
See link:http://www.apache.org/foundation/board/reporting[ASF board reporting] for more information.
See link:https://www.apache.org/foundation/board/reporting[ASF board reporting] for more information.
:numbered:

View File

@ -43,7 +43,7 @@ _backup-masters_::
_hadoop-metrics2-hbase.properties_::
Used to connect HBase Hadoop's Metrics2 framework.
See the link:http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
See the link:https://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2[Hadoop Wiki entry] for more information on Metrics2.
Contains only commented-out examples by default.
_hbase-env.cmd_ and _hbase-env.sh_::
@ -125,7 +125,7 @@ NOTE: You must set `JAVA_HOME` on each node of your cluster. _hbase-env.sh_ prov
[[os]]
.Operating System Utilities
ssh::
HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
HBase uses the Secure Shell (ssh) command and utilities extensively to communicate between cluster nodes. Each server in the cluster must be running `ssh` so that the Hadoop and HBase daemons can be managed. You must be able to connect to all nodes via SSH, including the local node, from the Master as well as any backup Master, using a shared key rather than a password. You can see the basic methodology for such a set-up in Linux or Unix systems at "<<passwordless.ssh.quickstart>>". If your cluster nodes use OS X, see the section, link:https://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29[SSH: Setting up Remote Desktop and Enabling Self-Login] on the Hadoop wiki.
DNS::
HBase uses the local hostname to self-report its IP address. Both forward and reverse DNS resolving must work in versions of HBase previous to 0.92.0. The link:https://github.com/sujee/hadoop-dns-checker[hadoop-dns-checker] tool can be used to verify DNS is working correctly on the cluster. The project `README` file provides detailed instructions on usage.
@ -181,13 +181,13 @@ Windows::
[[hadoop]]
=== link:http://hadoop.apache.org[Hadoop](((Hadoop)))
=== link:https://hadoop.apache.org[Hadoop](((Hadoop)))
The following table summarizes the versions of Hadoop supported with each version of HBase.
Based on the version of HBase, you should select the most appropriate version of Hadoop.
You can use Apache Hadoop, or a vendor's distribution of Hadoop.
No distinction is made here.
See link:http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
See link:https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support[the Hadoop wiki] for information about vendors of Hadoop.
.Hadoop 2.x is recommended.
[TIP]
@ -358,7 +358,7 @@ Distributed mode can be subdivided into distributed but all daemons run on a sin
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
See the Hadoop link:http://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
See the Hadoop link:https://hadoop.apache.org/docs/current/[documentation] for how to set up HDFS.
A good walk-through for setting up HDFS on Hadoop 2 can be found at http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide.
[[pseudo]]
@ -577,7 +577,7 @@ An example basic _hbase-site.xml_ for client only might look as follows:
[[java.client.config]]
==== Java client configuration
The configuration used by a Java client is kept in an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
The configuration used by a Java client is kept in an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration[HBaseConfiguration] instance.
The factory method on HBaseConfiguration, `HBaseConfiguration.create();`, on invocation, will read in the content of the first _hbase-site.xml_ found on the client's `CLASSPATH`, if one is present (Invocation will also factor in any _hbase-default.xml_ found; an _hbase-default.xml_ ships inside the _hbase.X.X.X.jar_). It is also possible to specify configuration directly without having to read from a _hbase-site.xml_.
For example, to set the ZooKeeper ensemble for the cluster programmatically do as follows:
@ -588,7 +588,7 @@ Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally
----
If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in a comma-separated list (just as in the _hbase-site.xml_ file). This populated `Configuration` instance can then be passed to an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], and so on.
[[example_config]]
== Example Configurations
@ -822,7 +822,7 @@ See the entry for `hbase.hregion.majorcompaction` in the <<compaction.parameters
====
Major compactions are absolutely necessary for StoreFile clean-up.
Do not disable them altogether.
You can run major compactions manually via the HBase shell or via the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
You can run major compactions manually via the HBase shell or via the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
====
For more information about compactions and the compaction file selection process, see <<compaction,compaction>>

View File

@ -61,7 +61,7 @@ coprocessor can severely degrade cluster performance and stability.
In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
query. In order to fetch only the relevant data, you filter it using a HBase
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
, whereas in an RDBMS you use a `WHERE` predicate.
After fetching the data, you perform computations on it. This paradigm works well
@ -121,8 +121,8 @@ package.
Observer coprocessors are triggered either before or after a specific event occurs.
Observers that happen before an event use methods that start with a `pre` prefix,
such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
with a `post` prefix, such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
==== Use Cases for Observer Coprocessors
@ -139,7 +139,7 @@ Referential Integrity::
Secondary Indexes::
You can use a coprocessor to maintain secondary indexes. For more information, see
link:http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
link:https://wiki.apache.org/hadoop/Hbase/SecondaryIndexing[SecondaryIndexing].
==== Types of Observer Coprocessor
@ -163,7 +163,7 @@ MasterObserver::
WalObserver::
A WalObserver allows you to observe events related to writes to the Write-Ahead
Log (WAL). See
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
<<cp_example,Examples>> provides working examples of observer coprocessors.

View File

@ -270,21 +270,21 @@ Cell content is uninterpreted bytes
== Data Model Operations
The four primary data model operations are Get, Put, Scan, and Delete.
Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
Operations are applied via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
=== Get
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
Gets are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
=== Put
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
[[scan]]
=== Scans
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
The following is an example of a Scan on a Table instance.
Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row".
@ -311,12 +311,12 @@ try {
}
----
Note that generally the easiest way to specify a specific stop point for a scan is by using the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
Note that generally the easiest way to specify a specific stop point for a scan is by using the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
=== Delete
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
Deletes are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
These tombstones, along with the dead values, are cleaned up on major compactions.
@ -355,7 +355,7 @@ Prior to HBase 0.96, the default number of versions kept was `3`, but in 0.96 an
.Modify the Maximum Number of Versions for a Column Family
====
This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`.
You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
----
hbase> alter t1, NAME => f1, VERSIONS => 5
@ -367,7 +367,7 @@ hbase> alter t1, NAME => f1, VERSIONS => 5
You can also specify the minimum number of versions to store per column family.
By default, this is set to 0, which means the feature is disabled.
The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell.
You could also use link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
----
hbase> alter t1, NAME => f1, MIN_VERSIONS => 2
@ -385,12 +385,12 @@ In this section we look at the behavior of the version dimension for each of the
==== Get/Scan
Gets are implemented on top of Scans.
The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
The below discussion of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
* to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
* to return more than one version, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
* to return versions other than the latest, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
+
To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.

View File

@ -46,7 +46,7 @@ As Apache HBase is an Apache Software Foundation project, see <<asf,asf>>
=== Mailing Lists
Sign up for the dev-list and the user-list.
See the link:http://hbase.apache.org/mail-lists.html[mailing lists] page.
See the link:https://hbase.apache.org/mail-lists.html[mailing lists] page.
Posing questions - and helping to answer other people's questions - is encouraged! There are varying levels of experience on both lists so patience and politeness are encouraged (and please stay on topic.)
[[slack]]
@ -173,8 +173,8 @@ GIT is our repository of record for all but the Apache HBase website.
We used to be on SVN.
We migrated.
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
See link:http://hbase.apache.org/source-repository.html[Source Code
Management] page for contributor and committer links or search for HBase on the link:http://git.apache.org/[Apache Git] page.
See link:https://hbase.apache.org/source-repository.html[Source Code
Management] page for contributor and committer links or search for HBase on the link:https://git.apache.org/[Apache Git] page.
== IDEs
@ -583,7 +583,7 @@ the checking of the produced artifacts to ensure they are 'good' --
e.g. extracting the produced tarballs, verifying that they
look right, then starting HBase and checking that everything is running
correctly -- or the signing and pushing of the tarballs to
link:http://people.apache.org[people.apache.org].
link:https://people.apache.org[people.apache.org].
Take a look. Modify/improve as you see fit.
====
@ -763,7 +763,7 @@ To finish the release, take up the script from here on out.
+
The artifacts are in the maven repository in the staging area in the 'open' state.
While in this 'open' state you can check out what you've published to make sure all is good.
To do this, log in to Apache's Nexus at link:http://repository.apache.org[repository.apache.org] using your Apache ID.
To do this, log in to Apache's Nexus at link:https://repository.apache.org[repository.apache.org] using your Apache ID.
Find your artifacts in the staging repository. Click on 'Staging Repositories' and look for a new one ending in "hbase" with a status of 'Open', select it.
Use the tree view to expand the list of repository contents and inspect if the artifacts you expect are present. Check the POMs.
As long as the staging repo is open you can re-upload if something is missing or built incorrectly.
@ -785,7 +785,7 @@ Be sure to edit the pom to point to the proper staging repository.
Make sure you are pulling from the repository when tests run and that you are not getting from your local repository, by either passing the `-U` flag or deleting your local repo content and check maven is pulling from remote out of the staging repository.
====
See link:http://www.apache.org/dev/publishing-maven-artifacts.html[Publishing Maven Artifacts] for some pointers on this maven staging process.
See link:https://www.apache.org/dev/publishing-maven-artifacts.html[Publishing Maven Artifacts] for some pointers on this maven staging process.
If the HBase version ends in `-SNAPSHOT`, the artifacts go elsewhere.
They are put into the Apache snapshots repository directly and are immediately available.
@ -870,7 +870,7 @@ This plugin is run when you specify the +site+ goal as in when you run +mvn site
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on building the documentation.
[[hbase.org]]
== Updating link:http://hbase.apache.org[hbase.apache.org]
== Updating link:https://hbase.apache.org[hbase.apache.org]
[[hbase.org.site.contributing]]
=== Contributing to hbase.apache.org
@ -878,7 +878,7 @@ See <<appendix_contributing_to_documentation,appendix contributing to documentat
See <<appendix_contributing_to_documentation,appendix contributing to documentation>> for more information on contributing to the documentation or website.
[[hbase.org.site.publishing]]
=== Publishing link:http://hbase.apache.org[hbase.apache.org]
=== Publishing link:https://hbase.apache.org[hbase.apache.org]
See <<website_publish>> for instructions on publishing the website and documentation.
@ -1278,7 +1278,7 @@ $ mvn clean install test -Dtest=TestZooKeeper -PskipIntegrationTests
==== Running integration tests against mini cluster
HBase 0.92 added a `verify` maven target.
Invoking it, for example by doing `mvn verify`, will run all the phases up to and including the verify phase via the maven link:http://maven.apache.org/plugins/maven-failsafe-plugin/[failsafe
Invoking it, for example by doing `mvn verify`, will run all the phases up to and including the verify phase via the maven link:https://maven.apache.org/plugins/maven-failsafe-plugin/[failsafe
plugin], running all the above mentioned HBase unit tests as well as tests that are in the HBase integration test group.
After you have completed +mvn install -DskipTests+ You can run just the integration tests by invoking:
@ -1333,7 +1333,7 @@ Currently there is no support for running integration tests against a distribute
The tests interact with the distributed cluster by using the methods in the `DistributedHBaseCluster` (implementing `HBaseCluster`) class, which in turn uses a pluggable `ClusterManager`.
Concrete implementations provide actual functionality for carrying out deployment-specific and environment-dependent tasks (SSH, etc). The default `ClusterManager` is `HBaseClusterManager`, which uses SSH to remotely execute start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also assumes the user running the test has enough "power" to start/stop servers on the remote machines.
By default, it picks up `HBASE_SSH_OPTS`, `HBASE_HOME`, `HBASE_CONF_DIR` from the env, and uses `bin/hbase-daemon.sh` to carry out the actions.
Currently tarball deployments, deployments which uses _hbase-daemons.sh_, and link:http://incubator.apache.org/ambari/[Apache Ambari] deployments are supported.
Currently tarball deployments, deployments which uses _hbase-daemons.sh_, and link:https://incubator.apache.org/ambari/[Apache Ambari] deployments are supported.
_/etc/init.d/_ scripts are not supported for now, but it can be easily added.
For other deployment options, a ClusterManager can be implemented and plugged in.
@ -1851,10 +1851,10 @@ The script checks the directory for sub-directory called _.git/_, before proceed
=== Submitting Patches
If you are new to submitting patches to open source or new to submitting patches to Apache, start by
reading the link:http://commons.apache.org/patches.html[On Contributing Patches] page from
link:http://commons.apache.org/[Apache Commons Project].
reading the link:https://commons.apache.org/patches.html[On Contributing Patches] page from
link:https://commons.apache.org/[Apache Commons Project].
It provides a nice overview that applies equally to the Apache HBase Project.
link:http://accumulo.apache.org/git.html[Accumulo doc on how to contribute and develop] is also
link:https://accumulo.apache.org/git.html[Accumulo doc on how to contribute and develop] is also
good read to understand development workflow.
[[submitting.patches.create]]
@ -1948,11 +1948,11 @@ Significant new features should provide an integration test in addition to unit
[[reviewboard]]
==== ReviewBoard
Patches larger than one screen, or patches that will be tricky to review, should go through link:http://reviews.apache.org[ReviewBoard].
Patches larger than one screen, or patches that will be tricky to review, should go through link:https://reviews.apache.org[ReviewBoard].
.Procedure: Use ReviewBoard
. Register for an account if you don't already have one.
It does not use the credentials from link:http://issues.apache.org[issues.apache.org].
It does not use the credentials from link:https://issues.apache.org[issues.apache.org].
Log in.
. Click [label]#New Review Request#.
. Choose the `hbase-git` repository.
@ -1978,8 +1978,8 @@ For more information on how to use ReviewBoard, see link:http://www.reviewboard.
New committers are encouraged to first read Apache's generic committer documentation:
* link:http://www.apache.org/dev/new-committers-guide.html[Apache New Committer Guide]
* link:http://www.apache.org/dev/committers.html[Apache Committer FAQ]
* link:https://www.apache.org/dev/new-committers-guide.html[Apache New Committer Guide]
* link:https://www.apache.org/dev/committers.html[Apache Committer FAQ]
===== Review

View File

@ -29,7 +29,7 @@
This chapter will cover access to Apache HBase either through non-Java languages and
through custom protocols. For information on using the native HBase APIs, refer to
link:http://hbase.apache.org/apidocs/index.html[User API Reference] and the
link:https://hbase.apache.org/apidocs/index.html[User API Reference] and the
<<hbase_apis,HBase APIs>> chapter.
== REST
@ -642,8 +642,8 @@ represent persistent data.
This code example has the following dependencies:
. HBase 0.90.x or newer
. commons-beanutils.jar (http://commons.apache.org/)
. commons-pool-1.5.5.jar (http://commons.apache.org/)
. commons-beanutils.jar (https://commons.apache.org/)
. commons-pool-1.5.5.jar (https://commons.apache.org/)
. transactional-tableindexed for HBase 0.90 (https://github.com/hbase-trx/hbase-transactional-tableindexed)
.Download `hbase-jdo`
@ -803,7 +803,7 @@ with HBase.
----
resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"
resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/"
resolvers += "Thrift" at "https://people.apache.org/~rawson/repo/"
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-core" % "0.20.2",

View File

@ -33,10 +33,10 @@ When should I use HBase?::
See <<arch.overview>> in the Architecture chapter.
Are there other HBase FAQs?::
See the FAQ that is up on the wiki, link:http://wiki.apache.org/hadoop/Hbase/FAQ[HBase Wiki FAQ].
See the FAQ that is up on the wiki, link:https://wiki.apache.org/hadoop/Hbase/FAQ[HBase Wiki FAQ].
Does HBase support SQL?::
Not really. SQL-ish support for HBase via link:http://hive.apache.org/[Hive] is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the <<datamodel>> section for examples on the HBase client.
Not really. SQL-ish support for HBase via link:https://hive.apache.org/[Hive] is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the <<datamodel>> section for examples on the HBase client.
How can I find examples of NoSQL/HBase?::
See the link to the BigTable paper in <<other.info>>, as well as the other papers.

View File

@ -70,7 +70,7 @@ See <<java,Java>> for information about supported JDK versions.
=== Get Started with HBase
.Procedure: Download, Configure, and Start HBase in Standalone Mode
. Choose a download site from this list of link:http://www.apache.org/dyn/closer.cgi/hbase/[Apache Download Mirrors].
. Choose a download site from this list of link:https://www.apache.org/dyn/closer.cgi/hbase/[Apache Download Mirrors].
Click on the suggested top link.
This will take you to a mirror of _HBase Releases_.
Click on the folder named _stable_ and then download the binary file that ends in _.tar.gz_ to your local filesystem.
@ -307,7 +307,7 @@ You can skip the HDFS configuration to continue storing your data in the local f
This procedure assumes that you have configured Hadoop and HDFS on your local system and/or a remote
system, and that they are running and available. It also assumes you are using Hadoop 2.
The guide on
link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster]
link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html[Setting up a Single Node Cluster]
in the Hadoop documentation is a good starting point.
====

View File

@ -459,7 +459,7 @@ The host name or IP address of the name server (DNS)
ZooKeeper session timeout in milliseconds. It is used in two different ways.
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
https://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
For example, if an HBase region server connects to a ZK ensemble that's also managed
by HBase, then the
session timeout will be the one specified by this configuration. But, a region server that connects
@ -523,7 +523,7 @@ The host name or IP address of the name server (DNS)
+
.Description
Port used by ZooKeeper peers to talk to each other.
See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
See https://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
for more information.
+
.Default
@ -535,7 +535,7 @@ Port used by ZooKeeper peers to talk to each other.
+
.Description
Port used by ZooKeeper for leader election.
See http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
See https://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperStarted.html#sc_RunningReplicatedZooKeeper
for more information.
+
.Default
@ -1946,7 +1946,7 @@ If the DFSClient configuration
Class used to execute the regions balancing when the period occurs.
See the class comment for more on how it works
http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.html
It replaces the DefaultLoadBalancer as the default (since renamed
as the SimpleLoadBalancer).

View File

@ -28,7 +28,7 @@
:experimental:
This chapter provides information about performing operations using HBase native APIs.
This information is not exhaustive, and provides a quick reference in addition to the link:http://hbase.apache.org/apidocs/index.html[User API Reference].
This information is not exhaustive, and provides a quick reference in addition to the link:https://hbase.apache.org/apidocs/index.html[User API Reference].
The examples here are not comprehensive or complete, and should be used for purposes of illustration only.
Apache HBase also works with multiple external APIs.

View File

@ -29,8 +29,8 @@
Apache MapReduce is a software framework used to analyze large amounts of data. It is provided by link:https://hadoop.apache.org/[Apache Hadoop].
MapReduce itself is out of the scope of this document.
A good place to get started with MapReduce is http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.
MapReduce version 2 (MR2)is now part of link:http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/[YARN].
A good place to get started with MapReduce is https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html.
MapReduce version 2 (MR2)is now part of link:https://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/[YARN].
This chapter discusses specific configuration steps you need to take to use MapReduce on data within HBase.
In addition, it discusses other interactions and issues between HBase and MapReduce
@ -259,10 +259,10 @@ $ ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-mapreduce-VERSION.jar rowcou
== HBase as a MapReduce Job Data Source and Data Sink
HBase can be used as a data source, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat], and data sink, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html[MultiTableOutputFormat], for MapReduce jobs.
Writing MapReduce jobs that read or write HBase, it is advisable to subclass link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper] and/or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableReducer.html[TableReducer].
See the do-nothing pass-through classes link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableMapper.html[IdentityTableMapper] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html[IdentityTableReducer] for basic usage.
For a more involved example, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] or review the `org.apache.hadoop.hbase.mapreduce.TestTableMapReduce` unit test.
HBase can be used as a data source, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat], and data sink, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.html[MultiTableOutputFormat], for MapReduce jobs.
Writing MapReduce jobs that read or write HBase, it is advisable to subclass link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper] and/or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableReducer.html[TableReducer].
See the do-nothing pass-through classes link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableMapper.html[IdentityTableMapper] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html[IdentityTableReducer] for basic usage.
For a more involved example, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] or review the `org.apache.hadoop.hbase.mapreduce.TestTableMapReduce` unit test.
If you run MapReduce jobs that use HBase as source or sink, need to specify source and sink table and column names in your configuration.
@ -275,7 +275,7 @@ On insert, HBase 'sorts' so there is no point double-sorting (and shuffling data
If you do not need the Reduce, your map might emit counts of records processed for reporting at the end of the job, or set the number of Reduces to zero and use TableOutputFormat.
If running the Reduce step makes sense in your case, you should typically use multiple reducers so that load is spread across the HBase cluster.
A new HBase partitioner, the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.html[HRegionPartitioner], can run as many reducers the number of existing regions.
A new HBase partitioner, the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/HRegionPartitioner.html[HRegionPartitioner], can run as many reducers the number of existing regions.
The HRegionPartitioner is suitable when your table is large and your upload will not greatly alter the number of existing regions upon completion.
Otherwise use the default partitioner.
@ -286,7 +286,7 @@ For more on how this mechanism works, see <<arch.bulk.load>>.
== RowCounter Example
The included link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job uses `TableInputFormat` and does a count of all rows in the specified table.
The included link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job uses `TableInputFormat` and does a count of all rows in the specified table.
To run it, use the following command:
[source,bash]
@ -306,13 +306,13 @@ If you have classpath errors, see <<hbase.mapreduce.classpath>>.
[[splitter.default]]
=== The Default HBase MapReduce Splitter
When link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat] is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table.
When link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html[TableInputFormat] is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table.
Thus, if there are 100 regions in the table, there will be 100 map-tasks for the job - regardless of how many column families are selected in the Scan.
[[splitter.custom]]
=== Custom Splitters
For those interested in implementing custom splitters, see the method `getSplits` in link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html[TableInputFormatBase].
For those interested in implementing custom splitters, see the method `getSplits` in link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html[TableInputFormatBase].
That is where the logic for map-task assignment resides.
[[mapreduce.example]]
@ -352,7 +352,7 @@ if (!b) {
}
----
...and the mapper instance would extend link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper]...
...and the mapper instance would extend link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableMapper.html[TableMapper]...
[source,java]
----
@ -400,7 +400,7 @@ if (!b) {
}
----
An explanation is required of what `TableMapReduceUtil` is doing, especially with the reducer. link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] is being used as the outputFormat class, and several parameters are being set on the config (e.g., `TableOutputFormat.OUTPUT_TABLE`), as well as setting the reducer output key to `ImmutableBytesWritable` and reducer value to `Writable`.
An explanation is required of what `TableMapReduceUtil` is doing, especially with the reducer. link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat] is being used as the outputFormat class, and several parameters are being set on the config (e.g., `TableOutputFormat.OUTPUT_TABLE`), as well as setting the reducer output key to `ImmutableBytesWritable` and reducer value to `Writable`.
These could be set by the programmer on the job and conf, but `TableMapReduceUtil` tries to make things easier.
The following is the example mapper, which will create a `Put` and matching the input `Result` and emit it.

View File

@ -620,7 +620,7 @@ To NOT run WALPlayer as a mapreduce job on your cluster, force it to run all in
[[rowcounter]]
=== RowCounter and CellCounter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] is a mapreduce job to count all the rows of a table.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] is a mapreduce job to count all the rows of a table.
This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit. It is also possible to limit
the time range of data to be scanned by using the `--starttime=[starttime]` and `--endtime=[endtime]` flags.
@ -633,7 +633,7 @@ RowCounter only counts one version per cell.
Note: caching for the input Scan is configured via `hbase.client.scanner.caching` in the job configuration.
HBase ships another diagnostic mapreduce job called link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html[CellCounter].
HBase ships another diagnostic mapreduce job called link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html[CellCounter].
Like RowCounter, it gathers more fine-grained statistics about your table.
The statistics gathered by RowCounter are more fine-grained and include:
@ -666,7 +666,7 @@ See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add ability
=== Offline Compaction Tool
See the usage for the
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool].
Run it like:
[source, bash]
@ -722,7 +722,7 @@ The LoadTestTool has received many updates in recent HBase releases, including s
[[ops.regionmgt.majorcompact]]
=== Major Compaction
Major compactions can be requested via the HBase shell or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
Major compactions can be requested via the HBase shell or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
Note: major compactions do NOT do region merges.
See <<compaction,compaction>> for more information about compactions.
@ -868,7 +868,7 @@ But usually disks do the "John Wayne" -- i.e.
take a while to go down spewing errors in _dmesg_ -- or for some reason, run much slower than their companions.
In this case you want to decommission the disk.
You have two options.
You can link:http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F[decommission
You can link:https://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F[decommission
the datanode] or, less disruptive in that only the bad disks data will be rereplicated, can stop the datanode, unmount the bad volume (You can't umount a volume while the datanode is using it), and then restart the datanode (presuming you have set dfs.datanode.failed.volumes.tolerated > 0). The regionserver will throw some errors in its logs as it recalibrates where to get its data from -- it will likely roll its WAL log too -- but in general but for some latency spikes, it should keep on chugging.
.Short Circuit Reads
@ -1021,7 +1021,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h
Restart the region server for the changes to take effect.
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
To filter which metrics are emitted or to extend the metrics framework, see https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
.HBase Metrics and Ganglia
[NOTE]
@ -1029,7 +1029,7 @@ To filter which metrics are emitted or to extend the metrics framework, see http
By default, HBase emits a large number of metrics per region server.
Ganglia may have difficulty processing all these metrics.
Consider increasing the capacity of the Ganglia server or reducing the number of metrics emitted by HBase.
See link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering[Metrics Filtering].
See link:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html#filtering[Metrics Filtering].
====
=== Disabling Metrics
@ -1455,7 +1455,7 @@ A single WAL edit goes through several steps in order to be replicated to a slav
. The edit is tagged with the master's UUID and added to a buffer.
When the buffer is filled, or the reader reaches the end of the file, the buffer is sent to a random region server on the slave cluster.
. The region server reads the edits sequentially and separates them into buffers, one buffer per table.
After all edits are read, each buffer is flushed using link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], HBase's normal client.
After all edits are read, each buffer is flushed using link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table], HBase's normal client.
The master's UUID and the UUIDs of slaves which have already consumed the data are preserved in the edits they are applied, in order to prevent replication loops.
. In the master, the offset for the WAL that is currently being replicated is registered in ZooKeeper.
@ -2090,7 +2090,7 @@ The act of copying these files creates new HDFS metadata, which is why a restore
=== Live Cluster Backup - Replication
This approach assumes that there is a second cluster.
See the HBase page on link:http://hbase.apache.org/book.html#_cluster_replication[replication] for more information.
See the HBase page on link:https://hbase.apache.org/book.html#_cluster_replication[replication] for more information.
[[ops.backup.live.copytable]]
=== Live Cluster Backup - CopyTable
@ -2299,7 +2299,7 @@ as in <<snapshots_s3>>.
- You must be using HBase 1.2 or higher with Hadoop 2.7.1 or
higher. No version of HBase supports Hadoop 2.7.0.
- Your hosts must be configured to be aware of the Azure blob storage filesystem.
See http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
See https://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html.
After you meet the prerequisites, follow the instructions
in <<snapshots_s3>>, replacingthe protocol specifier with `wasb://` or `wasbs://`.
@ -2362,7 +2362,7 @@ See <<gcpause,gcpause>>, <<trouble.log.gc,trouble.log.gc>> and elsewhere (TODO:
Generally less regions makes for a smoother running cluster (you can always manually split the big regions later (if necessary) to spread the data, or request load, over the cluster); 20-200 regions per RS is a reasonable range.
The number of regions cannot be configured directly (unless you go for fully <<disable.splitting,disable.splitting>>); adjust the region size to achieve the target region size given table size.
When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor], as well as shell commands.
When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor], as well as shell commands.
These settings will override the ones in `hbase-site.xml`.
That is useful if your tables have different workloads/use cases.

View File

@ -320,7 +320,7 @@ See also <<perf.compression.however>> for compression caveats.
[[schema.regionsize]]
=== Table RegionSize
The regionsize can be set on a per-table basis via `setFileSize` on link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize.
The regionsize can be set on a per-table basis via `setFileSize` on link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor] in the event where certain tables require different regionsizes than the configured default regionsize.
See <<ops.capacity.regions>> for more information.
@ -372,7 +372,7 @@ Bloom filters are enabled on a Column Family.
You can do this by using the setBloomFilterType method of HColumnDescriptor or using the HBase API.
Valid values are `NONE`, `ROW` (default), or `ROWCOL`.
See <<bloom.filters.when>> for more information on `ROW` versus `ROWCOL`.
See also the API documentation for link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
See also the API documentation for link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
The following example creates a table and enables a ROWCOL Bloom filter on the `colfam1` column family.
@ -431,7 +431,7 @@ The blocksize can be configured for each ColumnFamily in a table, and defaults t
Larger cell values require larger blocksizes.
There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information.
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] and <<store>>for more information.
[[cf.in.memory]]
=== In-Memory ColumnFamilies
@ -440,7 +440,7 @@ ColumnFamilies can optionally be defined as in-memory.
Data is still persisted to disk, just like any other ColumnFamily.
In-memory blocks have the highest priority in the <<block.cache>>, but it is not a guarantee that the entire table will be in memory.
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
[[perf.compression]]
=== Compression
@ -549,7 +549,7 @@ If deferred log flush is used, WAL edits are kept in memory until the flush peri
The benefit is aggregated and asynchronous `WAL`- writes, but the potential downside is that if the RegionServer goes down the yet-to-be-flushed edits are lost.
This is safer, however, than not using WAL at all with Puts.
Deferred log flush can be configured on tables via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor].
Deferred log flush can be configured on tables via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html[HTableDescriptor].
The default value of `hbase.regionserver.optionallogflushinterval` is 1000ms.
[[perf.hbase.client.putwal]]
@ -574,7 +574,7 @@ There is a utility `HTableUtil` currently on MASTER that does this, but you can
[[perf.hbase.write.mr.reducer]]
=== MapReduce: Skip The Reducer
When writing a lot of data to an HBase table from a MR job (e.g., with link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step.
When writing a lot of data to an HBase table from a MR job (e.g., with link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html[TableOutputFormat]), and specifically where Puts are being emitted from the Mapper, skip the Reducer step.
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
It's far more efficient to just write directly to HBase.
@ -600,7 +600,7 @@ For example, here is a good general thread on what to look at addressing read-ti
[[perf.hbase.client.caching]]
=== Scan Caching
If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed.
If HBase is used as an input source for a MapReduce job, for example, make sure that the input link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instance to the MapReduce job has `setCaching` set to something greater than the default (which is 1). Using the default value means that the map-task will make call back to the region-server for every record processed.
Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed.
There is a cost/benefit to have the cache value be large because it costs more in memory for both client and RegionServer, so bigger isn't always better.
@ -649,7 +649,7 @@ For MapReduce jobs that use HBase tables as a source, if there a pattern where t
=== Close ResultScanners
This isn't so much about improving performance but rather _avoiding_ performance problems.
If you forget to close link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers.
If you forget to close link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html[ResultScanners] you can cause problems on the RegionServers.
Always have ResultScanner processing enclosed in try/catch blocks.
[source,java]
@ -669,7 +669,7 @@ table.close();
[[perf.hbase.client.blockcache]]
=== Block Cache
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method.
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] instances can be set to use the block cache in the RegionServer via the `setCacheBlocks` method.
For input Scans to MapReduce jobs, this should be `false`.
For frequently accessed rows, it is advisable to use the block cache.
@ -679,8 +679,8 @@ See <<offheap.blockcache>>
[[perf.hbase.client.rowkeyonly]]
=== Optimal Loading of Row Keys
When performing a table link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`.
The filter list should include both a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter].
When performing a table link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[scan] where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a `MUST_PASS_ALL` operator to the scanner using `setFilter`.
The filter list should include both a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html[FirstKeyOnlyFilter] and a link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html[KeyOnlyFilter].
Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk and minimal network traffic to the client for a single row.
[[perf.hbase.read.dist]]
@ -816,7 +816,7 @@ In this case, special care must be taken to regularly perform major compactions
As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads.
Tombstones only get cleaned up with major compactions.
See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
See also <<compaction>> and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
[[perf.deleting.rpc]]
=== Delete RPC Behavior
@ -825,7 +825,7 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer.
It will execute an RegionServer RPC with each invocation.
For a large number of deletes, consider `Table.delete(List)`.
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete]
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete]
[[perf.hdfs]]
== HDFS

View File

@ -27,11 +27,11 @@
:icons: font
:experimental:
This is the official reference guide for the link:http://hbase.apache.org/[HBase] version it ships with.
This is the official reference guide for the link:https://hbase.apache.org/[HBase] version it ships with.
Herein you will find either the definitive documentation on an HBase topic as of its
standing when the referenced HBase version shipped, or it will point to the location
in link:http://hbase.apache.org/apidocs/index.html[Javadoc] or
in link:https://hbase.apache.org/apidocs/index.html[Javadoc] or
link:https://issues.apache.org/jira/browse/HBASE[JIRA] where the pertinent information can be found.
.About This Guide

View File

@ -28,7 +28,7 @@
:icons: font
:experimental:
In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop
In 0.95, all client/server communication is done with link:https://developers.google.com/protocol-buffers/[protobuf'ed] Messages rather than with link:https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Writable.html[Hadoop
Writables].
Our RPC wire format therefore changes.
This document describes the client/server request/response protocol and our new RPC wire-format.

View File

@ -44,7 +44,7 @@ modeling on HBase.
[[schema.creation]]
== Schema Creation
HBase schemas can be created or updated using the <<shell>> or by using link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API.
HBase schemas can be created or updated using the <<shell>> or by using link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html[Admin] in the Java API.
Tables must be disabled when making ColumnFamily modifications, for example:
@ -220,7 +220,7 @@ You could also optimize things so that certain pairs of keys were always in the
A third common trick for preventing hotspotting is to reverse a fixed-width or numeric row key so that the part that changes the most often (the least significant digit) is first.
This effectively randomizes row keys, but sacrifices row ordering properties.
See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:http://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting.
See https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables, and link:https://phoenix.apache.org/salted.html[article on Salted Tables] from the Phoenix project, and the discussion in the comments of link:https://issues.apache.org/jira/browse/HBASE-11682[HBASE-11682] for more information about avoiding hotspotting.
[[timeseries]]
=== Monotonically Increasing Row Keys/Timeseries Data
@ -430,7 +430,7 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
[[schema.versions.max]]
=== Maximum Number of Versions
The maximum number of row versions to store is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
The maximum number of row versions to store is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
The default for max versions is 1.
This is an important parameter because as described in <<datamodel>> section HBase does _not_ overwrite row values, but rather stores different values per row by time (and qualifier). Excess versions are removed during major compactions.
The number of max versions may need to be increased or decreased depending on application needs.
@ -440,14 +440,14 @@ It is not recommended setting the number of max versions to an exceedingly high
[[schema.minversions]]
=== Minimum Number of Versions
Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
The default for min versions is 0, which means the feature is disabled.
The minimum number of row versions parameter is used together with the time-to-live parameter and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, _but keep at least M versions around_" (where M is the value for minimum number of row versions, M<N). This parameter should only be set when time-to-live is enabled for a column family and must be less than the number of row versions.
[[supported.datatypes]]
== Supported Datatypes
HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
HBase supports a "bytes-in/bytes-out" interface via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic.
@ -456,7 +456,7 @@ Take that into consideration when making your design, as well as block size for
=== Counters
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`.
One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#increment%28org.apache.hadoop.hbase.client.Increment%29[Increment] in `Table`.
Synchronization on counters are done on the RegionServer, not in the client.
@ -476,7 +476,7 @@ Store files which contains only expired rows are deleted on minor compaction.
Setting `hbase.store.delete.expired.storefile` to `false` disables this feature.
Setting minimum number of versions to other than 0 also disables this.
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor] for more information.
Recent versions of HBase also support setting time to live on a per cell basis.
See link:https://issues.apache.org/jira/browse/HBASE-10560[HBASE-10560] for more information.
@ -491,7 +491,7 @@ There are two notable differences between cell TTL handling and ColumnFamily TTL
== Keeping Deleted Cells
By default, delete markers extend back to the beginning of time.
Therefore, link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed.
Therefore, link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed.
ColumnFamilies can optionally keep deleted cells.
In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any delete that would affect the cells.
@ -681,7 +681,7 @@ in the table (e.g. make sure values are in the range 1-10). Constraints could
also be used to enforce referential integrity, but this is strongly discouraged
as it will dramatically decrease the write throughput of the tables where integrity
checking is enabled. Extensive documentation on using Constraints can be found at
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint]
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/constraint/Constraint.html[Constraint]
since version 0.94.
[[schema.casestudies]]
@ -1092,7 +1092,7 @@ The tl;dr version is that you should probably go with one row per user+value, an
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown).
Doing it this way is generally recommended (see here https://hbase.apache.org/book.html#schema.smackdown).
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.

View File

@ -354,7 +354,7 @@ grant 'rest_server', 'RWCA'
For more information about ACLs, please see the <<hbase.accesscontrol.configuration>> section
HBase REST gateway supports link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway.
HBase REST gateway supports link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication] for client access to the gateway.
To enable REST gateway Kerberos authentication for client access, add the following to the `hbase-site.xml` file for every REST gateway.
[source,xml]
@ -390,7 +390,7 @@ Substitute the keytab for HTTP for _$KEYTAB_.
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
For more information, refer to link:https://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
[[security.rest.gateway]]
=== REST Gateway Impersonation Configuration
@ -1390,11 +1390,11 @@ When you issue a Scan or Get, HBase uses your default set of authorizations to
filter out cells that you do not have access to. A superuser can set the default
set of authorizations for a given user by using the `set_auths` HBase Shell command
or the
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method.
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method.
You can specify a different authorization during the Scan or Get, by passing the
AUTHORIZATIONS option in HBase Shell, or the
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()]
link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()]
method if you use the API. This authorization will be combined with your default
set as an additional filter. It will further filter your results, rather than
giving you additional authorization.
@ -1644,7 +1644,7 @@ Rotate the Master Key::
Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the MapReduce job to HBase.
Secure bulk loading is implemented by a coprocessor, named
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/access/SecureBulkLoadEndpoint.html[SecureBulkLoadEndpoint],
which uses a staging directory configured by the configuration property `hbase.bulkload.staging.dir`, which defaults to
_/tmp/hbase-staging/_.

View File

@ -33,10 +33,10 @@ The following projects offer some support for SQL over HBase.
[[phoenix]]
=== Apache Phoenix
link:http://phoenix.apache.org[Apache Phoenix]
link:https://phoenix.apache.org[Apache Phoenix]
=== Trafodion
link:http://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase]
link:https://trafodion.incubator.apache.org/[Trafodion: Transactional SQL-on-HBase]
:numbered:

View File

@ -28,7 +28,7 @@
:experimental:
Apache link:http://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework.
Apache link:https://thrift.apache.org/[Thrift] is a cross-platform, cross-language development framework.
HBase includes a Thrift API and filter language.
The Thrift API relies on client and server processes.

View File

@ -30,7 +30,7 @@
:icons: font
:experimental:
link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:http://htrace.incubator.apache.org/[HTrace].
link:https://issues.apache.org/jira/browse/HBASE-6449[HBASE-6449] added support for tracing requests through HBase, using the open source tracing library, link:https://htrace.incubator.apache.org/[HTrace].
Setting up tracing is quite simple, however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement).
[[tracing.spanreceivers]]
@ -67,7 +67,7 @@ The `LocalFileSpanReceiver` looks in _hbase-site.xml_ for a `hbase.local-fi
HTrace also provides `ZipkinSpanReceiver` which converts spans to link:http://github.com/twitter/zipkin[Zipkin] span format and send them to Zipkin server. In order to use this span receiver, you need to install the jar of htrace-zipkin to your HBase's classpath on all of the nodes in your cluster.
_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:http://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes.
_htrace-zipkin_ is published to the link:http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.htrace%22%20AND%20a%3A%22htrace-zipkin%22[Maven central repository]. You could get the latest version from there or just build it locally (see the link:https://htrace.incubator.apache.org/[HTrace] homepage for information on how to do this) and then copy it out to all nodes.
`ZipkinSpanReceiver` for properties called `hbase.htrace.zipkin.collector-hostname` and `hbase.htrace.zipkin.collector-port` in _hbase-site.xml_ with values describing the Zipkin collector server to which span information are sent.

View File

@ -225,7 +225,7 @@ Search here first when you have an issue as its more than likely someone has alr
[[trouble.resources.lists]]
=== Mailing Lists
Ask a question on the link:http://hbase.apache.org/mail-lists.html[Apache HBase mailing lists].
Ask a question on the link:https://hbase.apache.org/mail-lists.html[Apache HBase mailing lists].
The 'dev' mailing list is aimed at the community of developers actually building Apache HBase and for features currently under development, and 'user' is generally used for questions on released versions of Apache HBase.
Before going to the mailing list, make sure your question has not already been answered by searching the mailing list archives first.
Use <<trouble.resources.searchhadoop>>.
@ -596,7 +596,7 @@ See also Jesse Andersen's link:http://blog.cloudera.com/blog/2014/04/how-to-use-
In some situations clients that fetch data from a RegionServer get a LeaseException instead of the usual <<trouble.client.scantimeout>>.
Usually the source of the exception is `org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)` (line number may vary). It tends to happen in the context of a slow/freezing `RegionServer#next` call.
It can be prevented by having `hbase.rpc.timeout` > `hbase.regionserver.lease.period`.
Harsh J investigated the issue as part of the mailing list thread link:http://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions]
Harsh J investigated the issue as part of the mailing list thread link:https://mail-archives.apache.org/mod_mbox/hbase-user/201209.mbox/%3CCAOcnVr3R-LqtKhFsk8Bhrm-YW2i9O6J6Fhjz2h7q6_sxvwd2yw%40mail.gmail.com%3E[HBase, mail # user - Lease does not exist exceptions]
[[trouble.client.scarylogs]]
=== Shell or client application throws lots of scary exceptions during normal operation
@ -802,7 +802,7 @@ hadoop fs -du /hbase/myTable
----
...returns a list of the regions under the HBase table 'myTable' and their disk utilization.
For more information on HDFS shell commands, see the link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation].
For more information on HDFS shell commands, see the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation].
[[trouble.namenode.hbase.objects]]
=== Browsing HDFS for HBase Objects
@ -1174,7 +1174,7 @@ If you have a DNS server, you can set `hbase.zookeeper.dns.interface` and `hbase
ZooKeeper is the cluster's "canary in the mineshaft". It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster.
See the link:http://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page.
See the link:https://wiki.apache.org/hadoop/ZooKeeper/Troubleshooting[ZooKeeper Operating Environment Troubleshooting] page.
It has suggestions and tools for checking disk and networking performance; i.e.
the operating environment your ZooKeeper and HBase are running in.
@ -1313,7 +1313,7 @@ These changes were backported to HBase 0.98.x and apply to all newer versions.
== HBase and HDFS
General configuration guidance for Apache HDFS is out of the scope of this guide.
Refer to the documentation available at http://hadoop.apache.org/ for extensive information about configuring HDFS.
Refer to the documentation available at https://hadoop.apache.org/ for extensive information about configuring HDFS.
This section deals with HDFS in terms of HBase.
In most cases, HBase stores its data in Apache HDFS.

View File

@ -171,7 +171,7 @@ Similarly, you can now expand into other operations such as Get, Scan, or Delete
== MRUnit
link:http://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs.
link:https://mrunit.apache.org/[Apache MRUnit] is a library that allows you to unit-test MapReduce jobs.
You can use it to test HBase jobs in the same way as other MapReduce jobs.
Given a MapReduce job that writes to an HBase table called `MyTest`, which has one column family called `CF`, the reducer of such a job could look like the following:

View File

@ -125,14 +125,14 @@ for warning about incompatible changes). All effort will be made to provide a de
[[hbase.client.api.surface]]
==== HBase API Surface
HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses link:http://yetus.apache.org/documentation/0.5.0/interface-classification/[Apache Yetus Audience Annotations] to guide downstream expectations for stability.
HBase has a lot of API points, but for the compatibility matrix above, we differentiate between Client API, Limited Private API, and Private API. HBase uses link:https://yetus.apache.org/documentation/0.5.0/interface-classification/[Apache Yetus Audience Annotations] to guide downstream expectations for stability.
* InterfaceAudience (link:http://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceAudience.html[javadocs]): captures the intended audience, possible values include:
* InterfaceAudience (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceAudience.html[javadocs]): captures the intended audience, possible values include:
- Public: safe for end users and external projects
- LimitedPrivate: used for internals we expect to be pluggable, such as coprocessors
- Private: strictly for use within HBase itself
Classes which are defined as `IA.Private` may be used as parameters or return values for interfaces which are declared `IA.LimitedPrivate`. Treat the `IA.Private` object as opaque; do not try to access its methods or fields directly.
* InterfaceStability (link:http://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceStability.html[javadocs]): describes what types of interface changes are permitted. Possible values include:
* InterfaceStability (link:https://yetus.apache.org/documentation/0.5.0/audience-annotations-apidocs/org/apache/yetus/audience/InterfaceStability.html[javadocs]): describes what types of interface changes are permitted. Possible values include:
- Stable: the interface is fixed and is not expected to change
- Evolving: the interface may change in future minor verisons
- Unstable: the interface may change at any time
@ -159,7 +159,7 @@ HBase Private API::
=== Pre 1.0 versions
.HBase Pre-1.0 versions are all EOM
NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:http://www.apache.org/dist/hbase/[the header of our downloads].
NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:https://www.apache.org/dist/hbase/[the header of our downloads].
Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:https://web.archive.org/web/20150905071342/https://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0.

View File

@ -106,7 +106,7 @@ The newer version, the better. ZooKeeper 3.4.x is required as of HBase 1.0.0
.ZooKeeper Maintenance
[CAUTION]
====
Be sure to set up the data dir cleaner described under link:http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper
Be sure to set up the data dir cleaner described under link:https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_maintenance[ZooKeeper
Maintenance] else you could have 'interesting' problems a couple of months in; i.e.
zookeeper could start dropping sessions if it has to run through a directory of hundreds of thousands of logs which is wont to do around leader reelection time -- a process rare but run on occasion whether because a machine is dropped or happens to hiccup.
====
@ -135,9 +135,9 @@ ${HBASE_HOME}/bin/hbase-daemons.sh {start,stop} zookeeper
Note that you can use HBase in this manner to spin up a ZooKeeper cluster, unrelated to HBase.
Just make sure to set `HBASE_MANAGES_ZK` to `false` if you want it to stay up across HBase restarts so that when HBase shuts down, it doesn't take ZooKeeper down with it.
For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:http://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting
For more information about running a distinct ZooKeeper cluster, see the ZooKeeper link:https://hadoop.apache.org/zookeeper/docs/current/zookeeperStarted.html[Getting
Started Guide].
Additionally, see the link:http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper
Additionally, see the link:https://wiki.apache.org/hadoop/ZooKeeper/FAQ#A7[ZooKeeper Wiki] or the link:https://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#sc_zkMulitServerSetup[ZooKeeper
documentation] for more information on ZooKeeper sizing.
[[zk.sasl.auth]]

View File

@ -42,7 +42,7 @@
// Logo for HTML -- doesn't render in PDF
++++
<div>
<a href="http://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a>
<a href="https://hbase.apache.org"><img src="images/hbase_logo_with_orca.png" alt="Apache HBase Logo" /></a>
</div>
++++