HBASE-18922 Fix all dead links in our HBase book
Signed-off-by: Chia-Ping Tsai <chia7712@gmail.com>
This commit is contained in:
parent
bfaacfdba3
commit
281bbc40c5
|
@ -76,7 +76,7 @@ HBase can run quite well stand-alone on a laptop - but this should be considered
|
|||
[[arch.overview.hbasehdfs]]
|
||||
=== What Is The Difference Between HBase and Hadoop/HDFS?
|
||||
|
||||
link:http://hadoop.apache.org/hdfs/[HDFS] is a distributed file system that is well suited for the storage of large files.
|
||||
link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html[HDFS] is a distributed file system that is well suited for the storage of large files.
|
||||
Its documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
|
||||
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
|
||||
This can sometimes be a point of conceptual confusion.
|
||||
|
@ -119,9 +119,7 @@ If a region has both an empty start and an empty end key, it is the only region
|
|||
====
|
||||
|
||||
In the (hopefully unlikely) event that programmatic processing of catalog metadata
|
||||
is required, see the
|
||||
+++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte%5B%5D%29">Writables</a>+++
|
||||
utility.
|
||||
is required, see the link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/RegionInfo.html#parseFrom-byte:A-[RegionInfo.parseFrom] utility.
|
||||
|
||||
[[arch.catalog.startup]]
|
||||
=== Startup Sequencing
|
||||
|
@ -221,7 +219,7 @@ In HBase 2.0 and later, link:http://hbase.apache.org/devapidocs/org/apache/hadoo
|
|||
|
||||
For additional information on write durability, review the link:/acid-semantics.html[ACID semantics] page.
|
||||
|
||||
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch%28java.util.List%29[batch] methods on Table.
|
||||
For fine-grained control of batching of ``Put``s or ``Delete``s, see the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[batch] methods on Table.
|
||||
|
||||
[[async.client]]
|
||||
=== Asynchronous Client ===
|
||||
|
@ -2799,7 +2797,7 @@ if (result.isStale()) {
|
|||
=== Resources
|
||||
|
||||
. More information about the design and implementation can be found at the jira issue: link:https://issues.apache.org/jira/browse/HBASE-10070[HBASE-10070]
|
||||
. HBaseCon 2014 link:http://hbasecon.com/sessions/#session15[talk] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
|
||||
. HBaseCon 2014 talk: link:http://hbase.apache.org/www.hbasecon.com/#2014-PresentationsRecordings[HBase Read High Availability Using Timeline-Consistent Region Replicas] also contains some details and link:http://www.slideshare.net/enissoz/hbase-high-availability-for-reads-with-time[slides].
|
||||
|
||||
ifdef::backend-docbook[]
|
||||
[index]
|
||||
|
|
|
@ -47,9 +47,9 @@ The below policy is something we put in place 09/2012.
|
|||
It is a suggested policy rather than a hard requirement.
|
||||
We want to try it first to see if it works before we cast it in stone.
|
||||
|
||||
Apache HBase is made of link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components].
|
||||
Apache HBase is made of link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components].
|
||||
Components have one or more <<owner,OWNER>>s.
|
||||
See the 'Description' field on the link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components] JIRA page for who the current owners are by component.
|
||||
See the 'Description' field on the link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components] JIRA page for who the current owners are by component.
|
||||
|
||||
Patches that fit within the scope of a single Apache HBase component require, at least, a +1 by one of the component's owners before commit.
|
||||
If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice.
|
||||
|
@ -88,7 +88,7 @@ We also are currently in violation of this basic tenet -- replication at least k
|
|||
[[owner]]
|
||||
.Component Owner/Lieutenant
|
||||
|
||||
Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/browse/HBASE#selectedTab=com.atlassian.jira.plugin.system.project%3Acomponents-panel[components] page.
|
||||
Component owners are listed in the description field on this Apache HBase JIRA link:https://issues.apache.org/jira/projects/HBASE?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page[components] page.
|
||||
The owners are listed in the 'Description' field rather than in the 'Component Lead' field because the latter only allows us list one individual whereas it is encouraged that components have multiple owners.
|
||||
|
||||
Owners or component lieutenants are volunteers who are (usually, but not necessarily) expert in their component domain and may have an agenda on how they think their Apache HBase component should evolve.
|
||||
|
|
|
@ -267,8 +267,7 @@ See <<brand.new.compressor,brand.new.compressor>>).
|
|||
.Install LZO Support
|
||||
|
||||
HBase cannot ship with LZO because of incompatibility between HBase, which uses an Apache Software License (ASL) and LZO, which uses a GPL license.
|
||||
See the link:http://wiki.apache.org/hadoop/UsingLzoCompression[Using LZO
|
||||
Compression] wiki page for information on configuring LZO support for HBase.
|
||||
See the link:https://github.com/twitter/hadoop-lzo/blob/master/README.md[Hadoop-LZO at Twitter] for information on configuring LZO support for HBase.
|
||||
|
||||
If you depend upon LZO compression, consider configuring your RegionServers to fail to start if LZO is not available.
|
||||
See <<hbase.regionserver.codecs,hbase.regionserver.codecs>>.
|
||||
|
|
|
@ -820,7 +820,7 @@ See the entry for `hbase.hregion.majorcompaction` in the <<compaction.parameters
|
|||
====
|
||||
Major compactions are absolutely necessary for StoreFile clean-up.
|
||||
Do not disable them altogether.
|
||||
You can run major compactions manually via the HBase shell or via the http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact(org.apache.hadoop.hbase.TableName)[Admin API].
|
||||
You can run major compactions manually via the HBase shell or via the link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin API].
|
||||
====
|
||||
|
||||
For more information about compactions and the compaction file selection process, see <<compaction,compaction>>
|
||||
|
|
|
@ -121,8 +121,8 @@ package.
|
|||
|
||||
Observer coprocessors are triggered either before or after a specific event occurs.
|
||||
Observers that happen before an event use methods that start with a `pre` prefix,
|
||||
such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`prePut`]. Observers that happen just after an event override methods that start
|
||||
with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut%28org.apache.hadoop.hbase.coprocessor.ObserverContext,%20org.apache.hadoop.hbase.client.Put,%20org.apache.hadoop.hbase.regionserver.wal.WALEdit,%20org.apache.hadoop.hbase.client.Durability%29[`postPut`].
|
||||
such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
|
||||
with a `post` prefix, such as link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
|
||||
|
||||
|
||||
==== Use Cases for Observer Coprocessors
|
||||
|
@ -178,7 +178,7 @@ average or summation for an entire table which spans hundreds of regions.
|
|||
|
||||
In contrast to observer coprocessors, where your code is run transparently, endpoint
|
||||
coprocessors must be explicitly invoked using the
|
||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService%28java.lang.Class,%20byte%5B%5D,%20byte%5B%5D,%20org.apache.hadoop.hbase.client.coprocessor.Batch.Call%29[CoprocessorService()]
|
||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html#coprocessorService-java.lang.Class-byte:A-byte:A-org.apache.hadoop.hbase.client.coprocessor.Batch.Call-[CoprocessorService()]
|
||||
method available in
|
||||
link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table]
|
||||
or
|
||||
|
|
|
@ -67,7 +67,7 @@ Timestamp::
|
|||
[[conceptual.view]]
|
||||
== Conceptual View
|
||||
|
||||
You can read a very understandable explanation of the HBase data model in the blog post link:http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable[Understanding HBase and BigTable] by Jim R. Wilson.
|
||||
You can read a very understandable explanation of the HBase data model in the blog post link:http://jimbojw.com/#understanding%20hbase[Understanding HBase and BigTable] by Jim R. Wilson.
|
||||
Another good explanation is available in the PDF link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction to Basic Schema Design] by Amandeep Khurana.
|
||||
|
||||
It may help to read different perspectives to get a solid understanding of HBase schema design.
|
||||
|
@ -275,11 +275,11 @@ Operations are applied via link:http://hbase.apache.org/apidocs/org/apache/hadoo
|
|||
=== Get
|
||||
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
|
||||
Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get(org.apache.hadoop.hbase.client.Get)[Table.get].
|
||||
Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
|
||||
|
||||
=== Put
|
||||
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (non-writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,%20java.lang.Object%5B%5D)[Table.batch] (non-writeBuffer).
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
|
||||
|
||||
[[scan]]
|
||||
=== Scans
|
||||
|
@ -316,7 +316,7 @@ Note that generally the easiest way to specify a specific stop point for a scan
|
|||
=== Delete
|
||||
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
|
||||
Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete(org.apache.hadoop.hbase.client.Delete)[Table.delete].
|
||||
Deletes are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
|
||||
|
||||
HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
|
||||
These tombstones, along with the dead values, are cleaned up on major compactions.
|
||||
|
@ -389,8 +389,8 @@ The below discussion of link:http://hbase.apache.org/apidocs/org/apache/hadoop/h
|
|||
|
||||
By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
|
||||
|
||||
* to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions()[Get.setMaxVersions()]
|
||||
* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange(long,%20long)[Get.setTimeRange()]
|
||||
* to return more than one version, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
|
||||
* to return versions other than the latest, see link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
|
||||
+
|
||||
To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.
|
||||
|
||||
|
|
|
@ -64,7 +64,7 @@ FreeNode offers a web-based client, but most people prefer a native client, and
|
|||
|
||||
=== Jira
|
||||
|
||||
Check for existing issues in link:https://issues.apache.org/jira/browse/HBASE[Jira].
|
||||
Check for existing issues in link:https://issues.apache.org/jira/projects/HBASE/issues[Jira].
|
||||
If it's either a new feature request, enhancement, or a bug, file a ticket.
|
||||
|
||||
We track multiple types of work in JIRA:
|
||||
|
@ -479,8 +479,7 @@ mvn -DskipTests package assembly:single deploy
|
|||
|
||||
If you see `Unable to find resource 'VM_global_library.vm'`, ignore it.
|
||||
It's not an error.
|
||||
It is link:http://jira.codehaus.org/browse/MSITE-286[officially
|
||||
ugly] though.
|
||||
It is link:https://issues.apache.org/jira/browse/MSITE-286[officially ugly] though.
|
||||
|
||||
[[releasing]]
|
||||
== Releasing Apache HBase
|
||||
|
@ -593,8 +592,7 @@ Adjust the version in all the POM files appropriately.
|
|||
If you are making a release candidate, you must remove the `-SNAPSHOT` label from all versions
|
||||
in all pom.xml files.
|
||||
If you are running this receipe to publish a snapshot, you must keep the `-SNAPSHOT` suffix on the hbase version.
|
||||
The link:http://mojo.codehaus.org/versions-maven-plugin/[Versions
|
||||
Maven Plugin] can be of use here.
|
||||
The link:http://www.mojohaus.org/versions-maven-plugin/[Versions Maven Plugin] can be of use here.
|
||||
To set a version in all the many poms of the hbase multi-module project, use a command like the following:
|
||||
+
|
||||
[source,bourne]
|
||||
|
@ -738,7 +736,7 @@ If you run the script, do your checks at this stage verifying the src and bin ta
|
|||
Tag before you start the build.
|
||||
You can always delete it if the build goes haywire.
|
||||
|
||||
. Sign, fingerprint and then 'stage' your release candiate version directory via svnpubsub by committing your directory to link:https://dist.apache.org/repos/dist/dev/hbase/[The 'dev' distribution directory] (See comments on link:https://issues.apache.org/jira/browse/HBASE-10554[HBASE-10554 Please delete old releases from mirroring system] but in essence it is an svn checkout of https://dist.apache.org/repos/dist/dev/hbase -- releases are at https://dist.apache.org/repos/dist/release/hbase). In the _version directory_ run the following commands:
|
||||
. Sign, fingerprint and then 'stage' your release candiate version directory via svnpubsub by committing your directory to link:https://dist.apache.org/repos/dist/dev/hbase/[The 'dev' distribution directory] (See comments on link:https://issues.apache.org/jira/browse/HBASE-10554[HBASE-10554 Please delete old releases from mirroring system] but in essence, it is an svn checkout of https://dist.apache.org/repos/dist/dev/hbase. And releases are at https://dist.apache.org/repos/dist/release/hbase). In the _version directory_ run the following commands:
|
||||
+
|
||||
[source,bourne]
|
||||
----
|
||||
|
@ -920,7 +918,7 @@ Also, keep in mind that if you are running tests in the `hbase-server` module yo
|
|||
=== Unit Tests
|
||||
|
||||
Apache HBase test cases are subdivided into four categories: small, medium, large, and
|
||||
integration with corresponding JUnit link:http://www.junit.org/node/581[categories]: `SmallTests`, `MediumTests`, `LargeTests`, `IntegrationTests`.
|
||||
integration with corresponding JUnit link:https://github.com/junit-team/junit4/wiki/Categories[categories]: `SmallTests`, `MediumTests`, `LargeTests`, `IntegrationTests`.
|
||||
JUnit categories are denoted using java annotations and look like this in your unit test code.
|
||||
|
||||
[source,java]
|
||||
|
@ -1286,7 +1284,7 @@ For other deployment options, a ClusterManager can be implemented and plugged in
|
|||
==== Destructive integration / system tests (ChaosMonkey)
|
||||
|
||||
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after
|
||||
link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix's Chaos Monkey tool].
|
||||
link:https://netflix.github.io/chaosmonkey/[same-named tool by Netflix's Chaos Monkey tool].
|
||||
ChaosMonkey simulates real-world
|
||||
faults in a running cluster by killing or disconnecting random servers, or injecting
|
||||
other failures into the environment. You can use ChaosMonkey as a stand-alone tool
|
||||
|
@ -1934,7 +1932,7 @@ Use the btn:[Submit Patch] button in JIRA, just like othe
|
|||
|
||||
===== Reject
|
||||
|
||||
Patches which do not adhere to the guidelines in link:https://wiki.apache.org/hadoop/Hbase/HowToCommit/hadoop/Hbase/HowToContribute#[HowToContribute] and to the link:https://wiki.apache.org/hadoop/Hbase/HowToCommit/hadoop/CodeReviewChecklist#[code review checklist] should be rejected.
|
||||
Patches which do not adhere to the guidelines in link:https://hbase.apache.org/book.html#developer[HowToContribute] and to the link:https://wiki.apache.org/hadoop/CodeReviewChecklist[code review checklist] should be rejected.
|
||||
Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches.
|
||||
If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review.
|
||||
|
||||
|
|
|
@ -625,7 +625,9 @@ Documentation about Thrift has moved to <<thrift>>.
|
|||
== C/C++ Apache HBase Client
|
||||
|
||||
FB's Chip Turner wrote a pure C/C++ client.
|
||||
link:https://github.com/facebook/native-cpp-hbase-client[Check it out].
|
||||
link:https://github.com/hinaria/native-cpp-hbase-client[Check it out].
|
||||
|
||||
C++ client implementation. To see link:https://issues.apache.org/jira/browse/HBASE-14850[HBASE-14850].
|
||||
|
||||
[[jdo]]
|
||||
|
||||
|
|
|
@ -766,7 +766,7 @@ The LoadTestTool has received many updates in recent HBase releases, including s
|
|||
[[ops.regionmgt.majorcompact]]
|
||||
=== Major Compaction
|
||||
|
||||
Major compactions can be requested via the HBase shell or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact%28java.lang.String%29[Admin.majorCompact].
|
||||
Major compactions can be requested via the HBase shell or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
||||
|
||||
Note: major compactions do NOT do region merges.
|
||||
See <<compaction,compaction>> for more information about compactions.
|
||||
|
@ -783,7 +783,7 @@ $ bin/hbase org.apache.hadoop.hbase.util.Merge <tablename> <region1> <region2>
|
|||
|
||||
If you feel you have too many regions and want to consolidate them, Merge is the utility you need.
|
||||
Merge must run be done when the cluster is down.
|
||||
See the link:http://ofps.oreilly.com/titles/9781449396107/performance.html[O'Reilly HBase
|
||||
See the link:https://web.archive.org/web/20111231002503/http://ofps.oreilly.com/titles/9781449396107/performance.html[O'Reilly HBase
|
||||
Book] for an example of usage.
|
||||
|
||||
You will need to pass 3 parameters to this application.
|
||||
|
@ -1050,7 +1050,7 @@ In this case, or if you are in a OLAP environment and require having locality, t
|
|||
[[hbase_metrics]]
|
||||
== HBase Metrics
|
||||
|
||||
HBase emits metrics which adhere to the link:http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/metrics/package-summary.html[Hadoop metrics] API.
|
||||
HBase emits metrics which adhere to the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Metrics.html[Hadoop Metrics] API.
|
||||
Starting with HBase 0.95footnote:[The Metrics system was redone in
|
||||
HBase 0.96. See Migration
|
||||
to the New Metrics Hotness – Metrics2 by Elliot Clark for detail], HBase is configured to emit a default set of metrics with a default sampling period of every 10 seconds.
|
||||
|
@ -1331,7 +1331,7 @@ Have a look in the Web UI.
|
|||
== Cluster Replication
|
||||
|
||||
NOTE: This information was previously available at
|
||||
link:http://hbase.apache.org#replication[Cluster Replication].
|
||||
link:https://hbase.apache.org/0.94/replication.html[Cluster Replication].
|
||||
|
||||
HBase provides a cluster replication mechanism which allows you to keep one cluster's state synchronized with that of another cluster, using the write-ahead log (WAL) of the source cluster to propagate the changes.
|
||||
Some use cases for cluster replication include:
|
||||
|
@ -2093,7 +2093,7 @@ The act of copying these files creates new HDFS metadata, which is why a restore
|
|||
=== Live Cluster Backup - Replication
|
||||
|
||||
This approach assumes that there is a second cluster.
|
||||
See the HBase page on link:http://hbase.apache.org/book.html#replication[replication] for more information.
|
||||
See the HBase page on link:http://hbase.apache.org/book.html#_cluster_replication[replication] for more information.
|
||||
|
||||
[[ops.backup.live.copytable]]
|
||||
=== Live Cluster Backup - CopyTable
|
||||
|
|
|
@ -32,16 +32,14 @@
|
|||
=== HBase Videos
|
||||
|
||||
.Introduction to HBase
|
||||
* link:http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011).
|
||||
* link:http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon[Introduction to HBase] by Todd Lipcon (2010).
|
||||
link:http://www.cloudera.com/videos/hadoop-world-2011-presentation-video-building-realtime-big-data-services-at-facebook-with-hadoop-and-hbase[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Hadoop World 2011).
|
||||
|
||||
link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[HBase and Hadoop, Mixing Real-Time and Batch Processing at StumbleUpon] by JD Cryans (Hadoop World 2010).
|
||||
* link:https://vimeo.com/23400732[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011).
|
||||
* link:https://vimeo.com/26804675[Building Real Time Services at Facebook with HBase] by Jonathan Gray (Berlin buzzwords 2011)
|
||||
* link:http://www.cloudera.com/videos/hw10_video_how_stumbleupon_built_and_advertising_platform_using_hbase_and_hadoop[The Multiple Uses Of HBase] by Jean-Daniel Cryans(Berlin buzzwords 2011).
|
||||
|
||||
[[other.info.pres]]
|
||||
=== HBase Presentations (Slides)
|
||||
|
||||
link:http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html[Advanced HBase Schema Design] by Lars George (Hadoop World 2011).
|
||||
link:https://www.slideshare.net/cloudera/hadoop-world-2011-advanced-hbase-schema-design-lars-george-cloudera[Advanced HBase Schema Design] by Lars George (Hadoop World 2011).
|
||||
|
||||
link:http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction[Introduction to HBase] by Todd Lipcon (Chicago Data Summit 2011).
|
||||
|
||||
|
@ -61,9 +59,7 @@ link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Rela
|
|||
|
||||
link:https://blog.cloudera.com/blog/category/hbase/[Cloudera's HBase Blog] has a lot of links to useful HBase information.
|
||||
|
||||
* link:https://blog.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems.
|
||||
|
||||
link:http://wiki.apache.org/hadoop/HBase/HBasePresentations[HBase Wiki] has a page with a number of presentations.
|
||||
link:https://blog.cloudera.com/blog/2010/04/cap-confusion-problems-with-partition-tolerance/[CAP Confusion] is a relevant entry for background information on distributed storage systems.
|
||||
|
||||
link:http://refcardz.dzone.com/refcardz/hbase[HBase RefCard] from DZone.
|
||||
|
||||
|
|
|
@ -587,7 +587,7 @@ If all your data is being written to one region at a time, then re-read the sect
|
|||
|
||||
Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.
|
||||
There are a variety of reasons that regions may appear "well split" but won't work with your data.
|
||||
As the HBase client communicates directly with the RegionServers, this can be obtained via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation].
|
||||
As the HBase client communicates directly with the RegionServers, this can be obtained via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/RegionLocator.html#getRegionLocation-byte:A-[RegionLocator.getRegionLocation].
|
||||
|
||||
See <<precreate.regions>>, as well as <<perf.configurations>>
|
||||
|
||||
|
@ -699,7 +699,7 @@ Enabling Bloom Filters can save your having to go to disk and can help improve r
|
|||
link:http://en.wikipedia.org/wiki/Bloom_filter[Bloom filters] were developed over in link:https://issues.apache.org/jira/browse/HBASE-1200[HBase-1200 Add bloomfilters].
|
||||
For description of the development process -- why static blooms rather than dynamic -- and for an overview of the unique properties that pertain to blooms in HBase, as well as possible future directions, see the _Development Process_ section of the document link:https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf[BloomFilters in HBase] attached to link:https://issues.apache.org/jira/browse/HBASE-1200[HBASE-1200].
|
||||
The bloom filters described here are actually version two of blooms in HBase.
|
||||
In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the link:http://www.one-lab.org/[European Commission One-Lab Project 034819].
|
||||
In versions up to 0.19.x, HBase had a dynamic bloom option based on work done by the link:http://www.onelab.org[European Commission One-Lab Project 034819].
|
||||
The core of the HBase bloom work was later pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
|
||||
Version 1 of HBase blooms never worked that well.
|
||||
Version 2 is a rewrite from scratch though again it starts with the one-lab work.
|
||||
|
@ -816,7 +816,7 @@ In this case, special care must be taken to regularly perform major compactions
|
|||
As is documented in <<datamodel>>, marking rows as deleted creates additional StoreFiles which then need to be processed on reads.
|
||||
Tombstones only get cleaned up with major compactions.
|
||||
|
||||
See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact%28java.lang.String%29[Admin.majorCompact].
|
||||
See also <<compaction>> and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#majorCompact-org.apache.hadoop.hbase.TableName-[Admin.majorCompact].
|
||||
|
||||
[[perf.deleting.rpc]]
|
||||
=== Delete RPC Behavior
|
||||
|
@ -825,8 +825,7 @@ Be aware that `Table.delete(Delete)` doesn't use the writeBuffer.
|
|||
It will execute an RegionServer RPC with each invocation.
|
||||
For a large number of deletes, consider `Table.delete(List)`.
|
||||
|
||||
See
|
||||
+++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete%28org.apache.hadoop.hbase.client.Delete%29">hbase.client.Delete</a>+++.
|
||||
See link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[hbase.client.Delete]
|
||||
|
||||
[[perf.hdfs]]
|
||||
== HDFS
|
||||
|
|
|
@ -29,7 +29,7 @@
|
|||
|
||||
|
||||
== Protobuf
|
||||
HBase uses Google's link:http://protobuf.protobufs[protobufs] wherever
|
||||
HBase uses Google's link:https://developers.google.com/protocol-buffers/[protobufs] wherever
|
||||
it persists metadata -- in the tail of hfiles or Cells written by
|
||||
HBase into the system hbase:meta table or when HBase writes znodes
|
||||
to zookeeper, etc. -- and when it passes objects over the wire making
|
||||
|
|
|
@ -338,7 +338,7 @@ This is the main trade-off.
|
|||
====
|
||||
link:https://issues.apache.org/jira/browse/HBASE-4811[HBASE-4811] implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning.
|
||||
This feature is available in HBase 0.98 and later.
|
||||
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean for more information.
|
||||
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed-boolean-[Scan.setReversed()] for more information.
|
||||
====
|
||||
|
||||
A common problem in database processing is quickly finding the most recent version of a value.
|
||||
|
@ -760,7 +760,7 @@ Neither approach is wrong, it just depends on what is most appropriate for the s
|
|||
====
|
||||
link:https://issues.apache.org/jira/browse/HBASE-4811[HBASE-4811] implements an API to scan a table or a range within a table in reverse, reducing the need to optimize your schema for forward or reverse scanning.
|
||||
This feature is available in HBase 0.98 and later.
|
||||
See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed%28boolean for more information.
|
||||
See link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setReversed-boolean-[Scan.setReversed()] for more information.
|
||||
====
|
||||
|
||||
[[schema.casestudies.log_timeseries.varkeys]]
|
||||
|
@ -789,8 +789,7 @@ The rowkey of LOG_TYPES would be:
|
|||
* `[bytes]` variable length bytes for raw hostname or event-type.
|
||||
|
||||
A column for this rowkey could be a long with an assigned number, which could be obtained
|
||||
by using an
|
||||
+++<a href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29">HBase counter</a>+++.
|
||||
by using an link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#incrementColumnValue-byte:A-byte:A-byte:A-long-[HBase counter]
|
||||
|
||||
So the resulting composite rowkey would be:
|
||||
|
||||
|
@ -806,7 +805,7 @@ In either the Hash or Numeric substitution approach, the raw values for hostname
|
|||
This effectively is the OpenTSDB approach.
|
||||
What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
|
||||
For a detailed explanation, see: http://opentsdb.net/schema.html, and
|
||||
+++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++
|
||||
link:https://www.slideshare.net/cloudera/4-opentsdb-hbasecon[Lessons Learned from OpenTSDB]
|
||||
from HBaseCon2012.
|
||||
|
||||
But this is how the general concept works: data is ingested, for example, in this manner...
|
||||
|
|
|
@ -1390,11 +1390,11 @@ When you issue a Scan or Get, HBase uses your default set of authorizations to
|
|||
filter out cells that you do not have access to. A superuser can set the default
|
||||
set of authorizations for a given user by using the `set_auths` HBase Shell command
|
||||
or the
|
||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths(org.apache.hadoop.hbase.client.Connection,%20java.lang.String\[\],%20java.lang.String)[VisibilityClient.setAuths()] method.
|
||||
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/security/visibility/VisibilityClient.html#setAuths-org.apache.hadoop.hbase.client.Connection-java.lang.String:A-java.lang.String-[VisibilityClient.setAuths()] method.
|
||||
|
||||
You can specify a different authorization during the Scan or Get, by passing the
|
||||
AUTHORIZATIONS option in HBase Shell, or the
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations%28org.apache.hadoop.hbase.security.visibility.Authorizations%29[setAuthorizations()]
|
||||
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setAuthorizations-org.apache.hadoop.hbase.security.visibility.Authorizations-[Scan.setAuthorizations()]
|
||||
method if you use the API. This authorization will be combined with your default
|
||||
set as an additional filter. It will further filter your results, rather than
|
||||
giving you additional authorization.
|
||||
|
|
|
@ -706,7 +706,10 @@ Because of a change in the format in which MIT Kerberos writes its credentials c
|
|||
If you have this problematic combination of components in your environment, to work around this problem, first log in with `kinit` and then immediately refresh the credential cache with `kinit -R`.
|
||||
The refresh will rewrite the credential cache without the problematic formatting.
|
||||
|
||||
Finally, depending on your Kerberos configuration, you may need to install the link:http://docs.oracle.com/javase/1.4.2/docs/guide/security/jce/JCERefGuide.html[Java Cryptography Extension], or JCE.
|
||||
Prior to JDK 1.4, the JCE was an unbundled product, and as such, the JCA and JCE were regularly referred to as separate, distinct components.
|
||||
As JCE is now bundled in the JDK 7.0, the distinction is becoming less apparent. Since the JCE uses the same architecture as the JCA, the JCE should be more properly thought of as a part of the JCA.
|
||||
|
||||
You may need to install the link:https://docs.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html[Java Cryptography Extension], or JCE because of JDK 1.5 or earlier version.
|
||||
Insure the JCE jars are on the classpath on both server and client systems.
|
||||
|
||||
You may also need to download the link:http://www.oracle.com/technetwork/java/javase/downloads/jce-6-download-429243.html[unlimited strength JCE policy files].
|
||||
|
@ -758,7 +761,7 @@ For example (substitute VERSION with your HBase version):
|
|||
HADOOP_CLASSPATH=`hbase classpath` hadoop jar $HBASE_HOME/hbase-server-VERSION.jar rowcounter usertable
|
||||
----
|
||||
|
||||
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#classpathfor more information on HBase MapReduce jobs and classpaths.
|
||||
See <<hbase.mapreduce.classpath,HBase, MapReduce, and the CLASSPATH>> for more information on HBase MapReduce jobs and classpaths.
|
||||
|
||||
[[trouble.hbasezerocopybytestring]]
|
||||
=== Launching a job, you get java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString or class com.google.protobuf.ZeroCopyLiteralByteString cannot access its superclass com.google.protobuf.LiteralByteString
|
||||
|
@ -799,7 +802,7 @@ hadoop fs -du /hbase/myTable
|
|||
----
|
||||
...returns a list of the regions under the HBase table 'myTable' and their disk utilization.
|
||||
|
||||
For more information on HDFS shell commands, see the link:http://hadoop.apache.org/common/docs/current/file_system_shell.html[HDFS FileSystem Shell documentation].
|
||||
For more information on HDFS shell commands, see the link:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html[HDFS FileSystem Shell documentation].
|
||||
|
||||
[[trouble.namenode.hbase.objects]]
|
||||
=== Browsing HDFS for HBase Objects
|
||||
|
@ -830,7 +833,7 @@ The HDFS directory structure of HBase WAL is..
|
|||
/<WAL> (WAL files for the RegionServer)
|
||||
----
|
||||
|
||||
See the link:http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html[HDFS User Guide] for other non-shell diagnostic utilities like `fsck`.
|
||||
See the link:https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[HDFS User Guide] for other non-shell diagnostic utilities like `fsck`.
|
||||
|
||||
[[trouble.namenode.0size.hlogs]]
|
||||
==== Zero size WALs with data in them
|
||||
|
|
|
@ -33,7 +33,7 @@ For information on unit tests for HBase itself, see <<hbase.tests,hbase.tests>>.
|
|||
|
||||
== JUnit
|
||||
|
||||
HBase uses link:http://junit.org[JUnit] 4 for unit tests
|
||||
HBase uses link:http://junit.org[JUnit] for unit tests
|
||||
|
||||
This example will add unit tests to the following example class:
|
||||
|
||||
|
|
|
@ -161,7 +161,7 @@ HBase Private API::
|
|||
.HBase Pre-1.0 versions are all EOM
|
||||
NOTE: For new installations, do not deploy 0.94.y, 0.96.y, or 0.98.y. Deploy our stable version. See link:https://issues.apache.org/jira/browse/HBASE-11642[EOL 0.96], link:https://issues.apache.org/jira/browse/HBASE-16215[clean up of EOM releases], and link:http://www.apache.org/dist/hbase/[the header of our downloads].
|
||||
|
||||
Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:http://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0.
|
||||
Before the semantic versioning scheme pre-1.0, HBase tracked either Hadoop's versions (0.2x) or 0.9x versions. If you are into the arcane, checkout our old wiki page on link:https://web.archive.org/web/20150905071342/https://wiki.apache.org/hadoop/Hbase/HBaseVersions[HBase Versioning] which tries to connect the HBase version dots. Below sections cover ONLY the releases before 1.0.
|
||||
|
||||
[[hbase.development.series]]
|
||||
.Odd/Even Versioning or "Development" Series Releases
|
||||
|
|
|
@ -181,7 +181,7 @@ We'll refer to this JAAS configuration file as _$CLIENT_CONF_ below.
|
|||
|
||||
=== HBase-managed ZooKeeper Configuration
|
||||
|
||||
On each node that will run a zookeeper, a master, or a regionserver, create a link:http://docs.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/LoginConfigFile.html[JAAS] configuration file in the conf directory of the node's _HBASE_HOME_ directory that looks like the following:
|
||||
On each node that will run a zookeeper, a master, or a regionserver, create a link:http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html[JAAS] configuration file in the conf directory of the node's _HBASE_HOME_ directory that looks like the following:
|
||||
|
||||
[source,java]
|
||||
----
|
||||
|
|
Loading…
Reference in New Issue