Commit for HBASE-14825 -- corrections of typos, misspellings, and mangled links

Commit for HBASE-14825 -- corrections of typos, misspellings, and mangled links - addition commit for line lengths
This commit is contained in:
Daniel Vimont 2015-11-19 17:05:17 +09:00 committed by Misty Stanley-Jones
parent 8b67df6948
commit 6a493ddff7
33 changed files with 191 additions and 171 deletions

View File

@ -290,7 +290,7 @@ possible configurations would overwhelm and obscure the important.
updates are blocked and flushes are forced. Defaults to 40% of heap (0.4).
Updates are blocked and flushes are forced until size of all memstores
in a region server hits hbase.regionserver.global.memstore.size.lower.limit.
The default value in this configuration has been intentionally left emtpy in order to
The default value in this configuration has been intentionally left empty in order to
honor the old hbase.regionserver.global.memstore.upperLimit property if present.</description>
</property>
<property>
@ -300,7 +300,7 @@ possible configurations would overwhelm and obscure the important.
Defaults to 95% of hbase.regionserver.global.memstore.size (0.95).
A 100% value for this value causes the minimum possible flushing to occur when updates are
blocked due to memstore limiting.
The default value in this configuration has been intentionally left emtpy in order to
The default value in this configuration has been intentionally left empty in order to
honor the old hbase.regionserver.global.memstore.lowerLimit property if present.</description>
</property>
<property>
@ -356,7 +356,8 @@ possible configurations would overwhelm and obscure the important.
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
For example, if a HBase region server connects to a ZK ensemble that's also managed by HBase, then the
For example, if an HBase region server connects to a ZK ensemble that's also managed
by HBase, then the
session timeout will be the one specified by this configuration. But, a region server that connects
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
@ -368,7 +369,7 @@ possible configurations would overwhelm and obscure the important.
<value>/hbase</value>
<description>Root ZNode for HBase in ZooKeeper. All of HBase's ZooKeeper
files that are configured with a relative path will go under this node.
By default, all of HBase's ZooKeeper file path are configured with a
By default, all of HBase's ZooKeeper file paths are configured with a
relative path, so they will all go under this directory unless changed.</description>
</property>
<property>
@ -598,8 +599,8 @@ possible configurations would overwhelm and obscure the important.
<name>hbase.server.versionfile.writeattempts</name>
<value>3</value>
<description>
How many time to retry attempting to write a version file
before just aborting. Each attempt is seperated by the
How many times to retry attempting to write a version file
before just aborting. Each attempt is separated by the
hbase.server.thread.wakefrequency milliseconds.</description>
</property>
<property>
@ -739,7 +740,7 @@ possible configurations would overwhelm and obscure the important.
<description>A StoreFile (or a selection of StoreFiles, when using ExploringCompactionPolicy)
smaller than this size will always be eligible for minor compaction.
HFiles this size or larger are evaluated by hbase.hstore.compaction.ratio to determine if
they are eligible. Because this limit represents the "automatic include"limit for all
they are eligible. Because this limit represents the "automatic include" limit for all
StoreFiles smaller than this value, this value may need to be reduced in write-heavy
environments where many StoreFiles in the 1-2 MB range are being flushed, because every
StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the
@ -808,7 +809,7 @@ possible configurations would overwhelm and obscure the important.
<value>2684354560</value>
<description>There are two different thread pools for compactions, one for large compactions and
the other for small compactions. This helps to keep compaction of lean tables (such as
<systemitem>hbase:meta</systemitem>) fast. If a compaction is larger than this threshold, it
hbase:meta) fast. If a compaction is larger than this threshold, it
goes into the large compaction pool. In most cases, the default value is appropriate. Default:
2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size (which defaults to 128MB).
The value field assumes that the value of hbase.hregion.memstore.flush.size is unchanged from
@ -1111,8 +1112,8 @@ possible configurations would overwhelm and obscure the important.
<description>Set to true to skip the 'hbase.defaults.for.version' check.
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
IDE. You'll want to set this boolean to true to avoid
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (\${hbase.version}), this
version is X.X.X-SNAPSHOT"</description>
</property>
@ -1209,7 +1210,7 @@ possible configurations would overwhelm and obscure the important.
<property>
<name>hbase.rootdir.perms</name>
<value>700</value>
<description>FS Permissions for the root directory in a secure(kerberos) setup.
<description>FS Permissions for the root directory in a secure (kerberos) setup.
When master starts, it creates the rootdir with this permissions or sets the permissions
if it does not match.</description>
</property>
@ -1523,7 +1524,7 @@ possible configurations would overwhelm and obscure the important.
<description>
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
If this is enabled, a replication peer named "region_replica_replication" will be created
which will tail the logs and replicate the mutatations to region replicas for tables that
which will tail the logs and replicate the mutations to region replicas for tables that
have region replication > 1. If this is enabled once, disabling this replication also
requires disabling the replication peer using shell or ReplicationAdmin java class.
Replication to secondary region replicas works over standard inter-cluster replication.

View File

@ -136,7 +136,7 @@
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>

View File

@ -144,7 +144,7 @@
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>

View File

@ -140,7 +140,7 @@
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>

View File

@ -144,7 +144,7 @@
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>

View File

@ -144,7 +144,7 @@
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (@@@VERSION@@@), this
version is X.X.X-SNAPSHOT"
</description>

View File

@ -125,7 +125,7 @@ This directory also stores images used in the HBase Reference Guide.
The website's pages are written in an HTML-like XML dialect called xdoc, which
has a reference guide at
link:http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
http://maven.apache.org/archives/maven-1.x/plugins/xdoc/reference/xdocs.html.
You can edit these files in a plain-text editor, an IDE, or an XML editor such
as XML Mind XML Editor (XXE) or Oxygen XML Author.
@ -159,7 +159,7 @@ artifacts to the 0.94/ directory of the `asf-site` branch.
The HBase Reference Guide is written in Asciidoc and built using link:http://asciidoctor.org[AsciiDoctor].
The following cheat sheet is included for your reference. More nuanced and comprehensive documentation
is available at link:http://asciidoctor.org/docs/user-manual/.
is available at http://asciidoctor.org/docs/user-manual/.
.AsciiDoc Cheat Sheet
[cols="1,1,a",options="header"]
@ -186,7 +186,8 @@ is available at link:http://asciidoctor.org/docs/user-manual/.
include\::path/to/app.rb[]
----
................
| Include only part of a separate file | Similar to Javadoc | See link:http://asciidoctor.org/docs/user-manual/#by-tagged-regions
| Include only part of a separate file | Similar to Javadoc
| See http://asciidoctor.org/docs/user-manual/#by-tagged-regions
| Filenames, directory names, new terms | italic | \_hbase-default.xml_
| External naked URLs | A link with the URL as link text |
----
@ -285,7 +286,11 @@ Title:: content
Title::
content
----
| Sidebars, quotes, or other blocks of text | a block of text, formatted differently from the default | Delimited using different delimiters, see link:http://asciidoctor.org/docs/user-manual/#built-in-blocks-summary. Some of the examples above use delimiters like \...., ----,====.
| Sidebars, quotes, or other blocks of text
| a block of text, formatted differently from the default
| Delimited using different delimiters,
see http://asciidoctor.org/docs/user-manual/#built-in-blocks-summary.
Some of the examples above use delimiters like \...., ----,====.
........
[example]
====

View File

@ -252,7 +252,8 @@ However, the version is always stored as the last four-byte integer in the file.
|===
| Version 1 | Version 2
| |File info offset (long)
| Data index offset (long)| loadOnOpenOffset (long) /The offset of the sectionthat we need toload when opening the file./
| Data index offset (long)
| loadOnOpenOffset (long) /The offset of the section that we need to load when opening the file./
| | Number of data index entries (int)
| metaIndexOffset (long) /This field is not being used by the version 1 reader, so we removed it from version 2./ | uncompressedDataIndexSize (long) /The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks./
| | Number of meta index entries (int)
@ -260,7 +261,7 @@ However, the version is always stored as the last four-byte integer in the file.
| numEntries (int) | numEntries (long)
| Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int) | Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
| | The number of levels in the data block index (int)
| | firstDataBlockOffset (long) /The offset of the first first data block. Used when scanning./
| | firstDataBlockOffset (long) /The offset of the first data block. Used when scanning./
| | lastDataBlockEnd (long) /The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning./
| Version: 1 (int) | Version: 2 (int)
|===

View File

@ -41,7 +41,8 @@ Technically speaking, HBase is really more a "Data Store" than "Data Base" becau
However, HBase has many features which supports both linear and modular scaling.
HBase clusters expand by adding RegionServers that are hosted on commodity class servers.
If a cluster expands from 10 to 20 RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best performance requires specialized hardware and storage devices.
An RDBMS can scale well, but only up to a point - specifically, the size of a single database
server - and for the best performance requires specialized hardware and storage devices.
HBase features of note are:
* Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore.
@ -140,7 +141,7 @@ If a region has both an empty start and an empty end key, it is the only region
In the (hopefully unlikely) event that programmatic processing of catalog metadata
is required, see the
+++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte[]%29">Writables</a>+++
+++<a href="http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/util/Writables.html#getHRegionInfo%28byte%5B%5D%29">Writables</a>+++
utility.
[[arch.catalog.startup]]
@ -172,7 +173,7 @@ The API changed in HBase 1.0. For connection configuration information, see <<cl
==== API as of HBase 1.0.0
Its been cleaned up and users are returned Interfaces to work against rather than particular types.
It's been cleaned up and users are returned Interfaces to work against rather than particular types.
In HBase 1.0, obtain a `Connection` object from `ConnectionFactory` and thereafter, get from it instances of `Table`, `Admin`, and `RegionLocator` on an as-need basis.
When done, close the obtained instances.
Finally, be sure to cleanup your `Connection` instance before exiting.
@ -295,7 +296,11 @@ scan.setFilter(list);
[[client.filter.cv.scvf]]
==== SingleColumnValueFilter
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html[SingleColumnValueFilter] can be used to test column values for equivalence (`CompareOp.EQUAL`), inequality (`CompareOp.NOT_EQUAL`), or ranges (e.g., `CompareOp.GREATER`). The following is example of testing equivalence a column to a String value "my value"...
A SingleColumnValueFilter (see:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html)
can be used to test column values for equivalence (`CompareOp.EQUAL`),
inequality (`CompareOp.NOT_EQUAL`), or ranges (e.g., `CompareOp.GREATER`). The following is an
example of testing equivalence of a column to a String value "my value"...
[source,java]
----
@ -694,7 +699,8 @@ Here are others that you may have to take into account:
Catalog Tables::
The `-ROOT-` (prior to HBase 0.96, see <<arch.catalog.root,arch.catalog.root>>) and `hbase:meta` tables are forced into the block cache and have the in-memory priority which means that they are harder to evict.
The former never uses more than a few hundreds bytes while the latter can occupy a few MBs (depending on the number of regions).
The former never uses more than a few hundred bytes while the latter can occupy a few MBs
(depending on the number of regions).
HFiles Indexes::
An _HFile_ is the file format that HBase uses to store data in HDFS.
@ -878,7 +884,10 @@ image::region_split_process.png[Region Split Process]
. The Master learns about this znode, since it has a watcher for the parent `region-in-transition` znode.
. The RegionServer creates a sub-directory named `.splits` under the parents `region` directory in HDFS.
. The RegionServer closes the parent region and marks the region as offline in its local data structures. *THE SPLITTING REGION IS NOW OFFLINE.* At this point, client requests coming to the parent region will throw `NotServingRegionException`. The client will retry with some backoff. The closing region is flushed.
. The RegionServer creates region directories under the `.splits` directory, for daughter regions A and B, and creates necessary data structures. Then it splits the store files, in the sense that it creates two link:http://www.google.com/url?q=http%3A%2F%2Fhbase.apache.org%2Fapidocs%2Forg%2Fapache%2Fhadoop%2Fhbase%2Fio%2FReference.html&sa=D&sntz=1&usg=AFQjCNEkCbADZ3CgKHTtGYI8bJVwp663CA[Reference] files per store file in the parent region. Those reference files will point to the parent regions'files.
. The RegionServer creates region directories under the `.splits` directory, for daughter
regions A and B, and creates necessary data structures. Then it splits the store files,
in the sense that it creates two Reference files per store file in the parent region.
Those reference files will point to the parent region's files.
. The RegionServer creates the actual region directory in HDFS, and moves the reference files for each daughter.
. The RegionServer sends a `Put` request to the `.META.` table, to set the parent as offline in the `.META.` table and add information about daughter regions. At this point, there wont be individual entries in `.META.` for the daughters. Clients will see that the parent region is split if they scan `.META.`, but wont know about the daughters until they appear in `.META.`. Also, if this `Put` to `.META`. succeeds, the parent will be effectively split. If the RegionServer fails before this RPC succeeds, Master and the next Region Server opening the region will clean dirty state about the region split. After the `.META.` update, though, the region split will be rolled-forward by Master.
. The RegionServer opens daughters A and B in parallel.
@ -1008,7 +1017,8 @@ If you set the `hbase.hlog.split.skip.errors` option to `true`, errors are treat
* Processing of the WAL will continue
If the `hbase.hlog.split.skip.errors` option is set to `false`, the default, the exception will be propagated and the split will be logged as failed.
See link:https://issues.apache.org/jira/browse/HBASE-2958[HBASE-2958 When hbase.hlog.split.skip.errors is set to false, we fail the split but thats it].
See link:https://issues.apache.org/jira/browse/HBASE-2958[HBASE-2958 When
hbase.hlog.split.skip.errors is set to false, we fail the split but that's it].
We need to do more than just fail split if this flag is set.
====== How EOFExceptions are treated when splitting a crashed RegionServer's WALs
@ -1117,7 +1127,8 @@ Based on the state of the task whose data is changed, the split log manager does
Each RegionServer runs a daemon thread called the _split log worker_, which does the work to split the logs.
The daemon thread starts when the RegionServer starts, and registers itself to watch HBase znodes.
If any splitlog znode children change, it notifies a sleeping worker thread to wake up and grab more tasks.
If if a worker's current task's node data is changed, the worker checks to see if the task has been taken by another worker.
If a worker's current task's node data is changed,
the worker checks to see if the task has been taken by another worker.
If so, the worker thread stops work on the current task.
+
The worker monitors the splitlog znode constantly.
@ -1127,7 +1138,7 @@ At this point, the split log worker scans for another unclaimed task.
+
.How the Split Log Worker Approaches a Task
* It queries the task state and only takes action if the task is in `TASK_UNASSIGNED `state.
* If the task is is in `TASK_UNASSIGNED` state, the worker attempts to set the state to `TASK_OWNED` by itself.
* If the task is in `TASK_UNASSIGNED` state, the worker attempts to set the state to `TASK_OWNED` by itself.
If it fails to set the state, another worker will try to grab it.
The split log manager will also ask all workers to rescan later if the task remains unassigned.
* If the worker succeeds in taking ownership of the task, it tries to get the task state again to make sure it really gets it asynchronously.
@ -1135,7 +1146,7 @@ At this point, the split log worker scans for another unclaimed task.
** Get the HBase root folder, create a temp folder under the root, and split the log file to the temp folder.
** If the split was successful, the task executor sets the task to state `TASK_DONE`.
** If the worker catches an unexpected IOException, the task is set to state `TASK_ERR`.
** If the worker is shutting down, set the the task to state `TASK_RESIGNED`.
** If the worker is shutting down, set the task to state `TASK_RESIGNED`.
** If the task is taken by another worker, just log it.
@ -1326,7 +1337,7 @@ image::region_states.png[]
. Before assigning a region, the master moves the region to `OFFLINE` state automatically if it is in `CLOSED` state.
. When a RegionServer is about to split a region, it notifies the master.
The master moves the region to be split from `OPEN` to `SPLITTING` state and add the two new regions to be created to the RegionServer.
These two regions are in `SPLITING_NEW` state initially.
These two regions are in `SPLITTING_NEW` state initially.
. After notifying the master, the RegionServer starts to split the region.
Once past the point of no return, the RegionServer notifies the master again so the master can update the `hbase:meta` table.
However, the master does not update the region states until it is notified by the server that the split is done.
@ -1404,8 +1415,8 @@ hbase> create 'test', {METHOD => 'table_att', CONFIG => {'SPLIT_POLICY' => 'org.
----
The default split policy can be overwritten using a custom
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html
[RegionSplitPolicy(HBase 0.94+)]. Typically a custom split policy should extend HBase's default split policy:
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html[RegionSplitPolicy(HBase 0.94+)].
Typically a custom split policy should extend HBase's default split policy:
link:http://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.html[ConstantSizeRegionSplitPolicy].
The policy can be set globally through the HBaseConfiguration used or on a per table basis:
@ -1972,8 +1983,8 @@ Why?
* 100 -> No, because sum(50, 23, 12, 12) * 1.0 = 97.
* 50 -> No, because sum(23, 12, 12) * 1.0 = 47.
* 23 -> Yes, because sum(12, 12) * 1.0 = 24.
* 12 -> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
* 12 -> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
* 12 -> Yes, because the previous file has been included, and because this does not exceed the max-file limit of 5
* 12 -> Yes, because the previous file had been included, and because this does not exceed the max-file limit of 5.
[[compaction.file.selection.example2]]
====== Minor Compaction File Selection - Example #2 (Not Enough Files ToCompact)
@ -2234,7 +2245,7 @@ See link:http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and
[[arch.bulk.load.adv]]
=== Advanced Usage
Although the `importtsv` tool is useful in many cases, advanced users may want to generate data programatically, or import data from other formats.
Although the `importtsv` tool is useful in many cases, advanced users may want to generate data programmatically, or import data from other formats.
To get started doing so, dig into `ImportTsv.java` and check the JavaDoc for HFileOutputFormat.
The import step of the bulk load can also be done programmatically.
@ -2330,8 +2341,8 @@ In terms of semantics, TIMELINE consistency as implemented by HBase differs from
.Timeline Consistency
image::timeline_consistency.png[Timeline Consistency]
To better understand the TIMELINE semantics, lets look at the above diagram.
Lets say that there are two clients, and the first one writes x=1 at first, then x=2 and x=3 later.
To better understand the TIMELINE semantics, let's look at the above diagram.
Let's say that there are two clients, and the first one writes x=1 at first, then x=2 and x=3 later.
As above, all writes are handled by the primary region replica.
The writes are saved in the write ahead log (WAL), and replicated to the other replicas asynchronously.
In the above diagram, notice that replica_id=1 received 2 updates, and its data shows that x=2, while the replica_id=2 only received a single update, and its data shows that x=1.
@ -2367,7 +2378,7 @@ The regions opened in secondary mode will share the same data files with the pri
This feature is delivered in two phases, Phase 1 and 2. The first phase is done in time for HBase-1.0.0 release. Meaning that using HBase-1.0.x, you can use all the features that are marked for Phase 1. Phase 2 is committed in HBase-1.1.0, meaning all HBase versions after 1.1.0 should contain Phase 2 items.
=== Propagating writes to region replicas
As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommeded.
As discussed above writes only go to the primary region replica. For propagating the writes from the primary region replica to the secondaries, there are two different mechanisms. For read-only tables, you do not need to use any of the following methods. Disabling and enabling the table should make the data available in all region replicas. For mutable tables, you have to use *only* one of the following mechanisms: storefile refresher, or async wal replication. The latter is recommended.
==== StoreFile Refresher
The first mechanism is store file refresher which is introduced in HBase-1.0+. Store file refresher is a thread per region server, which runs periodically, and does a refresh operation for the store files of the primary region for the secondary region replicas. If enabled, the refresher will ensure that the secondary region replicas see the new flushed, compacted or bulk loaded files from the primary region in a timely manner. However, this means that only flushed data can be read back from the secondary region replicas, and after the refresher is run, making the secondaries lag behind the primary for an a longer time.
@ -2399,7 +2410,7 @@ Currently, Async WAL Replication is not done for the META tables WAL. The met
The secondary region replicas refer to the data files of the primary region replica, but they have their own memstores (in HBase-1.1+) and uses block cache as well. However, one distinction is that the secondary region replicas cannot flush the data when there is memory pressure for their memstores. They can only free up memstore memory when the primary region does a flush and this flush is replicated to the secondary. Since in a region server hosting primary replicas for some regions and secondaries for some others, the secondaries might cause extra flushes to the primary regions in the same host. In extreme situations, there can be no memory left for adding new writes coming from the primary via wal replication. For unblocking this situation (and since secondary cannot flush by itself), the secondary is allowed to do a “store file refresh” by doing a file system list operation to pick up new files from primary, and possibly dropping its memstore. This refresh will only be performed if the memstore size of the biggest secondary region replica is at least `hbase.region.replica.storefile.refresh.memstore.multiplier` (default 4) times bigger than the biggest memstore of a primary replica. One caveat is that if this is performed, the secondary can observe partial row updates across column families (since column families are flushed independently). The default should be good to not do this operation frequently. You can set this value to a large number to disable this feature if desired, but be warned that it might cause the replication to block forever.
=== Secondary replica failover
When a secondary region replica first comes online, or fails over, it may have served some edits from its memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a “region open event” replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message “The region's reads are disabled”. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this feature if needed.
When a secondary region replica first comes online, or fails over, it may have served some edits from its memstore. Since the recovery is handled differently for secondary replicas, the secondary has to ensure that it does not go back in time before it starts serving requests after assignment. For doing that, the secondary waits until it observes a full flush cycle (start flush, commit flush) or a “region open event” replicated from the primary. Until this happens, the secondary region replica will reject all read requests by throwing an IOException with message “The region's reads are disabled”. However, the other replicas will probably still be available to read, thus not causing any impact for the rpc with TIMELINE consistency. To facilitate faster recovery, the secondary region will trigger a flush request from the primary when it is opened. The configuration property `hbase.region.replica.wait.for.primary.flush` (enabled by default) can be used to disable this feature if needed.
@ -2435,7 +2446,7 @@ Instead you can change the number of region replicas per table to increase or de
<name>hbase.region.replica.replication.enabled</name>
<value>true</value>
<description>
Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutatations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work.
Whether asynchronous WAL replication to the secondary region replicas is enabled or not. If this is enabled, a replication peer named "region_replica_replication" will be created which will tail the logs and replicate the mutations to region replicas for tables that have region replication > 1. If this is enabled once, disabling this replication also requires disabling the replication peer using shell or ReplicationAdmin java class. Replication to secondary region replicas works over standard inter-cluster replication. So replication, if disabled explicitly, also has to be enabled by setting "hbase.replication"· to true for this feature to work.
</description>
</property>
<property>
@ -2603,7 +2614,7 @@ hbase> scan 't1', {CONSISTENCY => 'TIMELINE'}
==== Java
You can set set the consistency for Gets and Scans and do requests as follows.
You can set the consistency for Gets and Scans and do requests as follows.
[source,java]
----

View File

@ -55,7 +55,7 @@ These jobs were consistently found to be waiting on map and reduce tasks assigne
.Datanodes:
* Two 12-core processors
* Six Enerprise SATA disks
* Six Enterprise SATA disks
* 24GB of RAM
* Two bonded gigabit NICs

View File

@ -56,7 +56,7 @@ If owners are absent -- busy or otherwise -- two +1s by non-owners will suffice.
Patches that span components need at least two +1s before they can be committed, preferably +1s by owners of components touched by the x-component patch (TODO: This needs tightening up but I think fine for first pass).
Any -1 on a patch by anyone vetos a patch; it cannot be committed until the justification for the -1 is addressed.
Any -1 on a patch by anyone vetoes a patch; it cannot be committed until the justification for the -1 is addressed.
[[hbase.fix.version.in.jira]]
.How to set fix version in JIRA on issue resolve

View File

@ -151,7 +151,7 @@ If you see the following in your HBase logs, you know that HBase was unable to l
----
If the libraries loaded successfully, the WARN message does not show.
Lets presume your Hadoop shipped with a native library that suits the platform you are running HBase on.
Let's presume your Hadoop shipped with a native library that suits the platform you are running HBase on.
To check if the Hadoop native library is available to HBase, run the following tool (available in Hadoop 2.1 and greater):
[source]
----
@ -170,7 +170,7 @@ Above shows that the native hadoop library is not available in HBase context.
To fix the above, either copy the Hadoop native libraries local or symlink to them if the Hadoop and HBase stalls are adjacent in the filesystem.
You could also point at their location by setting the `LD_LIBRARY_PATH` environment variable.
Where the JVM looks to find native librarys is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM` is the label for the platform your HBase is installed on.
Where the JVM looks to find native libraries is "system dependent" (See `java.lang.System#loadLibrary(name)`). On linux, by default, is going to look in _lib/native/PLATFORM_ where `PLATFORM` is the label for the platform your HBase is installed on.
On a local linux machine, it seems to be the concatenation of the java properties `os.name` and `os.arch` followed by whether 32 or 64 bit.
HBase on startup prints out all of the java system properties so find the os.name and os.arch in the log.
For example:

View File

@ -162,7 +162,7 @@ For example, assuming that a schema had 3 ColumnFamilies per region with an aver
+
Another related setting is the number of processes a user is allowed to run at once. In Linux and Unix, the number of processes is set using the `ulimit -u` command. This should not be confused with the `nproc` command, which controls the number of CPUs available to a given user. Under load, a `ulimit -u` that is too low can cause OutOfMemoryError exceptions. See Jack Levin's major HDFS issues thread on the hbase-users mailing list, from 2011.
+
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on you hadoop cluster is Aaron Kimballs' Configuration Parameters: What can you just ignore?
Configuring the maximum number of file descriptors and processes for the user who is running the HBase process is an operating system configuration, rather than an HBase configuration. It is also important to be sure that the settings are changed for the user that actually runs HBase. To see which user started HBase, and that user's ulimit configuration, look at the first line of the HBase log for that instance. A useful read setting config on your hadoop cluster is Aaron Kimball's Configuration Parameters: What can you just ignore?
+
.`ulimit` Settings on Ubuntu
====
@ -410,7 +410,7 @@ Zookeeper binds to a well known port so clients may talk to HBase.
=== Distributed
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a. _pseudo-distributed_ -- and _fully-distributed_ where the daemons are spread across all nodes in the cluster.
The _pseudo-distributed_ vs. _fully-distributed_ nomenclature comes from Hadoop.
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the _Hadoop Distributed File System_ (HDFS). Fully-distributed mode can ONLY run on HDFS.
@ -540,7 +540,7 @@ HBase logs can be found in the _logs_ subdirectory.
Check them out especially if HBase had trouble starting.
HBase also puts up a UI listing vital attributes.
By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at _http://master.example.org:16010_ to see the web interface.
By default it's deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational HTTP server at port 16030). If the Master is running on a host named `master.example.org` on the default port, point your browser at pass:[http://master.example.org:16010] to see the web interface.
Prior to HBase 0.98 the master UI was deployed on port 60010, and the HBase RegionServers UI on port 60030.
@ -604,7 +604,7 @@ ZooKeeper is where all these values are kept.
Thus clients require the location of the ZooKeeper ensemble before they can do anything else.
Usually this the ensemble location is kept out in the _hbase-site.xml_ and is picked up by the client from the `CLASSPATH`.
If you are configuring an IDE to run a HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
If you are configuring an IDE to run an HBase client, you should include the _conf/_ directory on your classpath so _hbase-site.xml_ settings can be found (or add _src/test/resources_ to pick up the hbase-site.xml used by tests).
Minimally, a client of HBase needs several libraries in its `CLASSPATH` when connecting to a cluster, including:
[source]
@ -917,7 +917,7 @@ See <<master.processes.loadbalancer,master.processes.loadbalancer>> for more inf
==== Disabling Blockcache
Do not turn off block cache (You'd do it by setting `hbase.block.cache.size` to zero). Currently we do not do well if you do this because the RegionServer will spend all its time loading HFile indices over and over again.
If your working set it such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
If your working set is such that block cache does you no good, at least size the block cache such that HFile indices will stay up in the cache (you can get a rough idea on the size you need by surveying RegionServer UIs; you'll see index block size accounted near the top of the webpage).
[[nagles]]
==== link:http://en.wikipedia.org/wiki/Nagle's_algorithm[Nagle's] or the small package problem
@ -930,7 +930,7 @@ You might also see the graphs on the tail of link:https://issues.apache.org/jira
==== Better Mean Time to Recover (MTTR)
This section is about configurations that will make servers come back faster after a fail.
See the Deveraj Das an Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
See the Deveraj Das and Nicolas Liochon blog post link:http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/[Introduction to HBase Mean Time to Recover (MTTR)] for a brief introduction.
The issue link:https://issues.apache.org/jira/browse/HBASE-8389[HBASE-8354 forces Namenode into loop with lease recovery requests] is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes added to HDFS. Read the Varun Sharma comments.
The below suggested configurations are Varun's suggestions distilled and tested.
@ -1087,7 +1087,7 @@ NOTE: To enable the HBase JMX implementation on Master, you also need to add bel
[source,xml]
----
<property>
<ame>hbase.coprocessor.master.classes</name>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.JMXListener</value>
</property>
----

View File

@ -101,7 +101,7 @@ link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/Cop
. Load the Coprocessor: Currently there are two ways to load the Coprocessor. +
Static:: Loading from configuration
Dynammic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). +
Dynamic:: Loading via 'hbase shell' or via Java code using HTableDescriptor class). +
For more details see <<cp_loading,Loading Coprocessors>>.
. Finally your client-side code to call the Coprocessor. This is the easiest step, as HBase
@ -239,10 +239,10 @@ link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/HTable.h
From version 0.96, implementing Endpoint Coprocessor is not straight forward. Now it is done with
the help of Google's Protocol Buffer. For more details on Protocol Buffer, please see
link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
Endpoints Coprocessor written in version 0.94 are not compatible with with version 0.96 or later
Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later
(for more details, see
link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]),
so if your are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you
so if you are upgrading your HBase cluster from version 0.94 (or before) to 0.96 (or later) you
have to rewrite your Endpoint coprocessor.
For example see <<cp_example,Examples>>
@ -252,7 +252,7 @@ For example see <<cp_example,Examples>>
== Loading Coprocessors
_Loading of Coprocessor refers to the process of making your custom Coprocessor implementation
available to the the HBase, so that when a requests comes in or an event takes place the desired
available to HBase, so that when a request comes in or an event takes place the desired
functionality implemented in your custom code gets executed. +
Coprocessor can be loaded broadly in two ways. One is static (loading through configuration files)
and the other one is dynamic loading (using hbase shell or java code).
@ -271,10 +271,10 @@ sub elements <name> and <value> respectively.
... 'hbase.coprocessor.region.classes' for RegionObservers and Endpoints.
... 'hbase.coprocessor.wal.classes' for WALObservers.
... 'hbase.coprocessor.master.classes' for MasterObservers.
.. <value> must contain the fully qualified class name of your class implmenting the Coprocessor.
.. <value> must contain the fully qualified class name of your class implementing the Coprocessor.
+
For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directiory):
following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
+
[source,xml]
----
@ -297,7 +297,7 @@ When calling out to registered observers, the framework executes their callbacks
sorted order of their priority. +
Ties are broken arbitrarily.
. Put your code on classpth of HBase: There are various ways to do so, like adding jars on
. Put your code on classpath of HBase: There are various ways to do so, like adding jars on
classpath etc. One easy way to do this is to drop the jar (containing you code and all the
dependencies) in 'lib' folder of the HBase installation.
@ -455,7 +455,7 @@ hbase(main):003:0> alter 'users', METHOD => 'table_att_unset',
hbase(main):004:0* NAME => 'coprocessor$1'
----
. Using HtableDescriptor: Simply reload the table definition _without_ setting the value of
. Using HTableDescriptor: Simply reload the table definition _without_ setting the value of
Coprocessor either in setValue() or addCoprocessor() methods. This will remove the Coprocessor
attached to this table, if any. For example:
+
@ -624,12 +624,12 @@ hadoop fs -copyFromLocal coprocessor.jar coprocessor.jar
[source,java]
----
Configuration conf = HBaseConfiguration.create();
// Use below code for HBase verion 1.x.x or above.
// Use below code for HBase version 1.x.x or above.
Connection connection = ConnectionFactory.createConnection(conf);
TableName tableName = TableName.valueOf("users");
Table table = connection.getTable(tableName);
//Use below code HBase verion 0.98.xx or below.
//Use below code HBase version 0.98.xx or below.
//HConnection connection = HConnectionManager.createConnection(conf);
//HTableInterface table = connection.getTable("users");
@ -789,12 +789,12 @@ following code as shown below:
----
Configuration conf = HBaseConfiguration.create();
// Use below code for HBase verion 1.x.x or above.
// Use below code for HBase version 1.x.x or above.
Connection connection = ConnectionFactory.createConnection(conf);
TableName tableName = TableName.valueOf("users");
Table table = connection.getTable(tableName);
//Use below code HBase verion 0.98.xx or below.
//Use below code HBase version 0.98.xx or below.
//HConnection connection = HConnectionManager.createConnection(conf);
//HTableInterface table = connection.getTable("users");

View File

@ -171,7 +171,7 @@ For more information about the internals of how Apache HBase stores data, see <<
A namespace is a logical grouping of tables analogous to a database in relation database systems.
This abstraction lays the groundwork for upcoming multi-tenancy related features:
* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (ie regions, tables) a namespace can consume.
* Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (i.e. regions, tables) a namespace can consume.
* Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
* Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a course level of isolation.
@ -257,7 +257,7 @@ For example, the columns _courses:history_ and _courses:math_ are both members o
The colon character (`:`) delimits the column family from the column family qualifier.
The column family prefix must be composed of _printable_ characters.
The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes.
Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up an running.
Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up and running.
Physically, all column family members are stored together on the filesystem.
Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.
@ -279,7 +279,7 @@ Gets are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hba
=== Put
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List, java.lang.Object[])[Table.batch] (non-writeBuffer).
link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put(org.apache.hadoop.hbase.client.Put)[Table.put] (writeBuffer) or link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch(java.util.List,%20java.lang.Object%5B%5D)[Table.batch] (non-writeBuffer).
[[scan]]
=== Scans

View File

@ -90,7 +90,7 @@ We used to be on SVN.
We migrated.
See link:https://issues.apache.org/jira/browse/INFRA-7768[Migrate Apache HBase SVN Repos to Git].
See link:http://hbase.apache.org/source-repository.html[Source Code
Management] page for contributor and committer links or seach for HBase on the link:http://git.apache.org/[Apache Git] page.
Management] page for contributor and committer links or search for HBase on the link:http://git.apache.org/[Apache Git] page.
== IDEs
@ -133,7 +133,7 @@ If you cloned the project via git, download and install the Git plugin (EGit). A
==== HBase Project Setup in Eclipse using `m2eclipse`
The easiest way is to use the +m2eclipse+ plugin for Eclipse.
Eclipse Indigo or newer includes +m2eclipse+, or you can download it from link:http://www.eclipse.org/m2e//. It provides Maven integration for Eclipse, and even lets you use the direct Maven commands from within Eclipse to compile and test your project.
Eclipse Indigo or newer includes +m2eclipse+, or you can download it from http://www.eclipse.org/m2e/. It provides Maven integration for Eclipse, and even lets you use the direct Maven commands from within Eclipse to compile and test your project.
To import the project, click and select the HBase root directory. `m2eclipse` locates all the hbase modules for you.
@ -146,7 +146,7 @@ If you install +m2eclipse+ and import HBase in your workspace, do the following
----
Failed to execute goal
org.apache.maven.plugins:maven-antrun-plugin:1.6:run (default) on project hbase:
'An Ant BuildException has occured: Replace: source file .../target/classes/hbase-default.xml
'An Ant BuildException has occurred: Replace: source file .../target/classes/hbase-default.xml
doesn't exist
----
+
@ -213,7 +213,7 @@ For additional information on setting up Eclipse for HBase development on Window
=== IntelliJ IDEA
You can set up IntelliJ IDEA for similar functinoality as Eclipse.
You can set up IntelliJ IDEA for similar functionality as Eclipse.
Follow these steps.
. Select
@ -227,7 +227,7 @@ Using the Eclipse Code Formatter plugin for IntelliJ IDEA, you can import the HB
=== Other IDEs
It would be userful to mirror the <<eclipse,eclipse>> set-up instructions for other IDEs.
It would be useful to mirror the <<eclipse,eclipse>> set-up instructions for other IDEs.
If you would like to assist, please have a look at link:https://issues.apache.org/jira/browse/HBASE-11704[HBASE-11704].
[[build]]
@ -331,13 +331,13 @@ Tests may not all pass so you may need to pass `-DskipTests` unless you are incl
====
You will see ERRORs like the above title if you pass the _default_ profile; e.g.
if you pass +hadoop.profile=1.1+ when building 0.96 or +hadoop.profile=2.0+ when building hadoop 0.98; just drop the hadoop.profile stipulation in this case to get your build to run again.
This seems to be a maven pecularity that is probably fixable but we've not spent the time trying to figure it.
This seems to be a maven peculiarity that is probably fixable but we've not spent the time trying to figure it.
====
Similarly, for 3.0, you would just replace the profile value.
Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artifact - you will need to build and install your own in your local maven repository if you want to run against this profile.
In earilier versions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
In earlier versions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
If you are running, for example HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:
[source,bourne]
@ -415,7 +415,7 @@ mvn -DskipTests package assembly:single deploy
==== Build Gotchas
If you see `Unable to find resource 'VM_global_library.vm'`, ignore it.
Its not an error.
It's not an error.
It is link:http://jira.codehaus.org/browse/MSITE-286[officially
ugly] though.
@ -504,7 +504,7 @@ For building earlier versions, the process is different.
See this section under the respective release documentation folders.
.Point Releases
If you are making a point release (for example to quickly address a critical incompatability or security problem) off of a release branch instead of a development branch, the tagging instructions are slightly different.
If you are making a point release (for example to quickly address a critical incompatibility or security problem) off of a release branch instead of a development branch, the tagging instructions are slightly different.
I'll prefix those special steps with _Point Release Only_.
.Before You Begin
@ -516,7 +516,7 @@ You should also have tried recent branch tips out on a cluster under load, perha
[NOTE]
====
At this point you should tag the previous release branch (ex: 0.96.1) with the new point release tag (e.g.
0.96.1.1 tag). Any commits with changes for the point release should be appled to the new tag.
0.96.1.1 tag). Any commits with changes for the point release should be applied to the new tag.
====
The Hadoop link:http://wiki.apache.org/hadoop/HowToRelease[How To
@ -584,8 +584,8 @@ $ mvn clean install -DskipTests assembly:single -Dassembly.file=hbase-assembly/s
Extract the tarball and make sure it looks good.
A good test for the src tarball being 'complete' is to see if you can build new tarballs from this source bundle.
If the source tarball is good, save it off to a _version directory_, a directory somewhere where you are collecting all of the tarballs you will publish as part of the release candidate.
For example if you were building a hbase-0.96.0 release candidate, you might call the directory _hbase-0.96.0RC0_.
Later you will publish this directory as our release candidate up on http://people.apache.org/~YOU.
For example if you were building an hbase-0.96.0 release candidate, you might call the directory _hbase-0.96.0RC0_.
Later you will publish this directory as our release candidate up on pass:[http://people.apache.org/~YOU].
. Build the binary tarball.
+
@ -1146,7 +1146,7 @@ However, maven will do this for us; just use: +mvn
This is very similar to how you specify running a subset of unit tests (see above), but use the property `it.test` instead of `test`.
To just run `IntegrationTestClassXYZ.java`, use: +mvn
failsafe:integration-test -Dit.test=IntegrationTestClassXYZ+ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java: +mvn failsafe:integration-test -Dit.test=*ClassX*+ This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*". You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups.This would look something like: +mvn
failsafe:integration-test -Dit.test=IntegrationTestClassXYZ+ The next thing you might want to do is run groups of integration tests, say all integration tests that are named IntegrationTestClassX*.java: +mvn failsafe:integration-test -Dit.test=*ClassX*+ This runs everything that is an integration test that matches *ClassX*. This means anything matching: "**/IntegrationTest*ClassX*". You can also run multiple groups of integration tests using comma-delimited lists (similar to unit tests). Using a list of matches still supports full regex matching for each of the groups. This would look something like: +mvn
failsafe:integration-test -Dit.test=*ClassX*, *ClassY+
[[maven.build.commands.integration.tests.distributed]]
@ -1183,8 +1183,9 @@ For other deployment options, a ClusterManager can be implemented and plugged in
[[maven.build.commands.integration.tests.destructive]]
==== Destructive integration / system tests (ChaosMonkey)
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html
[same-named tool by Netflix's Chaos Monkey tool]. ChaosMonkey simulates real-world
HBase 0.96 introduced a tool named `ChaosMonkey`, modeled after
link:http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html[same-named tool by Netflix's Chaos Monkey tool].
ChaosMonkey simulates real-world
faults in a running cluster by killing or disconnecting random servers, or injecting
other failures into the environment. You can use ChaosMonkey as a stand-alone tool
to run a policy while other tests are running. In some environments, ChaosMonkey is
@ -1262,8 +1263,8 @@ HBase ships with several ChaosMonkey policies, available in the
[[chaos.monkey.properties]]
==== Configuring Individual ChaosMonkey Actions
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348
[HBASE-11348]), ChaosMonkey integration tests can be configured per test run.
Since HBase version 1.0.0 (link:https://issues.apache.org/jira/browse/HBASE-11348[HBASE-11348]),
ChaosMonkey integration tests can be configured per test run.
Create a Java properties file in the HBase classpath and pass it to ChaosMonkey using
the `-monkeyProps` configuration flag. Configurable properties, along with their default
values if applicable, are listed in the `org.apache.hadoop.hbase.chaos.factories.MonkeyConstants`
@ -1604,7 +1605,7 @@ All are subject to challenge of course but until then, please hold to the rules
ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.
* .ExceptionsThere are currently a few exceptions that we need to fix around whether a table is enabled or disabled.
* .Exceptions: There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.
* Replication data is currently stored only in ZooKeeper.
Deleting ZooKeeper data related to replication may cause replication to be disabled.
Do not delete the replication tree, _/hbase/replication/_.
@ -1866,9 +1867,9 @@ If the push fails for any reason, fix the problem or ask for help.
Do not do a +git push --force+.
+
Before you can commit a patch, you need to determine how the patch was created.
The instructions and preferences around the way to create patches have changed, and there will be a transition periond.
The instructions and preferences around the way to create patches have changed, and there will be a transition period.
+
* .Determine How a Patch Was CreatedIf the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using +git format-patch+.
* .Determine How a Patch Was Created: If the first few lines of the patch look like the headers of an email, with a From, Date, and Subject, it was created using +git format-patch+.
This is the preference, because you can reuse the submitter's commit message.
If the commit message is not appropriate, you can still use the commit, then run the command +git
rebase -i origin/master+, and squash and reword as appropriate.
@ -1971,7 +1972,7 @@ When the amending author is different from the original committer, add notice of
from master to branch].
[[committer.tests]]
====== Committers are responsible for making sure commits do not break thebuild or tests
====== Committers are responsible for making sure commits do not break the build or tests
If a committer commits a patch, it is their responsibility to make sure it passes the test suite.
It is helpful if contributors keep an eye out that their patch does not break the hbase build and/or tests, but ultimately, a contributor cannot be expected to be aware of all the particular vagaries and interconnections that occur in a project like HBase.

View File

@ -77,7 +77,7 @@ of the <<security>> chapter.
=== Using REST Endpoints
The following examples use the placeholder server `http://example.com:8000`, and
The following examples use the placeholder server pass:[http://example.com:8000], and
the following commands can all be run using `curl` or `wget` commands. You can request
plain text (the default), XML , or JSON output by adding no header for plain text,
or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.

View File

@ -46,7 +46,7 @@ What is the history of HBase?::
=== Upgrading
How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?::
In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the `hbase-client` module or another module as appropriate, rather than a single JAR. You can model your Maven depency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information.
In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the `hbase-client` module or another module as appropriate, rather than a single JAR. You can model your Maven dependency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information.
+
.Maven Dependency for HBase 0.98
[source,xml]

View File

@ -497,7 +497,8 @@ ZooKeeper session timeout in milliseconds. It is used in two different ways.
First, this value is used in the ZK client that HBase uses to connect to the ensemble.
It is also used by HBase when it starts a ZK server and it is passed as the 'maxSessionTimeout'. See
http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions.
For example, if a HBase region server connects to a ZK ensemble that's also managed by HBase, then the
For example, if an HBase region server connects to a ZK ensemble that's also managed
by HBase, then the
session timeout will be the one specified by this configuration. But, a region server that connects
to an ensemble managed with a different configuration will be subjected that ensemble's maxSessionTimeout. So,
even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and
@ -844,7 +845,7 @@ Time to sleep in between searches for work (in milliseconds).
.Description
How many time to retry attempting to write a version file
before just aborting. Each attempt is seperated by the
before just aborting. Each attempt is separated by the
hbase.server.thread.wakefrequency milliseconds.
+
.Default
@ -1578,7 +1579,7 @@ Set to true to skip the 'hbase.defaults.for.version' check.
Setting this to true can be useful in contexts other than
the other side of a maven generation; i.e. running in an
ide. You'll want to set this boolean to true to avoid
seeing the RuntimException complaint: "hbase-default.xml file
seeing the RuntimeException complaint: "hbase-default.xml file
seems to be for and old version of HBase (\${hbase.version}), this
version is X.X.X-SNAPSHOT"
+
@ -2139,7 +2140,7 @@ Fully qualified name of class implementing coordinated state manager.
Whether asynchronous WAL replication to the secondary region replicas is enabled or not.
If this is enabled, a replication peer named "region_replica_replication" will be created
which will tail the logs and replicate the mutatations to region replicas for tables that
which will tail the logs and replicate the mutations to region replicas for tables that
have region replication > 1. If this is enabled once, disabling this replication also
requires disabling the replication peer using shell or ReplicationAdmin java class.
Replication to secondary region replicas works over standard inter-cluster replication.

View File

@ -115,7 +115,7 @@ suit your environment, and restart or rolling restart the RegionServer.
<value>1000</value>
<description>
Number of opened file handlers to cache.
A larger value will benefit reads by provinding more file handlers per mob
A larger value will benefit reads by providing more file handlers per mob
file cache and would reduce frequent file opening and closing.
However, if this is set too high, this could lead to a "too many opened file handers"
The default value is 1000.
@ -167,7 +167,7 @@ These commands are also available via `Admin.compactMob` and
==== MOB Sweeper
HBase MOB a MapReduce job called the Sweeper tool for
optimization. The Sweeper tool oalesces small MOB files or MOB files with many
optimization. The Sweeper tool coalesces small MOB files or MOB files with many
deletions or updates. The Sweeper tool is not required if you use native MOB compaction, which
does not rely on MapReduce.

View File

@ -42,7 +42,7 @@ $ ./bin/hbase hbck
----
At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES present.
You may also want to run run hbck a few times because some inconsistencies can be transient (e.g.
You may also want to run hbck a few times because some inconsistencies can be transient (e.g.
cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g.
via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected.
The using the `-details` option will report more details including a representative listing of all the splits present in all the tables.
@ -177,7 +177,7 @@ $ ./bin/hbase hbck -fixMetaOnly -fixAssignments
==== Special cases: HBase version file is missing
HBase's data on the file system requires a version file in order to start.
If this flie is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file.
If this file is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file.
This assumes that the version of hbck you are running is the appropriate version for the HBase cluster.
==== Special case: Root and META are corrupt.

View File

@ -65,7 +65,7 @@ The dependencies only need to be available on the local `CLASSPATH`.
The following example runs the bundled HBase link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html[RowCounter] MapReduce job against a table named `usertable`.
If you have not set the environment variables expected in the command (the parts prefixed by a `$` sign and surrounded by curly braces), you can use the actual system paths instead.
Be sure to use the correct version of the HBase JAR for your system.
The backticks (``` symbols) cause ths shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`.
The backticks (``` symbols) cause the shell to execute the sub-commands, setting the output of `hbase classpath` (the command to dump HBase CLASSPATH) to `HADOOP_CLASSPATH`.
This example assumes you use a BASH-compatible shell.
[source,bash]
@ -279,7 +279,7 @@ That is where the logic for map-task assignment resides.
The following is an example of using HBase as a MapReduce source in read-only manner.
Specifically, there is a Mapper instance but no Reducer, and nothing is being emitted from the Mapper.
There job would be defined as follows...
The job would be defined as follows...
[source,java]
----
@ -592,7 +592,7 @@ public class MyMapper extends TableMapper<Text, LongWritable> {
== Speculative Execution
It is generally advisable to turn off speculative execution for MapReduce jobs that use HBase as a source.
This can either be done on a per-Job basis through properties, on on the entire cluster.
This can either be done on a per-Job basis through properties, or on the entire cluster.
Especially for longer running jobs, speculative execution will create duplicate map-tasks which will double-write your data to HBase; this is probably not what you want.
See <<spec.ex,spec.ex>> for more information.
@ -613,7 +613,7 @@ The following example shows a Cascading `Flow` which "sinks" data into an HBase
// emits two fields: "offset" and "line"
Tap source = new Hfs( new TextLine(), inputFileLhs );
// store data in a HBase cluster
// store data in an HBase cluster
// accepts fields "num", "lower", and "upper"
// will automatically scope incoming fields to their proper familyname, "left" or "right"
Fields keyFields = new Fields( "num" );

View File

@ -199,7 +199,7 @@ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -t 600000
By default, the canary tool only check the read operations, it's hard to find the problem in the
write path. To enable the write sniffing, you can run canary with the `-writeSniffing` option.
When the write sniffing is enabled, the canary tool will create a hbase table and make sure the
When the write sniffing is enabled, the canary tool will create an hbase table and make sure the
regions of the table distributed on all region servers. In each sniffing period, the canary will
try to put data to these regions to check the write availability of each region server.
----
@ -351,7 +351,7 @@ You can invoke it via the HBase cli with the 'wal' command.
[NOTE]
====
Prior to version 2.0, the WAL Pretty Printer was called the `HLogPrettyPrinter`, after an internal name for HBase's write ahead log.
In those versions, you can pring the contents of a WAL using the same configuration as above, but with the 'hlog' command.
In those versions, you can print the contents of a WAL using the same configuration as above, but with the 'hlog' command.
----
$ ./bin/hbase hlog hdfs://example.org:8020/hbase/.logs/example.org,60020,1283516293161/10.10.21.10%3A60020.1283973724012
@ -523,7 +523,7 @@ row9 c1 c2
row10 c1 c2
----
For ImportTsv to use this imput file, the command line needs to look like this:
For ImportTsv to use this input file, the command line needs to look like this:
----
@ -781,7 +781,7 @@ To decommission a loaded RegionServer, run the following: +$
====
The `HOSTNAME` passed to _graceful_stop.sh_ must match the hostname that hbase is using to identify RegionServers.
Check the list of RegionServers in the master UI for how HBase is referring to servers.
Its usually hostname but can also be FQDN.
It's usually hostname but can also be FQDN.
Whatever HBase is using, this is what you should pass the _graceful_stop.sh_ decommission script.
If you pass IPs, the script is not yet smart enough to make a hostname (or FQDN) of it and so it will fail when it checks if server is currently running; the graceful unloading of regions will not run.
====
@ -821,12 +821,12 @@ Hence, it is better to manage the balancer apart from `graceful_stop` reenabling
[[draining.servers]]
==== Decommissioning several Regions Servers concurrently
If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping mutiple RegionServers concurrently.
If you have a large cluster, you may want to decommission more than one machine at a time by gracefully stopping multiple RegionServers concurrently.
To gracefully drain multiple regionservers at the same time, RegionServers can be put into a "draining" state.
This is done by marking a RegionServer as a draining node by creating an entry in ZooKeeper under the _hbase_root/draining_ znode.
This znode has format `name,port,startcode` just like the regionserver entries under _hbase_root/rs_ znode.
Without this facility, decommissioning mulitple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining.
Without this facility, decommissioning multiple nodes may be non-optimal because regions that are being drained from one region server may be moved to other regionservers that are also draining.
Marking RegionServers to be in the draining state prevents this from happening.
See this link:http://inchoate-clatter.blogspot.com/2012/03/hbase-ops-automation.html[blog
post] for more details.
@ -991,7 +991,7 @@ To configure metrics for a given region server, edit the _conf/hadoop-metrics2-h
Restart the region server for the changes to take effect.
To change the sampling rate for the default sink, edit the line beginning with `*.period`.
To filter which metrics are emitted or to extend the metrics framework, see link:http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
To filter which metrics are emitted or to extend the metrics framework, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html
.HBase Metrics and Ganglia
[NOTE]
@ -1014,15 +1014,15 @@ Rather than listing each metric which HBase emits by default, you can browse thr
Different metrics are exposed for the Master process and each region server process.
.Procedure: Access a JSON Output of Available Metrics
. After starting HBase, access the region server's web UI, at `http://REGIONSERVER_HOSTNAME:60030` by default (or port 16030 in HBase 1.0+).
. After starting HBase, access the region server's web UI, at pass:[http://REGIONSERVER_HOSTNAME:60030] by default (or port 16030 in HBase 1.0+).
. Click the [label]#Metrics Dump# link near the top.
The metrics for the region server are presented as a dump of the JMX bean in JSON format.
This will dump out all metrics names and their values.
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60030/jmx?description=true`.
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60030/jmx?description=true].
Not all beans and attributes have descriptions.
. To view metrics for the Master, connect to the Master's web UI instead (defaults to `http://localhost:60010` or port 16010 in HBase 1.0+) and click its [label]#Metrics
. To view metrics for the Master, connect to the Master's web UI instead (defaults to pass:[http://localhost:60010] or port 16010 in HBase 1.0+) and click its [label]#Metrics
Dump# link.
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes `http://REGIONSERVER_HOSTNAME:60010/jmx?description=true`.
To include metrics descriptions in the listing -- this can be useful when you are exploring what is available -- add a query string of `?description=true` so your URL becomes pass:[http://REGIONSERVER_HOSTNAME:60010/jmx?description=true].
Not all beans and attributes have descriptions.
@ -1341,9 +1341,9 @@ disable_peer <ID>::
remove_peer <ID>::
Disable and remove a replication relationship. HBase will no longer send edits to that peer cluster or keep track of WALs.
enable_table_replication <TABLE_NAME>::
Enable the table replication switch for all it's column families. If the table is not found in the destination cluster then it will create one with the same name and column families.
Enable the table replication switch for all its column families. If the table is not found in the destination cluster then it will create one with the same name and column families.
disable_table_replication <TABLE_NAME>::
Disable the table replication switch for all it's column families.
Disable the table replication switch for all its column families.
=== Verifying Replicated Data
@ -1462,7 +1462,7 @@ Speed is also limited by total size of the list of edits to replicate per slave,
With this configuration, a master cluster region server with three slaves would use at most 192 MB to store data to replicate.
This does not account for the data which was filtered but not garbage collected.
Once the maximum size of edits has been buffered or the reader reaces the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return.
Once the maximum size of edits has been buffered or the reader reaches the end of the WAL, the source thread stops reading and chooses at random a sink to replicate to (from the list that was generated by keeping only a subset of slave region servers). It directly issues a RPC to the chosen region server and waits for the method to return.
If the RPC was successful, the source determines whether the current file has been emptied or it contains more data which needs to be read.
If the file has been emptied, the source deletes the znode in the queue.
Otherwise, it registers the new offset in the log's znode.
@ -1778,7 +1778,7 @@ but still suboptimal compared to a mechanism which allows large requests to be s
into multiple smaller ones.
HBASE-10993 introduces such a system for deprioritizing long-running scanners. There
are two types of queues,`fifo` and `deadline`.To configure the type of queue used,
are two types of queues, `fifo` and `deadline`. To configure the type of queue used,
configure the `hbase.ipc.server.callqueue.type` property in `hbase-site.xml`. There
is no way to estimate how long each request may take, so de-prioritization only affects
scans, and is based on the number of “next” calls a scan request has made. An assumption
@ -2049,7 +2049,7 @@ Aside from the disk space necessary to store the data, one RS may not be able to
[[ops.capacity.nodes.throughput]]
==== Read/Write throughput
Number of nodes can also be driven by required thoughput for reads and/or writes.
Number of nodes can also be driven by required throughput for reads and/or writes.
The throughput one can get per node depends a lot on data (esp.
key/value sizes) and request patterns, as well as node and system configuration.
Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count.

View File

@ -88,7 +88,7 @@ Multiple rack configurations carry the same potential issues as multiple switche
* Poor switch capacity performance
* Insufficient uplink to another rack
If the the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks.
If the switches in your rack have appropriate switching capacity to handle all the hosts at full speed, the next most likely issue will be caused by homing more of your cluster across racks.
The easiest way to avoid issues when spanning multiple racks is to use port trunking to create a bonded uplink to other racks.
The downside of this method however, is in the overhead of ports that could potentially be used.
An example of this is, creating an 8Gbps port channel from rack A to rack B, using 8 of your 24 ports to communicate between racks gives you a poor ROI, using too few however can mean you're not getting the most out of your cluster.
@ -102,7 +102,7 @@ Are all the network interfaces functioning correctly? Are you sure? See the Trou
[[perf.network.call_me_maybe]]
=== Network Consistency and Partition Tolerance
The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three charateristics:
The link:http://en.wikipedia.org/wiki/CAP_theorem[CAP Theorem] states that a distributed system can maintain two out of the following three characteristics:
- *C*onsistency -- all nodes see the same data.
- *A*vailability -- every request receives a response about whether it succeeded or failed.
- *P*artition tolerance -- the system continues to operate even if some of its components become unavailable to the others.
@ -556,7 +556,7 @@ When writing a lot of data to an HBase table from a MR job (e.g., with link:http
When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other Reducers that will most likely be off-node.
It's far more efficient to just write directly to HBase.
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the the above case.
For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). This is a different processing problem than from the above case.
[[perf.one.region]]
=== Anti-Pattern: One Hot Region
@ -565,7 +565,7 @@ If all your data is being written to one region at a time, then re-read the sect
Also, if you are pre-splitting regions and all your data is _still_ winding up in a single region even though your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.
There are a variety of reasons that regions may appear "well split" but won't work with your data.
As the HBase client communicates directly with the RegionServers, this can be obtained via link:hhttp://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte[])[Table.getRegionLocation].
As the HBase client communicates directly with the RegionServers, this can be obtained via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#getRegionLocation(byte%5B%5D)[Table.getRegionLocation].
See <<precreate.regions>>, as well as <<perf.configurations>>
@ -607,7 +607,7 @@ When columns are selected explicitly with `scan.addColumn`, HBase will schedule
When rows have few columns and each column has only a few versions this can be inefficient.
A seek operation is generally slower if does not seek at least past 5-10 columns/versions or 512-1024 bytes.
In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set the on Scan object.
In order to opportunistically look ahead a few columns/versions to see if the next column/version can be found that way before a seek operation is scheduled, a new attribute `Scan.HINT_LOOKAHEAD` can be set on the Scan object.
The following code instructs the RegionServer to attempt two iterations of next before a seek is scheduled:
[source,java]
@ -731,7 +731,7 @@ However, if hedged reads are enabled, the client waits some configurable amount
Whichever read returns first is used, and the other read request is discarded.
Hedged reads can be helpful for times where a rare slow read is caused by a transient error such as a failing disk or flaky network connection.
Because a HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment.
Because an HBase RegionServer is a HDFS client, you can enable hedged reads in HBase, by adding the following properties to the RegionServer's hbase-site.xml and tuning the values to suit your environment.
.Configuration for Hedged Reads
* `dfs.client.hedged.read.threadpool.size` - the number of threads dedicated to servicing hedged reads.
@ -870,7 +870,7 @@ If you are running on EC2 and post performance questions on the dist-list, pleas
== Collocating HBase and MapReduce
It is often recommended to have different clusters for HBase and MapReduce.
A better qualification of this is: don't collocate a HBase that serves live requests with a heavy MR workload.
A better qualification of this is: don't collocate an HBase that serves live requests with a heavy MR workload.
OLTP and OLAP-optimized systems have conflicting requirements and one will lose to the other, usually the former.
For example, short latency-sensitive disk reads will have to wait in line behind longer reads that are trying to squeeze out as much throughput as possible.
MR jobs that write to HBase will also generate flushes and compactions, which will in turn invalidate blocks in the <<block.cache>>.

View File

@ -106,7 +106,7 @@ After client sends preamble and connection header, server does NOT respond if su
No response means server is READY to accept requests and to give out response.
If the version or authentication in the preamble is not agreeable or the server has trouble parsing the preamble, it will throw a org.apache.hadoop.hbase.ipc.FatalConnectionException explaining the error and will then disconnect.
If the client in the connection header -- i.e.
the protobuf'd Message that comes after the connection preamble -- asks for for a Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.
the protobuf'd Message that comes after the connection preamble -- asks for a Service the server does not support or a codec the server does not have, again we throw a FatalConnectionException with explanation.
==== Request
@ -118,7 +118,7 @@ The header includes the method name and optionally, metadata on the optional Cel
The parameter type suits the method being invoked: i.e.
if we are doing a getRegionInfo request, the protobuf Message param will be an instance of GetRegionInfoRequest.
The response will be a GetRegionInfoResponse.
The CellBlock is optionally used ferrying the bulk of the RPC data: i.e Cells/KeyValues.
The CellBlock is optionally used ferrying the bulk of the RPC data: i.e. Cells/KeyValues.
===== Request Parts
@ -182,7 +182,7 @@ Codecs will live on the server for all time so old clients can connect.
.Constraints
In some part, current wire-format -- i.e.
all requests and responses preceeded by a length -- has been dictated by current server non-async architecture.
all requests and responses preceded by a length -- has been dictated by current server non-async architecture.
.One fat pb request or header+param
We went with pb header followed by pb param making a request and a pb header followed by pb response for now.
@ -214,9 +214,9 @@ If a server sees no codec, it will return all responses in pure protobuf.
Running pure protobuf all the time will be slower than running with cellblocks.
.Compression
Uses hadoops compression codecs.
Uses hadoop's compression codecs.
To enable compressing of passed CellBlocks, set `hbase.client.rpc.compressor` to the name of the Compressor to use.
Compressor must implement Hadoops' CompressionCodec Interface.
Compressor must implement Hadoop's CompressionCodec Interface.
After connection setup, all passed cellblocks will be sent compressed.
The server will return cellblocks compressed using this same compressor as long as the compressor is on its CLASSPATH (else you will get `UnsupportedCompressionCodecException`).

View File

@ -187,7 +187,7 @@ See this comic by IKai Lan on why monotonically increasing row keys are problema
The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
If you do need to upload time series data into HBase, you should study link:http://opentsdb.net/[OpenTSDB] as a successful example.
It has a page describing the link: http://opentsdb.net/schema.html[schema] it uses in HBase.
It has a page describing the link:http://opentsdb.net/schema.html[schema] it uses in HBase.
The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key.
However, the difference is that the timestamp is not in the _lead_ position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types.
Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.
@ -339,8 +339,8 @@ As an example of why this is important, consider the example of using displayabl
The problem is that all the data is going to pile up in the first 2 regions and the last region thus creating a "lumpy" (and possibly "hot") region problem.
To understand why, refer to an link:http://www.asciitable.com[ASCII Table].
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f]. Thus, the middle regions regions will never be used.
To make pre-spliting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required.
'0' is byte 48, and 'f' is byte 102, but there is a huge gap in byte values (bytes 58 to 96) that will _never appear in this keyspace_ because the only values are [0-9] and [a-f]. Thus, the middle regions will never be used.
To make pre-splitting work with this example keyspace, a custom definition of splits (i.e., and not relying on the built-in split method) is required.
Lesson #1: Pre-splitting tables is generally a best practice, but you need to pre-split them in such a way that all the regions are accessible in the keyspace.
While this example demonstrated the problem with a hex-key keyspace, the same problem can happen with _any_ keyspace.
@ -406,7 +406,7 @@ The minimum number of row versions parameter is used together with the time-to-l
HBase supports a "bytes-in/bytes-out" interface via link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] and link:http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html[Result], so anything that can be converted to an array of bytes can be stored as a value.
Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailling list for conversations on this topic.
There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask); search the mailing list for conversations on this topic.
All rows in HBase conform to the <<datamodel>>, and that includes versioning.
Take that into consideration when making your design, as well as block size for the ColumnFamily.
@ -514,7 +514,7 @@ ROW COLUMN+CELL
Notice how delete cells are let go.
Now lets run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table or per-column-family):
Now let's run the same test only with `KEEP_DELETED_CELLS` set on the table (you can do table or per-column-family):
[source]
----
@ -605,7 +605,7 @@ However, don't try a full-scan on a large table like this from an application (i
[[secondary.indexes.periodic]]
=== Periodic-Update Secondary Index
A secondary index could be created in an other table which is periodically updated via a MapReduce job.
A secondary index could be created in another table which is periodically updated via a MapReduce job.
The job could be executed intra-day, but depending on load-strategy it could still potentially be out of sync with the main data table.
See <<mapreduce.example.readwrite,mapreduce.example.readwrite>> for more information.
@ -753,7 +753,7 @@ In either the Hash or Numeric substitution approach, the raw values for hostname
This effectively is the OpenTSDB approach.
What OpenTSDB does is re-write data and pack rows into columns for certain time-periods.
For a detailed explanation, see: link:http://opentsdb.net/schema.html, and
For a detailed explanation, see: http://opentsdb.net/schema.html, and
+++<a href="http://www.cloudera.com/content/cloudera/en/resources/library/hbasecon/video-hbasecon-2012-lessons-learned-from-opentsdb.html">Lessons Learned from OpenTSDB</a>+++
from HBaseCon2012.
@ -800,7 +800,7 @@ Assuming that the combination of customer number and sales order uniquely identi
[customer number][order number]
----
for a ORDER table.
for an ORDER table.
However, there are more design decisions to make: are the _raw_ values the best choices for rowkeys?
The same design questions in the Log Data use-case confront us here.
@ -931,9 +931,9 @@ For example, the ORDER table's rowkey was described above: <<schema.casestudies.
There are many options here: JSON, XML, Java Serialization, Avro, Hadoop Writables, etc.
All of them are variants of the same approach: encode the object graph to a byte-array.
Care should be taken with this approach to ensure backward compatibilty in case the object model changes such that older persisted structures can still be read back out of HBase.
Care should be taken with this approach to ensure backward compatibility in case the object model changes such that older persisted structures can still be read back out of HBase.
Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per Order in this example), but the cons include the aforementioned warning about backward compatiblity of serialization, language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in getting frameworks like Hive to work with custom objects like this.
Pros are being able to manage complex object graphs with minimal I/O (e.g., a single HBase Get per Order in this example), but the cons include the aforementioned warning about backward compatibility of serialization, language dependencies of serialization (e.g., Java Serialization only works with Java clients), the fact that you have to deserialize the entire object to get any piece of information inside the BLOB, and the difficulty in getting frameworks like Hive to work with custom objects like this.
[[schema.smackdown]]
=== Case Study - "Tall/Wide/Middle" Schema Design Smackdown
@ -945,7 +945,7 @@ These are general guidelines and not laws - each application must consider its o
==== Rows vs. Versions
A common question is whether one should prefer rows or HBase's built-in-versioning.
The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwite with each successive update.
The context is typically where there are "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The rows-approach would require storing a timestamp in some portion of the rowkey so that they would not overwrite with each successive update.
Preference: Rows (generally speaking).
@ -1044,14 +1044,14 @@ The tl;dr version is that you should probably go with one row per user+value, an
Your two options mirror a common question people have when designing HBase schemas: should I go "tall" or "wide"? Your first schema is "tall": each row represents one value for one user, and so there are many rows in the table for each user; the row key is user + valueid, and there would be (presumably) a single column qualifier that means "the value". This is great if you want to scan over rows in sorted order by row key (thus my question above, about whether these ids are sorted correctly). You can start a scan at any user+valueid, read the next 30, and be done.
What you're giving up is the ability to have transactional guarantees around all the rows for one user, but it doesn't sound like you need that.
Doing it this way is generally recommended (see here link:http://hbase.apache.org/book.html#schema.smackdown).
Doing it this way is generally recommended (see here http://hbase.apache.org/book.html#schema.smackdown).
Your second option is "wide": you store a bunch of values in one row, using different qualifiers (where the qualifier is the valueid). The simple way to do that would be to just store ALL values for one user in a single row.
I'm guessing you jumped to the "paginated" version because you're assuming that storing millions of columns in a single row would be bad for performance, which may or may not be true; as long as you're not trying to do too much in a single request, or do things like scanning over and returning all of the cells in the row, it shouldn't be fundamentally worse.
The client has methods that allow you to get specific slices of columns.
Note that neither case fundamentally uses more disk space than the other; you're just "shifting" part of the identifying information for a value either to the left (into the row key, in option one) or to the right (into the column qualifiers in option 2). Under the covers, every key/value still stores the whole row key, and column family name.
(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding HBase schema design: link:http://www.youtube.com/watch?v=_HLoH_PgrLk).
(If this is a bit confusing, take an hour and watch Lars George's excellent video about understanding HBase schema design: http://www.youtube.com/watch?v=_HLoH_PgrLk).
A manually paginated version has lots more complexities, as you note, like having to keep track of how many things are in each page, re-shuffling if new values are inserted, etc.
That seems significantly more complex.

View File

@ -331,7 +331,7 @@ To enable REST gateway Kerberos authentication for client access, add the follow
Substitute the keytab for HTTP for _$KEYTAB_.
HBase REST gateway supports different 'hbase.rest.authentication.type': simple, kerberos.
You can also implement a custom authentication by implemening Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
You can also implement a custom authentication by implementing Hadoop AuthenticationHandler, then specify the full class name as 'hbase.rest.authentication.type' value.
For more information, refer to link:http://hadoop.apache.org/docs/stable/hadoop-auth/index.html[SPNEGO HTTP authentication].
[[security.rest.gateway]]
@ -343,7 +343,7 @@ To the HBase server, all requests are from the REST gateway user.
The actual users are unknown.
You can turn on the impersonation support.
With impersonation, the REST gateway user is a proxy user.
The HBase server knows the acutal/real user of each request.
The HBase server knows the actual/real user of each request.
So it can apply proper authorizations.
To turn on REST gateway impersonation, we need to configure HBase servers (masters and region servers) to allow proxy users; configure REST gateway to enable impersonation.
@ -1117,7 +1117,7 @@ NOTE: Visibility labels are not currently applied for superusers.
| Interpretation
| fulltime
| Allow accesss to users associated with the fulltime label.
| Allow access to users associated with the fulltime label.
| !public
| Allow access to users not associated with the public label.

View File

@ -76,7 +76,7 @@ NOTE: Spawning HBase Shell commands in this way is slow, so keep that in mind wh
.Passing Commands to the HBase Shell
====
You can pass commands to the HBase Shell in non-interactive mode (see <<hbasee.shell.noninteractive,hbasee.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
You can pass commands to the HBase Shell in non-interactive mode (see <<hbase.shell.noninteractive,hbase.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell.
Some debug-level output has been truncated from the example below.

View File

@ -36,9 +36,9 @@ more information on the Spark project and subprojects. This document will focus
on 4 main interaction points between Spark and HBase. Those interaction points are:
Basic Spark::
The ability to have a HBase Connection at any point in your Spark DAG.
The ability to have an HBase Connection at any point in your Spark DAG.
Spark Streaming::
The ability to have a HBase Connection at any point in your Spark Streaming
The ability to have an HBase Connection at any point in your Spark Streaming
application.
Spark Bulk Load::
The ability to write directly to HBase HFiles for bulk insertion into HBase
@ -205,7 +205,7 @@ There are three inputs to the `hbaseBulkPut` function.
. The hbaseContext that carries the configuration boardcast information link us
to the HBase Connections in the executors
. The table name of the table we are putting data into
. A function that will convert a record in the DStream into a HBase Put object.
. A function that will convert a record in the DStream into an HBase Put object.
====
== Bulk Load
@ -350,7 +350,7 @@ FROM hbaseTmp
WHERE (KEY_FIELD = 'get1' or KEY_FIELD = 'get2' or KEY_FIELD = 'get3')
----
Now lets look at an example where we will end up doing two scans on HBase.
Now let's look at an example where we will end up doing two scans on HBase.
[source, sql]
----

View File

@ -89,11 +89,11 @@ Additionally, each DataNode server will also have a TaskTracker/NodeManager log
[[rpc.logging]]
==== Enabling RPC-level logging
Enabling the RPC-level logging on a RegionServer can often given insight on timings at the server.
Enabling the RPC-level logging on a RegionServer can often give insight on timings at the server.
Once enabled, the amount of log spewed is voluminous.
It is not recommended that you leave this logging on for more than short bursts of time.
To enable RPC-level logging, browse to the RegionServer UI and click on _Log Level_.
Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (Thats right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
Set the log level to `DEBUG` for the package `org.apache.hadoop.ipc` (That's right, for `hadoop.ipc`, NOT, `hbase.ipc`). Then tail the RegionServers log.
Analyze.
To disable, set the logging level back to `INFO` level.
@ -185,7 +185,7 @@ The key points here is to keep all these pauses low.
CMS pauses are always low, but if your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit as high at 400ms.
This can be due to the size of the ParNew, which should be relatively small.
If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if its too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.
If your ParNew is very large after running HBase for a while, in one example a ParNew was about 150MB, then you might have to constrain the size of ParNew (The larger it is, the longer the collections take but if it's too small, objects are promoted to old gen too quickly). In the below we constrain new gen size to 64m.
Add the below line in _hbase-env.sh_:
[source,bourne]
@ -443,7 +443,7 @@ java.lang.Thread.State: WAITING (on object monitor)
at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)
----
A handler thread that's waiting for stuff to do (like put, delete, scan, etc):
A handler thread that's waiting for stuff to do (like put, delete, scan, etc.):
[source]
----
@ -849,7 +849,7 @@ are snapshots and WALs.
Snapshots::
When you create a snapshot, HBase retains everything it needs to recreate the table's
state at that time of tne snapshot. This includes deleted cells or expired versions.
state at that time of the snapshot. This includes deleted cells or expired versions.
For this reason, your snapshot usage pattern should be well-planned, and you should
prune snapshots that you no longer need. Snapshots are stored in `/hbase/.snapshots`,
and archives needed to restore snapshots are stored in
@ -1070,7 +1070,7 @@ However, if the NotServingRegionException is logged ERROR, then the client ran o
Fix your DNS.
In versions of Apache HBase before 0.92.x, reverse DNS needs to give same answer as forward lookup.
See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not using the name given it by the master; double entry in master listing of servers] for gorey details.
See link:https://issues.apache.org/jira/browse/HBASE-3431[HBASE 3431 RegionServer is not using the name given it by the master; double entry in master listing of servers] for gory details.
[[brand.new.compressor]]
==== Logs flooded with '2011-01-10 12:40:48,407 INFO org.apache.hadoop.io.compress.CodecPool: Gotbrand-new compressor' messages

View File

@ -96,13 +96,13 @@ public class TestMyHbaseDAOData {
These tests ensure that your `createPut` method creates, populates, and returns a `Put` object with expected values.
Of course, JUnit can do much more than this.
For an introduction to JUnit, see link:https://github.com/junit-team/junit/wiki/Getting-started.
For an introduction to JUnit, see https://github.com/junit-team/junit/wiki/Getting-started.
== Mockito
Mockito is a mocking framework.
It goes further than JUnit by allowing you to test the interactions between objects without having to replicate the entire environment.
You can read more about Mockito at its project site, link:https://code.google.com/p/mockito/.
You can read more about Mockito at its project site, https://code.google.com/p/mockito/.
You can use Mockito to do unit testing on smaller units.
For instance, you can mock a `org.apache.hadoop.hbase.Server` instance or a `org.apache.hadoop.hbase.master.MasterServices` interface reference rather than a full-blown `org.apache.hadoop.hbase.master.HMaster`.
@ -182,7 +182,7 @@ public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable>
public static final byte[] CF = "CF".getBytes();
public static final byte[] QUALIFIER = "CQ-1".getBytes();
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
//bunch of processing to extract data to be inserted, in our case, lets say we are simply
//bunch of processing to extract data to be inserted, in our case, let's say we are simply
//appending all the records we receive from the mapper for this particular
//key and insert one record into HBase
StringBuffer data = new StringBuffer();
@ -259,7 +259,7 @@ Your MRUnit test verifies that the output is as expected, the Put that is insert
MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other operations, including reading from HBase, processing data, or writing to HDFS,
== Integration Testing with a HBase Mini-Cluster
== Integration Testing with an HBase Mini-Cluster
HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using a [firstterm]_mini-cluster_.
The first step is to add some dependencies to your Maven POM file.

View File

@ -132,7 +132,7 @@ HBase Client API::
[[hbase.limitetprivate.api]]
HBase LimitedPrivate API::
LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those consumers are coprocessors, phoenix, replication endpoint implemnetations or similar. At this point, HBase only guarantees source and binary compatibility for these interfaces between patch versions.
LimitedPrivate annotation comes with a set of target consumers for the interfaces. Those consumers are coprocessors, phoenix, replication endpoint implementations or similar. At this point, HBase only guarantees source and binary compatibility for these interfaces between patch versions.
[[hbase.private.api]]
HBase Private API::
@ -158,7 +158,7 @@ When we say two HBase versions are compatible, we mean that the versions are wir
A rolling upgrade is the process by which you update the servers in your cluster a server at a time. You can rolling upgrade across HBase versions if they are binary or wire compatible. See <<hbase.rolling.restart>> for more on what this means. Coarsely, a rolling upgrade is a graceful stop each server, update the software, and then restart. You do this for each server in the cluster. Usually you upgrade the Master first and then the RegionServers. See <<rolling>> for tools that can help use the rolling upgrade process.
For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before running a rolling restart over the cluser, we changed the symlink to point at the new HBase software version and then ran
For example, in the below, HBase was symlinked to the actual HBase install. On upgrade, before running a rolling restart over the cluster, we changed the symlink to point at the new HBase software version and then ran
[source,bash]
----
@ -200,7 +200,7 @@ ports.
[[upgrade1.0.hbase.bucketcache.percentage.in.combinedcache]]
.hbase.bucketcache.percentage.in.combinedcache configuration has been REMOVED
You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache, this change does not effect you. Its removal means that your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config., its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
You may have made use of this configuration if you are using BucketCache. If NOT using BucketCache, this change does not affect you. Its removal means that your L1 LruBlockCache is now sized using `hfile.block.cache.size` -- i.e. the way you would size the on-heap L1 LruBlockCache if you were NOT doing BucketCache -- and the BucketCache size is not whatever the setting for `hbase.bucketcache.size` is. You may need to adjust configs to get the LruBlockCache and BucketCache sizes set to what they were in 0.98.x and previous. If you did not set this config., its default value was 0.9. If you do nothing, your BucketCache will increase in size by 10%. Your L1 LruBlockCache will become `hfile.block.cache.size` times your java heap size (`hfile.block.cache.size` is a float between 0.0 and 1.0). To read more, see link:https://issues.apache.org/jira/browse/HBASE-11520[HBASE-11520 Simplify offheap cache config by removing the confusing "hbase.bucketcache.percentage.in.combinedcache"].
[[hbase-12068]]
.If you have your own customer filters.
@ -392,7 +392,7 @@ The migration is a one-time event. However, every time your cluster starts, `MET
[[upgrade0.94]]
=== Upgrading from 0.92.x to 0.94.x
We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder pattern in HColumnDescriptor] changed method signatures so rather than return `void` they instead return `HColumnDescriptor`. This will throw`java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible. You cannot do a rolling upgrade between them.
We used to think that 0.92 and 0.94 were interface compatible and that you can do a rolling upgrade between these versions but then we figured that link:https://issues.apache.org/jira/browse/HBASE-5357[HBASE-5357 Use builder pattern in HColumnDescriptor] changed method signatures so rather than return `void` they instead return `HColumnDescriptor`. This will throw `java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V` so 0.92 and 0.94 are NOT compatible. You cannot do a rolling upgrade between them.
[[upgrade0.92]]
=== Upgrading from 0.90.x to 0.92.x

View File

@ -97,7 +97,7 @@ In the example below we have ZooKeeper persist to _/user/local/zookeeper_.
</configuration>
----
.What verion of ZooKeeper should I use?
.What version of ZooKeeper should I use?
[CAUTION]
====
The newer version, the better.