diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 0fb3eba8fdc..ed26c1beb80 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -68,8 +68,8 @@
-
-
+
+ Data Model
@@ -658,7 +658,7 @@ htable.put(put);
-
+ HBase and MapReduce
@@ -1319,11 +1319,16 @@ scan.setFilter(list);
Column ValueSingleColumnValueFilter
- SingleColumnValueFilter
- can be used to test column values for equivalence (CompareOp.EQUAL
- ), inequality (CompareOp.NOT_EQUAL), or ranges
- (e.g., CompareOp.GREATER). The folowing is example of testing equivalence a column to a String value "my value"...
-
+ SingleColumnValueFilter can be used to test column values for equivalence
+ (CompareOp.EQUAL
+ ), inequality (CompareOp.NOT_EQUAL), or ranges (e.g.,
+ CompareOp.GREATER). The following is example of testing equivalence a
+ column to a String value "my value"...
+
SingleColumnValueFilter filter = new SingleColumnValueFilter(
cf,
column,
@@ -1604,9 +1609,9 @@ rs.close();
Single access priority: The first time a block is loaded from HDFS it normally has this priority and it will be part of the first group to be considered
during evictions. The advantage is that scanned blocks are more likely to get evicted than blocks that are getting more usage.
- Mutli access priority: If a block in the previous priority group is accessed again, it upgrades to this priority. It is thus part of the second group
- considered during evictions.
-
+ Multi access priority: If a block in the previous priority group is accessed
+ again, it upgrades to this priority. It is thus part of the second group considered
+ during evictions. In-memory access priority: If the block's family was configured to be "in-memory", it will be part of this priority disregarding the number of times it
was accessed. Catalog tables are configured like this. This group is the last one considered during evictions.
@@ -2418,13 +2423,13 @@ All the settings that apply to normal compactions (file size limits, etc.) apply
-
+
-
-
+
+
-
-
+
+
@@ -2533,9 +2538,8 @@ All the settings that apply to normal compactions (file size limits, etc.) apply
Can I change a table's rowkeys?
-
- This is a very common quesiton. You can't. See .
-
+ This is a very common question. You can't. See .
diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml
index 8811d60a63b..45bc157742b 100644
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@@ -29,7 +29,7 @@
Apache HBase ConfigurationThis chapter is the Not-So-Quick start guide to Apache HBase configuration. It goes
over system requirements, Hadoop setup, the different Apache HBase run modes, and the
- various configurations in HBase. Please read this chapter carefully. At a mimimum
+ various configurations in HBase. Please read this chapter carefully. At a minimum
ensure that all have
been satisfied. Failure to do so will cause you (and us) grief debugging strange errors
and/or data loss.
@@ -778,7 +778,7 @@ stopping hbase............... Shutdown can take a moment to
hbase-env.shSet HBase environment variables in this file.
Examples include options to pass the JVM on start of
- an HBase daemon such as heap size and garbarge collector configs.
+ an HBase daemon such as heap size and garbage collector configs.
You can also set configurations for HBase configuration, log directories,
niceness, ssh options, where to locate process pid files,
etc. Open the file at
diff --git a/src/main/docbkx/developer.xml b/src/main/docbkx/developer.xml
index 3475c8dd421..dd469b65d60 100644
--- a/src/main/docbkx/developer.xml
+++ b/src/main/docbkx/developer.xml
@@ -492,12 +492,13 @@ HBase have a character not usually seen in other projects.Apache HBase ModulesAs of 0.96, Apache HBase is split into multiple modules which creates "interesting" rules for
-how and where tests are written. If you are writting code for hbase-server, see
- for how to write your tests; these tests can spin
-up a minicluster and will need to be categorized. For any other module, for example
-hbase-common, the tests must be strict unit tests and just test the class
-under test - no use of the HBaseTestingUtility or minicluster is allowed (or even possible
-given the dependency tree).
+ how and where tests are written. If you are writing code for
+ hbase-server, see for
+ how to write your tests; these tests can spin up a minicluster and will need to be
+ categorized. For any other module, for example hbase-common,
+ the tests must be strict unit tests and just test the class under test - no use of
+ the HBaseTestingUtility or minicluster is allowed (or even possible given the
+ dependency tree).
Running Tests in other Modules
If the module you are developing in has no other dependencies on other HBase modules, then
@@ -643,22 +644,22 @@ error will be reported when a non-existent test case is specified.
Running tests faster
-
-By default, $ mvn test -P runAllTests runs 5 tests in parallel.
-It can be increased on a developer's machine. Allowing that you can have 2
-tests in parallel per core, and you need about 2Gb of memory per test (at the
-extreme), if you have an 8 core, 24Gb box, you can have 16 tests in parallel.
-but the memory available limits it to 12 (24/2), To run all tests with 12 tests
-in parallell, do this:
-mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12.
-To increase the speed, you can as well use a ramdisk. You will need 2Gb of memory
-to run all tests. You will also need to delete the files between two test run.
-The typical way to configure a ramdisk on Linux is:
-$ sudo mkdir /ram2G
+ By default, $ mvn test -P runAllTests runs 5 tests in parallel. It can be
+ increased on a developer's machine. Allowing that you can have 2 tests in
+ parallel per core, and you need about 2Gb of memory per test (at the extreme),
+ if you have an 8 core, 24Gb box, you can have 16 tests in parallel. but the
+ memory available limits it to 12 (24/2), To run all tests with 12 tests in
+ parallel, do this: mvn test -P runAllTests
+ -Dsurefire.secondPartThreadCount=12. To increase the speed, you
+ can as well use a ramdisk. You will need 2Gb of memory to run all tests. You
+ will also need to delete the files between two test run. The typical way to
+ configure a ramdisk on Linux is:
+ $ sudo mkdir /ram2G
sudo mount -t tmpfs -o size=2048M tmpfs /ram2G
-You can then use it to run all HBase tests with the command:
-mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirectory=/ram2G
-
+ You can then use it to run all HBase tests with the command: mvn test
+ -P runAllTests -Dsurefire.secondPartThreadCount=12
+ -Dtest.build.data.basedirectory=/ram2G
+
@@ -818,7 +819,7 @@ This actually runs ALL the integration tests.
mvn failsafe:verify
The above command basically looks at all the test results (so don't remove the 'target' directory) for test failures and reports the results.
-
+ Running a subset of Integration testsThis is very similar to how you specify running a subset of unit tests (see above), but use the property
it.test instead of test.
@@ -970,9 +971,9 @@ pecularity that is probably fixable but we've not spent the time trying to figur
Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a
deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
-
- In earilier verions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
- If you are running, for example HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:
+ In earilier versions of Apache HBase, you can build against older versions of Apache
+ Hadoop, notably, Hadoop 0.22.x and 0.23.x. If you are running, for example
+ HBase-0.94 and wanted to build against Hadoop 0.23.x, you would run with:mvn -Dhadoop.profile=22 ...
@@ -1420,8 +1421,7 @@ Bar bar = foo.getBar(); <--- imagine there's an extra space(s) after the
Committers do this. See How To Commit in the Apache HBase wiki.
- Commiters will also resolve the Jira, typically after the patch passes a build.
-
+ Committers will also resolve the Jira, typically after the patch passes a build. Committers are responsible for making sure commits do not break the build or tests
diff --git a/src/main/docbkx/external_apis.xml b/src/main/docbkx/external_apis.xml
index e4c9f23f59c..9e3ed372798 100644
--- a/src/main/docbkx/external_apis.xml
+++ b/src/main/docbkx/external_apis.xml
@@ -295,7 +295,8 @@
Description: This filter takes two
arguments – a limit and offset. It returns limit number of columns after offset number
of columns. It does this for all the rows
- Syntax: ColumnPaginationFilter(‘<limit>’, ‘<offest>’)
+ Syntax: ColumnPaginationFilter(‘<limit>’,
+ ‘<offset>’) Example: "ColumnPaginationFilter (3, 5)"
diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index aab8928e062..dbd6d17bf37 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -36,10 +36,11 @@
Here we list HBase tools for administration, analysis, fixup, and debugging.Canary
-There is a Canary class can help users to canary-test the HBase cluster status, with every column-family for every regions or regionservers granularity. To see the usage,
-$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help
-Will output
-Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..]
+There is a Canary class can help users to canary-test the HBase cluster status, with every
+ column-family for every regions or regionservers granularity. To see the usage,
+ $ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help
+ Will output
+ Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..]
where [opts] are:
-help Show this help and exit.
-regionserver replace the table argument to regionserver,
@@ -49,25 +50,46 @@ Will output
-e Use region/regionserver as regular expression
which means the region/regionserver is regular expression pattern
-f <B> stop whole program if first error occurs, default is true
- -t <N> timeout for a check, default is 600000 (milisecs)
-This tool will return non zero error codes to user for collaborating with other monitoring tools, such as Nagios.
-The error code definitions are...
-private static final int USAGE_EXIT_CODE = 1;
+ -t <N> timeout for a check, default is 600000 (milliseconds)
+ This tool will return non zero error codes to user for collaborating with other monitoring
+ tools, such as Nagios. The error code definitions are...
+ private static final int USAGE_EXIT_CODE = 1;
private static final int INIT_ERROR_EXIT_CODE = 2;
private static final int TIMEOUT_ERROR_EXIT_CODE = 3;
private static final int ERROR_EXIT_CODE = 4;
-Here are some examples based on the following given case. There are two HTable called test-01 and test-02, they have two column family cf1 and cf2 respectively, and deployed on the 3 regionservers. see following table.
-
-Following are some examples based on the previous given case.
-
+ Here are some examples based on the following given case. There are two HTable called
+ test-01 and test-02, they have two column family cf1 and cf2 respectively, and deployed on
+ the 3 regionservers. see following table.
Following are some examples based on the previous given case. Canary test for every column family (store) of every region of every table$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index 174baf2b2fb..854e7c9fa1d 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -45,10 +45,9 @@
Network
-
- Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware
- that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).
-
+ Perhaps the most important factor in avoiding network issues degrading Hadoop and HBase
+ performance is the switching hardware that is used, decisions made early in the scope of the
+ project can cause major problems when you double or triple the size of your cluster (or more).
Important items to consider:
@@ -400,7 +399,7 @@ Deferred log flush can be configured on tables via
HBase Client: Group Puts by RegionServerIn addition to using the writeBuffer, grouping Puts by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
- There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own verison for
+ There is a utility HTableUtil currently on TRUNK that does this, but you can either copy that or implement your own version for
those still on 0.90.x or earlier.
diff --git a/src/main/docbkx/schema_design.xml b/src/main/docbkx/schema_design.xml
index 13ebc2fb40e..482d3d12a4c 100644
--- a/src/main/docbkx/schema_design.xml
+++ b/src/main/docbkx/schema_design.xml
@@ -141,7 +141,7 @@ admin.enableTable(table);
See for more information on HBase stores data internally to see why this is important.
- Attributes
+ AttributesAlthough verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer shorter attribute names (e.g., "via")
to store in HBase.
@@ -335,10 +335,11 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
Result, so anything that can be
converted to an array of bytes can be stored as a value. Input could be strings, numbers, complex objects, or even images as long as they can rendered as bytes.
- There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase would probably be too much to ask);
- search the mailling list for conversations on this topic. All rows in HBase conform to the datamodel, and
- that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily.
-
+ There are practical limits to the size of values (e.g., storing 10-50MB objects in HBase
+ would probably be too much to ask); search the mailing list for conversations on this topic.
+ All rows in HBase conform to the datamodel, and that includes
+ versioning. Take that into consideration when making your design, as well as block size for
+ the ColumnFamily. Counters
@@ -396,10 +397,11 @@ public static byte[][] getHexSplits(String startKey, String endKey, int numRegio
... and solutions are also influenced by the size of the cluster and how much processing power you have to throw at the solution.
Common techniques are in sub-sections below. This is a comprehensive, but not exhaustive, list of approaches.
- It should not be a surprise that secondary indexes require additional cluster space and processing.
- This is precisely what happens in an RDBMS because the act of creating an alternate index requires both space and processing cycles to update. RBDMS products
- are more advanced in this regard to handle alternative index management out of the box. However, HBase scales better at larger data volumes, so this is a feature trade-off.
-
+ It should not be a surprise that secondary indexes require additional cluster space and
+ processing. This is precisely what happens in an RDBMS because the act of creating an
+ alternate index requires both space and processing cycles to update. RDBMS products are more
+ advanced in this regard to handle alternative index management out of the box. However, HBase
+ scales better at larger data volumes, so this is a feature trade-off. Pay attention to when implementing any of these approaches.Additionally, see the David Butler response in this dist-list thread HBase, mail # user - Stargate+hbase
@@ -765,10 +767,11 @@ reasonable spread in the keyspace, similar options appear:
tall and wide tables. These are general guidelines and not laws - each application must consider its own needs.
Rows vs. Versions
- A common question is whether one should prefer rows or HBase's built-in-versioning. The context is typically where there are
- "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions). The
- rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
-
+ A common question is whether one should prefer rows or HBase's built-in-versioning. The
+ context is typically where there are "a lot" of versions of a row to be retained (e.g.,
+ where it is significantly above the HBase default of 1 max versions). The rows-approach
+ would require storing a timestamp in some portion of the rowkey so that they would not
+ overwite with each successive update. Preference: Rows (generally speaking).
diff --git a/src/main/docbkx/shell.xml b/src/main/docbkx/shell.xml
index 778545773ed..aaddc8d636e 100644
--- a/src/main/docbkx/shell.xml
+++ b/src/main/docbkx/shell.xml
@@ -175,18 +175,16 @@ hbase(main):018:0>
irbrc
- Create an .irbrc file for yourself in your
- home directory. Add customizations. A useful one is
- command history so commands are save across Shell invocations:
-
+ Create an .irbrc file for yourself in your home
+ directory. Add customizations. A useful one is command history so commands are save
+ across Shell invocations:
+
$ more .irbrc
require 'irb/ext/save-history'
IRB.conf[:SAVE_HISTORY] = 100
IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"
- See the ruby documentation of
- .irbrc to learn about other possible
- confiurations.
-
+ See the ruby documentation of .irbrc
+ to learn about other possible configurations. LOG data to timestamp
diff --git a/src/main/docbkx/troubleshooting.xml b/src/main/docbkx/troubleshooting.xml
index d9b7009a402..40f23d124f5 100644
--- a/src/main/docbkx/troubleshooting.xml
+++ b/src/main/docbkx/troubleshooting.xml
@@ -620,18 +620,21 @@ Harsh J investigated the issue as part of the mailing list thread
Client running out of memory though heap size seems to be stable (but the off-heap/direct heap keeps growing)
-
-You are likely running into the issue that is described and worked through in
-the mail thread HBase, mail # user - Suspected memory leak
-and continued over in HBase, mail # dev - FeedbackRe: Suspected memory leak.
-A workaround is passing your client-side JVM a reasonable value for -XX:MaxDirectMemorySize. By default,
-the MaxDirectMemorySize is equal to your -Xmx max heapsize setting (if -Xmx is set).
-Try seting it to something smaller (for example, one user had success setting it to 1g when
-they had a client-side heap of 12g). If you set it too small, it will bring on FullGCs so keep
-it a bit hefty. You want to make this setting client-side only especially if you are running the new experiemental
-server-side off-heap cache since this feature depends on being able to use big direct buffers (You may have to keep
-separate client-side and server-side config dirs).
-
+ You are likely running into the issue that is described and worked through in the
+ mail thread HBase, mail # user - Suspected memory leak and continued over in HBase, mail # dev - FeedbackRe: Suspected memory leak. A workaround is passing
+ your client-side JVM a reasonable value for -XX:MaxDirectMemorySize. By
+ default, the MaxDirectMemorySize is equal to your -Xmx max
+ heapsize setting (if -Xmx is set). Try seting it to something smaller (for
+ example, one user had success setting it to 1g when they had a client-side heap
+ of 12g). If you set it too small, it will bring on FullGCs so keep
+ it a bit hefty. You want to make this setting client-side only especially if you are running
+ the new experimental server-side off-heap cache since this feature depends on being able to
+ use big direct buffers (You may have to keep separate client-side and server-side config
+ dirs). Client Slowdown When Calling Admin Methods (flush, compact, etc.)
@@ -728,15 +731,17 @@ Caused by: java.io.FileNotFoundException: File _partition.lst does not exist.
Browsing HDFS for HBase Objects
- Somtimes it will be necessary to explore the HBase objects that exist on HDFS. These objects could include the WALs (Write Ahead Logs), tables, regions, StoreFiles, etc.
- The easiest way to do this is with the NameNode web application that runs on port 50070. The NameNode web application will provide links to the all the DataNodes in the cluster so that
- they can be browsed seamlessly.
+ Sometimes it will be necessary to explore the HBase objects that exist on HDFS.
+ These objects could include the WALs (Write Ahead Logs), tables, regions, StoreFiles, etc.
+ The easiest way to do this is with the NameNode web application that runs on port 50070. The
+ NameNode web application will provide links to the all the DataNodes in the cluster so that
+ they can be browsed seamlessly. The HDFS directory structure of HBase tables in the cluster is...
/hbase/<Table> (Tables in the cluster)
/<Region> (Regions for the table)
- /<ColumnFamiy> (ColumnFamilies for the Region for the table)
+ /<ColumnFamily> (ColumnFamilies for the Region for the table)
/<StoreFile> (StoreFiles for the ColumnFamily for the Regions for the table)
diff --git a/src/main/docbkx/upgrading.xml b/src/main/docbkx/upgrading.xml
index 22324a3491a..7711614d21b 100644
--- a/src/main/docbkx/upgrading.xml
+++ b/src/main/docbkx/upgrading.xml
@@ -27,9 +27,8 @@
*/
-->
Upgrading
- You cannot skip major verisons upgrading. If you are upgrading from
- version 0.90.x to 0.94.x, you must first go from 0.90.x to 0.92.x and then go
- from 0.92.x to 0.94.x.
+ You cannot skip major versions upgrading. If you are upgrading from version 0.90.x to
+ 0.94.x, you must first go from 0.90.x to 0.92.x and then go from 0.92.x to 0.94.x.It may be possible to skip across versions -- for example go from
0.92.2 straight to 0.98.0 just following the 0.96.x upgrade instructions --
but we have not tried it so cannot say whether it works or not.