diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index f7700c464bf..2c19cef72e9 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -260,6 +260,15 @@ admin.enableTable(table); search the mailling list for conversations on this topic. All rows in HBase conform to the datamodel, and that includes versioning. Take that into consideration when making your design, as well as block size for the ColumnFamily. +
+ Counters + + One supported datatype that deserves special mention are "counters" (i.e., the ability to do atomic increments of numbers). See + Increment in HTable. + + Synchronization on counters are done on the RegionServer, not in the client. + +
@@ -811,7 +820,7 @@ admin.enableTable(table); <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link> is responsible for finding RegionServers that are serving the particular row range of interest. It does this by querying - the <code>.META.</code> and <code>-ROOT</code> catalog tables + the <code>.META.</code> and <code>-ROOT-</code> catalog tables (TODO: Explain). After locating the required region(s), the client <emphasis>directly</emphasis> contacts the RegionServer serving that region (i.e., it does not go @@ -842,6 +851,11 @@ HTable table2 = new HTable(conf2, "myTable");</programlisting> For more information about how connections are handled in the HBase client, see <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html">HConnectionManager</link>. </para> + <section xml:id="client.connection.pooling"><title>Connection Pooling + For applications which require high-end multithreaded access (e.g., web-servers or application servers that may serve many application threads + in a single JVM), see HTablePool. + +
WriteBuffer and Batch Methods If is turned off on @@ -1055,7 +1069,7 @@ HTable table2 = new HTable(conf2, "myTable"); For a description of how a minor compaction picks files to compact, see the ascii diagram in the Store source code. After a major compaction runs there will be a single storefile per store, and this will help performance usually. Caution: major compactions rewrite all of the stores data and on a loaded system, this may not be tenable; - major compactions will usually have to be on large systems. + major compactions will usually have to be done manually on large systems. See .
diff --git a/src/docbkx/configuration.xml b/src/docbkx/configuration.xml index 6fb434b2605..3595e766a19 100644 --- a/src/docbkx/configuration.xml +++ b/src/docbkx/configuration.xml @@ -1076,7 +1076,18 @@ script to perform a network IO safe rolling split of all regions. - +
Managed Compactions + A common administrative technique is to manage major compactions manually, rather than letting + HBase do it. By default, HConstants.MAJOR_COMPACTION_PERIOD is one day and major compactions + may kick in when you least desire it - especially on a busy system. To "turn off" automatic major compactions set + the value to Long.MAX_VALUE. + + It is important to stress that major compactions are absolutely necessary for StoreFile cleanup, the only variant is when + they occur. They can be administered through the HBase shell, or via + HBaseAdmin. + +
+ diff --git a/src/docbkx/developer.xml b/src/docbkx/developer.xml index e56bcee8ed0..de4f1e2db24 100644 --- a/src/docbkx/developer.xml +++ b/src/docbkx/developer.xml @@ -35,6 +35,7 @@ git clone git://git.apache.org/hbase.git See HBASE-3678 Add Eclipse-based Apache Formatter to HBase Wiki for an Eclipse formatter to help ensure your code conforms to HBase'y coding conventsion. The issue includes instructions for loading the attached formatter. + Also, no @author tags - that's a rule. Quality Javadoc comments are appreciated. And include the Apache license.
Subversive Plugin @@ -129,13 +130,15 @@ mvn test -Dtest=TestXYZ
Getting Involved - HBase gets better only when people contribute! The following are highlights from the HBase wiki on - How To Contribute. + HBase gets better only when people contribute!
Mailing Lists - Sign up for the dev-list, and the user-list too for greater coverage. See the + Sign up for the dev-list and the user-list. See the mailing lists page. + Posing questions - and helping to answer other people's questions - is encouraged! + There are varying levels of experience on both lists so patience and politeness are encouraged (and please + stay on topic.)
@@ -144,14 +147,19 @@ mvn test -Dtest=TestXYZ If it's either a new feature request, enhancement, or a bug, file a ticket.
+
Codelines + Most development is done on TRUNK. However, there are branches for minor releases (e.g., 0.90.1, 0.90.2, and 0.90.3 are on the 0.90 branch). + If you have any questions on this just send an email to the dev dist-list. +
Submitting Patches
Create Patch - Patch files can be easily generated from Eclipse, for example by selecting Team -> Create Patch. + Patch files can be easily generated from Eclipse, for example by selecting "Team -> Create Patch". Please submit one patch-file per Jira. For example, if multiple files are changed make sure the selected resource when generating the patch is a directory. Patch files can reflect changes in multiple files. + Make sure you review for code style.
Patch File Naming @@ -162,12 +170,14 @@ mvn test -Dtest=TestXYZ
Unit Tests Yes, please. Please try to include unit tests with every code patch (and especially new classes and large changes). + Also, please make sure unit tests pass locally before submitting the patch.
Attach Patch to Jira - The patch should be attached to the associated Jira ticket. + The patch should be attached to the associated Jira ticket "More Actions -> Attach Files". Make sure you click the + ASF license inclusion, otherwise the patch can't be considered for inclusion. - Once attached to the ticket, click "submit patch" and + Once attached to the ticket, click "Submit Patch" and the status of the ticket will change. Committers will review submitted patches for inclusion into the codebase. Please understand that not every patch may get committed, and that feedback will likely be provided on the patch. Fear not, though, because the HBase community is helpful! @@ -177,7 +187,7 @@ mvn test -Dtest=TestXYZ
Committing Patches - See How To Commit in the HBase wiki. + Committers do this. See How To Commit in the HBase wiki.
diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml index 796f09db1f6..d8e104f7d7e 100644 --- a/src/docbkx/performance.xml +++ b/src/docbkx/performance.xml @@ -232,10 +232,13 @@ Deferred log flush can be configured on tables via
MapReduce: Skip The Reducer - When writing a lot of data to an HBase table in a in a Mapper (e.g., with TableOutputFormat), - skip the Reducer step whenever possible. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then shuffled to other - Reducers that will most likely be off-node. + When writing a lot of data to an HBase table from a MR job (e.g., with TableOutputFormat), and specifically where Puts are being emitted + from the Mapper, skip the Reducer step. When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other + Reducers that will most likely be off-node. It's far more efficient to just write directly to HBase. + + For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result). + This is a different processing problem than from the the above case.