From f0444014b8b2dd75284d317149addd8b288c91ff Mon Sep 17 00:00:00 2001 From: Doug Meil Date: Wed, 2 Nov 2011 20:40:14 +0000 Subject: [PATCH] HBASE-4730 book.xml, ops_mgt.xml, performance.xml - handful of changes git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1196792 13f79535-47bb-0310-9956-ffa450edef68 --- src/docbkx/book.xml | 18 +++++++++++++++++- src/docbkx/ops_mgt.xml | 5 +++++ src/docbkx/performance.xml | 5 +++++ 3 files changed, 27 insertions(+), 1 deletion(-) diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 669aff03f8a..371a88e3cf1 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -565,6 +565,12 @@ admin.enableTable(table); second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time. +
Cardinality of ColumnFamilies + Where multiple ColumnFamilies exist in a single table, be aware of the cardinality (i.e., number of rows). + If ColumnFamily-A has 1000,000 rows and ColumnFamily-B has 1 billion rows, ColumnFamily-A's data will likely be spread + across many, many regions (and RegionServers). This makes mass scans for ColumnFamily-A less efficient. + +
Rowkey Design
@@ -972,6 +978,11 @@ public static class MyMapper extends TableMapper<ImmutableBytesWritable, Put&
+
+ HBase MapReduce Read/Write Example With Multi-Table Output + TODO: example for MultiTableOutputFormat. + +
HBase MapReduce Summary Example The following example uses HBase as a MapReduce source and sink with a summarization step. This example will @@ -1575,7 +1586,6 @@ scan.setFilter(filter); For more information, see the HFile source code.
-
HFile Tool @@ -1589,7 +1599,13 @@ scan.setFilter(filter); usage for other things to do with the HFile tool.
+
+ StoreFile Directory Structure on HDFS + For more information of what StoreFiles look like on HDFS with respect to the directory structure, see . +
+
+
Blocks StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis. diff --git a/src/docbkx/ops_mgt.xml b/src/docbkx/ops_mgt.xml index 8601427734f..fa534842564 100644 --- a/src/docbkx/ops_mgt.xml +++ b/src/docbkx/ops_mgt.xml @@ -417,6 +417,11 @@ false
+
Regions + Another common question for HBase administrators is determining the right number of regions per + RegionServer. This affects both storage and hardware planning. See . + +
diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml index c2fffbb6716..12b1d19d006 100644 --- a/src/docbkx/performance.xml +++ b/src/docbkx/performance.xml @@ -140,6 +140,11 @@ The number of regions for an HBase table is driven by the . Also, see the architecture section on + A lower number of regions is preferred, generally in the range of 20 to 200 + per RegionServer. Adjust the regionsize as appropriate to achieve this number. There + are some clusters that set the regionsize to 20Gb, for example, so you may need to + experiment with this setting based on your hardware configuration and application needs. +