diff --git a/src/docbkx/book.xml b/src/docbkx/book.xml index 9b652d51b58..ffac6377983 100644 --- a/src/docbkx/book.xml +++ b/src/docbkx/book.xml @@ -179,7 +179,7 @@ admin.enableTable(table); On the number of column families - HBase currently does not do well with anything about two or three column families so keep the number + HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed though the amount of data they carry is small. Compaction is currently triggered @@ -187,7 +187,7 @@ admin.enableTable(table); flushing and compaction interaction can make for a bunch of needless i/o loading (To be addressed by changing flushing and compaction to work on a per column family basis). - Try to make do with one column famliy if you can in your schemas. Only introduce a + Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time. @@ -214,7 +214,7 @@ admin.enableTable(table); Or why are my storefile indices large? In HBase, values are always freighted with their coordinates; as a cell value passes through the system, it'll be accompanied by its - row, column name, and timestamp. Always. If your rows and column names + row, column name, and timestamp - always. If your rows and column names are large, especially compared to the size of the cell value, then you may run up against some interesting scenarios. One such is the case described by Marc Limotte at the tail of @@ -231,6 +231,8 @@ admin.enableTable(table); the thread a question storefileIndexSize up on the user mailing list. ` + In summary, although verbose attribute names (e.g., "myImportantAttribute") are easier to read, you pay for the clarity in storage and increased I/O - use shorter attribute names and constants. + Also, try to keep the row-keys as small as possible too.
diff --git a/src/docbkx/performance.xml b/src/docbkx/performance.xml index d8e104f7d7e..f7354386e3a 100644 --- a/src/docbkx/performance.xml +++ b/src/docbkx/performance.xml @@ -128,12 +128,19 @@ </section> - <section xml:id="perf.number.of.cfs"> - <title>Number of Column Families - - See . +
+ Schema Design + +
+ Number of Column Families + See . +
+
+ Key and Attribute Lengths + See . +
- +
Writing to HBase