Merge pull request #1904 from rohitkochar/doc

modifying the documentation as per Issue#1903
This commit is contained in:
Fangjin Yang 2015-11-04 09:48:45 -08:00
commit 861338c5e9
1 changed files with 6 additions and 5 deletions

View File

@ -43,18 +43,19 @@ group-by operations, so each dimension requires the following
three data structures:
1. A dictionary that maps values (which are always treated as strings) to integer IDs,
2. For each distinct value in the column, a bitmap that indicates which rows contain that value, and
3. A list of the columns values, encoded using the dictionary in 1.
2. A list of the columns values, encoded using the dictionary in 1, and
3. For each distinct value in the column, a bitmap that indicates which rows contain that value.
Why these three data structures? The dictionary simply maps string
values to integer ids so that the values in 2 and 3 can be
represented compactly. The bitmaps in 2 -- also known as *inverted
represented compactly. The bitmaps in 3 -- also known as *inverted
indexes* allow for quick filtering operations (specifically, bitmaps
are convenient for quickly applying AND and OR operators). Finally,
the list of values in 3 are needed for *group by* and *TopN*
the list of values in 2 is needed for *group by* and *TopN*
queries. In other words, queries that solely aggregate metrics based
on filters do not need to touch the list of dimension values stored in
3.
2.
To get a concrete sense of these data structures, consider the page
column from the example data above. The three data structures that