mirror of https://github.com/apache/druid.git
Merge pull request #1904 from rohitkochar/doc
modifying the documentation as per Issue#1903
This commit is contained in:
commit
861338c5e9
|
@ -43,18 +43,19 @@ group-by operations, so each dimension requires the following
|
|||
three data structures:
|
||||
|
||||
1. A dictionary that maps values (which are always treated as strings) to integer IDs,
|
||||
2. For each distinct value in the column, a bitmap that indicates which rows contain that value, and
|
||||
3. A list of the column’s values, encoded using the dictionary in 1.
|
||||
2. A list of the column’s values, encoded using the dictionary in 1, and
|
||||
3. For each distinct value in the column, a bitmap that indicates which rows contain that value.
|
||||
|
||||
|
||||
Why these three data structures? The dictionary simply maps string
|
||||
values to integer ids so that the values in 2 and 3 can be
|
||||
represented compactly. The bitmaps in 2 -- also known as *inverted
|
||||
represented compactly. The bitmaps in 3 -- also known as *inverted
|
||||
indexes* allow for quick filtering operations (specifically, bitmaps
|
||||
are convenient for quickly applying AND and OR operators). Finally,
|
||||
the list of values in 3 are needed for *group by* and *TopN*
|
||||
the list of values in 2 is needed for *group by* and *TopN*
|
||||
queries. In other words, queries that solely aggregate metrics based
|
||||
on filters do not need to touch the list of dimension values stored in
|
||||
3.
|
||||
2.
|
||||
|
||||
To get a concrete sense of these data structures, consider the ‘page’
|
||||
column from the example data above. The three data structures that
|
||||
|
|
Loading…
Reference in New Issue