diff --git a/docs/content/ingestion/schema-design.md b/docs/content/ingestion/schema-design.md index 07ea4e77fd6..aaa28dd6b10 100644 --- a/docs/content/ingestion/schema-design.md +++ b/docs/content/ingestion/schema-design.md @@ -24,8 +24,11 @@ Below, we outline some best practices with schema design: If the user wishes to ingest a column as a numeric-typed dimension (Long or Float), it is necessary to specify the type of the column in the `dimensions` section of the `dimensionsSpec`. If the type is omitted, Druid will ingest a column as the default String type. -See [Dimension Schema](../ingestion/index.html#dimension-schema) for more information. +There are performance tradeoffs between string and numeric columns. Numeric columns are generally faster to group on +than string columns. But unlike string columns, numeric columns don't have indexes, so they are generally slower to +filter on. +See [Dimension Schema](../ingestion/index.html#dimension-schema) for more information. ## High cardinality dimensions (e.g. unique IDs) diff --git a/docs/content/querying/filters.md b/docs/content/querying/filters.md index c9f51a70149..59d17f14c60 100644 --- a/docs/content/querying/filters.md +++ b/docs/content/querying/filters.md @@ -392,20 +392,53 @@ The following matches dimension values in `[product_1, product_3, product_5]` fo } ``` +## Column types + +Druid supports filtering on timestamp, string, long, and float columns. + +Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to +scan those columns. + +### Filtering on numeric columns + +When filtering on numeric columns, you can write filters as if they were strings. In most cases, your filter will be +converted into a numeric predicate and will be applied to the numeric column values directly. In some cases (such as +the "regex" filter) the numeric column values will be converted to strings during the scan. + +For example, filtering on a specific value, `myFloatColumn = 10.1`: + +```json +"filter": { + "type": "selector", + "dimension": "myFloatColumn", + "value": "10.1" +} +``` + +Filtering on a range of values, `10 <= myFloatColumn < 20`: + +```json +"filter": { + "type": "bound", + "dimension": "myFloatColumn", + "ordering": "numeric", + "lowerBound": "10", + "lowerStrict": false, + "upperBound": "20", + "upperStrict": true +} +``` + ### Filtering on the Timestamp Column -Filters can also be applied to the timestamp column. The timestamp column has long millisecond values. -To refer to the timestamp column, use the string `__time` as the dimension name. - -The filter parameters (e.g., the selector value for the SelectorFilter) should be provided as Strings. +Query filters can also be applied to the timestamp column. The timestamp column has long millisecond values. To refer +to the timestamp column, use the string `__time` as the dimension name. Like numeric dimensions, timestamp filters +should be specified as if the timestamp values were strings. If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.html#time-format-extraction-function) is useful. -Note that the timestamp column does not have a bitmap index. Thus, filtering on timestamp in a query requires a scan of the column, and performance will be affected accordingly. If possible, excluding time ranges by specifying the query interval will be faster. +For example, filtering on a long timestamp value: -**Example** - -Filtering on a long timestamp value: ```json "filter": { "type": "selector", @@ -415,6 +448,7 @@ Filtering on a long timestamp value: ``` Filtering on day of week: + ```json "filter": { "type": "selector", @@ -430,6 +464,7 @@ Filtering on day of week: ``` Filtering on a set of ISO 8601 intervals: + ```json { "type" : "interval",