18 KiB
id | title | sidebar_label |
---|---|---|
filters | Query filters | Filters |
Apache Druid supports two query languages: Druid SQL and native queries. This document describes the native language. For information about aggregators available in SQL, refer to the SQL documentation.
A filter is a JSON object indicating which rows of data should be included in the computation for a query. It’s essentially the equivalent of the WHERE clause in SQL. Filters are commonly applied on dimensions, but can be applied on aggregated metrics, for example, see Filtered aggregator and Having filters.
Apache Druid supports the following types of filters.
Selector filter
The simplest filter is a selector filter. The selector filter will match a specific dimension with a specific value. Selector filters can be used as the base filters for more complex Boolean expressions of filters.
The grammar for a SELECTOR filter is as follows:
"filter": { "type": "selector", "dimension": <dimension_string>, "value": <dimension_value_string> }
This is the equivalent of WHERE <dimension_string> = '<dimension_value_string>'
or WHERE <dimension_string> IS NULL
(if the value
is null
).
The selector filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
Column comparison filter
The column comparison filter is similar to the selector filter, but instead compares dimensions to each other. For example:
"filter": { "type": "columnComparison", "dimensions": [<dimension_a>, <dimension_b>] }
This is the equivalent of WHERE <dimension_a> = <dimension_b>
.
dimensions
is list of DimensionSpecs, making it possible to apply an extraction function if needed.
Regular expression filter
The regular expression filter is similar to the selector filter, but using regular expressions. It matches the specified dimension with the given pattern. The pattern can be any standard Java regular expression.
"filter": { "type": "regex", "dimension": <dimension_string>, "pattern": <pattern_string> }
The regex filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
Logical expression filters
AND
The grammar for an AND filter is as follows:
"filter": { "type": "and", "fields": [<filter>, <filter>, ...] }
The filters in fields can be any other filter defined on this page.
OR
The grammar for an OR filter is as follows:
"filter": { "type": "or", "fields": [<filter>, <filter>, ...] }
The filters in fields can be any other filter defined on this page.
NOT
The grammar for a NOT filter is as follows:
"filter": { "type": "not", "field": <filter> }
The filter specified at field can be any other filter defined on this page.
JavaScript filter
The JavaScript filter matches a dimension against the specified JavaScript function predicate. The filter matches values for which the function returns true.
The function takes a single argument, the dimension value, and returns either true or false.
{
"type" : "javascript",
"dimension" : <dimension_string>,
"function" : "function(value) { <...> }"
}
Example
The following matches any dimension values for the dimension name
between 'bar'
and 'foo'
{
"type" : "javascript",
"dimension" : "name",
"function" : "function(x) { return(x >= 'bar' && x <= 'foo') }"
}
The JavaScript filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
JavaScript-based functionality is disabled by default. Please refer to the Druid JavaScript programming guide for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
Extraction filter
The extraction filter is now deprecated. The selector filter with an extraction function specified provides identical functionality and should be used instead.
Extraction filter matches a dimension using some specific Extraction function.
The following filter matches the values for which the extraction function has transformation entry input_key=output_value
where
output_value
is equal to the filter value
and input_key
is present as dimension.
Example
The following matches dimension values in [product_1, product_3, product_5]
for the column product
{
"filter": {
"type": "extraction",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "lookup",
"lookup": {
"type": "map",
"map": {
"product_1": "bar_1",
"product_5": "bar_1",
"product_3": "bar_1"
}
}
}
}
}
Search filter
Search filters can be used to filter on partial string matches.
{
"filter": {
"type": "search",
"dimension": "product",
"query": {
"type": "insensitive_contains",
"value": "foo"
}
}
}
property | description | required? |
---|---|---|
type | This String should always be "search". | yes |
dimension | The dimension to perform the search over. | yes |
query | A JSON object for the type of search. See search query spec for more information. | yes |
extractionFn | Extraction function to apply to the dimension | no |
The search filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
Search query spec
Contains
property | description | required? |
---|---|---|
type | This String should always be "contains". | yes |
value | A String value to run the search over. | yes |
caseSensitive | Whether two string should be compared as case sensitive or not | no (default == false) |
Insensitive Contains
property | description | required? |
---|---|---|
type | This String should always be "insensitive_contains". | yes |
value | A String value to run the search over. | yes |
Note that an "insensitive_contains" search is equivalent to a "contains" search with "caseSensitive": false (or not provided).
Fragment
property | description | required? |
---|---|---|
type | This String should always be "fragment". | yes |
values | A JSON array of String values to run the search over. | yes |
caseSensitive | Whether strings should be compared as case sensitive or not. Default: false(insensitive) | no |
In filter
In filter can be used to express the following SQL query:
SELECT COUNT(*) AS 'Count' FROM `table` WHERE `outlaw` IN ('Good', 'Bad', 'Ugly')
The grammar for a "in" filter is as follows:
{
"type": "in",
"dimension": "outlaw",
"values": ["Good", "Bad", "Ugly"]
}
The "in" filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
If an empty values
array is passed to the "in" filter, it will simply return an empty result.
If the dimension
is a multi-valued dimension, the "in" filter will return true if one of the dimension values is
in the values
array.
If the values
array contains null
, the "in" filter matches null values. This differs from the SQL IN filter, which
does not match NULL values.
Like filter
Like filters can be used for basic wildcard searches. They are equivalent to the SQL LIKE operator. Special characters supported are "%" (matches any number of characters) and "_" (matches any one character).
property | type | description | required? |
---|---|---|---|
type | String | This should always be "like". | yes |
dimension | String | The dimension to filter on | yes |
pattern | String | LIKE pattern, such as "foo%" or "___bar". | yes |
escape | String | An escape character that can be used to escape special characters. | no |
extractionFn | Extraction function | Extraction function to apply to the dimension | no |
Like filters support the use of extraction functions, see Filtering with Extraction Functions for details.
This Like filter expresses the condition last_name LIKE "D%"
(i.e. last_name starts with "D").
{
"type": "like",
"dimension": "last_name",
"pattern": "D%"
}
Bound filter
Bound filters can be used to filter on ranges of dimension values. It can be used for comparison filtering like greater than, less than, greater than or equal to, less than or equal to, and "between" (if both "lower" and "upper" are set).
property | type | description | required? |
---|---|---|---|
type | String | This should always be "bound". | yes |
dimension | String | The dimension to filter on | yes |
lower | String | The lower bound for the filter | no |
upper | String | The upper bound for the filter | no |
lowerStrict | Boolean | Perform strict comparison on the lower bound (">" instead of ">=") | no, default: false |
upperStrict | Boolean | Perform strict comparison on the upper bound ("<" instead of "<=") | no, default: false |
ordering | String | Specifies the sorting order to use when comparing values against the bound. Can be one of the following values: "lexicographic", "alphanumeric", "numeric", "strlen", "version". See Sorting Orders for more details. | no, default: "lexicographic" |
extractionFn | Extraction function | Extraction function to apply to the dimension | no |
Bound filters support the use of extraction functions, see Filtering with Extraction Functions for details.
The following bound filter expresses the condition 21 <= age <= 31
:
{
"type": "bound",
"dimension": "age",
"lower": "21",
"upper": "31" ,
"ordering": "numeric"
}
This filter expresses the condition foo <= name <= hoo
, using the default lexicographic sorting order.
{
"type": "bound",
"dimension": "name",
"lower": "foo",
"upper": "hoo"
}
Using strict bounds, this filter expresses the condition 21 < age < 31
{
"type": "bound",
"dimension": "age",
"lower": "21",
"lowerStrict": true,
"upper": "31" ,
"upperStrict": true,
"ordering": "numeric"
}
The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses age < 31
.
{
"type": "bound",
"dimension": "age",
"upper": "31" ,
"upperStrict": true,
"ordering": "numeric"
}
Likewise, this filter expresses age >= 18
{
"type": "bound",
"dimension": "age",
"lower": "18" ,
"ordering": "numeric"
}
Interval filter
The Interval filter enables range filtering on columns that contain long millisecond values, with the boundaries specified as ISO 8601 time intervals. It is suitable for the __time
column, long metric columns, and dimensions with values that can be parsed as long milliseconds.
This filter converts the ISO 8601 intervals to long millisecond start/end ranges and translates to an OR of Bound filters on those millisecond ranges, with numeric comparison. The Bound filters will have left-closed and right-open matching (i.e., start <= time < end).
property | type | description | required? |
---|---|---|---|
type | String | This should always be "interval". | yes |
dimension | String | The dimension to filter on | yes |
intervals | Array | A JSON array containing ISO-8601 interval strings. This defines the time ranges to filter on. | yes |
extractionFn | Extraction function | Extraction function to apply to the dimension | no |
The interval filter supports the use of extraction functions, see Filtering with Extraction Functions for details.
If an extraction function is used with this filter, the extraction function should output values that are parseable as long milliseconds.
The following example filters on the time ranges of October 1-7, 2014 and November 15-16, 2014.
{
"type" : "interval",
"dimension" : "__time",
"intervals" : [
"2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
"2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
]
}
The filter above is equivalent to the following OR of Bound filters:
{
"type": "or",
"fields": [
{
"type": "bound",
"dimension": "__time",
"lower": "1412121600000",
"lowerStrict": false,
"upper": "1412640000000" ,
"upperStrict": true,
"ordering": "numeric"
},
{
"type": "bound",
"dimension": "__time",
"lower": "1416009600000",
"lowerStrict": false,
"upper": "1416096000000" ,
"upperStrict": true,
"ordering": "numeric"
}
]
}
Filtering with extraction functions
All filters except the "spatial" filter support extraction functions. An extraction function is defined by setting the "extractionFn" field on a filter. See Extraction function for more details on extraction functions.
If specified, the extraction function will be used to transform input values before the filter is applied. The example below shows a selector filter combined with an extraction function. This filter will transform input values according to the values defined in the lookup map; transformed values will then be matched with the string "bar_1".
Example
The following matches dimension values in [product_1, product_3, product_5]
for the column product
{
"filter": {
"type": "selector",
"dimension": "product",
"value": "bar_1",
"extractionFn": {
"type": "lookup",
"lookup": {
"type": "map",
"map": {
"product_1": "bar_1",
"product_5": "bar_1",
"product_3": "bar_1"
}
}
}
}
}
Column types
Druid supports filtering on timestamp, string, long, and float columns.
Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to scan those columns.
Filtering on numeric columns
When filtering on numeric columns, you can write filters as if they were strings. In most cases, your filter will be converted into a numeric predicate and will be applied to the numeric column values directly. In some cases (such as the "regex" filter) the numeric column values will be converted to strings during the scan.
For example, filtering on a specific value, myFloatColumn = 10.1
:
"filter": {
"type": "selector",
"dimension": "myFloatColumn",
"value": "10.1"
}
Filtering on a range of values, 10 <= myFloatColumn < 20
:
"filter": {
"type": "bound",
"dimension": "myFloatColumn",
"ordering": "numeric",
"lower": "10",
"lowerStrict": false,
"upper": "20",
"upperStrict": true
}
Filtering on the timestamp column
Query filters can also be applied to the timestamp column. The timestamp column has long millisecond values. To refer
to the timestamp column, use the string __time
as the dimension name. Like numeric dimensions, timestamp filters
should be specified as if the timestamp values were strings.
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the Time Format Extraction Function is useful.
For example, filtering on a long timestamp value:
"filter": {
"type": "selector",
"dimension": "__time",
"value": "124457387532"
}
Filtering on day of week:
"filter": {
"type": "selector",
"dimension": "__time",
"value": "Friday",
"extractionFn": {
"type": "timeFormat",
"format": "EEEE",
"timeZone": "America/New_York",
"locale": "en"
}
}
Filtering on a set of ISO 8601 intervals:
{
"type" : "interval",
"dimension" : "__time",
"intervals" : [
"2014-10-01T00:00:00.000Z/2014-10-07T00:00:00.000Z",
"2014-11-15T00:00:00.000Z/2014-11-16T00:00:00.000Z"
]
}
True filter
The true filter is a filter which matches all values. It can be used to temporarily disable other filters without removing the filter.
{ "type" : "true" }
Expression filter
The expression filter allows for the implementation of arbitrary conditions, leveraging the Druid expression system.
This filter allows for more flexibility, but it might be less performant than a combination of the other filters on this page due to the fact that not all filter optimizations are in place yet.
{
"type" : "expression" ,
"expression" : "((product_type == 42) && (!is_deleted))"
}
See the Druid expression system for more details.