druid/docs/content/DimensionSpecs.md

4.7 KiB

layout
doc_page

Transforming Dimension Values

The following JSON fields can be used in a query to operate on dimension values.

DimensionSpec

DimensionSpecs define how dimension values get transformed prior to aggregation.

Default DimensionSpec

Returns dimension values as is and optionally renames the dimension.

{ "type" : "default", "dimension" : <dimension>, "outputName": <output_name> }

Extraction DimensionSpec

Returns dimension values transformed using the given extraction function.

{
  "type" : "extraction",
  "dimension" : <dimension>,
  "outputName" :  <output_name>,
  "extractionFn" : <extraction_function>
}

Extraction Functions

Extraction functions define the transformation applied to each dimension value.

Transformations can be applied to both regular (string) dimensions, as well as the special __time dimension, which represents the current time bucket according to the query aggregation granularity.

Note: for functions taking string values (such as regular expressions), __time dimension values will be formatted in ISO-8601 format before getting passed to the extraction function.

Regular Expression Extraction Function

Returns the first matching group for the given regular expression. If there is no match, it returns the dimension value as is.

{ "type" : "regex", "expr" : <regular_expression> }

For example, using "expr" : "(\\w\\w\\w).*" will transform 'Monday', 'Tuesday', 'Wednesday' into 'Mon', 'Tue', 'Wed'.

Partial Extraction Function

Returns the dimension value unchanged if the regular expression matches, otherwise returns null.

{ "type" : "partial", "expr" : <regular_expression> }

Search Query Extraction Function

Returns the dimension value unchanged if the given SearchQuerySpec matches, otherwise returns null.

{ "type" : "searchQuery", "query" : <search_query_spec> }

Time Format Extraction Function

Returns the dimension value formatted according to the given format string, time zone, and locale.

For __time dimension values, this formats the time value bucketed by the aggregation granularity

For a regular dimension, it assumes the string is formatted in ISO-8601 date and time format.

{ "type" : "timeFormat",
  "format" : <output_format>,
  "timeZone" : <time_zone> (optional),
  "locale" : <locale> (optional) }

For example, the following dimension spec returns the day of the week for Montréal in French:

{
  "type" : "extraction",
  "dimension" : "__time",
  "outputName" :  "dayOfWeek",
  "extractionFn" : {
    "type" : "timeFormat",
    "format" : "EEEE",
    "timeZone" : "America/Montreal",
    "locale" : "fr"
  }
}

Time Parsing Extraction Function

Parses dimension values as timestamps using the given input format, and returns them formatted using the given output format.

Note, if you are working with the __time dimension, you should consider using the time extraction function instead instead, which works on time value directly as opposed to string values.

Time formats are described in the SimpleDateFormat documentation

{ "type" : "time",
  "timeFormat" : <input_format>,
  "resultFormat" : <output_format> }

Javascript Extraction Function

Returns the dimension value, as transformed by the given JavaScript function.

For regular dimensions, the input value is passed as a string.

For the __time dimension, the input value is passed as a number representing the number of milliseconds since January 1, 1970 UTC.

Example for a regular dimension

{
  "type" : "javascript",
  "function" : "function(str) { return str.substr(0, 3); }"
}

Example for the __time dimension:

{
  "type" : "javascript",
  "function" : "function(t) { return 'Second ' + Math.floor((t % 60000) / 1000); }"
}