Add ability to apply extraction functions to the time dimension

- Moves DimExtractionFn under a more generic ExtractionFn interface to
  support extracting dimension values other than strings
- pushes down extractionFn to the storage adapter from query engine
- 'dimExtractionFn' parameter has been deprecated in favor of 'extractionFn'
- adds a TimeFormatExtractionFn, allowing to project the '__time' dimension
- JavascriptDimExtractionFn renamed to JavascriptExtractionFn, adding
  support for any dimension value types that map directly to Javascript
- update documentation for time column extraction and related changes
This commit is contained in:
Xavier Léauté 2015-02-26 17:13:07 -08:00
parent c97fbdf616
commit d3f5bddc5c
44 changed files with 1037 additions and 231 deletions

View File

@ -1,14 +1,16 @@
--- ---
layout: doc_page layout: doc_page
--- ---
# Transforming Dimension Values # Transforming Dimension Values
The following JSON fields can be used in a query to operate on dimension values. The following JSON fields can be used in a query to operate on dimension values.
## DimensionSpec ## DimensionSpec
`DimensionSpec`s define how dimension values get transformed prior to aggregation. `DimensionSpec`s define how dimension values get transformed prior to aggregation.
### DefaultDimensionSpec ### Default DimensionSpec
Returns dimension values as is and optionally renames the dimension. Returns dimension values as is and optionally renames the dimension.
@ -16,63 +18,139 @@ Returns dimension values as is and optionally renames the dimension.
{ "type" : "default", "dimension" : <dimension>, "outputName": <output_name> } { "type" : "default", "dimension" : <dimension>, "outputName": <output_name> }
``` ```
### ExtractionDimensionSpec ### Extraction DimensionSpec
Returns dimension values transformed using the given [DimExtractionFn](#toc_3) Returns dimension values transformed using the given [extraction function](#extraction-functions).
```json ```json
{ {
"type" : "extraction", "type" : "extraction",
"dimension" : <dimension>, "dimension" : <dimension>,
"outputName" : <output_name>, "outputName" : <output_name>,
"dimExtractionFn" : <dim_extraction_fn> "extractionFn" : <extraction_function>
} }
``` ```
## <a id="toc_3"></a>DimExtractionFn ## Extraction Functions
`DimExtractionFn`s define the transformation applied to each dimension value Extraction functions define the transformation applied to each dimension value.
### RegexDimExtractionFn Transformations can be applied to both regular (string) dimensions, as well
as the special `__time` dimension, which represents the current time bucket
according to the query [aggregation granularity](Granularities.html).
Returns the first group matched by the given regular expression. If there is no match it returns the dimension value as is. **Note**: for functions taking string values (such as regular expressions),
`__time` dimension values will be formatted in [ISO-8601 format](https://en.wikipedia.org/wiki/ISO_8601)
before getting passed to the extraction function.
### Regular Expression Extraction Function
Returns the first matching group for the given regular expression.
If there is no match, it returns the dimension value as is.
```json ```json
{ "type" : "regex", "expr", <regular_expression> } { "type" : "regex", "expr" : <regular_expression> }
``` ```
### PartialDimExtractionFn For example, using `"expr" : "(\\w\\w\\w).*"` will transform
`'Monday'`, `'Tuesday'`, `'Wednesday'` into `'Mon'`, `'Tue'`, `'Wed'`.
Returns the dimension value as is if there is a match, otherwise returns null. ### Partial Extraction Function
Returns the dimension value unchanged if the regular expression matches, otherwise returns null.
```json ```json
{ "type" : "partial", "expr", <regular_expression> } { "type" : "partial", "expr" : <regular_expression> }
``` ```
### SearchQuerySpecDimExtractionFn ### Search Query Extraction Function
Returns the dimension value as is if the given [SearchQuerySpec](SearchQuerySpec.html) matches, otherwise returns null. Returns the dimension value unchanged if the given [`SearchQuerySpec`](SearchQuerySpec.html)
matches, otherwise returns null.
```json ```json
{ "type" : "searchQuery", "query" : <search_query_spec> } { "type" : "searchQuery", "query" : <search_query_spec> }
``` ```
### TimeDimExtractionFn ### Time Format Extraction Function
Parses dimension values as timestamps using the given input format, and returns them formatted using the given output format. Time formats follow the [com.ibm.icu.text.SimpleDateFormat](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/SimpleDateFormat.html) format Returns the dimension value formatted according to the given format string, time zone, and locale.
For `__time` dimension values, this formats the time value bucketed by the
[aggregation granularity](Granularities.html)
For a regular dimension, it assumes the string is formatted in
[ISO-8601 date and time format](https://en.wikipedia.org/wiki/ISO_8601).
* `format` : date time format for the resulting dimension value, in [Joda Time DateTimeFormat](http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html).
* `locale` : locale (language and country) to use, given as a [IETF BCP 47 language tag](http://www.oracle.com/technetwork/java/javase/java8locales-2095355.html#util-text), e.g. `en-US`, `en-GB`, `fr-FR`, `fr-CA`, etc.
* `timeZone` : time zone to use in [IANA tz database format](http://en.wikipedia.org/wiki/List_of_tz_database_time_zones), e.g. `Europe/Berlin` (this can possibly be different than the aggregation time-zone)
```json ```json
{ "type" : "time", "timeFormat" : <input_format>, "resultFormat" : <output_format> } { "type" : "timeFormat",
"format" : <output_format>,
"timeZone" : <time_zone> (optional),
"locale" : <locale> (optional) }
``` ```
### JavascriptDimExtractionFn For example, the following dimension spec returns the day of the week for Montréal in French:
Returns the dimension value as transformed by the given JavaScript function. ```json
{
"type" : "extraction",
"dimension" : "__time",
"outputName" : "dayOfWeek",
"extractionFn" : {
"type" : "timeFormat",
"format" : "EEEE",
"timeZone" : "America/Montreal",
"locale" : "fr"
}
}
```
Example ### Time Parsing Extraction Function
Parses dimension values as timestamps using the given input format,
and returns them formatted using the given output format.
Note, if you are working with the `__time` dimension, you should consider using the
[time extraction function instead](#time-format-extraction-function) instead,
which works on time value directly as opposed to string values.
Time formats are described in the
[SimpleDateFormat documentation](http://icu-project.org/apiref/icu4j/com/ibm/icu/text/SimpleDateFormat.html)
```json
{ "type" : "time",
"timeFormat" : <input_format>,
"resultFormat" : <output_format> }
```
### Javascript Extraction Function
Returns the dimension value, as transformed by the given JavaScript function.
For regular dimensions, the input value is passed as a string.
For the `__time` dimension, the input value is passed as a number
representing the number of milliseconds since January 1, 1970 UTC.
Example for a regular dimension
```json ```json
{ {
"type" : "javascript", "type" : "javascript",
"function" : "function(str) { return str.substr(0, 3); }" "function" : "function(str) { return str.substr(0, 3); }"
} }
```
Example for the `__time` dimension:
```json
{
"type" : "javascript",
"function" : "function(t) { return 'Second ' + Math.floor((t % 60000) / 1000); }"
}
```

View File

@ -1,6 +1,7 @@
--- ---
layout: doc_page layout: doc_page
--- ---
# Aggregation Granularity # Aggregation Granularity
The granularity field determines how data gets bucketed across the time dimension, or how it gets aggregated by hour, day, minute, etc. The granularity field determines how data gets bucketed across the time dimension, or how it gets aggregated by hour, day, minute, etc.

View File

@ -256,7 +256,7 @@ public class IngestSegmentFirehoseFactory implements FirehoseFactory<InputRowPar
final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap(); final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap();
for (String dim : dims) { for (String dim : dims) {
final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim); final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim, null);
// dimSelector is null if the dimension is not present // dimSelector is null if the dimension is not present
if (dimSelector != null) { if (dimSelector != null) {
dimSelectors.put(dim, dimSelector); dimSelectors.put(dim, dimSelector);

View File

@ -105,7 +105,7 @@ public class CardinalityAggregatorFactory implements AggregatorFactory
@Override @Override
public DimensionSelector apply(@Nullable String input) public DimensionSelector apply(@Nullable String input)
{ {
return columnFactory.makeDimensionSelector(input); return columnFactory.makeDimensionSelector(input, null);
} }
} }
), Predicates.notNull() ), Predicates.notNull()

View File

@ -20,7 +20,7 @@ package io.druid.query.dimension;
import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonProperty;
import com.metamx.common.StringUtils; import com.metamx.common.StringUtils;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
@ -59,7 +59,7 @@ public class DefaultDimensionSpec implements DimensionSpec
} }
@Override @Override
public DimExtractionFn getDimExtractionFn() public ExtractionFn getExtractionFn()
{ {
return null; return null;
} }

View File

@ -19,7 +19,7 @@ package io.druid.query.dimension;
import com.fasterxml.jackson.annotation.JsonSubTypes; import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo; import com.fasterxml.jackson.annotation.JsonTypeInfo;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
/** /**
*/ */
@ -34,7 +34,7 @@ public interface DimensionSpec
public String getOutputName(); public String getOutputName();
public DimExtractionFn getDimExtractionFn(); public ExtractionFn getExtractionFn();
public byte[] getCacheKey(); public byte[] getCacheKey();

View File

@ -19,8 +19,9 @@ package io.druid.query.dimension;
import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Preconditions;
import com.metamx.common.StringUtils; import com.metamx.common.StringUtils;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
@ -31,18 +32,23 @@ public class ExtractionDimensionSpec implements DimensionSpec
private static final byte CACHE_TYPE_ID = 0x1; private static final byte CACHE_TYPE_ID = 0x1;
private final String dimension; private final String dimension;
private final DimExtractionFn dimExtractionFn; private final ExtractionFn extractionFn;
private final String outputName; private final String outputName;
@JsonCreator @JsonCreator
public ExtractionDimensionSpec( public ExtractionDimensionSpec(
@JsonProperty("dimension") String dimension, @JsonProperty("dimension") String dimension,
@JsonProperty("outputName") String outputName, @JsonProperty("outputName") String outputName,
@JsonProperty("dimExtractionFn") DimExtractionFn dimExtractionFn @JsonProperty("extractionFn") ExtractionFn extractionFn,
// for backwards compatibility
@Deprecated @JsonProperty("dimExtractionFn") ExtractionFn dimExtractionFn
) )
{ {
Preconditions.checkNotNull(dimension, "dimension must not be null");
Preconditions.checkArgument(extractionFn != null || dimExtractionFn != null, "extractionFn must not be null");
this.dimension = dimension; this.dimension = dimension;
this.dimExtractionFn = dimExtractionFn; this.extractionFn = extractionFn != null ? extractionFn : dimExtractionFn;
// Do null check for backwards compatibility // Do null check for backwards compatibility
this.outputName = outputName == null ? dimension : outputName; this.outputName = outputName == null ? dimension : outputName;
@ -64,16 +70,16 @@ public class ExtractionDimensionSpec implements DimensionSpec
@Override @Override
@JsonProperty @JsonProperty
public DimExtractionFn getDimExtractionFn() public ExtractionFn getExtractionFn()
{ {
return dimExtractionFn; return extractionFn;
} }
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()
{ {
byte[] dimensionBytes = StringUtils.toUtf8(dimension); byte[] dimensionBytes = StringUtils.toUtf8(dimension);
byte[] dimExtractionFnBytes = dimExtractionFn.getCacheKey(); byte[] dimExtractionFnBytes = extractionFn.getCacheKey();
return ByteBuffer.allocate(1 + dimensionBytes.length + dimExtractionFnBytes.length) return ByteBuffer.allocate(1 + dimensionBytes.length + dimExtractionFnBytes.length)
.put(CACHE_TYPE_ID) .put(CACHE_TYPE_ID)
@ -85,7 +91,7 @@ public class ExtractionDimensionSpec implements DimensionSpec
@Override @Override
public boolean preservesOrdering() public boolean preservesOrdering()
{ {
return dimExtractionFn.preservesOrdering(); return extractionFn.preservesOrdering();
} }
@Override @Override
@ -93,7 +99,7 @@ public class ExtractionDimensionSpec implements DimensionSpec
{ {
return "ExtractionDimensionSpec{" + return "ExtractionDimensionSpec{" +
"dimension='" + dimension + '\'' + "dimension='" + dimension + '\'' +
", dimExtractionFn=" + dimExtractionFn + ", extractionFn=" + extractionFn +
", outputName='" + outputName + '\'' + ", outputName='" + outputName + '\'' +
'}'; '}';
} }
@ -106,7 +112,7 @@ public class ExtractionDimensionSpec implements DimensionSpec
ExtractionDimensionSpec that = (ExtractionDimensionSpec) o; ExtractionDimensionSpec that = (ExtractionDimensionSpec) o;
if (dimExtractionFn != null ? !dimExtractionFn.equals(that.dimExtractionFn) : that.dimExtractionFn != null) if (extractionFn != null ? !extractionFn.equals(that.extractionFn) : that.extractionFn != null)
return false; return false;
if (dimension != null ? !dimension.equals(that.dimension) : that.dimension != null) return false; if (dimension != null ? !dimension.equals(that.dimension) : that.dimension != null) return false;
if (outputName != null ? !outputName.equals(that.outputName) : that.outputName != null) return false; if (outputName != null ? !outputName.equals(that.outputName) : that.outputName != null) return false;
@ -118,7 +124,7 @@ public class ExtractionDimensionSpec implements DimensionSpec
public int hashCode() public int hashCode()
{ {
int result = dimension != null ? dimension.hashCode() : 0; int result = dimension != null ? dimension.hashCode() : 0;
result = 31 * result + (dimExtractionFn != null ? dimExtractionFn.hashCode() : 0); result = 31 * result + (extractionFn != null ? extractionFn.hashCode() : 0);
result = 31 * result + (outputName != null ? outputName.hashCode() : 0); result = 31 * result + (outputName != null ? outputName.hashCode() : 0);
return result; return result;
} }

View File

@ -1,73 +1,35 @@
/* /*
* Druid - a distributed column store. * Licensed to Metamarkets Group Inc. (Metamarkets) under one
* Copyright 2012 - 2015 Metamarkets Group Inc. * or more contributor license agreements. See the NOTICE file
* * distributed with this work for additional information
* Licensed under the Apache License, Version 2.0 (the "License"); * regarding copyright ownership. Metamarkets licenses this file
* you may not use this file except in compliance with the License. * to you under the Apache License, Version 2.0 (the
* You may obtain a copy of the License at * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing,
* distributed under the License is distributed on an "AS IS" BASIS, * software distributed under the License is distributed on an
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* See the License for the specific language governing permissions and * KIND, either express or implied. See the License for the
* limitations under the License. * specific language governing permissions and limitations
* under the License.
*/ */
package io.druid.query.extraction; package io.druid.query.extraction;
import com.fasterxml.jackson.annotation.JsonSubTypes; public abstract class DimExtractionFn implements ExtractionFn
import com.fasterxml.jackson.annotation.JsonTypeInfo;
/**
*/
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
@JsonSubTypes(value = {
@JsonSubTypes.Type(name = "time", value = TimeDimExtractionFn.class),
@JsonSubTypes.Type(name = "regex", value = RegexDimExtractionFn.class),
@JsonSubTypes.Type(name = "partial", value = MatchingDimExtractionFn.class),
@JsonSubTypes.Type(name = "searchQuery", value = SearchQuerySpecDimExtractionFn.class),
@JsonSubTypes.Type(name = "javascript", value = JavascriptDimExtractionFn.class)
})
/**
* A DimExtractionFn is a function that can be used to modify the values of a dimension column.
*
* A simple example of the type of operation this enables is the RegexDimExtractionFn which applies a
* regular expression with a capture group. When the regular expression matches the value of a dimension,
* the value captured by the group is used for grouping operations instead of the dimension value.
*/
public interface DimExtractionFn
{ {
/** @Override
* Returns a byte[] unique to all concrete implementations of DimExtractionFn. This byte[] is used to public String apply(Object value)
* generate a cache key for the specific query. {
* return apply(value.toString());
* @return a byte[] unit to all concrete implements of DimExtractionFn }
*/
public byte[] getCacheKey();
/** @Override
* The "extraction" function. This should map a dimension value into some other value. public String apply(long value)
* {
* In order to maintain the "null and empty string are equivalent" semantics that Druid provides, the return apply(Long.toString(value));
* empty string is considered invalid output for this method and should instead return null. This is }
* a contract on the method rather than enforced at a lower level in order to eliminate a global check
* for extraction functions that do not already need one.
*
*
* @param dimValue the original value of the dimension
* @return a value that should be used instead of the original
*/
public String apply(String dimValue);
/**
* Offers information on whether the extraction will preserve the original ordering of the values.
*
* Some optimizations of queries is possible if ordering is preserved. Null values *do* count towards
* ordering.
*
* @return true if ordering is preserved, false otherwise
*/
public boolean preservesOrdering();
} }

View File

@ -0,0 +1,78 @@
/*
* Druid - a distributed column store.
* Copyright 2012 - 2015 Metamarkets Group Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.druid.query.extraction;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
/**
*/
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
@JsonSubTypes(value = {
@JsonSubTypes.Type(name = "time", value = TimeDimExtractionFn.class),
@JsonSubTypes.Type(name = "regex", value = RegexDimExtractionFn.class),
@JsonSubTypes.Type(name = "partial", value = MatchingDimExtractionFn.class),
@JsonSubTypes.Type(name = "searchQuery", value = SearchQuerySpecDimExtractionFn.class),
@JsonSubTypes.Type(name = "javascript", value = JavascriptExtractionFn.class),
@JsonSubTypes.Type(name = "timeFormat", value = TimeFormatExtractionFn.class)
})
/**
* An ExtractionFn is a function that can be used to transform the values of a column (typically a dimension)
*
* A simple example of the type of operation this enables is the RegexDimExtractionFn which applies a
* regular expression with a capture group. When the regular expression matches the value of a dimension,
* the value captured by the group is used for grouping operations instead of the dimension value.
*/
public interface ExtractionFn
{
/**
* Returns a byte[] unique to all concrete implementations of DimExtractionFn. This byte[] is used to
* generate a cache key for the specific query.
*
* @return a byte[] unit to all concrete implements of DimExtractionFn
*/
public byte[] getCacheKey();
/**
* The "extraction" function. This should map a value into some other String value.
*
* In order to maintain the "null and empty string are equivalent" semantics that Druid provides, the
* empty string is considered invalid output for this method and should instead return null. This is
* a contract on the method rather than enforced at a lower level in order to eliminate a global check
* for extraction functions that do not already need one.
*
*
* @param value the original value of the dimension
* @return a value that should be used instead of the original
*/
public String apply(Object value);
public String apply(String value);
public String apply(long value);
/**
* Offers information on whether the extraction will preserve the original ordering of the values.
*
* Some optimizations of queries is possible if ordering is preserved. Null values *do* count towards
* ordering.
*
* @return true if ordering is preserved, false otherwise
*/
public boolean preservesOrdering();
}

View File

@ -22,17 +22,15 @@ import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Function; import com.google.common.base.Function;
import com.google.common.base.Strings; import com.google.common.base.Strings;
import com.metamx.common.StringUtils; import com.metamx.common.StringUtils;
import com.google.common.base.Strings;
import com.metamx.common.StringUtils;
import org.mozilla.javascript.Context; import org.mozilla.javascript.Context;
import org.mozilla.javascript.ContextFactory; import org.mozilla.javascript.ContextFactory;
import org.mozilla.javascript.ScriptableObject; import org.mozilla.javascript.ScriptableObject;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
public class JavascriptDimExtractionFn implements DimExtractionFn public class JavascriptExtractionFn implements ExtractionFn
{ {
private static Function<String, String> compile(String function) { private static Function<Object, String> compile(String function) {
final ContextFactory contextFactory = ContextFactory.getGlobal(); final ContextFactory contextFactory = ContextFactory.getGlobal();
final Context context = contextFactory.enterContext(); final Context context = contextFactory.enterContext();
context.setOptimizationLevel(9); context.setOptimizationLevel(9);
@ -43,9 +41,9 @@ public class JavascriptDimExtractionFn implements DimExtractionFn
Context.exit(); Context.exit();
return new Function<String, String>() return new Function<Object, String>()
{ {
public String apply(String input) public String apply(Object input)
{ {
// ideally we need a close() function to discard the context once it is not used anymore // ideally we need a close() function to discard the context once it is not used anymore
Context cx = Context.getCurrentContext(); Context cx = Context.getCurrentContext();
@ -53,7 +51,7 @@ public class JavascriptDimExtractionFn implements DimExtractionFn
cx = contextFactory.enterContext(); cx = contextFactory.enterContext();
} }
final Object res = fn.call(cx, scope, scope, new String[]{input}); final Object res = fn.call(cx, scope, scope, new Object[]{input});
return res != null ? Context.toString(res) : null; return res != null ? Context.toString(res) : null;
} }
}; };
@ -62,10 +60,10 @@ public class JavascriptDimExtractionFn implements DimExtractionFn
private static final byte CACHE_TYPE_ID = 0x4; private static final byte CACHE_TYPE_ID = 0x4;
private final String function; private final String function;
private final Function<String, String> fn; private final Function<Object, String> fn;
@JsonCreator @JsonCreator
public JavascriptDimExtractionFn( public JavascriptExtractionFn(
@JsonProperty("function") String function @JsonProperty("function") String function
) )
{ {
@ -90,10 +88,21 @@ public class JavascriptDimExtractionFn implements DimExtractionFn
} }
@Override @Override
public String apply(String dimValue) public String apply(Object value)
{ {
String retVal = fn.apply(dimValue); return Strings.emptyToNull(fn.apply(value));
return Strings.isNullOrEmpty(retVal) ? null : retVal; }
@Override
public String apply(String value)
{
return this.apply((Object)value);
}
@Override
public String apply(long value)
{
return this.apply((Long)value);
} }
@Override @Override

View File

@ -27,7 +27,7 @@ import java.util.regex.Pattern;
/** /**
*/ */
public class MatchingDimExtractionFn implements DimExtractionFn public class MatchingDimExtractionFn extends DimExtractionFn
{ {
private static final byte CACHE_TYPE_ID = 0x2; private static final byte CACHE_TYPE_ID = 0x2;

View File

@ -27,7 +27,7 @@ import java.util.regex.Pattern;
/** /**
*/ */
public class RegexDimExtractionFn implements DimExtractionFn public class RegexDimExtractionFn extends DimExtractionFn
{ {
private static final byte CACHE_TYPE_ID = 0x1; private static final byte CACHE_TYPE_ID = 0x1;

View File

@ -25,7 +25,7 @@ import java.nio.ByteBuffer;
/** /**
*/ */
public class SearchQuerySpecDimExtractionFn implements DimExtractionFn public class SearchQuerySpecDimExtractionFn extends DimExtractionFn
{ {
private static final byte CACHE_TYPE_ID = 0x3; private static final byte CACHE_TYPE_ID = 0x3;

View File

@ -28,7 +28,7 @@ import java.util.Date;
/** /**
*/ */
public class TimeDimExtractionFn implements DimExtractionFn public class TimeDimExtractionFn extends DimExtractionFn
{ {
private static final byte CACHE_TYPE_ID = 0x0; private static final byte CACHE_TYPE_ID = 0x0;

View File

@ -0,0 +1,113 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package io.druid.query.extraction;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Preconditions;
import com.metamx.common.StringUtils;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
import java.nio.ByteBuffer;
import java.util.Locale;
public class TimeFormatExtractionFn implements ExtractionFn
{
private static final byte CACHE_TYPE_ID = 0x5;
private final DateTimeZone tz;
private final String pattern;
private final Locale locale;
private final DateTimeFormatter formatter;
public TimeFormatExtractionFn(
@JsonProperty("format") String pattern,
@JsonProperty("timeZone") DateTimeZone tz,
@JsonProperty("locale") String localeString
)
{
Preconditions.checkArgument(pattern != null, "format cannot be null");
this.pattern = pattern;
this.tz = tz;
this.locale = localeString == null ? null : Locale.forLanguageTag(localeString);
this.formatter = DateTimeFormat.forPattern(pattern)
.withZone(tz == null ? DateTimeZone.UTC : tz)
.withLocale(locale);
}
@JsonProperty
public DateTimeZone getTimeZone()
{
return tz;
}
@JsonProperty
public String getFormat()
{
return pattern;
}
@JsonProperty
public String getLocale()
{
if (locale != null) {
return locale.toLanguageTag();
} else {
return null;
}
}
@Override
public byte[] getCacheKey()
{
byte[] exprBytes = StringUtils.toUtf8(pattern + "\u0001" + tz.getID() + "\u0001" + locale.toLanguageTag());
return ByteBuffer.allocate(1 + exprBytes.length)
.put(CACHE_TYPE_ID)
.put(exprBytes)
.array();
}
@Override
public String apply(long value)
{
return formatter.print(value);
}
@Override
public String apply(Object value)
{
return formatter.print(new DateTime(value));
}
@Override
public String apply(String value)
{
return apply((Object)value);
}
@Override
public boolean preservesOrdering()
{
return false;
}
}

View File

@ -21,7 +21,7 @@ import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonProperty;
import com.google.common.base.Preconditions; import com.google.common.base.Preconditions;
import com.metamx.common.StringUtils; import com.metamx.common.StringUtils;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
@ -31,22 +31,24 @@ public class ExtractionDimFilter implements DimFilter
{ {
private final String dimension; private final String dimension;
private final String value; private final String value;
private final DimExtractionFn dimExtractionFn; private final ExtractionFn extractionFn;
@JsonCreator @JsonCreator
public ExtractionDimFilter( public ExtractionDimFilter(
@JsonProperty("dimension") String dimension, @JsonProperty("dimension") String dimension,
@JsonProperty("value") String value, @JsonProperty("value") String value,
@JsonProperty("dimExtractionFn") DimExtractionFn dimExtractionFn @JsonProperty("extractionFn") ExtractionFn extractionFn,
// for backwards compatibility
@Deprecated @JsonProperty("dimExtractionFn") ExtractionFn dimExtractionFn
) )
{ {
Preconditions.checkArgument(dimension != null, "dimension must not be null"); Preconditions.checkArgument(dimension != null, "dimension must not be null");
Preconditions.checkArgument(value != null, "value must not be null"); Preconditions.checkArgument(value != null, "value must not be null");
Preconditions.checkArgument(dimExtractionFn != null, "extraction function must not be null"); Preconditions.checkArgument(extractionFn != null || dimExtractionFn != null, "extraction function must not be null");
this.dimension = dimension; this.dimension = dimension;
this.value = value; this.value = value;
this.dimExtractionFn = dimExtractionFn; this.extractionFn = extractionFn != null ? extractionFn : dimExtractionFn;
} }
@JsonProperty @JsonProperty
@ -62,9 +64,9 @@ public class ExtractionDimFilter implements DimFilter
} }
@JsonProperty @JsonProperty
public DimExtractionFn getDimExtractionFn() public ExtractionFn getExtractionFn()
{ {
return dimExtractionFn; return extractionFn;
} }
@Override @Override
@ -83,6 +85,6 @@ public class ExtractionDimFilter implements DimFilter
@Override @Override
public String toString() public String toString()
{ {
return String.format("%s(%s) = %s", dimExtractionFn, dimension, value); return String.format("%s(%s) = %s", extractionFn, dimension, value);
} }
} }

View File

@ -42,10 +42,11 @@ import io.druid.query.aggregation.AggregatorFactory;
import io.druid.query.aggregation.BufferAggregator; import io.druid.query.aggregation.BufferAggregator;
import io.druid.query.aggregation.PostAggregator; import io.druid.query.aggregation.PostAggregator;
import io.druid.query.dimension.DimensionSpec; import io.druid.query.dimension.DimensionSpec;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.segment.Cursor; import io.druid.segment.Cursor;
import io.druid.segment.DimensionSelector; import io.druid.segment.DimensionSelector;
import io.druid.segment.StorageAdapter; import io.druid.segment.StorageAdapter;
import io.druid.segment.column.Column;
import io.druid.segment.data.IndexedInts; import io.druid.segment.data.IndexedInts;
import io.druid.segment.filter.Filters; import io.druid.segment.filter.Filters;
import org.joda.time.DateTime; import org.joda.time.DateTime;
@ -300,7 +301,7 @@ public class GroupByQueryEngine
private List<ByteBuffer> unprocessedKeys; private List<ByteBuffer> unprocessedKeys;
private Iterator<Row> delegate; private Iterator<Row> delegate;
public RowIterator(GroupByQuery query, Cursor cursor, ByteBuffer metricsBuffer, GroupByQueryConfig config) public RowIterator(GroupByQuery query, final Cursor cursor, ByteBuffer metricsBuffer, GroupByQueryConfig config)
{ {
this.query = query; this.query = query;
this.cursor = cursor; this.cursor = cursor;
@ -312,9 +313,13 @@ public class GroupByQueryEngine
dimensionSpecs = query.getDimensions(); dimensionSpecs = query.getDimensions();
dimensions = Lists.newArrayListWithExpectedSize(dimensionSpecs.size()); dimensions = Lists.newArrayListWithExpectedSize(dimensionSpecs.size());
dimNames = Lists.newArrayListWithExpectedSize(dimensionSpecs.size()); dimNames = Lists.newArrayListWithExpectedSize(dimensionSpecs.size());
for (int i = 0; i < dimensionSpecs.size(); ++i) { for (int i = 0; i < dimensionSpecs.size(); ++i) {
final DimensionSpec dimSpec = dimensionSpecs.get(i); final DimensionSpec dimSpec = dimensionSpecs.get(i);
final DimensionSelector selector = cursor.makeDimensionSelector(dimSpec.getDimension()); final DimensionSelector selector = cursor.makeDimensionSelector(
dimSpec.getDimension(),
dimSpec.getExtractionFn()
);
if (selector != null) { if (selector != null) {
dimensions.add(selector); dimensions.add(selector);
dimNames.add(dimSpec.getOutputName()); dimNames.add(dimSpec.getOutputName());
@ -395,16 +400,11 @@ public class GroupByQueryEngine
ByteBuffer keyBuffer = input.getKey().duplicate(); ByteBuffer keyBuffer = input.getKey().duplicate();
for (int i = 0; i < dimensions.size(); ++i) { for (int i = 0; i < dimensions.size(); ++i) {
final DimensionSelector dimSelector = dimensions.get(i); final DimensionSelector dimSelector = dimensions.get(i);
final DimExtractionFn fn = dimensionSpecs.get(i).getDimExtractionFn();
final int dimVal = keyBuffer.getInt(); final int dimVal = keyBuffer.getInt();
if (dimSelector.getValueCardinality() != dimVal) { if (dimSelector.getValueCardinality() != dimVal) {
if (fn != null) {
theEvent.put(dimNames.get(i), fn.apply(dimSelector.lookupName(dimVal)));
} else {
theEvent.put(dimNames.get(i), dimSelector.lookupName(dimVal)); theEvent.put(dimNames.get(i), dimSelector.lookupName(dimVal));
} }
} }
}
int position = input.getValue(); int position = input.getValue();
for (int i = 0; i < aggregators.length; ++i) { for (int i = 0; i < aggregators.length; ++i) {

View File

@ -161,7 +161,8 @@ public class SearchQueryRunner implements QueryRunner<Result<SearchResultValue>>
Map<String, DimensionSelector> dimSelectors = Maps.newHashMap(); Map<String, DimensionSelector> dimSelectors = Maps.newHashMap();
for (String dim : dimsToSearch) { for (String dim : dimsToSearch) {
dimSelectors.put(dim, cursor.makeDimensionSelector(dim)); // switching to using DimensionSpec for search would allow the use of extractionFn here.
dimSelectors.put(dim, cursor.makeDimensionSelector(dim, null));
} }
while (!cursor.isDone()) { while (!cursor.isDone()) {

View File

@ -86,7 +86,8 @@ public class SelectQueryEngine
final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap(); final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap();
for (String dim : dims) { for (String dim : dims) {
final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim); // switching to using DimensionSpec for select would allow the use of extractionFn here.
final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim, null);
dimSelectors.put(dim, dimSelector); dimSelectors.put(dim, dimSelector);
} }

View File

@ -19,6 +19,7 @@ package io.druid.query.topn;
import com.google.common.collect.Maps; import com.google.common.collect.Maps;
import io.druid.query.aggregation.Aggregator; import io.druid.query.aggregation.Aggregator;
import io.druid.query.extraction.ExtractionFn;
import io.druid.segment.Capabilities; import io.druid.segment.Capabilities;
import io.druid.segment.Cursor; import io.druid.segment.Cursor;
import io.druid.segment.DimensionSelector; import io.druid.segment.DimensionSelector;
@ -44,7 +45,8 @@ public class DimExtractionTopNAlgorithm extends BaseTopNAlgorithm<Aggregator[][]
@Override @Override
public TopNParams makeInitParams( public TopNParams makeInitParams(
final DimensionSelector dimSelector, final Cursor cursor final DimensionSelector dimSelector,
final Cursor cursor
) )
{ {
return new TopNParams( return new TopNParams(
@ -64,13 +66,12 @@ public class DimExtractionTopNAlgorithm extends BaseTopNAlgorithm<Aggregator[][]
params.getCardinality() params.getCardinality()
); );
if (!query.getDimensionSpec().preservesOrdering()) { // Unlike regular topN we cannot rely on ordering to optimize.
// Optimization possibly requires a reverse lookup from value to ID, which is
// not possible when applying an extraction function
return provider.build(); return provider.build();
} }
return query.getTopNMetricSpec().configureOptimizer(provider).build();
}
@Override @Override
protected Aggregator[][] updateDimValSelector(Aggregator[][] aggregators, int numProcessed, int numToProcess) protected Aggregator[][] updateDimValSelector(Aggregator[][] aggregators, int numProcessed, int numToProcess)
{ {
@ -98,11 +99,11 @@ public class DimExtractionTopNAlgorithm extends BaseTopNAlgorithm<Aggregator[][]
final IndexedInts dimValues = dimSelector.getRow(); final IndexedInts dimValues = dimSelector.getRow();
for (int i = 0; i < dimValues.size(); ++i) { for (int i = 0; i < dimValues.size(); ++i) {
final int dimIndex = dimValues.get(i);
final int dimIndex = dimValues.get(i);
Aggregator[] theAggregators = rowSelector[dimIndex]; Aggregator[] theAggregators = rowSelector[dimIndex];
if (theAggregators == null) { if (theAggregators == null) {
String key = query.getDimensionSpec().getDimExtractionFn().apply(dimSelector.lookupName(dimIndex)); final String key = dimSelector.lookupName(dimIndex);
theAggregators = aggregatesStore.get(key); theAggregators = aggregatesStore.get(key);
if (theAggregators == null) { if (theAggregators == null) {
theAggregators = makeAggregators(cursor, query.getAggregatorSpecs()); theAggregators = makeAggregators(cursor, query.getAggregatorSpecs());

View File

@ -0,0 +1,139 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package io.druid.query.topn;
import com.google.common.collect.Maps;
import io.druid.query.aggregation.Aggregator;
import io.druid.query.extraction.ExtractionFn;
import io.druid.segment.Capabilities;
import io.druid.segment.Cursor;
import io.druid.segment.DimensionSelector;
import io.druid.segment.LongColumnSelector;
import io.druid.segment.column.Column;
import java.util.Map;
public class TimeExtractionTopNAlgorithm extends BaseTopNAlgorithm<int[], Map<String, Aggregator[]>, TopNParams>
{
public static final int[] EMPTY_INTS = new int[]{};
private final TopNQuery query;
public TimeExtractionTopNAlgorithm(Capabilities capabilities, TopNQuery query)
{
super(capabilities);
this.query = query;
}
@Override
public TopNParams makeInitParams(DimensionSelector dimSelector, Cursor cursor)
{
return new TopNParams(
dimSelector,
cursor,
dimSelector.getValueCardinality(),
Integer.MAX_VALUE
);
}
@Override
protected int[] makeDimValSelector(TopNParams params, int numProcessed, int numToProcess)
{
return EMPTY_INTS;
}
@Override
protected int[] updateDimValSelector(int[] dimValSelector, int numProcessed, int numToProcess)
{
return dimValSelector;
}
@Override
protected Map<String, Aggregator[]> makeDimValAggregateStore(TopNParams params)
{
return Maps.newHashMap();
}
@Override
protected void scanAndAggregate(
TopNParams params, int[] dimValSelector, Map<String, Aggregator[]> aggregatesStore, int numProcessed
)
{
final Cursor cursor = params.getCursor();
final DimensionSelector dimSelector = params.getDimSelector();
while (!cursor.isDone()) {
final String key = dimSelector.lookupName(dimSelector.getRow().get(0));
Aggregator[] theAggregators = aggregatesStore.get(key);
if (theAggregators == null) {
theAggregators = makeAggregators(cursor, query.getAggregatorSpecs());
aggregatesStore.put(key, theAggregators);
}
for (Aggregator aggregator : theAggregators) {
aggregator.aggregate();
}
cursor.advance();
}
}
@Override
protected void updateResults(
TopNParams params,
int[] dimValSelector,
Map<String, Aggregator[]> aggregatesStore,
TopNResultBuilder resultBuilder
)
{
for (Map.Entry<String, Aggregator[]> entry : aggregatesStore.entrySet()) {
Aggregator[] aggs = entry.getValue();
if (aggs != null && aggs.length > 0) {
Object[] vals = new Object[aggs.length];
for (int i = 0; i < aggs.length; i++) {
vals[i] = aggs[i].get();
}
resultBuilder.addEntry(
entry.getKey(),
entry.getKey(),
vals
);
}
}
}
@Override
protected void closeAggregators(Map<String, Aggregator[]> stringMap)
{
for (Aggregator[] aggregators : stringMap.values()) {
for (Aggregator agg : aggregators) {
agg.close();
}
}
}
@Override
public void cleanup(TopNParams params)
{
}
}

View File

@ -24,7 +24,7 @@ public class TopNAlgorithmSelector
private final int cardinality; private final int cardinality;
private final int numBytesPerRecord; private final int numBytesPerRecord;
private volatile boolean hasDimExtractionFn; private volatile boolean hasExtractionFn;
private volatile boolean aggregateAllMetrics; private volatile boolean aggregateAllMetrics;
private volatile boolean aggregateTopNMetricFirst; private volatile boolean aggregateTopNMetricFirst;
@ -34,9 +34,9 @@ public class TopNAlgorithmSelector
this.numBytesPerRecord = numBytesPerRecord; this.numBytesPerRecord = numBytesPerRecord;
} }
public void setHasDimExtractionFn(boolean hasDimExtractionFn) public void setHasExtractionFn(boolean hasExtractionFn)
{ {
this.hasDimExtractionFn = hasDimExtractionFn; this.hasExtractionFn = hasExtractionFn;
} }
public void setAggregateAllMetrics(boolean aggregateAllMetrics) public void setAggregateAllMetrics(boolean aggregateAllMetrics)
@ -53,9 +53,9 @@ public class TopNAlgorithmSelector
} }
} }
public boolean isHasDimExtractionFn() public boolean isHasExtractionFn()
{ {
return hasDimExtractionFn; return hasExtractionFn;
} }
public boolean isAggregateAllMetrics() public boolean isAggregateAllMetrics()

View File

@ -41,7 +41,10 @@ public class TopNMapFn implements Function<Cursor, Result<TopNResultValue>>
@SuppressWarnings("unchecked") @SuppressWarnings("unchecked")
public Result<TopNResultValue> apply(Cursor cursor) public Result<TopNResultValue> apply(Cursor cursor)
{ {
final DimensionSelector dimSelector = cursor.makeDimensionSelector(query.getDimensionSpec().getDimension()); final DimensionSelector dimSelector = cursor.makeDimensionSelector(
query.getDimensionSpec().getDimension(),
query.getDimensionSpec().getExtractionFn()
);
if (dimSelector == null) { if (dimSelector == null) {
return null; return null;
} }

View File

@ -139,8 +139,8 @@ public class TopNQuery extends BaseQuery<Result<TopNResultValue>>
public void initTopNAlgorithmSelector(TopNAlgorithmSelector selector) public void initTopNAlgorithmSelector(TopNAlgorithmSelector selector)
{ {
if (dimensionSpec.getDimExtractionFn() != null) { if (dimensionSpec.getExtractionFn() != null) {
selector.setHasDimExtractionFn(true); selector.setHasExtractionFn(true);
} }
topNMetricSpec.initTopNAlgorithmSelector(selector); topNMetricSpec.initTopNAlgorithmSelector(selector);
} }

View File

@ -32,6 +32,7 @@ import io.druid.segment.Capabilities;
import io.druid.segment.Cursor; import io.druid.segment.Cursor;
import io.druid.segment.SegmentMissingException; import io.druid.segment.SegmentMissingException;
import io.druid.segment.StorageAdapter; import io.druid.segment.StorageAdapter;
import io.druid.segment.column.Column;
import io.druid.segment.filter.Filters; import io.druid.segment.filter.Filters;
import org.joda.time.Interval; import org.joda.time.Interval;
@ -88,7 +89,10 @@ public class TopNQueryEngine
private Function<Cursor, Result<TopNResultValue>> getMapFn(TopNQuery query, final StorageAdapter adapter) private Function<Cursor, Result<TopNResultValue>> getMapFn(TopNQuery query, final StorageAdapter adapter)
{ {
final Capabilities capabilities = adapter.getCapabilities(); final Capabilities capabilities = adapter.getCapabilities();
final int cardinality = adapter.getDimensionCardinality(query.getDimensionSpec().getDimension()); final String dimension = query.getDimensionSpec().getDimension();
final int cardinality = adapter.getDimensionCardinality(dimension);
int numBytesPerRecord = 0; int numBytesPerRecord = 0;
for (AggregatorFactory aggregatorFactory : query.getAggregatorSpecs()) { for (AggregatorFactory aggregatorFactory : query.getAggregatorSpecs()) {
numBytesPerRecord += aggregatorFactory.getMaxIntermediateSize(); numBytesPerRecord += aggregatorFactory.getMaxIntermediateSize();
@ -97,8 +101,18 @@ public class TopNQueryEngine
final TopNAlgorithmSelector selector = new TopNAlgorithmSelector(cardinality, numBytesPerRecord); final TopNAlgorithmSelector selector = new TopNAlgorithmSelector(cardinality, numBytesPerRecord);
query.initTopNAlgorithmSelector(selector); query.initTopNAlgorithmSelector(selector);
TopNAlgorithm topNAlgorithm = null; final TopNAlgorithm topNAlgorithm;
if (selector.isHasDimExtractionFn()) { if (
selector.isHasExtractionFn() &&
// TimeExtractionTopNAlgorithm can work on any single-value dimension of type long.
// Once we have arbitrary dimension types following check should be replaced by checking
// that the column is of type long and single-value.
dimension.equals(Column.TIME_COLUMN_NAME)
) {
// A special TimeExtractionTopNAlgorithm is required, since DimExtractionTopNAlgorithm
// currently relies on the dimension cardinality to support lexicographic sorting
topNAlgorithm = new TimeExtractionTopNAlgorithm(capabilities, query);
} else if(selector.isHasExtractionFn()) {
topNAlgorithm = new DimExtractionTopNAlgorithm(capabilities, query); topNAlgorithm = new DimExtractionTopNAlgorithm(capabilities, query);
} else if (selector.isAggregateAllMetrics()) { } else if (selector.isAggregateAllMetrics()) {
topNAlgorithm = new PooledTopNAlgorithm(capabilities, query, bufferPool); topNAlgorithm = new PooledTopNAlgorithm(capabilities, query, bufferPool);

View File

@ -17,12 +17,16 @@
package io.druid.segment; package io.druid.segment;
import io.druid.query.extraction.ExtractionFn;
import javax.annotation.Nullable;
/** /**
* Factory class for MetricSelectors * Factory class for MetricSelectors
*/ */
public interface ColumnSelectorFactory public interface ColumnSelectorFactory
{ {
public DimensionSelector makeDimensionSelector(String dimensionName); public DimensionSelector makeDimensionSelector(String dimensionName, @Nullable ExtractionFn extractionFn);
public FloatColumnSelector makeFloatColumnSelector(String columnName); public FloatColumnSelector makeFloatColumnSelector(String columnName);
public LongColumnSelector makeLongColumnSelector(String columnName); public LongColumnSelector makeLongColumnSelector(String columnName);
public ObjectColumnSelector makeObjectColumnSelector(String columnName); public ObjectColumnSelector makeObjectColumnSelector(String columnName);

View File

@ -19,6 +19,7 @@ package io.druid.segment;
import com.google.common.base.Function; import com.google.common.base.Function;
import com.google.common.base.Predicates; import com.google.common.base.Predicates;
import com.google.common.base.Strings;
import com.google.common.collect.Iterators; import com.google.common.collect.Iterators;
import com.google.common.collect.Maps; import com.google.common.collect.Maps;
import com.google.common.collect.Sets; import com.google.common.collect.Sets;
@ -27,6 +28,7 @@ import com.metamx.common.guava.Sequence;
import com.metamx.common.guava.Sequences; import com.metamx.common.guava.Sequences;
import io.druid.granularity.QueryGranularity; import io.druid.granularity.QueryGranularity;
import io.druid.query.QueryInterruptedException; import io.druid.query.QueryInterruptedException;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.Filter; import io.druid.query.filter.Filter;
import io.druid.segment.column.Column; import io.druid.segment.column.Column;
import io.druid.segment.column.ColumnCapabilities; import io.druid.segment.column.ColumnCapabilities;
@ -40,6 +42,7 @@ import io.druid.segment.data.Offset;
import org.joda.time.DateTime; import org.joda.time.DateTime;
import org.joda.time.Interval; import org.joda.time.Interval;
import javax.annotation.Nullable;
import java.io.Closeable; import java.io.Closeable;
import java.io.IOException; import java.io.IOException;
import java.util.Iterator; import java.util.Iterator;
@ -96,7 +99,7 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
return 0; return 0;
} }
if (!column.getCapabilities().isDictionaryEncoded()) { if (!column.getCapabilities().isDictionaryEncoded()) {
throw new UnsupportedOperationException("Only know cardinality of dictionary encoded columns."); return Integer.MAX_VALUE;
} }
return column.getDictionaryEncoding().getCardinality(); return column.getDictionaryEncoding().getCardinality();
} }
@ -272,14 +275,18 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
} }
@Override @Override
public DimensionSelector makeDimensionSelector(String dimension) public DimensionSelector makeDimensionSelector(final String dimension, @Nullable final ExtractionFn extractionFn)
{ {
DictionaryEncodedColumn cachedColumn = dictionaryColumnCache.get(dimension);
final Column columnDesc = index.getColumn(dimension); final Column columnDesc = index.getColumn(dimension);
if (columnDesc == null) { if (columnDesc == null) {
return NULL_DIMENSION_SELECTOR; return NULL_DIMENSION_SELECTOR;
} }
if (dimension.equals(Column.TIME_COLUMN_NAME)) {
return new SingleScanTimeDimSelector(makeLongColumnSelector(dimension), extractionFn);
}
DictionaryEncodedColumn cachedColumn = dictionaryColumnCache.get(dimension);
if (cachedColumn == null) { if (cachedColumn == null) {
cachedColumn = columnDesc.getDictionaryEncoding(); cachedColumn = columnDesc.getDictionaryEncoding();
dictionaryColumnCache.put(dimension, cachedColumn); dictionaryColumnCache.put(dimension, cachedColumn);
@ -308,13 +315,18 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
@Override @Override
public String lookupName(int id) public String lookupName(int id)
{ {
final String retVal = column.lookupName(id); final String value = column.lookupName(id);
return retVal == null ? "" : retVal; return extractionFn == null ?
Strings.nullToEmpty(value) :
extractionFn.apply(Strings.nullToEmpty(value));
} }
@Override @Override
public int lookupId(String name) public int lookupId(String name)
{ {
if (extractionFn != null) {
throw new UnsupportedOperationException("cannot perform lookup when applying an extraction function");
}
return column.lookupId(name); return column.lookupId(name);
} }
}; };
@ -356,12 +368,16 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
@Override @Override
public String lookupName(int id) public String lookupName(int id)
{ {
return column.lookupName(id); final String value = column.lookupName(id);
return extractionFn == null ? value : extractionFn.apply(value);
} }
@Override @Override
public int lookupId(String name) public int lookupId(String name)
{ {
if (extractionFn != null) {
throw new UnsupportedOperationException("cannot perform lookup when applying an extraction function");
}
return column.lookupId(name); return column.lookupId(name);
} }
}; };
@ -702,4 +718,5 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
return currentOffset; return currentOffset;
} }
} }
} }

View File

@ -29,7 +29,16 @@ public interface StorageAdapter extends CursorFactory
public Interval getInterval(); public Interval getInterval();
public Indexed<String> getAvailableDimensions(); public Indexed<String> getAvailableDimensions();
public Iterable<String> getAvailableMetrics(); public Iterable<String> getAvailableMetrics();
public int getDimensionCardinality(String dimension);
/**
* Returns the number of distinct values for the given dimension column
* For dimensions of unknown cardinality, e.g. __time this currently returns
* Integer.MAX_VALUE
*
* @param column
* @return
*/
public int getDimensionCardinality(String column);
public DateTime getMinTime(); public DateTime getMinTime();
public DateTime getMaxTime(); public DateTime getMaxTime();
public Capabilities getCapabilities(); public Capabilities getCapabilities();

View File

@ -20,7 +20,7 @@ package io.druid.segment.filter;
import com.google.common.collect.Lists; import com.google.common.collect.Lists;
import com.metamx.collections.bitmap.ImmutableBitmap; import com.metamx.collections.bitmap.ImmutableBitmap;
import com.metamx.collections.bitmap.WrappedConciseBitmap; import com.metamx.collections.bitmap.WrappedConciseBitmap;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.BitmapIndexSelector; import io.druid.query.filter.BitmapIndexSelector;
import io.druid.query.filter.Filter; import io.druid.query.filter.Filter;
import io.druid.query.filter.ValueMatcher; import io.druid.query.filter.ValueMatcher;
@ -36,12 +36,12 @@ public class ExtractionFilter implements Filter
{ {
private final String dimension; private final String dimension;
private final String value; private final String value;
private final DimExtractionFn fn; private final ExtractionFn fn;
public ExtractionFilter( public ExtractionFilter(
String dimension, String dimension,
String value, String value,
DimExtractionFn fn ExtractionFn fn
) )
{ {
this.dimension = dimension; this.dimension = dimension;

View File

@ -75,7 +75,7 @@ public class Filters
filter = new ExtractionFilter( filter = new ExtractionFilter(
extractionDimFilter.getDimension(), extractionDimFilter.getDimension(),
extractionDimFilter.getValue(), extractionDimFilter.getValue(),
extractionDimFilter.getDimExtractionFn() extractionDimFilter.getExtractionFn()
); );
} else if (dimFilter instanceof RegexDimFilter) { } else if (dimFilter instanceof RegexDimFilter) {
final RegexDimFilter regexDimFilter = (RegexDimFilter) dimFilter; final RegexDimFilter regexDimFilter = (RegexDimFilter) dimFilter;

View File

@ -58,7 +58,7 @@ public class SelectorFilter implements Filter
@Override @Override
public ValueMatcher makeMatcher(ColumnSelectorFactory columnSelectorFactory) public ValueMatcher makeMatcher(ColumnSelectorFactory columnSelectorFactory)
{ {
final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(dimension); final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(dimension, null);
// Missing columns match a null or empty string value and don't match anything else // Missing columns match a null or empty string value and don't match anything else
if (dimensionSelector == null) { if (dimensionSelector == null) {

View File

@ -34,6 +34,7 @@ import io.druid.data.input.impl.SpatialDimensionSchema;
import io.druid.granularity.QueryGranularity; import io.druid.granularity.QueryGranularity;
import io.druid.query.aggregation.AggregatorFactory; import io.druid.query.aggregation.AggregatorFactory;
import io.druid.query.aggregation.PostAggregator; import io.druid.query.aggregation.PostAggregator;
import io.druid.query.extraction.ExtractionFn;
import io.druid.segment.ColumnSelectorFactory; import io.druid.segment.ColumnSelectorFactory;
import io.druid.segment.DimensionSelector; import io.druid.segment.DimensionSelector;
import io.druid.segment.FloatColumnSelector; import io.druid.segment.FloatColumnSelector;
@ -163,7 +164,7 @@ public abstract class IncrementalIndex<AggregatorType> implements Iterable<Row>,
} }
@Override @Override
public DimensionSelector makeDimensionSelector(final String dimension) public DimensionSelector makeDimensionSelector(final String dimension, final ExtractionFn extractionFn)
{ {
return new DimensionSelector() return new DimensionSelector()
{ {
@ -209,12 +210,16 @@ public abstract class IncrementalIndex<AggregatorType> implements Iterable<Row>,
@Override @Override
public String lookupName(int id) public String lookupName(int id)
{ {
return in.get().getDimension(dimension).get(id); final String value = in.get().getDimension(dimension).get(id);
return extractionFn == null ? value : extractionFn.apply(value);
} }
@Override @Override
public int lookupId(String name) public int lookupId(String name)
{ {
if (extractionFn != null) {
throw new UnsupportedOperationException("cannot perform lookup when applying an extraction function");
}
return in.get().getDimension(dimension).indexOf(name); return in.get().getDimension(dimension).indexOf(name);
} }
}; };

View File

@ -28,6 +28,7 @@ import com.metamx.common.guava.Sequence;
import com.metamx.common.guava.Sequences; import com.metamx.common.guava.Sequences;
import io.druid.granularity.QueryGranularity; import io.druid.granularity.QueryGranularity;
import io.druid.query.QueryInterruptedException; import io.druid.query.QueryInterruptedException;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.Filter; import io.druid.query.filter.Filter;
import io.druid.query.filter.ValueMatcher; import io.druid.query.filter.ValueMatcher;
import io.druid.query.filter.ValueMatcherFactory; import io.druid.query.filter.ValueMatcherFactory;
@ -38,6 +39,7 @@ import io.druid.segment.FloatColumnSelector;
import io.druid.segment.LongColumnSelector; import io.druid.segment.LongColumnSelector;
import io.druid.segment.NullDimensionSelector; import io.druid.segment.NullDimensionSelector;
import io.druid.segment.ObjectColumnSelector; import io.druid.segment.ObjectColumnSelector;
import io.druid.segment.SingleScanTimeDimSelector;
import io.druid.segment.StorageAdapter; import io.druid.segment.StorageAdapter;
import io.druid.segment.column.Column; import io.druid.segment.column.Column;
import io.druid.segment.data.Indexed; import io.druid.segment.data.Indexed;
@ -99,6 +101,9 @@ public class IncrementalIndexStorageAdapter implements StorageAdapter
@Override @Override
public int getDimensionCardinality(String dimension) public int getDimensionCardinality(String dimension)
{ {
if(dimension.equals(Column.TIME_COLUMN_NAME)) {
return Integer.MAX_VALUE;
}
IncrementalIndex.DimDim dimDim = index.getDimension(dimension); IncrementalIndex.DimDim dimDim = index.getDimension(dimension);
if (dimDim == null) { if (dimDim == null) {
return 0; return 0;
@ -272,8 +277,12 @@ public class IncrementalIndexStorageAdapter implements StorageAdapter
} }
@Override @Override
public DimensionSelector makeDimensionSelector(String dimension) public DimensionSelector makeDimensionSelector(final String dimension, @Nullable final ExtractionFn extractionFn)
{ {
if (dimension.equals(Column.TIME_COLUMN_NAME)) {
return new SingleScanTimeDimSelector(makeLongColumnSelector(dimension), extractionFn);
}
final IncrementalIndex.DimDim dimValLookup = index.getDimension(dimension); final IncrementalIndex.DimDim dimValLookup = index.getDimension(dimension);
if (dimValLookup == null) { if (dimValLookup == null) {
return NULL_DIMENSION_SELECTOR; return NULL_DIMENSION_SELECTOR;
@ -331,12 +340,17 @@ public class IncrementalIndexStorageAdapter implements StorageAdapter
@Override @Override
public String lookupName(int id) public String lookupName(int id)
{ {
return dimValLookup.getValue(id); final String value = dimValLookup.getValue(id);
return extractionFn == null ? value : extractionFn.apply(value);
} }
@Override @Override
public int lookupId(String name) public int lookupId(String name)
{ {
if (extractionFn != null) {
throw new UnsupportedOperationException("cannot perform lookup when applying an extraction function");
}
return dimValLookup.getId(name); return dimValLookup.getId(name);
} }
}; };

View File

@ -18,6 +18,7 @@
package io.druid.query.aggregation; package io.druid.query.aggregation;
import com.google.common.collect.Lists; import com.google.common.collect.Lists;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.AndDimFilter; import io.druid.query.filter.AndDimFilter;
import io.druid.query.filter.DimFilter; import io.druid.query.filter.DimFilter;
import io.druid.query.filter.NotDimFilter; import io.druid.query.filter.NotDimFilter;
@ -70,7 +71,7 @@ public class FilteredAggregatorTest
return new ColumnSelectorFactory() return new ColumnSelectorFactory()
{ {
@Override @Override
public DimensionSelector makeDimensionSelector(String dimensionName) public DimensionSelector makeDimensionSelector(String dimensionName, ExtractionFn extractionFn)
{ {
if (dimensionName.equals("dim")) { if (dimensionName.equals("dim")) {
return new DimensionSelector() return new DimensionSelector()

View File

@ -0,0 +1,98 @@
/*
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Metamarkets licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package io.druid.query.extraction;
import com.fasterxml.jackson.databind.ObjectMapper;
import io.druid.jackson.DefaultObjectMapper;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.junit.Assert;
import org.junit.Test;
public class TimeFormatExtractionFnTest
{
private static final long[] timestamps = {
new DateTime("2015-01-01T23:00:00Z").getMillis(),
new DateTime("2015-01-02T23:00:00Z").getMillis(),
new DateTime("2015-03-03T23:00:00Z").getMillis(),
new DateTime("2015-03-04T23:00:00Z").getMillis(),
new DateTime("2015-05-02T23:00:00Z").getMillis(),
new DateTime("2015-12-21T23:00:00Z").getMillis()
};
@Test(expected = IllegalArgumentException.class)
public void testIAEForNullPattern() throws Exception
{
new TimeFormatExtractionFn(null, null, null);
}
@Test
public void testDayOfWeekExtraction() throws Exception
{
TimeFormatExtractionFn fn = new TimeFormatExtractionFn("EEEE", null, null);
Assert.assertEquals("Thursday", fn.apply(timestamps[0]));
Assert.assertEquals("Friday", fn.apply(timestamps[1]));
Assert.assertEquals("Tuesday", fn.apply(timestamps[2]));
Assert.assertEquals("Wednesday", fn.apply(timestamps[3]));
Assert.assertEquals("Saturday", fn.apply(timestamps[4]));
Assert.assertEquals("Monday", fn.apply(timestamps[5]));
testSerde(fn, "EEEE", null, null);
}
@Test
public void testLocalizedExtraction() throws Exception
{
TimeFormatExtractionFn fn = new TimeFormatExtractionFn("EEEE", null, "is");
Assert.assertEquals("fimmtudagur", fn.apply(timestamps[0]));
Assert.assertEquals("föstudagur", fn.apply(timestamps[1]));
Assert.assertEquals("þriðjudagur", fn.apply(timestamps[2]));
Assert.assertEquals("miðvikudagur", fn.apply(timestamps[3]));
Assert.assertEquals("laugardagur", fn.apply(timestamps[4]));
Assert.assertEquals("mánudagur", fn.apply(timestamps[5]));
testSerde(fn, "EEEE", null, "is");
}
@Test
public void testTimeZoneExtraction() throws Exception
{
TimeFormatExtractionFn fn = new TimeFormatExtractionFn("'In Berlin ist es schon 'EEEE", DateTimeZone.forID("Europe/Berlin"), "de");
Assert.assertEquals("In Berlin ist es schon Freitag", fn.apply(timestamps[0]));
Assert.assertEquals("In Berlin ist es schon Samstag", fn.apply(timestamps[1]));
Assert.assertEquals("In Berlin ist es schon Mittwoch", fn.apply(timestamps[2]));
Assert.assertEquals("In Berlin ist es schon Donnerstag", fn.apply(timestamps[3]));
Assert.assertEquals("In Berlin ist es schon Sonntag", fn.apply(timestamps[4]));
Assert.assertEquals("In Berlin ist es schon Dienstag", fn.apply(timestamps[5]));
testSerde(fn, "'In Berlin ist es schon 'EEEE", DateTimeZone.forID("Europe/Berlin"), "de");
}
public void testSerde(TimeFormatExtractionFn fn, String format, DateTimeZone tz, String locale) throws Exception {
ObjectMapper objectMapper = new DefaultObjectMapper();
String json = objectMapper.writeValueAsString(fn);
TimeFormatExtractionFn deserialized = objectMapper.readValue(json, TimeFormatExtractionFn.class);
Assert.assertEquals(format, deserialized.getFormat());
Assert.assertEquals(tz, deserialized.getTimeZone());
Assert.assertEquals(locale, deserialized.getLocale());
}
}

View File

@ -19,14 +19,15 @@ package io.druid.query.extraction.extraction;
import com.google.common.collect.Iterators; import com.google.common.collect.Iterators;
import com.google.common.collect.Lists; import com.google.common.collect.Lists;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.JavascriptDimExtractionFn; import io.druid.query.extraction.JavascriptExtractionFn;
import org.joda.time.DateTime;
import org.junit.Assert; import org.junit.Assert;
import org.junit.Test; import org.junit.Test;
import java.util.Iterator; import java.util.Iterator;
public class JavascriptDimExtractionFnTest public class JavascriptExtractionFnTest
{ {
private static final String[] testStrings = { private static final String[] testStrings = {
"Quito", "Quito",
@ -43,24 +44,46 @@ public class JavascriptDimExtractionFnTest
public void testJavascriptSubstring() public void testJavascriptSubstring()
{ {
String function = "function(str) { return str.substring(0,3); }"; String function = "function(str) { return str.substring(0,3); }";
DimExtractionFn dimExtractionFn = new JavascriptDimExtractionFn(function); ExtractionFn extractionFn = new JavascriptExtractionFn(function);
for (String str : testStrings) { for (String str : testStrings) {
String res = dimExtractionFn.apply(str); String res = extractionFn.apply(str);
Assert.assertEquals(str.substring(0, 3), res); Assert.assertEquals(str.substring(0, 3), res);
} }
} }
@Test
public void testTimeExample() throws Exception
{
String utcHour = "function(t) {\nreturn 'Second ' + Math.floor((t % 60000) / 1000);\n}";
final long millis = new DateTime("2015-01-02T13:00:59.999Z").getMillis();
Assert.assertEquals("Second 59" , new JavascriptExtractionFn(utcHour).apply(millis));
}
@Test
public void testLongs() throws Exception
{
String typeOf = "function(x) {\nreturn typeof x\n}";
Assert.assertEquals("number", new JavascriptExtractionFn(typeOf).apply(1234L));
}
@Test
public void testFloats() throws Exception
{
String typeOf = "function(x) {\nreturn typeof x\n}";
Assert.assertEquals("number", new JavascriptExtractionFn(typeOf).apply(1234.0));
}
@Test @Test
public void testCastingAndNull() public void testCastingAndNull()
{ {
String function = "function(x) {\n x = Number(x);\n if(isNaN(x)) return null;\n return Math.floor(x / 5) * 5;\n}"; String function = "function(x) {\n x = Number(x);\n if(isNaN(x)) return null;\n return Math.floor(x / 5) * 5;\n}";
DimExtractionFn dimExtractionFn = new JavascriptDimExtractionFn(function); ExtractionFn extractionFn = new JavascriptExtractionFn(function);
Iterator<String> it = Iterators.forArray("0", "5", "5", "10", null); Iterator<String> it = Iterators.forArray("0", "5", "5", "10", null);
for(String str : Lists.newArrayList("1", "5", "6", "10", "CA")) { for(String str : Lists.newArrayList("1", "5", "6", "10", "CA")) {
String res = dimExtractionFn.apply(str); String res = extractionFn.apply(str);
String expected = it.next(); String expected = it.next();
Assert.assertEquals(expected, res); Assert.assertEquals(expected, res);
} }
@ -70,11 +93,11 @@ public class JavascriptDimExtractionFnTest
public void testJavascriptRegex() public void testJavascriptRegex()
{ {
String function = "function(str) { return str.replace(/[aeiou]/g, ''); }"; String function = "function(str) { return str.replace(/[aeiou]/g, ''); }";
DimExtractionFn dimExtractionFn = new JavascriptDimExtractionFn(function); ExtractionFn extractionFn = new JavascriptExtractionFn(function);
Iterator it = Iterators.forArray("Qt", "Clgry", "Tky", "Stckhlm", "Vncvr", "Prtr", "Wllngtn", "Ontr"); Iterator it = Iterators.forArray("Qt", "Clgry", "Tky", "Stckhlm", "Vncvr", "Prtr", "Wllngtn", "Ontr");
for (String str : testStrings) { for (String str : testStrings) {
String res = dimExtractionFn.apply(str); String res = extractionFn.apply(str);
Assert.assertEquals(it.next(), res); Assert.assertEquals(it.next(), res);
} }
} }
@ -274,13 +297,13 @@ public class JavascriptDimExtractionFnTest
+ "" + ""
+ "}"; + "}";
DimExtractionFn dimExtractionFn = new JavascriptDimExtractionFn(function); ExtractionFn extractionFn = new JavascriptExtractionFn(function);
Iterator<String> inputs = Iterators.forArray("introducing", "exploratory", "analytics", "on", "large", "datasets"); Iterator<String> inputs = Iterators.forArray("introducing", "exploratory", "analytics", "on", "large", "datasets");
Iterator<String> it = Iterators.forArray("introduc", "exploratori", "analyt", "on", "larg", "dataset"); Iterator<String> it = Iterators.forArray("introduc", "exploratori", "analyt", "on", "larg", "dataset");
while(inputs.hasNext()) { while(inputs.hasNext()) {
String res = dimExtractionFn.apply(inputs.next()); String res = extractionFn.apply(inputs.next());
Assert.assertEquals(it.next(), res); Assert.assertEquals(it.next(), res);
} }
} }

View File

@ -18,7 +18,7 @@
package io.druid.query.extraction.extraction; package io.druid.query.extraction.extraction;
import com.google.common.collect.Sets; import com.google.common.collect.Sets;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.MatchingDimExtractionFn; import io.druid.query.extraction.MatchingDimExtractionFn;
import org.junit.Assert; import org.junit.Assert;
import org.junit.Test; import org.junit.Test;
@ -47,12 +47,12 @@ public class MatchingDimExtractionFnTest
public void testExtraction() public void testExtraction()
{ {
String regex = ".*[Tt][Oo].*"; String regex = ".*[Tt][Oo].*";
DimExtractionFn dimExtractionFn = new MatchingDimExtractionFn(regex); ExtractionFn extractionFn = new MatchingDimExtractionFn(regex);
List<String> expected = Arrays.asList("Quito", "Tokyo", "Stockholm", "Pretoria", "Wellington"); List<String> expected = Arrays.asList("Quito", "Tokyo", "Stockholm", "Pretoria", "Wellington");
Set<String> extracted = Sets.newHashSet(); Set<String> extracted = Sets.newHashSet();
for (String str : testStrings) { for (String str : testStrings) {
String res = dimExtractionFn.apply(str); String res = extractionFn.apply(str);
if (res != null) { if (res != null) {
extracted.add(res); extracted.add(res);
} }

View File

@ -18,7 +18,7 @@
package io.druid.query.extraction.extraction; package io.druid.query.extraction.extraction;
import com.google.common.collect.Sets; import com.google.common.collect.Sets;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.RegexDimExtractionFn; import io.druid.query.extraction.RegexDimExtractionFn;
import org.junit.Assert; import org.junit.Assert;
import org.junit.Test; import org.junit.Test;
@ -53,11 +53,11 @@ public class RegexDimExtractionFnTest
public void testPathExtraction() public void testPathExtraction()
{ {
String regex = "/([^/]+)/"; String regex = "/([^/]+)/";
DimExtractionFn dimExtractionFn = new RegexDimExtractionFn(regex); ExtractionFn extractionFn = new RegexDimExtractionFn(regex);
Set<String> extracted = Sets.newHashSet(); Set<String> extracted = Sets.newHashSet();
for (String path : paths) { for (String path : paths) {
extracted.add(dimExtractionFn.apply(path)); extracted.add(extractionFn.apply(path));
} }
Assert.assertEquals(2, extracted.size()); Assert.assertEquals(2, extracted.size());
@ -69,11 +69,11 @@ public class RegexDimExtractionFnTest
public void testDeeperPathExtraction() public void testDeeperPathExtraction()
{ {
String regex = "^/([^/]+/[^/]+)(/|$)"; String regex = "^/([^/]+/[^/]+)(/|$)";
DimExtractionFn dimExtractionFn = new RegexDimExtractionFn(regex); ExtractionFn extractionFn = new RegexDimExtractionFn(regex);
Set<String> extracted = Sets.newHashSet(); Set<String> extracted = Sets.newHashSet();
for (String path : paths) { for (String path : paths) {
extracted.add(dimExtractionFn.apply(path)); extracted.add(extractionFn.apply(path));
} }
Assert.assertEquals(4, extracted.size()); Assert.assertEquals(4, extracted.size());
@ -87,11 +87,11 @@ public class RegexDimExtractionFnTest
public void testStringExtraction() public void testStringExtraction()
{ {
String regex = "(.)"; String regex = "(.)";
DimExtractionFn dimExtractionFn = new RegexDimExtractionFn(regex); ExtractionFn extractionFn = new RegexDimExtractionFn(regex);
Set<String> extracted = Sets.newHashSet(); Set<String> extracted = Sets.newHashSet();
for (String testString : testStrings) { for (String testString : testStrings) {
extracted.add(dimExtractionFn.apply(testString)); extracted.add(extractionFn.apply(testString));
} }
Assert.assertEquals(3, extracted.size()); Assert.assertEquals(3, extracted.size());

View File

@ -18,7 +18,7 @@
package io.druid.query.extraction.extraction; package io.druid.query.extraction.extraction;
import com.google.common.collect.Sets; import com.google.common.collect.Sets;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.SearchQuerySpecDimExtractionFn; import io.druid.query.extraction.SearchQuerySpecDimExtractionFn;
import io.druid.query.search.search.FragmentSearchQuerySpec; import io.druid.query.search.search.FragmentSearchQuerySpec;
import io.druid.query.search.search.SearchQuerySpec; import io.druid.query.search.search.SearchQuerySpec;
@ -50,12 +50,12 @@ public class SearchQuerySpecDimExtractionFnTest
SearchQuerySpec spec = new FragmentSearchQuerySpec( SearchQuerySpec spec = new FragmentSearchQuerySpec(
Arrays.asList("to", "yo") Arrays.asList("to", "yo")
); );
DimExtractionFn dimExtractionFn = new SearchQuerySpecDimExtractionFn(spec); ExtractionFn extractionFn = new SearchQuerySpecDimExtractionFn(spec);
List<String> expected = Arrays.asList("Kyoto", "Tokyo", "Toyokawa", "Yorktown"); List<String> expected = Arrays.asList("Kyoto", "Tokyo", "Toyokawa", "Yorktown");
Set<String> extracted = Sets.newHashSet(); Set<String> extracted = Sets.newHashSet();
for (String str : testStrings) { for (String str : testStrings) {
String res = dimExtractionFn.apply(str); String res = extractionFn.apply(str);
if (res != null) { if (res != null) {
extracted.add(res); extracted.add(res);
} }

View File

@ -18,7 +18,7 @@
package io.druid.query.extraction.extraction; package io.druid.query.extraction.extraction;
import com.google.common.collect.Sets; import com.google.common.collect.Sets;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.TimeDimExtractionFn; import io.druid.query.extraction.TimeDimExtractionFn;
import org.junit.Assert; import org.junit.Assert;
import org.junit.Test; import org.junit.Test;
@ -42,10 +42,10 @@ public class TimeDimExtractionFnTest
public void testMonthExtraction() public void testMonthExtraction()
{ {
Set<String> months = Sets.newHashSet(); Set<String> months = Sets.newHashSet();
DimExtractionFn dimExtractionFn = new TimeDimExtractionFn("MM/dd/yyyy", "MM/yyyy"); ExtractionFn extractionFn = new TimeDimExtractionFn("MM/dd/yyyy", "MM/yyyy");
for (String dim : dims) { for (String dim : dims) {
months.add(dimExtractionFn.apply(dim)); months.add(extractionFn.apply(dim));
} }
Assert.assertEquals(months.size(), 4); Assert.assertEquals(months.size(), 4);
@ -59,10 +59,10 @@ public class TimeDimExtractionFnTest
public void testQuarterExtraction() public void testQuarterExtraction()
{ {
Set<String> quarters = Sets.newHashSet(); Set<String> quarters = Sets.newHashSet();
DimExtractionFn dimExtractionFn = new TimeDimExtractionFn("MM/dd/yyyy", "QQQ/yyyy"); ExtractionFn extractionFn = new TimeDimExtractionFn("MM/dd/yyyy", "QQQ/yyyy");
for (String dim : dims) { for (String dim : dims) {
quarters.add(dimExtractionFn.apply(dim)); quarters.add(extractionFn.apply(dim));
} }
Assert.assertEquals(quarters.size(), 3); Assert.assertEquals(quarters.size(), 3);

View File

@ -52,9 +52,14 @@ import io.druid.query.dimension.DefaultDimensionSpec;
import io.druid.query.dimension.DimensionSpec; import io.druid.query.dimension.DimensionSpec;
import io.druid.query.dimension.ExtractionDimensionSpec; import io.druid.query.dimension.ExtractionDimensionSpec;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.DimExtractionFn;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.RegexDimExtractionFn; import io.druid.query.extraction.RegexDimExtractionFn;
import io.druid.query.extraction.TimeFormatExtractionFn;
import io.druid.query.filter.DimFilter;
import io.druid.query.filter.JavaScriptDimFilter; import io.druid.query.filter.JavaScriptDimFilter;
import io.druid.query.filter.OrDimFilter;
import io.druid.query.filter.RegexDimFilter; import io.druid.query.filter.RegexDimFilter;
import io.druid.query.filter.SelectorDimFilter;
import io.druid.query.groupby.having.EqualToHavingSpec; import io.druid.query.groupby.having.EqualToHavingSpec;
import io.druid.query.groupby.having.GreaterThanHavingSpec; import io.druid.query.groupby.having.GreaterThanHavingSpec;
import io.druid.query.groupby.having.HavingSpec; import io.druid.query.groupby.having.HavingSpec;
@ -64,6 +69,7 @@ import io.druid.query.groupby.orderby.LimitSpec;
import io.druid.query.groupby.orderby.OrderByColumnSpec; import io.druid.query.groupby.orderby.OrderByColumnSpec;
import io.druid.query.spec.MultipleIntervalSegmentSpec; import io.druid.query.spec.MultipleIntervalSegmentSpec;
import io.druid.segment.TestHelper; import io.druid.segment.TestHelper;
import io.druid.segment.column.Column;
import org.joda.time.DateTime; import org.joda.time.DateTime;
import org.joda.time.DateTimeZone; import org.joda.time.DateTimeZone;
import org.joda.time.Interval; import org.joda.time.Interval;
@ -322,8 +328,8 @@ public class GroupByQueryRunnerTest
@Test @Test
public void testGroupByWithNullProducingDimExtractionFn() public void testGroupByWithNullProducingDimExtractionFn()
{ {
final DimExtractionFn fn1 = new RegexDimExtractionFn("(\\w{1})"); final ExtractionFn fn1 = new RegexDimExtractionFn("(\\w{1})");
final DimExtractionFn nullExtractionFn = new DimExtractionFn() final ExtractionFn nullExtractionFn = new DimExtractionFn()
{ {
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()
@ -354,7 +360,7 @@ public class GroupByQueryRunnerTest
)) ))
.setGranularity(QueryRunnerTestHelper.dayGran) .setGranularity(QueryRunnerTestHelper.dayGran)
.setDimensions(Lists.<DimensionSpec>newArrayList( .setDimensions(Lists.<DimensionSpec>newArrayList(
new ExtractionDimensionSpec("quality", "alias", nullExtractionFn) new ExtractionDimensionSpec("quality", "alias", nullExtractionFn, null)
)) ))
.build(); .build();
@ -389,8 +395,8 @@ public class GroupByQueryRunnerTest
*/ */
public void testGroupByWithEmptyStringProducingDimExtractionFn() public void testGroupByWithEmptyStringProducingDimExtractionFn()
{ {
final DimExtractionFn fn1 = new RegexDimExtractionFn("(\\w{1})"); final ExtractionFn fn1 = new RegexDimExtractionFn("(\\w{1})");
final DimExtractionFn emptyStringExtractionFn = new DimExtractionFn() final ExtractionFn emptyStringExtractionFn = new DimExtractionFn()
{ {
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()
@ -421,7 +427,7 @@ public class GroupByQueryRunnerTest
)) ))
.setGranularity(QueryRunnerTestHelper.dayGran) .setGranularity(QueryRunnerTestHelper.dayGran)
.setDimensions(Lists.<DimensionSpec>newArrayList( .setDimensions(Lists.<DimensionSpec>newArrayList(
new ExtractionDimensionSpec("quality", "alias", emptyStringExtractionFn) new ExtractionDimensionSpec("quality", "alias", emptyStringExtractionFn, null)
)) ))
.build(); .build();
@ -672,11 +678,13 @@ public class GroupByQueryRunnerTest
final GroupByQuery allGranQuery = builder.copy().setGranularity(QueryGranularity.ALL).build(); final GroupByQuery allGranQuery = builder.copy().setGranularity(QueryGranularity.ALL).build();
QueryRunner mergedRunner = factory.getToolchest().mergeResults( QueryRunner mergedRunner = factory.getToolchest().mergeResults(
new QueryRunner<Row>() { new QueryRunner<Row>()
{
@Override @Override
public Sequence<Row> run( public Sequence<Row> run(
Query<Row> query, Map<String, Object> responseContext Query<Row> query, Map<String, Object> responseContext
) { )
{
// simulate two daily segments // simulate two daily segments
final Query query1 = query.withQuerySegmentSpec( final Query query1 = query.withQuerySegmentSpec(
new MultipleIntervalSegmentSpec(Lists.newArrayList(new Interval("2011-04-02/2011-04-03"))) new MultipleIntervalSegmentSpec(Lists.newArrayList(new Interval("2011-04-02/2011-04-03")))
@ -2272,4 +2280,61 @@ public class GroupByQueryRunnerTest
Iterable<Row> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query); Iterable<Row> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
TestHelper.assertExpectedObjects(expectedResults, results, ""); TestHelper.assertExpectedObjects(expectedResults, results, "");
} }
@Test
public void testGroupByTimeExtraction()
{
GroupByQuery query = GroupByQuery
.builder()
.setDataSource(QueryRunnerTestHelper.dataSource)
.setQuerySegmentSpec(QueryRunnerTestHelper.fullOnInterval)
.setDimensions(
Lists.newArrayList(
new DefaultDimensionSpec("market", "market"),
new ExtractionDimensionSpec(
Column.TIME_COLUMN_NAME,
"dayOfWeek",
new TimeFormatExtractionFn("EEEE", null, null),
null
)
)
)
.setAggregatorSpecs(
Arrays.asList(
QueryRunnerTestHelper.rowsCount,
QueryRunnerTestHelper.indexDoubleSum
)
)
.setPostAggregatorSpecs(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.setGranularity(QueryRunnerTestHelper.allGran)
.setDimFilter(
new OrDimFilter(
Arrays.<DimFilter>asList(
new SelectorDimFilter("market", "spot"),
new SelectorDimFilter("market", "upfront")
)
)
)
.build();
List<Row> expectedResults = Arrays.asList(
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Friday", "market", "spot", "index", 13219.574157714844, "rows", 117L, "addRowsIndexConstant", 13337.574157714844),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Monday", "market", "spot", "index", 13557.738830566406, "rows", 117L, "addRowsIndexConstant", 13675.738830566406),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Saturday", "market", "spot", "index", 13493.751281738281, "rows", 117L, "addRowsIndexConstant", 13611.751281738281),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Sunday", "market", "spot", "index", 13585.541015625, "rows", 117L, "addRowsIndexConstant", 13703.541015625),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Thursday", "market", "spot", "index", 14279.127197265625, "rows", 126L, "addRowsIndexConstant", 14406.127197265625),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Tuesday", "market", "spot", "index", 13199.471435546875, "rows", 117L, "addRowsIndexConstant", 13317.471435546875),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Wednesday", "market", "spot", "index", 14271.368591308594, "rows", 126L, "addRowsIndexConstant", 14398.368591308594),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Friday", "market", "upfront", "index", 27297.8623046875, "rows", 26L, "addRowsIndexConstant", 27324.8623046875),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Monday", "market", "upfront", "index", 27619.58447265625, "rows", 26L, "addRowsIndexConstant", 27646.58447265625),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Saturday", "market", "upfront", "index", 27820.83154296875, "rows", 26L, "addRowsIndexConstant", 27847.83154296875),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Sunday", "market", "upfront", "index", 24791.223876953125, "rows", 26L, "addRowsIndexConstant", 24818.223876953125),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Thursday", "market", "upfront", "index", 28562.748901367188, "rows", 28L, "addRowsIndexConstant", 28591.748901367188),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Tuesday", "market", "upfront", "index", 26968.280639648438, "rows", 26L, "addRowsIndexConstant", 26995.280639648438),
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01", "dayOfWeek", "Wednesday", "market", "upfront", "index", 28985.5751953125, "rows", 28L, "addRowsIndexConstant", 29014.5751953125)
);
Iterable<Row> results = GroupByQueryRunnerTestHelper.runQuery(factory, runner, query);
TestHelper.assertExpectedObjects(expectedResults, results, "");
}
} }

View File

@ -41,13 +41,14 @@ import io.druid.query.aggregation.cardinality.CardinalityAggregatorFactory;
import io.druid.query.aggregation.hyperloglog.HyperUniquesAggregatorFactory; import io.druid.query.aggregation.hyperloglog.HyperUniquesAggregatorFactory;
import io.druid.query.dimension.ExtractionDimensionSpec; import io.druid.query.dimension.ExtractionDimensionSpec;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.DimExtractionFn;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.extraction.RegexDimExtractionFn; import io.druid.query.extraction.RegexDimExtractionFn;
import io.druid.query.extraction.TimeFormatExtractionFn;
import io.druid.query.filter.AndDimFilter; import io.druid.query.filter.AndDimFilter;
import io.druid.query.filter.DimFilter; import io.druid.query.filter.DimFilter;
import io.druid.query.spec.MultipleIntervalSegmentSpec; import io.druid.query.spec.MultipleIntervalSegmentSpec;
import io.druid.query.timeseries.TimeseriesQuery;
import io.druid.query.timeseries.TimeseriesResultValue;
import io.druid.segment.TestHelper; import io.druid.segment.TestHelper;
import io.druid.segment.column.Column;
import org.joda.time.DateTime; import org.joda.time.DateTime;
import org.joda.time.Interval; import org.joda.time.Interval;
import org.junit.Test; import org.junit.Test;
@ -1399,6 +1400,55 @@ public class TopNQueryRunnerTest
TestHelper.assertExpectedResults(expectedResults, runner.run(query, context)); TestHelper.assertExpectedResults(expectedResults, runner.run(query, context));
} }
@Test
public void testTopNCollapsingDimExtraction()
{
TopNQuery query = new TopNQueryBuilder()
.dataSource(QueryRunnerTestHelper.dataSource)
.granularity(QueryRunnerTestHelper.allGran)
.dimension(
new ExtractionDimensionSpec(
QueryRunnerTestHelper.qualityDimension, QueryRunnerTestHelper.qualityDimension,
new RegexDimExtractionFn(".(.)"), null
)
)
.metric("index")
.threshold(2)
.intervals(QueryRunnerTestHelper.fullOnInterval)
.aggregators(
Arrays.asList(
QueryRunnerTestHelper.rowsCount,
QueryRunnerTestHelper.indexDoubleSum
)
)
.postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.build();
List<Result<TopNResultValue>> expectedResults = Arrays.asList(
new Result<>(
new DateTime("2011-01-12T00:00:00.000Z"),
new TopNResultValue(
Arrays.<Map<String, Object>>asList(
ImmutableMap.<String, Object>of(
QueryRunnerTestHelper.qualityDimension, "e",
"rows", 558L,
"index", 246645.1204032898,
"addRowsIndexConstant", 247204.1204032898
),
ImmutableMap.<String, Object>of(
QueryRunnerTestHelper.qualityDimension, "r",
"rows", 372L,
"index", 222051.08961486816,
"addRowsIndexConstant", 222424.08961486816
)
)
)
)
);
HashMap<String, Object> context = new HashMap<String, Object>();
TestHelper.assertExpectedResults(expectedResults, runner.run(query, context));
}
@Test @Test
public void testTopNDimExtraction() public void testTopNDimExtraction()
{ {
@ -1407,7 +1457,7 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)"), null
) )
) )
.metric("rows") .metric("rows")
@ -1459,7 +1509,7 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)"), null
) )
) )
.metric(new LexicographicTopNMetricSpec(null)) .metric(new LexicographicTopNMetricSpec(null))
@ -1511,7 +1561,7 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("..(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("..(.)"), null
) )
) )
.metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec(null))) .metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec(null)))
@ -1563,10 +1613,75 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)"), null
) )
) )
.metric(new LexicographicTopNMetricSpec("spot")) .metric(new LexicographicTopNMetricSpec("s"))
.threshold(4)
.intervals(QueryRunnerTestHelper.firstToThird)
.aggregators(QueryRunnerTestHelper.commonAggregators)
.postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.build();
List<Result<TopNResultValue>> expectedResults = Arrays.asList(
new Result<TopNResultValue>(
new DateTime("2011-04-01T00:00:00.000Z"),
new TopNResultValue(
Arrays.<Map<String, Object>>asList(
ImmutableMap.<String, Object>of(
QueryRunnerTestHelper.marketDimension, "t",
"rows", 4L,
"index", 5351.814697265625D,
"addRowsIndexConstant", 5356.814697265625D,
"uniques", QueryRunnerTestHelper.UNIQUES_2
),
ImmutableMap.<String, Object>of(
QueryRunnerTestHelper.marketDimension, "u",
"rows", 4L,
"index", 4875.669677734375D,
"addRowsIndexConstant", 4880.669677734375D,
"uniques", QueryRunnerTestHelper.UNIQUES_2
)
)
)
)
);
HashMap<String, Object> context = new HashMap<String, Object>();
TestHelper.assertExpectedResults(expectedResults, runner.run(query, context));
}
@Test
public void testTopNLexicographicDimExtractionWithSortingPreservedAndPreviousStop()
{
TopNQuery query = new TopNQueryBuilder()
.dataSource(QueryRunnerTestHelper.dataSource)
.granularity(QueryRunnerTestHelper.allGran)
.dimension(
new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension,
new DimExtractionFn()
{
@Override
public byte[] getCacheKey()
{
return new byte[0];
}
@Override
public String apply(String value)
{
return value.substring(0, 1);
}
@Override
public boolean preservesOrdering()
{
return true;
}
}, null
)
)
.metric(new LexicographicTopNMetricSpec("s"))
.threshold(4) .threshold(4)
.intervals(QueryRunnerTestHelper.firstToThird) .intervals(QueryRunnerTestHelper.firstToThird)
.aggregators(QueryRunnerTestHelper.commonAggregators) .aggregators(QueryRunnerTestHelper.commonAggregators)
@ -1609,7 +1724,7 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("(.)"), null
) )
) )
.metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec("u"))) .metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec("u")))
@ -1654,7 +1769,7 @@ public class TopNQueryRunnerTest
.granularity(QueryRunnerTestHelper.allGran) .granularity(QueryRunnerTestHelper.allGran)
.dimension( .dimension(
new ExtractionDimensionSpec( new ExtractionDimensionSpec(
QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("..(.)") QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, new RegexDimExtractionFn("..(.)"), null
) )
) )
.metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec("p"))) .metric(new InvertedTopNMetricSpec(new LexicographicTopNMetricSpec("p")))
@ -1693,7 +1808,7 @@ public class TopNQueryRunnerTest
@Test @Test
public void testTopNWithNullProducingDimExtractionFn() public void testTopNWithNullProducingDimExtractionFn()
{ {
final DimExtractionFn nullStringDimExtraction = new DimExtractionFn() final ExtractionFn nullStringDimExtraction = new DimExtractionFn()
{ {
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()
@ -1723,7 +1838,7 @@ public class TopNQueryRunnerTest
.aggregators(QueryRunnerTestHelper.commonAggregators) .aggregators(QueryRunnerTestHelper.commonAggregators)
.postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant)) .postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.dimension( .dimension(
new ExtractionDimensionSpec(QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, nullStringDimExtraction) new ExtractionDimensionSpec(QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, nullStringDimExtraction, null)
) )
.build(); .build();
@ -1771,7 +1886,7 @@ public class TopNQueryRunnerTest
*/ */
public void testTopNWithEmptyStringProducingDimExtractionFn() public void testTopNWithEmptyStringProducingDimExtractionFn()
{ {
final DimExtractionFn emptyStringDimExtraction = new DimExtractionFn() final ExtractionFn emptyStringDimExtraction = new DimExtractionFn()
{ {
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()
@ -1801,7 +1916,7 @@ public class TopNQueryRunnerTest
.aggregators(QueryRunnerTestHelper.commonAggregators) .aggregators(QueryRunnerTestHelper.commonAggregators)
.postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant)) .postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.dimension( .dimension(
new ExtractionDimensionSpec(QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, emptyStringDimExtraction)) new ExtractionDimensionSpec(QueryRunnerTestHelper.marketDimension, QueryRunnerTestHelper.marketDimension, emptyStringDimExtraction, null))
.build(); .build();
@ -2159,4 +2274,55 @@ public class TopNQueryRunnerTest
TestHelper.assertExpectedResults(expectedResults, runner.run(query, Maps.newHashMap())); TestHelper.assertExpectedResults(expectedResults, runner.run(query, Maps.newHashMap()));
} }
@Test
public void testTopNTimeExtraction()
{
TopNQuery query = new TopNQueryBuilder()
.dataSource(QueryRunnerTestHelper.dataSource)
.granularity(QueryRunnerTestHelper.allGran)
.dimension(
new ExtractionDimensionSpec(
Column.TIME_COLUMN_NAME,
"dayOfWeek",
new TimeFormatExtractionFn("EEEE", null, null),
null
)
)
.metric("index")
.threshold(2)
.intervals(QueryRunnerTestHelper.fullOnInterval)
.aggregators(
Arrays.asList(
QueryRunnerTestHelper.rowsCount,
QueryRunnerTestHelper.indexDoubleSum
)
)
.postAggregators(Arrays.<PostAggregator>asList(QueryRunnerTestHelper.addRowsIndexConstant))
.build();
List<Result<TopNResultValue>> expectedResults = Arrays.asList(
new Result<>(
new DateTime("2011-01-12T00:00:00.000Z"),
new TopNResultValue(
Arrays.<Map<String, Object>>asList(
ImmutableMap.<String, Object>of(
"dayOfWeek", "Wednesday",
"rows", 182L,
"index", 76010.28100585938,
"addRowsIndexConstant", 76193.28100585938
),
ImmutableMap.<String, Object>of(
"dayOfWeek", "Thursday",
"rows", 182L,
"index", 75203.26300811768,
"addRowsIndexConstant", 75386.26300811768
)
)
)
)
);
HashMap<String, Object> context = new HashMap<String, Object>();
TestHelper.assertExpectedResults(expectedResults, runner.run(query, context));
}
} }

View File

@ -17,7 +17,6 @@
package io.druid.segment.filter; package io.druid.segment.filter;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap; import com.google.common.collect.ImmutableMap;
import com.metamx.collections.bitmap.BitmapFactory; import com.metamx.collections.bitmap.BitmapFactory;
import com.metamx.collections.bitmap.ConciseBitmapFactory; import com.metamx.collections.bitmap.ConciseBitmapFactory;
@ -25,18 +24,15 @@ import com.metamx.collections.bitmap.ImmutableBitmap;
import com.metamx.collections.bitmap.WrappedConciseBitmap; import com.metamx.collections.bitmap.WrappedConciseBitmap;
import com.metamx.collections.spatial.ImmutableRTree; import com.metamx.collections.spatial.ImmutableRTree;
import io.druid.query.extraction.DimExtractionFn; import io.druid.query.extraction.DimExtractionFn;
import io.druid.query.extraction.ExtractionFn;
import io.druid.query.filter.BitmapIndexSelector; import io.druid.query.filter.BitmapIndexSelector;
import io.druid.query.filter.Filter;
import io.druid.segment.data.ArrayIndexed; import io.druid.segment.data.ArrayIndexed;
import io.druid.segment.data.Indexed; import io.druid.segment.data.Indexed;
import it.uniroma3.mat.extendedset.intset.ConciseSet; import it.uniroma3.mat.extendedset.intset.ConciseSet;
import org.junit.Assert; import org.junit.Assert;
import org.junit.BeforeClass; import org.junit.BeforeClass;
import org.junit.Test; import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;
import java.util.Collection;
import java.util.Map; import java.util.Map;
/** /**
@ -94,7 +90,7 @@ public class ExtractionDimFilterTest
return null; return null;
} }
}; };
private static final DimExtractionFn DIM_EXTRACTION_FN = new DimExtractionFn() private static final ExtractionFn DIM_EXTRACTION_FN = new DimExtractionFn()
{ {
@Override @Override
public byte[] getCacheKey() public byte[] getCacheKey()

View File

@ -273,7 +273,7 @@ public class IncrementalIndexStorageAdapterTest
Cursor cursor = Sequences.toList(Sequences.limit(cursorSequence, 1), Lists.<Cursor>newArrayList()).get(0); Cursor cursor = Sequences.toList(Sequences.limit(cursorSequence, 1), Lists.<Cursor>newArrayList()).get(0);
DimensionSelector dimSelector; DimensionSelector dimSelector;
dimSelector = cursor.makeDimensionSelector("sally"); dimSelector = cursor.makeDimensionSelector("sally", null);
Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0))); Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0)));
index.add( index.add(
@ -287,7 +287,7 @@ public class IncrementalIndexStorageAdapterTest
// Cursor reset should not be affected by out of order values // Cursor reset should not be affected by out of order values
cursor.reset(); cursor.reset();
dimSelector = cursor.makeDimensionSelector("sally"); dimSelector = cursor.makeDimensionSelector("sally", null);
Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0))); Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0)));
} }