mirror of https://github.com/apache/druid.git
Merge pull request #2130 from himanshug/fix_filter_multi_valued
Feature to "fix" filtering on multi-valued dimensions
This commit is contained in:
commit
f8219432ee
|
@ -252,3 +252,23 @@ A null dimension value can be mapped to a specific value by specifying the empty
|
|||
This allows distinguishing between a null dimension and a lookup resulting in a null.
|
||||
For example, specifying `{"":"bar","bat":"baz"}` with dimension values `[null, "foo", "bat"]` and replacing missing values with `"oof"` will yield results of `["bar", "oof", "baz"]`.
|
||||
Omitting the empty string key will cause the missing value to take over. For example, specifying `{"bat":"baz"}` with dimension values `[null, "foo", "bat"]` and replacing missing values with `"oof"` will yield results of `["oof", "oof", "baz"]`.
|
||||
|
||||
### Filtering DimensionSpecs
|
||||
These are only valid for multi-valued dimensions. If you have a row in druid that has a multi-valued dimension with values ["v1", "v2", "v3"] and you send a groupBy/topN query grouping by that dimension with [query filter](filter.html) for value "v1". In the response you will get 3 rows containing "v1", "v2" and "v3". This behavior might be unintuitive for some use cases.
|
||||
|
||||
It happens because `query filter` is internally used on the bitmaps and only used to match the row to be included in the query result processing. With multivalued dimensions, "query filter" behaves like a contains check, which will match the row with dimension value ["v1", "v2", "v3"]. Please see the section on "Multi-value columns" in [segment](../design/segments.html) for more details.
|
||||
Then groupBy/topN processing pipeline "explodes" all multi-valued dimensions resulting 3 rows for "v1", "v2" and "v3" each.
|
||||
|
||||
In addition to "query filter" which efficiently selects the rows to be processed, you can use the filtering dimension spec to filter for specific values within the values of a multi-valued dimension. These dimensionSpecs take a delegate DimensionSpec and a filtering criteria. From the "exploded" rows, only rows matching the given filtering criteria are returned in the query result.
|
||||
|
||||
The following filtered dimension spec acts as a whiltelist or blacklist for values as per the "isWhitelist" attribute value.
|
||||
```json
|
||||
{ "type" : "listFiltered", "delegate" : <dimensionSpec>, "values": <array of strings>, "isWhitelist": <optional attribute for true/false, default is true> }
|
||||
```
|
||||
|
||||
Following filtered dimension spec retains only the values matching regex. Note that `listFiltered` is faster than this and one should use that for whitelist or blacklist usecase.
|
||||
```json
|
||||
{ "type" : "regexFiltered", "delegate" : <dimensionSpec>, "pattern": <java regex pattern> }
|
||||
```
|
||||
|
||||
For more details and examples, see [multi-valued dimensions](multi-valued-dimensions.html).
|
||||
|
|
|
@ -0,0 +1,238 @@
|
|||
---
|
||||
layout: doc_page
|
||||
---
|
||||
|
||||
Druid supports "multi-valued" dimensions. See the section on multi-valued columns in [segments](../design/segments.html) for internal representation details. This document describes the behavior of groupBy(topN has similar behavior) queries on multi-valued dimensions when they are used as a dimension being grouped by.
|
||||
|
||||
Suppose, you have a dataSource with a segment that contains following rows with a multi-valued dimension called tags.
|
||||
|
||||
```
|
||||
2772011-01-12T00:00:00.000Z,["t1","t2","t3"], #row1
|
||||
2782011-01-13T00:00:00.000Z,["t3","t4","t5"], #row2
|
||||
2792011-01-14T00:00:00.000Z,["t5","t6","t7"] #row3
|
||||
```
|
||||
|
||||
### Group-By query with no filtering
|
||||
|
||||
See [GroupBy querying](groupbyquery.html) for details.
|
||||
|
||||
```json
|
||||
{
|
||||
"queryType": "groupBy",
|
||||
"dataSource": "test",
|
||||
"intervals": [
|
||||
"1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
|
||||
],
|
||||
"granularity": {
|
||||
"type": "all"
|
||||
},
|
||||
"dimensions": [
|
||||
{
|
||||
"type": "default",
|
||||
"dimension": "tags",
|
||||
"outputName": "tags"
|
||||
}
|
||||
],
|
||||
"aggregations": [
|
||||
{
|
||||
"type": "count",
|
||||
"name": "count"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
returns following result.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t1"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t2"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 2,
|
||||
"tags": "t3"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t4"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 2,
|
||||
"tags": "t5"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t6"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t7"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
notice how original rows are "exploded" into multiple rows and merged.
|
||||
|
||||
### Group-By query with a selector query filter
|
||||
|
||||
See [query filters](filters.html) for details of selector query filter.
|
||||
|
||||
```json
|
||||
{
|
||||
"queryType": "groupBy",
|
||||
"dataSource": "test",
|
||||
"intervals": [
|
||||
"1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
|
||||
],
|
||||
"filter": {
|
||||
"type": "selector",
|
||||
"dimension": "tags",
|
||||
"value": "t3"
|
||||
},
|
||||
"granularity": {
|
||||
"type": "all"
|
||||
},
|
||||
"dimensions": [
|
||||
{
|
||||
"type": "default",
|
||||
"dimension": "tags",
|
||||
"outputName": "tags"
|
||||
}
|
||||
],
|
||||
"aggregations": [
|
||||
{
|
||||
"type": "count",
|
||||
"name": "count"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
returns following result.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t1"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t2"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 2,
|
||||
"tags": "t3"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t4"
|
||||
}
|
||||
},
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 1,
|
||||
"tags": "t5"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
You might be surprised to see inclusion of "t1", "t2", "t4" and "t5" in the results. It happens because query filter is applied on the row before explosion. For multi-valued dimensions, selector filter for "t3" would match row1 and row2, after which exploding is done. For multi-valued dimensions, query filter matches a row if any individual value inside the multiple values matches the query filter.
|
||||
|
||||
### Group-By query with a selector query filter and additional filter in "dimensions" attributes
|
||||
|
||||
To solve the problem above and to get only rows for "t3" returned, you would have to use a "filtered dimension spec" as in the query below.
|
||||
|
||||
See section on filtered dimensionSpecs in [dimensionSpecs](dimensionspecs.html) for details.
|
||||
|
||||
```json
|
||||
{
|
||||
"queryType": "groupBy",
|
||||
"dataSource": "test",
|
||||
"intervals": [
|
||||
"1970-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z"
|
||||
],
|
||||
"filter": {
|
||||
"type": "selector",
|
||||
"dimension": "tags",
|
||||
"value": "t3"
|
||||
},
|
||||
"granularity": {
|
||||
"type": "all"
|
||||
},
|
||||
"dimensions": [
|
||||
{
|
||||
"type": "listFiltered",
|
||||
"delegate": {
|
||||
"type": "default",
|
||||
"dimension": "tags",
|
||||
"outputName": "tags"
|
||||
},
|
||||
"values": ["t3"]
|
||||
}
|
||||
],
|
||||
"aggregations": [
|
||||
{
|
||||
"type": "count",
|
||||
"name": "count"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
returns following result.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"timestamp": "1970-01-01T00:00:00.000Z",
|
||||
"event": {
|
||||
"count": 2,
|
||||
"tags": "t3"
|
||||
}
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Note that, for groupBy queries, you could get similar result with a [having spec](having.html) but using a filtered dimensionSpec would be much more efficient because that gets applied at the lowest level in the query processing pipeline while having spec is applied at the highest level of groupBy query processing.
|
||||
|
|
@ -38,6 +38,7 @@ h2. Querying
|
|||
** "Context":../querying/query-context.html
|
||||
* "SQL":../querying/sql.html
|
||||
* "Joins":../querying/joins.html
|
||||
* "Multi-Valued Dimensions":../querying/multi-valued-dimensions.html
|
||||
|
||||
h2. Design
|
||||
* "Overview":../design/design.html
|
||||
|
|
|
@ -33,6 +33,7 @@ import io.druid.query.aggregation.Aggregators;
|
|||
import io.druid.query.aggregation.BufferAggregator;
|
||||
import io.druid.query.aggregation.hyperloglog.HyperLogLogCollector;
|
||||
import io.druid.query.aggregation.hyperloglog.HyperUniquesAggregatorFactory;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.segment.ColumnSelectorFactory;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
import org.apache.commons.codec.binary.Base64;
|
||||
|
@ -107,7 +108,7 @@ public class CardinalityAggregatorFactory implements AggregatorFactory
|
|||
@Override
|
||||
public DimensionSelector apply(@Nullable String input)
|
||||
{
|
||||
return columnFactory.makeDimensionSelector(input, null);
|
||||
return columnFactory.makeDimensionSelector(new DefaultDimensionSpec(input, input));
|
||||
}
|
||||
}
|
||||
), Predicates.notNull()
|
||||
|
|
|
@ -0,0 +1,68 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query.dimension;
|
||||
|
||||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
import com.google.common.base.Preconditions;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
|
||||
/**
|
||||
*/
|
||||
public abstract class BaseFilteredDimensionSpec implements DimensionSpec
|
||||
{
|
||||
protected final DimensionSpec delegate;
|
||||
|
||||
public BaseFilteredDimensionSpec(
|
||||
@JsonProperty("delegate") DimensionSpec delegate
|
||||
)
|
||||
{
|
||||
this.delegate = Preconditions.checkNotNull(delegate, "delegate must not be null");
|
||||
}
|
||||
|
||||
@JsonProperty
|
||||
public DimensionSpec getDelegate()
|
||||
{
|
||||
return delegate;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getDimension()
|
||||
{
|
||||
return delegate.getDimension();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getOutputName()
|
||||
{
|
||||
return delegate.getOutputName();
|
||||
}
|
||||
|
||||
@Override
|
||||
public ExtractionFn getExtractionFn()
|
||||
{
|
||||
return delegate.getExtractionFn();
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean preservesOrdering()
|
||||
{
|
||||
return delegate.preservesOrdering();
|
||||
}
|
||||
}
|
|
@ -23,6 +23,7 @@ import com.fasterxml.jackson.annotation.JsonCreator;
|
|||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
import com.metamx.common.StringUtils;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
|
||||
import java.nio.ByteBuffer;
|
||||
|
||||
|
@ -66,6 +67,12 @@ public class DefaultDimensionSpec implements DimensionSpec
|
|||
return null;
|
||||
}
|
||||
|
||||
@Override
|
||||
public DimensionSelector decorate(DimensionSelector selector)
|
||||
{
|
||||
return selector;
|
||||
}
|
||||
|
||||
@Override
|
||||
public byte[] getCacheKey()
|
||||
{
|
||||
|
|
|
@ -22,23 +22,30 @@ package io.druid.query.dimension;
|
|||
import com.fasterxml.jackson.annotation.JsonSubTypes;
|
||||
import com.fasterxml.jackson.annotation.JsonTypeInfo;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
|
||||
/**
|
||||
*/
|
||||
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type", defaultImpl = LegacyDimensionSpec.class)
|
||||
@JsonSubTypes(value = {
|
||||
@JsonSubTypes.Type(name = "default", value = DefaultDimensionSpec.class),
|
||||
@JsonSubTypes.Type(name = "extraction", value = ExtractionDimensionSpec.class)
|
||||
@JsonSubTypes.Type(name = "extraction", value = ExtractionDimensionSpec.class),
|
||||
@JsonSubTypes.Type(name = "regexFiltered", value = RegexFilteredDimensionSpec.class),
|
||||
@JsonSubTypes.Type(name = "listFiltered", value = ListFilteredDimensionSpec.class)
|
||||
})
|
||||
public interface DimensionSpec
|
||||
{
|
||||
public String getDimension();
|
||||
String getDimension();
|
||||
|
||||
public String getOutputName();
|
||||
String getOutputName();
|
||||
|
||||
public ExtractionFn getExtractionFn();
|
||||
//ExtractionFn can be implemented with decorate(..) fn
|
||||
@Deprecated
|
||||
ExtractionFn getExtractionFn();
|
||||
|
||||
public byte[] getCacheKey();
|
||||
DimensionSelector decorate(DimensionSelector selector);
|
||||
|
||||
public boolean preservesOrdering();
|
||||
byte[] getCacheKey();
|
||||
|
||||
boolean preservesOrdering();
|
||||
}
|
||||
|
|
|
@ -24,6 +24,7 @@ import com.fasterxml.jackson.annotation.JsonProperty;
|
|||
import com.google.common.base.Preconditions;
|
||||
import com.metamx.common.StringUtils;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
|
||||
import java.nio.ByteBuffer;
|
||||
|
||||
|
@ -77,6 +78,12 @@ public class ExtractionDimensionSpec implements DimensionSpec
|
|||
return extractionFn;
|
||||
}
|
||||
|
||||
@Override
|
||||
public DimensionSelector decorate(DimensionSelector selector)
|
||||
{
|
||||
return selector;
|
||||
}
|
||||
|
||||
@Override
|
||||
public byte[] getCacheKey()
|
||||
{
|
||||
|
|
|
@ -0,0 +1,191 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query.dimension;
|
||||
|
||||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
import com.google.common.base.Preconditions;
|
||||
import com.metamx.common.StringUtils;
|
||||
import io.druid.query.filter.DimFilterCacheHelper;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
import io.druid.segment.data.IndexedInts;
|
||||
import io.druid.segment.data.ListBasedIndexedInts;
|
||||
|
||||
import java.nio.ByteBuffer;
|
||||
import java.util.ArrayList;
|
||||
import java.util.HashSet;
|
||||
import java.util.List;
|
||||
import java.util.Set;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class ListFilteredDimensionSpec extends BaseFilteredDimensionSpec
|
||||
{
|
||||
|
||||
private static final byte CACHE_TYPE_ID = 0x3;
|
||||
|
||||
private final List<String> values;
|
||||
private final boolean isWhitelist;
|
||||
|
||||
public ListFilteredDimensionSpec(
|
||||
@JsonProperty("delegate") DimensionSpec delegate,
|
||||
@JsonProperty("values") List<String> values,
|
||||
@JsonProperty("isWhitelist") Boolean isWhitelist
|
||||
)
|
||||
{
|
||||
super(delegate);
|
||||
|
||||
Preconditions.checkArgument(values != null && values.size() > 0, "values list must be non-empty");
|
||||
this.values = values;
|
||||
|
||||
this.isWhitelist = isWhitelist == null ? true : isWhitelist.booleanValue();
|
||||
}
|
||||
|
||||
@JsonProperty
|
||||
public List<String> getValues()
|
||||
{
|
||||
return values;
|
||||
}
|
||||
|
||||
@JsonProperty("isWhitelist")
|
||||
public boolean isWhitelist()
|
||||
{
|
||||
return isWhitelist;
|
||||
}
|
||||
|
||||
@Override
|
||||
public DimensionSelector decorate(final DimensionSelector selector)
|
||||
{
|
||||
if (selector == null) {
|
||||
return selector;
|
||||
}
|
||||
|
||||
final Set<Integer> matched = new HashSet<>(values.size());
|
||||
for (String value : values) {
|
||||
int i = selector.lookupId(value);
|
||||
if (i >= 0) {
|
||||
matched.add(i);
|
||||
}
|
||||
};
|
||||
|
||||
return new DimensionSelector()
|
||||
{
|
||||
@Override
|
||||
public IndexedInts getRow()
|
||||
{
|
||||
IndexedInts baseRow = selector.getRow();
|
||||
List<Integer> result = new ArrayList<>(baseRow.size());
|
||||
|
||||
for (int i : baseRow) {
|
||||
if (matched.contains(i)) {
|
||||
if (isWhitelist) {
|
||||
result.add(i);
|
||||
}
|
||||
} else {
|
||||
if (!isWhitelist) {
|
||||
result.add(i);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return new ListBasedIndexedInts(result);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getValueCardinality()
|
||||
{
|
||||
return matched.size();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String lookupName(int id)
|
||||
{
|
||||
return selector.lookupName(id);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int lookupId(String name)
|
||||
{
|
||||
return selector.lookupId(name);
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@Override
|
||||
public byte[] getCacheKey()
|
||||
{
|
||||
byte[] delegateCacheKey = delegate.getCacheKey();
|
||||
|
||||
byte[][] valuesBytes = new byte[values.size()][];
|
||||
int valuesBytesSize = 0;
|
||||
int index = 0;
|
||||
for (String value : values) {
|
||||
valuesBytes[index] = StringUtils.toUtf8(value);
|
||||
valuesBytesSize += valuesBytes[index].length + 1;
|
||||
++index;
|
||||
}
|
||||
|
||||
ByteBuffer filterCacheKey = ByteBuffer.allocate(3 + delegateCacheKey.length + valuesBytesSize)
|
||||
.put(CACHE_TYPE_ID)
|
||||
.put(delegateCacheKey)
|
||||
.put((byte) (isWhitelist ? 1 : 0))
|
||||
.put(DimFilterCacheHelper.STRING_SEPARATOR);
|
||||
for (byte[] bytes : valuesBytes) {
|
||||
filterCacheKey.put(bytes)
|
||||
.put(DimFilterCacheHelper.STRING_SEPARATOR);
|
||||
}
|
||||
return filterCacheKey.array();
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object o)
|
||||
{
|
||||
if (this == o) {
|
||||
return true;
|
||||
}
|
||||
if (o == null || getClass() != o.getClass()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
ListFilteredDimensionSpec that = (ListFilteredDimensionSpec) o;
|
||||
|
||||
if (isWhitelist != that.isWhitelist) {
|
||||
return false;
|
||||
}
|
||||
return values.equals(that.values);
|
||||
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode()
|
||||
{
|
||||
int result = values.hashCode();
|
||||
result = 31 * result + (isWhitelist ? 1 : 0);
|
||||
return result;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString()
|
||||
{
|
||||
return "ListFilteredDimensionSpec{" +
|
||||
"values=" + values +
|
||||
", isWhitelist=" + isWhitelist +
|
||||
'}';
|
||||
}
|
||||
}
|
|
@ -0,0 +1,162 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query.dimension;
|
||||
|
||||
import com.fasterxml.jackson.annotation.JsonProperty;
|
||||
import com.google.common.base.Preconditions;
|
||||
import com.google.common.base.Strings;
|
||||
import com.metamx.common.StringUtils;
|
||||
import io.druid.query.filter.DimFilterCacheHelper;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
import io.druid.segment.data.IndexedInts;
|
||||
import io.druid.segment.data.ListBasedIndexedInts;
|
||||
|
||||
import java.nio.ByteBuffer;
|
||||
import java.util.ArrayList;
|
||||
import java.util.BitSet;
|
||||
import java.util.List;
|
||||
import java.util.regex.Pattern;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class RegexFilteredDimensionSpec extends BaseFilteredDimensionSpec
|
||||
{
|
||||
|
||||
private static final byte CACHE_TYPE_ID = 0x2;
|
||||
|
||||
private final String pattern;
|
||||
|
||||
private final Pattern compiledRegex;
|
||||
|
||||
public RegexFilteredDimensionSpec(
|
||||
@JsonProperty("delegate") DimensionSpec delegate,
|
||||
@JsonProperty("pattern") String pattern //rows not matching the pattern will be discarded
|
||||
)
|
||||
{
|
||||
super(delegate);
|
||||
this.pattern = Preconditions.checkNotNull(pattern, "pattern must not be null");
|
||||
this.compiledRegex = Pattern.compile(pattern);
|
||||
}
|
||||
|
||||
@JsonProperty
|
||||
public String getPattern()
|
||||
{
|
||||
return pattern;
|
||||
}
|
||||
|
||||
@Override
|
||||
public DimensionSelector decorate(final DimensionSelector selector)
|
||||
{
|
||||
if (selector == null) {
|
||||
return selector;
|
||||
}
|
||||
|
||||
final BitSet bitSetOfIds = new BitSet(selector.getValueCardinality());
|
||||
for (int i = 0; i < selector.getValueCardinality(); i++) {
|
||||
if (compiledRegex.matcher(Strings.nullToEmpty(selector.lookupName(i))).matches()) {
|
||||
bitSetOfIds.set(i);
|
||||
}
|
||||
}
|
||||
|
||||
return new DimensionSelector()
|
||||
{
|
||||
@Override
|
||||
public IndexedInts getRow()
|
||||
{
|
||||
IndexedInts baseRow = selector.getRow();
|
||||
List<Integer> result = new ArrayList<>(baseRow.size());
|
||||
|
||||
for (int i : baseRow) {
|
||||
if (bitSetOfIds.get(i)) {
|
||||
result.add(i);
|
||||
}
|
||||
}
|
||||
|
||||
return new ListBasedIndexedInts(result);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int getValueCardinality()
|
||||
{
|
||||
return bitSetOfIds.cardinality();
|
||||
}
|
||||
|
||||
@Override
|
||||
public String lookupName(int id)
|
||||
{
|
||||
return selector.lookupName(id);
|
||||
}
|
||||
|
||||
@Override
|
||||
public int lookupId(String name)
|
||||
{
|
||||
return selector.lookupId(name);
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
@Override
|
||||
public byte[] getCacheKey()
|
||||
{
|
||||
byte[] delegateCacheKey = delegate.getCacheKey();
|
||||
byte[] regexBytes = StringUtils.toUtf8(pattern);
|
||||
return ByteBuffer.allocate(2 + delegateCacheKey.length + regexBytes.length)
|
||||
.put(CACHE_TYPE_ID)
|
||||
.put(delegateCacheKey)
|
||||
.put(DimFilterCacheHelper.STRING_SEPARATOR)
|
||||
.put(regexBytes)
|
||||
.array();
|
||||
}
|
||||
|
||||
@Override
|
||||
public boolean equals(Object o)
|
||||
{
|
||||
if (this == o) {
|
||||
return true;
|
||||
}
|
||||
if (o == null || getClass() != o.getClass()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
RegexFilteredDimensionSpec that = (RegexFilteredDimensionSpec) o;
|
||||
|
||||
if (!delegate.equals(that.delegate)) {
|
||||
return false;
|
||||
}
|
||||
return pattern.equals(that.pattern);
|
||||
|
||||
}
|
||||
|
||||
@Override
|
||||
public int hashCode()
|
||||
{
|
||||
int result = delegate.hashCode();
|
||||
result = 31 * result + pattern.hashCode();
|
||||
return result;
|
||||
}
|
||||
|
||||
@Override
|
||||
public String toString()
|
||||
{
|
||||
return "RegexFilteredDimensionSpec{" +
|
||||
"pattern='" + pattern + '\'' +
|
||||
'}';
|
||||
}
|
||||
}
|
|
@ -24,7 +24,7 @@ import java.util.List;
|
|||
|
||||
/**
|
||||
*/
|
||||
class DimFilterCacheHelper
|
||||
public class DimFilterCacheHelper
|
||||
{
|
||||
static final byte NOOP_CACHE_ID = -0x4;
|
||||
static final byte SELECTOR_CACHE_ID = 0x0;
|
||||
|
@ -37,7 +37,7 @@ class DimFilterCacheHelper
|
|||
static final byte JAVASCRIPT_CACHE_ID = 0x7;
|
||||
static final byte SPATIAL_CACHE_ID = 0x8;
|
||||
static final byte IN_CACHE_ID = 0x9;
|
||||
static final byte STRING_SEPARATOR = (byte) 0xFF;
|
||||
public static final byte STRING_SEPARATOR = (byte) 0xFF;
|
||||
public static byte BOUND_CACHE_ID = 0xA;
|
||||
|
||||
static byte[] computeCacheKey(byte cacheIdKey, List<DimFilter> filters)
|
||||
|
|
|
@ -316,10 +316,7 @@ public class GroupByQueryEngine
|
|||
|
||||
for (int i = 0; i < dimensionSpecs.size(); ++i) {
|
||||
final DimensionSpec dimSpec = dimensionSpecs.get(i);
|
||||
final DimensionSelector selector = cursor.makeDimensionSelector(
|
||||
dimSpec.getDimension(),
|
||||
dimSpec.getExtractionFn()
|
||||
);
|
||||
final DimensionSelector selector = cursor.makeDimensionSelector(dimSpec);
|
||||
if (selector != null) {
|
||||
dimensions.add(selector);
|
||||
dimNames.add(dimSpec.getOutputName());
|
||||
|
|
|
@ -175,7 +175,7 @@ public class SearchQueryRunner implements QueryRunner<Result<SearchResultValue>>
|
|||
for (DimensionSpec dim : dimsToSearch) {
|
||||
dimSelectors.put(
|
||||
dim.getOutputName(),
|
||||
cursor.makeDimensionSelector(dim.getDimension(), dim.getExtractionFn())
|
||||
cursor.makeDimensionSelector(dim)
|
||||
);
|
||||
}
|
||||
|
||||
|
|
|
@ -26,6 +26,7 @@ import com.metamx.common.ISE;
|
|||
import com.metamx.common.guava.Sequence;
|
||||
import io.druid.query.QueryRunnerHelper;
|
||||
import io.druid.query.Result;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.segment.Cursor;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
import io.druid.segment.LongColumnSelector;
|
||||
|
@ -89,7 +90,7 @@ public class SelectQueryEngine
|
|||
final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap();
|
||||
for (String dim : dims) {
|
||||
// switching to using DimensionSpec for select would allow the use of extractionFn here.
|
||||
final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim, null);
|
||||
final DimensionSelector dimSelector = cursor.makeDimensionSelector(new DefaultDimensionSpec(dim, dim));
|
||||
dimSelectors.put(dim, dimSelector);
|
||||
}
|
||||
|
||||
|
|
|
@ -43,8 +43,7 @@ public class TopNMapFn implements Function<Cursor, Result<TopNResultValue>>
|
|||
public Result<TopNResultValue> apply(Cursor cursor)
|
||||
{
|
||||
final DimensionSelector dimSelector = cursor.makeDimensionSelector(
|
||||
query.getDimensionSpec().getDimension(),
|
||||
query.getDimensionSpec().getExtractionFn()
|
||||
query.getDimensionSpec()
|
||||
);
|
||||
if (dimSelector == null) {
|
||||
return null;
|
||||
|
|
|
@ -19,16 +19,14 @@
|
|||
|
||||
package io.druid.segment;
|
||||
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
|
||||
import javax.annotation.Nullable;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
|
||||
/**
|
||||
* Factory class for MetricSelectors
|
||||
*/
|
||||
public interface ColumnSelectorFactory
|
||||
{
|
||||
public DimensionSelector makeDimensionSelector(String dimensionName, @Nullable ExtractionFn extractionFn);
|
||||
public DimensionSelector makeDimensionSelector(DimensionSpec dimensionSpec);
|
||||
public FloatColumnSelector makeFloatColumnSelector(String columnName);
|
||||
public LongColumnSelector makeLongColumnSelector(String columnName);
|
||||
public ObjectColumnSelector makeObjectColumnSelector(String columnName);
|
||||
|
|
|
@ -30,6 +30,7 @@ import com.metamx.common.guava.Sequence;
|
|||
import com.metamx.common.guava.Sequences;
|
||||
import io.druid.granularity.QueryGranularity;
|
||||
import io.druid.query.QueryInterruptedException;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.query.filter.Filter;
|
||||
import io.druid.segment.column.Column;
|
||||
|
@ -44,7 +45,6 @@ import io.druid.segment.data.Offset;
|
|||
import org.joda.time.DateTime;
|
||||
import org.joda.time.Interval;
|
||||
|
||||
import javax.annotation.Nullable;
|
||||
import java.io.Closeable;
|
||||
import java.io.IOException;
|
||||
import java.util.Iterator;
|
||||
|
@ -296,10 +296,19 @@ public class QueryableIndexStorageAdapter implements StorageAdapter
|
|||
|
||||
@Override
|
||||
public DimensionSelector makeDimensionSelector(
|
||||
final String dimension,
|
||||
@Nullable final ExtractionFn extractionFn
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
return dimensionSpec.decorate(makeDimensionSelectorUndecorated(dimensionSpec));
|
||||
}
|
||||
|
||||
private DimensionSelector makeDimensionSelectorUndecorated(
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
final String dimension = dimensionSpec.getDimension();
|
||||
final ExtractionFn extractionFn = dimensionSpec.getExtractionFn();
|
||||
|
||||
final Column columnDesc = index.getColumn(dimension);
|
||||
if (columnDesc == null) {
|
||||
return NULL_DIMENSION_SELECTOR;
|
||||
|
|
|
@ -0,0 +1,63 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.segment.data;
|
||||
|
||||
import java.io.IOException;
|
||||
import java.util.Iterator;
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class ListBasedIndexedInts implements IndexedInts
|
||||
{
|
||||
private final List<Integer> expansion;
|
||||
|
||||
public ListBasedIndexedInts(List<Integer> expansion) {this.expansion = expansion;}
|
||||
|
||||
@Override
|
||||
public int size()
|
||||
{
|
||||
return expansion.size();
|
||||
}
|
||||
|
||||
@Override
|
||||
public int get(int index)
|
||||
{
|
||||
return expansion.get(index);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Iterator<Integer> iterator()
|
||||
{
|
||||
return new IndexedIntsIterator(this);
|
||||
}
|
||||
|
||||
@Override
|
||||
public void fill(int index, int[] toFill)
|
||||
{
|
||||
throw new UnsupportedOperationException("fill not supported");
|
||||
}
|
||||
|
||||
@Override
|
||||
public void close() throws IOException
|
||||
{
|
||||
|
||||
}
|
||||
}
|
|
@ -23,6 +23,7 @@ import com.google.common.base.Predicate;
|
|||
import com.google.common.base.Strings;
|
||||
import com.google.common.collect.Lists;
|
||||
import com.metamx.collections.bitmap.ImmutableBitmap;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.query.filter.BitmapIndexSelector;
|
||||
import io.druid.query.filter.Filter;
|
||||
|
@ -124,7 +125,9 @@ public class ExtractionFilter implements Filter
|
|||
@Override
|
||||
public ValueMatcher makeMatcher(ColumnSelectorFactory columnSelectorFactory)
|
||||
{
|
||||
final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(dimension, null);
|
||||
final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(
|
||||
new DefaultDimensionSpec(dimension, dimension)
|
||||
);
|
||||
if (dimensionSelector == null) {
|
||||
return new BooleanValueMatcher(value.equals(Strings.nullToEmpty(fn.apply(null))));
|
||||
} else {
|
||||
|
|
|
@ -21,6 +21,7 @@ package io.druid.segment.filter;
|
|||
|
||||
import com.google.common.base.Strings;
|
||||
import com.metamx.collections.bitmap.ImmutableBitmap;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.query.filter.BitmapIndexSelector;
|
||||
import io.druid.query.filter.Filter;
|
||||
import io.druid.query.filter.ValueMatcher;
|
||||
|
@ -60,7 +61,9 @@ public class SelectorFilter implements Filter
|
|||
@Override
|
||||
public ValueMatcher makeMatcher(ColumnSelectorFactory columnSelectorFactory)
|
||||
{
|
||||
final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(dimension, null);
|
||||
final DimensionSelector dimensionSelector = columnSelectorFactory.makeDimensionSelector(
|
||||
new DefaultDimensionSpec(dimension, dimension)
|
||||
);
|
||||
|
||||
// Missing columns match a null or empty string value and don't match anything else
|
||||
if (dimensionSelector == null) {
|
||||
|
|
|
@ -38,6 +38,7 @@ import io.druid.data.input.impl.SpatialDimensionSchema;
|
|||
import io.druid.granularity.QueryGranularity;
|
||||
import io.druid.query.aggregation.AggregatorFactory;
|
||||
import io.druid.query.aggregation.PostAggregator;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.segment.ColumnSelectorFactory;
|
||||
import io.druid.segment.DimensionSelector;
|
||||
|
@ -169,8 +170,20 @@ public abstract class IncrementalIndex<AggregatorType> implements Iterable<Row>,
|
|||
}
|
||||
|
||||
@Override
|
||||
public DimensionSelector makeDimensionSelector(final String dimension, final ExtractionFn extractionFn)
|
||||
public DimensionSelector makeDimensionSelector(
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
return dimensionSpec.decorate(makeDimensionSelectorUndecorated(dimensionSpec));
|
||||
}
|
||||
|
||||
private DimensionSelector makeDimensionSelectorUndecorated(
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
final String dimension = dimensionSpec.getDimension();
|
||||
final ExtractionFn extractionFn = dimensionSpec.getExtractionFn();
|
||||
|
||||
return new DimensionSelector()
|
||||
{
|
||||
@Override
|
||||
|
|
|
@ -30,6 +30,7 @@ import com.metamx.common.guava.Sequence;
|
|||
import com.metamx.common.guava.Sequences;
|
||||
import io.druid.granularity.QueryGranularity;
|
||||
import io.druid.query.QueryInterruptedException;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.query.filter.Filter;
|
||||
import io.druid.query.filter.ValueMatcher;
|
||||
|
@ -294,10 +295,19 @@ public class IncrementalIndexStorageAdapter implements StorageAdapter
|
|||
|
||||
@Override
|
||||
public DimensionSelector makeDimensionSelector(
|
||||
final String dimension,
|
||||
@Nullable final ExtractionFn extractionFn
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
return dimensionSpec.decorate(makeDimensionSelectorUndecorated(dimensionSpec));
|
||||
}
|
||||
|
||||
private DimensionSelector makeDimensionSelectorUndecorated(
|
||||
DimensionSpec dimensionSpec
|
||||
)
|
||||
{
|
||||
final String dimension = dimensionSpec.getDimension();
|
||||
final ExtractionFn extractionFn = dimensionSpec.getExtractionFn();
|
||||
|
||||
if (dimension.equals(Column.TIME_COLUMN_NAME)) {
|
||||
return new SingleScanTimeDimSelector(makeLongColumnSelector(dimension), extractionFn);
|
||||
}
|
||||
|
|
|
@ -0,0 +1,260 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query;
|
||||
|
||||
import com.fasterxml.jackson.databind.Module;
|
||||
import com.google.common.collect.ImmutableList;
|
||||
import com.google.common.collect.ImmutableMap;
|
||||
import com.google.common.collect.Lists;
|
||||
import com.google.common.io.Files;
|
||||
import com.metamx.common.guava.Sequence;
|
||||
import com.metamx.common.guava.Sequences;
|
||||
import io.druid.data.input.Row;
|
||||
import io.druid.data.input.impl.CSVParseSpec;
|
||||
import io.druid.data.input.impl.DimensionsSpec;
|
||||
import io.druid.data.input.impl.StringInputRowParser;
|
||||
import io.druid.data.input.impl.TimestampSpec;
|
||||
import io.druid.granularity.QueryGranularity;
|
||||
import io.druid.query.aggregation.AggregationTestHelper;
|
||||
import io.druid.query.aggregation.AggregatorFactory;
|
||||
import io.druid.query.aggregation.CountAggregatorFactory;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
import io.druid.query.dimension.RegexFilteredDimensionSpec;
|
||||
import io.druid.query.filter.SelectorDimFilter;
|
||||
import io.druid.query.groupby.GroupByQuery;
|
||||
import io.druid.query.groupby.GroupByQueryRunnerTestHelper;
|
||||
import io.druid.query.spec.LegacySegmentSpec;
|
||||
import io.druid.segment.IncrementalIndexSegment;
|
||||
import io.druid.segment.IndexSpec;
|
||||
import io.druid.segment.QueryableIndex;
|
||||
import io.druid.segment.QueryableIndexSegment;
|
||||
import io.druid.segment.TestHelper;
|
||||
import io.druid.segment.incremental.IncrementalIndex;
|
||||
import io.druid.segment.incremental.OnheapIncrementalIndex;
|
||||
import org.apache.commons.io.FileUtils;
|
||||
import org.junit.AfterClass;
|
||||
import org.junit.BeforeClass;
|
||||
import org.junit.Test;
|
||||
|
||||
import java.io.File;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Arrays;
|
||||
import java.util.List;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class MultiValuedDimensionTest
|
||||
{
|
||||
private AggregationTestHelper helper;
|
||||
|
||||
private static IncrementalIndex incrementalIndex;
|
||||
private static QueryableIndex queryableIndex;
|
||||
|
||||
private static File persistedSegmentDir;
|
||||
|
||||
public MultiValuedDimensionTest() throws Exception
|
||||
{
|
||||
helper = new AggregationTestHelper(
|
||||
ImmutableList.<Module>of(), null
|
||||
);
|
||||
}
|
||||
|
||||
@BeforeClass
|
||||
public static void setupClass() throws Exception
|
||||
{
|
||||
incrementalIndex = new OnheapIncrementalIndex(
|
||||
0,
|
||||
QueryGranularity.NONE,
|
||||
new AggregatorFactory[]{
|
||||
new CountAggregatorFactory("count")
|
||||
},
|
||||
true,
|
||||
5000
|
||||
);
|
||||
|
||||
StringInputRowParser parser = new StringInputRowParser(
|
||||
new CSVParseSpec(
|
||||
new TimestampSpec("timestamp", "iso", null),
|
||||
new DimensionsSpec(ImmutableList.of("product", "tags"), null, null),
|
||||
"\t",
|
||||
ImmutableList.of("timestamp", "product", "tags")
|
||||
),
|
||||
"UTF-8"
|
||||
);
|
||||
|
||||
String[] rows = new String[]{
|
||||
"2011-01-12T00:00:00.000Z,product_1,t1\tt2\tt3",
|
||||
"2011-01-13T00:00:00.000Z,product_2,t3\tt4\tt5",
|
||||
"2011-01-14T00:00:00.000Z,product_3,t5\tt6\tt7",
|
||||
};
|
||||
|
||||
for (String row : rows) {
|
||||
incrementalIndex.add(parser.parse(row));
|
||||
}
|
||||
|
||||
persistedSegmentDir = Files.createTempDir();
|
||||
TestHelper.getTestIndexMerger()
|
||||
.persist(incrementalIndex, persistedSegmentDir, ImmutableMap.<String, Object>of(), new IndexSpec());
|
||||
|
||||
queryableIndex = TestHelper.getTestIndexIO().loadIndex(persistedSegmentDir);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testGroupByNoFilter() throws Exception
|
||||
{
|
||||
GroupByQuery query = GroupByQuery
|
||||
.builder()
|
||||
.setDataSource("xx")
|
||||
.setQuerySegmentSpec(new LegacySegmentSpec("1970/3000"))
|
||||
.setGranularity(QueryGranularity.ALL)
|
||||
.setDimensions(Lists.<DimensionSpec>newArrayList(new DefaultDimensionSpec("tags", "tags")))
|
||||
.setAggregatorSpecs(
|
||||
Arrays.asList(
|
||||
new AggregatorFactory[]
|
||||
{
|
||||
new CountAggregatorFactory("count")
|
||||
}
|
||||
)
|
||||
)
|
||||
.build();
|
||||
|
||||
Sequence<Row> result = helper.runQueryOnSegmentsObjs(
|
||||
ImmutableList.of(
|
||||
new QueryableIndexSegment("sid1", queryableIndex),
|
||||
new IncrementalIndexSegment(incrementalIndex, "sid2")
|
||||
),
|
||||
query
|
||||
);
|
||||
|
||||
List<Row> expectedResults = Arrays.asList(
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t1", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t2", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t3", "count", 4L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t4", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t5", "count", 4L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t6", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t7", "count", 2L)
|
||||
);
|
||||
|
||||
TestHelper.assertExpectedObjects(expectedResults, Sequences.toList(result, new ArrayList<Row>()), "");
|
||||
|
||||
result = helper.runQueryOnSegmentsObjs(
|
||||
ImmutableList.of(
|
||||
new QueryableIndexSegment("sid1", queryableIndex),
|
||||
new IncrementalIndexSegment(incrementalIndex, "sid2")
|
||||
),
|
||||
query
|
||||
);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testGroupByWithDimFilter() throws Exception
|
||||
{
|
||||
GroupByQuery query = GroupByQuery
|
||||
.builder()
|
||||
.setDataSource("xx")
|
||||
.setQuerySegmentSpec(new LegacySegmentSpec("1970/3000"))
|
||||
.setGranularity(QueryGranularity.ALL)
|
||||
.setDimensions(Lists.<DimensionSpec>newArrayList(new DefaultDimensionSpec("tags", "tags")))
|
||||
.setAggregatorSpecs(
|
||||
Arrays.asList(
|
||||
new AggregatorFactory[]
|
||||
{
|
||||
new CountAggregatorFactory("count")
|
||||
}
|
||||
)
|
||||
)
|
||||
.setDimFilter(
|
||||
new SelectorDimFilter("tags", "t3")
|
||||
)
|
||||
.build();
|
||||
|
||||
Sequence<Row> result = helper.runQueryOnSegmentsObjs(
|
||||
ImmutableList.of(
|
||||
new QueryableIndexSegment("sid1", queryableIndex),
|
||||
new IncrementalIndexSegment(incrementalIndex, "sid2")
|
||||
),
|
||||
query
|
||||
);
|
||||
|
||||
List<Row> expectedResults = Arrays.asList(
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t1", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t2", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t3", "count", 4L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t4", "count", 2L),
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t5", "count", 2L)
|
||||
);
|
||||
|
||||
TestHelper.assertExpectedObjects(expectedResults, Sequences.toList(result, new ArrayList<Row>()), "");
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testGroupByWithDimFilterAndWithFilteredDimSpec() throws Exception
|
||||
{
|
||||
GroupByQuery query = GroupByQuery
|
||||
.builder()
|
||||
.setDataSource("xx")
|
||||
.setQuerySegmentSpec(new LegacySegmentSpec("1970/3000"))
|
||||
.setGranularity(QueryGranularity.ALL)
|
||||
.setDimensions(
|
||||
Lists.<DimensionSpec>newArrayList(
|
||||
new RegexFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("tags", "tags"),
|
||||
"t3"
|
||||
)
|
||||
)
|
||||
)
|
||||
.setAggregatorSpecs(
|
||||
Arrays.asList(
|
||||
new AggregatorFactory[]
|
||||
{
|
||||
new CountAggregatorFactory("count")
|
||||
}
|
||||
)
|
||||
)
|
||||
.setDimFilter(
|
||||
new SelectorDimFilter("tags", "t3")
|
||||
)
|
||||
.build();
|
||||
|
||||
Sequence<Row> result = helper.runQueryOnSegmentsObjs(
|
||||
ImmutableList.of(
|
||||
new QueryableIndexSegment("sid1", queryableIndex),
|
||||
new IncrementalIndexSegment(incrementalIndex, "sid2")
|
||||
),
|
||||
query
|
||||
);
|
||||
|
||||
List<Row> expectedResults = Arrays.asList(
|
||||
GroupByQueryRunnerTestHelper.createExpectedRow("1970-01-01T00:00:00.000Z", "tags", "t3", "count", 4L)
|
||||
);
|
||||
|
||||
TestHelper.assertExpectedObjects(expectedResults, Sequences.toList(result, new ArrayList<Row>()), "");
|
||||
}
|
||||
|
||||
@AfterClass
|
||||
public static void cleanup() throws Exception
|
||||
{
|
||||
queryableIndex.close();
|
||||
incrementalIndex.close();
|
||||
FileUtils.deleteDirectory(persistedSegmentDir);
|
||||
}
|
||||
}
|
|
@ -296,12 +296,12 @@ public class AggregationTestHelper
|
|||
|
||||
public Sequence<Row> runQueryOnSegments(final List<File> segmentDirs, final GroupByQuery query)
|
||||
{
|
||||
final List<QueryableIndexSegment> segments = Lists.transform(
|
||||
final List<Segment> segments = Lists.transform(
|
||||
segmentDirs,
|
||||
new Function<File, QueryableIndexSegment>()
|
||||
new Function<File, Segment>()
|
||||
{
|
||||
@Override
|
||||
public QueryableIndexSegment apply(File segmentDir)
|
||||
public Segment apply(File segmentDir)
|
||||
{
|
||||
try {
|
||||
return new QueryableIndexSegment("", indexIO.loadIndex(segmentDir));
|
||||
|
@ -314,6 +314,16 @@ public class AggregationTestHelper
|
|||
);
|
||||
|
||||
try {
|
||||
return runQueryOnSegmentsObjs(segments, query);
|
||||
} finally {
|
||||
for(Segment segment: segments) {
|
||||
CloseQuietly.close(segment);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public Sequence<Row> runQueryOnSegmentsObjs(final List<Segment> segments, final GroupByQuery query)
|
||||
{
|
||||
final FinalizeResultsQueryRunner baseRunner = new FinalizeResultsQueryRunner(
|
||||
toolChest.postMergeQueryDecoration(
|
||||
toolChest.mergeResults(
|
||||
|
@ -350,12 +360,6 @@ public class AggregationTestHelper
|
|||
);
|
||||
|
||||
return baseRunner.run(query, Maps.newHashMap());
|
||||
} finally {
|
||||
for(Segment segment: segments) {
|
||||
CloseQuietly.close(segment);
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
public QueryRunner<Row> makeStringSerdeQueryRunner(final ObjectMapper mapper, final QueryToolChest toolChest, final Query<Row> query, final QueryRunner<Row> baseRunner)
|
||||
|
|
|
@ -20,6 +20,7 @@
|
|||
package io.druid.query.aggregation;
|
||||
|
||||
import com.google.common.collect.Lists;
|
||||
import io.druid.query.dimension.DimensionSpec;
|
||||
import io.druid.query.extraction.ExtractionFn;
|
||||
import io.druid.query.filter.AndDimFilter;
|
||||
import io.druid.query.filter.DimFilter;
|
||||
|
@ -73,10 +74,14 @@ public class FilteredAggregatorTest
|
|||
return new ColumnSelectorFactory()
|
||||
{
|
||||
@Override
|
||||
public DimensionSelector makeDimensionSelector(String dimensionName, ExtractionFn extractionFn)
|
||||
public DimensionSelector makeDimensionSelector(DimensionSpec dimensionSpec)
|
||||
{
|
||||
final String dimensionName = dimensionSpec.getDimension();
|
||||
final ExtractionFn extractionFn = dimensionSpec.getExtractionFn();
|
||||
|
||||
if (dimensionName.equals("dim")) {
|
||||
return new DimensionSelector()
|
||||
return dimensionSpec.decorate(
|
||||
new DimensionSelector()
|
||||
{
|
||||
@Override
|
||||
public IndexedInts getRow()
|
||||
|
@ -119,7 +124,8 @@ public class FilteredAggregatorTest
|
|||
throw new IllegalArgumentException();
|
||||
}
|
||||
}
|
||||
};
|
||||
}
|
||||
);
|
||||
} else {
|
||||
throw new UnsupportedOperationException();
|
||||
}
|
||||
|
|
|
@ -0,0 +1,113 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query.dimension;
|
||||
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import com.google.common.collect.ImmutableList;
|
||||
import io.druid.segment.TestHelper;
|
||||
import org.junit.Assert;
|
||||
import org.junit.Test;
|
||||
|
||||
import java.util.Arrays;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class ListFilteredDimensionSpecTest
|
||||
{
|
||||
|
||||
@Test
|
||||
public void testSerde() throws Exception
|
||||
{
|
||||
ObjectMapper mapper = TestHelper.getObjectMapper();
|
||||
|
||||
//isWhitelist = true
|
||||
String jsonStr = "{\n"
|
||||
+ " \"type\": \"listFiltered\",\n"
|
||||
+ " \"delegate\": {\n"
|
||||
+ " \"type\": \"default\",\n"
|
||||
+ " \"dimension\": \"foo\",\n"
|
||||
+ " \"outputName\": \"bar\"\n"
|
||||
+ " },\n"
|
||||
+ " \"values\": [\"xxx\"]\n"
|
||||
+ "}";
|
||||
|
||||
ListFilteredDimensionSpec actual = (ListFilteredDimensionSpec) mapper.readValue(
|
||||
mapper.writeValueAsString(mapper.readValue(jsonStr, DimensionSpec.class)),
|
||||
DimensionSpec.class);
|
||||
|
||||
ListFilteredDimensionSpec expected = new ListFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
ImmutableList.of("xxx"),
|
||||
true
|
||||
);
|
||||
|
||||
Assert.assertEquals(expected, actual);
|
||||
|
||||
//isWhitelist = false
|
||||
jsonStr = "{\n"
|
||||
+ " \"type\": \"listFiltered\",\n"
|
||||
+ " \"delegate\": {\n"
|
||||
+ " \"type\": \"default\",\n"
|
||||
+ " \"dimension\": \"foo\",\n"
|
||||
+ " \"outputName\": \"bar\"\n"
|
||||
+ " },\n"
|
||||
+ " \"values\": [\"xxx\"],\n"
|
||||
+ " \"isWhitelist\": false\n"
|
||||
+ "}";
|
||||
|
||||
actual = (ListFilteredDimensionSpec) mapper.readValue(
|
||||
mapper.writeValueAsString(mapper.readValue(jsonStr, DimensionSpec.class)),
|
||||
DimensionSpec.class);
|
||||
|
||||
expected = new ListFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
ImmutableList.of("xxx"),
|
||||
false
|
||||
);
|
||||
|
||||
Assert.assertEquals(expected, actual);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testGetCacheKey()
|
||||
{
|
||||
ListFilteredDimensionSpec spec1 = new ListFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
ImmutableList.of("xxx"),
|
||||
null
|
||||
);
|
||||
|
||||
ListFilteredDimensionSpec spec2 = new ListFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
ImmutableList.of("xyz"),
|
||||
null
|
||||
);
|
||||
|
||||
Assert.assertFalse(Arrays.equals(spec1.getCacheKey(), spec2.getCacheKey()));
|
||||
|
||||
ListFilteredDimensionSpec spec3 = new ListFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
ImmutableList.of("xxx"),
|
||||
false
|
||||
);
|
||||
|
||||
Assert.assertFalse(Arrays.equals(spec1.getCacheKey(), spec3.getCacheKey()));
|
||||
}
|
||||
}
|
|
@ -0,0 +1,76 @@
|
|||
/*
|
||||
* Licensed to Metamarkets Group Inc. (Metamarkets) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. Metamarkets licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing,
|
||||
* software distributed under the License is distributed on an
|
||||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
* KIND, either express or implied. See the License for the
|
||||
* specific language governing permissions and limitations
|
||||
* under the License.
|
||||
*/
|
||||
|
||||
package io.druid.query.dimension;
|
||||
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import io.druid.segment.TestHelper;
|
||||
import org.junit.Assert;
|
||||
import org.junit.Test;
|
||||
|
||||
import java.util.Arrays;
|
||||
|
||||
/**
|
||||
*/
|
||||
public class RegexFilteredDimensionSpecTest
|
||||
{
|
||||
|
||||
@Test
|
||||
public void testSerde() throws Exception
|
||||
{
|
||||
ObjectMapper mapper = TestHelper.getObjectMapper();
|
||||
|
||||
String jsonStr = "{\n"
|
||||
+ " \"type\": \"regexFiltered\",\n"
|
||||
+ " \"delegate\": {\n"
|
||||
+ " \"type\": \"default\",\n"
|
||||
+ " \"dimension\": \"foo\",\n"
|
||||
+ " \"outputName\": \"bar\"\n"
|
||||
+ " },\n"
|
||||
+ " \"pattern\": \"xxx\"\n"
|
||||
+ "}";
|
||||
|
||||
RegexFilteredDimensionSpec actual = (RegexFilteredDimensionSpec) mapper.readValue(
|
||||
mapper.writeValueAsString(mapper.readValue(jsonStr, DimensionSpec.class)),
|
||||
DimensionSpec.class);
|
||||
|
||||
RegexFilteredDimensionSpec expected = new RegexFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
"xxx"
|
||||
);
|
||||
|
||||
Assert.assertEquals(expected, actual);
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testGetCacheKey()
|
||||
{
|
||||
RegexFilteredDimensionSpec spec1 = new RegexFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
"xxx"
|
||||
);
|
||||
|
||||
RegexFilteredDimensionSpec spec2 = new RegexFilteredDimensionSpec(
|
||||
new DefaultDimensionSpec("foo", "bar"),
|
||||
"xyz"
|
||||
);
|
||||
|
||||
Assert.assertFalse(Arrays.equals(spec1.getCacheKey(), spec2.getCacheKey()));
|
||||
}
|
||||
}
|
|
@ -36,6 +36,7 @@ import io.druid.query.aggregation.AggregatorFactory;
|
|||
import io.druid.query.aggregation.CountAggregatorFactory;
|
||||
import io.druid.query.aggregation.JavaScriptAggregatorFactory;
|
||||
import io.druid.query.aggregation.LongSumAggregatorFactory;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.query.filter.DimFilters;
|
||||
import io.druid.query.groupby.GroupByQuery;
|
||||
import io.druid.query.groupby.GroupByQueryConfig;
|
||||
|
@ -260,7 +261,7 @@ public class IncrementalIndexStorageAdapterTest
|
|||
Cursor cursor = Sequences.toList(Sequences.limit(cursorSequence, 1), Lists.<Cursor>newArrayList()).get(0);
|
||||
DimensionSelector dimSelector;
|
||||
|
||||
dimSelector = cursor.makeDimensionSelector("sally", null);
|
||||
dimSelector = cursor.makeDimensionSelector(new DefaultDimensionSpec("sally", "sally"));
|
||||
Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0)));
|
||||
|
||||
index.add(
|
||||
|
@ -274,7 +275,7 @@ public class IncrementalIndexStorageAdapterTest
|
|||
// Cursor reset should not be affected by out of order values
|
||||
cursor.reset();
|
||||
|
||||
dimSelector = cursor.makeDimensionSelector("sally", null);
|
||||
dimSelector = cursor.makeDimensionSelector(new DefaultDimensionSpec("sally", "sally"));
|
||||
Assert.assertEquals("bo", dimSelector.lookupName(dimSelector.getRow().get(0)));
|
||||
}
|
||||
|
||||
|
|
|
@ -31,6 +31,7 @@ import io.druid.data.input.Firehose;
|
|||
import io.druid.data.input.InputRow;
|
||||
import io.druid.data.input.MapBasedInputRow;
|
||||
import io.druid.granularity.QueryGranularity;
|
||||
import io.druid.query.dimension.DefaultDimensionSpec;
|
||||
import io.druid.query.filter.DimFilter;
|
||||
import io.druid.query.select.EventHolder;
|
||||
import io.druid.segment.Cursor;
|
||||
|
@ -85,7 +86,9 @@ public class IngestSegmentFirehose implements Firehose
|
|||
|
||||
final Map<String, DimensionSelector> dimSelectors = Maps.newHashMap();
|
||||
for (String dim : dims) {
|
||||
final DimensionSelector dimSelector = cursor.makeDimensionSelector(dim, null);
|
||||
final DimensionSelector dimSelector = cursor.makeDimensionSelector(
|
||||
new DefaultDimensionSpec(dim, dim)
|
||||
);
|
||||
// dimSelector is null if the dimension is not present
|
||||
if (dimSelector != null) {
|
||||
dimSelectors.put(dim, dimSelector);
|
||||
|
|
Loading…
Reference in New Issue