Merge branch 'master' into druid-0.7.x

Conflicts:
	processing/src/test/java/io/druid/query/search/SearchQueryRunnerTest.java
This commit is contained in:
fjy 2014-11-10 14:52:10 -08:00
commit 6188315293
14 changed files with 126 additions and 36 deletions

View File

@ -13,6 +13,15 @@ Druid supports the following types of having clauses.
The simplest having clause is a numeric filter.
Numeric filters can be used as the base filters for more complex boolean expressions of filters.
Here's an example of a having-clause numeric filter:
```json
{
"type": "greaterThan",
"aggregation": "myAggMetric",
"value": 100
}
#### Equal To
The equalTo filter will match rows with a specific aggregate value.
@ -21,7 +30,7 @@ The grammar for an `equalTo` filter is as follows:
```json
{
"type": "equalTo",
"aggregation": <aggregate_metric>,
"aggregation": "<aggregate_metric>",
"value": <numeric_value>
}
```
@ -36,7 +45,7 @@ The grammar for a `greaterThan` filter is as follows:
```json
{
"type": "greaterThan",
"aggregation": <aggregate_metric>,
"aggregation": "<aggregate_metric>",
"value": <numeric_value>
}
```
@ -51,7 +60,7 @@ The grammar for a `greaterThan` filter is as follows:
```json
{
"type": "lessThan",
"aggregation": <aggregate_metric>,
"aggregation": "<aggregate_metric>",
"value": <numeric_value>
}
```

View File

@ -119,15 +119,15 @@ Including this strategy means all timeBoundary queries are always routed to the
Queries with a priority set to less than minPriority are routed to the lowest priority broker. Queries with priority set to greater than maxPriority are routed to the highest priority broker. By default, minPriority is 0 and maxPriority is 1. Using these default values, if a query with priority 0 (the default query priority is 0) is sent, the query skips the priority selection logic.
### javascript
### JavaScript
Allows defining arbitrary routing rules using a JavaScript function. The function is passed the configuration and the query to be executed, and returns the tier it should be routed to, or null for the default tier.
*Example*: a function that return the highest priority broker unless the given query has more than two aggregators.
*Example*: a function that sends queries containing more than three aggregators to the lowest priority broker.
```json
{
"type" : "javascript",
"function" : "function (config, query) { if (config.getTierToBrokerMap().values().size() > 0 && query.getAggregatorSpecs && query.getAggregatorSpecs().size() <= 2) { return config.getTierToBrokerMap().values().toArray()[0] } else { return config.getDefaultBrokerServiceName() } }"
"function" : "function (config, query) { if (query.getAggregatorSpecs && query.getAggregatorSpecs().size() >= 3) { var size = config.getTierToBrokerMap().values().size(); if (size > 0) { return config.getTierToBrokerMap().values().toArray()[size-1] } else { return config.getDefaultBrokerServiceName() } } else { return null } }"
}
```

View File

@ -30,14 +30,14 @@ There are several main parts to a search query:
|property|description|required?|
|--------|-----------|---------|
|queryType|This String should always be "search"; this is the first thing Druid looks at to figure out how to interpret the query|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|granularity|Defines the granularity of the query. See [Granularities](Granularities.html)|yes|
|filter|See [Filters](Filters.html)|no|
|queryType|This String should always be "search"; this is the first thing Druid looks at to figure out how to interpret the query.|yes|
|dataSource|A String defining the data source to query, very similar to a table in a relational database.|yes|
|granularity|Defines the granularity of the query. See [Granularities](Granularities.html).|yes|
|filter|See [Filters](Filters.html).|no|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|searchDimensions|The dimensions to run the search over. Excluding this means the search is run over all dimensions.|no|
|query|See [SearchQuerySpec](SearchQuerySpec.html).|yes|
|sort|How the results of the search should be sorted. Two possible types here are "lexicographic" and "strlen".|yes|
|sort|An object specifying how the results of the search should be sorted. Two possible types here are "lexicographic" (the default sort) and "strlen".|no|
|context|An additional JSON Object which can be used to specify certain flags.|no|
The format of the result is:

View File

@ -26,9 +26,9 @@ There are several main parts to a select query:
|dataSource|A String defining the data source to query, very similar to a table in a relational database|yes|
|intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes|
|filter|See [Filters](Filters.html)|no|
|dimensions|The list of dimensions to select. If left empty, all dimensions are returned.|no|
|metrics|The list of metrics to select. If left empty, all metrics are returned.|no|
|pagingSpec|A JSON object indicating offsets into different scanned segments. Select query results will return a pagingSpec that can be reused for pagination.|yes|
|dimensions|A String array of dimensions to select. If left empty, all dimensions are returned.|no|
|metrics|A String array of metrics to select. If left empty, all metrics are returned.|no|
|pagingSpec|A JSON object indicating offsets into different scanned segments. Query results will return a `pagingIdentifiers` value that can be reused in the next query for pagination.|yes|
|context|An additional JSON Object which can be used to specify certain flags.|no|
The format of the result is:
@ -140,4 +140,30 @@ The format of the result is:
} ]
```
The result returns a global pagingSpec that can be reused for the next select query. The offset will need to be increased by 1 on the client side.
The `threshold` determines how many hits are returned, with each hit indexed by an offset.
The results above include:
```json
"pagingIdentifiers" : {
"wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9" : 4
},
```
This can be used with the next query's pagingSpec:
```json
{
"queryType": "select",
"dataSource": "wikipedia",
"dimensions":[],
"metrics":[],
"granularity": "all",
"intervals": [
"2013-01-01/2013-01-02"
],
"pagingSpec":{"pagingIdentifiers": {"wikipedia_2012-12-29T00:00:00.000Z_2013-01-10T08:00:00.000Z_2013-01-10T08:13:47.830Z_v9" : 5}, "threshold":5}
}
Note that in the second query, an offset is specified and that it is 1 greater than the largest offset found in the initial results. To return the next "page", this offset must be incremented by 1 with each new query. When an empty results set is received, the very last page has been returned.

View File

@ -11,7 +11,7 @@ The topN metric spec specifies how topN values should be sorted.
The simplest metric specification is a String value indicating the metric to sort topN results by. They are included in a topN query with:
```json
"metric": <metric_value_string>
"metric": "<metric_name>"
```
The metric field can also be given as a JSON object. The grammar for dimension values sorted by numeric value is shown below:
@ -19,7 +19,7 @@ The metric field can also be given as a JSON object. The grammar for dimension v
```json
"metric": {
"type": "numeric",
"metric": "<metric_value>"
"metric": "<metric_name>"
}
```

View File

@ -72,9 +72,9 @@ There are 10 parts to a topN query, but 7 of them are shared with [TimeseriesQue
|property|description|required?|
|--------|-----------|---------|
|dimension|A JSON object defining the dimension that you want the top taken for. For more info, see [DimensionSpecs](DimensionSpecs.html)|yes|
|dimension|A String or JSON object defining the dimension that you want the top taken for. For more info, see [DimensionSpecs](DimensionSpecs.html)|yes|
|threshold|An integer defining the N in the topN (i.e. how many you want in the top list)|yes|
|metric|A JSON object specifying the metric to sort by for the top list. For more info, see [TopNMetricSpec](TopNMetricSpec.html).|yes|
|metric|A String or JSON object specifying the metric to sort by for the top list. For more info, see [TopNMetricSpec](TopNMetricSpec.html).|yes|
Please note the context JSON object is also available for topN queries and should be used with the same caution as the timeseries case.
The format of the results would look like so:

View File

@ -75,9 +75,13 @@ Setting up Zookeeper
Before we get started, we need to start Apache Zookeeper.
```bash
curl http://apache.osuosl.org/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz
tar xzf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
Download zookeeper from [http://www.apache.org/dyn/closer.cgi/zookeeper/](http://www.apache.org/dyn/closer.cgi/zookeeper/)
Install zookeeper.
e.g.
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..

View File

@ -48,9 +48,13 @@ CREATE database druid;
#### Setting up Zookeeper
```bash
curl http://apache.osuosl.org/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz -o zookeeper-3.4.5.tar.gz
tar xzf zookeeper-3.4.5.tar.gz
cd zookeeper-3.4.5
Download zookeeper from [http://www.apache.org/dyn/closer.cgi/zookeeper/](http://www.apache.org/dyn/closer.cgi/zookeeper/)
Install zookeeper.
e.g.
curl http://www.gtlib.gatech.edu/pub/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz -o zookeeper-3.4.6.tar.gz
tar xzf zookeeper-3.4.6.tar.gz
cd zookeeper-3.4.6
cp conf/zoo_sample.cfg conf/zoo.cfg
./bin/zkServer.sh start
cd ..

View File

@ -41,7 +41,11 @@ public class LexicographicSearchSortSpec implements SearchSortSpec
@Override
public int compare(SearchHit searchHit, SearchHit searchHit1)
{
return searchHit.getValue().compareTo(searchHit1.getValue());
int retVal = searchHit.getValue().compareTo(searchHit1.getValue());
if (retVal == 0) {
retVal = searchHit.getDimension().compareTo(searchHit1.getDimension());
}
return retVal;
}
};
}

View File

@ -88,12 +88,35 @@ public class SearchQueryRunnerTest
QueryRunnerTestHelper.qualityDimension,
Sets.newHashSet("automotive", "mezzanine", "travel", "health", "entertainment")
);
expectedResults.put(QueryRunnerTestHelper.marketDimension.toLowerCase(), Sets.newHashSet("total_market"));
expectedResults.put(QueryRunnerTestHelper.marketDimension, Sets.newHashSet("total_market"));
expectedResults.put(QueryRunnerTestHelper.placementishDimension, Sets.newHashSet("a"));
checkSearchQuery(searchQuery, expectedResults);
}
@Test
public void testSearchSameValueInMultiDims()
{
SearchQuery searchQuery = Druids.newSearchQueryBuilder()
.dataSource(QueryRunnerTestHelper.dataSource)
.granularity(QueryRunnerTestHelper.allGran)
.intervals(QueryRunnerTestHelper.fullOnInterval)
.dimensions(
Arrays.asList(
QueryRunnerTestHelper.placementDimension,
QueryRunnerTestHelper.placementishDimension
)
)
.query("e")
.build();
Map<String, Set<String>> expectedResults = Maps.newTreeMap(String.CASE_INSENSITIVE_ORDER);
expectedResults.put(QueryRunnerTestHelper.placementDimension, Sets.newHashSet("preferred"));
expectedResults.put(QueryRunnerTestHelper.placementishDimension, Sets.newHashSet("e", "preferred"));
checkSearchQuery(searchQuery, expectedResults);
}
@Test
public void testFragmentSearch()
{

View File

@ -32,7 +32,7 @@ public class DruidHttpClientConfig
{
@JsonProperty
@Min(0)
private int numConnections = 5;
private int numConnections = 20;
@JsonProperty
private Period readTimeout = new Period("PT15M");

View File

@ -421,7 +421,7 @@ public class ServerManager implements QuerySegmentWalker
},
new ReferenceCountingSegmentQueryRunner<T>(factory, adapter),
"scan/time"
).withWaitMeasuredFromNow(),
),
cacheConfig
)
)

View File

@ -31,8 +31,7 @@ public class ServerConfig
{
@JsonProperty
@Min(1)
// Jetty defaults are whack
private int numThreads = Math.max(10, (Runtime.getRuntime().availableProcessors() * 17) / 16 + 2);
private int numThreads = Math.max(10, (Runtime.getRuntime().availableProcessors() * 17) / 16 + 2) + 30;
@JsonProperty
@NotNull

View File

@ -28,6 +28,7 @@ import io.druid.query.aggregation.AggregatorFactory;
import io.druid.query.aggregation.CountAggregatorFactory;
import io.druid.query.aggregation.DoubleSumAggregatorFactory;
import io.druid.query.aggregation.LongSumAggregatorFactory;
import io.druid.query.topn.TopNQueryBuilder;
import org.junit.Assert;
import org.junit.Test;
@ -36,7 +37,7 @@ import java.util.LinkedHashMap;
public class JavaScriptTieredBrokerSelectorStrategyTest
{
final TieredBrokerSelectorStrategy jsStrategy = new JavaScriptTieredBrokerSelectorStrategy(
"function (config, query) { if (config.getTierToBrokerMap().values().size() > 0 && query.getAggregatorSpecs && query.getAggregatorSpecs().size() <= 2) { return config.getTierToBrokerMap().values().toArray()[0] } else { return config.getDefaultBrokerServiceName() } }"
"function (config, query) { if (query.getAggregatorSpecs && query.getDimensionSpec && query.getDimensionSpec().getDimension() == 'bigdim' && query.getAggregatorSpecs().size() >= 3) { var size = config.getTierToBrokerMap().values().size(); if (size > 0) { return config.getTierToBrokerMap().values().toArray()[size-1] } else { return config.getDefaultBrokerServiceName() } } else { return null } }"
);
@Test
@ -57,7 +58,8 @@ public class JavaScriptTieredBrokerSelectorStrategyTest
{
final LinkedHashMap<String, String> tierBrokerMap = new LinkedHashMap<>();
tierBrokerMap.put("fast", "druid/fastBroker");
tierBrokerMap.put("slow", "druid/broker");
tierBrokerMap.put("fast", "druid/broker");
tierBrokerMap.put("slow", "druid/slowBroker");
final TieredBrokerConfig tieredBrokerConfig = new TieredBrokerConfig()
{
@ -74,8 +76,11 @@ public class JavaScriptTieredBrokerSelectorStrategyTest
}
};
final Druids.TimeseriesQueryBuilder queryBuilder = Druids.newTimeseriesQueryBuilder().dataSource("test")
final TopNQueryBuilder queryBuilder = new TopNQueryBuilder().dataSource("test")
.intervals("2014/2015")
.dimension("bigdim")
.metric("count")
.threshold(1)
.aggregators(
ImmutableList.<AggregatorFactory>of(
new CountAggregatorFactory("count")
@ -83,7 +88,7 @@ public class JavaScriptTieredBrokerSelectorStrategyTest
);
Assert.assertEquals(
Optional.of("druid/fastBroker"),
Optional.absent(),
jsStrategy.getBrokerServiceName(
tieredBrokerConfig,
queryBuilder.build()
@ -92,13 +97,29 @@ public class JavaScriptTieredBrokerSelectorStrategyTest
Assert.assertEquals(
Optional.of("druid/broker"),
Optional.absent(),
jsStrategy.getBrokerServiceName(
tieredBrokerConfig,
Druids.newTimeBoundaryQueryBuilder().dataSource("test").bound("maxTime").build()
)
);
Assert.assertEquals(
Optional.of("druid/slowBroker"),
jsStrategy.getBrokerServiceName(
tieredBrokerConfig,
queryBuilder.aggregators(
ImmutableList.of(
new CountAggregatorFactory("count"),
new LongSumAggregatorFactory("longSum", "a"),
new DoubleSumAggregatorFactory("doubleSum", "b")
)
).build()
)
);
// in absence of tiers, expect the default
tierBrokerMap.clear();
Assert.assertEquals(
Optional.of("druid/broker"),
jsStrategy.getBrokerServiceName(