825 Commits

Author SHA1 Message Date
Gian Merlino
23364a47fd BaseFilterTest: Test optimized filters too. 2016-04-01 12:44:59 -07:00
navis.ryu
077522a46f stringFormat extractionFn should be able to return null on null values (Fix for #2706) 2016-04-01 13:40:56 +09:00
navis.ryu
f0e55f5d31 Null string is encoded as "null" in incremental index 2016-04-01 09:47:15 +09:00
navis.ryu
29bb00535b Add option for select query to get next page without modifying returned paging identifiers 2016-04-01 09:03:03 +09:00
navis.ryu
108535fd07 Implement native in filter (Fix for #2577) 2016-03-31 10:10:57 +09:00
Fangjin Yang
95733a362f Merge pull request #2753 from gianm/null-filtering-multi-value-columns
More consistent empty-set filtering behavior on multi-value columns.
2016-03-29 18:52:25 -07:00
Charles Allen
95d42cfd9e Merge pull request #2758 from pjain1/fix_npe_in_filter
handle null values in In Filter
2016-03-29 17:53:02 -07:00
Gian Merlino
1853f36e9f More consistent empty-set filtering behavior on multi-value columns.
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.

Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.

Fixes #2750.
2016-03-29 15:32:13 -07:00
Parag Jain
d892918a3d handle null values in In Filter 2016-03-29 17:03:26 -05:00
Fangjin Yang
9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Charles Allen
0ee861d0da Add ExtractionFn to LookupExtractor bridge 2016-03-28 13:14:43 -07:00
Fangjin Yang
7fe277e6da Merge pull request #2727 from gianm/optimize-bound-filter
BoundFilter optimizations, and related interface changes.
2016-03-26 18:59:05 -07:00
Gian Merlino
2970b49adc BoundFilter optimizations, and related interface changes.
BoundFilter:

- For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
  then union all bitmaps between those points.
- For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
  matching the predicate.
- Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
  now it matches nulls if "" is allowed by the lower limit and not excluded by the
  upper limit.

Interface changes:

- BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a
  value without retrieving the bitmap.
- BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`.
- BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
- Clarified contract of indexOf in Indexed, GenericIndexed.

Also added tests for SelectorFilter, NotFilter, and BoundFilter.
2016-03-25 14:11:48 -07:00
jon-wei
9afaa2b94a Fix HyperUniquesAggregatorFactory comparator 2016-03-25 12:36:42 -07:00
Himanshu Gupta
e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Nishant
0b03c9405f Merge pull request #2614 from sirpkt/calendric_gran
Support week, month, quarter, and year in query granularity
2016-03-24 16:21:01 -07:00
Himanshu
56343c6cdc Merge pull request #2704 from navis/simple-optimize
optimize single elemented and/or filter
2016-03-24 16:13:48 -05:00
Gian Merlino
713062053c Filters: Add filter.toFilter method, use that instead of the instanceof chain in Filters.
I believe that the instanceof chain in Filters exists because in the past, Filter
and DimFilter were in different packages (DimFilter was in druid-client and Filter
was in druid-processing). And since druid-client didn't depend on druid-processing,
DimFilter couldn't have a toFilter method. But now it can.
2016-03-23 17:03:49 -07:00
Gian Merlino
dd86198902 All Filters should work with FilteredAggregators.
This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a
ValueMatcherFactory implementation to FilteredAggregatorFactory so it can
take advantage of existing makeMatcher(ValueMatcherFactory) implementations.

This patch also removes the Bound-based method from ValueMatcherFactory. Its
only user was the SpatialFilter, which could use the Predicate-based method.

Fixes #2604.
2016-03-23 12:24:01 -07:00
binlijin
57d78d3293 clean tmp file when index merge fail 2016-03-23 10:55:12 +08:00
navis.ryu
91f6be4884 optimize single elemented and/or filter 2016-03-23 09:29:15 +09:00
jon-wei
a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Keuntae Park
7f29f2ac3b support week, month, quarter, year in query granularity 2016-03-21 17:41:53 +09:00
Charles Allen
5da9a280b6 Query Time Lookup - Dynamic Configuration 2016-03-18 09:45:05 -07:00
Slim
cf342d8d3c Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility
[QTL][Lookup] lookup module with the snapshot utility
2016-03-17 11:39:47 -05:00
Slim Bouguerra
0c86b29ef0 lookup module with the snapshot utility 2016-03-17 09:20:41 -05:00
Charles Allen
2ac8a22173 Merge pull request #2579 from metamx/closerIsCloser
Make CloserRule use guava's Closer
2016-03-14 17:18:19 -07:00
Charles Allen
a64979463f Make CloserRule use guava's Closer 2016-03-14 15:01:24 -07:00
Fangjin Yang
06813b510a Merge pull request #2571 from himanshug/gp_by_avoid_sort
avoid sort while doing groupBy merging when possible
2016-03-14 14:46:51 -07:00
Fangjin Yang
dbdbacaa18 Merge pull request #2260 from navis/cardinality-for-searchquery
Support cardinality for search query
2016-03-14 13:24:40 -07:00
navis.ryu
be341bf4e3 Support cardinality for search query (Fix for #2260) 2016-03-12 09:51:01 +09:00
Xavier Léauté
6f0d6ef0e9 optimize timeboundary for min or max bound 2016-03-11 14:11:47 -08:00
Himanshu Gupta
02dfd5cd80 update IncrementalIndex to support unsorted facts map that can be used in groupBy merging to improve performance 2016-03-10 16:11:48 -06:00
Gian Merlino
a2b1652787 Clarify parser docs.
- Clarify what parseSpecs are used for.
- Avro, Protobuf should use timeAndDims parseSpecs.
- Hadoop jobs should use hadoopyString string parsers.
2016-03-10 08:45:04 -08:00
Gian Merlino
708bc674fa Make specifying query context booleans more consistent.
Before, some needed to be strings and some needed to be real booleans. Now
they can all be either one.
2016-03-08 19:38:26 -08:00
Himanshu Gupta
099acb4966 allow groupBy max[Intermediate]Rows limit be overridable by context 2016-03-07 15:22:41 -06:00
Himanshu Gupta
c544ebf25e reintroducing the safety check removed in commit-1d602be so that dim value ids are less than cardinality 2016-03-03 23:34:23 -06:00
Bingkun Guo
4a58462fc7 update querySegmentSpec when passing query to getQueryRunner
After finding the FireChief for a specific partition, Druid will need to find the specific queryRunner for each segment being queried by passing the query to FireChief. Currently Druid is passing the original query that contains all the segments need to be queried, it's possible that fireChief.getQueryRunner(query) returns more than 1 queryRunner because query.getIntervals() is not specific to a single segment.

In this patch, for each segment being queried, Druid will update the query with its corresponding SpecificSegmentSpec.
2016-03-02 16:44:56 -06:00
Nishant
31b502773a Merge pull request #2480 from navis/pagingfail-over-segments
Select query cannot span to next segment with paging
2016-03-01 11:42:41 +05:30
Himanshu Gupta
0722ced413 with GpBy query outer query results need to be further merged 2016-02-29 10:16:25 -06:00
navis.ryu
5f1e60324a Added more complex test case with versioned segments 2016-02-29 14:48:24 +09:00
navis.ryu
2686bfa394 Select query cannot span to next segment with paging 2016-02-29 00:01:46 +09:00
jon-wei
fd3782522c Rename 'replaceMissingValues...' parameters in RegexExtractionFn 2016-02-24 13:12:56 -08:00
Nishant
fb7eae34ed Merge pull request #2249 from metamx/workerExpanded
Use Worker instead of ZkWorker whenever possible
2016-02-24 13:23:22 +05:30
Charles Allen
ac13a5942a Use Worker instead of ZkWorker whenver possible
* Moves last run task state information to Worker
* Makes WorkerTaskRunner a TaskRunner which has interfaces to help with getting information about a Worker
2016-02-23 15:02:03 -08:00
Gian Merlino
3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Fangjin Yang
3bdd757024 Merge pull request #1773 from b-slim/log_details
Adding downstream source when throwing QueryInterruptedException
2016-02-22 10:16:07 -08:00
Slim Bouguerra
77925cc061 adding downstream source of QueryInterruptedException 2016-02-20 13:05:14 -06:00
Gian Merlino
d25c46cb9f Add comparator to HyperUniquesFinalizingPostAggregator.
This makes it possible to do groupBys with clauses like "HAVING uniques > 10".
Beforehand you couldn't do it with either an aggregator (because it returns
an HLLV1 which the havingSpec can't understand) or a finalized postaggregator
(because it didn't have a comparator).

Now you can at least do it with a finalizing postaggregator. Trying it with
the aggregator alone still doesn't work.

Added some topN and groupBy tests verifying the comparator, and added an
@Ignore test that should pass if havingSpecs are made work on the aggregator
directly.
2016-02-19 08:36:08 -08:00
Jaehong Choi
32b9d57b23 handle a failing UT in GroupByQueryRunnerTest after merging into the master 2016-02-16 16:56:57 +09:00