Commit Graph

1579 Commits

Author SHA1 Message Date
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Gian Merlino 06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Nishant 7995818220 Increase test timeout to prevent failing on slow machines (#3224)
constantly timing out on one of slow build machines, increasing the
timeout fixed it.

Running io.druid.granularity.QueryGranularityTest
Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776
sec - in io.druid.granularity.QueryGranularityTest
2016-07-17 18:44:48 -07:00
Gian Merlino 6cd1f5375b Better harmonized dimensions for query metrics. (#3245)
All query metrics now start with toolChest.makeMetricBuilder, and all of
*those* now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id"
moved to common code, since all query metrics added it anyway.

In particular this will add query-type specific dimensions like "threshold"
and "numDimensions" to servlet-originated metrics like query/time.
2016-07-14 11:55:51 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
Jonathan Wei f3a3662133 Fix compile error in SearchBinaryFnTest (#3201) 2016-06-29 09:44:45 -05:00
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Dave Li 8a08398977 Add segment pruning based on secondary partition dimension (#2982)
* add get dimension rangeset to filters

* add get domain to ShardSpec and added chunk filter in caching clustered client

* add null check and modified not filter, started with unit test

* add filter test with caching

* refactor and some comments

* extract filtershard to helper function

* fixup

* minor changes

* update javadoc
2016-06-24 14:52:19 -07:00
michaelschiff 66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino da660bb592 DumpSegment tool. (#3182)
Fixes #2723.
2016-06-23 14:37:50 -07:00
Gian Merlino a437fb150b Fix SegmentMetadataQuery when queryGranularity is requested but not present. (#3181) 2016-06-23 14:30:50 -07:00
Jonathan Wei 24860a1391 Two-stage filtering (#3018)
* Two-stage filtering

* PR comment
2016-06-22 16:08:21 -07:00
Nishant f46ad9a4cb support Union Segment metadata queries (#3132)
* support Union Segment metadata queries

fix 3128

* remove extraneous sys out
2016-06-21 10:30:50 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Nishant 0d427923c0 fix caching for search results (#3119)
* fix caching for search results

properly read count when reading from cache.

* fix NPE during merging search count and add test

* Update cache key to invalidate prev results
2016-06-09 17:49:47 -07:00
Gian Merlino 5998de7d5b Fix lenient merging of conflicting aggregators. (#3113)
This should have marked the conflicting aggregator as null, but instead it
threw an NPE for the entire query.
2016-06-08 15:56:48 -07:00
Jonathan Wei 37c8a8f186 Speed up filter tests with adapter cache (#3103) 2016-06-08 07:41:10 -07:00
Gian Merlino 54139c6815 Fix NPE in registeredLookup extractionFn when "optimize" is not provided. (#3064) 2016-06-03 12:58:17 -05:00
Gian Merlino 6171e078c8 Improve NPE message in LookupDimensionSpec when lookup does not exist. (#3065)
The message used to be empty, which made things hard to debug.
2016-06-02 19:59:12 -07:00
John Wang e662efa79f segment interface refactor for proposal 2965 (#2990) 2016-05-26 20:36:41 -07:00
Kurt Young b5bd406597 fix #2991: race condition in OnheapIncrementalIndex#addToFacts (#3002)
* fix #2991: race condition in OnheapIncrementalIndex#addToFacts

* add missing header

* handle parseExceptions when first doing first agg
2016-05-25 19:05:46 -07:00
Jonathan Wei b72c54c4f8 Add benchmark data generator, basic ingestion/persist/merge/query benchmarks (#2875) 2016-05-25 16:39:37 -07:00
Dave Li dcabd4b1ee Add lookup optimization for InDimFilter (#2938)
* Add lookup optimization for InDimFilter

* tests for in filter with lookup extraction fn

* refactor

* refactor2 and modified filter test

* make optimizeLookup private
2016-05-19 16:29:16 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Charles Allen fb01db4db7 [QTL] Allows RegisteredLookupExtractionFn to find its lookups lazily (#2971)
* Allows RegisteredLookupExtractionFn to find its lookups lazily

* Use raw variables instead of AtomicReference

* Make sure to use volatile

* Remove extra local variable.

* Move from BAOS to ByteBuffer
2016-05-17 11:29:39 -07:00
Himanshu d3e9c47a5f use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
Himanshu d821144738 at historicals GpBy query mergeResults does not need merging as results are already merged by GroupByQueryRunnerFactory.mergeRunners(..) (#2962) 2016-05-12 17:41:24 -07:00
Gian Merlino 01bebf432a GroupByQuery: Multi-value dimension tests. (#2959) 2016-05-12 11:31:50 -07:00
Charles Allen a31348450f Add toString for LookupConfig (#2935)
* Helps with operations and getting where the snapshot dir is
2016-05-09 18:20:00 -07:00
Dave Li 79a54283d4 Optimize filter for timeseries, search, and select queries (#2931)
* Optimize filter for timeseries, search, and select queries

* exception at failed toolchest type check

* took out query type check

* java7 error fix and test improvement
2016-05-09 11:04:06 -07:00
Slim 8b570ab130 make it clear what LookupExtractorFactory start/stop methods return (#2925) 2016-05-05 10:38:40 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Himanshu 8e2742b7e8 adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
Gian Merlino 40e595c7a0 Remove types from TimeAndDims, they aren't needed. (#2865) 2016-05-03 13:10:25 -05:00
binlijin 841be5c61f periodically emit metric segment/scan/pending (#2854) 2016-05-02 22:38:13 -07:00
Navis Ryu 2729fea84d Fix parsing fail of segment id with datasource containing underscore (#2797)
* Fix parsing fail of segment id with underscored datasource (Fix for #2786)

* addressed comment

* renamed and moved code into api. added log4 dependency for tests

* addressed comments

* fixed test fails
2016-05-02 22:37:28 -07:00
Gian Merlino 90ce03c66f Fix integer overflow in SegmentMetadataQuery numRows. (#2890) 2016-04-27 14:37:04 -07:00
Gian Merlino 6dc7688a29 TimeAndDims equals/hashCode implementation. (#2870)
Adapted from #2692, thanks @navis for original implementation.
2016-04-22 08:45:20 +08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00
Gian Merlino c74391e54c JavaScript: Ability to disable. (#2853)
Fixes #2852.
2016-04-21 09:43:15 -05:00
Gian Merlino 7d3e55717d Reduce cost of various toFilter calls. (#2860)
These happen once per segment and so it's better if they don't do
as much work.
2016-04-21 04:28:46 +08:00
Gian Merlino 59460b17cc Add Filters.matchPredicate helper, use it where appropriate. (#2851)
This approach simplifies code and is generally faster, due to skipping
unnecessary dictionary lookups (see #2850).
2016-04-19 15:54:32 -07:00
Xavier Léauté b2745befb7 remove obsolete comment (#2858) 2016-04-19 13:06:58 -07:00
Jisoo Kim 7b65ca7889 refactor ClientQuerySegmentWalker (#2837)
* refactor ClientQuerySegmentWalker

* add header to FluentQueryRunnerBuilder

* refactor QueryRunnerTestHelper
2016-04-18 14:00:47 -07:00
Gian Merlino 7c0b1dde3a DimensionPredicateFilter: Skip unnecessary dictionary lookup. (#2850) 2016-04-18 12:38:25 -07:00
Jonathan Wei b534f7203c Fix performance regression from #2753 in IndexMerger (#2841) 2016-04-14 21:39:41 -07:00
Jonathan Wei a26134575b Fix NPE in TopNLexicographicResultBuilder.addEntry() (#2835) 2016-04-13 17:27:16 -07:00
Fangjin Yang abd951df1a Document how to use roaring bitmaps (#2824)
* Document how to use roaring bitmaps

This fixes #2408.
While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on.

* fix

* fix

* fix

* fix
2016-04-12 19:28:02 -07:00
michaelschiff db35dd7508 fix issue #2744. Check for null before combining metrics (#2774) 2016-04-12 14:46:31 -07:00
Nishant 1bf1dd03a0 Merge pull request #2812 from mrijke/fix-missing-equals-hashcode-filters
Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters
2016-04-12 12:00:23 +05:30
Charles Allen 21e406613c Merge pull request #2809 from metamx/fix2694
Fix test for snapshot taker to better check for lookup perist failure
2016-04-11 14:52:47 -07:00
Maarten Rijke de68d6b7c4 Add missing equals/hashcode to JS, Regex and SearchQuery DimFilters
This commits adds missing equals() and hashcode() methods to
 the JavascriptDimFilter, RegexDimFilter and the SearchQueryDimFilter.
2016-04-11 12:16:24 +02:00
Nishant bbb326decf Merge pull request #2799 from b-slim/fix_snapshot
MapLookupFactory need to be Ser/Desr ready.
2016-04-07 13:22:34 +05:30
Slim Bouguerra bf1eafc4e1 remove all the mock lookupFactory 2016-04-06 15:37:52 -05:00
Slim Bouguerra 59eb2490a0 MapLookupFactory need to be Ser/Desr. 2016-04-06 15:02:18 -05:00
Charles Allen f915a59138 Merge pull request #2691 from metamx/lookupExtrFn
Add ExtractionFn to LookupExtractor bridge
2016-04-06 09:13:08 -07:00
jon-wei 051fd6c0eb Remove extra println from InFilter 2016-04-05 14:55:49 -07:00
Fangjin Yang 289bb6f885 Merge pull request #2690 from jon-wei/filter_support
Allow filters to use extraction functions
2016-04-05 15:40:15 -06:00
jon-wei 0e481d6f93 Allow filters to use extraction functions 2016-04-05 13:24:56 -07:00
Gian Merlino e060a9f283 Additional ExtractionFn null-handling adjustments.
Followup to comments on #2771.
2016-04-01 18:35:26 -07:00
Fangjin Yang 18b9ea62cf Merge pull request #2771 from gianm/extractionfn-stuff
Various ExtractionFn null handling fixes.
2016-04-01 16:35:46 -07:00
Gian Merlino 23d66e5ff9 Merge pull request #2765 from navis/invalid-encode-nullstring
Null string is encoded as "null" in incremental index
2016-04-01 14:43:40 -07:00
Gian Merlino b6e4d8b2c1 Various ExtractionFn null handling fixes.
- JavaScriptExtractionFn shouldn't pass empty strings to its JS functions
- Upper/LowerExtractionFn properly handles null Objects (DimExtractionFn's implementation works here)
- MatchingDimExtractionFn properly returns nulls rather than empties
- RegexDimExtractionFn properly attempts matching on nulls and empties
- SearchQuerySpecDimExtractionFn properly returns nulls when passed empties
2016-04-01 14:34:47 -07:00
Fangjin Yang eea7a47870 Merge pull request #2576 from navis/paging-from-next
Add option for select query to get next page without modifying returned paging identifiers
2016-04-01 13:50:36 -07:00
Fangjin Yang 4eb5a2c4f1 Merge pull request #2715 from navis/stringformat-null-handling
stringFormat extractionFn should be able to return null on null values (Fix for #2706)
2016-04-01 13:45:28 -07:00
Gian Merlino 23364a47fd BaseFilterTest: Test optimized filters too. 2016-04-01 12:44:59 -07:00
navis.ryu 077522a46f stringFormat extractionFn should be able to return null on null values (Fix for #2706) 2016-04-01 13:40:56 +09:00
navis.ryu f0e55f5d31 Null string is encoded as "null" in incremental index 2016-04-01 09:47:15 +09:00
navis.ryu 29bb00535b Add option for select query to get next page without modifying returned paging identifiers 2016-04-01 09:03:03 +09:00
Gian Merlino 5f9240fcbc Merge pull request #2577 from navis/native-in-filter
Implement native in filter
2016-03-30 20:02:54 -07:00
Fangjin Yang 3d68da94fe Merge pull request #2661 from navis/utf8-estimated-length
Utility method for length estimation of utf8
2016-03-30 19:56:14 -07:00
navis.ryu 108535fd07 Implement native in filter (Fix for #2577) 2016-03-31 10:10:57 +09:00
navis.ryu e0cfd9ee19 Utility method for length estimation of utf8 2016-03-31 10:07:00 +09:00
jon-wei 5503bf1b38 Remove unnecessary type check in TimeAndDimsComp 2016-03-30 17:54:15 -07:00
Fangjin Yang 95733a362f Merge pull request #2753 from gianm/null-filtering-multi-value-columns
More consistent empty-set filtering behavior on multi-value columns.
2016-03-29 18:52:25 -07:00
Charles Allen 95d42cfd9e Merge pull request #2758 from pjain1/fix_npe_in_filter
handle null values in In Filter
2016-03-29 17:53:02 -07:00
Gian Merlino 1853f36e9f More consistent empty-set filtering behavior on multi-value columns.
The behavior is now that filters on "null" will match rows with no
values. The behavior in the past was inconsistent; sometimes these
filters would match and sometimes they wouldn't.

Adds tests for this behavior to SelectorFilterTest and
BoundFilterTest, for query-level filters and filtered aggregates.

Fixes #2750.
2016-03-29 15:32:13 -07:00
Parag Jain d892918a3d handle null values in In Filter 2016-03-29 17:03:26 -05:00
Fangjin Yang e023df2b92 Merge pull request #2754 from gianm/i-dont-get-it
Remove error suppression code from IncrementalIndexAdapter.
2016-03-28 19:29:53 -07:00
Gian Merlino c7ff0d698e Remove error suppression code from IncrementalIndexAdapter. 2016-03-28 18:40:27 -07:00
fjy c418a55638 cleanup distinct count agg 2016-03-28 17:29:41 -07:00
Fangjin Yang 9cb197adec Merge pull request #2722 from himanshug/fix_hadoop_jar_upload
config to explicitly specify classpath for hadoop container during hadoop ingestion
2016-03-28 14:49:03 -07:00
Charles Allen 4a98c4fbac Fix LookupExtractionFn equals and hashCode 2016-03-28 13:14:43 -07:00
Charles Allen 0ee861d0da Add ExtractionFn to LookupExtractor bridge 2016-03-28 13:14:43 -07:00
Fangjin Yang 7fe277e6da Merge pull request #2727 from gianm/optimize-bound-filter
BoundFilter optimizations, and related interface changes.
2016-03-26 18:59:05 -07:00
Fangjin Yang 0dae28b6af Merge pull request #2729 from jon-wei/fix_hyperunique_comparator
Fix HyperUniquesAggregatorFactory comparator
2016-03-26 15:39:35 -07:00
Gian Merlino 2970b49adc BoundFilter optimizations, and related interface changes.
BoundFilter:

- For lexicographic bounds, use bitmapIndex.getIndex to find the start and end points,
  then union all bitmaps between those points.
- For alphanumeric bounds, iterate through dimValues, and union all bitmaps for values
  matching the predicate.
- Change behavior for nulls: it used to be that the BoundFilter would never match nulls,
  now it matches nulls if "" is allowed by the lower limit and not excluded by the
  upper limit.

Interface changes:

- BitmapIndex: add `int getIndex(value)` to make it possible to get the index for a
  value without retrieving the bitmap.
- BitmapIndex: remove `ImmutableBitmap getBitmap(value)`, change callers to `getBitmap(getIndex(value))`.
- BitmapIndexSelector: allow retrieving the underlying BitmapIndex through getBitmapIndex.
- Clarified contract of indexOf in Indexed, GenericIndexed.

Also added tests for SelectorFilter, NotFilter, and BoundFilter.
2016-03-25 14:11:48 -07:00
jon-wei 9afaa2b94a Fix HyperUniquesAggregatorFactory comparator 2016-03-25 12:36:42 -07:00
Gian Merlino 4ac9e03161 Fix predicate-based ValueMatcher behavior for IncrementalIndex on missing columns.
Missing columns should be treated the same as columns containing 100% nulls.
2016-03-25 10:23:59 -07:00
Himanshu Gupta e78a469fb7 UTs for ExtensionsConfig 2016-03-25 10:51:28 -05:00
Himanshu Gupta 004b00bb96 config to explicitly specify classpath for hadoop container during hadoop ingestion 2016-03-25 10:51:28 -05:00
Nishant 0b03c9405f Merge pull request #2614 from sirpkt/calendric_gran
Support week, month, quarter, and year in query granularity
2016-03-24 16:21:01 -07:00
Himanshu 56343c6cdc Merge pull request #2704 from navis/simple-optimize
optimize single elemented and/or filter
2016-03-24 16:13:48 -05:00
Gian Merlino 713062053c Filters: Add filter.toFilter method, use that instead of the instanceof chain in Filters.
I believe that the instanceof chain in Filters exists because in the past, Filter
and DimFilter were in different packages (DimFilter was in druid-client and Filter
was in druid-processing). And since druid-client didn't depend on druid-processing,
DimFilter couldn't have a toFilter method. But now it can.
2016-03-23 17:03:49 -07:00
Gian Merlino dd86198902 All Filters should work with FilteredAggregators.
This removes Filter.makeMatcher(ColumnSelectorFactory) and adds a
ValueMatcherFactory implementation to FilteredAggregatorFactory so it can
take advantage of existing makeMatcher(ValueMatcherFactory) implementations.

This patch also removes the Bound-based method from ValueMatcherFactory. Its
only user was the SpatialFilter, which could use the Predicate-based method.

Fixes #2604.
2016-03-23 12:24:01 -07:00