Commit Graph

1535 Commits

Author SHA1 Message Date
Jonathan Wei a6105cbb86 Add numeric StringComparator (#3270)
* Add numeric StringComparator

* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query

* Address PR comments, add multithreaded BoundDimFilter test

* Add comment on strlen tie handling

* Add timeseries interval filter benchmark

* Adjust docs

* Use jackson for StringComparator, address PR comments

* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)

* More TopNMetricSpec and SearchSortSpec tests

* Fix NewSearchSortSpec serde

* Update docs for new DimensionTopNMetricSpec

* Delete NumericDimensionTopNMetricSpec

* Delete old SearchSortSpec

* Rename NewSearchSortSpec to SearchSortSpec

* Add TopN numeric comparator benchmark, address PR comments

* Refactor OrderByColumnSpec

* Add null checks to NumericComparator and String->BigDecimal conversion function

* Add more OrderByColumnSpec serde tests
2016-07-29 15:44:16 -07:00
Navis Ryu 884017d981 "all" type search query spec (#3300)
* "all" type search query spec

* addressed comments

* added unit test
2016-07-28 18:16:15 -07:00
Gian Merlino 2553997200 Associate groupBy v2 resources with the Sequence lifecycle. (#3296)
This fixes a potential issue where groupBy resources could be allocated to
create a Sequence, but then the Sequence is never used, and thus the resources
are never freed.

Also simplifies how groupBy handles config overrides (this made the new
unit test easier to write).
2016-07-27 18:44:19 -07:00
Gian Merlino 9b5523add3 Reference counting, better error handling for resources in groupBy v2. (#3268)
Refcounting prevents releasing the merge buffer, or closing the concurrent
grouper, before the processing threads have all finished. The better
error handling prevents an avalanche of per-runner exceptions when grouping
resources are exhausted, by grouping those all up into a single merged
exception.
2016-07-27 01:59:02 +05:30
Erik Dubbelboer 76fabcfdb2 Fix #2782, Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231)
On systems with only once processor this test fails.
2016-07-25 15:51:09 -07:00
kaijianding 3dc2974894 Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227)
* save TimestampSpec in metadata.drd

* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00
Jonathan Wei a42ccb6d19 Support filtering on long columns (including __time) (#3180)
* Support filtering on __time column

* Rename DruidPredicate

* Add docs for ValueMatcherFactory, add comment on getColumnCapabilities

* Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate

* Address PR comments (support filter on all long columns)

* Use predicate factory instead of composite predicate

* Address PR comments

* Lazily initialize long handling in selector/in filter

* Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe

* Add multithreaded selector/in filter test

* Fix non-final lock object in SelectorDimFilter
2016-07-20 17:08:49 -07:00
Gian Merlino 06624c40c0 Share query handling between Appenderator and RealtimePlumber. (#3248)
Fixes inconsistent metric handling between the two implementations. Formerly,
RealtimePlumber only emitted query/segmentAndCache/time and query/wait and
Appenderator only emitted query/partial/time and query/wait (all per sink).

Now they both do the same thing:
- query/segmentAndCache/time, query/segment/time are the time spent per sink.
- query/cpu/time is the CPU time spent per query.
- query/wait/time is the executor waiting time per sink.

These generally match historical metrics, except segmentAndCache & segment
mean the same thing here, because one Sink may be partially cached and
partially uncached and we aren't splitting that out.
2016-07-19 22:15:13 -05:00
Nishant 7995818220 Increase test timeout to prevent failing on slow machines (#3224)
constantly timing out on one of slow build machines, increasing the
timeout fixed it.

Running io.druid.granularity.QueryGranularityTest
Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776
sec - in io.druid.granularity.QueryGranularityTest
2016-07-17 18:44:48 -07:00
Gian Merlino 6cd1f5375b Better harmonized dimensions for query metrics. (#3245)
All query metrics now start with toolChest.makeMetricBuilder, and all of
*those* now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id"
moved to common code, since all query metrics added it anyway.

In particular this will add query-type specific dimensions like "threshold"
and "numDimensions" to servlet-originated metrics like query/time.
2016-07-14 11:55:51 -07:00
Gian Merlino ea03906fcf Configurable compressRunOnSerialization for Roaring bitmaps. (#3228)
Defaults to true, which is a change in behavior (this used to be false and unconfigurable).
2016-07-08 10:24:19 +05:30
Gian Merlino fdc7e88a7d Allow queries with no aggregators. (#3216)
This is actually reasonable for a groupBy or lexicographic topNs that is
being used to do a "COUNT DISTINCT" kind of query. No aggregators are
needed for that query, and including a dummy aggregator wastes 8 bytes
per row.

It's kind of silly for timeseries, but why not.
2016-07-06 20:38:54 +05:30
Jonathan Wei f3a3662133 Fix compile error in SearchBinaryFnTest (#3201) 2016-06-29 09:44:45 -05:00
jaehong choi efbcbf5315 Support alphanumeric sort in search query (#2593)
* support alphanumeric sort in search query

* address a comment about handling equals() and hashCode()

* address comments

* add Ut for string comparators

* address a comment about space indentations.
2016-06-28 15:06:18 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
Dave Li 8a08398977 Add segment pruning based on secondary partition dimension (#2982)
* add get dimension rangeset to filters

* add get domain to ShardSpec and added chunk filter in caching clustered client

* add null check and modified not filter, started with unit test

* add filter test with caching

* refactor and some comments

* extract filtershard to helper function

* fixup

* minor changes

* update javadoc
2016-06-24 14:52:19 -07:00
michaelschiff 66d8ad36d7 adds new coordinator metrics 'segment/unavailable/count' and (#3176)
'segment/underReplicated/count' (#3173)
2016-06-23 14:53:15 -07:00
Gian Merlino da660bb592 DumpSegment tool. (#3182)
Fixes #2723.
2016-06-23 14:37:50 -07:00
Gian Merlino a437fb150b Fix SegmentMetadataQuery when queryGranularity is requested but not present. (#3181) 2016-06-23 14:30:50 -07:00
Jonathan Wei 24860a1391 Two-stage filtering (#3018)
* Two-stage filtering

* PR comment
2016-06-22 16:08:21 -07:00
Nishant f46ad9a4cb support Union Segment metadata queries (#3132)
* support Union Segment metadata queries

fix 3128

* remove extraneous sys out
2016-06-21 10:30:50 -07:00
Dave Li 12be1c0a4b Add bucket extraction function (#3033)
* add bucket extraction function

* add doc and header

* updated doc and test
2016-06-17 09:24:27 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Nishant 0d427923c0 fix caching for search results (#3119)
* fix caching for search results

properly read count when reading from cache.

* fix NPE during merging search count and add test

* Update cache key to invalidate prev results
2016-06-09 17:49:47 -07:00
Gian Merlino 5998de7d5b Fix lenient merging of conflicting aggregators. (#3113)
This should have marked the conflicting aggregator as null, but instead it
threw an NPE for the entire query.
2016-06-08 15:56:48 -07:00
Jonathan Wei 37c8a8f186 Speed up filter tests with adapter cache (#3103) 2016-06-08 07:41:10 -07:00
Gian Merlino 54139c6815 Fix NPE in registeredLookup extractionFn when "optimize" is not provided. (#3064) 2016-06-03 12:58:17 -05:00
Gian Merlino 6171e078c8 Improve NPE message in LookupDimensionSpec when lookup does not exist. (#3065)
The message used to be empty, which made things hard to debug.
2016-06-02 19:59:12 -07:00
John Wang e662efa79f segment interface refactor for proposal 2965 (#2990) 2016-05-26 20:36:41 -07:00
Kurt Young b5bd406597 fix #2991: race condition in OnheapIncrementalIndex#addToFacts (#3002)
* fix #2991: race condition in OnheapIncrementalIndex#addToFacts

* add missing header

* handle parseExceptions when first doing first agg
2016-05-25 19:05:46 -07:00
Jonathan Wei b72c54c4f8 Add benchmark data generator, basic ingestion/persist/merge/query benchmarks (#2875) 2016-05-25 16:39:37 -07:00
Dave Li dcabd4b1ee Add lookup optimization for InDimFilter (#2938)
* Add lookup optimization for InDimFilter

* tests for in filter with lookup extraction fn

* refactor

* refactor2 and modified filter test

* make optimizeLookup private
2016-05-19 16:29:16 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Charles Allen fb01db4db7 [QTL] Allows RegisteredLookupExtractionFn to find its lookups lazily (#2971)
* Allows RegisteredLookupExtractionFn to find its lookups lazily

* Use raw variables instead of AtomicReference

* Make sure to use volatile

* Remove extra local variable.

* Move from BAOS to ByteBuffer
2016-05-17 11:29:39 -07:00
Himanshu d3e9c47a5f use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
Himanshu d821144738 at historicals GpBy query mergeResults does not need merging as results are already merged by GroupByQueryRunnerFactory.mergeRunners(..) (#2962) 2016-05-12 17:41:24 -07:00
Gian Merlino 01bebf432a GroupByQuery: Multi-value dimension tests. (#2959) 2016-05-12 11:31:50 -07:00
Charles Allen a31348450f Add toString for LookupConfig (#2935)
* Helps with operations and getting where the snapshot dir is
2016-05-09 18:20:00 -07:00
Dave Li 79a54283d4 Optimize filter for timeseries, search, and select queries (#2931)
* Optimize filter for timeseries, search, and select queries

* exception at failed toolchest type check

* took out query type check

* java7 error fix and test improvement
2016-05-09 11:04:06 -07:00
Slim 8b570ab130 make it clear what LookupExtractorFactory start/stop methods return (#2925) 2016-05-05 10:38:40 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Himanshu 8e2742b7e8 adding QueryGranularity to segment metadata and optionally expose same from segmentMetadata query (#2873) 2016-05-03 11:31:10 -07:00
Gian Merlino 40e595c7a0 Remove types from TimeAndDims, they aren't needed. (#2865) 2016-05-03 13:10:25 -05:00
binlijin 841be5c61f periodically emit metric segment/scan/pending (#2854) 2016-05-02 22:38:13 -07:00
Navis Ryu 2729fea84d Fix parsing fail of segment id with datasource containing underscore (#2797)
* Fix parsing fail of segment id with underscored datasource (Fix for #2786)

* addressed comment

* renamed and moved code into api. added log4 dependency for tests

* addressed comments

* fixed test fails
2016-05-02 22:37:28 -07:00
Gian Merlino 90ce03c66f Fix integer overflow in SegmentMetadataQuery numRows. (#2890) 2016-04-27 14:37:04 -07:00
Gian Merlino 6dc7688a29 TimeAndDims equals/hashCode implementation. (#2870)
Adapted from #2692, thanks @navis for original implementation.
2016-04-22 08:45:20 +08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Slim 984a518c9f Merge pull request #2734 from b-slim/LookupIntrospection2
[QTL][Lookup] adding introspection endpoint
2016-04-21 12:15:57 -05:00