Commit Graph

1628 Commits

Author SHA1 Message Date
Jonathan Wei 2bfcc8a592 First and Last Aggregator (#3566)
* add first and last aggregator

* add test and fix

* moving around

* separate aggregator valueType

* address PR comment

* add finalize inner query and adjust v1 inner indexing

* better test and fixes

* java-util import fixes

* PR comments

* Add first/last aggs to ITWikipediaQueryTest
2016-12-16 15:26:40 -08:00
Himanshu ed322a4beb remove size from default analysisTypes list for segmentMetadata query (#3773) 2016-12-13 18:01:21 -08:00
Jonathan Wei 880a021a7a Fix missed travis failures from PR 3567 and 2798 (#3761)
* Fix checkstyle failures from PR 3567

* Fix GranularityPathSpecTest compile failure
2016-12-07 19:07:31 -08:00
Erik Dubbelboer bb9e35e1af Add Greatest and Least post aggregations (#3567) 2016-12-07 17:58:23 -08:00
Roman Leventov dc8f814acc Optimize Iterator<ImmutableBitmap> implementation inside Filters.matchPredicate() so that it doesn't emit empty bitmap in the end of the iteration, and make it to follow Iterator contract, that is throw NoSuchElementException from next() if there are no more bitmaps (#3754) 2016-12-07 12:54:09 -08:00
Jonathan Wei d1896a2d62 Disable flush after every ObjectMapper write (#3748) 2016-12-06 16:45:23 -08:00
Gian Merlino b1bac9f2d3 groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. (#3740)
* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
2016-12-06 16:17:32 -08:00
Himanshu 45da7e48f1 groupBy sort results by (dimensions,timestamp) instead of (timestamp,dimension) (#3672)
* sortByDimsFirst flag for groupBy query

* Remove need for KeyType in Grouper<KeyType> to be Comparable<KeyType>

* fix review comments

* fix review comments regarding removing code duplication of dim/time comparison

* move comparator for KeyType object to KeySerdeFactory so that creation of comparator does not need KeySerde

* remove unnecessary system.out.println

* make access static var NATURAL_NULLS_FIRST directly

* further review comments addressing
2016-12-06 09:48:56 -08:00
Navis Ryu c74d267f50 Support virtual column for select query (#2511)
* Support virtual column for select query

* Addressed comments
2016-12-05 15:14:35 -08:00
Gian Merlino b64e06704e Fix SingleScanTimeDimSelector when an extractionFn returns null for a timestamp. (#3732) 2016-12-02 15:27:54 -08:00
Gian Merlino f4cc8c2b2f IndexBuilder: Close IncrementalIndex when done. (#3734) 2016-12-02 16:56:34 -06:00
Gian Merlino 353fee79dd Add "asMillis" option to "timeFormat" extractionFn. (#3733)
This is useful for chaining extractionFns that all want to treat time as millis,
such as having a javascript extractionFn after a timeFormat.
2016-12-02 13:45:16 -08:00
Gian Merlino 102375d9bb Add "strlen" extractionFn. (#3731) 2016-12-02 12:08:51 -08:00
Gian Merlino 4c5d10f8a3 Add DimFilterHavingSpec. (#3727)
* Add DimFilterHavingSpec.

* Add test for DimFilterHavingSpec with extractionFns.
2016-12-02 10:04:30 -08:00
Gian Merlino 68735829ca Add, fix equals, hashCode, toString on various classes. (#3723)
* TimeFormatExtractionFn: Add toString.

* InDimFilter: Add toString, allow accepting any Collection of values.

* DimensionTopNMetricSpec: Fix toString.

* InvertedTopNMetricSpec: Add toString.

* HyperUniqueFinalizingPostAggregator: Add equals, hashCode, toString.
2016-11-30 19:00:14 -08:00
Gian Merlino 477e0cab7c Filter fixes and tests (#3724)
* More robust Filter tests.

All Filter tests now exercise the CNF and post-filtering features.

* Fixes to RowBasedValueMatcherFactory and to bound filters.

- Change Comparables to Strings in ValueMatcher related code.
- Break out RowBasedValueMatcherFactory, fix a variety of issues around nulls, and add tests.
- Fix bound filters on long columns with non-numeric bounds, and add tests.
2016-11-30 16:10:05 -08:00
Gian Merlino 6922d684bf GroupBy: Validation of output names, and a gross hack for v1 subqueries. (#3686)
v1 subqueries try to use aggregators to "transfer" values from the inner
results to an incremental index, but aggregators can't transfer all kinds of
values (strings are a common one). This is a workaround that selectively
ignores what the outer aggregators ask for and instead assumes that we know
best.

These are in the same commit because the name validation changed the kinds of
errors that were thrown by v1 subqueries.
2016-11-29 12:35:03 +05:30
Roman Leventov c070b4a816 Fix concurrency defects, remove unnecessary volatiles (#3701) 2016-11-22 16:42:28 -08:00
Roman Leventov 7b56cec3b9 Fix resource leaks (#3702) 2016-11-18 21:21:36 +05:30
Gian Merlino 7e80d1045a Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698)
This also involved some other test changes:

- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
  engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
  https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
2016-11-16 20:02:25 -08:00
Keuntae Park 094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Gian Merlino 9ad34a3f03 groupBy v1: Force all dimensions to strings. (#3685)
Fixes #3683.
2016-11-14 09:30:18 -08:00
Jisoo Kim 7c0f462fbc fix bug in StringDimensionHandler and add a cli tool for validating segments (#3666) 2016-11-11 18:46:25 -08:00
Roman Leventov fbbb55f867 Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679)
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics

* Remove unused imports

* Use empty string instead of "testing-version" as a version placeholder
2016-11-11 17:17:27 -06:00
Akash Dwivedi 3e408497b3 Migrating bytebuffercollections from Metamarkets. (#3647)
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
2016-11-11 10:51:07 -08:00
Gian Merlino fd5451486c Short-circuiting AndFilter. (#3676)
If any of the bitmaps are empty, the result will be false.
2016-11-11 10:14:56 -08:00
Gian Merlino 657e4512d2 Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660)
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
2016-11-05 11:34:36 -07:00
Gian Merlino 4cbebd0931 SubstringDimExtractionFn, BoundDimFilter: Implement typical style toString. (#3658) 2016-11-04 13:31:47 -07:00
Gian Merlino 600bbd4a17 BucketExtractionFn: Implement hashCode, fix toString. (#3656) 2016-11-04 11:24:02 -07:00
Gian Merlino 8b3c86f41f Fix FilteredAggregatorFactory toString formatting. (#3657) 2016-11-04 11:23:55 -07:00
Gian Merlino 2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Navis Ryu b99e14e732 Support configuration for handling multi-valued dimension (#2541)
* Support configuration for handling multi-valued dimension

* Addressed comments

* use MultiValueHandling.ofDefault() for missing policy
2016-11-03 22:38:54 -06:00
Navis Ryu e10def32f2 Support string type in math expression (#2836)
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
2016-11-02 21:10:48 -06:00
kaijianding 2961406b90 fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644) 2016-11-02 15:40:07 +05:30
Roman Leventov 4b0d6cf789 Fix resource leaks (ComplexColumn and GenericColumn) (#3629)
* Remove unused ComplexColumnImpl class

* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs

* Use concise try-with-resources syntax in several places

* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter

* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
2016-11-02 09:23:52 +05:30
Gian Merlino 45940d6e40 Math expressions support for missing columns. (#3630)
Also add SchemaEvolutionTest to help test this kind of thing.

Fixes #3627 and includes test for #3625.
2016-11-01 09:40:25 -07:00
Gian Merlino 89d9c61894 Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572) 2016-10-31 15:24:30 -07:00
Navis Ryu 3fca3be9ea SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617) 2016-10-30 11:47:29 -07:00
Himanshu 23a8e22836 fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613) 2016-10-28 15:48:33 -07:00
Navis Ryu 898c1c21af More best-effort parse long (#3603)
* More best-effort parse long

* addressed comments
2016-10-25 10:31:51 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Navis Ryu 8b7ff4409a Math expressional parameters for aggregator (#2783)
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)

* Addressed comments

* addressed comments
2016-10-19 13:58:35 -05:00
Roman Leventov b113a34355 In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587) 2016-10-18 09:06:42 -05:00
Charles Allen 2c5c8198db Make query/cpu/time still report on error (#3535) 2016-10-18 08:26:21 -05:00
Roman Leventov 9611358f0a Small topn scan improvements (#3526)
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()

* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()

* Remove unused import
2016-10-17 10:36:19 -07:00
Gian Merlino 285516bede Workaround non-thread-safe use of HLL aggregators. (#3578)
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
2016-10-17 09:39:12 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Roman Leventov 85ac8eff90 Improve performance of IndexMergerV9 (#3440)
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
2016-10-13 08:28:46 -07:00
Charles Allen 76e77cb610 Make segment creation gauva 14 friendly (#3520) 2016-10-05 15:25:03 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00