Commit Graph

1579 Commits

Author SHA1 Message Date
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Charles Allen 654e1db309 Add simple test to FunctionalExtractionTest (#3522) 2016-09-28 23:45:15 -07:00
Gian Merlino d5a8a35fec groupBy: GroupByRowProcessor fixes, invert subquery context overrides. (#3502)
- Fix GroupByRowProcessor config overrides
- Fix GroupByRowProcessor resource limit checking
- Invert subquery context overrides such that for the subquery, its own
  keys override keys from the outer query, not the other way around.

The last bit is necessary for the test to work, and seems like a better
way to do it anyway.
2016-09-23 14:41:09 -07:00
Gian Merlino 7195be32d8 groupBy v2: Fix dangling references. (#3500)
Acquiring references in the processing task prevents dangling references
caused by canceled processing tasks.
2016-09-24 01:59:11 +05:30
Gian Merlino f8d71fc602 groupBy: Fix maxMergingDictionarySize config. (#3488) 2016-09-22 10:02:33 -07:00
Gian Merlino c87ecea975 Fix ListFilteredDimensionSpec blacklisting on non-present values. (#3487) 2016-09-22 09:12:02 -07:00
Navis Ryu 49c0fe0e8b Show candidate hosts for the given query (#2282)
* Show candidate hosts for the given query

* Added test cases & minor changes to address comments

* Changed path-param to query-pram for intervals/numCandidates
2016-09-22 08:32:38 -07:00
Keuntae Park 54ec4dd584 Support renaming of outputName for cached select and search query results (#3395)
* support renaming of outputName for cached select and search queries

* rebase and resolve conflicts

* rollback CacheStrategy interface change

* updated based on review comments
2016-09-20 08:19:14 -07:00
Charles Allen 95e08b38ea [QTL] Reduced Locking Lookups (#3071)
* Lockless lookups

* Fix compile problem

* Make stack trace throw instead

* Remove non-germane change

* * Add better naming to cache keys. Makes logging nicer
* Fix #3459

* Move start/stop lock to non-interruptable for readability purposes
2016-09-16 11:54:23 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Gian Merlino d108461838 groupBy v2: Parallel disk spilling. (#3433)
In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch
from hash-based partitioning to thread-based partitioning. This stops processing
threads from blocking each other while spilling is occurring.
2016-09-09 16:49:58 -06:00
Gian Merlino 1e3f94237e groupBy v2: Configurable load factor. (#3437)
Also change defaults:

- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
  for most heaps.
2016-09-07 14:14:59 -05:00
Roman Leventov 4f0bcdce36 Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 (#3422)
* Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312)

* Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9

* Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources

* Reformat IndexIO
2016-09-07 10:43:47 -07:00
Gian Merlino 8d2ae144a8 groupBy: Short-circuit identity preCompute manipulators. (#3434) 2016-09-06 22:28:44 -06:00
Gian Merlino 1d07964987 LimitedTemporaryStorage: Fix perf bug. (#3432)
FilterOutputStream has an inefficient implementation of write(byte[], int, int).
So let's extend OutputStream directly and use efficient implementations of all
methods.
2016-09-06 15:39:36 -07:00
Gian Merlino 8ed1894488 groupBy: Omit timestamp from merge key when granularity = all. (#3416)
Fixes #3412.
2016-09-01 09:02:54 -07:00
Gian Merlino 6d25c5e053 Avoid materializing all groupBy results with order + limit. (#3410)
The old TopNFunction code did Sequences.toList on the input sequence before
using a priority queue to find the top N items. Now, the priority queue
is used in an accumulator, so there is no need to fully materialize the results.

Also removed equals/hashCode from the limitFn and remove limitFn from the
GroupByQuery's hashCode, since that wasn't necessary and the implementation
of hashCode wasn't correct anyway.
2016-08-31 14:08:07 -07:00
Gian Merlino 1268e2902c Add groupBy test for multiple multi-value dimensions. (#3415) 2016-08-31 11:21:10 -07:00
Gian Merlino e9050c2b4c TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411) 2016-08-31 20:58:53 +05:30
Keuntae Park 0076b5fc1a Interval bug fix for search query (#2903)
* support query granularity and interval for search query

* skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments.

* use binary search to find start and end index for the given interval

* fix based on comment

* bug fix based on the review comments and add unit tests
2016-08-31 20:52:44 +05:30
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Jonathan Wei 4e91330a17 Use DimensionSpec in CardinalityAggregatorFactory (#3406)
* Use DimensionSpec in CardinalityAggregatorFactory

* Address PR comments

* Fix requiredFields()
2016-08-30 15:54:02 -07:00
Gian Merlino b11e9544ea GroupBy v2: Improve hash code distribution. (#3407)
Without this transformation, distribution of hash % X is poor in general.
It is catastrophically poor when X is a multiple of 31 (many slots would
be empty).
2016-08-30 12:09:08 +05:30
kaijianding f037dfcaa4 fix missing segments duplicate retried (#3398) 2016-08-29 23:46:21 +05:30
jaehong choi 2e0f253c32 introducing lists of existing columns in the fields of select queries' output (#2491)
* introducing lists of existing columns in the fields of select queries' output

* rebase master

* address the comment. add test code for select query caching

* change the cache code in SelectQueryQueryToolChest to 0x16
2016-08-25 21:37:53 +05:30
rajk-tetration 362b9266f8 Adding filters for TimeBoundary on backend (#3168)
* Adding filters for TimeBoundary on backend

Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>

* updating TimeBoundaryQuery constructor in QueryHostFinderTest

* add filter helpers

* update filterSegments + test

* Conditional filterSegment depending on whether a filter exists

* Style changes

* Trigger rebuild

* Adding documentation for timeboundaryquery filtering

* added filter serialization to timeboundaryquery cache

* code style changes
2016-08-15 10:25:24 -07:00
Gian Merlino e1b0b7de3e IndexBuilder: Allow replacing rows, customizable maxRows. (#3359) 2016-08-12 15:22:45 -07:00
Jonathan Wei 454587857c Make StringComparator deserialization case-insensitive (#3356) 2016-08-11 18:00:11 -07:00
Himanshu 043562914d Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341) 2016-08-10 11:03:44 -07:00
Gian Merlino 1eb7a7e882 Restore optimizations in BoundFilter. (#3343) 2016-08-10 08:53:17 -07:00
Gian Merlino a2bcd97512 IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344)
They had arrays as values, which MapBasedRow doesn't understand and
toStrings rather than converting to lists.
2016-08-10 08:47:29 -07:00
Jonathan Wei 890e3bdd3f More informative query unit test names (#3342) 2016-08-09 22:24:48 -07:00
Gian Merlino 8899affe48 Introduce standardized "Resource limit exceeded" error. (#3338)
Fixes #3336.
2016-08-09 10:50:56 -07:00
Gian Merlino 21bce96c4c More useful query errors. (#3335)
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.

With this patch, the error fields are:

- error, one of the specific strings "Query interrupted", "Query timeout",
  "Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
  in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
  in the causality chain.
- host, the host that failed the query.
2016-08-09 16:14:52 +08:00
Gian Merlino 1aae5bd67d Nicer handling for cancelled groupBy v2 queries. (#3330)
1. Wrap temporaryStorage in a resource holder, to avoid spurious "Closed"
   errors from already-running processing tasks.
2. Exit early from the merging accumulator if the query is cancelled.
2016-08-05 14:48:06 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Gian Merlino 9437a7a313 HLL: Avoid some allocations when possible. (#3314)
- HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
- HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
- Updated call sites where appropriate to duplicate BBs passed to HLLC.
2016-08-03 18:08:52 -07:00
Gian Merlino a4b95af839 Fix grouper closing in GroupByMergingQueryRunnerV2. (#3316)
The grouperHolder should be closed on failure, not the grouper.
2016-08-02 21:02:30 -07:00
Gian Merlino 0299ac73b8 Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312)
The common theme between the two is they both create "fake" DimensionSelectors
that work on top of Rows. They both do it because there isn't really any
dictionary for the underlying Rows, they're just a stream of data. The fix for
both is to allow a DimensionSelector to tell callers that it has no dictionary
by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in
turn, can avoid using it in ways that assume it has a dictionary.

Fixes #3311.
2016-08-02 17:39:40 -07:00
Gian Merlino ae3e0015b6 Fix ClassCastException in nested v2 groupBys with timeouts. (#3310)
Add tests for the CCE and for a bunch of other groupBy stuff.

Also avoids setting the interrupted flag when InterruptedExceptions
happen, since this might interfere with resource closing, no other
query does it, and is probably pointless anyway since the thread
is likely to be a jetty thread that we don't actually want to set
an interrupt flag on.

Also fixes toString on OrderByColumnSpec.
2016-08-02 16:02:44 -06:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00
Jonathan Wei 0bdaaa224b Use Long.compare for NumericComparator when possible (#3309) 2016-08-01 20:36:56 -07:00
Dave Li bc20658239 groupBy nested query using v2 strategy (#3269)
* changed v2 nested query strategy

* add test for #3239

* update for new ValueMatcher interface and add benchmarks

* enable time filtering

* address PR comments

* add failing test for outer filter aggregator

* add helper class for sharing code

* update nested groupby doc

* move temporary storage instantiation

* address PR comment

* address PR comment 2
2016-08-01 18:30:39 -07:00
Jonathan Wei a6105cbb86 Add numeric StringComparator (#3270)
* Add numeric StringComparator

* Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query

* Address PR comments, add multithreaded BoundDimFilter test

* Add comment on strlen tie handling

* Add timeseries interval filter benchmark

* Adjust docs

* Use jackson for StringComparator, address PR comments

* Add new TopNMetricSpec and SearchSortSpec with tests (WIP)

* More TopNMetricSpec and SearchSortSpec tests

* Fix NewSearchSortSpec serde

* Update docs for new DimensionTopNMetricSpec

* Delete NumericDimensionTopNMetricSpec

* Delete old SearchSortSpec

* Rename NewSearchSortSpec to SearchSortSpec

* Add TopN numeric comparator benchmark, address PR comments

* Refactor OrderByColumnSpec

* Add null checks to NumericComparator and String->BigDecimal conversion function

* Add more OrderByColumnSpec serde tests
2016-07-29 15:44:16 -07:00
Navis Ryu 884017d981 "all" type search query spec (#3300)
* "all" type search query spec

* addressed comments

* added unit test
2016-07-28 18:16:15 -07:00
Gian Merlino 2553997200 Associate groupBy v2 resources with the Sequence lifecycle. (#3296)
This fixes a potential issue where groupBy resources could be allocated to
create a Sequence, but then the Sequence is never used, and thus the resources
are never freed.

Also simplifies how groupBy handles config overrides (this made the new
unit test easier to write).
2016-07-27 18:44:19 -07:00
Gian Merlino 9b5523add3 Reference counting, better error handling for resources in groupBy v2. (#3268)
Refcounting prevents releasing the merge buffer, or closing the concurrent
grouper, before the processing threads have all finished. The better
error handling prevents an avalanche of per-runner exceptions when grouping
resources are exhausted, by grouping those all up into a single merged
exception.
2016-07-27 01:59:02 +05:30
Erik Dubbelboer 76fabcfdb2 Fix #2782, Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231)
On systems with only once processor this test fails.
2016-07-25 15:51:09 -07:00
kaijianding 3dc2974894 Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227)
* save TimestampSpec in metadata.drd

* add timestampSpec info in SegmentMetadataQuery
2016-07-25 15:45:30 -07:00