Commit Graph

7503 Commits

Author SHA1 Message Date
Gian Merlino 6d25c5e053 Avoid materializing all groupBy results with order + limit. (#3410)
The old TopNFunction code did Sequences.toList on the input sequence before
using a priority queue to find the top N items. Now, the priority queue
is used in an accumulator, so there is no need to fully materialize the results.

Also removed equals/hashCode from the limitFn and remove limitFn from the
GroupByQuery's hashCode, since that wasn't necessary and the implementation
of hashCode wasn't correct anyway.
2016-08-31 14:08:07 -07:00
Gian Merlino 1268e2902c Add groupBy test for multiple multi-value dimensions. (#3415) 2016-08-31 11:21:10 -07:00
Gian Merlino e9050c2b4c TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411) 2016-08-31 20:58:53 +05:30
Keuntae Park 0076b5fc1a Interval bug fix for search query (#2903)
* support query granularity and interval for search query

* skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments.

* use binary search to find start and end index for the given interval

* fix based on comment

* bug fix based on the review comments and add unit tests
2016-08-31 20:52:44 +05:30
Stéphane Derosiaux 48dce88aab Add flag binaryAsString for parquet ingestion (#3381) 2016-08-30 17:30:50 -07:00
Dave Li c4e8440c22 Adds long compression methods (#3148)
* add read

* update deprecated guava calls

* add write and vsizeserde

* add benchmark

* separate encoding and compression

* add header and reformat

* update doc

* address PR comment

* fix buffer order

* generate benchmark files

* separate encoding strategy and format

* fix benchmark

* modify supplier write to channel

* add float NONE handling

* address PR comment

* address PR comment 2
2016-08-30 16:17:46 -07:00
Jonathan Wei 4e91330a17 Use DimensionSpec in CardinalityAggregatorFactory (#3406)
* Use DimensionSpec in CardinalityAggregatorFactory

* Address PR comments

* Fix requiredFields()
2016-08-30 15:54:02 -07:00
Nishant 4c2b8d29d3 Make RTR assign pending tasks by insertion order (#3405) 2016-08-30 12:22:44 -07:00
Gian Merlino b11e9544ea GroupBy v2: Improve hash code distribution. (#3407)
Without this transformation, distribution of hash % X is poor in general.
It is catastrophically poor when X is a multiple of 31 (many slots would
be empty).
2016-08-30 12:09:08 +05:30
kaijianding f037dfcaa4 fix missing segments duplicate retried (#3398) 2016-08-29 23:46:21 +05:30
Ashish 6b40bf8b32 doc: added note to README, about necessary hdfs config after insert-segment-to-db (#3402) 2016-08-28 16:39:33 -07:00
Gleb Smirnov 8bee07e81e Respect server-side sorting of tasks in coordinator console (#3404) 2016-08-28 16:38:29 -07:00
jaehong choi 2e0f253c32 introducing lists of existing columns in the fields of select queries' output (#2491)
* introducing lists of existing columns in the fields of select queries' output

* rebase master

* address the comment. add test code for select query caching

* change the cache code in SelectQueryQueryToolChest to 0x16
2016-08-25 21:37:53 +05:30
Chanh Le d624037698 Pull-deps: correct the library directory in the document (#3361)
* Pull-deps: correct the library directory in the document

* Pull-deps: correct the library directory in the document in the last example command
2016-08-16 17:18:15 -07:00
Fangjin Yang edb0eca3a9 fix docs (#3370) 2016-08-16 16:25:50 -07:00
Fangjin Yang 6beb8ac342 fix some docs and add new content (#3369) 2016-08-16 15:00:18 -07:00
Hamlet Lee e4f0eac8e6 Fix issue #2707 (#2708) 2016-08-16 12:19:44 -05:00
kaijianding eafafce1aa fix old usage of dimension as string instead of dimensionSchema in DataSchema (#3365) 2016-08-16 09:58:04 -07:00
David Lim ed924bf214 allow registrants to opt out of announcing themselves when registering as a chat handler (#3360) 2016-08-16 10:51:28 +05:30
rajk-tetration 362b9266f8 Adding filters for TimeBoundary on backend (#3168)
* Adding filters for TimeBoundary on backend

Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>

* updating TimeBoundaryQuery constructor in QueryHostFinderTest

* add filter helpers

* update filterSegments + test

* Conditional filterSegment depending on whether a filter exists

* Style changes

* Trigger rebuild

* Adding documentation for timeboundaryquery filtering

* added filter serialization to timeboundaryquery cache

* code style changes
2016-08-15 10:25:24 -07:00
Himanshu 70d99fe3c6 Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest (#3363)
or else the test fails if ran independently.
2016-08-15 10:19:33 -07:00
Gian Merlino e1b0b7de3e IndexBuilder: Allow replacing rows, customizable maxRows. (#3359) 2016-08-12 15:22:45 -07:00
kaijianding df89f25b15 fix can't get latest offset in KafkaEightSimpleConsumerFirehoseFactory (#3355) 2016-08-11 18:00:24 -07:00
Jonathan Wei 454587857c Make StringComparator deserialization case-insensitive (#3356) 2016-08-11 18:00:11 -07:00
jianran 18af480017 Rename fields in OrderedMergeIterator (#3149)
* code readable

* fix the pre middle manager peon no stop

* Revert "fix the pre middle manager peon no stop"

This reverts commit 6cef4980bf.
2016-08-11 09:42:12 -07:00
Gian Merlino 2f46effc8e FileTaskLogsTest: Throw unthrown exception. (#3352) 2016-08-11 09:40:28 -07:00
Himanshu 03cfcf002b fix the race described in #3174 (#3205) 2016-08-10 11:29:50 -07:00
Himanshu 043562914d Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() (#3341) 2016-08-10 11:03:44 -07:00
Himanshu 46da682231 avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249) 2016-08-10 10:49:26 -07:00
Gian Merlino 1eb7a7e882 Restore optimizations in BoundFilter. (#3343) 2016-08-10 08:53:17 -07:00
Gian Merlino a2bcd97512 IncrementalIndex: Fix multi-value dimensions returned from iterators. (#3344)
They had arrays as values, which MapBasedRow doesn't understand and
toStrings rather than converting to lists.
2016-08-10 08:47:29 -07:00
kaijianding b21a98e2f6 fix NPE if queueBufferLength is null in KafkaEightSimpleConsumerFirehoseFactory (#3345) 2016-08-10 07:59:17 -07:00
Jonathan Wei 890e3bdd3f More informative query unit test names (#3342) 2016-08-09 22:24:48 -07:00
Nishant 8035c73409 Implement EnvironmentVariablePasswordProvider (#3329)
* Implement EnvironmentVariablePasswordProvider

* Review Comment : rename passwordKey to passwordVariable

* add docs

* improve doc layout

* review comment: rename property for variable
2016-08-10 05:33:51 +08:00
Gian Merlino 8899affe48 Introduce standardized "Resource limit exceeded" error. (#3338)
Fixes #3336.
2016-08-09 10:50:56 -07:00
Gian Merlino 21bce96c4c More useful query errors. (#3335)
Follow-up to #1773, which meant to add more useful query errors but
did not actually do so. Since that patch, any error other than
interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`.

With this patch, the error fields are:

- error, one of the specific strings "Query interrupted", "Query timeout",
  "Query cancelled", or "Unknown exception" (same behavior as before).
- errorMessage, the message of the topmost non-QueryInterruptedException
  in the causality chain.
- errorClass, the class of the topmost non-QueryInterruptedException
  in the causality chain.
- host, the host that failed the query.
2016-08-09 16:14:52 +08:00
Gian Merlino 2613e68477 Update java-util to 0.27.10. (#3337) 2016-08-09 13:37:30 +05:30
Navis Ryu 39351fb8d2 Mask properties from logging (#3332)
* Mask properties from logging

* mask "password" by default
2016-08-08 21:36:10 +05:30
Himanshu ed5b92d612 document how to check MM enabled/disabled (#3331) 2016-08-06 05:56:51 +08:00
Gian Merlino 1aae5bd67d Nicer handling for cancelled groupBy v2 queries. (#3330)
1. Wrap temporaryStorage in a resource holder, to avoid spurious "Closed"
   errors from already-running processing tasks.
2. Exit early from the merging accumulator if the query is cancelled.
2016-08-05 14:48:06 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Jonathan Wei 1e3979a5e8 Add variance aggregator from hive to NOTICE (#3327) 2016-08-04 17:43:55 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Gleb Smirnov 33dbe0800c Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326) 2016-08-04 10:24:19 -07:00
Gian Merlino 9437a7a313 HLL: Avoid some allocations when possible. (#3314)
- HLLC.fold avoids duplicating the other buffer by saving and restoring its position.
- HLLC.makeCollector(buffer) no longer duplicates incoming BBs.
- Updated call sites where appropriate to duplicate BBs passed to HLLC.
2016-08-03 18:08:52 -07:00
Himanshu be79b095ba fixing expected result for segmentMetadata query in integration tests (#3318) 2016-08-03 12:13:27 -07:00
Gian Merlino a4b95af839 Fix grouper closing in GroupByMergingQueryRunnerV2. (#3316)
The grouperHolder should be closed on failure, not the grouper.
2016-08-02 21:02:30 -07:00
Gian Merlino 0299ac73b8 Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312)
The common theme between the two is they both create "fake" DimensionSelectors
that work on top of Rows. They both do it because there isn't really any
dictionary for the underlying Rows, they're just a stream of data. The fix for
both is to allow a DimensionSelector to tell callers that it has no dictionary
by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in
turn, can avoid using it in ways that assume it has a dictionary.

Fixes #3311.
2016-08-02 17:39:40 -07:00
Gian Merlino ae3e0015b6 Fix ClassCastException in nested v2 groupBys with timeouts. (#3310)
Add tests for the CCE and for a bunch of other groupBy stuff.

Also avoids setting the interrupted flag when InterruptedExceptions
happen, since this might interfere with resource closing, no other
query does it, and is probably pointless anyway since the thread
is likely to be a jetty thread that we don't actually want to set
an interrupt flag on.

Also fixes toString on OrderByColumnSpec.
2016-08-02 16:02:44 -06:00
kaijianding 50d52a24fc ability to not rollup at index time, make pre aggregation an option (#3020)
* ability to not rollup at index time, make pre aggregation an option

* rename getRowIndexForRollup to getPriorIndex

* fix doc misspelling

* test query using no-rollup indexes

* fix benchmark fail due to jmh bug
2016-08-02 11:13:05 -07:00