Gian Merlino
76a24054e3
JavaScript docs, including docs for globals. ( #3454 )
2016-09-13 13:46:55 -07:00
Gian Merlino
bcff08826b
KafkaIndexTask: Treat null values as unparseable. ( #3453 )
2016-09-13 10:56:38 -07:00
Slim
ba6ddf307e
Adding hadoop kerberos authentification. ( #3419 )
...
* adding kerberos authentication
* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Jonathan Wei
df766b2bbd
Add dimension handling interface for ingestion and segment creation ( #3217 )
...
* Add dimension handling interface for ingestion and segment creation
* update javadocs for DimensionHandler/DimensionIndexer
* Move IndexIO row validation into DimensionHandler
* Fix null column skipping in mergerV9
* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion
* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Alexander Saydakov
1a5042ca26
updated dependency on sketches-core ( #3443 )
...
* updated dependency on sketches-core to 0.7.0
* Use sketches-core-0.4.1, which is the latest version still compatible
with JDK7
2016-09-09 16:21:32 -07:00
Slim
6a1cd7fc66
avoid throwing exceptions fix#3389 ( #3441 )
...
* avoid throwing exceptions
* log alert
* fix comments
2016-09-09 16:19:50 -07:00
Gian Merlino
d108461838
groupBy v2: Parallel disk spilling. ( #3433 )
...
In ConcurrentGrouper, when it becomes clear that disk spilling is necessary, switch
from hash-based partitioning to thread-based partitioning. This stops processing
threads from blocking each other while spilling is occurring.
2016-09-09 16:49:58 -06:00
Gian Merlino
1344e3c3af
Clearer filter docs. ( #3448 )
2016-09-09 13:47:13 -07:00
Himanshu
3b6c81e7c0
fix cleanup of hadoop ingestion intermediate path ( #3385 )
2016-09-08 01:36:56 +05:30
Gian Merlino
1e3f94237e
groupBy v2: Configurable load factor. ( #3437 )
...
Also change defaults:
- bufferGrouperMaxLoadFactor from 0.75 to 0.7.
- maxMergingDictionarySize to 100MB from 25MB, should be more appropriate
for most heaps.
2016-09-07 14:14:59 -05:00
Roman Leventov
4f0bcdce36
Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9 ( #3422 )
...
* Eager file unmapping in IndexIO, IndexMerger and IndexMergerV9. The exact purpose for this change is to allow running IndexMergeBenchmark in Windows, however should also be universally 'better' than non-deterministic unmapping, done when MappedByteBuffers are garbage-collected (BACKEND-312)
* Use Closer with a proper pattern in IndexIO, IndexMerger and IndexMergerV9
* Unmap file in IndexMergerV9.makeInvertedIndexes() using try-with-resources
* Reformat IndexIO
2016-09-07 10:43:47 -07:00
Fangjin Yang
c0e62b536a
Changes to lambda architecture paper required for HICSS ( #3382 )
...
* changes for hicss
* more updates
* revert test changes
* final edits
2016-09-06 21:32:21 -07:00
David Lim
146a17de48
KafkaIndexTask: allow pause to break out of retry loop ( #3401 )
2016-09-06 22:29:37 -06:00
Gian Merlino
8d2ae144a8
groupBy: Short-circuit identity preCompute manipulators. ( #3434 )
2016-09-06 22:28:44 -06:00
Gian Merlino
1d07964987
LimitedTemporaryStorage: Fix perf bug. ( #3432 )
...
FilterOutputStream has an inefficient implementation of write(byte[], int, int).
So let's extend OutputStream directly and use efficient implementations of all
methods.
2016-09-06 15:39:36 -07:00
Gian Merlino
6827c09311
GroupByBenchmark: Fix queryable index generation, improve memory use. ( #3431 )
...
With the old code, all on-disk segments were the same. Now they're different.
This will end up altering benchmark results for queryMultiQueryableIndex,
likely making them slower (since values won't group as well as they used to).
The memory changes will help test with larger/more segments, since we won't
have to hold them all in memory at once.
2016-09-06 14:37:55 -07:00
David Lim
5b1ae21bd1
retry calls to getStartTime ( #3429 )
2016-09-06 14:02:22 -07:00
David Lim
3a97fd4d6c
doc fix ( #3430 )
2016-09-06 13:13:30 -06:00
Himanshu
2235988069
update wikipedia search query in the integration tests as per the fix in commit 0076b5f
( #3420 )
2016-09-01 10:13:17 -07:00
Gian Merlino
8ed1894488
groupBy: Omit timestamp from merge key when granularity = all. ( #3416 )
...
Fixes #3412 .
2016-09-01 09:02:54 -07:00
Gian Merlino
6d25c5e053
Avoid materializing all groupBy results with order + limit. ( #3410 )
...
The old TopNFunction code did Sequences.toList on the input sequence before
using a priority queue to find the top N items. Now, the priority queue
is used in an accumulator, so there is no need to fully materialize the results.
Also removed equals/hashCode from the limitFn and remove limitFn from the
GroupByQuery's hashCode, since that wasn't necessary and the implementation
of hashCode wasn't correct anyway.
2016-08-31 14:08:07 -07:00
Gian Merlino
1268e2902c
Add groupBy test for multiple multi-value dimensions. ( #3415 )
2016-08-31 11:21:10 -07:00
Gian Merlino
e9050c2b4c
TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. ( #3411 )
2016-08-31 20:58:53 +05:30
Keuntae Park
0076b5fc1a
Interval bug fix for search query ( #2903 )
...
* support query granularity and interval for search query
* skip unncessary bitmap calculation when query interval contains whole the data interval of the given segments.
* use binary search to find start and end index for the given interval
* fix based on comment
* bug fix based on the review comments and add unit tests
2016-08-31 20:52:44 +05:30
Stéphane Derosiaux
48dce88aab
Add flag binaryAsString for parquet ingestion ( #3381 )
2016-08-30 17:30:50 -07:00
Dave Li
c4e8440c22
Adds long compression methods ( #3148 )
...
* add read
* update deprecated guava calls
* add write and vsizeserde
* add benchmark
* separate encoding and compression
* add header and reformat
* update doc
* address PR comment
* fix buffer order
* generate benchmark files
* separate encoding strategy and format
* fix benchmark
* modify supplier write to channel
* add float NONE handling
* address PR comment
* address PR comment 2
2016-08-30 16:17:46 -07:00
Jonathan Wei
4e91330a17
Use DimensionSpec in CardinalityAggregatorFactory ( #3406 )
...
* Use DimensionSpec in CardinalityAggregatorFactory
* Address PR comments
* Fix requiredFields()
2016-08-30 15:54:02 -07:00
Nishant
4c2b8d29d3
Make RTR assign pending tasks by insertion order ( #3405 )
2016-08-30 12:22:44 -07:00
Gian Merlino
b11e9544ea
GroupBy v2: Improve hash code distribution. ( #3407 )
...
Without this transformation, distribution of hash % X is poor in general.
It is catastrophically poor when X is a multiple of 31 (many slots would
be empty).
2016-08-30 12:09:08 +05:30
kaijianding
f037dfcaa4
fix missing segments duplicate retried ( #3398 )
2016-08-29 23:46:21 +05:30
Ashish
6b40bf8b32
doc: added note to README, about necessary hdfs config after insert-segment-to-db ( #3402 )
2016-08-28 16:39:33 -07:00
Gleb Smirnov
8bee07e81e
Respect server-side sorting of tasks in coordinator console ( #3404 )
2016-08-28 16:38:29 -07:00
jaehong choi
2e0f253c32
introducing lists of existing columns in the fields of select queries' output ( #2491 )
...
* introducing lists of existing columns in the fields of select queries' output
* rebase master
* address the comment. add test code for select query caching
* change the cache code in SelectQueryQueryToolChest to 0x16
2016-08-25 21:37:53 +05:30
Chanh Le
d624037698
Pull-deps: correct the library directory in the document ( #3361 )
...
* Pull-deps: correct the library directory in the document
* Pull-deps: correct the library directory in the document in the last example command
2016-08-16 17:18:15 -07:00
Fangjin Yang
edb0eca3a9
fix docs ( #3370 )
2016-08-16 16:25:50 -07:00
Fangjin Yang
6beb8ac342
fix some docs and add new content ( #3369 )
2016-08-16 15:00:18 -07:00
Hamlet Lee
e4f0eac8e6
Fix issue #2707 ( #2708 )
2016-08-16 12:19:44 -05:00
kaijianding
eafafce1aa
fix old usage of dimension as string instead of dimensionSchema in DataSchema ( #3365 )
2016-08-16 09:58:04 -07:00
David Lim
ed924bf214
allow registrants to opt out of announcing themselves when registering as a chat handler ( #3360 )
2016-08-16 10:51:28 +05:30
rajk-tetration
362b9266f8
Adding filters for TimeBoundary on backend ( #3168 )
...
* Adding filters for TimeBoundary on backend
Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com>
* updating TimeBoundaryQuery constructor in QueryHostFinderTest
* add filter helpers
* update filterSegments + test
* Conditional filterSegment depending on whether a filter exists
* Style changes
* Trigger rebuild
* Adding documentation for timeboundaryquery filtering
* added filter serialization to timeboundaryquery cache
* code style changes
2016-08-15 10:25:24 -07:00
Himanshu
70d99fe3c6
Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest ( #3363 )
...
or else the test fails if ran independently.
2016-08-15 10:19:33 -07:00
Gian Merlino
e1b0b7de3e
IndexBuilder: Allow replacing rows, customizable maxRows. ( #3359 )
2016-08-12 15:22:45 -07:00
kaijianding
df89f25b15
fix can't get latest offset in KafkaEightSimpleConsumerFirehoseFactory ( #3355 )
2016-08-11 18:00:24 -07:00
Jonathan Wei
454587857c
Make StringComparator deserialization case-insensitive ( #3356 )
2016-08-11 18:00:11 -07:00
jianran
18af480017
Rename fields in OrderedMergeIterator ( #3149 )
...
* code readable
* fix the pre middle manager peon no stop
* Revert "fix the pre middle manager peon no stop"
This reverts commit 6cef4980bf
.
2016-08-11 09:42:12 -07:00
Gian Merlino
2f46effc8e
FileTaskLogsTest: Throw unthrown exception. ( #3352 )
2016-08-11 09:40:28 -07:00
Himanshu
03cfcf002b
fix the race described in #3174 ( #3205 )
2016-08-10 11:29:50 -07:00
Himanshu
043562914d
Update IncrementalIndex.getMetricType() to return type name stored by ComplexMetricsSerde instead of AggregatorFactory.getTypeName() ( #3341 )
2016-08-10 11:03:44 -07:00
Himanshu
46da682231
avro-extensions -- feature to specify avro reader schema inline in the task json for all events ( #3249 )
2016-08-10 10:49:26 -07:00
Gian Merlino
1eb7a7e882
Restore optimizations in BoundFilter. ( #3343 )
2016-08-10 08:53:17 -07:00