7710 Commits

Author SHA1 Message Date
Gian Merlino
600bbd4a17 BucketExtractionFn: Implement hashCode, fix toString. (#3656) 2016-11-04 11:24:02 -07:00
Gian Merlino
8b3c86f41f Fix FilteredAggregatorFactory toString formatting. (#3657) 2016-11-04 11:23:55 -07:00
Gian Merlino
2c504b6258 Add "like" filter. (#3642)
* Add "like" filter.

* Addressed some PR comments.

* Slight simplifications to LikeFilter.

* Additional simplifications.

* Fix comment in LikeFilter.

* Clarify comment in LikeFilter.

* Simplify LikeMatcher a bit.

* No use going through the optimized path if prefix is empty.

* Add more tests.
2016-11-04 23:25:03 +05:30
Nishant
b961b6a69f Add configs to enable running integration tests for cluster running behind proxy (#3646)
* Add configs to enable running integration tests for cluster running behind proxy

As part of https://issues.apache.org/jira/browse/KNOX-758
I am working on adding support for proxying druid queries & UIs using
Apache KNOX gateway.
This PR adds support for integration-tests to be run using the proxy
gateway.
Changes Include -
1) Instead of hostName and port, ability to specify url in the config
file.
2) tests now use HTTPClient defined in DruidTestModule that can pass in
the request.

Note - the config changes are backwards compatible and existing configs
should work fine.

* review comments

* review comments
2016-11-04 08:19:38 -07:00
Navis Ryu
b99e14e732 Support configuration for handling multi-valued dimension (#2541)
* Support configuration for handling multi-valued dimension

* Addressed comments

* use MultiValueHandling.ofDefault() for missing policy
2016-11-03 22:38:54 -06:00
Gian Merlino
4203580290 URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512)
* URIExtractionNamespace: Treat null values in lookup maps as missing entries.

This is useful when many logical lookups are derived from the same base JSON file,
and some lookups' values may be unknown sometimes.

* Add test, logging message, and address other comments.

* Update docs.
2016-11-03 13:53:04 -07:00
Navis Ryu
e10def32f2 Support string type in math expression (#2836)
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
2016-11-02 21:10:48 -06:00
Himanshu
2362effd8c use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650) 2016-11-02 15:58:55 -07:00
Roman Leventov
36a1543222 Lookup cache bug fixes (#3609)
* Return better lastVersion from JDBCExtractionNamespaceCacheFactory's cache populator callable

* Return the lastVersion if URI lookup last modified date is not later than the last cached, from URIExtractionNamespaceCacheFactory's cache populator callable

* Fix a race condition in NamespaceExtractionCacheManager.cancelFuture()

* Don't delete cache from NamespaceExtractionCacheManager if the ExtractionNamespaceCacheFactory returned the same version as the last; Better exception treatment in the scheduled cache updater runnable in NamespaceExtractionCacheManager (in particular, don't consume Errors); throw AssertionError in StaticMapExtractionNamespaceCacheFactory if the lastVersion != null)

* In NamespaceExtractionCacheManager, put NamespaceImplData.latestVersion update in the same synchronized() block with swapAndClearCache(id, cacheId); Turn getPostRunnable which returns a callback into a simple updateNamespace() method

* In StaticMapExtractionNamespaceCacheFactory.getCachePopulator(), check the input directly, not inside a callback

* In URIExtractionNamespaceCacheFactory, allow URI last modified time to go backwards

* Better logging in NamespaceExtractionCacheManager

* Add comment on lastVersion nullability in URIExtractionNamespaceCacheFactory
2016-11-02 09:40:19 -07:00
kaijianding
2961406b90 fix zero period in PeriodGranularity causing gran.iterable(start, end) infinite loop (#3644) 2016-11-02 15:40:07 +05:30
Roman Leventov
4b0d6cf789 Fix resource leaks (ComplexColumn and GenericColumn) (#3629)
* Remove unused ComplexColumnImpl class

* Remove throws IOException from close() in GenericColumn, ComplexColumn, IndexedFloats and IndexedLongs

* Use concise try-with-resources syntax in several places

* Fix resource leaks (ComplexColumn and GenericColumn) in SegmentAnalyzer, SearchQueryRunner, QueryableIndexIndexableAdapter and QueryableIndexStorageAdapter

* Use Closer in Iterable, returned from QueryableIndexIndexableAdapter.getRows(), in order to try to close everything even if closing some parts thew exceptions
2016-11-02 09:23:52 +05:30
Himanshu
eb70a12e43 fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636) 2016-11-01 12:45:38 -05:00
kaijianding
f1dee037d6 fix 'No Such File' error when execute script out of druid installation directory (#3517) 2016-11-01 09:57:09 -07:00
Gian Merlino
45940d6e40 Math expressions support for missing columns. (#3630)
Also add SchemaEvolutionTest to help test this kind of thing.

Fixes #3627 and includes test for #3625.
2016-11-01 09:40:25 -07:00
Himanshu
0e269ce72a max row limit is necessary is connector is setup for streaming (#3635) 2016-11-01 09:39:55 -07:00
Gian Merlino
89d9c61894 Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572) 2016-10-31 15:24:30 -07:00
Himanshu
32c5494e97 eagerly allocate the intermediate computation buffers (#3628) 2016-10-31 15:24:07 -07:00
Gian Merlino
9f5c895d5f Fix imports in DefaultOfflineAppenderatorFactoryTest. (#3624) 2016-10-31 12:23:19 -07:00
Slim
f6995bc908 offline appenderator factory. (#3483)
* adding default offline appenderator

* adding test

* fix comments

* fix comments
2016-10-31 10:05:58 -07:00
Aveplatter
317d62e18c teeny tiny wording change (#3623) 2016-10-31 09:46:54 -07:00
Navis Ryu
3fca3be9ea SpecificSegmentQueryRunner misses missing segments from toYielder() (#3617) 2016-10-30 11:47:29 -07:00
Himanshu
23a8e22836 fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613) 2016-10-28 15:48:33 -07:00
Akash Dwivedi
6a845e1f7b Adding getDelegate() to directly access delegate. (#3616)
👍
2016-10-27 15:57:36 -07:00
Charles Allen
78159d7ca4 Move off-heap QTL global cache delete lock outside of subclass lock (#3597)
* Move off-heap QTL global cache delete lock outside of subclass lock

* Make `delete` thread safe
2016-10-27 22:23:53 +05:30
Navis Ryu
0799640299 Faster interval comparator (#3605) 2016-10-26 14:20:27 +05:30
Parag Jain
98465f47b5 start stop metamx lifecycle annotated objects as well (#3610) 2016-10-25 15:56:16 -07:00
Navis Ryu
898c1c21af More best-effort parse long (#3603)
* More best-effort parse long

* addressed comments
2016-10-25 10:31:51 -07:00
David Lim
3c56cbdf82 fix timing issue with KafkaLookupExtractorFactoryTest (#3604) 2016-10-25 07:04:51 -07:00
Himanshu
641469fc38 manage overshadowing efficiently at coordinator (#3584)
* manage overshadowing efficiently at coordinator

* take readlock in VersionedIntervalTimeline.isOvershadowed()
2016-10-24 22:49:08 +05:30
Charles Allen
9bb735133f Move copyright to proper druid.io form (#3602) 2016-10-21 16:23:53 -07:00
Akash Dwivedi
4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Navis Ryu
8b7ff4409a Math expressional parameters for aggregator (#2783)
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was #2820)

* Addressed comments

* addressed comments
2016-10-19 13:58:35 -05:00
Roman Leventov
b113a34355 In CPUTimeMetricQueryRunner, account CPU consumed in baseSequence.toYielder() (#3587) 2016-10-18 09:06:42 -05:00
Charles Allen
2c5c8198db Make query/cpu/time still report on error (#3535) 2016-10-18 08:26:21 -05:00
Nishant
8ea5f9324d Integration Tests - fix middlemanager property name in doc (#3586) 2016-10-18 08:23:34 -05:00
Gian Merlino
dd0bb6da1e Unit test for #3544: Avoid exceptions for dataSource spec when using s3. (#3571) 2016-10-17 12:41:43 -07:00
Roman Leventov
9611358f0a Small topn scan improvements (#3526)
* Remove unused numProcessed param from PooledTopNAlgorithm.aggregateDimValue()

* Replace AtomicInteger with simple int in PooledTopNAlgorithm.scanAndAggregate() and aggregateDimValue()

* Remove unused import
2016-10-17 10:36:19 -07:00
Gian Merlino
285516bede Workaround non-thread-safe use of HLL aggregators. (#3578)
Despite the non-thread-safety of HyperLogLogCollector, it is actually currently used
by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and
"get" methods can be called simultaneously by OnheapIncrementalIndex, since its
"doAggregate" and "getMetricObjectValue" methods are not synchronized.

This means that the optimization of HyperLogLogCollector.fold in #3314 (saving and
restoring position rather than duplicating the storage buffer of the right-hand side)
could cause corruption in the face of concurrent writes.

This patch works around the issue by duplicating the storage buffer in "get" before
returning a collector. The returned collector still shares data with the original one,
but the situation is no worse than before #3314. In the future we may want to consider
making a thread safe version of HLLC that avoids these kinds of problems in realtime
indexing. But for now I thought it was best to do a small change that restored the old
behavior.
2016-10-17 09:39:12 -07:00
David Lim
c2ae734848 KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534)
* allow run thread to gracefully complete instead of interrupting when stopGracefully() is called

* add comments
2016-10-17 10:52:19 -04:00
Gian Merlino
c1d3b8a30c Remove dropwizard-jdbc dependency from lookups-cached-single. (#3573)
Fixes #3548.
2016-10-17 10:37:47 -04:00
Gian Merlino
0ce33bc95f HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577)
Fixes #3576.
2016-10-17 10:37:18 -04:00
David Lim
472c409b99 KafkaLookupExtractorFactory: shutdown kafka consumer on close() (#3539)
* shutdown kafka consumer on close

* handle close() race condition
2016-10-15 09:55:51 -07:00
Charles Allen
3b6261c690 Add druid-lookups-cached-single to default distribution build (#3550)
Fixes #3527
2016-10-15 08:11:04 -07:00
Navis Ryu
4554c1214b Avoid exceptions for dataSource spec when using s3 (#3544) 2016-10-14 18:24:19 -07:00
Roman Leventov
5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Roman Leventov
85ac8eff90 Improve performance of IndexMergerV9 (#3440)
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
2016-10-13 08:28:46 -07:00
Gian Merlino
ddc856214d When inserting segments, mark unused if already overshadowed. (#3499)
This is useful for the insert-segment-to-db tool, which would otherwise
potentially insert a lot of overshadowed segments as "used", causing
load and drop churn in the cluster.
2016-10-10 18:10:18 -07:00
jaehong choi
6f21778364 Support finding segments in AWS S3. (#3399)
* support finding segments from a AWS S3 storage.

* add more Uts

* address comments and add a document for the feature.

* update docs indentation

* update docs indentation

* address comments.
1. add a Ut for json ser/deser for the config object.
2. more informant error message in a Ut.

* address comments.
1. use @Min to validate the configuration object
2. change updateDescriptor to a string as it does not take an argument otherwise

* fix a Ut failure - delete a Ut for testing default max length.
2016-10-10 17:27:09 -07:00
Parag Jain
1e79a1be82 fix useExplicitVersion (#3559) 2016-10-10 14:28:06 -05:00
Akash Dwivedi
3a83e0513e Doc update(batch-ingestion) to include useExplicitVersion. (#3557) 2016-10-07 14:48:00 -07:00