Commit Graph

1217 Commits

Author SHA1 Message Date
Roman Leventov 38000576ea Optimizations of union, intersection and iterators of concise bitsets (part of #3798) (#3883)
* Port of metamx/extendedset#10, metamx/extendedset#13, metamx/extendedset#14, metamx/extendedset#15, metamx/bytebuffer-collections@9b199e3349, metamx/bytebuffer-collections#38 to Druid, remove unused code from extendedset module

* Remove ConciseSet.modCount

* Replace comments with assertions in ImmutableConciseSet

* Fix comments

* Fix asssertions in ImmutableConciseSet

* Add tests

* Comment fix
2017-02-10 18:02:26 -08:00
Gian Merlino 9191588656 Fix mvn javadoc:jar failure due to HadoopFsWrapper. (#3912) 2017-02-08 13:54:41 -06:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
DaimonPl 93b71e265e Extract HLL related code to separate module (#3900) 2017-02-03 09:45:11 -08:00
Nishant Bangarwa a457cded28 Druid Extension to enable Authentication using Kerberos. (#3853)
* Add extension for supporting kerberos security

- This PR adds an extension for supporting druid authentication via
Kerberos.
- Working on the docs.

* Add docs

* review comments

* more review comments

* Block all paths by default

* more review comments - use proper Oid

* Allow extensions to override httpclient for integration tests

* Add kerberos lock to prevent multithreaded issues.

* review comment - remove enabled flag and fix router injection

* Add Cookie Handling and more detailed docs

* review comment - rename DruidKerberosConfig -> AuthKerberosConfig

* review comments

* fix travis failure on jdk7
2017-02-02 14:55:21 -06:00
Himanshu efb1b40fe0 build sqlserver-metadata-storage contrib extension (#3871) 2017-01-20 14:39:15 -08:00
kaijianding 33ae9dd485 streaming version of select query (#3307)
* streaming version of select query

* use columns instead of dimensions and metrics;prepare for valueVector;remove granularity

* respect query limit within historical

* use constant

* fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner

* add some test for scan query

* add scan query document

* fix merge conflicts

* add compactedList resultFormat, this format is better for json ser/der

* respect query timeout

* respect query limit on broker

* use static consts and remove unused code
2017-01-19 16:09:53 -06:00
Slim ae5a349a54 Exclude the transitive dependency LGPL jar since it is not needed (#3865)
* Exclude the transitive dependency LGPL jar since it is not needed

* add reason why exclude

* exclude from the root dependency

* add banning tool  to enforce exclusions
2017-01-19 11:49:08 -08:00
Akash Dwivedi dd0c4e2ead Migrating extendedset from Metamarkets. (#3694)
* Migrating extendedset from Metamarkets.

* Notice change

* More details in NOTICE

* NOTICE formatting.

* suppress header checkstlye for extendedset.
2017-01-17 10:10:27 -08:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Gian Merlino a4f81a6471 Update to Calcite 1.11.0. (#3825) 2017-01-06 14:45:17 -08:00
Gian Merlino 1f35120c7e Downgrade to avatica-server 1.8.0, skip avatica-core. (#3813)
This matches the version bundled by Calcite 1.10.0.
2017-01-03 16:00:37 -08:00
Roman Leventov 76cb06a8d8 Lookup cache refactoring (the main part of #3667) (#3697)
* Lookup cache refactoring (the main part of druid-io/druid#3667)

* Use PowerMock's static methods in NamespaceLookupExtractorFactoryTest

* Fix KafkaLookupExtractorFactoryTest

* Use VisibleForTesting annotation instead of Javadoc comment

* Create a NamespaceExtractionCacheManager separately for each test in NamespaceExtractionCacheManagersTest

* Rename CacheScheduler.NoCache.ENTRY_DISPOSED to ENTRY_CLOSED

* Reduce visibility of NamespaceExtractionCacheManager.cacheCount() and monitor() implementations, and don't run NamespaceExtractionCacheManagerExecutorsTest with off-heap cache (it didn't before)

* In NamespaceLookupExtractorFactory, use safer idiom to check if CacheState is NoCache or VersionedCache

* More logging in CacheHandler constructor and close(), VersionedCache.close()

* PR comments addressed

* Make CacheScheduler.EntryImpl AutoCloseable, avoid 'dispose' verb in comments, logging and naming in CacheScheduler in favor of 'close'

* More Javadoc comments to CacheScheduler

* Fix NPE

* Remove logging in OnHeapNamespaceExtractionCacheManager.expungeCollectedCaches()

* Make NamespaceExtractionCacheManagersTest.testRacyCreation() to have similar load to what it be before the refactoring

* Unwrap NamespaceExtractionCacheManager.scheduledExecutorService from unneeded MoreExecutors.listeningDecorator() and specify that this is ScheduledThreadPoolExecutor, which ensures happens-before between periodic runs of the tasks

* More comments on MapDbCacheDisposer.disposed

* Replace concat with Long.toString()

* Comment on why NamespaceExtractionCacheManager.scheduledExecutorService() returns ScheduledThreadPoolExecutor

* Place logging statements in VersionedCache.close() and CacheHandler.close() after actual closing logic, because logging may fail

* Make JDBCExtractionNamespaceCacheFactory and StaticMapExtractionNamespaceCacheFactory to try to close newly created VersionedCache if population has failed, as it is done already in URIExtractionNamespaceCacheFactory

* Don't close the whole CacheScheduler.Entry, if the cache update task failed

* Replace AtomicLong updateCounter and firstRunLatch with Phaser-based UpdateCounter in CacheScheduler.EntryImpl
2016-12-23 18:04:27 -08:00
Gian Merlino 6440ddcbca Fix #3795 (Java 7 compatibility). (#3796)
* Fix #3795 (Java 7 compatibility).

Also introduce Animal Sniffer checks during build, which would
have caught the original problems.

* Add Animal Sniffer on caffeine-cache for JDK8.
2016-12-21 10:19:13 -08:00
Nishant f576a0ff14 Contrib Extension for Ambari Metrics Emitter (#3767)
* Contrib Extension for Ambari Metrics Emitter

extension to enable druid to send metrics to ambari metrics server
(https://cwiki.apache.org/confluence/display/AMBARI/Metrics)

review comments

switch to public repo

* review comments

* add docs

* fix pom version

* Add link for doc page in extensions.md

* remove unused imports

* review comments

review comments

remove unused dependency

review comment
2016-12-19 11:12:47 -08:00
Gian Merlino dd63f54325 Built-in SQL. (#3682) 2016-12-16 17:15:59 -08:00
Jihoon Son 5e39578eee Enable parallel test (#3774)
* Enable parallel test

* Remove unnecessary NotThreadSafe annocation

* Randomize the start port when finding available ports

* Fix test failure

* Change to handle all negatives
2016-12-14 21:05:56 -08:00
Ninglin Du 469ab21091 [Feature] Thrift support for realtime and batch ingestion (#3418)
* Thrift ingestion plugin

1. thrift binary is platform dependent, use scrooge to generate java files to avoid style check failure
2. stream and hadoop ingesion are both supported, input format can be sequence file and lzo thrift block file.
3. base64 and protocol aware

change header

* fix conlicts in pom
2016-12-13 10:05:15 -08:00
Gleb Smirnov 07384d6f40 Update Apache curator to a non-leaky version (see CURATOR-354) (#3769) 2016-12-12 09:52:40 -08:00
Akash Dwivedi 6386e6a4dc root and java-util pom cleanup (#3764)
* Remove bytebuffer-collections dependency from the root pom and java-util pom cleanup.

* Remove json-smart exclusion from root pom
2016-12-08 11:30:19 -08:00
Gian Merlino 943982b7b0 Configurable HTTP compression. (#3759)
* Configurable HTTP compression.

* Call real-time nodes real-time processes in docs.
2016-12-07 17:40:39 -08:00
Roman Leventov 949e65165c Bitset iteration optimization and improve safety (#3753)
* Deduplicate looking for bitset.nextSetBit() in BitSetIterator.next() and hasNext()

* Add BitmapIterationTest

* More elaborate comment on why Roaring is not tested in BitmapIterationTest
2016-12-07 15:49:16 -08:00
Navis Ryu c74d267f50 Support virtual column for select query (#2511)
* Support virtual column for select query

* Addressed comments
2016-12-05 15:14:35 -08:00
Erik Dubbelboer 7d36f540e8 WIP: Add Google Storage support (#2458)
Also excludes the correct artifacts from #2741
2016-11-16 14:06:45 +05:30
Keuntae Park 094f5b851b Support Min/Max for Timestamp (#3299)
* Min/Max aggregator for Timestamp

* remove unused imports and method

* rebase and zip the test data

* add docs
2016-11-14 23:00:21 -08:00
Roman Leventov fbbb55f867 Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics (#3679)
* Update emitter dependency to 0.4.0 and emit "version" dimension for all druid metrics, not only query metrics

* Remove unused imports

* Use empty string instead of "testing-version" as a version placeholder
2016-11-11 17:17:27 -06:00
Akash Dwivedi 3e408497b3 Migrating bytebuffercollections from Metamarkets. (#3647)
* Migrating  bytebuffercollections from Metamarkets.

* resolving code conflicts and removing <p> from bytebuffer-collections.
2016-11-11 10:51:07 -08:00
Gian Merlino 657e4512d2 Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660)
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
2016-11-05 11:34:36 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
Roman Leventov 85ac8eff90 Improve performance of IndexMergerV9 (#3440)
* Improve performance of StringDimensionMergerV9 and StringDimensionMergerLegacy by avoiding primitive int boxing by using IntIterator in IndexedInts instead of Iterator<Integer>; Extract some common logic for V9 and Legacy mergers; Minor improvements to resource handling in StringDimensionMergerV9

* Don't mask index in MergeIntIterator.makeQueueElement()

* DRY conversion RoaringBitmap's IntIterator to fastutil's IntIterator

* Do implement skip(n) in IntIterators extending AbstractIntIterator because original implementation is not reliable

* Use Test(expected=Exception.class) instead of try { } catch (Exception e) { /* ignore */ }
2016-10-13 08:28:46 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
John Zhang 78b06a7d7e make global http client worker threads configurable (#3514) 2016-09-28 23:18:51 -07:00
Slim 3175e17a3b Cached lookup module. first cut implementing JDBC cache (#2819) 2016-09-16 13:45:54 -07:00
Gian Merlino 76fcbd8fc5 Update Curator, ZK to latest stable versions. (#3461) 2016-09-16 09:16:14 -07:00
Gian Merlino 2613e68477 Update java-util to 0.27.10. (#3337) 2016-08-09 13:37:30 +05:30
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Keuntae Park 95a58097e2 Hadoop InputRowParser for Orc file (#3019)
* InputRowParser to decode OrcStruct from OrcNewInputFormat

* add unit test for orc hadoop indexing

* update docs and fix test code bug

* doc updated

* resove maven dependency conflict

* remove unused imports

* fix returning array type from Object[] to correct primitive array type

* fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list

* rebase and updated based on comments

* updated based on comments

* on reflecting review comments

* fix bug in typeStringFromParseSpec() and add unit test

* add license header
2016-07-26 09:42:56 -07:00
Nishant 47894c4eff add comment for default hadoop coordinates (#3257)
1) Modify CliHadoopIndexer to share constant from `TaskConfig.DEFAULT_DEFAULT_HADOOP_COORDINATES`
2) add comment to pom.xml as discussed in
https://github.com/druid-io/druid/pull/3044

fix name
2016-07-18 15:23:11 -07:00
Gian Merlino 13d8d96bc6 Update to guice-4.1.0. (#3222) 2016-07-18 08:08:43 -07:00
Charles Allen 3f1681c16c Caffeine cache extension (#3028)
* Initial commit of caffeine cache

* Address code comments

* Move and fixup README.md a bit

* Improve caffeine readme information

* Cleanup caffeine pom

* Address review comments

* Bump caffeine to 2.3.1

* Bump druid version to 0.9.2-SNAPSHOT

* Make test not fail randomly.

See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation

* Fix distribution and documentation

* Add caffeine to extensions.md

* Fix links in extensions.md

* Lexicographic
2016-07-06 15:42:54 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
Charles Allen aa2982ee31 Update bytebuffer-collections to 0.2.5 (#3117) 2016-06-13 08:41:20 -07:00
Fangjin Yang 53886a677c include avro in the druid tarball (#3123) 2016-06-13 16:58:21 +05:30
David Lim 6d38dde2f8 exclude slf4j-log4j12 (#3075) 2016-06-03 11:39:23 -07:00
Charles Allen 8024b915e2 [QTL] Implement LookupExtractorFactory of namespaced lookup (#2926)
* support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions

- druid-namespace-lookup and druid-kafka-extraction-namespace are modified
- However, druid-namespace-lookup still has configuration about ON/OFF
  HEAP cache manager selection, which is not namespace wide
  configuration but node wide configuration as multiple namespace shares
  the same cache manager

* update KafkaExtractionNamespaceTest to reflect argument signature changes

* Add more synchronization functionality to NamespaceLookupExtractorFactory

* Remove old way of using extraction namespaces

* resolve compile error by supporting LookupIntrospectHandler

* Remove kafka lookups

* Remove unused stuff

* Fix start and stop behavior to be consistent with new javadocs

* Remove unused strings

* Add timeout option

* Address comments on configurations and improve docs

* Add more options and update hash key and replaces

* Move monitoring to the overriding classes

* Add better start/stop logging

* Remove old docs about namespace names

* Fix bad comma

* Add `@JsonIgnore` to lookup factory

* Address code review comments

* Remove ExtractionNamespace from module json registration

* Fix problems with naming and initialization. Add tests

* Optimize imports / reformat

* Fix future not being properly cancelled on failed initial scheduling

* Fix delete returns

* Add more docs about whole introspection

* Add `/version` introspection point for lookups

* Add more tests and address comments

* Add StaticMap extraction namespace for testing. Also add a bunch of tests

* Move cache system property to `druid.lookup.namespace.cache.type`

* Make VERSION lower case

* Change poll period to 0ms  for StaticMap

* Move cache key to bytebuffer

* Change hashCode and equals on static map extraction fn

* Add more comments on StaticMap

* Address comments

* Make scheduleAndWait use a latch

* Sanity renames and fix imports

* Remove extra info in docs

* Fix review comments

* Strengthen failure on start from warn to error

* Address comments

* Rename namespace-lookup to lookups-cached-global

* Fix injective mis-naming
* Also add serde test
2016-05-24 10:56:40 -07:00
Xavier Léauté e79284da59 new interval based cost function (#2972)
* new interval based cost function

Addresses issues with balancing of segments in the existing cost function
- `gapPenalty` led to clusters of segments ~30 days apart
- `recencyPenalty` caused imbalance among recent segments
- size-based cost could be skewed by compression

New cost function is purely based on segment intervals:
- assumes each time-slice of a partition is a constant cost
- cost is additive, i.e. cost(A, B union C) = cost(A, B) + cost(A, C)
- cost decays exponentially based on distance between time-slices

* comments and formatting

* add more comments to explain the calculation
2016-05-17 09:56:00 -07:00
michaelschiff 2203a812bc statsd-emitter (#2410) 2016-04-28 18:41:02 -07:00
Xavier Léauté fc91120b54 Merge pull request #2857 from metamx/upgrade-zk
upgrade zookeeper client dependency to 3.4.8
2016-04-20 10:36:07 +05:30
Xavier Léauté 838768c632 upgrade curator, fixes #2829 (#2849) 2016-04-18 13:17:36 -07:00