Commit Graph

954 Commits

Author SHA1 Message Date
Jerry Chung 0bcfd9354c Fix S3 deep storage push and s3 insert-segment-to-db (#4174)
* Fix S3 deep storage push and s3 insert-segment-to-db

* Less verbose checks in S3DataSegmentFinder
2017-04-14 19:42:10 -07:00
Gian Merlino b2954d5fea Better groupBy error messages and docs around resource limits. (#4162)
* Better groupBy error messages and docs around resource limits.

* Fix BufferGrouper test from datasketches.

* Further clarify.
2017-04-13 10:38:53 -07:00
Roman Leventov 15f3a94474 Copy closer into Druid codebase (fixes #3652) (#4153) 2017-04-10 09:38:45 +09:00
Parag Jain 7e0d4c9555 secure supervisor endpoints (#3985) 2017-04-05 16:42:32 -07:00
Roman Leventov 73d9b31664 GenericIndexed minor bug fixes, optimizations and refactoring (#3951)
* Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream

* Fixes

* Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils

* Move constructors

* Add GenericIndexedBenchmark

* Comments

* Typo

* Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe

* Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap

* Optimize TableLongEncodingWriter

* Add checks to SerializerUtils methods

* Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream()

* Update GenericIndexedBenchmark

* SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian

* Add GenericIndexedBenchmark.indexOf()

* More checks in methods in SerializerUtils

* Use helperBuffer.arrayOffset()

* Optimizations in SerializerUtils
2017-03-27 14:17:31 -05:00
Benedict Jin 23f77ebd20 Explain Avro's unnecessary EOFException (#4098) (#4100)
* Explain Avro's unnecessary EOFException (#4098)

* add jira link into log message
2017-03-24 10:45:45 -05:00
Gian Merlino 4b9f975f50 Rename SketchAggregationWithSimpleDataTest. (#4105)
Tests that don't end in "Test" won't get run automatically by Maven.
2017-03-23 14:20:50 -07:00
Akash Dwivedi ff7f90b02d relocate method in BufferAggregator. (#4071)
*  relocate method in BufferAggregator.

* Unused import.

* Detailed javadoc.

* using Int2ObjectMap.

* batch relocate.

* Revert batch relocate.

* Unused import.

* code comments.

* code comment.
2017-03-23 13:07:59 -07:00
Roman Leventov 84fe91ba0b Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798) (#3889)
* Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing.

* Use Execs.singleThreaded()

* RuntimeShapeInspector to support nullable fields

* Make CalledFromHotLoop annotation Inherited

* Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory

* Close InputStream in SpecializationService

* Formatting

* Test specialized PooledTopNScanners

* Set flags in PooledTopNAlgorithm directly

* Fix tests, dependent on CountAggragatorFactory toString() form

* Fix

* Revert CountAggregatorFactory changes

* Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector

* Remove duplicate RoaringBitmap dependency in the extendedset pom.xml

* Fix

* Treat ByteBuffers specially in StringRuntimeShape

* Doc fix

* Annotate BufferAggregator.init() with CalledFromHotLoop

* Make triggerSpecializationIterationsThreshold an int

* Remove SpecializationService.PerPrototypeClassState.of()

* Add comments

* Limit the amount of specializations that SpecializationService could make

* Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions

* Use more efficient ConcurrentMap's idioms in SpecializationService
2017-03-17 14:44:36 -05:00
Charles Allen 805d85afda Allow compilation as Java8 source and target (#3328)
* Allow compilation as Java8 source and target for everything except API

* Remove conditions in tests which assume that we may run with Java 7

* Update easymock to 3.4

* Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration

* Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity()

* Remove java7 special for druid-api
2017-03-14 22:23:47 -06:00
Gian Merlino 3216134f8c SQL: Make row extractions extensible and add one for lookups. (#3991)
This is a reopening of #3989, since that PR was merged to master prematurely
and accidentally.
2017-03-13 21:56:16 -07:00
Nishant Bangarwa adbe89e7d6 Fix race in KafkaIndexTaskTest (#4031)
task.pause(0) can return early before the task is actually paused.
Exception for failure -
java.lang.AssertionError: expected:<PAUSED> but was:<READING>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at
io.druid.indexing.kafka.KafkaIndexTaskTest.testRunWithOffsetOutOfRangeEx
ceptionAndPause(KafkaIndexTaskTest.java:1229)

To reproduce add Thread.sleep(10000) in beginning of
KafkaIndexTask.possiblypause method.
2017-03-09 07:34:46 -08:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Akash Dwivedi bebf9f34c7 HdfsDataSegmentPusher bug fix (#4003)
* Fix for HdfsDataSegmentPusher.

* Add missing loadspec in actual descriptor file. Tests to check actual content of descriptor file.
2017-03-06 00:53:44 -08:00
Gian Merlino df623ebfe3 Fix a couple bugs due to calling Period.getMillis(). (#4006) 2017-03-05 18:44:20 +05:30
Roman Leventov 81a5f9851f TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990)
* In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir

* Use FileUtils.forceMkdir() across the codebase and remove some unused code

* Fix test

* Fix PullDependencies.run()

* Unused import
2017-03-02 14:05:12 -08:00
Gian Merlino e63eefd7ff Revert "SQL: Make row extractions extensible and add one for lookups. (#3989)"
The PR was merged to master accidentally.

This reverts commit 23927a3c96.
2017-03-01 17:06:12 -08:00
Gian Merlino 23927a3c96 SQL: Make row extractions extensible and add one for lookups. (#3989)
* SQL: Make row extractions extensible and add one for lookups.

* Fix QuantileSqlAggregatorTest.
2017-03-01 17:03:43 -08:00
Akash Dwivedi 94da5e80f9 Namespace optimization for hdfs data segments. (#3877)
* NN optimization for hdfs data segments.

* HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage
format.Docs update.

* Common utility function in DataSegmentPusherUtil.

* new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper

* reuse getHdfsStorageDirUptoVersion in
DataSegmentPusherUtil.getHdfsStorageDir()

* Addressed comments.

* Review comments.

* HdfsDataSegmentKiller requested changes.

* extra newline

* Add maprfs.
2017-03-01 09:51:20 -08:00
Akash Dwivedi 91344cbe57 Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987)
* Enable GenericIndexed V2 for complex columns.

* SerializerBuilder to use  GenericColumnSerializer.
2017-02-28 22:06:54 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Gian Merlino f21641f0dc Fix over-optimistic log message. (#3963)
"Wrote task log" could be logged before the output stream is flushed and
closed, which could generate an error and not actually write the log.
2017-02-22 15:02:53 -08:00
Parag Jain edb032b96d add datasource in intermediate segment path (#3961) 2017-02-22 16:31:00 -06:00
Gian Merlino 985203b634 Finalize fields in postaggs (#3957)
* initial commits for finalizeFieldAccess #2433

* fix some bugs to run a query

* change name of method Queries.verifyAggregations to Queries.prepareAggregations

* add Uts

* fix Ut failures

* rebased to master

* address comments and add a Ut for arithmetic post aggregators

* rebased to the master

* address the comment of injection within arithmetic post aggregator

* address comments and introduce decorate() in the PostAggregator interface.

* Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name.

* Fix a code style mismatch.

* Rebased to the master
2017-02-21 16:32:14 -08:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
Gian Merlino 78b0d134ae Require Java 8 and include some Java 8 dependencies. (#3914)
* Require Java 8 and include some Java 8 dependencies.

- Upgrade Jetty to 9.3.16.v20170120.
- Upgrade DataSketches to 0.8.4.
- Bundle caffeine-cache by default.
- Still target Java 7 when compiling base Druid classes.

* Update cluster, quickstart docs.

* Remove oraclejdk7 from travis.yml.
2017-02-14 12:51:51 -08:00
Akash Dwivedi 8854ce018e File.deleteOnExit() (#3923)
* Less use of File.deleteOnExit()
 * removed deleteOnExit from most of the tests/benchmarks/iopeon
 * Made IOpeon closable

* Formatting.

* Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon
2017-02-13 15:12:14 -08:00
Parag Jain 1f263fe50b alert when resetting offsets (#3931)
* alert when resetting offsets

* add more data to alerts
2017-02-13 13:49:24 -08:00
michaelschiff c1eee9bbf3 modified "end" column to `end` (#3903)
* modified "end" column to `end`.  "end" is interpretted as a string rather than dereferencing the column value

* SQLMetadataConnector.getQuoteString defines the string that should be used to quote string fields

* positional arguments for String.format

* for Connectors that use " need to include the \ escape as well
2017-02-13 12:36:27 -08:00
Jihoon Son 991e2852da Add PostAggregators to generator cache keys for top-n queries (#3899)
* Add PostAggregators to generator cache keys for top-n queries

* Add tests for strings

* Remove debug comments

* Add type keys and list sizes to cache key

* Make post aggregators used for sort are considered for cache key generation

* Use assertArrayEquals()

* Improve findPostAggregatorsForSort()

* Address comments

* fix test failure

* address comments
2017-02-13 12:23:44 -08:00
Parag Jain 8e31a465ad report hand off count finite appenderator driver (#3925) 2017-02-13 10:41:24 -08:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Parag Jain 1aabb45a09 auto reset option for Kafka Indexing service (#3842)
* auto reset option for Kafka Indexing service in case message at the offset being fetched is not present anymore at kafka brokers

* review comments

* review comments

* reverted last change

* review comments

* review comments

* fix typo
2017-02-02 14:57:45 -06:00
Nishant Bangarwa a457cded28 Druid Extension to enable Authentication using Kerberos. (#3853)
* Add extension for supporting kerberos security

- This PR adds an extension for supporting druid authentication via
Kerberos.
- Working on the docs.

* Add docs

* review comments

* more review comments

* Block all paths by default

* more review comments - use proper Oid

* Allow extensions to override httpclient for integration tests

* Add kerberos lock to prevent multithreaded issues.

* review comment - remove enabled flag and fix router injection

* Add Cookie Handling and more detailed docs

* review comment - rename DruidKerberosConfig -> AuthKerberosConfig

* review comments

* fix travis failure on jdk7
2017-02-02 14:55:21 -06:00
Charles Allen a73f1c9c70 Make s3 work better (#3898) 2017-02-02 10:04:30 -08:00
Jonathan Wei e6b95e80aa Remove deprecated Aggregator/AggregatorFactory methods (#3894) 2017-02-01 14:43:18 -08:00
Gian Merlino ac84a3e011 SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868)
* SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE.

* Fix bug with re-use of filtered approximate histogram aggregators.

Also add APPROX_QUANTILE tests for filtering and running on complex columns.
Includes some slight refactoring to allow tests to make DruidTables that
include complex columns.

* Remove unused import
2017-01-25 18:39:26 -08:00
Parag Jain b3dae0efc3 catch all errors (#3844) 2017-01-24 18:01:30 -07:00
Gian Merlino d51f5e058d SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852)
* SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators.

Switched from CalciteConnection to Planner, bringing benefits:

- CalciteConnection's JDBC interface no longer sits between the SQL server
  (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use
  Druid Sequence objects directly, reducing overhead in the query return path.

- Implemented our own Planner-based Avatica Meta, letting us control
  connection timeouts and connection / statement limits. The previous
  CalciteConnection-based implementation didn't have any limits or timeouts.

- The Planner interface lets us override the operator table, opening up
  SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT
  in core, and a QUANTILE aggregator in the druid-histogram extension.

Also:

- Added INFORMATION_SCHEMA metadata schema.

- Added tests for Unicode literals and escapes.

* Verify statement is actually open before closing it.

* More detailed INFORMATION_SCHEMA docs.
2017-01-19 16:32:20 -08:00
Akash Dwivedi e550d48772 Using fully qualified hdfs path. (#3705)
* Using fully qualified hdfs path.

* Review changes.

* Remove unused imports.

* Variable name change.
2017-01-17 14:40:22 -06:00
Jihoon Son d80bec83cc Enable auto license checking (#3836)
* Enable license checking

* Clean duplicated license headers
2017-01-10 18:13:47 -08:00
Roman Leventov 49d71e9b38 Fix the build after #3697 (#3807) 2016-12-26 17:06:48 -06:00
Roman Leventov 33800122ad Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631) 2016-12-26 00:35:35 -06:00
Roman Leventov 76cb06a8d8 Lookup cache refactoring (the main part of #3667) (#3697)
* Lookup cache refactoring (the main part of druid-io/druid#3667)

* Use PowerMock's static methods in NamespaceLookupExtractorFactoryTest

* Fix KafkaLookupExtractorFactoryTest

* Use VisibleForTesting annotation instead of Javadoc comment

* Create a NamespaceExtractionCacheManager separately for each test in NamespaceExtractionCacheManagersTest

* Rename CacheScheduler.NoCache.ENTRY_DISPOSED to ENTRY_CLOSED

* Reduce visibility of NamespaceExtractionCacheManager.cacheCount() and monitor() implementations, and don't run NamespaceExtractionCacheManagerExecutorsTest with off-heap cache (it didn't before)

* In NamespaceLookupExtractorFactory, use safer idiom to check if CacheState is NoCache or VersionedCache

* More logging in CacheHandler constructor and close(), VersionedCache.close()

* PR comments addressed

* Make CacheScheduler.EntryImpl AutoCloseable, avoid 'dispose' verb in comments, logging and naming in CacheScheduler in favor of 'close'

* More Javadoc comments to CacheScheduler

* Fix NPE

* Remove logging in OnHeapNamespaceExtractionCacheManager.expungeCollectedCaches()

* Make NamespaceExtractionCacheManagersTest.testRacyCreation() to have similar load to what it be before the refactoring

* Unwrap NamespaceExtractionCacheManager.scheduledExecutorService from unneeded MoreExecutors.listeningDecorator() and specify that this is ScheduledThreadPoolExecutor, which ensures happens-before between periodic runs of the tasks

* More comments on MapDbCacheDisposer.disposed

* Replace concat with Long.toString()

* Comment on why NamespaceExtractionCacheManager.scheduledExecutorService() returns ScheduledThreadPoolExecutor

* Place logging statements in VersionedCache.close() and CacheHandler.close() after actual closing logic, because logging may fail

* Make JDBCExtractionNamespaceCacheFactory and StaticMapExtractionNamespaceCacheFactory to try to close newly created VersionedCache if population has failed, as it is done already in URIExtractionNamespaceCacheFactory

* Don't close the whole CacheScheduler.Entry, if the cache update task failed

* Replace AtomicLong updateCounter and firstRunLatch with Phaser-based UpdateCounter in CacheScheduler.EntryImpl
2016-12-23 18:04:27 -08:00
Himanshu 4ca3b7f1e4 overlord helpers framework and tasklog auto cleanup (#3677)
* overlord helpers framework and tasklog auto cleanup

* review comment changes

* further review comments addressed
2016-12-21 15:18:55 -08:00
Gian Merlino 6440ddcbca Fix #3795 (Java 7 compatibility). (#3796)
* Fix #3795 (Java 7 compatibility).

Also introduce Animal Sniffer checks during build, which would
have caught the original problems.

* Add Animal Sniffer on caffeine-cache for JDK8.
2016-12-21 10:19:13 -08:00
David Lim 0b9dff0bc1 fix worker thread pool exhaustion bug (#3760)
* fix worker thread pool exhaustion bug

* code review changes

* code review changes
2016-12-09 15:23:11 -08:00
David Lim 7f087cdd3b allow Kafka consumer group.id to be overriden by config (#3765) 2016-12-08 15:53:13 -08:00
Charles Allen 27ab23ef44 Don't update segment metadata if archive doesn't move anything (#3476)
* Don't update segment metadata if archive doesn't move anything

* Fix restore task to handle potential null values

* Don't try to update empty metadata

* Address review comments

* Move to druid-io java-util
2016-12-01 07:49:28 -08:00
Parag Jain 7ee6bb7410 option to reset offest automatically in case of OffsetOutOfRangeException (#3678)
* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition

* review comments

* refactoring

* refactor

* review comments
2016-11-21 16:29:46 -06:00
Roman Leventov 7b56cec3b9 Fix resource leaks (#3702) 2016-11-18 21:21:36 +05:30
Gian Merlino 7e80d1045a Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698)
This also involved some other test changes:

- Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2
  engine does merging there.
- Changed test byteBuffer pools from on-heap to off-heap to work around
  https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.
2016-11-16 20:02:25 -08:00
Gian Merlino bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00
Roman Leventov 988d97b09c Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() (part of #3667) (#3668)
* Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache()

* Fix tests
2016-11-11 17:25:41 -08:00
Himanshu ddc078926b consolidate different theta sketch representations into SketchHolder (#3671) 2016-11-11 10:20:41 -08:00
Himanshu b76b3f8d85 reset-cluster command to clean up druid state stored on metadata and deep storage (#3670) 2016-11-09 11:07:01 -06:00
Nicolas Colomer 37ecffb648 Add support for Confluent Schema Registry in the avro extension (#3529) 2016-11-08 16:10:45 -06:00
Gian Merlino 657e4512d2 Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660)
Excludes tests from AvoidStaticImport, since those are used often there and
I didn't want to make this changeset too large. Production code use was minimal
and I switched those to non-static imports.
2016-11-05 11:34:36 -07:00
Roman Leventov 22b57ddd60 Make ExtractionNamespaceCacheFactory to populate cache directly instead of returning callable (#3651)
* Rename ExtractionNamespaceCacheFactory.getCachePopulator() to populateCache() and make it to populate cache itself instead of returning a Callable which populates cache, because this "callback style" is not actually needed.

ExtractionNamespaceCacheFactory isn't a "factory" so it should be renamed, but renaming right in this commit would tear the git history for files, because ExtractionNamespaceCacheFactory implementations have too many changed lines. Going to rename ExtractionNamespaceCacheFactory to something like "CachePopulator" in one of subsequent PRs.

This commit is a part of a bigger refactoring of the lookup cache subsystem.

* Remove unused line and imports
2016-11-04 13:33:16 -07:00
Gian Merlino 4203580290 URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512)
* URIExtractionNamespace: Treat null values in lookup maps as missing entries.

This is useful when many logical lookups are derived from the same base JSON file,
and some lookups' values may be unknown sometimes.

* Add test, logging message, and address other comments.

* Update docs.
2016-11-03 13:53:04 -07:00
Himanshu 2362effd8c use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650) 2016-11-02 15:58:55 -07:00
Roman Leventov 36a1543222 Lookup cache bug fixes (#3609)
* Return better lastVersion from JDBCExtractionNamespaceCacheFactory's cache populator callable

* Return the lastVersion if URI lookup last modified date is not later than the last cached, from URIExtractionNamespaceCacheFactory's cache populator callable

* Fix a race condition in NamespaceExtractionCacheManager.cancelFuture()

* Don't delete cache from NamespaceExtractionCacheManager if the ExtractionNamespaceCacheFactory returned the same version as the last; Better exception treatment in the scheduled cache updater runnable in NamespaceExtractionCacheManager (in particular, don't consume Errors); throw AssertionError in StaticMapExtractionNamespaceCacheFactory if the lastVersion != null)

* In NamespaceExtractionCacheManager, put NamespaceImplData.latestVersion update in the same synchronized() block with swapAndClearCache(id, cacheId); Turn getPostRunnable which returns a callback into a simple updateNamespace() method

* In StaticMapExtractionNamespaceCacheFactory.getCachePopulator(), check the input directly, not inside a callback

* In URIExtractionNamespaceCacheFactory, allow URI last modified time to go backwards

* Better logging in NamespaceExtractionCacheManager

* Add comment on lastVersion nullability in URIExtractionNamespaceCacheFactory
2016-11-02 09:40:19 -07:00
Himanshu eb70a12e43 fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636) 2016-11-01 12:45:38 -05:00
Gian Merlino 89d9c61894 Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572) 2016-10-31 15:24:30 -07:00
Himanshu 23a8e22836 fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613) 2016-10-28 15:48:33 -07:00
Charles Allen 78159d7ca4 Move off-heap QTL global cache delete lock outside of subclass lock (#3597)
* Move off-heap QTL global cache delete lock outside of subclass lock

* Make `delete` thread safe
2016-10-27 22:23:53 +05:30
David Lim 3c56cbdf82 fix timing issue with KafkaLookupExtractorFactoryTest (#3604) 2016-10-25 07:04:51 -07:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
David Lim c2ae734848 KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534)
* allow run thread to gracefully complete instead of interrupting when stopGracefully() is called

* add comments
2016-10-17 10:52:19 -04:00
Gian Merlino c1d3b8a30c Remove dropwizard-jdbc dependency from lookups-cached-single. (#3573)
Fixes #3548.
2016-10-17 10:37:47 -04:00
Gian Merlino 0ce33bc95f HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577)
Fixes #3576.
2016-10-17 10:37:18 -04:00
David Lim 472c409b99 KafkaLookupExtractorFactory: shutdown kafka consumer on close() (#3539)
* shutdown kafka consumer on close

* handle close() race condition
2016-10-15 09:55:51 -07:00
Roman Leventov 5dc95389f7 Add Checkstyle framework (#3551)
* Add Checkstyle framework

* Avoid star import

* Need braces for control flow statements

* Redundant imports

* Add NewLineAtEndOfFile check
2016-10-13 13:37:47 -07:00
jaehong choi 6f21778364 Support finding segments in AWS S3. (#3399)
* support finding segments from a AWS S3 storage.

* add more Uts

* address comments and add a document for the feature.

* update docs indentation

* update docs indentation

* address comments.
1. add a Ut for json ser/deser for the config object.
2. more informant error message in a Ut.

* address comments.
1. use @Min to validate the configuration object
2. change updateDescriptor to a string as it does not take an argument otherwise

* fix a Ut failure - delete a Ut for testing default max length.
2016-10-10 17:27:09 -07:00
Parag Jain c255dd8b19 fix datasegment metadata (#3555) 2016-10-07 16:30:33 -05:00
Parag Jain 76a60a007e create parent dir on HDFS if it does not exist (#3547) 2016-10-06 16:14:00 -07:00
Himanshu 1523de08fb SketchAggregatorFactory.combine(..) returns Union object now so that it can be reused across multiple combine(..) calls (#3471) 2016-10-05 08:40:14 -07:00
Parag Jain 592903571a add context to kafka supervisor for the kafka indexing task (#3464) 2016-10-04 20:08:43 -05:00
Parag Jain e419407eba handle supervisor spec metadata failures (#3456)
close kafka consumer in case supervisor start fails
2016-10-04 10:15:28 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Parag Jain 15c9918c65 log exceptions while trying to pause task (#3504) 2016-09-23 16:53:23 -07:00
David Lim 9226d4af3c configurable shutdownTimeout for Kakfa supervisor (#3497)
* configurable shutdownTimeout

* cr change
2016-09-23 13:26:45 -06:00
David Lim ca9114b41b add supervisor reset API (#3484)
* add supervisor reset API

* CR doc changes and kill running tasks / clear offsets from supervisor
2016-09-22 17:51:06 -07:00
Nishant 6099d20303 [FIX] ReleaseException when the path is being written by multiple tasks (#3494)
* fix ReleaseException when the path is being written by multiple task

* Do not throw IOException if another replica wins the race for segment creation

fix if check

* handle logging comments

* fix test
2016-09-22 14:25:41 -05:00
Navis Ryu 74e1243c7e Fix test fail of PollingLookupTest.testApplyAfterDataChange (#3489) 2016-09-22 08:33:59 -07:00
Himanshu 05ea88df5c fix kafka-indexing-service pom to not reference specific version but parent version for druid core dependencies (#3472) 2016-09-20 15:18:21 -07:00
David Lim 96fcca18ea update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452) 2016-09-20 22:51:15 +05:30
Slim 3175e17a3b Cached lookup module. first cut implementing JDBC cache (#2819) 2016-09-16 13:45:54 -07:00
Charles Allen 95e08b38ea [QTL] Reduced Locking Lookups (#3071)
* Lockless lookups

* Fix compile problem

* Make stack trace throw instead

* Remove non-germane change

* * Add better naming to cache keys. Makes logging nicer
* Fix #3459

* Move start/stop lock to non-interruptable for readability purposes
2016-09-16 11:54:23 -07:00
Gleb Smirnov d981a2aa02 Avoid interrupting ZookeeperConsumerConnector.shutdown() #3346 (#3403) 2016-09-14 17:44:27 -07:00
Himanshu a069257d37 avro-extension -- feature to specify multiple avro reader schemas inline (#3368)
* rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder

* feature to specify multiple schemas inline in avro module
2016-09-13 14:54:31 -07:00
Gian Merlino bcff08826b KafkaIndexTask: Treat null values as unparseable. (#3453) 2016-09-13 10:56:38 -07:00
Slim ba6ddf307e Adding hadoop kerberos authentification. (#3419)
* adding kerberos authentication

* make the 2 functions identical
2016-09-13 10:42:50 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
Alexander Saydakov 1a5042ca26 updated dependency on sketches-core (#3443)
* updated dependency on sketches-core to 0.7.0

* Use sketches-core-0.4.1, which is the latest version still compatible
with JDK7
2016-09-09 16:21:32 -07:00
David Lim 146a17de48 KafkaIndexTask: allow pause to break out of retry loop (#3401) 2016-09-06 22:29:37 -06:00
David Lim 5b1ae21bd1 retry calls to getStartTime (#3429) 2016-09-06 14:02:22 -07:00
Stéphane Derosiaux 48dce88aab Add flag binaryAsString for parquet ingestion (#3381) 2016-08-30 17:30:50 -07:00
David Lim ed924bf214 allow registrants to opt out of announcing themselves when registering as a chat handler (#3360) 2016-08-16 10:51:28 +05:30
Himanshu 70d99fe3c6 Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest (#3363)
or else the test fails if ran independently.
2016-08-15 10:19:33 -07:00
Himanshu 46da682231 avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249) 2016-08-10 10:49:26 -07:00
Jonathan Wei 890e3bdd3f More informative query unit test names (#3342) 2016-08-09 22:24:48 -07:00
Jonathan Wei decefb7477 Add time interval dim filter and retention analysis example (#3315)
* Add time interval dim filter and retention analysis example

* Use closed-open matching for intervals, update cache key generation

* Fix time filtering tests for interval boundary change
2016-08-05 07:25:04 -07:00
Navis Ryu 5b3f0ccb1f Support variance and standard deviation (#2525)
* Support variance and standard deviation

* addressed comments
2016-08-04 17:32:58 -07:00
Gleb Smirnov 33dbe0800c Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326) 2016-08-04 10:24:19 -07:00
Gian Merlino 8030f1cb67 Be more respectful of maxRowsInMemory. (#3284)
- Appenderator: Respect maxRowsInMemory across all sinks.
- KafkaIndexTask: Respect maxRowsInMemory across all partitions.
2016-07-26 15:02:35 -06:00
Charles Allen 3f1681c16c Caffeine cache extension (#3028)
* Initial commit of caffeine cache

* Address code comments

* Move and fixup README.md a bit

* Improve caffeine readme information

* Cleanup caffeine pom

* Address review comments

* Bump caffeine to 2.3.1

* Bump druid version to 0.9.2-SNAPSHOT

* Make test not fail randomly.

See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation

* Fix distribution and documentation

* Add caffeine to extensions.md

* Fix links in extensions.md

* Lexicographic
2016-07-06 15:42:54 -07:00
Charles Allen bfa5c05aaa Make global lookup cache introspector class public (#3199)
* Make global lookup cache introspector class public
* Fixes #3187

* Make KafkaLookupExtractorIntrospectionHandler a public static class
2016-07-01 15:50:57 -07:00
Xavier Léauté 485e381387 remove datasource from hadoop output path (#3196)
fixes #2083, follow-up to #1702
2016-06-29 08:53:45 -07:00
David Lim 1d40df4bb7 fix kafka consumer concurrent access during shutdown (#3193) 2016-06-28 13:23:17 -07:00
Hyukjin Kwon 45f553fc28 Replace the deprecated usage of NoneShardSpec (#3166) 2016-06-25 10:27:25 -07:00
Gian Merlino 4cc39b2ee7 Alternative groupBy strategy. (#2998)
This patch introduces a GroupByStrategy concept and two strategies: "v1"
is the current groupBy strategy and "v2" is a new one. It also introduces
a merge buffers concept in DruidProcessingModule, to try to better
manage memory used for merging.

Both of these are described in more detail in #2987.

There are two goals of this patch:

1. Make it possible for historical/realtime nodes to return larger groupBy
   result sets, faster, with better memory management.
2. Make it possible for brokers to merge streams when there are no order-by
   columns, avoiding materialization.

This patch does not do anything to help with memory management on the broker
when there are order-by columns or when there are nested queries. That could
potentially be done in a future patch.
2016-06-24 18:06:09 -07:00
du00cs ebd654228b fix: avro types exception in sketch (#3167) 2016-06-22 15:54:20 -05:00
Charles Allen 674f94083e Add more logging around failed S3DataSegmentMover DeleteExceptions (#3104)
* Add more logging around failed S3DataSegmentMover DeleteExceptions

* Fix test NPE
2016-06-16 14:58:33 -07:00
Charles Allen f7fa1d8c62 [QTL] Allow S3 version finder to search entire s3 object key (#3139)
* Allow S3 version finder to search entire s3 object key
* Previously only was able to search immediate "directory"

* Update method javadoc

* Expand docs a bit better
2016-06-13 21:02:28 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
David Lim 4faa298977 update kafka client for kafka indexing service to 0.9.0.1 (#3109) 2016-06-08 06:51:03 -07:00
Charles Allen 8cac710546 Async lookups-cached-global by default (#3074)
* Async lookups-cached-global by default
* Also better lookup docs

* Fix test timeouts

* Fix timing of deserialized test

* Fix problem with 0 wait failing immediately
2016-06-03 15:58:10 -05:00
David Lim a2290a8f05 support seamless config changes (#3051) 2016-06-03 13:50:19 -07:00
Charles Allen 447033985e Make S3DataSegmentMover not bother checking for items if they are the same (#3032)
* Make S3DataSegmentMover not bother checking for items if they are the same
2016-06-02 17:27:21 +01:00
David Lim f6c39cc844 Kafka task minimum message time (#3035)
* add KafkaIndexTask support for minimumMessageTime

* add Kafka supervisor support for lateMessageRejectionPeriod
2016-05-31 11:37:00 -07:00
David Lim 3ef24c03b3 Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006)
* validate X-Druid-Task-Id header in request and add header to response

* modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant
2016-05-25 22:05:18 -07:00
Charles Allen 8024b915e2 [QTL] Implement LookupExtractorFactory of namespaced lookup (#2926)
* support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions

- druid-namespace-lookup and druid-kafka-extraction-namespace are modified
- However, druid-namespace-lookup still has configuration about ON/OFF
  HEAP cache manager selection, which is not namespace wide
  configuration but node wide configuration as multiple namespace shares
  the same cache manager

* update KafkaExtractionNamespaceTest to reflect argument signature changes

* Add more synchronization functionality to NamespaceLookupExtractorFactory

* Remove old way of using extraction namespaces

* resolve compile error by supporting LookupIntrospectHandler

* Remove kafka lookups

* Remove unused stuff

* Fix start and stop behavior to be consistent with new javadocs

* Remove unused strings

* Add timeout option

* Address comments on configurations and improve docs

* Add more options and update hash key and replaces

* Move monitoring to the overriding classes

* Add better start/stop logging

* Remove old docs about namespace names

* Fix bad comma

* Add `@JsonIgnore` to lookup factory

* Address code review comments

* Remove ExtractionNamespace from module json registration

* Fix problems with naming and initialization. Add tests

* Optimize imports / reformat

* Fix future not being properly cancelled on failed initial scheduling

* Fix delete returns

* Add more docs about whole introspection

* Add `/version` introspection point for lookups

* Add more tests and address comments

* Add StaticMap extraction namespace for testing. Also add a bunch of tests

* Move cache system property to `druid.lookup.namespace.cache.type`

* Make VERSION lower case

* Change poll period to 0ms  for StaticMap

* Move cache key to bytebuffer

* Change hashCode and equals on static map extraction fn

* Add more comments on StaticMap

* Address comments

* Make scheduleAndWait use a latch

* Sanity renames and fix imports

* Remove extra info in docs

* Fix review comments

* Strengthen failure on start from warn to error

* Address comments

* Rename namespace-lookup to lookups-cached-global

* Fix injective mis-naming
* Also add serde test
2016-05-24 10:56:40 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
Himanshu d3e9c47a5f use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943) 2016-05-13 10:06:31 +05:30
Slim 45b2e65d75 [QTL] adding listDelimiter to lookup parser spec (#2941)
* adding listDelimiter to lookup parser spec

* cleaning code
2016-05-10 15:41:16 +05:30
Charles Allen 90b0b0a4ad Make URIExtraction not require FileSystem impls for URIs it understands (#2929)
* Make URIExtraction not require FileSystem impls for URIs it understands
* Fixes #2928

* Preserve URI information

* Simply case for exact matching

* Move unused variable
2016-05-08 23:23:53 +05:30
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Charles Allen 2a769a9fb7 Make S3DataSegmentPuller do GET requests less often (#2900)
* Make S3DataSegmentPuller do GET requests less often
* Fixes #2894

* Run intellij formatting on S3Utils

* Remove forced stream fetching on getVersion

* Remove unneeded finalize

* Allow initial object fetching to fail and be retried
2016-05-04 16:21:35 -07:00
Gian Merlino f8ddfb9a4b Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922)
Fixes #2912.
2016-05-04 13:54:34 -07:00
Charles Allen 6b957aa072 [QTL] Make URI Exctraction Namespace take more sane arguments (#2738)
* Make URI Exctraction Namespace take more sane arguments
* Fixes https://github.com/druid-io/druid/issues/2669

* Update docs

* Rename error message

* Undo overzealous deletion of docs

* Explain caching mechanism a bit more in docs
2016-05-02 12:54:34 -07:00
Charles Allen 54b717bdc3 [QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800)
* Move kafka-extraction-namespace to the Lookup framework.

* Address comments

* Fix missing kafka introspection

* Fix tests to be less racy

* Make testing a bit more leniant

* Make tests even more forgiving

* Add comments to kafka lookup cache method

* Move startStopLock to just use started

* Make start() and stop() idempotent

* Forgot to update test after last change, test now accounts for idempotency

* Add extra idempotency on stop check

* Add more descriptive docs of behavior
2016-05-02 09:45:13 -07:00
Gian Merlino 67b47c982f Datasketches: Remove isInputThetaSketch from cache key. (#2899) 2016-04-28 18:14:52 -07:00
Gian Merlino 16080dc54f Adjust colliding aggregator cache IDs. (#2891)
- Renumbered ApproximateHistogramAggregatorFactory from 8 to 12,
  8 was taken by CardinalityAggregatorFactory
- Renumbered ApproximateHistogramFoldingAggregatorFactory from 9 to 13,
  9 was taken by FilteredAggregatorFactory
2016-04-28 10:11:33 -07:00
Gian Merlino 909abd17f3 Sketch cache key should include size, isInputThetaSketch. (#2893) 2016-04-28 10:10:46 -07:00
David Lim 7641f2628f add control and status endpoints to KafkaIndexTask (#2730) 2016-04-21 15:34:59 -07:00
Xavier Léauté 5938d9085b Stream segments from database (#2859)
* Avoids fetching all segment records into heap by JDBC driver
* Set connection to read-only to help database optimize queries
* Update JDBC drivers (MySQL has fixes for streaming results)
2016-04-21 05:40:07 +08:00
Gian Merlino 08c784fbf6 KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844)
segment creation deterministic.

This means that each segment will contain data from just one Kafka
partition. So, users will probably not want to have a super high number
of Kafka partitions...

Fixes #2703.
2016-04-18 22:29:52 -07:00
Xavier Léauté 0f8a037bcd support PostgreSQL >= 9.5 upsert capability 2016-04-01 16:53:27 -07:00
Gian Merlino 977e867ad8 Downgrade geoip2, exclude com.google.http-client.
Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client
from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting
to run Druid along with an upgraded com.google.http-client) while preventing
Jackson conflicts pointed out in #2717.

Fixes #2717.

This reverts commit 21b7572533.
2016-03-25 14:43:22 -07:00
Himanshu f26e73d133 Merge pull request #2720 from gianm/druid-api
Move druid-api into the druid repo.
2016-03-24 15:51:10 -05:00
Gian Merlino 7e7a886f65 Move druid-api into the druid repo.
This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf
in the druid-api repo.
2016-03-24 11:04:34 -07:00
Himanshu Gupta 4aead38130 fix SketchEstimate post aggregator's getComparator() and test changes to verify same 2016-03-24 10:11:06 -05:00
jon-wei a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Slim cf342d8d3c Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility
[QTL][Lookup] lookup module with the snapshot utility
2016-03-17 11:39:47 -05:00
Slim Bouguerra 0c86b29ef0 lookup module with the snapshot utility 2016-03-17 09:20:41 -05:00
Charles Allen 02805a74a1 Merge pull request #2648 from chtefi/master
Ignore case when testing for table existence
2016-03-14 13:57:53 -07:00
Stéphane Derosiaux 416cb03687 Ignore case when testing for table existence 2016-03-13 11:17:30 +01:00
Gian Merlino f22fb2c2cf KafkaIndexTask.
Reads a specific offset range from specific partitions, and can use dataSource metadata
transactions to guarantee exactly-once ingestion.

Each task has a finite lifecycle, so it is expected that some process will be supervising
existing tasks and creating new ones when needed.
2016-03-10 18:41:43 -08:00
Gian Merlino 187569e702 DataSource metadata.
Geared towards supporting transactional inserts of new segments. This involves an
interface "DataSourceMetadata" that allows combining of partially specified metadata
(useful for partitioned ingestion).

DataSource metadata is stored in a new "dataSource" table.
2016-03-10 17:41:50 -08:00
Nishant ba1185963b Fix a bunch of dependencies
* Eliminate exclusion groups from pull-deps
* Only consider dependency nodes in pull-deps if they are not in the following scopes
	* provided
	* test
	* system
* Fix a bunch of `<scope>provided</scope>` missing tags
* Better exclusions for a couple of problematic libs
2016-03-10 10:18:08 -08:00
fjy e3e932a4d4 refactor extensions into core and contrib 2016-03-08 17:12:09 -08:00