druid

Commit Graph

Author	SHA1	Message	Date
Jerry Chung	0bcfd9354c	Fix S3 deep storage push and s3 insert-segment-to-db (#4174 ) * Fix S3 deep storage push and s3 insert-segment-to-db * Less verbose checks in S3DataSegmentFinder	2017-04-14 19:42:10 -07:00
Gian Merlino	b2954d5fea	Better groupBy error messages and docs around resource limits. (#4162 ) * Better groupBy error messages and docs around resource limits. * Fix BufferGrouper test from datasketches. * Further clarify.	2017-04-13 10:38:53 -07:00
Roman Leventov	15f3a94474	Copy closer into Druid codebase (fixes #3652 ) (#4153 )	2017-04-10 09:38:45 +09:00
Parag Jain	7e0d4c9555	secure supervisor endpoints (#3985 )	2017-04-05 16:42:32 -07:00
Roman Leventov	73d9b31664	GenericIndexed minor bug fixes, optimizations and refactoring (#3951 ) * Minor bug fixes in GenericIndexed; Refactor and optimize GenericIndexed; Remove some unnecessary ByteBuffer duplications in some deserialization paths; Add ZeroCopyByteArrayOutputStream * Fixes * Move GenericIndexedWriter.writeLongValueToOutputStream() and writeIntValueToOutputStream() to SerializerUtils * Move constructors * Add GenericIndexedBenchmark * Comments * Typo * Note in Javadoc that IntermediateLongSupplierSerializer, LongColumnSerializer and LongMetricColumnSerializer are thread-unsafe * Use primitive collections in IntermediateLongSupplierSerializer instead of BiMap * Optimize TableLongEncodingWriter * Add checks to SerializerUtils methods * Don't restrict byte order in SerializerUtils.writeLongToOutputStream() and writeIntToOutputStream() * Update GenericIndexedBenchmark * SerializerUtils.writeIntToOutputStream() and writeLongToOutputStream() separate for big-endian and native-endian * Add GenericIndexedBenchmark.indexOf() * More checks in methods in SerializerUtils * Use helperBuffer.arrayOffset() * Optimizations in SerializerUtils	2017-03-27 14:17:31 -05:00
Benedict Jin	23f77ebd20	Explain Avro's unnecessary EOFException (#4098 ) (#4100 ) * Explain Avro's unnecessary EOFException (#4098) * add jira link into log message	2017-03-24 10:45:45 -05:00
Gian Merlino	4b9f975f50	Rename SketchAggregationWithSimpleDataTest. (#4105 ) Tests that don't end in "Test" won't get run automatically by Maven.	2017-03-23 14:20:50 -07:00
Akash Dwivedi	ff7f90b02d	relocate method in BufferAggregator. (#4071 ) * relocate method in BufferAggregator. * Unused import. * Detailed javadoc. * using Int2ObjectMap. * batch relocate. * Revert batch relocate. * Unused import. * code comments. * code comment.	2017-03-23 13:07:59 -07:00
Roman Leventov	84fe91ba0b	Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798 ) (#3889 ) * Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing. * Use Execs.singleThreaded() * RuntimeShapeInspector to support nullable fields * Make CalledFromHotLoop annotation Inherited * Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory * Close InputStream in SpecializationService * Formatting * Test specialized PooledTopNScanners * Set flags in PooledTopNAlgorithm directly * Fix tests, dependent on CountAggragatorFactory toString() form * Fix * Revert CountAggregatorFactory changes * Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector * Remove duplicate RoaringBitmap dependency in the extendedset pom.xml * Fix * Treat ByteBuffers specially in StringRuntimeShape * Doc fix * Annotate BufferAggregator.init() with CalledFromHotLoop * Make triggerSpecializationIterationsThreshold an int * Remove SpecializationService.PerPrototypeClassState.of() * Add comments * Limit the amount of specializations that SpecializationService could make * Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions * Use more efficient ConcurrentMap's idioms in SpecializationService	2017-03-17 14:44:36 -05:00
Charles Allen	805d85afda	Allow compilation as Java8 source and target (#3328 ) * Allow compilation as Java8 source and target for everything except API * Remove conditions in tests which assume that we may run with Java 7 * Update easymock to 3.4 * Make Animal Sniffer to check Java 1.8 usage; remove redundant druid-caffeine-cache configuration * Use try-with-resources in LargeColumnSupportedComplexColumnSerializerTest.testSanity() * Remove java7 special for druid-api	2017-03-14 22:23:47 -06:00
Gian Merlino	3216134f8c	SQL: Make row extractions extensible and add one for lookups. (#3991 ) This is a reopening of #3989, since that PR was merged to master prematurely and accidentally.	2017-03-13 21:56:16 -07:00
Nishant Bangarwa	adbe89e7d6	Fix race in KafkaIndexTaskTest (#4031 ) task.pause(0) can return early before the task is actually paused. Exception for failure - java.lang.AssertionError: expected:<PAUSED> but was:<READING> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at io.druid.indexing.kafka.KafkaIndexTaskTest.testRunWithOffsetOutOfRangeEx ceptionAndPause(KafkaIndexTaskTest.java:1229) To reproduce add Thread.sleep(10000) in beginning of KafkaIndexTask.possiblypause method.	2017-03-09 07:34:46 -08:00
Gian Merlino	4ca5270e88	Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004 ) * Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. Includes two fixes: - groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults returns a lazy sequence) and it generates incorrect results. - Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y". Also includes doc and test fixes: - groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it is once again. - chunkPeriod documentation was misleading due to its checkered past. Updated it to be more accurate. * Remove unused import. * Restore buffer size.	2017-03-06 12:27:02 -06:00
Akash Dwivedi	bebf9f34c7	HdfsDataSegmentPusher bug fix (#4003 ) * Fix for HdfsDataSegmentPusher. * Add missing loadspec in actual descriptor file. Tests to check actual content of descriptor file.	2017-03-06 00:53:44 -08:00
Gian Merlino	df623ebfe3	Fix a couple bugs due to calling Period.getMillis(). (#4006 )	2017-03-05 18:44:20 +05:30
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Gian Merlino	e63eefd7ff	Revert "SQL: Make row extractions extensible and add one for lookups. (#3989 )" The PR was merged to master accidentally. This reverts commit `23927a3c96`.	2017-03-01 17:06:12 -08:00
Gian Merlino	23927a3c96	SQL: Make row extractions extensible and add one for lookups. (#3989 ) * SQL: Make row extractions extensible and add one for lookups. * Fix QuantileSqlAggregatorTest.	2017-03-01 17:03:43 -08:00
Akash Dwivedi	94da5e80f9	Namespace optimization for hdfs data segments. (#3877 ) * NN optimization for hdfs data segments. * HdfsDataSegmentKiller, HdfsDataSegment finder changes to use new storage format.Docs update. * Common utility function in DataSegmentPusherUtil. * new static method `makeSegmentOutputPathUptoVersionForHdfs` in JobHelper * reuse getHdfsStorageDirUptoVersion in DataSegmentPusherUtil.getHdfsStorageDir() * Addressed comments. * Review comments. * HdfsDataSegmentKiller requested changes. * extra newline * Add maprfs.	2017-03-01 09:51:20 -08:00
Akash Dwivedi	91344cbe57	Enable GenericIndexed V2 for built-in(druid-io managed) complex columns. (#3987 ) * Enable GenericIndexed V2 for complex columns. * SerializerBuilder to use GenericColumnSerializer.	2017-02-28 22:06:54 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Gian Merlino	f21641f0dc	Fix over-optimistic log message. (#3963 ) "Wrote task log" could be logged before the output stream is flushed and closed, which could generate an error and not actually write the log.	2017-02-22 15:02:53 -08:00
Parag Jain	edb032b96d	add datasource in intermediate segment path (#3961 )	2017-02-22 16:31:00 -06:00
Gian Merlino	985203b634	Finalize fields in postaggs (#3957 ) * initial commits for finalizeFieldAccess #2433 * fix some bugs to run a query * change name of method Queries.verifyAggregations to Queries.prepareAggregations * add Uts * fix Ut failures * rebased to master * address comments and add a Ut for arithmetic post aggregators * rebased to the master * address the comment of injection within arithmetic post aggregator * address comments and introduce decorate() in the PostAggregator interface. * Address comments. 1. Implements getComparator in FinalizingFieldAccessPostAggregator and add Uts for it 2. Some minor changes like renaming a method name. * Fix a code style mismatch. * Rebased to the master	2017-02-21 16:32:14 -08:00
Gian Merlino	16ef513c7d	SQL: Add context and contextual functions to planner. (#3919 ) * SQL: Add context and contextual functions to planner. Added support for context parameters specified as JDBC connection properties or a JSON object for SQL-over-JSON-over-HTTP. Also added features that depend on context functionality: - Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions. - Added support for time zones other than UTC via a "timeZone" context. - Pass down query context to Druid queries too. Also some bug fixes: - Fix DATE handling, it was largely done incorrectly before. - Fix CAST(__time TO DATE) which should do a floor-to-day. - Fix non-equality comparisons to FLOOR(__time TO X). - Fix maxQueryCount property. * Pass down context to nested queries too.	2017-02-15 14:09:14 -08:00
Gian Merlino	78b0d134ae	Require Java 8 and include some Java 8 dependencies. (#3914 ) * Require Java 8 and include some Java 8 dependencies. - Upgrade Jetty to 9.3.16.v20170120. - Upgrade DataSketches to 0.8.4. - Bundle caffeine-cache by default. - Still target Java 7 when compiling base Druid classes. * Update cluster, quickstart docs. * Remove oraclejdk7 from travis.yml.	2017-02-14 12:51:51 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
Parag Jain	1f263fe50b	alert when resetting offsets (#3931 ) * alert when resetting offsets * add more data to alerts	2017-02-13 13:49:24 -08:00
michaelschiff	c1eee9bbf3	modified "end" column to `end` (#3903 ) * modified "end" column to `end`. "end" is interpretted as a string rather than dereferencing the column value * SQLMetadataConnector.getQuoteString defines the string that should be used to quote string fields * positional arguments for String.format * for Connectors that use " need to include the \ escape as well	2017-02-13 12:36:27 -08:00
Jihoon Son	991e2852da	Add PostAggregators to generator cache keys for top-n queries (#3899 ) * Add PostAggregators to generator cache keys for top-n queries * Add tests for strings * Remove debug comments * Add type keys and list sizes to cache key * Make post aggregators used for sort are considered for cache key generation * Use assertArrayEquals() * Improve findPostAggregatorsForSort() * Address comments * fix test failure * address comments	2017-02-13 12:23:44 -08:00
Parag Jain	8e31a465ad	report hand off count finite appenderator driver (#3925 )	2017-02-13 10:41:24 -08:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
Parag Jain	1aabb45a09	auto reset option for Kafka Indexing service (#3842 ) * auto reset option for Kafka Indexing service in case message at the offset being fetched is not present anymore at kafka brokers * review comments * review comments * reverted last change * review comments * review comments * fix typo	2017-02-02 14:57:45 -06:00
Nishant Bangarwa	a457cded28	Druid Extension to enable Authentication using Kerberos. (#3853 ) * Add extension for supporting kerberos security - This PR adds an extension for supporting druid authentication via Kerberos. - Working on the docs. * Add docs * review comments * more review comments * Block all paths by default * more review comments - use proper Oid * Allow extensions to override httpclient for integration tests * Add kerberos lock to prevent multithreaded issues. * review comment - remove enabled flag and fix router injection * Add Cookie Handling and more detailed docs * review comment - rename DruidKerberosConfig -> AuthKerberosConfig * review comments * fix travis failure on jdk7	2017-02-02 14:55:21 -06:00
Charles Allen	a73f1c9c70	Make s3 work better (#3898 )	2017-02-02 10:04:30 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	ac84a3e011	SQL: Add resolution parameter, fix filtering bug with APPROX_QUANTILE (#3868 ) * SQL: Add resolution parameter to quantile agg, rename to APPROX_QUANTILE. * Fix bug with re-use of filtered approximate histogram aggregators. Also add APPROX_QUANTILE tests for filtering and running on complex columns. Includes some slight refactoring to allow tests to make DruidTables that include complex columns. * Remove unused import	2017-01-25 18:39:26 -08:00
Parag Jain	b3dae0efc3	catch all errors (#3844 )	2017-01-24 18:01:30 -07:00
Gian Merlino	d51f5e058d	SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. (#3852 ) * SQL: Ditch CalciteConnection layer and add DruidMeta, extension aggregators. Switched from CalciteConnection to Planner, bringing benefits: - CalciteConnection's JDBC interface no longer sits between the SQL server (HTTP/Avatica) and Druid's query layer. Instead, the SQL servers can use Druid Sequence objects directly, reducing overhead in the query return path. - Implemented our own Planner-based Avatica Meta, letting us control connection timeouts and connection / statement limits. The previous CalciteConnection-based implementation didn't have any limits or timeouts. - The Planner interface lets us override the operator table, opening up SQL language extensions. This patch includes two: APPROX_COUNT_DISTINCT in core, and a QUANTILE aggregator in the druid-histogram extension. Also: - Added INFORMATION_SCHEMA metadata schema. - Added tests for Unicode literals and escapes. * Verify statement is actually open before closing it. * More detailed INFORMATION_SCHEMA docs.	2017-01-19 16:32:20 -08:00
Akash Dwivedi	e550d48772	Using fully qualified hdfs path. (#3705 ) * Using fully qualified hdfs path. * Review changes. * Remove unused imports. * Variable name change.	2017-01-17 14:40:22 -06:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Roman Leventov	49d71e9b38	Fix the build after #3697 (#3807 )	2016-12-26 17:06:48 -06:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Roman Leventov	76cb06a8d8	Lookup cache refactoring (the main part of #3667 ) (#3697 ) * Lookup cache refactoring (the main part of druid-io/druid#3667) * Use PowerMock's static methods in NamespaceLookupExtractorFactoryTest * Fix KafkaLookupExtractorFactoryTest * Use VisibleForTesting annotation instead of Javadoc comment * Create a NamespaceExtractionCacheManager separately for each test in NamespaceExtractionCacheManagersTest * Rename CacheScheduler.NoCache.ENTRY_DISPOSED to ENTRY_CLOSED * Reduce visibility of NamespaceExtractionCacheManager.cacheCount() and monitor() implementations, and don't run NamespaceExtractionCacheManagerExecutorsTest with off-heap cache (it didn't before) * In NamespaceLookupExtractorFactory, use safer idiom to check if CacheState is NoCache or VersionedCache * More logging in CacheHandler constructor and close(), VersionedCache.close() * PR comments addressed * Make CacheScheduler.EntryImpl AutoCloseable, avoid 'dispose' verb in comments, logging and naming in CacheScheduler in favor of 'close' * More Javadoc comments to CacheScheduler * Fix NPE * Remove logging in OnHeapNamespaceExtractionCacheManager.expungeCollectedCaches() * Make NamespaceExtractionCacheManagersTest.testRacyCreation() to have similar load to what it be before the refactoring * Unwrap NamespaceExtractionCacheManager.scheduledExecutorService from unneeded MoreExecutors.listeningDecorator() and specify that this is ScheduledThreadPoolExecutor, which ensures happens-before between periodic runs of the tasks * More comments on MapDbCacheDisposer.disposed * Replace concat with Long.toString() * Comment on why NamespaceExtractionCacheManager.scheduledExecutorService() returns ScheduledThreadPoolExecutor * Place logging statements in VersionedCache.close() and CacheHandler.close() after actual closing logic, because logging may fail * Make JDBCExtractionNamespaceCacheFactory and StaticMapExtractionNamespaceCacheFactory to try to close newly created VersionedCache if population has failed, as it is done already in URIExtractionNamespaceCacheFactory * Don't close the whole CacheScheduler.Entry, if the cache update task failed * Replace AtomicLong updateCounter and firstRunLatch with Phaser-based UpdateCounter in CacheScheduler.EntryImpl	2016-12-23 18:04:27 -08:00
Himanshu	4ca3b7f1e4	overlord helpers framework and tasklog auto cleanup (#3677 ) * overlord helpers framework and tasklog auto cleanup * review comment changes * further review comments addressed	2016-12-21 15:18:55 -08:00
Gian Merlino	6440ddcbca	Fix #3795 (Java 7 compatibility). (#3796 ) * Fix #3795 (Java 7 compatibility). Also introduce Animal Sniffer checks during build, which would have caught the original problems. * Add Animal Sniffer on caffeine-cache for JDK8.	2016-12-21 10:19:13 -08:00
David Lim	0b9dff0bc1	fix worker thread pool exhaustion bug (#3760 ) * fix worker thread pool exhaustion bug * code review changes * code review changes	2016-12-09 15:23:11 -08:00
David Lim	7f087cdd3b	allow Kafka consumer group.id to be overriden by config (#3765 )	2016-12-08 15:53:13 -08:00
Charles Allen	27ab23ef44	Don't update segment metadata if archive doesn't move anything (#3476 ) * Don't update segment metadata if archive doesn't move anything * Fix restore task to handle potential null values * Don't try to update empty metadata * Address review comments * Move to druid-io java-util	2016-12-01 07:49:28 -08:00
Parag Jain	7ee6bb7410	option to reset offest automatically in case of OffsetOutOfRangeException (#3678 ) * option to reset offset automatically in case of OffsetOutOfRangeException if the next offset is less than the earliest available offset for that partition * review comments * refactoring * refactor * review comments	2016-11-21 16:29:46 -06:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
Roman Leventov	988d97b09c	Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() (part of #3667 ) (#3668 ) * Unwrap exceptions from RuntimeException in URIExtractionNamespaceCacheFactory.populateCache() * Fix tests	2016-11-11 17:25:41 -08:00
Himanshu	ddc078926b	consolidate different theta sketch representations into SketchHolder (#3671 )	2016-11-11 10:20:41 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Nicolas Colomer	37ecffb648	Add support for Confluent Schema Registry in the avro extension (#3529 )	2016-11-08 16:10:45 -06:00
Gian Merlino	657e4512d2	Checkstyle checks for AvoidStaticImport, UnusedImports. (#3660 ) Excludes tests from AvoidStaticImport, since those are used often there and I didn't want to make this changeset too large. Production code use was minimal and I switched those to non-static imports.	2016-11-05 11:34:36 -07:00
Roman Leventov	22b57ddd60	Make ExtractionNamespaceCacheFactory to populate cache directly instead of returning callable (#3651 ) * Rename ExtractionNamespaceCacheFactory.getCachePopulator() to populateCache() and make it to populate cache itself instead of returning a Callable which populates cache, because this "callback style" is not actually needed. ExtractionNamespaceCacheFactory isn't a "factory" so it should be renamed, but renaming right in this commit would tear the git history for files, because ExtractionNamespaceCacheFactory implementations have too many changed lines. Going to rename ExtractionNamespaceCacheFactory to something like "CachePopulator" in one of subsequent PRs. This commit is a part of a bigger refactoring of the lookup cache subsystem. * Remove unused line and imports	2016-11-04 13:33:16 -07:00
Gian Merlino	4203580290	URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512 ) * URIExtractionNamespace: Treat null values in lookup maps as missing entries. This is useful when many logical lookups are derived from the same base JSON file, and some lookups' values may be unknown sometimes. * Add test, logging message, and address other comments. * Update docs.	2016-11-03 13:53:04 -07:00
Himanshu	2362effd8c	use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650 )	2016-11-02 15:58:55 -07:00
Roman Leventov	36a1543222	Lookup cache bug fixes (#3609 ) * Return better lastVersion from JDBCExtractionNamespaceCacheFactory's cache populator callable * Return the lastVersion if URI lookup last modified date is not later than the last cached, from URIExtractionNamespaceCacheFactory's cache populator callable * Fix a race condition in NamespaceExtractionCacheManager.cancelFuture() * Don't delete cache from NamespaceExtractionCacheManager if the ExtractionNamespaceCacheFactory returned the same version as the last; Better exception treatment in the scheduled cache updater runnable in NamespaceExtractionCacheManager (in particular, don't consume Errors); throw AssertionError in StaticMapExtractionNamespaceCacheFactory if the lastVersion != null) * In NamespaceExtractionCacheManager, put NamespaceImplData.latestVersion update in the same synchronized() block with swapAndClearCache(id, cacheId); Turn getPostRunnable which returns a callback into a simple updateNamespace() method * In StaticMapExtractionNamespaceCacheFactory.getCachePopulator(), check the input directly, not inside a callback * In URIExtractionNamespaceCacheFactory, allow URI last modified time to go backwards * Better logging in NamespaceExtractionCacheManager * Add comment on lastVersion nullability in URIExtractionNamespaceCacheFactory	2016-11-02 09:40:19 -07:00
Himanshu	eb70a12e43	fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636 )	2016-11-01 12:45:38 -05:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Charles Allen	78159d7ca4	Move off-heap QTL global cache delete lock outside of subclass lock (#3597 ) * Move off-heap QTL global cache delete lock outside of subclass lock * Make `delete` thread safe	2016-10-27 22:23:53 +05:30
David Lim	3c56cbdf82	fix timing issue with KafkaLookupExtractorFactoryTest (#3604 )	2016-10-25 07:04:51 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
David Lim	c2ae734848	KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534 ) * allow run thread to gracefully complete instead of interrupting when stopGracefully() is called * add comments	2016-10-17 10:52:19 -04:00
Gian Merlino	c1d3b8a30c	Remove dropwizard-jdbc dependency from lookups-cached-single. (#3573 ) Fixes #3548.	2016-10-17 10:37:47 -04:00
Gian Merlino	0ce33bc95f	HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577 ) Fixes #3576.	2016-10-17 10:37:18 -04:00
David Lim	472c409b99	KafkaLookupExtractorFactory: shutdown kafka consumer on close() (#3539 ) * shutdown kafka consumer on close * handle close() race condition	2016-10-15 09:55:51 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
jaehong choi	6f21778364	Support finding segments in AWS S3. (#3399 ) * support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @Min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length.	2016-10-10 17:27:09 -07:00
Parag Jain	c255dd8b19	fix datasegment metadata (#3555 )	2016-10-07 16:30:33 -05:00
Parag Jain	76a60a007e	create parent dir on HDFS if it does not exist (#3547 )	2016-10-06 16:14:00 -07:00
Himanshu	1523de08fb	SketchAggregatorFactory.combine(..) returns Union object now so that it can be reused across multiple combine(..) calls (#3471 )	2016-10-05 08:40:14 -07:00
Parag Jain	592903571a	add context to kafka supervisor for the kafka indexing task (#3464 )	2016-10-04 20:08:43 -05:00
Parag Jain	e419407eba	handle supervisor spec metadata failures (#3456 ) close kafka consumer in case supervisor start fails	2016-10-04 10:15:28 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Parag Jain	15c9918c65	log exceptions while trying to pause task (#3504 )	2016-09-23 16:53:23 -07:00
David Lim	9226d4af3c	configurable shutdownTimeout for Kakfa supervisor (#3497 ) * configurable shutdownTimeout * cr change	2016-09-23 13:26:45 -06:00
David Lim	ca9114b41b	add supervisor reset API (#3484 ) * add supervisor reset API * CR doc changes and kill running tasks / clear offsets from supervisor	2016-09-22 17:51:06 -07:00
Nishant	6099d20303	[FIX] ReleaseException when the path is being written by multiple tasks (#3494 ) * fix ReleaseException when the path is being written by multiple task * Do not throw IOException if another replica wins the race for segment creation fix if check * handle logging comments * fix test	2016-09-22 14:25:41 -05:00
Navis Ryu	74e1243c7e	Fix test fail of PollingLookupTest.testApplyAfterDataChange (#3489 )	2016-09-22 08:33:59 -07:00
Himanshu	05ea88df5c	fix kafka-indexing-service pom to not reference specific version but parent version for druid core dependencies (#3472 )	2016-09-20 15:18:21 -07:00
David Lim	96fcca18ea	update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452 )	2016-09-20 22:51:15 +05:30
Slim	3175e17a3b	Cached lookup module. first cut implementing JDBC cache (#2819 )	2016-09-16 13:45:54 -07:00
Charles Allen	95e08b38ea	[QTL] Reduced Locking Lookups (#3071 ) * Lockless lookups * Fix compile problem * Make stack trace throw instead * Remove non-germane change * * Add better naming to cache keys. Makes logging nicer * Fix #3459 * Move start/stop lock to non-interruptable for readability purposes	2016-09-16 11:54:23 -07:00
Gleb Smirnov	d981a2aa02	Avoid interrupting ZookeeperConsumerConnector.shutdown() #3346 (#3403 )	2016-09-14 17:44:27 -07:00
Himanshu	a069257d37	avro-extension -- feature to specify multiple avro reader schemas inline (#3368 ) * rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder * feature to specify multiple schemas inline in avro module	2016-09-13 14:54:31 -07:00
Gian Merlino	bcff08826b	KafkaIndexTask: Treat null values as unparseable. (#3453 )	2016-09-13 10:56:38 -07:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Alexander Saydakov	1a5042ca26	updated dependency on sketches-core (#3443 ) * updated dependency on sketches-core to 0.7.0 * Use sketches-core-0.4.1, which is the latest version still compatible with JDK7	2016-09-09 16:21:32 -07:00
David Lim	146a17de48	KafkaIndexTask: allow pause to break out of retry loop (#3401 )	2016-09-06 22:29:37 -06:00
David Lim	5b1ae21bd1	retry calls to getStartTime (#3429 )	2016-09-06 14:02:22 -07:00
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
David Lim	ed924bf214	allow registrants to opt out of announcing themselves when registering as a chat handler (#3360 )	2016-08-16 10:51:28 +05:30
Himanshu	70d99fe3c6	Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest (#3363 ) or else the test fails if ran independently.	2016-08-15 10:19:33 -07:00
Himanshu	46da682231	avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249 )	2016-08-10 10:49:26 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gleb Smirnov	33dbe0800c	Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326 )	2016-08-04 10:24:19 -07:00
Gian Merlino	8030f1cb67	Be more respectful of maxRowsInMemory. (#3284 ) - Appenderator: Respect maxRowsInMemory across all sinks. - KafkaIndexTask: Respect maxRowsInMemory across all partitions.	2016-07-26 15:02:35 -06:00
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Charles Allen	bfa5c05aaa	Make global lookup cache introspector class public (#3199 ) * Make global lookup cache introspector class public * Fixes #3187 * Make KafkaLookupExtractorIntrospectionHandler a public static class	2016-07-01 15:50:57 -07:00
Xavier Léauté	485e381387	remove datasource from hadoop output path (#3196 ) fixes #2083, follow-up to #1702	2016-06-29 08:53:45 -07:00
David Lim	1d40df4bb7	fix kafka consumer concurrent access during shutdown (#3193 )	2016-06-28 13:23:17 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
du00cs	ebd654228b	fix: avro types exception in sketch (#3167 )	2016-06-22 15:54:20 -05:00
Charles Allen	674f94083e	Add more logging around failed S3DataSegmentMover DeleteExceptions (#3104 ) * Add more logging around failed S3DataSegmentMover DeleteExceptions * Fix test NPE	2016-06-16 14:58:33 -07:00
Charles Allen	f7fa1d8c62	[QTL] Allow S3 version finder to search entire s3 object key (#3139 ) * Allow S3 version finder to search entire s3 object key * Previously only was able to search immediate "directory" * Update method javadoc * Expand docs a bit better	2016-06-13 21:02:28 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
David Lim	4faa298977	update kafka client for kafka indexing service to 0.9.0.1 (#3109 )	2016-06-08 06:51:03 -07:00
Charles Allen	8cac710546	Async lookups-cached-global by default (#3074 ) * Async lookups-cached-global by default * Also better lookup docs * Fix test timeouts * Fix timing of deserialized test * Fix problem with 0 wait failing immediately	2016-06-03 15:58:10 -05:00
David Lim	a2290a8f05	support seamless config changes (#3051 )	2016-06-03 13:50:19 -07:00
Charles Allen	447033985e	Make S3DataSegmentMover not bother checking for items if they are the same (#3032 ) * Make S3DataSegmentMover not bother checking for items if they are the same	2016-06-02 17:27:21 +01:00
David Lim	f6c39cc844	Kafka task minimum message time (#3035 ) * add KafkaIndexTask support for minimumMessageTime * add Kafka supervisor support for lateMessageRejectionPeriod	2016-05-31 11:37:00 -07:00
David Lim	3ef24c03b3	Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006 ) * validate X-Druid-Task-Id header in request and add header to response * modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant	2016-05-25 22:05:18 -07:00
Charles Allen	8024b915e2	[QTL] Implement LookupExtractorFactory of namespaced lookup (#2926 ) * support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions - druid-namespace-lookup and druid-kafka-extraction-namespace are modified - However, druid-namespace-lookup still has configuration about ON/OFF HEAP cache manager selection, which is not namespace wide configuration but node wide configuration as multiple namespace shares the same cache manager * update KafkaExtractionNamespaceTest to reflect argument signature changes * Add more synchronization functionality to NamespaceLookupExtractorFactory * Remove old way of using extraction namespaces * resolve compile error by supporting LookupIntrospectHandler * Remove kafka lookups * Remove unused stuff * Fix start and stop behavior to be consistent with new javadocs * Remove unused strings * Add timeout option * Address comments on configurations and improve docs * Add more options and update hash key and replaces * Move monitoring to the overriding classes * Add better start/stop logging * Remove old docs about namespace names * Fix bad comma * Add `@JsonIgnore` to lookup factory * Address code review comments * Remove ExtractionNamespace from module json registration * Fix problems with naming and initialization. Add tests * Optimize imports / reformat * Fix future not being properly cancelled on failed initial scheduling * Fix delete returns * Add more docs about whole introspection * Add `/version` introspection point for lookups * Add more tests and address comments * Add StaticMap extraction namespace for testing. Also add a bunch of tests * Move cache system property to `druid.lookup.namespace.cache.type` * Make VERSION lower case * Change poll period to 0ms for StaticMap * Move cache key to bytebuffer * Change hashCode and equals on static map extraction fn * Add more comments on StaticMap * Address comments * Make scheduleAndWait use a latch * Sanity renames and fix imports * Remove extra info in docs * Fix review comments * Strengthen failure on start from warn to error * Address comments * Rename namespace-lookup to lookups-cached-global * Fix injective mis-naming * Also add serde test	2016-05-24 10:56:40 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Himanshu	d3e9c47a5f	use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943 )	2016-05-13 10:06:31 +05:30
Slim	45b2e65d75	[QTL] adding listDelimiter to lookup parser spec (#2941 ) * adding listDelimiter to lookup parser spec * cleaning code	2016-05-10 15:41:16 +05:30
Charles Allen	90b0b0a4ad	Make URIExtraction not require FileSystem impls for URIs it understands (#2929 ) * Make URIExtraction not require FileSystem impls for URIs it understands * Fixes #2928 * Preserve URI information * Simply case for exact matching * Move unused variable	2016-05-08 23:23:53 +05:30
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Charles Allen	2a769a9fb7	Make S3DataSegmentPuller do GET requests less often (#2900 ) * Make S3DataSegmentPuller do GET requests less often * Fixes #2894 * Run intellij formatting on S3Utils * Remove forced stream fetching on getVersion * Remove unneeded finalize * Allow initial object fetching to fail and be retried	2016-05-04 16:21:35 -07:00
Gian Merlino	f8ddfb9a4b	Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922 ) Fixes #2912.	2016-05-04 13:54:34 -07:00
Charles Allen	6b957aa072	[QTL] Make URI Exctraction Namespace take more sane arguments (#2738 ) * Make URI Exctraction Namespace take more sane arguments * Fixes https://github.com/druid-io/druid/issues/2669 * Update docs * Rename error message * Undo overzealous deletion of docs * Explain caching mechanism a bit more in docs	2016-05-02 12:54:34 -07:00
Charles Allen	54b717bdc3	[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800 ) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior	2016-05-02 09:45:13 -07:00
Gian Merlino	67b47c982f	Datasketches: Remove isInputThetaSketch from cache key. (#2899 )	2016-04-28 18:14:52 -07:00
Gian Merlino	16080dc54f	Adjust colliding aggregator cache IDs. (#2891 ) - Renumbered ApproximateHistogramAggregatorFactory from 8 to 12, 8 was taken by CardinalityAggregatorFactory - Renumbered ApproximateHistogramFoldingAggregatorFactory from 9 to 13, 9 was taken by FilteredAggregatorFactory	2016-04-28 10:11:33 -07:00
Gian Merlino	909abd17f3	Sketch cache key should include size, isInputThetaSketch. (#2893 )	2016-04-28 10:10:46 -07:00
David Lim	7641f2628f	add control and status endpoints to KafkaIndexTask (#2730 )	2016-04-21 15:34:59 -07:00
Xavier Léauté	5938d9085b	Stream segments from database (#2859 ) * Avoids fetching all segment records into heap by JDBC driver * Set connection to read-only to help database optimize queries * Update JDBC drivers (MySQL has fixes for streaming results)	2016-04-21 05:40:07 +08:00
Gian Merlino	08c784fbf6	KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844 ) segment creation deterministic. This means that each segment will contain data from just one Kafka partition. So, users will probably not want to have a super high number of Kafka partitions... Fixes #2703.	2016-04-18 22:29:52 -07:00
Xavier Léauté	0f8a037bcd	support PostgreSQL >= 9.5 upsert capability	2016-04-01 16:53:27 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Himanshu	f26e73d133	Merge pull request #2720 from gianm/druid-api Move druid-api into the druid repo.	2016-03-24 15:51:10 -05:00
Gian Merlino	7e7a886f65	Move druid-api into the druid repo. This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf in the druid-api repo.	2016-03-24 11:04:34 -07:00
Himanshu Gupta	4aead38130	fix SketchEstimate post aggregator's getComparator() and test changes to verify same	2016-03-24 10:11:06 -05:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Slim	cf342d8d3c	Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility [QTL][Lookup] lookup module with the snapshot utility	2016-03-17 11:39:47 -05:00
Slim Bouguerra	0c86b29ef0	lookup module with the snapshot utility	2016-03-17 09:20:41 -05:00
Charles Allen	02805a74a1	Merge pull request #2648 from chtefi/master Ignore case when testing for table existence	2016-03-14 13:57:53 -07:00
Stéphane Derosiaux	416cb03687	Ignore case when testing for table existence	2016-03-13 11:17:30 +01:00
Gian Merlino	f22fb2c2cf	KafkaIndexTask. Reads a specific offset range from specific partitions, and can use dataSource metadata transactions to guarantee exactly-once ingestion. Each task has a finite lifecycle, so it is expected that some process will be supervising existing tasks and creating new ones when needed.	2016-03-10 18:41:43 -08:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Nishant	ba1185963b	Fix a bunch of dependencies * Eliminate exclusion groups from pull-deps * Only consider dependency nodes in pull-deps if they are not in the following scopes * provided * test * system * Fix a bunch of `<scope>provided</scope>` missing tags * Better exclusions for a couple of problematic libs	2016-03-10 10:18:08 -08:00
fjy	e3e932a4d4	refactor extensions into core and contrib	2016-03-08 17:12:09 -08:00

... 15 16 17 18 19 ...

954 Commits