druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	4203580290	URIExtractionNamespace: Treat null values in lookup maps as missing entries. (#3512 ) * URIExtractionNamespace: Treat null values in lookup maps as missing entries. This is useful when many logical lookups are derived from the same base JSON file, and some lookups' values may be unknown sometimes. * Add test, logging message, and address other comments. * Update docs.	2016-11-03 13:53:04 -07:00
Himanshu	2362effd8c	use FileSystem.rename(from,to,Rename.NONE) so that tmp dirs from replicating tasks are not moved to the segment directory created by first task (#3650 )	2016-11-02 15:58:55 -07:00
Roman Leventov	36a1543222	Lookup cache bug fixes (#3609 ) * Return better lastVersion from JDBCExtractionNamespaceCacheFactory's cache populator callable * Return the lastVersion if URI lookup last modified date is not later than the last cached, from URIExtractionNamespaceCacheFactory's cache populator callable * Fix a race condition in NamespaceExtractionCacheManager.cancelFuture() * Don't delete cache from NamespaceExtractionCacheManager if the ExtractionNamespaceCacheFactory returned the same version as the last; Better exception treatment in the scheduled cache updater runnable in NamespaceExtractionCacheManager (in particular, don't consume Errors); throw AssertionError in StaticMapExtractionNamespaceCacheFactory if the lastVersion != null) * In NamespaceExtractionCacheManager, put NamespaceImplData.latestVersion update in the same synchronized() block with swapAndClearCache(id, cacheId); Turn getPostRunnable which returns a callback into a simple updateNamespace() method * In StaticMapExtractionNamespaceCacheFactory.getCachePopulator(), check the input directly, not inside a callback * In URIExtractionNamespaceCacheFactory, allow URI last modified time to go backwards * Better logging in NamespaceExtractionCacheManager * Add comment on lastVersion nullability in URIExtractionNamespaceCacheFactory	2016-11-02 09:40:19 -07:00
Himanshu	eb70a12e43	fix cleanup of tmp dir in HdfsDataSegmentPusher (#3636 )	2016-11-01 12:45:38 -05:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Himanshu	23a8e22836	fix SketchMergeAggregatorFactory.finalizeResults, comparator and more UTs for timeseries, topN (#3613 )	2016-10-28 15:48:33 -07:00
Charles Allen	78159d7ca4	Move off-heap QTL global cache delete lock outside of subclass lock (#3597 ) * Move off-heap QTL global cache delete lock outside of subclass lock * Make `delete` thread safe	2016-10-27 22:23:53 +05:30
David Lim	3c56cbdf82	fix timing issue with KafkaLookupExtractorFactoryTest (#3604 )	2016-10-25 07:04:51 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
David Lim	c2ae734848	KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534 ) * allow run thread to gracefully complete instead of interrupting when stopGracefully() is called * add comments	2016-10-17 10:52:19 -04:00
Gian Merlino	c1d3b8a30c	Remove dropwizard-jdbc dependency from lookups-cached-single. (#3573 ) Fixes #3548.	2016-10-17 10:37:47 -04:00
Gian Merlino	0ce33bc95f	HdfsDataSegmentPusher: Properly include scheme, host in output path if necessary. (#3577 ) Fixes #3576.	2016-10-17 10:37:18 -04:00
David Lim	472c409b99	KafkaLookupExtractorFactory: shutdown kafka consumer on close() (#3539 ) * shutdown kafka consumer on close * handle close() race condition	2016-10-15 09:55:51 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
jaehong choi	6f21778364	Support finding segments in AWS S3. (#3399 ) * support finding segments from a AWS S3 storage. * add more Uts * address comments and add a document for the feature. * update docs indentation * update docs indentation * address comments. 1. add a Ut for json ser/deser for the config object. 2. more informant error message in a Ut. * address comments. 1. use @Min to validate the configuration object 2. change updateDescriptor to a string as it does not take an argument otherwise * fix a Ut failure - delete a Ut for testing default max length.	2016-10-10 17:27:09 -07:00
Parag Jain	c255dd8b19	fix datasegment metadata (#3555 )	2016-10-07 16:30:33 -05:00
Parag Jain	76a60a007e	create parent dir on HDFS if it does not exist (#3547 )	2016-10-06 16:14:00 -07:00
Himanshu	1523de08fb	SketchAggregatorFactory.combine(..) returns Union object now so that it can be reused across multiple combine(..) calls (#3471 )	2016-10-05 08:40:14 -07:00
Parag Jain	592903571a	add context to kafka supervisor for the kafka indexing task (#3464 )	2016-10-04 20:08:43 -05:00
Parag Jain	e419407eba	handle supervisor spec metadata failures (#3456 ) close kafka consumer in case supervisor start fails	2016-10-04 10:15:28 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Parag Jain	15c9918c65	log exceptions while trying to pause task (#3504 )	2016-09-23 16:53:23 -07:00
David Lim	9226d4af3c	configurable shutdownTimeout for Kakfa supervisor (#3497 ) * configurable shutdownTimeout * cr change	2016-09-23 13:26:45 -06:00
David Lim	ca9114b41b	add supervisor reset API (#3484 ) * add supervisor reset API * CR doc changes and kill running tasks / clear offsets from supervisor	2016-09-22 17:51:06 -07:00
Nishant	6099d20303	[FIX] ReleaseException when the path is being written by multiple tasks (#3494 ) * fix ReleaseException when the path is being written by multiple task * Do not throw IOException if another replica wins the race for segment creation fix if check * handle logging comments * fix test	2016-09-22 14:25:41 -05:00
Navis Ryu	74e1243c7e	Fix test fail of PollingLookupTest.testApplyAfterDataChange (#3489 )	2016-09-22 08:33:59 -07:00
Himanshu	05ea88df5c	fix kafka-indexing-service pom to not reference specific version but parent version for druid core dependencies (#3472 )	2016-09-20 15:18:21 -07:00
David Lim	96fcca18ea	update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452 )	2016-09-20 22:51:15 +05:30
Slim	3175e17a3b	Cached lookup module. first cut implementing JDBC cache (#2819 )	2016-09-16 13:45:54 -07:00
Charles Allen	95e08b38ea	[QTL] Reduced Locking Lookups (#3071 ) * Lockless lookups * Fix compile problem * Make stack trace throw instead * Remove non-germane change * * Add better naming to cache keys. Makes logging nicer * Fix #3459 * Move start/stop lock to non-interruptable for readability purposes	2016-09-16 11:54:23 -07:00
Gleb Smirnov	d981a2aa02	Avoid interrupting ZookeeperConsumerConnector.shutdown() #3346 (#3403 )	2016-09-14 17:44:27 -07:00
Himanshu	a069257d37	avro-extension -- feature to specify multiple avro reader schemas inline (#3368 ) * rename SimpleAvroBytesDecoder to InlineSchemaAvroBytesDecoder * feature to specify multiple schemas inline in avro module	2016-09-13 14:54:31 -07:00
Gian Merlino	bcff08826b	KafkaIndexTask: Treat null values as unparseable. (#3453 )	2016-09-13 10:56:38 -07:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Jonathan Wei	df766b2bbd	Add dimension handling interface for ingestion and segment creation (#3217 ) * Add dimension handling interface for ingestion and segment creation * update javadocs for DimensionHandler/DimensionIndexer * Move IndexIO row validation into DimensionHandler * Fix null column skipping in mergerV9 * Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion * Fix java7 test failure	2016-09-12 12:54:02 -07:00
Alexander Saydakov	1a5042ca26	updated dependency on sketches-core (#3443 ) * updated dependency on sketches-core to 0.7.0 * Use sketches-core-0.4.1, which is the latest version still compatible with JDK7	2016-09-09 16:21:32 -07:00
David Lim	146a17de48	KafkaIndexTask: allow pause to break out of retry loop (#3401 )	2016-09-06 22:29:37 -06:00
David Lim	5b1ae21bd1	retry calls to getStartTime (#3429 )	2016-09-06 14:02:22 -07:00
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
David Lim	ed924bf214	allow registrants to opt out of announcing themselves when registering as a chat handler (#3360 )	2016-08-16 10:51:28 +05:30
Himanshu	70d99fe3c6	Initialize ApproximateHistogram Module in ApproximateHistogramGroupByQueryTest (#3363 ) or else the test fails if ran independently.	2016-08-15 10:19:33 -07:00
Himanshu	46da682231	avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249 )	2016-08-10 10:49:26 -07:00
Jonathan Wei	890e3bdd3f	More informative query unit test names (#3342 )	2016-08-09 22:24:48 -07:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gleb Smirnov	33dbe0800c	Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326 )	2016-08-04 10:24:19 -07:00
Gian Merlino	8030f1cb67	Be more respectful of maxRowsInMemory. (#3284 ) - Appenderator: Respect maxRowsInMemory across all sinks. - KafkaIndexTask: Respect maxRowsInMemory across all partitions.	2016-07-26 15:02:35 -06:00
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Charles Allen	bfa5c05aaa	Make global lookup cache introspector class public (#3199 ) * Make global lookup cache introspector class public * Fixes #3187 * Make KafkaLookupExtractorIntrospectionHandler a public static class	2016-07-01 15:50:57 -07:00
Xavier Léauté	485e381387	remove datasource from hadoop output path (#3196 ) fixes #2083, follow-up to #1702	2016-06-29 08:53:45 -07:00
David Lim	1d40df4bb7	fix kafka consumer concurrent access during shutdown (#3193 )	2016-06-28 13:23:17 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
du00cs	ebd654228b	fix: avro types exception in sketch (#3167 )	2016-06-22 15:54:20 -05:00
Charles Allen	674f94083e	Add more logging around failed S3DataSegmentMover DeleteExceptions (#3104 ) * Add more logging around failed S3DataSegmentMover DeleteExceptions * Fix test NPE	2016-06-16 14:58:33 -07:00
Charles Allen	f7fa1d8c62	[QTL] Allow S3 version finder to search entire s3 object key (#3139 ) * Allow S3 version finder to search entire s3 object key * Previously only was able to search immediate "directory" * Update method javadoc * Expand docs a bit better	2016-06-13 21:02:28 -07:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
David Lim	4faa298977	update kafka client for kafka indexing service to 0.9.0.1 (#3109 )	2016-06-08 06:51:03 -07:00
Charles Allen	8cac710546	Async lookups-cached-global by default (#3074 ) * Async lookups-cached-global by default * Also better lookup docs * Fix test timeouts * Fix timing of deserialized test * Fix problem with 0 wait failing immediately	2016-06-03 15:58:10 -05:00
David Lim	a2290a8f05	support seamless config changes (#3051 )	2016-06-03 13:50:19 -07:00
Charles Allen	447033985e	Make S3DataSegmentMover not bother checking for items if they are the same (#3032 ) * Make S3DataSegmentMover not bother checking for items if they are the same	2016-06-02 17:27:21 +01:00
David Lim	f6c39cc844	Kafka task minimum message time (#3035 ) * add KafkaIndexTask support for minimumMessageTime * add Kafka supervisor support for lateMessageRejectionPeriod	2016-05-31 11:37:00 -07:00
David Lim	3ef24c03b3	Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006 ) * validate X-Druid-Task-Id header in request and add header to response * modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant	2016-05-25 22:05:18 -07:00
Charles Allen	8024b915e2	[QTL] Implement LookupExtractorFactory of namespaced lookup (#2926 ) * support LookupReferencesManager registration of namespaced lookup and eliminate static configurations for lookup from namespecd lookup extensions - druid-namespace-lookup and druid-kafka-extraction-namespace are modified - However, druid-namespace-lookup still has configuration about ON/OFF HEAP cache manager selection, which is not namespace wide configuration but node wide configuration as multiple namespace shares the same cache manager * update KafkaExtractionNamespaceTest to reflect argument signature changes * Add more synchronization functionality to NamespaceLookupExtractorFactory * Remove old way of using extraction namespaces * resolve compile error by supporting LookupIntrospectHandler * Remove kafka lookups * Remove unused stuff * Fix start and stop behavior to be consistent with new javadocs * Remove unused strings * Add timeout option * Address comments on configurations and improve docs * Add more options and update hash key and replaces * Move monitoring to the overriding classes * Add better start/stop logging * Remove old docs about namespace names * Fix bad comma * Add `@JsonIgnore` to lookup factory * Address code review comments * Remove ExtractionNamespace from module json registration * Fix problems with naming and initialization. Add tests * Optimize imports / reformat * Fix future not being properly cancelled on failed initial scheduling * Fix delete returns * Add more docs about whole introspection * Add `/version` introspection point for lookups * Add more tests and address comments * Add StaticMap extraction namespace for testing. Also add a bunch of tests * Move cache system property to `druid.lookup.namespace.cache.type` * Make VERSION lower case * Change poll period to 0ms for StaticMap * Move cache key to bytebuffer * Change hashCode and equals on static map extraction fn * Add more comments on StaticMap * Address comments * Make scheduleAndWait use a latch * Sanity renames and fix imports * Remove extra info in docs * Fix review comments * Strengthen failure on start from warn to error * Address comments * Rename namespace-lookup to lookups-cached-global * Fix injective mis-naming * Also add serde test	2016-05-24 10:56:40 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Himanshu	d3e9c47a5f	use correct ObjectMapper in Index[IO/Merger] in AggregationTestHelper and minor fix in theta sketch SketchMergeAggregatorFactory.getMergingFactory(..) (#2943 )	2016-05-13 10:06:31 +05:30
Slim	45b2e65d75	[QTL] adding listDelimiter to lookup parser spec (#2941 ) * adding listDelimiter to lookup parser spec * cleaning code	2016-05-10 15:41:16 +05:30
Charles Allen	90b0b0a4ad	Make URIExtraction not require FileSystem impls for URIs it understands (#2929 ) * Make URIExtraction not require FileSystem impls for URIs it understands * Fixes #2928 * Preserve URI information * Simply case for exact matching * Move unused variable	2016-05-08 23:23:53 +05:30
David Lim	b489f63698	Supervisor for KafkaIndexTask (#2656 ) * supervisor for kafka indexing tasks * cr changes	2016-05-04 23:13:13 -07:00
Charles Allen	2a769a9fb7	Make S3DataSegmentPuller do GET requests less often (#2900 ) * Make S3DataSegmentPuller do GET requests less often * Fixes #2894 * Run intellij formatting on S3Utils * Remove forced stream fetching on getVersion * Remove unneeded finalize * Allow initial object fetching to fail and be retried	2016-05-04 16:21:35 -07:00
Gian Merlino	f8ddfb9a4b	Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922 ) Fixes #2912.	2016-05-04 13:54:34 -07:00
Charles Allen	6b957aa072	[QTL] Make URI Exctraction Namespace take more sane arguments (#2738 ) * Make URI Exctraction Namespace take more sane arguments * Fixes https://github.com/druid-io/druid/issues/2669 * Update docs * Rename error message * Undo overzealous deletion of docs * Explain caching mechanism a bit more in docs	2016-05-02 12:54:34 -07:00
Charles Allen	54b717bdc3	[QTL] Move kafka-extraction-namespace to the Lookup framework. (#2800 ) * Move kafka-extraction-namespace to the Lookup framework. * Address comments * Fix missing kafka introspection * Fix tests to be less racy * Make testing a bit more leniant * Make tests even more forgiving * Add comments to kafka lookup cache method * Move startStopLock to just use started * Make start() and stop() idempotent * Forgot to update test after last change, test now accounts for idempotency * Add extra idempotency on stop check * Add more descriptive docs of behavior	2016-05-02 09:45:13 -07:00
Gian Merlino	67b47c982f	Datasketches: Remove isInputThetaSketch from cache key. (#2899 )	2016-04-28 18:14:52 -07:00
Gian Merlino	16080dc54f	Adjust colliding aggregator cache IDs. (#2891 ) - Renumbered ApproximateHistogramAggregatorFactory from 8 to 12, 8 was taken by CardinalityAggregatorFactory - Renumbered ApproximateHistogramFoldingAggregatorFactory from 9 to 13, 9 was taken by FilteredAggregatorFactory	2016-04-28 10:11:33 -07:00
Gian Merlino	909abd17f3	Sketch cache key should include size, isInputThetaSketch. (#2893 )	2016-04-28 10:10:46 -07:00
David Lim	7641f2628f	add control and status endpoints to KafkaIndexTask (#2730 )	2016-04-21 15:34:59 -07:00
Xavier Léauté	5938d9085b	Stream segments from database (#2859 ) * Avoids fetching all segment records into heap by JDBC driver * Set connection to read-only to help database optimize queries * Update JDBC drivers (MySQL has fixes for streaming results)	2016-04-21 05:40:07 +08:00
Gian Merlino	08c784fbf6	KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844 ) segment creation deterministic. This means that each segment will contain data from just one Kafka partition. So, users will probably not want to have a super high number of Kafka partitions... Fixes #2703.	2016-04-18 22:29:52 -07:00
Xavier Léauté	0f8a037bcd	support PostgreSQL >= 9.5 upsert capability	2016-04-01 16:53:27 -07:00
Gian Merlino	977e867ad8	Downgrade geoip2, exclude com.google.http-client. Reverts "Update com.maxmind.geoip2 to 2.6.0" and exclude the google http client from com.maxmind.geoip2. This should satisfy the original need from #2646 (wanting to run Druid along with an upgraded com.google.http-client) while preventing Jackson conflicts pointed out in #2717. Fixes #2717. This reverts commit `21b7572533`.	2016-03-25 14:43:22 -07:00
Himanshu	f26e73d133	Merge pull request #2720 from gianm/druid-api Move druid-api into the druid repo.	2016-03-24 15:51:10 -05:00
Gian Merlino	7e7a886f65	Move druid-api into the druid repo. This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf in the druid-api repo.	2016-03-24 11:04:34 -07:00
Himanshu Gupta	4aead38130	fix SketchEstimate post aggregator's getComparator() and test changes to verify same	2016-03-24 10:11:06 -05:00
jon-wei	a59c9ee1b1	Support use of DimensionSchema class in DimensionsSpec	2016-03-21 13:12:04 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Slim	cf342d8d3c	Merge pull request #2517 from b-slim/adding_lookup_snapshot_utility [QTL][Lookup] lookup module with the snapshot utility	2016-03-17 11:39:47 -05:00
Slim Bouguerra	0c86b29ef0	lookup module with the snapshot utility	2016-03-17 09:20:41 -05:00
Charles Allen	02805a74a1	Merge pull request #2648 from chtefi/master Ignore case when testing for table existence	2016-03-14 13:57:53 -07:00
Stéphane Derosiaux	416cb03687	Ignore case when testing for table existence	2016-03-13 11:17:30 +01:00
Gian Merlino	f22fb2c2cf	KafkaIndexTask. Reads a specific offset range from specific partitions, and can use dataSource metadata transactions to guarantee exactly-once ingestion. Each task has a finite lifecycle, so it is expected that some process will be supervising existing tasks and creating new ones when needed.	2016-03-10 18:41:43 -08:00
Gian Merlino	187569e702	DataSource metadata. Geared towards supporting transactional inserts of new segments. This involves an interface "DataSourceMetadata" that allows combining of partially specified metadata (useful for partitioned ingestion). DataSource metadata is stored in a new "dataSource" table.	2016-03-10 17:41:50 -08:00
Nishant	ba1185963b	Fix a bunch of dependencies * Eliminate exclusion groups from pull-deps * Only consider dependency nodes in pull-deps if they are not in the following scopes * provided * test * system * Fix a bunch of `<scope>provided</scope>` missing tags * Better exclusions for a couple of problematic libs	2016-03-10 10:18:08 -08:00
fjy	e3e932a4d4	refactor extensions into core and contrib	2016-03-08 17:12:09 -08:00

... 13 14 15 16 17

844 Commits