druid

Commit Graph

Author	SHA1	Message	Date
Justin Borromeo	7bfa77d3c1	Merge branch 'Update-Query-Interrupted-Exception' into 6088-Time-Ordering-On-Scans-N-Way-Merge	2019-03-12 16:57:45 -07:00
Clint Wylie	d7ba19d477	sql, filters, and virtual columns (#6902 ) * refactor sql planning to re-use expression virtual columns when possible when constructing a DruidQuery, allowing virtual columns to be defined in filter expressions, and making resulting native druid queries more concise. also minor refactor of built-in sql aggregators to maximize code re-use * fix it * fix it in the right place * fixup for base64 stuff * fixup tests * fix merge conflict on import order * fixup * fix imports * fix tests * review comments * refactor * re-arrange * better javadoc * fixup merge * fixup tests * fix accidental changes	2019-03-11 11:37:58 -07:00
Justin Borromeo	2d1978d571	Merge branch 'master' into 6088-Time-Ordering-On-Scans-N-Way-Merge	2019-03-04 15:24:49 -08:00
Gian Merlino	fa218f5160	Fix two SeekableStream serde issues. (#7176 ) * Fix two SeekableStream serde issues. 1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed for split 0.13 / 0.14 clusters to work properly during a rolling update. 2) Abstract classes don't need JsonCreator constructors; remove them. * Comment fixes.	2019-03-01 22:27:08 -08:00
Justin Borromeo	5bd0e1a32c	Merge branch 'master' into 6088-Time-Ordering-On-Scans-N-Way-Merge	2019-02-26 16:39:16 -08:00
Jihoon Son	9a066558a4	Fix exception when the scheme is missing in endpointUrl for S3 (#7129 ) * Fix exception when the scheme is missing in endpointUrl for S3 * add null check	2019-02-25 11:10:35 -08:00
Himanshu Pandey	8b803cbc22	Added checkstyle for "Methods starting with Capital Letters" (#7118 ) * Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this. * Un-abbreviate the method names in the calcite tests * Fixed checkstyle errors * Changed asserts position in the code	2019-02-23 20:10:31 -08:00
David Glasser	1c2753ab90	ParallelIndexSubTask: support ingestSegment in delegating factories (#7089 ) IndexTask had special-cased code to properly send a TaskToolbox to a IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory, but ParallelIndexSubTask didn't. This change refactors IngestSegmentFirehoseFactory so that it doesn't need a TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory directly injected into it. This also refactors SegmentLoaderFactory so it doesn't depend on an injectable SegmentLoaderConfig, since its only method always replaces the preconfigured SegmentLoaderConfig anyway. This makes it possible to use SegmentLoaderFactory without setting druid.segmentCaches.locations to some dummy value. Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory to list data segments outside of connect() --- specifically, to make it a FiniteFirehoseFactory which can query the coordinator in order to calculate its splits. See #7048. This also adds missing datasource name URL-encoding to an API used by CoordinatorBasedSegmentHandoffNotifier.	2019-02-23 17:02:56 -08:00
Justin Borromeo	69b24bd851	Merge branch 'master' into 6088-Time-Ordering-On-Scans-N-Way-Merge	2019-02-22 18:13:26 -08:00
Justin Borromeo	06a5218917	Wrote docs	2019-02-22 16:59:57 -08:00
Jihoon Son	4e2b085201	Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911 ) * Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file * delete descriptor.file when killing segments * fix test * Add doc for ha * improve warning	2019-02-20 15:10:29 -08:00
Surekha	80a2ef7be4	Support kafka transactional topics (#5404 ) (#6496 ) * Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0	2019-02-18 11:50:08 -08:00
scrawfor	0fa9000849	Add Postgresql SqlFirehose (#6813 ) * Add Postgresql SqlFirehose * Fix Code Style. * Fix style. * Fix Import Order. * Add Line Break before package.	2019-02-14 22:52:03 -08:00
Mingming Qiu	d0abf5c20a	fix kafka index task doesn't resume when recieve duplicate request (#6990 ) * fix kafka index task doesn't resume when recieve duplicate request * add unit test	2019-02-12 13:24:28 -08:00
Ankit Kothari	16a4a50e91	[Issue #6967 ] NoClassDefFoundError when using druid-hdfs-storage (#7015 ) * Fix: 1. hadoop-common dependency for druid-hdfs and druid-kerberos extensions Refactoring: 2. Hadoop config call in the inner static class to avoid class path conflicts for stopGracefully kill * Fix: 1. hadoop-common test dependency * Fix: 1. Avoid issue of kill command once the job is actually completed	2019-02-08 18:26:37 -08:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00
Furkan KAMACI	3097562adf	Improper getter value is fixed. (#6930 ) * Improper getter value is fixed. * Test class is added.	2019-02-07 11:51:07 -08:00
anantmf	315ccb76b8	Fix for getSingleObjectSummary, replacing keyCount with objectSummaries().size (#7000 ) * Instead of using keyCount, changing it to check the size of objectSummaries. For issue: https://github.com/apache/incubator-druid/issues/6980 https://github.com/apache/incubator-druid/issues/6980#issuecomment-460006580 * Changing another usage of keyCount with size of objectSummaries. * Adding some comments to explain why using keyCount is not working as expected.	2019-02-05 15:45:44 -08:00
Jonathan Wei	8bc5eaa908	Set version to 0.14.0-incubating-SNAPSHOT (#7003 )	2019-02-04 19:36:20 -08:00
Roman Leventov	0e926e8652	Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898 ) * Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool * Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java * Remove unnecessary comment * Checkstyle * Fix getFromExtensions() * Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization * Fix UriCacheGeneratorTest * Workaround issue with MaterializedViewQueryQueryToolChest * Strengthen Appenderator's contract regarding concurrency	2019-02-04 09:18:12 -08:00
Clint Wylie	6207b66e20	fix build (#6994 )	2019-02-03 09:38:51 -08:00
Jonathan Wei	953b96d0a4	Add more sketch aggregator support in Druid SQL (#6951 ) * Add more sketch aggregator support in Druid SQL * Add docs * Tweak module serde register * Fix tests * Checkstyle * Test fix * PR comment * PR comment * PR comments	2019-02-02 22:34:53 -08:00
Surekha	7baa33049c	Introduce published segment cache in broker (#6901 ) * Add published segment cache in broker * Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set * Added a trueEquals to DataSegment class * Use separate interner for realtime and historical segments * Remove trueEquals as it's not used anymore, change log message * PR comments * PR comments * Fix tests * PR comments * Few more modification to * change the coordinator api * removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments * Change the poll behaviour to avoid multiple poll execution at same time * minor changes * PR comments * PR comments * Make the segment cache in broker off by default * Added a config to PlannerConfig * Moved MetadataSegmentView to sql module * Add doc for new planner config * Update documentation * PR comments * some more changes * PR comments * fix test * remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR * minor changes * some changes to initialization * use pollPeriodInMS * Add boolean cachePopulated to check if first poll succeeds * Remove poll from start() * take the log message out of condition in stop()	2019-02-02 22:27:13 -08:00
Clint Wylie	7a5827e12e	bloom filter sql aggregator (#6950 ) * adds sql aggregator for bloom filter, adds complex value serde for sql results * fix tests * checkstyle * fix copy-paste	2019-02-01 13:54:46 -08:00
Roman Leventov	f7df5fedcc	Add several missing inspectRuntimeShape() calls (#6893 ) * Add several missing inspectRuntimeShape() calls * Add lgK to runtime shapes	2019-01-31 20:04:26 -08:00
Jihoon Son	d4fbbb8deb	Support protocol configuration for S3 (#6954 ) * Support protocol configuration for S3 * Add doc	2019-01-30 19:32:00 -08:00
Clint Wylie	a6d81c0d16	Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397 ) * blooming aggs * partially address review * fix docs * minor test refactor after rebase * use copied bloomkfilter * add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests * add methods to BloomKFilter to get number of set bits, use in comparator, fixes * more docs * fix * fix style * simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets * oof, more fixes * more sane docs example * fix it * do the right thing in the right place * formatting * fix * avoid conflict * typo fixes, faster comparator, docs for comparator behavior * unused imports * use buffer comparator instead of deserializing * striped readwrite lock for buffer agg, null handling comparator, other review changes * style fixes * style * remove sync for now * oops * consistency * inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception * CardinalityBufferAggregator inspect selectors instead of selectorPluses * fix style * refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments * adjustment * fix teamcity error? * rename nil aggs to empty, change empty agg constructor signature, add comments * use stringutils base64 stuff to be chill with master * add aggregate combiner, comment	2019-01-29 20:05:17 +07:00
Gian Merlino	ba33bdc497	Add exclusions to limit doubling up on jars. (#6927 )	2019-01-28 11:06:30 -08:00
Clint Wylie	af3cbc3687	add bloom filter druid expression (#6904 ) * add "bloom_filter_test" druid expression to support bloom filters in ExpressionVirtualColumn and ExpressionDimFilter and sql expressions * more docs * use java.util.Base64, doc fixes	2019-01-28 08:41:45 -05:00
Benedict Jin	72a571fbf7	For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913 ) * * Add few methods about base64 into StringUtils * Use `java.util.Base64` instead of others * Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis * Rename encodeBase64String & decodeBase64String * Update druid-forbidden-apis	2019-01-25 17:32:29 -08:00
Ankit Kothari	8492d94f59	Kill Hadoop MR task on kill of Hadoop ingestion task (#6828 ) * KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute Run in jobby is now split into 2 1. submitAndGetHadoopJobId followed by 2. run submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called HadoopJob files have been changed to incorporate the changes to jobby Addressing PR comments * Addressing PR comments - Fix taskDir * Addressing PR comments - For changing the contract of Task.stopGracefully() `SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task * Addressing PR comments 1. Formatting 2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself * Addressing PR comments 1. POM change. Moving hadoop dependency to indexing-hadoop * Addressing PR comments 1. stopGracefully now accepts TaskConfig as a param Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask` Changing tests to make TaskConfig param isRestoreOnRestart to true	2019-01-25 15:43:06 -08:00
Clint Wylie	ffded61f5e	fix build (#6897 )	2019-01-21 17:18:14 -08:00
Roman Leventov	8eae26fd4e	Introduce SegmentId class (#6370 ) * Introduce SegmentId class * tmp * Fix SelectQueryRunnerTest * Fix indentation * Fixes * Remove Comparators.inverse() tests * Refinements * Fix tests * Fix more tests * Remove duplicate DataSegmentTest, fixes #6064 * SegmentDescriptor doc * Fix SQLMetadataStorageUpdaterJobHandler * Fix DataSegment deserialization for ignoring id * Add comments * More comments * Address more comments * Fix compilation * Restore segment2 in SystemSchemaTest according to a comment * Fix style * fix testServerSegmentsTable * Fix compilation * Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes * Fix SystemSchemaTest * Fix style * Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment * Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164 * Fix compilation	2019-01-21 11:11:10 -08:00
Jonathan Wei	68f744ec0a	Fixed buckets histogram aggregator (#6638 ) * Fixed buckets histogram aggregator * PR comments * More PR comments * Checkstyle * TeamCity * More TeamCity * PR comment * PR comment * Fix doc formatting	2019-01-17 14:51:16 -08:00
Alexander Saydakov	161dac1d23	datasketches quantiles module - implemented makeAggregateCombiner (#6882 ) * implemented makeAggregateCombiner * fixed import order	2019-01-17 14:09:55 -08:00
Dayue Gao	5b8a221713	Add SQL id, request logs, and metrics (#6302 ) * use SqlLifecyle to manage sql execution, add sqlId * add sql request logger * fix UT * rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc * add docs and more sql request logger impls * add UT for http and jdbc * fix forbidden use of com.google.common.base.Charsets * fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId * do not use default method in QueryMetrics interface * capitalize 'sql' everywhere in the non-property parts of the docs * use RequestLogger interface to log sql query * minor bugfixes and add switching request logger * add filePattern configs for FileRequestLogger * address review comments, adjust sql request log format * fix inspection error * try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider	2019-01-15 23:12:59 -08:00
Charles Allen	5d2947cd52	Use Guava Compatible immediate executor service (#6815 ) * Use multi-guava version friendly direct executor implementation * Don't use a singleton * Fix strict compliation complaints * Copy Guava's DirectExecutor * Fix javadoc * Imports are the devil	2019-01-11 10:42:19 -08:00
Jonathan Wei	b18d681551	Use kafka_2.12-0.10.2.2 (#6846 )	2019-01-10 20:52:55 -08:00
Jihoon Son	c35a39d70b	Add support maxRowsPerSegment for auto compaction (#6780 ) * Add support maxRowsPerSegment for auto compaction * fix build * fix build * fix teamcity * add test * fix test * address comment	2019-01-10 09:50:14 -08:00
Clint Wylie	ccfd1244d1	fix parquet parse performance issue (#6833 ) * check that value is present before conversion to prevent silent, expensive exception and fix another bug * cleanup * now with less parenthesis	2019-01-10 09:18:57 -08:00
Mingming Qiu	6761663509	make kafka poll timeout can be configured (#6773 ) * make kafka poll timeout can be configured * add doc * rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS	2019-01-03 12:16:02 +08:00
Joshua Sun	7c7997e8a1	Add Kinesis Indexing Service to core Druid (#6431 ) * created seekablestream classes * created seekablestreamsupervisor class * first attempt to integrate kafa indexing service to use SeekableStream * seekablestream bug fixes * kafkarecordsupplier * integrated kafka indexing service with seekablestream * implemented resume/suspend and refactored some package names * moved kinesis indexing service into core druid extensions * merged some changes from kafka supervisor race condition * integrated kinesis-indexing-service with seekablestream * unite tests for kinesis-indexing-service * various bug fixes for kinesis-indexing-service * refactored kinesisindexingtask * finished up more kinesis unit tests * more bug fixes for kinesis-indexing-service * finsihed refactoring kinesis unit tests * removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions * kinesis-indexing-service code cleanup and docs * merge #6291 merge #6337 merge #6383 * added more docs and reordered methods * fixd kinesis tests after merging master and added docs in seekablestream * fix various things from pr comment * improve recordsupplier and add unit tests * migrated to aws-java-sdk-kinesis * merge changes from master * fix pom files and forbiddenapi checks * checkpoint JavaType bug fix * fix pom and stuff * disable checkpointing in kinesis * fix kinesis sequence number null in closed shard * merge changes from master * fixes for kinesis tasks * capitalized <partitionType, sequenceType> * removed abstract class loggers * conform to guava api restrictions * add docker for travis other modules test * address comments * improve RecordSupplier to supply records in batch * fix strict compile issue * add test scope for localstack dependency * kinesis indexing task refactoring * comments * github comments * minor fix * removed unneeded readme * fix deserialization bug * fix various bugs * KinesisRecordSupplier unable to catch up to earliest position in stream bug fix * minor changes to kinesis * implement deaggregate for kinesis * Merge remote-tracking branch 'upstream/master' into seekablestream * fix kinesis offset discrepancy with kafka * kinesis record supplier disable getPosition * pr comments * mock for kinesis tests and remove docker dependency for unit tests * PR comments * avg lag in kafkasupervisor #6587 * refacotred SequenceMetadata in taskRunners * small fix * more small fix * recordsupplier resource leak * revert .travis.yml formatting * fix style * kinesis docs * doc part2 * more docs * comments * comments2 revert string replace changes * comments * teamcity * comments part 1 * comments part 2 * comments part 3 * merge #6754 * fix injection binding * comments * KinesisRegion refactor * comments part idk lol * can't think of a commit msg anymore * remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner * commmmmmmmmmments * extra error handling in KinesisRecordSupplier getRecords * comments * quickfix * typo * oof	2018-12-21 12:49:24 -07:00
Jihoon Son	4591c56afb	Fix error handling after pause request in Kafka supervisor (#6754 ) * Fix error handling after pause request in kafka supervisor * fix test * fix test	2018-12-18 17:52:44 -08:00
Clint Wylie	4ec068642d	move parquet extension input formats up a level to `org.apache.druid.data.input.parquet.DruidParquetInputFormat` for `parquet` and `org.apache.druid.data.input.parquet.DruidParquetAvroInputFormat` for `parquet-avro` (#6727 )	2018-12-13 16:33:42 -08:00
Atul Mohan	86e3ae5b48	Add fail message (#6720 )	2018-12-11 08:05:50 -08:00
Gian Merlino	b7709e1245	FileUtils: Sync directory entry too on writeAtomically. (#6677 ) * FileUtils: Sync directory entry too on writeAtomically. See the fsync(2) man page for why this is important: https://linux.die.net/man/2/fsync This also plumbs CompressionUtils's "zip" function through writeAtomically, so the code for handling atomic local filesystem writes is all done in the same place. * Remove unused import. * Avoid FileOutputStream. * Allow non-atomic writes to overwrite. * Add some comments. And no need to flush an unbuffered stream.	2018-12-08 17:12:59 +01:00
Clint Wylie	43adb391c2	remove AbstractResourceFilter.isApplicable because it is not (#6691 ) * remove AbstractResourceFilter.isApplicable because it is not, add tests for OverlordResource.doShutdown and OverlordResource.shutdownTasksForDatasource * cleanup	2018-12-01 21:52:31 +08:00
Roman Leventov	ec38df7575	Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606 ) * Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns * Fix style * Fix HttpEmitterTest.timeoutEmptyQueue() * Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests * Clarify code	2018-12-01 01:12:56 +01:00
陈春斌	624f328ea1	lazy create descriptor in ProtobufInputRowParser (#6678 )	2018-11-28 21:59:29 -08:00
Mingming Qiu	c5405bb592	emit maxLag/avgLag in KafkaSupervisor (#6587 ) * emit maxLag/totalLag/avgLag in KafkaSupervisor * modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility	2018-11-28 02:11:14 -08:00

1 2 3 4 5 ...

496 Commits