druid

Commit Graph

Author	SHA1	Message	Date
Surekha	80a2ef7be4	Support kafka transactional topics (#5404 ) (#6496 ) * Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0	2019-02-18 11:50:08 -08:00
Mingming Qiu	d0abf5c20a	fix kafka index task doesn't resume when recieve duplicate request (#6990 ) * fix kafka index task doesn't resume when recieve duplicate request * add unit test	2019-02-12 13:24:28 -08:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00
Jonathan Wei	8bc5eaa908	Set version to 0.14.0-incubating-SNAPSHOT (#7003 )	2019-02-04 19:36:20 -08:00
Roman Leventov	0e926e8652	Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898 ) * Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool * Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java * Remove unnecessary comment * Checkstyle * Fix getFromExtensions() * Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization * Fix UriCacheGeneratorTest * Workaround issue with MaterializedViewQueryQueryToolChest * Strengthen Appenderator's contract regarding concurrency	2019-02-04 09:18:12 -08:00
Ankit Kothari	8492d94f59	Kill Hadoop MR task on kill of Hadoop ingestion task (#6828 ) * KillTask from overlord UI now makes sure that it terminates the underlying MR job, thus saving unnecessary compute Run in jobby is now split into 2 1. submitAndGetHadoopJobId followed by 2. run submitAndGetHadoopJobId is responsible for submitting the job and returning the jobId as a string, run monitors this job for completion JobHelper writes this jobId in the path provided by HadoopIndexTask which in turn is provided by the ForkingTaskRunner HadoopIndexTask reads this path when kill task is clicked to get hte jobId and fire the kill command via the yarn api. This is taken care in the stopGracefully method which is called in SingleTaskBackgroundRunner. Have enabled `canRestore` method to return `true` for HadoopIndexTask in order for the stopGracefully method to be called HadoopJob files have been changed to incorporate the changes to jobby Addressing PR comments * Addressing PR comments - Fix taskDir * Addressing PR comments - For changing the contract of Task.stopGracefully() `SingleTaskBackgroundRunner` calls stopGracefully in stop() and then checks for canRestore condition to return the status of the task * Addressing PR comments 1. Formatting 2. Removing `submitAndGetHadoopJobId` from `Jobby` and calling writeJobIdToFile in the job itself * Addressing PR comments 1. POM change. Moving hadoop dependency to indexing-hadoop * Addressing PR comments 1. stopGracefully now accepts TaskConfig as a param Handling isRestoreOnRestart in stopGracefully for `AppenderatorDriverRealtimeIndexTask, RealtimeIndexTask, SeekableStreamIndexTask` Changing tests to make TaskConfig param isRestoreOnRestart to true	2019-01-25 15:43:06 -08:00
Roman Leventov	8eae26fd4e	Introduce SegmentId class (#6370 ) * Introduce SegmentId class * tmp * Fix SelectQueryRunnerTest * Fix indentation * Fixes * Remove Comparators.inverse() tests * Refinements * Fix tests * Fix more tests * Remove duplicate DataSegmentTest, fixes #6064 * SegmentDescriptor doc * Fix SQLMetadataStorageUpdaterJobHandler * Fix DataSegment deserialization for ignoring id * Add comments * More comments * Address more comments * Fix compilation * Restore segment2 in SystemSchemaTest according to a comment * Fix style * fix testServerSegmentsTable * Fix compilation * Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes * Fix SystemSchemaTest * Fix style * Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment * Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164 * Fix compilation	2019-01-21 11:11:10 -08:00
Charles Allen	5d2947cd52	Use Guava Compatible immediate executor service (#6815 ) * Use multi-guava version friendly direct executor implementation * Don't use a singleton * Fix strict compliation complaints * Copy Guava's DirectExecutor * Fix javadoc * Imports are the devil	2019-01-11 10:42:19 -08:00
Jonathan Wei	b18d681551	Use kafka_2.12-0.10.2.2 (#6846 )	2019-01-10 20:52:55 -08:00
Jihoon Son	c35a39d70b	Add support maxRowsPerSegment for auto compaction (#6780 ) * Add support maxRowsPerSegment for auto compaction * fix build * fix build * fix teamcity * add test * fix test * address comment	2019-01-10 09:50:14 -08:00
Mingming Qiu	6761663509	make kafka poll timeout can be configured (#6773 ) * make kafka poll timeout can be configured * add doc * rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS	2019-01-03 12:16:02 +08:00
Joshua Sun	7c7997e8a1	Add Kinesis Indexing Service to core Druid (#6431 ) * created seekablestream classes * created seekablestreamsupervisor class * first attempt to integrate kafa indexing service to use SeekableStream * seekablestream bug fixes * kafkarecordsupplier * integrated kafka indexing service with seekablestream * implemented resume/suspend and refactored some package names * moved kinesis indexing service into core druid extensions * merged some changes from kafka supervisor race condition * integrated kinesis-indexing-service with seekablestream * unite tests for kinesis-indexing-service * various bug fixes for kinesis-indexing-service * refactored kinesisindexingtask * finished up more kinesis unit tests * more bug fixes for kinesis-indexing-service * finsihed refactoring kinesis unit tests * removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions * kinesis-indexing-service code cleanup and docs * merge #6291 merge #6337 merge #6383 * added more docs and reordered methods * fixd kinesis tests after merging master and added docs in seekablestream * fix various things from pr comment * improve recordsupplier and add unit tests * migrated to aws-java-sdk-kinesis * merge changes from master * fix pom files and forbiddenapi checks * checkpoint JavaType bug fix * fix pom and stuff * disable checkpointing in kinesis * fix kinesis sequence number null in closed shard * merge changes from master * fixes for kinesis tasks * capitalized <partitionType, sequenceType> * removed abstract class loggers * conform to guava api restrictions * add docker for travis other modules test * address comments * improve RecordSupplier to supply records in batch * fix strict compile issue * add test scope for localstack dependency * kinesis indexing task refactoring * comments * github comments * minor fix * removed unneeded readme * fix deserialization bug * fix various bugs * KinesisRecordSupplier unable to catch up to earliest position in stream bug fix * minor changes to kinesis * implement deaggregate for kinesis * Merge remote-tracking branch 'upstream/master' into seekablestream * fix kinesis offset discrepancy with kafka * kinesis record supplier disable getPosition * pr comments * mock for kinesis tests and remove docker dependency for unit tests * PR comments * avg lag in kafkasupervisor #6587 * refacotred SequenceMetadata in taskRunners * small fix * more small fix * recordsupplier resource leak * revert .travis.yml formatting * fix style * kinesis docs * doc part2 * more docs * comments * comments2 revert string replace changes * comments * teamcity * comments part 1 * comments part 2 * comments part 3 * merge #6754 * fix injection binding * comments * KinesisRegion refactor * comments part idk lol * can't think of a commit msg anymore * remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner * commmmmmmmmmments * extra error handling in KinesisRecordSupplier getRecords * comments * quickfix * typo * oof	2018-12-21 12:49:24 -07:00
Jihoon Son	4591c56afb	Fix error handling after pause request in Kafka supervisor (#6754 ) * Fix error handling after pause request in kafka supervisor * fix test * fix test	2018-12-18 17:52:44 -08:00
Roman Leventov	ec38df7575	Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() (#6606 ) * Simplify DruidNodeDiscoveryProvider; add DruidNodeDiscovery.Listener.nodeViewInitialized() method; prohibit and eliminate some suboptimal Java 8 patterns * Fix style * Fix HttpEmitterTest.timeoutEmptyQueue() * Add DruidNodeDiscovery.Listener.nodeViewInitialized() calls in tests * Clarify code	2018-12-01 01:12:56 +01:00
Mingming Qiu	c5405bb592	emit maxLag/avgLag in KafkaSupervisor (#6587 ) * emit maxLag/totalLag/avgLag in KafkaSupervisor * modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility	2018-11-28 02:11:14 -08:00
Roman Leventov	87b96fb1fd	Add checkstyle rules about imports and empty lines between members (#6543 ) * Add checkstyle rules about imports and empty lines between members * Add suppressions * Update Eclipse import order * Add empty line * Fix StatsDEmitter	2018-11-20 12:42:15 +01:00
Joshua Sun	7928e81e3f	kafkasupervisor checkpointing bug (#6639 )	2018-11-16 16:09:55 -08:00
hate13	c4cb4b4909	optimize loading end offsets of all partitions (#6623 )	2018-11-16 15:24:01 +08:00
Jihoon Son	d738ce4d2a	Enforce logging when killing a task (#6621 ) * Enforce logging when killing a task * fix test * address comment * address comment	2018-11-16 10:01:56 +08:00
David Lim	afb239b17a	add missing license headers, in particular to MD files; clean up RAT … (#6563 ) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg	2018-11-13 09:38:37 -08:00
Clint Wylie	e326086604	fix kafka indexing task not processing through end offsets on publish, fixes #6602 (#6603 )	2018-11-12 14:27:32 -08:00
Roman Leventov	54351a5c75	Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490 ) * Fix various bugs; Enable more IntelliJ inspections and update error-prone * Fix NPE * Fix inspections * Remove unused imports	2018-11-06 14:38:08 -08:00
QiuMM	676f5e6d7f	Prohibit some guava collection APIs and use JDK collection APIs directly (#6511 ) * Prohibit some guava collection APIs and use JDK APIs directly * reset files that changed by accident * sort codestyle/druid-forbidden-apis.txt alphabetically	2018-10-29 13:02:43 +01:00
Joshua Sun	f7753ef1e2	fix KafkaSupervisor stats report error (#6508 ) * fix kafkasupervisor stats 500 * added unit test * throw error if group already exists	2018-10-25 15:45:54 -07:00
Roman Leventov	84ac18dc1b	Catch some incorrect method parameter or call argument formatting patterns with checkstyle (#6461 ) * Catch some incorrect method parameter or call argument formatting patterns with checkstyle * Fix DiscoveryModule * Inline parameters_and_arguments.txt * Fix a bug in PolyBind * Fix formatting	2018-10-23 07:17:38 -03:00
elloooooo	1e82b6291e	Remove consumer.listTopics() method in case when too many topics in kafka causes the FullGC in Overlord (#6455 ) * remove consumer.listTopics() method * add consumerLock and exception handling for consumer.partitionFor() and remove some useless checks * add check in case consumer.partitionsFor() returns null * fix CI failure * fix failed UT * Revert "fix CI failure" This reverts commit `f839d09e1e`. * revert unless commit and re-commit the useful part to fix failed UT	2018-10-22 10:46:31 -07:00
Roman Leventov	aa121da25f	Use NodeType enum instead of Strings (#6377 ) * Use NodeType enum instead of Strings * Make NodeType constants uppercase * Fix CommonCacheNotifier and NodeType/ServerType comments * Reconsidering comment * Fix import * Add a comment to CommonCacheNotifier.NODE_TYPES	2018-10-14 20:49:38 -07:00
Clint Wylie	84598fba3b	combine druid-api, druid-common, java-util into druid-core (#6443 ) * combine druid-api, druid-common, java-util * spacing	2018-10-14 20:37:37 -07:00
Roman Leventov	e3397ba00f	Enforce Druid's exception class use (#6456 )	2018-10-13 16:35:14 -07:00
David Lim	20ab213ba6	change project versions to 0.13.0-incubating-SNAPSHOT (#6453 )	2018-10-11 19:28:01 -07:00
Atul Mohan	ab7b4798cc	Securing passwords used for SSL connections to Kafka (#6285 ) * Secure credentials in consumer properties * Merge master * Refactor property population into separate method * Fix property setter * Fix tests	2018-10-11 10:03:01 -07:00
Jonathan Wei	c7ac8785a1	Prevent failed KafkaConsumer creation from blocking overlord startup (#6383 ) * Prevent failed KafkaConsumer creation from blocking overlord startup * PR comments * Fix random task ID length * Adjust test timer * Use Integer.SIZE	2018-10-03 19:08:20 -07:00
Roman Leventov	3ae563263a	Renamed 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; misc refinements (#5957 ) This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR. Some of the changes: - Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names. - Generified `ComplexMetricExtractor`	2018-10-02 14:50:22 -03:00
QiuMM	993bc5e9d3	Fix Kafka Indexing Service notice handle thread may never terminate (#6337 ) * Fix Kafka Indexing Service notice handle thread may never terminate * address comment * handle null value	2018-09-26 20:09:53 -07:00
QiuMM	00ea8c00ac	using Entry directly instead of Map.Entry in KafkaSupervisor (#6291 )	2018-09-26 19:01:36 -07:00
Jihoon Son	6fb503c073	Deprecate task audit logging (#6368 ) * Deprecate task audit logging * fix test * fix it test	2018-09-26 16:28:02 -07:00
QiuMM	255214cbe6	correct variable name in KafkaSupervisor (#6354 )	2018-09-20 16:22:03 -07:00
Joshua Sun	4fafc2ccc9	fixes race condition in kafkasupervisor (#6304 ) * fixes race condition in kafkasupervisor * async verify checkpoints * fixes race condition in kafkasupervisor * replace commonly used methods with variables * remove countdownlatch import * reformat * fixes	2018-09-18 22:37:22 -07:00
Roman Leventov	0c4bd2b57b	Prohibit some Random usage patterns (#6226 ) * Prohibit Random usage patterns * Fix FlattenJSONBenchmarkUtil	2018-09-14 13:35:51 -07:00
Roman Leventov	d50b69e6d4	Prohibit LinkedList (#6112 ) * Prohibit LinkedList * Fix tests * Fix * Remove unused import	2018-09-13 18:07:06 -07:00
Clint Wylie	91a37c692d	'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234 ) * 'suspend' and 'resume' support for kafka indexing service changes: * introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively. * `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise. * `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors * Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever * Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}` * Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate' * move overlord console status to own column in supervisor table so does not look like garbage * spacing * padding * other kind of spacing * fix rebase fail * fix more better * all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests * fix log	2018-09-13 14:42:18 -07:00
Clint Wylie	e6e068ce60	Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129 ) * resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask * address review comments * changes due to review * merge fail	2018-09-07 13:17:49 -07:00
Gian Merlino	431d3d8497	Rename io.druid to org.apache.druid. (#6266 ) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests.	2018-08-30 09:56:26 -07:00
Jonathan Wei	c9a27e3e8e	Don't let catch/finally suppress main exception in IncrementalPublishingKafkaIndexTaskRunner (#6258 )	2018-08-28 16:12:02 -07:00
Jihoon Son	bda5a8a95e	Fix NPE in KafkaSupervisor.checkpointTaskGroup (#6206 ) * Fix NPE in KafkaSupervisor.checkpointTaskGroup * address comments * address comment	2018-08-26 22:23:33 -07:00
Jihoon Son	64d33eef7e	Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup (#6207 ) * Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup * fix npe * add taskRunner.getRunningTasks()	2018-08-26 19:59:01 -06:00
Benedict Jin	3647d4c94a	Make time-related variables more readable (#6158 ) * Make time-related variables more readable * Patch some improvements from the code reviewer * Remove unnecessary boxing of Long type variables	2018-08-21 15:29:40 -07:00
Benedict Jin	7d4b2d51e8	Fix assertionError at testCheckpointForInactiveTaskGroup in KafkaSupervisorTest (#6192 )	2018-08-21 11:33:45 -07:00
Jihoon Son	2bfe1b6a5a	Fix NPE for taskGroupId when rolling update (#6168 ) * Fix NPE for taskGroupId * missing changes * fix wrong annotation * fix potential race * keep baseSequenceName * make deprecated old param	2018-08-17 10:15:45 -07:00
Gian Merlino	5ce3185b9c	Fix three bugs with segment publishing. (#6155 ) * Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in #6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.	2018-08-15 13:55:53 -07:00
Jihoon Son	a7ca4589dd	Fix race in testCheckpointForUnknownTaskGroup() of KafkaSupervisorTest (#6153 )	2018-08-11 08:26:46 -07:00
Gian Merlino	3525d4059e	Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108 ) * Cache: Add maxEntrySize config. The idea is this makes it more feasible to cache query types that can potentially generate large result sets, like groupBy and select, without fear of writing too much to the cache per query. Includes a refactor of cache population code in CachingQueryRunner and CachingClusteredClient, such that they now use the same CachePopulator interface with two implementations: one for foreground and one for background. The main reason for splitting the foreground / background impls is that the foreground impl can have a more effective implementation of maxEntrySize. It can stop retaining subvalues for the cache early. * Add CachePopulatorStats. * Fix whitespace. * Fix docs. * Fix various tests. * Add tests. * Fix tests. * Better tests * Remove conflict markers. * Fix licenses.	2018-08-07 10:23:15 -07:00
Jihoon Son	56ab4363ea	Native parallel batch indexing without shuffle (#5492 ) * Native parallel indexing without shuffle * fix build * fix ci * fix ingestion without intervals * fix retry * fix retry * add it test * use chat handler * fix build * add docs * fix ITUnionQueryTest * fix failures * disable metrics reporting * working * Fix split of static-s3 firehose * Add endpoints to supervisor task and a unit test for endpoints * increase timeout in test * Added doc * Address comments * Fix overlapping locks * address comments * Fix static s3 firehose * Fix test * fix build * fix test * fix typo in docs * add missing maxBytesInMemory to doc * address comments * fix race in test * fix test * Rename to ParallelIndexSupervisorTask * fix teamcity * address comments * Fix license * addressing comments * addressing comments * indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator * Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner * Add more javadocs * use StringUtils.nonStrictFormat for logging * fix typo and remove unused class * fix tests * change package * fix strict build * tmp * Fix overlord api according to the recent change in master * Fix it test	2018-08-06 23:59:42 -07:00
Jihoon Son	ef2d6e9118	Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 (#6086 ) * Fix TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 * Make the priority of taskLock nullable * fix test * fix build	2018-08-03 17:13:44 -07:00
Nishant Bangarwa	75c8a87ce1	Part 2 of changes for SQL Compatible Null Handling (#5958 ) * Part 2 of changes for SQL Compatible Null Handling * Review comments - break lines longer than 120 characters * review comments * review comments * fix license * fix test failure * fix CalciteQueryTest failure * Null Handling - Review comments * review comments * review comments * fix checkstyle * fix checkstyle * remove unrelated change * fix test failure * fix failing test * fix travis failures * Make StringLast and StringFirst aggregators nullable and fix travis failures	2018-08-02 08:20:25 -07:00
Benedict Jin	331a0afb98	Remove redundant type parameters and enforce some other style and inspection rules (#5980 ) * Various changes about druid-services module * Patch improvements from reviewer * Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile * Fix ArraysAsListWithZeroOrOneArgument * Fix conflict * Fix ToArrayCallWithZeroLengthArrayArgument * Fix AliEqualsAvoidNull * Remove blank line * Remove unused import clauses * Fix code style in TopNQueryRunnerTest * Fix conflict * Don't use Collections.singletonList when converting the type of array type * Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase * Roll back the latest commit * Add java.io.File#toURL() into druid-forbidden-apis * Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord * Add a new regexp element into stylecode xml file * Fix style error for new regexp * Set the level of ArraysAsListWithZeroOrOneArgument as WARNING * Fix style error for new regexp * Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile * Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR * Add toArray(new Object[0]) regexp into checkstyle config file & fix them * Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it * Add a comment for string equals regexp in checkstyle config * Fix code format * Add RedundantTypeArguments as ERROR level inspection * Fix cannot resolve symbol datasource	2018-07-27 16:56:49 -05:00
Jihoon Son	1524af703d	Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050 )	2018-07-27 10:43:32 -07:00
Jihoon Son	b7d42edb0f	Check the kafka topic when compacring checkpoints from tasks and the one stored in metastore (#6015 )	2018-07-20 11:20:23 -07:00
Jihoon Son	c48aa74a30	Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996 ) * Fix NPE while handling CheckpointNotice * fix code style * Fix test * fix test * add a log for creating a new taskGroup * fix backward compatibility in KafkaIOConfig	2018-07-13 17:14:57 -07:00
Gian Merlino	04ea3c9f8c	Update license headers. (#5976 ) * Update license headers. For compliance with http://www.apache.org/legal/src-headers.html. * More license adjustments. * Fix mistakenly edited package line.	2018-07-11 09:55:18 -07:00
Gian Merlino	948e73da77	Extend various test timeouts. (#5978 ) False failures on Travis due to spurious timeout (in turn due to noisy neighbors) is a bigger problem than legitimate failures taking too long to time out. So it makes sense to extend timeouts.	2018-07-10 13:02:14 -07:00
Surekha	9bece8ce1e	Prevent KafkaSupervisor NPE in generateSequenceName (#5900 ) (#5902 ) * Prevent KafkaSupervisor NPE in checkPendingCompletionTasks (#5900) * throw IAE in generateSequenceName if groupId not found in taskGroups * add null check in checkPendingCompletionTasks * Add warn log in checkPendingCompletionTasks * Address PR comments Replace warn with error log * Address PR comments * change signature of generateSequenceName to take a TaskGroup object instead of int * Address comments * Remove unnecessary method from KafkaSupervisorTest	2018-07-04 23:45:42 -07:00
Jihoon Son	1ccabab98e	Fix the broken Appenderator contract in KafkaIndexTask (#5905 ) * Fix broken Appenderator contract in KafkaIndexTask * fix build * add publishFuture * reuse sequenceToUse if possible	2018-07-03 13:31:29 -07:00
Jihoon Son	b76a056c14	Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner (#5907 ) * Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner * fix lock and add comments	2018-06-30 17:20:41 -07:00
Surekha	0f429298cf	Fix Kafka Indexing task pause forever if no events in taskDuration (#5656 ) (#5899 ) * Fix Kafka Indexing task pause forever (#5656) * Fix Nullpointer Exception in overlord if taskGroups does not contain the groupId * If the endOffset is same as startOffset, still let the task resume instead of returning endOffsets early which causes the tasks to pause forever and ultimately fail on timeout * Address PR comment *Remove the null check and do not return null from generateSequenceName	2018-06-25 19:29:36 -07:00
Jihoon Son	8c5ded0fad	Splitting KafkaIndexTask for better code maintenance (#5854 ) * Refactoring KafkaIndexTask for better code maintenance * fix bug * fix test * add annotation * fix checkstyle * remove SetEndOffsetsResult	2018-06-22 13:00:03 -07:00
Surekha	8619adb5b9	Improve task retrieval APIs on Overlord (#5801 ) * Add the new tasks api in overlordResource It takes 4 optional query params * state(pending/running/waiting/compelte) * dataSource * interval (applies to completed tasks) * maxCompletedTasks (applies to completed tasks) If all params are null, the api returns all the tasks * Add the state to each task returned by tasks endpoint * divide active tasks into waiting, pending or running * Add more unit tests * Add UNKNOWN state to TaskState * Fix the authorization calls * WIP: PR comments Added new class to capture task info for caching Other refactoring * Refactoring : move TaskStatus class to druid-api so it can be accessed within server And other related classes like TaskState and TaskStatusPlus are in api * Remove unused class and apis accessing it * Add a separate cache for recently completed tasks This is to mainly capture the task type from payload * Ignore a test * Add a RuntimeTaskState to encompass all states a task can be in * Revert "Add a RuntimeTaskState to encompass all states a task can be in" This reverts commit `2a527a0731`. * Fix wrong api call * Fix and unignore tests * Remove waiting,pending state from TaskState * Add RunnerTaskState * Missed the annotation runnerStatusCode * Fix the creationTime * Fix the createdTime and queueInsertionTime for running/active tasks * Clean up tests * Add javadocs * Potentially fix the teamcity build * Address PR comments Get rid of TaskInfoBuilder Make TaskInfoMapper static nested class Other changes fix import in MaterializedViewSupervisor after merge * Address PR comments on * Replace global cache with local map * combine multiple queries into one * Removed unused code * Fix unit tests Fix a bug in securedTaskStatusPlus * Remove getRecentlyFinishedTaskStatuses method Change TaskInfoMapper signature to add generic type * Address PR comments * Passed datasource as argument to be used in sql query * Other minor fixes * Address PR comments Some minor changes, rename method, spacing changes Add early auth check if datasource is not null * Fix test case * Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage * Add TaskLocation to Anytask object * Address PR comments * Fix a bug in test case causing ClassCastException	2018-06-19 11:34:59 -07:00
Jonathan Wei	684b5d18c1	Moving averages for ingestion row stats (#5748 ) * Moving averages for ingestion row stats * PR comments * Make RowIngestionMeters extensible * test and checkstyle fixes * More PR comments * Fix metrics * Add some comments * PR comments * Comments	2018-06-05 09:08:57 -07:00
Kirill Kozlov	67d0b0ee42	Add taskType dimension to task metrics (#5664 )	2018-05-07 09:42:26 -07:00
Surekha	13c616ba24	'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583 ) * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Fix check style and remove a comment * Add overlord unsecured paths to coordinator when using combined service (#5579) * Add overlord unsecured paths to coordinator when using combined service * PR comment * More error reporting and stats for ingestion tasks (#5418) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments * Allow getDomain to return disjointed intervals (#5570) * Allow getDomain to return disjointed intervals * Indentation issues * Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551) * Adding feature thetaSketchConstant to do some set operation in PostAggregator * Updated review comments for PR #5551 - Adding thetaSketchConstant * Fixed CI build issue * Updated review comments 2 for PR #5551 - Adding thetaSketchConstant * Fix taskDuration docs for KafkaIndexingService (#5572) * With incremental handoff the changed line is no longer true. * Add doc for automatic pendingSegments (#5565) * Add missing doc for automatic pendingSegments * address comments * Fix indexTask to respect forceExtendableShardSpecs (#5509) * Fix indexTask to respect forceExtendableShardSpecs * add comments * Deprecate spark2 profile in pom.xml (#5581) Deprecated due to https://github.com/druid-io/druid/pull/5382 * CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586) Also switch various firehoses to the new method. Fixes #5585. * This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks Currently a config called 'maxRowsInMemory' is present which affects how much memory gets used for indexing.If this value is not optimal for your JVM heap size, it could lead to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might be bad for query performance and a higher value will limit number of persists but require more jvm heap space and could lead to OOM. 'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes kept in memory before persisting. * The default value is 1/3(Runtime.maxMemory()) * To maintain the current behaviour set 'maxBytesInMemory' to -1 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them will be respected i.e. the first one to go above threshold will trigger persist * Address code review comments * Fix the coding style according to druid conventions * Add more javadocs * Rename some variables/methods * Other minor issues * Address more code review comments * Some refactoring to put defaults in IndexTaskUtils * Added check for maxBytesInMemory in AppenderatorImpl * Decrement bytes in abandonSegment * Test unit test for multiple sinks in single appenderator * Fix some merge conflicts after rebase * Fix some style checks * Merge conflicts * Fix failing tests Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex * Address PR comments * Put defaults for maxRows and maxBytes in TuningConfig * Change/add javadocs * Refactoring and renaming some variables/methods * Fix TeamCity inspection warnings * Added maxBytesInMemory config to HadoopTuningConfig * Updated the docs and examples * Added maxBytesInMemory config in docs * Removed references to maxRowsInMemory under tuningConfig in examples * Set maxBytesInMemory to 0 until used Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing and set to part of max jvm memory when ingestion task starts * Update toString in KafkaSupervisorTuningConfig * Use correct maxBytesInMemory value in AppenderatorImpl * Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory Experimenting with various defaults, 1/3 jvm memory causes OOM * Update docs to correct maxBytesInMemory default value * Minor to rename and add comment * Add more details in docs * Address new PR comments * Address PR comments * Fix spelling typo	2018-05-03 16:25:58 -07:00
David Lim	8ec2d2fe18	Use unique segment paths for Kafka indexing (#5692 ) * support unique segment file paths * forbiddenapis * code review changes * code review changes * code review changes * checkstyle fix	2018-04-29 21:59:48 -07:00
Jonathan Wei	969342cd28	More error reporting and stats for ingestion tasks (#5418 ) * Add more indexing task status and error reporting * PR comments, add support in AppenderatorDriverRealtimeIndexTask * Use TaskReport instead of metrics/context * Fix tests * Use TaskReport uploads * Refactor fire department metrics retrieval * Refactor input row serde in hadoop task * Refactor hadoop task loader names * Truncate error message in TaskStatus, add errorMsg to task report * PR comments	2018-04-05 21:38:57 -07:00
Jonathan Wei	723f7ac550	Add support for task reports, upload reports to deep storage (#5524 ) * Add support for task reports, upload reports to deep storage * PR comments * Better name for method * Fix report file upload * Use TaskReportFileWriter * Checkstyle * More PR comments	2018-04-02 12:10:56 -07:00
Kirill Kozlov	8878a7ff94	Replace guava Charsets with native java StandardCharsets (#5545 )	2018-03-28 21:00:08 -07:00
Roman Leventov	693e3575f9	Remove unused code and exception declarations (#5461 ) * Remove unused code and exception declarations * Address comments * Remove redundant Exception declarations * Make FirehoseFactoryV2.connect() to throw IOException again	2018-03-16 22:11:12 +01:00
Jihoon Son	9b2a25bd84	Refactor supervisorReport to be type-safe (#5479 ) * refactor supervisorReport * use primitives	2018-03-13 09:28:44 -07:00
Niraja Mishra	96cebfc222	As part of this feature, implemented a new endpoint to get running tasks by datasources (#5260 ) and added datasource information as part of existing endpoint /druid/indexer/v1/runningTasks. Added junit test cases for the newly implemented API and fixed existing junit test cases. Fixed review comments - added new method getCreatedDateTimeAndDataSource into TaskStorageQueryAdapter class and formatted changed files	2018-03-12 23:48:11 -07:00
Clint Wylie	f948066710	KafkaIndexTask remove branch with unreachable code (#5434 )	2018-03-02 17:26:12 -08:00
Jihoon Son	16e08c9adb	add task priority for kafka indexing (#5444 )	2018-02-28 22:29:23 -08:00
Parag Jain	fba13d8978	time based checkpointing for Kafka Indexing Service (#5255 ) * time based checkpointing * add test and fix issue * fix comments * fix formatting * update docs	2018-02-15 20:57:02 -08:00
Jihoon Son	cd929000ca	Change early publishing to early pushing in indexTask & refactor AppenderatorDriver (#5297 ) * Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver * fix compile * rename and add more javadocs * Fix conflicts * address comments * revert await executors * fix test	2018-02-14 12:48:33 -08:00
Jonathan Wei	b234a119ac	Log exceptions thrown before persist() for indexing tasks (#5374 ) * Log exceptions thrown before persist() for indexing tasks * PR comment	2018-02-13 09:20:07 -08:00
Roman Leventov	e64ffb10c2	Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366 )	2018-02-07 13:24:30 -08:00
Gian Merlino	9a62b02cb7	Extensions: Option to load classes from extension jars first. (#5321 ) The behavior is configurable through druid.extensions.useExtensionClassloaderFirst. It is useful when extensions want to load a dependency different from one provided by Druid, for example a different version of geoip or protobuf.	2018-02-06 16:14:03 +05:30
Gian Merlino	7e02408510	Update versions to 0.13.0-SNAPSHOT. (#5323 )	2018-02-02 12:06:38 -06:00
Jonathan Wei	80419752b5	Add metamx emitter, http clients, and metrics packages to druid java-util (#5289 ) * Add metamx java-util emitter, http clients, and metrics packages to druid java-util * Remove metamx java-util from pom.xml files * Checkstyle fixes * Import fix * TeamCity inspection fixes * Use slf4j, move some version defs to master pom.xml * Use parent jvm-attach-api and maven-surefire-plugin versions * Add ] to log msg, suppress inspection	2018-01-24 22:10:36 +01:00
Jihoon Son	241efafbb2	Automatic compaction by coordinators (#5102 ) * Automatic compaction by coordinator * add links * skip compaction for very recent segments if they are small * fix finding search interval * fix finding search interval * fix TimelineHolder iteration * add test for newestSegmentFirstPolicy * add CompactionSegmentIterator * add numTargetCompactionSegments * add missing config * fix skipping huge shards * fix handling large number of segments per shard * fix test failure * change recursive call to loop * fix logging * fix build * fix test failure * address comments * change dataSources type * check running pendingTasks at each run * fix test * address comments * fix build * fix test * address comments * address comments * add doc for segment size optimization * address comment	2018-01-13 13:52:37 +09:00
Roman Leventov	8877ce38d6	Enforce modifier order with Checkstyle (#5246 )	2018-01-11 09:50:42 +01:00
Parag Jain	83c6c48bed	Fix state check bug in Kafka Index Task (#5204 ) * fix state check for replacement task * fix comments * rebase with master	2018-01-08 18:01:36 -08:00
Jonathan Wei	cdd374a417	Throw away rows with timestamps beyond long bounds in kafka indexing (#5215 ) * Throw away rows with timestamps beyond long bounds in kafka indexing * PR comments	2018-01-08 17:40:50 -06:00
Roman Leventov	579f9fbedf	Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175 ) * Add IndexedInts.debugToString() and AbstractIndex.toString() * Fix AppenderatorTest	2018-01-04 09:56:47 +09:00
Jihoon Son	9199d61389	Automatic pendingSegments cleanup (#5149 ) * PendingSegments cleanup * fix build * address comments * address comments * fix potential npe * address comments * fix build * fix test * fix test	2017-12-20 14:46:34 -08:00
Parag Jain	c56a9807d4	prevent npe on mismatch between number of kafka partitions and task count (#5139 )	2017-12-20 16:23:15 -06:00
Roman Leventov	5787d04fad	Bump Druid version to 0.12.0 (#5138 )	2017-12-15 07:37:01 -08:00
Parag Jain	677e24b760	prevent NPE from supressing actual exception (#5146 )	2017-12-12 11:42:30 -08:00
Roman Leventov	64848c7ebf	DataSegment memory optimizations (#5094 ) * Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers * Fix DataSegmentTest * Refinements * Try to fix * Fix the second DataSegmentTest * Nullability * Fix tests * Fix tests, unify to use TestHelper.getJsonMapper() * Revert TestUtil as ServerTestHelper, fix tests * Add newline * Fix indexing tests * Fix s3 tests * Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper() * Fix HDFS tests * Fix HdfsDataSegmentPusherTest * Capitalize constant names	2017-12-12 11:41:40 -08:00
Roman Leventov	a7a6a0487e	Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762 ) * Replace IOPeon with OutputMedium; Improve compression * Fix test * Cleanup CompressionStrategy * Javadocs * Add OutputBytesTest * Address comments * Random access in OutputBytes and GenericIndexedWriter * Fix bugs * Fixes * Test OutputBytes.readFully() * Address comments * Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes * Add comments to ByteBufferInputStream * Remove unused declarations	2017-12-04 18:04:27 -08:00
Parag Jain	7c01f77b04	Parse Batch support (#5081 ) * add parseBatch and deprecate parse method in InputRowParser add addAll method, skip max rows in memory check for it remove parse method from implemetations transform transformers add string multiplier input row parser fix withParseSpec fix kafka batch indexing fix isPersistRequired comments * add unit test * make persist async * review comments	2017-12-04 16:06:16 -06:00
Parag Jain	cb03efeb14	Kafka Index Task that supports Incremental handoffs (#4815 ) * Kafka Index Task that supports Incremental handoffs - Incrementally handoff segments when they hit maxRowsPerSegment limit - Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment - Support for restoring task on middle manager restarts by check pointing end offsets for segments * take care of review comments * make getCurrentOffsets call async, keep track of publishing sequence, review comments * fix setEndoffset duplicate request handling, formatting * fix unit test * backward compatibility * make AppenderatorDriverMetadata backwards compatible * add unit test * fix deadlock between persist and push executors in AppenderatorImpl * fix formatting * use persist dir instead of work dir * review comments * fix deadlock * actually fix deadlock	2017-11-17 16:05:20 -06:00
Gian Merlino	5da0241ac8	Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (#5034 ) AppenderatorImpl already applies maxRowsInMemory across all sinks. So dividing by the number of Kafka partitions is pointless and effectively makes the interpretation of maxRowsInMemory lower than expected. This undoes one of the two changes from #3284, which fixed the original bug twice. In this, that's worse than fixing it once.	2017-11-02 13:45:04 -06:00

1 2 3 4 5

244 Commits