Commit Graph

1681 Commits

Author SHA1 Message Date
Clint Wylie b00dd84fa2 clarify error messaging for parallel indexing task when when missing numShards or intervals (#8513) 2019-09-11 20:47:27 -07:00
Chi Cao Minh 5f61374cb3 Fix dependency analyze warnings (#8230)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.

* Address review comments

* Adjust scope for org.glassfish.jaxb:jaxb-runtime

* Fix dependencies for hdfs-storage

* Consolidate netty4 versions
2019-09-09 14:37:21 -07:00
Clint Wylie c73a489335
bump master version to 0.17.0-incubating-SNAPSHOT (#8421) 2019-08-28 01:58:36 -07:00
Jihoon Son e5ef5ddafa Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350)
* Fix shuffle with tls enabled; add an integration test; improve unit tests

* remove debug log

* fix tests

* unused import

* add javadoc

* rename to getContent
2019-08-26 19:27:41 -07:00
Dylan Wylie b2821a8371 do not exclude client core jar (#8339)
make indexing service depend on hadoop client
2019-08-26 13:48:24 -07:00
Xavier Léauté 8e0c307e54
Do not assume system classloader is URLClassLoader in Java 9+ (#8392)
* Fallback to parsing classpath for hadoop task in Java 9+
In Java 9 and above we cannot assume that the system classloader is an
instance of URLClassLoader. This change adds a fallback method to parse
the system classpath in that case, and adds a unit test to validate it matches
what JDK8 would do.

Note: This has not been tested in an actual hadoop setup, so this is mostly
to help us pass unit tests.

* Remove granularity test of dubious value
One of our granularity tests relies on system classloader being a URLClassLoaders to
catch a bug related to class initialization and static initializers using a subclass (see
#2979)
This test was added to catch a potential regression, but it assumes we would add back
the same type of static initializers to this specific class, so it seems to be of dubious value
as a unit test and mostly serves to illustrate the bug.

relates to #5589
2019-08-24 20:47:54 -04:00
Jihoon Son 95fa609615 Fix wrong partitionsSpec type names in the document (#8297)
* Fix wrong type names for partitionsSpec

* add unit tests; add json properties for backward compatibility

* beautify conf names

* remove maxRowsPerSegment from hashed partitionsSpec

* fix doc build
2019-08-23 13:44:58 -07:00
SandishKumarHN 33f0753a70 Add Checkstyle for constant name static final (#8060)
* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* merging with upstream

* review-1

* unknow changes

* unknow changes

* review-2

* merging with master

* review-2 1 changes

* review changes-2 2

* bug fix
2019-08-23 13:13:54 +03:00
Surekha cf2a2dd917
Add group_id to the sys.tasks table (#8304)
* Add group_id to overlord tasks API and sys.tasks table

* adjust test

* modify docs

* Make groupId nullable

* fix integration test

* fix toString

* Remove groupId from TaskInfo

* Modify docs and tests

* modify TaskMonitorTest
2019-08-22 15:28:23 -07:00
Jihoon Son fba92ae469 Fix to always use end sequenceNumber for reset (#8305)
* Fix to always use end sequenceNumber for reset

* fix checkstyle

* fix style and add log
2019-08-22 16:51:25 -05:00
Jonathan Wei 96e2142ea3 Cleanup appenderators and segment walkers in UnifiedIndexerAppenderatorsManager (#8287)
* Cleanup Appenderators in UnifiedIndexerAppenderatorsManager

* PR comments

* More PR comments

* Fix test
2019-08-22 12:18:46 -07:00
Clint Wylie b95607d31c
remove YeOldePlumberSchool.java, unused (#8347) 2019-08-21 18:15:51 -07:00
Jihoon Son 22d6384d36
Fix unrealistic test variables in KafkaSupervisorTest and tidy up unused variable in checkpointing process (#7319)
* Fix unrealistic test arguments in KafkaSupervisorTest

* remove currentCheckpoint from checkpoint action

* rename variable
2019-08-21 10:58:22 -07:00
Benedict Jin 566dc8c719
Fix missing format argument (#8331) 2019-08-19 16:19:44 +08:00
Jihoon Son 31af4eb9ad
Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324) 2019-08-16 15:57:13 -07:00
Jihoon Son 5dac6375f3
Add support for parallel native indexing with shuffle for perfect rollup (#8257)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* Add support for parallel native indexing with shuffle for perfect rollup.

* Add comment about volatiles

* fix test

* fix test

* handling missing exceptions

* more clear javadoc for stopGracefully

* unused import

* update javadoc

* Add missing statement in javadoc

* address comments; fix doc

* add javadoc for isGuaranteedRollup

* Rename confusing variable name and fix typos

* fix typos; move fetch() to a better home; fix the expiration time

* add support https
2019-08-15 17:43:35 -07:00
Jonathan Wei ef7b9606f2
Keep track of task location for completed tasks (#8286)
* Keep track of task location for completed tasks

* Add TaskLifecycleTest location checks
2019-08-15 16:57:02 -05:00
Jihoon Son 312cdc2452 Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks (#8236)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* handling missing exceptions

* more clear javadoc for stopGracefully

* update javadoc

* Add missing statement in javadoc

* typo
2019-08-12 19:42:06 -05:00
Clint Wylie 1054d85171
add mechanism to control filter optimization in historical query processing (#8209)
* add support for mechanism to control filter optimization in historical query processing

* oops

* adjust

* woo

* javadoc

* review comments

* fix

* default

* oops

* oof

* this will fix it

* more nullable, refactor DimFilter.getRequiredColumns to use Set, formatting

* extract class DimFilterToStringBuilder with common code from custom DimFilter toString implementations

* adjust variable naming

* missing nullable

* more nullable

* fix javadocs

* nullable

* address review comments

* javadocs, precondition

* nullable

* rename method to be consistent

* review comments

* remove tuning from ColumnComparisonFilter/ColumnComparisonDimFilter
2019-08-09 16:36:18 -07:00
Jihoon Son ab5b3be6c6 Add shuffleSegmentPusher for data shuffle (#8115)
* Fix race between canHandle() and addSegment() in StorageLocation

* add comment

* Add shuffleSegmentPusher which is a dataSegmentPusher used for writing shuffle data in local storage.

* add comments

* unused import

* add comments

* fix test

* address comments

* remove <p> tag from javadoc

* address comments

* comparingLong

* Address comments

* fix test
2019-08-05 13:38:35 -07:00
Jihoon Son 1ee828ff49
Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173)
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking

* add more test

* javadoc for missingIntervalsInOverwriteMode

* Fix test

* Address comments

* avoid spotbugs
2019-08-02 20:30:05 -07:00
Jihoon Son 8a16a8e97f
Teach tasks what machine they are running on (#8190)
* Teach the middleManager port to tasks

* parent annotation

* Bind parent for indexer
2019-08-02 15:34:44 -07:00
Jonathan Wei 41893d4647 Simple memory allocation for CliIndexer tasks (#8201)
* Simple memory allocation for CliIndexer

* PR comments

* Checkstyle
2019-08-01 10:22:41 +08:00
Gian Merlino 77297f4e6f GroupBy array-based result rows. (#8196)
* GroupBy array-based result rows.

Fixes #8118; see that proposal for details.

Other than the GroupBy changes, the main other "interesting" classes are:

- ResultRow: The array-based result type.
- BaseQuery: T is no longer required to be Comparable.
- QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization
  and deserialization of result rows (necessary due to their positional nature).
- QueryResource: Uses the new decoration functionality.
- DirectDruidClient: Also uses the new decoration functionality.
- QueryMaker (in Druid SQL): Modifications to read ResultRows.

These classes weren't changed, but got some new javadocs:

- BySegmentQueryRunner
- FinalizeResultsQueryRunner
- Query

* Adjustments for TC stuff.
2019-07-31 16:15:12 -07:00
Fokko Driesprong faf51107d5 Add SuppressWarnings SS_SHOULD_BE_STATIC (#8138)
* Spotbugs: SS_SHOULD_BE_STATIC (#8073)

* Add SuppressWarnings SS_SHOULD_BE_STATIC

Fixes #8073

* Fix the voilation

* Make them non-final

* Remove @Nonnull
2019-07-31 19:44:42 +03:00
Jihoon Son 385f492a55
Use PartitionsSpec for all task types (#8141)
* Use partitionsSpec for all task types

* fix doc

* fix typos and revert to use isPushRequired

* address comments

* move partitionsSpec to core

* remove hadoopPartitionsSpec
2019-07-30 17:24:39 -07:00
Fokko Driesprong e016995d1f Enable Spotbugs: WMI_WRONG_MAP_ITERATOR (#8005)
* WMI_WRONG_MAP_ITERATOR

* Fixed missing loop
2019-07-30 19:51:53 +03:00
Jihoon Son fb653ceef9 Add benchmark for VersionedIntervalTimeline (#8161)
* Add benchmark for VersionedIntervalTimeline

* rename
2019-07-30 08:10:00 -07:00
Jonathan Wei 640b7afc1c Add CliIndexer process type and initial task runner implementation (#8107)
* Add CliIndexer process type and initial task runner implementation

* Fix HttpRemoteTaskRunnerTest

* Remove batch sanity check on PeonAppenderatorsManager

* Fix paralle index tests

* PR comments

* Adjust Jersey resource logging

* Additional cleanup

* Fix SystemSchemaTest

* Add comment to LocalDataSegmentPusherTest absolute path test

* More PR comments

* Use Server annotated with RemoteChatHandler

* More PR comments

* Checkstyle

* PR comments

* Add task shutdown to stopGracefully

* Small cleanup

* Compile fix

* Address PR comments

* Adjust TaskReportFileWriter and fix nits

* Remove unnecessary closer

* More PR comments

* Minor adjustments

* PR comments

* ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
2019-07-29 17:06:33 -07:00
Chi Cao Minh ab71a2e1e4 Revert "Fix dependency analyze warnings (#8128)" (#8189)
This reverts commit 5dd0d8e873.
2019-07-29 11:42:16 -07:00
Jihoon Son adf7bafb9f Fix race between canHandle() and addSegment() in StorageLocation (#8114)
* Fix race between canHandle() and addSegment() in StorageLocation

* add comment

* add comments

* fix test

* address comments

* remove <p> tag from javadoc

* address comments

* comparingLong
2019-07-27 11:11:06 +03:00
Chi Cao Minh 5dd0d8e873 Fix dependency analyze warnings (#8128)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.

* Fix licenses and dependencies

* Fix licenses and dependencies again

* Fix integration test dependency

* Address review comments

* Fix unit test dependencies

* Fix integration test dependency

* Fix integration test dependency again

* Fix integration test dependency third time

* Fix integration test dependency fourth time

* Fix compile error

* Fix assert package
2019-07-26 10:49:03 -07:00
Jihoon Son db14946207
Add support minor compaction with segment locking (#7547)
* Segment locking

* Allow both timeChunk and segment lock in the same gruop

* fix it test

* Fix adding same chunk to atomicUpdateGroup

* resolving todos

* Fix segments to lock

* fix segments to lock

* fix kill task

* resolving todos

* resolving todos

* fix teamcity

* remove unused class

* fix single map

* resolving todos

* fix build

* fix SQLMetadataSegmentManager

* fix findInputSegments

* adding more tests

* fixing task lock checks

* add SegmentTransactionalOverwriteAction

* changing publisher

* fixing something

* fix for perfect rollup

* fix test

* adjust package-lock.json

* fix test

* fix style

* adding javadocs

* remove unused classes

* add more javadocs

* unused import

* fix test

* fix test

* Support forceTimeChunk context and force timeChunk lock for parallel index task if intervals are missing

* fix travis

* fix travis

* unused import

* spotbug

* revert getMaxVersion

* address comments

* fix tc

* add missing error handling

* fix backward compatibility

* unused import

* Fix perf of versionedIntervalTimeline

* fix timeline

* fix tc

* remove remaining todos

* add comment for parallel index

* fix javadoc and typos

* typo

* address comments
2019-07-24 17:35:46 -07:00
Eugene Sevastianov 799d20249f Response context refactoring (#8110)
* Response context refactoring

* Serialization/Deserialization of ResponseContext

* Added java doc comments

* Renamed vars related to ResponseContext

* Renamed empty() methods to createEmpty()

* Fixed ResponseContext usage

* Renamed multiple ResponseContext static fields

* Added PublicApi annotations

* Renamed QueryResponseContext class to ResourceIOReaderWriter

* Moved the protected method below public static constants

* Added createEmpty method to ResponseContext with DefaultResponseContext creation

* Fixed inspection error

* Added comments to the ResponseContext length limit and ResponseContext
http header name

* Added a comment of possible future refactoring

* Removed .gitignore file of indexing-service

* Removed a never-used method

* VisibleForTesting method reducing boilerplate

Co-Authored-By: Clint Wylie <cjwylie@gmail.com>

* Reduced boilerplate

* Renamed the method serialize to serializeWith

* Removed unused import

* Fixed incorrectly refactored test method

* Added comments for ResponseContext keys

* Fixed incorrectly refactored test method

* Fixed IntervalChunkingQueryRunnerTest mocks
2019-07-24 18:29:03 +03:00
Clint Wylie cb82d72547
fix forking task runner task shutdown to be more graceful (#8085)
* fix forking task runner shutdown to be more graceful

* javadoc
2019-07-23 11:47:23 -07:00
Himanshu 54a7b54d2d avoid 'must return non-void type' warning (#8105) 2019-07-18 15:02:27 -07:00
Jihoon Son c7eb7cd018
Add intermediary data server for shuffle (#8088)
* Add intermediary data server for shuffle

* javadoc

* adjust timeout

* resolved todo

* fix test

* style

* address comments

* rename to shuffleDataLocations

* Address comments

* bit adjustment StorageLocation

* fix test

* address comment & fix test

* handle interrupted exception
2019-07-18 14:46:47 -07:00
Surekha da16144495 Refactoring to use `CollectionUtils.mapValues` (#8059)
* doc updates and changes to use the CollectionUtils.mapValues utility method

* Add Structural Search patterns to intelliJ

* refactoring from PR comments

* put -> putIfAbsent

* do single key lookup
2019-07-17 23:02:22 -07:00
Roman Leventov ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Clint Wylie 15fbf5983d add Class.getCanonicalName to forbidden-apis (#8086)
* add checkstyle to forbid unecessary use of Class.getCanonicalName

* use forbiddin-api instead of checkstyle

* add space
2019-07-16 15:21:50 -07:00
Himanshu 14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Jihoon Son fcf56f2330 Add IS_INCREMENTAL_HANDOFF_SUPPORTED for KIS backward compatibility (#8050)
* Add IS_INCREMENTAL_HANDOFF_SUPPORTED for KIS backward compatibility

* do it for kafka only

* fix test
2019-07-10 08:29:37 -07:00
Gian Merlino 338b8b3fef
SupervisorManager: Add authorization checks to bulk endpoints. (#8044)
The endpoints added in #6272 were missing authorization checks. This patch removes the bulk
methods from SupervisorManager, and instead has SupervisorResource run the full list through
filterAuthorizedSupervisorIds before calling resume/suspend/terminate one by one.
2019-07-09 13:16:54 -07:00
Sashidhar Thallam 6701dc08fe Making StatusResponseHandler singleton and fixing all its instantiation invocations (#7969)
* Making StatusResponseHandler singleton and fixing all its instantiation invocations

* Using StatusResponseHandler.getInstance() where applicable
2019-07-08 13:33:00 +05:30
Chi Cao Minh 1166bbcb75 Remove static imports from tests (#8036)
Make static imports forbidden in tests and remove all occurrences to be
consistent with the non-test code.

Also, various changes to files affected by above:
- Reformat to adhere to druid style guide
- Fix various IntelliJ warnings
- Fix various SonarLint warnings (e.g., the expected/actual args to
  Assert.assertEquals() were flipped)
2019-07-06 09:33:12 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Fokko Driesprong 4508d3ff55 Enable RCN_REDUNDANT_NULLCHECK_OF_NULL_VALUE (#7972) 2019-07-01 11:25:06 +05:30
Fokko Driesprong e8dedb3de4 Enable MS_MUTABLE_COLLECTION_PKGPROTECT (#7971) 2019-07-01 11:24:05 +05:30
Clint Wylie 494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
Himanshu 417fcef385 WorkerTaskManager to create disk files atomically and ignore task file corruption (#7917)
* WorkerTaskManager to create disk files atomically and ignore task file
corruptions

* fixing weird checkstyle lambda indentation issues
2019-06-18 09:18:43 -07:00
SandishKumarHN 01881e3a98 Use only com.google.errorprone.annotations.concurrent.GuardedBy, not javax.annotations.concurrent.GuardedBy (#7889) 2019-06-17 15:58:51 +02:00
litao91 daed1bc141 Fix HttpRemoteTaskRunnerConfig not loading (#7902) 2019-06-16 00:31:42 -07:00
Sashidhar Thallam 3bee6adcf7 Use map.putIfAbsent() or map.computeIfAbsent() as appropriate instead of containsKey() + put() (#7764)
* https://github.com/apache/incubator-druid/issues/7316 Use Map.putIfAbsent() instead of containsKey() + put()

* fixing indentation

* Using map.computeIfAbsent() instead of map.putIfAbsent() where appropriate

* fixing checkstyle

* Changing the recommendation text

* Reverting auto changes made by IDE

* Implementing recommendation: A ConcurrentHashMap on which computeIfAbsent() is called should be assigned into variables of ConcurrentHashMap type, not ConcurrentMap

* Removing unused import
2019-06-14 17:59:36 +02:00
Clint Wylie 3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Surekha ea752ef562 Optimize overshadowed segments computation (#7595)
* Move the overshadowed segment computation to SQLMetadataSegmentManager's poll

* rename method in MetadataSegmentManager

* Fix tests

* PR comments

* PR comments

* PR comments

* fix indentation

* fix tests

*  fix test

*  add test for SegmentWithOvershadowedStatus serde format

* PR comments

* PR comments

* fix test

* remove snapshot updates outside poll

* PR comments

* PR comments

* PR comments

*  removed unused import
2019-06-07 19:15:54 +02:00
Jihoon Son 61ec521135
Remove keepSegmentGranularity option for compaction (#7747)
* Remove keepSegmentGranularity option from compaction

* fix it test

* clean up

* remove from web console

* fix test
2019-06-03 12:59:15 -07:00
Justin Borromeo 8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Roman Leventov 782863ed0f Fix some problems reported by PVS-Studio (#7738)
* Fix some problems reported by PVS-Studio

* Address comments
2019-05-29 11:20:45 -07:00
Jonathan Wei c97eb93e16 Fix dead code in IndexTask.collectIntervalsAndShardSpecs() (#7746) 2019-05-25 10:12:14 -07:00
Bartosz Ługowski cbdac49ab3 Web console - add enable/disable actions for middle manager workers (#7642)
* Overlord console - add enable/disable button for remote workers.

* Overlord console - add proxy for remote workers API.

* WorkerResourceTest - revert newline change.

* Remote worker proxy tests - remove empty line.

* Refactor remote worker proxy for readability and security

* Rename method in remote task runner tests for readability

* Remove enable/disable button for remote workers from old web console

* Add enable/disable actions for middle manager worker in new web console

* Fix variable type

* Add worker task runner query adapter

* Fix web console tests: segments-view, servers-view

* Fix overlord resource tests
2019-05-23 16:47:23 -07:00
Himanshu daf20b4b86 use memoized supplier for lazy singleton in SeekableStreamIndexTask.java (#7740) 2019-05-23 15:58:38 -07:00
Merlin Lee 26fad7e06a Add checkstyle for "Local variable names shouldn't start with capital" (#7681)
* Add checkstyle for "Local variable names shouldn't start with capital"

* Adjust some local variables to constants

* Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()
2019-05-23 18:40:28 +02:00
Jihoon Son eff2be4f8f Remove LegacyKafkaIndexTaskRunner (#7735) 2019-05-23 09:25:35 -07:00
Gian Merlino 53b6467fc8 SeekableStreamIndexTaskRunner: Lazy init of runner. (#7729)
The main motivation is that this fixes #7724, by making it so the overlord
doesn't try to create a task runner and parser when all it really wants to
do is create a task object and serialize it.
2019-05-22 21:13:57 -07:00
David Lim d38457933f Data loader (sampler component) - Kafka/Kinesis samplers (#7566)
* implement Kafka/Kinesis sampler

* add KafkaSamplerSpecTest and KinesisSamplerSpecTest

* code review changes
2019-05-16 20:26:23 -07:00
Fokko Driesprong 2aa9613bed Bump Checkstyle to 8.20 (#7651)
* Bump Checkstyle to 8.20

Moderate severity vulnerability that affects:
com.puppycrawl.tools:checkstyle

Checkstyle prior to 8.18 loads external DTDs by default,
which can potentially lead to denial of service attacks
or the leaking of confidential information.

Affected versions: < 8.18

* Oops, missed one

* Oops, missed a few
2019-05-14 11:53:37 -07:00
Xavier Léauté b3c7463059
fix test pattern matching for jdk11 (#7602)
This change makes the error pattern more generic to account for
differences between JDK versions.
2019-05-06 21:02:44 -07:00
David Lim ec8562c885 Data loader (sampler component) (#7531)
* sampler initial check-in
fix checkstyle issues
add sampler fix to process CSV files from cache properly
change to composition and rename some classes
add tests and report num rows read and indexed
remove excludedByFilter flag and don't send filtered out data
fix tests to handle both settings for druid.generic.useDefaultValueForNull

* wrap sampler firehose in TimedShutoffFirehoseFactory to support timeouts

* code review changes - add additional comments, limit maxRows
2019-05-01 22:37:14 -07:00
Jihoon Son c60e7feab8 Fix encoded taskId check in chatHandlerResource (#7520)
* Fix encoded taskId check in chatHandlerResource

* fix tests
2019-04-20 18:08:34 -07:00
Gian Merlino 1fb5ec3989 Adds backwards-compatible serde for SeekableStreamStartSequenceNumbers. (#7512)
This allows them to be deserialized by older Druid versions as
KafkaPartitions objects.

Fixes #7470.
2019-04-19 13:19:45 -07:00
Jihoon Son 625c030744 Fix ParallelIndexTuningConfig constructor (#7479)
* Fix ParallelIndexTuningConfig constructor

* unused import
2019-04-15 17:41:53 -07:00
Lucas Capistrant 8acad27d99 Enhance the Http Firehose to work with URIs requiring basic authentication (#7145)
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication

* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials

* Fix checkstyle failure in HttpFirehoseFactory.java

* Update docs and fix TeamCity build with required noinspection

* Indentation cleanup and logic modification for HttpFirehose object stream

* Remove default Empty string password provider in http firehose

* Add JavaDoc for MixIn describing its intended use

* Reverting documentation notation for json code to be inline with rest of doc

* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs

* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
2019-04-15 14:29:01 -07:00
Mingming Qiu 7a38d28cf3 run pending tasks when assigned a task that is already pending (#6991)
* run pending tasks when assigned a task that is already pending

* add unit test

* fix pending tasks aren't going to run

* address comments
2019-04-14 23:37:15 +08:00
Eugene Sevastyanov c57e62bcd6 7153: Isolated a supplier's close by removing its invocation (#7459)
from a non-notice-processing thread
2019-04-12 19:13:08 -07:00
Justin Borromeo 2771ed50b0 Support Kafka supervisor adopting running tasks between versions (#7212)
* Recompute hash in isTaskCurrent() and added tests

* Fixed checkstyle stuff

* Fixed failing tests

* Make TestableKafkaSupervisorWithCustomIsTaskCurrent static

* Add doc

* baseSequenceName change

* Added comment

* WIP

* Fixed imports

* Undid lambda change for diff sake

* Cleanup

* Added comment

* Reinsert Kafka tests

* Readded kinesis test

* Readd bad partition assignment in kinesis supervisor test

* Nit

* Misnamed var
2019-04-10 18:16:38 -07:00
lxqfy 6b741f0800 Overwrite index task maxTotalRows with computed maxRowsPerSegments (#7338)
fixed identation

set maxTotalRows to higher value in case mutiple segments in appendrator
with comments
2019-04-03 11:47:59 -07:00
David Glasser 4e23c11345 Make IngestSegmentFirehoseFactory splittable for parallel ingestion (#7048)
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion

* Code review feedback

- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)

* Add missing @JsonProperty

* Integration test passes

* Add unit test

* Remove two FIXME comments from CompactionTask

I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.

* Updates from code review
2019-04-02 14:59:17 -07:00
Charles Allen eeb3dbe79d Move GCP to a core extension (#6953)
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
2019-03-27 09:00:43 -07:00
Jihoon Son 5294277cb4
Fix exclusive start partitions for sequenceMetadata (#7339)
* Fix exclusvie start partitions for sequenceMetadata

* add empty check
2019-03-26 14:39:07 -07:00
Roman Leventov bca40dcdaf
Fix some IntelliJ inspections (#7273)
Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.
2019-03-25 21:11:01 -03:00
Jihoon Son 0c5dcf5586 Fix exclusivity for start offset in kinesis indexing service & check exclusivity properly in IndexerSQLMetadataStorageCoordinator (#7291)
* Fix exclusivity for start offset in kinesis indexing service

* some adjustment

* Fix SeekableStreamDataSourceMetadata

* Add missing javadocs

* Add missing comments and unit test

* fix SeekableStreamStartSequenceNumbers.plus and add comments

* remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue

* Add javadocs

* fix compilation

* fix test

* remove unused variable
2019-03-21 13:12:22 -07:00
Surekha e170203876 Consolidate kafka consumer configs (#7249)
* Consolidate kafka consumer configs

* change the order of adding properties

* Add consumer properties to fix test

it seems kafka consumer does not reveive any message without these configs

* Use KafkaConsumerConfigs in integration test

* Update zookeeper and kafka versions in the setup.sh for the base druid image

*  use version 0.2 of base druid image

* Try to fix tests in KafkaRecordSupplierTest

* unused import

* Fix tests in KafkaSupervisorTest
2019-03-21 11:19:49 -07:00
Roman Leventov dfd27e00c0
Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality (#7185)
* Avoid many  unnecessary materializations of collections of 'all segments in cluster' cardinality

* Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier()

* More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet()

* Style

* typo

* Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives

* Fixes
2019-03-19 18:22:56 -03:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Gian Merlino a8c7132482 Logic adjustments to SeekableStreamIndexTaskRunner. (#7267)
* Logic adjustments to SeekableStreamIndexTaskRunner.

A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:

- Remove "initialOffsetsSnapshot", which was used to determine when to
  skip start offsets. Replace it with "lastReadOffsets", which I hope
  is more intuitive. (There is a connection: start offsets must be
  skipped if and only if they have already been read, either by a
  previous task or by a previous sequence in the same task, post-restoring.)
- Remove "isStartingSequenceOffsetsExclusive", because it should always
  be the opposite of isEndOffsetExclusive. The reason is that starts are
  exclusive exactly when the prior ends are inclusive: they must match
  up in that way for adjacent reads to link up properly.
- Don't call "seekToStartingSequence" after the initial seek. There is
  no reason to, since we expect to read continuous message streams
  throughout the task. And calling it makes offset-tracking logic
  trickier, so better to avoid the need for trickiness. I believe the
  call being here was causing a bug in Kinesis ingestion where a
  message might get double-read.
- Remove the "continue" calls in the main read loop. They are bad
  because they prevent keeping currOffsets and lastReadOffsets up to
  date, and prevent us from detecting that we have finished reading.
- Rework "verifyInitialRecordAndSkipExclusivePartition" into
  "verifyRecordInRange". It no longer has side effects. It does a sanity
  check on the message offset and also makes sure that it is not past
  the endOffsets.
- Rework "assignPartitions" to replace inline comparisons with
  "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
  believe this fixes an off-by-one error with Kinesis where the last
  record would not get read. It also makes the logic easier to read.
- When doing the final publish, only adjust end offsets of the final
  sequence, rather than potentially adjusting any unpublished sequence.
  Adjusting sequences other than the last one is a mistake since it
  will extend their endOffsets beyond what they actually read. (I'm not
  sure if this was an issue in practice, since I'm not sure if real
  world situations would have more than one unpublished sequence.)
- Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
  shorter and more clear, I think.
- Add equals/hashCode/toString methods to OrderedSequenceNumber.

Kafka test changes:

- Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
  the very end of the task lifecycle still work properly.

Kinesis test changes:

- Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
  given Kinesis semantics, the right behavior when start offset equals
  end offset (and there aren't exclusive partitions set) is to read that
  single offset. This is because they are both meant to be treated as
  inclusive.
- Adjusted "testRestoreAfterPersistingSequences" to expect one more
  message read. I believe the old test was wrong; it expected the task
  not to read message number 5.
- Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
  checkpoint starting from 1 rather than 2. I believe the old test was
  wrong here too; it was expecting the task to start reading from the
  checkpointed offset, but it actually should have started reading from
  one past the checkpointed offset.
- Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
  11 messages read instead of 12. It's starting at message 0 and reading
  up to 10, which should be 11 messages.

* Changes from code review.
2019-03-15 00:22:42 -07:00
Furkan KAMACI 7ada1c49f9 Prohibit Throwables.propagate() (#7121)
* Throw caught exception.

* Throw caught exceptions.

* Related checkstyle rule is added to prevent further bugs.

* RuntimeException() is used instead of Throwables.propagate().

* Missing import is added.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* Throwables are propogated if possible.

* * Checkstyle definition is improved.
* Throwables.propagate() usages are removed.

* Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind.

* Throwable is kept before firing a Runtime Exception.

* Fix unused assignments.
2019-03-14 18:28:33 -03:00
Clint Wylie fb1489d313 fix SequenceMetadata deserialization (#7256)
* wip

* fix tests, stop reading if we are at end offset

* fix build

* remove restore at end offsets fix in favor of a separate PR

* use typereference from method for serialization too
2019-03-13 17:29:39 -04:00
Jihoon Son 873232954f Fix log level and throw NPE on null currOffset in SeekableStreamIndexTaskRunner (#7253) 2019-03-13 10:20:43 -04:00
Jihoon Son 32e86ea75e Fix record validation in SeekableStreamIndexTaskRunner (#7246)
* Fix record validation in SeekableStreamIndexTaskRunner

* add kinesis test
2019-03-12 21:12:21 -07:00
Gian Merlino dcfca03718
More accurate RealtimeMetricsMonitor messages. (#7230)
The old messages did not reflect the full range of reasons why messages
could be thrown away.
2019-03-11 19:50:32 -04:00
Jihoon Son e48a9c138e Reduce default max # of subTasks to 1 for native parallel task (#7181)
* Reduce # of max subTasks to 2

* fix typo and add more doc

* add more doc and link

* change default and add warning

* fix doc

* add test

* fix it test
2019-03-05 22:06:36 -08:00
Gian Merlino fa218f5160 Fix two SeekableStream serde issues. (#7176)
* Fix two SeekableStream serde issues.

1) Fix backwards-compatibility serde for SeekableStreamPartitions. It is needed
   for split 0.13 / 0.14 clusters to work properly during a rolling update.
2) Abstract classes don't need JsonCreator constructors; remove them.

* Comment fixes.
2019-03-01 22:27:08 -08:00
Jihoon Son 06c8229c08
Kill all running tasks when the supervisor task is killed (#7041)
* Kill all running tasks when the supervisor task is killed

* add some docs and simplify

* address comment
2019-03-01 11:28:03 -08:00
Jihoon Son cacdc83cad Improve error message for integer overflow in compaction task (#7131)
* improve error message for integer overflow in compaction task

* fix build
2019-02-28 11:07:37 +08:00
Himanshu Pandey 8b803cbc22 Added checkstyle for "Methods starting with Capital Letters" (#7118)
* Added checkstyle for "Methods starting with Capital Letters" and changed the method names violating this.

* Un-abbreviate the method names in the calcite tests

* Fixed checkstyle errors

* Changed asserts position in the code
2019-02-23 20:10:31 -08:00
David Glasser 1c2753ab90 ParallelIndexSubTask: support ingestSegment in delegating factories (#7089)
IndexTask had special-cased code to properly send a TaskToolbox to a
IngestSegmentFirehoseFactory that's nested inside a CombiningFirehoseFactory,
but ParallelIndexSubTask didn't.

This change refactors IngestSegmentFirehoseFactory so that it doesn't need a
TaskToolbox; it instead gets a CoordinatorClient and a SegmentLoaderFactory
directly injected into it.

This also refactors SegmentLoaderFactory so it doesn't depend on
an injectable SegmentLoaderConfig, since its only method always
replaces the preconfigured SegmentLoaderConfig anyway.
This makes it possible to use SegmentLoaderFactory without setting
druid.segmentCaches.locations to some dummy value.

Another goal of this PR is to make it possible for IngestSegmentFirehoseFactory
to list data segments outside of connect() --- specifically, to make it a
FiniteFirehoseFactory which can query the coordinator in order to calculate its
splits. See #7048.

This also adds missing datasource name URL-encoding to an API used by
CoordinatorBasedSegmentHandoffNotifier.
2019-02-23 17:02:56 -08:00
Jihoon Son 4e2b085201
Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911)
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file

* delete descriptor.file when killing segments

* fix test

* Add doc for ha

* improve warning
2019-02-20 15:10:29 -08:00
David Glasser a81b1b8c9c index_parallel: support !appendToExisting with no explicit intervals (#7046)
* index_parallel: support !appendToExisting with no explicit intervals

This enables ParallelIndexSupervisorTask to dynamically request locks at runtime
if it is run without explicit intervals in the granularity spec and with
appendToExisting set to false.  Previously, it behaved as if appendToExisting
was set to true, which was undocumented and inconsistent with IndexTask and
Hadoop indexing.

Also, when ParallelIndexSupervisorTask allocates segments in the explicit
interval case, fail if its locks on the interval have been revoked.

Also make a few other additions/clarifications to native ingestion docs.

Fixes #6989.

* Review feedback.

PR description on GitHub updated to match.

* Make native batch ingestion partitions start at 0

* Fix to previous commit

* Unit test. Verified to fail without the other commits on this branch.

* Another round of review

* Slightly scarier warning
2019-02-20 10:54:26 -08:00
Clint Wylie cadb6c5280 Missing Overlord and MiddleManager api docs (#7042)
* document middle manager api

* re-arrange

* correction

* document more missing overlord api calls, minor re-arrange of some code i was referencing

* fix it

* this will fix it

* fixup

* link to other docs
2019-02-19 10:52:05 -08:00