Commit Graph

388 Commits

Author SHA1 Message Date
Jihoon Son 1524af703d
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050) 2018-07-27 10:43:32 -07:00
Jihoon Son b7d42edb0f Check the kafka topic when compacring checkpoints from tasks and the one stored in metastore (#6015) 2018-07-20 11:20:23 -07:00
Jihoon Son c48aa74a30 Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996)
* Fix NPE while handling CheckpointNotice

* fix code style

* Fix test

* fix test

* add a log for creating a new taskGroup

* fix backward compatibility in KafkaIOConfig
2018-07-13 17:14:57 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 948e73da77 Extend various test timeouts. (#5978)
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
2018-07-10 13:02:14 -07:00
Surekha 9bece8ce1e Prevent KafkaSupervisor NPE in generateSequenceName (#5900) (#5902)
* Prevent KafkaSupervisor NPE in checkPendingCompletionTasks (#5900)

* throw IAE in generateSequenceName if groupId not found in taskGroups
* add null check in checkPendingCompletionTasks

* Add warn log in checkPendingCompletionTasks

* Address PR comments

Replace warn with error log

* Address PR comments

* change signature of generateSequenceName to take a TaskGroup object instead of int

* Address comments

* Remove unnecessary method from KafkaSupervisorTest
2018-07-04 23:45:42 -07:00
Jihoon Son 1ccabab98e Fix the broken Appenderator contract in KafkaIndexTask (#5905)
* Fix broken Appenderator contract in KafkaIndexTask

* fix build

* add publishFuture

* reuse sequenceToUse if possible
2018-07-03 13:31:29 -07:00
Jihoon Son b76a056c14 Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner (#5907)
* Fix ConcurrentModificationException in IncrementalPublishingKafkaIndexTaskRunner

* fix lock and add comments
2018-06-30 17:20:41 -07:00
Surekha 0f429298cf Fix Kafka Indexing task pause forever if no events in taskDuration (#5656) (#5899)
* Fix Kafka Indexing task pause forever (#5656)

* Fix Nullpointer Exception in overlord if taskGroups does not contain the groupId
* If the endOffset is same as startOffset, still let the task resume instead of returning
   endOffsets early which causes the tasks to pause forever and ultimately fail on timeout

* Address PR comment

*Remove the null check and do not return null from generateSequenceName
2018-06-25 19:29:36 -07:00
Jihoon Son 8c5ded0fad
Splitting KafkaIndexTask for better code maintenance (#5854)
* Refactoring KafkaIndexTask for better code maintenance

* fix bug

* fix test

* add annotation

* fix checkstyle

* remove SetEndOffsetsResult
2018-06-22 13:00:03 -07:00
Surekha 8619adb5b9 Improve task retrieval APIs on Overlord (#5801)
* Add the new tasks api in overlordResource

It takes 4 optional query params
* state(pending/running/waiting/compelte)
* dataSource
* interval (applies to completed tasks)
* maxCompletedTasks (applies to completed tasks)

If all params are null, the api returns all the tasks

* Add the state to each task returned by tasks endpoint

* divide active tasks into waiting, pending or running
* Add more unit tests

* Add UNKNOWN state to TaskState

* Fix the authorization calls

* WIP: PR comments

Added new class to capture task info for caching
Other refactoring

* Refactoring : move TaskStatus class to druid-api

so it can be accessed within server
And other related classes like TaskState and TaskStatusPlus are in api

* Remove unused class and apis accessing it

* Add a separate cache for recently completed tasks

This is to mainly capture the task type from payload

* Ignore a test

* Add a RuntimeTaskState to encompass all states a task can be in

* Revert "Add a RuntimeTaskState to encompass all states a task can be in"

This reverts commit 2a527a0731.

* Fix wrong api call

* Fix and unignore tests

* Remove waiting,pending state from TaskState

* Add RunnerTaskState

* Missed the annotation runnerStatusCode

* Fix the creationTime

* Fix the createdTime and queueInsertionTime for running/active tasks
* Clean up tests

* Add javadocs

* Potentially fix the teamcity build

* Address PR comments

*Get rid of TaskInfoBuilder
*Make TaskInfoMapper static nested class
*Other changes

* fix import in MaterializedViewSupervisor after merge

* Address PR comments on

* Replace global cache with local map
* combine multiple queries into one
* Removed unused code

* Fix unit tests

Fix a bug in securedTaskStatusPlus

* Remove getRecentlyFinishedTaskStatuses method

Change TaskInfoMapper signature to add generic type

* Address PR comments

* Passed datasource as argument to be used in sql query
* Other minor fixes

* Address PR comments

*Some minor changes, rename method, spacing changes

* Add early auth check if datasource is not null

* Fix test case

* Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage
* Add TaskLocation to Anytask object

* Address PR comments

* Fix a bug in test case causing ClassCastException
2018-06-19 11:34:59 -07:00
Jonathan Wei 684b5d18c1
Moving averages for ingestion row stats (#5748)
* Moving averages for ingestion row stats

* PR comments

* Make RowIngestionMeters extensible

* test and checkstyle fixes

* More PR comments

* Fix metrics

* Add some comments

* PR comments

* Comments
2018-06-05 09:08:57 -07:00
Kirill Kozlov 67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Jonathan Wei 969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Jonathan Wei 723f7ac550
Add support for task reports, upload reports to deep storage (#5524)
* Add support for task reports, upload reports to deep storage

* PR comments

* Better name for method

* Fix report file upload

* Use TaskReportFileWriter

* Checkstyle

* More PR comments
2018-04-02 12:10:56 -07:00
Kirill Kozlov 8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Roman Leventov 693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
Jihoon Son 9b2a25bd84
Refactor supervisorReport to be type-safe (#5479)
* refactor supervisorReport

* use primitives
2018-03-13 09:28:44 -07:00
Niraja Mishra 96cebfc222 As part of this feature, implemented a new endpoint to get running tasks by datasources (#5260)
and added datasource information as part of existing endpoint /druid/indexer/v1/runningTasks.

Added junit test cases for the newly implemented API and fixed existing junit test cases.

Fixed review comments - added new method getCreatedDateTimeAndDataSource into TaskStorageQueryAdapter class
and formatted changed files
2018-03-12 23:48:11 -07:00
Clint Wylie f948066710 KafkaIndexTask remove branch with unreachable code (#5434) 2018-03-02 17:26:12 -08:00
Jihoon Son 16e08c9adb add task priority for kafka indexing (#5444) 2018-02-28 22:29:23 -08:00
Parag Jain fba13d8978 time based checkpointing for Kafka Indexing Service (#5255)
* time based checkpointing

* add test and fix issue

* fix comments

* fix formatting

* update docs
2018-02-15 20:57:02 -08:00
Jihoon Son cd929000ca
Change early publishing to early pushing in indexTask & refactor AppenderatorDriver (#5297)
* Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver

* fix compile

* rename and add more javadocs

* Fix conflicts

* address comments

* revert await executors

* fix test
2018-02-14 12:48:33 -08:00
Jonathan Wei b234a119ac Log exceptions thrown before persist() for indexing tasks (#5374)
* Log exceptions thrown before persist() for indexing tasks

* PR comment
2018-02-13 09:20:07 -08:00
Roman Leventov e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Gian Merlino 9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Jihoon Son 241efafbb2
Automatic compaction by coordinators (#5102)
* Automatic compaction by coordinator

* add links

* skip compaction for very recent segments if they are small

* fix finding search interval

* fix finding search interval

* fix TimelineHolder iteration

* add test for newestSegmentFirstPolicy

* add CompactionSegmentIterator

* add numTargetCompactionSegments

* add missing config

* fix skipping huge shards

* fix handling large number of segments per shard

* fix test failure

* change recursive call to loop

* fix logging

* fix build

* fix test failure

* address comments

* change dataSources type

* check running pendingTasks at each run

* fix test

* address comments

* fix build

* fix test

* address comments

* address comments

* add doc for segment size optimization

* address comment
2018-01-13 13:52:37 +09:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Parag Jain 83c6c48bed Fix state check bug in Kafka Index Task (#5204)
* fix state check for replacement task

* fix comments

* rebase with master
2018-01-08 18:01:36 -08:00
Jonathan Wei cdd374a417 Throw away rows with timestamps beyond long bounds in kafka indexing (#5215)
* Throw away rows with timestamps beyond long bounds in kafka indexing

* PR comments
2018-01-08 17:40:50 -06:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
Jihoon Son 9199d61389 Automatic pendingSegments cleanup (#5149)
* PendingSegments cleanup

* fix build

* address comments

* address comments

* fix potential npe

* address comments

* fix build

* fix test

* fix test
2017-12-20 14:46:34 -08:00
Parag Jain c56a9807d4
prevent npe on mismatch between number of kafka partitions and task count (#5139) 2017-12-20 16:23:15 -06:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Parag Jain 677e24b760 prevent NPE from supressing actual exception (#5146) 2017-12-12 11:42:30 -08:00
Roman Leventov 64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
Parag Jain 7c01f77b04 Parse Batch support (#5081)
* add parseBatch and deprecate parse method in InputRowParser

add addAll method, skip max rows in memory check for it

remove parse method from implemetations

transform transformers

add string multiplier input row parser

fix withParseSpec

fix kafka batch indexing

fix isPersistRequired

comments

* add unit test

* make persist async

* review comments
2017-12-04 16:06:16 -06:00
Parag Jain cb03efeb14 Kafka Index Task that supports Incremental handoffs (#4815)
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments

* take care of review comments

* make getCurrentOffsets call async, keep track of publishing sequence, review comments

* fix setEndoffset duplicate request handling, formatting

* fix unit test

* backward compatibility

* make AppenderatorDriverMetadata backwards compatible

* add unit test

* fix deadlock between persist and push executors in AppenderatorImpl

* fix formatting

* use persist dir instead of work dir

* review comments

* fix deadlock

* actually fix deadlock
2017-11-17 16:05:20 -06:00
Gian Merlino 5da0241ac8
Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (#5034)
AppenderatorImpl already applies maxRowsInMemory across all sinks. So dividing by
the number of Kafka partitions is pointless and effectively makes the interpretation
of maxRowsInMemory lower than expected.

This undoes one of the two changes from #3284, which fixed the original bug twice.
In this, that's worse than fixing it once.
2017-11-02 13:45:04 -06:00
Gian Merlino 6c725a7e06 Fix havingSpec on complex aggregators. (#5024)
* Fix havingSpec on complex aggregators.

- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
  related classes from druid-server to druid-processing. They probably make
  more sense there anyway.
- Adds a SQL query test.

Fixes #4957.

* Remove unused import.
2017-11-01 12:58:08 -04:00
Gian Merlino 0ce406bdf1
Introduce "transformSpec" at ingest-time. (#4890)
* Introduce "transformSpec" at ingest-time.

It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.

The "expression" fields use the same expression language as other
expression-based feature.

* Remove forbidden api.

* Fix compile error.

* Fix tests.

* Some more changes.

- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
  and ingestSegment firehose.

* Fix bad merge.

* Adjust imports.

* Adjust whitespace.

* Make Transform into an interface.

* Add missing annotation.

* Switch logger.

* Switch logger.

* Adjust test.

* Adjustment to handling for DatasourceIngestionSpec.

* Fix test.

* CR comments.

* Remove unused method.

* Add javadocs.

* More javadocs, and always decorate.

* Fix bug in TransformingStringInputRowParser.

* Fix bad merge.

* Fix ISFF tests.

* Fix DORC test.
2017-10-30 17:38:52 -07:00
elloooooo 52a162e302 define earlyMessegeRejectPeriod as the period after the taskduration (#4990) 2017-10-27 01:13:46 +05:30
Jihoon Son 8d9902831e Refactoring PrefetchableTextFilesFirehoseFactory (#4836)
* Refactoring prefetchable firehose

* Fix to read cache when prefetch is disabled

* More tests

* Cleanup codes

* Add Fetcher

* Fix test failure

* Count file size

* Fix test

* rename generic parameter

* address comments

* address comments

* reuse buffer

* move Execs to java-util

* use execs

* Fix build
2017-10-13 21:39:28 -05:00
Jihoon Son dfa9cdc982 Prioritized locking (#4550)
* Implementation of prioritized locking

* Fix build failure

* Fix tc fail

* Fix typos

* Fix IndexTaskTest

* Addressed comments

* Fix test

* Fix spacing

* Fix build error

* Fix build error

* Add lock status

* Cleanup suspicious method

* Add nullables

*  add doInCriticalSection to TaskLockBox and revert return type of task actions

* fix build

* refactor CriticalAction

* make replaceLock transactional

* fix formatting

* fix javadoc

* fix build
2017-10-11 23:16:31 -07:00
Jihoon Son 56fb11ce0b Lazy initialization for JavaScript functions (#4871)
* Lazy initialization of JavaScript functions

* Fix test failure

* Fix thread-safety and postpone js conf check

* Fix test fail

* Fix test

* Fix KafkaIndexTaskTest

* Move config check
2017-10-10 21:52:42 -07:00
Gian Merlino 1f2074c247 Bump versions in master to 0.11.1-SNAPSHOT. (#4878)
* Bump versions in master to 0.11.1-SNAPSHOT.

* Missed a few.
2017-09-28 17:09:51 -05:00
Gian Merlino bf8fd4c203 Add flattenSpec support to the Avro parser. (#4832)
* Add flattenSpec support to the Avro parser.

Also:

- Refactor the JSONPathParser a bit so it can share flattening code
  with Avro (see ObjectFlatteners).
- Remove the JSONParser. It was only used in two places: by
  UriNamespaceExtractor, and as a base for JSONToLowerParser. Migrated
  the former to JSONPathParser and made the latter a standalone.
- Move GenericRecordAsMap to the Parquet extension, since the Avro
  extension no longer uses it.

* Fix indentation.

* Fix equals/hashCode.
2017-09-26 09:26:06 -07:00
Parag Jain 07446ef32c warn if topic not found (#4834) 2017-09-25 12:21:46 +09:00
Roman Leventov e267f3901b Enforce Indentation with Checkstyle (#4799) 2017-09-21 13:06:48 -07:00
Jonathan Wei c2a0e753b6 Extension points for authentication/authorization (#4271)
* Extension points for authentication/authorization

* Address some PR comments

* Authorization result caching

* Add unit tests for SecuritySanityCheckFilter and PreResponseAuthorizationCheckFilter

* Use Set for auth caching, close outputstreams in filters

* Don't close output stream on success in sanity check filter

* Add ConfigResourceFilter to coordinator lookups

* Fix filtering authorization check for empty resource list

* HttpClient users must explicitly escalate the client

* Remove response modification from PreResponseAuthorizationCheckFilter

* Remove extraneous pom.xml

* Fix unit test

* Better lifecycle management

* Rename AuthorizationManager to Authorizer

* Fix authorization denials for empty supervisor list

* Address some PR comments

* Address more PR comments

* Small cleanup

* Add Jetty HttpClient wrapper to Authenticator

* Remove Authorizer start/stop

* Restore immutable context map in DruidConnection, UT fix

* Fix/update docs

* Add authorization checks to EventReceiverFirehose

* Fix router authorization check failure, restore PreResponseAuthorizationFilter changes

* Compile fixes

* Test fixes

* Update Authenticator/Authorizer doc comments

* Merge fixes

* PR comments

* Fix test

* Fix IT

* More PR comments

* PR comments

* SSL fix
2017-09-15 23:45:48 -07:00
Himanshu 4c04083926 kafkaIndexTask unannounce service in final block (#4736) 2017-09-01 09:31:15 -07:00
Roman Leventov cbd1902db8 Add forbidden-apis plugin; prohibit using system time zone (#4611)
* Forbidden APIs WIP

* Remove some tests

* Restore io.druid.math.expr.Function

* Integration tests fix

* Add comments

* Fix in SimpleWorkerProvisioningStrategy

* Formatting

* Replace String.format() with StringUtils.format() in RemoteTaskRunnerTest

* Address comments

* Fix GroupByMultiSegmentTest
2017-08-21 13:02:42 -07:00
Himanshu 74a64c88ab internal-discovery: interfaces for announcement/discovery, curator based impls (#4634)
* internal-discovery: interfaces for announcement/discovery, curator impls

* more tests

* address some review comments

* more fixes

* address more review comments

* simplify ObjectMapper setup in CuratorDruidNodeAnnouncerAndDiscoveryTest

* fix KafkaIndexTaskTest

* make lookupTier overridable via RealtimeIndexTask and KafkaIndexTask context

* make teamcity build happy
2017-08-16 13:07:16 -07:00
Parag Jain 725a144096 add localhost as advertised hostname (#4689)
* add localhost as advertised hostname

* set advertised.host.name to localhost for test kafka broker
2017-08-14 16:59:26 -07:00
Roman Leventov bf28d0775b Remove QueryRunner.run(Query, responseContext) and related legacy methods (#4482)
* Remove QueryRunner.run(Query, responseContext) and related legacy methods

* Remove local var
2017-08-11 09:12:38 +09:00
Yuewen Wang c821bc9a5a Implement "earlyMessageRejectionPeriod" config discussed in issue #4599 (#4607)
* Implement "earlyMessageRejectionPeriod" config discussed in issue #4599
    * implement the logics of this param
    * Added doc of this config
    * Added unit tests of it

* Update KafkaSupervisor.java

ameliorate comment

* fix format

* fix bug when rebasing
2017-08-11 09:12:08 +09:00
Jihoon Son d5606bc558 Passing lockTimeout as a parameter for TaskLockbox.lock() (#4549)
* Passing lockTimeout as a parameter for TaskLockbox.lock()

* Remove TIME_UNIT

* Fix tc fail

* Add taskLockTimeout to TaskContext

* Add caution
2017-08-08 18:21:07 -07:00
Roman Leventov f5d4171459 Prohibit for loops which could be foreach with IntelliJ (#4653)
* Replace for with foreach

* Replace for with for-each in GroupByQueryEngineV2

* Remove io.druid.collections.IntList
2017-08-08 18:05:33 -07:00
Roman Leventov aa7e4ae5e4 Enforce correct spacing with Checkstyle (#4651) 2017-08-05 10:18:25 -07:00
Roman Leventov c0beb78ffd Enforce brace formatting with Checkstyle (#4564) 2017-07-21 10:26:59 -05:00
Chris Gavin 960cb07ea6 Fix some unnecessary use of boxed types and incorrect format strings spotted by lgtm. (#4474)
* Remove some unnecessary use of boxed types.

* Fix some incorrect format strings.

* Enable IDEA's MalformedFormatString inspection.

* Add a Checkstyle check for finding uses of incorrect logging packages.

* Fix some incorrect usages of the metamx logger.

* Bypass incorrect logger Checkstyle check where using the correct logger is not simple.

* Fix some more places where the wrong number of arguments are provided to format strings.

* Suppress `MalformedFormatString` inspection on legacy logging test.

* Use @SuppressWarnings rather than a noinspection suppression comment.

* Fix some more incorrect format strings.

* Suppress some more incorrect format string warnings where the incorrect string is intentional.

* Log the aggregator when closing it fails.

* Remove some unneeded log lines.
2017-07-13 12:15:32 -07:00
Roman Leventov b2865b7c7b Make possible to start Peon without DI loading of any querying-related stuff (#4516)
* Make QueryRunnerFactoryConglomerate injection lazy in TaskToolbox/TaskToolboxFactory

* Extract QueryablePeonModule and add druid.modules.excludeList config

* Typo
2017-07-12 13:18:25 -05:00
Akash Dwivedi 5f411f14af Timeout for LockAcquireAction (#4461)
* Timeout for LockAcquireAction

* Static inner class.

* Rebase changes.

* makeAlert and throw exception incase of overlapping interval.

* Addressed comments.

* remove unused import.

* Addressed comments
2017-07-11 18:59:32 +09:00
Jihoon Son cc20260078 Early publishing segments in the middle of data ingestion (#4238)
* Early publishing segments in the middle of data ingestion

* Remove unnecessary logs

* Address comments

* Refactoring the patch according to #4292 and address comments

* Set the total shard number of NumberedShardSpec to 0

* refactoring

* Address comments

* Fix tests

* Address comments

* Fix sync problem of committer and retry push only

* Fix doc

* Fix build failure

* Address comments

* Fix compilation failure

* Fix transient test failure
2017-07-10 22:35:36 -07:00
Parag Jain 6e2f78f552 TLS support (#4270) 2017-07-06 17:40:12 -07:00
Roman Leventov 9ae457f7ad Avoid using the default system Locale and printing to System.out in production code (#4409)
* Avoid usages of Default system Locale and printing to System.out or System.err in production code

* Fix Charset in DruidKerberosUtil

* Remove redundant string format in GenericIndexed

* Rename StringUtils.safeFormat() to unimportantSafeFormat(); add StringUtils.format() which fails as well as String.format()

* Fix testSafeFormat()

* More fixes of redundant StringUtils.format() inside ISE

* Rename unimportantSafeFormat() to nonStrictFormat()
2017-06-29 14:06:19 -07:00
Roman Leventov ae900a4934 Update versions to 0.11.0-SNAPSHOT (#4483) 2017-06-28 17:05:58 -07:00
Roman Leventov 05d58689ad Remove the ability to create segments in v8 format (#4420)
* Remove ability to create segments in v8 format

* Fix IndexGeneratorJobTest

* Fix parameterized test name in IndexMergerTest

* Remove extra legacy merging stuff

* Remove legacy serializer builders

* Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest
2017-06-26 13:21:39 -07:00
Roman Leventov 5285eb961b Update dependencies (#4313)
* Update dependencies

* Downgrade curator

* Rollback aws-java-sdk dependency to 1.10.77

* Revert exclusions in integration-tests

* Depend only on aws-java-sdk-ec2 instead of umbrella aws-java-sdk (fixes #4382)
2017-06-09 14:32:07 -07:00
Gian Merlino 1f2afccdf8 Expressions: Add ExprMacros. (#4365)
* Expressions: Add ExprMacros, which have the same syntax as functions, but
can convert themselves to any kind of Expr at parse-time.

ExprMacroTable is an extension point for adding new ExprMacros. Anything
that might need to parse expressions needs an ExprMacroTable, which can
be injected through Guice.

* Address code review comments.
2017-06-08 09:32:10 -04:00
Roman Leventov 31d33b333e Make using implicit system Charset an error (#4326)
* Make using implicit system charset an error

* Use StringUtils.toUtf8() and fromUtf8() instead of String.getBytes() and new String()

* Use English locale in StringUtils.safeFormat()

* Restore comment
2017-06-05 23:57:25 -07:00
David Lim 13ecf90923 Report Kafka lag information in supervisor status report (#4314)
* refactor lag reporting and report lag at status endpoint

* refactor offset reporting logic to fetch offsets periodically vs. at request time

* remove JavaCompatUtils

* code review changes

* code review changes
2017-06-05 13:26:25 -07:00
Jihoon Son da32e1ae53 Reducing testing time for KafkaIndexTaskTest and KafkaSupervisorTest (#4352) 2017-06-03 00:53:07 +09:00
Jihoon Son f876246af7 Rename FiniteAppenderatorDriver to AppenderatorDriver (#4356) 2017-06-03 00:48:44 +09:00
Jihoon Son 1150bf7a2c Refactoring Appenderator Driver (#4292)
* Refactoring Appenderator

1) Added publishExecutor and handoffExecutor for background publishing and handing segments off
2) Change add() to not move segments out in it

* Address comments

1) Remove publishTimeout for KafkaIndexTask
2) Simplifying registerHandoff()
3) Add increamental handoff test

* Remove unused variable

* Add persist() to Appenderator and more tests for AppenderatorDriver

* Remove unused imports

* Fix strict build

* Address comments
2017-06-02 07:09:11 +09:00
Jihoon Son 7889891bd3 Fix integration tests (#4337)
* Fix integration tests

1) Use the same version of kafka
2) Change ServiceEmitter from LazySingleton to ManageLifecycle

* Revert unnecessary change
2017-05-28 08:48:39 -07:00
Jihoon Son 733dfc9b30 Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193)
* Add PrefetcheableTextFilesFirehoseFactory

* fix comment

* exception handling

* Fix wrong json property

* Remove ReplayableFirehoseFactory and fix misspelling

* Defer object initialization

* Add a temporaryDirectory parameter to FirehoseFactory.connect()

* fix when cache and fetch are disabled

* Address comments

* Add more test

* Increase timeout for test

* Add wrapObjectStream

* Move methods to Firehose from PrefetchableFirehoseFactory

* Cleanup comment

* add directory listing to s3 firehose

* Rename a variable

* Addressing comments

* Update document

* Support disabling prefetch

* Fix race condition

* Add fetchLock

* Remove ReplayableFirehoseFactoryTest

* Fix compilation error

* Fix test failure

* Address comments

* Add default implementation for new method
2017-05-18 15:37:18 +09:00
Himanshu daa8ef8658 Optional long-polling based segment announcement via HTTP instead of Zookeeper (#3902)
* Optional long-polling based segment announcement via HTTP instead of Zookeeper

* address review comments

* make endpoint /druid-internal/v1 instead of /druid/internal so that jetty qos filters can be configured easily when needed

* update segment callback initialization to be called only after first segment list fetch has been succeeded from all servers

* address review comments

* remove size check not required anymore as only segment servers announce themselves and not all peon processes

* annouce segment server on historical only after cached segments are loaded

* fix checkstyle errors
2017-05-17 16:31:58 -05:00
Roman Leventov b7a52286e8 Make @Override annotation obligatory (#4274)
* Make MissingOverride an error

* Make travis stript to fail fast

* Add missing Override annotations

* Comment
2017-05-16 13:30:30 -05:00
David Lim 8333043b7b add skipOffsetGaps flag (#4256) 2017-05-16 12:19:28 -06:00
Roman Leventov 1ebfa22955 Update Error prone configuration; Fix bugs (#4252)
* Make Errorprone the default compiler

* Address comments

* Make Error Prone's ClassCanBeStatic rule a error

* Preconditions allow only %s pattern

* Fix DruidCoordinatorBalancerTester

* Try to give the compiler more memory

* Remove distribution module activation on jdk 1.8 because only jdk 1.8 is used now

* Don't show compiler warnings

* Try different travis script

* Fix travis.yml

* Make Error Prone optional again

* For error-prone compiler

* Increase compiler's maxmem

* Don't run Error Prone for benchmarks because of OOM

* Skip install step in Travis

* Remove MetricHolder.writeToChannel()

* In travis.yml, check compilation before tests, because it may fail faster
2017-05-12 15:55:17 +09:00
Roman Leventov e09e892477 Refactor QueryRunner to accept QueryPlus: Query + QueryMetrics (part of #3798) (#4184)
* Add QueryPlus. Add QueryRunner.run(QueryPlus, Map) method with default implementation, to replace QueryRunner.run(Query, Map).

* Fix GroupByMergingQueryRunnerV2

* Fix QueryResourceTest

* Expand the comment to Query.run(walker, context)

* Remove legacy version of BySegmentSkippingQueryRunner.doRun()

* Add LegacyApiQueryRunnerTest and be more specific about legacy API removal plans in Druid 0.11 in Javadocs
2017-05-10 12:25:00 -07:00
Parag Jain 1fd177039d fix auto reset - pause task instead of putting thread to sleep (#4244) 2017-05-08 15:08:25 -07:00
Parag Jain eb8e1b0a97 Prevent interrupted exception from polluting log during supervisor shutdown (#4253)
* Prevent interrupted exception from polluting log during supervisor shutdown

* do nothing in case of InterruptedException
2017-05-08 15:05:25 -07:00
Parag Jain 4502c207af fix injection bug and documentation (#4243) 2017-05-03 15:07:43 -05:00
Parag Jain f9a61ea2ba Kafka lag emitter - Kafka Indexing Service (#4194)
* Kafka lag emitter

* enforce minimum emit period to a minute

* fixed comment
2017-05-02 17:30:07 -06:00
satishbhor d51097c809 Fix lz4 library incompatibility in kafka-indexing-service extension (#4115)
* Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Bumped Kafka version to 0.10.2.0 for : Fix lz4 library incompatibility in kafka-indexing-service extension #3266

* Replaced Lists.newArrayList() with Collections.singletonList() For Fix lz4 library incompatibility in kafka-indexing-service extension #4115
2017-04-25 12:23:51 +09:00
Gian Merlino 2ca7b00346 Update versions to 0.10.1-SNAPSHOT. (#4191) 2017-04-20 18:12:28 -07:00
Parag Jain 7e0d4c9555 secure supervisor endpoints (#3985) 2017-04-05 16:42:32 -07:00
Nishant Bangarwa adbe89e7d6 Fix race in KafkaIndexTaskTest (#4031)
task.pause(0) can return early before the task is actually paused.
Exception for failure -
java.lang.AssertionError: expected:<PAUSED> but was:<READING>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
	at
io.druid.indexing.kafka.KafkaIndexTaskTest.testRunWithOffsetOutOfRangeEx
ceptionAndPause(KafkaIndexTaskTest.java:1229)

To reproduce add Thread.sleep(10000) in beginning of
KafkaIndexTask.possiblypause method.
2017-03-09 07:34:46 -08:00
praveev 5ccfdcc48b Fix testDeadlock timeout delay (#3979)
* No more singleton. Reduce iterations

* Granularities

* Fix the delay in the test

* Add license header

* Remove unused imports

* Lot more unused imports from all the rearranging

* CR feedback

* Move javadoc to constructor
2017-02-28 12:51:41 -06:00
praveev c3bf40108d One granularity (#3850)
* Refactor Segment Granularity

* Beginning of one granularity

* Copy the fix for custom periods in segment-grunalrity over here.

* Remove the custom serialization for now.

* Compilation cleanup

* Reformat code

* Fixing unit tests

* Unify to use a single iterable

* Backward compatibility for rolling upgrade

* Minor check style. Cosmetic changes.

* Rename length and millis to duration

* CR feedback

* Minor changes.
2017-02-25 01:02:29 -06:00
Parag Jain 1f263fe50b alert when resetting offsets (#3931)
* alert when resetting offsets

* add more data to alerts
2017-02-13 13:49:24 -08:00
Parag Jain 8e31a465ad report hand off count finite appenderator driver (#3925) 2017-02-13 10:41:24 -08:00
Gian Merlino 12317fd001 Bump version to 0.10.0-SNAPSHOT. (#3913) 2017-02-06 17:54:35 -08:00
Parag Jain 1aabb45a09 auto reset option for Kafka Indexing service (#3842)
* auto reset option for Kafka Indexing service in case message at the offset being fetched is not present anymore at kafka brokers

* review comments

* review comments

* reverted last change

* review comments

* review comments

* fix typo
2017-02-02 14:57:45 -06:00
Parag Jain b3dae0efc3 catch all errors (#3844) 2017-01-24 18:01:30 -07:00
Gian Merlino 6440ddcbca Fix #3795 (Java 7 compatibility). (#3796)
* Fix #3795 (Java 7 compatibility).

Also introduce Animal Sniffer checks during build, which would
have caught the original problems.

* Add Animal Sniffer on caffeine-cache for JDK8.
2016-12-21 10:19:13 -08:00
David Lim 0b9dff0bc1 fix worker thread pool exhaustion bug (#3760)
* fix worker thread pool exhaustion bug

* code review changes

* code review changes
2016-12-09 15:23:11 -08:00
David Lim 7f087cdd3b allow Kafka consumer group.id to be overriden by config (#3765) 2016-12-08 15:53:13 -08:00
Parag Jain 7ee6bb7410 option to reset offest automatically in case of OffsetOutOfRangeException (#3678)
* option to reset offset automatically in case of OffsetOutOfRangeException
if the next offset is less than the earliest available offset for that partition

* review comments

* refactoring

* refactor

* review comments
2016-11-21 16:29:46 -06:00
Gian Merlino bcd20441be Make buildV9Directly the default. (#3688) 2016-11-14 09:29:32 -08:00
Akash Dwivedi 4b3bd8bd63 Migrating java-util from Metamarkets. (#3585)
* Migrating java-util from Metamarkets.

* checkstyle and updated license on java-util files.

* Removed unused imports from whole project.

* cherry pick metamx/java-util@826021f.

* Copyright changes on java-util pom, address review comments.
2016-10-21 14:57:07 -07:00
David Lim c2ae734848 KafkaIndexTask: Allow run thread to stop gracefully instead of interrupting (#3534)
* allow run thread to gracefully complete instead of interrupting when stopGracefully() is called

* add comments
2016-10-17 10:52:19 -04:00
Parag Jain 592903571a add context to kafka supervisor for the kafka indexing task (#3464) 2016-10-04 20:08:43 -05:00
Parag Jain e419407eba handle supervisor spec metadata failures (#3456)
close kafka consumer in case supervisor start fails
2016-10-04 10:15:28 -07:00
Gian Merlino 40f2fe7893 Bump versions to 0.9.3-SNAPSHOT (#3524) 2016-09-29 13:53:32 -07:00
Parag Jain 15c9918c65 log exceptions while trying to pause task (#3504) 2016-09-23 16:53:23 -07:00
David Lim 9226d4af3c configurable shutdownTimeout for Kakfa supervisor (#3497)
* configurable shutdownTimeout

* cr change
2016-09-23 13:26:45 -06:00
David Lim ca9114b41b add supervisor reset API (#3484)
* add supervisor reset API

* CR doc changes and kill running tasks / clear offsets from supervisor
2016-09-22 17:51:06 -07:00
Himanshu 05ea88df5c fix kafka-indexing-service pom to not reference specific version but parent version for druid core dependencies (#3472) 2016-09-20 15:18:21 -07:00
David Lim 96fcca18ea update KafkaSupervisor to make HTTP requests to tasks in parallel where possible (#3452) 2016-09-20 22:51:15 +05:30
Gian Merlino bcff08826b KafkaIndexTask: Treat null values as unparseable. (#3453) 2016-09-13 10:56:38 -07:00
Jonathan Wei df766b2bbd Add dimension handling interface for ingestion and segment creation (#3217)
* Add dimension handling interface for ingestion and segment creation

* update javadocs for DimensionHandler/DimensionIndexer

* Move IndexIO row validation into DimensionHandler

* Fix null column skipping in mergerV9

* Add deprecation note for 'numeric_dims' filename pattern in IndexIO v8->v9 conversion

* Fix java7 test failure
2016-09-12 12:54:02 -07:00
David Lim 146a17de48 KafkaIndexTask: allow pause to break out of retry loop (#3401) 2016-09-06 22:29:37 -06:00
David Lim 5b1ae21bd1 retry calls to getStartTime (#3429) 2016-09-06 14:02:22 -07:00
David Lim ed924bf214 allow registrants to opt out of announcing themselves when registering as a chat handler (#3360) 2016-08-16 10:51:28 +05:30
Gian Merlino 8030f1cb67 Be more respectful of maxRowsInMemory. (#3284)
- Appenderator: Respect maxRowsInMemory across all sinks.
- KafkaIndexTask: Respect maxRowsInMemory across all partitions.
2016-07-26 15:02:35 -06:00
David Lim 1d40df4bb7 fix kafka consumer concurrent access during shutdown (#3193) 2016-06-28 13:23:17 -07:00
Gian Merlino ebf890fe79 Update master version to 0.9.2-SNAPSHOT. (#3133) 2016-06-13 13:10:38 -07:00
David Lim 4faa298977 update kafka client for kafka indexing service to 0.9.0.1 (#3109) 2016-06-08 06:51:03 -07:00
David Lim a2290a8f05 support seamless config changes (#3051) 2016-06-03 13:50:19 -07:00
David Lim f6c39cc844 Kafka task minimum message time (#3035)
* add KafkaIndexTask support for minimumMessageTime

* add Kafka supervisor support for lateMessageRejectionPeriod
2016-05-31 11:37:00 -07:00
David Lim 3ef24c03b3 Validate X-Druid-Task-Id header in request/response and support retrying on outdated TaskLocation information, add KafkaIndexTaskClient unit tests (#3006)
* validate X-Druid-Task-Id header in request and add header to response

* modify KafkaIndexTaskClient to take a TaskLocationProvider as the TaskLocation may not remain constant
2016-05-25 22:05:18 -07:00
Charles Allen 15ccf451f9 Move QueryGranularity static fields to QueryGranularities (#2980)
* Move QueryGranularity static fields to QueryGranularityUtil
* Fixes #2979

* Add test showing #2979

* change name to QueryGranularities
2016-05-17 16:23:48 -07:00
David Lim b489f63698 Supervisor for KafkaIndexTask (#2656)
* supervisor for kafka indexing tasks

* cr changes
2016-05-04 23:13:13 -07:00
Gian Merlino f8ddfb9a4b Split SegmentInsertAction and SegmentTransactionalInsertAction for backwards compat. (#2922)
Fixes #2912.
2016-05-04 13:54:34 -07:00
David Lim 7641f2628f add control and status endpoints to KafkaIndexTask (#2730) 2016-04-21 15:34:59 -07:00
Gian Merlino 08c784fbf6 KafkaIndexTask: Use a separate sequence per Kafka partition in order to make (#2844)
segment creation deterministic.

This means that each segment will contain data from just one Kafka
partition. So, users will probably not want to have a super high number
of Kafka partitions...

Fixes #2703.
2016-04-18 22:29:52 -07:00
Gian Merlino 7e7a886f65 Move druid-api into the druid repo.
This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf
in the druid-api repo.
2016-03-24 11:04:34 -07:00
jon-wei a59c9ee1b1 Support use of DimensionSchema class in DimensionsSpec 2016-03-21 13:12:04 -07:00
Gian Merlino 738dcd8cd9 Update version to 0.9.1-SNAPSHOT.
Fixes #2462
2016-03-17 10:34:20 -07:00
Gian Merlino f22fb2c2cf KafkaIndexTask.
Reads a specific offset range from specific partitions, and can use dataSource metadata
transactions to guarantee exactly-once ingestion.

Each task has a finite lifecycle, so it is expected that some process will be supervising
existing tasks and creating new ones when needed.
2016-03-10 18:41:43 -08:00