Commit Graph

1498 Commits

Author SHA1 Message Date
Gian Merlino 5ce3185b9c Fix three bugs with segment publishing. (#6155)
* Fix three bugs with segment publishing.

1. In AppenderatorImpl: always use a unique path if requested, even if the segment
   was already pushed. This is important because if we don't do this, it causes
   the issue mentioned in #6124.
2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return
   a "not published" result instead of throwing an exception, when there was one
   metadata update failure, followed by some random exception. This is done by
   resetting the AtomicBoolean that tracks what case we're in, each time the
   callback runs.
3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false
   publish result. Skip killing if we just got some exception. The reason for this
   is that we want to avoid killing segments if they are in an unknown state.

Two other changes to clarify the contracts a bit and hopefully prevent future bugs:

1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it
more similar to announceHistoricalSegments.
2. Make it explicit, at multiple levels of javadocs, that a "false" publish result
must indicate that the publish _definitely_ did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.

* Remove javadoc-only import.

* Updates.

* Fix test.

* Fix tests.
2018-08-15 13:55:53 -07:00
Karol Woźniak da3a1f61ac Fix appenderator_realtime creating shards bigger by 1 than maxRowsPerSegment (#6125) 2018-08-10 22:29:06 -07:00
Jihoon Son d6a02de5b5 Add support 'keepSegmentGranularity' for compactionTask (#6095)
* Add keepSegmentGranularity for compactionTask

* fix build

* createIoConfig method

* fix build

* fix build

* address comments

* fix build
2018-08-09 13:51:20 -07:00
Jihoon Son 577632f5c1
Fix missing argument of TaskToolbox (#6121) 2018-08-07 17:18:56 -07:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00
Jihoon Son ef2d6e9118
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 (#6086)
* Fix TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2

* Make the priority of taskLock nullable

* fix test

* fix build
2018-08-03 17:13:44 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Roman Leventov 0754d78a2e Prohibit Lists.newArrayList() with a single argument (#6068)
* Prohibit Lists.newArrayList() with a single argument

* Test fixes

* Add Javadoc to Node constructor
2018-07-31 20:09:10 -07:00
Jonathan Wei 91943a24db
Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask (#6065)
* Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask

* Make recursive
2018-07-31 15:30:44 -07:00
Gian Merlino 3aa7017975 Remove some unnecessary task storage internal APIs. (#6058)
* Remove some unnecessary task storage internal APIs.

- Remove MetadataStorageActionHandler's getInactiveStatusesSince and getActiveEntriesWithStatus.
- Remove TaskStorage's getCreatedDateTimeAndDataSource.
- Remove TaskStorageQueryAdapter's getCreatedTime, and getCreatedDateAndDataSource.
- Migrated all callers to getActiveTaskInfo and getCompletedTaskInfo.

This has one side effect: since getActiveTaskInfo (new) warns and continues when it
sees unreadable tasks, but getActiveEntriesWithStatus threw an exception when it
encountered those, it means that after this patch bad tasks will be ignored when
syncing from metadata storage rather than causing an exception to be thrown.

IMO, this is an improvement, since the most likely reason for bad tasks is either:

- A new version introduced an additional validation, and a pre-existing task doesn't
  pass it.
- You are rolling back from a newer version to an older version.

In both cases, I believe you would want to skip tasks that can't be deserialized,
rather than blocking overlord startup.

* Remove unused import.

* Fix formatting.

* Fix formatting.
2018-07-30 18:35:06 -07:00
Gian Merlino 63be028cee
CompactionTask: Reject empty intervals on construction. (#6059)
* CompactionTask: Reject empty intervals on construction.

They don't make sense anyway, and it's better to fail fast.

* Switch API.
2018-07-30 08:52:50 -07:00
Benedict Jin 331a0afb98 Remove redundant type parameters and enforce some other style and inspection rules (#5980)
* Various changes about druid-services module

* Patch improvements from reviewer

* Add ToArrayCallWithZeroLengthArrayArgument & ArraysAsListWithZeroOrOneArgument into inspection profile

* Fix ArraysAsListWithZeroOrOneArgument

* Fix conflict

* Fix ToArrayCallWithZeroLengthArrayArgument

* Fix AliEqualsAvoidNull

* Remove blank line

* Remove unused import clauses

* Fix code style in TopNQueryRunnerTest

* Fix conflict

* Don't use Collections.singletonList when converting the type of array type

* Add argLine into maven-surefire-plugin in druid-process module & increase the timeout value for testMoveSegment testcase

* Roll back the latest commit

* Add java.io.File#toURL() into druid-forbidden-apis

* Using Boolean.parseBoolean instead of Boolean.valueOf for CliCoordinator#isOverlord

* Add a new regexp element into stylecode xml file

* Fix style error for new regexp

* Set the level of ArraysAsListWithZeroOrOneArgument as WARNING

* Fix style error for new regexp

* Add option BY_LEVEL for ToArrayCallWithZeroLengthArrayArgument in inspection profile

* Roll back the level as ToArrayCallWithZeroLengthArrayArgument as ERROR

* Add toArray(new Object[0]) regexp into checkstyle config file & fix them

* Set the level of ArraysAsListWithZeroOrOneArgument as ERROR & Roll back the level of ToArrayCallWithZeroLengthArrayArgument as WARNING until Youtrack fix it

* Add a comment for string equals regexp in checkstyle config

* Fix code format

* Add RedundantTypeArguments as ERROR level inspection

* Fix cannot resolve symbol datasource
2018-07-27 16:56:49 -05:00
Surekha 74ae73df42 Add the correct createdTime to waiting tasks (#6060) 2018-07-27 14:56:19 -07:00
Jihoon Son 1524af703d
Fix IllegalArgumentException in TaskLockBox.syncFromStorage() (#6050) 2018-07-27 10:43:32 -07:00
Surekha 414487a78e Add support to filter on datasource for active tasks (#5998)
* Add support to filter on datasource for active tasks

* Added datasource filter to sql query for active tasks
* Fixed unit tests

* Address PR comments
2018-07-19 16:33:46 -07:00
Jihoon Son c48aa74a30 Fix NPE while handling CheckpointNotice in KafkaSupervisor (#5996)
* Fix NPE while handling CheckpointNotice

* fix code style

* Fix test

* fix test

* add a log for creating a new taskGroup

* fix backward compatibility in KafkaIOConfig
2018-07-13 17:14:57 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Gian Merlino 948e73da77 Extend various test timeouts. (#5978)
False failures on Travis due to spurious timeout (in turn due to noisy
neighbors) is a bigger problem than legitimate failures taking too long
to time out. So it makes sense to extend timeouts.
2018-07-10 13:02:14 -07:00
Jihoon Son 1ccabab98e Fix the broken Appenderator contract in KafkaIndexTask (#5905)
* Fix broken Appenderator contract in KafkaIndexTask

* fix build

* add publishFuture

* reuse sequenceToUse if possible
2018-07-03 13:31:29 -07:00
Jihoon Son b6c957b0d2 Allow reordered segment allocation in kafka indexing service (#5805)
* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
2018-07-02 15:09:12 -07:00
Surekha 933b25416c Handle task deserialization failure in the tasks api (#5911)
If task payload fails to deserialize json to Java, make the task null and handle null task in OverlordResource
2018-06-29 11:57:48 -07:00
Jonathan Wei f3e1520360
Fix merge for TrueDimFilter (#5916)
* Fix merge for TrueDimFilter

* remove unused cache ID
2018-06-28 14:46:47 -07:00
Jihoon Son 8c5ded0fad
Splitting KafkaIndexTask for better code maintenance (#5854)
* Refactoring KafkaIndexTask for better code maintenance

* fix bug

* fix test

* add annotation

* fix checkstyle

* remove SetEndOffsetsResult
2018-06-22 13:00:03 -07:00
Surekha 8619adb5b9 Improve task retrieval APIs on Overlord (#5801)
* Add the new tasks api in overlordResource

It takes 4 optional query params
* state(pending/running/waiting/compelte)
* dataSource
* interval (applies to completed tasks)
* maxCompletedTasks (applies to completed tasks)

If all params are null, the api returns all the tasks

* Add the state to each task returned by tasks endpoint

* divide active tasks into waiting, pending or running
* Add more unit tests

* Add UNKNOWN state to TaskState

* Fix the authorization calls

* WIP: PR comments

Added new class to capture task info for caching
Other refactoring

* Refactoring : move TaskStatus class to druid-api

so it can be accessed within server
And other related classes like TaskState and TaskStatusPlus are in api

* Remove unused class and apis accessing it

* Add a separate cache for recently completed tasks

This is to mainly capture the task type from payload

* Ignore a test

* Add a RuntimeTaskState to encompass all states a task can be in

* Revert "Add a RuntimeTaskState to encompass all states a task can be in"

This reverts commit 2a527a0731.

* Fix wrong api call

* Fix and unignore tests

* Remove waiting,pending state from TaskState

* Add RunnerTaskState

* Missed the annotation runnerStatusCode

* Fix the creationTime

* Fix the createdTime and queueInsertionTime for running/active tasks
* Clean up tests

* Add javadocs

* Potentially fix the teamcity build

* Address PR comments

*Get rid of TaskInfoBuilder
*Make TaskInfoMapper static nested class
*Other changes

* fix import in MaterializedViewSupervisor after merge

* Address PR comments on

* Replace global cache with local map
* combine multiple queries into one
* Removed unused code

* Fix unit tests

Fix a bug in securedTaskStatusPlus

* Remove getRecentlyFinishedTaskStatuses method

Change TaskInfoMapper signature to add generic type

* Address PR comments

* Passed datasource as argument to be used in sql query
* Other minor fixes

* Address PR comments

*Some minor changes, rename method, spacing changes

* Add early auth check if datasource is not null

* Fix test case

* Add max limit to getRecentlyFinishedTaskInfo in HeapMemoryTaskStorage
* Add TaskLocation to Anytask object

* Address PR comments

* Fix a bug in test case causing ClassCastException
2018-06-19 11:34:59 -07:00
Jihoon Son 2feec44a55 Fix mismatch in revoked task locks between memory and metastore after sync from storage (#5858)
* Fix mismatched revoked task locks after sync from storage

* fix build

* fix log

* fix lock release
2018-06-12 10:25:34 -04:00
zhangxinyu e43e5ebbcd Materialized view implementation (#5556)
* implement materialized view

* modify code according to jihoonson's comments

* modify code according to jihoonson's comments - 2

* add documentation about materialized view

* use new HadoopTuningConfig in pr 5583

* add minDataLag and fix optimizer bug

* correct value of DEFAULT_MIN_DATA_LAG_MS

* modify code according to jihoonson's comments - 3

* use the boolean expression instead of if-else
2018-06-09 12:24:54 -07:00
Jonathan Wei 684b5d18c1
Moving averages for ingestion row stats (#5748)
* Moving averages for ingestion row stats

* PR comments

* Make RowIngestionMeters extensible

* test and checkstyle fixes

* More PR comments

* Fix metrics

* Add some comments

* PR comments

* Comments
2018-06-05 09:08:57 -07:00
Atul Mohan 1b9611a60e Local indexing from RDBMS (#5441)
* Local indexing from RDBMS

*  Fix content

* Remove pom changes

* Remove extraneous space

* Add tests and update documentation

* Fix comments

* Fix docs

*  Fix build related issue

*  Handle invalid strings

* Make target database independent of metadata storage

* Add firehose connector

* Fix accessibility

* Add docs

* Remove unused def

* Remove lazy instantiation of jsoniterator

* Move unused changes

* Move unused changes

* Fix build

* Make Sqlfirehose method private
2018-05-22 12:33:01 +09:00
Gian Merlino f2cc6ce4d5
VersionedIntervalTimeline: Optimize construction with heavily populated holders. (#5777)
* VersionedIntervalTimeline: Optimize construction with heavily populated holders.

Each time a segment is "add"ed to a timeline, "isComplete" is called on the holder
that it is added to. "isComplete" is an O(segments per chunk) operation, meaning
that adding N segments to a chunk is an O(N^2) operation. This blows up badly if
we have thousands of segments per chunk.

The patch defers the "isComplete" check until after all segments have been
inserted.

* Fix imports.
2018-05-16 09:16:59 -07:00
Jihoon Son 9dca5ec76b Simple cleanup for ThreadPoolTaskRunner and SetAndVerifyContextQueryRunner / Add ThreadPoolTaskRunnerTest (#5557)
* Simple fix for ThreadPoolTaskRunner

* fix build

* address comments

* update javadoc

* fix build

* fix test

* add dependency
2018-05-15 22:53:11 +05:30
Jihoon Son c9d645103b Fix metrics for inserting segments (#5749)
* Fix metrics for inserting segments

* Add a comment
2018-05-08 13:07:39 -07:00
Kirill Kozlov 67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
Slim Bouguerra 8aa8d9fa5b
Kerberos Spnego Authentication Router Issue (#5706)
* Adding decoration method to proxy servlet

Change-Id: I872f9282fb60bfa20524271535980a36a87b9621

* moving the proxy request decoration to authenticators

Change-Id: I7f94b9ff5ecf08e8abf7169b58bc410f33148448

* added docs

Change-Id: I901543e52f0faf4666bfea6256a7c05593b1ae70

* use the authentication result to decorate request

Change-Id: I052650de9cd02b4faefdbcdaf2332dd3b2966af5

* adding authenticated by name

Change-Id: I074d2933460165feeddb19352eac9bd0f96f42ca

* ensure that authenticator is not null

Change-Id: Idb58e308f90db88224a06f3759114872165b24f5

* fix types and minor bug

Change-Id: I6801d49a05d5d8324406fc0280286954eb66db10

* fix typo

Change-Id: I390b12af74f44d760d0812a519125fbf0df4e97b

* use actual type names

Change-Id: I62c3ee763363781e52809ec912aafd50b8486b8e

* set authenitcatedBy to null for AutheticationResults created by
Escalator.

Change-Id: I4a675c372f59ebd8a8d19c61b85a1e4bf227a8ba
2018-05-05 20:33:51 -07:00
kaijianding c12c16385e support throw duplcate row during realtime ingestion in RealtimePlumber (#5693) 2018-05-04 10:12:25 -07:00
Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Gian Merlino 762f8829e4
Add task action metrics, add taskId metric dimension. (#5714)
* Add task action metrics, add taskId metric dimension.

Adds two new metrics: task/action/log/time and task/action/run/time. Also
adds taskId as a dimension, to give us the ability to drill down into metrics
for an individual task. Also standardizes metrics-attachment using two helper
methods in IndexTaskUtils.

* Fix typo
2018-04-29 21:24:06 -07:00
Jihoon Son 23dc0d5b24 Better logging for taskLockBox (#5703) 2018-04-28 21:08:10 -07:00
Roman Leventov 9be000758d Refactor index merging, replace Rowboats with RowIterators and RowPointers (#5335)
* Refactor index merging, replace Rowboats with RowIterators and RowPointers

* Add javadocs

* Fix a bug in QueryableIndexIndexableAdapter

* Fixes

* Remove unused declarations

* Remove unused GenericColumn.isNull() method

* Fix test

* Address comments

* Rearrange some code in MergingRowIterator for more clarity

* Self-review

* Fix style

* Improve docs

* Fix docs

* Rename IndexMergerV9.writeDimValueAndSetupDimConversion to setUpDimConversion()

* Update Javadocs

* Minor fixes

* Doc fixes, more code comments, cleanup of RowCombiningTimeAndDimsIterator

* Fix doc link
2018-04-27 17:34:32 -07:00
Roman Leventov a3a9ada843 Add GenericWhitespace checkstyle check (#5668) 2018-04-24 01:09:14 +05:30
Jihoon Son f349e03091 Fix NPE in compactionTask (#5613)
* Fix NPE in compactionTask

* more annotations for metadata

* better error message for empty input

* fix build

* revert some null checks

* address comments
2018-04-13 00:11:03 -04:00
Caroline1000 d709d1a59f correct overlord console header. Because it's not the coordinator console, it's the overlord console. (#5627) 2018-04-11 20:44:31 -04:00
Nishant Bangarwa e6efd75a3d Add config to allow setting up custom unsecured paths for druid nodes. (#5614)
* Add config to allow setting up custom unsecured paths for druid nodes.

* return all resources for Unsecured paths

* review comment - Add test

* fix tests

* fix test
2018-04-11 17:10:07 -07:00
Jihoon Son 298ed1755d Fix indexTask to respect forceExtendableShardSpecs (#5509)
* Fix indexTask to respect forceExtendableShardSpecs

* add comments
2018-04-05 23:54:59 -07:00
Jonathan Wei 969342cd28
More error reporting and stats for ingestion tasks (#5418)
* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments
2018-04-05 21:38:57 -07:00
Jihoon Son 7239f56131 Fix NPE in RemoteTaskRunner when some tasks in ZooKeeper but not in Overlord (#5511)
* Fix NPE in RemoteTaskRunner when some tasks in ZooKeeper but not in Overlord

* revert unnecessary change
2018-04-03 21:15:58 -07:00
Jonathan Wei 723f7ac550
Add support for task reports, upload reports to deep storage (#5524)
* Add support for task reports, upload reports to deep storage

* PR comments

* Better name for method

* Fix report file upload

* Use TaskReportFileWriter

* Checkstyle

* More PR comments
2018-04-02 12:10:56 -07:00
Kirill Kozlov 8878a7ff94 Replace guava Charsets with native java StandardCharsets (#5545) 2018-03-28 21:00:08 -07:00
Jihoon Son 1ad898bde2
Use the official aws-sdk instead of jet3t (#5382)
* Use the official aws-sdk instead of jet3t

* fix compile and serde tests

* address comments and fix test

* add http version string

* remove redundant dependencies, fix potential NPE, and fix test

* resolve TODOs

* fix build

* downgrade jackson version to 2.6.7

* fix test

* resolve the last TODO

* support proxy and endpoint configurations

* fix build

* remove debugging log

* downgrade hadoop version to 2.8.3

* fix tests

* remove unused log

* fix it test

* revert KerberosAuthenticator change

* change hadoop-aws scope to provided in hdfs-storage

* address comments

* address comments
2018-03-21 15:36:54 -07:00
Charles Allen 58f110f7f8 Future-proof some Guava usage (#5414)
* Future-proof some Guava usage

* Use a java-util EmptyIterator instead of Guava's
* Change some of the guava future handling to do manual async
transforms. Guava changes transform into transformAsync by deprecating
transform in ONLY Guava 19. Then its gone in 20

* Use `Collections.emptyIterator()`

* Pretty formatting

* Make listenable future transforms a thing in default druid

* Format fix

* Add forbidden guava apis

* Make the ListenableFutrues.transformAsync have comments

* Undo intellij bad pattern matching in comments

* Futrues --> Futures

* Add empty iterators forbidding

* Fix extra `A`

* Correct method signature

* Address review comments

* Finish Gian review comments

* Proper syntax from https://github.com/policeman-tools/forbidden-apis/wiki/SignaturesSyntax
2018-03-20 08:59:33 -07:00
Jonathan Wei b22455b924
Fix supervisor tombstone auth handling (#5504) 2018-03-19 12:55:47 -07:00
Roman Leventov 693e3575f9
Remove unused code and exception declarations (#5461)
* Remove unused code and exception declarations

* Address comments

* Remove redundant Exception declarations

* Make FirehoseFactoryV2.connect() to throw IOException again
2018-03-16 22:11:12 +01:00
Jonathan Wei 30e6bdedf3 Authorize supervisor history instead of current active supervisors for supervisor history API (#5501) 2018-03-16 12:29:17 -07:00
Jihoon Son 9b2a25bd84
Refactor supervisorReport to be type-safe (#5479)
* refactor supervisorReport

* use primitives
2018-03-13 09:28:44 -07:00
Niraja Mishra 96cebfc222 As part of this feature, implemented a new endpoint to get running tasks by datasources (#5260)
and added datasource information as part of existing endpoint /druid/indexer/v1/runningTasks.

Added junit test cases for the newly implemented API and fixed existing junit test cases.

Fixed review comments - added new method getCreatedDateTimeAndDataSource into TaskStorageQueryAdapter class
and formatted changed files
2018-03-12 23:48:11 -07:00
Alexander Korablev 8a51800693 fix PortFinder issue #5466 (#5467) 2018-03-05 16:58:49 -08:00
Jonathan Wei b63f1c0e45
Fix authorization check in supervisor history API (#5460) 2018-03-02 14:03:07 -08:00
Kevin Conaway 969a12f6ca #5425 Refactor to use map.get() when asserting the existence of published segments (#5426) 2018-03-02 19:42:39 +01:00
Gian Merlino e4eaee3806
Support for disabling bitmap indexes. (#5402)
* Support for disabling bitmap indexes.

Can save space for columns where bitmap indexes are pointless (like
free-form text).

* Remove import.

* Fix CompactionTaskTest.

* Update for review comments.

* Review comments, tests.

* Fix test.
2018-02-28 19:19:56 -08:00
Gian Merlino 3f72537787 Fix missing task type in task payload API. (#5399)
* Fix missing task type in task payload API.

Apparently embedding a polymorphic object inside a Map<String, Object> is
a bit too much for Jackson to serialize properly. Fix this by using
wrapper classes.

* Fix OverlordTest casts.

* Remove import.

* Remove unused imports.

* Clarify comments.
2018-02-21 10:00:13 -08:00
Jihoon Son cd929000ca
Change early publishing to early pushing in indexTask & refactor AppenderatorDriver (#5297)
* Fix early publishing to early pushing in batch indexing & refactor appenderatorDriver

* fix compile

* rename and add more javadocs

* Fix conflicts

* address comments

* revert await executors

* fix test
2018-02-14 12:48:33 -08:00
Jonathan Wei b234a119ac Log exceptions thrown before persist() for indexing tasks (#5374)
* Log exceptions thrown before persist() for indexing tasks

* PR comment
2018-02-13 09:20:07 -08:00
Roman Leventov e64ffb10c2 Standartize on using Integer.BYTES instead of Ints.BYTES from Guava, same for other primitives (#5366) 2018-02-07 13:24:30 -08:00
Jihoon Son 2099b43e5f Add a new config object for compactConfig (#5264)
* add a new config object for compactConfig

* fix test

* address comments

* Update doc
2018-02-06 12:13:52 -08:00
Kevin Conaway 93fdbcb364 Change RealtimeIndexTask to use AppenderatorDriver (#5261)
* Change RealtimeIndexTask to use AppenderatorDriver instead of RealtimePlumber.  Related to #4774

* Remove unused throwableDuringPublishing

* Fix usage of forbidden API

* Update realtime index IT to account for not skipping older data any more

* Separate out waiting on publish futures and handoff futures to avoid a race condition where the handoff timeout expires before the segment is published

* #5261 Add separate AppenderatorDriverRealtimeIndexTask and revert changes to RealtimeIndexTask

* #5261 Add separate AppenderatorDriverRealtimeIndexTask and revert changes to RealtimeIndexTask

* #5261 Readability improvements in AppenderatorDriverRealtimeIndexTask.  Combine publish and handoff futures in to single future

* #5261 Add separate tuningConfig for RealtimeAppenderatorIndexTask.  Revert changes to RealtimeTuningConfig

* #5261 Change JSON type to realtime_appenderator to keep the same naming pattern as RealtimeIndexTask
2018-02-06 10:21:31 -08:00
Gian Merlino 9a62b02cb7 Extensions: Option to load classes from extension jars first. (#5321)
The behavior is configurable through druid.extensions.useExtensionClassloaderFirst.
It is useful when extensions want to load a dependency different from one provided
by Druid, for example a different version of geoip or protobuf.
2018-02-06 16:14:03 +05:30
Gian Merlino 7e02408510 Update versions to 0.13.0-SNAPSHOT. (#5323) 2018-02-02 12:06:38 -06:00
Clint Wylie 1fffc681d2 fix RemoteTaskRunner terminating lazy workers below autoscaler minNumWorkers value (#5310)
* fix RemoteTaskRunner terminating lazy workers below autoscaler minNumWorkers value

* add comment
2018-02-01 17:57:01 +01:00
Jihoon Son 3a69b0e513 Handle nullable taskTypes for rolling upgrade (#5309) 2018-01-30 13:32:54 -08:00
Himanshu 59250cf19b add taskType from announcement in HttpRemoteTaskRunnerWorkItem (#5301) 2018-01-26 15:47:35 -08:00
Jonathan Wei 80419752b5 Add metamx emitter, http clients, and metrics packages to druid java-util (#5289)
* Add metamx java-util emitter, http clients, and metrics packages to druid java-util

* Remove metamx java-util from pom.xml files

* Checkstyle fixes

* Import fix

* TeamCity inspection fixes

* Use slf4j, move some version defs to master pom.xml

* Use parent jvm-attach-api and maven-surefire-plugin versions

* Add ] to log msg, suppress inspection
2018-01-24 22:10:36 +01:00
Roman Leventov 61e6878afd Check Javadoc reference integrity (#5279) 2018-01-22 13:51:28 -08:00
Jihoon Son 241efafbb2
Automatic compaction by coordinators (#5102)
* Automatic compaction by coordinator

* add links

* skip compaction for very recent segments if they are small

* fix finding search interval

* fix finding search interval

* fix TimelineHolder iteration

* add test for newestSegmentFirstPolicy

* add CompactionSegmentIterator

* add numTargetCompactionSegments

* add missing config

* fix skipping huge shards

* fix handling large number of segments per shard

* fix test failure

* change recursive call to loop

* fix logging

* fix build

* fix test failure

* address comments

* change dataSources type

* check running pendingTasks at each run

* fix test

* address comments

* fix build

* fix test

* address comments

* address comments

* add doc for segment size optimization

* address comment
2018-01-13 13:52:37 +09:00
Roman Leventov 8877ce38d6
Enforce modifier order with Checkstyle (#5246) 2018-01-11 09:50:42 +01:00
Gian Merlino 7b8b0a96d6 IndexTask: Add summary stats line at the end. (#5241) 2018-01-10 14:06:54 +09:00
Himanshu a46d34daa2 HTTP based task/worker management. (#5104)
* just renaming of SegmentChangeRequestHistory etc

* additional change history refactoring changes

* WorkerTaskManager a replica of WorkerTaskMonitor

* HttpServerInventoryView refactoring to extract sync code and robustification

* Introducing HttpRemoteTaskRunner

* Additional Worker side updates
2018-01-04 19:19:35 -08:00
Roman Leventov 579f9fbedf Add IndexedInts.debugToString() and AbstractIndex.toString(); Add Sequence.toList() and limit() (#5175)
* Add IndexedInts.debugToString() and AbstractIndex.toString()

* Fix AppenderatorTest
2018-01-04 09:56:47 +09:00
David Lim a7967ade4d Support replaceExisting parameter for segments pushers (#5187)
* support replaceExisting parameter for segments pushers

* code review changes

* code review changes
2018-01-03 16:13:21 -08:00
Jihoon Son b31abd03ad
Fix timeout in RemoteTaskRunnerTest (#5191)
* Fix timeout in RemoteTaskRunnerTest

* add message for npe
2017-12-22 17:40:11 +09:00
Jihoon Son 9199d61389 Automatic pendingSegments cleanup (#5149)
* PendingSegments cleanup

* fix build

* address comments

* address comments

* fix potential npe

* address comments

* fix build

* fix test

* fix test
2017-12-20 14:46:34 -08:00
Roman Leventov 5787d04fad Bump Druid version to 0.12.0 (#5138) 2017-12-15 07:37:01 -08:00
Roman Leventov 64848c7ebf DataSegment memory optimizations (#5094)
* Deduplicate DataSegments contents (loadSpec's keys, dimensions and metrics lists as a whole) more aggressively; use ArrayMap instead of default LinkedHashMap for DataSegment.loadSpec, because they have only 3 entries on average; prune DataSegment.loadSpec on brokers

* Fix DataSegmentTest

* Refinements

* Try to fix

* Fix the second DataSegmentTest

* Nullability

* Fix tests

* Fix tests, unify to use TestHelper.getJsonMapper()

* Revert TestUtil as ServerTestHelper, fix tests

* Add newline

* Fix indexing tests

* Fix s3 tests

* Try to fix tests, remove lazy caching of ObjectMapper in TestHelper, rename TestHelper.getJsonMapper() to makeJsonMapper()

* Fix HDFS tests

* Fix HdfsDataSegmentPusherTest

* Capitalize constant names
2017-12-12 11:41:40 -08:00
Gian Merlino 4f5e2b4549 Fix some unemitted alerts. (#5141) 2017-12-06 18:37:01 -08:00
Roman Leventov a7a6a0487e Replace IOPeon with SegmentWriteOutMedium; Improve buffer compression (#4762)
* Replace IOPeon with OutputMedium; Improve compression

* Fix test

* Cleanup CompressionStrategy

* Javadocs

* Add OutputBytesTest

* Address comments

* Random access in OutputBytes and GenericIndexedWriter

* Fix bugs

* Fixes

* Test OutputBytes.readFully()

* Address comments

* Rename OutputMedium to SegmentWriteOutMedium and OutputBytes to WriteOutBytes

* Add comments to ByteBufferInputStream

* Remove unused declarations
2017-12-04 18:04:27 -08:00
Parag Jain 7c01f77b04 Parse Batch support (#5081)
* add parseBatch and deprecate parse method in InputRowParser

add addAll method, skip max rows in memory check for it

remove parse method from implemetations

transform transformers

add string multiplier input row parser

fix withParseSpec

fix kafka batch indexing

fix isPersistRequired

comments

* add unit test

* make persist async

* review comments
2017-12-04 16:06:16 -06:00
Jihoon Son 645de02fb2 Remove Injector from IngestSegmentFirehoseFactory (#5045)
* Add lock check to segment list actions

* fix test

* remove lock check
2017-11-20 15:54:35 -08:00
Parag Jain cb03efeb14 Kafka Index Task that supports Incremental handoffs (#4815)
* Kafka Index Task that supports Incremental handoffs
- Incrementally handoff segments when they hit maxRowsPerSegment limit
- Decouple segment partitioning from Kafka partitioning, all records from consumed partitions go to a single druid segment
- Support for restoring task on middle manager restarts by check pointing end offsets for segments

* take care of review comments

* make getCurrentOffsets call async, keep track of publishing sequence, review comments

* fix setEndoffset duplicate request handling, formatting

* fix unit test

* backward compatibility

* make AppenderatorDriverMetadata backwards compatible

* add unit test

* fix deadlock between persist and push executors in AppenderatorImpl

* fix formatting

* use persist dir instead of work dir

* review comments

* fix deadlock

* actually fix deadlock
2017-11-17 16:05:20 -06:00
Jihoon Son 93459f748f Fix missing intervals after compacting intervals (#5092)
* Fix missing intervals after compacting intervals

* fix build
2017-11-16 11:42:38 -08:00
Jonathan Wei af44d1142b Add unsecured /health endpoint, remove auth checks from isLeader (#5087)
* Add unsecured /health endpoint, remove auth checks from isLeader

* PR comments
2017-11-15 14:41:30 -06:00
Akash Dwivedi c1538f29fc maxQueryTimeout property in runtime properties. (#4852)
* maxQueryTimeout property in runtime properties.

* extra line

* move withTimeoutAndMaxScatterGatherBytes method to QueryLifeCycle.

* Fix initialize method.

* remove unused import.

* doc update.

* some more details in doc about query failure..

* minor fix.

* decorating QueryRunner to set and verify context. Added by servers.

* remove whitespace.
2017-11-13 19:23:11 -06:00
Himanshu 2ecebb3173 Fix coordinator/overlord redirects when TLS is enabled (#5037)
* Fix coordinator/overlord redirects when TLS is enabled

* address review comment

* fix UTs

* workaround to not ignore URL instance to fix the teamcity build

* update tls doc
2017-11-09 13:10:28 -08:00
Jihoon Son c11c71ab3e Using ImmutableDruidDataSource as a key for map and set instead of DruidDataSource (#5054)
* use ImmutableDruidDataSource for map and set

* address comments

* unused import

* allow returning only ImmutableDruidDataSource in MetadataSegmentManager

* address comments

* remove TreeSet

* revert to use TreeSet
2017-11-09 16:07:58 -03:00
Roman Leventov 3541b7544b Prohibit and remove unused declarations in the processing module (#4930)
* Prohibit and remove unused declarations in the processing module

* Fix tests

* Fix integration tests

* Suppress unused

* Try to remove SuppressWarnings unused in VirtualColumn

* Remove reset 'false positives'

* Annotate CliCommandCreator as ExtensionPoint

* Unused import warning instead of error in IntelliJ

* Fixes

* Add comment

* Fix AzureBlob

* Fix CloudFilesBlob

* Address comments

* Add Project SDK section to INTELLIJ_SETUP.md

* Fix image
2017-11-09 09:27:27 -08:00
Jihoon Son 5f3c863d5e Add compaction task (#4985)
* Add compaction task

* added doc

* use combining aggregators

* address comments

* add support for dimensionsSpec

* fix getUniqueDims and getUniqueMetics

* find unique dimensionsSpec

* fix compilation

* add unit test

* fix test

* fix test

* test for different dimension orderings and types, and doc for type and ordering

* add control for custom ordering and type

* update doc

* fix compile

* fix compile

* add segments param

* fix serde error

* fix build
2017-11-03 21:55:27 -06:00
Gian Merlino 6c725a7e06 Fix havingSpec on complex aggregators. (#5024)
* Fix havingSpec on complex aggregators.

- Uses the technique from #4883 on DimFilterHavingSpec too.
- Also uses Transformers from #4890, necessitating a move of that and other
  related classes from druid-server to druid-processing. They probably make
  more sense there anyway.
- Adds a SQL query test.

Fixes #4957.

* Remove unused import.
2017-11-01 12:58:08 -04:00
Jihoon Son e96daa2593 Fix SQLMetadataSegmentManager (#5001) 2017-10-31 08:02:41 -07:00
Gian Merlino 0ce406bdf1
Introduce "transformSpec" at ingest-time. (#4890)
* Introduce "transformSpec" at ingest-time.

It accepts a "filter" (standard query filter object) and "transforms" (a
list of objects with "name" and "expression"). These can be used to do
filtering and single-row transforms without need for a separate data
processing job.

The "expression" fields use the same expression language as other
expression-based feature.

* Remove forbidden api.

* Fix compile error.

* Fix tests.

* Some more changes.

- Add nullable annotation to Firehose.nextRow.
- Add tests for index task, realtime task, kafka task, hadoop mapper,
  and ingestSegment firehose.

* Fix bad merge.

* Adjust imports.

* Adjust whitespace.

* Make Transform into an interface.

* Add missing annotation.

* Switch logger.

* Switch logger.

* Adjust test.

* Adjustment to handling for DatasourceIngestionSpec.

* Fix test.

* CR comments.

* Remove unused method.

* Add javadocs.

* More javadocs, and always decorate.

* Fix bug in TransformingStringInputRowParser.

* Fix bad merge.

* Fix ISFF tests.

* Fix DORC test.
2017-10-30 17:38:52 -07:00
Gian Merlino 5fc6891404 Reduce code duplication between test ExprMacroTables. (#4979) 2017-10-18 15:57:49 -05:00
Roman Leventov dc7cb117a1 Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism (#4886)
* Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism

* Fix MapVirtualColumn.makeColumnValueSelector()

* Minor fixes

* Fix IndexGeneratorCombinerTest

* DimensionSelector to return zeros when treated as numeric ColumnValueSelector

* Fix IncrementalIndexTest

* Fix IncrementalIndex.makeColumnSelectorFactory()

* Optimize MapBasedRow.getMetric()

* Fix VarianceAggregatorTest

* Simplify IncrementalIndex.makeColumnSelectorFactory()

* Address comments

* More comments

* Test
2017-10-13 21:44:17 -05:00