Commit Graph

1271 Commits

Author SHA1 Message Date
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
Kashif Faraz aaf0aaad8f
Enable routing of SQL queries at Router (#11566)
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.

This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.

To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
2021-08-13 18:44:39 +05:30
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn f5d53569ca
Supervisor metadata auto cleanup failing as missing Guice injection (#11424)
* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* fix IT

* fix IT

* Update services/src/main/java/org/apache/druid/cli/CliCoordinator.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fix

* fix

* fix

* fix

* fix

* fix

* fix

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-07-13 09:47:49 +07:00
Abhishek Agarwal 03a6a6d6e1
Replace Processing ExecutorService with QueryProcessingPool (#11382)
This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.
2021-07-01 16:03:08 +05:30
Yi Yuan de8daf8139
Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351)
* delete_buildV9Directly_in_kafka_and_kinesis_indexing_service

* delete

* delete them from server

* delete buildV9Directly from hadoop indexing

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-23 16:36:46 -07:00
Yi Yuan 8de0d36c52
Allow query through router when load moving average extension (#11276)
* init commit

* change NoopQuerySegmentWalker name

* change doc

* move NoopQuerySegmentWalker and add document

* fix doc

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-10 18:46:53 +08:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn d73f72e508
Add feature to automatically remove supervisor based on retention period (#11200)
* add auto clean up

* add test

* add test

* fix test

* Address comments

* Address comments
2021-05-06 22:25:23 -07:00
zachjsh 99f39c7202
Hadoop segment index file rename (#11194)
* Do stuff

* Do more stuff

* * Do more stuff

* * Do more stuff

* * working

* * cleanup

* * more cleanup

* * more cleanup

* * add license header

* * Add unit tests

* * add java docs

* * add more unit tests

* * Cleanup test

* * Move removing of workingPath to index task rather than in hadoop job.

* * Address review comments

* * remove unused import

* * Address review comments

* Do not overwrite segment descriptor for segment if it already exists.

* * add comments to FileSystemHelper class

* * fix local hadoop integration test

* * Fix failing test failures when running with java11

* Revert "Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)"

This reverts commit 49a9c3ffb7.

* * remove JobHelperPowerMockTest

* * remove FileSystemHelper class
2021-05-04 20:22:18 -04:00
Maytas Monsereenusorn 84aac4832d
Add feature to automatically remove rules based on retention period (#11164)
* Add feature to automatically remove rules based on retention period

* Add feature to automatically remove rules based on retention period

* address comments
2021-05-03 11:50:45 -07:00
Jonathan Wei 49a9c3ffb7
Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)
This reverts commit a2892d9c40.
2021-04-22 15:33:27 -07:00
zachjsh a2892d9c40
Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)
* Do stuff

* Do more stuff

* * Do more stuff

* * Do more stuff

* * working

* * cleanup

* * more cleanup

* * more cleanup

* * add license header

* * Add unit tests

* * add java docs

* * add more unit tests

* * Cleanup test

* * Move removing of workingPath to index task rather than in hadoop job.

* * Address review comments

* * remove unused import

* * Address review comments

* Do not overwrite segment descriptor for segment if it already exists.

* * add comments to FileSystemHelper class

* * fix local hadoop integration test
2021-04-21 12:24:31 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Jihoon Son a6a2758095
More unit tests for JsonParserIterator; Integration tests for query errors (#11091)
* unit tests for timeout exception in init

* integration tests

* run integraion test on travis

* fix inspection
2021-04-12 15:08:50 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
Lasse Krogh Mammen 782a1d4e6c
Add Calcite Avatica protobuf handler (#10543) 2021-03-31 12:46:25 -07:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Clint Wylie 74fbdd322d
refactor NodeRole so extensions can participate in disco and announcement (#10700)
* refactor NodeRole so extensions can participate in disco and announcement

* fixes, maybe

* retries

* javadoc

* fix

* spelling
2020-12-24 15:29:32 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
Himanshu ee136303bb
optionally disable all of hardcoded zookeeper use (#9507)
* optionally disable all of hardcoded zookeeper use

* fix DruidCoordinatorTest compilation

* fix test in DruidCoordinatorTest

* fix strict compilation

Co-authored-by: Himanshu Gupta <fill email>
2020-10-26 22:35:59 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Jihoon Son f82fd22fa7
Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308)
* Move tools for indexing to TaskToolbox instead of injecting them in constructor

* oops, other changes

* fix test

* unnecessary new file

* fix test

* fix build
2020-08-26 17:08:12 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Jihoon Son 53a2550571
Follow-up for RetryQueryRunner fix (#10144)
* address comments; use guice instead of query context

* typo

* QueryResource tests

* address comments

* catch queryException

* fix spell check
2020-07-08 13:28:11 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Suneet Saldanha b91a16943b
Make 0.19 brokers compatible with 0.18 router (#10091)
* Make brokers backwards compatible

In 0.19, Brokers gained the ability to serve segments. To support this change,
a `BROKER` ServerType was added to `druid.server.coordination`.

Druid nodes prior to this change do not know of this new server type and so
they would fail to deserialize this node's announcement.

This change makes it so that the broker only announces itself if the segment
cache is configured on the broker. It is expected that a Druid admin will only
configure the segment cache on the broker once the cluster has been upgraded
to a version that supports a broker using the segment cache.

* make code nicer

* Add tests

* Ignore icode coverage for nitialization classes

* Revert "Ignore icode coverage for nitialization classes"

This reverts commit aeec0c2ac2.

* code review
2020-06-29 20:57:33 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Atul Mohan 17cf8ea8f2
Add Sql InputSource (#9449)
* Add Sql InputSource

* Add spelling

* Use separate DruidModule

* Change module name

* Fix docs

* Use sqltestutils for tests

* Add additional tests

* Fix inspection

* Add module test

* Fix md in docs

* Remove annotation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-09 12:55:20 -07:00
Jonathan Wei 771870ae2d
Load broadcast datasources on broker and tasks (#9971)
* Load broadcast datasources on broker and tasks

* Add javadocs

* Support HTTP segment management

* Fix indexer maxSize

* inspection fix

* Make segment cache optional on non-historicals

* Fix build

* Fix inspections, some coverage, failed tests

* More tests

* Add CliIndexer to MainTest

* Fix inspection

* Rename UnprunedDataSegment to LoadableDataSegment

* Address PR comments

* Fix
2020-06-08 20:15:59 -07:00
Maytas Monsereenusorn 8328d91b30
Add missing integration tests for the compaction by the coordinator (#9644)
* Add API to trigger a compaction by the coordinator for integration tests

* Add missing integration tests for the compaction by the coordinator

* address comments
2020-04-15 14:27:33 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Himanshu 5604ac7963
druid extension for OpenID Connect auth using pac4j lib (#8992)
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication

* update version in druid-pac4j pom

* introducing unauthorized resource filter

* authenticated but authorized /unified-webconsole.html

* use httpReq.getRequestURI() for matching callback path

* add documentation

* minor doc addition

* licesne file updates

* make dependency analyze succeed

* fix doc build

* hopefully fixes doc build

* hopefully fixes license check build

* yet another try on fixing license build

* revert unintentional changes to website folder

* update version to 0.18.0-SNAPSHOT

* check session and its expiry on each request

* add crypto service

* code for encrypting the cookie

* update doc with cookiePassphrase

* update license yaml

* make sessionstore in Pac4jFilter private non static

* make Pac4jFilter fields final

* okta: use sha256 for hmac

* remove incubating

* add UTs for crypto util and session store impl

* use standard charsets

* add license header

* remove unused file

* add org.objenesis.objenesis to license.yaml

* a bit of nit changes  in CryptoService  and embedding EncryptionResult for clarity

* rename alg  to cipherAlgName

* take cipher alg name, mode and padding as input

* add java doc  for CryptoService  and make it more understandable

* another  UT for CryptoService

* cache pac4j Config

* use generics clearly in Pac4jSessionStore

* update cookiePassphrase doc to mention PasswordProvider

* mark stuff Nullable where appropriate in Pac4jSessionStore

* update doc to mention jdbc

* add error log on reaching callback resource

* javadoc  for Pac4jCallbackResource

* introduce NOOP_HTTP_ACTION_ADAPTER

* add correct module name in license file

* correct extensions folder name in licenses.yaml

* replace druid-kubernetes-extensions to druid-pac4j

* cache SecureRandom instance

* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 18:15:45 -07:00
Himanshu 1ba1a3c523
fix worker category on Indexer node (#9510) 2020-03-12 14:11:02 -07:00
Gian Merlino 2ef5c17441
Link up row-based datasources to serving layer. (#9503)
* Link up row-based datasources to serving layer.

- Add SegmentWrangler interface that allows linking of DataSources to Segments.
- Add LocalQuerySegmentWalker that uses SegmentWranglers to compute queries on
  data that is available locally.
- Modify ClientQuerySegmentWalker to use LocalQuerySegmentWalker when the base
  datasource is concrete and not a table.
- Add SegmentWranglerModule to the Broker so it has them available and can
  properly instantiate . LocalQuerySegmentWalkers.
- Set InlineDataSource and LookupDataSource to concrete, since they can be
  directly queried now.

* Fix tests.
2020-03-11 11:32:27 -07:00
Jihoon Son 141d8dd875
Enable druid.coordinator.kill.pendingSegments.on by default (#9385)
* Enable druid.coordinator.kill.pendingSegments.on by default

* checkstyle
2020-02-21 13:13:49 -08:00
Jihoon Son 3bb9e7e53a
Inject things instead of subclassing everything for parallel task testing (#9353)
* Inject things instead of subclassing everything for parallel task
testing

* javadoc

* fix compilation

* fix wrong merge

* Address comments
2020-02-16 13:00:12 -08:00
Suneet Saldanha 303b02eba1
intelliJ inspections cleanup (#9260)
* intelliJ inspections cleanup

- remove redundant escapes
- performance warnings
- access static member via instance reference
- static method declared final
- inner class may be static

Most of these changes are aesthetic, however, they will allow inspections to
be enabled as part of CI checks going forward

The valuable changes in this delta are:
- using StringBuilder instead of string addition in a loop
    indexing-hadoop/.../Utils.java
    processing/.../ByteBufferMinMaxOffsetHeap.java
- Use class variables instead of static variables for parameterized test
    processing/src/.../ScanQueryLimitRowIteratorTest.java

* Add intelliJ inspection warnings as errors to druid profile

* one more static inner class
2020-01-29 11:50:52 -08:00