Commit Graph

3994 Commits

Author SHA1 Message Date
Gian Merlino bfffbabb56
Async task client for SeekableStreamSupervisors. (#13354)
Main changes:
1) Convert SeekableStreamIndexTaskClient to an interface, move old code
   to SeekableStreamIndexTaskClientSyncImpl, and add new implementation
   SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient.
2) Add "chatAsync" parameter to seekable stream supervisors that causes
   the supervisor to use an async task client.
3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making
   blocking RPC calls in workerExec threads.
4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList
   to FutureUtils.coalesce, so we can better capture the errors that occurred
   with contacting individual tasks.

Other, related changes:
1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether
   ServiceClient retries unavailable services. Useful since we do not
   want to retry calls unavailable tasks within the service client. (The
   supervisor does its own higher-level retries.)
2) Add FutureUtils.transformAsync, a more lambda friendly version of
   Futures.transform(f, AsyncFunction).
3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but
   returns Either instead of using null on error.
4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.
2022-11-21 19:20:26 +05:30
Rohan Garg 6ccf31490e
Allow injection of node-role set to all non base modules (#13371) 2022-11-18 12:12:03 +05:30
Kashif Faraz 71b133f3ff
Add `RoundRobinServerSelector` to speed up segment assignments (#13367)
Segment assignments can take very long due to the strategy cost computation
for a large number of segments. This commit allows segment assignments to be
done in a round-robin fashion within a tier. Only segment balancing takes cost-based
decisions to move segments around.

Changes
- Add dynamic config `useRoundRobinSegmentAssignment` with default value false
- Add `RoundRobinServerSelector`. This does not implement the `BalancerStrategy`
as it does not conform to that contract and may also be used in conjunction with a
strategy (round-robin for `RunRules` and a cost strategy for `BalanceSegments`)
- Drops are still cost-based even when round-robin assignment is enabled.
2022-11-16 20:05:17 +05:30
Paul Rogers 81d005f267
Druid Catalog basics (#13165)
Druid catalog basics

Catalog object model for tables, columns
Druid metadata DB storage (as an extension)
REST API to update the catalog (as an extension)
Integration tests
Model only: no planner integration yet
2022-11-12 15:30:22 -08:00
AmatyaAvadhanula fb23e38aa7
Fix messageGap emission (#13346)
* Fix messageGap emission

* Do not emit messageGap after stopping reading events

* Refactoring

* Fix tests
2022-11-10 17:50:19 +05:30
Paul Rogers 7e600d2c63
Enhancements to the Calcite test framework (#13283)
* Enhancements to the Calcite test framework
* Standardize "Unauthorized" messages
* Additional test framework extension points
* Resolved joinable factory dependency issue
2022-11-08 14:28:49 -08:00
Kashif Faraz 9f7fd57a69
Improve fetch of pending segments from metadata store (#13310)
* Deserialize only when needed

* Update query to fetch pending segments

* Revert unneeded changes

* Fix query
2022-11-08 05:46:19 -08:00
Kashif Faraz ff8e0c3397
Fix issues with caching cost strategy (#13321)
`cachingCost` strategy has some discrepancies when compared to cost strategy.
This commit addresses two of these by retaining the same behaviour as the `cost` strategy
when computing the cost of moving a segment to a server:
- subtract the self cost of a segment if it is being served by the target server
- subtract the cost of segments that are marked to be dropped

Other changes:
- Add tests to verify fixed strategy. These tests would fail without the fixes made to `CachingCostStrategy.computeCost()`
- Fix the definition of the segment related metrics in the docs.
- Fix some docs issues introduced in #13181
2022-11-08 16:11:39 +05:30
Tejaswini Bandlamudi 594545da55
Adds cluster level idleConfig setting for supervisor (#13311)
* adds cluster level idleConfig

* updates docs

* refactoring

* spelling nit

* nit

* nit

* refactoring
2022-11-08 14:54:14 +05:30
AmatyaAvadhanula 47c32a9d92
Skip ALL granularity compaction (#13304)
* Skip autocompaction for datasources with ETERNITY segments
2022-11-07 17:55:03 +05:30
AmatyaAvadhanula 650840ddaf
Add segment handoff time metric (#13238)
* Add segment handoff time metric

* Remove monitors on scheduler stop

* Add warning log for slow handoff

* Remove monitor when scheduler stops
2022-11-07 17:49:10 +05:30
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00
Kashif Faraz fd7864ae33
Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments (#13287)
In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments`
can take a long very long time to finish. This is because of the costly invocation of 
`timeline.isOvershadowed` which is done for every used segment in every coordinator run.

Changes
- Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments
- Iterate over this set instead of all used segments to identify segments that can be marked as unused
- Mark segments as unused in the DB in batches rather than one at a time
- Refactor: Add class `SegmentTimeline` for ease of use and readability while using a
`VersionedIntervalTimeline` of segments.
2022-11-01 20:19:52 +05:30
somu-imply affc522b9f
Refactoring the data source before unnest (#13085)
* First set of changes for framework

* Second set of changes to move segment map function to data source

* Minot change to server manager

* Removing the createSegmentMapFunction from JoinableFactoryWrapper and moving to JoinDataSource

* Checkstyle fixes

* Patching Eric's fix for injection

* Checkstyle and fixing some CI issues

* Fixing code inspections and some failed tests and one injector for test in avatica

* Another set of changes for CI...almost there

* Equals and hashcode part update

* Fixing injector from Eric + refactoring for broadcastJoinHelper

* Updating second injector. Might revert later if better way found

* Fixing guice issue in JoinableFactory

* Addressing review comments part 1

* Temp changes refactoring

* Revert "Temp changes refactoring"

This reverts commit 9da42a9ef0.

* temp

* Temp discussions

* Refactoring temp

* Refatoring the query rewrite to refer to a datasource

* Refactoring getCacheKey by moving it inside data source

* Nullable annotation check in injector

* Addressing some comments, removing 2 analysis.isJoin() checks and correcting the benchmark files

* Minor changes for refactoring

* Addressing reviews part 1

* Refactoring part 2 with new test cases for broadcast join

* Set for nullables

* removing instance of checks

* Storing nullables in guice to avoid checking on reruns

* Fixing a test case and removing an irrelevant line

* Addressing the atomic reference review comments
2022-10-26 15:58:58 -07:00
Gian Merlino d98c808d3f
Remove basePersistDirectory from tuning configs. (#13040)
* Remove basePersistDirectory from tuning configs.

Since the removal of CliRealtime, it serves no purpose, since it is
always overridden in production using withBasePersistDirectory given
some subdirectory of the task work directory.

Removing this from the tuning config has a benefit beyond removing
no-longer-needed logic: it also avoids the side effect of empty
"druid-realtime-persist" directories getting created in the systemwide
temp directory.

* Test adjustments to appropriately set basePersistDirectory.

* Remove unused import.

* Fix RATC constructor.
2022-10-21 17:25:36 -07:00
Paul Rogers 86e6e61e88
Modular Calcite Test Framework (#12965)
* Refactor Calcite test "framework" for planner tests

Refactors the current Calcite tests to make it a bit easier
to adjust the set of runtime objects used within a test.

* Move data creation out of CalciteTests into TestDataBuilder
* Move "framework" creation out of CalciteTests into
  a QueryFramework
* Move injector-dependent functions from CalciteTests
  into QueryFrameworkUtils
* Wrapper around the planner factory, etc. to allow
  customization.
* Bulk of the "framework" created once per class rather
  than once per test.
* Refactor tests to use a test builder
* Change all testQuery() methods to use the test builder.
Move test execution & verification into a test runner.
2022-10-20 15:45:44 -07:00
Paul Rogers f4dcc52dac
Redesign QueryContext class (#13071)
We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available:

druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks.
druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks.
Both are set using JSON list format:

druid.auth.securedContextKeys=["secretKey1", "secretKey2"]
You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys.

In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally:

sqlQueryId
sqlStringifyArrays
2022-10-15 11:02:11 +05:30
Tejaswini Bandlamudi 3e13584e0e
Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144)
* Idle Seekable stream supervisor changes.

* nit

* nit

* nit

* Adds unit tests

* Supervisor decides it's idle state instead of AutoScaler

* docs update

* nit

* nit

* docs update

* Adds Kafka unit test

* Adds Kafka Integration test.

* Updates travis config.

* Updates kafka-indexing-service dependencies.

* updates previous offsets snapshot & doc

* Doesn't act if supervisor is suspended.

* Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes.

* Reverts Kinesis Supervisor idle behaviour changes.

* nit

* nit

* Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests.

* Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too

* Adds Kafka Supervisor UT

* Improves test coverage in druid-server

* Corrects IT override config

* Doc updates and Syntactic changes

* nit

* supervisorSpec.ioConfig.idleConfig changes
2022-10-12 18:31:08 +05:30
Gian Merlino c19ae13323
Improve direct-memory check on startup. (#13207)
1) Better support for Java 9+ in RuntimeInfo. This means that in many cases,
   an actual validation can be done.

2) Clearer log message in cases where an actual validation cannot be done.
2022-10-12 05:10:25 +08:00
AmatyaAvadhanula 41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Abhishek Agarwal e3f9a0ed44
Lazy initialization of segment killers, movers and archivers (#13170)
* Lazy initialization of segment killers, movers and archivers

* Add test for lazy killer

* Add more tests

* Intellij fixes
2022-10-04 15:55:46 +05:30
Kashif Faraz ce5f55e5ce
Fix over-replication caused by balancing when inventory is not updated yet (#13114)
* Add coordinator test framework

* Remove outdated changes

* Add more tests

* Add option to auto-sync inventory

* Minor cleanup

* Fix inspections

* Add README for simulations, add SegmentLoadingNegativeTest

* Fix over-replication from balancing

* Fix README

* Cleanup unnecessary fields from DruidCoordinator

* Add a test

* Fix DruidCoordinatorTest

* Remove unused import

* Fix CuratorDruidCoordinatorTest

* Remove test log4j2.xml
2022-09-29 12:06:23 +05:30
Kashif Faraz 0039409817
Add test framework to simulate segment loading and balancing (#13074)
Fixes #12822 

The framework added here make it easy to write tests that verify the behaviour and interactions
of the following entities under various conditions:
- `DruidCoordinator`
- `HttpLoadQueuePeon`, `LoadQueueTaskMaster`
- coordinator duties: `BalanceSegments`, `RunRules`, `UnloadUnusedSegments`, etc.
- datasource retention rules: `LoadRule`, `DropRule`

Changes:
Add the following main classes:
- `CoordinatorSimulation` and related interfaces to dictate behaviour of simulation
- `CoordinatorSimulationBuilder` to build a simulation.
- `BlockingExecutorService` to keep submitted tasks in queue and execute them
  only when explicitly invoked.

Add tests:
- `CoordinatorSimulationBaseTest`, `SegmentLoadingTest`, `SegmentBalancingTest`
- `SegmentLoadingNegativeTest` to contain tests which assert the existing erroneous behaviour
of segment loading. Once the behaviour is fixed, these tests will be moved to the regular
`SegmentLoadingTest`.

Please refer to the README.md in `org.apache.druid.server.coordinator.simulate` for more details
2022-09-21 09:51:58 +05:30
Paul Rogers 8ce03eb094
Convert the Druid planner to use statement handlers (#12905)
* Converted Druid planner to use statement handlers

Converts the large collection of if-statements for statement
types into a set of classes: one per supported statement type.
Cleans up a few error messages.

* Revisions from review comments

* Build fix

* Build fix

* Resolve merge confict.

* More merges with QueryResponse PR

* More parameterized type cleanup

Forces a rebuild due to a flaky test
2022-09-19 11:58:45 +05:30
AmatyaAvadhanula 9b53b0184f
Allocate numCorePartitions using only used segments (#13070)
* Allocate numCorePartitions using only used segments

* Add corePartition checks in existing test

* Separate committedMaxId and overallMaxId

* Fix bug: replace overall with committed
2022-09-16 19:16:36 +05:30
AmatyaAvadhanula 1311e85f65
Faster fix for dangling tasks upon supervisor termination (#13072)
This commit fixes issues with delayed supervisor termination during certain transient states.
Tasks can be created during supervisor termination and left behind since the cleanup may
not consider these newly added tasks.

#12178 added a lock for the entire process of task creation to prevent such dangling tasks.
But it also introduced a deadlock scenario as follows:
- An invocation of `runInternal` is in progress.
- A `stop` request comes, acquires `stateChangeLock` and submit a `ShutdownNotice`
- `runInternal` keeps waiting to acquire the `stateChangeLock`
- `ShutdownNotice` remains stuck in the notice queue because `runInternal` is still running
- After some timeout, the supervisor goes through a forced termination

Fix:
 * `SeekableStreamSupervisor.runInternal` - do not try to acquire lock if supervisor is already stopping
 * `SupervisorStateManager.maybeSetState` - do not allow transitions from STOPPING state
2022-09-15 15:31:14 +05:30
Frank Chen aa9b0900d4
Move web-console dependency declaration from druid-server to druid-distribution (#12501)
* Move web-console dependency from druid-server to distribution

* Add a test to check if the web-console is correctly integrated

* exclude web-console from 'other integration tests'

* Revert "exclude web-console from 'other integration tests'"

This reverts commit 8d72225544.

* Revert "Add a test to check if the web-console is correctly integrated"

This reverts commit d6ac8f3087.
2022-09-15 10:39:30 +08:00
Clint Wylie f4ec50bf7a
fix JsonParserIteratorTest (#13083) 2022-09-13 20:49:57 -07:00
Frank Chen fd6c05eee8
Avoid ClassCastException when getting values from `QueryContext` (#13022)
* Use safe conversion methods

* Rename method

* Add getContextAsBoolean

* Update test case

* Remove generic from getContextValue

* Update catch-handler

* Add test

* Resolve comments

* Replace 'getContextXXX' to 'getQueryContext().getAsXXXX'
2022-09-13 18:00:09 +08:00
imply-cheddar 5ba0075c0c
Expose HTTP Response headers from SqlResource (#13052)
* Expose HTTP Response headers from SqlResource

This change makes the SqlResource expose HTTP response
headers in the same way that the QueryResource exposes them.

Fundamentally, the change is to pipe the QueryResponse
object all the way through to the Resource so that it can
populate response headers.  There is also some code
cleanup around DI, as there was a superfluous FactoryFactory
class muddying things up.
2022-09-12 01:40:06 -07:00
Gian Merlino e29e7a8434
Add ARRAY_QUANTILE function. (#13061)
* Add ARRAY_QUANTILE function.

Expected usage is like: ARRAY_QUANTILE(ARRAY_AGG(x), 0.9).

* Fix test.
2022-09-09 11:29:20 -07:00
Lucas Capistrant 99fd22c79b
fix bug in /status/properties filtering (#13045)
* fix bug in /status/properties filtering

* Refactor tests to use jackson for parsing druid.server.hiddenProperties instead of hacky string modifications

* make javadoc more descriptive using example

* add in a sanity assertion that raw properties keyset size is greater than filtered properties keyset size
2022-09-07 17:45:28 -07:00
Rohan Garg 7aa8d7f987
Add query/time metric for SQL queries from router (#12867)
* Add query/time metric for SQL queries from router

* Fix query cancel bug when user has overriden native query-id in a SQL query
2022-09-07 13:54:46 +05:30
Adam Peck ee22663dd3
Add interpolation to JsonConfigurator (#13023)
* Add interpolation to JsonConfigurator

* Fix checkstyle

* Fix tests by removing common-text override

* Add back commons-text without version

* Remove unused hadoopDir configs

* Move some stuff to hopefully pass coverage
2022-09-07 12:48:01 +05:30
Clint Wylie a3a377e570
more consistent expression error messages (#12995)
* more consistent expression error messages

* review stuff

* add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff

* fixes

* add expression transform name to transformer failure, better parse_json error messaging
2022-09-06 23:21:38 -07:00
zemin 6805a7f9c2
Ease of hidding sensitive properties from /status/proper… (#12950)
* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

using one property for hiding properties, updated the index.md to document hiddenProperties

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Added java docs

* apache#12063 Ease of hidding sensitive properties from /status/properties endpoint

Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties

fixed typo and removed redundant space

Co-authored-by: zemin <zemin.piao@adyen.com>
2022-09-02 08:51:25 -05:00
Abhishek Agarwal 618757352b
Bump up the version to 25.0.0 (#12975)
* Bump up the version to 25.0.0

* Fix the version in console
2022-08-29 11:27:38 +05:30
Clint Wylie 16f5ac5bd5
json_value adjustments (#12968)
* json_value adjustments
changes:
* native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type
* rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument

* fix json_value_any to be usable by humans too, coverage

* fix bug

* checkstyle

* checkstyle

* review stuff

* validate that options to json_value are the supported options rather than ignore them

* remove more legacy undocumented functions
2022-08-27 07:15:47 -07:00
Adam Peck 21b73bde20
Update Curator to 5.3.0 (#12939)
* Update Curator to 5.3.0

* Update licenses.yaml

* Fix inspections + add tests.

* Fix checkstyle

* Another intellij inspection fix

* Update curator exclusions

* Cleanup new exhibitor references

* Remove unused dep and checkstyle fix
2022-08-26 18:23:40 -07:00
Santosh Pingale 31dc9004bd
Auto-reload TLS certs for druid endpoints (#12933)
* #12064 Auto-reload tls certs for druid endpoints

* #12064 Add missing toString param

* #12064 Add tests and new jks
Co-authored-by: zemin-piao <pzm6391@gmail.com>

* #12064 Refine tests

* #12064 Add documentation

* Apply suggestions from code review

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: santosh <santosh.pingale@adyen.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
2022-08-25 20:12:43 +08:00
Paul Rogers cfed036091
Add the new integration test framework (#12368)
This commit is a first draft of the revised integration test framework which provides:
- A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.)
- Maven module druid-it-tools to hold code placed into the Docker image.
- Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.)
- Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes.

The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.
2022-08-24 17:03:23 +05:30
Bartosz Mikulski 0bc9f9f303
#12912 Fix KafkaEmitter not emitting queryType for a native query (#12915)
Fixes KafkaEmitter not emitting queryType for a native query. The Event to JSON serialization was extracted to the external class: EventToJsonSerializer. This was done to simplify the testing logic for the serialization as well as extract the responsibility of serialization to the separate class.

The logic builds ObjectNode incrementally based on the event .toMap method. Parsing each entry individually ensures that the Jackson polymorphic annotations are respected. Not respecting these annotation caused the missing of the queryType from output event.
2022-08-24 14:07:00 +05:30
Adarsh Sanjeev 3b58a01c7c
Correct spelling in messages and variable names. (#12932) 2022-08-24 11:06:31 +05:30
Clint Wylie 289e43281e
stricter behavior for parse_json, add try_parse_json, remove to_json (#12920) 2022-08-22 18:41:07 -07:00
Rohan Garg a879d91a20
Remove misleading logging on router for JDBC queries (#12925) 2022-08-22 11:58:51 +05:30
Rohan Garg 3c129f6728
Add sql planning time metric (#12923) 2022-08-22 11:09:44 +05:30
Karan Kumar a3a9c5f409
Fixing overlord issued too many redirects (#12908)
* Fixing race in overlord redirects where the node was redirecting to itself

* Fixing test cases
2022-08-17 18:27:39 +05:30
Paul Rogers 41712b7a3a
Refactor SqlLifecycle into statement classes (#12845)
* Refactor SqlLifecycle into statement classes

Create direct & prepared statements
Remove redundant exceptions from tests
Tidy up Calcite query tests
Make PlannerConfig more testable

* Build fixes

* Added builder to SqlQueryPlus

* Moved Calcites system properties to saffron.properties

* Build fix

* Resolve merge conflict

* Fix IntelliJ inspection issue

* Revisions from reviews

Backed out a revision to Calcite tests that didn't work out as planned

* Build fix

* Fixed spelling errors

* Fixed failed test

Prepare now enforces security; before it did not.

* Rebase and fix IntelliJ inspections issue

* Clean up exception handling

* Fix handling of JDBC auth errors

* Build fix

* More tweaks to security messages
2022-08-14 00:44:08 -07:00
Gian Merlino 836430019a
Add EXTERNAL resource type. (#12896)
This is used to control access to the EXTERN function, which allows
reading external data in SQL. The EXTERN function is not usable in
production as of today, but it is used by the task-based SQL engine
contemplated in #12262.
2022-08-12 10:57:30 -07:00
Karan Kumar 2f2d8ded5a
Introducing Storage connector Interface (#12874)
In the current druid code base, we have the interface DataSegmentPusher which allows us to push segments to the appropriate deep storage without the extension being worried about the semantics of how to push too deep storage.

While working on #12262, whose some part of the code will go as an extension, I realized that we do not have an interface that allows us to do basic "write, get, delete, deleteAll" operations on the appropriate deep storage without let's say pulling the s3-storage-extension dependency in the custom extension.

Hence, the idea of StorageConnector was born where the storage connector sits inside the druid core so all extensions have access to it.

Each deep storage implementation, for eg s3, GCS, will implement this interface.
Now with some Jackson magic, we bind the implementation of the correct deep storage implementation on runtime using a type variable.
2022-08-12 16:11:49 +05:30
Adarsh Sanjeev 24f8f9e1ab
Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844)
* Add check for eternity time segment to SqlSegmentsMetadataQuery

* Add check for half eternities

* Add multiple segments test

* Add failing test to document known issue
2022-08-04 22:33:08 -07:00
Paul Rogers a618458bf0
Tidy up construction of the Guice Injectors (#12816)
* Refactor Guice initialization

Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder

* Revisions from review comments

* Build fixes

* Revisions from review comments
2022-08-04 00:05:07 -07:00
Gian Merlino ef6811ef88
Improved Java 17 support and Java runtime docs. (#12839)
* Improved Java 17 support and Java runtime docs.

1) Add a "Java runtime" doc page with information about supported
   Java versions, garbage collection, and strong encapsulation..

2) Update asm and equalsverifier to versions that support Java 17.

3) Add additional "--add-opens" lines to surefire configuration, so
   tests can pass successfully under Java 17.

4) Switch openjdk15 tests to openjdk17.

5) Update FrameFile to specifically mention Java runtime incompatibility
   as the cause of not being able to use Memory.map.

6) Update SegmentLoadDropHandler to log an error for Errors too, not
   just Exceptions. This is important because an IllegalAccessError is
   encountered when the correct "--add-opens" line is not provided,
   which would otherwise be silently ignored.

7) Update example configs to use druid.indexer.runner.javaOptsArray
   instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)

* Adjustments.

* Use run-java in more places.

* Add run-java.

* Update .gitignore.

* Exclude hadoop-client-api.

Brought in when building on Java 17.

* Swap one more usage of java.

* Fix the run-java script.

* Fix flag.

* Include link to Temurin.

* Spelling.

* Update examples/bin/run-java

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2022-08-03 23:16:05 -07:00
Gian Merlino 2912a36a20
Use nonzero default value of maxQueuedBytes. (#12840)
* Use nonzero default value of maxQueuedBytes.

The purpose of this parameter is to prevent the Broker from running out
of memory. The prior default is unlimited; this patch changes it to a
relatively conservative 25MB.

This may be too low for larger clusters. The risk is that throughput
can decrease for queries with large resultsets or large amounts of intermediate
data. However, I think this is better than the risk of the prior default, which
is that these queries can cause the Broker to go OOM.

* Alter calculation.
2022-08-02 17:57:27 -07:00
Atul Mohan 93a9a4b1c5
Add retention for file request logs (#12559)
* Add retention for file request logs

* Spelling
2022-07-27 08:17:02 -07:00
Rohan Garg bf0886a8ab
Fix hash calcuation in RendezvousHasher (#12817) 2022-07-27 12:16:27 +05:30
Tejaswini Bandlamudi 5772dfd155
Peons should not report SysMonitor stats since MiddleManager reports them. (#12802)
Sysmonitor stats (mem, fs, disk, net, cpu, swap, sys, tcp) are reported by all Druid processes, including Peons that are ephemeral in nature. Since Peons always run on the same host as the MiddleManager that spawned them and is unlikely to change, the SyMonitor metrics emitted by Peon are merely duplicates. This is often not a problem except when machines are super-beefy. Imagine a 64-core machine and 32 workers running on this machine. now you will have each Peon reporting metrics for each core. that's an increase of (32 * 64)x in the number of metrics. This leads to a metric explosion.

This PR updates MetricsModule to check node role running while registering SysMonitor and not to load any existing SysMonitor$Stats.
2022-07-23 13:32:16 +05:30
Tejaswini Bandlamudi cc1ff56ca5
Unregisters `RealtimeMetricsMonitor`, `TaskRealtimeMetricsMonitor` on Indexers after task completion (#12743)
Few indexing tasks register RealtimeMetricsMonitor or TaskRealtimeMetricsMonitor with the process’s MonitorScheduler when they start. These monitors never unregister themselves (they always return true, they'd need to return false to unregister). Each of these monitors emits a set of metrics once every druid.monitoring.emissionPeriod.
As a result, after executing several tasks for a while, Indexer emits metrics of these tasks even after they're long gone.

Proposed Solution
Since one should be able to obtain the last round of ingestion metrics after the task unregisters the monitor, introducing lastRoundMetricsToBePushed variable to keep track of the same and overriding the AbstractMonitor.monitor method in RealtimeMetricsMonitor, TaskRealtimeMetricsMonitor to implement the new logic.
2022-07-18 14:34:18 +05:30
Clint Wylie 05b2e967ed
druid nested data column type (#12753)
* add new druid nested data column type

* fixes and such

* fixes

* adjustments, more tests

* self review

* oops

* fix and test

* more better

* style
2022-07-14 12:07:23 -07:00
zachjsh c0380e7b0a
* fix duplicate dimension (#12778) 2022-07-14 10:39:03 +05:30
TSFenwick 8c02880d5f
Emit metrics for distribution of number of rows per segment (#12730)
* initial commit of bucket dimensions for metrics

return counts of segments that have rowcount in a bucket size for a datasource
return average value of rowcount per segment in a datasource
added unit test
naming could use a lot of work
buckets right now are not finalized
added javadocs
altered metrics.md

* fix checkstyle issues

* addressed review comments

add monitor test
move added functionality to new monitor
update docs

* address comments

renamed monitor
handle tombstones better
update docs
added javadocs

* Add support for tombstones in the segment distribution

* undo changes to tombstone segmentizer factory

* fix accidental whitespacing changes

* address comments regarding metrics documentation

and rename variable to be more accurate

* fix tests

* fix checkstyle issues

* fix broken test

* undo removal of timeout
2022-07-12 07:04:42 -07:00
Rohan Garg bb953be09b
Refactor usage of JoinableFactoryWrapper + more test coverage (#12767)
Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled
2022-07-12 06:25:36 -07:00
Gian Merlino d2576584a0
Consolidate the two TaskStatus classes. (#12765)
* Consolidate the two TaskStatus classes.

There are two, but we don't need more than one.

* Fix import order.
2022-07-11 07:25:22 -07:00
Gian Merlino 378fea9517
Retain CSP configuration in ServerConfig constructor. (#12755)
Without this change, CliIndexer would not apply custom CSP headers
and would revert to the default.
2022-07-08 19:19:14 +05:30
Gian Merlino 2b330186e2
Mid-level service client and updated high-level clients. (#12696)
* Mid-level service client and updated high-level clients.

Our servers talk to each other over HTTP. We have a low-level HTTP
client (HttpClient) that is super-asynchronous and super-customizable
through its handlers. It's also proven to be quite robust: we use it
for Broker -> Historical communication over the wide variety of query
types and workloads we support.

But the low-level client has no facilities for service location or
retries, which means we have a variety of high-level clients that
implement these in their own ways. Some high-level clients do a better
job than others. This patch adds a mid-level ServiceClient that makes
it easier for high-level clients to be built correctly and harmoniously,
and migrates some of the high-level logic to use ServiceClients.

Main changes:

1) Add ServiceClient org.apache.druid.rpc package. That package also
   contains supporting stuff like ServiceLocator and RetryPolicy
   interfaces, and a DiscoveryServiceLocator based on
   DruidNodeDiscoveryProvider.

2) Add high-level OverlordClient in org.apache.druid.rpc.indexing.

3) Indexing task client creator in TaskServiceClients. It uses
   SpecificTaskServiceLocator to find the tasks. This improves on
   ClientInfoTaskProvider by caching task locations for up to 30 seconds
   across calls, reducing load on the Overlord.

4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient
   instead of extending IndexTaskClient.

5) Rework RemoteTaskActionClient to use a ServiceClient instead of
   DruidLeaderClient.

6) Rework LocalIntermediaryDataManager, TaskMonitor, and
   ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and
   Overlord no longer need IndexingServiceClient (which internally used
   DruidLeaderClient).

There are some concrete benefits over the prior logic, namely:

- DruidLeaderClient does retries in its "go" method, but only retries
  exactly 5 times, does not sleep between retries, and does not retry
  retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.)
  ServiceClient handles retries in a more reasonable way.

- DruidLeaderClient's methods are all synchronous, whereas ServiceClient
  methods are asynchronous. This is used in one place so far: the
  SpecificTaskServiceLocator, so we don't need to block a thread trying
  to locate a task. It can be used in other places in the future.

- HttpIndexingServiceClient does not properly handle all server errors.
  In some cases, it tries to parse a server error as a successful
  response (for example: in getTaskStatus).

- IndexTaskClient currently makes an Overlord call on every task-to-task
  HTTP request, as a way to find where the target task is. ServiceClient,
  through SpecificTaskServiceLocator, caches these target locations
  for a period of time.

* Style adjustments.

* For the coverage.

* Adjustments.

* Better behaviors.

* Fixes.
2022-07-05 09:43:26 -07:00
Kashif Faraz f5b5cb93ea
Fix expiry timeout bug in LocalIntermediateDataManager (#12722)
The expiry timeout is compared against the current time but the condition is reversed.
This means that as soon as a supervisor task finishes, its partitions are cleaned up,
irrespective of the specified `intermediaryPartitionTimeout` period.

After these changes, the `intermediaryPartitionTimeout` will start getting honored.

Changes
* Fix the condition
* Add tests to verify the new correct behaviour
* Reduce the default expiry timeout from P1D to PT5M
   to retain current behaviour in case of default configs.
2022-07-01 16:29:22 +05:30
Gian Merlino 679ccffe0f
Revert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)" (#12679)
This reverts commit 8fbf92e047.
2022-06-25 09:08:26 +05:30
Tejaswini Bandlamudi 1fc2f6e4b0
Throw BadQueryContextException if context params cannot be parsed (#12680) 2022-06-24 09:21:25 +05:30
Paul Rogers ffcb996468
Cleanup changes pulled out of PR #12368 (#12672)
This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
2022-06-23 23:19:50 +05:30
AmatyaAvadhanula eccdec9139
Reduce interval creation cost for segment cost computation (#12670)
Changes:
- Reuse created interval in `SegmentId.getInterval()`
- Intern intervals to save on memory footprint
2022-06-21 17:39:43 +05:30
Paul Rogers 893759de91
Remove null and empty fields from native queries (#12634)
* Remove null and empty fields from native queries

* Test fixes

* Attempted IT fix.

* Revisions from review comments

* Build fixes resulting from changes suggested by reviews

* IT fix for changed segment size
2022-06-16 14:07:25 -07:00
AmatyaAvadhanula f970757efc
Optimize overlord GET /tasks memory usage (#12404)
The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
2022-06-16 22:30:37 +05:30
Lucas Capistrant 602d95d865
Add a builder class for TestDruidCoordinatorConfig (#12624)
* Add a builder class for TestDruidCoordinatorConfig

* updates after review

* Fix formatting
2022-06-16 09:11:31 -05:00
Paul Rogers 45e3111549
Clean up query contexts (#12633)
* Clean up query contexts

Uses constants in place of literal strings for context keys.
Moves some QueryContext methods to QueryContexts for reuse.

* Revisions from review comments
2022-06-15 11:31:22 -07:00
Gian Merlino 1f6e888472
Add QoSFilters first in the chain. (#12625)
* Add QoSFilters first in the chain.

When a request is suspended and later resumed due to QoS constraints,
its filter chain is restarted. Placing QoSFilters first in the chain
avoids double-execution of other filters.

Fixes an issue where requests deferred by QoS would report 403 Forbidden
due to double-execution of SecuritySanityCheckFilter.

* Smaller changes.

* Add QoS filters in BaseJettyTest.

* Remove unused parameter.
2022-06-14 13:37:00 -07:00
Gian Merlino 8fbf92e047
SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600)
* SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments.

Segments with endpoints prior to year 0 or after year 9999 may overlap
the search intervals but not match the generated SQL conditions. So, we
need to add an additional OR condition to catch these.

I checked a real, live MySQL metadata store to confirm that the query
still uses metadata store indexes. It does.

* Add comments.
2022-06-07 11:33:46 -07:00
Abhishek Agarwal 59a0c10c47
Add remedial information in error message when type is unknown (#12612)
Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.
2022-06-07 20:22:45 +05:30
Gian Merlino a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Clint Wylie d0c9c37e35
make query context changes backwards compatible (#12564)
Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers.

The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values.

I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff.

The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.
2022-05-25 15:24:41 +05:30
Agustin Gonzalez 2f3d7a4c07
Emit state of replace and append for native batch tasks (#12488)
* Emit state of replace and append for native batch tasks

* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)

* Add metric to compaction job

* Avoid null ptr exc when null emitter

* Coverage

* Emit tombstone & segment counts

* Tasks need a type

* Spelling

* Integrate BatchIngestionMode in batch ingestion tasks functionality

* Typos

* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.

* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.

* Spelling

* Avoid polluting the Task interface

* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
2022-05-23 12:32:47 -07:00
superivaj f9bdb3b236
Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551)
Issue: 
Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config
(to optimize memory usage, particulary for datasources with many dimensions),
the corresponding client object `ClientCompactionTaskQueryTuningConfig`
(used by the coordinator duty `CompactSegments` to trigger auto-compaction)
does not contain this field. Thus, the value of `maxColumnsToMerge` specified
in any datasource compaction config is ignored.

Changes:
- Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig`
  and `UserCompactionTaskQueryTuningConfig`
- Fix tests
2022-05-20 22:23:08 +05:30
Gian Merlino 4631cff2a9
Free ByteBuffers in tests and fix some bugs. (#12521)
* Ensure ByteBuffers allocated in tests get freed.

Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.

To fix this:

1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
   This makes it easy to free the direct buffer. Currently, it's only used
   in tests, because production code seems OK.

2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
   to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
   ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
   reason for the buffer to be off-heap). Make sure to close all direct
   holders when done.

* Changes based on CI results.

* A different approach.

* Roll back BitmapOperationTest stuff.

* Try additional surefire memory.

* Revert "Roll back BitmapOperationTest stuff."

This reverts commit 49f846d9e3.

* Add TestBufferPool.

* Revert Xmx change in tests.

* Better behaved NestedQueryPushDownTest. Exit tests on OOME.

* Fix TestBufferPool.

* Remove T1C from ARM tests.

* Somewhat safer.

* Fix tests.

* Fix style stuff.

* Additional debugging.

* Reset null / expr configs better.

* ExpressionLambdaAggregatorFactory thread-safety.

* Alter forkNode to try to get better info when a JVM crashes.

* Fix buffer retention in ExpressionLambdaAggregatorFactory.

* Remove unused import.
2022-05-19 07:42:29 -07:00
Tejaswini Bandlamudi c877d8a981
Updates default inputSegmentSizeBytes in Compaction config (#12534)
Fixes Cannot serialize BigInt value as JSON error while loading compaction config in console.
2022-05-19 14:43:34 +05:30
Clint Wylie b23ddc5939
print replication levels in coordinator segment logs (#12511)
* print replication levels in coordinator segment logs

* add served segment count to stats

* also for drops
2022-05-17 02:24:13 -07:00
Lucas Capistrant deb69d1bc0
Allow coordinator to be configured to kill segments in future (#10877)
Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system.

A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.
2022-05-11 07:35:15 +05:30
Vadim Ogievetsky fb08bac01a
Web console: Misc table fixes (#12489)
* Misc table fixes

* extract default className

* table spacing updates

* fix e2e action selector

* try more times

* make the web console exist again
2022-05-03 12:08:08 -07:00
MC-JY bb080693a9
Improve build performance of modules (#12486)
* improve build performance of modules

* improve build performance of modules

* Update pom.xml

* improve build performance of modules
2022-05-01 22:43:11 +08:00
Gian Merlino 529b983ad0
GroupBy: Reduce allocations by reusing entry and key holders. (#12474)
* GroupBy: Reduce allocations by reusing entry and key holders.

Two main changes:

1) Reuse Entry objects returned by various implementations of
   Grouper.iterator.

2) Reuse key objects contained within those Entry objects.

This is allowed by the contract, which states that entries must be
processed and immediately discarded. However, not all call sites
respected this, so this patch also updates those call sites.

One particularly sneaky way that the old code retained entries too long
is due to Guava's MergingIterator and CombiningIterator. Internally,
these both advance to the next value prior to returning the current
value. So, this patch addresses that in two ways:

1) For merging, we have our own implementation MergeIterator already,
   although it had the same problem. So, this patch updates our
   implementation to return the current item prior to advancing to the
   next item. It also adds a forbidden-api entry to ensure that this
   safer implementation is used instead of Guava's.

2) For combining, we address the problem in a different way: by copying
   the key when creating the new, combined entry.

* Attempt to fix test.

* Remove unused import.
2022-04-28 23:21:13 -07:00
Gian Merlino a2bad0b3a2
Reduce allocations due to Jackson serialization. (#12468)
* Reduce allocations due to Jackson serialization.

This patch attacks two sources of allocations during Jackson
serialization:

1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new
   DefaultSerializerProvider instance for each call. It has lots of
   fields and creates pressure on the garbage collector. So, this patch
   adds helper functions in JacksonUtils that enable reuse of
   SerializerProvider objects and updates various call sites to make
   use of this.

2) GroupByQueryToolChest copies the ObjectMapper for every query to
   install a special module that supports backwards compatibility with
   map-based rows. This isn't needed if resultAsArray is set and
   all servers are running Druid 0.16.0 or later. This release was a
   while ago. So, this patch disables backwards compatibility by default,
   which eliminates the need to copy the heavyweight ObjectMapper. The
   patch also introduces a configuration option that allows admins to
   explicitly enable backwards compatibility.

* Add test.

* Update additional call sites and add to forbidden APIs.
2022-04-27 14:17:26 -07:00
Abhishek Agarwal 2fe053c5cb
Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
zachjsh 564d6defd4
Worker level task metrics (#12446)
* * fix metric name inconsistency

* * add task slot metrics for middle managers

* * add new WorkerTaskCountStatsMonitor to report task count metrics
  from worker

* * more stuff

* * remove unused variable

* * more stuff

* * add javadocs

* * fix checkstyle

* * fix hadoop test failure

* * cleanup

* * add more code coverage in tests

* * fix test failure

* * add docs

* * increase code coverage

* * fix spelling

* * fix failing tests

* * remove dead code

* * fix spelling
2022-04-26 11:44:44 -05:00
Rohan Garg 95694b5afa
Convert simple min/max SQL queries on __time to timeBoundary queries (#12472)
* Support array based results in timeBoundary query

* Fix bug with query interval in timeBoundary

* Convert min(__time) and max(__time) SQL queries to timeBoundary

* Add tests for timeBoundary backed SQL queries

* Fix query plans for existing tests

* fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary

* fixup! Add tests for timeBoundary backed SQL queries

* fixup! Fix bug with query interval in timeBoundary
2022-04-25 08:18:58 -07:00
Gian Merlino b7621226d2
QueryScheduler: Log per-query message at DEBUG level. (#12467)
We generally want to avoid having any routine per-query messages at
INFO level, because they pollute logs.
2022-04-22 11:22:34 -07:00
Jihoon Son 73ce5df22d
Add support for authorizing query context params (#12396)
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause 
1) a bad UX if the context param is not matured yet or 
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
2022-04-21 14:21:16 +05:30
Maytas Monsereenusorn c25a556827
Fix bug in auto compaction preserveExistingMetrics feature (#12438)
* fix bug

* fix test

* fix IT
2022-04-15 15:47:47 -07:00
Agustin Gonzalez 0460d45e92
Make tombstones ingestible by having them return an empty result set. (#12392)
* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage
2022-04-15 09:08:06 -07:00
Parag Jain 2c79d28bb7
Copy of #11309 with fixes (#12402)
* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
2022-04-11 21:05:24 +05:30
Maytas Monsereenusorn 36e17a20ea
Improve metrics for Auto Compaction (#12413)
* add impl

* add docs

* fix
2022-04-08 20:14:36 -07:00
Maytas Monsereenusorn 8edea5a82d
Add a new flag for ingestion to preserve existing metrics (#12185)
* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT
2022-04-08 11:02:02 -07:00
Paul Rogers 2cc2088720
Method to specify eternity in the scan query builder (#12223)
* Method to specify eternity in the scan query builder

* Fix checkstyle issue

* Renamed eterity() to eternityInterval()

* Minor fixes
2022-04-04 15:11:32 -07:00