Commit Graph

4274 Commits

Author SHA1 Message Date
Jonathan Wei 5d1e66b8f9
Allow broker to use catalog for datasource schemas for SQL queries (#15469)
* Allow broker to use catalog for datasource schemas

* More PR comments

* PR comments
2024-01-08 13:46:08 -06:00
AmatyaAvadhanula 63bfb3e6c9
Handle half-eternity intervals while fetching segments with created dates (#15608)
* Handle all intervals while fetching segments with created dates
2024-01-08 12:07:11 +05:30
George Shiqi Wu 9fe67958be
Increase ServerConnector accept queue size (#15596)
* Allow overwriting ServerConnector accept queue size

* Use a single config

* Fix spacing

* fix spacing

* fixed value

* read value from environment

* fix spacing

* Unpack value before reading

* check somaxconn on linux only
2024-01-05 12:04:15 -05:00
Gian Merlino e40b96e026
Reverse lookup fixes and enhancements. (#15611)
* Reverse lookup fixes and enhancements.

1) Add a "mayIncludeUnknown" parameter to DimFilter#optimize. This is important
   because otherwise the reverse-lookup optimization is done improperly when
   the "in" filter appears under a "not", and the lookup extractionFn may return
   null for some possible values of the filtered column. The "includeUnknown" test
   cases in InDimFilterTest illustrate the difference in behavior.

2) Enhance InDimFilter#optimizeLookup to handle "mayIncludeUnknown", and to be able
   to do a reverse lookup in a wider variety of cases.

3) Make "unapply" protected in LookupExtractor, and move callers to "unapplyAll".
   The main reason is that MapLookupExtractor, a common implementation, lacks a
   reverse mapping and therefore does a scan of the map for each call to "unapply".
   For performance sake these calls need to be batched.

* Remove optimize call from BloomDimFilter.

* Follow the law.

* Fix tests.

* Fix imports.

* Switch function.

* Fix tests.

* More tests.
2024-01-03 13:28:44 -08:00
Abhishek Radhakrishnan f0f428274a
Prometheus config property doc fixup (#15613)
* Minor fixes

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-01-02 16:28:42 -08:00
kaisun2000 a5e9b14be0
Add delay before the peon drops the segments after publishing them (#15373)
Currently in the realtime ingestion (Kafka/Kinesis) case, after publishing the segments, upon acknowledgement from the coordinator that the segments are already placed in some historicals, the peon would unannounce the segments (basically saying the segments are not in this peon anymore to the whole cluster) and drop the segments from cache and sink timeline in one shot.

The in transit queries from the brokers that still thinks the segments are in the peon can get a NullPointer exception when the peon is unsetting the hydrants in the sinks.

The fix would let the peon to wait for a configurable delay period before dropping segments, remove segments from cache etc after the peon unannounce the segments.

This delayed approach is similar to how the historicals handle segments moving out.
2024-01-02 11:08:28 +05:30
Kashif Faraz 9f568858ef
Add logging implementation for AuditManager and audit more endpoints (#15480)
Changes
- Add `log` implementation for `AuditManager` alongwith `SQLAuditManager`
- `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for
all `fetchAuditHistory` calls.
- Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default)
- Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`.
This gets activated only if `type` is `log`.
- Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs
- Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of
audit payload and other utility methods.
2023-12-19 13:14:04 +05:30
Kashif Faraz feeb4f0fb0
Allocate pending segments at latest committed version (#15459)
The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:

datasource
sequence name
same interval
same value of skipSegmentLineageCheck (false for batch append, true for streaming append)
same previous segment id (used only when skipSegmentLineageCheck = false)
The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the sequence_name_prev_id_sha1 column in druid_pendingSegments metadata table).

This reuse is done in order to

allow replica tasks (in case of streaming ingestion) to use the same set of segment IDs.
allow re-run of a failed batch task to use the same segment ID and prevent unnecessary allocations
2023-12-14 16:18:39 +05:30
AmatyaAvadhanula 91ca8e73d6
Skip compaction for datasources with partial-eternity segments (#15542)
This PR builds on #13304 to skip compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.

This is needed in order to prevent an issue similar to #13208 as the Coordinator tries to iterate over a large number of intervals when trying to compact an interval with infinite start or end.
2023-12-12 15:06:45 +05:30
zachjsh ab7d9bc6ec
Add api for Retrieving unused segments (#15415)
### Description

This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified

`GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}`

Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval.
Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results.
The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter.

`dataSourceName`: The name of the datasource
`interval`:                 the specific interval to search for unused segments for.
`limit`:                      the maximum number of unused segments to return information about. This property helps to
                                 support pagination
`lastSegmentId`:     the last segment id from which to search for results. All segments returned are > this segment 
                                 lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
`sortOrder`:            Specifies the order with which to return the matching segments by start time, end time. A null
                                value indicates that order does not matter.

This PR has:

- [x] been self-reviewed.
   - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
2023-12-11 16:32:18 -05:00
Abhishek Radhakrishnan 96be82a3e6
Clean up duty for non-overlapping eternity tombstones (#15281)
* Add initial draft of MarkDanglingTombstonesAsUnused duty.

* Use overshadowed segments instead of all used segments.

* Add unit test for MarkDanglingSegmentsAsUnused duty.

* Add mock call

* Simplify code.

* Docs

* shorter lines formatting

* metric doc

* More tests, refactor and fix up some logic.

* update javadocs; other review comments.

* Make numCorePartitions as 0 in the TombstoneShardSpec.

* fix up test

* Add tombstone core partition tests

* Update docs/design/coordinator.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* review comment

* Minor cleanup

* Only consider tombstones with 0 core partitions

* Need to register the test shard type to make jackson happy

* test comments

* checkstyle

* fixup misc typos in comments

* Update logic to use overshadowed segments

* minor cleanup

* Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone.

* Address review feedback.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-12-11 08:57:15 -08:00
Clint Wylie 557f3f6f57
add array column type support to EXTEND operator (#15458) 2023-12-06 23:21:35 -08:00
Rishabh Singh 6a64f72c67
Lookup on incomplete partition set in SegmentMetadataQuerySegmentWalker (#15496)
Description
With CentralizedDatasourceSchema (#14989) feature enabled, metadata for appended segments was not being refreshed. This caused numRows to be 0 for the new segments and would probably cause the datasource schema to not include columns from the new segments.

Analysis
The problem turned out in the new QuerySegmentWalker implementation in the Coordinator. It first finds the segment to be queried in the Coordinator timeline. Then it creates a new timeline of the segments present in the timeline.
The problem was that it is looking up complete partition set in the new timeline. Since the appended segments by themselves do not make a complete partition set, no SegmentMetadataQuery were executed.
2023-12-06 15:25:28 +05:30
Abhishek Radhakrishnan f4949afdd7
clarify and fixup typos related to unused segments in docs and javadocs. (#15498) 2023-12-05 22:30:32 -08:00
Vishesh Garg 326b7b731d
Upgrade zookeeper from 3.5.10 to 3.8.3 (#15477)
Upgrade zookeeper from 3.5.10 to 3.8.3
2023-12-05 18:57:56 +05:30
Rishabh Singh d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Jan Werner 8cc256b079
update guava to 32.0.1-jre to address CVEs (#15482)
Update guava to 32.0.1-jre to address two CVEs: CVE-2020-8908, CVE-2023-2976
This change requires a minor test change to remove assumptions about ordering.

---------

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2023-12-04 13:18:42 -08:00
Pranav 9f3b26676d
Log full stack when exception message is null (#15467) 2023-11-30 16:47:37 -08:00
Kashif Faraz 8ddb847658
Fix message when skipping compaction (#15460) 2023-11-30 14:57:13 +05:30
Vadim Ogievetsky 31fa63e789
Web console: better management proxy detection (#15453)
* better management proxy detection

* fix checkstyle issue

* add test

* test should read the body also

* use ObjectMapper

* assert read ammount
2023-11-29 21:43:42 -08:00
Kashif Faraz 58a724c7e4
Use StubServiceEmitter in tests (#15426)
* Use StubServiceEmitter in tests
* Remove unthrown exception from declaration
2023-11-28 09:43:09 +05:30
Karan Kumar a0188192de
Fixing failing compaction/parallel index jobs during upgrade due to new actions being available on the overlord. (#15430)
* Fixing failing compaction/parallel index jobs during upgrade due to new actions not available on the overlord.

* Fixing build

* Removing extra space.

* Fixing json getter.

* Review comments.
2023-11-25 13:50:29 +05:30
Gian Merlino 5ccd79d62b
Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug. (#15260)
* Fix NPE caused by realtime segment closing race, fix possible missing-segment retry bug.

Fixes #12168, by returning empty from FireHydrant when the segment is
swapped to null. This causes the SinkQuerySegmentWalker to use
ReportTimelineMissingSegmentQueryRunner, which causes the Broker to look
for the segment somewhere else.

In addition, this patch changes SinkQuerySegmentWalker to acquire references
to all hydrants (subsegments of a sink) at once, and return a
ReportTimelineMissingSegmentQueryRunner if *any* of them could not be acquired.
I suspect, although have not confirmed, that the prior behavior could lead to
segments being reported as missing even though results from some hydrants were
still included.

* Some more test coverage.
2023-11-20 10:48:14 -08:00
Abhishek Radhakrishnan 470c8ed7b0
Make `numCorePartitions` as 0 for tombstones (#15379)
* Make numCorePartitions as 0 in the TombstoneShardSpec.

* fix up test

* Add tombstone core partition tests

* review comment

* Need to register the test shard type to make jackson happy
2023-11-20 09:42:51 -08:00
Rishabh Singh 03a092f4ec
Document Nuances in SegmentMetadataCache Behaviour (#15367)
* Document segment metadata cache behaviour

* Fix typo

* Minor update

* Minor change
2023-11-15 18:41:20 +05:30
Rishabh Singh 5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
AmatyaAvadhanula 895e53555c
Optimize mark segments as unused (#15352) 2023-11-09 15:13:45 +05:30
Pranav e2fde8c516
Refactor lookups behavior while loading/dropping the containers (#14806) 2023-11-07 10:07:28 -08:00
Abhishek Radhakrishnan 2136dc3591
Batch segment retrieval from the metadata store (#15305)
* Add a unit test that fails when used segments with too many intervals are retrieved.

- This is a failing test case that needs to be ignored.

* Batch the intervals (use 100 as it's consistent with batching in other places).

* move the filtering inside the batch

* Account for limit cross the batch splits.

* Adjustments

* Fixup and add tests

* small refactor

* add more tests.

* remove wrapper.

* Minor edits

* assert out of range
2023-11-06 11:30:24 -08:00
Rishabh Singh 8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Gian Merlino 98f1eb8ede
Use filters for pruning properly for hash-joins. (#15299)
* Use filters for pruning properly for hash-joins.

Native used them too aggressively: it might use filters for the RHS
to prune the LHS. MSQ used them not at all. Now, both use them properly,
pruning based on base (LHS) columns only.

* Fix tests.

* Fix style.

* Clear filterFields too.

* Update.
2023-11-03 07:29:16 -07:00
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Laksh Singla b82ad59dfe
Better logging in ServiceClientImpl (#15269)
ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal.
This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs.
Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.
2023-11-02 11:32:49 +05:30
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Gian Merlino 37e158c2c4
Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300)
* Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN.

Prior to this patch, columnar frames would always write multi-valued columns if
the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping
UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE
causes expressions to assume that string inputs must be treated as arrays.

We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness
is encountered, and flipping it to TRUE if multi-valuedness is encountered.

* Add regression test case.
2023-11-01 22:05:53 -07:00
Suneet Saldanha e6b7c36e74
LoadRules with 0 replicas should be treated as handoff complete (#15274)
* LoadRules with 0 replicas should be treated as handoff complete

* fix it

* pr feedback

* fixit
2023-10-30 10:42:58 -07:00
George Shiqi Wu 3173093415
Handle status failures for streaming supervisors (#15174)
* Cleanup logic

* newline

* remove whitespace

* Fix log message

* Add test class

* PR changes
2023-10-30 10:21:23 -07:00
kaisun2000 60c2ad597a
Enhance json parser error logging to better track Istio Proxy error message (#15176)
Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation.

In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server.

For example, this is an example of mysterious error message

QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}"

While the context of the message is the following from the proxy when it can't tunnel the network connection.

pstream connect error or disconnect/reset before header

So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network.

Co-authored-by: Kai Sun <kai.sun@salesforce.com>
2023-10-27 14:20:19 +05:30
AmatyaAvadhanula 65b69cded4
Filter pending segments upgraded with transactional replace (#15169)
* Filter pending segments upgraded with transactional replace

* Push sequence name filter to metadata query
2023-10-23 21:18:47 +05:30
AmatyaAvadhanula a8febd457c
A Replacing task must read segments created before it acquired its lock (#15085)
* Replacing tasks must read segments created before they acquired their locks
2023-10-19 11:13:07 +05:30
AmatyaAvadhanula d25caaefa4
Add support for streaming ingestion with concurrent replace (#15039)
Add support for streaming ingestion with concurrent replace

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-13 09:09:03 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Laksh Singla 95bf331c08
Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108)
The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.
2023-10-10 02:03:29 +05:30
Adarsh Sanjeev 7a35ce886d
Add ability for MSQ tasks to query realtime tasks (#15024)
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
2023-10-09 15:14:03 +05:30
kaisun2000 e2cc1c4ad1
Add metric -- count of queries waiting for merge buffers (#15025)
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
2023-10-09 12:56:23 +05:30
Pranav f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Parth Agrawal d038237ece
memcached cache: switch to AWS elasticache-java-cluster-client and add TLS support (#14827)
This PR updates the library used for Memcached client to AWS Elasticache Client : https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-java

This enables us to use the option of encrypting data in transit:
Amazon ElastiCache for Memcached now supports encryption of data in transit

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.
Benefits of Auto Discovery - Amazon ElastiCache

AWS has forked spymemcached 2.12.1, and has since added all the patches included in 2.12.2 and 2.12.3 as part of the 1.2.0 release. So, this can now be considered as an equivalent drop-in replacement.

GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticache/AmazonElastiCacheClient.html#AmazonElastiCacheClient--

How to enable TLS with Elasticache

On server side:
https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/in-transit-encryption-mc.html#in-transit-encryption-enable-existing-mc

On client side:
GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
2023-10-02 12:51:05 -07:00
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
AmatyaAvadhanula c62193c4d7
Add support for concurrent batch Append and Replace (#14407)
Changes:
- Add task context parameter `taskLockType`. This determines the type of lock used by a batch task.
- Add new task actions for transactional replace and append of segments
- Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments
- Upgrade segments to appropriate versions when performing replace and append
- Add new metadata table `upgradeSegments` to track segments that need to be upgraded
- Add tests
2023-09-25 07:06:37 +05:30
Kashif Faraz d7c152c82c
Add a TaskReport for "kill" tasks (#15023)
- Add `KillTaskReport` that contains stats for `numSegmentsKilled`,
`numBatchesProcessed`, `numSegmentsMarkedAsUnused`
- Fix bug where exception message had no formatter but was was still being passed some args.
- Add some comments regarding deprecation of `markAsUnused` flag.
2023-09-23 07:44:27 +05:30
Kashif Faraz 409bffe7f2
Rename IMSC.announceHistoricalSegments to commitSegments (#15021)
This commit pulls out some changes from #14407 to simplify that PR.

Changes:
- Rename `IndexerMetadataStorageCoordinator.announceHistoricalSegments` to `commitSegments`
- Rename the overloaded method to `commitSegmentsAndMetadata`
- Fix some typos
2023-09-21 16:19:03 +05:30
Laksh Singla 82e809c8d0
fix (#15017) 2023-09-20 15:48:26 -07:00
Zoltan Haindrich 79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Kashif Faraz 286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Kashif Faraz 7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Kashif Faraz 647686aee2
Add test and metrics for KillStalePendingSegments duty (#14951)
Changes:
- Add new metric `kill/pendingSegments/count` with dimension `dataSource`
- Add tests for `KillStalePendingSegments`
- Reduce no-op logs that spit out for each datasource even when no pending
segments have been deleted. This can get particularly noisy at low values of `indexingPeriod`.
- Refactor the code in `KillStalePendingSegments` for readability and add javadocs
2023-09-08 10:33:47 +05:30
Kashif Faraz 88f3c9baed
Fix bug in computed value of balancerComputeThreads (#14947)
In smartSegmentLoading mode, use computed value of balancerComputeThreads
rather than configured value.
2023-09-07 01:14:05 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Adarsh Sanjeev 959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Kashif Faraz ec630e3671
Remove deprecated coordinator dynamic configs (#14923)
Changes:

[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 , 
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)

[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments

Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
2023-09-04 11:54:36 +05:30
Kashif Faraz 7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30
Clint Wylie dea9d4f1a7
cleaning DruidProcessingConfig bindings (#14927) 2023-08-30 22:35:08 -07:00
Kashif Faraz 8263f0d1e9
Reduce coordinator logs when operating normally (#14926)
Changes:
- Reduce log level of some coordinator stats, which only denote normal coordinator operation.
These stats are still emitted and can be logged by setting debugDimensions in the coordinator
dynamic config.
- Initialize SegmentLoadingConfig only for historical management duties. This config is not
needed in other duties and initializing it creates logs which are misleading.
2023-08-30 11:30:38 +05:30
Kashif Faraz d6565f46b0
Increase the computed value of replicationThrottleLimit (#14913)
Changes
- Increase value of `replicationThrottleLimit` computed by `smartSegmentLoading` from
2% to 5% of total number of used segments.
- Assign replicas to a tier even when some replicas are already being loaded in that tier
- Limit the total number of replicas in load queue at start of run + replica assignments in
the run to the `replicationThrottleLimit`.

i.e. for every tier,
    num loading replicas at start of run + num replicas assigned in run <= replicationThrottleLimit
2023-08-28 18:20:22 +05:30
Karan Kumar 9fcbf05c5d
Adjusting `SqlStatementResource` and `SqlTaskResource` to set request attribute via a new method. (#14878) 2023-08-26 10:59:47 +00:00
Kashif Faraz e51181957c
Use num cores to determine balancerComputeThreads (#14902)
Changes:
- Determine the default value of balancerComputeThreads based on number of
coordinator cpus rather than number of segments. Even if the number of segments
is low and we create more balancer threads, it doesn't hurt the system as threads
would mostly be idle.
- Remove unused field from SegmentLoadQueueManager

Expected values:
- Clusters with ~1M segments typically work with Coordinators having 16 cores or more.
This would give us 8 balancer threads, which is the same as the current maximum.
- On small clusters, even a single thread is enough to do the required balancing work.
2023-08-25 08:15:27 +05:30
Clint Wylie 36e659a501
remove group-by v1 (#14866)
* remove group-by v1

* docs

* remove unused configs, fix test

* fix test

* adjustments

* why not

* adjust

* review stuff
2023-08-23 12:44:06 -07:00
zachjsh 0c76df1c7d
Enable Continuous auto kill (#14831)
### Description

This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following:

1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work.

2. The task resources used by auto kill were previously unbounded.  Each duty run period, if unused
 segments were found for any datasource, a kill task would be submitted to kill them.

This pr solves for both of these issues:

1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map,  allows for searching for unusedSegments in the datasource from the beginning of time once again

2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of ba957a9b97


Operators can configure continous auto kill by providing coordinator runtime properties similar to the following:

```
druid.coordinator.period.indexingPeriod=PT60S
druid.coordinator.kill.period=PT60S
```

And providing sensible limits to the killTask usage via coordinator dynamic properties.
2023-08-23 09:23:08 -04:00
Adarsh Sanjeev dfb5a98888
Add coordinator API for unused segments (#14846)
There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.
2023-08-23 14:51:25 +05:30
Kashif Faraz 9376d8d6e1
Refactor: Move `UpdateCoordinatorStateAndPrepareCluster` duty out of `DruidCoordinator` (#14845)
Motivation:
- Clean up `DruidCoordinator` and move methods to classes where they are most relevant

Changes:
- No functional change
- Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState`
- Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster`
- Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here
- Move reporting of used segment stats and historical capacity stats from
`CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Move reporting of unavailable and under-replicated segment stats from
`CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty
2023-08-22 19:50:41 +05:30
Tejaswini Bandlamudi d87056e708
Upgrade guava version to 31.1-jre (#14767)
Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues.

CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution.
Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414
2023-08-22 12:09:53 +05:30
Kashif Faraz 92906059d2
Remove segmentsToBeDropped from SegmentTransactionInsertAction (#14883)
Motivation:
- There is no usage of the `SegmentTransactionInsertAction` which passes a
non-null non-empty value of `segmentsToBeDropped`.
- This is not really needed either as overshadowed segments are marked as unused
by the Coordinator and need not be done in the same transaction as committing segments.
- It will also help simplify the changes being made in #14407 

Changes:
- Remove `segmentsToBeDropped` from the task action and all intermediate methods
- Remove related tests which are not needed anymore
2023-08-21 20:08:56 +05:30
Kashif Faraz c211dcc4b3
Clean up compaction logs on coordinator (#14875)
Changes:
- Move logic of `NewestSegmentFirstIterator.needsCompaction` to `CompactionStatus`
to improve testability and readability
- Capture the list of checks performed to determine if compaction is needed in a readable
manner in `CompactionStatus.CHECKS`
- Make `CompactionSegmentIterator` iterate over instances of `SegmentsToCompact`
instead of `List<DataSegment>`. This allows use of the `umbrellaInterval` later.
- Replace usages of `QueueEntry` with `SegmentsToCompact`
- Move `SegmentsToCompact` out of `NewestSegmentFirstIterator`
- Simplify `CompactionStatistics`
- Reduce level of less important logs to debug
- No change made to tests to ensure correctness
2023-08-21 17:30:41 +05:30
Kashif Faraz 07a193a142
Use separate executor for each coordinator duty group (#14869)
Changes:
- Use separate executor for every duty group
- This change is thread-safe as every duty group uses its own copy of
`DruidCoordinatorRuntimeParams` and does not share any other mutable instances
with other duty groups.
- With the exception of `HistoricalManagementDuties`, duty groups are typically not
very compute intensive and mostly perform database or HTTP I/O. So, coordinator
resources would still mostly be available for `HistoricalManagementDuties`.
2023-08-21 15:53:22 +05:30
Abhishek Agarwal 9065ef1aff
Fix a bug in QosFilter (#14859)
QoSFilter class is trying to parse the timeout as an integer. We need to round a value of query timeout that is higher than INT.MAX to INT.MAX.
2023-08-21 13:00:41 +05:30
Kashif Faraz 097b645005
Clean up after add kill bufferPeriod (#14868)
Follow up changes to #12599 

Changes:
- Rename column `used_flag_last_updated` to `used_status_last_updated`
- Remove new CLI tool `UpdateTables`.
  - We already have a `CreateTables` with similar functionality, which should be able to
  handle update cases too.
  - Any user running the cluster for the first time should either just have `connector.createTables`
  enabled or run `CreateTables` which should create tables at the latest version.
  - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has
  been added to Druid, and users would have to run `CreateTables` anyway.
- Remove `upgrade-prep.md` and include that info in `metadata-init.md`.
- Fix log messages to adhere to Druid style
- Use lambdas
2023-08-19 00:00:04 +05:30
Lucas Capistrant 9c124f2cde
Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599)
* Add new configurable buffer period to create gap between mark unused and kill of segment

* Changes after testing

* fixes and improvements

* changes after initial self review

* self review changes

* update sql statement that was lacking last_used

* shore up some code in SqlMetadataConnector after self review

* fix derby compatibility and improve testing/docs

* fix checkstyle violations

* Fixes post merge with master

* add some unit tests to improve coverage

* ignore test coverage on new UpdateTools cli tool

* another attempt to ignore UpdateTables in coverage check

* change column name to used_flag_last_updated

* fix a method signature after column name switch

* update docs spelling

* Update spelling dictionary

* Fixing up docs/spelling and integrating altering tasks table with my alteration code

* Update NULL values for used_flag_last_updated in the background

* Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod

* remove unneeded things now that the new column is automatically updated

* Test new background row updater method

* fix broken tests

* fix create table statement

* cleanup DDL formatting

* Revert adding columns to entry table by default

* fix compilation issues after merge with master

* discovered and fixed metastore inserts that were breaking integration tests

* fixup forgotten insert by using pattern of sharing now timestamp across columns

* fix issue introduced by merge

* fixup after merge with master

* add some directions to docs in the case of segment table validation issues
2023-08-17 19:32:51 -05:00
Abhishek Radhakrishnan 37db5d9b81
Reset offsets supervisor API (#14772)
* Add supervisor /resetOffsets API.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.

* Add a new interface method for backwards compatibility.

* Rename

* Adjust tests and javadocs.

* Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules

* UT fix

* Doc updates.

* remove extraneous debugging logs.

* Remove the boolean setting; only ResetHandle() and resetInternal()

* Relax constraints and add a new ResetOffsetsNotice; cleanup old logic.

* A separate ResetOffsetsNotice and some cleanup.

* Minor cleanup

* Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers

* Add unit tests for the no op implementations for test coverage

* CodeQL fix

* checkstyle from merge conflict

* Doc changes

* DOCUSAURUS code tabs fix. Thanks, Brian!
2023-08-17 14:13:10 -07:00
Kashif Faraz fffb2e4fe7
Speed up SQLMetadataStorageActionHandlerTest (#14856)
Changes
- Reduce test time of `SQLMetadataStorageActionHandlerTest.testMigration`
- Slightly modify log messages to adhere to Druid style
2023-08-17 18:02:43 +05:30
Kashif Faraz 5d4ac64178
Adapt maxSegmentsToMove based on cluster skew (#14584)
Changes:
- No change in behaviour if `smartSegmentLoading` is disabled
- If `smartSegmentLoading` is enabled
  - Compute `balancerComputeThreads` based on `numUsedSegments`
  - Compute `maxSegmentsToMove` based on `balancerComputeThreads`
  - Compute `segmentsToMoveToFixSkew` based on usage skew
  - Compute `segmentsToMove = Math.min(maxSegmentsToMove, segmentsToMoveToFixSkew)`

Limits:
- 1 <= `balancerComputeThreads` <= 8
- `maxSegmentsToMove` <= 20% of total segments
- `minSegmentsToMove` = 0.15% of total segments
2023-08-17 11:14:54 +05:30
Clint Wylie 6b14dde50e
deprecate config-magic in favor of json configuration stuff (#14695)
* json config based processing and broker merge configs to deprecate config-magic
2023-08-16 18:23:57 -07:00
Kashif Faraz d9221e46e4
Completely disable cachingCost balancer strategy (#14798)
`cachingCost` has been deprecated in #14484 and is not advised to be used in
production clusters as it may cause usage skew across historicals which the
coordinator is unable to rectify. This PR completely disables `cachingCost` strategy
as it has now been rendered redundant due to recent performance improvements
made to `cost` strategy.

Changes
- Disable `cachingCost` strategy
- Add `DisabledCachingCostBalancerStrategyFactory` for the time being so that we
can give a proper error message before falling back to `CostBalancerStrategy`. This
will be removed in subsequent releases.
- Retain `CachingCostBalancerStrategy` for testing/benchmarking purposes.
- Add javadocs to `DiskNormalizedCostBalancerStrategy`
2023-08-16 11:43:52 +05:30
AmatyaAvadhanula e16096735b
Fix 404 when segment is used but not in the Coordinator snapshot (#14762)
* Fix 404 when used segment has not been updated in the Coordinator snapshot

* Add unit test
2023-08-14 13:20:43 +05:30
Kashif Faraz 786e772d26
Remove config `druid.coordinator.compaction.skipLockedIntervals` (#14807)
The value of `druid.coordinator.compaction.skipLockedIntervals` should always be `true`.
2023-08-14 12:31:15 +05:30
Rishabh Singh 0dc305f9e4
Upgrade hibernate validator version to fix CVE-2019-10219 (#14757) 2023-08-14 11:50:51 +05:30
zachjsh 82d82dfbd6
Add stats to KillUnusedSegments coordinator duty (#14782)
### Description

Added the following metrics, which are calculated from the `KillUnusedSegments` coordinatorDuty

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill. 

#### Release note
NEW: metrics added for auto kill

`"killTask/availableSlot/count"`: calculates the number remaining task slots available for auto kill
`"killTask/maxSlot/count"`: calculates the maximum number of tasks available for auto kill
`"killTask/task/count"`: calculates the number of tasks submitted by auto kill.
2023-08-10 18:36:53 -04:00
Rishabh Singh 4b9846b90f
Improve exception message when DruidLeaderClient doesn't find leader node (#14775)
The existing exception message No known server thrown in DruidLeaderClient is unhelpful.
2023-08-10 16:37:37 +05:30
Tejaswini Bandlamudi 550a66d71e
Upgrade jackson-databind to 2.12.7 (#14770)
The current version of jackson-databind is flagged for vulnerabilities CVE-2020-28491 (Although cbor format is not used in druid), CVE-2020-36518 (Seems genuine as deeply nested json in can cause resource exhaustion). Updating the dependency to the latest version 2.12.7 to fix these vulnerabilities.
2023-08-09 12:22:16 +05:30
zachjsh 660e6cfa01
Allow for task limit on kill tasks spawned by auto kill coordinator duty (#14769)
### Description

Previously, the `KillUnusedSegments` coordinator duty, in charge of periodically deleting unused segments, could spawn an unlimited number of kill tasks for unused segments. This change adds 2 new coordinator dynamic configs that can be used to control the limit of tasks spawned by this coordinator duty

`killTaskSlotRatio`: Ratio of total available task slots, including autoscaling if applicable that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is 1, which allows all available tasks to be used, which is the existing behavior

`maxKillTaskSlots`: Maximum number of tasks that will be allowed for kill tasks. This limit only applies for kill tasks that are spawned automatically by the coordinator's auto kill duty. Default is INT.MAX, which essentially allows for unbounded number of tasks, which is the existing behavior. 

Realize that we can effectively get away with just the one `killTaskSlotRatio`, but following similarly to the compaction config, which has similar properties; I thought it was good to have some control of the upper limit regardless of ratio provided.

#### Release note
NEW: `killTaskSlotRatio`  and `maxKillTaskSlots` coordinator dynamic config properties added that allow control of task resource usage spawned by `KillUnusedSegments` coordinator task (auto kill)
2023-08-08 08:40:55 -04:00
Kashif Faraz 2d8e0f28f3
Refactor: Cleanup coordinator duties for metadata cleanup (#14631)
Changes
- Add abstract class `MetadataCleanupDuty`
- Make `KillAuditLogs`, `KillCompactionConfig`, etc extend `MetadataCleanupDuty` 
- Improve log and error messages
- Cleanup tests
- No functional change
2023-08-05 13:08:23 +05:30
Suneet Saldanha 62ddeaf16f
Additional dimensions for service/heartbeat (#14743)
* Additional dimensions for service/heartbeat

* docs

* review

* review
2023-08-04 11:01:07 -07:00
Pranav d31c04c4c6
Fix the bug in getIndexInfo for mysql (#14750) 2023-08-03 21:45:01 -07:00
zachjsh ba957a9b97
Add ability to limit the number of segments killed in kill task (#14662)
### Description

Previously, the `maxSegments` configured for auto kill could be ignored if an interval of data for a  given datasource had more than this number of unused segments, causing the kill task spawned with the task of deleting unused segments in that given interval of data to delete more than the `maxSegments` configured. Now each kill task spawned by the auto kill coordinator duty, will kill at most `limit` segments. This is done by adding a new config property to the `KillUnusedSegmentTask` which allows users to specify this limit.
2023-08-03 22:17:04 -04:00
imply-cheddar 748874405c
Minimize PostAggregator computations (#14708)
* Minimize PostAggregator computations

Since a change back in 2014, the topN query has been computing
all PostAggregators on all intermediate responses from leaf nodes
to brokers.  This generates significant slow downs for queries
with relatively expensive PostAggregators.  This change rewrites
the query that is pushed down to only have the minimal set of
PostAggregators such that it is impossible for downstream
processing to do too much work.  The final PostAggregators are
applied at the very end.
2023-08-04 00:04:31 +05:30
Kashif Faraz b27d281b11
Remove unused param in MetadataResource (#14747) 2023-08-03 19:18:01 +05:30
Kashif Faraz ee4e0c93b4
Improve alert message for segment assignments (#14696)
Changes:
- Add interface `SegmentDeleteHandler` for marking segments as unused
- In `StrategicSegmentAssigner`, collect all segments on which a drop rule applies in a list
- Process the list above as a batch delete rather than individual deletes
- Improve alert messages when an invalid tier is specified in a load rule
- Improve alert message when no rule applies on a segment
2023-08-01 23:33:05 +05:30
Kashif Faraz 10328c0743
Rename metadatacache and serverview metrics (#14716) 2023-08-01 14:18:20 +05:30
Kashif Faraz d04521d58f
Improve description field when emitting metric for broadcast failure (#14703)
Changes:
- Emit descriptions such as `Load queue is full`, `No disk space` etc. instead of `Unknown error`
- Rewrite `BroadcastDistributionRuleTest`
2023-08-01 10:13:55 +05:30
Jason Koch 44d5c1a15f
split KillUnusedSegmentsTask to processing in smaller chunks (#14642)
split KillUnusedSegmentsTask to smaller batches

Processing in smaller chunks allows the task execution to yield the TaskLockbox lock,
which allows the overlord to continue being responsive to other tasks and users while
this particular kill task is executing.

* introduce KillUnusedSegmentsTask batchSize parameter to control size of batching

* provide an explanation for kill task batchSize parameter

* add logging details for kill batch progress
2023-07-31 12:56:27 -07:00
Kashif Faraz 844a9c7ffb
Cancel loads of unused segments (#14644) 2023-07-31 18:01:50 +05:30
Kashif Faraz e9b4f1e95c
Fix reported replication factor of segment with zero required replicas (#14701) 2023-07-31 14:51:01 +05:30
Kashif Faraz 22290fd632
Test: Simplify test impl of LoadQueuePeon (#14684)
Changes
- Rename `LoadQueuePeonTester` to `TestLoadQueuePeon`
- Simplify `TestLoadQueuePeon` by removing dependency on `CuratorLoadQueuePeon`
- Remove usages of mock peons in `LoadRuleTest` and use `TestLoadQueuePeon` instead
2023-07-28 16:14:23 +05:30
TSFenwick 9a9038c7ae
Speed up kill tasks by deleting segments in batch (#14131)
* allow for batched delete of segments instead of deleting segment data one by one

create new batchdelete method in datasegment killer that has default functionality
of iterating through all segments and calling delete on them. This will enable
a slow rollout of other deepstorage implementations to move to a batched delete
on their own time

* cleanup batchdelete segments

* batch delete with the omni data deleter

cleaned up code
just need to add tests and docs for this functionality

* update java doc to explain how it will try to use batch if function is overwritten

* rename killBatch to kill
add unit tests

* add omniDataSegmentKillerTest for deleting multiple segments at a time. fix checkstyle

* explain test peculiarity better

* clean up batch kill in s3.

* remove unused return value. cleanup comments and fix checkstyle

* default to batch delete. more specific java docs. list segments that couldn't be deleted
if there was a client error or server error

* simplify error handling

* add tests where an exception is thrown when killing multiple s3 segments

* add test for failing to delete two calls with the s3 client

* fix javadoc for kill(List<DataSegment> segments) clean up tests remove feature flag

* fix typo in javadocs

* fix test failure

* fix checkstyle and improve tests

* fix intellij inspections issues

* address comments, make delete multiple segments not assume same bucket

* fix test errors

* better grammar and punctuation. fix test. and better logging for exception

* remove unused code

* avoid extra arraylist instantiation

* fix broken test

* fix broken test

* fix tests to use assert.throws
2023-07-27 15:34:44 -07:00
Gian Merlino 986a271a7d
Merge core CoordinatorClient with MSQ CoordinatorServiceClient. (#14652)
* Merge core CoordinatorClient with MSQ CoordinatorServiceClient.

Continuing the work from #12696, this patch merges the MSQ
CoordinatorServiceClient into the core CoordinatorClient, yielding a single
interface that serves both needs and is based on the ServiceClient RPC
system rather than DruidLeaderClient.

Also removes the backwards-compatibility code for the handoff API in
CoordinatorBasedSegmentHandoffNotifier, because the new API was added
in 0.14.0. That's long enough ago that we don't need backwards
compatibility for rolling updates.

* Fixups.

* Trigger GHA.

* Remove unnecessary retrying in DruidInputSource. Add "about an hour"
retry policy and h

* EasyMock
2023-07-27 13:23:37 -07:00
Kashif Faraz 7634ac896e
Quick fix for SegmentLoadDropHandler bug (#14670) 2023-07-27 11:53:58 +05:30
Gian Merlino 4a68f8a294
Fix maxCompletedTasks parameter in OverlordClientImpl. (#14667)
It was sent to the server as "maxCompletedTasks", but the server expects
"max". This caused it to be ignored. This bug was introduced in #14581.
2023-07-26 15:12:20 -07:00
Gian Merlino 2f9619a96f
Use OverlordClient for all Overlord RPCs. (#14581)
* Use OverlordClient for all Overlord RPCs.

Continuing the work from #12696, this patch removes HttpIndexingServiceClient
and the IndexingService flavor of DruidLeaderClient completely. All remaining
usages are migrated to OverlordClient.

Supporting changes include:

1) Add a variety of methods to OverlordClient.

2) Update MetadataTaskStorage to skip the complete-task lookup when
   the caller requests zero completed tasks. This helps performance of
   the "get active tasks" APIs, which don't want to see complete ones.

* Use less forbidden APIs.

* Fixes from CI.

* Add test coverage.

* Two more tests.

* Fix test.

* Updates from CR.

* Remove unthrown exceptions.

* Refactor to improve testability and test coverage.

* Add isNil tests.

* Remove unnecessary "deserialize" methods.
2023-07-24 21:14:27 -07:00
aho135 607f511767
Improve logging in CoordinatorBasedSegmentHandoffNotifier (#14640) 2023-07-24 18:04:21 +05:30
Jason Koch 54f29fedce
Use PreparedBatch while deleting segments (#14639)
Related to #14634 

Changes:
- Update `IndexerSQLMetadataStorageCoordinator.deleteSegments` to use
JDBI PreparedBatch instead of issuing single DELETE statements
2023-07-23 22:55:04 +05:30
Abhishek Agarwal 1ddbaa8744
Reserve threads for non-query requests without using laning (#14576)
This PR uses the QoSFilter available in Jetty to park the query requests that exceed a configured limit. This is done so that other HTTP requests such as health check calls do not get blocked if the query server is busy serving long-running queries. The same mechanism can also be used in the future to isolate interactive queries from long-running select queries from interactive queries within the same broker.

Right now, you can still get that isolation by setting druid.query.scheduler.numThreads to a value lowe than druid.server.http.numThreads. That enables total laning but the side effect is that excess requests are not queued and rejected outright that leads to a bad user experience.

Parked requests are timed out after 30 seconds by default. I overrode that to the maxQueryTimeout in this PR.
2023-07-20 15:03:48 +05:30
Clint Wylie 913416c669
add equality, null, and range filter (#14542)
changes:
* new filters that preserve match value typing to better handle filtering different column types
* sql planner uses new filters by default in sql compatible null handling mode
* remove isFilterable from column capabilities
* proper handling of array filtering, add array processor to column processors
* javadoc for sql test filter functions
* range filter support for arrays, tons more tests, fixes
* add dimension selector tests for mixed type roots
* support json equality
* rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes
* add cooler equality index maker, ValueIndexes 
* fix missing string utf8 index supplier
* expression array comparator stuff
2023-07-18 12:15:22 -07:00
AmatyaAvadhanula 0412f40d36
Prepare master branch for next release, 28.0.0 (#14595)
* Prepare master branch for next release, 28.0.0
2023-07-18 09:22:30 +05:30
Kashif Faraz ab051d9c5e
Add test for ReservoirSegmentSampler (#14591)
Tests to verify the following behaviour have been added:
- Segments from more populous servers are more likely to be picked irrespective of
sample size.
- Segments from all servers are equally likely to be picked if all servers have equivalent
number of segments.
2023-07-17 18:50:02 +05:30
Gian Merlino 95ca43034f
Change default handoffConditionTimeout to 15 minutes. (#14539)
* Change default handoffConditionTimeout to 15 minutes.

Most of the time, when handoff is taking this long, it's because something
is preventing Historicals from loading new data. In this case, we have
two choices:

1) Stop making progress on ingestion, wait for Historicals to load stuff,
   and keep the waiting-for-handoff segments available on realtime tasks.
   (handoffConditionTimeout = 0, the current default)

2) Continue making progress on ingestion, by exiting the realtime tasks
   that were waiting for handoff. Once the Historicals get their act
   together, the segments will be loaded, as they are still there on
   deep storage. They will just not be continuously available.
   (handoffConditionTimeout > 0)

I believe most users would prefer [2], because [1] risks ingestion falling
behind the stream, which causes many other problems. It can cause data loss
if the stream ages-out data before we have a chance to ingest it.

Due to the way tuningConfigs are serialized -- defaults are baked into the
serialized form that is written to the database -- this default change will
not change anyone's existing supervisors. It will take effect for newly
created supervisors.

* Fix tests.

* Update docs/development/extensions-core/kafka-supervisor-reference.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 13:17:14 -07:00
Abhishek Radhakrishnan f4ee58eaa8
Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560)
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 12:37:36 -04:00
Gian Merlino 3ff51487b7
Add ZooKeeper connection state alerts and metrics. (#14333)
* Add ZooKeeper connection state alerts and metrics.

- New metric "zk/connected" is an indicator showing 1 when connected,
  0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.

* Use right GuardedBy.

* Test fixes, coverage.

* Adjustment.

* Fix tests.

* Fix ITs.

* Improved injection.

* Adjust metric name, add tests.
2023-07-12 09:34:28 -07:00
hqx871 7142b0c39e
Enable result level cache for GroupByStrategyV2 on broker (#11595)
Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.
2023-07-12 15:00:01 +05:30
Kashif Faraz 58a35bf07e
Deprecate EntryExistsException in Druid 27 and remove in Druid 28 (#14554)
Also deprecate UnknownSegmentIdsException.
2023-07-08 15:40:14 +05:30
Kashif Faraz 40d0dc9e0e
Use separate executor to handle task updates in TaskQueue (#14533)
Description:
`TaskQueue.notifyStatus` is often a heavy call as it performs the following operations:
- Update task status in metadata DB
- Update task locks in metadata DB
- Request (synchronously) the task runner to shutdown the completed task
- Clean up in-memory data structures

This method can often be slow and can cause worker sync / task runners to slow down.

Main changes:
- Run task completion callbacks in a separate executor to handle task completion updates
- Add new config `druid.indexer.queue.taskCompleteHandlerNumThreads`
- Add metrics to monitor number of processed and queued items
- There are still other paths that can invoke `notifyStatus`, but those need not be moved to
the new executor as they are synchronous on purpose.

Other changes:
- Add new metrics `task/status/queue/count`, `task/status/handled/count`
- Add `TaskCountStatsProvider.getStats()` which deprecates the other `getXXXTaskCount` methods.
- Use `CoordinatorRunStats` to collect and report metrics. This class has been used as is
for now but will later be renamed and repurposed to use across all Druid services.
2023-07-07 20:43:12 +05:30
Gian Merlino 1fe61bc869
ChangeRequestHttpSyncer: Don't wait 1ms when checking isInitialized(). (#14547)
The wait doesn't seem to serve a purpose, other than causing delays
when checking isInitialized() for a large number of things that have
not yet been initialized.
2023-07-07 05:54:39 -07:00
imply-cheddar 277b357256
Optimize IntervalIterator (#14530)
UniformGranularityTest's test to test a large number of intervals
runs through 10 years of 1 second intervals.  This pushes a lot of
stuff through IntervalIterator and shows up in terms of test
runtime as one of the hottest tests.  Most of the time is going to
constructing jodatime objects because it is doing things with
DateTime objects instead of millis.  Change the calls to use
millis instead and things go faster.
2023-07-06 14:44:23 +05:30
Kashif Faraz 87bb1b9709
Fix bug during initialization of HttpServerInventoryView (#14517)
If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`,
the initialization gets stuck as this server is never synced. The method eventually times
out (default 250s).

Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for
non-stopped servers to sync.

Other changes:
- Add new metrics for better debugging of slow broker/coordinator startup
  - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server
  - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable  
- Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer`
- Minor refactor for readability
- Add utility class `Stopwatch`
- Add tests and stubs
2023-07-06 13:04:53 +05:30
Kashif Faraz a6547febaf
Remove unused coordinator dynamic configs (#14524)
After #13197 , several coordinator configs are now redundant as they are not being
used anymore, neither with `smartSegmentLoading` nor otherwise.

Changes:
- Remove dynamic configs `emitBalancingStats`: balancer error stats are always
emitted, debug stats can be logged by using `debugDimensions`
- `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`:
batched segment sampling is always used
- Add test to verify deserialization with unknown properties
- Update `CoordinatorRunStats` to always track stats, this can be optimized later.
2023-07-06 12:11:10 +05:30
Clint Wylie 277aaa5c57
remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500)
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
2023-07-02 19:37:15 -07:00
Pranav 4b2d87336a
Add additional index on task table (#14470) 2023-06-29 15:32:43 -07:00
Gian Merlino e10e35aa2c
Add REGEXP_REPLACE function. (#14460)
* Add REGEXP_REPLACE function.

Replaces all instances of a pattern with a replacement string.

* Fixes.

* Improve test coverage.

* Adjust behavior.
2023-06-29 13:47:57 -07:00
Gian Merlino a6cabbe10f
SQL: Avoid "intervals" for non-table-based datasources. (#14336)
In these other cases, stick to plain "filter". This simplifies lots of
logic downstream, and doesn't hurt since we don't have intervals-specific
optimizations outside of tables.

Fixes an issue where we couldn't properly filter on a column from an
external datasource if it was named __time.
2023-06-29 09:57:11 +05:30
Karan Kumar cb3a9d2b57
Adding Interactive API's for MSQ engine (#14416)
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
2023-06-28 17:51:58 +05:30
Kashif Faraz 4bd6bd0d4f
Improve CostBalancerStrategy, deprecate cachingCost (#14484)
Changes to `cost` strategy:
- In every `ServerHolder`, track the number of segments per datasource per interval
- Perform cost computations for a given interval just once, and then multiply by a constant
factor to account for the total segment count in that interval
- Do not perform joint cost computations with segments that are outside the compute interval
(± 45 days) for the segment being considered for move
- Remove metrics `segment/cost/*` as they were coordinator killers! Turning on these metrics
(by setting `emitBalancingStats` to true) has often caused the coordinator to be stuck for hours.
Moreover, they are too complicated to decipher and do not provide any meaningful insight into
a Druid cluster.
- Add new simpler metrics `segment/balancer/compute/*` to track cost computation time,
count and errors.

Other changes:
- Remove flaky test from `CostBalancerStrategyTest`.
- Add tests to verify that computed cost has remained unchanged
- Remove usages of mock `BalancerStrategy` from `LoadRuleTest`, `BalanceSegmentsTest`
- Clean up `BalancerStrategy` interface
2023-06-27 13:23:29 +05:30
YongGang b7434be99e
Add ServiceStatusMonitor to monitor service health (#14443)
* Add OverlordStatusMonitor and CoordinatorStatusMonitor to monitor service leader status

* make the monitor more general

* resolve conflict

* use Supplier pattern to provide metrics

* reformat code and doc

* move service specific tag to dimension

* minor refine

* update doc

* reformat code

* address comments

* remove declared exception

* bind HeartbeatSupplier conditionally in Coordinator
2023-06-26 10:26:37 -07:00
Laksh Singla 1647d5f4a0
Limit the subquery results by memory usage (#13952)
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
2023-06-26 18:12:28 +05:30
Adarsh Sanjeev 90b8f850a5
Allow empty tiered replicants map for load rules (#14432)
Changes:
- Add property `useDefaultTierForNull` for all load rules. This property determines the default
value of `tieredReplicants` if it is not specified. When true, the default is `_default_tier => 2 replicas`.
When false, the default is empty, i.e. no replicas on any tier.
- Fix validation to allow empty replicants map, so that the segment is used but not loaded anywhere.
2023-06-22 14:44:06 +05:30
Rishabh Singh 92a7febacb
Revert "Add method to authorize native query using authentication result (#14376)" (#14452)
This reverts commit 8b212e73d7.
2023-06-21 10:42:26 +05:30
Hardik Bajaj 1ea9158a50
Added new SysMonitorOshi v0 using Oshi library (#14359)
Added a new monitor SysMonitorOshi to replace SysMonitor. The new monitor has a wider support for different machine architectures including ARM instances. Please switch to SysMonitorOshi as SysMonitor is now deprecated and will be removed in future releases.
2023-06-20 20:57:58 +05:30
Adarsh Sanjeev f5cc823d0f
Handle nulls in DruidCoordinator.getReplicationFactor (#14447) 2023-06-20 15:25:57 +05:30
Kashif Faraz 50461c3bd5
Enable smartSegmentLoading on the Coordinator (#13197)
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.

Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`

Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
2023-06-19 14:27:35 +05:30
imply-cheddar cfd07a95b7
Errors take 3 (#14004)
Introduce DruidException, an exception whose goal in life is to be delivered to a user.

DruidException itself has javadoc on it to describe how it should be used.  This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.

This work was a 3rd iteration on top of work that was started by Paul Rogers.  I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
2023-06-19 01:11:13 -07:00
Adarsh Sanjeev 128133fadc
Add column replication_factor column to sys.segments table (#14403)
Description:
Druid allows a configuration of load rules that may cause a used segment to not be loaded
on any historical. This status is not tracked in the sys.segments table on the broker, which
makes it difficult to determine if the unavailability of a segment is expected and if we should
not wait for it to be loaded on a server after ingestion has finished.

Changes:
- Track replication factor in `SegmentReplicantLookup` during evaluation of load rules
- Update API `/druid/coordinator/v1metadata/segments` to return replication factor
- Add column `replication_factor` to the sys.segments virtual table and populate it in
`MetadataSegmentView`
- If this column is 0, the segment is not assigned to any historical and will not be loaded.
2023-06-18 10:02:21 +05:30
Maytas Monsereenusorn 5d76d0ea74
Fix segment/deleted/count metric not being emitted (#14433)
* Fix segment/deleted/count metric

* Fix segment/deleted/count metric

* Fix segment/deleted/count metric
2023-06-15 14:08:19 -07:00
Rishabh Singh 66c3cc1391
Handle unparseable SupervisorSpec in metadata store (#14382)
Changes:
- Skip a supervisor spec entry which cannot be deserialised into a `SupervisorSpec` object.
- Log an error for the unparseable spec
2023-06-13 08:02:01 +05:30
Rishabh Singh 8b212e73d7
Add method to authorize native query using authentication result (#14376) 2023-06-12 11:06:00 +05:30
Kashif Faraz 6e158704cb
Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271)
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
2023-06-10 12:15:44 +05:30
Abhishek Radhakrishnan 23c2dcaf8d
Add NullHandling module initialization for `LookupDimensionSpecTest` (#14393) 2023-06-09 09:07:32 +05:30
Kashif Faraz 12e8fa5c97
Prevent coordinator from getting stuck if leadership changes during coordinator run (#14385)
Changes:
- Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`.
1 minute is the typical time for a full coordinator run and is more than enough time for cost
computations of a single segment.
- Raise an alert if an exception is encountered while computing costs and if the executor has
not been shutdown. This is because a shutdown is intentional and does not require an alert.
2023-06-08 15:29:20 +05:30
zachjsh 04a82da63d
Input source security fixes (#14266)
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
2023-06-01 16:37:19 -07:00
Kashif Faraz d4cacebf79
Add tests for CostBalancerStrategy (#14230)
Changes:
- `CostBalancerStrategyTest`
  - Focus on verification of cost computations rather than choosing servers in this test
  - Add new tests `testComputeCost` and `testJointSegmentsCost`
  - Add tests to demonstrate that with a long enough interval gap, all costs become negligible
  - Retain `testIntervalCost` and `testIntervalCostAdditivity`
  - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as
verification of this behaviour is better suited to `BalancingStrategiesTest`.
- `CostBalancerStrategyBenchmark`
  - Remove usage of static method from `CostBalancerStrategyTest`
  - Explicitly setup cluster and segments to use for benchmarking
2023-05-30 08:52:56 +05:30
Kashif Faraz 8091c6a547
Update default values in CoordinatorDynamicConfig (#14269)
The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated.

1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100)
2. `replicationThrottleLimit = 500` (previous = 10)
Rationale: With round-robin segment assignment now being the default assignment technique,
the Coordinator can assign a large number of under-replicated/unavailable segments very quickly,
without getting stuck in `RunRules` duty due to very slow strategy-based cost computations.

3. `maxSegmentsToMove = 100` (previous = 5)
Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments
to balance. A very large value can cause excessive moves, which has these disadvantages:
- Load of moving segments competing with load of unavailable/under-replicated segments
- Unnecessary network costs due to constant download and delete of segments

These defaults will be revisited after #13197 is merged.
2023-05-30 08:51:33 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00