Commit Graph

4060 Commits

Author SHA1 Message Date
Kashif Faraz ab051d9c5e
Add test for ReservoirSegmentSampler (#14591)
Tests to verify the following behaviour have been added:
- Segments from more populous servers are more likely to be picked irrespective of
sample size.
- Segments from all servers are equally likely to be picked if all servers have equivalent
number of segments.
2023-07-17 18:50:02 +05:30
Gian Merlino 95ca43034f
Change default handoffConditionTimeout to 15 minutes. (#14539)
* Change default handoffConditionTimeout to 15 minutes.

Most of the time, when handoff is taking this long, it's because something
is preventing Historicals from loading new data. In this case, we have
two choices:

1) Stop making progress on ingestion, wait for Historicals to load stuff,
   and keep the waiting-for-handoff segments available on realtime tasks.
   (handoffConditionTimeout = 0, the current default)

2) Continue making progress on ingestion, by exiting the realtime tasks
   that were waiting for handoff. Once the Historicals get their act
   together, the segments will be loaded, as they are still there on
   deep storage. They will just not be continuously available.
   (handoffConditionTimeout > 0)

I believe most users would prefer [2], because [1] risks ingestion falling
behind the stream, which causes many other problems. It can cause data loss
if the stream ages-out data before we have a chance to ingest it.

Due to the way tuningConfigs are serialized -- defaults are baked into the
serialized form that is written to the database -- this default change will
not change anyone's existing supervisors. It will take effect for newly
created supervisors.

* Fix tests.

* Update docs/development/extensions-core/kafka-supervisor-reference.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 13:17:14 -07:00
Abhishek Radhakrishnan f4ee58eaa8
Add `aggregatorMergeStrategy` property in SegmentMetadata queries (#14560)
* Add aggregatorMergeStrategy property to SegmentMetadaQuery.

- Adds a new property aggregatorMergeStrategy to segmentMetadata query.
aggregatorMergeStrategy currently supports three types of merge strategies -
the legacy strict and lenient strategies, and the new latest strategy.
- The latest strategy considers the latest aggregator from the latest segment
by time order when there's a conflict when merging aggregators from different
segments.
- Deprecate lenientAggregatorMerge property; The API validates that both the new
and old properties are not set, and returns an exception.
- When merging segments as part of segmentMetadata query, the segments have a more
elaborate id -- <datasource>_<interval>_merged_<partition_number> format, similar to
the name format that segments usually contain. Previously it was simply "merged".
- Adjust unit tests to test the latest strategy, to assert the returned complete
SegmentAnalysis object instead of just the aggregators for completeness.

* Don't explicitly set strict strategy in tests

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/querying/segmentmetadataquery.md

* Apply suggestions from code review

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 12:37:36 -04:00
Gian Merlino 3ff51487b7
Add ZooKeeper connection state alerts and metrics. (#14333)
* Add ZooKeeper connection state alerts and metrics.

- New metric "zk/connected" is an indicator showing 1 when connected,
  0 when disconnected.
- New metric "zk/disconnected/time" measures time spent disconnected.
- New alert when Curator connection state enters LOST or SUSPENDED.

* Use right GuardedBy.

* Test fixes, coverage.

* Adjustment.

* Fix tests.

* Fix ITs.

* Improved injection.

* Adjust metric name, add tests.
2023-07-12 09:34:28 -07:00
hqx871 7142b0c39e
Enable result level cache for GroupByStrategyV2 on broker (#11595)
Cache is disabled for GroupByStrategyV2 on broker since the pr #3820 [groupBy v2: Results not fully merged when caching is enabled on the broker]. But we can enable the result-level cache on broker for GroupByStrategyV2 and keep the segment-level cache disabled.
2023-07-12 15:00:01 +05:30
Kashif Faraz 58a35bf07e
Deprecate EntryExistsException in Druid 27 and remove in Druid 28 (#14554)
Also deprecate UnknownSegmentIdsException.
2023-07-08 15:40:14 +05:30
Kashif Faraz 40d0dc9e0e
Use separate executor to handle task updates in TaskQueue (#14533)
Description:
`TaskQueue.notifyStatus` is often a heavy call as it performs the following operations:
- Update task status in metadata DB
- Update task locks in metadata DB
- Request (synchronously) the task runner to shutdown the completed task
- Clean up in-memory data structures

This method can often be slow and can cause worker sync / task runners to slow down.

Main changes:
- Run task completion callbacks in a separate executor to handle task completion updates
- Add new config `druid.indexer.queue.taskCompleteHandlerNumThreads`
- Add metrics to monitor number of processed and queued items
- There are still other paths that can invoke `notifyStatus`, but those need not be moved to
the new executor as they are synchronous on purpose.

Other changes:
- Add new metrics `task/status/queue/count`, `task/status/handled/count`
- Add `TaskCountStatsProvider.getStats()` which deprecates the other `getXXXTaskCount` methods.
- Use `CoordinatorRunStats` to collect and report metrics. This class has been used as is
for now but will later be renamed and repurposed to use across all Druid services.
2023-07-07 20:43:12 +05:30
Gian Merlino 1fe61bc869
ChangeRequestHttpSyncer: Don't wait 1ms when checking isInitialized(). (#14547)
The wait doesn't seem to serve a purpose, other than causing delays
when checking isInitialized() for a large number of things that have
not yet been initialized.
2023-07-07 05:54:39 -07:00
imply-cheddar 277b357256
Optimize IntervalIterator (#14530)
UniformGranularityTest's test to test a large number of intervals
runs through 10 years of 1 second intervals.  This pushes a lot of
stuff through IntervalIterator and shows up in terms of test
runtime as one of the hottest tests.  Most of the time is going to
constructing jodatime objects because it is doing things with
DateTime objects instead of millis.  Change the calls to use
millis instead and things go faster.
2023-07-06 14:44:23 +05:30
Kashif Faraz 87bb1b9709
Fix bug during initialization of HttpServerInventoryView (#14517)
If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`,
the initialization gets stuck as this server is never synced. The method eventually times
out (default 250s).

Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for
non-stopped servers to sync.

Other changes:
- Add new metrics for better debugging of slow broker/coordinator startup
  - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server
  - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable  
- Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer`
- Minor refactor for readability
- Add utility class `Stopwatch`
- Add tests and stubs
2023-07-06 13:04:53 +05:30
Kashif Faraz a6547febaf
Remove unused coordinator dynamic configs (#14524)
After #13197 , several coordinator configs are now redundant as they are not being
used anymore, neither with `smartSegmentLoading` nor otherwise.

Changes:
- Remove dynamic configs `emitBalancingStats`: balancer error stats are always
emitted, debug stats can be logged by using `debugDimensions`
- `useBatchedSegmentSampler`, `percentOfSegmentsToConsiderPerMove`:
batched segment sampling is always used
- Add test to verify deserialization with unknown properties
- Update `CoordinatorRunStats` to always track stats, this can be optimized later.
2023-07-06 12:11:10 +05:30
Clint Wylie 277aaa5c57
remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500)
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
2023-07-02 19:37:15 -07:00
Pranav 4b2d87336a
Add additional index on task table (#14470) 2023-06-29 15:32:43 -07:00
Gian Merlino e10e35aa2c
Add REGEXP_REPLACE function. (#14460)
* Add REGEXP_REPLACE function.

Replaces all instances of a pattern with a replacement string.

* Fixes.

* Improve test coverage.

* Adjust behavior.
2023-06-29 13:47:57 -07:00
Gian Merlino a6cabbe10f
SQL: Avoid "intervals" for non-table-based datasources. (#14336)
In these other cases, stick to plain "filter". This simplifies lots of
logic downstream, and doesn't hurt since we don't have intervals-specific
optimizations outside of tables.

Fixes an issue where we couldn't properly filter on a column from an
external datasource if it was named __time.
2023-06-29 09:57:11 +05:30
Karan Kumar cb3a9d2b57
Adding Interactive API's for MSQ engine (#14416)
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
2023-06-28 17:51:58 +05:30
Kashif Faraz 4bd6bd0d4f
Improve CostBalancerStrategy, deprecate cachingCost (#14484)
Changes to `cost` strategy:
- In every `ServerHolder`, track the number of segments per datasource per interval
- Perform cost computations for a given interval just once, and then multiply by a constant
factor to account for the total segment count in that interval
- Do not perform joint cost computations with segments that are outside the compute interval
(± 45 days) for the segment being considered for move
- Remove metrics `segment/cost/*` as they were coordinator killers! Turning on these metrics
(by setting `emitBalancingStats` to true) has often caused the coordinator to be stuck for hours.
Moreover, they are too complicated to decipher and do not provide any meaningful insight into
a Druid cluster.
- Add new simpler metrics `segment/balancer/compute/*` to track cost computation time,
count and errors.

Other changes:
- Remove flaky test from `CostBalancerStrategyTest`.
- Add tests to verify that computed cost has remained unchanged
- Remove usages of mock `BalancerStrategy` from `LoadRuleTest`, `BalanceSegmentsTest`
- Clean up `BalancerStrategy` interface
2023-06-27 13:23:29 +05:30
YongGang b7434be99e
Add ServiceStatusMonitor to monitor service health (#14443)
* Add OverlordStatusMonitor and CoordinatorStatusMonitor to monitor service leader status

* make the monitor more general

* resolve conflict

* use Supplier pattern to provide metrics

* reformat code and doc

* move service specific tag to dimension

* minor refine

* update doc

* reformat code

* address comments

* remove declared exception

* bind HeartbeatSupplier conditionally in Coordinator
2023-06-26 10:26:37 -07:00
Laksh Singla 1647d5f4a0
Limit the subquery results by memory usage (#13952)
Users can now add a guardrail to prevent subquery’s results from exceeding the set number of bytes by setting druid.server.http.maxSubqueryRows in Broker's config or maxSubqueryRows in the query context. This feature is experimental for now and would default back to row-based limiting in case it fails to get the accurate size of the results consumed by the query.
2023-06-26 18:12:28 +05:30
Adarsh Sanjeev 90b8f850a5
Allow empty tiered replicants map for load rules (#14432)
Changes:
- Add property `useDefaultTierForNull` for all load rules. This property determines the default
value of `tieredReplicants` if it is not specified. When true, the default is `_default_tier => 2 replicas`.
When false, the default is empty, i.e. no replicas on any tier.
- Fix validation to allow empty replicants map, so that the segment is used but not loaded anywhere.
2023-06-22 14:44:06 +05:30
Rishabh Singh 92a7febacb
Revert "Add method to authorize native query using authentication result (#14376)" (#14452)
This reverts commit 8b212e73d7.
2023-06-21 10:42:26 +05:30
Hardik Bajaj 1ea9158a50
Added new SysMonitorOshi v0 using Oshi library (#14359)
Added a new monitor SysMonitorOshi to replace SysMonitor. The new monitor has a wider support for different machine architectures including ARM instances. Please switch to SysMonitorOshi as SysMonitor is now deprecated and will be removed in future releases.
2023-06-20 20:57:58 +05:30
Adarsh Sanjeev f5cc823d0f
Handle nulls in DruidCoordinator.getReplicationFactor (#14447) 2023-06-20 15:25:57 +05:30
Kashif Faraz 50461c3bd5
Enable smartSegmentLoading on the Coordinator (#13197)
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.

Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`

Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
2023-06-19 14:27:35 +05:30
imply-cheddar cfd07a95b7
Errors take 3 (#14004)
Introduce DruidException, an exception whose goal in life is to be delivered to a user.

DruidException itself has javadoc on it to describe how it should be used.  This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.

This work was a 3rd iteration on top of work that was started by Paul Rogers.  I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
2023-06-19 01:11:13 -07:00
Adarsh Sanjeev 128133fadc
Add column replication_factor column to sys.segments table (#14403)
Description:
Druid allows a configuration of load rules that may cause a used segment to not be loaded
on any historical. This status is not tracked in the sys.segments table on the broker, which
makes it difficult to determine if the unavailability of a segment is expected and if we should
not wait for it to be loaded on a server after ingestion has finished.

Changes:
- Track replication factor in `SegmentReplicantLookup` during evaluation of load rules
- Update API `/druid/coordinator/v1metadata/segments` to return replication factor
- Add column `replication_factor` to the sys.segments virtual table and populate it in
`MetadataSegmentView`
- If this column is 0, the segment is not assigned to any historical and will not be loaded.
2023-06-18 10:02:21 +05:30
Maytas Monsereenusorn 5d76d0ea74
Fix segment/deleted/count metric not being emitted (#14433)
* Fix segment/deleted/count metric

* Fix segment/deleted/count metric

* Fix segment/deleted/count metric
2023-06-15 14:08:19 -07:00
Rishabh Singh 66c3cc1391
Handle unparseable SupervisorSpec in metadata store (#14382)
Changes:
- Skip a supervisor spec entry which cannot be deserialised into a `SupervisorSpec` object.
- Log an error for the unparseable spec
2023-06-13 08:02:01 +05:30
Rishabh Singh 8b212e73d7
Add method to authorize native query using authentication result (#14376) 2023-06-12 11:06:00 +05:30
Kashif Faraz 6e158704cb
Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271)
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
2023-06-10 12:15:44 +05:30
Abhishek Radhakrishnan 23c2dcaf8d
Add NullHandling module initialization for `LookupDimensionSpecTest` (#14393) 2023-06-09 09:07:32 +05:30
Kashif Faraz 12e8fa5c97
Prevent coordinator from getting stuck if leadership changes during coordinator run (#14385)
Changes:
- Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`.
1 minute is the typical time for a full coordinator run and is more than enough time for cost
computations of a single segment.
- Raise an alert if an exception is encountered while computing costs and if the executor has
not been shutdown. This is because a shutdown is intentional and does not require an alert.
2023-06-08 15:29:20 +05:30
zachjsh 04a82da63d
Input source security fixes (#14266)
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
2023-06-01 16:37:19 -07:00
Kashif Faraz d4cacebf79
Add tests for CostBalancerStrategy (#14230)
Changes:
- `CostBalancerStrategyTest`
  - Focus on verification of cost computations rather than choosing servers in this test
  - Add new tests `testComputeCost` and `testJointSegmentsCost`
  - Add tests to demonstrate that with a long enough interval gap, all costs become negligible
  - Retain `testIntervalCost` and `testIntervalCostAdditivity`
  - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as
verification of this behaviour is better suited to `BalancingStrategiesTest`.
- `CostBalancerStrategyBenchmark`
  - Remove usage of static method from `CostBalancerStrategyTest`
  - Explicitly setup cluster and segments to use for benchmarking
2023-05-30 08:52:56 +05:30
Kashif Faraz 8091c6a547
Update default values in CoordinatorDynamicConfig (#14269)
The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated.

1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100)
2. `replicationThrottleLimit = 500` (previous = 10)
Rationale: With round-robin segment assignment now being the default assignment technique,
the Coordinator can assign a large number of under-replicated/unavailable segments very quickly,
without getting stuck in `RunRules` duty due to very slow strategy-based cost computations.

3. `maxSegmentsToMove = 100` (previous = 5)
Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments
to balance. A very large value can cause excessive moves, which has these disadvantages:
- Load of moving segments competing with load of unavailable/under-replicated segments
- Unnecessary network costs due to constant download and delete of segments

These defaults will be revisited after #13197 is merged.
2023-05-30 08:51:33 +05:30
Soumyava 22ba457d29
Expr getCacheKey now delegates to children (#14287)
* Expr getCacheKey now delegates to children

* Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it

* Adding an unit test

* Update processing/src/main/java/org/apache/druid/math/expr/Expr.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-05-23 14:49:38 -07:00
imply-cheddar e9fed1445f
Revert PreResponseAuthorizationCheckFilter (#13813)
Make it permissive like it used to be again so that we
ensure that validation errors make it out.
2023-05-18 18:16:43 -07:00
Paul Rogers 3c0983c8e9
Extend the IT framework to allow tests in extensions (#13877)
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
2023-05-15 20:29:51 +05:30
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
Kashif Faraz 64e6283eca
Do not allow retention rules to be null (#14223)
Changes:
- Do not allow retention rules for any datasource or cluster to be null
- Allow empty rules at the datasource level but not at the cluster level
- Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly
- Minor style refactors
2023-05-11 14:33:56 +05:30
Clint Wylie a7a4bfd331
modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184)
This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.
2023-05-08 11:42:05 +05:30
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
George Shiqi Wu d0654e2174
Register emitter (#14180) 2023-04-27 18:32:50 -07:00
Suneet Saldanha 84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
Gian Merlino 752475b799
Fix two concurrency issues with segment fetching. (#14042)
* Fix two concurrency issues with segment fetching.

1) SegmentLocalCacheManager: Fix a concurrency issue where certain directory
   cleanup happened outside of directoryWriteRemoveLock. This created the
   possibility that segments would be deleted by one thread, while being
   actively downloaded by another thread.

2) TaskDataSegmentProcessor (MSQ): Fix a concurrency issue when two stages
   in the same process both use the same segment. For example: a self-join
   using distributed sort-merge. Prior to this change, the two stages could
   delete each others' segments.

3) ReferenceCountingResourceHolder: increment() returns a new ResourceHolder,
   rather than a Releaser. This allows it to be passed to callers without them
   having to hold on to both the original ResourceHolder *and* a Releaser.

4) Simplify various interfaces and implementations by using ResourceHolder
   instead of Pair and instead of split-up fields.

* Add test.

* Fix style.

* Remove Releaser.

* Updates from master.

* Add some GuardedBys.

* Use the correct GuardedBy.

* Adjustments.
2023-04-25 20:49:27 -07:00
Abhishek Singh Chouhan 895abd8929
Refresh DruidLeaderClient cache selectively for non-200 responses (#14092)
* Refresh DruidLeaderClient cache for non-200 responses

* Change local variable name to avoid confusion

* Implicit retries for 503 and 504

* Remove unused imports

* Use argumentmatcher instead of Mockito for #any in test

* Remove flag to disable retry for 503/504

* Remove unused import from test

* Add log line for internal retry

---------

Co-authored-by: Abhishek Singh Chouhan <abhishek.chouhan@salesforce.com>
2023-04-20 01:46:54 -07:00
zachjsh 04da0102cb
KillTask should return empty inputSource resources (#14106)
### Description

This pr fixes a few bugs found with the inputSource security feature.

1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now.

2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against:

`{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}`

When they should be instead authenticated against:

`{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}`

3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.
2023-04-18 15:27:16 -04:00
Atul Mohan e3c160f2f2
Add start_time column to sys.servers (#13358)
Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.
2023-04-14 15:23:34 +05:30
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00