Commit Graph

4143 Commits

Author SHA1 Message Date
Gian Merlino d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Laksh Singla b82ad59dfe
Better logging in ServiceClientImpl (#15269)
ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal.
This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs.
Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.
2023-11-02 11:32:49 +05:30
Gian Merlino 6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Gian Merlino 37e158c2c4
Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300)
* Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN.

Prior to this patch, columnar frames would always write multi-valued columns if
the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping
UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE
causes expressions to assume that string inputs must be treated as arrays.

We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness
is encountered, and flipping it to TRUE if multi-valuedness is encountered.

* Add regression test case.
2023-11-01 22:05:53 -07:00
Suneet Saldanha e6b7c36e74
LoadRules with 0 replicas should be treated as handoff complete (#15274)
* LoadRules with 0 replicas should be treated as handoff complete

* fix it

* pr feedback

* fixit
2023-10-30 10:42:58 -07:00
George Shiqi Wu 3173093415
Handle status failures for streaming supervisors (#15174)
* Cleanup logic

* newline

* remove whitespace

* Fix log message

* Add test class

* PR changes
2023-10-30 10:21:23 -07:00
kaisun2000 60c2ad597a
Enhance json parser error logging to better track Istio Proxy error message (#15176)
Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation.

In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server.

For example, this is an example of mysterious error message

QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}"

While the context of the message is the following from the proxy when it can't tunnel the network connection.

pstream connect error or disconnect/reset before header

So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network.

Co-authored-by: Kai Sun <kai.sun@salesforce.com>
2023-10-27 14:20:19 +05:30
AmatyaAvadhanula 65b69cded4
Filter pending segments upgraded with transactional replace (#15169)
* Filter pending segments upgraded with transactional replace

* Push sequence name filter to metadata query
2023-10-23 21:18:47 +05:30
AmatyaAvadhanula a8febd457c
A Replacing task must read segments created before it acquired its lock (#15085)
* Replacing tasks must read segments created before they acquired their locks
2023-10-19 11:13:07 +05:30
AmatyaAvadhanula d25caaefa4
Add support for streaming ingestion with concurrent replace (#15039)
Add support for streaming ingestion with concurrent replace

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-13 09:09:03 +05:30
Laksh Singla 5f86072456
Prepare master for Druid 29 (#15121)
Prepare master for Druid 29
2023-10-11 10:33:45 +05:30
Laksh Singla 95bf331c08
Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108)
The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.
2023-10-10 02:03:29 +05:30
Adarsh Sanjeev 7a35ce886d
Add ability for MSQ tasks to query realtime tasks (#15024)
This PR aims to add the capabilities to:
1. Fetch the realtime segment metadata from the coordinator server view,
2. Adds the ability for workers to query indexers, similar to how brokers do the same for native queries.
2023-10-09 15:14:03 +05:30
kaisun2000 e2cc1c4ad1
Add metric -- count of queries waiting for merge buffers (#15025)
Add 'mergeBuffer/pendingRequests' metric that exposes the count of waiting queries (threads) blocking in the merge buffers pools.
2023-10-09 12:56:23 +05:30
Pranav f1edd671fb
Exposing optional replaceMissingValueWith in lookup function and macros (#14956)
* Exposing optional replaceMissingValueWith in lookup function and macros

* args range validation

* Updating docs

* Addressing comments

* Update docs/querying/sql-scalar.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/querying/sql-functions.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Addressing comments

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-10-02 17:09:23 -07:00
Parth Agrawal d038237ece
memcached cache: switch to AWS elasticache-java-cluster-client and add TLS support (#14827)
This PR updates the library used for Memcached client to AWS Elasticache Client : https://github.com/awslabs/aws-elasticache-cluster-client-memcached-for-java

This enables us to use the option of encrypting data in transit:
Amazon ElastiCache for Memcached now supports encryption of data in transit

For clusters running the Memcached engine, ElastiCache supports Auto Discovery—the ability for client programs to automatically identify all of the nodes in a cache cluster, and to initiate and maintain connections to all of these nodes.
Benefits of Auto Discovery - Amazon ElastiCache

AWS has forked spymemcached 2.12.1, and has since added all the patches included in 2.12.2 and 2.12.3 as part of the 1.2.0 release. So, this can now be considered as an equivalent drop-in replacement.

GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticache/AmazonElastiCacheClient.html#AmazonElastiCacheClient--

How to enable TLS with Elasticache

On server side:
https://docs.aws.amazon.com/AmazonElastiCache/latest/mem-ug/in-transit-encryption-mc.html#in-transit-encryption-enable-existing-mc

On client side:
GitHub - awslabs/aws-elasticache-cluster-client-memcached-for-java: Amazon ElastiCache Cluster Client for Java - enhanced library to connect to ElastiCache clusters.
2023-10-02 12:51:05 -07:00
Gian Merlino 3dabfead05
Fix getResultType for HLL, quantiles aggregators. (#15043)
The aggregators had incorrect types for getResultType when shouldFinalze
is false. They had the finalized type, but they should have had the
intermediate type.

Also includes a refactor of how ExprMacroTable is handled in tests, to make
it easier to add tests for this to the MSQ module. The bug was originally
noticed because the incorrect result types caused MSQ queries with DS_HLL
to behave erratically.
2023-09-27 08:51:14 +05:30
AmatyaAvadhanula c62193c4d7
Add support for concurrent batch Append and Replace (#14407)
Changes:
- Add task context parameter `taskLockType`. This determines the type of lock used by a batch task.
- Add new task actions for transactional replace and append of segments
- Add methods StorageCoordinator.commitAppendSegments and commitReplaceSegments
- Upgrade segments to appropriate versions when performing replace and append
- Add new metadata table `upgradeSegments` to track segments that need to be upgraded
- Add tests
2023-09-25 07:06:37 +05:30
Kashif Faraz d7c152c82c
Add a TaskReport for "kill" tasks (#15023)
- Add `KillTaskReport` that contains stats for `numSegmentsKilled`,
`numBatchesProcessed`, `numSegmentsMarkedAsUnused`
- Fix bug where exception message had no formatter but was was still being passed some args.
- Add some comments regarding deprecation of `markAsUnused` flag.
2023-09-23 07:44:27 +05:30
Kashif Faraz 409bffe7f2
Rename IMSC.announceHistoricalSegments to commitSegments (#15021)
This commit pulls out some changes from #14407 to simplify that PR.

Changes:
- Rename `IndexerMetadataStorageCoordinator.announceHistoricalSegments` to `commitSegments`
- Rename the overloaded method to `commitSegmentsAndMetadata`
- Fix some typos
2023-09-21 16:19:03 +05:30
Laksh Singla 82e809c8d0
fix (#15017) 2023-09-20 15:48:26 -07:00
Zoltan Haindrich 79f882f48c
Fix exception cause logging in QueryResultPusher (#14975) 2023-09-20 15:44:02 +05:30
Laksh Singla 4c57504960
Fix the uncaught exceptions when materializing results as frames (#14970)
When materializing the results as frames, we defer the creation of the frames in ScanQueryQueryToolChest, which passes through the catch-all block reserved for catching cases when we don't have the complete row signature in the query (and falls back to the old code).
This PR aims to resolve it by adding the frame generation code to the try-catch block we have at the outer level.
2023-09-13 15:41:28 +05:30
Kashif Faraz 286eecad7c
Simplify DruidCoordinatorConfig and binding of metadata cleanup duties (#14891)
Changes:
- Move following configs from `CliCoordinator` to `DruidCoordinatorConfig`:
  - `druid.coordinator.kill.on`
  - `druid.coordinator.kill.pendingSegments.on`
  - `druid.coordinator.kill.supervisors.on`
  - `druid.coordinator.kill.rules.on`
  - `druid.coordinator.kill.audit.on`
  - `druid.coordinator.kill.datasource.on`
  - `druid.coordinator.kill.compaction.on`
- In the Coordinator style used by historical management duties, always instantiate all
 the metadata cleanup duties but execute only if enabled. In the existing code, they are
instantiated only when enabled by using optional binding with Guice.
- Add a wrapper `MetadataManager` which contains handles to all the different
metadata managers for rules, supervisors, segments, etc.
- Add a `CoordinatorConfigManager` to simplify read and update of coordinator configs
- Remove persistence related methods from `CoordinatorCompactionConfig` and
`CoordinatorDynamicConfig` as these are config classes.
- Remove annotations `@CoordinatorIndexingServiceDuty`,
`@CoordinatorMetadataStoreManagementDuty`
2023-09-13 09:06:57 +05:30
Clint Wylie 891f0a3fe9
longer compatibility window for nested column format v4 (#14955)
changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
2023-09-12 14:07:53 -07:00
Kashif Faraz 7871e633c6
Fix bug in KillStalePendingSegments (#14961) 2023-09-11 15:18:15 +05:30
Kashif Faraz 647686aee2
Add test and metrics for KillStalePendingSegments duty (#14951)
Changes:
- Add new metric `kill/pendingSegments/count` with dimension `dataSource`
- Add tests for `KillStalePendingSegments`
- Reduce no-op logs that spit out for each datasource even when no pending
segments have been deleted. This can get particularly noisy at low values of `indexingPeriod`.
- Refactor the code in `KillStalePendingSegments` for readability and add javadocs
2023-09-08 10:33:47 +05:30
Kashif Faraz 88f3c9baed
Fix bug in computed value of balancerComputeThreads (#14947)
In smartSegmentLoading mode, use computed value of balancerComputeThreads
rather than configured value.
2023-09-07 01:14:05 +05:30
Laksh Singla 6ee0b06e38
Auto configuration for maxSubqueryBytes (#14808)
A new monitor SubqueryCountStatsMonitor which emits the metrics corresponding to the subqueries and their execution is now introduced. Moreover, the user can now also use the auto mode to automatically set the number of bytes available per query for the inlining of its subquery's results.
2023-09-06 05:47:19 +00:00
Adarsh Sanjeev 959148ad37
Add code to wait for segments generated to be loaded on historicals (#14322)
Currently, after an MSQ query, the web console is responsible for waiting for the segments to load. It does so by checking if there are any segments loading into the datasource ingested into, which can cause some issues, like in cases where the segments would never be loaded, or would end up waiting for other ingests as well.

This PR shifts this responsibility to the controller, which would have the list of segments created.
2023-09-06 10:35:57 +05:30
Kashif Faraz ec630e3671
Remove deprecated coordinator dynamic configs (#14923)
Changes:

[A] Remove config `decommissioningMaxPercentOfMaxSegmentsToMove`
- It is a complicated config 😅 , 
- It is always desirable to prioritize move from decommissioning servers so that
they can be terminated quickly, so this should always be 100%
- It is already handled by `smartSegmentLoading` (enabled by default)

[B] Remove config `maxNonPrimaryReplicantsToLoad`
This was added in #11135 to address two requirements:
- Prevent coordinator runs from getting stuck assigning too many segments to historicals
- Prevent load of replicas from competing with load of unavailable segments

Both of these requirements are now already met thanks to:
- Round-robin segment assignment
- Prioritization in the new coordinator
- Modifications to `replicationThrottleLimit`
- `smartSegmentLoading` (enabled by default)
2023-09-04 11:54:36 +05:30
Kashif Faraz 7f26b80e21
Simplify ServiceMetricEvent.Builder (#14933)
Changes:
- Make ServiceMetricEvent.Builder extend ServiceEventBuilder<ServiceMetricEvent>
and thus convert it to a plain builder rather than a builder of builder.
- Add methods setCreatedTime , setMetricAndValue to the builder
2023-09-01 11:30:45 +05:30
Clint Wylie dea9d4f1a7
cleaning DruidProcessingConfig bindings (#14927) 2023-08-30 22:35:08 -07:00
Kashif Faraz 8263f0d1e9
Reduce coordinator logs when operating normally (#14926)
Changes:
- Reduce log level of some coordinator stats, which only denote normal coordinator operation.
These stats are still emitted and can be logged by setting debugDimensions in the coordinator
dynamic config.
- Initialize SegmentLoadingConfig only for historical management duties. This config is not
needed in other duties and initializing it creates logs which are misleading.
2023-08-30 11:30:38 +05:30
Kashif Faraz d6565f46b0
Increase the computed value of replicationThrottleLimit (#14913)
Changes
- Increase value of `replicationThrottleLimit` computed by `smartSegmentLoading` from
2% to 5% of total number of used segments.
- Assign replicas to a tier even when some replicas are already being loaded in that tier
- Limit the total number of replicas in load queue at start of run + replica assignments in
the run to the `replicationThrottleLimit`.

i.e. for every tier,
    num loading replicas at start of run + num replicas assigned in run <= replicationThrottleLimit
2023-08-28 18:20:22 +05:30
Karan Kumar 9fcbf05c5d
Adjusting `SqlStatementResource` and `SqlTaskResource` to set request attribute via a new method. (#14878) 2023-08-26 10:59:47 +00:00
Kashif Faraz e51181957c
Use num cores to determine balancerComputeThreads (#14902)
Changes:
- Determine the default value of balancerComputeThreads based on number of
coordinator cpus rather than number of segments. Even if the number of segments
is low and we create more balancer threads, it doesn't hurt the system as threads
would mostly be idle.
- Remove unused field from SegmentLoadQueueManager

Expected values:
- Clusters with ~1M segments typically work with Coordinators having 16 cores or more.
This would give us 8 balancer threads, which is the same as the current maximum.
- On small clusters, even a single thread is enough to do the required balancing work.
2023-08-25 08:15:27 +05:30
Clint Wylie 36e659a501
remove group-by v1 (#14866)
* remove group-by v1

* docs

* remove unused configs, fix test

* fix test

* adjustments

* why not

* adjust

* review stuff
2023-08-23 12:44:06 -07:00
zachjsh 0c76df1c7d
Enable Continuous auto kill (#14831)
### Description

This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following:

1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work.

2. The task resources used by auto kill were previously unbounded.  Each duty run period, if unused
 segments were found for any datasource, a kill task would be submitted to kill them.

This pr solves for both of these issues:

1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map,  allows for searching for unusedSegments in the datasource from the beginning of time once again

2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of ba957a9b97


Operators can configure continous auto kill by providing coordinator runtime properties similar to the following:

```
druid.coordinator.period.indexingPeriod=PT60S
druid.coordinator.kill.period=PT60S
```

And providing sensible limits to the killTask usage via coordinator dynamic properties.
2023-08-23 09:23:08 -04:00
Adarsh Sanjeev dfb5a98888
Add coordinator API for unused segments (#14846)
There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.
2023-08-23 14:51:25 +05:30
Kashif Faraz 9376d8d6e1
Refactor: Move `UpdateCoordinatorStateAndPrepareCluster` duty out of `DruidCoordinator` (#14845)
Motivation:
- Clean up `DruidCoordinator` and move methods to classes where they are most relevant

Changes:
- No functional change
- Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState`
- Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster`
- Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here
- Move reporting of used segment stats and historical capacity stats from
`CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Move reporting of unavailable and under-replicated segment stats from
`CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty
2023-08-22 19:50:41 +05:30
Tejaswini Bandlamudi d87056e708
Upgrade guava version to 31.1-jre (#14767)
Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues.

CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution.
Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414
2023-08-22 12:09:53 +05:30
Kashif Faraz 92906059d2
Remove segmentsToBeDropped from SegmentTransactionInsertAction (#14883)
Motivation:
- There is no usage of the `SegmentTransactionInsertAction` which passes a
non-null non-empty value of `segmentsToBeDropped`.
- This is not really needed either as overshadowed segments are marked as unused
by the Coordinator and need not be done in the same transaction as committing segments.
- It will also help simplify the changes being made in #14407 

Changes:
- Remove `segmentsToBeDropped` from the task action and all intermediate methods
- Remove related tests which are not needed anymore
2023-08-21 20:08:56 +05:30
Kashif Faraz c211dcc4b3
Clean up compaction logs on coordinator (#14875)
Changes:
- Move logic of `NewestSegmentFirstIterator.needsCompaction` to `CompactionStatus`
to improve testability and readability
- Capture the list of checks performed to determine if compaction is needed in a readable
manner in `CompactionStatus.CHECKS`
- Make `CompactionSegmentIterator` iterate over instances of `SegmentsToCompact`
instead of `List<DataSegment>`. This allows use of the `umbrellaInterval` later.
- Replace usages of `QueueEntry` with `SegmentsToCompact`
- Move `SegmentsToCompact` out of `NewestSegmentFirstIterator`
- Simplify `CompactionStatistics`
- Reduce level of less important logs to debug
- No change made to tests to ensure correctness
2023-08-21 17:30:41 +05:30
Kashif Faraz 07a193a142
Use separate executor for each coordinator duty group (#14869)
Changes:
- Use separate executor for every duty group
- This change is thread-safe as every duty group uses its own copy of
`DruidCoordinatorRuntimeParams` and does not share any other mutable instances
with other duty groups.
- With the exception of `HistoricalManagementDuties`, duty groups are typically not
very compute intensive and mostly perform database or HTTP I/O. So, coordinator
resources would still mostly be available for `HistoricalManagementDuties`.
2023-08-21 15:53:22 +05:30
Abhishek Agarwal 9065ef1aff
Fix a bug in QosFilter (#14859)
QoSFilter class is trying to parse the timeout as an integer. We need to round a value of query timeout that is higher than INT.MAX to INT.MAX.
2023-08-21 13:00:41 +05:30
Kashif Faraz 097b645005
Clean up after add kill bufferPeriod (#14868)
Follow up changes to #12599 

Changes:
- Rename column `used_flag_last_updated` to `used_status_last_updated`
- Remove new CLI tool `UpdateTables`.
  - We already have a `CreateTables` with similar functionality, which should be able to
  handle update cases too.
  - Any user running the cluster for the first time should either just have `connector.createTables`
  enabled or run `CreateTables` which should create tables at the latest version.
  - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has
  been added to Druid, and users would have to run `CreateTables` anyway.
- Remove `upgrade-prep.md` and include that info in `metadata-init.md`.
- Fix log messages to adhere to Druid style
- Use lambdas
2023-08-19 00:00:04 +05:30
Lucas Capistrant 9c124f2cde
Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599)
* Add new configurable buffer period to create gap between mark unused and kill of segment

* Changes after testing

* fixes and improvements

* changes after initial self review

* self review changes

* update sql statement that was lacking last_used

* shore up some code in SqlMetadataConnector after self review

* fix derby compatibility and improve testing/docs

* fix checkstyle violations

* Fixes post merge with master

* add some unit tests to improve coverage

* ignore test coverage on new UpdateTools cli tool

* another attempt to ignore UpdateTables in coverage check

* change column name to used_flag_last_updated

* fix a method signature after column name switch

* update docs spelling

* Update spelling dictionary

* Fixing up docs/spelling and integrating altering tasks table with my alteration code

* Update NULL values for used_flag_last_updated in the background

* Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod

* remove unneeded things now that the new column is automatically updated

* Test new background row updater method

* fix broken tests

* fix create table statement

* cleanup DDL formatting

* Revert adding columns to entry table by default

* fix compilation issues after merge with master

* discovered and fixed metastore inserts that were breaking integration tests

* fixup forgotten insert by using pattern of sharing now timestamp across columns

* fix issue introduced by merge

* fixup after merge with master

* add some directions to docs in the case of segment table validation issues
2023-08-17 19:32:51 -05:00
Abhishek Radhakrishnan 37db5d9b81
Reset offsets supervisor API (#14772)
* Add supervisor /resetOffsets API.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.

* Add a new interface method for backwards compatibility.

* Rename

* Adjust tests and javadocs.

* Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules

* UT fix

* Doc updates.

* remove extraneous debugging logs.

* Remove the boolean setting; only ResetHandle() and resetInternal()

* Relax constraints and add a new ResetOffsetsNotice; cleanup old logic.

* A separate ResetOffsetsNotice and some cleanup.

* Minor cleanup

* Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers

* Add unit tests for the no op implementations for test coverage

* CodeQL fix

* checkstyle from merge conflict

* Doc changes

* DOCUSAURUS code tabs fix. Thanks, Brian!
2023-08-17 14:13:10 -07:00
Kashif Faraz fffb2e4fe7
Speed up SQLMetadataStorageActionHandlerTest (#14856)
Changes
- Reduce test time of `SQLMetadataStorageActionHandlerTest.testMigration`
- Slightly modify log messages to adhere to Druid style
2023-08-17 18:02:43 +05:30