Commit Graph

2127 Commits

Author SHA1 Message Date
Gian Merlino 450ecd6370
More efficient generation of ImmutableWorkerHolder from WorkerHolder. (#14546)
* More efficient generation of ImmutableWorkerHolder from WorkerHolder.

Taking the work done in #12096 a little further:

1) Applying a similar optimization to WorkerHolder (HttpRemoteTaskRunner).
   The original patch only helped with the ZkWorker (RemoteTaskRunner).

2) Improve the ZkWorker version somewhat by avoiding multiple iterations
   through the task announcements map.

* Pick better names and use better logic.

* Only runnable tasks.

* Fix test.

* Fix testBlacklistZKWorkers50Percent.
2023-07-13 07:57:16 -07:00
Gian Merlino 63ee69b4e8
Claim full support for Java 17. (#14384)
* Claim full support for Java 17.

No production code has changed, except the startup scripts.

Changes:

1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK.

2) Include the full list of opens and exports on both Java 11 and 17.

3) Document that Java 17 is both supported and preferred.

4) Switch some tests from Java 11 to 17 to get better coverage on the
   preferred version.

* Doc update.

* Update errorprone.

* Update docker_build_containers.sh.

* Update errorprone in licenses.yaml.

* Add some more run-javas.

* Additional run-javas.

* Update errorprone.

* Suppress new errorprone error.

* Add exports and opens in ForkingTaskRunner for Java 11+.

Test, doc changes.

* Additional errorprone updates.

* Update for errorprone.

* Restore old fomatting in LdapCredentialsValidator.

* Copy bin/ too.

* Fix Java 15, 17 build line in docker_build_containers.sh.

* Update busybox image.

* One more java command.

* Fix interpolation.

* IT commandline refinements.

* Switch to busybox 1.34.1-glibc.

* POM adjustments, build and test one IT on 17.

* Additional debugging.

* Fix silly thing.

* Adjust command line.

* Add exports and opens one more place.

* Additional harmonization of strong encapsulation parameters.
2023-07-07 12:52:35 -07:00
Gian Merlino 021a01df45
RTR, HRTR: Fix incorrect maxLazyWorkers check in markLazyWorkers. (#14545)
Recently #14532 fixed a problem when maxLazyWorkers == 0 and lazyWorkers
starts out empty. Unfortunately, even after that patch, there remained
a more general version of this problem when maxLazyWorkers == lazyWorkers.size().
This patch fixes it.

I'm not sure if this would actually happen in production, because the
provisioning strategies do try to avoid calling markWorkersLazy until
previously-initiated terminations have finished. Nevertheless, it still
seems like a good thing to fix.
2023-07-07 10:08:12 -07:00
Kashif Faraz 40d0dc9e0e
Use separate executor to handle task updates in TaskQueue (#14533)
Description:
`TaskQueue.notifyStatus` is often a heavy call as it performs the following operations:
- Update task status in metadata DB
- Update task locks in metadata DB
- Request (synchronously) the task runner to shutdown the completed task
- Clean up in-memory data structures

This method can often be slow and can cause worker sync / task runners to slow down.

Main changes:
- Run task completion callbacks in a separate executor to handle task completion updates
- Add new config `druid.indexer.queue.taskCompleteHandlerNumThreads`
- Add metrics to monitor number of processed and queued items
- There are still other paths that can invoke `notifyStatus`, but those need not be moved to
the new executor as they are synchronous on purpose.

Other changes:
- Add new metrics `task/status/queue/count`, `task/status/handled/count`
- Add `TaskCountStatsProvider.getStats()` which deprecates the other `getXXXTaskCount` methods.
- Use `CoordinatorRunStats` to collect and report metrics. This class has been used as is
for now but will later be renamed and repurposed to use across all Druid services.
2023-07-07 20:43:12 +05:30
Gian Merlino 1fe61bc869
ChangeRequestHttpSyncer: Don't wait 1ms when checking isInitialized(). (#14547)
The wait doesn't seem to serve a purpose, other than causing delays
when checking isInitialized() for a large number of things that have
not yet been initialized.
2023-07-07 05:54:39 -07:00
Kashif Faraz d63eff3b1b
Reduce contention in HttpRemoteTaskRunner.getKnownTasks() (#14541) 2023-07-07 13:43:59 +05:30
Gian Merlino 037f09bef2
HttpRemoteTaskRunner: Fix markLazyWorkers for maxLazyWorkers == 0. (#14532) 2023-07-06 11:51:04 -07:00
Kashif Faraz 87bb1b9709
Fix bug during initialization of HttpServerInventoryView (#14517)
If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`,
the initialization gets stuck as this server is never synced. The method eventually times
out (default 250s).

Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for
non-stopped servers to sync.

Other changes:
- Add new metrics for better debugging of slow broker/coordinator startup
  - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server
  - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable  
- Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer`
- Minor refactor for readability
- Add utility class `Stopwatch`
- Add tests and stubs
2023-07-06 13:04:53 +05:30
AmatyaAvadhanula 609833c97b
Do not emit negative lag because of stale offsets (#14292)
The latest topic offsets are polled frequently and used to determine the lag based on the current offsets. However, when the offsets are stale (which can happen due to connection issues commonly), we may see a negative lag .

This PR prevents emission of metrics when the offsets are stale and at least one of the partitions has a negative lag.
2023-07-05 14:44:23 +05:30
Clint Wylie 277aaa5c57
remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500)
* combine string column implementations
changes:
* generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations
* remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings
* remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user
2023-07-02 19:37:15 -07:00
Karan Kumar cb3a9d2b57
Adding Interactive API's for MSQ engine (#14416)
This PR aims to expose a new API called
"@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.
2023-06-28 17:51:58 +05:30
Clint Wylie 31b9d5695d
Extend InitializedNullHandlingTest instead of NullHandlingTest (#14467)
NullHandlingTest is an actual test, it shouldn't be used as a base class
2023-06-22 15:01:50 +05:30
imply-cheddar cfd07a95b7
Errors take 3 (#14004)
Introduce DruidException, an exception whose goal in life is to be delivered to a user.

DruidException itself has javadoc on it to describe how it should be used.  This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used.

This work was a 3rd iteration on top of work that was started by Paul Rogers.  I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.
2023-06-19 01:11:13 -07:00
George Shiqi Wu 64af9bfe5b
Add groupId to metrics (#14402)
* Add group id as a dimension

* Revert changes

* Add to forking task runner

* Add missing metrics

* Fix indenting

* revert metrics

* Fix indentation
2023-06-16 09:28:16 -07:00
Gian Merlino 85656a467c
MSQ: Load broadcast tables on workers. (#14437)
They were not previously loaded because supportsQueries was false.
This patch sets supportsQueries to true, and clarifies in Task
javadocs that supportsQueries can be true for tasks that aren't
directly queryable over HTTP.
2023-06-16 12:02:20 +05:30
Clint Wylie 8454cc619a
auto columns fixes (#14422)
changes:
* auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR)
* fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs
* fix bug with bound filter when upper is null equivalent but is strict
2023-06-14 08:57:06 -07:00
Kashif Faraz 6e158704cb
Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271)
Changes
- Add a `DruidException` which contains a user-facing error message, HTTP response code
- Make `EntryExistsException` extend `DruidException`
- If metadata store max_allowed_packet limit is violated while inserting a new task, throw
`DruidException` with response code 400 (bad request) to prevent retries
- Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL
2023-06-10 12:15:44 +05:30
Harini Rajendran 4ff6026d30
Adding SegmentMetadataEvent and publishing them via KafkaEmitter (#14281)
In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.
2023-06-02 21:28:26 +05:30
Andreas Maechler 45014bd5b4
Handle all types of exceptions when initializing input source in sampler API (#14355)
The sampler API returns a `400 bad request` response if it encounters a `SamplerException`.
Otherwise, it returns a generic `500 Internal server error` response, with the message
"The RuntimeException could not be mapped to a response, re-throwing to the HTTP container".

This commit updates `RecordSupplierInputSource` to handle all types of exceptions instead of just
`InterruptedException`and wrap them in a `SamplerException` so that the actual error is
propagated back to the user.
2023-06-02 19:43:53 +05:30
zachjsh 04a82da63d
Input source security fixes (#14266)
It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.
2023-06-01 16:37:19 -07:00
Rishabh Singh 2086ff88bc
Add logging for task stop operations (#14192)
Log more details when task cannot be stopped for various reasons
2023-05-30 18:50:52 +05:30
Alexander Saydakov 4131c0df13
use the latest datasketches-java-4.0.0 (#14334)
* use the latest datasketches-java-4.0.0

* updated versions of datasketches

* adjusted expectation

* fixed the expectations

---------

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2023-05-27 22:19:18 -07:00
Kashif Faraz 0cde3a8b52
Fix regression in batch segment allocation (#14337)
* Improve batch segment allocation logs

* Fix batch seg alloc regression

* Fix logs

* Fix logs

* Fix tests and logs
2023-05-25 22:34:54 -07:00
AmatyaAvadhanula e9913abbbf
Add new lock types: APPEND and REPLACE (#14258)
* Add new lock types: APPEND and REPLACE
2023-05-14 22:38:32 -07:00
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
AmatyaAvadhanula 47e48ee657
Remove incorrect optimization (#14246) 2023-05-11 00:54:41 -07:00
Clint Wylie e833a4700d
suppress hadoop3 cve that seem not applicable to us (#14252) 2023-05-10 23:08:05 -07:00
Abhishek Radhakrishnan 46dabab36d
Fix NPE in test parse exception report. Add more tests with different thresholds. (#14209) 2023-05-05 10:05:41 -07:00
Abhishek Radhakrishnan 68f908e511
Fix uncaught `ParseException` when reading Avro from Kafka (#14183)
In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator.
A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.
2023-05-04 12:35:36 +05:30
AmatyaAvadhanula ac7181bbda
Persist supervisor spec only after successful start (#14150)
* Persist spec after successful start

* Fix checkstyle.

* checkstyle after mvn install
2023-05-03 18:27:39 +05:30
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Suneet Saldanha 84c11df980
Make LoggingEmitter more useful by using Markers (#14121)
* Make LoggingEmitter more useful

* Skip code coverage for facade classes

* fix spellcheck

* code review

* fix dependency

* logging.md

* fix checkstyle

* Add back jacoco version to main pom
2023-04-27 15:06:06 -07:00
Tejaswini Bandlamudi 774073b2e7
Update Hadoop3 as default build version (#14005)
Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.
2023-04-26 12:52:51 +05:30
Gian Merlino a7d4162195
Compaction: Block input specs not aligned with segmentGranularity. (#14127)
* Compaction: Block input specs not aligned with segmentGranularity.

When input intervals are not aligned with segmentGranularity, data may be
overshadowed if it lies in the space between the input intervals and the
output segmentGranularity.

In MSQ REPLACE, this is a validation error. IMO the same behavior makes
sense for compaction tasks. In case anyone was depending on the ability
to compact nonaligned intervals, a configuration parameter
allowNonAlignedInterval is provided. I don't expect it to be used much.

* Remove unused.

* ITCompactionTaskTest uses non-aligned intervals.
2023-04-25 17:06:16 -07:00
Nicholas Lippis 9d4cc501f7
return task status reported by peon (#14040)
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
2023-04-24 12:05:39 -07:00
TSFenwick accd5536df
Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094)
* Allow for Log4J to be configured for peons but still ensure console logging is enforced

This change will allow for log4j to be configured for peons but require console logging is still
configured for them to ensure peon logs are saved to deep storage.

Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console
Config as the previous config was incorrect and would never return a logger.

* fix checkstyle

* add warning to logger when it overwrites all loggers to be console

* optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory

add getName to the druid logger class

* update docs, and error message

* edit docs to be more clear

* fix checkstyle issues

* CI fixes - LoggerTest code coverage and fix spelling issue for logging docs
2023-04-24 10:41:56 -07:00
Clint Wylie 887f8db1b5
preserve explicitly specified dimension schema in "logical" schema of sampler response (#14144) 2023-04-23 21:28:05 +05:30
zachjsh 04da0102cb
KillTask should return empty inputSource resources (#14106)
### Description

This pr fixes a few bugs found with the inputSource security feature.

1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now.

2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against:

`{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}`

When they should be instead authenticated against:

`{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}`

3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.
2023-04-18 15:27:16 -04:00
Adarsh Sanjeev a7d5c64aeb
Move MSQ temporary storage to a runtime parameter instead of being configured from query context (#14061)
* 
    Adds new run time parameter druid.indexer.task.tmpStorageBytesPerTask. This sets a limit for the amount of temporary storage disk space used by tasks. This limit is currently only respected by MSQ tasks.
*   Removes query context parameters intermediateSuperSorterStorageMaxLocalBytes and composedIntermediateSuperSorterStorageEnabled. Composed intermediate super sorter (which was enabled by composedIntermediateSuperSorterStorageEnabled) is now enabled automatically if durableShuffleStorage is set to true. intermediateSuperSorterStorageMaxLocalBytes is calculated from the limit set by the run time parameter druid.indexer.task.tmpStorageBytesPerTask.
2023-04-18 16:56:51 +05:30
Rohan Garg 086b2b8efe
Log merge and push timings for PartialGenericSegmentMergeTask (#14089) 2023-04-18 11:51:26 +05:30
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00
Clint Wylie 179e2e8108
adjust useSchemaDiscovery to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076) 2023-04-12 23:12:49 -07:00
Clint Wylie 9ed8beca5e
bug fixes and add support for boolean inputs to classic long dimension indexer (#14069)
changes:
* adds support for boolean inputs to the classic long dimension indexer, which plays nice with LONG being the semi official boolean type in Druid, and even nicer when druid.expressions.useStrictBooleans is set to true, since the sampler when using the new 'auto' schema when 'useSchemaDiscovery' is specified on the dimensions spec will call the type out as LONG
* fix bugs with sampler response and new schema discovery stuff incorrectly using classic 'json' type for the logical schema instead of the new 'auto' type
2023-04-11 20:49:52 -07:00
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Karan Kumar 8712098301
Fixing overlord unable to become a leader when syncing the lock from metadata store. (#14038) 2023-04-10 12:37:31 +05:30
zachjsh 5c0221375c
Allow for Input source security in native task layer (#14003)
Fixes #13837.

### Description

This change allows for input source type security in the native task layer.

To enable this feature, the user must set the following property to true:

`druid.auth.enableInputSourceSecurity=true`

The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource.

When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource.

`new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ`

where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc..

Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.
2023-04-06 13:13:09 -04:00
Clint Wylie 1c8a184677
add null safety checks for DiscoveryDruidNode services for more resilient http server and task views (#13930)
* add null safety checks for DiscoveryDruidNode services for more resilient http server and task vi
2023-04-05 02:45:39 -07:00
Clint Wylie d21babc5b8
remix nested columns (#14014)
changes:
* introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation
* introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>.
* revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility
* fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)
2023-04-04 17:51:59 -07:00
Clint Wylie 518698a952
lower segment heap footprint and fix bug with expression type coercion (#14002) 2023-03-31 13:53:22 -07:00
kaijianding 13ffeb50ba
should retry when failed to pause realtime task (#11515) 2023-03-25 19:03:13 +05:30
Kashif Faraz b7752a909c
Enable round-robin segment assignment and batch segment allocation by default (#13942)
Changes:
- Set `useRoundRobinSegmentAssignment` in coordinator dynamic config to `true` by default.
- Set `batchSegmentAllocation` in `TaskLockConfig` (used in Overlord runtime properties) to `true` by default.
2023-03-22 08:20:01 +05:30
Gian Merlino 1c7a03a47b
Lower default maxRowsInMemory for realtime ingestion. (#13939)
* Lower default maxRowsInMemory for realtime ingestion.

The thinking here is that for best ingestion throughput, we want
intermediate persists to be as big as possible without using up all
available memory. So, we rely mainly on maxBytesInMemory. The default
maxRowsInMemory (1 million) is really just a safety: in case we have
a large number of very small rows, we don't want to get overwhelmed
by per-row overheads.

However, maximum ingestion throughput isn't necessarily the primary
goal for realtime ingestion. Query performance is also important. And
because query performance is not as good on the in-memory dataset, it's
helpful to keep it from growing too large. 150k seems like a reasonable
balance here. It means that for a typical 5 million row segment, we
won't trigger more than 33 persists due to this limit, which is a
reasonable number of persists.

* Update tests.

* Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Fix test.

* Fix link.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-03-21 10:36:36 -07:00
Laksh Singla c16d9da35a
Improve documentation for tombstone generation and minor improvement (#13907)
* As a follow up to #13893, this PR improves the comments added along with examples for the code, as well as adds handling for an edge case where the generated tombstone boundaries were overshooting the bounds of MIN_TIME (or MAX_TIME).
2023-03-10 06:59:51 +05:30
Gian Merlino fe9d0c46d5
Improve memory efficiency of WrappedRoaringBitmap. (#13889)
* Improve memory efficiency of WrappedRoaringBitmap.

Two changes:

1) Use an int[] for sizes 4 or below.
2) Remove the boolean compressRunOnSerialization. Doesn't save much
   space, but it does save a little, and it isn't adding a ton of value
   to have it be configurable. It was originally configurable in case
   anything broke when enabling it, but it's been a while and nothing
   has broken.

* Slight adjustment.

* Adjust for inspection.

* Updates.

* Update snaps.

* Update test.

* Adjust test.

* Fix snaps.
2023-03-09 15:48:02 -08:00
Laksh Singla dc67296e9d
Fix for OOM in the Tombstone generating logic in MSQ (#13893)
fix OOMs using a different logic for generating tombstones

---------

Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com>
2023-03-08 21:38:08 -08:00
Clint Wylie 68db39d08a
fix ci (#13901)
This PR is #13899 plus spotbugs fix to fix the failures introduced by #13815
2023-03-08 16:55:47 +05:30
Nicholas Lippis faac43eabe
Use base task dir in kubernetes task runner (#13880)
* Use TaskConfig to get task dir in KubernetesTaskRunner

* Use the first path specified in baseTaskDirPaths instead of deprecated baseTaskDirPath

* Use getBaseTaskDirPaths in generate command
2023-03-07 15:30:42 -07:00
Karan Kumar 65c3954942
Adding forbidden api for Properties#get() and Properties#getOrDefault() (#13882)
Properties#getOrDefault method does not check the default map for values where as Properties#getProperty() does.
2023-03-06 10:42:04 +05:30
Tejaswini Bandlamudi 7103cb4b9d
Removes FiniteFirehoseFactory and its implementations (#12852)
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
2023-03-02 18:07:17 +05:30
Laksh Singla ca68fd93a6
Generate tombstones when running MSQ's replace (#13706)
*When running REPLACE queries, the segments which contain no data are dropped (marked as unused). This PR aims to generate tombstones in place of segments which contain no data to mark their deletion, as is the behavior with the native ingestion.

This will cause InsertCannotReplaceExistingSegmentFault to be removed since it was generated if the interval to be marked unused didn't fully overlap one of the existing segments to replace.
2023-03-01 12:01:30 +05:30
Clint Wylie 1d8fff4096
sampler + type detection = bff (#13711)
* sampler + type detection = bff
* split logical and physical dimensions, tidy up
2023-02-28 04:14:30 -08:00
Abhishek Agarwal d2dbb8b2c0
Fix infinite checkpointing between tasks and overlord (#13825)
If the intermediate handoff period is less than the task duration and there is no new data in the input topic, task will continuously checkpoint the same offsets again and again. This PR fixes that bug by resetting the checkpoint time even when the task receives the same end offset request again.
2023-02-22 19:25:59 +05:30
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Paul Rogers 333196d207
Code cleanup & message improvements (#13778)
* Misc cleanup edits

Correct spacing
Add type parameters
Add toString() methods to formats so tests compare correctly
IT doc revisions
Error message edits
Display UT query results when tests fail

* Edit

* Build fix

* Build fixes
2023-02-15 15:22:54 +05:30
Suneet Saldanha 714ac07b52
Allow users to add additional metadata to ingestion metrics (#13760)
* Allow users to add additional metadata to ingestion metrics

When submitting an ingestion spec, users may pass a map of metadata
in the ingestion spec config that will be added to ingestion metrics.

This will make it possible for operators to tag metrics with other
metadata that doesn't necessarily line up with the existing tags
like taskId.

Druid clusters that ingest these metrics can take advantage of the
nested data columns feature to process this additional metadata.

* rename to tags

* docs

* tests

* fix test

* make code cov happy

* checkstyle
2023-02-08 18:07:23 -08:00
AmatyaAvadhanula 34c04daa9f
Fix infinite iteration in http sync monitoring (#13731)
* Fix infinite iteration in http task runner

* Fix infinite iteration in http server view

* Add tests
2023-02-08 15:14:11 +05:30
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Churro f022a9f246
When a task fails and doesn't throw an exception, report it correctly… (#13668)
* When a task fails and doesn't throw an exception, report it correctly in mm-less druid

* Removing unthrown exception from test
2023-02-02 09:04:18 -08:00
Kashif Faraz 7c188d80b8
Make batch segment allocation logs less noisy (#13725) 2023-02-02 09:54:53 +05:30
Clint Wylie fb26a1093d
discover nested columns when using nested column indexer for schemaless ingestion (#13672)
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
2023-01-18 12:57:28 -08:00
Gian Merlino 182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Clint Wylie b5b740bbbb
allow using nested column indexer for schema discovery (#13653)
* single typed "root" only nested columns now mimic "regular" columns of those types
* incremental index can now use nested column indexer instead of string indexer for discovered columns
2023-01-12 18:31:12 -08:00
Adarsh Sanjeev 0a486c3bcf
Update forbidden apis with fixed executor (#13633)
* Update forbidden apis with fixed executor
2023-01-12 15:34:36 +05:30
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Maytas Monsereenusorn 62a105ee65
Add context to HadoopIngestionSpec (#13624)
* add context to HadoopIngestionSpec

* fix alert
2023-01-09 14:37:02 -10:00
AmatyaAvadhanula af05cfa78c
Fix shutdown in httpRemote task runner (#13558)
* Fix shutdown in httpRemote task runner

* Add UT
2022-12-22 14:50:04 +05:30
imply-cheddar 089d8da561
Support Framing for Window Aggregations (#13514)
* Support Framing for Window Aggregations

This adds support for framing  over ROWS
for window aggregations.

Still not implemented as yet:
1. RANGE frames
2. Multiple different frames in the same query
3. Frames on last/first functions
2022-12-14 18:04:39 -08:00
Kashif Faraz 58a3acc2c4
Add InputStats to track bytes processed by a task (#13520)
This commit adds a new class `InputStats` to track the total bytes processed by a task.
The field `processedBytes` is published in task reports along with other row stats.

Major changes:
- Add class `InputStats` to track processed bytes
- Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes.
> Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost.
- Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask`
- Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method.
- Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports

Other changes:
- Update tests to verify the value of `processedBytes`
- Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class
- Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager`
- Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`
2022-12-13 18:54:42 +05:30
somu-imply 7682b0b6b1
Analysis refactor (#13501)
Refactor DataSource to have a getAnalysis method()

This removes various parts of the code where while loops and instanceof
checks were being used to walk through the structure of DataSource objects
in order to build a DataSourceAnalysis.  Instead we just ask the DataSource
for its analysis and allow the stack to rebuild whatever structure existed.
2022-12-12 17:35:44 -08:00
Gian Merlino de5a4bafcb
Zero-copy local deep storage. (#13394)
* Zero-copy local deep storage.

This is useful for local deep storage, since it reduces disk usage and
makes Historicals able to load segments instantaneously.

Two changes:

1) Introduce "druid.storage.zip" parameter for local storage, which defaults
   to false. This changes default behavior from writing an index.zip to writing
   a regular directory. This is safe to do even during a rolling update, because
   the older code actually already handled unzipped directories being present
   on local deep storage.

2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links
   instead of copies when possible. (Generally this is possible when the
   source and destination directory are on the same filesystem.)
2022-12-12 17:28:24 -08:00
Kashif Faraz 69951273b8
Fix typo in metric name (#13521) 2022-12-08 06:41:23 +05:30
Kashif Faraz c7229fc787
Limit max batch size for segment allocation, add docs (#13503)
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
2022-12-07 10:07:14 +05:30
Gian Merlino fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
Kashif Faraz 45a8fa280c
Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369)
In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions 
on the overlord can often take very long intervals of time to finish thus causing spikes 
in the `task/action/run/time`. This may result in lag building up while a task waits for a
segment to get allocated.

The root causes are:
- large number of metadata calls made to the segments and pending segments tables
- `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments

Since the contention typically arises when several tasks of the same datasource try
to allocate segments for the same interval/granularity, the allocation run times can be
improved by batching the requests together.

Changes
- Add flags
   - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`)
   - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`)
- Add methods `canPerformAsync` and `performAsync` to `TaskAction`
- Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch
- Process batch after `batchAllocationMaxWaitTime`
- Acquire `giant` lock just once per batch in `TaskLockbox`
- Reduce metadata calls by batching statements together and updating query filters
- Except for batching, retain the whole behaviour (order of steps, retries, etc.)
- Respond to leadership changes and fail items in queue when not leader
- Emit batch and request level metrics
2022-12-05 14:00:07 +05:30
AmatyaAvadhanula cc307e4c29
Fix needless task shutdown on leader switch (#13411)
* Fix needless task shutdown on leader switch

* Add unit test

* Fix style

* Fix UTs
2022-12-01 18:31:08 +05:30
Tejaswini Bandlamudi b091b32f21
Fixes reindexing bug with filter on long column (#13386)
* fixes BlockLayoutColumnarLongs close method to nullify internal buffer.

* fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers.

* fix spotbugs
2022-11-25 19:22:48 +05:30
Kashif Faraz 7cf761cee4
Prepare master branch for next release, 26.0.0 (#13401)
* Prepare master branch for next release, 26.0.0

* Use docker image for druid 24.0.1

* Fix version in druid-it-cases pom.xml
2022-11-22 15:31:01 +05:30
Gian Merlino bfffbabb56
Async task client for SeekableStreamSupervisors. (#13354)
Main changes:
1) Convert SeekableStreamIndexTaskClient to an interface, move old code
   to SeekableStreamIndexTaskClientSyncImpl, and add new implementation
   SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient.
2) Add "chatAsync" parameter to seekable stream supervisors that causes
   the supervisor to use an async task client.
3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making
   blocking RPC calls in workerExec threads.
4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList
   to FutureUtils.coalesce, so we can better capture the errors that occurred
   with contacting individual tasks.

Other, related changes:
1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether
   ServiceClient retries unavailable services. Useful since we do not
   want to retry calls unavailable tasks within the service client. (The
   supervisor does its own higher-level retries.)
2) Add FutureUtils.transformAsync, a more lambda friendly version of
   Futures.transform(f, AsyncFunction).
3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but
   returns Either instead of using null on error.
4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.
2022-11-21 19:20:26 +05:30
Gian Merlino b8ca03d283
SeekableStreamSupervisor: Unique type name for GracefulShutdownNotice. (#13399)
Allows GracefulShutdownNotice to be differentiated from ShutdownNotice.
2022-11-21 19:10:14 +05:30
AmatyaAvadhanula de566eb0db
Fix shared lock acquisition criteria (#13390)
Currently, a shared lock is acquired only when all other locks are also shared locks.

This commit updates the behaviour and acquires a shared lock only if all locks
of equal or higher priority are either shared locks or are already revoked.
The lock type of locks with lower priority does not matter as they can be revoked.
2022-11-21 15:31:38 +05:30
Gian Merlino c61313f4c4
Quieter streaming supervisors. (#13392)
Eliminates two common sources of noise with Kafka supervisors that have
large numbers of tasks and partitions:

1) Log the report at DEBUG rather than INFO level at each run cycle.
   It can get quite large, and can be retrieved via API when needed.

2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals
   package. Avoids a log message per-partition per-minute as part of seeking
   to the latest offset in the reporting thread. In the tasks, where this
   sort of logging might be more useful, we have another log message with
   the same information: "Seeking partition[%s] to[%s]".
2022-11-20 23:53:17 -08:00
Rohan Garg 6ccf31490e
Allow injection of node-role set to all non base modules (#13371) 2022-11-18 12:12:03 +05:30
Gian Merlino e78f648023
SeekableStreamSupervisor: Don't enqueue duplicate notices. (#13334)
* SeekableStreamSupervisor: Don't enqueue duplicate notices.

Similar goal to #12018, but more aggressive. Don't enqueue a notice at
all if it is equal to one currently in the queue.

* Adjustments from review.

* Update indexing-service/src/test/java/org/apache/druid/indexing/overlord/supervisor/NoticesQueueTest.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-11-11 01:54:01 -08:00
Gian Merlino 77478f25fb
Add taskActionType dimension to task/action/run/time. (#13333)
* Add taskActionType dimension to task/action/run/time.

* Spelling.
2022-11-11 12:00:08 +05:30
AmatyaAvadhanula fb23e38aa7
Fix messageGap emission (#13346)
* Fix messageGap emission

* Do not emit messageGap after stopping reading events

* Refactoring

* Fix tests
2022-11-10 17:50:19 +05:30
Clint Wylie 44f29030dd
fix flaky RemoteTaskRunnerTest.testRunPendingTaskFailToAssignTask with ugly Thread.sleep (#13344) 2022-11-10 14:28:53 +05:30
AmatyaAvadhanula 0512ae4922
Optimize metadata calls in SeekableStreamSupervisor (#13328)
* Optimize metadata calls

* Modify isTaskCurrent

* Fix tests

* Refactoring
2022-11-10 07:22:51 +05:30
AmatyaAvadhanula a2013e6566
Enhance streaming ingestion metrics (#13331)
Changes:
- Add a metric for partition-wise kafka/kinesis lag for streaming ingestion.
- Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR}
- Document metrics
2022-11-09 23:44:15 +05:30
Tejaswini Bandlamudi 594545da55
Adds cluster level idleConfig setting for supervisor (#13311)
* adds cluster level idleConfig

* updates docs

* refactoring

* spelling nit

* nit

* nit

* refactoring
2022-11-08 14:54:14 +05:30