Commit Graph

2060 Commits

Author SHA1 Message Date
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Churro f022a9f246
When a task fails and doesn't throw an exception, report it correctly… (#13668)
* When a task fails and doesn't throw an exception, report it correctly in mm-less druid

* Removing unthrown exception from test
2023-02-02 09:04:18 -08:00
Kashif Faraz 7c188d80b8
Make batch segment allocation logs less noisy (#13725) 2023-02-02 09:54:53 +05:30
Clint Wylie fb26a1093d
discover nested columns when using nested column indexer for schemaless ingestion (#13672)
* discover nested columns when using nested column indexer for schemaless
* move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec
2023-01-18 12:57:28 -08:00
Gian Merlino 182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Clint Wylie b5b740bbbb
allow using nested column indexer for schema discovery (#13653)
* single typed "root" only nested columns now mimic "regular" columns of those types
* incremental index can now use nested column indexer instead of string indexer for discovered columns
2023-01-12 18:31:12 -08:00
Adarsh Sanjeev 0a486c3bcf
Update forbidden apis with fixed executor (#13633)
* Update forbidden apis with fixed executor
2023-01-12 15:34:36 +05:30
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Maytas Monsereenusorn 62a105ee65
Add context to HadoopIngestionSpec (#13624)
* add context to HadoopIngestionSpec

* fix alert
2023-01-09 14:37:02 -10:00
AmatyaAvadhanula af05cfa78c
Fix shutdown in httpRemote task runner (#13558)
* Fix shutdown in httpRemote task runner

* Add UT
2022-12-22 14:50:04 +05:30
imply-cheddar 089d8da561
Support Framing for Window Aggregations (#13514)
* Support Framing for Window Aggregations

This adds support for framing  over ROWS
for window aggregations.

Still not implemented as yet:
1. RANGE frames
2. Multiple different frames in the same query
3. Frames on last/first functions
2022-12-14 18:04:39 -08:00
Kashif Faraz 58a3acc2c4
Add InputStats to track bytes processed by a task (#13520)
This commit adds a new class `InputStats` to track the total bytes processed by a task.
The field `processedBytes` is published in task reports along with other row stats.

Major changes:
- Add class `InputStats` to track processed bytes
- Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes.
> Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost.
- Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask`
- Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method.
- Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports

Other changes:
- Update tests to verify the value of `processedBytes`
- Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class
- Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager`
- Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`
2022-12-13 18:54:42 +05:30
somu-imply 7682b0b6b1
Analysis refactor (#13501)
Refactor DataSource to have a getAnalysis method()

This removes various parts of the code where while loops and instanceof
checks were being used to walk through the structure of DataSource objects
in order to build a DataSourceAnalysis.  Instead we just ask the DataSource
for its analysis and allow the stack to rebuild whatever structure existed.
2022-12-12 17:35:44 -08:00
Gian Merlino de5a4bafcb
Zero-copy local deep storage. (#13394)
* Zero-copy local deep storage.

This is useful for local deep storage, since it reduces disk usage and
makes Historicals able to load segments instantaneously.

Two changes:

1) Introduce "druid.storage.zip" parameter for local storage, which defaults
   to false. This changes default behavior from writing an index.zip to writing
   a regular directory. This is safe to do even during a rolling update, because
   the older code actually already handled unzipped directories being present
   on local deep storage.

2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links
   instead of copies when possible. (Generally this is possible when the
   source and destination directory are on the same filesystem.)
2022-12-12 17:28:24 -08:00
Kashif Faraz 69951273b8
Fix typo in metric name (#13521) 2022-12-08 06:41:23 +05:30
Kashif Faraz c7229fc787
Limit max batch size for segment allocation, add docs (#13503)
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
2022-12-07 10:07:14 +05:30
Gian Merlino fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
Kashif Faraz 45a8fa280c
Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369)
In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions 
on the overlord can often take very long intervals of time to finish thus causing spikes 
in the `task/action/run/time`. This may result in lag building up while a task waits for a
segment to get allocated.

The root causes are:
- large number of metadata calls made to the segments and pending segments tables
- `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments

Since the contention typically arises when several tasks of the same datasource try
to allocate segments for the same interval/granularity, the allocation run times can be
improved by batching the requests together.

Changes
- Add flags
   - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`)
   - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`)
- Add methods `canPerformAsync` and `performAsync` to `TaskAction`
- Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch
- Process batch after `batchAllocationMaxWaitTime`
- Acquire `giant` lock just once per batch in `TaskLockbox`
- Reduce metadata calls by batching statements together and updating query filters
- Except for batching, retain the whole behaviour (order of steps, retries, etc.)
- Respond to leadership changes and fail items in queue when not leader
- Emit batch and request level metrics
2022-12-05 14:00:07 +05:30
AmatyaAvadhanula cc307e4c29
Fix needless task shutdown on leader switch (#13411)
* Fix needless task shutdown on leader switch

* Add unit test

* Fix style

* Fix UTs
2022-12-01 18:31:08 +05:30
Tejaswini Bandlamudi b091b32f21
Fixes reindexing bug with filter on long column (#13386)
* fixes BlockLayoutColumnarLongs close method to nullify internal buffer.

* fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers.

* fix spotbugs
2022-11-25 19:22:48 +05:30
Kashif Faraz 7cf761cee4
Prepare master branch for next release, 26.0.0 (#13401)
* Prepare master branch for next release, 26.0.0

* Use docker image for druid 24.0.1

* Fix version in druid-it-cases pom.xml
2022-11-22 15:31:01 +05:30
Gian Merlino bfffbabb56
Async task client for SeekableStreamSupervisors. (#13354)
Main changes:
1) Convert SeekableStreamIndexTaskClient to an interface, move old code
   to SeekableStreamIndexTaskClientSyncImpl, and add new implementation
   SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient.
2) Add "chatAsync" parameter to seekable stream supervisors that causes
   the supervisor to use an async task client.
3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making
   blocking RPC calls in workerExec threads.
4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList
   to FutureUtils.coalesce, so we can better capture the errors that occurred
   with contacting individual tasks.

Other, related changes:
1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether
   ServiceClient retries unavailable services. Useful since we do not
   want to retry calls unavailable tasks within the service client. (The
   supervisor does its own higher-level retries.)
2) Add FutureUtils.transformAsync, a more lambda friendly version of
   Futures.transform(f, AsyncFunction).
3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but
   returns Either instead of using null on error.
4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.
2022-11-21 19:20:26 +05:30
Gian Merlino b8ca03d283
SeekableStreamSupervisor: Unique type name for GracefulShutdownNotice. (#13399)
Allows GracefulShutdownNotice to be differentiated from ShutdownNotice.
2022-11-21 19:10:14 +05:30
AmatyaAvadhanula de566eb0db
Fix shared lock acquisition criteria (#13390)
Currently, a shared lock is acquired only when all other locks are also shared locks.

This commit updates the behaviour and acquires a shared lock only if all locks
of equal or higher priority are either shared locks or are already revoked.
The lock type of locks with lower priority does not matter as they can be revoked.
2022-11-21 15:31:38 +05:30
Gian Merlino c61313f4c4
Quieter streaming supervisors. (#13392)
Eliminates two common sources of noise with Kafka supervisors that have
large numbers of tasks and partitions:

1) Log the report at DEBUG rather than INFO level at each run cycle.
   It can get quite large, and can be retrieved via API when needed.

2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals
   package. Avoids a log message per-partition per-minute as part of seeking
   to the latest offset in the reporting thread. In the tasks, where this
   sort of logging might be more useful, we have another log message with
   the same information: "Seeking partition[%s] to[%s]".
2022-11-20 23:53:17 -08:00
Rohan Garg 6ccf31490e
Allow injection of node-role set to all non base modules (#13371) 2022-11-18 12:12:03 +05:30
Gian Merlino e78f648023
SeekableStreamSupervisor: Don't enqueue duplicate notices. (#13334)
* SeekableStreamSupervisor: Don't enqueue duplicate notices.

Similar goal to #12018, but more aggressive. Don't enqueue a notice at
all if it is equal to one currently in the queue.

* Adjustments from review.

* Update indexing-service/src/test/java/org/apache/druid/indexing/overlord/supervisor/NoticesQueueTest.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-11-11 01:54:01 -08:00
Gian Merlino 77478f25fb
Add taskActionType dimension to task/action/run/time. (#13333)
* Add taskActionType dimension to task/action/run/time.

* Spelling.
2022-11-11 12:00:08 +05:30
AmatyaAvadhanula fb23e38aa7
Fix messageGap emission (#13346)
* Fix messageGap emission

* Do not emit messageGap after stopping reading events

* Refactoring

* Fix tests
2022-11-10 17:50:19 +05:30
Clint Wylie 44f29030dd
fix flaky RemoteTaskRunnerTest.testRunPendingTaskFailToAssignTask with ugly Thread.sleep (#13344) 2022-11-10 14:28:53 +05:30
AmatyaAvadhanula 0512ae4922
Optimize metadata calls in SeekableStreamSupervisor (#13328)
* Optimize metadata calls

* Modify isTaskCurrent

* Fix tests

* Refactoring
2022-11-10 07:22:51 +05:30
AmatyaAvadhanula a2013e6566
Enhance streaming ingestion metrics (#13331)
Changes:
- Add a metric for partition-wise kafka/kinesis lag for streaming ingestion.
- Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR}
- Document metrics
2022-11-09 23:44:15 +05:30
Tejaswini Bandlamudi 594545da55
Adds cluster level idleConfig setting for supervisor (#13311)
* adds cluster level idleConfig

* updates docs

* refactoring

* spelling nit

* nit

* nit

* refactoring
2022-11-08 14:54:14 +05:30
AmatyaAvadhanula a738ac9ad7
Improve task pause logging and metrics for streaming ingestion (#13313)
* Improve task pause logging and metrics for streaming ingestion

* Add metrics doc

* Fix spelling
2022-11-07 21:33:54 +05:30
AmatyaAvadhanula 650840ddaf
Add segment handoff time metric (#13238)
* Add segment handoff time metric

* Remove monitors on scheduler stop

* Add warning log for slow handoff

* Remove monitor when scheduler stops
2022-11-07 17:49:10 +05:30
Gian Merlino 227b57dd8e
Compaction: Fetch segments one at a time on main task; skip when possible. (#13280)
* Compaction: Fetch segments one at a time on main task; skip when possible.

Compact tasks include the ability to fetch existing segments and determine
reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec.
This is a useful feature that makes compact tasks work well even when the
user running the compaction does not have a clear idea of what they want
the compacted segments to be like.

However, this comes at a cost: it takes time, and disk space, to do all
of these fetches. This patch improves the situation in two ways:

1) When segments do need to be fetched, download them one at a time and
   delete them when we're done. This still takes time, but minimizes the
   required disk space.

2) Don't fetch segments on the main compact task when they aren't needed.
   If the user provides a full granularitySpec, dimensionsSpec, and
   metricsSpec, we can skip it.

* Adjustments.

* Changes from code review.

* Fix logic for determining rollup.
2022-11-07 14:50:14 +05:30
Jonathan Wei 2fdaa2fcab
Make RecordSupplierInputSource respect sampler timeout when stream is empty (#13296)
* Make RecordSupplierInputSource respect sampler timeout when stream is empty

* Rename timeout param, make it nullable, add timeout test
2022-11-03 17:45:35 -05:00
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00
Kashif Faraz fd7864ae33
Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments (#13287)
In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments`
can take a long very long time to finish. This is because of the costly invocation of 
`timeline.isOvershadowed` which is done for every used segment in every coordinator run.

Changes
- Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments
- Iterate over this set instead of all used segments to identify segments that can be marked as unused
- Mark segments as unused in the DB in batches rather than one at a time
- Refactor: Add class `SegmentTimeline` for ease of use and readability while using a
`VersionedIntervalTimeline` of segments.
2022-11-01 20:19:52 +05:30
AmatyaAvadhanula e1ff3ca289
Resume streaming tasks on Overlord switch (#13223)
* Resume streaming tasks on Overlord switch

* Refactoring and better messages

* Better docs

* Add unit test

* Fix tests' setup

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Better logs

* Fix test again

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-10-29 09:38:49 +05:30
Gian Merlino 5429b9d764
RTR: Dedupe items in getKnownTasks. (#13273)
Fixes a problem where the tasks API in OverlordResource would complain
about duplicate keys in the map it's building.
2022-10-28 08:31:26 -07:00
AmatyaAvadhanula 9cbda66d96
Remove skip ignorable shards (#13221)
* Revert "Improve kinesis task assignment after resharding (#12235)"

This reverts commit 1ec57cb935.
2022-10-28 16:19:01 +05:30
Gian Merlino d98c808d3f
Remove basePersistDirectory from tuning configs. (#13040)
* Remove basePersistDirectory from tuning configs.

Since the removal of CliRealtime, it serves no purpose, since it is
always overridden in production using withBasePersistDirectory given
some subdirectory of the task work directory.

Removing this from the tuning config has a benefit beyond removing
no-longer-needed logic: it also avoids the side effect of empty
"druid-realtime-persist" directories getting created in the systemwide
temp directory.

* Test adjustments to appropriately set basePersistDirectory.

* Remove unused import.

* Fix RATC constructor.
2022-10-21 17:25:36 -07:00
AmatyaAvadhanula b88e1c21ea
Fix Overlord leader election when task lock re-acquisition fails (#13172)
Overlord leader election can sometimes fail due to task lock re-acquisition issues.
This commit solves the issue by failing such tasks and clearing all their locks.
2022-10-17 15:23:16 +05:30
Tejaswini Bandlamudi 3e13584e0e
Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144)
* Idle Seekable stream supervisor changes.

* nit

* nit

* nit

* Adds unit tests

* Supervisor decides it's idle state instead of AutoScaler

* docs update

* nit

* nit

* docs update

* Adds Kafka unit test

* Adds Kafka Integration test.

* Updates travis config.

* Updates kafka-indexing-service dependencies.

* updates previous offsets snapshot & doc

* Doesn't act if supervisor is suspended.

* Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes.

* Reverts Kinesis Supervisor idle behaviour changes.

* nit

* nit

* Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests.

* Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too

* Adds Kafka Supervisor UT

* Improves test coverage in druid-server

* Corrects IT override config

* Doc updates and Syntactic changes

* nit

* supervisorSpec.ioConfig.idleConfig changes
2022-10-12 18:31:08 +05:30
Kashif Faraz b07f01d645
Set useMaxMemoryEstimates=false by default (#13178)
A value of `false` denotes that the new flow with improved estimates will be used.
2022-10-04 15:04:23 +05:30
Kashif Faraz ce5f55e5ce
Fix over-replication caused by balancing when inventory is not updated yet (#13114)
* Add coordinator test framework

* Remove outdated changes

* Add more tests

* Add option to auto-sync inventory

* Minor cleanup

* Fix inspections

* Add README for simulations, add SegmentLoadingNegativeTest

* Fix over-replication from balancing

* Fix README

* Cleanup unnecessary fields from DruidCoordinator

* Add a test

* Fix DruidCoordinatorTest

* Remove unused import

* Fix CuratorDruidCoordinatorTest

* Remove test log4j2.xml
2022-09-29 12:06:23 +05:30
Jonathan Wei 1f1fced6d4
Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089)
* Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON

* Fix serde and docs

* Add PR comment check
2022-09-26 19:51:04 -05:00
Laksh Singla 0bfa81b7df
Fix the Injector creation in HadoopTask (#13138)
* Injector fix in HadoopTask

* Log the ExtensionsConfig while instantiating the HadoopTask

* Log the config in the run() method instead of the ctor
2022-09-24 10:38:25 +05:30
Jonathan Wei 331e6d707b
Add KafkaConfigOverrides extension point (#13122)
* Add KafkaConfigOverrides extension point

* X
2022-09-21 11:47:19 +05:30
Gian Merlino 2e729170cc
Kill task: Don't include markAsUnused unless set. (#13104)
Cleans up the serialized JSON.
2022-09-17 14:03:34 -07:00
AmatyaAvadhanula 1311e85f65
Faster fix for dangling tasks upon supervisor termination (#13072)
This commit fixes issues with delayed supervisor termination during certain transient states.
Tasks can be created during supervisor termination and left behind since the cleanup may
not consider these newly added tasks.

#12178 added a lock for the entire process of task creation to prevent such dangling tasks.
But it also introduced a deadlock scenario as follows:
- An invocation of `runInternal` is in progress.
- A `stop` request comes, acquires `stateChangeLock` and submit a `ShutdownNotice`
- `runInternal` keeps waiting to acquire the `stateChangeLock`
- `ShutdownNotice` remains stuck in the notice queue because `runInternal` is still running
- After some timeout, the supervisor goes through a forced termination

Fix:
 * `SeekableStreamSupervisor.runInternal` - do not try to acquire lock if supervisor is already stopping
 * `SupervisorStateManager.maybeSetState` - do not allow transitions from STOPPING state
2022-09-15 15:31:14 +05:30
Abhishek Agarwal 618757352b
Bump up the version to 25.0.0 (#12975)
* Bump up the version to 25.0.0

* Fix the version in console
2022-08-29 11:27:38 +05:30
Karan Kumar 275f834b2a
Race in Task report/log streamer (#12931)
* Fixing RACE in HTTP remote task Runner

* Changes in the interface

* Updating documentation

* Adding test cases to SwitchingTaskLogStreamer

* Adding more tests
2022-08-25 17:56:01 -07:00
Clint Wylie 8ee8786d3c
add maxBytesInMemory and maxClientResponseBytes to SamplerConfig (#12947)
* add maxBytesInMemory and maxClientResponseBytes to SamplerConfig
2022-08-25 00:50:41 -07:00
Gian Merlino 35aaaa9573
Fix serialization in TaskReportFileWriters. (#12938)
* Fix serialization in TaskReportFileWriters.

For some reason, serializing a Map<String, TaskReport> would omit the
"type" field. Explicitly sending each value through the ObjectMapper
fixes this, because the type information does not get lost.

* Fixes for static analysis.
2022-08-24 08:11:01 -07:00
Adarsh Sanjeev 3b58a01c7c
Correct spelling in messages and variable names. (#12932) 2022-08-24 11:06:31 +05:30
Gian Merlino d7d15ba51f
Add druid-multi-stage-query extension. (#12918)
* Add druid-multi-stage-query extension.

* Adjustments from CI.

* Task ID validation.

* Various changes from code review.

* Remove unnecessary code.

* LGTM-related.
2022-08-23 18:44:01 -07:00
Karan Kumar a3a9c5f409
Fixing overlord issued too many redirects (#12908)
* Fixing race in overlord redirects where the node was redirecting to itself

* Fixing test cases
2022-08-17 18:27:39 +05:30
Abhishek Agarwal adbebc174a
Fix flaky tests in SeekableStreamSupervisorStateTest (#12875)
* Fix flaky test in SeekableStreamSupervisorStateTest

* Fix for flaky security IT Test

* fix tests

* retry queries if there is some flakiness
2022-08-16 18:38:03 +05:30
Gian Merlino 28836dfa71
Fix race in TaskQueue.notifyStatus. (#12901)
* Fix race in TaskQueue.notifyStatus.

It was possible for manageInternal to relaunch a task while it was
being cleaned up, due to a race that happens when notifyStatus is
called to clean up a successful task:

1) In a critical section, notifyStatus removes the task from "tasks".
2) Outside a critical section, notifyStatus calls taskRunner.shutdown
   to let the task runner know it can clear out its data structures.
3) In a critical section, syncFromStorage adds the task back to "tasks",
   because it is still present in metadata storage.
4) In a critical section, manageInternalCritical notices that the task
   is in "tasks" and is not running in the taskRunner, so it launches
   it again.
5) In a (different) critical section, notifyStatus updates the metadata
   store to set the task status to SUCCESS.
6) The task continues running even though it should not be.

The possibility for this race was introduced in #12099, which shrunk
the critical section in notifyStatus. Prior to that patch, a single
critical section encompassed (1), (2), and (5), so the ordering above
was not possible.

This patch does the following:

1) Fixes the race by adding a recentlyCompletedTasks set that prevents
   the main management loop from doing anything with tasks that are
   currently being cleaned up.
2) Switches the order of the critical sections in notifyStatus, so
   metadata store updates happen first. This is useful in case of
   server failures: it ensures that if the Overlord fails in the midst
   of notifyStatus, then completed-task statuses are still available in
   ZK or on MMs for the next Overlord. (Those are cleaned up by
   taskRunner.shutdown, which formerly ran first.) This isn't related
   to the race described above, but is fixed opportunistically as part
   of the same patch.
3) Changes the "tasks" list to a map. Many operations require retrieval
   or removal of individual tasks; those are now O(1) instead of O(N)
   in the number of running tasks.
4) Changes various log messages to use task ID instead of full task
   payload, to make the logs more readable.

* Fix format string.

* Update comment.
2022-08-14 23:34:36 -07:00
Herb Brewer 9f8982a9a6
fix(druid-indexing): failed to get shardSpec for interval issue (#12573) 2022-08-05 17:57:36 -07:00
AmatyaAvadhanula d294404924
Kinesis ingestion with empty shards (#12792)
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.

Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.

If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.

These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.

However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:

The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)

Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
2022-08-05 22:38:58 +05:30
Paul Rogers a618458bf0
Tidy up construction of the Guice Injectors (#12816)
* Refactor Guice initialization

Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder

* Revisions from review comments

* Build fixes

* Revisions from review comments
2022-08-04 00:05:07 -07:00
刘小辉 6f5c1434b8
fix get task may be null (#12100) 2022-08-03 09:23:48 -07:00
Maytas Monsereenusorn 5417aa2055
Fix: ParseException swallow cause Exception (#12810)
* add impl

* add impl

* fix checkstyle
2022-07-22 13:46:28 -07:00
Tejaswini Bandlamudi cc1ff56ca5
Unregisters `RealtimeMetricsMonitor`, `TaskRealtimeMetricsMonitor` on Indexers after task completion (#12743)
Few indexing tasks register RealtimeMetricsMonitor or TaskRealtimeMetricsMonitor with the process’s MonitorScheduler when they start. These monitors never unregister themselves (they always return true, they'd need to return false to unregister). Each of these monitors emits a set of metrics once every druid.monitoring.emissionPeriod.
As a result, after executing several tasks for a while, Indexer emits metrics of these tasks even after they're long gone.

Proposed Solution
Since one should be able to obtain the last round of ingestion metrics after the task unregisters the monitor, introducing lastRoundMetricsToBePushed variable to keep track of the same and overriding the AbstractMonitor.monitor method in RealtimeMetricsMonitor, TaskRealtimeMetricsMonitor to implement the new logic.
2022-07-18 14:34:18 +05:30
Abhishek Agarwal 2ab20c9fc9
Surface more information about task status in tests (#12759)
I see some test runs failing because task status is not as expected. It will be helpful to know what error the task has.
2022-07-13 14:53:53 +05:30
Rohan Garg bb953be09b
Refactor usage of JoinableFactoryWrapper + more test coverage (#12767)
Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled
2022-07-12 06:25:36 -07:00
Gian Merlino d2576584a0
Consolidate the two TaskStatus classes. (#12765)
* Consolidate the two TaskStatus classes.

There are two, but we don't need more than one.

* Fix import order.
2022-07-11 07:25:22 -07:00
Didip Kerabat 06251c5d2a
Add EIGHT_HOUR into possible list of Granularities. (#12717)
* Add EIGHT_HOUR into possible list of Granularities.

* Add the missing definition.

* fix test.

* Fix another test.

* Stylecheck finally passed.

Co-authored-by: Didip Kerabat <didip@apple.com>
2022-07-05 11:05:37 -07:00
Gian Merlino 2b330186e2
Mid-level service client and updated high-level clients. (#12696)
* Mid-level service client and updated high-level clients.

Our servers talk to each other over HTTP. We have a low-level HTTP
client (HttpClient) that is super-asynchronous and super-customizable
through its handlers. It's also proven to be quite robust: we use it
for Broker -> Historical communication over the wide variety of query
types and workloads we support.

But the low-level client has no facilities for service location or
retries, which means we have a variety of high-level clients that
implement these in their own ways. Some high-level clients do a better
job than others. This patch adds a mid-level ServiceClient that makes
it easier for high-level clients to be built correctly and harmoniously,
and migrates some of the high-level logic to use ServiceClients.

Main changes:

1) Add ServiceClient org.apache.druid.rpc package. That package also
   contains supporting stuff like ServiceLocator and RetryPolicy
   interfaces, and a DiscoveryServiceLocator based on
   DruidNodeDiscoveryProvider.

2) Add high-level OverlordClient in org.apache.druid.rpc.indexing.

3) Indexing task client creator in TaskServiceClients. It uses
   SpecificTaskServiceLocator to find the tasks. This improves on
   ClientInfoTaskProvider by caching task locations for up to 30 seconds
   across calls, reducing load on the Overlord.

4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient
   instead of extending IndexTaskClient.

5) Rework RemoteTaskActionClient to use a ServiceClient instead of
   DruidLeaderClient.

6) Rework LocalIntermediaryDataManager, TaskMonitor, and
   ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and
   Overlord no longer need IndexingServiceClient (which internally used
   DruidLeaderClient).

There are some concrete benefits over the prior logic, namely:

- DruidLeaderClient does retries in its "go" method, but only retries
  exactly 5 times, does not sleep between retries, and does not retry
  retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.)
  ServiceClient handles retries in a more reasonable way.

- DruidLeaderClient's methods are all synchronous, whereas ServiceClient
  methods are asynchronous. This is used in one place so far: the
  SpecificTaskServiceLocator, so we don't need to block a thread trying
  to locate a task. It can be used in other places in the future.

- HttpIndexingServiceClient does not properly handle all server errors.
  In some cases, it tries to parse a server error as a successful
  response (for example: in getTaskStatus).

- IndexTaskClient currently makes an Overlord call on every task-to-task
  HTTP request, as a way to find where the target task is. ServiceClient,
  through SpecificTaskServiceLocator, caches these target locations
  for a period of time.

* Style adjustments.

* For the coverage.

* Adjustments.

* Better behaviors.

* Fixes.
2022-07-05 09:43:26 -07:00
imply-cheddar e3128e3fa3
Poison stupid pool (#12646)
* Poison StupidPool and fix resource leaks

There are various resource leaks from test setup as well as some
corners in query processing.  We poison the StupidPool to start failing
tests when the leaks come and fix any issues uncovered from that so
that we can start from a clean baseline.

Unfortunately, because of how poisoning works,
we can only fail future checkouts from the same pool,
which means that there is a natural race between a
leak happening -> GC occurs -> leak detected -> pool poisoned.

This race means that, depending on interleaving of tests,
if the very last time that an object is checked out
from the pool leaks, then it won't get caught.
At some point in the future, something will catch it,
 however and from that point on it will be deterministic.

* Remove various things left over from iterations

* Clean up FilterAnalysis and add javadoc on StupidPool

* Revert changes to .idea/misc.xml that accidentally got pushed

* Style and test branches

* Stylistic woes
2022-07-03 14:36:22 -07:00
Kashif Faraz f5b5cb93ea
Fix expiry timeout bug in LocalIntermediateDataManager (#12722)
The expiry timeout is compared against the current time but the condition is reversed.
This means that as soon as a supervisor task finishes, its partitions are cleaned up,
irrespective of the specified `intermediaryPartitionTimeout` period.

After these changes, the `intermediaryPartitionTimeout` will start getting honored.

Changes
* Fix the condition
* Add tests to verify the new correct behaviour
* Reduce the default expiry timeout from P1D to PT5M
   to retain current behaviour in case of default configs.
2022-07-01 16:29:22 +05:30
Gian Merlino d5abd06b96
Fix flaky KafkaIndexTaskTest. (#12657)
* Fix flaky KafkaIndexTaskTest.

The testRunTransactionModeRollback case had many race conditions. Most notably,
it would commit a transaction and then immediately check to see that the results
were *not* indexed. This is racey because it relied on the indexing thread being
slower than the test thread.

Now, the case waits for the transaction to be processed by the indexing thread
before checking the results.

* Changes from review.
2022-06-24 13:53:51 -07:00
Gian Merlino 4d892483ca
Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658)
The TestEmitter is used from different threads without concurrency
control. This patch makes the emitter thread-safe.
2022-06-22 22:29:16 -07:00
Paul Rogers 893759de91
Remove null and empty fields from native queries (#12634)
* Remove null and empty fields from native queries

* Test fixes

* Attempted IT fix.

* Revisions from review comments

* Build fixes resulting from changes suggested by reviews

* IT fix for changed segment size
2022-06-16 14:07:25 -07:00
AmatyaAvadhanula f970757efc
Optimize overlord GET /tasks memory usage (#12404)
The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
2022-06-16 22:30:37 +05:30
Gian Merlino 70f3b13621
ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592)
* ForkingTaskRunner: Set ActiveProcessorCount for tasks.

This prevents various automatically-sized thread pools from being unreasonably
large (we don't want each task to size its pools as if it is the only thing on
the entire machine).

* Fix tests.

* Add missing LifecycleStart annotation.

* ForkingTaskRunner needs ManageLifecycle.
2022-06-15 15:56:32 -07:00
Abhishek Agarwal 59a0c10c47
Add remedial information in error message when type is unknown (#12612)
Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.
2022-06-07 20:22:45 +05:30
Agustin Gonzalez 2f3d7a4c07
Emit state of replace and append for native batch tasks (#12488)
* Emit state of replace and append for native batch tasks

* Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE)

* Add metric to compaction job

* Avoid null ptr exc when null emitter

* Coverage

* Emit tombstone & segment counts

* Tasks need a type

* Spelling

* Integrate BatchIngestionMode in batch ingestion tasks functionality

* Typos

* Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage.

* Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test.

* Spelling

* Avoid polluting the Task interface

* Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.
2022-05-23 12:32:47 -07:00
Agustin Gonzalez c236227905
Deal with potential cardinality estimate being negative and add logging to hash determine partitions phase (#12443)
* Deal with potential cardinality estimate being negative and add logging

* Fix typo in name

* Refine and minimize logging

* Make it info based on code review

* Create a named constant for the magic number
2022-05-20 10:51:06 -07:00
superivaj f9bdb3b236
Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551)
Issue: 
Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config
(to optimize memory usage, particulary for datasources with many dimensions),
the corresponding client object `ClientCompactionTaskQueryTuningConfig`
(used by the coordinator duty `CompactSegments` to trigger auto-compaction)
does not contain this field. Thus, the value of `maxColumnsToMerge` specified
in any datasource compaction config is ignored.

Changes:
- Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig`
  and `UserCompactionTaskQueryTuningConfig`
- Fix tests
2022-05-20 22:23:08 +05:30
Gian Merlino 5f95cc61fe
RemoteTaskRunner: Fix NPE in streamTaskReports. (#12006)
* RemoteTaskRunner: Fix NPE in streamTaskReports.

It is possible for a work item to drop out of runningTasks after the
ZkWorker is retrieved. In this case, the current code would throw
an NPE.

* Additional tests and additional fixes.

* Fix import.
2022-05-19 14:23:55 -07:00
Gian Merlino 1d258d2108
Slightly improve RTR log messages. (#12540)
1) Align "Assigning task" log messages between RTR and HRTR.

2) Remove confusing reference to "Coordinator".

3) Move "Not assigning task" message from INFO to DEBUG. It's not super
   important to see this message: we mainly want to see what _does_ get
   assigned.

4) Reword "Task switched from pending to running" message to better
   match the structure of the  "Assigning task" message from the same
   method.
2022-05-19 07:43:55 -07:00
Gian Merlino 485de6a14a
Add builder for TaskToolbox. (#12539)
* Add builder for TaskToolbox.

The main purpose of this change is to make it easier to create
TaskToolboxes in tests. However, the builder is used in production
too, by TaskToolboxFactory.

* Fix imports, adjust formatting.

* Fix import.
2022-05-19 07:43:50 -07:00
Frank Chen c33ff1c745
Enforce console logging for peon process (#12067)
Currently all Druid processes share the same log4j2 configuration file located in _common directory. Since peon processes are spawned by middle manager process, they derivate the environment variables from the middle manager. These variables include those in the log4j2.xml controlling to which file the logger writes the log.

But current task logging mechanism requires the peon processes to output the log to console so that the middle manager can redirect the console output to a file and upload this file to task log storage.

So, this PR imposes this requirement to peon processes, whatever the configuration is in the shared log4j2.xml, peon processes always write the log to console.
2022-05-16 15:07:21 +05:30
Jason Koch bb1a6def9d
Task queue unblock (#12099)
* concurrency: introduce GuardedBy to TaskQueue

* perf: Introduce TaskQueueScaleTest to test performance of TaskQueue with large task counts

This introduces a test case to confirm how long it will take to launch and manage (aka shutdown)
a large number of threads in the TaskQueue.

h/t to @gianm for main implementation.

* perf: improve scalability of TaskQueue with large task counts

* linter fixes, expand test coverage

* pr feedback suggestion; swap to different linter

* swap to use SuppressWarnings

* Fix TaskQueueScaleTest.

Co-authored-by: Gian Merlino <gian@imply.io>
2022-05-14 16:44:29 -07:00
Clint Wylie 9e5a940cf1
remake column indexes and query processing of filters (#12388)
Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
2022-05-11 11:57:08 +05:30
zachjsh de14f511d6
Fix broken ForkingTaskRunnerTest (#12499)
A recent commit broke this test. This pr fixes the test.
2022-05-03 04:00:36 -04:00
Rocky Chen 770ad95169
Add a metric for task duration in the pending queue (#12492)
This PR is to measure how long a task stays in the pending queue and emits the value with the metric task/pending/time. The metric is measured in RemoteTaskRunner and HttpRemoteTaskRunner.

An example of the metric:

```
2022-04-26T21:59:09,488 INFO [rtr-pending-tasks-runner-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2022-04-26T21:59:09.487Z","service":"druid/coordinator","host":"localhost:8081","version":"2022.02.0-iap-SNAPSHOT","metric":"task/pending/time","value":8,"dataSource":"wikipedia","taskId":"index_parallel_wikipedia_gecpcglg_2022-04-26T21:59:09.432Z","taskType":"index_parallel"}
```

------------------------------------------
Key changed/added classes in this PR

    Emit metric task/pending/time in classes RemoteTaskRunner and HttpRemoteTaskRunner.
    Update related factory classes and tests.
2022-05-02 23:47:25 -04:00
Abhishek Agarwal 2fe053c5cb
Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
zachjsh 564d6defd4
Worker level task metrics (#12446)
* * fix metric name inconsistency

* * add task slot metrics for middle managers

* * add new WorkerTaskCountStatsMonitor to report task count metrics
  from worker

* * more stuff

* * remove unused variable

* * more stuff

* * add javadocs

* * fix checkstyle

* * fix hadoop test failure

* * cleanup

* * add more code coverage in tests

* * fix test failure

* * add docs

* * increase code coverage

* * fix spelling

* * fix failing tests

* * remove dead code

* * fix spelling
2022-04-26 11:44:44 -05:00
Agustin Gonzalez 0460d45e92
Make tombstones ingestible by having them return an empty result set. (#12392)
* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage
2022-04-15 09:08:06 -07:00
Jihoon Son 5e5625f3ae
Fix indexMerger to respect the includeAllDimensions flag (#12428)
* Fix indexMerger to respect flag includeAllDimensions flag; jsonInputFormat should set keepNullColumns if useFieldDiscovery is set

* address comments
2022-04-13 12:43:11 -07:00
Parag Jain 2c79d28bb7
Copy of #11309 with fixes (#12402)
* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
2022-04-11 21:05:24 +05:30
Maytas Monsereenusorn 8edea5a82d
Add a new flag for ingestion to preserve existing metrics (#12185)
* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT
2022-04-08 11:02:02 -07:00
AmatyaAvadhanula 9c6b9abcde
Use javaOptsArray provided in task context (#12326)
The `javaOpts` property is being read from task context but not `javaOptsArray`.
Changes:
- Read `javaOptsArray` from task context in `ForkingTaskRunner`.
- Add test to verify that `javaOptsArray` in task context takes precedence over `javaOpts`
2022-03-28 16:33:40 +05:30
Jihoon Son b6eeef31e5
Store null columns in the segments (#12279)
* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments
2022-03-23 16:54:04 -07:00
Kashif Faraz 0867ca75e1
Fix OOM failures in dimension distribution phase of parallel indexing (#12331)
Parallel indexing with range partitioning can often cause OOM in the
`ParallelIndexSupervisorTask` during the dimension distribution phase.
This typically happens because of too many `StringSketch` objects
obtained from the different `partial_dimension_distribution` sub-tasks.

We need not keep any of the sketches in memory until we need to compute
the PartitionBoundaries for the respective interval.

Changes
- Extract `StringDistribution` from `DimensionDistributionReport`s when they are received
  and write to disk inside the task/temp/distributions
- After all the subtasks have finished, iterate over all the intervals one by one
- For each interval, read the distributions from disk, merge them and create `PartitionBoundaries`.
- Cleanup task/temp/distributions directory when all `PartitionBoundaries` have been determined
2022-03-22 19:28:15 +05:30