Commit Graph

1903 Commits

Author SHA1 Message Date
Frank Chen 36bc41855d
Set Content-Type for String based response (#12295) 2022-03-04 15:17:03 +08:00
Samarth Jain 58d05d7014
Fix ci (#12304) 2022-03-03 23:05:50 -08:00
Jason Koch 36193955b6
perf: eliminate expensive log construction in remote-task-runner shutdown (#12097) 2022-03-03 13:38:21 -08:00
Jason Koch f594e7ac24
perf: improve RemoteTaskRunner task assignment loop performance (#12096)
* perf: improve ZkWorker task lookup performance

This improves the performance of the ZkWorker task lookup loop by
eliminating repeat calls to getRunningTasks() in toImmutable(),
and reduces the work performed in isRunningTask() to stream-parse
the id field instead of entire JSON blob.
2022-03-02 09:38:32 -08:00
Tejaswini Bandlamudi 1af4c9c933
Display row stats for multiphase parallel indexing tasks (#12280)
Row stats are reported for single phase tasks in the `/liveReports` and `/rowStats` APIs
and are also a part of the overall task report. This commit adds changes to report
row stats for multiphase tasks too.

Changes:
- Add `TaskReport` in `GeneratedPartitionsReport` generated during hash and range partitioning
- Collect the reports for `index_generate` phase in `ParallelIndexSupervisorTask`
2022-03-02 10:10:31 +05:30
Laksh Singla 3f709db173
Make ParseExceptions more informative (#12259)
This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
2022-02-28 22:31:15 +05:30
Xavier Léauté d105519558
Replace use of PowerMock with Mockito (#12282)
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. 

* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
2022-02-27 22:47:09 -08:00
Xavier Léauté 1434197ee1
update airline dependency to 2.x (#12270)
* upgrade Airline to Airline 2
  https://github.com/airlift/airline is no longer maintained, updating to
  https://github.com/rvesse/airline (Airline 2) to use an actively
  maintained version, while minimizing breaking changes.

  Note, this is a backwards incompatible change, and extensions relying on
  the CliCommandCreator extension point will also need to be updated.

* fix dependency checks where jakarta.inject is now resolved first instead
  of javax.inject, due to Airline 2 using jakarta
2022-02-27 15:19:28 -08:00
Jihoon Son e5ad862665
A new includeAllDimension flag for dimensionsSpec (#12276)
* includeAllDimensions in dimensionsSpec

* doc

* address comments

* unused import and doc spelling
2022-02-25 18:27:48 -08:00
AmatyaAvadhanula 1ec57cb935
Improve kinesis task assignment after resharding (#12235)
Problem:
- When a kinesis stream is resharded, the original shards are closed.
   Any intermediate shard created in the process is eventually closed as well.
- If a shard is closed before any record is put into it, it can be safely ignored for ingestion.
- It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster.

Changes:
- Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor`
- Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig`
- The caches are used and updated only when `skipIgnorableShards = true`
2022-02-18 12:37:06 +05:30
AmatyaAvadhanula 393e9b68a8
Add config to limit task slots for parallel indexing tasks (#12221)
In extreme cases where many parallel indexing jobs are submitted together, it is possible
that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule
their own sub-tasks thus stalling progress of all the indexing jobs.

Key changes:
- Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots
  for `ParallelIndexSupervisorTasks` per worker
- `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior)
- `ratio = 0` implies supervisor tasks can not use any slot on a worker
   (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs)
- `ImmutableWorkerInfo.canRunTask()`
- `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`
2022-02-15 23:15:09 +05:30
Kashif Faraz 95b388d2d1
Assign partitionIds in the same order as bucketIds (#12236)
When `ParallelIndexSupervisorTask` converts `BucketNumberedShardSpecs`
to corresponding `BuildingShardSpecs`, the bucketId order gets lost.
Particularly, for range partitioning, this results in the partitionIds not being in the same order
as increasing partition boundaries.

Changes
- Refactor `ParallelIndexSupervisorTask.groupGenericPartitionLocationsPerPartition()`
2022-02-10 11:08:39 +05:30
Jonathan Wei 33bc9226f0
Move task creation under stateChangeLock in SeekableStreamSupervisor (#12178) 2022-02-09 13:24:46 -06:00
Jihoon Son ab3d994a17
Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207)
* working

* Lazily load segmentKillers, segmentMovers, and segmentArchivers

* more tests

* test-jar plugin

* more coverage

* lazy client

* clean up changes

* checkstyle

* i did not change the branch condition

* adjust failure rate to run tests faster

* javadocs

* checkstyle
2022-02-08 13:02:06 -08:00
Maytas Monsereenusorn 2b8e7fc0b4
Add a flag to allow auto compaction task slot ratio to consider auto scaler slots (#12228)
* add impl

* fix checkstyle

* add unit tests

* checkstyle

* add IT

* fix IT

* add comments

* fix checkstyle
2022-02-06 20:46:05 -08:00
Kashif Faraz e648b01afb
Improve memory estimates in Aggregator and DimensionIndexer (#12073)
Fixes #12022  

### Description
The current implementations of memory estimation in `OnHeapIncrementalIndex` and `StringDimensionIndexer` tend to over-estimate which leads to more persistence cycles than necessary.

This PR replaces the max estimation mechanism with getting the incremental memory used by the aggregator or indexer at each invocation of `aggregate` or `encode` respectively.

### Changes
- Add new flag `useMaxMemoryEstimates` in the task context. This overrides the same flag in DefaultTaskConfig i.e. `druid.indexer.task.default.context` map
- Add method `AggregatorFactory.factorizeWithSize()` that returns an `AggregatorAndSize` which contains
  the aggregator instance and the estimated initial size of the aggregator
- Add method `Aggregator.aggregateWithSize()` which returns the incremental memory used by this aggregation step
- Update the method `DimensionIndexer.processRowValsToKeyComponent()` to return the encoded key component as well as its effective size in bytes
- Update `OnHeapIncrementalIndex` to use the new estimations only if `useMaxMemoryEstimates = false`
2022-02-03 10:34:02 +05:30
zachjsh f47e1e0dcc
Reduce RemoteTaskRunnerTest flakiness (#12211)
* * add more logging to start / stop of RemoteTaskRunner

* * add more logging

* Increase timeout on RemoteTaskRunnerTest

* Apply suggestions from code review

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2022-02-01 15:35:18 -08:00
zachjsh f906f2f577
Fix HttpRemoteTaskRunner LifecycleStart / LifecycleStop race condition (#12184)
* * stop workers, remove listener, and call exitStop() on HttpRemoteTaskRunner @LifecycleStop

* * fix test failure
2022-01-27 13:15:14 -06:00
zachjsh 376d7c069d
Close provisioner during HttpRemotetaskRunner LifecycleStop (#12176)
Fixed an issue where the provisionerService which can be used to spawn resources as needed is left running on a non-leader coordinator/overlord, after it is removed from leadership. Provisioning should only be done by the leader. To fix the issue, a call to stop the provisionerService was added to the stop() method of HttpRemoteTaskRunner class. The provisionerService was properly closed on other TaskRunner types.
2022-01-20 13:32:08 -05:00
Abhishek Agarwal 53c0e489c2
Fix infinite retrying during task pausing (#12167)
This fixes a bug that causes TaskClient in overlord to continuously retry to pause tasks. This can happen when a task is not responding to the pause command. Ideally, in such a case when the task is unresponsive, the overlord would have given up after a few retries and would have killed the task. However, due to this bug, retries go on forever.
2022-01-19 09:03:36 +05:30
Maytas Monsereenusorn bd7fe45da0
Support adding metrics in Auto Compaction (#12125)
* add impl

* add impl

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add integration tests

* add integration tests

* fix LGTM

* fix test

* remove doc
2022-01-17 20:19:31 -08:00
Jonathan Wei 3f79453506
Lock count guardrail for parallel single phase/sequential task (#12052)
* Lock count guardrail for parallel single phase/sequential task

* PR comments
2021-12-15 11:12:21 -06:00
Lucas Capistrant 761fe9f144
Add new metric that quantifies how long batch ingest jobs waited for segment availability and whether or not that wait was successful (#12002)
* add a unit test that tests that new metric is emitted

* remove unused import

* clarify in doc that this is for batch tasks

* fix IndexTaskTest
2021-12-10 11:40:52 -06:00
imply-cheddar a8b916576d
Allow for appending tasks to co-exist with each other. (#12041)
* Allow for appending tasks to co-exist with each other.

Add a config parameter for appending tasks to allow them to
use a SHARED lock.  This will allow multiple appending tasks
to add segments to the same datasource at the same time.

This config should actually be the default, but it is added
as a config to enable a smooth transition/validation in
production settings before forcing it as the default
behavior going forward.

This change leverages the TaskLockType.SHARED that existed
previously, this used to carry the semantics of a READ lock,
which was "escalated" when the task wanted to actually
persist the segment.  As of many moons before this diff, the
SHARED lock had stopped being used but was still piped into
the code.  It turns out that with a few tweaks, it can be
adjusted to be a shared lock for append tasks to allow them
all to write to the same datasource, so that is what this does.

* Can only reuse the shared lock if using the same groupId

* Need to serialize out the task lock type

* Adjust Unit tests to expect new field in JSON
2021-12-09 16:46:40 -08:00
Jonathan Wei 229f82a6f0
Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961)
* Add parse error list API for stream supervisors, simplify parse exception message

* Add input string to parse exception

* Use structured ParseExceptionReport

* Fix tests

* Add test

* PR comments, add ParseExceptionReport equals verifier

* Fix test
2021-12-09 15:42:55 -06:00
Gian Merlino 76d281d64f
Enable allocating segments at ALL granularity. (#12003)
* Enable allocating segments at ALL granularity.

The main change is that Granularity.granularitiesFinerThan will return ALL if ALL
is passed in.

Allocating segments at ALL granularity is somewhat unconventional, but there
is nothing wrong with it, and it actually makes a lot of sense for tables that
are meant to be used for lookups or dimensions rather than main fact tables.
This change enables ALL segmentGranularity to work properly in appendToExisting
mode.

Also clarifies behavior in javadocs and tests.

* Move tests to improve coverage.
2021-12-03 14:15:05 -08:00
Gian Merlino bc2cc47db6
SeekableStreamSupervisor: Coalesce adjacent RunNotices. (#12018)
The idea is that if multiple notices come in around the same time due
to rapid task status changes, we only need to execute one of them.
2021-12-03 13:42:03 -08:00
Gian Merlino e0e05aad99
Enhancements to IndexTaskClient. (#12011)
* Enhancements to IndexTaskClient.

1) Ability to use handlers other than StringFullResponseHandler. This
   functionality is not used in production code yet, but is useful
   because it will allow tasks to communicate with each other in
   non-string-based formats and in streaming fashion. In the future,
   we'll be able to use this to make task-to-task communication
   more efficient.

2) Truncate server errors at 1KB, so long errors do not pollute logs.

3) Change error log level for retryable errors from WARN to INFO. (The
   final error is still WARN.)

4) Harmonize log and exception messages to have a more consistent format.

* Additional tests and improvements.
2021-12-03 09:14:32 -08:00
Frank Chen c2cea25a6b
Improve exception message when loading data from web-console (#11723)
* Improve exception handling

* Revert some changes

* Resolve comments

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* Address review comments

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2021-12-03 21:33:49 +08:00
Paul Rogers a66f10eea1
Code cleanup from query profile project (#11822)
* Code cleanup from query profile project

* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse

* Reverted change due to lack of tests
2021-11-30 11:35:38 -08:00
Gian Merlino f6e6ca2893
Use intermediate-persist IndexSpec during multiphase merge. (#11940)
* Use intermediate-persist IndexSpec during multiphase merge.

The main change is the addition of an intermediate-persist IndexSpec
to the main "merge" method in IndexMerger. There are also a few minor
adjustments to the IndexMerger interface to encourage more harmonious
usage of its methods in the future.

* Additional changes inspired by the test coverage checker.

- Remove unused-in-production IndexMerger methods "append" and "convert".
- Add additional unit tests to UnifiedIndexerAppenderatorsManager.

* Additional adjustments.

* Even more additional adjustments.

* Test fixes.
2021-11-29 15:08:49 -08:00
Frank Chen 98957be044
Return HTTP 404 instead of 400 for supervisor/task endpoints (#11724)
* Use 404 instead of 400

* Use 404 instead of 400

* Add UT test cases

* Add IT testcases

* add UT for task resource filter

Signed-off-by: frank chen <frank.chen021@outlook.com>

* Using org.testing.Assert instead of org.junit.Assert

* Resolve comments and fix test

* Fix test

* Fix tests

* Resolve comments
2021-11-25 13:09:47 +08:00
Rohan Garg 2c08055962
Specify time column for first/last aggregators (#11949)
Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.
2021-11-25 09:44:14 +05:30
Gian Merlino 3d72e66f56
Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582)
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.

This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.

In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.

So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!

* Fix test compiling.

* Fixes and improvements.

* Fix forbidden API.

* Additional fixes.
2021-11-24 14:51:53 -08:00
Gian Merlino 5e168b861a
StorageAdapter: Add getRowSignature method. (#11953)
Simplifies logic for callers that only want to get a list of all the
column names, or column names and types. Updated callers SegmentAnalyzer,
HashJoinSegmentStorageAdapter, and DruidSegmentReader.
2021-11-24 13:14:25 -08:00
Maytas Monsereenusorn bb3d2a433a
Support filtering data in Auto Compaction (#11922)
* add impl

* fix checkstyle

* add test

* add test

* add unit tests

* fix unit tests

* fix unit tests

* fix unit tests

* add IT

* add IT

* add comments

* fix spelling
2021-11-24 10:56:38 -08:00
Kashif Faraz 48dbe0ea45
Handle null values in Range Partition dimension distribution (#11973)
This PR adds support for handling null dimension values while creating partition boundaries
in range partitioning.

This means that we can now have partition boundaries like [null, "abc"] or ["abc", null, "def"].
2021-11-24 14:30:02 +05:30
Clint Wylie f260bbed23
restore and deprecate AggregatorFactory methods (#11917)
* add back and deprecate aggregator factory methods so i can say i told you so when i delete these later

* rename to make less ambiguous, fix fill method

* adjust
2021-11-19 15:59:35 -08:00
Gian Merlino 36ee0367ff
Scan: Add "orderBy" parameter. (#11930)
* Scan: Add "orderBy" parameter.

This patch adds an API for requesting non-time orderings, although it
does not actually add the ability to execute such queries.

The changes are done in such a way that no matter how Scan query objects
are constructed, they will have a correct "getOrderBy". This will enable
us to switch the execution to exclusively use "getOrderBy" later on when
it's implemented.

Scan queries are serialized such that they only include "order" (time
order) if the ordering is time-based, and they only include "orderBy" if
the ordering is non-time-based. This maximizes compatibility with
the existing API while also providing a clean look for formatted queries.

Because this patch does not include execution logic, if someone actually
tries to run a query with non-time ordering, then they will get an error
like "Cannot execute query with orderBy [quality ASC]".

* SQL module fixes.

* Add spotbugs-exclude.

* Remove unused method.
2021-11-19 08:19:12 -08:00
Nikhil Navadiya 3c51136098
Add worker category dimension (#11554)
* Add worker category as dimension in TaskSlotCountStatsMonitor

* Change description

* Add workerConfig as field

* Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics

* Fixing tests

* Fixing alerts

* Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs

* Resolving false positive spell check

* addressing comments

* throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner

Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>
2021-11-18 22:59:07 -08:00
Clint Wylie 7f0bede878
autocompaction support for complex dimensions (#11924)
* autocompaction support for complex dimensions

* more test
2021-11-16 15:57:44 -08:00
Agustin Gonzalez a13a96d5e0
Avoid materializing list of segment files when finding a partition file during shuffle (#11903)
* Avoid materializing list of segment files (it can cause OOM/memory pressure) as well as looping over the files.

* Validate subTaskId
2021-11-11 10:51:52 -07:00
Kashif Faraz 223c5692a8
Add dimension partitioningType to metrics to track usage of different partitioning schemes (#11902)
Add method ShardSpec.getType() to get name of shard spec type
List all names of shard spec types in the interface ShardSpec itself
for easy reference and maintenance
Add dimension partitioningType to metric segment/added/bytes
2021-11-11 18:34:27 +05:30
Gian Merlino babf00f8e3
Migrate File.mkdirs to FileUtils.mkdirp. (#11879)
* Migrate File.mkdirs to FileUtils.mkdirp.

* Remove unused imports.

* Fix LookupReferencesManager.

* Simplify.

* Also migrate usages of forceMkdir.

* Fix var name.

* Fix incorrect call.

* Update test.
2021-11-09 11:10:49 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Kashif Faraz 2d77e1a3c6
Add support for multi dimension range partitioning (#11848)
This PR adds support for range partitioning on multiple dimensions. It extends on the
concept and implementation of single dimension range partitioning.

The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension.

The start and end values of a DimensionRangeShardSpec are represented
by StringTuples, where each String in the tuple is the value of a partition dimension.
2021-11-06 12:50:17 +05:30
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Maytas Monsereenusorn 33d9d9bd74
Add rollup config to auto and manual compaction (#11850)
* add rollup to auto and manual compaction

* add unit tests

* add unit tests

* add IT

* fix checkstyle
2021-10-29 10:22:25 -07:00
Jonathan Wei a96aed021e
Fix indefinite WAITING batch task when lock is revoked (#11788)
* Fix indefinite WAITING batch task when lock is revoked

* Use revoked property on TaskLock

* Update TimeChunkLockAcquireAction to return TaskLock for revoked locks
2021-10-27 17:49:15 -05:00
Liran Funaro 9ca8f1ec97
Remove IncrementalIndex template modifier (#11160)
Co-authored-by: Liran Funaro <liran.funaro@verizonmedia.com>
2021-10-27 13:10:37 -07:00