Commit Graph

13229 Commits

Author SHA1 Message Date
Giulio Talarico 76e5048aab
fix supervisor spec api submission commands (#14877) 2023-08-23 14:38:09 +05:30
Zoltan Haindrich e806d09309
Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848) 2023-08-22 22:50:19 -07:00
Clint Wylie 7b5012ea6e
override retry attempts for InputEntityIteratingReaderTest for much faster test run (#14897) 2023-08-22 22:01:47 -07:00
Clint Wylie fb053c399c
consolidate json and auto indexers, remove v4 nested column serializer (#14456) 2023-08-22 18:50:11 -07:00
Soumyava 6817de9376
Doc changes for avatica transparent reconnection (#14896) 2023-08-22 11:58:17 -07:00
Zoltan Haindrich b9a33949fd
Fix aggregation filter expression processing in the absense of projection (#14893)
* test

* fix

* add 33 test

* crap

* Revert "crap"

This reverts commit 2751198deb.

* cleanup test

* celanup

* rename test
2023-08-22 10:17:14 -07:00
Kashif Faraz 9376d8d6e1
Refactor: Move `UpdateCoordinatorStateAndPrepareCluster` duty out of `DruidCoordinator` (#14845)
Motivation:
- Clean up `DruidCoordinator` and move methods to classes where they are most relevant

Changes:
- No functional change
- Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState`
- Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster`
- Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here
- Move reporting of used segment stats and historical capacity stats from
`CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues`
- Move reporting of unavailable and under-replicated segment stats from
`CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty
2023-08-22 19:50:41 +05:30
Zoltan Haindrich 14c1aff150
Fix error messages relating to OVERWRITE keyword (#14870)
OVERWRITE should not be a fully reserved keyword
2023-08-22 16:17:49 +05:30
AmatyaAvadhanula bd505062de
Improve streaming ingestion completion timeout error message (#14636)
* Improve streaming ingestion completion timeout error message

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2023-08-22 14:33:28 +05:30
Clint Wylie 194a9c9abc
set druid.expressions.useStrictBooleans to true by default (#14734) 2023-08-22 00:19:56 -07:00
Tejaswini Bandlamudi d87056e708
Upgrade guava version to 31.1-jre (#14767)
Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues.

CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution.
Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414
2023-08-22 12:09:53 +05:30
Benedict Jin 18f7cb6926
Fixed broken URL of python api tutorial (#14881) 2023-08-22 09:53:41 +05:30
Clint Wylie 5d1412949e
enable sql compatible null handling mode by default (#14792)
* enable sql compatible null handling mode by default
* fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false
2023-08-21 20:07:13 -07:00
Katya Macedo 5f74ef56f1
Clean up Kafka supervisor topic (#14651)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 11:55:38 -07:00
Nhi Pham 9fe7c01c16
Automatic compaction API documentation refactor (#14740)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-08-21 11:34:41 -07:00
Peter Marshall 0dfd99e381
202307-notebook-unionall (#14726)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 10:55:58 -07:00
Vadim Ogievetsky 631dc3b589
add Kafka topic column controls (#14865) 2023-08-21 21:33:23 +05:30
Abhishek Agarwal a38b4f0491
Add topic name as a column in the Kafka Input format (#14857)
This PR adds a way to store the topic name in a column. Such a column can be used to distinguish messages coming from different topics in multi-topic ingestion.
2023-08-21 21:32:34 +05:30
Kashif Faraz 92906059d2
Remove segmentsToBeDropped from SegmentTransactionInsertAction (#14883)
Motivation:
- There is no usage of the `SegmentTransactionInsertAction` which passes a
non-null non-empty value of `segmentsToBeDropped`.
- This is not really needed either as overshadowed segments are marked as unused
by the Coordinator and need not be done in the same transaction as committing segments.
- It will also help simplify the changes being made in #14407 

Changes:
- Remove `segmentsToBeDropped` from the task action and all intermediate methods
- Remove related tests which are not needed anymore
2023-08-21 20:08:56 +05:30
Kashif Faraz c211dcc4b3
Clean up compaction logs on coordinator (#14875)
Changes:
- Move logic of `NewestSegmentFirstIterator.needsCompaction` to `CompactionStatus`
to improve testability and readability
- Capture the list of checks performed to determine if compaction is needed in a readable
manner in `CompactionStatus.CHECKS`
- Make `CompactionSegmentIterator` iterate over instances of `SegmentsToCompact`
instead of `List<DataSegment>`. This allows use of the `umbrellaInterval` later.
- Replace usages of `QueueEntry` with `SegmentsToCompact`
- Move `SegmentsToCompact` out of `NewestSegmentFirstIterator`
- Simplify `CompactionStatistics`
- Reduce level of less important logs to debug
- No change made to tests to ensure correctness
2023-08-21 17:30:41 +05:30
Kashif Faraz 07a193a142
Use separate executor for each coordinator duty group (#14869)
Changes:
- Use separate executor for every duty group
- This change is thread-safe as every duty group uses its own copy of
`DruidCoordinatorRuntimeParams` and does not share any other mutable instances
with other duty groups.
- With the exception of `HistoricalManagementDuties`, duty groups are typically not
very compute intensive and mostly perform database or HTTP I/O. So, coordinator
resources would still mostly be available for `HistoricalManagementDuties`.
2023-08-21 15:53:22 +05:30
Abhishek Agarwal 9065ef1aff
Fix a bug in QosFilter (#14859)
QoSFilter class is trying to parse the timeout as an integer. We need to round a value of query timeout that is higher than INT.MAX to INT.MAX.
2023-08-21 13:00:41 +05:30
317brian 263ac36e8d
docs: fix autolabeler for jupyter notebooks (#14862) 2023-08-18 12:42:36 -07:00
Kashif Faraz 097b645005
Clean up after add kill bufferPeriod (#14868)
Follow up changes to #12599 

Changes:
- Rename column `used_flag_last_updated` to `used_status_last_updated`
- Remove new CLI tool `UpdateTables`.
  - We already have a `CreateTables` with similar functionality, which should be able to
  handle update cases too.
  - Any user running the cluster for the first time should either just have `connector.createTables`
  enabled or run `CreateTables` which should create tables at the latest version.
  - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has
  been added to Druid, and users would have to run `CreateTables` anyway.
- Remove `upgrade-prep.md` and include that info in `metadata-init.md`.
- Fix log messages to adhere to Druid style
- Use lambdas
2023-08-19 00:00:04 +05:30
dependabot[bot] 1e14df4c49
Bump com.ibm.icu:icu4j from 55.1 to 73.2 (#14853)
* Bump com.ibm.icu:icu4j from 55.1 to 73.2

Bumps [com.ibm.icu:icu4j](https://github.com/unicode-org/icu) from 55.1 to 73.2.
- [Release notes](https://github.com/unicode-org/icu/releases)
- [Commits](https://github.com/unicode-org/icu/commits)

---
updated-dependencies:
- dependency-name: com.ibm.icu:icu4j
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

* update Unicode/ICU license

* fix license check for unicode/icu

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-18 09:10:39 -04:00
Jonathan Wei a8eaa1e4ed
Skip streaming auto-scaling action if supervisor is idle (#14773)
* Skip streaming auto-scaling action if supervisor is idle

* Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>

---------

Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
2023-08-17 19:43:25 -05:00
Lucas Capistrant 9c124f2cde
Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599)
* Add new configurable buffer period to create gap between mark unused and kill of segment

* Changes after testing

* fixes and improvements

* changes after initial self review

* self review changes

* update sql statement that was lacking last_used

* shore up some code in SqlMetadataConnector after self review

* fix derby compatibility and improve testing/docs

* fix checkstyle violations

* Fixes post merge with master

* add some unit tests to improve coverage

* ignore test coverage on new UpdateTools cli tool

* another attempt to ignore UpdateTables in coverage check

* change column name to used_flag_last_updated

* fix a method signature after column name switch

* update docs spelling

* Update spelling dictionary

* Fixing up docs/spelling and integrating altering tasks table with my alteration code

* Update NULL values for used_flag_last_updated in the background

* Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod

* remove unneeded things now that the new column is automatically updated

* Test new background row updater method

* fix broken tests

* fix create table statement

* cleanup DDL formatting

* Revert adding columns to entry table by default

* fix compilation issues after merge with master

* discovered and fixed metastore inserts that were breaking integration tests

* fixup forgotten insert by using pattern of sharing now timestamp across columns

* fix issue introduced by merge

* fixup after merge with master

* add some directions to docs in the case of segment table validation issues
2023-08-17 19:32:51 -05:00
Vadim Ogievetsky 7e147ee905
Web console: Reset to specific offsets dialog (#14863)
* add dialog

* copy changes
2023-08-17 15:38:56 -07:00
Vadim Ogievetsky 59415ba9b2
Web console: expose new coordinator properties in the dialog (#14791)
* expose new coordinator properties in the dialog

* escape
2023-08-17 15:37:23 -07:00
Abhishek Radhakrishnan 37db5d9b81
Reset offsets supervisor API (#14772)
* Add supervisor /resetOffsets API.

- Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets
which accepts DataSourceMetadata as a body parameter.
- Update logs, unit tests and docs.

* Add a new interface method for backwards compatibility.

* Rename

* Adjust tests and javadocs.

* Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules

* UT fix

* Doc updates.

* remove extraneous debugging logs.

* Remove the boolean setting; only ResetHandle() and resetInternal()

* Relax constraints and add a new ResetOffsetsNotice; cleanup old logic.

* A separate ResetOffsetsNotice and some cleanup.

* Minor cleanup

* Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers

* Add unit tests for the no op implementations for test coverage

* CodeQL fix

* checkstyle from merge conflict

* Doc changes

* DOCUSAURUS code tabs fix. Thanks, Brian!
2023-08-17 14:13:10 -07:00
dependabot[bot] 2cc3bd6383
Bump joda-time:joda-time from 2.12.4 to 2.12.5 (#14855)
* Bump joda-time:joda-time from 2.12.4 to 2.12.5

Bumps [joda-time:joda-time](https://github.com/JodaOrg/joda-time) from 2.12.4 to 2.12.5.
- [Release notes](https://github.com/JodaOrg/joda-time/releases)
- [Changelog](https://github.com/JodaOrg/joda-time/blob/main/RELEASE-NOTES.txt)
- [Commits](https://github.com/JodaOrg/joda-time/compare/v2.12.4...v2.12.5)

---
updated-dependencies:
- dependency-name: joda-time:joda-time
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-17 11:24:22 -07:00
dependabot[bot] 2a7fbf2ab4
Bump org.apache.directory.api:api-util from 1.0.3 to 2.1.3 (#14852)
Bumps [org.apache.directory.api:api-util](https://github.com/apache/directory-ldap-api) from 1.0.3 to 2.1.3.
- [Commits](https://github.com/apache/directory-ldap-api/compare/1.0.3...2.1.3)

---
updated-dependencies:
- dependency-name: org.apache.directory.api:api-util
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-17 08:56:34 -07:00
Kashif Faraz fffb2e4fe7
Speed up SQLMetadataStorageActionHandlerTest (#14856)
Changes
- Reduce test time of `SQLMetadataStorageActionHandlerTest.testMigration`
- Slightly modify log messages to adhere to Druid style
2023-08-17 18:02:43 +05:30
Abhishek Agarwal b97cc45d81
Add clarification to the docs for multi-topic Kafka ingestion (#14847)
Follow-up to #14828. Added some more clarification about how topicPattern is used.
2023-08-17 12:52:06 +05:30
Vadim Ogievetsky dc2ae1e99c
Web console: improving the helper queries by allowing for running inline helper queries (#14801)
* remove helper queries

* fix tests

* take care of zero queries also

* switch to better place
2023-08-16 23:50:43 -07:00
Kashif Faraz 5d4ac64178
Adapt maxSegmentsToMove based on cluster skew (#14584)
Changes:
- No change in behaviour if `smartSegmentLoading` is disabled
- If `smartSegmentLoading` is enabled
  - Compute `balancerComputeThreads` based on `numUsedSegments`
  - Compute `maxSegmentsToMove` based on `balancerComputeThreads`
  - Compute `segmentsToMoveToFixSkew` based on usage skew
  - Compute `segmentsToMove = Math.min(maxSegmentsToMove, segmentsToMoveToFixSkew)`

Limits:
- 1 <= `balancerComputeThreads` <= 8
- `maxSegmentsToMove` <= 20% of total segments
- `minSegmentsToMove` = 0.15% of total segments
2023-08-17 11:14:54 +05:30
Vadim Ogievetsky cb27d0d2ed
Web console: enable Kafka multi-topic ingestion from the data loader (#14833)
* multi topic ux

* updated to match new api
2023-08-17 09:57:34 +05:30
317brian 6b4dda964d
Docusaurus2 upgrade for master (#14411)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-16 19:01:21 -07:00
Clint Wylie 6b14dde50e
deprecate config-magic in favor of json configuration stuff (#14695)
* json config based processing and broker merge configs to deprecate config-magic
2023-08-16 18:23:57 -07:00
Pranav 26d82fd342
fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587) 2023-08-16 17:57:16 -07:00
Peter Marshall f585f0a8ed
202306-docs-notebook topn (#14478)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-16 14:50:49 -07:00
Jill Osborne 2561477e87
Jupyter nested columns tutorial (#14788) 2023-08-16 14:45:37 -07:00
dependabot[bot] faf79470ae
Bump io.dropwizard.metrics:metrics-graphite from 3.1.2 to 4.2.19 (#14842)
* Bump io.dropwizard.metrics:metrics-graphite from 3.1.2 to 4.2.19

Bumps [io.dropwizard.metrics:metrics-graphite](https://github.com/dropwizard/metrics) from 3.1.2 to 4.2.19.
- [Release notes](https://github.com/dropwizard/metrics/releases)
- [Commits](https://github.com/dropwizard/metrics/compare/v3.1.2...v4.2.19)

---
updated-dependencies:
- dependency-name: io.dropwizard.metrics:metrics-graphite
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* align graphite-emitter dropwizard version with core

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-16 13:58:35 -07:00
YongGang 3954685aae
Report more metrics to monitor K8s task runner (#14771)
* Report pod running metrics to monitor K8s task runner

* refine method definition

* fix checkstyle

* implement task metrics

* more comment

* address comments

* update doc for the new metrics reported

* fix checkstyle

* refine method definition

* minor refine
2023-08-16 14:03:53 -04:00
dependabot[bot] 97c3773012
Bump commons-cli:commons-cli from 1.3.1 to 1.5.0 (#14837)
* Bump commons-cli:commons-cli from 1.3.1 to 1.5.0

Bumps commons-cli:commons-cli from 1.3.1 to 1.5.0.

---
updated-dependencies:
- dependency-name: commons-cli:commons-cli
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-16 07:37:56 -07:00
dependabot[bot] 8be7751dbd
Bump org.tukaani:xz from 1.8 to 1.9 (#14839)
* Bump org.tukaani:xz from 1.8 to 1.9

Bumps org.tukaani:xz from 1.8 to 1.9.

---
updated-dependencies:
- dependency-name: org.tukaani:xz
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-16 07:37:29 -07:00
dependabot[bot] e55fe67535
Bump apache.curator.version from 5.4.0 to 5.5.0 (#14843)
* Bump apache.curator.version from 5.4.0 to 5.5.0

Bumps `apache.curator.version` from 5.4.0 to 5.5.0.

Updates `org.apache.curator:curator-client` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-framework` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-recipes` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-x-discovery` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

Updates `org.apache.curator:curator-test` from 5.4.0 to 5.5.0
- [Commits](https://github.com/apache/curator/compare/apache-curator-5.4.0...apache-curator-5.5.0)

---
updated-dependencies:
- dependency-name: org.apache.curator:curator-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-framework
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-recipes
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-x-discovery
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.apache.curator:curator-test
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-16 07:36:58 -07:00
Abhishek Agarwal 7911a04064
Refactoring of multi-topic kafka ingestion docs (#14828)
In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.
2023-08-16 18:00:11 +05:30
Kashif Faraz d9221e46e4
Completely disable cachingCost balancer strategy (#14798)
`cachingCost` has been deprecated in #14484 and is not advised to be used in
production clusters as it may cause usage skew across historicals which the
coordinator is unable to rectify. This PR completely disables `cachingCost` strategy
as it has now been rendered redundant due to recent performance improvements
made to `cost` strategy.

Changes
- Disable `cachingCost` strategy
- Add `DisabledCachingCostBalancerStrategyFactory` for the time being so that we
can give a proper error message before falling back to `CostBalancerStrategy`. This
will be removed in subsequent releases.
- Retain `CachingCostBalancerStrategy` for testing/benchmarking purposes.
- Add javadocs to `DiskNormalizedCostBalancerStrategy`
2023-08-16 11:43:52 +05:30
dependabot[bot] 9be0f64f50
Bump org.apache.commons:commons-compress from 1.21 to 1.23.0 (#14820)
* Bump org.apache.commons:commons-compress from 1.21 to 1.23.0

Bumps org.apache.commons:commons-compress from 1.21 to 1.23.0.

---
updated-dependencies:
- dependency-name: org.apache.commons:commons-compress
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-08-15 20:08:54 -04:00