Commit Graph

11753 Commits

Author SHA1 Message Date
dependabot[bot] a5dfb911de
Bump maven-site-plugin from 3.1 to 3.11.0 (#12310)
Bumps [maven-site-plugin](https://github.com/apache/maven-site-plugin) from 3.1 to 3.11.0.
- [Release notes](https://github.com/apache/maven-site-plugin/releases)
- [Commits](https://github.com/apache/maven-site-plugin/compare/maven-site-plugin-3.1...maven-site-plugin-3.11.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-site-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-17 15:17:29 +08:00
Jihoon Son 5e23674fe5
Fix a race condition in the '/tasks' Overlord API (#12330)
* finds complete and active tasks from the same snapshot

* overlord resource

* unit test

* integration test

* javadoc and cleanup

* more cleanup

* fix test and add more
2022-03-17 10:47:45 +09:00
Frank Chen d745d0b338
Add JDK 11 (#12333) 2022-03-16 15:03:04 -07:00
Dr. Sizzles 69f928f50e
Adding k8s support for human readable parsing (#12316)
* Adding k8s support for human readable parsing

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/configuration/human-readable-byte.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java

Co-authored-by: Frank Chen <frankchen@apache.org>

* Changes per review

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Frank Chen <frankchen@apache.org>
2022-03-16 11:18:47 +08:00
Xavier Léauté 5d02a91faa
upgrade Error Prone to 2.11 (requires Java 11) (#12306)
The latest version of Error Prone now requires Java 11. Upgrading means we can
remove a lot of the maven profile complexity required to run checks with Java 8.
This also requires switching our strict build to use Java 11.

* update error-prone to 2.11
* remove need for specific maven profiles for Java 8 and Java 15
* fix additional Error Prone warnings with Java 11
* update strict build to use Java 11
2022-03-14 19:40:48 -07:00
somu-imply b5195c5095
Graceful null handling and correctness in DoubleMean Aggregator (#12320)
* Adding null handling for double mean aggregator

* Updating code to handle nulls in DoubleMean aggregator

* oops last one should have checkstyle issues. fixed

* Updating some code and test cases

* Checking on object is null in case of numeric aggregator

* Adding one more test to improve coverage

* Changing one test as asked in the review

* Changing one test as asked in the review for nulls
2022-03-14 16:52:47 -07:00
mchades 3de1272926
bug fix: merge results of group by limit push down (#11969) 2022-03-11 09:04:34 -08:00
Kyle Larose db91961af7
kubernetes: restart watch on null response (#12233)
* kubernetes: restart watch on null response

Kubernetes watches allow a client to efficiently processes changes to
resources. However, they have some idiosyncrasies. In particular, they
can error out for various reasons leading to what would normally be seen
as an invalid result.

The Druid kubernetes node discovery subsystem does not handle a certain
case properly. The watch can return an item with a null object.  These
leads to a null pointer exception. When this happens, the provider needs
to restart the watch, because rerunning the watch from the same resource
version leads to the same result: yet another null pointer exception.

This commit changes the provider to handle null objects by restarting
the watch.

* review: add more coverage

This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch
loop, and removes an unnecessay return.

* kubernetes: reduce logging verbosity

The log messages about items being NULL don't really deserve to be at a
level other than DEBUG since they are not actionable, particularly since
we automatically recover now. Move them to the DEBUG level.
2022-03-10 12:56:40 -08:00
Gian Merlino cb2b2b696d
Fix error message for groupByEnableMultiValueUnnesting. (#12325)
* Fix error message for groupByEnableMultiValueUnnesting.

It referred to the incorrect context parameter.

Also, create a dedicated exception class, to allow easier detection of this
specific error.

* Fix other test.

* More better error messages.

* Test getDimensionName method.
2022-03-10 11:37:24 -08:00
Parag Jain 2efb74ff1e
fix supervisor auto scaler config serde bug (#12317) 2022-03-09 16:17:12 -08:00
Jihoon Son d89d4ff588
Git hooks should fail on errors; pass args to git hooks (#12322)
* Git hooks should fail on errors

* don't set shell to pass args
2022-03-10 09:07:50 +09:00
Abhishek Agarwal 6346b9561d
Reuse the InputEntityReader in SettableByteEntityReader (#12269)
* Reuse the InputEntityReader in SettableByteEntityReader

* Fix logic

* Fix kafka streaming ingestion

* Add Tests for kafka input format change

* Address review comments
2022-03-09 14:38:31 -08:00
Clint Wylie 9cfb23935f
push value range and set index get operations into BitmapIndex (#12315)
* push value range and set index get operations into BitmapIndex

* fix bug

* oops, fix better

* better like, fix test, javadocs

* fix checkstyle

* simplify and fixes

* cache

* fix tests

* move indexOf into GenericIndexed

* oops

* fix tests
2022-03-09 13:30:58 -08:00
Rohan Garg 9f6a930462
Fix join query incase of filter explosion during CNF conversion (#12324) 2022-03-09 12:43:09 -08:00
Clint Wylie dc0372a28e
improve FileWriteOutBytes.readFully (#12323)
* improve FileWriteOutBytes.readFully

* no need to flush if out of bounds
2022-03-09 11:45:45 -08:00
AmatyaAvadhanula 7bf1d8c5c0
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)
Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
2022-03-09 23:17:43 +05:30
Rohan Garg 56fbd2af6f
Guard against exponential increase of filters during CNF conversion (#12314)
Currently, the CNF conversion of a filter is unbounded, which means that it can create as many filters as possible thereby also leading to OOMs in historical heap. We should throw an error or disable CNF conversion if the filter count starts getting out of hand. There are ways to do CNF conversion with linear increase in filters as well but that has been left out of the scope of this change since those algorithms add new variables in the predicate - which can be contentious.
2022-03-09 13:19:52 +05:30
Clint Wylie 0600772cce
use a non-concurrent map for lookups-cached-global unless incremental updates are actually required (#12293)
* use a non-concurrent map for lookups-cached-global unless incremental updates are actually required
* adjustments
* fix test
2022-03-08 21:54:25 -08:00
Agustin Gonzalez abe76ccb90
Batch ingestion replace (#12137)
* Tombstone support for replace functionality

* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec

* Update compaction test to match replace behavior

* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.

* Style plus simple queriableindex test

* Add segment cache loader tombstone test

* Add more tests

* Add a method to the LogicalSegment to test whether it has any data

* Test filter with some empty logical segments

* Refactor more compaction/dropexisting tests

* Code coverage

* Support for all empty segments

* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.

* Fix null ptr when segment does not have a queriable index

* Add support for empty replace interval (all input data has been filtered out)

* Fixed coverage & style

* Find tombstone versions from lock versions

* Test failures & style

* Interner was making this fail since the two segments were consider equal due to their id's being equal

* Cleanup tombstone version code

* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used

* Reject replace spec when input intervals are empty

* Documentation

* Style and unit test

* Restore test code deleted by mistake

* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.

* Unused imports. Dead code. Test coverage.

* Coverage.

* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.

* Fix OmniKiller + more test coverage.

* Tombstones are now marked using a shard spec

* Drop a segment factory.json in the segment cache for tombstones

* Style

* Style + coverage

* style

* Add TombstoneLoadSpec.class to mapper in test

* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

Typo

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/configuration/index.md

Missing

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Typo

* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

* Range does not work with multi-dim

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
2022-03-08 20:07:02 -07:00
Clint Wylie dae53ae36a
adjust topn heap operation when string is dictionary encoded, but not uniquely (#12291)
* add topn heap optimization when string is dictionary encoded, but not uniquely

* use array instead

* is same

* fix javadoc

* fix

* Update StringTopNColumnAggregatesProcessor.java
2022-03-08 14:32:40 -08:00
Jihoon Son 0e097ead36
Add git hooks that can run multiple scripts (#12300)
* Add git hooks that can run multiple scripts

* scripts to install/uninstall hooks

* better message for uninstall; support pre-push params
2022-03-09 07:16:47 +09:00
Gian Merlino 875e0696e0
GroupBy: Cap dictionary-building selector memory usage. (#12309)
* GroupBy: Cap dictionary-building selector memory usage.

New context parameter "maxSelectorDictionarySize" controls when the
per-segment processing code should return early and trigger a trip
to the merge buffer.

Includes:

- Vectorized and nonvectorized implementations.
- Adjustments to GroupByQueryRunnerTest to exercise this code in
  the v2SmallDictionary suite. (Both the selector dictionary and
  the merging dictionary will be small in that suite.)
- Tests for the new config parameter.

* Fix issues from tests.

* Add "pre-existing" to dictionary.

* Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods.

* Adjustments from review comments.
2022-03-08 13:13:11 -08:00
Kashif Faraz baea3ec614
Break up parallel indexing unit test to reduce test times (#12313)
* Break up parallel indexing unit test to reduce test times

* Fix checkstyle
2022-03-07 16:26:24 -07:00
Gian Merlino 28f8bcce9b
Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)
* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.

When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.

The main changes here are:

- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
  into a single condition. Also, make it required, since every usage
  is passing in a specific condition anyway.

* Test fixes.

* Fix read impl.
2022-03-05 14:39:14 -08:00
Victoria Lim 903174de20
correct errors on compaction doc (#12308) 2022-03-04 15:33:35 -08:00
Gian Merlino 3b373114dc
Officially support Java 11. (#12232)
There aren't any changes in this patch that improve Java 11
compatibility; these changes have already been done separately. This
patch merely updates documentation and explicit Java version checks.

The log message adjustments in DruidProcessingConfig are there to make
things a little nicer when running in Java 11, where we can't measure
direct memory _directly_, and so we may auto-size processing buffers
incorrectly.
2022-03-04 14:15:45 -08:00
Gian Merlino ada3ae08df
Retain order in TaskReport. (#12005) 2022-03-04 08:06:20 -08:00
Sandeep 61e1ffc7f7
add a new query laning metrics to visualize lane assignment (#12111)
* add a new query laning metrics to visualize lane assignment

* fixes :spotbugs check

* Update docs/operations/metrics.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2022-03-04 15:21:17 +08:00
Frank Chen 36bc41855d
Set Content-Type for String based response (#12295) 2022-03-04 15:17:03 +08:00
Samarth Jain 58d05d7014
Fix ci (#12304) 2022-03-03 23:05:50 -08:00
dependabot[bot] a1cdee2a3a
Bump jersey.version from 1.19.3 to 1.19.4 (#12290)
* Bump jersey.version from 1.19.3 to 1.19.4

Bumps `jersey.version` from 1.19.3 to 1.19.4.

Updates `jersey-client` from 1.19.3 to 1.19.4

Updates `jersey-core` from 1.19.3 to 1.19.4

Updates `jersey-grizzly2` from 1.19.3 to 1.19.4

Updates `jersey-guice` from 1.19.3 to 1.19.4

Updates `jersey-server` from 1.19.3 to 1.19.4

Updates `jersey-servlet` from 1.19.3 to 1.19.4

Updates `jersey-json` from 1.19.3 to 1.19.4

Updates `jersey-test-framework-core` from 1.19.3 to 1.19.4

Updates `jersey-test-framework-grizzly2` from 1.19.3 to 1.19.4

---
updated-dependencies:
- dependency-name: com.sun.jersey:jersey-client
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-core
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-grizzly2
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.contribs:jersey-guice
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-server
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-servlet
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey:jersey-json
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.jersey-test-framework:jersey-test-framework-core
  dependency-type: direct:development
  update-type: version-update:semver-patch
- dependency-name: com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update licenses.yaml

* Update licenses.yaml

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
2022-03-04 09:57:20 +08:00
Clint Wylie 1c004ea47e
use virtual columns for sql simple aggregators instead of inline expressions (#12251)
* use virtual columns for sql simple aggregators instead of inline expressions

* fixes

* always use virtual columns

* add more tests
2022-03-03 15:05:28 -08:00
Jason Koch 36193955b6
perf: eliminate expensive log construction in remote-task-runner shutdown (#12097) 2022-03-03 13:38:21 -08:00
Jason Koch f594e7ac24
perf: improve RemoteTaskRunner task assignment loop performance (#12096)
* perf: improve ZkWorker task lookup performance

This improves the performance of the ZkWorker task lookup loop by
eliminating repeat calls to getRunningTasks() in toImmutable(),
and reduces the work performed in isRunningTask() to stream-parse
the id field instead of entire JSON blob.
2022-03-02 09:38:32 -08:00
Tejaswini Bandlamudi 1af4c9c933
Display row stats for multiphase parallel indexing tasks (#12280)
Row stats are reported for single phase tasks in the `/liveReports` and `/rowStats` APIs
and are also a part of the overall task report. This commit adds changes to report
row stats for multiphase tasks too.

Changes:
- Add `TaskReport` in `GeneratedPartitionsReport` generated during hash and range partitioning
- Collect the reports for `index_generate` phase in `ParallelIndexSupervisorTask`
2022-03-02 10:10:31 +05:30
Alexander Saydakov 50038d9344
latest datasketches-java-3.1.0 (#12224)
These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544.

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2022-03-01 17:14:42 -08:00
Laksh Singla 3f709db173
Make ParseExceptions more informative (#12259)
This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
2022-02-28 22:31:15 +05:30
Xavier Léauté d105519558
Replace use of PowerMock with Mockito (#12282)
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. 

* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
2022-02-27 22:47:09 -08:00
Xavier Léauté 1434197ee1
update airline dependency to 2.x (#12270)
* upgrade Airline to Airline 2
  https://github.com/airlift/airline is no longer maintained, updating to
  https://github.com/rvesse/airline (Airline 2) to use an actively
  maintained version, while minimizing breaking changes.

  Note, this is a backwards incompatible change, and extensions relying on
  the CliCommandCreator extension point will also need to be updated.

* fix dependency checks where jakarta.inject is now resolved first instead
  of javax.inject, due to Airline 2 using jakarta
2022-02-27 15:19:28 -08:00
Xavier Léauté 4c61878f9c
Reduce use of mocking and simplify some tests (#12283)
* remove use of mocks for ServiceMetricEvent
* simplify KafkaEmitterTests by moving to Mockito
* speed up KafkaEmitterTest by adjusting reporting frequency in tests
* remove unnecessary easymock and JUnitParams dependencies
2022-02-26 17:23:09 -08:00
Karan Kumar a080fcdd7b
Fixing hadoop 3 Dockerfile (#12284) 2022-02-26 19:18:29 +05:30
Jihoon Son e5ad862665
A new includeAllDimension flag for dimensionsSpec (#12276)
* includeAllDimensions in dimensionsSpec

* doc

* address comments

* unused import and doc spelling
2022-02-25 18:27:48 -08:00
Jason Koch eb1b53b7f8
perf: indexing: Introduce a bulk getValuesInto function to read values (#12105)
* perf: indexing: Introduce a bulk getValuesInto function to read values in bulk

If large number of values are required from DimensionDictionary
during indexing, fetch them all in a single lock/unlock instead of
lock/unlock each individual item.

* refactor: rename key to keys in function args

* fix: check explicitly that argument length on arrays match

* refactor: getValuesInto renamed to getValues, now creates and returns a new T[] rather than filling
2022-02-25 12:19:04 -08:00
Karan Kumar b86f2d4c2e
Performance fixes in proto readers (#12267) 2022-02-24 23:21:48 +05:30
Xavier Léauté 009dd9e09a
upgrade core Apache Kafka dependencies to 3.1.0 (#12203)
Announcement: https://blogs.apache.org/kafka/entry/what-s-new-in-apache7
Release notes: https://dist.apache.org/repos/dist/release/kafka/3.1.0/RELEASE_NOTES.html

* upgrade core Apache Kafka dependencies to 3.1.0
* fix use of private Kafka APIs
* remove deprecated test rules
* remove mock calls that weren't verified in the first place
* remove the need for powermock in KafkaLookupExtractorFactoryTest
* align curator-test version with curator itself
* update easymock to 4.3.0
2022-02-23 18:42:51 -08:00
Jonathan Wei b1640a72ee
Re-enable segment metadata cache when using external schema (#12264) 2022-02-22 19:50:29 -06:00
Karan Kumar b94390ba33
Adding Shared Access resource support for azure (#12266)
Azure Blob storage has multiple modes of authentication. One of them is Shared access resource
. This is very useful in cases when we do not want to add the account key in the druid properties .
2022-02-22 18:27:43 +05:30
Maytas Monsereenusorn 6e2eded277
Allow coordinator run auto compaction duty period to be configured separately from other indexing duties (#12263)
* add impl

* add impl

* add unit tests

* add impl

* add impl

* add serde test

* add tests

* add docs

* fix test

* fix test

* fix docs

* fix docs

* fix spelling
2022-02-18 23:02:57 -08:00
AmatyaAvadhanula 1ec57cb935
Improve kinesis task assignment after resharding (#12235)
Problem:
- When a kinesis stream is resharded, the original shards are closed.
   Any intermediate shard created in the process is eventually closed as well.
- If a shard is closed before any record is put into it, it can be safely ignored for ingestion.
- It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster.

Changes:
- Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor`
- Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig`
- The caches are used and updated only when `skipIgnorableShards = true`
2022-02-18 12:37:06 +05:30
tejaswini-imply 70c40c4281
Fix long overflow in SegmentCostCache.Bucket.toLocalInterval (#12257)
Problem:
When using a `CachingCostBalancerStrategy` with segments of granularity ALL,
no segment gets loaded.
- With granularity ALL, segments of eternity interval are created which have 
  `start = Long.MIN_VALUE / 2` and `end = Long.MAX_VALUE / 2`.
- For cost calculation in the balancer strategy, `toLocalInterval()` method is invoked where
  `Long.MIN_VALUE / 2` or `Long.MAX_VALUE / 2` cause an overflow thus resulting in no overlap.
- The strategy is unable to find any eligible server for loading a given segment.

Fix:
- Reverse order of operations to divide by `MILLIS_FACTOR` (~10^8) first,
   then do the subtraction to prevent Long overflow.
2022-02-17 15:13:51 +05:30