* New handling for COALESCE, SEARCH, and filter optimization.
COALESCE is converted by Calcite's parser to CASE, which is largely
counterproductive for us, because it ends up duplicating expressions.
In the current code we end up un-doing it in our CaseOperatorConversion.
This patch has a different approach:
1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before
the Volcano planner runs, using CaseToCoalesceRule.
2) Add FilterDecomposeCoalesceRule to decompose calls like
"f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))".
This helps use indexes when available on x and y.
3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP.
4) Add a native "coalesce" function so we can convert 3+ arg COALESCE.
The advantage of this approach is that by un-doing the CASE to COALESCE
conversion earlier, we have flexibility to do more stuff with
COALESCE (like decomposition and pushing into LOOKUP).
SEARCH is an operator used internally by Calcite to represent matching
an argument against some set of ranges. This patch improves our handling
of SEARCH in two ways:
1) Expand NOT points (point "holes" in the range set) from SEARCH as
`!(a || b)` rather than `!a && !b`, which makes it possible to convert
them to a "not" of "in" filter later.
2) Generate those nice conversions for NOT points even if the SEARCH
is not composed of 100% NOT points. Without this change, a SEARCH
for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like
"x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')".
One of the steps we take when generating Druid queries from Calcite
plans is to optimize native filters. This patch improves this step:
1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can
convert "(a && x = 'b') || (a && x = 'c')" into "a && x IN ('b', 'c')".
2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on
ORs with lots of children by adjusting the logic to avoid calling
"indexOf" and "remove" on an ArrayList.
3) Refactor ConvertSelectorsToIns to reduce duplicated code between the
handling for "selector" and "equals" filters.
* Not so final.
* Fixes.
* Fix test.
* Fix test.
* Fix ColumnSelectorColumnIndexSelector#getColumnCapabilities.
It was using virtualColumns.getColumnCapabilities, which only returns
capabilities for virtual columns, not regular columns. The effect of this
is that expression filters (and in some cases, arrayContainsElement filters)
would build value matchers rather than use indexes.
I think this has been like this since #12315, which added the
getColumnCapabilities method to BitmapIndexSelector, and included the same
implementation as exists in the code today.
This error is easy to make due to the design of virtualColumns.getColumnCapabilities,
so to help avoid it in the future, this patch renames the method to
getColumnCapabilitiesWithoutFallback to emphasize that it does not return
capabilities for regular columns.
* Make getColumnCapabilitiesWithoutFallback package-private.
* Fix expression filter bitmap usage.
* Minor fixes
* Update docs/development/extensions-contrib/prometheus.md
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
---------
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
* Allow empty inserts and replace.
- Introduce a new query context failOnEmptyInsert which defaults to false.
- When this context is false (default), MSQE will now allow empty inserts and replaces.
- When this context is true, MSQE will throw the existing InsertCannotBeEmpty MSQ fault.
- For REPLACE ALL over an ALL grain segment, the query will generate a tombstone spanning eternity
which will be removed eventually be the coordinator.
- Add unit tests in MSQInsertTest, MSQReplaceTest to test the new default behavior (i.e., when failOnEmptyInsert = false)
- Update unit tests in MSQFaultsTest to test the non-default behavior (i.e., when failOnEmptyInsert = true)
* Ignore test to see if it's the culprit for OOM
* Add heap dump config
* Bump up -Xmx from 1500 MB to 2048 MB
* Add steps to tarball and collect hprof dump to GHA action
* put back mx to 1500MB to trigger the failure
* add the step to reusable unit test workflow as well
* Revert the temp heap dump & @Ignore changes since max heap size is increased
* Minor updates
* Review comments
1. Doc suggestions
2. Add tests for empty insert and replace queries with ALL grain and limit in the
default failOnEmptyInsert mode (=false). Add similar tests to MSQFaultsTest with
failOnEmptyInsert = true, so the query does fail with an InsertCannotBeEmpty fault.
3. Nullable annotation and javadocs
* Add comment
replace_limit.patch
The PR: #13947 introduced a function evalDimension() in the interface RowFunction.
There was no default implementation added for this interface which causes all the implementations and custom transforms to fail and require to implement their own version of evalDimension method. This PR adds a default implementation in the interface which allows the evalDimension to return value as a Singleton array of eval result.
Currently in the realtime ingestion (Kafka/Kinesis) case, after publishing the segments, upon acknowledgement from the coordinator that the segments are already placed in some historicals, the peon would unannounce the segments (basically saying the segments are not in this peon anymore to the whole cluster) and drop the segments from cache and sink timeline in one shot.
The in transit queries from the brokers that still thinks the segments are in the peon can get a NullPointer exception when the peon is unsetting the hydrants in the sinks.
The fix would let the peon to wait for a configurable delay period before dropping segments, remove segments from cache etc after the peon unannounce the segments.
This delayed approach is similar to how the historicals handle segments moving out.
Database slowness while doing audits seems to be causing flakiness in auth ITs.
The failing test is almost always
`ITBasicAuthConfigurationTest.test_avaticaQuery_datasourceAndContextParamsUser`
but in some rare cases, other tests fail too. Alternately, this failing test has been seen to pass too.
It is most likely because the auth changes are not able to propagate in time from
the coordinator to other services.
Fix: Just log the audits rather than persisting them to database.
Most audits have been newly added and it is okay to not have them persisted.
Moreover, logging audits can also be more beneficial while debugging an IT.
Fixes#15072
Before this modification , the third parameter (timezone) require to be a Literal, it will throw a error when this parameter is column Identifier.
Changes
- Add `log` implementation for `AuditManager` alongwith `SQLAuditManager`
- `LoggingAuditManager` simply logs the audit event. Thus, it returns empty for
all `fetchAuditHistory` calls.
- Add new config `druid.audit.manager.type` which can take values `log`, `sql` (default)
- Add new config `druid.audit.manager.logLevel` which can take values `DEBUG`, `INFO`, `WARN`.
This gets activated only if `type` is `log`.
- Remove usage of `ConfigSerde` from `AuditManager` as audit is not just limited to configs
- Add `AuditSerdeHelper` for a single implementation of serialization/deserialization of
audit payload and other utility methods.
* unpin snakeyaml globally, add suppressions and licenses
* pin snakeyaml in the specific modules that require version 1.x, update licenses and owasp suppression
This removes the pin of the Snakeyaml introduced in: https://github.com/apache/druid/pull/14519
After the updates of io.kubernetes.java-client and io.confluent.kafka-clients, the only uses of the Snakeyaml 1.x are:
- in test scope, transitive dependency of jackson-dataformat-yaml🫙2.12.7
- in compile scope in contrib extension druid-cassandra-storage
- in compile scope in it-tests.
With the dependency version un-pinned, io.kubernetes.java-client and io.confluent.kafka-clients bring Snakeyaml versions 2.0 and 2.2, consequently allowing to build a Druid distribution without the contrib-extension and free of vulnerable Snakeyaml versions.
* Allow for kafka emitter producer secrets to be masked in logs instead of being visible
This change will allow for kafka producer config values that should be secrets to not show up in the logs.
This will enhance the security of the people who use the kafka emitter to use this if they want to.
This is opt in and will not affect prior configs for this emitter
* fix checkstyle issue
* change property name
I was looking into a query which was performing a bit poorly because the case_searched was touching more than 1 columns (if there is only 1 column there is a cache based evaluator).
While I was doing that I've noticed that there are a few simple things which could help a bit:
use a static TRUE/FALSE instead of creating a new object every time
create the ExprEval early for ConstantExpr -s (except the one for BigInteger which seem to have some odd contract)
return early from type autodetection
these changes mostly reduce the amount of garbage the query creates during case_searched evaluation; although ExpressionSelectorBenchmark shows some improvements ~15% - but my manual trials on the taxi dataset with 60M rows showed more improvements - probably due to the fact that these changes mostly only reduce gc pressure.
* Clean useless InterruptedException warn in ingestion task log
* test coverage for the code change, manually close the scheduler thread to trigger Interrupt signal
---------
Co-authored-by: Qiong Chen <qiong.chen@shopee.com>
The segment allocation algorithm reuses an already allocated pending segment if the new allocation request is made for the same parameters:
datasource
sequence name
same interval
same value of skipSegmentLineageCheck (false for batch append, true for streaming append)
same previous segment id (used only when skipSegmentLineageCheck = false)
The above parameters can thus uniquely identify a pending segment (enforced by the UNIQUE constraint on the sequence_name_prev_id_sha1 column in druid_pendingSegments metadata table).
This reuse is done in order to
allow replica tasks (in case of streaming ingestion) to use the same set of segment IDs.
allow re-run of a failed batch task to use the same segment ID and prevent unnecessary allocations
The PR addresses 2 things:
Add MSQ durable storage connector for GCS
Change GCS client library from the old Google API Client Library to the recommended Google Cloud Client Library. Ref: https://cloud.google.com/apis/docs/client-libraries-explained
* Upgrade org.pac4j:pac4j-oidc to 4.5.5 to address CVE-2021-44878
* add CVE suppression and notes, since vulnerability scan still shows this CVE
* Add tests to improve coverage
* Optional removal of metrics from Prometheus PushGateway on shutdown
* Make pushGatewayDeleteOnShutdown property nullable
* Add waitForShutdownDelay property
* Fix unit test
* Address PR comments
* Address PR comments
* Add explanation on why it is useful to have deletePushGatewayMetricsOnShutdown
* Fix spelling error
* Fix spelling error
This PR updates the tag present in pom.xml to match the druid version in pom.xml
This was last updated in 0da8ffc
It seems to me like this was missed in further Druid version upgrades.
Update of direct dependencies:
* kubernetes java-client to 19.0.0
* docker-java-bom to 3.3.4
In order to update transitive dependencies:
* okio to 3.6.0
* bcjava to 1.76
To address CVES:
- CVE-2023-3635 in okio
- CVE-2023-33201 in bcjava
---------
Co-authored-by: Xavier Léauté <xvrl@apache.org>
This PR builds on #13304 to skip compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.
This is needed in order to prevent an issue similar to #13208 as the Coordinator tries to iterate over a large number of intervals when trying to compact an interval with infinite start or end.
The website pom was removed as part of
https://github.com/apache/druid/pull/14411 so we no longer need to
reference it as a module and the profile can be removed.
Dependabot is currently failing trying to look for this module, so
removing it should also fix that.
### Description
This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified
`GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}`
Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval.
Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results.
The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter.
`dataSourceName`: The name of the datasource
`interval`: the specific interval to search for unused segments for.
`limit`: the maximum number of unused segments to return information about. This property helps to
support pagination
`lastSegmentId`: the last segment id from which to search for results. All segments returned are > this segment
lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC.
`sortOrder`: Specifies the order with which to return the matching segments by start time, end time. A null
value indicates that order does not matter.
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [ ] a release note entry in the PR description.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
* Add initial draft of MarkDanglingTombstonesAsUnused duty.
* Use overshadowed segments instead of all used segments.
* Add unit test for MarkDanglingSegmentsAsUnused duty.
* Add mock call
* Simplify code.
* Docs
* shorter lines formatting
* metric doc
* More tests, refactor and fix up some logic.
* update javadocs; other review comments.
* Make numCorePartitions as 0 in the TombstoneShardSpec.
* fix up test
* Add tombstone core partition tests
* Update docs/design/coordinator.md
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
* review comment
* Minor cleanup
* Only consider tombstones with 0 core partitions
* Need to register the test shard type to make jackson happy
* test comments
* checkstyle
* fixup misc typos in comments
* Update logic to use overshadowed segments
* minor cleanup
* Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone.
* Address review feedback.
---------
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Query with lookups in FilteredAggregator fails with this exception in router,
Cannot construct instance of `org.apache.druid.query.aggregation.FilteredAggregatorFactory`, problem: Lookup [campaigns_lookup[campaignId][is_sold][autodsp]] not found at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 913] (through reference chain: org.apache.druid.query.groupby.GroupByQuery["aggregations"]->java.util.ArrayList[1])
T
he problem is that constructor of FilteredAggregatorFactory is actually validating if the lookup exists in this statement dimFilter.toFilter().
This is failing on the router, which is to be expected, because, the router isn’t assigned any lookups.
The fix is to move to a lazy initialisation of the filter object in the constructor.