Commit Graph

13755 Commits

Author SHA1 Message Date
Katya Macedo f37d019fe6
Fix redirects for streaming ingestion (#15943) 2024-02-22 22:34:19 +05:30
Clint Wylie cc5964fbcb
fix NestedCommonFormatColumnHandler to use nullable comparator when castToType is set (#15921)
Fixes a bug when the undocumented castToType parameter is set on 'auto' column schema, which should have been using the 'nullable' comparator to allow null values to be present when merging columns, but wasn't which would lead to null pointer exceptions. Also fixes an issue I noticed while adding tests that if 'FLOAT' type was specified for the castToType parameter it would be an exception because that type is not expected to be present, since 'auto' uses the native expressions to determine the input types and expressions don't have direct support for floats, only doubles.

In the future I should probably split this functionality out of the 'auto' schema (maybe even have a simpler version of the auto indexer dedicated to handling non-nested data) but still have the same results of writing out the newer 'nested common format' columns used by 'auto', but I haven't taken that on in this PR.
2024-02-22 21:35:50 +05:30
Jamie 80942d5754
Feature: add support for ingesting from rabbitmq super streams (#14137)
* Add support for ingesting from Rabbit MQ Super Streams
2024-02-22 10:50:37 +05:30
George Shiqi Wu 59bb72a926
Fix parsing of env variables when properties have underscores (#15919)
* Fix parsing of env variables when properties have underscores

* Add documentation

* Use a % sign instead
2024-02-21 13:18:21 -05:00
Zoltan Haindrich bcce0806d7
Support Union in decoupled mode (#15870) 2024-02-21 10:54:50 -05:00
Zoltan Haindrich 170d37f188
add check to build docker image (#15894) 2024-02-21 10:53:35 -05:00
Benedict Jin 0f38a98368
Update the link of Helm Chart to avoid 404 error (#15905) 2024-02-21 16:57:41 +08:00
Suneet Saldanha cbc53d53b4
Update k8sTaskRunner log message (#15871) 2024-02-21 14:34:00 +08:00
Gian Merlino e20004c7df
Remove helm chart. (#15904)
The helm chart was originally moved here in #11163 from
https://github.com/helm/charts/tree/master/incubator/druid after the
helm/charts repository was deprecated. However, it has been excluded
from releases since then, due to uncertainty around whether we need
IP clearance. We have not had volunteers willing to sort this out,
so this patch removes the code.

It can be re-added if a volunteer is available to sort out the
IP clearance process.

See thread at: https://lists.apache.org/thread/ygyzt23m06vc775nq5dsm349rf0j47dg
2024-02-21 14:21:37 +08:00
Laksh Singla a1b2c7326e
Numeric array support for columnar frames (#15917)
Columnar frames used in subquery materialization and window functions now support numeric arrays.
2024-02-21 11:32:33 +05:30
George Shiqi Wu 2c0d1128f8
Fix pod template reading logic (#15915)
* Fix pod template reading

* PR changes

* Fix unit tests
2024-02-20 11:13:51 -05:00
Adarsh Sanjeev 9eaaeb5c16
Add security ITs to the revised integration tests (#15885)
* Add IT for security

* Add admin client

* Clean up code

* Clean up code

* Address review comments
2024-02-20 11:32:08 +05:30
Gian Merlino 9c41827dba
Globally disable AUTO_CLOSE_JSON_CONTENT. (#15880)
* Globally disable AUTO_CLOSE_JSON_CONTENT.

This JsonGenerator feature is on by default. It causes problems with code
like this:

  try (JsonGenerator jg = ...) {
    jg.writeStartArray();
    for (x : xs) {
      jg.writeObject(x);
    }
    jg.writeEndArray();
  }

If a jg.writeObject call fails due to some problem with the data it's
reading, the JsonGenerator will write the end array marker automatically
when closed as part of the try-with-resources. If the generator is writing
to a stream where the reader does not have some other mechanism to realize
that an exception was thrown, this leads the reader to believe that the
array is complete when it actually isn't.

Prior to this patch, we disabled AUTO_CLOSE_JSON_CONTENT for JSON-wrapped
SQL result formats in #11685, which fixed an issue where such results
could be erroneously interpreted as complete. This patch fixes a similar
issue with task reports, and all similar issues that may exist elsewhere,
by disabling the feature globally.

* Update test.
2024-02-16 08:52:48 -08:00
Clint Wylie fe2ba8cc28
fix return type inference of parse_long, which can also be null if string is not parseable into a long (#15909)
* fix return type inference of parse_long, which can also be null if string is not parseable into a long

* fix msq test
2024-02-15 08:45:34 -08:00
Vadim Ogievetsky 66f54f2066
allow compaction config slots to drop to 0 (#15877) 2024-02-15 15:27:15 +08:00
Parth Agrawal 495e66f2e7
CVE Fix: Update json-path version (#15772)
Apache Druid brings the dependency json-path which is affected by CVE-2023-51074.
Its latest version 2.9.0 fixes the above CVE.

Append function has been added to json-path and so the unit test to check for the append function not present has been updated.

---------

Co-authored-by: Xavier Léauté <xvrl@apache.org>
2024-02-14 20:58:27 -08:00
Tom f224035c7e
Fix Flakiness in KafkaEmitterTest (#15907)
* thrust of the fix to allow for the json values to be out of order

The existing problem is that toMap doesn't turn some values into json primitive
values, for example segmentMetadata just has DateTime objects for it's time in
the EventMap, but Alert event converts those into strings when calling toMap.
This creates an issue because when we check the emitted events the mapper
deserializing the string value for dateTime leaves it as a string in the
EventMap. So the question is do we alter the events toMap() to return string/map
version of objects or to make the expected events do a round trip of
eventMap -> string -> eventMap to turn everything into json primitives

* fix issue by making toMap events convert Objects into strings, or maps

* fix linting errors

* use method of using mapper to round trip expected data to make it have same type
as those of the events emitted

* remove unnecessary comment
2024-02-15 10:01:55 +05:30
317brian c98d54f3c4
docs: delete unused file that causes confusion (#15910) 2024-02-14 16:42:02 -08:00
YongGang 19ed5c863f
Enhance rolling Supervisor restarts at taskDuration (#15859) 2024-02-14 15:44:34 -08:00
Abhishek Radhakrishnan c324e37751
Add javadocs to `KafkaEmitterTest` & fix flaky test (#15898)
* Address review comment: add test javadocs

* Fix flaky assertion failure.

Use ConcurrentHashMap instead of HashMap because the producer callback
can trigger concurrently and override the map initialization.

* fixup intellij inspection
2024-02-14 11:52:06 -08:00
Sam Rash be0ee2ee33
update version check for profiling to >= 17 (#15686) 2024-02-14 21:44:20 +05:30
Peter Marshall cae9cbd7d7
Update tasks.md (#15887)
Remove erroneous white space causing render issues on this page.
2024-02-13 05:20:09 -08:00
Clint Wylie dad8398a4d
start process of deprecating non-sql compatible legacy configurations (#15713)
Starting the process to officially deprecate non SQL compatible modes by updating docs to aggressively call out that Druids non SQL compliant modes are deprecated and will go away someday. There are no code or behavior changes at this PR.
2024-02-13 15:31:45 +05:30
Tom c225c19f81
fix copy paste issue in earlier PR (#15890) 2024-02-12 19:49:19 -05:00
Gian Merlino 0f6a895372
Rework ExprMacro base classes to simplify implementations. (#15622)
* Rework ExprMacro base classes to simplify implementations.

This patch removes BaseScalarUnivariateMacroFunctionExpr, adds
BaseMacroFunctionExpr at the top of the hierarchy (a suitable base class
for ExprMacros that take either arrays or scalars), and adds an
implementation for "visit" to BaseMacroFunctionExpr.

The effect on implementations is generally cleaner code:

- Exprs no longer need to implement "visit".
- Exprs no longer need to implement "stringify", even if they don't
  use all of their args at runtime, because BaseMacroFunctionExpr has
  access to even unused args.
- Exprs that accept arrays can extend BaseMacroFunctionExpr and
  inherit a bunch of useful methods. The only one they need to
  implement themselves that scalar exprs don't is "supplyAnalyzeInputs".

* Make StringDecodeBase64UTFExpression a static class.

* Remove unused import.

* Formatting, annotation changes.
2024-02-12 15:50:45 -08:00
Katya Macedo 0f29ece6a9
[Docs] Refactor streaming ingestion section (#15591)
Merging the work so far. @ektravel , @vogievetsky if there are additional improvements, let's track them & make another pr.



* Refactor streaming ingestion docs

* Update property definition

* Update after review

* Update known issues

* Move kinesis and kafka topics to ingestion, add redirects

* Saving changes

* Saving

* Add input format text

* Update after review

* Minor text edit

* Update example syntax

* Revert back to colon

* Fix merge conflicts

* Fix broken links

* Fix spelling error
2024-02-12 13:52:42 -08:00
Charles Smith 2a42b11660
remove legacy Jupyter tutorial files (#15834)
* remove legacy files

* redirection for the jupyter tutorial page

* remove tutorial from sidebar

* remove redirection
2024-02-12 13:45:47 -08:00
Abhishek Radhakrishnan 51fd79ee58
Clean up kafka emitter tests, add more validations and code coverage. (#15878)
* Clean up kafka emitter tests a bit and add more validations.

The test wasn't validating what events were sent, but simply the dropped counters, which
aren't that useful.
Additionally, this module has fewer tests, so folks often run into code coverage issue
in this extension. Hopefully this change helps with that too.

* Change things to feed-based rather than topic-based.

* Another test for shared topic

* Switch to DruidException, add test dependencies and sad path config tests.

* missing test dependency

* minor renames.

* Add more tests - to test unknown events and drop when queue is full
2024-02-12 16:22:19 -05:00
Gian Merlino 7fea34abdd
LOOKUP docs: clarify behavior of replaceMissingValueWith. (#15879)
Clarify behavior when expr is null.
2024-02-11 13:11:00 -08:00
zachjsh f9ee2c353b
Extend the PARTITION BY clause to accept string literals for the time partitioning (#15836)
This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, extending the PARTITION BY clause to accept string literals for the time partitioning
2024-02-09 11:45:38 -05:00
Vishesh Garg 6e9eee4c5f
Add failure check (#15873) 2024-02-09 08:27:10 -08:00
Lasse Mammen 4255711b3e
fix: handle BOOKMARK events in kubernetes pod discovery (#15819) 2024-02-09 18:50:04 +05:30
Tom 11a8624ef1
allow for kafka-emitter to have extra dimensions be set for each event it emits (#15845)
* allow for kafka-emitter to have extra dimensions be set for each event it emits

* fix checktsyle issue in kafkaemitterconfig

* make changes to fix docs, and cleanup copy paste error in #toString()

* undo formatting to markdown table

* add more branches so test passes

* fix checkstyle issue
2024-02-08 22:55:24 -08:00
George Shiqi Wu d703b2c709
Add azure kill test (#15833)
* Add kill test

* Extra line

* Don't need toString

* Add back test

* Remove newline

* move kill verification into main test
2024-02-08 16:15:30 -05:00
Sree Charan Manamala 57e12df352
Sql Single Value Aggregator for scalar queries (#15700)
Executing single value correlated queries will throw an exception today since single_value function is not available in druid.
With these added classes, this provides druid, the capability to plan and run such queries.
2024-02-08 19:20:30 +05:30
Soumyava f3996b96ff
Fixes for safe_divide with vectorize and datatypes (#15839)
* Fix for save_divide with vectorize

* More fixes

* Update to use expr.eval(null) for both cases when denominator is 0
2024-02-08 14:40:42 +05:30
Abhishek Radhakrishnan 1a5b57df84
Update `groupId` for delta-lake and iceberg extensions (#15843)
* Update the group id to org.apache.druid.extensions.contrib for contrib exts.

* Note iceberg and delta lake extensions in extensions.md

* properties and shell backticks

* Update groupId in distribution/pom.xml

* remove delta-lake from dist.

* Add note on downloading extension.
2024-02-07 23:54:06 -08:00
Vadim Ogievetsky 26815d425b
Web console: add system fields UI (#15858)
This PR adds console support for configuring system fields in the batch data loader.
2024-02-08 11:08:55 +05:30
Gian Merlino 21a97f4c61
Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch. (#15860)
* Fix HllSketchHolderObjectStrategy#isSafeToConvertToNullSketch.

The prior code from #15162 was reading only the low-order byte of an int
representing the size of a coupon set. As a result, it would erroneously
believe that a coupon set with a multiple of 256 elements was empty.
2024-02-08 08:14:28 +05:30
Adarsh Sanjeev 514b3b4d01
Add export capabilities to MSQ with SQL syntax (#15689)
* Add test

* Parser changes to support export statements

* Fix builds

* Address comments

* Add frame processor

* Address review comments

* Fix builds

* Update syntax

* Webconsole workaround

* Refactor

* Refactor

* Change export file path

* Update docs

* Remove webconsole changes

* Fix spelling mistake

* Parser changes, add tests

* Parser changes, resolve build warnings

* Fix failing test

* Fix failing test

* Fix IT tests

* Add tests

* Cleanup

* Fix unparse

* Fix forbidden API

* Update docs

* Update docs

* Address review comments

* Address review comments

* Fix tests

* Address review comments

* Fix insert unparse

* Add external write resource action

* Fix tests

* Add resource check to overlord resource

* Fix tests

* Add IT

* Update syntax

* Update tests

* Update permission

* Address review comments

* Address review comments

* Address review comments

* Add tests

* Add check for runtime parameter for bucket and path

* Add check for runtime parameter for bucket and path

* Add tests

* Update docs

* Fix NPE

* Update docs, remove deadcode

* Fix formatting
2024-02-07 22:08:50 +05:30
Vadim Ogievetsky f2b242b6e6
update console to core Druid changes (#15854) 2024-02-07 19:44:25 +05:30
Clint Wylie 23d4fade90
use NullFilter for SQL rewrite of MV_CONTAINS and MV_OVERLAP for null array elements (#15855)
Fixes an oversight after #14542 that happens in the SQL planner rewrite of MV_CONTAINS and MV_OVERLAP when faced with array elements that are NULL, where we were incorrectly using EqualityFilter instead of NullFilter for null elements (EqualityFilter does not accept null elements).
2024-02-07 19:40:41 +05:30
Bartosz Mikulski 45c26e8682
Fix Inspection Check in DirectDruidClientTest (#15857) 2024-02-07 02:56:26 -08:00
Zoltan Haindrich fdc7cec271
Support Window operators in decoupled planning (#15815) 2024-02-07 04:09:48 -05:00
Bartosz Mikulski 43a1c96cd1
Fix query-cancellation-executor memory leak (#15754)
This PR fixes #15069 by resolving a memory leak caused by a thread leak in query-cancellation-executor.
2024-02-07 10:54:38 +05:30
Pramod Immaneni 59bca0951a
Parallelize storage of incremental segments (#13982)
During ingestion, incremental segments are created in memory for the different time chunks and persisted to disk when certain thresholds are reached (max number of rows, max memory, incremental persist period etc). In the case where there are a lot of dimension and metrics (1000+) it was observed that the creation/serialization of incremental segment file format for persistence and persisting the file took a while and it was blocking ingestion of new data. This affected the real-time ingestion. This serialization and persistence can be parallelized across the different time chunks. This update aims to do that.

The patch adds a simple configuration parameter to the ingestion tuning configuration to specify number of persistence threads. The default value is 1 if it not specified which makes it the same as it is today.
2024-02-07 10:43:05 +05:30
Sam Wheating 4c58856f10
Fix incorrect ordering of args in log statement (#15846) 2024-02-06 16:12:04 -08:00
Abhishek Radhakrishnan 1affa35b29
Bump up Delta Lake Kernel to 3.1.0 (#15842)
This patch bumps Delta Lake Kernel dependency from 3.0.0 to 3.1.0, which released last week - please see https://github.com/delta-io/delta/releases/tag/v3.1.0 for release notes.

There were a few "breaking" API changes in 3.1.0, you can find the rationale for some of those changes here.

Next-up in this extension: add and expose filter predicates.
2024-02-06 21:25:17 +05:30
317brian 2dc71c7874
docs: fix rendering (#15835) 2024-02-06 07:18:43 -08:00
Gian Merlino 54b30646f3
Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832)
If lots of keys map to the same value, reversing a LOOKUP call can slow
things down unacceptably. To protect against this, this patch introduces
a parameter sqlReverseLookupThreshold representing the maximum size of an
IN filter that will be created as part of lookup reversal.

If inSubQueryThreshold is set to a smaller value than
sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead.
This allows users to use that single parameter to control IN sizes if they
wish.
2024-02-06 16:32:05 +05:30