Commit Graph

12564 Commits

Author SHA1 Message Date
Clint Wylie 2219e68fa3
add backwards compat mode for frontCoded stringEncodingStrategy (#13988) 2023-03-28 14:44:44 -07:00
Paul Rogers 76fe26d4ba
Fix typos, add tests for http() function (#13954) 2023-03-28 14:41:06 -07:00
frankgrimes97 2f98675285
Tuple sketch SQL support (#13887)
This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.
2023-03-28 18:47:12 +05:30
Karan Kumar c2fe6a4956
Reworking s3 connector with various improvements (#13960)
* Reworking s3 connector with
1. Adding retries
2. Adding max fetch size
3. Using s3Utils for most of the api's
4. Fixing bugs in DurableStorageCleaner
5. Moving to Iterator for listDir call
2023-03-28 17:05:16 +05:30
Rishabh Singh e8e8082573
Update OIDCConfig with scope information (#13973)
Allow users to provide custom scope through OIDC configuration
2023-03-28 14:50:00 +05:30
Clint Wylie d5b1b5bc8e
nested columns + arrays = array columns! (#13803)
array columns!
changes:
* add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns
* nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns
* neat test refactor stuff
* add v4 segment test
* add array element indexes
* add tests for unnest and array columns
* fix unnest column value selector cursor handling of null and empty arrays
2023-03-27 12:42:35 -07:00
Gian Merlino 062d72b67e
Add timeout to TaskStartTimeoutFault. (#13970)
* Add timeout to TaskStartTimeoutFault.

Makes the error message a bit more useful.

* Update docs.
2023-03-27 23:37:19 +05:30
Arnout Engelen daff7fe73b
Document how to report security issues (#13886)
Document how to report security issues on the security overview page, so we can link this page from the homepage. That should make all the other important security information easier to find as well.
2023-03-27 11:26:37 +05:30
kaijianding 13ffeb50ba
should retry when failed to pause realtime task (#11515) 2023-03-25 19:03:13 +05:30
Atul Mohan 19db32d6b4
Add JWT authenticator support for validating ID Tokens (#13242)
Expands the OIDC based auth in Druid by adding a JWT Authenticator that validates ID Tokens associated with a request. The existing pac4j authenticator works for authenticating web users while accessing the console, whereas this authenticator is for validating Druid API requests made by Direct clients. Services already supporting OIDC can attach their ID tokens to the Druid requests
under the Authorization request header.
2023-03-25 18:41:40 +05:30
Rishabh Singh 598eaad7e1
Fix HSTS for middle manager (#13975)
Fix HSTS for middle manager
2023-03-25 14:01:09 +05:30
Gian Merlino 549018d076 Revert "Update docs."
This reverts commit de27c7d3c1.
2023-03-24 17:16:12 -07:00
Gian Merlino de27c7d3c1 Update docs. 2023-03-24 17:15:27 -07:00
Nicholas Lippis 8a72544bd2
Hook up pod template adapter (#13966)
* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969)

* Update docs/development/extensions-contrib/k8s-jobs.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Hook up PodTemplateTaskAdapter

* Make task adapter TYPE parameters final

* Rename adapters types

* Include specified adapter name in exception message

* Documentation for sidecarSupport deprecation

* Fix order

* fix spelling errors

---------

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2023-03-24 12:13:46 -06:00
Jill Osborne 976d39281f
Fix some broken links in docs (#13968) 2023-03-24 10:48:23 -07:00
Nicholas Lippis 36df2495e1
Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) 2023-03-23 16:45:01 -06:00
Abhishek Agarwal 139a058ba7
Use sonatype maven central for plugin repositories (#13961)
* Change search order of maven repositories

* Update pom.xml
2023-03-23 15:35:47 +05:30
abhagraw c52d15d65d
Fixing security vulnerability check errors (#13956)
* Fixing security vulnerability check errors

* Updating javax.el to jakarta.el

* Adding cron job trigger on changes to suppressions file
2023-03-23 11:10:06 +05:30
Paul Rogers da42ee5bfa
Added TYPE(native) data type for external tables (#13958) 2023-03-22 21:43:29 -07:00
Soumyava 2ad133c06e
Unnest changes for moving the filter on right side of correlate to inside the unnest datasource (#13934)
* Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as
1. filter on unnested column which involves a left filter rewrite
2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite
3. not filters
4. SQL functions applied on top of unnested column
5. null present in first row of the column to be unnested
2023-03-22 18:24:00 -07:00
Vadim Ogievetsky 8d125b7c7f
Web console: segment writing progress indication (#13929)
* add segment writing progress indication

* update with more metrics

* add push metric
2023-03-22 16:34:38 -07:00
Nicholas Lippis d81d13b9ba
Pod template task adapter (#13896)
* Pod template task adapter

* Use getBaseTaskDirPaths

* Remove unused task from getEnv

* Use Optional.ifPresent() instead of Optional.map()

* Pass absolute path

* Don't pass task to getEnv

* Assert the correct adapter is created

* Javadocs and Comments

* Add exception message to assertions
2023-03-22 14:20:24 -06:00
Clint Wylie 086eb26b74
fix join and unnest planning to ensure that duplicate join prefixes are not used (#13943)
* fix join and unnest planning to ensure that duplicate join prefixes are not used

* wont somebody please think of the children
2023-03-22 12:53:55 -07:00
Adarsh Sanjeev 7bab407495
Add segment generator counters to MSQ reports (#13909)
* Add segment generator counters to reports

* Remove unneeded annotation

* Fix checkstyle and coverage

* Add persist and merged as new metrics

* Address review comments

* Fix checkstyle

* Create metrics class to handle updating counters

* Address review comments

* Add rowsPushed as a new metrics
2023-03-22 09:17:26 -07:00
Clint Wylie f4392a3155
expression transform improvements and fixes (#13947)
changes:
* fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms
* improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away
* improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension
* added some tests for ExpressionTransform with array inputs
* improved ExpressionPostAggregator to use partial type information from decoration
* migrate some test uses of InputBindings.forMap to use other methods
2023-03-21 23:26:53 -07:00
Kashif Faraz b7752a909c
Enable round-robin segment assignment and batch segment allocation by default (#13942)
Changes:
- Set `useRoundRobinSegmentAssignment` in coordinator dynamic config to `true` by default.
- Set `batchSegmentAllocation` in `TaskLockConfig` (used in Overlord runtime properties) to `true` by default.
2023-03-22 08:20:01 +05:30
Victoria Lim ede9903ff4
pip install for Python Druid API (#13938)
Broken test appears unrelated to this PR

* make druidapi pip installable

* include druidapi in prerequisites

* add license to setup.py

* updates from Paul's review

* note about editable install

* Apply suggestions from code review

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* update install instructions

* found unrelated typos

* standardize install cmd with pip

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-03-21 11:37:39 -07:00
Gian Merlino 1c7a03a47b
Lower default maxRowsInMemory for realtime ingestion. (#13939)
* Lower default maxRowsInMemory for realtime ingestion.

The thinking here is that for best ingestion throughput, we want
intermediate persists to be as big as possible without using up all
available memory. So, we rely mainly on maxBytesInMemory. The default
maxRowsInMemory (1 million) is really just a safety: in case we have
a large number of very small rows, we don't want to get overwhelmed
by per-row overheads.

However, maximum ingestion throughput isn't necessarily the primary
goal for realtime ingestion. Query performance is also important. And
because query performance is not as good on the in-memory dataset, it's
helpful to keep it from growing too large. 150k seems like a reasonable
balance here. It means that for a typical 5 million row segment, we
won't trigger more than 33 persists due to this limit, which is a
reasonable number of persists.

* Update tests.

* Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Fix test.

* Fix link.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-03-21 10:36:36 -07:00
Jill Osborne 4f95285406
Correct nested columns JSON example (#13953) 2023-03-21 09:17:26 -07:00
Atul Mohan 617c325c70
Make zk connection retries configurable (#13913)
* This makes the zookeeper connection retry count configurable. This is presently hardcoded to 29 tries which ends up taking a long time for the druid node to shutdown in case of ZK connectivity loss.
Having a shorter retry count helps k8s deployments to fail fast. In situations where the underlying k8s node loses network connectivity or is no longer able to talk to zookeeper, failing fast can trigger pod restarts which can then reassign the pod to a healthy k8s node.
Existing behavior is preserved, but users can override this property if needed.
2023-03-21 14:45:28 +05:30
Adarsh Sanjeev 143fdcfacf
Change test name so it triggers in CI (#13844)
As the name of the class did not end or start with "Test", CalciteSelectQueryMSQTest was not triggered in CI. This PR renames the test.
2023-03-20 15:55:52 +05:30
Tejaswini Bandlamudi 1c250a0bc0
Fix error in cron job ITs workflow (#13945) 2023-03-17 17:29:45 +05:30
John Gozde 38adac4369
Dart sass (#13937)
* Run npx saas-migrator division

* Switch to dart sass

* Upgrade blueprint

* Remove deprecated import syntax

* Prettify

* Snapshots
2023-03-16 12:44:24 -07:00
Karan Kumar bf13156b55
Regression bug fix where ever LimitFrameProcessor's were used. (#13941) 2023-03-16 09:18:18 -07:00
Karan Kumar 67df1324ee
Undocumenting certain context parameter in MSQ. (#13928)
* Removing intermediateSuperSorterStorageMaxLocalBytes, maxInputBytesPerWorker, composedIntermediateSuperSorterStorageEnabled, clusterStatisticsMergeMode from docs

* Adding documentation in the context class.
2023-03-16 17:56:44 +05:30
Tejaswini Bandlamudi da197c9273
Migrate existing jdk11 ITs to cron job (#13918)
This cron job runs on the latest commit of the master branch by default daily at 3:00 AM UTC.
2023-03-16 15:30:07 +05:30
Tejaswini Bandlamudi 6837289cb0
Fixes parquet uint_32 datatype conversion (#13935)
After parquet ingestion, uint_32 parquet datatypes are stored as null values in the dataSource. This PR fixes this conversion bug.
2023-03-16 15:27:38 +05:30
abhagraw c7d864d3bc
Update container creation in AzureTestUtil.java (#13911)
*
1. Handling deletion/creation of container created during the previously run test in AzureTestUtil.java.
2. Adding/updating log messages and comments in Azure and GCS deep storage tests.
2023-03-16 11:04:43 +05:30
317brian 65a663adbb
docs: clarify Java precision (#13671)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-03-15 11:43:41 -07:00
Benedict Jin cee2dfd768
Upgrade ZK from 3.5.9 to 3.5.10 to avoid data inconsistency risk (#13715) 2023-03-15 19:21:09 +05:30
Andreas Maechler 46766d245c
Replace deprecated substr with slice (#13822) 2023-03-15 03:57:06 -07:00
Clint Wylie ed57c5c853
better FrontCodedIndexed (#13854)
* Adds new implementation of 'frontCoded' string encoding strategy, which writes out a v1 FrontCodedIndexed which stores buckets on a prefix of the previous value instead of the first value in the bucket
2023-03-14 18:14:11 -07:00
somu-imply 7ce3371730
Fixing a github workflow to resolve conflicts and use the correct tag for jdk (#13933) 2023-03-14 16:06:27 -07:00
somu-imply a7ba361666
Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922)
* Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as
1. filter on unnested column which involves a left filter rewrite
2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite
3. not filters
4. SQL functions applied on top of unnested column
5. null present in first row of the column to be unnested
2023-03-14 16:05:56 -07:00
Paul Rogers 4493275d88
Use Maven central repo rather than Apache (#13921)
* Use Maven central repo rather than Apache

* Disable snapshots
2023-03-13 10:49:32 -07:00
Karan Kumar 29b6bf0942
Removing the forbidden check on getOrDefault due to java8 incompatibility. (#13920) 2023-03-11 09:49:32 +05:30
Elliott Freis 8a1dc2f51c
We want to tag the container based on the build jdk version, not the runtime version (#13917)
Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net>
2023-03-10 11:35:33 -08:00
Suneet Saldanha 44547614ae
Report engine as a dimension for sqlQuery metrics (#13906)
* Report engine as a dimension for sqlQuery metrics

* docs
2023-03-10 11:23:57 -08:00
Karan Kumar 67be70e82e
Removing the forbidden check until we find a fix for java 8 to unblock builds. (#13910) 2023-03-10 21:37:19 +05:30
Gian Merlino 4b1ffbc452
Various changes and fixes to UNNEST. (#13892)
* Various changes and fixes to UNNEST.

Native changes:

1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn".
   This enables pushing expressions into the datasource. This in turn
   allows us to do the next thing...

2) UnnestStorageAdapter: Logically apply query-level filters and virtual
   columns after the unnest operation. (Physically, filters are pulled up,
   when possible.) This is beneficial because it allows filters and
   virtual columns to reference the unnested column, and because it is
   consistent with how the join datasource works.

3) Various documentation updates, including declaring "unnest" as an
   experimental feature for now.

SQL changes:

1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel
   is simplified: it only handles the UNNEST part of a correlated join.
   Constant UNNESTs are handled with regular inline rels.

2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from
   the left side up above the Correlate. New test testUnnestTwice verifies
   that this works even when two UNNESTs are stacked on the same table.

3) Include ProjectCorrelateTransposeRule from Calcite to encourage
   pushing mappings down below the left-hand side of the Correlate.

4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule
   to handle pulling Filters up above the Correlate. New tests
   testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify
   this behavior.

5) Require a context feature flag for SQL UNNEST, since it's undocumented.
   As part of this, also cleaned up how we handle feature flags in SQL.
   They're now hooked into EngineFeatures, which is useful because not
   all engines support all features.
2023-03-10 16:42:08 +05:30