12278 Commits

Author SHA1 Message Date
Vadim Ogievetsky
9679f6a9b5
Web console: add arrayOfDoublesSketch and other small fixes (#13486)
* add padding and keywords

* add arrayOfDoubles

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* partiton int

* fix docs

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-06 21:21:49 -08:00
Kashif Faraz
c7229fc787
Limit max batch size for segment allocation, add docs (#13503)
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
2022-12-07 10:07:14 +05:30
Abhishek Agarwal
b25cf216d5
Better error message when theta_sketch_intersect is used on scalar expression (#13508) 2022-12-07 09:35:43 +05:30
Clint Wylie
37d8833125
fix bug with broker parallel merge metrics emitting, add wall time, fast/slow partition time metrics (#13420) 2022-12-06 17:50:59 -08:00
imply-cheddar
83261f9641
Starting on Window Functions (#13458)
* Processors for Window Processing

This is an initial take on how to use Processors
for Window Processing.  A Processor is an interface
that transforms RowsAndColumns objects.
RowsAndColumns objects are essentially combinations
of rows and columns.

The intention is that these Processors are the start
of a set of operators that more closely resemble what
DB engineers would be accustomed to seeing.

* Wire up windowed processors with a query type that
can run them end-to-end.  This code can be used to
actually run a query, so yay!

* Wire up windowed processors with a query type that
can run them end-to-end.  This code can be used to
actually run a query, so yay!

* Some SQL tests for window functions. Added wikipedia 
data to the indexes available to the
SQL queries and tests validating the windowing
functionality as it exists now.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2022-12-06 15:54:05 -08:00
Clint Wylie
cf472162a6
fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http (#13499)
* fix issue with http server inventory view blocking data node http server shutdown with long polling

* adjust

* fix test inspections
2022-12-06 15:52:44 -08:00
Tejaswini Bandlamudi
136322d13b
clean install before license checks (#13502) 2022-12-05 22:38:03 -08:00
Gian Merlino
fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
AmatyaAvadhanula
658a9c2d35
Early stop on failed start (Alternative to #13087) (#13258)
* Make halt configurable. Don't halt in tests
2022-12-05 21:05:07 +05:30
Kashif Faraz
65945a686f
Docs: Update docs for coordinator dynamic config (#13494)
* Update docs for useBatchedSegmentSampler

* Update docs for round robin assigment
2022-12-05 16:53:10 +05:30
TSFenwick
10bec54acc
Switching emitter. This will allow for a per feed emitter designation. (#13363)
* Switching emitter. This will allow for a per feed emitter designation.

This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed.
The emitter can direct the event to a default emitter.

* fix checkstyle issues and make docs for switching emitter use basic event feeds

* fix broken docs, add test, and guard against misconfigurations

* add module test
add switching emitter module test

* fix broken SwitchingEmitterModuleTest

* add apache license to top of test

* fix checkstyle issues

* address comments by adding javadocs, removing a todo, and making druid docs more clear
2022-12-05 16:04:34 +05:30
Kashif Faraz
45a8fa280c
Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369)
In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions 
on the overlord can often take very long intervals of time to finish thus causing spikes 
in the `task/action/run/time`. This may result in lag building up while a task waits for a
segment to get allocated.

The root causes are:
- large number of metadata calls made to the segments and pending segments tables
- `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments

Since the contention typically arises when several tasks of the same datasource try
to allocate segments for the same interval/granularity, the allocation run times can be
improved by batching the requests together.

Changes
- Add flags
   - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`)
   - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`)
- Add methods `canPerformAsync` and `performAsync` to `TaskAction`
- Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch
- Process batch after `batchAllocationMaxWaitTime`
- Acquire `giant` lock just once per batch in `TaskLockbox`
- Reduce metadata calls by batching statements together and updating query filters
- Except for batching, retain the whole behaviour (order of steps, retries, etc.)
- Respond to leadership changes and fail items in queue when not leader
- Emit batch and request level metrics
2022-12-05 14:00:07 +05:30
somu-imply
9177419628
Unnest functionality for Druid (#13268)
* Moving all unnest cursor code atop refactored code for unnest

* Updating unnest cursor

* Removing dedup and fixing up some null checks

* AllowList changes

* Fixing some NPEs

* Using bitset for allowlist

* Updating the initialization only when cursor is in non-done state

* Updating code to skip rows not in allow list

* Adding a flag for cases when first element is not in allowed list

* Updating for a null in allowList

* Splitting unnest cursor into 2 subclasses

* Intercepting some apis with columnName for new unnested column

* Adding test cases and renaming some stuff

* checkstyle fixes

* Moving to an interface for Unnest

* handling null rows in a dimension

* Updating cursors after comments part-1

* Addressing comments and adding some more tests

* Reverting a change to ScanQueryRunner and improving a comment

* removing an unused function

* Updating cursors after comments part 2

* One last fix for review comments

* Making some functions private, deleting some comments, adding a test for unnest of unnest with allowList

* Adding an exception for a case

* Closure for unnest data source

* Adding some javadocs

* One minor change in makeDimSelector of columnarCursor

* Updating an error message

* Update processing/src/main/java/org/apache/druid/segment/DimensionUnnestCursor.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Unnesting on virtual columns was missing an object array, adding that to support virtual columns unnesting

* Updating exceptions to use UOE

* Renamed files, added column capability test on adapter, return statement and made unnest datasource not cacheable for the time being

* Handling for null values in dim selector

* Fixing a NPE for null row

* Updating capabilities

* Updating capabilities

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-12-02 18:48:25 -08:00
Katya Macedo
78c1a2bd66
Remove limit from timeseries (#13457)
CI build failures seem unrelated to docs
2022-12-02 12:19:59 -08:00
Paul Rogers
b76ff16d00
SQL test framework extensions (#13426)
SQL test framework extensions

* Capture planner artifacts: logical plan, etc.
* Planner test builder validates the logical plan
* Validation for the SQL resut schema (we already have
  validation for the Druid row signature)
* Better Guice integration: properties, reuse Guice modules
* Avoid need for hand-coded expr, macro tables
* Retire some of the test-specific query component creation
* Fix query log hook race condition
2022-12-02 09:11:59 -08:00
Tejaswini Bandlamudi
30498c1f98
Update gha & travis checks (#13412)
* update static-checks GHA to run sequentially
remove static-checks from travis.yml
move docs, web-console, packaging checks from travis to GHA

* nit

* nit

* groups all checks, runs on 8, 11, 17 jdks

* nit

* adds license info

* update permissions on scripts folder

* nit

* nit

* fix packaging check

* changes naming, cleans repo before license checks

* simulate failure

* bump up license checks

* test license checks failure

* test license checks failure

* test license checks failure

* verify gha script run exit code

* fail fast in case of shell script

* verified fail fast in case of shell script
2022-12-02 15:06:31 +05:30
Jill Osborne
138a6de507
Update nested columns docs (#13461)
* Update nested columns docs

(cherry picked from commit 04206c5179e0eb46a30d4113c7332daee46c390d)

* Update nested-columns.md

(cherry picked from commit 8085ee7217d90e0e3f133985a52ec2e0b0552992)
2022-12-01 10:47:32 -08:00
AmatyaAvadhanula
cc307e4c29
Fix needless task shutdown on leader switch (#13411)
* Fix needless task shutdown on leader switch

* Add unit test

* Fix style

* Fix UTs
2022-12-01 18:31:08 +05:30
abhagraw
f6f625ee08
MSQ Reindex IT (#13433)
* MSQ Reindex IT

* Fixing checkstyle errors

* Addressing comments

* Addressing comments
2022-12-01 12:13:23 +05:30
Adarsh Sanjeev
8395273099
Add unit tests for MSQ ingestion faults (#13439)
* Add unit tests for MSQ ingestion faults

* Resolve build failure

* Move test to MSQFaultTest

* Rename test
2022-12-01 10:11:49 +05:30
Adarsh Sanjeev
2f3b97194f
Fix harcoded version in pom file (#13460) 2022-12-01 10:10:04 +05:30
Vadim Ogievetsky
2fdcfffe40
don't render duration if aggregated (#13455) 2022-11-30 19:21:07 -08:00
317brian
cc2e4a80ff
doc: add a basic JDBC tutorial (#13343)
* initial commit for jdbc tutorial

(cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb)

* add commentary

* address comments from charles

* add query context to example

* fix typo

* add links

* Apply suggestions from code review

Co-authored-by: Frank Chen <frankchen@apache.org>

* fix datatype

* address feedback

* add parameterize to spelling file. the past tense version was already there

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-11-30 16:25:35 -08:00
xiaokang
6ba35f6d59
update org.bouncycastle:bcprov-jdk15on 1.68 to 1.69 (#13440) 2022-11-30 21:57:38 +05:30
Adarsh Sanjeev
af164cbc10
Fix an issue with WorkerSketchFetcher not terminating on shutdown (#13459)
* Fix an issue with WorkerSketchFetcher not terminating on shutdown

* Change threadpool name
2022-11-30 21:02:48 +05:30
Kashif Faraz
8ff1b2d5d4
Revert "Add filter in cloud object input source for backward compatibility (#13437)" (#13450)
This reverts commit b12e5f300e7c2795ba3d9c7ef17fb64f4925b9c0.
2022-11-30 16:33:05 +05:30
Jill Osborne
291ded22d5
Update experimental features doc (#13452) 2022-11-30 16:14:43 +05:30
Gian Merlino
50963edcae
Fix compile error in MSQSelectTest. (#13456) 2022-11-29 15:51:03 -08:00
Jill Osborne
5c520e0cf9
Update LDAP configuration docs (#13245)
* Update LDAP configuration docs

* Updated after review

* Update auth-ldap.md

Updated.

* Update auth-ldap.md

* Updated spelling file

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/operations/auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update auth-ldap.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-11-29 09:26:32 -08:00
Laksh Singla
79df11c16c
Improve unit test coverage for MSQ (#13398)
* add faults tests for the multi stage query

* add too many parttiions fault

* add toomanyinputfilesfault

* programmatically generate the file

* refactor

* Trigger Build
2022-11-29 17:27:04 +05:30
Laksh Singla
4ed6255bdf
Convert errors based on implicit type conversion in multi value arrays to parse exception in MSQ (#13366)
* initial commit

* fix test

* push the json changes

* reduce the area of the try..catch

* Trigger Build

* review
2022-11-29 17:19:57 +05:30
Karan Kumar
edd076ca69
Remove duplicate FrameRowTooLargeException.java (#13441)
* Removing duplicate FrameRowTooLargeException.java

* Fixing intellij inspection
2022-11-29 08:46:59 +05:30
Jill Osborne
100a2aa4a2
Update and document experimental features (#13348)
* Update and document experimental features
* Updated
* Update experimental-features.md
* Update docs/development/experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Updated after review
* Updated
* Update materialized-view.md
* Update experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-29 08:01:28 +05:30
Vadim Ogievetsky
d8f4353c43
Web console: be more robust to aux queries failing and improve kill tasks (#13431)
* be more robust to aux queries failing

* feedback fixes

* remove empty block

* fix spelling

* remove killAllDataSources from the console
2022-11-28 16:50:38 -08:00
Clint Wylie
37b8d4861c
fix issues with nested data conversion (#13407) 2022-11-28 12:29:43 -08:00
Clint Wylie
4b58f5f23c
fix KafkaInputFormat with nested columns by delegating to underlying inputRow map instead of eagerly copying (#13406) 2022-11-28 12:28:07 -08:00
Vadim Ogievetsky
a2d5e335f3
Web console: Index spec dialog (#13425)
* add index spec dialog

* add sanpshot
2022-11-28 11:40:45 -08:00
Tejaswini Bandlamudi
b12e5f300e
Add filter in cloud object input source for backward compatibility (#13437)
https://github.com/apache/druid/pull/13027 PR replaces `filter` parameter with
`objectGlob` in ingestion input source. However, this will cause existing ingestion
jobs to fail if they are using a filter already. This PR adds old filter functionality
alongside objectGlob to preserve backward compatibility.
2022-11-28 23:04:33 +05:30
Gian Merlino
58c896ea0b
ServiceClient: More robust redirect handling. (#13413)
Detects self-redirects, redirect loops, long redirect chains, and redirects to unknown servers.
Treat all of these cases as an unavailable service, retrying if the retry policy allows it.

Previously, some of these cases would lead to a prompt, unretryable error. This caused
clients contacting an Overlord during a leader change to fail with error messages like:

org.apache.druid.rpc.RpcException: Service [overlord] redirected too many times

Additionally, a slight refactor of callbacks in ServiceClientImpl improves readability of
the flow through onSuccess.
2022-11-28 22:24:46 +05:30
Kashif Faraz
656b6cdf62
Add MetricsVerifier to simplify verification of metric values in tests (#13442) 2022-11-28 19:32:37 +05:30
Jill Osborne
db7c29c6f9
Correction to firehose migration doc (#13423) 2022-11-28 10:24:27 +05:30
Tejaswini Bandlamudi
b091b32f21
Fixes reindexing bug with filter on long column (#13386)
* fixes BlockLayoutColumnarLongs close method to nullify internal buffer.

* fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers.

* fix spotbugs
2022-11-25 19:22:48 +05:30
dependabot[bot]
16385c7101
Bump minimatch and replace in /web-console (#13396)
Bumps [minimatch](https://github.com/isaacs/minimatch) to 3.0.5 and updates ancestor dependency [replace](https://github.com/ALMaclaine/replace). These dependencies need to be updated together.


Updates `minimatch` from 3.0.4 to 3.0.5
- [Release notes](https://github.com/isaacs/minimatch/releases)
- [Commits](https://github.com/isaacs/minimatch/compare/v3.0.4...v3.0.5)

Updates `replace` from 1.2.1 to 1.2.2
- [Release notes](https://github.com/ALMaclaine/replace/releases)
- [Commits](https://github.com/ALMaclaine/replace/commits)

---
updated-dependencies:
- dependency-name: minimatch
  dependency-type: indirect
- dependency-name: replace
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-23 12:16:00 -08:00
Clint Wylie
f524c68f08
Add mechanism for 'safe' memory reads for complex types (#13361)
* we can read where we want to
we can leave your bounds behind
'cause if the memory is not there
we really don't care
and we'll crash this process of mine
2022-11-23 00:25:22 -08:00
Rohan Garg
c26b18c953
Port CVE suppressions from 24.0.1 (#13415)
* Suppress jackson-databind CVE-2022-42003 and CVE-2022-42004
(cherry picked from commit 1f4d892c9a2dbc3ce6df1481fd4c6d242ba0ea8d)
* Suppress CVEs
(cherry picked from commit ed55baa8fa7d7f914a0addabb072d9ed47e1cd9f)
* Suppress vulnerabilities from druid-website package
(cherry picked from commit c0fb364f8049d53cd704e414e2ffeab6c49b012e)
* Add more suppressions for website package
(cherry picked from commit 9bba569ebd52c5480bf4219c420ed78eb053701f)
2022-11-23 11:35:33 +05:30
Clint Wylie
be4914dcd9
fix off by one error in nested column range index (#13405) 2022-11-22 12:46:06 -08:00
Kashif Faraz
7cf761cee4
Prepare master branch for next release, 26.0.0 (#13401)
* Prepare master branch for next release, 26.0.0

* Use docker image for druid 24.0.1

* Fix version in druid-it-cases pom.xml
2022-11-22 15:31:01 +05:30
Gian Merlino
c6054b7cb7
Attach IO error to parse error when we can't contact Avro schema registry. (#13403)
* Attach IO error to parse error when we can't contact Avro schema registry.

The change in #12080 lost the original exception context. This patch
adds it back.

* Add hamcrest-core.

* Fix format string.
2022-11-21 22:20:26 -08:00
Adarsh Sanjeev
280a0f7158
Add sequential sketch merging to MSQ (#13205)
* Add sketch fetching framework

* Refactor code to support sequential merge

* Update worker sketch fetcher

* Refactor sketch fetcher

* Refactor sketch fetcher

* Add context parameter and threshold to trigger sequential merge

* Fix test

* Add integration test for non sequential merge

* Address review comments

* Address review comments

* Address review comments

* Resolve maxRetainedBytes

* Add new classes

* Renamed key statistics information class

* Rename fetchStatisticsSnapshotForTimeChunk function

* Address review comments

* Address review comments

* Update documentation and add comments

* Resolve build issues

* Resolve build issues

* Change worker APIs to async

* Address review comments

* Resolve build issues

* Add null time check

* Update integration tests

* Address review comments

* Add log messages and comments

* Resolve build issues

* Add unit tests

* Add unit tests

* Fix timing issue in tests
2022-11-22 09:56:32 +05:30
Vadim Ogievetsky
fe34ecc5e3
add ability to make inputFormat part of the example datasets (#13402) 2022-11-21 12:50:44 -08:00