12195 Commits

Author SHA1 Message Date
Sam Rash
f89496ccac
Revert Accidental Change to Druid.xml (#13190)
See commit 54a2eb for accidental commit
2022-10-06 14:42:35 -07:00
317brian
0edceead80
msq: update known issue about GROUPING SETS and COUNT DISTINCT (#13185)
* msq: update known issue about GROUPING SETS and COUNT DISTINCT

* address feedback from Gian
2022-10-05 19:47:03 -07:00
AmatyaAvadhanula
41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Xavier Léauté
eff7edb603
update core Apache Kafka dependencies to 3.3.1 (#13176)
Announcement:
- https://blogs.apache.org/kafka/entry/what-rsquo-s-new-in

Release notes:
- https://archive.apache.org/dist/kafka/3.3.0/RELEASE_NOTES.html
- https://downloads.apache.org/kafka/3.3.1/RELEASE_NOTES.html
2022-10-04 12:52:16 -07:00
Abhishek Agarwal
e3f9a0ed44
Lazy initialization of segment killers, movers and archivers (#13170)
* Lazy initialization of segment killers, movers and archivers

* Add test for lazy killer

* Add more tests

* Intellij fixes
2022-10-04 15:55:46 +05:30
Kashif Faraz
b07f01d645
Set useMaxMemoryEstimates=false by default (#13178)
A value of `false` denotes that the new flow with improved estimates will be used.
2022-10-04 15:04:23 +05:30
Abhishek Agarwal
7fa53ff4b3
Exclude calcite from dependabot (#13160)
* Exclude calcite from dependabot

* Update .github/dependabot.yml

Co-authored-by: Liam Newman <96086065+liam-verta@users.noreply.github.com>

* Update dependabot.yml

Co-authored-by: Liam Newman <96086065+liam-verta@users.noreply.github.com>
2022-10-04 10:21:11 +08:00
Vadim Ogievetsky
4bfae1deee
Docs: fix doc search (#13164)
* fix doc search

* upgrade website node to 16

* change website travis script

* move spellcheck notification

* explicit path to npm bin

* cd to the correct place
2022-10-03 16:48:13 -07:00
Adarsh Sanjeev
92d2633ae6
Update ClusterByStatisticsCollectorImpl to use bytes instead of keys (#12998)
* Update clusterByStatistics to use bytes instead of keys

* Address review comments

* Resolve checkstyle

* Increase test coverage

* Update test

* Update thresholds

* Update retained keys function

* Update docs

* Fix spelling
2022-10-03 12:08:23 +05:30
Vadim Ogievetsky
ebfe1c0c90
Web console: fix DQT import (#13159)
* fix dqt import

* update licenses

* update tests
2022-09-30 09:31:06 -07:00
Kashif Faraz
ce5f55e5ce
Fix over-replication caused by balancing when inventory is not updated yet (#13114)
* Add coordinator test framework

* Remove outdated changes

* Add more tests

* Add option to auto-sync inventory

* Minor cleanup

* Fix inspections

* Add README for simulations, add SegmentLoadingNegativeTest

* Fix over-replication from balancing

* Fix README

* Cleanup unnecessary fields from DruidCoordinator

* Add a test

* Fix DruidCoordinatorTest

* Remove unused import

* Fix CuratorDruidCoordinatorTest

* Remove test log4j2.xml
2022-09-29 12:06:23 +05:30
Abhishek Agarwal
61b34950e7
Fix assertion error in sql planning for latest aggregators (#13151)
* Fix sql planning bug for latest aggregators

* change test name

* Fix error messages

* fix error message again
2022-09-28 21:01:32 +05:30
AmatyaAvadhanula
acafd0d1e0
Upgrade kafka version to 3.2.3 to fix CVE (#13142)
Upgrade to 3.2.3 to fix CVE: https://nvd.nist.gov/vuln/detail/CVE-2022-34917
2022-09-28 10:47:09 +05:30
Jill Osborne
548d810baa
Correct nested columns example (#13150) 2022-09-28 10:39:56 +05:30
David Palmer
0d7bf66578
Add a note to the documentation about pre-built HLLSketches (#13088)
* add a note to the documentation about pre-built HLLSketches

Druid actually supports ingesting a pre-generated sketch column by using
the HLLSketchMerge aggregator. However, this functionality was
previously not made clear in the documentation.

* copyedit from the King's English to American English

* add suggested style changes

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-09-27 10:29:39 +08:00
Apoorv Gupta
c8f4d72fb1
Fix documentation bug about injective lookups (#13147)
replace mapping to `unique keys` with mapping to `unique values`.
2022-09-27 10:16:48 +08:00
Sam Rash
28b9edc2a8
Add BIG_SUM SQL function (#13102)
This adds a sql function, "BIG_SUM", that uses
CompressedBigDecimal to do a sum. Other misc changes:

1. handle NumberFormatExceptions when parsing a string (default to set
   to 0, configurable in agg factory to be strict and throw on error)
2. format pom file (whitespace) + add dependency
3. scaleUp -> scale and always require scale as a parameter
2022-09-26 18:02:25 -07:00
Jonathan Wei
1f1fced6d4
Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089)
* Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON

* Fix serde and docs

* Add PR comment check
2022-09-26 19:51:04 -05:00
imply-cheddar
e839660b6a
Grab the thread name in a poisoned pool (#13143) 2022-09-26 17:09:10 -07:00
Laksh Singla
0bfa81b7df
Fix the Injector creation in HadoopTask (#13138)
* Injector fix in HadoopTask

* Log the ExtensionsConfig while instantiating the HadoopTask

* Log the config in the run() method instead of the ctor
2022-09-24 10:38:25 +05:30
Adarsh Sanjeev
306f612f86
Suppress Calcite CVE (#13119)
* Suppress Calcite CVE

* Update comment
2022-09-23 16:23:26 +05:30
Vadim Ogievetsky
a910764e41
better spec conversion with issues (#13136) 2022-09-22 10:46:57 -07:00
Vadim Ogievetsky
6c1dc6589e
initialize all counters for stages with input (#13137) 2022-09-22 08:10:50 -07:00
Laksh Singla
728745a1d3
Add IT for MSQ task engine using the new IT framework (#12992)
* first test, serde causing problems

* serde working

* insert and select check

* Add cluster annotations for MSQ test cases

* Add cluster config for MSQ

* Add MSQ config to the pom.xml

* cleanup unnecessary changes

* Remove model classes

* Comments, checkstyle, check queries from file

* fixup test case name

* build failure fix

* review changes

* build failure fix

* Trigger Build

* Log the mismatch in QueryResultsVerifier

* Trigger Build

* Change the signature of the results verifier

* review changes

* LGTM fix

* build, change pom

* Trigger Build

* Trigger Build

* trigger build with minimal pom changes

* guice fix in tests

* travis.yml
2022-09-22 16:09:47 +05:30
Sam Rash
044cab5094
Optimize CompressedBigDecimal compareTo() (#13086)
Optimizes the compareTo() function in
CompressedBigDecimal. It directly compares the int[] rather than
creating BigDecimal objects and using its compareTo.

It handles unequal sized CBDs, but does require
the scales to match.
2022-09-21 20:31:02 -07:00
Vadim Ogievetsky
f1d3728371
append to exisitng callout (#13130) 2022-09-21 19:39:28 -07:00
Charles Smith
eb760c3d1d
update log4j example (#13095)
* update log4j example

* fix some style issues

* Update docs/configuration/logging.md

Co-authored-by: Frank Chen <frankchen@apache.org>

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-22 09:46:49 +08:00
317brian
12f12a13a9
fix: fix broken postgres link (#13135) 2022-09-22 09:46:20 +08:00
317brian
7fa35839c0
fix: follow naming convention for msq task engine (#13127)
* fix: follow naming convention for msq task engine

* more fixes

* add back in experimental

* fix anchor
2022-09-21 18:46:06 -07:00
Gian Merlino
2f731f356e
Update pull-deps docs with correct repo list. (#13134)
There is only one default remote repo at this time.
2022-09-21 12:16:57 -07:00
Jonathan Wei
331e6d707b
Add KafkaConfigOverrides extension point (#13122)
* Add KafkaConfigOverrides extension point

* X
2022-09-21 11:47:19 +05:30
Katya Macedo
90d14f629a
spatial-filters (#13124) 2022-09-20 22:48:36 -07:00
Kashif Faraz
0039409817
Add test framework to simulate segment loading and balancing (#13074)
Fixes #12822 

The framework added here make it easy to write tests that verify the behaviour and interactions
of the following entities under various conditions:
- `DruidCoordinator`
- `HttpLoadQueuePeon`, `LoadQueueTaskMaster`
- coordinator duties: `BalanceSegments`, `RunRules`, `UnloadUnusedSegments`, etc.
- datasource retention rules: `LoadRule`, `DropRule`

Changes:
Add the following main classes:
- `CoordinatorSimulation` and related interfaces to dictate behaviour of simulation
- `CoordinatorSimulationBuilder` to build a simulation.
- `BlockingExecutorService` to keep submitted tasks in queue and execute them
  only when explicitly invoked.

Add tests:
- `CoordinatorSimulationBaseTest`, `SegmentLoadingTest`, `SegmentBalancingTest`
- `SegmentLoadingNegativeTest` to contain tests which assert the existing erroneous behaviour
of segment loading. Once the behaviour is fixed, these tests will be moved to the regular
`SegmentLoadingTest`.

Please refer to the README.md in `org.apache.druid.server.coordinator.simulate` for more details
2022-09-21 09:51:58 +05:30
hosswald
5ed5c83aab
Clarified the behaviour of SQL COUNT(DISTINCT dim) on multi-value dimensions (#13128)
* Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns

* Update docs/querying/sql-aggregations.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-09-20 18:03:34 -07:00
Vadim Ogievetsky
edc444a4bc
fix quickstart (#13126) 2022-09-20 17:44:21 -07:00
Abhishek Agarwal
455b074b36
Move JDK11 ITs to cron stage (#13075)
* Move JDK11 ITs to cron stage

* Make cron run on release branches

* Review comments

* fix spelling
2022-09-20 09:18:52 -07:00
Vadim Ogievetsky
b9edfe34a4
be consistent about referring to the web console by its name (#13118) 2022-09-19 15:02:17 -07:00
Frank Chen
a3391693eb
Improve a MSQ planning error message (#13113) 2022-09-19 23:11:54 +08:00
abhagraw
48638a5438
Getting extension list from pom (#13073)
* Getting extension list from pom

* Trigger Build
2022-09-19 15:14:21 +05:30
Clint Wylie
a0e0fbe1b3
nested column serializer performance improvement for sparse columns (#13101) 2022-09-19 14:07:48 +05:30
Paul Rogers
8ce03eb094
Convert the Druid planner to use statement handlers (#12905)
* Converted Druid planner to use statement handlers

Converts the large collection of if-statements for statement
types into a set of classes: one per supported statement type.
Cleans up a few error messages.

* Revisions from review comments

* Build fix

* Build fix

* Resolve merge confict.

* More merges with QueryResponse PR

* More parameterized type cleanup

Forces a rebuild due to a flaky test
2022-09-19 11:58:45 +05:30
Vadim Ogievetsky
bb0b810b1d
fix html tags in docs (#13117)
* fix html tags in docs

* revert not null
2022-09-18 19:40:33 -07:00
Gian Merlino
2e729170cc
Kill task: Don't include markAsUnused unless set. (#13104)
Cleans up the serialized JSON.
2022-09-17 14:03:34 -07:00
Vadim Ogievetsky
de8f229bed
Web console: correctly escape path based flatten specs (#13105)
* fix path generation

* do escape

* fix replace

* fix replace for good
2022-09-17 14:02:42 -07:00
Gian Merlino
d9b2968edb
Docs: Clarify the situation with SELECT. (#13109) 2022-09-17 10:47:57 -07:00
Charles Smith
b366a6c5a4
Add clarification around docker environment #8926 (#13084)
* Add clarification around docker environment #8926

* fix spelling

* Update docs/tutorials/docker.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* Update docs/tutorials/docker.md

Co-authored-by: Frank Chen <frankchen@apache.org>

* fix nano quickstart

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-09-17 20:44:24 +08:00
Ellen Shen
da30c8070a
kafka consumer: custom serializer can't be configured after it's instantiation (#12960) (#13097)
* allow kakfa custom serializer to be configured

  * add unit tests

Co-authored-by: ellen shen <ellenshen@apple.com>
2022-09-17 20:42:21 +08:00
Gian Merlino
d4967c38f8
Various documentation updates. (#13107)
* Various documentation updates.

1) Split out "data management" from "ingestion". Break it into thematic pages.

2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
   all conceptual content is in concepts.md and all syntax content is in reference.md.
   Shorten the known issues page to the most interesting ones.

3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
   index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.

4) Rename various mentions of "Druid console" to "web console".

5) Add additional information to ingestion/partitioning.md.

6) Remove a mention of Tranquility.

7) Remove a note about upgrading to Druid 0.10.1.

8) Remove no-longer-relevant task types from ingestion/tasks.md.

9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.

10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
    places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
    in the sidebar.

11) Make all br tags self-closing.

12) Certain other cosmetic changes.

13) Update to node-sass 7.

* make travis use node12 for docs

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2022-09-16 21:58:11 -07:00
Vadim Ogievetsky
c62a822121
support kafka lookups (#13098) 2022-09-16 15:25:25 -07:00
AmatyaAvadhanula
9b53b0184f
Allocate numCorePartitions using only used segments (#13070)
* Allocate numCorePartitions using only used segments

* Add corePartition checks in existing test

* Separate committedMaxId and overallMaxId

* Fix bug: replace overall with committed
2022-09-16 19:16:36 +05:30