2726 Commits

Author SHA1 Message Date
zachjsh
38e620aa4c
Operator conversion deny list (#13766)
### Description

This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify
any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions.

An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following:

```
{
  "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985",
  "state": "FAILED",
  "error": {
    "error": "Plan validation failed",
    "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)",
    "errorClass": "org.apache.calcite.tools.ValidationException",
    "host": null
  }
}
```
2023-02-10 09:59:26 -08:00
Anshu Makkar
d7b95988d7
Add missing documentation for constant post-aggregator (#13664)
Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.
2023-02-09 08:53:45 -08:00
Suneet Saldanha
714ac07b52
Allow users to add additional metadata to ingestion metrics (#13760)
* Allow users to add additional metadata to ingestion metrics

When submitting an ingestion spec, users may pass a map of metadata
in the ingestion spec config that will be added to ingestion metrics.

This will make it possible for operators to tag metrics with other
metadata that doesn't necessarily line up with the existing tags
like taskId.

Druid clusters that ingest these metrics can take advantage of the
nested data columns feature to process this additional metadata.

* rename to tags

* docs

* tests

* fix test

* make code cov happy

* checkstyle
2023-02-08 18:07:23 -08:00
AmatyaAvadhanula
0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
AmatyaAvadhanula
dcdae84888
Add server view initialization metrics (#13716)
* Add server view init metrics

* Test coverage

* Rename metrics
2023-02-07 20:02:00 +05:30
Suneet Saldanha
bea18dc9e4
Update basic auth examples (#13750) 2023-02-03 14:45:48 -08:00
drudi-at-coffee
7580248770
Update api.md (#13727)
Added missing '/status' in HTTP status request
2023-02-02 10:43:22 -08:00
Victoria Lim
33efd5ab1d
docs: Refresh the update data tutorial (#13641)
Merging regardless of nit since topic is in better shape.

* refresh the update data tutorial

* Apply suggestions from code review

Co-authored-by: Jill Osborne <jill.osborne@imply.io>

---------

Co-authored-by: Jill Osborne <jill.osborne@imply.io>
2023-02-01 18:18:16 -08:00
Kashif Faraz
f629643c50
Fix value of lookup sync period in docs (#13695)
* Fix lookup docs

* Fix spelling

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-01 18:12:00 -08:00
Sergio Ferragut
7f830b20d7
fixed init commands for both mysql and postgresql (#13713) 2023-02-01 18:07:31 -08:00
Suneet Saldanha
cfc3115a59
Compaction history returns empty list instead of 404 when not found (#13730)
* Compaction history returns empty list instead of 404 when not found

* checkstyle
2023-02-01 17:44:07 -08:00
Tijo Thomas
1beef30bb2
Support postaggregation function as in Math.pow() (#13703) (#13704)
Support postaggregation function as in Math.pow()
2023-01-31 22:55:04 +05:30
Adarsh Sanjeev
51dfde0284
Add maxInputBytesPerWorker as query context parameter (#13707)
* Add maxInputBytesPerWorker as query context parameter

* Move documenation to msq specific docs

* Update tests

* Spacing

* Address review comments

* Fix test

* Update docs/multi-stage-query/reference.md

* Correct spelling mistake

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-31 20:55:28 +05:30
Jill Osborne
356b0e37cf
Tutorial: Query view (#13565)
* Tutorial: Query view

* Removed duplicate file

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Updated after review

* Update docs/tutorials/tutorial-sql-query-view.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update tutorial-sql-query-view.md

Update title

* Update sidebars.json

fix merge conflict w/ sidebar

* address spelling ci

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-27 14:29:43 -08:00
sairam devarashetty
6164c420a1
Create update.md (#13451)
* Create update.md

Important Line highlighted

* Update docs/data-management/update.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-25 16:23:40 -08:00
317brian
9021161c8c
doc: fix markdown spacing (#13683)
* doc: fix markdown spacing

* fix spacing
2023-01-25 16:22:49 -08:00
Victoria Lim
00cee329bd
pitfall when using combining input source (#13639) 2023-01-25 12:50:19 -08:00
Suneet Saldanha
016c881795
Add API to return automatic compaction config history (#13699)
Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config.

The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.
2023-01-23 13:23:45 -08:00
Rohan Garg
f76acccff2
Allow using composed storage for SuperSorter intermediate data (#13368) 2023-01-24 01:02:03 +05:30
Eyal Yurman
44374f91bc
Fix broken links to Oracle JDK docs (#13687)
* Fix broken link for SSLContext java doc

* Update tls-support.md

* Update tls-support.md

* Update tls-support.md

* Update simple-client-sslcontext.md
2023-01-18 14:46:08 +05:30
Paul Rogers
22630b0aab
Much improved table functions (#13627)
Much improved table functions

* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
2023-01-17 08:41:57 -08:00
Gian Merlino
182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Vadim Ogievetsky
93dc01b6c5
fix broken table missing new line (#13666) 2023-01-12 15:29:51 -08:00
Vadim Ogievetsky
f97bcc69d3
Docs: reword single server page (#13659)
* reword single server page

* fix typo

* Update docs/operations/single-server.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-11 21:12:52 -08:00
Karan Kumar
56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Abhishek Agarwal
17936e2920
Add an option to enable HSTS in druid services (#13489)
* Add an option to enable HSTS

* Fix code and add docs

* Deduplicate headers

* unused import

* Fix spelling
2023-01-10 22:31:51 +05:30
Victoria Lim
a800dae87a
doc: List Protobuf as a supported format (#13640) 2023-01-06 15:09:37 -08:00
317brian
6bbf4266b2
docs: documentation for unnest datasource (#13479)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-01-06 11:41:11 -08:00
Kashif Faraz
0d97e658b2
Docs: Update quickstart instructions (#13611)
Changes:
- Remove specification of a Druid version in the quickstart, because the previous step
instructs downloading the latest version anyway.
- Mention usage of memory parameter in the quickstart
2022-12-22 11:51:08 +05:30
Vadim Ogievetsky
07597c687d
Docs: Remove large data file (#13595) 2022-12-19 13:14:22 +05:30
Gian Merlino
ee890965f4
LocalInputSource: Serialize File paths without forcing resolution. (#13534)
* LocalInputSource: Serialize File paths without forcing resolution.

Fixes #13359.

* Add one more javadoc.
2022-12-19 11:47:36 +05:30
Victoria Lim
09d8b16447
Document shouldFinalize for sketches that have the parameter (#13524)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-17 10:48:06 -08:00
317brian
d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
Gian Merlino
7f3c117e3a
SQL: Improve docs around casts. (#13466)
Main change: clarify that the "default value" for casts only applies if
druid.generic.useDefaultValueForNull = true.

Secondary change: adjust a bunch of wording from future to present tense.
2022-12-15 15:01:40 -08:00
Kashif Faraz
d6949b1b79
Track input processedBytes with MSQ ingestion (#13559)
Follow up to #13520

Bytes processed are currently tracked for intermediate stages in MSQ ingestion.
This patch adds the capability to track the bytes processed by an MSQ controller
task while reading from an external input source or a segment source.

Changes:
- Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader`
- Update `ChannelCounters` with the above obtained `processedBytes` when incrementing
the input file count.
- Update task report structure in docs

The total input processed bytes can be obtained by summing the `processedBytes` as follows:

totalBytes = 0
for every root stage (i.e. a stage which does not have another stage as an input):
    for every worker in that stage:
        for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.)
            totalBytes += processedBytes
2022-12-16 02:20:01 +05:30
Adarsh Sanjeev
2b605aa9cf
Multiple fixes for the MSQ stats merging piece which (#13463)
* Add validation checks to worker chat handler apis

* Merge things and polishing the error messages.

* Minor error message change

* Fixing race and adding some tests

* Fixing controller fetching stats from wrong workers.
Fixing race
Changing default mode to Parallel
Adding logging.
Fixing exceptions not propagated properly.

* Changing to kernel worker count

* Added a better logic to figure out assigned worker for a stage.

* Nits

* Moving to existing kernel methods

* Adding more coverage

Co-authored-by: cryptoe <karankumar1100@gmail.com>
2022-12-15 09:35:11 +05:30
Vadim Ogievetsky
2729e25295
Link to java docs (#13478)
* add link to page about selecting a JRE

* add link to script also

* simplify text
2022-12-14 11:45:23 -08:00
Gian Merlino
de5a4bafcb
Zero-copy local deep storage. (#13394)
* Zero-copy local deep storage.

This is useful for local deep storage, since it reduces disk usage and
makes Historicals able to load segments instantaneously.

Two changes:

1) Introduce "druid.storage.zip" parameter for local storage, which defaults
   to false. This changes default behavior from writing an index.zip to writing
   a regular directory. This is safe to do even during a rolling update, because
   the older code actually already handled unzipped directories being present
   on local deep storage.

2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links
   instead of copies when possible. (Generally this is possible when the
   source and destination directory are on the same filesystem.)
2022-12-12 17:28:24 -08:00
Rishabh Singh
4ebdfe226d
Druid automated quickstart (#13365)
* Druid automated quickstart

* remove conf/druid/single-server/quickstart/_common/historical/jvm.config

* Minor changes in python script

* Add lower bound memory for some services

* Additional runtime properties for services

* Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py

* File end newline

* Limit the ability to start multiple instances of a service, documentation changes

* simplify script arguments

* restore changes in medium profile

* run-druid refactor

* compute and pass middle manager runtime properties to run-druid
supervise script changes to process java opts array
use argparse, leave free memory, logging

* Remove extra quotes from mm task javaopts array

* Update logic to compute minimum memory

* simplify run-druid

* remove debug options from run-druid

* resolve the config_path provided

* comment out service specific runtime properties which are computed in the code

* simplify run-druid

* clean up docs, naming changes

* Throw ValueError exception on illegal state

* update docs

* rename args, compute_only -> compute, run_zk -> zk

* update help documentation

* update help documentation

* move task memory computation into separate method

* Add validation checks

* remove print

* Add validations

* remove start-druid bash script, rename start-druid-main

* Include tasks in lower bound memory calculation

* Fix test

* 256m instead of 256g

* caffeine cache uses 5% of heap

* ensure min task count is 2, task count is monotonic

* update configs and documentation for runtime props in conf/druid/single-server/quickstart

* Update docs

* Specify memory argument for each profile in single-server.md

* Update middleManager runtime.properties

* Move quickstart configs to conf/druid/base, add bash launch script, support python2

* Update supervise script

* rename base config directory to auto

* rename python script, changes to pass repeated args to supervise

* remove exmaples/conf/druid/base dir

* add docs

* restore changes in conf dir

* update start-druid-auto

* remove hashref for commands in supervise script

* start-druid-main java_opts array is comma separated

* update entry point script name in python script

* Update help docs

* documentation changes

* docs changes

* update docs

* add support for running indexer

* update supported services list

* update help

* Update python.md

* remove dir

* update .spelling

* Remove dependency on psutil and pathlib

* update docs

* Update get_physical_memory method

* Update help docs

* update docs

* update method to get physical memory on python

* udpate spelling

* update .spelling

* minor change

* Minor change

* memory comptuation for indexer

* update start-druid

* Update python.md

* Update single-server.md

* Update python.md

* run python3 --version to check if python is installed

* Update supervise script

* start-druid: echo message if python not found

* update anchor text

* minor change

* Update condition in supervise script

* JVM not jvm in docs
2022-12-09 11:04:02 -08:00
Paul Rogers
013a12e86f
Enhanced MSQ table functions (#13360)
* Enhanced MSQ table functions
* HTTP, LOCALFILES and INLINE table functions powered by
catalog metadata.
* Documentation
2022-12-08 13:56:02 -08:00
Gian Merlino
91ef9872ec
MSQ: Improve TooManyBuckets error message, improve error docs. (#13525)
1) Edited the TooManyBuckets error message to mention PARTITIONED BY
   instead of segmentGranularity.

2) Added error-code-specific anchors in the docs.

3) Add information to various error codes in the docs about common
   causes and solutions.
2022-12-08 13:18:26 -08:00
Jill Osborne
b56855b837
Update to native ingestion doc (#13482)
* Update to native ingestion doc

* Update docs/ingestion/native-batch.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Update native-batch.md

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2022-12-07 15:08:19 +05:30
Vadim Ogievetsky
9679f6a9b5
Web console: add arrayOfDoublesSketch and other small fixes (#13486)
* add padding and keywords

* add arrayOfDoubles

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/development/extensions-core/datasketches-tuple.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* partiton int

* fix docs

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-06 21:21:49 -08:00
Kashif Faraz
c7229fc787
Limit max batch size for segment allocation, add docs (#13503)
Changes:
- Limit max batch size in `SegmentAllocationQueue` to 500
- Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual
wait time may exceed this configured value.
- Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`
2022-12-07 10:07:14 +05:30
Gian Merlino
fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
Kashif Faraz
65945a686f
Docs: Update docs for coordinator dynamic config (#13494)
* Update docs for useBatchedSegmentSampler

* Update docs for round robin assigment
2022-12-05 16:53:10 +05:30
TSFenwick
10bec54acc
Switching emitter. This will allow for a per feed emitter designation. (#13363)
* Switching emitter. This will allow for a per feed emitter designation.

This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed.
The emitter can direct the event to a default emitter.

* fix checkstyle issues and make docs for switching emitter use basic event feeds

* fix broken docs, add test, and guard against misconfigurations

* add module test
add switching emitter module test

* fix broken SwitchingEmitterModuleTest

* add apache license to top of test

* fix checkstyle issues

* address comments by adding javadocs, removing a todo, and making druid docs more clear
2022-12-05 16:04:34 +05:30
Katya Macedo
78c1a2bd66
Remove limit from timeseries (#13457)
CI build failures seem unrelated to docs
2022-12-02 12:19:59 -08:00
Jill Osborne
138a6de507
Update nested columns docs (#13461)
* Update nested columns docs

(cherry picked from commit 04206c5179e0eb46a30d4113c7332daee46c390d)

* Update nested-columns.md

(cherry picked from commit 8085ee7217d90e0e3f133985a52ec2e0b0552992)
2022-12-01 10:47:32 -08:00
317brian
cc2e4a80ff
doc: add a basic JDBC tutorial (#13343)
* initial commit for jdbc tutorial

(cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb)

* add commentary

* address comments from charles

* add query context to example

* fix typo

* add links

* Apply suggestions from code review

Co-authored-by: Frank Chen <frankchen@apache.org>

* fix datatype

* address feedback

* add parameterize to spelling file. the past tense version was already there

Co-authored-by: Frank Chen <frankchen@apache.org>
2022-11-30 16:25:35 -08:00