Commit Graph

2738 Commits

Author SHA1 Message Date
Win Min Soe 70f9052f1d
docs: update correct config base on server spec (#13832)
Co-authored-by: Winn Minn <winn.minn@grabtaxi.com>
2023-02-23 08:50:47 -08:00
Abhishek Radhakrishnan 17a3cd0b68
Remove the additional backtick that's causing a SA issue. (#13838) 2023-02-23 09:01:08 +05:30
benkrug 66034dd8bc
Update default for finalize in query-context.md (#13763)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2023-02-22 12:35:36 -08:00
Katya Macedo 1595653e6f
docs: add a link for the Druid SQL tutorial (#13468)
* docs: add juptyer API tutorial for API and jupyter tutorial index (#3)

(cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583)

* update prereqs and fix jupyterlab name

* Removing notebook since 13345 has it

13345 should be merged first

* update contributing instructions

* docs: link to the  Druid SQL tutorial

* Add link to partitioning

* fix merge conflict

* Saving

* Update docs/tutorials/tutorial-jupyter-index.md

* Remove partitioning

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: brian.le <brian.le@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-22 09:36:13 -08:00
317brian 07883e311e
doc: fix unnecessary link (#13785)
CI errors look unrelated to this change.
2023-02-21 17:34:46 -08:00
zachjsh 665dee43bf
Revert "Operator conversion deny list (#13766)" (#13829)
This reverts commit 38e620aa4c.
2023-02-21 15:14:49 -08:00
Paul Rogers 5dadbdf4d0
Generate the IT docker-compose.yaml files (#13669)
Generate IT docker-compose.sh files

Generates test-specific docker-compose.sh files using a simple
Python template script.
2023-02-21 15:03:02 -08:00
benkrug c6b1576fc1
Update clean-metadata-store.md (#13131)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-02-21 12:53:54 -08:00
Paul Rogers 85d36be085
Information schema now uses numeric column types (#13777)
Change to use SQL schemas to allow null numeric columns

* Updated docs
2023-02-17 14:39:31 -08:00
Katya Macedo bc8b710b7e
Fix broken link (#13767) 2023-02-17 09:02:12 -08:00
Churro c1f283fd31
Better sidecar support (#13655)
* Better sidecar support

* remove un-thrown exception from test

* Druid you are such a stickler about spelling :)

* Only require the primaryContainerName, no need to exclude containers
2023-02-14 10:56:15 +05:30
Guy ☀️ Moore 306997be87
Add Perl 5 to druid requirements (#13708)
Without perl 5 I was unable to start druid using the instructions in the quickstart guide. I'm not certain what versions it might require, but the one that I got working was perl 5

> This is perl 5, version 36, subversion 0 (v5.36.0) built for x86_64-linux-thread-multi
2023-02-13 13:34:49 -08:00
zachjsh 38e620aa4c
Operator conversion deny list (#13766)
### Description

This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify
any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions.

An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following:

```
{
  "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985",
  "state": "FAILED",
  "error": {
    "error": "Plan validation failed",
    "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)",
    "errorClass": "org.apache.calcite.tools.ValidationException",
    "host": null
  }
}
```
2023-02-10 09:59:26 -08:00
Anshu Makkar d7b95988d7
Add missing documentation for constant post-aggregator (#13664)
Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.
2023-02-09 08:53:45 -08:00
Suneet Saldanha 714ac07b52
Allow users to add additional metadata to ingestion metrics (#13760)
* Allow users to add additional metadata to ingestion metrics

When submitting an ingestion spec, users may pass a map of metadata
in the ingestion spec config that will be added to ingestion metrics.

This will make it possible for operators to tag metrics with other
metadata that doesn't necessarily line up with the existing tags
like taskId.

Druid clusters that ingest these metrics can take advantage of the
nested data columns feature to process this additional metadata.

* rename to tags

* docs

* tests

* fix test

* make code cov happy

* checkstyle
2023-02-08 18:07:23 -08:00
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
AmatyaAvadhanula dcdae84888
Add server view initialization metrics (#13716)
* Add server view init metrics

* Test coverage

* Rename metrics
2023-02-07 20:02:00 +05:30
Suneet Saldanha bea18dc9e4
Update basic auth examples (#13750) 2023-02-03 14:45:48 -08:00
drudi-at-coffee 7580248770
Update api.md (#13727)
Added missing '/status' in HTTP status request
2023-02-02 10:43:22 -08:00
Victoria Lim 33efd5ab1d
docs: Refresh the update data tutorial (#13641)
Merging regardless of nit since topic is in better shape.

* refresh the update data tutorial

* Apply suggestions from code review

Co-authored-by: Jill Osborne <jill.osborne@imply.io>

---------

Co-authored-by: Jill Osborne <jill.osborne@imply.io>
2023-02-01 18:18:16 -08:00
Kashif Faraz f629643c50
Fix value of lookup sync period in docs (#13695)
* Fix lookup docs

* Fix spelling

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-02-01 18:12:00 -08:00
Sergio Ferragut 7f830b20d7
fixed init commands for both mysql and postgresql (#13713) 2023-02-01 18:07:31 -08:00
Suneet Saldanha cfc3115a59
Compaction history returns empty list instead of 404 when not found (#13730)
* Compaction history returns empty list instead of 404 when not found

* checkstyle
2023-02-01 17:44:07 -08:00
Tijo Thomas 1beef30bb2
Support postaggregation function as in Math.pow() (#13703) (#13704)
Support postaggregation function as in Math.pow()
2023-01-31 22:55:04 +05:30
Adarsh Sanjeev 51dfde0284
Add maxInputBytesPerWorker as query context parameter (#13707)
* Add maxInputBytesPerWorker as query context parameter

* Move documenation to msq specific docs

* Update tests

* Spacing

* Address review comments

* Fix test

* Update docs/multi-stage-query/reference.md

* Correct spelling mistake

---------

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
2023-01-31 20:55:28 +05:30
Jill Osborne 356b0e37cf
Tutorial: Query view (#13565)
* Tutorial: Query view

* Removed duplicate file

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Update tutorial-sql-query-view.md

* Updated after review

* Update docs/tutorials/tutorial-sql-query-view.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update tutorial-sql-query-view.md

Update title

* Update sidebars.json

fix merge conflict w/ sidebar

* address spelling ci

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-27 14:29:43 -08:00
sairam devarashetty 6164c420a1
Create update.md (#13451)
* Create update.md

Important Line highlighted

* Update docs/data-management/update.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-25 16:23:40 -08:00
317brian 9021161c8c
doc: fix markdown spacing (#13683)
* doc: fix markdown spacing

* fix spacing
2023-01-25 16:22:49 -08:00
Victoria Lim 00cee329bd
pitfall when using combining input source (#13639) 2023-01-25 12:50:19 -08:00
Suneet Saldanha 016c881795
Add API to return automatic compaction config history (#13699)
Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config.

The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.
2023-01-23 13:23:45 -08:00
Rohan Garg f76acccff2
Allow using composed storage for SuperSorter intermediate data (#13368) 2023-01-24 01:02:03 +05:30
Eyal Yurman 44374f91bc
Fix broken links to Oracle JDK docs (#13687)
* Fix broken link for SSLContext java doc

* Update tls-support.md

* Update tls-support.md

* Update tls-support.md

* Update simple-client-sslcontext.md
2023-01-18 14:46:08 +05:30
Paul Rogers 22630b0aab
Much improved table functions (#13627)
Much improved table functions

* Revises properties, definitions in the catalog
* Adds a "table function" abstraction to model such functions
* Specific functions for HTTP, inline, local and S3.
* Extended SQL types in the catalog
* Restructure external table definitions to use table functions
* EXTEND syntax for Druid's extern table function
* Support for array-valued table function parameters
* Support for array-valued SQL query parameters
* Much new documentation
2023-01-17 08:41:57 -08:00
Gian Merlino 182c4fad29
Kinesis: More robust default fetch settings. (#13539)
* Kinesis: More robust default fetch settings.

1) Default recordsPerFetch and recordBufferSize based on available memory
   rather than using hardcoded numbers. For this, we need an estimate
   of record size. Use 10 KB for regular records and 1 MB for aggregated
   records. With 1 GB heaps, 2 processors per task, and nonaggregated
   records, recordBufferSize comes out to the same as the old
   default (10000), and recordsPerFetch comes out slightly lower (1250
   instead of 4000).

2) Default maxRecordsPerPoll based on whether records are aggregated
   or not (100 if not aggregated, 1 if aggregated). Prior default was 100.

3) Default fetchThreads based on processors divided by task count on
   Indexers, rather than overall processor count.

4) Additionally clean up the serialized JSON a bit by adding various
   JsonInclude annotations.

* Updates for tests.

* Additional important verify.
2023-01-13 11:03:54 +05:30
Vadim Ogievetsky 93dc01b6c5
fix broken table missing new line (#13666) 2023-01-12 15:29:51 -08:00
Vadim Ogievetsky f97bcc69d3
Docs: reword single server page (#13659)
* reword single server page

* fix typo

* Update docs/operations/single-server.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* spelling

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-01-11 21:12:52 -08:00
Karan Kumar 56076d33fb
Worker retry for MSQ task (#13353)
* Initial commit.

* Fixing error message in retry exceeded exception

* Cleaning up some code

* Adding some test cases.

* Adding java docs.

* Finishing up state test cases.

* Adding some more java docs and fixing spot bugs, intellij inspections

* Fixing intellij inspections and added tests

* Documenting error codes

* Migrate current integration batch tests to equivalent MSQ tests (#13374)

* Migrate current integration batch tests to equivalent MSQ tests using new IT framework

* Fix build issues

* Trigger Build

* Adding more tests and addressing comments

* fixBuildIssues

* fix dependency issues

* Parameterized the test and addressed comments

* Addressing comments

* fixing checkstyle errors

* Adressing comments

* Adding ITTest which kills the worker abruptly

* Review comments phase one

* Adding doc changes

* Adjusting for single threaded execution.

* Adding Sequential Merge PR state handling

* Merge things

* Fixing checkstyle.

* Adding new context param for fault tolerance.
Adding stale task handling in sketchFetcher.
Adding UT's.

* Merge things

* Merge things

* Adding parameterized tests
Created separate module for faultToleranceTests

* Adding missed files

* Review comments and fixing tests.

* Documentation things.

* Fixing IT

* Controller impl fix.

* Fixing racy WorkerSketchFetcherTest.java exception handling.

Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com>
Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>
2023-01-11 07:38:29 +05:30
Abhishek Agarwal 17936e2920
Add an option to enable HSTS in druid services (#13489)
* Add an option to enable HSTS

* Fix code and add docs

* Deduplicate headers

* unused import

* Fix spelling
2023-01-10 22:31:51 +05:30
Victoria Lim a800dae87a
doc: List Protobuf as a supported format (#13640) 2023-01-06 15:09:37 -08:00
317brian 6bbf4266b2
docs: documentation for unnest datasource (#13479)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-01-06 11:41:11 -08:00
Kashif Faraz 0d97e658b2
Docs: Update quickstart instructions (#13611)
Changes:
- Remove specification of a Druid version in the quickstart, because the previous step
instructs downloading the latest version anyway.
- Mention usage of memory parameter in the quickstart
2022-12-22 11:51:08 +05:30
Vadim Ogievetsky 07597c687d
Docs: Remove large data file (#13595) 2022-12-19 13:14:22 +05:30
Gian Merlino ee890965f4
LocalInputSource: Serialize File paths without forcing resolution. (#13534)
* LocalInputSource: Serialize File paths without forcing resolution.

Fixes #13359.

* Add one more javadoc.
2022-12-19 11:47:36 +05:30
Victoria Lim 09d8b16447
Document shouldFinalize for sketches that have the parameter (#13524)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2022-12-17 10:48:06 -08:00
317brian d9c27d6102
docs: add index page and related stuff for jupyter tutorials (#13342) 2022-12-16 13:33:50 -08:00
Gian Merlino 7f3c117e3a
SQL: Improve docs around casts. (#13466)
Main change: clarify that the "default value" for casts only applies if
druid.generic.useDefaultValueForNull = true.

Secondary change: adjust a bunch of wording from future to present tense.
2022-12-15 15:01:40 -08:00
Kashif Faraz d6949b1b79
Track input processedBytes with MSQ ingestion (#13559)
Follow up to #13520

Bytes processed are currently tracked for intermediate stages in MSQ ingestion.
This patch adds the capability to track the bytes processed by an MSQ controller
task while reading from an external input source or a segment source.

Changes:
- Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader`
- Update `ChannelCounters` with the above obtained `processedBytes` when incrementing
the input file count.
- Update task report structure in docs

The total input processed bytes can be obtained by summing the `processedBytes` as follows:

totalBytes = 0
for every root stage (i.e. a stage which does not have another stage as an input):
    for every worker in that stage:
        for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.)
            totalBytes += processedBytes
2022-12-16 02:20:01 +05:30
Adarsh Sanjeev 2b605aa9cf
Multiple fixes for the MSQ stats merging piece which (#13463)
* Add validation checks to worker chat handler apis

* Merge things and polishing the error messages.

* Minor error message change

* Fixing race and adding some tests

* Fixing controller fetching stats from wrong workers.
Fixing race
Changing default mode to Parallel
Adding logging.
Fixing exceptions not propagated properly.

* Changing to kernel worker count

* Added a better logic to figure out assigned worker for a stage.

* Nits

* Moving to existing kernel methods

* Adding more coverage

Co-authored-by: cryptoe <karankumar1100@gmail.com>
2022-12-15 09:35:11 +05:30
Vadim Ogievetsky 2729e25295
Link to java docs (#13478)
* add link to page about selecting a JRE

* add link to script also

* simplify text
2022-12-14 11:45:23 -08:00
Gian Merlino de5a4bafcb
Zero-copy local deep storage. (#13394)
* Zero-copy local deep storage.

This is useful for local deep storage, since it reduces disk usage and
makes Historicals able to load segments instantaneously.

Two changes:

1) Introduce "druid.storage.zip" parameter for local storage, which defaults
   to false. This changes default behavior from writing an index.zip to writing
   a regular directory. This is safe to do even during a rolling update, because
   the older code actually already handled unzipped directories being present
   on local deep storage.

2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links
   instead of copies when possible. (Generally this is possible when the
   source and destination directory are on the same filesystem.)
2022-12-12 17:28:24 -08:00