druid

Commit Graph

Author	SHA1	Message	Date
zachjsh	665dee43bf	Revert "Operator conversion deny list (#13766 )" (#13829 ) This reverts commit `38e620aa4c`.	2023-02-21 15:14:49 -08:00
Paul Rogers	5dadbdf4d0	Generate the IT docker-compose.yaml files (#13669 ) Generate IT docker-compose.sh files Generates test-specific docker-compose.sh files using a simple Python template script.	2023-02-21 15:03:02 -08:00
benkrug	c6b1576fc1	Update clean-metadata-store.md (#13131 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-02-21 12:53:54 -08:00
Paul Rogers	85d36be085	Information schema now uses numeric column types (#13777 ) Change to use SQL schemas to allow null numeric columns * Updated docs	2023-02-17 14:39:31 -08:00
Katya Macedo	bc8b710b7e	Fix broken link (#13767 )	2023-02-17 09:02:12 -08:00
Churro	c1f283fd31	Better sidecar support (#13655 ) * Better sidecar support * remove un-thrown exception from test * Druid you are such a stickler about spelling :) * Only require the primaryContainerName, no need to exclude containers	2023-02-14 10:56:15 +05:30
Guy ☀️ Moore	306997be87	Add Perl 5 to druid requirements (#13708 ) Without perl 5 I was unable to start druid using the instructions in the quickstart guide. I'm not certain what versions it might require, but the one that I got working was perl 5 > This is perl 5, version 36, subversion 0 (v5.36.0) built for x86_64-linux-thread-multi	2023-02-13 13:34:49 -08:00
zachjsh	38e620aa4c	Operator conversion deny list (#13766 ) ### Description This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions. An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following: ``` { "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985", "state": "FAILED", "error": { "error": "Plan validation failed", "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)", "errorClass": "org.apache.calcite.tools.ValidationException", "host": null } } ```	2023-02-10 09:59:26 -08:00
Anshu Makkar	d7b95988d7	Add missing documentation for constant post-aggregator (#13664 ) Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.	2023-02-09 08:53:45 -08:00
Suneet Saldanha	714ac07b52	Allow users to add additional metadata to ingestion metrics (#13760 ) * Allow users to add additional metadata to ingestion metrics When submitting an ingestion spec, users may pass a map of metadata in the ingestion spec config that will be added to ingestion metrics. This will make it possible for operators to tag metrics with other metadata that doesn't necessarily line up with the existing tags like taskId. Druid clusters that ingest these metrics can take advantage of the nested data columns feature to process this additional metadata. * rename to tags * docs * tests * fix test * make code cov happy * checkstyle	2023-02-08 18:07:23 -08:00
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
AmatyaAvadhanula	dcdae84888	Add server view initialization metrics (#13716 ) * Add server view init metrics * Test coverage * Rename metrics	2023-02-07 20:02:00 +05:30
Suneet Saldanha	bea18dc9e4	Update basic auth examples (#13750 )	2023-02-03 14:45:48 -08:00
drudi-at-coffee	7580248770	Update api.md (#13727 ) Added missing '/status' in HTTP status request	2023-02-02 10:43:22 -08:00
Victoria Lim	33efd5ab1d	docs: Refresh the update data tutorial (#13641 ) Merging regardless of nit since topic is in better shape. * refresh the update data tutorial * Apply suggestions from code review Co-authored-by: Jill Osborne <jill.osborne@imply.io> --------- Co-authored-by: Jill Osborne <jill.osborne@imply.io>	2023-02-01 18:18:16 -08:00
Kashif Faraz	f629643c50	Fix value of lookup sync period in docs (#13695 ) * Fix lookup docs * Fix spelling * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-01 18:12:00 -08:00
Sergio Ferragut	7f830b20d7	fixed init commands for both mysql and postgresql (#13713 )	2023-02-01 18:07:31 -08:00
Suneet Saldanha	cfc3115a59	Compaction history returns empty list instead of 404 when not found (#13730 ) * Compaction history returns empty list instead of 404 when not found * checkstyle	2023-02-01 17:44:07 -08:00
Tijo Thomas	1beef30bb2	Support postaggregation function as in Math.pow() (#13703 ) (#13704 ) Support postaggregation function as in Math.pow()	2023-01-31 22:55:04 +05:30
Adarsh Sanjeev	51dfde0284	Add maxInputBytesPerWorker as query context parameter (#13707 ) * Add maxInputBytesPerWorker as query context parameter * Move documenation to msq specific docs * Update tests * Spacing * Address review comments * Fix test * Update docs/multi-stage-query/reference.md * Correct spelling mistake --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2023-01-31 20:55:28 +05:30
Jill Osborne	356b0e37cf	Tutorial: Query view (#13565 ) * Tutorial: Query view * Removed duplicate file * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Updated after review * Update docs/tutorials/tutorial-sql-query-view.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update tutorial-sql-query-view.md Update title * Update sidebars.json fix merge conflict w/ sidebar * address spelling ci --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-27 14:29:43 -08:00
sairam devarashetty	6164c420a1	Create update.md (#13451 ) * Create update.md Important Line highlighted * Update docs/data-management/update.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-25 16:23:40 -08:00
317brian	9021161c8c	doc: fix markdown spacing (#13683 ) * doc: fix markdown spacing * fix spacing	2023-01-25 16:22:49 -08:00
Victoria Lim	00cee329bd	pitfall when using combining input source (#13639 )	2023-01-25 12:50:19 -08:00
Suneet Saldanha	016c881795	Add API to return automatic compaction config history (#13699 ) Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config. The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.	2023-01-23 13:23:45 -08:00
Rohan Garg	f76acccff2	Allow using composed storage for SuperSorter intermediate data (#13368 )	2023-01-24 01:02:03 +05:30
Eyal Yurman	44374f91bc	Fix broken links to Oracle JDK docs (#13687 ) * Fix broken link for SSLContext java doc * Update tls-support.md * Update tls-support.md * Update tls-support.md * Update simple-client-sslcontext.md	2023-01-18 14:46:08 +05:30
Paul Rogers	22630b0aab	Much improved table functions (#13627 ) Much improved table functions * Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure external table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters * Much new documentation	2023-01-17 08:41:57 -08:00
Gian Merlino	182c4fad29	Kinesis: More robust default fetch settings. (#13539 ) * Kinesis: More robust default fetch settings. 1) Default recordsPerFetch and recordBufferSize based on available memory rather than using hardcoded numbers. For this, we need an estimate of record size. Use 10 KB for regular records and 1 MB for aggregated records. With 1 GB heaps, 2 processors per task, and nonaggregated records, recordBufferSize comes out to the same as the old default (10000), and recordsPerFetch comes out slightly lower (1250 instead of 4000). 2) Default maxRecordsPerPoll based on whether records are aggregated or not (100 if not aggregated, 1 if aggregated). Prior default was 100. 3) Default fetchThreads based on processors divided by task count on Indexers, rather than overall processor count. 4) Additionally clean up the serialized JSON a bit by adding various JsonInclude annotations. * Updates for tests. * Additional important verify.	2023-01-13 11:03:54 +05:30
Vadim Ogievetsky	93dc01b6c5	fix broken table missing new line (#13666 )	2023-01-12 15:29:51 -08:00
Vadim Ogievetsky	f97bcc69d3	Docs: reword single server page (#13659 ) * reword single server page * fix typo * Update docs/operations/single-server.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-11 21:12:52 -08:00
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
Abhishek Agarwal	17936e2920	Add an option to enable HSTS in druid services (#13489 ) * Add an option to enable HSTS * Fix code and add docs * Deduplicate headers * unused import * Fix spelling	2023-01-10 22:31:51 +05:30
Victoria Lim	a800dae87a	doc: List Protobuf as a supported format (#13640 )	2023-01-06 15:09:37 -08:00
317brian	6bbf4266b2	docs: documentation for unnest datasource (#13479 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-01-06 11:41:11 -08:00
Kashif Faraz	0d97e658b2	Docs: Update quickstart instructions (#13611 ) Changes: - Remove specification of a Druid version in the quickstart, because the previous step instructs downloading the latest version anyway. - Mention usage of memory parameter in the quickstart	2022-12-22 11:51:08 +05:30
Vadim Ogievetsky	07597c687d	Docs: Remove large data file (#13595 )	2022-12-19 13:14:22 +05:30
Gian Merlino	ee890965f4	LocalInputSource: Serialize File paths without forcing resolution. (#13534 ) * LocalInputSource: Serialize File paths without forcing resolution. Fixes #13359. * Add one more javadoc.	2022-12-19 11:47:36 +05:30
Victoria Lim	09d8b16447	Document shouldFinalize for sketches that have the parameter (#13524 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-17 10:48:06 -08:00
317brian	d9c27d6102	docs: add index page and related stuff for jupyter tutorials (#13342 )	2022-12-16 13:33:50 -08:00
Gian Merlino	7f3c117e3a	SQL: Improve docs around casts. (#13466 ) Main change: clarify that the "default value" for casts only applies if druid.generic.useDefaultValueForNull = true. Secondary change: adjust a bunch of wording from future to present tense.	2022-12-15 15:01:40 -08:00
Kashif Faraz	d6949b1b79	Track input processedBytes with MSQ ingestion (#13559 ) Follow up to #13520 Bytes processed are currently tracked for intermediate stages in MSQ ingestion. This patch adds the capability to track the bytes processed by an MSQ controller task while reading from an external input source or a segment source. Changes: - Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader` - Update `ChannelCounters` with the above obtained `processedBytes` when incrementing the input file count. - Update task report structure in docs The total input processed bytes can be obtained by summing the `processedBytes` as follows: totalBytes = 0 for every root stage (i.e. a stage which does not have another stage as an input): for every worker in that stage: for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.) totalBytes += processedBytes	2022-12-16 02:20:01 +05:30
Adarsh Sanjeev	2b605aa9cf	Multiple fixes for the MSQ stats merging piece which (#13463 ) * Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com>	2022-12-15 09:35:11 +05:30
Vadim Ogievetsky	2729e25295	Link to java docs (#13478 ) * add link to page about selecting a JRE * add link to script also * simplify text	2022-12-14 11:45:23 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
Gian Merlino	91ef9872ec	MSQ: Improve TooManyBuckets error message, improve error docs. (#13525 ) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions.	2022-12-08 13:18:26 -08:00
Jill Osborne	b56855b837	Update to native ingestion doc (#13482 ) * Update to native ingestion doc * Update docs/ingestion/native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-12-07 15:08:19 +05:30
Vadim Ogievetsky	9679f6a9b5	Web console: add arrayOfDoublesSketch and other small fixes (#13486 ) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-06 21:21:49 -08:00

1 2 3 4 5 ...

2733 Commits