druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Radhakrishnan	7400ed3c93	Fixup data deletion tutorial docs (#14283 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-05-17 17:05:35 -07:00
317brian	ceda1e98b9	docs: add docs for schema auto-discovery (#14065 ) * wip schemaless * wip * more cleanup * update tuningconfig example * updates based on feedback from clint * remove errant comma * update dimension object to include auto * update to include string schemaless way * fix spelling errors * updates for type-aware and string-based changes * Update docs/ingestion/schema-design.md * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * update spelling file * Update docs/ingestion/schema-design.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * copyedits * fix anchor --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-05-17 01:36:02 -07:00
Adarsh Sanjeev	e8ef31fe92	Fix condition for timeout in worker task launcher (#14270 ) * Fix condition for timeout in worker task launcher	2023-05-16 08:30:00 +05:30
Victoria Lim	66d4ea014c	Docs: Tutorial for streaming ingestion using Kafka + Docker file to use with Jupyter tutorials (#13984 )	2023-05-15 15:20:52 -07:00
Peter Marshall	c4aa98953b	202304-docs-removeDF (#14132 )	2023-05-15 15:08:57 -07:00
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
317brian	8bda7297e1	doc: fix unnest datasource syntax (#14272 )	2023-05-12 13:05:27 -07:00
317brian	6254658f61	docs: fix links (#14111 )	2023-05-12 09:59:16 -07:00
Kashif Faraz	47a70d03e8	Docs: Minor rephrase in indexing-service.md (#14231 ) * Fix language in indexing-service * Apply suggestions from code review Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-05-12 08:22:02 +05:30
317brian	cc37987dff	docs: copyedits for MSQ join algos (#14012 )	2023-05-11 14:21:09 -07:00
Clint Wylie	a58cebe491	add array_to_mv function to convert arrays into mvds to assist with migration from mvds to arrays (#14236 )	2023-05-11 04:43:28 -07:00
Kashif Faraz	bd0080c4ce	Update default values in docs (#14233 )	2023-05-09 19:13:51 +05:30
Shingo Kitagawa	152e9375e2	update documentation about multiValueHandling (#14197 ) * update documentation about multiValueHandling * Update docs/ingestion/ingestion-spec.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/ingestion/ingestion-spec.md Co-authored-by: Gian Merlino <gianmerlino@gmail.com> * fix spelling --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2023-05-08 16:16:54 -07:00
Abhishek Radhakrishnan	6ca3fb9b08	Remove the redundant ISO-8601 text in the readme. (#14210 )	2023-05-05 11:27:29 -07:00
zachjsh	48cde236c4	Add columnMappings to explain plan output (#14187 ) * Add columnMappings to explain plan output * * fix checkstyle * add tests * * improve test coverage * * temporarily remove unit-test need to run ITs * * depend on build * * temporarily lower unit test threshold * * add back dependency on unit-tests * * add license headers * * fix header order * * review comments * * fix intellij inspection errors * * revert code coverage change	2023-05-04 10:36:28 -07:00
Karan Kumar	6f0cdd0c3f	`TaskStartTimeoutFault` now depends on the last successful worker launch time. (#14172 ) * `TaskStartTimeoutFault` now depends on the last successful worker launch time.	2023-05-03 00:05:15 +05:30
Vadim Ogievetsky	32af570fb2	fix API doc formatting (#14167 )	2023-04-29 09:29:41 -07:00
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
Jill Osborne	d4e478c909	NVL function docs update (#14169 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-27 11:17:21 -07:00
TSFenwick	6c99fbea92	fix typo in s3 docs. add readme to s3 module. (#14135 ) * fix typo in s3 docs. add readme to s3 module. * Update extensions-core/s3-extensions/README.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * cleanup readme for s3 extension and link to repo markdown doc instead of web docs --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-26 14:03:11 -07:00
Tejaswini Bandlamudi	774073b2e7	Update Hadoop3 as default build version (#14005 ) Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.	2023-04-26 12:52:51 +05:30
Gian Merlino	a7d4162195	Compaction: Block input specs not aligned with segmentGranularity. (#14127 ) * Compaction: Block input specs not aligned with segmentGranularity. When input intervals are not aligned with segmentGranularity, data may be overshadowed if it lies in the space between the input intervals and the output segmentGranularity. In MSQ REPLACE, this is a validation error. IMO the same behavior makes sense for compaction tasks. In case anyone was depending on the ability to compact nonaligned intervals, a configuration parameter allowNonAlignedInterval is provided. I don't expect it to be used much. * Remove unused. * ITCompactionTaskTest uses non-aligned intervals.	2023-04-25 17:06:16 -07:00
Gian Merlino	89e7948159	MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105 ) * MSQ: Subclass CalciteJoinQueryTest, other supporting changes. The main change is the new tests: we now subclass CalciteJoinQueryTest in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for SortMerge. Two supporting production changes for default-value mode: 1) InputNumberDataSource is marked as concrete, to allow leftFilter to be pushed down to it. 2) In default-value mode, numeric frame field readers can now return nulls. This is necessary when stacking joins on top of joins: nulls must be preserved for semantics that match broadcast joins and native queries. 3) In default-value mode, StringFieldReader.isNull returns true on empty strings in addition to nulls. This is more consistent with the behavior of the selectors, which map empty strings to null as well in that mode. As an effect of change (2), the InsertTimeNull change from #14020 (to replace null timestamps with default timestamps) is reverted. IMO, this is fine, as either behavior is defensible, and the change from #14020 hasn't been released yet. * Adjust tests. * Style fix. * Additional tests.	2023-04-25 12:10:23 -07:00
TSFenwick	accd5536df	Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094 ) * Allow for Log4J to be configured for peons but still ensure console logging is enforced This change will allow for log4j to be configured for peons but require console logging is still configured for them to ensure peon logs are saved to deep storage. Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console Config as the previous config was incorrect and would never return a logger. * fix checkstyle * add warning to logger when it overwrites all loggers to be console * optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory add getName to the druid logger class * update docs, and error message * edit docs to be more clear * fix checkstyle issues * CI fixes - LoggerTest code coverage and fix spelling issue for logging docs	2023-04-24 10:41:56 -07:00
Adarsh Sanjeev	a7d5c64aeb	Move MSQ temporary storage to a runtime parameter instead of being configured from query context (#14061 ) * Adds new run time parameter druid.indexer.task.tmpStorageBytesPerTask. This sets a limit for the amount of temporary storage disk space used by tasks. This limit is currently only respected by MSQ tasks. * Removes query context parameters intermediateSuperSorterStorageMaxLocalBytes and composedIntermediateSuperSorterStorageEnabled. Composed intermediate super sorter (which was enabled by composedIntermediateSuperSorterStorageEnabled) is now enabled automatically if durableShuffleStorage is set to true. intermediateSuperSorterStorageMaxLocalBytes is calculated from the limit set by the run time parameter druid.indexer.task.tmpStorageBytesPerTask.	2023-04-18 16:56:51 +05:30
Laksh Singla	8eb854c845	Remove maxResultsSize config property from S3OutputConfig (#14101 ) * "maxResultsSize" has been removed from the S3OutputConfig and a default "chunkSize" of 100MiB is now present. This change primarily affects users who wish to use durable storage for MSQ jobs.	2023-04-18 14:25:20 +05:30
Clint Wylie	f6a0888bc0	document arrays in sql (#12549 ) * document arrays in sql * adjustments * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-data-types.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-data-types.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update sql-array-functions.md * fix stuff * fix spelling --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-04-17 19:08:46 -07:00
Abhishek Radhakrishnan	c98c66558f	Include statement attributes in `EXPLAIN PLAN` output (#14074 ) This commit adds attributes that contain metadata information about the query in the EXPLAIN PLAN output. The attributes currently contain two items: - `statementTyp`: SELECT, INSERT or REPLACE - `targetDataSource`: provides the target datasource name for DML statements It is added to both the legacy and native query plan outputs.	2023-04-17 21:00:25 +05:30
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
317brian	6c9b7b6efd	msq: add durable storage info (#14035 ) * msq: add durable storage info * fix duplicate row * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2023-04-14 13:28:23 +05:30
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
Vadim Ogievetsky	3a7e4efdd6	Docs: updating Kafka input format docs (#14049 ) * updating Kafka input format docs * typo * spellcheck * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-11 20:06:23 -07:00
Abhishek Radhakrishnan	5ce1b0903e	Add basic security functions to druidapi (follow up to #14009 ) (#14055 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Paul Rogers <progers@apache.org>	2023-04-11 10:55:27 -07:00
Gian Merlino	d52bc333aa	Frames: Ensure nulls are read as default values when appropriate. (#14020 ) * Frames: Ensure nulls are read as default values when appropriate. Fixes a bug where LongFieldWriter didn't write a properly transformed zero when writing out a null. This had no meaningful effect in SQL-compatible null handling mode, because the field would get treated as a null anyway. But it does have an effect in default-value mode: it would cause Long.MIN_VALUE to get read out instead of zero. Also adds NullHandling checks to the various frame-based column selectors, allowing reading of nullable frames by servers in default-value mode.	2023-04-10 05:28:46 +05:30
Charles Smith	166cb6203b	Remove unnecessary python topic. Style changes to quickstart. (#13647 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-07 09:55:52 -07:00
Vadim Ogievetsky	5ee4ecee62	Web console: use new sampler features (#14017 ) * use new sampler features * supprot kafka format * update DQT, fix tests * prefer non numeric formats * fix input format step * boost SQL data loader * delete dimension in auto discover mode * inline example specs * feedback updates * yeet the format into valueFormat when switching to kafka * kafka format is now a toggle * even better form layout * rename	2023-04-07 06:28:29 -07:00
Suraj Sanjay Kadam	b4157e32ae	Update api.md (#13436 ) * Update api.md I have created changes in api call of python according to latest version of requests 2.28.1 library. Along with this there are some irregularities between use of <your-instance> and <hostname> so I have tried to fix that also. * Update api.md made some changes in declaring USER and PASSWORD	2023-04-06 15:05:36 -07:00
Charles Smith	1c2744b31e	Fix querying sql (#14026 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-06 14:50:06 -07:00
Paul Rogers	030ed911d4	Temporarily revert extended table functions for Druid 26 (#14019 )	2023-04-05 21:09:33 -07:00
Nicholas Lippis	5810e650d4	K8s mm less fixes (#14028 ) Update Fabric8 version and allow metrics monitors to be overriden	2023-04-05 22:23:16 +05:30
Tejaswini Bandlamudi	ccf48245d7	Update documentation for Kafka Supervisor IdleConfig (#14032 )	2023-04-05 21:55:39 +05:30
Karan Kumar	e6a11707cb	Adding query stack fault to MSQ to capture native query errors. (#13926 ) * Add a new fault "QueryRuntimeError" to MSQ engine to capture native query errors. * Fixed bug in MSQ fault tolerance where worker were being retried if `UnexpectedMultiValueDimensionException` was thrown. * An exception from the query runtime with `org.apache.druid.query` as the package name is thrown as a QueryRuntimeError	2023-04-05 16:29:10 +05:30
317brian	7e572eef08	docs: sql unnest and cleanup unnest datasource (#13736 ) Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com> Co-authored-by: Jill Osborne <jill.osborne@imply.io> Co-authored-by: Anshu Makkar <83963638+anshu-makkar@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Elliott Freis <108356317+imply-elliott@users.noreply.github.com> Co-authored-by: Nicholas Lippis <nick.lippis@imply.io> Co-authored-by: Rohan Garg <7731512+rohangarg@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-04-04 13:07:54 -07:00
Vadim Ogievetsky	981662e9f4	Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993 ) * in progress * better form * doc updates * doc changes * add inline docs * fix tests * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * final fixes * fix case * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix overflow * fix spelling --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-31 10:12:25 -07:00
Clint Wylie	e3211e3be0	actually backwards compatible frontCoded string encoding strategy (#13996 )	2023-03-31 02:24:12 -07:00
Clint Wylie	2219e68fa3	add backwards compat mode for frontCoded stringEncodingStrategy (#13988 )	2023-03-28 14:44:44 -07:00
Paul Rogers	76fe26d4ba	Fix typos, add tests for http() function (#13954 )	2023-03-28 14:41:06 -07:00
frankgrimes97	2f98675285	Tuple sketch SQL support (#13887 ) This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.	2023-03-28 18:47:12 +05:30
Rishabh Singh	e8e8082573	Update OIDCConfig with scope information (#13973 ) Allow users to provide custom scope through OIDC configuration	2023-03-28 14:50:00 +05:30
Gian Merlino	062d72b67e	Add timeout to TaskStartTimeoutFault. (#13970 ) * Add timeout to TaskStartTimeoutFault. Makes the error message a bit more useful. * Update docs.	2023-03-27 23:37:19 +05:30
Arnout Engelen	daff7fe73b	Document how to report security issues (#13886 ) Document how to report security issues on the security overview page, so we can link this page from the homepage. That should make all the other important security information easier to find as well.	2023-03-27 11:26:37 +05:30
Atul Mohan	19db32d6b4	Add JWT authenticator support for validating ID Tokens (#13242 ) Expands the OIDC based auth in Druid by adding a JWT Authenticator that validates ID Tokens associated with a request. The existing pac4j authenticator works for authenticating web users while accessing the console, whereas this authenticator is for validating Druid API requests made by Direct clients. Services already supporting OIDC can attach their ID tokens to the Druid requests under the Authorization request header.	2023-03-25 18:41:40 +05:30
Gian Merlino	549018d076	Revert "Update docs." This reverts commit `de27c7d3c1`.	2023-03-24 17:16:12 -07:00
Gian Merlino	de27c7d3c1	Update docs.	2023-03-24 17:15:27 -07:00
Nicholas Lippis	8a72544bd2	Hook up pod template adapter (#13966 ) * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * fix spelling errors --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-03-24 12:13:46 -06:00
Jill Osborne	976d39281f	Fix some broken links in docs (#13968 )	2023-03-24 10:48:23 -07:00
Paul Rogers	da42ee5bfa	Added TYPE(native) data type for external tables (#13958 )	2023-03-22 21:43:29 -07:00
Adarsh Sanjeev	7bab407495	Add segment generator counters to MSQ reports (#13909 ) * Add segment generator counters to reports * Remove unneeded annotation * Fix checkstyle and coverage * Add persist and merged as new metrics * Address review comments * Fix checkstyle * Create metrics class to handle updating counters * Address review comments * Add rowsPushed as a new metrics	2023-03-22 09:17:26 -07:00
Jill Osborne	4f95285406	Correct nested columns JSON example (#13953 )	2023-03-21 09:17:26 -07:00
Karan Kumar	67df1324ee	Undocumenting certain context parameter in MSQ. (#13928 ) * Removing intermediateSuperSorterStorageMaxLocalBytes, maxInputBytesPerWorker, composedIntermediateSuperSorterStorageEnabled, clusterStatisticsMergeMode from docs * Adding documentation in the context class.	2023-03-16 17:56:44 +05:30
317brian	65a663adbb	docs: clarify Java precision (#13671 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-15 11:43:41 -07:00
somu-imply	a7ba361666	Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-14 16:05:56 -07:00
Suneet Saldanha	44547614ae	Report engine as a dimension for sqlQuery metrics (#13906 ) * Report engine as a dimension for sqlQuery metrics * docs	2023-03-10 11:23:57 -08:00
Gian Merlino	4b1ffbc452	Various changes and fixes to UNNEST. (#13892 ) * Various changes and fixes to UNNEST. Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.	2023-03-10 16:42:08 +05:30
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Gian Merlino	82f7a56475	Sort-merge join and hash shuffles for MSQ. (#13506 ) * Sort-merge join and hash shuffles for MSQ. The main changes are in the processing, multi-stage-query, and sql modules. processing module: 1) Rename SortColumn to KeyColumn, replace boolean descending with KeyOrder. This makes it nicer to model hash keys, which use KeyOrder.NONE. 2) Add nullability checkers to the FieldReader interface, and an "isPartiallyNullKey" method to FrameComparisonWidget. The join processor uses this to detect null keys. 3) Add WritableFrameChannel.isClosed and OutputChannel.isReadableChannelReady so callers can tell which OutputChannels are ready for reading and which aren't. 4) Specialize FrameProcessors.makeCursor to return FrameCursor, a random-access implementation. The join processor uses this to rewind when it needs to replay a set of rows with a particular key. 5) Add MemoryAllocatorFactory, which is embedded inside FrameWriterFactory instead of a particular MemoryAllocator. This allows FrameWriterFactory to be shared in more scenarios. multi-stage-query module: 1) ShuffleSpec: Add hash-based shuffles. New enum ShuffleKind helps callers figure out what kind of shuffle is happening. The change from SortColumn to KeyColumn allows ClusterBy to be used for both hash-based and sort-based shuffling. 2) WorkerImpl: Add ability to handle hash-based shuffles. Refactor the logic to be more readable by moving the work-order-running code to the inner class RunWorkOrder, and the shuffle-pipeline-building code to the inner class ShufflePipelineBuilder. 3) Add SortMergeJoinFrameProcessor and factory. 4) WorkerMemoryParameters: Adjust logic to reserve space for output frames for hash partitioning. (We need one frame per partition.) sql module: 1) Add sqlJoinAlgorithm context parameter; can be "broadcast" or "sortMerge". With native, it must always be "broadcast", or it's a validation error. MSQ supports both. Default is "broadcast" in both engines. 2) Validate that MSQs do not use broadcast join with RIGHT or FULL join, as results are not correct for broadcast join with those types. Allow this in native for two reasons: legacy (the docs caution against it, but it's always been allowed), and the fact that it actually does generate correct results in native when the join is processed on the Broker. It is much less likely that MSQ will plan in such a way that generates correct results. 3) Remove subquery penalty in DruidJoinQueryRel when using sort-merge join, because subqueries are always required, so there's no reason to penalize them. 4) Move previously-disabled join reordering and manipulation rules to FANCY_JOIN_RULES, and enable them when using sort-merge join. Helps get to better plans where projections and filters are pushed down. * Work around compiler problem. * Updates from static analysis. * Fix @param tag. * Fix declared exception. * Fix spelling. * Minor adjustments. * wip * Merge fixups * fixes * Fix CalciteSelectQueryMSQTest * Empty keys are sortable. * Address comments from code review. Rename mux -> mix. * Restore inspection config. * Restore original doc. * Reorder imports. * Adjustments * Fix. * Fix imports. * Adjustments from review. * Update header. * Adjust docs.	2023-03-08 14:19:39 -08:00
Abhishek Agarwal	52bd9e6adb	Improved error message when topic name changes within same supervisor (#13815 ) Improved error message when topic name changes within same supervisor Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-03-07 18:10:18 -08:00
Adarsh Sanjeev	ef82756176	Add validation for aggregations on __time (#13793 ) * Add validation for aggregations on __time	2023-03-07 17:16:36 -08:00
Karan Kumar	94cfabea18	Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. (#13846 ) * Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-03-06 18:00:36 +05:30
Paul Rogers	a580aca551	Python Druid API for use in notebooks (#13787 ) Python Druid API for use in notebooks Revises existing notebooks and readme to reference the new API. Notebook to explain the new API. Split README into a console version and a notebook version to work around lack of a nice display for md files. Update the REST API notebook to use simpler Requests calls Converted the SQL tutorial to use the Python library README file, converted to using properties --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-04 18:25:19 -08:00
Anshu Makkar	a10e4150d5	Add Post Aggregators for Tuple Sketches (#13819 ) You can now do the following operations with TupleSketches in Post Aggregation Step Get the Sketch Output as Base64 String Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch	2023-03-03 09:32:09 +05:30
317brian	b4b354b658	docs: fix html nits (#13835 )	2023-03-02 11:19:32 -08:00
Jill Osborne	26c5cac41a	Fix a link problem (#13876 )	2023-03-02 09:09:51 -08:00
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Apoorv Gupta	b26f1b4a5d	Update datasources.md: Fix Documentation. (#13865 ) Fixed documentation to clarify that union query cant be run over query datasources.	2023-03-01 20:29:15 +05:30
Laksh Singla	ca68fd93a6	Generate tombstones when running MSQ's replace (#13706 ) *When running REPLACE queries, the segments which contain no data are dropped (marked as unused). This PR aims to generate tombstones in place of segments which contain no data to mark their deletion, as is the behavior with the native ingestion. This will cause InsertCannotReplaceExistingSegmentFault to be removed since it was generated if the interval to be marked unused didn't fully overlap one of the existing segments to replace.	2023-03-01 12:01:30 +05:30
AdheipSingh	22e516fd53	Update kubernetes.md (#13858 )	2023-02-28 11:20:24 -08:00
Kashif Faraz	12f62e2c42	Clarify doc of ingest/handoff/time metric (#13856 )	2023-02-28 10:37:47 +05:30
Victoria Lim	e46379ba7a	Docs: Update name of the metadata tables (#13734 ) * Update name of the metadata tables * emend spelling file * fix spelling --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-23 13:57:59 -08:00
tejasparbat	d74d6824ec	update LDAP endpoint (#13839 ) Current DOC at step https://druid.apache.org/docs/latest/operations/auth-ldap.html#add-an-ldap-user-to-druid-and-assign-a-role Example request to add the LDAP user myuser to Druid: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authentication/db/ldap/users/myuser Example request to assign the myuser user to the queryRole role: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authentication/db/ldap/users/myuser/roles/queryRole Expected: Example request to add the LDAP user myuser to Druid: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/users/myuser Example request to assign the myuser user to the queryRole role curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/users/myuser/roles/queryRole	2023-02-23 13:55:06 -08:00
Win Min Soe	70f9052f1d	docs: update correct config base on server spec (#13832 ) Co-authored-by: Winn Minn <winn.minn@grabtaxi.com>	2023-02-23 08:50:47 -08:00
Abhishek Radhakrishnan	17a3cd0b68	Remove the additional backtick that's causing a SA issue. (#13838 )	2023-02-23 09:01:08 +05:30
benkrug	66034dd8bc	Update default for finalize in query-context.md (#13763 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-02-22 12:35:36 -08:00
Katya Macedo	1595653e6f	docs: add a link for the Druid SQL tutorial (#13468 ) * docs: add juptyer API tutorial for API and jupyter tutorial index (#3) (cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583) * update prereqs and fix jupyterlab name * Removing notebook since 13345 has it 13345 should be merged first * update contributing instructions * docs: link to the Druid SQL tutorial * Add link to partitioning * fix merge conflict * Saving * Update docs/tutorials/tutorial-jupyter-index.md * Remove partitioning --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: brian.le <brian.le@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-22 09:36:13 -08:00
317brian	07883e311e	doc: fix unnecessary link (#13785 ) CI errors look unrelated to this change.	2023-02-21 17:34:46 -08:00
zachjsh	665dee43bf	Revert "Operator conversion deny list (#13766 )" (#13829 ) This reverts commit `38e620aa4c`.	2023-02-21 15:14:49 -08:00
Paul Rogers	5dadbdf4d0	Generate the IT docker-compose.yaml files (#13669 ) Generate IT docker-compose.sh files Generates test-specific docker-compose.sh files using a simple Python template script.	2023-02-21 15:03:02 -08:00
benkrug	c6b1576fc1	Update clean-metadata-store.md (#13131 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-02-21 12:53:54 -08:00
Paul Rogers	85d36be085	Information schema now uses numeric column types (#13777 ) Change to use SQL schemas to allow null numeric columns * Updated docs	2023-02-17 14:39:31 -08:00
Katya Macedo	bc8b710b7e	Fix broken link (#13767 )	2023-02-17 09:02:12 -08:00
Churro	c1f283fd31	Better sidecar support (#13655 ) * Better sidecar support * remove un-thrown exception from test * Druid you are such a stickler about spelling :) * Only require the primaryContainerName, no need to exclude containers	2023-02-14 10:56:15 +05:30
Guy ☀️ Moore	306997be87	Add Perl 5 to druid requirements (#13708 ) Without perl 5 I was unable to start druid using the instructions in the quickstart guide. I'm not certain what versions it might require, but the one that I got working was perl 5 > This is perl 5, version 36, subversion 0 (v5.36.0) built for x86_64-linux-thread-multi	2023-02-13 13:34:49 -08:00
zachjsh	38e620aa4c	Operator conversion deny list (#13766 ) ### Description This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions. An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following: ``` { "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985", "state": "FAILED", "error": { "error": "Plan validation failed", "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)", "errorClass": "org.apache.calcite.tools.ValidationException", "host": null } } ```	2023-02-10 09:59:26 -08:00
Anshu Makkar	d7b95988d7	Add missing documentation for constant post-aggregator (#13664 ) Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.	2023-02-09 08:53:45 -08:00
Suneet Saldanha	714ac07b52	Allow users to add additional metadata to ingestion metrics (#13760 ) * Allow users to add additional metadata to ingestion metrics When submitting an ingestion spec, users may pass a map of metadata in the ingestion spec config that will be added to ingestion metrics. This will make it possible for operators to tag metrics with other metadata that doesn't necessarily line up with the existing tags like taskId. Druid clusters that ingest these metrics can take advantage of the nested data columns feature to process this additional metadata. * rename to tags * docs * tests * fix test * make code cov happy * checkstyle	2023-02-08 18:07:23 -08:00
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
AmatyaAvadhanula	dcdae84888	Add server view initialization metrics (#13716 ) * Add server view init metrics * Test coverage * Rename metrics	2023-02-07 20:02:00 +05:30
Suneet Saldanha	bea18dc9e4	Update basic auth examples (#13750 )	2023-02-03 14:45:48 -08:00
drudi-at-coffee	7580248770	Update api.md (#13727 ) Added missing '/status' in HTTP status request	2023-02-02 10:43:22 -08:00
Victoria Lim	33efd5ab1d	docs: Refresh the update data tutorial (#13641 ) Merging regardless of nit since topic is in better shape. * refresh the update data tutorial * Apply suggestions from code review Co-authored-by: Jill Osborne <jill.osborne@imply.io> --------- Co-authored-by: Jill Osborne <jill.osborne@imply.io>	2023-02-01 18:18:16 -08:00
Kashif Faraz	f629643c50	Fix value of lookup sync period in docs (#13695 ) * Fix lookup docs * Fix spelling * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-01 18:12:00 -08:00
Sergio Ferragut	7f830b20d7	fixed init commands for both mysql and postgresql (#13713 )	2023-02-01 18:07:31 -08:00
Suneet Saldanha	cfc3115a59	Compaction history returns empty list instead of 404 when not found (#13730 ) * Compaction history returns empty list instead of 404 when not found * checkstyle	2023-02-01 17:44:07 -08:00
Tijo Thomas	1beef30bb2	Support postaggregation function as in Math.pow() (#13703 ) (#13704 ) Support postaggregation function as in Math.pow()	2023-01-31 22:55:04 +05:30
Adarsh Sanjeev	51dfde0284	Add maxInputBytesPerWorker as query context parameter (#13707 ) * Add maxInputBytesPerWorker as query context parameter * Move documenation to msq specific docs * Update tests * Spacing * Address review comments * Fix test * Update docs/multi-stage-query/reference.md * Correct spelling mistake --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2023-01-31 20:55:28 +05:30
Jill Osborne	356b0e37cf	Tutorial: Query view (#13565 ) * Tutorial: Query view * Removed duplicate file * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Updated after review * Update docs/tutorials/tutorial-sql-query-view.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update tutorial-sql-query-view.md Update title * Update sidebars.json fix merge conflict w/ sidebar * address spelling ci --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-27 14:29:43 -08:00
sairam devarashetty	6164c420a1	Create update.md (#13451 ) * Create update.md Important Line highlighted * Update docs/data-management/update.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-25 16:23:40 -08:00
317brian	9021161c8c	doc: fix markdown spacing (#13683 ) * doc: fix markdown spacing * fix spacing	2023-01-25 16:22:49 -08:00
Victoria Lim	00cee329bd	pitfall when using combining input source (#13639 )	2023-01-25 12:50:19 -08:00
Suneet Saldanha	016c881795	Add API to return automatic compaction config history (#13699 ) Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config. The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.	2023-01-23 13:23:45 -08:00
Rohan Garg	f76acccff2	Allow using composed storage for SuperSorter intermediate data (#13368 )	2023-01-24 01:02:03 +05:30
Eyal Yurman	44374f91bc	Fix broken links to Oracle JDK docs (#13687 ) * Fix broken link for SSLContext java doc * Update tls-support.md * Update tls-support.md * Update tls-support.md * Update simple-client-sslcontext.md	2023-01-18 14:46:08 +05:30
Paul Rogers	22630b0aab	Much improved table functions (#13627 ) Much improved table functions * Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure external table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters * Much new documentation	2023-01-17 08:41:57 -08:00
Gian Merlino	182c4fad29	Kinesis: More robust default fetch settings. (#13539 ) * Kinesis: More robust default fetch settings. 1) Default recordsPerFetch and recordBufferSize based on available memory rather than using hardcoded numbers. For this, we need an estimate of record size. Use 10 KB for regular records and 1 MB for aggregated records. With 1 GB heaps, 2 processors per task, and nonaggregated records, recordBufferSize comes out to the same as the old default (10000), and recordsPerFetch comes out slightly lower (1250 instead of 4000). 2) Default maxRecordsPerPoll based on whether records are aggregated or not (100 if not aggregated, 1 if aggregated). Prior default was 100. 3) Default fetchThreads based on processors divided by task count on Indexers, rather than overall processor count. 4) Additionally clean up the serialized JSON a bit by adding various JsonInclude annotations. * Updates for tests. * Additional important verify.	2023-01-13 11:03:54 +05:30
Vadim Ogievetsky	93dc01b6c5	fix broken table missing new line (#13666 )	2023-01-12 15:29:51 -08:00
Vadim Ogievetsky	f97bcc69d3	Docs: reword single server page (#13659 ) * reword single server page * fix typo * Update docs/operations/single-server.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-11 21:12:52 -08:00
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
Abhishek Agarwal	17936e2920	Add an option to enable HSTS in druid services (#13489 ) * Add an option to enable HSTS * Fix code and add docs * Deduplicate headers * unused import * Fix spelling	2023-01-10 22:31:51 +05:30
Victoria Lim	a800dae87a	doc: List Protobuf as a supported format (#13640 )	2023-01-06 15:09:37 -08:00
317brian	6bbf4266b2	docs: documentation for unnest datasource (#13479 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-01-06 11:41:11 -08:00
Kashif Faraz	0d97e658b2	Docs: Update quickstart instructions (#13611 ) Changes: - Remove specification of a Druid version in the quickstart, because the previous step instructs downloading the latest version anyway. - Mention usage of memory parameter in the quickstart	2022-12-22 11:51:08 +05:30
Vadim Ogievetsky	07597c687d	Docs: Remove large data file (#13595 )	2022-12-19 13:14:22 +05:30
Gian Merlino	ee890965f4	LocalInputSource: Serialize File paths without forcing resolution. (#13534 ) * LocalInputSource: Serialize File paths without forcing resolution. Fixes #13359. * Add one more javadoc.	2022-12-19 11:47:36 +05:30
Victoria Lim	09d8b16447	Document shouldFinalize for sketches that have the parameter (#13524 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-17 10:48:06 -08:00
317brian	d9c27d6102	docs: add index page and related stuff for jupyter tutorials (#13342 )	2022-12-16 13:33:50 -08:00
Gian Merlino	7f3c117e3a	SQL: Improve docs around casts. (#13466 ) Main change: clarify that the "default value" for casts only applies if druid.generic.useDefaultValueForNull = true. Secondary change: adjust a bunch of wording from future to present tense.	2022-12-15 15:01:40 -08:00
Kashif Faraz	d6949b1b79	Track input processedBytes with MSQ ingestion (#13559 ) Follow up to #13520 Bytes processed are currently tracked for intermediate stages in MSQ ingestion. This patch adds the capability to track the bytes processed by an MSQ controller task while reading from an external input source or a segment source. Changes: - Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader` - Update `ChannelCounters` with the above obtained `processedBytes` when incrementing the input file count. - Update task report structure in docs The total input processed bytes can be obtained by summing the `processedBytes` as follows: totalBytes = 0 for every root stage (i.e. a stage which does not have another stage as an input): for every worker in that stage: for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.) totalBytes += processedBytes	2022-12-16 02:20:01 +05:30
Adarsh Sanjeev	2b605aa9cf	Multiple fixes for the MSQ stats merging piece which (#13463 ) * Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com>	2022-12-15 09:35:11 +05:30
Vadim Ogievetsky	2729e25295	Link to java docs (#13478 ) * add link to page about selecting a JRE * add link to script also * simplify text	2022-12-14 11:45:23 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
Gian Merlino	91ef9872ec	MSQ: Improve TooManyBuckets error message, improve error docs. (#13525 ) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions.	2022-12-08 13:18:26 -08:00
Jill Osborne	b56855b837	Update to native ingestion doc (#13482 ) * Update to native ingestion doc * Update docs/ingestion/native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-12-07 15:08:19 +05:30
Vadim Ogievetsky	9679f6a9b5	Web console: add arrayOfDoublesSketch and other small fixes (#13486 ) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-06 21:21:49 -08:00
Kashif Faraz	c7229fc787	Limit max batch size for segment allocation, add docs (#13503 ) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`	2022-12-07 10:07:14 +05:30
Gian Merlino	fda0a1aadd	Set chatAsync default to true. (#13491 ) This functionality was originally added in #13354.	2022-12-05 20:53:59 -08:00
Kashif Faraz	65945a686f	Docs: Update docs for coordinator dynamic config (#13494 ) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment	2022-12-05 16:53:10 +05:30
TSFenwick	10bec54acc	Switching emitter. This will allow for a per feed emitter designation. (#13363 ) * Switching emitter. This will allow for a per feed emitter designation. This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed. The emitter can direct the event to a default emitter. * fix checkstyle issues and make docs for switching emitter use basic event feeds * fix broken docs, add test, and guard against misconfigurations * add module test add switching emitter module test * fix broken SwitchingEmitterModuleTest * add apache license to top of test * fix checkstyle issues * address comments by adding javadocs, removing a todo, and making druid docs more clear	2022-12-05 16:04:34 +05:30
Katya Macedo	78c1a2bd66	Remove limit from timeseries (#13457 ) CI build failures seem unrelated to docs	2022-12-02 12:19:59 -08:00
Jill Osborne	138a6de507	Update nested columns docs (#13461 ) * Update nested columns docs (cherry picked from commit `04206c5179`) * Update nested-columns.md (cherry picked from commit `8085ee7217`)	2022-12-01 10:47:32 -08:00
317brian	cc2e4a80ff	doc: add a basic JDBC tutorial (#13343 ) * initial commit for jdbc tutorial (cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb) * add commentary * address comments from charles * add query context to example * fix typo * add links * Apply suggestions from code review Co-authored-by: Frank Chen <frankchen@apache.org> * fix datatype * address feedback * add parameterize to spelling file. the past tense version was already there Co-authored-by: Frank Chen <frankchen@apache.org>	2022-11-30 16:25:35 -08:00
Jill Osborne	291ded22d5	Update experimental features doc (#13452 )	2022-11-30 16:14:43 +05:30
Jill Osborne	5c520e0cf9	Update LDAP configuration docs (#13245 ) * Update LDAP configuration docs * Updated after review * Update auth-ldap.md Updated. * Update auth-ldap.md * Updated spelling file * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-29 09:26:32 -08:00
Jill Osborne	100a2aa4a2	Update and document experimental features (#13348 ) * Update and document experimental features * Updated * Update experimental-features.md * Update docs/development/experimental-features.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Updated after review * Updated * Update materialized-view.md * Update experimental-features.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-29 08:01:28 +05:30
Jill Osborne	db7c29c6f9	Correction to firehose migration doc (#13423 )	2022-11-28 10:24:27 +05:30
Adarsh Sanjeev	280a0f7158	Add sequential sketch merging to MSQ (#13205 ) * Add sketch fetching framework * Refactor code to support sequential merge * Update worker sketch fetcher * Refactor sketch fetcher * Refactor sketch fetcher * Add context parameter and threshold to trigger sequential merge * Fix test * Add integration test for non sequential merge * Address review comments * Address review comments * Address review comments * Resolve maxRetainedBytes * Add new classes * Renamed key statistics information class * Rename fetchStatisticsSnapshotForTimeChunk function * Address review comments * Address review comments * Update documentation and add comments * Resolve build issues * Resolve build issues * Change worker APIs to async * Address review comments * Resolve build issues * Add null time check * Update integration tests * Address review comments * Add log messages and comments * Resolve build issues * Add unit tests * Add unit tests * Fix timing issue in tests	2022-11-22 09:56:32 +05:30
Jill Osborne	68018a808f	Firehose migration doc (#12981 ) * Firehose migration doc * Update migrate-from-firehose-ingestion.md * Updated with review comments and suggestions * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md	2022-11-21 11:17:12 -08:00
Gian Merlino	bfffbabb56	Async task client for SeekableStreamSupervisors. (#13354 ) Main changes: 1) Convert SeekableStreamIndexTaskClient to an interface, move old code to SeekableStreamIndexTaskClientSyncImpl, and add new implementation SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient. 2) Add "chatAsync" parameter to seekable stream supervisors that causes the supervisor to use an async task client. 3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making blocking RPC calls in workerExec threads. 4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList to FutureUtils.coalesce, so we can better capture the errors that occurred with contacting individual tasks. Other, related changes: 1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether ServiceClient retries unavailable services. Useful since we do not want to retry calls unavailable tasks within the service client. (The supervisor does its own higher-level retries.) 2) Add FutureUtils.transformAsync, a more lambda friendly version of Futures.transform(f, AsyncFunction). 3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but returns Either instead of using null on error. 4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.	2022-11-21 19:20:26 +05:30
Katya Macedo	fd239305d9	Update metrics doc (#13316 ) Changes: - used inline code-style to format dimension names - removed unnecessary punctuation	2022-11-21 09:43:52 +05:30

1 2 3 4 5 ...

2918 Commits