druid

Commit Graph

Author	SHA1	Message	Date
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
Jill Osborne	d4e478c909	NVL function docs update (#14169 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-27 11:17:21 -07:00
TSFenwick	6c99fbea92	fix typo in s3 docs. add readme to s3 module. (#14135 ) * fix typo in s3 docs. add readme to s3 module. * Update extensions-core/s3-extensions/README.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * cleanup readme for s3 extension and link to repo markdown doc instead of web docs --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-26 14:03:11 -07:00
Tejaswini Bandlamudi	774073b2e7	Update Hadoop3 as default build version (#14005 ) Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.	2023-04-26 12:52:51 +05:30
Gian Merlino	a7d4162195	Compaction: Block input specs not aligned with segmentGranularity. (#14127 ) * Compaction: Block input specs not aligned with segmentGranularity. When input intervals are not aligned with segmentGranularity, data may be overshadowed if it lies in the space between the input intervals and the output segmentGranularity. In MSQ REPLACE, this is a validation error. IMO the same behavior makes sense for compaction tasks. In case anyone was depending on the ability to compact nonaligned intervals, a configuration parameter allowNonAlignedInterval is provided. I don't expect it to be used much. * Remove unused. * ITCompactionTaskTest uses non-aligned intervals.	2023-04-25 17:06:16 -07:00
Gian Merlino	89e7948159	MSQ: Subclass CalciteJoinQueryTest, other supporting changes. (#14105 ) * MSQ: Subclass CalciteJoinQueryTest, other supporting changes. The main change is the new tests: we now subclass CalciteJoinQueryTest in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for SortMerge. Two supporting production changes for default-value mode: 1) InputNumberDataSource is marked as concrete, to allow leftFilter to be pushed down to it. 2) In default-value mode, numeric frame field readers can now return nulls. This is necessary when stacking joins on top of joins: nulls must be preserved for semantics that match broadcast joins and native queries. 3) In default-value mode, StringFieldReader.isNull returns true on empty strings in addition to nulls. This is more consistent with the behavior of the selectors, which map empty strings to null as well in that mode. As an effect of change (2), the InsertTimeNull change from #14020 (to replace null timestamps with default timestamps) is reverted. IMO, this is fine, as either behavior is defensible, and the change from #14020 hasn't been released yet. * Adjust tests. * Style fix. * Additional tests.	2023-04-25 12:10:23 -07:00
TSFenwick	accd5536df	Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094 ) * Allow for Log4J to be configured for peons but still ensure console logging is enforced This change will allow for log4j to be configured for peons but require console logging is still configured for them to ensure peon logs are saved to deep storage. Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console Config as the previous config was incorrect and would never return a logger. * fix checkstyle * add warning to logger when it overwrites all loggers to be console * optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory add getName to the druid logger class * update docs, and error message * edit docs to be more clear * fix checkstyle issues * CI fixes - LoggerTest code coverage and fix spelling issue for logging docs	2023-04-24 10:41:56 -07:00
Adarsh Sanjeev	a7d5c64aeb	Move MSQ temporary storage to a runtime parameter instead of being configured from query context (#14061 ) * Adds new run time parameter druid.indexer.task.tmpStorageBytesPerTask. This sets a limit for the amount of temporary storage disk space used by tasks. This limit is currently only respected by MSQ tasks. * Removes query context parameters intermediateSuperSorterStorageMaxLocalBytes and composedIntermediateSuperSorterStorageEnabled. Composed intermediate super sorter (which was enabled by composedIntermediateSuperSorterStorageEnabled) is now enabled automatically if durableShuffleStorage is set to true. intermediateSuperSorterStorageMaxLocalBytes is calculated from the limit set by the run time parameter druid.indexer.task.tmpStorageBytesPerTask.	2023-04-18 16:56:51 +05:30
Laksh Singla	8eb854c845	Remove maxResultsSize config property from S3OutputConfig (#14101 ) * "maxResultsSize" has been removed from the S3OutputConfig and a default "chunkSize" of 100MiB is now present. This change primarily affects users who wish to use durable storage for MSQ jobs.	2023-04-18 14:25:20 +05:30
Clint Wylie	f6a0888bc0	document arrays in sql (#12549 ) * document arrays in sql * adjustments * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-data-types.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-data-types.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/querying/sql-array-functions.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update sql-array-functions.md * fix stuff * fix spelling --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-04-17 19:08:46 -07:00
Abhishek Radhakrishnan	c98c66558f	Include statement attributes in `EXPLAIN PLAN` output (#14074 ) This commit adds attributes that contain metadata information about the query in the EXPLAIN PLAN output. The attributes currently contain two items: - `statementTyp`: SELECT, INSERT or REPLACE - `targetDataSource`: provides the target datasource name for DML statements It is added to both the legacy and native query plan outputs.	2023-04-17 21:00:25 +05:30
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
317brian	6c9b7b6efd	msq: add durable storage info (#14035 ) * msq: add durable storage info * fix duplicate row * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2023-04-14 13:28:23 +05:30
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
Vadim Ogievetsky	3a7e4efdd6	Docs: updating Kafka input format docs (#14049 ) * updating Kafka input format docs * typo * spellcheck * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/development/extensions-core/kafka-ingestion.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/ingestion/data-formats.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-11 20:06:23 -07:00
Abhishek Radhakrishnan	5ce1b0903e	Add basic security functions to druidapi (follow up to #14009 ) (#14055 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Paul Rogers <progers@apache.org>	2023-04-11 10:55:27 -07:00
Gian Merlino	d52bc333aa	Frames: Ensure nulls are read as default values when appropriate. (#14020 ) * Frames: Ensure nulls are read as default values when appropriate. Fixes a bug where LongFieldWriter didn't write a properly transformed zero when writing out a null. This had no meaningful effect in SQL-compatible null handling mode, because the field would get treated as a null anyway. But it does have an effect in default-value mode: it would cause Long.MIN_VALUE to get read out instead of zero. Also adds NullHandling checks to the various frame-based column selectors, allowing reading of nullable frames by servers in default-value mode.	2023-04-10 05:28:46 +05:30
Charles Smith	166cb6203b	Remove unnecessary python topic. Style changes to quickstart. (#13647 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-07 09:55:52 -07:00
Vadim Ogievetsky	5ee4ecee62	Web console: use new sampler features (#14017 ) * use new sampler features * supprot kafka format * update DQT, fix tests * prefer non numeric formats * fix input format step * boost SQL data loader * delete dimension in auto discover mode * inline example specs * feedback updates * yeet the format into valueFormat when switching to kafka * kafka format is now a toggle * even better form layout * rename	2023-04-07 06:28:29 -07:00
Suraj Sanjay Kadam	b4157e32ae	Update api.md (#13436 ) * Update api.md I have created changes in api call of python according to latest version of requests 2.28.1 library. Along with this there are some irregularities between use of <your-instance> and <hostname> so I have tried to fix that also. * Update api.md made some changes in declaring USER and PASSWORD	2023-04-06 15:05:36 -07:00
Charles Smith	1c2744b31e	Fix querying sql (#14026 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-06 14:50:06 -07:00
Paul Rogers	030ed911d4	Temporarily revert extended table functions for Druid 26 (#14019 )	2023-04-05 21:09:33 -07:00
Nicholas Lippis	5810e650d4	K8s mm less fixes (#14028 ) Update Fabric8 version and allow metrics monitors to be overriden	2023-04-05 22:23:16 +05:30
Tejaswini Bandlamudi	ccf48245d7	Update documentation for Kafka Supervisor IdleConfig (#14032 )	2023-04-05 21:55:39 +05:30
Karan Kumar	e6a11707cb	Adding query stack fault to MSQ to capture native query errors. (#13926 ) * Add a new fault "QueryRuntimeError" to MSQ engine to capture native query errors. * Fixed bug in MSQ fault tolerance where worker were being retried if `UnexpectedMultiValueDimensionException` was thrown. * An exception from the query runtime with `org.apache.druid.query` as the package name is thrown as a QueryRuntimeError	2023-04-05 16:29:10 +05:30
317brian	7e572eef08	docs: sql unnest and cleanup unnest datasource (#13736 ) Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com> Co-authored-by: Jill Osborne <jill.osborne@imply.io> Co-authored-by: Anshu Makkar <83963638+anshu-makkar@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Elliott Freis <108356317+imply-elliott@users.noreply.github.com> Co-authored-by: Nicholas Lippis <nick.lippis@imply.io> Co-authored-by: Rohan Garg <7731512+rohangarg@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-04-04 13:07:54 -07:00
Vadim Ogievetsky	981662e9f4	Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993 ) * in progress * better form * doc updates * doc changes * add inline docs * fix tests * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * final fixes * fix case * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix overflow * fix spelling --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-31 10:12:25 -07:00
Clint Wylie	e3211e3be0	actually backwards compatible frontCoded string encoding strategy (#13996 )	2023-03-31 02:24:12 -07:00
Clint Wylie	2219e68fa3	add backwards compat mode for frontCoded stringEncodingStrategy (#13988 )	2023-03-28 14:44:44 -07:00
Paul Rogers	76fe26d4ba	Fix typos, add tests for http() function (#13954 )	2023-03-28 14:41:06 -07:00
frankgrimes97	2f98675285	Tuple sketch SQL support (#13887 ) This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.	2023-03-28 18:47:12 +05:30
Rishabh Singh	e8e8082573	Update OIDCConfig with scope information (#13973 ) Allow users to provide custom scope through OIDC configuration	2023-03-28 14:50:00 +05:30
Gian Merlino	062d72b67e	Add timeout to TaskStartTimeoutFault. (#13970 ) * Add timeout to TaskStartTimeoutFault. Makes the error message a bit more useful. * Update docs.	2023-03-27 23:37:19 +05:30
Arnout Engelen	daff7fe73b	Document how to report security issues (#13886 ) Document how to report security issues on the security overview page, so we can link this page from the homepage. That should make all the other important security information easier to find as well.	2023-03-27 11:26:37 +05:30
Atul Mohan	19db32d6b4	Add JWT authenticator support for validating ID Tokens (#13242 ) Expands the OIDC based auth in Druid by adding a JWT Authenticator that validates ID Tokens associated with a request. The existing pac4j authenticator works for authenticating web users while accessing the console, whereas this authenticator is for validating Druid API requests made by Direct clients. Services already supporting OIDC can attach their ID tokens to the Druid requests under the Authorization request header.	2023-03-25 18:41:40 +05:30
Gian Merlino	549018d076	Revert "Update docs." This reverts commit `de27c7d3c1`.	2023-03-24 17:16:12 -07:00
Gian Merlino	de27c7d3c1	Update docs.	2023-03-24 17:15:27 -07:00
Nicholas Lippis	8a72544bd2	Hook up pod template adapter (#13966 ) * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * fix spelling errors --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-03-24 12:13:46 -06:00
Jill Osborne	976d39281f	Fix some broken links in docs (#13968 )	2023-03-24 10:48:23 -07:00
Paul Rogers	da42ee5bfa	Added TYPE(native) data type for external tables (#13958 )	2023-03-22 21:43:29 -07:00
Adarsh Sanjeev	7bab407495	Add segment generator counters to MSQ reports (#13909 ) * Add segment generator counters to reports * Remove unneeded annotation * Fix checkstyle and coverage * Add persist and merged as new metrics * Address review comments * Fix checkstyle * Create metrics class to handle updating counters * Address review comments * Add rowsPushed as a new metrics	2023-03-22 09:17:26 -07:00
Jill Osborne	4f95285406	Correct nested columns JSON example (#13953 )	2023-03-21 09:17:26 -07:00
Karan Kumar	67df1324ee	Undocumenting certain context parameter in MSQ. (#13928 ) * Removing intermediateSuperSorterStorageMaxLocalBytes, maxInputBytesPerWorker, composedIntermediateSuperSorterStorageEnabled, clusterStatisticsMergeMode from docs * Adding documentation in the context class.	2023-03-16 17:56:44 +05:30
317brian	65a663adbb	docs: clarify Java precision (#13671 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-15 11:43:41 -07:00
somu-imply	a7ba361666	Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-14 16:05:56 -07:00
Suneet Saldanha	44547614ae	Report engine as a dimension for sqlQuery metrics (#13906 ) * Report engine as a dimension for sqlQuery metrics * docs	2023-03-10 11:23:57 -08:00
Gian Merlino	4b1ffbc452	Various changes and fixes to UNNEST. (#13892 ) * Various changes and fixes to UNNEST. Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.	2023-03-10 16:42:08 +05:30
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Gian Merlino	82f7a56475	Sort-merge join and hash shuffles for MSQ. (#13506 ) * Sort-merge join and hash shuffles for MSQ. The main changes are in the processing, multi-stage-query, and sql modules. processing module: 1) Rename SortColumn to KeyColumn, replace boolean descending with KeyOrder. This makes it nicer to model hash keys, which use KeyOrder.NONE. 2) Add nullability checkers to the FieldReader interface, and an "isPartiallyNullKey" method to FrameComparisonWidget. The join processor uses this to detect null keys. 3) Add WritableFrameChannel.isClosed and OutputChannel.isReadableChannelReady so callers can tell which OutputChannels are ready for reading and which aren't. 4) Specialize FrameProcessors.makeCursor to return FrameCursor, a random-access implementation. The join processor uses this to rewind when it needs to replay a set of rows with a particular key. 5) Add MemoryAllocatorFactory, which is embedded inside FrameWriterFactory instead of a particular MemoryAllocator. This allows FrameWriterFactory to be shared in more scenarios. multi-stage-query module: 1) ShuffleSpec: Add hash-based shuffles. New enum ShuffleKind helps callers figure out what kind of shuffle is happening. The change from SortColumn to KeyColumn allows ClusterBy to be used for both hash-based and sort-based shuffling. 2) WorkerImpl: Add ability to handle hash-based shuffles. Refactor the logic to be more readable by moving the work-order-running code to the inner class RunWorkOrder, and the shuffle-pipeline-building code to the inner class ShufflePipelineBuilder. 3) Add SortMergeJoinFrameProcessor and factory. 4) WorkerMemoryParameters: Adjust logic to reserve space for output frames for hash partitioning. (We need one frame per partition.) sql module: 1) Add sqlJoinAlgorithm context parameter; can be "broadcast" or "sortMerge". With native, it must always be "broadcast", or it's a validation error. MSQ supports both. Default is "broadcast" in both engines. 2) Validate that MSQs do not use broadcast join with RIGHT or FULL join, as results are not correct for broadcast join with those types. Allow this in native for two reasons: legacy (the docs caution against it, but it's always been allowed), and the fact that it actually does generate correct results in native when the join is processed on the Broker. It is much less likely that MSQ will plan in such a way that generates correct results. 3) Remove subquery penalty in DruidJoinQueryRel when using sort-merge join, because subqueries are always required, so there's no reason to penalize them. 4) Move previously-disabled join reordering and manipulation rules to FANCY_JOIN_RULES, and enable them when using sort-merge join. Helps get to better plans where projections and filters are pushed down. * Work around compiler problem. * Updates from static analysis. * Fix @param tag. * Fix declared exception. * Fix spelling. * Minor adjustments. * wip * Merge fixups * fixes * Fix CalciteSelectQueryMSQTest * Empty keys are sortable. * Address comments from code review. Rename mux -> mix. * Restore inspection config. * Restore original doc. * Reorder imports. * Adjustments * Fix. * Fix imports. * Adjustments from review. * Update header. * Adjust docs.	2023-03-08 14:19:39 -08:00
Abhishek Agarwal	52bd9e6adb	Improved error message when topic name changes within same supervisor (#13815 ) Improved error message when topic name changes within same supervisor Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-03-07 18:10:18 -08:00
Adarsh Sanjeev	ef82756176	Add validation for aggregations on __time (#13793 ) * Add validation for aggregations on __time	2023-03-07 17:16:36 -08:00
Karan Kumar	94cfabea18	Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. (#13846 ) * Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-03-06 18:00:36 +05:30
Paul Rogers	a580aca551	Python Druid API for use in notebooks (#13787 ) Python Druid API for use in notebooks Revises existing notebooks and readme to reference the new API. Notebook to explain the new API. Split README into a console version and a notebook version to work around lack of a nice display for md files. Update the REST API notebook to use simpler Requests calls Converted the SQL tutorial to use the Python library README file, converted to using properties --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-04 18:25:19 -08:00
Anshu Makkar	a10e4150d5	Add Post Aggregators for Tuple Sketches (#13819 ) You can now do the following operations with TupleSketches in Post Aggregation Step Get the Sketch Output as Base64 String Provide a constant Tuple Sketch in post-aggregation step that can be used in Set Operations Get the Estimated Value(Sum) of Summary/Metrics Objects associated with Tuple Sketch	2023-03-03 09:32:09 +05:30
317brian	b4b354b658	docs: fix html nits (#13835 )	2023-03-02 11:19:32 -08:00
Jill Osborne	26c5cac41a	Fix a link problem (#13876 )	2023-03-02 09:09:51 -08:00
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Apoorv Gupta	b26f1b4a5d	Update datasources.md: Fix Documentation. (#13865 ) Fixed documentation to clarify that union query cant be run over query datasources.	2023-03-01 20:29:15 +05:30
Laksh Singla	ca68fd93a6	Generate tombstones when running MSQ's replace (#13706 ) *When running REPLACE queries, the segments which contain no data are dropped (marked as unused). This PR aims to generate tombstones in place of segments which contain no data to mark their deletion, as is the behavior with the native ingestion. This will cause InsertCannotReplaceExistingSegmentFault to be removed since it was generated if the interval to be marked unused didn't fully overlap one of the existing segments to replace.	2023-03-01 12:01:30 +05:30
AdheipSingh	22e516fd53	Update kubernetes.md (#13858 )	2023-02-28 11:20:24 -08:00
Kashif Faraz	12f62e2c42	Clarify doc of ingest/handoff/time metric (#13856 )	2023-02-28 10:37:47 +05:30
Victoria Lim	e46379ba7a	Docs: Update name of the metadata tables (#13734 ) * Update name of the metadata tables * emend spelling file * fix spelling --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-23 13:57:59 -08:00
tejasparbat	d74d6824ec	update LDAP endpoint (#13839 ) Current DOC at step https://druid.apache.org/docs/latest/operations/auth-ldap.html#add-an-ldap-user-to-druid-and-assign-a-role Example request to add the LDAP user myuser to Druid: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authentication/db/ldap/users/myuser Example request to assign the myuser user to the queryRole role: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authentication/db/ldap/users/myuser/roles/queryRole Expected: Example request to add the LDAP user myuser to Druid: curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/users/myuser Example request to assign the myuser user to the queryRole role curl -i -v -H "Content-Type: application/json" -u internal -X POST http://localhost:8081/druid-ext/basic-security/authorization/db/ldapauth/users/myuser/roles/queryRole	2023-02-23 13:55:06 -08:00
Win Min Soe	70f9052f1d	docs: update correct config base on server spec (#13832 ) Co-authored-by: Winn Minn <winn.minn@grabtaxi.com>	2023-02-23 08:50:47 -08:00
Abhishek Radhakrishnan	17a3cd0b68	Remove the additional backtick that's causing a SA issue. (#13838 )	2023-02-23 09:01:08 +05:30
benkrug	66034dd8bc	Update default for finalize in query-context.md (#13763 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-02-22 12:35:36 -08:00
Katya Macedo	1595653e6f	docs: add a link for the Druid SQL tutorial (#13468 ) * docs: add juptyer API tutorial for API and jupyter tutorial index (#3) (cherry picked from commit aeb8d9e3390fa26d9c533dce0862295b80c58583) * update prereqs and fix jupyterlab name * Removing notebook since 13345 has it 13345 should be merged first * update contributing instructions * docs: link to the Druid SQL tutorial * Add link to partitioning * fix merge conflict * Saving * Update docs/tutorials/tutorial-jupyter-index.md * Remove partitioning --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: brian.le <brian.le@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-22 09:36:13 -08:00
317brian	07883e311e	doc: fix unnecessary link (#13785 ) CI errors look unrelated to this change.	2023-02-21 17:34:46 -08:00
zachjsh	665dee43bf	Revert "Operator conversion deny list (#13766 )" (#13829 ) This reverts commit `38e620aa4c`.	2023-02-21 15:14:49 -08:00
Paul Rogers	5dadbdf4d0	Generate the IT docker-compose.yaml files (#13669 ) Generate IT docker-compose.sh files Generates test-specific docker-compose.sh files using a simple Python template script.	2023-02-21 15:03:02 -08:00
benkrug	c6b1576fc1	Update clean-metadata-store.md (#13131 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-02-21 12:53:54 -08:00
Paul Rogers	85d36be085	Information schema now uses numeric column types (#13777 ) Change to use SQL schemas to allow null numeric columns * Updated docs	2023-02-17 14:39:31 -08:00
Katya Macedo	bc8b710b7e	Fix broken link (#13767 )	2023-02-17 09:02:12 -08:00
Churro	c1f283fd31	Better sidecar support (#13655 ) * Better sidecar support * remove un-thrown exception from test * Druid you are such a stickler about spelling :) * Only require the primaryContainerName, no need to exclude containers	2023-02-14 10:56:15 +05:30
Guy ☀️ Moore	306997be87	Add Perl 5 to druid requirements (#13708 ) Without perl 5 I was unable to start druid using the instructions in the quickstart guide. I'm not certain what versions it might require, but the one that I got working was perl 5 > This is perl 5, version 36, subversion 0 (v5.36.0) built for x86_64-linux-thread-multi	2023-02-13 13:34:49 -08:00
zachjsh	38e620aa4c	Operator conversion deny list (#13766 ) ### Description This change adds a new config property `druid.sql.planner.operatorConversion.denyList`, which allows a user to specify any operator conversions that they wish to disallow. A user may want to do this for a number of reasons, including security concerns. The default value of this property is the empty list `[]`, which does not disallow any operator conversions. An example usage of this property is `druid.sql.planner.operatorConversion.denyList=["extern"]`, which disallows the usage of the `extern` operator conversion. If the property is configured this way, and a user of the Druid cluster tries to submit a query that uses the `extern` function, such as the example given [here](https://druid.apache.org/docs/latest/multi-stage-query/examples.html#insert-with-no-rollup), a response with http response code `400` is returned with en error body similar to the following: ``` { "taskId": "4ec5b0b6-fa9b-4c3a-827d-2308294e9985", "state": "FAILED", "error": { "error": "Plan validation failed", "errorMessage": "org.apache.calcite.runtime.CalciteContextException: From line 28, column 5 to line 32, column 5: No match found for function signature EXTERN(<CHARACTER>, <CHARACTER>, <CHARACTER>)", "errorClass": "org.apache.calcite.tools.ValidationException", "host": null } } ```	2023-02-10 09:59:26 -08:00
Anshu Makkar	d7b95988d7	Add missing documentation for constant post-aggregator (#13664 ) Thanks @anshu-makkar , I was waiting for CI to complete yesterday. Failures seem unrelated, so merging.	2023-02-09 08:53:45 -08:00
Suneet Saldanha	714ac07b52	Allow users to add additional metadata to ingestion metrics (#13760 ) * Allow users to add additional metadata to ingestion metrics When submitting an ingestion spec, users may pass a map of metadata in the ingestion spec config that will be added to ingestion metrics. This will make it possible for operators to tag metrics with other metadata that doesn't necessarily line up with the existing tags like taskId. Druid clusters that ingest these metrics can take advantage of the nested data columns feature to process this additional metadata. * rename to tags * docs * tests * fix test * make code cov happy * checkstyle	2023-02-08 18:07:23 -08:00
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
AmatyaAvadhanula	dcdae84888	Add server view initialization metrics (#13716 ) * Add server view init metrics * Test coverage * Rename metrics	2023-02-07 20:02:00 +05:30
Suneet Saldanha	bea18dc9e4	Update basic auth examples (#13750 )	2023-02-03 14:45:48 -08:00
drudi-at-coffee	7580248770	Update api.md (#13727 ) Added missing '/status' in HTTP status request	2023-02-02 10:43:22 -08:00
Victoria Lim	33efd5ab1d	docs: Refresh the update data tutorial (#13641 ) Merging regardless of nit since topic is in better shape. * refresh the update data tutorial * Apply suggestions from code review Co-authored-by: Jill Osborne <jill.osborne@imply.io> --------- Co-authored-by: Jill Osborne <jill.osborne@imply.io>	2023-02-01 18:18:16 -08:00
Kashif Faraz	f629643c50	Fix value of lookup sync period in docs (#13695 ) * Fix lookup docs * Fix spelling * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-02-01 18:12:00 -08:00
Sergio Ferragut	7f830b20d7	fixed init commands for both mysql and postgresql (#13713 )	2023-02-01 18:07:31 -08:00
Suneet Saldanha	cfc3115a59	Compaction history returns empty list instead of 404 when not found (#13730 ) * Compaction history returns empty list instead of 404 when not found * checkstyle	2023-02-01 17:44:07 -08:00
Tijo Thomas	1beef30bb2	Support postaggregation function as in Math.pow() (#13703 ) (#13704 ) Support postaggregation function as in Math.pow()	2023-01-31 22:55:04 +05:30
Adarsh Sanjeev	51dfde0284	Add maxInputBytesPerWorker as query context parameter (#13707 ) * Add maxInputBytesPerWorker as query context parameter * Move documenation to msq specific docs * Update tests * Spacing * Address review comments * Fix test * Update docs/multi-stage-query/reference.md * Correct spelling mistake --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2023-01-31 20:55:28 +05:30
Jill Osborne	356b0e37cf	Tutorial: Query view (#13565 ) * Tutorial: Query view * Removed duplicate file * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Update tutorial-sql-query-view.md * Updated after review * Update docs/tutorials/tutorial-sql-query-view.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update tutorial-sql-query-view.md Update title * Update sidebars.json fix merge conflict w/ sidebar * address spelling ci --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-27 14:29:43 -08:00
sairam devarashetty	6164c420a1	Create update.md (#13451 ) * Create update.md Important Line highlighted * Update docs/data-management/update.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-25 16:23:40 -08:00
317brian	9021161c8c	doc: fix markdown spacing (#13683 ) * doc: fix markdown spacing * fix spacing	2023-01-25 16:22:49 -08:00
Victoria Lim	00cee329bd	pitfall when using combining input source (#13639 )	2023-01-25 12:50:19 -08:00
Suneet Saldanha	016c881795	Add API to return automatic compaction config history (#13699 ) Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config. The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.	2023-01-23 13:23:45 -08:00
Rohan Garg	f76acccff2	Allow using composed storage for SuperSorter intermediate data (#13368 )	2023-01-24 01:02:03 +05:30
Eyal Yurman	44374f91bc	Fix broken links to Oracle JDK docs (#13687 ) * Fix broken link for SSLContext java doc * Update tls-support.md * Update tls-support.md * Update tls-support.md * Update simple-client-sslcontext.md	2023-01-18 14:46:08 +05:30
Paul Rogers	22630b0aab	Much improved table functions (#13627 ) Much improved table functions * Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure external table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters * Much new documentation	2023-01-17 08:41:57 -08:00
Gian Merlino	182c4fad29	Kinesis: More robust default fetch settings. (#13539 ) * Kinesis: More robust default fetch settings. 1) Default recordsPerFetch and recordBufferSize based on available memory rather than using hardcoded numbers. For this, we need an estimate of record size. Use 10 KB for regular records and 1 MB for aggregated records. With 1 GB heaps, 2 processors per task, and nonaggregated records, recordBufferSize comes out to the same as the old default (10000), and recordsPerFetch comes out slightly lower (1250 instead of 4000). 2) Default maxRecordsPerPoll based on whether records are aggregated or not (100 if not aggregated, 1 if aggregated). Prior default was 100. 3) Default fetchThreads based on processors divided by task count on Indexers, rather than overall processor count. 4) Additionally clean up the serialized JSON a bit by adding various JsonInclude annotations. * Updates for tests. * Additional important verify.	2023-01-13 11:03:54 +05:30
Vadim Ogievetsky	93dc01b6c5	fix broken table missing new line (#13666 )	2023-01-12 15:29:51 -08:00
Vadim Ogievetsky	f97bcc69d3	Docs: reword single server page (#13659 ) * reword single server page * fix typo * Update docs/operations/single-server.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * spelling Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-01-11 21:12:52 -08:00
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
Abhishek Agarwal	17936e2920	Add an option to enable HSTS in druid services (#13489 ) * Add an option to enable HSTS * Fix code and add docs * Deduplicate headers * unused import * Fix spelling	2023-01-10 22:31:51 +05:30
Victoria Lim	a800dae87a	doc: List Protobuf as a supported format (#13640 )	2023-01-06 15:09:37 -08:00
317brian	6bbf4266b2	docs: documentation for unnest datasource (#13479 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-01-06 11:41:11 -08:00
Kashif Faraz	0d97e658b2	Docs: Update quickstart instructions (#13611 ) Changes: - Remove specification of a Druid version in the quickstart, because the previous step instructs downloading the latest version anyway. - Mention usage of memory parameter in the quickstart	2022-12-22 11:51:08 +05:30
Vadim Ogievetsky	07597c687d	Docs: Remove large data file (#13595 )	2022-12-19 13:14:22 +05:30
Gian Merlino	ee890965f4	LocalInputSource: Serialize File paths without forcing resolution. (#13534 ) * LocalInputSource: Serialize File paths without forcing resolution. Fixes #13359. * Add one more javadoc.	2022-12-19 11:47:36 +05:30
Victoria Lim	09d8b16447	Document shouldFinalize for sketches that have the parameter (#13524 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-17 10:48:06 -08:00
317brian	d9c27d6102	docs: add index page and related stuff for jupyter tutorials (#13342 )	2022-12-16 13:33:50 -08:00
Gian Merlino	7f3c117e3a	SQL: Improve docs around casts. (#13466 ) Main change: clarify that the "default value" for casts only applies if druid.generic.useDefaultValueForNull = true. Secondary change: adjust a bunch of wording from future to present tense.	2022-12-15 15:01:40 -08:00
Kashif Faraz	d6949b1b79	Track input processedBytes with MSQ ingestion (#13559 ) Follow up to #13520 Bytes processed are currently tracked for intermediate stages in MSQ ingestion. This patch adds the capability to track the bytes processed by an MSQ controller task while reading from an external input source or a segment source. Changes: - Track `processedBytes` for every `InputSource` read in `ExternalInputSliceReader` - Update `ChannelCounters` with the above obtained `processedBytes` when incrementing the input file count. - Update task report structure in docs The total input processed bytes can be obtained by summing the `processedBytes` as follows: totalBytes = 0 for every root stage (i.e. a stage which does not have another stage as an input): for every worker in that stage: for every input channel: (i.e. channels with prefix "input", e.g. "input0", "input1", etc.) totalBytes += processedBytes	2022-12-16 02:20:01 +05:30
Adarsh Sanjeev	2b605aa9cf	Multiple fixes for the MSQ stats merging piece which (#13463 ) * Add validation checks to worker chat handler apis * Merge things and polishing the error messages. * Minor error message change * Fixing race and adding some tests * Fixing controller fetching stats from wrong workers. Fixing race Changing default mode to Parallel Adding logging. Fixing exceptions not propagated properly. * Changing to kernel worker count * Added a better logic to figure out assigned worker for a stage. * Nits * Moving to existing kernel methods * Adding more coverage Co-authored-by: cryptoe <karankumar1100@gmail.com>	2022-12-15 09:35:11 +05:30
Vadim Ogievetsky	2729e25295	Link to java docs (#13478 ) * add link to page about selecting a JRE * add link to script also * simplify text	2022-12-14 11:45:23 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
Gian Merlino	91ef9872ec	MSQ: Improve TooManyBuckets error message, improve error docs. (#13525 ) 1) Edited the TooManyBuckets error message to mention PARTITIONED BY instead of segmentGranularity. 2) Added error-code-specific anchors in the docs. 3) Add information to various error codes in the docs about common causes and solutions.	2022-12-08 13:18:26 -08:00
Jill Osborne	b56855b837	Update to native ingestion doc (#13482 ) * Update to native ingestion doc * Update docs/ingestion/native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update native-batch.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-12-07 15:08:19 +05:30
Vadim Ogievetsky	9679f6a9b5	Web console: add arrayOfDoublesSketch and other small fixes (#13486 ) * add padding and keywords * add arrayOfDoubles * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/datasketches-tuple.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * partiton int * fix docs Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-12-06 21:21:49 -08:00
Kashif Faraz	c7229fc787	Limit max batch size for segment allocation, add docs (#13503 ) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`	2022-12-07 10:07:14 +05:30
Gian Merlino	fda0a1aadd	Set chatAsync default to true. (#13491 ) This functionality was originally added in #13354.	2022-12-05 20:53:59 -08:00
Kashif Faraz	65945a686f	Docs: Update docs for coordinator dynamic config (#13494 ) * Update docs for useBatchedSegmentSampler * Update docs for round robin assigment	2022-12-05 16:53:10 +05:30
TSFenwick	10bec54acc	Switching emitter. This will allow for a per feed emitter designation. (#13363 ) * Switching emitter. This will allow for a per feed emitter designation. This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed. The emitter can direct the event to a default emitter. * fix checkstyle issues and make docs for switching emitter use basic event feeds * fix broken docs, add test, and guard against misconfigurations * add module test add switching emitter module test * fix broken SwitchingEmitterModuleTest * add apache license to top of test * fix checkstyle issues * address comments by adding javadocs, removing a todo, and making druid docs more clear	2022-12-05 16:04:34 +05:30
Katya Macedo	78c1a2bd66	Remove limit from timeseries (#13457 ) CI build failures seem unrelated to docs	2022-12-02 12:19:59 -08:00
Jill Osborne	138a6de507	Update nested columns docs (#13461 ) * Update nested columns docs (cherry picked from commit `04206c5179`) * Update nested-columns.md (cherry picked from commit `8085ee7217`)	2022-12-01 10:47:32 -08:00
317brian	cc2e4a80ff	doc: add a basic JDBC tutorial (#13343 ) * initial commit for jdbc tutorial (cherry picked from commit 04c4adad71e5436b76c3425fe369df03aaaf0acb) * add commentary * address comments from charles * add query context to example * fix typo * add links * Apply suggestions from code review Co-authored-by: Frank Chen <frankchen@apache.org> * fix datatype * address feedback * add parameterize to spelling file. the past tense version was already there Co-authored-by: Frank Chen <frankchen@apache.org>	2022-11-30 16:25:35 -08:00
Jill Osborne	291ded22d5	Update experimental features doc (#13452 )	2022-11-30 16:14:43 +05:30
Jill Osborne	5c520e0cf9	Update LDAP configuration docs (#13245 ) * Update LDAP configuration docs * Updated after review * Update auth-ldap.md Updated. * Update auth-ldap.md * Updated spelling file * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update auth-ldap.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-29 09:26:32 -08:00
Jill Osborne	100a2aa4a2	Update and document experimental features (#13348 ) * Update and document experimental features * Updated * Update experimental-features.md * Update docs/development/experimental-features.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Updated after review * Updated * Update materialized-view.md * Update experimental-features.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-29 08:01:28 +05:30
Jill Osborne	db7c29c6f9	Correction to firehose migration doc (#13423 )	2022-11-28 10:24:27 +05:30
Adarsh Sanjeev	280a0f7158	Add sequential sketch merging to MSQ (#13205 ) * Add sketch fetching framework * Refactor code to support sequential merge * Update worker sketch fetcher * Refactor sketch fetcher * Refactor sketch fetcher * Add context parameter and threshold to trigger sequential merge * Fix test * Add integration test for non sequential merge * Address review comments * Address review comments * Address review comments * Resolve maxRetainedBytes * Add new classes * Renamed key statistics information class * Rename fetchStatisticsSnapshotForTimeChunk function * Address review comments * Address review comments * Update documentation and add comments * Resolve build issues * Resolve build issues * Change worker APIs to async * Address review comments * Resolve build issues * Add null time check * Update integration tests * Address review comments * Add log messages and comments * Resolve build issues * Add unit tests * Add unit tests * Fix timing issue in tests	2022-11-22 09:56:32 +05:30
Jill Osborne	68018a808f	Firehose migration doc (#12981 ) * Firehose migration doc * Update migrate-from-firehose-ingestion.md * Updated with review comments and suggestions * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md * Update migrate-from-firehose-ingestion.md	2022-11-21 11:17:12 -08:00
Gian Merlino	bfffbabb56	Async task client for SeekableStreamSupervisors. (#13354 ) Main changes: 1) Convert SeekableStreamIndexTaskClient to an interface, move old code to SeekableStreamIndexTaskClientSyncImpl, and add new implementation SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient. 2) Add "chatAsync" parameter to seekable stream supervisors that causes the supervisor to use an async task client. 3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making blocking RPC calls in workerExec threads. 4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList to FutureUtils.coalesce, so we can better capture the errors that occurred with contacting individual tasks. Other, related changes: 1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether ServiceClient retries unavailable services. Useful since we do not want to retry calls unavailable tasks within the service client. (The supervisor does its own higher-level retries.) 2) Add FutureUtils.transformAsync, a more lambda friendly version of Futures.transform(f, AsyncFunction). 3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but returns Either instead of using null on error. 4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.	2022-11-21 19:20:26 +05:30
Katya Macedo	fd239305d9	Update metrics doc (#13316 ) Changes: - used inline code-style to format dimension names - removed unnecessary punctuation	2022-11-21 09:43:52 +05:30
Jill Osborne	a860baf496	Updated docs on front coding (#13387 )	2022-11-19 00:01:04 -08:00
Laksh Singla	9e938b5a6f	Add a limit to the number of columns in the CLUSTERED BY clause (#13352 ) * Add clustered by limit * change semantics, add docs * add fault class to the module * add test * unambiguate test	2022-11-15 22:05:15 +05:30
Clint Wylie	1231ce3b75	dump-segment tool support for examining nested columns (#13356 ) * add nested mode to dump segment tool to dump nested columns * docs * more test * fix it	2022-11-14 16:08:47 -08:00
Jill Osborne	b0db2a87d8	Update Kafka ingestion tutorial (#13261 ) * Update Kafka ingestion tutorial * Update tutorial-kafka.md Updated location of sample data file * Added sample data file * Update tutorial-kafka.md * Add sample data file * Update tutorial-kafka.md Updated sample file location in curl commands * Update and reuploading sample data files * Updated spelling file * Delete .spelling * Added spelling file * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Updated after review * Update tutorial-kafka.md * Updated * Update tutorial-kafka.md * Update tutorial-kafka.md * Update tutorial-kafka.md * Updated sample data file and command * Add files via upload * Delete kttm-nested-data.json.tgz * Delete kttm-nested-data.json.tgz * Add files via upload * Update tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-11-11 14:47:54 -08:00
Jill Osborne	47dd4ed2e7	Added experimental feature text for front coding feature (#13349 )	2022-11-11 02:06:13 -08:00
Didip Kerabat	56d5c9780d	Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027 ) * Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects. Removed: import org.apache.commons.io.FilenameUtils; Add: import java.nio.file.FileSystems; import java.nio.file.PathMatcher; import java.nio.file.Paths; * Forgot to update CloudObjectInputSource as well. * Fix tests. * Removed unused exceptions. * Able to reduced user mistakes, by removing the protocol and the bucket on filter. * add 1 more test. * add comment on filterWithoutProtocolAndBucket * Fix lint issue. * Fix another lint issue. * Replace all mention of filter -> objectGlob per convo here: https://github.com/apache/druid/pull/13027#issuecomment-1266410707 * fix 1 bad constructor. * Fix the documentation. * Don’t do anything clever with the object path. * Remove unused imports. * Fix spelling error. * Fix incorrect search and replace. * Addressing Gian’s comment. * add filename on .spelling * Fix documentation. * fix documentation again Co-authored-by: Didip Kerabat <didip@apple.com>	2022-11-10 23:46:40 -08:00
Gian Merlino	77478f25fb	Add taskActionType dimension to task/action/run/time. (#13333 ) * Add taskActionType dimension to task/action/run/time. * Spelling.	2022-11-11 12:00:08 +05:30
Andreas Maechler	03175a2b8d	Add missing MSQ error code fields to docs (#13308 ) * Fix typo * Fix some spacing * Add missing fields * Cleanup table spacing * Remove durable storage docs again Thanks Brian for pointing out previous discussions. * Update docs/multi-stage-query/reference.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Mark codes as code * And even more codes as code * Another set of spaces * Combine `ColumnTypeNotSupported` Thanks Karan. * More whitespaces and typos * Add spelling and fix links Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-10 21:03:04 +05:30
Jill Osborne	c2210c4e09	Update ingestion spec doc (#13329 ) * Update ingestion spec doc * Updated * Updated * Update docs/ingestion/ingestion-spec.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Updated * Updated Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2022-11-10 02:54:35 -08:00
Jill Osborne	965e41538e	Update nested columns doc (#13314 ) * Updated nested columns doc * Update nested-columns.md * Update nested-columns.md	2022-11-10 09:53:28 +08:00
AmatyaAvadhanula	a2013e6566	Enhance streaming ingestion metrics (#13331 ) Changes: - Add a metric for partition-wise kafka/kinesis lag for streaming ingestion. - Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR} - Document metrics	2022-11-09 23:44:15 +05:30
Laksh Singla	b7a513fe09	Add a OverlordHelper that cleans up durable storage objects in MSQ (#13269 ) * scratch * s3 ls fix, add docs * add documentation, update method name * Add tests, address commits, change default value of the helper * fix test * update the default value of config, remove initial delay config * Trigger Build * update class * add more tests * docs update * spellcheck * remove ioe from the signature * add back dmmy constructor for initialization * fix guice bindings, intellij inspections	2022-11-09 17:23:35 +05:30
Kashif Faraz	ff8e0c3397	Fix issues with caching cost strategy (#13321 ) `cachingCost` strategy has some discrepancies when compared to cost strategy. This commit addresses two of these by retaining the same behaviour as the `cost` strategy when computing the cost of moving a segment to a server: - subtract the self cost of a segment if it is being served by the target server - subtract the cost of segments that are marked to be dropped Other changes: - Add tests to verify fixed strategy. These tests would fail without the fixes made to `CachingCostStrategy.computeCost()` - Fix the definition of the segment related metrics in the docs. - Fix some docs issues introduced in #13181	2022-11-08 16:11:39 +05:30
Tejaswini Bandlamudi	594545da55	Adds cluster level idleConfig setting for supervisor (#13311 ) * adds cluster level idleConfig * updates docs * refactoring * spelling nit * nit * nit * refactoring	2022-11-08 14:54:14 +05:30
Gian Merlino	48528a0c98	MSQ: Fix task lock checking during publish, fix lock priority. (#13282 ) * MSQ: Fix task lock checking during publish, fix lock priority. Fixes two issues: 1) ControllerImpl did not properly check the return value of SegmentTransactionalInsertAction when doing a REPLACE. This could cause it to not realize that its locks were preempted. 2) Task lock priority was the default of 0. It should be the higher batch default of 50. The low priority made it possible for MSQ tasks to be preempted by compaction tasks, which is not desired. * Restructuring, add docs. * Add performSegmentPublish tests. * Fix tests.	2022-11-08 09:27:34 +05:30
Jill Osborne	d1a4de022a	Update retention rules doc (#13181 ) * Update retention rules doc * Update rule-configuration.md * Updated * Updated * Updated * Updated * Update rule-configuration.md * Update rule-configuration.md	2022-11-07 14:47:33 -08:00
AmatyaAvadhanula	a738ac9ad7	Improve task pause logging and metrics for streaming ingestion (#13313 ) * Improve task pause logging and metrics for streaming ingestion * Add metrics doc * Fix spelling	2022-11-07 21:33:54 +05:30
AmatyaAvadhanula	47c32a9d92	Skip ALL granularity compaction (#13304 ) * Skip autocompaction for datasources with ETERNITY segments	2022-11-07 17:55:03 +05:30
Gian Merlino	227b57dd8e	Compaction: Fetch segments one at a time on main task; skip when possible. (#13280 ) * Compaction: Fetch segments one at a time on main task; skip when possible. Compact tasks include the ability to fetch existing segments and determine reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec. This is a useful feature that makes compact tasks work well even when the user running the compaction does not have a clear idea of what they want the compacted segments to be like. However, this comes at a cost: it takes time, and disk space, to do all of these fetches. This patch improves the situation in two ways: 1) When segments do need to be fetched, download them one at a time and delete them when we're done. This still takes time, but minimizes the required disk space. 2) Don't fetch segments on the main compact task when they aren't needed. If the user provides a full granularitySpec, dimensionsSpec, and metricsSpec, we can skip it. * Adjustments. * Changes from code review. * Fix logic for determining rollup.	2022-11-07 14:50:14 +05:30
Gian Merlino	9423aa9163	MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters. (#13274 ) * MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters. This consideration is important, because otherwise we can run out of memory due to large statistics-tracking objects. * Improved calculations.	2022-11-07 14:27:18 +05:30
Gian Merlino	8f90589ce5	Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247 ) * Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. These aggregation functions are documented as creating sketches. However, they are planned into native aggregators that include finalization logic to convert the sketch to a number of some sort. This creates an inconsistency: the functions sometimes return sketches, and sometimes return numbers, depending on where they lie in the native query plan. This patch changes these SQL aggregators to _never_ finalize, by using the "shouldFinalize" feature of the native aggregators. It already existed for theta sketches. This patch adds the feature for hll and quantiles sketches. As to impact, Druid finalizes aggregators in two cases: - When they appear in the outer level of a query (not a subquery). - When they are used as input to an expression or finalizing-field-access post-aggregator (not any other kind of post-aggregator). With this patch, the functions will no longer be finalized in these cases. The second item is not likely to matter much. The SQL functions all declare return type OTHER, which would be usable as an input to any other function that makes sense and that would be planned into an expression. So, the main effect of this patch is the first item. To provide backwards compatibility with anyone that was depending on the old behavior, the patch adds a "sqlFinalizeOuterSketches" query context parameter that restores the old behavior. Other changes: 1) Move various argument-checking logic from runtime to planning time in DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker. 2) Add various JsonIgnores to the sketches to simplify their JSON representations. 3) Allow chaining of ExpressionPostAggregators and other PostAggregators in the SQL layer. 4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer, now that expressions can operate on complex inputs. 5) Adjust return type to thetaSketch (instead of OTHER) in ThetaSketchSetBaseOperatorConversion. * Fix benchmark class. * Fix compilation error. * Fix ThetaSketchSqlAggregatorTest. * Hopefully fix ITAutoCompactionTest. * Adjustment to ITAutoCompactionTest.	2022-11-03 09:43:00 -07:00
Gian Merlino	d1877e41ec	Use lookup memory footprint in MSQ memory computations. (#13271 ) * Use lookup memory footprint in MSQ memory computations. Two main changes: 1) Add estimateHeapFootprint to LookupExtractor. 2) Use this in MSQ's IndexerWorkerContext when determining the total amount of available memory. It's taken off the top. This prevents MSQ tasks from running out of memory when there are lookups defined in the cluster. * Updates from code review.	2022-11-03 07:36:54 -07:00
317brian	ae638e338c	docs(msq): update insert vs replace for dimension-based segment pruning (#13228 ) * docs(msq): update insert vs replace to mention dimension-based segment pruning * make suggested changes	2022-11-03 14:17:44 +05:30
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
Jason Koch	0d03ce435f	introduce a "tree" type to the flattenSpec (#12177 ) * introduce a "tree" type to the flattenSpec * feedback - rename exprs to nodes, use CollectionsUtils.isNullOrEmpty for guard * feedback - expand docs to more clearly capture limitations of "tree" flattenSpec * feedback - fix for typo on docs * introduce a comment to explain defensive copy, tweak null handling * fix: part of rebase * mark ObjectFlatteners.FlattenerMaker as an ExtensionPoint and provide default for new tree type * fix: objectflattener restore previous behavior to call getRootField for root type * docs: ingestion/data-formats add note that ORC only supports path expressions * chore: linter remove unused import * fix: use correct newer form for empty DimensionsSpec in FlattenJSONBenchmark	2022-11-01 14:49:30 +08:00
Gian Merlino	d851985cf5	MSQ: Add support for indexSpec. (#13275 )	2022-10-28 14:27:50 -07:00
Adarsh Sanjeev	4775427e2c	Add task start status to worker report (#13263 ) * Add task start status to worker report * Address review comments * Address review comments * Update documentation * Update spelling checks	2022-10-28 12:00:15 +05:30
Tejaswini Bandlamudi	49e54a0ec6	Docs: Update inputSegmentSizeBytes description (#13266 )	2022-10-28 09:33:52 +05:30
Clint Wylie	77e4246598	add support for 'front coded' string dictionaries for smaller string columns (#12277 ) * add FrontCodedIndexed for delta string encoding * now for actual segments * fix indexOf * fixes and thread safety * add bucket size 4, which seems generally better * fixes * fixes maybe * update indexes to latest interfaces * utf8 support * adjust * oops * oops * refactor, better, faster * more test * fixes * revert * adjustments * fix prefixing * more chill * sql nested benchmark too * refactor * more comments and javadocs * better get * remove base class * fix * hot rod * adjust comments * faster still * minor adjustments * spatial index support * spotbugs * add isSorted to Indexed to strengthen indexOf contract if set, improve javadocs, add docs * fix docs * push into constructor * use base buffer instead of copy * oops	2022-10-25 18:05:38 -07:00
317brian	c83115e4e1	api: change API page formatting (#13213 ) Tracking additional improvements requested by @paul-rogers: #13239 * api: refactor page so that indented bullet is child and unindented portion is parent * get rid of post etc headings and combine them with the endpoint * Update docs/operations/api-reference.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix broken links * fix typo Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-10-18 13:22:26 -07:00
Paul Rogers	b34b4353f4	Async reads for JDBC (#13196 ) Async reads for JDBC: Prevents JDBC timeouts on long queries by returning empty batches when a batch fetch takes too long. Uses an async model to run the result fetch concurrently with JDBC requests. Fixed race condition in Druid's Avatica server-side handler Fixed issue with no-user connections	2022-10-18 11:40:57 -07:00
cristian-popa	cc10350870	Collocated processes instructions (#13224 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-17 11:56:00 -07:00
Gian Merlino	3bbb76f17b	Docs: Add query/cpu/time to real-time metrics. (#13229 )	2022-10-15 18:26:44 +05:30
arvindanugula	42384d85e7	Update nested-columns.md (#13227 ) typo error corrected.	2022-10-14 16:15:46 -07:00
Victoria Lim	02ad62a08c	Docs: update description of query priority default value (#13191 ) * update description of default for query priority * update order * update terms * standardize to query context parameters	2022-10-14 14:28:04 -07:00
Karan Kumar	9d51e466b1	Minor doc update for BroadcastTablesTooLarge (#13218 ) Minor doc update for `BroadcastTablesTooLarge`. Now the user will know what to do in case this fault is encountered.	2022-10-14 09:06:55 +05:30
Tejaswini Bandlamudi	3e13584e0e	Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144 ) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes	2022-10-12 18:31:08 +05:30
Jonathan Wei	9b8e69c99a	Add inline descriptor Protobuf bytes decoder (#13192 ) * Add inline descriptor Protobuf bytes decoder * PR comments * Update tests, check for IllegalArgumentException * Fix license, add equals test * Update extensions-core/protobuf-extensions/src/main/java/org/apache/druid/data/input/protobuf/InlineDescriptorProtobufBytesDecoder.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-10-11 13:37:28 -05:00
Charles Smith	25c1d55dd6	Clarify behavior when decommissioningMaxPercentOfMaxSegmentsToMove = 0 (#13157 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-10-07 09:01:32 -07:00
317brian	0edceead80	msq: update known issue about GROUPING SETS and COUNT DISTINCT (#13185 ) * msq: update known issue about GROUPING SETS and COUNT DISTINCT * address feedback from Gian	2022-10-05 19:47:03 -07:00
AmatyaAvadhanula	41e51b21c3	Make http options the default configurations (#13092 ) Druid currently uses Zookeeper dependent options as the default. This commit updates the following to use HTTP as the default instead. - task runner. `druid.indexer.runner.type=remote -> httpRemote` - load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http` - server inventory view. `druid.serverview.type=curator -> http`	2022-10-05 05:35:17 +05:30
Adarsh Sanjeev	92d2633ae6	Update ClusterByStatisticsCollectorImpl to use bytes instead of keys (#12998 ) * Update clusterByStatistics to use bytes instead of keys * Address review comments * Resolve checkstyle * Increase test coverage * Update test * Update thresholds * Update retained keys function * Update docs * Fix spelling	2022-10-03 12:08:23 +05:30
Jill Osborne	548d810baa	Correct nested columns example (#13150 )	2022-09-28 10:39:56 +05:30
David Palmer	0d7bf66578	Add a note to the documentation about pre-built HLLSketches (#13088 ) * add a note to the documentation about pre-built HLLSketches Druid actually supports ingesting a pre-generated sketch column by using the HLLSketchMerge aggregator. However, this functionality was previously not made clear in the documentation. * copyedit from the King's English to American English * add suggested style changes Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-09-27 10:29:39 +08:00
Apoorv Gupta	c8f4d72fb1	Fix documentation bug about injective lookups (#13147 ) replace mapping to `unique keys` with mapping to `unique values`.	2022-09-27 10:16:48 +08:00
Jonathan Wei	1f1fced6d4	Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089 ) * Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON * Fix serde and docs * Add PR comment check	2022-09-26 19:51:04 -05:00
Charles Smith	eb760c3d1d	update log4j example (#13095 ) * update log4j example * fix some style issues * Update docs/configuration/logging.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-22 09:46:49 +08:00
317brian	12f12a13a9	fix: fix broken postgres link (#13135 )	2022-09-22 09:46:20 +08:00
317brian	7fa35839c0	fix: follow naming convention for msq task engine (#13127 ) * fix: follow naming convention for msq task engine * more fixes * add back in experimental * fix anchor	2022-09-21 18:46:06 -07:00
Gian Merlino	2f731f356e	Update pull-deps docs with correct repo list. (#13134 ) There is only one default remote repo at this time.	2022-09-21 12:16:57 -07:00
Katya Macedo	90d14f629a	spatial-filters (#13124 )	2022-09-20 22:48:36 -07:00
hosswald	5ed5c83aab	Clarified the behaviour of SQL COUNT(DISTINCT dim) on multi-value dimensions (#13128 ) * Clarified the behaviour of COUNT(DISTINCT column) on multi-value columns * Update docs/querying/sql-aggregations.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Vadim Ogievetsky <vadimon@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-09-20 18:03:34 -07:00
Vadim Ogievetsky	edc444a4bc	fix quickstart (#13126 )	2022-09-20 17:44:21 -07:00
Vadim Ogievetsky	b9edfe34a4	be consistent about referring to the web console by its name (#13118 )	2022-09-19 15:02:17 -07:00
Vadim Ogievetsky	bb0b810b1d	fix html tags in docs (#13117 ) * fix html tags in docs * revert not null	2022-09-18 19:40:33 -07:00
Gian Merlino	d9b2968edb	Docs: Clarify the situation with SELECT. (#13109 )	2022-09-17 10:47:57 -07:00
Charles Smith	b366a6c5a4	Add clarification around docker environment #8926 (#13084 ) * Add clarification around docker environment #8926 * fix spelling * Update docs/tutorials/docker.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/tutorials/docker.md Co-authored-by: Frank Chen <frankchen@apache.org> * fix nano quickstart Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-17 20:44:24 +08:00
Gian Merlino	d4967c38f8	Various documentation updates. (#13107 ) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>	2022-09-16 21:58:11 -07:00
Vadim Ogievetsky	2493eb17bf	Doc fixes around msq (#13090 ) * remove things that do not apply * fix more things * pin node to a working version * fix * fixes * known issues tidy up * revert auto formatting changes * remove management-uis page which is 100% lies * don't mention the Coordinator console (that no longer exits) * goodies * fix typo	2022-09-16 02:15:26 -07:00
Katya Macedo	2218c8d23c	Documentation: Update spatial indexing example (#12555 ) * fix spatial indexing example * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update text and example * Format JSON example * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/development/geo.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Accept review suggestions Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-16 10:32:19 +08:00
Atul Mohan	a8fd3a9077	Provide service specific log4j overrides in containerized deployments (#13020 ) * Provide service specific log4j overrides * Clarify comments * Add docs	2022-09-14 11:47:11 +08:00
Benedict Jin	4bde50e683	Bump the version of Druid docker image from 0.16.0-incubating to latest (#13058 )	2022-09-10 14:06:00 +05:30
Vadim Ogievetsky	4fc43670e5	adjust docs and images (#13067 )	2022-09-10 14:05:19 +05:30
DENNIS	dced61645f	prometheus-emitter supports sending metrics to pushgateway regularly … (#13034 ) * prometheus-emitter supports sending metrics to pushgateway regularly and continuously * spell check fix * Optimization variable name and related documents * Update docs/development/extensions-contrib/prometheus.md OK, it looks more conspicuous Co-authored-by: Frank Chen <frankchen@apache.org> * Update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Frank Chen <frankchen@apache.org> * When PrometheusEmitter is closed, close the scheduler * Ensure that registeredMetrics is thread safe. * Local variable name optimization * Remove unnecessary white space characters Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 20:46:14 +08:00
sachidananda007	48c99054d0	Update tutorial-kafka.md (#13056 ) * Update tutorial-kafka.md Added missing command to the doc for zookeeper before starting kafka * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 10:06:19 +08:00
Frank Chen	d57557d51d	Improve doc and configuration of prometheus emitter (#13028 ) * Improve doc and validation * Add configuration for peon tasks * Update doc * Update test case * Fix typo * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-09-09 02:20:34 +08:00
Rohan Garg	7aa8d7f987	Add query/time metric for SQL queries from router (#12867 ) * Add query/time metric for SQL queries from router * Fix query cancel bug when user has overriden native query-id in a SQL query	2022-09-07 13:54:46 +05:30

... 2 3 4 5 6 ...

2951 Commits