druid

Commit Graph

Author	SHA1	Message	Date
Peter Marshall	5167d328b1	Docs - query caching (#11584 ) * Update caching.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900 Update caching.md A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300 * Update caching.md Typos * Amendments on the segment cache Significant updates on content around the segment cache, pull process, and in-memory cache * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update basic-cluster-tuning.md typo * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Whole-query caching update Made more succinct and removed specific config to change. * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-18 17:00:21 +08:00
Charles Smith	408b46ae9f	Fixes a small typo in ingestion spec doc (#12143 ) * small typo * Update docs/ingestion/ingestion-spec.md Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2022-04-18 16:53:50 +08:00
Peter Marshall	1201c9b2e5	Docs - added another common config property to tuningConfig (#11935 ) * Update ingestion-spec.md Added indexSpecForIntermediatePersists as a common configuration property. * Update ingestion-spec.md Amended to remove "below" and add link to the table. * Update ingestion-spec.md Removed passive.	2022-04-18 13:41:39 +08:00
Alexandre BERTHIOT	9f2b37f250	Update tutorial-compaction.md to change an unclear statement (#11988 ) * Update tutorial-compaction.md Unclear statement on the explanation of tuningConfig section. * Update docs/tutorials/tutorial-compaction.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-04-18 13:25:09 +08:00
Maytas Monsereenusorn	5d37d9f9d8	Add docs to metric spec for auto compaction (#12415 ) * add docs * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update index.md * Update docs/configuration/index.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-13 13:27:00 -07:00
Katya Macedo	f24e9c6862	Add Kinesis ListShards permission (#12387 ) * add Kinesis permission * List Kinesis IAM permissions * Adopt review suggestions * Fix merge conflicts	2022-04-13 15:29:56 +05:30
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
mark-imply	bf96ddf5ba	Update index.md (#12390 ) Added guidance on when to increase druid.indexer.storage.recentlyFinishedThreshold.	2022-04-08 18:01:54 +05:30
mark-imply	d98cbd90f0	Update basic-cluster-tuning.md (#12412 ) Changed "Other useful JVM flags" to "Other generally useful JVM flags" in order to align with the introduction to the doc.	2022-04-08 15:29:55 +05:30
317brian	d82a8185d1	fix(docs): clarify what s3 permissions are needed based on the access management type (#12405 ) * fix(docs): clarify what s3 permissions are needed based on the permissions model * fix typo * Update docs/development/extensions-core/s3.md Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-04-07 16:22:56 -07:00
Victoria Lim	e6229b76a6	Document data format and example for featureSpec (#12394 ) * add data format and example for featureSpec * add second feature in example * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-06 15:17:15 -07:00
317brian	ac6c24793e	docs(fix): add clarity around granularitySpec (#12362 ) * fix: add clarify around granularitySpec * fix spacing * Update docs/ingestion/compaction.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-04-06 09:24:37 -07:00
Victoria Lim	d326c681c1	Document config for ingesting null columns (#12389 ) * config for ingesting null columns * add link * edit .spelling * what happens if storeEmptyColumns is disabled	2022-04-05 09:15:42 -07:00
AmatyaAvadhanula	067254b778	Package kinesis client jar within the extension (#12370 ) amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension. This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities. Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.	2022-04-04 21:31:18 +05:30
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
AmatyaAvadhanula	c5531be553	Add feature flag for Kinesis listShards API usage (#12383 ) listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161. However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html). A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.	2022-04-04 14:58:10 +05:30
somu-imply	a1ea658115	Introducing a new config to ignore nulls while computing String Cardinality (#12345 ) * Counting nulls in String cardinality with a config * Adding tests for the new config * Wrapping the vectorize part to allow backward compatibility * Adding different tests, cleaning the code and putting the check at the proper position, handling hasRow() and hasValue() changes * Updating testcase and code * Adding null handling test to improve coverage * Checkstyle fix * Adding 1 more change in docs * Making docs clearer	2022-03-29 14:31:36 -07:00
Peter Marshall	f1841c6444	Docs - S3 masking and nav update to S3 page (#11490 ) * Docs: Masking S3 creds and some rewording Knowledge transfer from https://groups.google.com/g/druid-user/c/FydcpFrA688 * Removed bold in one of the quote sections * Update s3.md * Update s3.md Quick grammar change * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update s3.md Typo * Update docs/development/extensions-core/s3.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update s3.md Active lang * Update s3.md LAng nit * Update native-batch.md LAng nit * Update docs/ingestion/native-batch.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Grammar tidy-up and link fix Corrected 2 x links to old page H2s, resolved the question around precedence, and some other grammatical changes. * Update docs/development/extensions-core/s3.md * Update s3.md Removed an Erroneous E Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-29 09:13:05 -07:00
Peter Marshall	b9a968e7ff	Docs – expressions link back and timestamp hint (#11674 ) * Update math-expr.md Link back to transformSpec * Update ingestion-spec.md Moved info about using the timestamp inside transforms into the actual timestamp section. * Update ingestion-spec.md Active language.	2022-03-29 09:12:30 -07:00
mark-imply	3c55565398	Update ingestion-spec.md (#12371 ) * Update ingestion-spec.md Added best practice point to dimensions description. * Update docs/ingestion/ingestion-spec.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-29 09:12:02 -07:00
Victoria Lim	9ed7aa33ec	Docs for request logging (#12363 ) * add docs for request logging * remove stray character * Update docs/operations/request-logging.md Co-authored-by: TSFenwick <tsfenwick@gmail.com> * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: TSFenwick <tsfenwick@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-03-28 14:09:41 -07:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
Frank Chen	d745d0b338	Add JDK 11 (#12333 )	2022-03-16 15:03:04 -07:00
Dr. Sizzles	69f928f50e	Adding k8s support for human readable parsing (#12316 ) * Adding k8s support for human readable parsing * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/configuration/human-readable-byte.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update core/src/main/java/org/apache/druid/java/util/common/HumanReadableBytes.java Co-authored-by: Frank Chen <frankchen@apache.org> * Changes per review Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-03-16 11:18:47 +08:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	875e0696e0	GroupBy: Cap dictionary-building selector memory usage. (#12309 ) * GroupBy: Cap dictionary-building selector memory usage. New context parameter "maxSelectorDictionarySize" controls when the per-segment processing code should return early and trigger a trip to the merge buffer. Includes: - Vectorized and nonvectorized implementations. - Adjustments to GroupByQueryRunnerTest to exercise this code in the v2SmallDictionary suite. (Both the selector dictionary and the merging dictionary will be small in that suite.) - Tests for the new config parameter. * Fix issues from tests. * Add "pre-existing" to dictionary. * Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods. * Adjustments from review comments.	2022-03-08 13:13:11 -08:00
Victoria Lim	903174de20	correct errors on compaction doc (#12308 )	2022-03-04 15:33:35 -08:00
Gian Merlino	3b373114dc	Officially support Java 11. (#12232 ) There aren't any changes in this patch that improve Java 11 compatibility; these changes have already been done separately. This patch merely updates documentation and explicit Java version checks. The log message adjustments in DruidProcessingConfig are there to make things a little nicer when running in Java 11, where we can't measure direct memory _directly_, and so we may auto-size processing buffers incorrectly.	2022-03-04 14:15:45 -08:00
Sandeep	61e1ffc7f7	add a new query laning metrics to visualize lane assignment (#12111 ) * add a new query laning metrics to visualize lane assignment * fixes :spotbugs check * Update docs/operations/metrics.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2022-03-04 15:21:17 +08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Karan Kumar	b94390ba33	Adding Shared Access resource support for azure (#12266 ) Azure Blob storage has multiple modes of authentication. One of them is Shared access resource . This is very useful in cases when we do not want to add the account key in the druid properties .	2022-02-22 18:27:43 +05:30
Maytas Monsereenusorn	6e2eded277	Allow coordinator run auto compaction duty period to be configured separately from other indexing duties (#12263 ) * add impl * add impl * add unit tests * add impl * add impl * add serde test * add tests * add docs * fix test * fix test * fix docs * fix docs * fix spelling	2022-02-18 23:02:57 -08:00
Karan Kumar	5794331eb1	Adding new config for disabling group by on multiValue column (#12253 ) As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key. Added a config groupByEnableMultiValueUnnesting which can be set in the query context. The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior. If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.	2022-02-16 20:53:26 +05:30
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
AmatyaAvadhanula	393e9b68a8	Add config to limit task slots for parallel indexing tasks (#12221 ) In extreme cases where many parallel indexing jobs are submitted together, it is possible that the `ParallelIndexSupervisorTasks` take up all slots leaving no slot to schedule their own sub-tasks thus stalling progress of all the indexing jobs. Key changes: - Add config `druid.indexer.runner.parallelIndexTaskSlotRatio` to limit the task slots for `ParallelIndexSupervisorTasks` per worker - `ratio = 1` implies supervisor tasks can use all slots on a worker if needed (default behavior) - `ratio = 0` implies supervisor tasks can not use any slot on a worker (actually, at least 1 slot is always available to ensure progress of parallel indexing jobs) - `ImmutableWorkerInfo.canRunTask()` - `WorkerHolder`, `ZkWorker`, `WorkerSelectUtils`	2022-02-15 23:15:09 +05:30
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Victoria Lim	24716bfedc	Doc updates for metadata cleanup and storage (#12190 ) * doc updates for metadata storage/cleanup * Add comments for disabling cleanup * Apply suggestions from code review * updated for https://github.com/apache/druid/pull/12201 * Apply suggestions from code review Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org> * move retention period line earlier; more concise text * fix typo Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2022-01-27 11:40:54 -08:00
Maytas Monsereenusorn	fac6a48a8f	add impl (#12201 )	2022-01-27 11:39:59 -08:00
Suneet Saldanha	2b32d86f3b	Enable automatic metdata cleanup by default (#12188 )	2022-01-24 20:04:17 -08:00
Victoria Lim	d2ac146365	Docs for cluster tiering to improve query concurrency (#12128 ) * add new doc * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * reorder query laning properties * rename doc * new name in doc header * organize material into "service tiering" section * text edits and update sidebars.json * update query laning * how queries get assigned to lanes * add more details to intro; use more consistent terminology * more content * Apply suggestions from code review Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/operations/mixed-workloads.md * Apply suggestions from code review Co-authored-by: Charles Smith <techdocsmith@gmail.com> * typo Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-01-15 12:22:08 +08:00
Jonathan Wei	74c876e578	Throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder (#12080 ) * Add option to throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder * Remove option	2022-01-13 12:36:51 -06:00
Clint Wylie	f2ce76966c	add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous (#12145 ) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better	2022-01-12 03:48:53 -08:00
Vadim Ogievetsky	2299eb321e	Standardizing SQL function docs (#12091 ) * fix typos in SQL function docs * more code * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * a few more expr, fixes * more fixes * quote TIME_SHIFT * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * undo header change Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-01-06 23:57:03 -08:00
Jihoon Son	4a74c5adcc	Use Druid's extension loading for integration test instead of maven (#12095 ) * Use Druid's extension loading for integration test instead of maven * fix maven command * override config path * load input format extensions and kafka by default; add prepopulated-data group * all docker-composes are overridable * fix s3 configs * override config for all * fix docker_compose_args * fix security tests * turn off debug logs for overlord api calls * clean up stuff * revert docker-compose.yml * fix override config for query error test; fix circular dependency in docker compose * add back some dependencies in docker compose * new maven profile for integration test * example file filter	2022-01-05 23:33:04 -08:00
Victoria Lim	6846622080	Docs: add FILTER to sql query syntax (#12093 ) * docs: add FILTER to sql query syntax * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * move and update FILTER section	2022-01-05 12:59:41 -08:00
somu-imply	c267b65f97	Removing unused processing threadpool on broker (#12070 ) * Thread pool for broker * Updating two tests to improve coverage for new method added * Updating druidProcessingConfigTest to cover coverage * Adding missed spelling errors caused in doc * Adding test to cover lines of new function added	2021-12-21 13:07:53 -08:00
Victoria Lim	acbeae23b8	New doc for troubleshooting query execution (#12075 ) * new doc for troubleshooting query execution * add doc to sidebar * Apply suggestions from code review	2021-12-16 17:34:34 -08:00
Karan Kumar	377edff042	Ingestion metrics doc fix (#12066 ) * Ingestion metrics doc fix. * Fixing typo * Adding missed keywords in ignore list	2021-12-15 12:51:53 +05:30
Victoria Lim	4ede3bbff6	Docs updates (#12069 ) * minor updates to docs * remove en.json	2021-12-14 14:38:18 -08:00
Victoria Lim	e77bdfa70d	Document query context parameters related to join filters (#12057 ) * docs update for query context and filters * updates from review * Update docs/querying/filters.md	2021-12-13 17:47:21 -08:00
Lucas Capistrant	761fe9f144	Add new metric that quantifies how long batch ingest jobs waited for segment availability and whether or not that wait was successful (#12002 ) * add a unit test that tests that new metric is emitted * remove unused import * clarify in doc that this is for batch tasks * fix IndexTaskTest	2021-12-10 11:40:52 -06:00
Frank Chen	58245b4617	Support JsonPath functions in JsonPath expressions (#11722 ) * Add jsonPath functions support * Add jsonPath function test for Avro * Add jsonPath function length() to Orc * Add jsonPath function length() to Parquet * Add more tests to ORC format * update doc * Fix exception during ingestion * Add IT test case * Revert "Fix exception during ingestion" This reverts commit `5a5484b9ea`. * update IT test case * Add 'keys()' * Commit IT test case * Fix UT	2021-12-10 10:53:23 +08:00
shallada	25c9eba2f7	clarify time format for intervals (#12035 )	2021-12-08 08:31:21 -08:00
Lucas Capistrant	150902b95c	clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960 ) * clean up the balancing code around the batched vs deprecated way of sampling segments to balance * fix docs, clarify comments, add deprecated annotations to legacy code * remove unused variable * update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated * fix dynamic config text for percentOfSegmentsToConsiderPerMove * run prettier to cleanup coordinator-dynamic-config.tsx changes * update jest snapshot * update documentation per review feedback	2021-12-07 14:47:46 -08:00
Clint Wylie	a8815f671e	Fix druid client timeout zero (#12023 ) * fix bug where queries fail immediately when timeout is 0 instead of using default timeout * fix to use serverside max * more better * less flaky test * oops	2021-12-07 12:41:01 -08:00
Peter Marshall	0b3f0bbbd8	Docs - Metrics docs layout and info about query/bytes (#11481 ) * Metrics docs layout and info about query/bytes Knowledge transfer from https://groups.google.com/g/druid-user/c/8fiflmSEoTQ - updated the layout of the Metrics part, adding links between docs pages. Update index.md Amended typo * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/metrics.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Feedback applied Http --> HTTP and moved content / removed > * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/configuration/index.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:45:24 -08:00
Peter Marshall	c209db3a1d	Docs - roll-up tip (#11677 ) * Update rollup.md Added SE tip around roll-up. * Update docs/ingestion/rollup.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-12-07 09:17:36 -08:00
Peter Marshall	d7463c99e9	Docs - Task ref logs correction (#11746 ) * Update tasks.md Removed confusing backreference * Update tasks.md Changed silly grammar.	2021-12-07 09:15:19 -08:00
Jihoon Son	fc9513b6cd	Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012 ) * Make nodeRole available during binding; add support for dynamic registration of DruidService * fix checkstyle and test * fix customRole test * address comments * add more javadoc	2021-12-03 11:59:00 -08:00
jacobtolar	f7f5505631	Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865 ) * Update docs - Kinesis InputFormat ingestion * Add avro_ocf to list of supported Kafka InputFormats * Remove extra whitespace. * Update kafka-supervisor-reference.md * Delete extra whitespace.	2021-12-03 07:57:26 -08:00
Frank Chen	4631a66723	Support rolling log files (#10147 ) * apply log file rolling strategy * fix doc Signed-off-by: frank chen <frank.chen021@outlook.com> * Use absolute log path and allow spaces in log path * Update log4j2 configuration * apply FileAppender to ZooKeeper * DO NOT redirect application's console log to file in supervisor	2021-12-03 21:32:01 +08:00
Charles Smith	7ed46800c3	Docs: Add multi-dimension partitioning doc; refactor native batch and separate into smaller topics. (#11983 ) Adds documentation for multi-dimension partitioning. cc: @kfaraz Refactors the native batch partitioning topic as follows: Native batch ingestion covers parallel-index Native batch simple task indexing covers index Native batch input sources covers ioSource Native batch ingestion with firehose covers deprecated firehose	2021-12-03 16:37:14 +05:30
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00
benkrug	11746b8536	Update datasketches-hll.md (#12010 ) under "Aggregators", about the lgK setting, it said "Must be a power of 2 from 4 to 21 inclusively." 21 is not a power of 2, nor is 12, the given default. I think there may have been confusion because lgK represents log2 of K. We could say "K must be a power of 2...", or just say lgK must be between 4 and 21.	2021-11-30 18:52:00 -08:00
Charles Smith	f536f31229	clarify avro support & general style improvements (#11975 ) * clarify avro support & general style improvements * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update docs/development/extensions-core/avro.md Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update avro.md remove redundancy Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-28 16:10:18 +08:00
Laksh Singla	c381cae51b	Improve the output of SQL explain message (#11908 ) Currently, when we try to do EXPLAIN PLAN FOR, it returns the structure of the SQL parsed (via Calcite's internal planner util), which is verbose (since it tries to explain about the nodes in the SQL, instead of the Druid Query), and not representative of the native Druid query which will get executed on the broker side. This PR aims to change the format when user tries to EXPLAIN PLAN FOR for queries which are executed by converting them into Druid's native queries (i.e. not sys schemas).	2021-11-25 21:08:33 +05:30
Rohan Garg	2c08055962	Specify time column for first/last aggregators (#11949 ) Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.	2021-11-25 09:44:14 +05:30
Maytas Monsereenusorn	bb3d2a433a	Support filtering data in Auto Compaction (#11922 ) * add impl * fix checkstyle * add test * add test * add unit tests * fix unit tests * fix unit tests * fix unit tests * add IT * add IT * add comments * fix spelling	2021-11-24 10:56:38 -08:00
Kashif Faraz	6607e4cc75	Docs: Remove reference to deprecated field `targetPartitionSize` (#11974 ) * Remove reference to deprecated field `targetPartitionSize` * Fix spelling of LeaderLatch	2021-11-23 15:32:16 +05:30
Peter Marshall	ed0606db69	Docs - Corrected admonition issue (#11926 ) * Corrected admonition issue * Update data-formats.md Removed all admonition bits, and took out sf linebreaks. * Update data-formats.md Changed the shocker line into something a little more practical.	2021-11-22 12:14:30 -08:00
Katya Macedo	706d057ccc	corrected leaderlatch name (#11966 )	2021-11-22 11:58:42 -08:00
jacobtolar	0a9a908031	Add inline native query example to tutorial (#11642 ) * Add inline native query example to tutorial Minor change to the tutorial that adds an example of a native HTTP query request body, and adds a link to the more detailed "native query over HTTP" documentation. * cleanup * Apply suggestions from code review. Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: sthetland <steve.hetland@imply.io>	2021-11-22 21:35:05 +08:00
Peter Marshall	0c0001579d	Update compaction.md (#11937 ) Removed superfluous tabs that caused issues in rendering Added nav to the `inputSpec`	2021-11-22 21:33:47 +08:00
jacobtolar	3aee5d9ec3	Fix: invalid JSON in ingestion spec doc example (#11880 ) * Fix: invalid JSON in ingestion spec doc example * Update ingestion-spec.md	2021-11-22 21:33:26 +08:00
Nikhil Navadiya	3c51136098	Add worker category dimension (#11554 ) * Add worker category as dimension in TaskSlotCountStatsMonitor * Change description * Add workerConfig as field * Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics * Fixing tests * Fixing alerts * Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs * Resolving false positive spell check * addressing comments * throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>	2021-11-18 22:59:07 -08:00
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
TSFenwick	1487f558b1	Use a simple class to sanitize JDBC exceptions and also log them (#11843 ) * Use a simple class to sanitize sanitizable errors and log them The purpose of this is to sanitize JDBC errors, but can sanitize other errors if they implement SanitizableError Interface add a class to log errors and sanitize them added a simple test that tests out that the error gets sanitized add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy * return less information as part of too many connections, and instead only log specific details This is so an end user gets relevant information but not too much info since they might now how many brokers they have * return only runtime exceptions added new error types that need to be sanitized also sanitize deprecated and unsupported exceptions. * dont reqrewite exceptions unless necessary for checked exceptions add docs avoid blanket turning all exceptions into runtime exceptions * address comments, to fix up docs. add more javadocs add support UOE sanitization * use try catch instead and sanitize at public methods * checkstyle fixes * throw noSuchStatement and NoSuchConnection as Avatica is affected by those * address comments. move log error back to druid meta clean up bad formatting and commented code. add missed catch for NoSuchStatementException clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions * alter test to reflect new error message	2021-11-16 13:13:03 -08:00
sthetland	02b578a3dd	Fixing a few typos and style issues (#11883 ) * grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-16 10:13:35 -08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Jihoon Son	f91868602d	Remove stale warning for HTTP inputSource (#11907 )	2021-11-13 10:27:14 +08:00
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
Maytas Monsereenusorn	a36a41da73	Support routing data through an HTTP proxy (#11891 ) * Support routing data through an HTTP proxy * Support routing data through an HTTP proxy This adds the ability for the HttpClient to connect through an HTTP proxy. We augment the channel factory to check if it is supposed to be proxied and, if so, we connect to the proxy host first, issue a CONNECT command through to the final recipient host and then give the channel to the normal http client for usage. * add docs * address comments Co-authored-by: imply-cheddar <86940447+imply-cheddar@users.noreply.github.com>	2021-11-09 17:24:06 -08:00
Maytas Monsereenusorn	ddc68c6a81	Support changing dimension schema in Auto Compaction (#11874 ) * add impl * add unit tests * fix checkstyle * add impl * add impl * add impl * add impl * add impl * add impl * fix test * add IT * add IT * fix docs * add test * address comments * fix conflict	2021-11-08 21:17:08 -08:00
Jian Wang	8e7e679984	Add more metrics for Jetty server thread pool usage (#11113 ) Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.	2021-11-07 16:51:44 +05:30
zachjsh	1d6df48145	Warn if cache size of lookup is beyond max size (#11863 ) Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.	2021-11-03 21:32:22 -04:00
Kashif Faraz	a22687ecbe	Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732 ) The new config is an extension of the concept of "watchedTiers" where the Broker can choose to add the info of only the specified tiers to its timeline. Similarly, with this config, Broker can choose to skip the realtime nodes and thus it would query only Historical processes for any given segment.	2021-11-02 12:38:42 +05:30
Katya Macedo	5e1dc843d1	Fix quickstart link (#11864 )	2021-11-02 13:27:53 +08:00
Maytas Monsereenusorn	ba2874ee1f	Support changing query granularity in Auto Compaction (#11856 ) * add queryGranularity * fix checkstyle * fix test	2021-11-01 15:18:44 -07:00
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Maytas Monsereenusorn	33d9d9bd74	Add rollup config to auto and manual compaction (#11850 ) * add rollup to auto and manual compaction * add unit tests * add unit tests * add IT * fix checkstyle	2021-10-29 10:22:25 -07:00
Gian Merlino	fc95c92806	Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124 ) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions.	2021-10-26 08:05:56 -07:00
Sergio Ferragut	000a5551fa	docker mem reqs (#11827 ) * docker mem reqs * Update docs/tutorials/docker.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Sergio Ferragut <sergio.ferragut@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-25 12:23:25 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Kashif Faraz	abac9e39ed	Revert permission changes to Supervisor and Task APIs (#11819 ) * Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)" This reverts commit `f2d6100124`. * Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)" This reverts commit `6779c4652d`. * Fix docs for the reverted commits * Fix and restore deleted tests * Fix and restore SystemSchemaTest	2021-10-25 14:50:38 +05:30
Charles Smith	10c5fa93f1	remove dupe sentence (#11821 )	2021-10-25 14:48:20 +05:30

1 2 3 4 5 ...

2559 Commits