druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
Laksh Singla	95bf331c08	Rename the default setting of 'maxSubqueryBytes' from 'unlimited' to 'disabled' (#15108 ) The default setting of 'maxSubqueryBytes' is renamed from 'unlimited' to 'disabled'.	2023-10-10 02:03:29 +05:30
Katya Macedo	269137c682	Update Ingestion section (#14023 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2023-05-19 09:42:27 -07:00
benkrug	66034dd8bc	Update default for finalize in query-context.md (#13763 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-02-22 12:35:36 -08:00
Victoria Lim	02ad62a08c	Docs: update description of query priority default value (#13191 ) * update description of default for query priority * update order * update terms * standardize to query context parameters	2022-10-14 14:28:04 -07:00
Rohan Garg	5394838030	Enable conversion of join to filter by default (#12868 )	2022-08-13 20:37:43 +05:30
Gian Merlino	5b6727f319	Enable vectorized virtual column processing by default. (#12520 ) In the majority of cases, this improves performance. There's only one case I'm aware of where this may be a net negative: for time_floor(__time, <period>) where there are many repeated __time values. In nonvectorized processing, SingleLongInputCachingExpressionColumnValueSelector implements an optimization to avoid computing the time_floor function on every row. There is no such optimization in vectorized processing. IMO, we shouldn't mention this in the docs. Rationale: It's too fiddly of a thing: it's not guaranteed that nonvectorized processing will be faster due to the optimization, because it would have to overcome the inherent speed advantage of vectorization. So it'd always require testing to determine the best setting for a specific dataset. It would be bad if users disabled vectorization thinking it would speed up their queries, and it actually slowed them down. And even if users do their own testing, at some point in the future we'll implement the optimization for vectorized processing too, and it's likely that users that explicitly disabled vectorization will continue to have it disabled. I'd like to avoid this outcome by encouraging all users to enable vectorization at all times. Really advanced users would be following development activity anyway, and can read this issue	2022-05-16 15:43:53 +05:30
Gian Merlino	ff253fd8a3	Add setProcessingThreadNames context parameter. (#12514 ) setting thread names takes a measurable amount of time in the case where segment scans are very quick. In high-QPS testing we found a slight performance boost from turning off processing thread renaming. This option makes that possible.	2022-05-16 13:42:00 +05:30
Victoria Lim	f95447070e	updated docs for sql query context (#12406 )	2022-04-21 11:19:39 -07:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Victoria Lim	e77bdfa70d	Document query context parameters related to join filters (#12057 ) * docs update for query context and filters * updates from review * Update docs/querying/filters.md	2021-12-13 17:47:21 -08:00
Clint Wylie	a8815f671e	Fix druid client timeout zero (#12023 ) * fix bug where queries fail immediately when timeout is 0 instead of using default timeout * fix to use serverside max * more better * less flaky test * oops	2021-12-07 12:41:01 -08:00
Maytas Monsereenusorn	3257913737	Improve query error logging (#11519 ) * Improve query error logging * add docs * address comments * address comments	2021-08-05 22:51:09 +07:00
Kashif Faraz	8a4e27f51d	Select broker based on query context parameter `brokerService` (#11495 ) This change allows the selection of a specific broker service (or broker tier) by the Router. The newly added ManualTieredBrokerSelectorStrategy works as follows: Check for the parameter brokerService in the query context. If this is a valid broker service, use it. Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it. Move on to the next strategy	2021-07-27 20:56:05 +05:30
Charles Smith	a1ed3a407d	clarify bySegment is native only (#11331 )	2021-06-11 13:48:17 -07:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Maytas Monsereenusorn	9be5039f68	Enable query vectorization by default (#10065 ) * Enable query vectorization by default * update docs	2020-06-24 13:08:49 -07:00
Clint Wylie	e677c62484	document useFilterCNF query context parameter (#9647 ) * document useFilterCNF query context parameter * move context key to QueryContexts * Update .spelling	2020-04-16 22:12:20 -07:00
Gian Merlino	42590ae64b	Refresh query docs. (#9704 ) * Refresh query docs. Larger changes: - New doc: querying/datasource.md describes the various kinds of datasources you can use, and has examples for both SQL and native. - New doc: querying/query-execution.md describes how native queries are executed at a high level. It doesn't go into the details of specific query engines or how queries run at a per-segment level. But I think it would be good to add or link that content here in the future. - Refreshed doc: querying/sql.md updated to refer to joins, reformatted a bit, added a new "Query translation" section that explains how queries are translated from SQL to native, and removed configuration details (moved to configuration/index.md). - Refreshed doc: querying/joins.md updated to refer to join datasources. Smaller changes: - Add helpful banners to the top of query documentation pages telling people whether a given page describes SQL, native, or both. - Add SQL metrics to operations/metrics.md. - Add some color and cross-links in various places. - Add native query component docs to the sidebar, and renamed them so they look nicer. - Remove Select query from the sidebar. - Fix Broker SQL configs in configuration/index.md. Remove them from querying/sql.md. - Combined querying/searchquery.md and querying/searchqueryspec.md. * Updates. * Fix numbering. * Fix glitches. * Add new words to spellcheck file. * Assorted changes. * Further adjustments. * Add missing punctuation.	2020-04-15 16:12:20 -07:00
Clint Wylie	8b9fe6f584	query laning and load shedding (#9407 ) * prototype * merge QueryScheduler and QueryManager * everything in its right place * adjustments * docs * fixes * doc fixes * use resilience4j instead of semaphore * more tests * simplify * checkstyle * spelling * oops heh * remove unused * simplify * concurrency tests * add SqlResource tests, refactor error response * add json config tests * use LongAdder instead of AtomicLong * remove test only stuffs from scheduler * javadocs, etc * style * partial review stuffs * adjust * review stuffs * more javadoc * error response documentation * spelling * preserve user specified lane for NoSchedulingStrategy * more test, why not * doc adjustment * style * missed review for make a thing a constant * fixes and tests * fix test * Update docs/configuration/index.md Co-Authored-By: sthetland <steve.hetland@imply.io> * doc update Co-authored-by: sthetland <steve.hetland@imply.io>	2020-03-10 02:57:16 -07:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Clint Wylie	7aafcf8bca	parallel broker merges on fork join pool (#8578 ) * sketch of broker parallel merges done in small batches on fork join pool * fix non-terminating sequences, auto compute parallelism * adjust benches * adjust benchmarks * now hella more faster, fixed dumb * fix * remove comments * log.info for debug * javadoc * safer block for sequence to yielder conversion * refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool * smooth yield rate adjustment, more logs to help tune * cleanup, less logs * error handling, bug fixes, on by default, more parallel, more tests * remove unused var * comments * timeboundary mergeFn * simplify, more javadoc * formatting * pushdown config * use nanos consistently, move logs back to debug level, bit more javadoc * static terminal result batch * javadoc for nullability of createMergeFn * cleanup * oops * fix race, add docs * spelling, remove todo, add unhandled exception log * cleanup, revert unintended change * another unintended change * review stuff * add ParallelMergeCombiningSequenceBenchmark, fixes * hyper-threading is the enemy * fix initial start delay, lol * parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2 * fix those important style issues with the benchmarks code * lazy sequence creation for benchmarks * more benchmark comments * stable sequence generation time * update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs * add jmh thread based benchmarks, cleanup some stuff * oops * style * add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose * retool benchmark to allow modeling more typical heterogenous heavy workloads * spelling * fix * refactor benchmarks * formatting * docs * add maxThreadStartDelay parameter to threaded benchmark * why does catch need to be on its own line but else doesnt	2019-11-07 11:58:46 -08:00
Jihoon Son	f5b9bf5525	Cluster-wide configuration for query vectorization (#8657 ) * Cluster-wide configuration for query vectorization * add doc * fix build * fix doc * rename to QueryConfig and add javadoc * fix checkstyle * fix variable names	2019-10-23 21:44:28 +08:00
Clint Wylie	8bda3afea4	fix spelling errors triggered by another doc PR (#8653 )	2019-10-08 23:43:58 -07:00
Vadim Ogievetsky	52f3f2c229	fix docs version interpolation (#8568 )	2019-09-22 17:38:55 -07:00
Gian Merlino	d007477742	Docusaurus build framework + ingestion doc refresh. (#8311 ) * Docusaurus build framework + ingestion doc refresh. * stick to npm instead of yarn * fix typos * restore some _bin * Adjustments. * detect and fix redirect anchors * update anchor lint * Web-console: remove specific column filters (#8343) * add clear filter * update tool kit * remove usless check * auto run * add % * Fix resource leak (#8337) * Fix resource leak * Patch comments * Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234) * Fixes from PR review. * Fix more anchors. * Preamble nix. * Fix more anchors, headers * clean up placeholder page * add to website lint to travis config * better broken link checking * travis fix * Fixed more broken links * better redirects * unfancy catch * fix LGTM error * link fixes * fix md issues * Addl fixes	2019-08-20 21:48:59 -07:00

38 Commits