druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	35b876436b	remove native scan query legacy mode (#16659 )	2024-07-18 23:33:27 -07:00
Edgar Melendrez	721a65046f	docs: add examples for SQL functions (#16745 ) * updating first batch of numeric functions * First batch of functions * addressing first few comments * alphabetize list * draft with suggestions applied * minor discrepency expr -> <NUMERIC> * changed raises to calculates * Update docs/querying/sql-functions.md * switch to underscore * changed to exp(1) to match slack message * adding html text for trademark symbol to .spelling * fixed discrepancy between description and example --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-07-18 17:06:22 -07:00
Gian Merlino	dbed1b0f50	Defer more expressions in vectorized groupBy. (#16338 ) * Defer more expressions in vectorized groupBy. This patch adds a way for columns to provide GroupByVectorColumnSelectors, which controls how the groupBy engine operates on them. This mechanism is used by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector that uses the inputs of an expression as the grouping key. The actual expression evaluation is deferred until the grouped ResultRow is created. A new context parameter "deferExpressionDimensions" allows users to control when this deferred selector is used. The default is "fixedWidthNonNumeric", which is a behavioral change from the prior behavior. Users can get the prior behavior by setting this to "singleString". * Fix style. * Add deferExpressionDimensions to SqlExpressionBenchmark. * Fix style. * Fix inspections. * Add more testing. * Use valueOrDefault. * Compute exprKeyBytes a bit lighter-weight.	2024-06-26 17:28:36 -07:00
Victoria Lim	836cdb48a5	docs: Migration guide for MVDs to arrays (#16516 ) Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-06-13 13:05:58 -07:00
317brian	8e11adfc6f	docs: remove outdated druidversion var from a page (#16570 ) Co-authored-by: asdf2014 <asdf2014@apache.org>	2024-06-10 15:30:36 +08:00
Gian Merlino	b837ce565b	Simplify serialized form of JsonInputFormat. (#15691 ) * Simplify serialized form of JsonInputFormat. Use JsonInclude for keepNullColumns, assumeNewlineDelimited, and useJsonNodeReader. Because the default value of keepNullColumns is variable, we store the original configured value rather than the derived value, and include if the original value is nonnull. * Fix test.	2024-06-05 20:01:14 -07:00
Charles Smith	8f78c901e7	docs: add lookups to the sidebar (#16530 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-06-03 16:04:15 -07:00
Vadim Ogievetsky	a124c6cbbd	fix typo in extension name (#16466 )	2024-05-20 09:47:22 +08:00
Gian Merlino	72432c2e78	Speed up SQL IN using SCALAR_IN_ARRAY. (#16388 ) * Speed up SQL IN using SCALAR_IN_ARRAY. Main changes: 1) DruidSqlValidator now includes a rewrite of IN to SCALAR_IN_ARRAY, when the size of the IN is above inFunctionThreshold. The default value of inFunctionThreshold is 100. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 2) SearchOperatorConversion now generates SCALAR_IN_ARRAY when converting to a regular expression, when the size of the SEARCH is above inFunctionExprThreshold. The default value of inFunctionExprThreshold is 2. Users can restore the prior behavior by setting it to Integer.MAX_VALUE. 3) ReverseLookupRule generates SCALAR_IN_ARRAY if the set of reverse-looked-up values is greater than inFunctionThreshold. * Revert test. * Additional coverage. * Update docs/querying/sql-query-context.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * New test. --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-05-14 08:09:27 -07:00
Misha	b5958b6b07	Feature configurable calcite bloat (#16248 ) * Configurable bloat for calcite ProjectMergeRule implemented * Comment added * Default bloat value increased to 1000 * Implemented bloat configuration from QueryContext * Code refactored, docs updated --------- Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>	2024-05-06 20:43:39 +05:30
Gian Merlino	db82adcdfd	SCALAR_IN_ARRAY: Optimization and behavioral follow-ups. (#16311 ) * Four changes to scalar_in_array as follow-ups to #16306: 1) Align behavior for `null` scalars to the behavior of the native `in` and `inType` filters: return `true` if the array itself contains null, else return `null`. 2) Rename the class to more closely match the function name. 3) Add a specialization for constant arrays, where we build a `HashSet`. 4) Use `castForEqualityComparison` to properly handle cross-type comparisons. Additional tests verify comparisons between LONG and DOUBLE are now handled properly. * Fix spelling. * Adjustments from review.	2024-04-26 16:01:17 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
Charles Smith	1aa6808b9a	docs: add tutorial with examples of sql null handling (#16185 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-04-01 11:03:42 -07:00
Pranav	20de7fd95a	Geo spatial interfaces (#16029 ) This PR creates an interface for ImmutableRTree and moved the existing implementation to new class which represent 32 bit implementation (stores coordinate as floats). This PR makes the ImmutableRTree extendable to create higher precision implementation as well (64 bit). In all spatial bound filters, we accept float as input which might not be accurate in the case of high precision implementation of ImmutableRTree. This PR changed the bound filters to accepts the query bounds as double instead of float and it is backward compatible change as it compares double to existing float values in RTree. Previously it was comparing input float to RTree floats which can cause precision loss, now it is little better as it compares double to float which is still not 100% accurate. There are no changes in the way that we query spatial dimension today except input bound parsing. There is little improvement in string filter predicate which now parse double strings instead of float and compares double to double which is 100% accurate but string predicate is only called when we dont have spatial index. With allowing the interface to extend ImmutableRTree, we allow to create high precision (HP) implementation and defines new search strategies to perform HP search Iterable<ImmutableBitmap> search(ImmutableDoubleNode node, Bound bound); With possible HP implementations, Radius bound filter can not really focus on accuracy, it is calculating Euclidean distance in comparing. As EARTH 🌍 is round and not flat, Euclidean distances are not accurate in geo system. This PR adds new param called 'radiusUnit' which allows you to specify units like meters, km, miles etc. It uses https://en.wikipedia.org/wiki/Haversine_formula to check if given geo point falls inside circle or not. Added a test that generates set of points inside and outside in RadiusBoundTest.	2024-04-01 14:58:03 +05:30
Gian Merlino	256160aba6	MSQ: Validate that strings and string arrays are not mixed. (#15920 ) * MSQ: Validate that strings and string arrays are not mixed. When multi-value strings and string arrays coexist in the same column, it causes problems with "classic MVD" style queries such as: select * from wikipedia -- fails at runtime select count() from wikipedia where flags = 'B' -- fails at planning time select flags, count() from wikipedia group by 1 -- fails at runtime To avoid these problems, this patch adds type verification for INSERT and REPLACE. It is targeted: the only type changes that are blocked are string-to-array and array-to-string. There is also a way to exclude certain columns from the type checks, if the user really knows what they're doing. * Fixes. * Tests and docs and error messages. * More docs. * Adjustments. * Adjust message. * Fix tests. * Fix test in DV mode.	2024-03-13 15:37:27 -07:00
Charles Smith	3caacba8c5	update window functions doc (#15902 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-03-07 15:16:52 -08:00
Zoltan Haindrich	bf0995f846	Introduce dynamic table append (#15897 )	2024-03-01 04:31:57 -05:00
317brian	c98d54f3c4	docs: delete unused file that causes confusion (#15910 )	2024-02-14 16:42:02 -08:00
Clint Wylie	dad8398a4d	start process of deprecating non-sql compatible legacy configurations (#15713 ) Starting the process to officially deprecate non SQL compatible modes by updating docs to aggressively call out that Druids non SQL compliant modes are deprecated and will go away someday. There are no code or behavior changes at this PR.	2024-02-13 15:31:45 +05:30
Katya Macedo	0f29ece6a9	[Docs] Refactor streaming ingestion section (#15591 ) Merging the work so far. @ektravel , @vogievetsky if there are additional improvements, let's track them & make another pr. * Refactor streaming ingestion docs * Update property definition * Update after review * Update known issues * Move kinesis and kafka topics to ingestion, add redirects * Saving changes * Saving * Add input format text * Update after review * Minor text edit * Update example syntax * Revert back to colon * Fix merge conflicts * Fix broken links * Fix spelling error	2024-02-12 13:52:42 -08:00
Gian Merlino	7fea34abdd	LOOKUP docs: clarify behavior of replaceMissingValueWith. (#15879 ) Clarify behavior when expr is null.	2024-02-11 13:11:00 -08:00
317brian	2dc71c7874	docs: fix rendering (#15835 )	2024-02-06 07:18:43 -08:00
Gian Merlino	54b30646f3	Add sqlReverseLookupThreshold for ReverseLookupRule. (#15832 ) If lots of keys map to the same value, reversing a LOOKUP call can slow things down unacceptably. To protect against this, this patch introduces a parameter sqlReverseLookupThreshold representing the maximum size of an IN filter that will be created as part of lookup reversal. If inSubQueryThreshold is set to a smaller value than sqlReverseLookupThreshold, then inSubQueryThreshold will be used instead. This allows users to use that single parameter to control IN sizes if they wish.	2024-02-06 16:32:05 +05:30
Laksh Singla	7d65caf0c5	Update the docs for EARLIEST_BY/LATEST_BY aggregators with the newly added numeric capabilities (#15670 )	2024-02-01 10:24:43 +05:30
Katya Macedo	867c636629	Document pivot and unpivot operators (#15669 )	2024-01-25 09:53:39 -08:00
Pranav	45b30dc07d	Revert "Change default inSubQueryThreshold (#15336 )" (#15722 ) A low value of inSubQueryThreshold can cause queries with IN filter to plan as joins more commonly. However, some of these join queries may not get planned as IN filter on data nodes and causes significant perf regression.	2024-01-22 11:34:39 +05:30
Gian Merlino	cccf13ea82	Reverse, pull up lookups in the SQL planner. (#15626 ) * Reverse, pull up lookups in the SQL planner. Adds two new rules: 1) ReverseLookupRule, which eliminates calls to LOOKUP by doing reverse lookups. 2) AggregatePullUpLookupRule, which pulls up calls to LOOKUP above GROUP BY, when the lookup is injective. Adds configs `sqlReverseLookup` and `sqlPullUpLookup` to control whether these rules fire. Both are enabled by default. To minimize the chance of performance problems due to many keys mapping to the same value, ReverseLookupRule refrains from reversing a lookup if there are more keys than `inSubQueryThreshold`. The rationale for using this setting is that reversal works by generating an IN, and the `inSubQueryThreshold` describes the largest IN the user wants the planner to create. * Add additional line. * Style. * Remove commented-out lines. * Fix tests. * Add test. * Fix doc link. * Fix docs. * Add one more test. * Fix tests. * Logic, test updates. * - Make FilterDecomposeConcatRule more flexible. - Make CalciteRulesManager apply reduction rules til fixpoint. * Additional tests, simplify code.	2024-01-12 00:06:31 -08:00
Victoria Lim	52313c51ac	docs: Anchor link checker (#15624 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2024-01-08 15:19:05 -08:00
Gian Merlino	01eec4a55e	New handling for COALESCE, SEARCH, and filter optimization. (#15609 ) * New handling for COALESCE, SEARCH, and filter optimization. COALESCE is converted by Calcite's parser to CASE, which is largely counterproductive for us, because it ends up duplicating expressions. In the current code we end up un-doing it in our CaseOperatorConversion. This patch has a different approach: 1) Add CaseToCoalesceRule to convert CASE back to COALESCE earlier, before the Volcano planner runs, using CaseToCoalesceRule. 2) Add FilterDecomposeCoalesceRule to decompose calls like "f(COALESCE(x, y))" into "(x IS NOT NULL AND f(x)) OR (x IS NULL AND f(y))". This helps use indexes when available on x and y. 3) Add CoalesceLookupRule to push COALESCE into the third arg of LOOKUP. 4) Add a native "coalesce" function so we can convert 3+ arg COALESCE. The advantage of this approach is that by un-doing the CASE to COALESCE conversion earlier, we have flexibility to do more stuff with COALESCE (like decomposition and pushing into LOOKUP). SEARCH is an operator used internally by Calcite to represent matching an argument against some set of ranges. This patch improves our handling of SEARCH in two ways: 1) Expand NOT points (point "holes" in the range set) from SEARCH as `!(a \|\| b)` rather than `!a && !b`, which makes it possible to convert them to a "not" of "in" filter later. 2) Generate those nice conversions for NOT points even if the SEARCH is not composed of 100% NOT points. Without this change, a SEARCH for "x NOT IN ('a', 'b') AND x < 'm'" would get converted like "x < 'a' OR (x > 'a' AND x < 'b') OR (x > 'b' AND x < 'm')". One of the steps we take when generating Druid queries from Calcite plans is to optimize native filters. This patch improves this step: 1) Extract common ANDed predicates in ConvertSelectorsToIns, so we can convert "(a && x = 'b') \|\| (a && x = 'c')" into "a && x IN ('b', 'c')". 2) Speed up CombineAndSimplifyBounds and ConvertSelectorsToIns on ORs with lots of children by adjusting the logic to avoid calling "indexOf" and "remove" on an ArrayList. 3) Refactor ConvertSelectorsToIns to reduce duplicated code between the handling for "selector" and "equals" filters. * Not so final. * Fixes. * Fix test. * Fix test.	2024-01-03 08:56:22 -08:00
Clint Wylie	e55f6b6202	remove search auto strategy, estimateSelectivity of BitmapColumnIndex (#15550 ) * remove search auto strategy, estimateSelectivity of BitmapColumnIndex * more cleanup	2023-12-13 16:30:01 -08:00
Clint Wylie	e8fcf2cac8	minor doc adjustments (#15531 )	2023-12-11 18:22:44 -08:00
Katya Macedo	fc222377ae	[Docs] Document decode_base64_complex and decode_base64_utf8 functions (#15444 )	2023-12-11 09:12:06 -08:00
Katya Macedo	355c800108	Revamp design page (#15486 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-12-08 11:40:24 -08:00
Clint Wylie	e64b92eb35	add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521 )	2023-12-08 05:28:46 -08:00
sb89594	5fda8613ad	Feature: Add IPv6 Match Function (#15212 )	2023-12-07 23:09:06 -08:00
Charles Smith	db3a633250	update timeseries to reflect NULL filling (#15512 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-12-07 14:41:27 -08:00
Clint Wylie	82ac48786b	document arrayContainsElement filter (#15455 )	2023-12-07 00:14:00 -08:00
Pranav	74ab6024e1	Native doc update (#15456 ) Updating the native docs for #15434	2023-11-30 10:37:23 +05:30
Pranav	93cd638645	Enabling aggregateMultipleValues in all StringAnyAggregators (#15434 ) * Enabling aggregateMultipleValues in all StringAnyAggregators * Adding more tests * More validation * fix warning * updating asserts in decoupled mode * fix intellij inspection * Addressing comments * Addressing comments * Adding early validations and make aggregate consistent across all * fixing tests * fixing tests * Update docs/querying/sql-aggregations.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * fixing static check --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-11-29 14:32:49 -08:00
Abhishek Agarwal	0a56c87e93	SQL: Plan non-equijoin conditions as cross join followed by filter (#15302 ) This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.	2023-11-29 13:46:11 +05:30
Zoltan Haindrich	eb056e23b5	Fix dictionarySize overrides in tests (#15354 ) I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here	2023-11-28 18:49:09 +05:30
Charles Smith	a929b9f16e	clafiry DISTINCT is optional for COUNT() (#15394 )	2023-11-28 16:52:16 +05:30
Clint Wylie	a95c22ce70	support non-constant expressions for path arguments for json_value and json_query (#15320 ) * support dynamic expressions for path arguments for json_value and json_query	2023-11-17 01:12:05 -08:00
Charles Smith	6a5da5a05e	fix redirect for api docs and misc array-related typos (#15387 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-16 13:29:19 -08:00
Karan Kumar	857b8de425	Query from deep storage doc fixes. (#15382 ) Fixing outdated query from deep storage docs.	2023-11-16 14:05:20 +05:30
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
Charles Smith	0403e48266	window functions docs (#14739 ) * draft window functions * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address comments * remove default column * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * fix ntile * remove default header column * code tics to remove spelling errors * add known issues, add SUM example * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address spelling * remove extra chars * add to sidebar, fix admonition * Update sql-window-functions.md accept suggestion, change admonition style * update sidebar * Delete Untitled.ipynb rm unwanted file * Update docs/querying/sql-window-functions.md * Update docs/querying/sql-window-functions.md * update context param, accept suggestions * accept suggestions * Apply suggestions from code review * Fix known issues * require GROUP BY, explain order of operation * accept suggestions * fix spelling --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-06 11:34:42 -08:00
Karan Kumar	5036af6fb3	Doc fixes for query from deep storage and MSQ (#15313 ) Minor updates to the documentation. Added prerequisites. Removed a known issue in MSQ since its no longer valid. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-03 10:52:20 +05:30
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00

1 2 3 4 5 ...

310 Commits