druid

Commit Graph

Author	SHA1	Message	Date
Victoria Lim	63a993c33a	stringFirst and stringLast supported in ingestion (#12466 )	2022-04-22 10:28:49 +08:00
Victoria Lim	f95447070e	updated docs for sql query context (#12406 )	2022-04-21 11:19:39 -07:00
jacobtolar	0edc22179c	Document expression post-aggregators (#11896 ) * Document expression post-aggregators * Update docs/querying/post-aggregations.md Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-04-19 10:36:19 +08:00
Victoria Lim	c86c48203e	recommendation for comparing strings and numbers (#12442 )	2022-04-18 09:28:32 -07:00
Peter Marshall	5167d328b1	Docs - query caching (#11584 ) * Update caching.md Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1597781107153900 Update caching.md A few additional updates OTBO https://the-asf.slack.com/archives/CJ8D1JTB8/p1608669046041300 * Update caching.md Typos * Amendments on the segment cache Significant updates on content around the segment cache, pull process, and in-memory cache * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/operations/basic-cluster-tuning.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update basic-cluster-tuning.md typo * Update docs/querying/caching.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Whole-query caching update Made more succinct and removed specific config to change. * Update docs/design/historical.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-04-18 17:00:21 +08:00
Adarsh Sanjeev	ef45a1551e	Convert inQueryThreshold into query context parameter. (#12357 ) Added Calcites InQueryThreshold as a query context parameter. Setting this parameter appropriately reduces the time taken for queries with large number of values in their IN conditions.	2022-03-22 18:33:57 +05:30
Gian Merlino	875e0696e0	GroupBy: Cap dictionary-building selector memory usage. (#12309 ) * GroupBy: Cap dictionary-building selector memory usage. New context parameter "maxSelectorDictionarySize" controls when the per-segment processing code should return early and trigger a trip to the merge buffer. Includes: - Vectorized and nonvectorized implementations. - Adjustments to GroupByQueryRunnerTest to exercise this code in the v2SmallDictionary suite. (Both the selector dictionary and the merging dictionary will be small in that suite.) - Tests for the new config parameter. * Fix issues from tests. * Add "pre-existing" to dictionary. * Simplify GroupByColumnSelectorStrategy interface by removing one of the writeToKeyBuffer methods. * Adjustments from review comments.	2022-03-08 13:13:11 -08:00
Karan Kumar	5794331eb1	Adding new config for disabling group by on multiValue column (#12253 ) As part of #12078 one of the followup's was to have a specific config which does not allow accidental unnesting of multi value columns if such columns become part of the grouping key. Added a config groupByEnableMultiValueUnnesting which can be set in the query context. The default value of groupByEnableMultiValueUnnesting is true, therefore it does not change the current engine behavior. If groupByEnableMultiValueUnnesting is set to false, the query will fail if it encounters a multi-value column in the grouping key.	2022-02-16 20:53:26 +05:30
somu-imply	eae163a797	Moving in filter check to broker (#12195 ) * Moving in filter check to broker * Adding more unit tests, making error message meaningful * Spelling and doc changes * Updating default to -1 and making this feature hide by default. The number of IN filters can grow upto a max limit of 100 * Removing upper limit of 100, updated docs * Making documentation more meaningful * Moving check outside to PlannerConfig, updating test cases and adding back max limit * Updated with some additional code comments * Missed removing one line during the checkin * Addressing doc changes and one forbidden API correction * Final doc change * Adding a speling exception, correcting a testcase * Reading entire filter tree to address combinations of ANDs and ORs * Specifying in docs that, this case works only for ORs * Revert "Reading entire filter tree to address combinations of ANDs and ORs" This reverts commit `81ca8f8496`. * Covering a class cast exception and updating docs * Counting changed Co-authored-by: Jihoon Son <jihoonson@apache.org>	2022-02-15 20:45:07 -08:00
Victoria Lim	c61b19d443	Refactor SQL docs (#12239 ) * refactor and link fixes * add sql docs to left nav * code format for needle * updated web console script * link fixes * update earliest/latest functions * edits for grammar and style * more link fixes * another link * update with #12226 * update .spelling file	2022-02-11 14:43:30 -08:00
Clint Wylie	ae71e05fc5	array_concat_agg and array_agg support for array inputs (#12226 ) * array_concat_agg and array_agg support for array inputs changes: * added array_concat_agg to aggregate arrays into a single array * added array_agg support for array inputs to make nested array * added 'shouldAggregateNullInputs' and 'shouldCombineAggregateNullInputs' to fix a correctness issue with STRING_AGG and ARRAY_AGG when merging results, with dual purpose of being an optimization for aggregating * fix test * tie capabilities type to legacy mode flag about coercing arrays to strings * oops * better javadoc	2022-02-07 19:59:30 -08:00
Clint Wylie	f2ce76966c	add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures less ambiguous (#12145 ) * add EARLIEST_BY/LATEST_BY to make EARLIEST/LATEST function signatures unambiguous * switcheroo * EARLIEST_BY/LATEST_BY use timestamp instead of numeric types, update docs * revert unintended change * fix docs * fix docs better	2022-01-12 03:48:53 -08:00
Vadim Ogievetsky	2299eb321e	Standardizing SQL function docs (#12091 ) * fix typos in SQL function docs * more code * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * a few more expr, fixes * more fixes * quote TIME_SHIFT * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * undo header change Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-01-06 23:57:03 -08:00
Victoria Lim	6846622080	Docs: add FILTER to sql query syntax (#12093 ) * docs: add FILTER to sql query syntax * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * Update docs/querying/sql.md * move and update FILTER section	2022-01-05 12:59:41 -08:00
Victoria Lim	acbeae23b8	New doc for troubleshooting query execution (#12075 ) * new doc for troubleshooting query execution * add doc to sidebar * Apply suggestions from code review	2021-12-16 17:34:34 -08:00
Victoria Lim	4ede3bbff6	Docs updates (#12069 ) * minor updates to docs * remove en.json	2021-12-14 14:38:18 -08:00
Victoria Lim	e77bdfa70d	Document query context parameters related to join filters (#12057 ) * docs update for query context and filters * updates from review * Update docs/querying/filters.md	2021-12-13 17:47:21 -08:00
Clint Wylie	a8815f671e	Fix druid client timeout zero (#12023 ) * fix bug where queries fail immediately when timeout is 0 instead of using default timeout * fix to use serverside max * more better * less flaky test * oops	2021-12-07 12:41:01 -08:00
Rohan Garg	2c08055962	Specify time column for first/last aggregators (#11949 ) Add the ability to pass time column in first/last aggregator (and latest/earliest SQL functions). It is to support cases where the time to query upon is stored as a part of a column different than __time. Also, some other logical time column can be specified.	2021-11-25 09:44:14 +05:30
somu-imply	29710789a4	Adding safe divide function (#11904 ) * IMPLY-4344: Adding safe divide function along with testcases and documentation updates * Changing based on review comments * Addressing review comments, fixing coding style, docs and spelling * Checkstyle passes for all code * Fixing expected results for infinity * Revert "Fixing expected results for infinity" This reverts commit `5fd5cd480d`. * Updating test result and a space in docs	2021-11-17 08:22:41 -08:00
sthetland	02b578a3dd	Fixing a few typos and style issues (#11883 ) * grammar and format work * light writing touchup Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-16 10:13:35 -08:00
Gian Merlino	6f6e88e02e	SQL: Add type headers to response formats. (#11914 ) This allows clients to interpret the results of SQL queries without having to guess types.	2021-11-13 11:30:57 +05:30
Charles Smith	33a5cda061	Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912 ) * Splits Kafka topic according to function. Adds detailed example for kafka inputFormat * Apply suggestions from code review accept suggestions from review Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Apply suggestions from code review accept suggestions Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * accept suggestions * accept suggestions * final typos and clarifications * bringing forward some syntax fixes Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2021-11-12 13:02:23 -08:00
Clint Wylie	5baa22148e	revert ColumnAnalysis type, add typeSignature and use it for DruidSchema (#11895 ) * revert ColumnAnalysis type, add typeSignature and use it for DruidSchema * review stuffs * maybe null * better maybe null * Update docs/querying/segmentmetadataquery.md * Update docs/querying/segmentmetadataquery.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * fix null right * sad * oops * Update batch_hadoop_queries.json Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-11-10 18:46:29 -08:00
Gian Merlino	fc95c92806	Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124 ) * Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. This patch does the following: - Removes OffheapIncrementalIndex. - Clarifies that Aggregators are required to be thread safe. - Clarifies that BufferAggregators and VectorAggregators are not required to be thread safe. - Removes thread safety code from some DataSketches aggregators that had it. (Not all of them did, and that's OK, because it wasn't necessary anyway.) - Makes enabling "useOffheap" with groupBy v1 an error. Rationale for removing the offheap incremental index: - It is only used in one rare scenario: groupBy v1 (which is non-default) in "useOffheap" mode (also non-default). So you have to go pretty deep into the wilderness to get this code to activate in production. It is never used during ingestion. - Its existence complicates developer efforts to reason about how aggregators get used, because the way it uses buffer aggregators is so different from how every other query engine uses them. - It doesn't have meaningful testing. By the way, I do believe that the given way the offheap incremental index works, it actually didn't require buffer aggregators to be thread-safe. It synchronizes on "aggregate" and doesn't call "get" until it has stopped calling "aggregate". Nevertheless, this is a bother to think about, and for the above reasons I think it makes sense to remove the code anyway. * Remove things that are now unused. * Revert removal of getFloat, getLong, getDouble from BufferAggregator. * OAK-related warnings, suppressions. * Unused item suppressions.	2021-10-26 08:05:56 -07:00
Gian Merlino	8276c031c5	Add druid.sql.approxCountDistinct.function property. (#11181 ) * Add druid.sql.approxCountDistinct.function property. The new property allows admins to configure the implementation for APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode. The motivation for adding this setting is to enable site admins to switch the default HLL implementation to DataSketches. For example, an admin can set: druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL * Fixes * Fix tests. * Remove erroneous cannotVectorize. * Remove unused import. * Remove unused test imports.	2021-10-25 12:16:21 -07:00
Victoria Lim	43103632fb	Docs - add description on time origin (#11826 ) * add description on time origin * reorder parameter descriptions * add example of origin value	2021-10-22 14:57:13 -07:00
Victoria Lim	a31d99fb37	update docs with X-Druid-SQL-Query-Id (#11761 ) * update docs with X-Druid-SQL-Query-Id * review comments * update header description * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-10-06 00:15:05 +07:00
Peter Marshall	abd19a8896	Docs - SYS query examples (#11673 ) * Update sql.md Added two example queries and adjusted formatting of one that was already there * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update docs/querying/sql.md Co-authored-by: Frank Chen <frankchen@apache.org> * Update sql.md Co-authored-by: Frank Chen <frankchen@apache.org>	2021-09-17 08:27:34 -07:00
Clint Wylie	5e092ccb9b	add MV_FILTER_ONLY, MV_FILTER_NONE, ListFilteredVirtualColumn (#11650 ) * add MV_FILTER_ONLY SQL function, and list filter virtual column * MV_FILTER_NONE and more tests * formatting * o yeah, forgot can do easy thing * style * hmm why was that there * test filtering on virtual column * style * meh * do it right * good bot	2021-09-16 09:31:53 -07:00
Charles Smith	1ae1bbfc4f	docs: delete / cancel query (#11708 ) * draft delete query * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address comments * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update docs/querying/sql.md Co-authored-by: Jihoon Son <jihoonson@apache.org> * Update sql.md fix port for router * Update sql.md remove authorization until it is 403 * Update sql.md add 403 message Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2021-09-15 20:26:04 -07:00
Charles Smith	f9329fbf9e	add clarification for maxSubqueryRows (#11687 ) * add clarification for maxSubqueryRows	2021-09-13 11:49:30 -07:00
Peter Marshall	f16cd2a815	Docs - granularities link back to segmentGranularity (#11672 ) * Update granularities.md Link-back to the ingestion spec as well as Native queries plus examples. * Update docs/querying/granularities.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/granularities.md Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-09-10 10:40:11 -07:00
Jihoon Son	7e90d00cc0	Configurable maxStreamLength for doubles sketches (#11574 ) * Configurable maxStreamLength for doubles sketches * fix equals/hashcode and it test failure * fix test * fix it test * benchmark * doc * grouping key * fix comment * dependency check * Update docs/development/extensions-core/datasketches-quantiles.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Update docs/querying/sql.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-31 14:56:37 -07:00
Gian Merlino	ec6c6e2d53	Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. (#11549 ) * Docs: Clarify segmentMetadata cardinality, minmax, and size behavior. * Further clarifications. * Update docs/querying/segmentmetadataquery.md style update Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2021-08-26 15:39:40 -07:00
Charles Smith	66964a261b	fixes syntax for TRIM (#11619 ) * fixes syntax for TRIM * trim erroneous quote * fix typo	2021-08-23 11:44:19 -07:00
Charles Smith	91cd573472	fixes web console introduction and addresses linking issues (#11609 ) * fixes web console introduction and addresses linking issues * fix merge conflict	2021-08-18 08:37:05 -07:00
Gian Merlino	4e5f9cdacf	Add pushes to DataSketches in SQL docs. (#11578 ) * Add pushes to DataSketches in SQL docs. These notices were already in the native docs, but they were missing from the SQL docs. * Grammar fix.	2021-08-16 10:38:56 -07:00
frank chen	e40be0ae28	Add SQL functions to format numbers into human readable format (#10635 ) * add binary_byte_format/decimal_byte_format/decimal_format * clean code * fix doc * fix review comments * add spelling check rules * remove extra param * improve type handling and null handling * remove extra zeros * fix tests and add space between unit suffix and number as most size-format functions do * fix tests * add examples * change function names according to review comments * fix merge Signed-off-by: frank chen <frank.chen021@outlook.com> * no need to configure NullHandling explicitly for tests Signed-off-by: frank chen <frank.chen021@outlook.com> * fix tests in SQL-Compatible mode Signed-off-by: frank chen <frank.chen021@outlook.com> * Resolve review comments * Update SQL test case to check null handling * Fix intellij inspections * Add more examples * Fix example	2021-08-13 10:27:49 -07:00
Charles Smith	6524d838d7	Docs refactor of ingestion. Carries #11541 (#11576 ) * Docs refactor of ingestion. Carries #11541 * Update docs/misc/math-expr.md * add Apache license * fix header, add topics to sidebar * Update docs/ingestion/partitioning.md * pick up changes to and md from `c7fdf1d`, #11479 Co-authored-by: Suneet Saldanha <suneet@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-08-13 08:42:03 -07:00
Clint Wylie	9af7ba9d2a	STRING_AGG SQL aggregator function (#11241 ) * add string_agg * oops * style and fix test * spelling * fixup * review stuffs	2021-08-10 13:47:09 -07:00
Maytas Monsereenusorn	3257913737	Improve query error logging (#11519 ) * Improve query error logging * add docs * address comments * address comments	2021-08-05 22:51:09 +07:00
Yi Yuan	f1e52ab356	add doc (#11531 ) Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-08-03 12:20:29 +08:00
Victoria Lim	949484728f	docs fix for doubleMean description (#11513 ) * fix for doubleMean description * include quantile aggregator description from Suneet * update hyperlink to quantiles aggregator	2021-07-30 12:39:44 -07:00
Kashif Faraz	8a4e27f51d	Select broker based on query context parameter `brokerService` (#11495 ) This change allows the selection of a specific broker service (or broker tier) by the Router. The newly added ManualTieredBrokerSelectorStrategy works as follows: Check for the parameter brokerService in the query context. If this is a valid broker service, use it. Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it. Move on to the next strategy	2021-07-27 20:56:05 +05:30
Peter Marshall	973e5bf7d0	Docs - HLL lgK tip and slight layout change (#11482 ) * HLL lgK and a tip Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200. Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated. Also added a note about using K over 16 being pretty much pointless. * Corrected spelling * Create datasketches-hll.md Put roll-up back to rollup * Update docs/development/extensions-core/datasketches-hll.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2021-07-26 12:28:53 -07:00
sthetland	a366753ba5	Consolidate multi-value dimension doc and highlight configurability (#11428 ) * Clarify options for multi-value dims * Add first example	2021-07-15 10:19:10 -07:00
Clint Wylie	17efa6f556	add single input string expression dimension vector selector and better expression planning (#11213 ) * add single input string expression dimension vector selector and better expression planning * better * fixes * oops * rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing * oops * javadocs, renaming * more javadocs * benchmarks * use string expression vector processor with vector size 1 instead of expr.eval * better logging * javadocs, surprising number of the the * more * simplify	2021-07-06 11:20:49 -07:00
frank chen	906a704c55	Eliminate ambiguities of KB/MB/GB in the doc (#11333 ) * GB ---> GiB * suppress spelling check * MB --> MiB, KB --> KiB * Use IEC binary prefix * Add reference link * Fix doc style	2021-06-30 13:42:45 -07:00
Clint Wylie	df9b57aa1a	bitwise aggregators, better null handling options for expression agg (#11280 ) * bitwise aggregators, better nulls for expression agg * correct behavior * rework deserialize, better names * fix json, share mask	2021-06-25 16:51:16 -07:00
Clint Wylie	bfbd7ec432	fix a bugs related to SQL type inference return type nullability (#11327 ) * fix a bunch of type inference nullability bugs * fixes * style * fix test * fix concat	2021-06-15 12:26:59 -07:00
Charles Smith	a1ed3a407d	clarify bySegment is native only (#11331 )	2021-06-11 13:48:17 -07:00
Clint Wylie	3649c608d2	array handling improvements (#11233 ) * fix jdbc array handling, split handling for some array and multi value operator, split and add more tests * formatting	2021-05-13 18:50:32 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
benkrug	49c8307b72	Update datasource.md (#10864 ) * Update datasource.md Change "table" to "datasource" in join discussion: This means that all datasources other than the leftmost "base" table must fit in memory. According to docs on datasources, "datasource" is the more general term, and a table is a kind of datasource. In the context here, then, "datasource" is applicable. * left-hand table -> left-hand datasource Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-05-07 01:14:45 -07:00
Lasse Krogh Mammen	9be2a5cdc2	Add documentation re alphabetical sorted of MV dimensions (#10695 )	2021-05-07 01:12:32 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
imply-jbalik	6f7701e742	fixed array syntax (#11191 )	2021-05-03 21:38:16 -07:00
Gian Merlino	cb7c6ac314	Doc updates for union datasources. (#11103 ) The main one is updating datasources.md to talk about SQL. (It still said that table unions are not supported in SQL.) Also, this doc update adds some clarifying details on limitations.	2021-04-14 18:18:14 -07:00
sthetland	dd4c5f2a17	Update using-caching.md (#11069 )	2021-04-08 16:48:26 -05:00
sthetland	fb6751fa45	Fix old broken link (#11048 ) * link check fixes * updated link target * Update aggregations.md * spelling error	2021-04-07 20:40:50 -07:00
Cameron Teasdale	786207995e	add minimal documentation for expression filters (#11045 ) * add minimal documentation for expression filters * Update docs/querying/filters.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Update docs/querying/filters.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/filters.md Co-authored-by: Alejandro Lujan <andanthor@gmail.com> * Update docs/querying/filters.md Co-authored-by: Alejandro Lujan <andanthor@gmail.com> Co-authored-by: Clint Wylie <cjwylie@gmail.com> Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Alejandro Lujan <andanthor@gmail.com>	2021-04-07 16:58:28 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Lasse Krogh Mammen	782a1d4e6c	Add Calcite Avatica protobuf handler (#10543 )	2021-03-31 12:46:25 -07:00
benkrug	7f96ca8f5e	Update topnquery.md (#10944 ) minor edits of the English, no meanings changed (imo)	2021-03-09 15:19:02 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Charles Smith	0f81ce32a0	refactor query caching docs (#10848 ) * refactor query caching * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * add description for context link * accept suggestions * reword, rework some awkward language * incorporate feedback, fix errors * add back perf considerations * Apply suggestions from code review applying @suneet-s 's changes Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update caching.md fix link Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-03-08 22:25:48 -08:00
Atul Mohan	be2ac8d6ce	Document type inference issues with dynamic params in SQL (#10801 ) * Clarify docs * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-04 03:48:11 -08:00
Clint Wylie	cbbef80c7f	add SQL operators for bitwise expressions (#10823 ) * add SQL operators for bitwise expressions * more test * fix spelling * more tests	2021-02-18 20:56:33 -08:00
Jihoon Son	ac41e41232	Update doc for query errors and add unit tests for JsonParserIterator (#10833 ) * Update doc for query errors and add unit tests for JsonParserIterator * static constructor for convenience * rename method	2021-02-05 02:55:32 -08:00
Makdon	f9fc1892d1	Typo: missing comma in json (#10711 )	2021-01-06 13:49:50 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
sthetland	6ae8059c09	cleaning up and fixing links (#10528 ) * cleaning up and fixing links * reverting local link * Update indexer.md * link checking * Fixing one more stale link for PostgreSQL	2020-12-17 13:37:43 -08:00
Abhishek Agarwal	4ea1ab8531	Fix links in the grouping function doc (#10654 )	2020-12-09 14:56:32 +08:00
Abhishek Agarwal	26d74b3580	Add grouping_id function (#10518 ) * First draft of grouping_id function * Add more tests and documentation * Add calcite tests * Fix travis failures * bit of a change * Add documentation * Fix typos * typo fix	2020-12-07 11:46:29 -08:00
frank chen	24f1e35b5d	fix desc of 'required' for granularity property (#10616 )	2020-12-01 18:29:51 -08:00
sthetland	ba915b7f56	Security overview documentation (#10339 ) * initial file * initial file * security overview added * ldap added * spacing adjustments * nits * security graphics and doc review * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-user-auth.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/operations/security-overview.md Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * updates frm review * review comments * finish up review and light edits * broken links * spell check Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2020-11-19 15:24:58 -08:00
michaelschiff	2f4d6da33f	Updates segment metadata query documentation (#10589 ) * updates segment metadata query documentation to be clearer about cardinality estimation * typo in documentation	2020-11-20 00:08:27 +05:30
Atul Mohan	21e3c4b39c	Add missing docs for timeout exceptions (#10554 ) * Add missing docs for timeout exceptions * Add info on auth failures	2020-11-13 08:45:40 -06:00
Gian Merlino	3436297354	Clarify how ORDER BY works with UNION ALL (#10561 ) Hopefully a bit clearer.	2020-11-05 20:12:03 -08:00
Abhishek Agarwal	04546b65ec	Additional documentation for query caching (#10503 ) * Add documentation for when caching is unsupported * Minor changes * Minor doc fix * Review comments * Add more details * Fix spelling check * Fix doc for union query * Trailing dot	2020-10-20 13:49:13 -07:00
Maytas Monsereenusorn	3538abd5d0	Make sure all fields in sys.segments are JSON-serialized (#10481 ) * fix JSON format * Change all columns in sys segments to be JSON * Change all columns in sys segments to be JSON * add tests * fix failing tests * fix failing tests	2020-10-14 13:49:46 -07:00
Clint Wylie	1d6cb624f4	add vectorizeVirtualColumns query context parameter (#10432 ) * add vectorizeVirtualColumns query context parameter * oops * spelling * default to false, more docs * fix test * fix spelling	2020-09-28 18:48:34 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Clint Wylie	dad69481f0	add light weight version of /druid/coordinator/v1/lookups/nodeStatus (#10422 ) * add light weight version /druid/coordinator/v1/lookups/nodeStatus * review stuffs	2020-09-24 14:36:53 +08:00
Maytas Monsereenusorn	72f1b55f56	Add last_compaction_state to sys.segments table (#10413 ) * Add is_compacted to sys.segments table * change is_compacted to last_compaction_state * fix tests * fix tests * address comments	2020-09-23 15:29:36 -07:00
sthetland	ae247b6e63	Document change in results of groupBy queries with subtotalsSpec (#10405 ) * subtotalsSpec results with null values Document the format change in results of a groupBy query with a subtotalsSpec. This update applies to 0.18 and later. * Review catches	2020-09-19 10:51:23 -07:00
Suneet Saldanha	f71ba6f2c2	Vectorized ANY aggregators (#10338 ) * WIP vectorized ANY aggregators * tests * fix aggs * cleanup * code review + tests * docs * use NilVectorSelector when needed * fix spellcheck * dont instantiate vectors * cleanup	2020-09-14 19:44:58 -07:00
Abhishek Agarwal	f5e2645bbb	Support SearchQueryDimFilter in sql via new methods (#10350 ) * Support SearchQueryDimFilter in sql via new methods * Contains is a reserved word * revert unnecessary change * Fix toDruidExpression method * rename methods * java docs * Add native functions * revert change in dockerfile * remove changes from dockerfile * More tests * travis fix * Handle null values better	2020-09-14 09:57:54 -07:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00
Gian Merlino	5cd7610fb6	SQL support for union datasources. (#10324 ) * SQL support for union datasources. Exposed via the "UNION ALL" operator. This means that there are now two different implementations of UNION ALL: one at the top level of a query that works by concatenating subquery results, and one at the table level that works by creating a UnionDataSource. The SQL documentation is updated to discuss these two use cases and how they behave. Future work could unify these by building support for a native datasource that represents the union of multiple subqueries. (Today, UnionDataSource can only represent the union of tables, not subqueries.) * Fixes. * Error message for sanity check. * Additional test fixes. * Add some error messages.	2020-08-28 07:57:06 -07:00
Gian Merlino	21703d81ac	Fix handling of 'join' on top of 'union' datasources. (#10318 ) * Fix handling of 'join' on top of 'union' datasources. The problem is that unions are typically rewritten into a series of individual queries on the underlying tables, but this isn't done when the union is wrapped in a join. The main changes are in UnionQueryRunner: 1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis. 2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource". Together, these enable UnionQueryRunner to "see through" a join. * Tests. * Adjust heap sizes for integration tests. * Different approach, more tests. * Tweak. * Styling.	2020-08-26 14:23:54 -07:00
Gian Merlino	91bb27cdf7	Clarify SQL behavior for multi-value dimensions. (#10276 ) There are some known inconsistencies between SQL and native that users should be aware of.	2020-08-25 10:11:16 -07:00
Gian Merlino	0910d22f48	Add SQL "OFFSET" clause. (#10279 ) * Add SQL "OFFSET" clause. Under the hood, this uses the new offset features from #10233 (Scan) and #10235 (GroupBy). Since Timeseries and TopN queries do not currently have an offset feature, SQL planning will switch from one of those to Scan or GroupBy if users add an OFFSET. Includes a refactoring to harmonize offset and limit planning using an OffsetLimit wrapper class. This is useful because it ensures that the various places that need to deal with offset and limit collapsing all behave the same way, using its "andThen" method. * Fix test and add another test.	2020-08-21 14:11:54 -07:00
Suneet Saldanha	0891b1f833	Add note about aggregations on floats (#10285 ) * Add note about aggreations on floats Floating point math is known to be unstable. Due to the way aggregators work across segments it's possible for the same query operating on the same data to produce slightly different results. The same problem exists with any aggregators that are not commutative since the merge order across segments is not guaranteed. * Also talk about doubles * Apply suggestions from code review	2020-08-17 13:29:57 -07:00
Gian Merlino	6cca7242de	Add "offset" parameter to the Scan query. (#10233 ) * Add "offset" parameter to the Scan query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Fix constructor call. * Fix up JSONs. * Fix call to ScanQuery. * Doc update. * Fix javadocs. * Spotbugs, LGTM suppressions. * Javadocs. * Fix suppression. * Stabilize Scan query result order, add tests. * Update LGTM comment. * Fixup. * Test different batch sizes too. * Nicer tests. * Fix comment.	2020-08-13 14:56:24 -07:00
Clint Wylie	e053348f74	add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219 ) * add isNullable to ColumnCapabilities, ColumnAnalysis * better builder * fix segment metadata queries in integration tests * adjustments * cleanup * fix spotbugs * treat unknown as true in segmentmetadata * rename to hasNulls, add docs * fixup * test the dim indexer selector isNull fix for numeric columns * fixes * oof	2020-08-13 14:55:32 -07:00
Abhishek Radhakrishnan	dc16abae34	Vectorization support for long, double, float min & max aggregators. (#10260 ) * LongMaxVectorAggregator support and test case. * DoubleMinVectorAggregator and test cases. * DoubleMaxVectorAggregator and unit test. * FloatMinVectorAggregator and FloatMaxVectorAggregator. * Documentation update to include the other vector aggregators. * Bug fix. * checkstyle formatting fixes. * CalciteQueryTest cases update. * Separate test classes for FloatMaxAggregation and FloatMniAggregation. * remove the cannotVectorize for float max/min aggregator in test. * Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.	2020-08-10 15:18:55 -07:00
Gian Merlino	b6aaf59e8c	Add "offset" parameter to GroupBy query. (#10235 ) * Add "offset" parameter to GroupBy query. It works by doing the query as normal and then throwing away the first "offset" number of rows on the broker. * Stabilize GroupBy sorts. * Fix inspections. * Fix suppression. * Fixups. * Move TopNSequence to druid-core. * Addl comments. * NumberedElement equals verification. * Changes from review.	2020-08-05 15:39:58 -07:00
Abhishek Radhakrishnan	34a4113752	Add vectorization support for the longMin aggregator. (#10211 ) * Fix minor formatting in docs. * Add Nullhandling initialization for test to run from IDE. * Vectorize longMin aggregator. - A new vectorized class for the vectorized long min aggregator. - Changes to AggregatorFactory to support vectorize functionality. - Few changes to schema evolution test to add LongMinAggregatorFactory. * Add longSum to the supported vectorized aggregator implementations. * Add MIN() long min to calcite query test that can vectorize. * Add simple long aggregations test. * Fixup formatting per checkstyle guide. * fixup and add more tests for long min aggregator. * Override test for groupBy since timestamps are handled differently. * Null compatibility check in test. * Review comment: Add a test case to LongMinAggregationTest.	2020-08-01 15:32:09 -07:00

1 2 3 4 5

205 Commits