druid

Commit Graph

Author	SHA1	Message	Date
Atul Mohan	77333e56fa	Docs: Add missing kafka emitter config (#16332 )	2024-04-25 10:37:14 +05:30
Gian Merlino	8a5cc976a9	ArrayOfDoublesSketchBuildAggregator: Fix NPE in get() for empty sketch. (#16330 ) Fixes a bug introduced in #16296, where the sketch might not be initialized if get() is called without calling aggregate(). Also adds a test for this case.	2024-04-25 00:59:59 -04:00
Bünyamin	e74da6a6b6	Add new metrics for prometheus emitter (#16329 )	2024-04-25 07:16:24 +05:30
Katya Macedo	ceb6646dec	Add supervisor actions (#16276 ) * Add supervisor actions * Update text * Update text * Update after review * Update after review	2024-04-24 13:14:01 -07:00
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Rishabh Singh	e30790e013	Introduce Segment Schema Publishing and Polling for Efficient Datasource Schema Building (#15817 ) Issue: #14989 The initial step in optimizing segment metadata was to centralize the construction of datasource schema in the Coordinator (#14985). Thereafter, we addressed the problem of publishing schema for realtime segments (#15475). Subsequently, our goal is to eliminate the requirement for regularly executing queries to obtain segment schema information. This is the final change which involves publishing segment schema for finalized segments from task and periodically polling them in the Coordinator.	2024-04-24 22:22:53 +05:30
Sree Charan Manamala	080476f9ea	WINDOWING - Fix 2 nodes with same digest causing mapping issue (#16301 ) Fixes the mapping issue in window fucntions where 2 nodes get the same reference.	2024-04-24 16:45:02 +05:30
Gian Merlino	274ccbfd85	Reset buffer aggregators when resetting Groupers. (#16296 ) Buffer aggregators can contain some cached objects within them, such as Memory references or HLL Unions. Prior to this patch, various Grouper implementations were not releasing this state when resetting their own internal state, which could lead to excessive memory use. This patch renames AggregatorAdapater#close to "reset", and updates Grouper implementations to call this reset method whenever they reset their internal state. The base method on BufferAggregator and VectorAggregator remains named "close", for compatibility with existing extensions, but the contract is adjusted to say that the aggregator may be reused after the method is called. All existing implementations in core already adhere to this new contract, except for the ArrayOfDoubles build flavors, which are updated in this patch to adhere. Additionally, this patch harmonizes buffer sketch helpers to call their clear method "clear" rather than a mix of "clear" and "close". (Others were already using "clear".)	2024-04-24 05:39:24 -04:00
Kashif Faraz	1dabb02843	Fix `ForkingTaskRunnerTest` (#16323 ) Changes: - Use non-static fields to track task counts in `ForkingTaskRunner` - Update assertions in `ForkingTaskRunnerTest` to ensure that the tests are idempotent	2024-04-24 14:05:05 +05:30
Tim Williamson	4bdc1890f7	Improve worst-case performance of LIKE filters by 20x (#16153 ) * Expected-linear-time LIKE `LikeDimFilter` was compiling the `LIKE` clause down to a `java.util.regex.Pattern`. Unfortunately, even seemingly simply regexes can lead to [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html). In particular, something as simple as a few `%` wildcards can end up in [exploding the time complexity](https://www.rexegg.com/regex-explosive-quantifiers.html#remote). This MR implements a simple greedy algorithm that avoids backtracking. Technically, the algorithm runs in `O(nm)`, where `n` is the length of the string to match and `m` is the length of the pattern. In practice, it should run in linear time: essentially as fast as `String.indexOf()` can search for the next match. Running an updated version of the `LikeFilterBenchmark` with Java 11 on a `t2.xlarge` instance showed at least a 1.7x speed up for a simple "contains" query (`%50%`), and more than a 20x speed up for a "killer" query with four wildcards but no matches (`%%%%x`). The benchmark uses short strings: cases with longer strings should benefit more. Note that the `REGEX` operator still suffers from the same potentially-catastrophic runtimes. Using a better library than the built-in `java.util.regex.Pattern` (e.g., [joni](https://github.com/jruby/joni)) would be a good idea to avoid accidental — or intentional — DoSing. ``` Benchmark (cardinality) Mode Cnt Before Score Error After Score Error Units Before / After LikeFilterBenchmark.matchBoundPrefix 1000 avgt 10 6.686 ± 0.026 6.765 ± 0.087 us/op 0.99x LikeFilterBenchmark.matchBoundPrefix 100000 avgt 10 163.936 ± 1.589 140.014 ± 0.563 us/op 1.17x LikeFilterBenchmark.matchBoundPrefix 1000000 avgt 10 1235.259 ± 7.318 1165.330 ± 9.300 us/op 1.06x LikeFilterBenchmark.matchLikeContains 1000 avgt 10 255.074 ± 1.530 130.212 ± 3.314 us/op 1.96x LikeFilterBenchmark.matchLikeContains 100000 avgt 10 34789.639 ± 210.219 18563.644 ± 100.030 us/op 1.87x LikeFilterBenchmark.matchLikeContains 1000000 avgt 10 287265.302 ± 1790.957 164684.778 ± 317.698 us/op 1.74x LikeFilterBenchmark.matchLikeEquals 1000 avgt 10 0.410 ± 0.003 0.399 ± 0.001 us/op 1.03x LikeFilterBenchmark.matchLikeEquals 100000 avgt 10 0.793 ± 0.005 0.719 ± 0.003 us/op 1.10x LikeFilterBenchmark.matchLikeEquals 1000000 avgt 10 0.864 ± 0.004 0.839 ± 0.005 us/op 1.03x LikeFilterBenchmark.matchLikeKiller 1000 avgt 10 3077.629 ± 7.928 103.714 ± 2.417 us/op 29.67x LikeFilterBenchmark.matchLikeKiller 100000 avgt 10 311048.049 ± 13466.911 14777.567 ± 70.242 us/op 21.05x LikeFilterBenchmark.matchLikeKiller 1000000 avgt 10 3055855.099 ± 18387.839 92476.621 ± 1198.255 us/op 33.04x LikeFilterBenchmark.matchLikePrefix 1000 avgt 10 6.711 ± 0.035 6.653 ± 0.046 us/op 1.01x LikeFilterBenchmark.matchLikePrefix 100000 avgt 10 161.535 ± 0.574 163.740 ± 0.833 us/op 0.99x LikeFilterBenchmark.matchLikePrefix 1000000 avgt 10 1255.696 ± 5.207 1201.378 ± 3.466 us/op 1.05x LikeFilterBenchmark.matchRegexContains 1000 avgt 10 467.736 ± 2.546 481.431 ± 5.647 us/op 0.97x LikeFilterBenchmark.matchRegexContains 100000 avgt 10 64871.766 ± 223.341 65483.992 ± 391.249 us/op 0.99x LikeFilterBenchmark.matchRegexContains 1000000 avgt 10 482906.004 ± 2003.583 477195.835 ± 3094.605 us/op 1.01x LikeFilterBenchmark.matchRegexKiller 1000 avgt 10 8071.881 ± 18.026 8052.322 ± 17.336 us/op 1.00x LikeFilterBenchmark.matchRegexKiller 100000 avgt 10 1120094.520 ± 2428.172 808321.542 ± 2411.032 us/op 1.39x LikeFilterBenchmark.matchRegexKiller 1000000 avgt 10 8096745.012 ± 40782.747 8114114.896 ± 43250.204 us/op 1.00x LikeFilterBenchmark.matchRegexPrefix 1000 avgt 10 170.843 ± 1.095 175.924 ± 1.144 us/op 0.97x LikeFilterBenchmark.matchRegexPrefix 100000 avgt 10 17785.280 ± 116.813 18708.888 ± 61.857 us/op 0.95x LikeFilterBenchmark.matchRegexPrefix 1000000 avgt 10 174415.586 ± 1827.478 173190.799 ± 949.224 us/op 1.01x LikeFilterBenchmark.matchSelectorEquals 1000 avgt 10 0.411 ± 0.003 0.416 ± 0.002 us/op 0.99x LikeFilterBenchmark.matchSelectorEquals 100000 avgt 10 0.728 ± 0.003 0.739 ± 0.003 us/op 0.99x LikeFilterBenchmark.matchSelectorEquals 1000000 avgt 10 0.842 ± 0.002 0.879 ± 0.007 us/op 0.96x ``` * Take into account whether druid.generic.useDefaultValueForNull is set in LikeDimFilterTest assertions. * Attempt to placate CodeQL. * Fix handling of multi-pattern suffixes. * Expected-linear-time LIKE `LikeDimFilter` was compiling the `LIKE` clause down to a `java.util.regex.Pattern`. Unfortunately, even seemingly simply regexes can lead to [catastrophic backtracking](https://www.regular-expressions.info/catastrophic.html). In particular, something as simple as a few `%` wildcards can end up in [exploding the time complexity](https://www.rexegg.com/regex-explosive-quantifiers.html#remote). This MR implements a simple greedy algorithm that avoids the catastrophic backtracking, converting the `LIKE` pattern into a list of `java.util.regex.Pattern` by splitting on the `%` wildcard. The resulting sub-patterns do no backtracking, and a simple greedy loop using `Matcher.find()` to progress through the string is used. Running an updated version of the `LikeFilterBenchmark` with Java 11 on a `t2.xlarge` instance showed at least a 1.15x speed up for a simple "contains" query (`%50%`), and more than a 20x speed up for a "killer" query with four wildcards but no matches (`%%%%x`). The benchmark uses short strings: cases with longer strings should benefit more. Note that the `REGEX` operator still suffers from the same potentially-catastrophic runtimes. Using a better library than the built-in `java.util.regex.Pattern` (e.g., [joni](https://github.com/jruby/joni)) would be a good idea to avoid accidental — or intentional — DoSing. ``` Benchmark (cardinality) Mode Cnt Before Score Error After Score Error Units Before/After LikeFilterBenchmark.matchBoundPrefix 1000 avgt 10 5.410 ± 0.010 5.582 ± 0.004 us/op 0.97x LikeFilterBenchmark.matchBoundPrefix 100000 avgt 10 140.920 ± 0.306 141.082 ± 0.391 us/op 1.00x LikeFilterBenchmark.matchBoundPrefix 1000000 avgt 10 1082.762 ± 1.070 1171.407 ± 1.628 us/op 0.92x LikeFilterBenchmark.matchLikeComplexContains 1000 avgt 10 221.572 ± 0.228 183.742 ± 0.210 us/op 1.21x LikeFilterBenchmark.matchLikeComplexContains 100000 avgt 10 25461.362 ± 21.481 17373.828 ± 42.577 us/op 1.47x LikeFilterBenchmark.matchLikeComplexContains 1000000 avgt 10 221075.917 ± 919.238 177454.683 ± 506.420 us/op 1.25x LikeFilterBenchmark.matchLikeContains 1000 avgt 10 283.015 ± 0.219 218.835 ± 3.126 us/op 1.29x LikeFilterBenchmark.matchLikeContains 100000 avgt 10 30202.910 ± 32.697 26713.488 ± 49.525 us/op 1.13x LikeFilterBenchmark.matchLikeContains 1000000 avgt 10 284661.411 ± 130.324 243381.857 ± 540.143 us/op 1.17x LikeFilterBenchmark.matchLikeEquals 1000 avgt 10 0.386 ± 0.001 0.380 ± 0.001 us/op 1.02x LikeFilterBenchmark.matchLikeEquals 100000 avgt 10 0.670 ± 0.001 0.705 ± 0.002 us/op 0.95x LikeFilterBenchmark.matchLikeEquals 1000000 avgt 10 0.839 ± 0.001 0.796 ± 0.001 us/op 1.05x LikeFilterBenchmark.matchLikeKiller 1000 avgt 10 4882.099 ± 7.953 170.142 ± 0.494 us/op 28.69x LikeFilterBenchmark.matchLikeKiller 100000 avgt 10 524122.010 ± 390.170 19461.637 ± 117.090 us/op 26.93x LikeFilterBenchmark.matchLikeKiller 1000000 avgt 10 5121795.377 ± 4176.052 181162.978 ± 368.443 us/op 28.27x LikeFilterBenchmark.matchLikePrefix 1000 avgt 10 5.708 ± 0.005 5.677 ± 0.011 us/op 1.01x LikeFilterBenchmark.matchLikePrefix 100000 avgt 10 141.853 ± 0.554 108.313 ± 0.330 us/op 1.31x LikeFilterBenchmark.matchLikePrefix 1000000 avgt 10 1199.148 ± 1.298 1153.297 ± 1.575 us/op 1.04x LikeFilterBenchmark.matchLikeSuffix 1000 avgt 10 256.020 ± 0.283 196.339 ± 0.564 us/op 1.30x LikeFilterBenchmark.matchLikeSuffix 100000 avgt 10 29917.931 ± 28.218 21450.997 ± 20.341 us/op 1.39x LikeFilterBenchmark.matchLikeSuffix 1000000 avgt 10 241225.193 ± 465.824 194034.292 ± 362.312 us/op 1.24x LikeFilterBenchmark.matchRegexComplexContains 1000 avgt 10 119.597 ± 0.635 135.550 ± 0.697 us/op 0.88x LikeFilterBenchmark.matchRegexComplexContains 100000 avgt 10 13089.670 ± 13.738 13766.712 ± 12.802 us/op 0.95x LikeFilterBenchmark.matchRegexComplexContains 1000000 avgt 10 130822.830 ± 1624.048 131076.029 ± 1636.811 us/op 1.00x LikeFilterBenchmark.matchRegexContains 1000 avgt 10 573.273 ± 0.421 615.399 ± 0.633 us/op 0.93x LikeFilterBenchmark.matchRegexContains 100000 avgt 10 57259.313 ± 162.747 62900.380 ± 44.746 us/op 0.91x LikeFilterBenchmark.matchRegexContains 1000000 avgt 10 571335.768 ± 2822.776 542536.982 ± 780.290 us/op 1.05x LikeFilterBenchmark.matchRegexKiller 1000 avgt 10 11525.499 ± 8.741 11061.791 ± 21.746 us/op 1.04x LikeFilterBenchmark.matchRegexKiller 100000 avgt 10 1170414.723 ± 766.160 1144437.291 ± 886.263 us/op 1.02x LikeFilterBenchmark.matchRegexKiller 1000000 avgt 10 11507668.302 ± 11318.176 110381620.014 ± 10707.974 us/op 1.11x LikeFilterBenchmark.matchRegexPrefix 1000 avgt 10 156.460 ± 0.097 155.217 ± 0.431 us/op 1.01x LikeFilterBenchmark.matchRegexPrefix 100000 avgt 10 15056.491 ± 23.906 15508.965 ± 763.976 us/op 0.97x LikeFilterBenchmark.matchRegexPrefix 1000000 avgt 10 154416.563 ± 473.108 153737.912 ± 273.347 us/op 1.00x LikeFilterBenchmark.matchRegexSuffix 1000 avgt 10 610.684 ± 0.462 590.352 ± 0.334 us/op 1.03x LikeFilterBenchmark.matchRegexSuffix 100000 avgt 10 53196.517 ± 78.155 59460.261 ± 56.934 us/op 0.89x LikeFilterBenchmark.matchRegexSuffix 1000000 avgt 10 536100.944 ± 440.353 550098.917 ± 740.464 us/op 0.97x LikeFilterBenchmark.matchSelectorEquals 1000 avgt 10 0.390 ± 0.001 0.366 ± 0.001 us/op 1.07x LikeFilterBenchmark.matchSelectorEquals 100000 avgt 10 0.724 ± 0.001 0.714 ± 0.001 us/op 1.01x LikeFilterBenchmark.matchSelectorEquals 1000000 avgt 10 0.826 ± 0.001 0.847 ± 0.001 us/op 0.98x ```	2024-04-23 22:45:23 -07:00
Parth Agrawal	f1d24c868f	[CVE Fixes] Update version of Nimbus.jose.jwt (#16320 ) * Update version of nimbus.jose.jwt.version * update licenses.yaml	2024-04-23 15:11:54 +05:30
Charles Smith	65412f80ab	remove additional column marks (#16319 )	2024-04-22 19:41:54 -07:00
AmatyaAvadhanula	08b5a8b88e	Ignore append locks for compaction when using concurrent locks (#16316 ) * Ignore append locks for compaction when using concurrent locks	2024-04-22 23:26:45 +05:30
Vishesh Garg	173a206829	Fix incorrect check of InvalidFieldException to InvalidFieldFault while generating MSQ Error Report (#16273 ) InvalidFieldFault is incorrectly checked as InvalidFieldException in mapQueryColumnNameToOutputColumnName. This fixes the bug.	2024-04-22 15:18:49 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Adithya Chakilam	cff5d1e369	Add method Supervisor.computeLagForAutoScaler (#16314 ) Tries to address the comments made on #16284 after merged. Changes: - Remove method `Supervisor.getLagMetric()` - Add method `Supervisor.computeLagForAutoScaler()` - Remove classes `LagMetric` and `LagMetricTest`	2024-04-20 07:57:50 +05:30
Vadim Ogievetsky	3e42ebbaea	Web console: Fix the supervisor offset reset dialog. (#16298 ) * Add host to query output * Init fixes for reset offsets * fix the supervisor offset reset dialog * Update web-console/src/views/load-data-view/load-data-view.tsx Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update web-console/src/views/load-data-view/load-data-view.tsx Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * Update web-console/src/views/load-data-view/load-data-view.tsx Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> * reformat code * ' * fix conflict --------- Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2024-04-19 17:25:46 -07:00
Sree Charan Manamala	ad5701e891	new SCALAR_IN_ARRAY function analogous to DRUID_IN (#16306 ) * scalar_in function * api doc * refactor	2024-04-18 21:15:15 -07:00
Akshat Jain	79e48c6b45	Fix NPE while loading lookups from empty JDBC source (#16307 )	2024-04-18 21:52:02 +05:30
Sree Charan Manamala	960a674442	Corrected Strict NON NULL return type checks (#16279 )	2024-04-18 12:17:13 +02:00
Gian Merlino	4285a5e2c6	Update documentation for exceptions to subquery limit. (#16295 ) The true exception for groupBy is somewhat more narrow than the docs suggest.	2024-04-17 21:04:43 -07:00
YongGang	6974498d98	Improve error message when task fails before becoming ready (#16286 )	2024-04-18 08:15:41 +05:30
zachjsh	3f2dd46ede	Catalog table should not need explicit segment granularity set (#16278 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * fix and update test * * address review comments * * address test review comments * * fix checkstyle * * fix checkstyle * * fix failing test	2024-04-17 11:46:24 -04:00
Clint Wylie	aa230642dd	use PeekableIntIterator for OR filter "partial index" value matchers (#16300 )	2024-04-17 08:27:21 -07:00
zachjsh	2351f038eb	Kafka with topicPattern can ignore old offsets spuriously (#16190 ) * * fix * * simplify * * simplify tests * * update matches function definition for Kafka Datasource Metadata * * add matchesOld * * override matches and plus for kafka based metadata / sequence numbers * * implement minus * add tests * * fix failing tests * * remove TODO comments * * simplfy and add comments * * remove unused variable in tests * * remove unneeded function * * add serde tests * * more stuff * * address review comments * * remove unneeded code.	2024-04-17 10:00:17 -04:00
Hardik Bajaj	0bf5e7745d	Add configurable parameters for statsd client (#16283 ) Statsd client sometimes drops metrics when this queueSize of statsd client with max unprocessed messages is completely full. This causes some high cardinality metrics like per partition lag being droppped. There are multiple parameters of statsdclient that can be initialized and can help increase the load/capacity of client to not to drop metrics more frequently. Properties like queueSize, poolSize, processorWorkers and senderWorkers will now be configurable at runtime	2024-04-17 18:35:31 +05:30
Adithya Chakilam	34237bc112	Consider max lag for kinesis while autoscaling (#16284 ) * Consider max lag for kinesis while autoscaling * add test for coverage * test folder	2024-04-17 15:05:05 +05:30
Gian Merlino	ccc1ffb032	Additional short circuiting knowledge in filter bundles. (#16292 ) * Additional short circuiting knowledge in filter bundles. Three updates: 1) The parameter "selectionRowCount" on "makeFilterBundle" is renamed "applyRowCount", and redefined as an upper bound on rows remaining after short-circuiting (rather than number of rows selected so far). This definition works better for OR filters, which pass through the FALSE set rather than the TRUE set to the next subfilter. 2) AndFilter uses min(applyRowCount, indexIntersectionSize) rather than using selectionRowCount for the first subfilter and indexIntersectionSize for each filter thereafter. This improves accuracy when the incoming applyRowCount is smaller than the row count from the first few indexes. 3) OrFilter uses min(applyRowCount, totalRowCount - indexUnionSize) rather than applyRowCount for subfilters. This allows an OR filter to pass information about short-circuiting to its subfilters. To help write tests for this, the patch also moves the sampled wikiticker data file from sql to processing. * Forbidden APIs. * Forbidden APIs. * Better comments. * Fix inspection. * Adjustments to tests.	2024-04-16 22:42:28 -07:00
aho135	4fa377c7fd	Improve logging for lookups (#16287 )	2024-04-17 10:20:09 +05:30
AmatyaAvadhanula	f3d69f30e6	Associate pending segments with the tasks that requested them (#16144 ) Changes: - Add column `task_allocator_id` to `pendingSegments` metadata table. - Add column `upgraded_from_segment_id` to `pendingSegments` metadata table. - Add interface `PendingSegmentAllocatingTask` and implement it by all tasks which can allocate pending segments. - Use `taskAllocatorId` to identify the task (and its sub-tasks or replicas) to which a pending segment has been allocated. - Perform active cleanup of pending segments in `TaskLockbox` once there are no active tasks for the corresponding task allocator id. - When committing APPEND segments, also commit all upgraded pending segments corresponding to that task allocator id. - When committing REPLACE segments, upgrade all overlapping pending segments in the same transaction.	2024-04-17 09:06:31 +05:30
zachjsh	a5428e75ff	INSERT/REPLACE complex target column types are validated against source input expressions (#16223 ) * * fix * * fix * * address review comments * * fix * * simplify tests * * fix complex type nullability issue * * address review comments * * address test review comments * * fix checkstyle	2024-04-16 17:20:35 -04:00
Gian Merlino	cf841b8e67	Fix incorrect class in BaseMacroFunctionExpr.equals. (#16294 ) The equals method cast to the wrong class, potentially leading to ClassCastException.	2024-04-16 09:40:46 -07:00
AmatyaAvadhanula	ad6bd62140	Handle task location fetch from overlord during rolling upgrades (#16227 ) Bug: #15724 introduced a bug where a rolling upgrade would cause all task locations returned by the Overlord on an older version to be unknown. Fix: If the new API fails, fall back to single task status API which always returns a valid task location.	2024-04-16 21:01:37 +05:30
Jan Werner	c45da431fb	update netty and zookeeper dependencies to address CVEs (#16267 ) Update dependencies to address CVEs: - Update netty from 4.1.107.Final to 4.1.108.Final to address: CVE-2024-29025 - Update zookeeper from 3.8.3 to 3.8.4 to address: CVE-2024-23944 Release notes: - Update netty from 4.1.107.Final to 4.1.108.Final to address: CVE-2024-29025 - Update zookeeper from 3.8.3 to 3.8.4 to address: CVE-2024-23944	2024-04-15 20:40:50 -07:00
YongGang	6964297b53	Remove the unused Controller context reference from Worker (#16285 )	2024-04-16 08:34:24 +05:30
Nikhil Rao	a805c5612e	Adds Druid SQL query examples for the Stats aggregator Native Queries (#16277 ) * Adds Druid SQL query examples for the Timeseries and GroupBy Native queries in the stats aggregator docs page * Updates intervals in Native Query to remove excess Time part in timestamp * Moves Druid SQL section above Native query because sql used more often by users * removes old Druid SQL sections * Adds TopN Druid SQL query using ORDER BY and LIMIT * Adds table for Druid SQL variance and standard deviation functions * Update docs/development/extensions-core/stats.md Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-04-15 08:05:34 -07:00
Sree Charan Manamala	5247059d2f	Allow Double & null values in sql type array through dynamic params (#16274 )	2024-04-15 10:44:42 +02:00
Adarsh Sanjeev	3df00aef9d	Add manifest file for MSQ export (#15953 ) Currently, export creates the files at the provided destination. The addition of the manifest file will provide a list of files created as part of the manifest. This will allow easier consumption of the data exported from Druid, especially for automated data pipelines	2024-04-15 11:37:31 +05:30
Kashif Faraz	81d7b6ebe1	Fix OverlordClient to read reports as a concrete `ReportMap` (#16226 ) Follow up to #16217 Changes: - Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap` - Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module - `TaskReport` - `KillTaskReport` - `IngestionStatsAndErrorsTaskReport` - `TaskContextReport` - `TaskReportFileWriter` - `SingleFileTaskReportFileWriter` - `TaskReportSerdeTest` - Remove `MsqOverlordResourceTestClient` as it had only one method which is already present in `OverlordResourceTestClient` itself	2024-04-15 08:00:59 +05:30
Abhishek Radhakrishnan	041d0bff5e	Set default `KillUnusedSegments` duty to coordinator's indexing period & `killTaskSlotRatio` to 0.1 (#16247 ) The default value for druid.coordinator.kill.period (if unspecified) has changed from P1D to the value of druid.coordinator.period.indexingPeriod. Operators can choose to override druid.coordinator.kill.period and that will take precedence over the default behavior. The default value for the coordinator dynamic config killTaskSlotRatio is updated from 1.0 to 0.1. This ensures that that kill tasks take up only 1 task slot right out-of-the-box instead of taking up all the task slots. * Remove stale comment and inline canDutyRun() * druid.coordinator.kill.period defaults to druid.coordinator.period.indexingPeriod if not set. - Remove the default P1D value for druid.coordinator.kill.period. Instead default druid.coordinator.kill.period to whatever value druid.coordinator.period.indexingPeriod is set to if the former config isn't specified. - If druid.coordinator.kill.period is set, the value will take precedence over druid.coordinator.period.indexingPeriod * Update server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorConfigTest.java * Fix checkstyle error * Clarify comment * Update server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorConfig.java * Put back canDutyRun() * Default killTaskSlotsRatio to 0.1 instead of 1.0 (all slots) * Fix typo DEFAULT_MAX_COMPACTION_TASK_SLOTS * Remove unused test method. * Update default value of killTaskSlotsRatio in docs and web-console default mock * Move initDuty() after params and config setup.	2024-04-14 18:56:17 -07:00
Gian Merlino	b0c5184f9d	Fix ORDER BY on certain GROUPING SETS. (#16268 ) * Fix ORDER BY on certain GROUPING SETS. DefaultLimitSpec (part of native groupBy) had a bug where it would assume that results are naturally ordered by dimensions even when subtotalsSpec is present. However, this is not necessarily the case. For certain combinations of ORDER BY and GROUPING SETS, this would cause the ORDER BY to be ignored. * Fix test testGroupByWithSubtotalsSpecWithOrderLimitForcePushdown. Resorting was necessary.	2024-04-12 12:06:47 -07:00
Katya Macedo	7f06a53cb1	[Docs] Fix API placeholder formatting (#16240 )	2024-04-12 09:19:13 -07:00
Sree Charan Manamala	3340b200db	Fix window function drill tests failures falling under RESULT_MISMATCH & RESULT_COUNT_MISMATCH (#16264 ) * Updated the drill test expected results which are failing due to druid's default sorting algorithm taking nulls first approach. * Corrected the queries where date time values are directly provided * marked 2 cases failing with resultset casting issues	2024-04-12 13:54:48 +02:00
Laksh Singla	cce2d0f127	Upload openrewrite patch via GHA (#16270 ) This patch adds a step to the openrewrite action, such that it uploads the correcting patch, in case it fails.	2024-04-12 15:31:07 +05:30
Sree Charan Manamala	f65c166327	Windowed aggregates should update the aggregation value based on final compute (#16244 )	2024-04-12 08:28:33 +02:00
YongGang	da9feb4430	Introduce TaskContextReport for reporting task context (#16041 ) Changes: - Add `TaskContextEnricher` interface to improve task management and monitoring - Invoke `enrichContext` in `TaskQueue.add()` whenever a new task is submitted to the Overlord - Add `TaskContextReport` to write out task context information in reports	2024-04-12 08:57:49 +05:30
Pranav	fc2600b8e2	Adding jvmVersion dimension in JVM Monitor (#16262 )	2024-04-11 15:44:56 -07:00
Gian Merlino	9f358f5f4a	SQL tests: avoid mixing skip and cannot vectorize. (#16251 ) * SQL tests: avoid mixing skip and cannot vectorize. skipVectorize switches off vectorization tests completely, and cannotVectorize turns vectorization tests into negative tests. It doesn't make sense to use them together, so this patch makes it an error to do so, and cleans up cases where both are mentioned. This patch also has the effect of changing various tests from skipVectorize to cannotVectorize, because in the past when both were mentioned, skipVectorize would take priority. * Fix bug with StringAnyAggregatorFactory attempting to vectorize when it cannt. * Fix tests.	2024-04-11 15:06:11 -07:00
317brian	df9e1bb97b	Docs: Fix typo in tutorial (#16254 )	2024-04-10 08:59:52 +05:30
Katya Macedo	cd69f145b7	docs: Add upgrade notes for Druid 29.0.1 (#16123 )	2024-04-09 13:56:57 -07:00

... 4 5 6 7 8 ...

14200 Commits All Branches Search

14200 Commits

All Branches