druid

Commit Graph

Author	SHA1	Message	Date
Jill Osborne	b0db2a87d8	Update Kafka ingestion tutorial (#13261 ) * Update Kafka ingestion tutorial * Update tutorial-kafka.md Updated location of sample data file * Added sample data file * Update tutorial-kafka.md * Add sample data file * Update tutorial-kafka.md Updated sample file location in curl commands * Update and reuploading sample data files * Updated spelling file * Delete .spelling * Added spelling file * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/tutorials/tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Updated after review * Update tutorial-kafka.md * Updated * Update tutorial-kafka.md * Update tutorial-kafka.md * Update tutorial-kafka.md * Updated sample data file and command * Add files via upload * Delete kttm-nested-data.json.tgz * Delete kttm-nested-data.json.tgz * Add files via upload * Update tutorial-kafka.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2022-11-11 14:47:54 -08:00
Laksh Singla	3e172d44ab	Bind DurableStorageCleaner only on the Overlord nodes (#13355 )	2022-11-11 21:56:33 +05:30
Jill Osborne	47dd4ed2e7	Added experimental feature text for front coding feature (#13349 )	2022-11-11 02:06:13 -08:00
Gian Merlino	e78f648023	SeekableStreamSupervisor: Don't enqueue duplicate notices. (#13334 ) * SeekableStreamSupervisor: Don't enqueue duplicate notices. Similar goal to #12018, but more aggressive. Don't enqueue a notice at all if it is equal to one currently in the queue. * Adjustments from review. * Update indexing-service/src/test/java/org/apache/druid/indexing/overlord/supervisor/NoticesQueueTest.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-11-11 01:54:01 -08:00
Didip Kerabat	56d5c9780d	Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027 ) * Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects. Removed: import org.apache.commons.io.FilenameUtils; Add: import java.nio.file.FileSystems; import java.nio.file.PathMatcher; import java.nio.file.Paths; * Forgot to update CloudObjectInputSource as well. * Fix tests. * Removed unused exceptions. * Able to reduced user mistakes, by removing the protocol and the bucket on filter. * add 1 more test. * add comment on filterWithoutProtocolAndBucket * Fix lint issue. * Fix another lint issue. * Replace all mention of filter -> objectGlob per convo here: https://github.com/apache/druid/pull/13027#issuecomment-1266410707 * fix 1 bad constructor. * Fix the documentation. * Don’t do anything clever with the object path. * Remove unused imports. * Fix spelling error. * Fix incorrect search and replace. * Addressing Gian’s comment. * add filename on .spelling * Fix documentation. * fix documentation again Co-authored-by: Didip Kerabat <didip@apple.com>	2022-11-10 23:46:40 -08:00
Gian Merlino	77478f25fb	Add taskActionType dimension to task/action/run/time. (#13333 ) * Add taskActionType dimension to task/action/run/time. * Spelling.	2022-11-11 12:00:08 +05:30
Andreas Maechler	03175a2b8d	Add missing MSQ error code fields to docs (#13308 ) * Fix typo * Fix some spacing * Add missing fields * Cleanup table spacing * Remove durable storage docs again Thanks Brian for pointing out previous discussions. * Update docs/multi-stage-query/reference.md Co-authored-by: Charles Smith <techdocsmith@gmail.com> * Mark codes as code * And even more codes as code * Another set of spaces * Combine `ColumnTypeNotSupported` Thanks Karan. * More whitespaces and typos * Add spelling and fix links Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2022-11-10 21:03:04 +05:30
AmatyaAvadhanula	fb23e38aa7	Fix messageGap emission (#13346 ) * Fix messageGap emission * Do not emit messageGap after stopping reading events * Refactoring * Fix tests	2022-11-10 17:50:19 +05:30
Jill Osborne	c2210c4e09	Update ingestion spec doc (#13329 ) * Update ingestion spec doc * Updated * Updated * Update docs/ingestion/ingestion-spec.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Updated * Updated Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2022-11-10 02:54:35 -08:00
Clint Wylie	44f29030dd	fix flaky RemoteTaskRunnerTest.testRunPendingTaskFailToAssignTask with ugly Thread.sleep (#13344 )	2022-11-10 14:28:53 +05:30
Clint Wylie	27215d1ff1	fix complex_decode_base64 function, add SQL bindings (#13332 ) * fix complex_decode_base64 function, add SQL bindings * more permissive	2022-11-09 23:40:25 -08:00
Jill Osborne	965e41538e	Update nested columns doc (#13314 ) * Updated nested columns doc * Update nested-columns.md * Update nested-columns.md	2022-11-10 09:53:28 +08:00
AmatyaAvadhanula	0512ae4922	Optimize metadata calls in SeekableStreamSupervisor (#13328 ) * Optimize metadata calls * Modify isTaskCurrent * Fix tests * Refactoring	2022-11-10 07:22:51 +05:30
Jason Koch	0040042863	HttpPostEmitter back off send() busy-loop (#12102 ) * HttpPostEmitter back off send() busy-loop The HttpPostEmitter gets in a loop until the flush timeout can be triggered, OR until some new events arrive that reset the minimum batch fill timeout delay. As a tactical fix, this introduces a simple backoff delay to the send loop to prevent spamming logs. * Update core/src/main/java/org/apache/druid/java/util/emitter/core/HttpPostEmitter.java Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-11-09 14:32:40 -08:00
Clint Wylie	3e2bb4cf10	fix front-coded bucket size handling, better validation (#13335 ) * fix front-coded bucket size handling, better validation * Update FrontCodedIndexedTest.java	2022-11-09 13:33:01 -08:00
AmatyaAvadhanula	a2013e6566	Enhance streaming ingestion metrics (#13331 ) Changes: - Add a metric for partition-wise kafka/kinesis lag for streaming ingestion. - Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR} - Document metrics	2022-11-09 23:44:15 +05:30
Laksh Singla	b7a513fe09	Add a OverlordHelper that cleans up durable storage objects in MSQ (#13269 ) * scratch * s3 ls fix, add docs * add documentation, update method name * Add tests, address commits, change default value of the helper * fix test * update the default value of config, remove initial delay config * Trigger Build * update class * add more tests * docs update * spellcheck * remove ioe from the signature * add back dmmy constructor for initialization * fix guice bindings, intellij inspections	2022-11-09 17:23:35 +05:30
Tejaswini Bandlamudi	d242a9314b	Adds license and security vulnerabilities checks for Hadoop3 build (#13270 ) * adds license and security vulnerabilities check for Hadoop3 builds * spacing * fixes bugs * updates check_test_suite.py to always run license checks with Hadoop3 * nit * run analyze dependencies, analyze hadoop 3 dependencies * run tests * revert analyze dependencies, analyze hadoop 3 dependencies addition in check_test_suite.py * fixes bug * revert code change	2022-11-09 14:50:31 +05:30
Paul Rogers	7e600d2c63	Enhancements to the Calcite test framework (#13283 ) * Enhancements to the Calcite test framework * Standardize "Unauthorized" messages * Additional test framework extension points * Resolved joinable factory dependency issue	2022-11-08 14:28:49 -08:00
Kashif Faraz	9f7fd57a69	Improve fetch of pending segments from metadata store (#13310 ) * Deserialize only when needed * Update query to fetch pending segments * Revert unneeded changes * Fix query	2022-11-08 05:46:19 -08:00
Kashif Faraz	ff8e0c3397	Fix issues with caching cost strategy (#13321 ) `cachingCost` strategy has some discrepancies when compared to cost strategy. This commit addresses two of these by retaining the same behaviour as the `cost` strategy when computing the cost of moving a segment to a server: - subtract the self cost of a segment if it is being served by the target server - subtract the cost of segments that are marked to be dropped Other changes: - Add tests to verify fixed strategy. These tests would fail without the fixes made to `CachingCostStrategy.computeCost()` - Fix the definition of the segment related metrics in the docs. - Fix some docs issues introduced in #13181	2022-11-08 16:11:39 +05:30
Tejaswini Bandlamudi	594545da55	Adds cluster level idleConfig setting for supervisor (#13311 ) * adds cluster level idleConfig * updates docs * refactoring * spelling nit * nit * nit * refactoring	2022-11-08 14:54:14 +05:30
Churro	9a684af3c9	Fixing the K8s task runner to work with MSQ (#13305 ) * Fixing the K8s task runner to work with MSQ * Sorry incomplete PR Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-11-08 14:41:05 +05:30
Adarsh Sanjeev	a28b8c2674	Improve rowkey object size estimate (#13319 ) * Improve rowkey object size estimate * Address review comments * Update comment * Fix test	2022-11-08 10:12:07 +05:30
Gian Merlino	48528a0c98	MSQ: Fix task lock checking during publish, fix lock priority. (#13282 ) * MSQ: Fix task lock checking during publish, fix lock priority. Fixes two issues: 1) ControllerImpl did not properly check the return value of SegmentTransactionalInsertAction when doing a REPLACE. This could cause it to not realize that its locks were preempted. 2) Task lock priority was the default of 0. It should be the higher batch default of 50. The low priority made it possible for MSQ tasks to be preempted by compaction tasks, which is not desired. * Restructuring, add docs. * Add performSegmentPublish tests. * Fix tests.	2022-11-08 09:27:34 +05:30
Vadim Ogievetsky	f6aca21e82	Web console: update DQT to version 0.17 (#13323 ) * update to DQT 17 * update licenses * after npm i	2022-11-07 17:47:11 -08:00
Jill Osborne	d1a4de022a	Update retention rules doc (#13181 ) * Update retention rules doc * Update rule-configuration.md * Updated * Updated * Updated * Updated * Update rule-configuration.md * Update rule-configuration.md	2022-11-07 14:47:33 -08:00
Rohan Garg	a9b39fc29d	Try converting all inner joins to filters (#13201 )	2022-11-07 23:19:18 +05:30
AmatyaAvadhanula	a738ac9ad7	Improve task pause logging and metrics for streaming ingestion (#13313 ) * Improve task pause logging and metrics for streaming ingestion * Add metrics doc * Fix spelling	2022-11-07 21:33:54 +05:30
Abhishek Agarwal	b1eaf7a21f	MSQ should load even if node roles are not set (#13318 )	2022-11-07 21:11:16 +05:30
AmatyaAvadhanula	47c32a9d92	Skip ALL granularity compaction (#13304 ) * Skip autocompaction for datasources with ETERNITY segments	2022-11-07 17:55:03 +05:30
AmatyaAvadhanula	650840ddaf	Add segment handoff time metric (#13238 ) * Add segment handoff time metric * Remove monitors on scheduler stop * Add warning log for slow handoff * Remove monitor when scheduler stops	2022-11-07 17:49:10 +05:30
Gian Merlino	227b57dd8e	Compaction: Fetch segments one at a time on main task; skip when possible. (#13280 ) * Compaction: Fetch segments one at a time on main task; skip when possible. Compact tasks include the ability to fetch existing segments and determine reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec. This is a useful feature that makes compact tasks work well even when the user running the compaction does not have a clear idea of what they want the compacted segments to be like. However, this comes at a cost: it takes time, and disk space, to do all of these fetches. This patch improves the situation in two ways: 1) When segments do need to be fetched, download them one at a time and delete them when we're done. This still takes time, but minimizes the required disk space. 2) Don't fetch segments on the main compact task when they aren't needed. If the user provides a full granularitySpec, dimensionsSpec, and metricsSpec, we can skip it. * Adjustments. * Changes from code review. * Fix logic for determining rollup.	2022-11-07 14:50:14 +05:30
Gian Merlino	9423aa9163	MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters. (#13274 ) * MSQ: Consider PARTITION_STATS_MAX_BYTES in WorkerMemoryParameters. This consideration is important, because otherwise we can run out of memory due to large statistics-tracking objects. * Improved calculations.	2022-11-07 14:27:18 +05:30
dependabot[bot]	081508f1aa	Bump commons-text from 1.9 to 1.10.0 in /extensions-contrib/kubernetes-overlord-extensions (#13299 ) * Bump commons-text in /extensions-contrib/kubernetes-overlord-extensions Bumps commons-text from 1.9 to 1.10.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-text dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Cleanup pom Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Frank Chen <frank.chen021@outlook.com>	2022-11-05 15:21:39 +08:00
AmatyaAvadhanula	a17ffdfc5d	Fix flaky test method in KafkaSupervisorTest (#13315 )	2022-11-05 10:31:40 +05:30
Clint Wylie	d8329195f7	fix bug when front-coded index has only the null value (#13309 )	2022-11-04 05:26:33 -07:00
Clint Wylie	e60e305ddb	fix issue with parquet list conversion of nullable lists with complex nullable elements (#13294 ) * fix issue with parquet list conversion of nullable lists with complex nullable elements * pom stuff * fix style * adjustments	2022-11-04 05:25:42 -07:00
abhagraw	848570d8db	Suppressing package-lock.json?d3-color vulnerability (#13301 )	2022-11-04 11:47:02 +05:30
Jonathan Wei	2fdaa2fcab	Make RecordSupplierInputSource respect sampler timeout when stream is empty (#13296 ) * Make RecordSupplierInputSource respect sampler timeout when stream is empty * Rename timeout param, make it nullable, add timeout test	2022-11-03 17:45:35 -05:00
Gian Merlino	2a757b64e8	Update Curator in licenses.yaml. (#13306 )	2022-11-03 15:42:30 -07:00
Didip Kerabat	c875f4bd04	Upgrade curator to 5.4.0 (#13302 )	2022-11-03 11:26:19 -07:00
Gian Merlino	8f90589ce5	Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. (#13247 ) * Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH. These aggregation functions are documented as creating sketches. However, they are planned into native aggregators that include finalization logic to convert the sketch to a number of some sort. This creates an inconsistency: the functions sometimes return sketches, and sometimes return numbers, depending on where they lie in the native query plan. This patch changes these SQL aggregators to _never_ finalize, by using the "shouldFinalize" feature of the native aggregators. It already existed for theta sketches. This patch adds the feature for hll and quantiles sketches. As to impact, Druid finalizes aggregators in two cases: - When they appear in the outer level of a query (not a subquery). - When they are used as input to an expression or finalizing-field-access post-aggregator (not any other kind of post-aggregator). With this patch, the functions will no longer be finalized in these cases. The second item is not likely to matter much. The SQL functions all declare return type OTHER, which would be usable as an input to any other function that makes sense and that would be planned into an expression. So, the main effect of this patch is the first item. To provide backwards compatibility with anyone that was depending on the old behavior, the patch adds a "sqlFinalizeOuterSketches" query context parameter that restores the old behavior. Other changes: 1) Move various argument-checking logic from runtime to planning time in DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker. 2) Add various JsonIgnores to the sketches to simplify their JSON representations. 3) Allow chaining of ExpressionPostAggregators and other PostAggregators in the SQL layer. 4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer, now that expressions can operate on complex inputs. 5) Adjust return type to thetaSketch (instead of OTHER) in ThetaSketchSetBaseOperatorConversion. * Fix benchmark class. * Fix compilation error. * Fix ThetaSketchSqlAggregatorTest. * Hopefully fix ITAutoCompactionTest. * Adjustment to ITAutoCompactionTest.	2022-11-03 09:43:00 -07:00
Gian Merlino	d1877e41ec	Use lookup memory footprint in MSQ memory computations. (#13271 ) * Use lookup memory footprint in MSQ memory computations. Two main changes: 1) Add estimateHeapFootprint to LookupExtractor. 2) Use this in MSQ's IndexerWorkerContext when determining the total amount of available memory. It's taken off the top. This prevents MSQ tasks from running out of memory when there are lookups defined in the cluster. * Updates from code review.	2022-11-03 07:36:54 -07:00
DENNIS	c5fcc03bdf	PrometheusEmitter NullPointerException fix (#13286 ) * PrometheusEmitter NullPointerException fix * Improved null value judgment in pushMetric * Delete meaningless judgments about namespace * Delete unnecessary @Nullable above namespace attribute	2022-11-03 18:50:27 +08:00
Laksh Singla	ccc55ef899	Mask SQL String in the MSQTaskQueryMaker for secrets (#13231 ) * add test * add masking code * fix test * oops * refactor json usage * refactor, variable update * add test cases * Trigger Build * add comment to the regex * address review comment	2022-11-03 15:27:28 +05:30
317brian	ae638e338c	docs(msq): update insert vs replace for dimension-based segment pruning (#13228 ) * docs(msq): update insert vs replace to mention dimension-based segment pruning * make suggested changes	2022-11-03 14:17:44 +05:30
Laksh Singla	7cb21cb968	Use worker number instead of task id in MSQ for communication to/from workers. (#13062 ) * Conversion from taskId to workerNumber in the workerClient * storage connector changes, suffix file when finish writing to it * Fix tests * Trigger Build * convert IntFunction to a dedicated interface * first review round * use a dummy file to indicate success * fetch the first filename from the list in case of multiple files * tests working, fix semantic issue with ls * change how the success flag works * comments, checkstyle, method rename * fix test * forbiddenapis fix * Trigger Build * change the writer * dead store fix * Review comments * revert changes * review * review comments * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/shuffle/DurableStorageInputChannelFactory.java Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * Update extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/shuffle/DurableStorageInputChannelFactory.java Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * update error messages * better error messages * fix checkstyle Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2022-11-03 10:25:45 +05:30
Clint Wylie	018f984781	fix nested column range index range computation (#13297 ) * fix nested column range index range computation * simplify, add missing bounds check for FixedIndexed	2022-11-02 21:37:41 -07:00
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00

1 2 3 4 5 ...

12197 Commits All Branches Search

12197 Commits

All Branches