druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	566013a5f5	Docs: Fix spelling of 5 GB. (#16040 ) The spellchecker does not consider "5GB" to be spelled correctly.	2024-03-04 22:37:38 -08:00
Zoltan Haindrich	e469b7ed34	Make setting QUERY_CONTEXT_DEFAULT explicit in tests (#16010 )	2024-03-05 10:54:16 +05:30
zachjsh	720f1e834a	Add support for AzureDNSZone enabled storage accounts used for deep storage (#16016 ) * Add support for AzureDNSZone enabled storage accounts used for deep storage Added a new config to AzureAccountConfig `storageAccountEndpointSuffix` which allows the user to specify a storage account endpoint suffix where the underlying storage account is enabled for AzureDNSZone. The previous config `endpointSuffix`, did not allow support for such accounts. The previous config has been deprecated in favor of this new config. Also fixed an issue where `managedIdentityClientId` was not being set properly * * address review comments * * add back azure government link and docs	2024-03-04 16:13:28 -05:00
Gian Merlino	930655ff18	Move retries into DataSegmentPusher implementations. (#15938 ) * Move retries into DataSegmentPusher implementations. The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. * Fix missing var. * Adjust imports. * Tests, comments, style. * Remove unused import.	2024-03-04 10:36:21 -08:00
Katya Macedo	ced8be3044	docs: Add upgrade notes for Druid 29.0.0 (#16022 )	2024-03-04 08:58:52 -08:00
Gian Merlino	376a41f1e9	Rows.objectToNumber: Accept decimals with output type LONG. (#15999 ) * Rows.objectToNumber: Accept decimals with output type LONG. PR #15615 added an optimization to avoid parsing numbers twice in cases where we know that they should definitely be longs or definitely be doubles. Rather than try parsing as long first, and then try parsing as double, it would use only the parsing routine specific to the requested outputType. This caused a bug: previously, we would accept decimals like "1.0" or "1.23" as longs, by truncating them to "1". After that patch, we would treat such decimals as nulls when the outputType is set to LONG. This patch retains the short-circuit for doubles: if outputType is DOUBLE, we only parse the string as a double. But for outputType LONG, this patch restores the old behavior: try to parse as long first, then double.	2024-03-04 22:00:27 +05:30
Sensor	4e9b758661	Support CPU resource configurable for Kubernates job under MoK Mode (#16008 ) * support CPU resource configurable for Kubernates job * update property doc * fix test name * refine doc format	2024-03-04 10:12:09 -05:00
Adithya Chakilam	ec52f686c0	Fix compaction tasks reports getting overwritten (#15981 ) * Fix compaction tasks reports geting overwrittened * only skip for compactiont task * address comments * fix boolean * move boolean flag to task rather than spec * rename variable * add docs, fix missing case * Update docs/ingestion/tasks.md * rename var * add task report decode check in IT * change assert	2024-03-04 10:10:17 -05:00
Adarsh Sanjeev	93eeb05eaf	Revert explain attributes change to old behaviour. (#16004 ) * Revert explain attributes change * Fix tests * Fix tests * Rename function	2024-03-04 15:56:02 +05:30
Sree Charan Manamala	820febf38c	Improved Connection Count server select strategy (#15975 ) Updated the Direct Druid Client so as to make Connection Count Server Selector Strategy work more efficiently. If creating connection to a node is slow, then that slowness wouldn't be accounted for if we count the open connections after sending the request. So we increment the counter and then send the request.	2024-03-04 15:02:32 +05:30
317brian	b3015bd7ce	docs: mention acid-compliance for meta store (#16014 ) * docs: add mermaid diagram support * fix crash when parsing data in data loader that can not be parsed (#15983) * update jetty to address CVE (#16000) * Concurrent replace should work with supervisors using concurrent locks (#15995) * Concurrent replace should work with supervisors using concurrent locks * Ignore supervisors with useConcurrentLocks set to false * Apply feedback * Add pre-check for heavy debug logs (#15706) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org> * Remove helm paths from CodeQL config (#16006) * docs: mention acid-compliance for metadb --------- Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Jan Werner <105367074+janjwerner-confluent@users.noreply.github.com> Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io> Co-authored-by: Sensor <fectrain@outlook.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-03-04 11:00:38 +08:00
Gian Merlino	8d3ed31015	MSQ: Nicer error when sortMerge join falls back to broadcast. (#16002 ) * MSQ: Nicer error when sortMerge join falls back to broadcast. In certain cases, joins run as broadcast even when the user hinted that they wanted sortMerge. This happens when the sortMerge algorithm is unable to process the join, because it isn't a direct comparison between two fields on the LHS and RHS. When this happens, the error message from BroadcastTablesTooLargeFault is quite confusing, since it mentions that you should try sortMerge to fix it. But the user may have already configured sortMerge. This patch fixes it by having two error messages, based on whether broadcast join was used as a primary selection or as a fallback selection. * Style. * Better message.	2024-03-01 13:16:39 -08:00
George Shiqi Wu	ef48aceff8	Fix segment/unavailable/count (#16020 )	2024-03-01 15:38:27 -05:00
Zoltan Haindrich	bf0995f846	Introduce dynamic table append (#15897 )	2024-03-01 04:31:57 -05:00
Vadim Ogievetsky	acb5124679	make double detection better (#15998 )	2024-02-29 15:45:58 -08:00
Vadim Ogievetsky	c5b032799c	Web console: add table and column search (#15990 ) * Make a search * fix snapshot * added message when not found	2024-02-29 15:45:50 -08:00
Clint Wylie	101176590c	adaptive filter partitioning (#15838 ) * cooler cursor filter processing allowing much smart utilization of indexes by feeding selectivity forward, with implementations for range and predicate based filters * added new method Filter.makeFilterBundle which cursors use to get indexes and matchers for building offsets * AND filter partitioning is now pushed all the way down, even to nested AND filters * vector engine now uses same indexed base value matcher strategy for OR filters which partially support indexes	2024-02-29 15:38:12 -08:00
Jan Werner	baaa4a6808	update common-compress to address CVE-2024-25710 CVE-2024-26308 (#16009 ) * Update common-compress to 1.26.0 to address CVEs CVE-2024-25710 CVE-2024-26308 * Add commons-codec as a runtime dependency required by common-compress 1.26.0 --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2024-02-29 14:05:31 -08:00
Sensor	3acfc95453	Remove helm paths from CodeQL config (#16006 )	2024-02-29 20:02:27 +05:30
Sensor	e0bce0ef90	Add pre-check for heavy debug logs (#15706 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-02-29 12:58:14 +05:30
AmatyaAvadhanula	7c42e87db9	Concurrent replace should work with supervisors using concurrent locks (#15995 ) * Concurrent replace should work with supervisors using concurrent locks * Ignore supervisors with useConcurrentLocks set to false * Apply feedback	2024-02-29 12:06:47 +05:30
Jan Werner	d6f59d1999	update jetty to address CVE (#16000 )	2024-02-29 09:27:31 +08:00
Vadim Ogievetsky	6e222d47c8	fix crash when parsing data in data loader that can not be parsed (#15983 )	2024-02-28 14:37:24 -08:00
Kashif Faraz	f757231420	Use cache for password hash while validating LDAP password (#15993 )	2024-02-28 18:33:33 +05:30
Adarsh Sanjeev	d2c2036ea2	Optimize MSQ realtime queries (#15399 ) Currently, while reading results from realtime tasks, requests are sent on a segment level. This is slightly wasteful, as when contacting a data servers, it is possible to transfer results for all segments which it is hosting, instead of only one segment at a time. One change this PR makes is to group the segments on the basis of servers. This reduces the number of queries to data servers made. Since we don't have access to the number of rows for realtime segments, the grouping is done with a fixed estimated number of rows for each realtime segment.	2024-02-28 11:32:14 +05:30
317brian	3df161f73c	docs: update security doc for hashing (#15970 ) * docs: add mermaid diagram support * docs: update druid-basic-security doc to mention caching * Update docs/development/extensions-core/druid-basic-security.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-28 09:48:37 +08:00
benkrug	0c601bf430	Update basic-cluster-tuning.md (#14964 ) * Update basic-cluster-tuning.md The sentence "When free system memory is greater than or equal to druid.segmentCache.locations, the more segment data the Historical can be held in the memory-mapped segment cache" didn't read well. Updated to clarify it. * Update docs/operations/basic-cluster-tuning.md * Update docs/operations/basic-cluster-tuning.md --------- Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2024-02-28 09:48:20 +08:00
AlbericByte	f07d402f48	pin Testng dependencies to 7.3.0 (#15924 )	2024-02-28 09:48:13 +08:00
AlbericByte	e7d753d4b0	update the doc for dump-segment tool when using jdk11+ (#15971 ) * update the doc for dump-segment tool when using jdk11+ * update the style * fix spell check error	2024-02-28 09:40:10 +08:00
Abhishek Radhakrishnan	beccc401e1	Segments created in the same batch have the same `created_date` entry & rename metric (#15977 ) * All segments stored in the same batch have the same created_date entry. In the absence of a group_id column, this metadata would allow us to easily reason about and troubleshoot ingestion-related issues. * Rename metric name and code references to eligibleUnusedSegments. Address review comment from https://github.com/apache/druid/pull/15941#discussion_r1503631992	2024-02-27 17:28:43 +05:30
Karan Kumar	5bb5b41b18	Adding task pending time in MSQ reports (#15966 ) Added a new field pendingMs in MSQ task reports. This helps in figuring out the exact run time of the MSQ worker tasks. Fixed data races.	2024-02-27 14:41:28 +05:30
Abhishek Radhakrishnan	38ecf980d0	Refactor and add tests and metric to KillUnusedSegments duty (auto-kill) (#15941 ) * Kill duty and test improvements. Initial commit with: - Bug fixes - auto-kill can throw NPE when there are no datasources present and defaults mismatch. - Add new stat for candidate segment intervals killed. - Move a couple of debug logs to info logs for improved visibility (should only log once per kill period). - Remove redundant checks for code readability. - Updated tests from using mocks (also the mocks weren't using last updated timestamp) and add more test coverage for different config parameters. - Add a couple of unit tests that are ignored for the eternity case to prove that the kill duty doesn't clean up segments with ALL grain or that end in DateTimes.MAX. - Migrate Druid exception from user to operator persona. * Address review comments. * Remove unused methods. * fix up format specifier and validate bad config tests. * Consolidate the helpers a bit more and add another test. * Update test names. Add javadoc placeholders for slightly involved tests. * Add docs for metric kill/candidateUnusedSegments/count. Also, rename to disambiguate. * Comments. * Apply logging suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Review comments - Clarify docs on eligibility. - Add test for multiple segments in the same interval. Clarify comment. - Remove log line from test. - Remove lastUpdatedDate = now.plus(10) from test. * minor cleanup. * Clarify javadocs for getUnusedSegmentIntervals(). --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-02-27 12:14:41 +05:30
Laksh Singla	17e4f3ac60	Refactor GroupBy and TopN code to relax the constraint of dimensions being comparable (#15559 ) The code in the groupBy engine and the topN engine assume that the dimensions are comparable and can call dimA.compareTo(dimB) to sort the dimensions and group them together. This works well for the primitive dimensions, because they are Comparable, however falls apart when the dimensions can be arrays (or in future scenarios complex columns). In cases when the dimensions are not comparable, Druid resorts to having a wrapper type ComparableStringArray and ComparableList, which is a Comparable, based on the list comparator.	2024-02-27 11:39:29 +05:30
Soumyava	51cc729fd1	Enforcing type checking for flatten concat (#15903 )	2024-02-26 21:53:49 -08:00
Vadim Ogievetsky	a81429746d	Web console: fix typos in Kinesis suggestions, add regions and groups (#15900 ) * fix typo * update regions * add China * Update web-console/src/druid-models/ingestion-spec/ingestion-spec.tsx Co-authored-by: Benedict Jin <asdf2014@apache.org> * add , --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2024-02-27 10:00:02 +08:00
Vadim Ogievetsky	bf3139562c	Web console: support for the export execution state (#15969 ) * init * add CSV keyword	2024-02-26 11:28:25 -08:00
Vadim Ogievetsky	28b3e117cf	Web console: Add input format props (#15950 ) * fix typo * add Protobuf * better padding	2024-02-26 11:28:09 -08:00
Abhishek Radhakrishnan	67a6224d91	Fix up incorrect `PARTITIONED BY` error messages (#15961 ) * Fix up typos, inaccuracies and clean up code related to PARTITIONED BY. * Remove wrapper function and update tests to use DruidExceptionMatcher. * Checkstyle and Intellij inspection fixes.	2024-02-26 14:17:53 -05:00
Abhishek Agarwal	ddfc31d7ed	Reduce the size of distribution docker image (#15968 ) This PR creates symlinks when there are duplicate jars present in the extension. Docker image includes contrib extensions, too, and the size of the image has bloated up quite a lot of late. This change also fixes "ITNestedQueryPushDownTest integration test"	2024-02-26 21:18:55 +05:30
AmatyaAvadhanula	e2b7289dea	Try to fetch the task status for an active from memory (#15724 ) * Reduce metadata calls to fetch the status for an active task	2024-02-26 13:53:05 +05:30
Benjamin Hopp	ebb7190545	Docs: Change single-dim to hashed in example for index task (#15529 )	2024-02-26 09:16:10 +05:30
Zoltan Haindrich	06deda9415	ScanAndSort query fails with NPE for simple queries (#15914 ) * some stuff * add dummy fields * draft-fix * rename test * cleanup * add null * cleanup * cleanup * add test * updates * move check tp constructore * cleanup * updates/etc * fix some more * add rowSignatureMode * checkstyle/etc * override * missing msqIncompat * fix test * fixes * undo * updates * remove param	2024-02-24 15:33:50 -08:00
Clint Wylie	6145c8dd01	fix bug with expression virtual column indexes on missing columns for expressions that turn null values into not null values (#15959 )	2024-02-23 15:07:32 -08:00
Gian Merlino	b69f89d9f8	Clarify where to set druid.monitoring.monitors. (#15729 )	2024-02-23 18:49:37 +05:30
Adithya Chakilam	1f443d218c	Enable partition stats on streaming task completion report (#15930 ) Changes: - Add visibility into number of records processed by each streaming task per partition - Add field `recordsProcessed` to `IngestionStatsAndErrorsTaskReportData` - Populate number of records processed per partition in `SeekableStreamIndexTaskRunner`	2024-02-23 16:29:03 +05:30
dependabot[bot]	3011829419	Bump log4j.version from 2.18.0 to 2.22.1 (#15934 ) * Bump log4j.version from 2.18.0 to 2.22.1 Bumps `log4j.version` from 2.18.0 to 2.22.1. Updates `org.apache.logging.log4j:log4j-api` from 2.18.0 to 2.22.1 Updates `org.apache.logging.log4j:log4j-core` from 2.18.0 to 2.22.1 Updates `org.apache.logging.log4j:log4j-slf4j-impl` from 2.18.0 to 2.22.1 Updates `org.apache.logging.log4j:log4j-1.2-api` from 2.18.0 to 2.22.1 Updates `org.apache.logging.log4j:log4j-jul` from 2.18.0 to 2.22.1 --- updated-dependencies: - dependency-name: org.apache.logging.log4j:log4j-api dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.logging.log4j:log4j-core dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.logging.log4j:log4j-slf4j-impl dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.logging.log4j:log4j-1.2-api dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: org.apache.logging.log4j:log4j-jul dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update License --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: frank chen <frank.chen021@outlook.com>	2024-02-23 16:19:35 +08:00
dependabot[bot]	936ba25e85	Bump org.postgresql:postgresql from 42.6.0 to 42.7.2 (#15931 ) * Bump org.postgresql:postgresql from 42.6.0 to 42.7.2 Bumps [org.postgresql:postgresql](https://github.com/pgjdbc/pgjdbc) from 42.6.0 to 42.7.2. - [Release notes](https://github.com/pgjdbc/pgjdbc/releases) - [Changelog](https://github.com/pgjdbc/pgjdbc/blob/master/CHANGELOG.md) - [Commits](https://github.com/pgjdbc/pgjdbc/commits) --- updated-dependencies: - dependency-name: org.postgresql:postgresql dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Update License --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: frank chen <frank.chen021@outlook.com>	2024-02-23 16:19:26 +08:00
Vadim Ogievetsky	c52ddd0b86	make flattenSpec location adaptive (#15946 )	2024-02-22 14:07:04 -08:00
zachjsh	8ebf237576	Move INSERT & REPLACE validation to the Calcite validator (#15908 ) This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner https://github.com/apache/druid/pull/13686 from @paul-rogers, Refactoring the IngestHandler and subclasses to produce a validated SqlInsert instance node instead of the previous Insert source node. The SqlInsert node is then validated in the calcite validator. The validation that is implemented as part of this pr, is only that for the source node, and some of the validation that was previously done in the ingest handlers. As part of this change, the partitionedBy clause can be supplied by the table catalog metadata if it exists, and can be omitted from the ingest time query in this case.	2024-02-22 14:01:59 -05:00
Katya Macedo	f37d019fe6	Fix redirects for streaming ingestion (#15943 )	2024-02-22 22:34:19 +05:30

... 5 6 7 8 9 ...

14054 Commits All Branches Search

14054 Commits

All Branches