druid

Commit Graph

Author	SHA1	Message	Date
Vadim Ogievetsky	f7c0e425a9	fix URI wording (#16111 )	2024-03-13 11:38:51 -07:00
Gian Merlino	910124d4de	MSQ: Plan without implicit sorting. (#16073 ) * MSQ: Plan without implicit sorting. This patch adds an EngineFeature "GROUPBY_IMPLICITLY_SORTS" and sets it true for native, false for MSQ. It's useful for two reasons: 1) In the future we'll likely want MSQ to hash-partition for GROUP BY instead of using a global sort, which would mean MSQ would not implicitly ORDER BY when there is a GROUP BY. 2) When doing REPLACE with MSQ, CLUSTERED BY is transformed to ORDER BY. We should retain that ORDER BY, as it may be a subset of the GROUP BY, and it is important to remember which fields the user wanted to include in range shard specs. * Fix tests. * Fix tests for real. * Fix test.	2024-03-13 08:27:39 -07:00
Zoltan Haindrich	818cc9eedf	Fix toString for SingleThreadSpecializable ConstantExprs (#16084 )	2024-03-13 03:48:52 -07:00
Clint Wylie	aa2959b2bd	reset keySerde when closing groupers to clear out heap dictionaries (#16114 ) ConcurrentGrouper kind of misuses ThreadLocal to hold a SpillingGrouper, and never calls remove() on it, which can result in large amounts of heap being retained as weak references even after grouping is finished. This PR calls keySerde.reset() on all of the Grouper.close() implementations that have a KeySerde and should free up a bunch of space that is no longer needed.	2024-03-13 15:09:54 +05:30
Clint Wylie	795e342ba8	fix sql results mixed array and scalar values (#16105 ) * fix sql results mixed array and scalar values * simplify	2024-03-12 23:47:35 -07:00
Kashif Faraz	82fced571b	Remove deprecated UnknownSegmentIdsException (#16112 ) Changes - Replace usages of `UnknownSegmentIdsException` with `DruidException` - Add method `SqlMetadataQuery.retrieveSegments` - Add new field `used` to `DataSegmentPlus`	2024-03-13 11:07:37 +05:30
Abhishek Radhakrishnan	fb7bb0953d	Kill segments by versions (#15994 ) * Kill task version support. Kill tasks by default kill all versions of unused segments in the specified interval. Users wanting to delete specific versions (for example, data compliance reasons) and keep rest of the versions can specify the optional version in the kill task payload. * Formatting changes. * Multi version tests in RetrieveSegmentsActionsTest Sort of like method-level parameterized tests. * Address review feedback * Accept a list of versions instead of a single version. Support multiple versions. * Tests for multiple versions. * Update docs * Cleanup * Address review comments. Retain the old interface method and make it default and route it to the method with nullable versions variant. Update usages to use the default method where versions doesn't matter. * Remove versions from retreive used segments action. * Some updates. * Apply suggestions from code review Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * /s/actual/observed/g * minor test cleanup * WIP: Test fixes and updates. Also add test for kill by version with used load spec. Checkpoint. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-13 09:37:30 +05:30
Karan Kumar	84c5098473	Fix data race in getting results from MSQ select tasks. (#16107 ) * Fix data race in getting results from MSQ select tasks. * Add better logging * Handling number overflow.	2024-03-13 08:58:18 +05:30
Katya Macedo	6f6f86c325	Update `maxRowsInMemory` and `maxBytesInMemory` description (#16104 )	2024-03-12 14:40:15 -07:00
Zoltan Haindrich	8252d72e2a	Pull up literals in InputAccessor (#16033 ) * Pull up literals in InputAccessor * pull up literals in `InputAccessor` * remove the need to pass `constants` of `Window` operator Fixes #15353 * update test * enable relax_nulls	2024-03-12 09:14:31 -07:00
Sree Charan Manamala	ef9637eef1	Handling array with boolean literals (#16093 ) Handling array with boolean literals like ARRAY[true, false] Druid appears to be able to convert an array with boolean expressions like this array[added=deleted, added=delta] into a numeric array of 0 and 1: select array[added=deleted, added=delta] from wikipedia However, select array[true, false] from wikipedia doesn't work. This PR fixes this.	2024-03-12 12:28:16 +05:30
Abhishek Radhakrishnan	0a615f16de	Fix bug where numSegmentsKilled is reported incorrectly. Also, add a unit test. (#16103 )	2024-03-12 10:02:54 +05:30
Vadim Ogievetsky	8ef3eebd30	Web console: upgrade axios and follow-redirects (#16087 ) * upgrade axios * upgrade jest	2024-03-11 18:57:00 -07:00
Soumyava	85ee775390	Handling latest_by and earliest_by on numeric columns correctly (#15939 ) * Handling latest_by and earliest_by on numeric columns correctly * Adding test	2024-03-11 13:49:21 -07:00
Clint Wylie	313da98879	decouple column serializer compression closers from SegmentWriteoutMedium to optionally allow serializers to release direct memory allocated for compression earlier than when segment is completed (#16076 )	2024-03-11 12:28:04 -07:00
Abhishek Radhakrishnan	8084f2206b	Remove `@JsonIgnore` annotations for private members of `TaskAction` classes (#16099 ) * Remove @JsonIgnore annotations for private members * checkstyle fix - removed unused imports.	2024-03-12 00:12:36 +05:30
George Shiqi Wu	94d2a28465	Add deep storage segment metric (#16072 ) * Add new metric for deepStorage segments * Add docs * change metric name	2024-03-11 10:24:46 -04:00
Vishesh Garg	2dd8b16467	Correct the API used to fetch the version for a GCS object (#16097 ) Current API used to fetch the version for a GCS object is incorrect. This PR fixes that API.	2024-03-11 18:30:34 +05:30
Abhishek Radhakrishnan	c7f1872bd1	Fixup KillUnusedSegmentsTest (#16094 ) Changes: - Use an actual SqlSegmentsMetadataManager instead of TestSqlSegmentsMetadataManager - Simplify TestSegmentsMetadataManager - Add a test for large interval segments.	2024-03-11 13:37:48 +05:30
sullis	148ad32e75	netty 4.1.107 (#16027 ) * netty 4.1.107 * update licenses.yaml	2024-03-11 15:57:44 +08:00
Zoltan Haindrich	2eb7d7a89b	Calcite tests remove expected exception (#16046 ) * Calcite tests remove expected exception * update testcases using `expectedException` to utilize `assertThrows` instead * remove `BaseCalciteQueryTest#expectedException` * fixes `cannotVectorize` so it doesn't anymore stops further processing * `msqIncompatible` is not anymore toggles a boolean - its an `Assume` instead Fixes #15423 * cleanup * move msqIncompat * update test * cleanup * remove comment * empty-commit * empty-commit	2024-03-11 13:23:57 +05:30
Sensor	2d62b4f09b	docs refinement: json format (#16080 ) * docs refinement: json format * Update docs/api-reference/tasks-api.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2024-03-11 15:49:14 +08:00
gzhao9	2d628cce84	Refactor AsyncQueryForwardingServletTest to reduce code duplication (#16092 )	2024-03-10 17:32:43 +05:30
Sensor	ba3d4daf45	Fix field names in PeonCommandContext (#16067 )	2024-03-09 08:46:51 +05:30
Vishesh Garg	b1c1937e94	Change last update timestamp granularity of GCS objects from seconds to milliseconds (#16083 ) The previously used GCS API client library returned last update time for objects directly in milliseconds. The new library returns it in OffsetDateTime format which was being converted to seconds and stored against the object. This fix converts the time back to ms before storing it.	2024-03-09 07:54:33 +05:30
Parth Agrawal	3aec90563e	Fix Jest and Prettify Checks (#15544 ) * Fix jest and prettify checks * Remove defaultQueryContext and run jest again --------- Co-authored-by: Ghazanfar-CFLT <mghazanfar@confluent.io>	2024-03-08 14:36:26 -08:00
Vadim Ogievetsky	2816121ef0	wait for the coordinator (and proxy service to start) (#16088 )	2024-03-08 13:31:29 -08:00
George Shiqi Wu	40ebaf83c9	Fix bug with mmless ingestion and compaction tasks on azure (#16065 ) * Update azure behavior to match s3 * Add test * Cleanup logic * fix checkstyle * Add comment	2024-03-08 15:42:44 -05:00
Vadim Ogievetsky	19a8af866b	make detail archive opening more robust (#16071 )	2024-03-08 10:55:31 -08:00
dependabot[bot]	775c1180ae	Bump redis.clients:jedis from 5.0.2 to 5.1.2 (#16074 ) Bumps [redis.clients:jedis](https://github.com/redis/jedis) from 5.0.2 to 5.1.2. - [Release notes](https://github.com/redis/jedis/releases) - [Commits](https://github.com/redis/jedis/compare/v5.0.2...v5.1.2) --- updated-dependencies: - dependency-name: redis.clients:jedis dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-03-08 07:40:37 -08:00
Jan Werner	834a0ad9f1	update jose4j and corresponding license file (#16078 ) Update org.bitbucket.b_c:jose4j from 0.9.3 to 0.9.6. to resolve https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-51775 fixes #16075	2024-03-08 07:36:07 -08:00
Jan Werner	a7b2747e56	remove aws-sdk from ranger-extension (#16011 ) Fixes # size blowup regression introduced in https://github.com/apache/druid/pull/15443 This PR removes the transitive dependency of ranger-plugins-audit to reduce the size of the compiled artifacts * add aws-logs-sdk to ensure that all the transitive dependencies are satisfied * replace aws-bundle-sdk with aws-logs-sdk * add additional guidance on ranger update, add dependency ignore to satisfy dependency analyzer * add aws-sdk-logs to list of ignored dependencies to satisfy the maven plugin * align aws-sdk versions	2024-03-08 07:35:29 -08:00
Zoltan Haindrich	60766495aa	Use dorny/paths-filter@v3.0.0 (#16082 )	2024-03-08 13:35:26 +05:30
Abhishek Radhakrishnan	daf03939a9	Upgrade GHA dependencies (#15954 ) * Upgrade actions/checkout from v3 to v4. * Upgrade actions/setup-java from v3 to v4. * Upgrade dorny/paths-filter, actions/cdache/restore, actions/stale to v3, v4 and v9 respectively. * Add a GHA label for .github/** and skip UT/IT on .github files. * remove skipping UT/IT on .github/** changes.	2024-03-08 07:54:02 +05:30
Kashif Faraz	5f203725dd	Clean up SqlSegmentsMetadataManager and corresponding tests (#16044 ) Changes: Improve `SqlSegmentsMetadataManager` - Break the loop in `populateUsedStatusLastUpdated` before going to sleep if there are no more segments to update - Add comments and clean up logs Refactor `SqlSegmentsMetadataManagerTest` - Merge `SqlSegmentsMetadataManagerEmptyTest` into this test - Add method `testPollEmpty` - Shave a few seconds off of the tests by reducing poll duration - Simplify creation of test segments - Some renames here and there - Remove unused methods - Move `TestDerbyConnector.allowLastUsedFlagToBeNull` to this class Other minor changes - Add javadoc to `NoneShardSpec` - Use lambda in `SqlSegmentMetadataPublisher`	2024-03-08 07:34:51 +05:30
Charles Smith	3caacba8c5	update window functions doc (#15902 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2024-03-07 15:16:52 -08:00
AmatyaAvadhanula	5871b81a78	Fix race in BaseNodeRoleWatcher tests (#16064 ) * Fix race in BaseNodeRoleWatcher tests * Make non static	2024-03-07 13:41:16 -08:00
Zoltan Haindrich	aaa64832fd	Disable DecoupledPlanningCalciteJoinQueryTest until it gets fixed (#16070 ) Recently this test started other tests from executing by triggering a bug somewhere in surefire. This patch disables the testcases in case of non-sql compat mode.	2024-03-07 12:55:48 -08:00
George Shiqi Wu	80cab51d50	Fix bug with cancelling pending tasks when running kubernetes ingestion. (#16036 ) * Fix bug * Add new test	2024-03-07 15:48:14 -05:00
Jill Osborne	67ae0ff450	Update docs for rabbit community extension (#16069 ) * Updated docs for rabbit community extension * Updated after review	2024-03-07 11:29:53 -08:00
Vishesh Garg	bed5d9c3b2	Remove exception on failure response from GCS delete API (#16047 ) * Throw 404 Exception on failure response from GCS delete API * Replace String.format * Apply suggestions from code review Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * Remove exception for file not found and fix tests * Add warn log and fix intellij inspection errors * More intellij inspection fixes * * Change to debug log * change runtime exception class for code coverage * Add file paths for batch delete failures * Move failedPaths computation to inside isDebugEnabled flag * Correct handling of StorageException * Address review comments * Remove unused exceptions * Address code coverage and review comments * Minor corrections --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2024-03-07 17:57:17 +05:30
Laksh Singla	5f588fa45c	Fix bug while materializing scan's result to frames (#15987 ) While converting Sequence<ScanResultValue> to Sequence<Frames>, when maxSubqueryBytes is enabled, we batch the results to prevent creating a single frame per ScanResultValue. Batching requires peeking into the actual value, and checking if the row signature of the scan result’s value matches that of the previous value. Since we can do this indefinitely (in the worst case all of them have the same signature), we keep fetching them and accumulating them in a list (on the heap). We don’t really know how much to batch before we actually write the value as frames. The PR modifies the batching logic to not accumulate the results in an intermediary list	2024-03-07 17:11:44 +05:30
Parth Agrawal	bf39c71d2a	Update protocol for MemcachedCache (#16035 )	2024-03-06 22:28:11 -08:00
Adithya Chakilam	564c44ed85	Add stats segmentsRead and segmentsPublished to compaction task reports (#15947 ) Changes: - Add visibility into number of segments read/published by each parallel compaction - Add new fields `segmentsRead`, `segmentsPublished` to `IngestionStatsAndErrorsTaskReportData` - Update `ParallelIndexSupervisorTask` to populate the new stats	2024-03-07 09:37:23 +05:30
Charles Smith	ebf3bdd909	restore information about truncated responses to sql api (#16001 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>	2024-03-06 14:03:58 -08:00
Adithya Chakilam	ae022cc0c9	fixup!: #15981 Missing completion reports on index_parallel tasks (#16042 ) * initial commit * comments * typo * comments * comments * remove var * initialize global var early * remove new line * small test fix * same fix another test	2024-03-06 13:58:34 -05:00
AmatyaAvadhanula	c2841425f4	Handle uninitialized cache in Node role watchers (#15726 ) BaseNodeRoleWatcher counts down cacheInitialized after a timeout, but also sets some flag that it was a timed-out initialization. and call nodeViewInitializationTimedOut (new method on listeners) instead of nodeViewInitialized. Then listeners can do what is most appropriate with this information.	2024-03-06 16:00:24 +05:30
Vishesh Garg	cf9bc507f6	Fix compilation failure due to missing constant MISSING_JOIN_CONVERSION (#16050 ) * Reintroduce variable MISSING_JOIN_CONVERSION * Remove redundant constant MISSING_JOIN_CONVERSION2 * Correct fix to address failing tests	2024-03-06 15:34:39 +08:00
Adarsh Sanjeev	ddd9da2e09	Make servedSegments nullable to maintain compatibility (#16034 ) * Make servedSegments nullable to maintain compatibility	2024-03-06 11:39:24 +05:30
Zoltan Haindrich	65c3b4d31a	Support join in decoupled mode (#15957 ) * plan join(s) in decoupled mode * configure DecoupledPlanningCalciteJoinQueryTest the test has 593 cases; however there are quite a few parameterized from the 107 methods annotated with @Test - 42 is not yet working * replace the isRoot hack in DruidQueryGenerator with a logic that instead looks ahead for the next node; and doesn't let the previous node do the Project - this makes it plan more likely than the existing planner	2024-03-05 19:10:13 -06:00

... 7 8 9 10 11 ...

14210 Commits All Branches Search

14210 Commits

All Branches