druid

Commit Graph

Author	SHA1	Message	Date
Kashif Faraz	64e6283eca	Do not allow retention rules to be null (#14223 ) Changes: - Do not allow retention rules for any datasource or cluster to be null - Allow empty rules at the datasource level but not at the cluster level - Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly - Minor style refactors	2023-05-11 14:33:56 +05:30
AmatyaAvadhanula	47e48ee657	Remove incorrect optimization (#14246 )	2023-05-11 00:54:41 -07:00
Clint Wylie	e833a4700d	suppress hadoop3 cve that seem not applicable to us (#14252 )	2023-05-10 23:08:05 -07:00
Abhishek Agarwal	f3ff36a004	Move the stale bot to a GHA action (#14238 ) Move the stale bot to a GHA action	2023-05-11 11:31:28 +05:30
Clint Wylie	aaaff74740	fix npe regression in json_value when filtering non-existent paths (#14250 ) * fix npe regression in json_value when filtering non-existent paths * more coverage	2023-05-10 22:39:22 -07:00
Clint Wylie	6db11bfc60	suppress some cves and fix javadoc build when using java 17 (#14241 )	2023-05-10 15:47:10 -07:00
Clint Wylie	625c4745b1	add context flag "useAutoColumnSchemas" to use new auto types for MSQ segment generation (#14175 )	2023-05-10 15:37:14 -07:00
George Shiqi Wu	161d12eb44	Fix unit tests for java 17 (#14207 ) Fix a unit test that fails in java 17	2023-05-09 20:02:31 +05:30
Kashif Faraz	bd0080c4ce	Update default values in docs (#14233 )	2023-05-09 19:13:51 +05:30
Shingo Kitagawa	152e9375e2	update documentation about multiValueHandling (#14197 ) * update documentation about multiValueHandling * Update docs/ingestion/ingestion-spec.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/ingestion/ingestion-spec.md Co-authored-by: Gian Merlino <gianmerlino@gmail.com> * fix spelling --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2023-05-08 16:16:54 -07:00
Clint Wylie	8805d8d7db	fix issues with filtering nulls on values coerced to numeric types (#14139 ) * fix issues with filtering nulls on values coerced to numeric types * fix issues with 'auto' type numeric columns in default value mode * optimize variant typed columns without nested data * more tests for 'auto' type column ingestion	2023-05-08 13:19:02 -07:00
Vadim Ogievetsky	0a3889b192	account for auto allowing for leading and trailing spaces (#14224 )	2023-05-08 13:18:31 -07:00
minseok	3c62c00d4c	Fix Typos in DruidToGraphiteEventConverter (#14219 )	2023-05-08 17:46:32 +05:30
Clint Wylie	a7a4bfd331	modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184 ) This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.	2023-05-08 11:42:05 +05:30
Rohan Garg	4d8feeb279	Fix planning in CASE expressions with complex WHEN and ELSE expressions (#14220 )	2023-05-08 11:35:04 +05:30
George Shiqi Wu	eed5f4f291	Add labels to k8s jobs for the PodTemplateTaskAdapter (#14205 ) * Add labels * Add prefix * remove newline * fix syntax * Update prefix	2023-05-08 10:56:52 +08:00
Adarsh Sanjeev	fb38085ddb	Add wait for worker shutdown to MSQ task cancel (#14198 ) * Add wait for worker shutdown to MSQ task cancel * Fix checkstyle	2023-05-05 16:29:59 -07:00
Churro	123c4908c8	Ephemeral storage is respected from the overlod for peon tasks (#14201 )	2023-05-05 16:27:29 -07:00
Vadim Ogievetsky	4c15e978f1	Web console: misc bug fixes (#14216 ) * fixing little things * clear edit columns when switching to SQL tab * updated snapshots	2023-05-05 15:45:19 -07:00
Abhishek Radhakrishnan	6ca3fb9b08	Remove the redundant ISO-8601 text in the readme. (#14210 )	2023-05-05 11:27:29 -07:00
Abhishek Radhakrishnan	46dabab36d	Fix NPE in test parse exception report. Add more tests with different thresholds. (#14209 )	2023-05-05 10:05:41 -07:00
Clint Wylie	01e88848ce	restore .idea/misc.xml to see if it fixes intellij inspection ci (#14208 )	2023-05-05 11:47:16 +05:30
zachjsh	48cde236c4	Add columnMappings to explain plan output (#14187 ) * Add columnMappings to explain plan output * * fix checkstyle * add tests * * improve test coverage * * temporarily remove unit-test need to run ITs * * depend on build * * temporarily lower unit test threshold * * add back dependency on unit-tests * * add license headers * * fix header order * * review comments * * fix intellij inspection errors * * revert code coverage change	2023-05-04 10:36:28 -07:00
Abhishek Agarwal	edfd46ed45	Better actionable error message when druid services are not running (#14202 ) We have seen that the first-time users often don't know the next steps if druid services are unresponsive for some reason. This PR makes some of those messages a bit more clear.	2023-05-04 18:03:59 +05:30
Abhishek Radhakrishnan	68f908e511	Fix uncaught `ParseException` when reading Avro from Kafka (#14183 ) In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator. A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.	2023-05-04 12:35:36 +05:30
Abhishek Radhakrishnan	954f3917ef	Add check for required avroBytesDecoder property that otherwise causes NPE. (#14177 )	2023-05-03 09:53:58 -07:00
AmatyaAvadhanula	ac7181bbda	Persist supervisor spec only after successful start (#14150 ) * Persist spec after successful start * Fix checkstyle. * checkstyle after mvn install	2023-05-03 18:27:39 +05:30
Vadim Ogievetsky	ad93635e45	Web console: allow stringly schemas in the data loader (#14189 ) * allow stringly schemas * fix copy * feedback fixes * feedback * fix copy * add warning * indicate submitting * Update web-console/src/views/load-data-view/load-data-view.tsx Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> * feedback fix * copy fix --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2023-05-02 23:13:21 -07:00
Karan Kumar	6f0cdd0c3f	`TaskStartTimeoutFault` now depends on the last successful worker launch time. (#14172 ) * `TaskStartTimeoutFault` now depends on the last successful worker launch time.	2023-05-03 00:05:15 +05:30
Laksh Singla	387e682fbc	Fix memory calculations for WorkerMemoryParameters for machines with relatively less heap space (#14117 ) * update worker memory parameters	2023-05-02 09:24:56 +05:30
Karan Kumar	078d5ac590	Preference to first worker error in-case job fails with `TooManyAttemptsForWorker` (#14170 )	2023-05-01 14:47:11 +05:30
Clint Wylie	90ea192d9c	fix bugs with auto encoded long vector deserializers (#14186 ) This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.	2023-05-01 11:49:27 +05:30
Vadim Ogievetsky	32af570fb2	fix API doc formatting (#14167 )	2023-04-29 09:29:41 -07:00
Vadim Ogievetsky	f976837eaa	allow marking segments as used when the whole datasoruce is unused (#14185 )	2023-04-28 19:45:50 -07:00
George Shiqi Wu	d0654e2174	Register emitter (#14180 )	2023-04-27 18:32:50 -07:00
Vadim Ogievetsky	98db960794	fix task query error decode (#14174 )	2023-04-27 15:26:07 -07:00
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
Jill Osborne	d4e478c909	NVL function docs update (#14169 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-27 11:17:21 -07:00
Vadim Ogievetsky	fceb505833	Web console: allow __time in MSQ (#14165 ) * works in MSQ * fix spec conversion	2023-04-27 09:02:22 -07:00
Nicholas Lippis	6579c1c5b6	remove unneeded TaskLogStreamer binding override (#14176 )	2023-04-27 19:39:24 +05:30
Adarsh Sanjeev	63268a5023	Relaunch track of failed workers without work orders (#14166 ) * If a worker dies after it has finished generating results, MSQ decides to not retry it as it has no active work orders. However, since we don't keep track of it further, if it is required for a future stage, the controller hangs waiting for the worker to be ready. This PR keeps tracks of any workers the controller decides to not restart immediately and while starting workers for the next stage, queues these workers for retry.	2023-04-27 19:38:05 +05:30
Adarsh Sanjeev	5aa119dfda	Add retry to opening retrying stream (#14126 ) * Add retry to opening retrying stream * Add retry to S3Entity for network issues * Fix tests and clean up code	2023-04-27 16:52:22 +05:30
Gian Merlino	42c8c84eb6	TimeBoundary: Use cursor when datasource is not a regular table. (#14151 ) * TimeBoundary: Use cursor when datasource is not a regular table. Fixes a bug where TimeBoundary could return incorrect results with INNER Join or inline data. * Addl Javadocs.	2023-04-26 17:00:13 -07:00
TSFenwick	6c99fbea92	fix typo in s3 docs. add readme to s3 module. (#14135 ) * fix typo in s3 docs. add readme to s3 module. * Update extensions-core/s3-extensions/README.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * cleanup readme for s3 extension and link to repo markdown doc instead of web docs --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-04-26 14:03:11 -07:00
robo220	5db7396c78	fix(avro-json-path-expressions): allow more complex jsonpath expressions (#14149 )	2023-04-26 14:58:11 +05:30
Vadim Ogievetsky	e4d99c3e26	set count on rule history api (#14164 )	2023-04-26 01:44:20 -07:00
Tejaswini Bandlamudi	774073b2e7	Update Hadoop3 as default build version (#14005 ) Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.	2023-04-26 12:52:51 +05:30
Gian Merlino	752475b799	Fix two concurrency issues with segment fetching. (#14042 ) * Fix two concurrency issues with segment fetching. 1) SegmentLocalCacheManager: Fix a concurrency issue where certain directory cleanup happened outside of directoryWriteRemoveLock. This created the possibility that segments would be deleted by one thread, while being actively downloaded by another thread. 2) TaskDataSegmentProcessor (MSQ): Fix a concurrency issue when two stages in the same process both use the same segment. For example: a self-join using distributed sort-merge. Prior to this change, the two stages could delete each others' segments. 3) ReferenceCountingResourceHolder: increment() returns a new ResourceHolder, rather than a Releaser. This allows it to be passed to callers without them having to hold on to both the original ResourceHolder and a Releaser. 4) Simplify various interfaces and implementations by using ResourceHolder instead of Pair and instead of split-up fields. * Add test. * Fix style. * Remove Releaser. * Updates from master. * Add some GuardedBys. * Use the correct GuardedBy. * Adjustments.	2023-04-25 20:49:27 -07:00
Gian Merlino	2dfb693d4c	Improved handling for zero-length intervals. (#14136 ) * Improved handling for zero-length intervals. 1) Return an empty list from VersionedIntervalTimeline.lookup when provided with an empty interval. (The logic doesn't quite work when intervals are empty, which led to #14129.) 2) Don't return zero-length intervals from JodaUtils.condenseIntervals. 3) Detect "incorrect" comparator in JodaUtils.condenseIntervals, and recreate the SortedSet if needed. (Not strictly related to the theme of this patch. Just another thing in the same file.) 4) Remove unused method JodaUtils.containOverlappingIntervals. Fixes #14129. * Fix TimewarpOperatorTest.	2023-04-25 17:12:56 -07:00
Gian Merlino	a7d4162195	Compaction: Block input specs not aligned with segmentGranularity. (#14127 ) * Compaction: Block input specs not aligned with segmentGranularity. When input intervals are not aligned with segmentGranularity, data may be overshadowed if it lies in the space between the input intervals and the output segmentGranularity. In MSQ REPLACE, this is a validation error. IMO the same behavior makes sense for compaction tasks. In case anyone was depending on the ability to compact nonaligned intervals, a configuration parameter allowNonAlignedInterval is provided. I don't expect it to be used much. * Remove unused. * ITCompactionTaskTest uses non-aligned intervals.	2023-04-25 17:06:16 -07:00

1 2 3 4 5 ...

12761 Commits All Branches Search

12761 Commits

All Branches