Commit Graph

14444 Commits

Author SHA1 Message Date
Zoltan Haindrich cc4e0adcbf make synch 2024-06-26 12:06:50 +00:00
Zoltan Haindrich b8992434a9 x 2024-06-26 11:13:32 +00:00
Zoltan Haindrich c559bc3422 Revert "Revert "Revert "Revert "add kttm tx""""
This reverts commit 9a4d04a818.
2024-06-26 10:08:01 +00:00
Zoltan Haindrich e7141e2080 Revert "stuff"
This reverts commit 1b7dd8fd3c.
2024-06-26 10:07:48 +00:00
Zoltan Haindrich 1b7dd8fd3c stuff 2024-06-26 10:07:47 +00:00
Laksh Singla 71b3b5ab5d
Add query context parameter to remove null bytes when writing frames (#16579)
MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.
2024-06-26 15:00:30 +05:30
Kashif Faraz d9bd02256a
Refactor: Rename UsedSegmentChecker and cleanup task actions (#16644)
Changes:
- Rename `UsedSegmentChecker` to `PublishedSegmentsRetriever`
- Remove deprecated single `Interval` argument from `RetrieveUsedSegmentsAction`
as it is now unused and has been deprecated since #1988 
- Return `Set` of segments instead of a `Collection` from `IndexerMetadataStorageCoordinator.retrieveUsedSegments()`
2024-06-26 10:48:59 +05:30
Tom 52c9929019
Column name in parse exceptions (#16529)
* first pass

* more changes

* fix tests and formatting

* fix kinesis failing tests

* fix kafka tests

* add dimension name to float parse errors

* double and convertToType handling of dimensionName can report parse errors with dimension name

* fix checkstyle issue

* fix tests

* more cases to have better parse exception messages

* fix test

* fix tests

* partially address comments

* annotate method parameter with nullable

* address comments

* fix tests

* let float, double, long dimensionIndexer pass dimensionName down to dimensionHandlerUtils

* fix compilation error and clean up formatting

* clean up whitespace

* address feedback. undo change, pass down report parse exception for convertToType

* fix test
2024-06-25 13:42:52 -07:00
Abhishek Radhakrishnan e01f155209
Add missing `delta-storage` dependency and class loader workaround to Delta table ingestion (#16648)
* Workaround to ingesting from Delta table in 3.2.0.

With the upgrade to Kernel 3.2.0, the Druid Delta connector extension
isn't able to ingest from Delta tables successfully.

Please see https://github.com/delta-io/delta/issues/3299

The underlying problem seems to be coming from
https://github.com/delta-io/delta/blob/master/kernel/kernel-defaults/src/main/java/io/delta/kernel/defaults/internal/logstore/LogStoreProvider.java#L99

This patch is a workaround to setting the thread class loader explictly.
The Kernel community may consider a fix in the next release as it's affected another
connector as well.

* Address review comment: clear the CL after the Thread CL is set.
2024-06-25 09:16:13 -07:00
Edgar Melendrez b43f4063c5
Docs: update link and title of quickstart (#16638)
* update link and title

* Discard changes to website/package.json

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-06-25 09:07:00 -07:00
Abhishek Radhakrishnan 2979f73e89
Fix Intellij inspection (#16651) 2024-06-25 04:32:43 -07:00
Zoltan Haindrich 1a5faf1afb more pomxml stuff 2024-06-25 08:13:31 +00:00
Zoltan Haindrich 3dfe5c4a05 add reflections 2024-06-25 07:08:01 +00:00
Zoltan Haindrich 6c02cbdf4d fixes 2024-06-25 06:38:13 +00:00
Zoltan Haindrich 0d76a73c4c remove final 2024-06-25 06:33:45 +00:00
Kashif Faraz f1043d20bc
Support csv input format in Kafka ingestion with header (#16630)
* Support ListBasedInputRow in Kafka ingestion with header
* Fix up buildBlendedEventMap
* Add new test for KafkaInputFormat with csv value and headers
* Do not use forbidden APIs
* Move utility method to TestUtils
2024-06-25 11:50:01 +05:30
Clint Wylie 37a50e6803
Remove index_realtime and index_realtime_appenderator tasks (#16602)
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.

index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.

This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)

ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-06-24 20:13:33 -07:00
317brian 2131917f16
docs: added front-coded dictionaries to upgrade notes (#16647)
* docs: add front-coded dictionareis to upgrade notes

* add it to release notes template
2024-06-24 10:52:26 -07:00
Abhishek Radhakrishnan 7463589b07
Support for bootstrap segments (#16609)
* Initial support for bootstrap segments.

  - Adds a new API in the coordinator.
  - All processes that have storage locations configured (including tasks)
    talk to the coordinator if they can, and fetch bootstrap segments from it.
  - Then load the segments onto the segment cache as part of startup.
  - This addresses the segment bootstrapping logic required by processes before
    they can start serving queries or ingesting.

    This patch also lays the foundation to speed up upgrades.

* Fail open by default if there are any errors talking to the coordinator.

* Add test for failure scenario and cleanup logs.

* Cleanup and add debug log

* Assert the events so we know the list exactly.

* Revert RunRules test.

The rules aren't evaluated if there are no clusters.

* Revert RunRulesTest too.

* Remove debug info.

* Make the API POST and update log.

* Fix up UTs.

* Throw 503 from MetadataResource; clean up exception handling and DruidException.

* Remove unused logger, add verification of metrics and docs.

* Update error message

* Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Adjust test metric expectations with the rename.

* Add BootstrapSegmentResponse container in the response for future extensibility.

* Rename to BootstrapSegmentsInfo for internal consistency.

* Remove unused log.

* Use a member variable for broadcast segments instead of segmentAssigner.

* Minor cleanup

* Add test for loadable bootstrap segments and clarify comment.

* Review suggestions.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-06-24 09:27:17 -07:00
Misha 354a3bea0b
The default `WHERE' filter for automatically generated SQL queries is returned (#16608)
* Returned the default `WHERE` filter for auto-generated SQL queries

* Checkstyle fix

---------

Co-authored-by: sviatahorau <mikhail.sviatahorau@deep.bi>
2024-06-24 08:52:35 -07:00
Sree Charan Manamala 990fd5f5fb
Make use group iterator for all window frames & support for same bound kinds (#16603)
Fixes apache/druid#15739
2024-06-24 15:52:41 +02:00
Kashif Faraz 0fe6a2af68
Fix replica task failures with metadata inconsistency while running concurrent append replace (#16614)
Changes:
- Add new task action `RetrieveSegmentsByIdAction`
- Use new task action to retrieve segments irrespective of their visibility
- During rolling upgrades, this task action would fail as Overlord would be on old version
- If new action fails, fall back to just fetching used segments as before
2024-06-24 09:56:04 +05:30
Adarsh Sanjeev 1a883ba1f7
Fix complex columns with export (#16572)
This PR fixes a few bugs with MSQ export. The main change is calling SqlResults#coerce before writing the column. This allows sketches and json to be correctly deserialized. The format of the exported complex columns are similar to those produced by Async MSQ queries with CSV format.

Notes:

    Fix printing of complex columns during export. Sketches and JSON are now correctly formatted during export.
    Fix an NPE if the writer has not been initialized. Empty export queries will create an empty file at the location.
    Fix a bug with counters for MSQ export, where rows were reported for only the first partition.
2024-06-24 09:03:30 +05:30
Akshat Jain 641f739a47
Fix flaky test in RetryableS3OutputStreamTest (#16639)
As part of #16481, we have started uploading the chunks in parallel.
That means that it's not necessary for the part that finished uploading last
to be less than or equal to the chunkSize (as the final part could've been uploaded earlier).

This made a test in RetryableS3OutputStreamTest flaky where we were
asserting that the final part should be smaller than chunk size.

This commit fixes the test, and also adds another test where the file size
is such that all chunk sizes would be of equal size.
2024-06-24 08:13:47 +05:30
Laksh Singla 00c96432af
Materialize scan results correctly when columns are not present in the segments (#16619)
Fixes a bug causing maxSubqueryBytes not to work when segments have missing columns.
2024-06-23 23:15:45 +05:30
Rishabh Singh a63c12bf34
Upload tasklogs along with service logs on Standard IT failure (#16631)
* Fix build

* Push tasklogs alongwith service logs

* temp changes to run standard its without unit test results

* test

* minor change

* test

* test

* Update datasource name for ITSystemTableBatchIndexTaskTest

* Publish task logs

* Revert other changes

* update standard-it yaml
2024-06-22 11:45:54 +05:30
Vadim Ogievetsky 51c73b5a4e
Web console: show formatted JSON value (#16632)
* show formatted json value

* update snapshot

* window functions

* count star can also have a window

* better edit query context
2024-06-21 18:33:15 -07:00
Rishabh Singh 4eced9b3c9
Fix CentralizedDatasourceSchema group IT failure (#16636)
* Fix build

* Update datasource name in ITSystemTableBatchIndexTaskTest
2024-06-21 15:40:12 -07:00
Suneet Saldanha 4e0ea7823b
Update docs for K8s TaskRunner Dynamic Config (#16600)
* Update docs for K8s TaskRunner Dynamic Config

* touchups

* code review

* npe

* oopsies
2024-06-21 06:01:59 -07:00
Akshat Jain cd438b1918
Emit metrics for S3UploadThreadPool (#16616)
* Emit metrics for S3UploadThreadPool

* Address review comments

* Revert unnecessary formatting change

* Revert unnecessary formatting change in metrics.md file

* Address review comments

* Add metric for task duration

* Minor fix in metrics.md

* Add s3Key and uploadId in the log message

* Address review comments

* Create new instance of ServiceMetricEvent.Builder for thread safety

* Address review comments

* Address review comments
2024-06-21 11:36:47 +05:30
Zoltan Haindrich 9a4d04a818 Revert "Revert "Revert "add kttm tx"""
This reverts commit 26a16fb4fe.
2024-06-20 17:45:08 +00:00
Zoltan Haindrich 0af3b910f1 update readme 2024-06-20 17:45:01 +00:00
Adithya Chakilam 35709de549
CgroupCpuSetMonitor: Initialize the cgroup discoverer (#16621) 2024-06-20 10:23:59 -07:00
Zoltan Haindrich cefdf96a26 prep 2024-06-20 16:45:39 +00:00
Zoltan Haindrich 724212381c close stuff 2024-06-20 16:40:06 +00:00
Andreas Maechler ae70e18bc8
docs: Update Azure extension (#16585)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-06-20 09:31:29 -07:00
Zoltan Haindrich 26a16fb4fe Revert "Revert "add kttm tx""
This reverts commit 82f24e61f2.
2024-06-20 16:02:09 +00:00
Zoltan Haindrich 82f24e61f2 Revert "add kttm tx"
This reverts commit eda48497e2.
2024-06-20 16:01:53 +00:00
Abhishek Radhakrishnan b20c3dbadf
Fix malformed period throwing `ADMIN` persona error (#16626)
* Turn invalid periods into user-facing exception providing more context.

The current exception is targeting the ADMIN persona. Catch that and turn
it into a USER persona instead. Also, provide more context in the error
message.

* Review comment: pass the wrapping expression and stringify.

* Update processing/src/main/java/org/apache/druid/query/expression/ExprUtils.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2024-06-20 08:40:28 -07:00
Zoltan Haindrich d329686d5c fix msq test 2024-06-20 15:37:13 +00:00
Zoltan Haindrich 6e48cb86d5 move annotation 2024-06-20 14:41:45 +00:00
Zoltan Haindrich ebb27cf462 add extension to disabel when not sql compat 2024-06-20 14:40:44 +00:00
Zoltan Haindrich 604910cead cleanup 2024-06-20 14:27:09 +00:00
Zoltan Haindrich b2be5abdd5 fix md 2024-06-20 14:21:56 +00:00
Zoltan Haindrich eda48497e2 add kttm tx 2024-06-20 14:20:29 +00:00
Sree Charan Manamala 7ac0862287
Grouping Engine fix when a limit spec with different order by columns is applied (#16534) 2024-06-20 11:35:58 +02:00
Zoltan Haindrich 4bd8039715 fix delegate 2024-06-19 16:49:24 +00:00
Zoltan Haindrich 1a0ab2c3b1 Merge remote-tracking branch 'apache/master' into quidem-record 2024-06-19 12:59:26 +00:00
Rishabh Singh 169a8dbd1a
Disable TestValidateIncompatibleCentralizedDatasourceSchemaConfig (#16627)
* Fix build

* Ignore test
2024-06-18 17:50:46 -07:00
Maytas Monsereenusorn 44268e7fad
Pass requestBufferSize from Config to Proxy servlet (#16611) 2024-06-19 02:42:16 +07:00