druid

Commit Graph

Author	SHA1	Message	Date
Abhishek Agarwal	3c7b237c22	Add docs for ingesting Kafka topic name (#14894 ) Add documentation on how to extract the Kafka topic name and ingest it into the data.	2023-08-24 19:19:59 +05:30
Zoltan Haindrich	54336e2a3e	Imporve on incremental compilation (#14860 ) This patch fixes a few issues toward #14858 1. some phony classes were added to enable maven to track the compilation of those classes 2. cyclonedx 2.7.9 seem to handle incremental compilation better; it had a PR relating to that 3. needed to update root pom to 25 4. update antlr to 4.5.3 older one didn't really worked incrementally; 4.5.3 works much better	2023-08-24 16:06:16 +05:30
Laksh Singla	f9f734cde5	Display the output column name in InvalidNullByteException (#14780 ) This PR maps the query column to the output column name while surfacing the fault since that is readily visible to the user while executing the query.	2023-08-24 04:24:41 +00:00
Clint Wylie	36e659a501	remove group-by v1 (#14866 ) * remove group-by v1 * docs * remove unused configs, fix test * fix test * adjustments * why not * adjust * review stuff	2023-08-23 12:44:06 -07:00
zachjsh	0c76df1c7d	Enable Continuous auto kill (#14831 ) ### Description This change enables the `KillUnusedSegments` coordinator duty to be scheduled continuously. Things that prevented this, or made this difficult before were the following: 1. If scheduled at fast enough rate, the duty would find the same intervals to kill for the same datasources, while kill tasks submitted for those same datasources and intervals were already underway, thus wasting task slots on duplicated work. 2. The task resources used by auto kill were previously unbounded. Each duty run period, if unused segments were found for any datasource, a kill task would be submitted to kill them. This pr solves for both of these issues: 1. The duty keeps track of the end time of the last interval found when killing unused segments for each datasource, in a in memory map. The end time for each datasource, if found, is used as the start time lower bound, when searching for unused intervals for that same datasource. Each duty run, we remove any datasource keys from this map that are no longer found to match datasources in the system, or in whitelist, and also remove a datasource entry, if there is found to be no unused segments for the datasource, which happens when we fail to find an interval which includes unused segments. Removing the datasource entry from the map, allows for searching for unusedSegments in the datasource from the beginning of time once again 2. The unbounded task resource usage can be mitigated with coordinator dynamic config added as part of `ba957a9b97` Operators can configure continous auto kill by providing coordinator runtime properties similar to the following: ``` druid.coordinator.period.indexingPeriod=PT60S druid.coordinator.kill.period=PT60S ``` And providing sensible limits to the killTask usage via coordinator dynamic properties.	2023-08-23 09:23:08 -04:00
Adarsh Sanjeev	dfb5a98888	Add coordinator API for unused segments (#14846 ) There is a current issue due to inconsistent metadata between worker and controller in MSQ. A controller can receive one set of segments, which are then marked as unused by, say, a compaction job. The worker would be unable to get the segment information as MetadataResource.	2023-08-23 14:51:25 +05:30
Atul Mohan	989ed8d0c2	Fix null check for JWT claims (#14872 )	2023-08-23 14:39:23 +05:30
Giulio Talarico	76e5048aab	fix supervisor spec api submission commands (#14877 )	2023-08-23 14:38:09 +05:30
Zoltan Haindrich	e806d09309	Allow EARLIEST/EARLIEST_BY/LATEST/LATEST_BY for STRING columns without specifying maxStringBytes (#14848 )	2023-08-22 22:50:19 -07:00
Clint Wylie	7b5012ea6e	override retry attempts for InputEntityIteratingReaderTest for much faster test run (#14897 )	2023-08-22 22:01:47 -07:00
Clint Wylie	fb053c399c	consolidate json and auto indexers, remove v4 nested column serializer (#14456 )	2023-08-22 18:50:11 -07:00
Soumyava	6817de9376	Doc changes for avatica transparent reconnection (#14896 )	2023-08-22 11:58:17 -07:00
Zoltan Haindrich	b9a33949fd	Fix aggregation filter expression processing in the absense of projection (#14893 ) * test * fix * add 33 test * crap * Revert "crap" This reverts commit `2751198deb`. * cleanup test * celanup * rename test	2023-08-22 10:17:14 -07:00
Kashif Faraz	9376d8d6e1	Refactor: Move `UpdateCoordinatorStateAndPrepareCluster` duty out of `DruidCoordinator` (#14845 ) Motivation: - Clean up `DruidCoordinator` and move methods to classes where they are most relevant Changes: - No functional change - Add duty `PrepareBalancerAndLoadQueues` to replace `UpdateCoordinatorState` - Move map of `LoadQueuePeon` from `DruidCoordinator` to `LoadQueueTaskMaster` - Make `BalancerStrategyFactory` an abstract class and keep the balancer executor here - Move reporting of used segment stats and historical capacity stats from `CollectSegmentAndServerStats` to `PrepareBalancerAndLoadQueues` - Move reporting of unavailable and under-replicated segment stats from `CollectSegmentAndServerStats` to `UpdateReplicationStatus` duty	2023-08-22 19:50:41 +05:30
Zoltan Haindrich	14c1aff150	Fix error messages relating to OVERWRITE keyword (#14870 ) OVERWRITE should not be a fully reserved keyword	2023-08-22 16:17:49 +05:30
AmatyaAvadhanula	bd505062de	Improve streaming ingestion completion timeout error message (#14636 ) * Improve streaming ingestion completion timeout error message Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2023-08-22 14:33:28 +05:30
Clint Wylie	194a9c9abc	set druid.expressions.useStrictBooleans to true by default (#14734 )	2023-08-22 00:19:56 -07:00
Tejaswini Bandlamudi	d87056e708	Upgrade guava version to 31.1-jre (#14767 ) Currently, Druid is using Guava 16.0.1 version. This upgrade to 31.1-jre fixes the following issues. CVE-2018-10237 (Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable). We don't use Java or GWT serializations. Despite being false positive they're causing red security scans on Druid distribution. Latest version of google-client-api is incompatible with the existing Guava version. This PR unblocks Update google client apis to latest version #14414	2023-08-22 12:09:53 +05:30
Benedict Jin	18f7cb6926	Fixed broken URL of python api tutorial (#14881 )	2023-08-22 09:53:41 +05:30
Clint Wylie	5d1412949e	enable sql compatible null handling mode by default (#14792 ) * enable sql compatible null handling mode by default * fix bug with string first/last aggs when druid.generic.useDefaultValueForNull=false	2023-08-21 20:07:13 -07:00
Katya Macedo	5f74ef56f1	Clean up Kafka supervisor topic (#14651 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-21 11:55:38 -07:00
Nhi Pham	9fe7c01c16	Automatic compaction API documentation refactor (#14740 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-08-21 11:34:41 -07:00
Peter Marshall	0dfd99e381	202307-notebook-unionall (#14726 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-21 10:55:58 -07:00
Vadim Ogievetsky	631dc3b589	add Kafka topic column controls (#14865 )	2023-08-21 21:33:23 +05:30
Abhishek Agarwal	a38b4f0491	Add topic name as a column in the Kafka Input format (#14857 ) This PR adds a way to store the topic name in a column. Such a column can be used to distinguish messages coming from different topics in multi-topic ingestion.	2023-08-21 21:32:34 +05:30
Kashif Faraz	92906059d2	Remove segmentsToBeDropped from SegmentTransactionInsertAction (#14883 ) Motivation: - There is no usage of the `SegmentTransactionInsertAction` which passes a non-null non-empty value of `segmentsToBeDropped`. - This is not really needed either as overshadowed segments are marked as unused by the Coordinator and need not be done in the same transaction as committing segments. - It will also help simplify the changes being made in #14407 Changes: - Remove `segmentsToBeDropped` from the task action and all intermediate methods - Remove related tests which are not needed anymore	2023-08-21 20:08:56 +05:30
Kashif Faraz	c211dcc4b3	Clean up compaction logs on coordinator (#14875 ) Changes: - Move logic of `NewestSegmentFirstIterator.needsCompaction` to `CompactionStatus` to improve testability and readability - Capture the list of checks performed to determine if compaction is needed in a readable manner in `CompactionStatus.CHECKS` - Make `CompactionSegmentIterator` iterate over instances of `SegmentsToCompact` instead of `List<DataSegment>`. This allows use of the `umbrellaInterval` later. - Replace usages of `QueueEntry` with `SegmentsToCompact` - Move `SegmentsToCompact` out of `NewestSegmentFirstIterator` - Simplify `CompactionStatistics` - Reduce level of less important logs to debug - No change made to tests to ensure correctness	2023-08-21 17:30:41 +05:30
Kashif Faraz	07a193a142	Use separate executor for each coordinator duty group (#14869 ) Changes: - Use separate executor for every duty group - This change is thread-safe as every duty group uses its own copy of `DruidCoordinatorRuntimeParams` and does not share any other mutable instances with other duty groups. - With the exception of `HistoricalManagementDuties`, duty groups are typically not very compute intensive and mostly perform database or HTTP I/O. So, coordinator resources would still mostly be available for `HistoricalManagementDuties`.	2023-08-21 15:53:22 +05:30
Abhishek Agarwal	9065ef1aff	Fix a bug in QosFilter (#14859 ) QoSFilter class is trying to parse the timeout as an integer. We need to round a value of query timeout that is higher than INT.MAX to INT.MAX.	2023-08-21 13:00:41 +05:30
317brian	263ac36e8d	docs: fix autolabeler for jupyter notebooks (#14862 )	2023-08-18 12:42:36 -07:00
Kashif Faraz	097b645005	Clean up after add kill bufferPeriod (#14868 ) Follow up changes to #12599 Changes: - Rename column `used_flag_last_updated` to `used_status_last_updated` - Remove new CLI tool `UpdateTables`. - We already have a `CreateTables` with similar functionality, which should be able to handle update cases too. - Any user running the cluster for the first time should either just have `connector.createTables` enabled or run `CreateTables` which should create tables at the latest version. - For instance, the `UpdateTables` tool would be inadequate when a new metadata table has been added to Druid, and users would have to run `CreateTables` anyway. - Remove `upgrade-prep.md` and include that info in `metadata-init.md`. - Fix log messages to adhere to Druid style - Use lambdas	2023-08-19 00:00:04 +05:30
dependabot[bot]	1e14df4c49	Bump com.ibm.icu:icu4j from 55.1 to 73.2 (#14853 ) * Bump com.ibm.icu:icu4j from 55.1 to 73.2 Bumps [com.ibm.icu:icu4j](https://github.com/unicode-org/icu) from 55.1 to 73.2. - [Release notes](https://github.com/unicode-org/icu/releases) - [Commits](https://github.com/unicode-org/icu/commits) --- updated-dependencies: - dependency-name: com.ibm.icu:icu4j dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update Unicode/ICU license * fix license check for unicode/icu --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-08-18 09:10:39 -04:00
Jonathan Wei	a8eaa1e4ed	Skip streaming auto-scaling action if supervisor is idle (#14773 ) * Skip streaming auto-scaling action if supervisor is idle * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> --------- Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>	2023-08-17 19:43:25 -05:00
Lucas Capistrant	9c124f2cde	Add a configurable bufferPeriod between when a segment is marked unused and deleted by KillUnusedSegments duty (#12599 ) * Add new configurable buffer period to create gap between mark unused and kill of segment * Changes after testing * fixes and improvements * changes after initial self review * self review changes * update sql statement that was lacking last_used * shore up some code in SqlMetadataConnector after self review * fix derby compatibility and improve testing/docs * fix checkstyle violations * Fixes post merge with master * add some unit tests to improve coverage * ignore test coverage on new UpdateTools cli tool * another attempt to ignore UpdateTables in coverage check * change column name to used_flag_last_updated * fix a method signature after column name switch * update docs spelling * Update spelling dictionary * Fixing up docs/spelling and integrating altering tasks table with my alteration code * Update NULL values for used_flag_last_updated in the background * Remove logic to allow segs with null used_flag_last_updated to be killed regardless of bufferPeriod * remove unneeded things now that the new column is automatically updated * Test new background row updater method * fix broken tests * fix create table statement * cleanup DDL formatting * Revert adding columns to entry table by default * fix compilation issues after merge with master * discovered and fixed metastore inserts that were breaking integration tests * fixup forgotten insert by using pattern of sharing now timestamp across columns * fix issue introduced by merge * fixup after merge with master * add some directions to docs in the case of segment table validation issues	2023-08-17 19:32:51 -05:00
Vadim Ogievetsky	7e147ee905	Web console: Reset to specific offsets dialog (#14863 ) * add dialog * copy changes	2023-08-17 15:38:56 -07:00
Vadim Ogievetsky	59415ba9b2	Web console: expose new coordinator properties in the dialog (#14791 ) * expose new coordinator properties in the dialog * escape	2023-08-17 15:37:23 -07:00
Abhishek Radhakrishnan	37db5d9b81	Reset offsets supervisor API (#14772 ) * Add supervisor /resetOffsets API. - Add a new endpoint /druid/indexer/v1/supervisor/<supervisorId>/resetOffsets which accepts DataSourceMetadata as a body parameter. - Update logs, unit tests and docs. * Add a new interface method for backwards compatibility. * Rename * Adjust tests and javadocs. * Use CoreInjectorBuilder instead of deprecated makeInjectorWithModules * UT fix * Doc updates. * remove extraneous debugging logs. * Remove the boolean setting; only ResetHandle() and resetInternal() * Relax constraints and add a new ResetOffsetsNotice; cleanup old logic. * A separate ResetOffsetsNotice and some cleanup. * Minor cleanup * Add a check & test to verify that sequence numbers are only of type SeekableStreamEndSequenceNumbers * Add unit tests for the no op implementations for test coverage * CodeQL fix * checkstyle from merge conflict * Doc changes * DOCUSAURUS code tabs fix. Thanks, Brian!	2023-08-17 14:13:10 -07:00
dependabot[bot]	2cc3bd6383	Bump joda-time:joda-time from 2.12.4 to 2.12.5 (#14855 ) * Bump joda-time:joda-time from 2.12.4 to 2.12.5 Bumps [joda-time:joda-time](https://github.com/JodaOrg/joda-time) from 2.12.4 to 2.12.5. - [Release notes](https://github.com/JodaOrg/joda-time/releases) - [Changelog](https://github.com/JodaOrg/joda-time/blob/main/RELEASE-NOTES.txt) - [Commits](https://github.com/JodaOrg/joda-time/compare/v2.12.4...v2.12.5) --- updated-dependencies: - dependency-name: joda-time:joda-time dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-08-17 11:24:22 -07:00
dependabot[bot]	2a7fbf2ab4	Bump org.apache.directory.api:api-util from 1.0.3 to 2.1.3 (#14852 ) Bumps [org.apache.directory.api:api-util](https://github.com/apache/directory-ldap-api) from 1.0.3 to 2.1.3. - [Commits](https://github.com/apache/directory-ldap-api/compare/1.0.3...2.1.3) --- updated-dependencies: - dependency-name: org.apache.directory.api:api-util dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-17 08:56:34 -07:00
Kashif Faraz	fffb2e4fe7	Speed up SQLMetadataStorageActionHandlerTest (#14856 ) Changes - Reduce test time of `SQLMetadataStorageActionHandlerTest.testMigration` - Slightly modify log messages to adhere to Druid style	2023-08-17 18:02:43 +05:30
Abhishek Agarwal	b97cc45d81	Add clarification to the docs for multi-topic Kafka ingestion (#14847 ) Follow-up to #14828. Added some more clarification about how topicPattern is used.	2023-08-17 12:52:06 +05:30
Vadim Ogievetsky	dc2ae1e99c	Web console: improving the helper queries by allowing for running inline helper queries (#14801 ) * remove helper queries * fix tests * take care of zero queries also * switch to better place	2023-08-16 23:50:43 -07:00
Kashif Faraz	5d4ac64178	Adapt maxSegmentsToMove based on cluster skew (#14584 ) Changes: - No change in behaviour if `smartSegmentLoading` is disabled - If `smartSegmentLoading` is enabled - Compute `balancerComputeThreads` based on `numUsedSegments` - Compute `maxSegmentsToMove` based on `balancerComputeThreads` - Compute `segmentsToMoveToFixSkew` based on usage skew - Compute `segmentsToMove = Math.min(maxSegmentsToMove, segmentsToMoveToFixSkew)` Limits: - 1 <= `balancerComputeThreads` <= 8 - `maxSegmentsToMove` <= 20% of total segments - `minSegmentsToMove` = 0.15% of total segments	2023-08-17 11:14:54 +05:30
Vadim Ogievetsky	cb27d0d2ed	Web console: enable Kafka multi-topic ingestion from the data loader (#14833 ) * multi topic ux * updated to match new api	2023-08-17 09:57:34 +05:30
317brian	6b4dda964d	Docusaurus2 upgrade for master (#14411 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-16 19:01:21 -07:00
Clint Wylie	6b14dde50e	deprecate config-magic in favor of json configuration stuff (#14695 ) * json config based processing and broker merge configs to deprecate config-magic	2023-08-16 18:23:57 -07:00
Pranav	26d82fd342	fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite (#14587 )	2023-08-16 17:57:16 -07:00
Peter Marshall	f585f0a8ed	202306-docs-notebook topn (#14478 ) Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-08-16 14:50:49 -07:00
Jill Osborne	2561477e87	Jupyter nested columns tutorial (#14788 )	2023-08-16 14:45:37 -07:00
dependabot[bot]	faf79470ae	Bump io.dropwizard.metrics:metrics-graphite from 3.1.2 to 4.2.19 (#14842 ) * Bump io.dropwizard.metrics:metrics-graphite from 3.1.2 to 4.2.19 Bumps [io.dropwizard.metrics:metrics-graphite](https://github.com/dropwizard/metrics) from 3.1.2 to 4.2.19. - [Release notes](https://github.com/dropwizard/metrics/releases) - [Commits](https://github.com/dropwizard/metrics/compare/v3.1.2...v4.2.19) --- updated-dependencies: - dependency-name: io.dropwizard.metrics:metrics-graphite dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * align graphite-emitter dropwizard version with core --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-08-16 13:58:35 -07:00

... 3 4 5 6 7 ...

13336 Commits All Branches Search

13336 Commits

All Branches