druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	2f9619a96f	Use OverlordClient for all Overlord RPCs. (#14581 ) * Use OverlordClient for all Overlord RPCs. Continuing the work from #12696, this patch removes HttpIndexingServiceClient and the IndexingService flavor of DruidLeaderClient completely. All remaining usages are migrated to OverlordClient. Supporting changes include: 1) Add a variety of methods to OverlordClient. 2) Update MetadataTaskStorage to skip the complete-task lookup when the caller requests zero completed tasks. This helps performance of the "get active tasks" APIs, which don't want to see complete ones. * Use less forbidden APIs. * Fixes from CI. * Add test coverage. * Two more tests. * Fix test. * Updates from CR. * Remove unthrown exceptions. * Refactor to improve testability and test coverage. * Add isNil tests. * Remove unnecessary "deserialize" methods.	2023-07-24 21:14:27 -07:00
Gian Merlino	bac5ef347c	Add ingest/input/bytes metric and Kafka consumer metrics. (#14582 ) * Add ingest/input/bytes metric and Kafka consumer metrics. New metrics: 1) ingest/input/bytes. Equivalent to processedBytes in the task reports. 2) kafka/consumer/bytesConsumed: Equivalent to the Kafka consumer metric "bytes-consumed-total". Only emitted for Kafka tasks. 3) kafka/consumer/recordsConsumed: Equivalent to the Kafka consumer metric "records-consumed-total". Only emitted for Kafka tasks. * Fix anchor. * Fix KafkaConsumerMonitor. * Interface updates. * Doc changes. * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamIndexTask.java Co-authored-by: Benedict Jin <asdf2014@apache.org> --------- Co-authored-by: Benedict Jin <asdf2014@apache.org>	2023-07-20 10:56:22 +08:00
Clint Wylie	913416c669	add equality, null, and range filter (#14542 ) changes: * new filters that preserve match value typing to better handle filtering different column types * sql planner uses new filters by default in sql compatible null handling mode * remove isFilterable from column capabilities * proper handling of array filtering, add array processor to column processors * javadoc for sql test filter functions * range filter support for arrays, tons more tests, fixes * add dimension selector tests for mixed type roots * support json equality * rename semantic index maker thingys to mostly have plural names since they typically make many indexes, e.g. StringValueSetIndex -> StringValueSetIndexes * add cooler equality index maker, ValueIndexes * fix missing string utf8 index supplier * expression array comparator stuff	2023-07-18 12:15:22 -07:00
Maytas Monsereenusorn	aef221f71b	Allow multiple consoleAppender to be used in peon logging (#14521 ) * Allow multiple consoleAppender to be used in peon logging * Fix Attempted to append to non-started appender error	2023-07-17 21:29:45 -07:00
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Gian Merlino	450ecd6370	More efficient generation of ImmutableWorkerHolder from WorkerHolder. (#14546 ) * More efficient generation of ImmutableWorkerHolder from WorkerHolder. Taking the work done in #12096 a little further: 1) Applying a similar optimization to WorkerHolder (HttpRemoteTaskRunner). The original patch only helped with the ZkWorker (RemoteTaskRunner). 2) Improve the ZkWorker version somewhat by avoiding multiple iterations through the task announcements map. * Pick better names and use better logic. * Only runnable tasks. * Fix test. * Fix testBlacklistZKWorkers50Percent.	2023-07-13 07:57:16 -07:00
Gian Merlino	63ee69b4e8	Claim full support for Java 17. (#14384 ) * Claim full support for Java 17. No production code has changed, except the startup scripts. Changes: 1) Allow Java 17 without DRUID_SKIP_JAVA_CHECK. 2) Include the full list of opens and exports on both Java 11 and 17. 3) Document that Java 17 is both supported and preferred. 4) Switch some tests from Java 11 to 17 to get better coverage on the preferred version. * Doc update. * Update errorprone. * Update docker_build_containers.sh. * Update errorprone in licenses.yaml. * Add some more run-javas. * Additional run-javas. * Update errorprone. * Suppress new errorprone error. * Add exports and opens in ForkingTaskRunner for Java 11+. Test, doc changes. * Additional errorprone updates. * Update for errorprone. * Restore old fomatting in LdapCredentialsValidator. * Copy bin/ too. * Fix Java 15, 17 build line in docker_build_containers.sh. * Update busybox image. * One more java command. * Fix interpolation. * IT commandline refinements. * Switch to busybox 1.34.1-glibc. * POM adjustments, build and test one IT on 17. * Additional debugging. * Fix silly thing. * Adjust command line. * Add exports and opens one more place. * Additional harmonization of strong encapsulation parameters.	2023-07-07 12:52:35 -07:00
Gian Merlino	021a01df45	RTR, HRTR: Fix incorrect maxLazyWorkers check in markLazyWorkers. (#14545 ) Recently #14532 fixed a problem when maxLazyWorkers == 0 and lazyWorkers starts out empty. Unfortunately, even after that patch, there remained a more general version of this problem when maxLazyWorkers == lazyWorkers.size(). This patch fixes it. I'm not sure if this would actually happen in production, because the provisioning strategies do try to avoid calling markWorkersLazy until previously-initiated terminations have finished. Nevertheless, it still seems like a good thing to fix.	2023-07-07 10:08:12 -07:00
Kashif Faraz	40d0dc9e0e	Use separate executor to handle task updates in TaskQueue (#14533 ) Description: `TaskQueue.notifyStatus` is often a heavy call as it performs the following operations: - Update task status in metadata DB - Update task locks in metadata DB - Request (synchronously) the task runner to shutdown the completed task - Clean up in-memory data structures This method can often be slow and can cause worker sync / task runners to slow down. Main changes: - Run task completion callbacks in a separate executor to handle task completion updates - Add new config `druid.indexer.queue.taskCompleteHandlerNumThreads` - Add metrics to monitor number of processed and queued items - There are still other paths that can invoke `notifyStatus`, but those need not be moved to the new executor as they are synchronous on purpose. Other changes: - Add new metrics `task/status/queue/count`, `task/status/handled/count` - Add `TaskCountStatsProvider.getStats()` which deprecates the other `getXXXTaskCount` methods. - Use `CoordinatorRunStats` to collect and report metrics. This class has been used as is for now but will later be renamed and repurposed to use across all Druid services.	2023-07-07 20:43:12 +05:30
Gian Merlino	1fe61bc869	ChangeRequestHttpSyncer: Don't wait 1ms when checking isInitialized(). (#14547 ) The wait doesn't seem to serve a purpose, other than causing delays when checking isInitialized() for a large number of things that have not yet been initialized.	2023-07-07 05:54:39 -07:00
Kashif Faraz	d63eff3b1b	Reduce contention in HttpRemoteTaskRunner.getKnownTasks() (#14541 )	2023-07-07 13:43:59 +05:30
Gian Merlino	037f09bef2	HttpRemoteTaskRunner: Fix markLazyWorkers for maxLazyWorkers == 0. (#14532 )	2023-07-06 11:51:04 -07:00
Kashif Faraz	87bb1b9709	Fix bug during initialization of HttpServerInventoryView (#14517 ) If a server is removed during `HttpServerInventoryView.serverInventoryInitialized`, the initialization gets stuck as this server is never synced. The method eventually times out (default 250s). Fix: Mark a server as stopped if it is removed. `serverInventoryInitialized` only waits for non-stopped servers to sync. Other changes: - Add new metrics for better debugging of slow broker/coordinator startup - `segment/serverview/sync/healthy`: whether the server view is syncing properly with a server - `segment/serverview/sync/unstableTime`: time for which sync with a server has been unstable - Clean up logging in `HttpServerInventoryView` and `ChangeRequestHttpSyncer` - Minor refactor for readability - Add utility class `Stopwatch` - Add tests and stubs	2023-07-06 13:04:53 +05:30
AmatyaAvadhanula	609833c97b	Do not emit negative lag because of stale offsets (#14292 ) The latest topic offsets are polled frequently and used to determine the lag based on the current offsets. However, when the offsets are stale (which can happen due to connection issues commonly), we may see a negative lag . This PR prevents emission of metrics when the offsets are stale and at least one of the partitions has a negative lag.	2023-07-05 14:44:23 +05:30
Clint Wylie	277aaa5c57	remove druid.processing.columnCache.sizeBytes and CachingIndexed, combine string column implementations (#14500 ) * combine string column implementations changes: * generic indexed, front-coded, and auto string columns now all share the same column and index supplier implementations * remove CachingIndexed implementation, which I think is largely no longer needed by the switch of many things to directly using ByteBuffer, avoiding the cost of creating Strings * remove ColumnConfig.columnCacheSizeBytes since CachingIndexed was the only user	2023-07-02 19:37:15 -07:00
Karan Kumar	cb3a9d2b57	Adding Interactive API's for MSQ engine (#14416 ) This PR aims to expose a new API called "@path("/druid/v2/sql/statements/")" which takes the same payload as the current "/druid/v2/sql" endpoint and allows users to fetch results in an async manner.	2023-06-28 17:51:58 +05:30
Clint Wylie	31b9d5695d	Extend InitializedNullHandlingTest instead of NullHandlingTest (#14467 ) NullHandlingTest is an actual test, it shouldn't be used as a base class	2023-06-22 15:01:50 +05:30
imply-cheddar	cfd07a95b7	Errors take 3 (#14004 ) Introduce DruidException, an exception whose goal in life is to be delivered to a user. DruidException itself has javadoc on it to describe how it should be used. This commit both introduces the Exception and adjusts some of the places that are generating exceptions to generate DruidException objects instead, as a way to show how the Exception should be used. This work was a 3rd iteration on top of work that was started by Paul Rogers. I don't know if his name will survive the squash-and-merge, so I'm calling it out here and thanking him for starting on this.	2023-06-19 01:11:13 -07:00
George Shiqi Wu	64af9bfe5b	Add groupId to metrics (#14402 ) * Add group id as a dimension * Revert changes * Add to forking task runner * Add missing metrics * Fix indenting * revert metrics * Fix indentation	2023-06-16 09:28:16 -07:00
Gian Merlino	85656a467c	MSQ: Load broadcast tables on workers. (#14437 ) They were not previously loaded because supportsQueries was false. This patch sets supportsQueries to true, and clarifies in Task javadocs that supportsQueries can be true for tasks that aren't directly queryable over HTTP.	2023-06-16 12:02:20 +05:30
Clint Wylie	8454cc619a	auto columns fixes (#14422 ) changes: * auto columns no longer participate in generic 'null column' handling, this was a mistake to try to support and caused ingestion failures due to mismatched ColumnFormat, and will be replaced in the future with nested common format constant column functionality (not in this PR) * fix bugs with auto columns which contain empty objects, empty arrays, or primitive types mixed with either of these empty constructs * fix bug with bound filter when upper is null equivalent but is strict	2023-06-14 08:57:06 -07:00
Kashif Faraz	6e158704cb	Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271 ) Changes - Add a `DruidException` which contains a user-facing error message, HTTP response code - Make `EntryExistsException` extend `DruidException` - If metadata store max_allowed_packet limit is violated while inserting a new task, throw `DruidException` with response code 400 (bad request) to prevent retries - Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL	2023-06-10 12:15:44 +05:30
Harini Rajendran	4ff6026d30	Adding SegmentMetadataEvent and publishing them via KafkaEmitter (#14281 ) In this PR, we are enhancing KafkaEmitter, to emit metadata about published segments (SegmentMetadataEvent) into a Kafka topic. This segment metadata information that gets published into Kafka, can be used by any other downstream services to query Druid intelligently based on the segments published. The segment metadata gets published into kafka topic in json string format similar to other events.	2023-06-02 21:28:26 +05:30
Andreas Maechler	45014bd5b4	Handle all types of exceptions when initializing input source in sampler API (#14355 ) The sampler API returns a `400 bad request` response if it encounters a `SamplerException`. Otherwise, it returns a generic `500 Internal server error` response, with the message "The RuntimeException could not be mapped to a response, re-throwing to the HTTP container". This commit updates `RecordSupplierInputSource` to handle all types of exceptions instead of just `InterruptedException`and wrap them in a `SamplerException` so that the actual error is propagated back to the user.	2023-06-02 19:43:53 +05:30
zachjsh	04a82da63d	Input source security fixes (#14266 ) It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.	2023-06-01 16:37:19 -07:00
Rishabh Singh	2086ff88bc	Add logging for task stop operations (#14192 ) Log more details when task cannot be stopped for various reasons	2023-05-30 18:50:52 +05:30
Alexander Saydakov	4131c0df13	use the latest datasketches-java-4.0.0 (#14334 ) * use the latest datasketches-java-4.0.0 * updated versions of datasketches * adjusted expectation * fixed the expectations --------- Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2023-05-27 22:19:18 -07:00
Kashif Faraz	0cde3a8b52	Fix regression in batch segment allocation (#14337 ) * Improve batch segment allocation logs * Fix batch seg alloc regression * Fix logs * Fix logs * Fix tests and logs	2023-05-25 22:34:54 -07:00
AmatyaAvadhanula	e9913abbbf	Add new lock types: APPEND and REPLACE (#14258 ) * Add new lock types: APPEND and REPLACE	2023-05-14 22:38:32 -07:00
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
Kashif Faraz	ba11b3d462	Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235 ) Changes: - Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty` - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period. - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner` - Add utility class `Configs` - Update log, error messages and javadocs - Other minor style improvements	2023-05-12 22:39:56 +05:30
AmatyaAvadhanula	47e48ee657	Remove incorrect optimization (#14246 )	2023-05-11 00:54:41 -07:00
Clint Wylie	e833a4700d	suppress hadoop3 cve that seem not applicable to us (#14252 )	2023-05-10 23:08:05 -07:00
Abhishek Radhakrishnan	46dabab36d	Fix NPE in test parse exception report. Add more tests with different thresholds. (#14209 )	2023-05-05 10:05:41 -07:00
Abhishek Radhakrishnan	68f908e511	Fix uncaught `ParseException` when reading Avro from Kafka (#14183 ) In StreamChunkParser#parseWithInputFormat, we call byteEntityReader.read() without handling a potential ParseException, which is thrown during this function call by the delegate AvroStreamReader#intermediateRowIterator. A ParseException can be thrown if an Avro stream has corrupt data or data that doesn't conform to the schema specified or for other decoding reasons. This exception if uncaught, can cause ingestion to fail.	2023-05-04 12:35:36 +05:30
AmatyaAvadhanula	ac7181bbda	Persist supervisor spec only after successful start (#14150 ) * Persist spec after successful start * Fix checkstyle. * checkstyle after mvn install	2023-05-03 18:27:39 +05:30
Clint Wylie	90ea192d9c	fix bugs with auto encoded long vector deserializers (#14186 ) This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.	2023-05-01 11:49:27 +05:30
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
Tejaswini Bandlamudi	774073b2e7	Update Hadoop3 as default build version (#14005 ) Hadoop 2 often causes red security scans on Druid distribution because of the dependencies it brings. We want to move away from Hadoop 2 and provide Hadoop 3 distribution available. Switch druid to building with Hadoop 3 by default. Druid will still be compatible with Hadoop 2 and users can build hadoop-2 compatible distribution using hadoop2 profile.	2023-04-26 12:52:51 +05:30
Gian Merlino	a7d4162195	Compaction: Block input specs not aligned with segmentGranularity. (#14127 ) * Compaction: Block input specs not aligned with segmentGranularity. When input intervals are not aligned with segmentGranularity, data may be overshadowed if it lies in the space between the input intervals and the output segmentGranularity. In MSQ REPLACE, this is a validation error. IMO the same behavior makes sense for compaction tasks. In case anyone was depending on the ability to compact nonaligned intervals, a configuration parameter allowNonAlignedInterval is provided. I don't expect it to be used much. * Remove unused. * ITCompactionTaskTest uses non-aligned intervals.	2023-04-25 17:06:16 -07:00
Nicholas Lippis	9d4cc501f7	return task status reported by peon (#14040 ) * return task status reported by peon * Write TaskStatus to file in AbstractTask.cleanUp * Get TaskStatus from task log * Fix merge conflicts in AbstractTaskTest * Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage * Add license headerss * Fix style * Remove unknown exception declarations	2023-04-24 12:05:39 -07:00
TSFenwick	accd5536df	Allow for Log4J to be configured for peons but still ensure console logging is enforced (#14094 ) * Allow for Log4J to be configured for peons but still ensure console logging is enforced This change will allow for log4j to be configured for peons but require console logging is still configured for them to ensure peon logs are saved to deep storage. Also fixed the test ConsoleLoggingEnforcementTest to use a valid appender for the non console Config as the previous config was incorrect and would never return a logger. * fix checkstyle * add warning to logger when it overwrites all loggers to be console * optimize calls for altering logging config for ConsoleLoggingEnforcementConfigurationFactory add getName to the druid logger class * update docs, and error message * edit docs to be more clear * fix checkstyle issues * CI fixes - LoggerTest code coverage and fix spelling issue for logging docs	2023-04-24 10:41:56 -07:00
Clint Wylie	887f8db1b5	preserve explicitly specified dimension schema in "logical" schema of sampler response (#14144 )	2023-04-23 21:28:05 +05:30
zachjsh	04da0102cb	KillTask should return empty inputSource resources (#14106 ) ### Description This pr fixes a few bugs found with the inputSource security feature. 1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now. 2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against: `{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}` When they should be instead authenticated against: `{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}` 3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.	2023-04-18 15:27:16 -04:00
Adarsh Sanjeev	a7d5c64aeb	Move MSQ temporary storage to a runtime parameter instead of being configured from query context (#14061 ) * Adds new run time parameter druid.indexer.task.tmpStorageBytesPerTask. This sets a limit for the amount of temporary storage disk space used by tasks. This limit is currently only respected by MSQ tasks. * Removes query context parameters intermediateSuperSorterStorageMaxLocalBytes and composedIntermediateSuperSorterStorageEnabled. Composed intermediate super sorter (which was enabled by composedIntermediateSuperSorterStorageEnabled) is now enabled automatically if durableShuffleStorage is set to true. intermediateSuperSorterStorageMaxLocalBytes is calculated from the limit set by the run time parameter druid.indexer.task.tmpStorageBytesPerTask.	2023-04-18 16:56:51 +05:30
Rohan Garg	086b2b8efe	Log merge and push timings for PartialGenericSegmentMergeTask (#14089 )	2023-04-18 11:51:26 +05:30
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
Clint Wylie	179e2e8108	adjust useSchemaDiscovery to also include the behavior of includeAllDimensions to support partial schema declaration without having to set two flags (#14076 )	2023-04-12 23:12:49 -07:00
Clint Wylie	9ed8beca5e	bug fixes and add support for boolean inputs to classic long dimension indexer (#14069 ) changes: * adds support for boolean inputs to the classic long dimension indexer, which plays nice with LONG being the semi official boolean type in Druid, and even nicer when druid.expressions.useStrictBooleans is set to true, since the sampler when using the new 'auto' schema when 'useSchemaDiscovery' is specified on the dimensions spec will call the type out as LONG * fix bugs with sampler response and new schema discovery stuff incorrectly using classic 'json' type for the logical schema instead of the new 'auto' type	2023-04-11 20:49:52 -07:00
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
Karan Kumar	8712098301	Fixing overlord unable to become a leader when syncing the lock from metadata store. (#14038 )	2023-04-10 12:37:31 +05:30
zachjsh	5c0221375c	Allow for Input source security in native task layer (#14003 ) Fixes #13837. ### Description This change allows for input source type security in the native task layer. To enable this feature, the user must set the following property to true: `druid.auth.enableInputSourceSecurity=true` The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource. When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource. `new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ` where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc.. Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.	2023-04-06 13:13:09 -04:00
Clint Wylie	1c8a184677	add null safety checks for DiscoveryDruidNode services for more resilient http server and task views (#13930 ) * add null safety checks for DiscoveryDruidNode services for more resilient http server and task vi	2023-04-05 02:45:39 -07:00
Clint Wylie	d21babc5b8	remix nested columns (#14014 ) changes: * introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation * introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>. * revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility * fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)	2023-04-04 17:51:59 -07:00
Clint Wylie	518698a952	lower segment heap footprint and fix bug with expression type coercion (#14002 )	2023-03-31 13:53:22 -07:00
kaijianding	13ffeb50ba	should retry when failed to pause realtime task (#11515 )	2023-03-25 19:03:13 +05:30
Kashif Faraz	b7752a909c	Enable round-robin segment assignment and batch segment allocation by default (#13942 ) Changes: - Set `useRoundRobinSegmentAssignment` in coordinator dynamic config to `true` by default. - Set `batchSegmentAllocation` in `TaskLockConfig` (used in Overlord runtime properties) to `true` by default.	2023-03-22 08:20:01 +05:30
Gian Merlino	1c7a03a47b	Lower default maxRowsInMemory for realtime ingestion. (#13939 ) * Lower default maxRowsInMemory for realtime ingestion. The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads. However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 150k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 33 persists due to this limit, which is a reasonable number of persists. * Update tests. * Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Fix test. * Fix link. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-21 10:36:36 -07:00
Laksh Singla	c16d9da35a	Improve documentation for tombstone generation and minor improvement (#13907 ) * As a follow up to #13893, this PR improves the comments added along with examples for the code, as well as adds handling for an edge case where the generated tombstone boundaries were overshooting the bounds of MIN_TIME (or MAX_TIME).	2023-03-10 06:59:51 +05:30
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Laksh Singla	dc67296e9d	Fix for OOM in the Tombstone generating logic in MSQ (#13893 ) fix OOMs using a different logic for generating tombstones --------- Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com>	2023-03-08 21:38:08 -08:00
Clint Wylie	68db39d08a	fix ci (#13901 ) This PR is #13899 plus spotbugs fix to fix the failures introduced by #13815	2023-03-08 16:55:47 +05:30
Nicholas Lippis	faac43eabe	Use base task dir in kubernetes task runner (#13880 ) * Use TaskConfig to get task dir in KubernetesTaskRunner * Use the first path specified in baseTaskDirPaths instead of deprecated baseTaskDirPath * Use getBaseTaskDirPaths in generate command	2023-03-07 15:30:42 -07:00
Karan Kumar	65c3954942	Adding forbidden api for Properties#get() and Properties#getOrDefault() (#13882 ) Properties#getOrDefault method does not check the default map for values where as Properties#getProperty() does.	2023-03-06 10:42:04 +05:30
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Laksh Singla	ca68fd93a6	Generate tombstones when running MSQ's replace (#13706 ) *When running REPLACE queries, the segments which contain no data are dropped (marked as unused). This PR aims to generate tombstones in place of segments which contain no data to mark their deletion, as is the behavior with the native ingestion. This will cause InsertCannotReplaceExistingSegmentFault to be removed since it was generated if the interval to be marked unused didn't fully overlap one of the existing segments to replace.	2023-03-01 12:01:30 +05:30
Clint Wylie	1d8fff4096	sampler + type detection = bff (#13711 ) * sampler + type detection = bff * split logical and physical dimensions, tidy up	2023-02-28 04:14:30 -08:00
Abhishek Agarwal	d2dbb8b2c0	Fix infinite checkpointing between tasks and overlord (#13825 ) If the intermediate handoff period is less than the task duration and there is no new data in the input topic, task will continuously checkpoint the same offsets again and again. This PR fixes that bug by resetting the checkpoint time even when the task receives the same end offset request again.	2023-02-22 19:25:59 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Paul Rogers	333196d207	Code cleanup & message improvements (#13778 ) * Misc cleanup edits Correct spacing Add type parameters Add toString() methods to formats so tests compare correctly IT doc revisions Error message edits Display UT query results when tests fail * Edit * Build fix * Build fixes	2023-02-15 15:22:54 +05:30
Suneet Saldanha	714ac07b52	Allow users to add additional metadata to ingestion metrics (#13760 ) * Allow users to add additional metadata to ingestion metrics When submitting an ingestion spec, users may pass a map of metadata in the ingestion spec config that will be added to ingestion metrics. This will make it possible for operators to tag metrics with other metadata that doesn't necessarily line up with the existing tags like taskId. Druid clusters that ingest these metrics can take advantage of the nested data columns feature to process this additional metadata. * rename to tags * docs * tests * fix test * make code cov happy * checkstyle	2023-02-08 18:07:23 -08:00
AmatyaAvadhanula	34c04daa9f	Fix infinite iteration in http sync monitoring (#13731 ) * Fix infinite iteration in http task runner * Fix infinite iteration in http server view * Add tests	2023-02-08 15:14:11 +05:30
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
Churro	f022a9f246	When a task fails and doesn't throw an exception, report it correctly… (#13668 ) * When a task fails and doesn't throw an exception, report it correctly in mm-less druid * Removing unthrown exception from test	2023-02-02 09:04:18 -08:00
Kashif Faraz	7c188d80b8	Make batch segment allocation logs less noisy (#13725 )	2023-02-02 09:54:53 +05:30
Clint Wylie	fb26a1093d	discover nested columns when using nested column indexer for schemaless ingestion (#13672 ) * discover nested columns when using nested column indexer for schemaless * move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec	2023-01-18 12:57:28 -08:00
Gian Merlino	182c4fad29	Kinesis: More robust default fetch settings. (#13539 ) * Kinesis: More robust default fetch settings. 1) Default recordsPerFetch and recordBufferSize based on available memory rather than using hardcoded numbers. For this, we need an estimate of record size. Use 10 KB for regular records and 1 MB for aggregated records. With 1 GB heaps, 2 processors per task, and nonaggregated records, recordBufferSize comes out to the same as the old default (10000), and recordsPerFetch comes out slightly lower (1250 instead of 4000). 2) Default maxRecordsPerPoll based on whether records are aggregated or not (100 if not aggregated, 1 if aggregated). Prior default was 100. 3) Default fetchThreads based on processors divided by task count on Indexers, rather than overall processor count. 4) Additionally clean up the serialized JSON a bit by adding various JsonInclude annotations. * Updates for tests. * Additional important verify.	2023-01-13 11:03:54 +05:30
Clint Wylie	b5b740bbbb	allow using nested column indexer for schema discovery (#13653 ) * single typed "root" only nested columns now mimic "regular" columns of those types * incremental index can now use nested column indexer instead of string indexer for discovered columns	2023-01-12 18:31:12 -08:00
Adarsh Sanjeev	0a486c3bcf	Update forbidden apis with fixed executor (#13633 ) * Update forbidden apis with fixed executor	2023-01-12 15:34:36 +05:30
Karan Kumar	56076d33fb	Worker retry for MSQ task (#13353 ) * Initial commit. * Fixing error message in retry exceeded exception * Cleaning up some code * Adding some test cases. * Adding java docs. * Finishing up state test cases. * Adding some more java docs and fixing spot bugs, intellij inspections * Fixing intellij inspections and added tests * Documenting error codes * Migrate current integration batch tests to equivalent MSQ tests (#13374) * Migrate current integration batch tests to equivalent MSQ tests using new IT framework * Fix build issues * Trigger Build * Adding more tests and addressing comments * fixBuildIssues * fix dependency issues * Parameterized the test and addressed comments * Addressing comments * fixing checkstyle errors * Adressing comments * Adding ITTest which kills the worker abruptly * Review comments phase one * Adding doc changes * Adjusting for single threaded execution. * Adding Sequential Merge PR state handling * Merge things * Fixing checkstyle. * Adding new context param for fault tolerance. Adding stale task handling in sketchFetcher. Adding UT's. * Merge things * Merge things * Adding parameterized tests Created separate module for faultToleranceTests * Adding missed files * Review comments and fixing tests. * Documentation things. * Fixing IT * Controller impl fix. * Fixing racy WorkerSketchFetcherTest.java exception handling. Co-authored-by: abhagraw <99210446+abhagraw@users.noreply.github.com> Co-authored-by: Karan Kumar <cryptoe@karans-mbp.lan>	2023-01-11 07:38:29 +05:30
Maytas Monsereenusorn	62a105ee65	Add context to HadoopIngestionSpec (#13624 ) * add context to HadoopIngestionSpec * fix alert	2023-01-09 14:37:02 -10:00
AmatyaAvadhanula	af05cfa78c	Fix shutdown in httpRemote task runner (#13558 ) * Fix shutdown in httpRemote task runner * Add UT	2022-12-22 14:50:04 +05:30
imply-cheddar	089d8da561	Support Framing for Window Aggregations (#13514 ) * Support Framing for Window Aggregations This adds support for framing over ROWS for window aggregations. Still not implemented as yet: 1. RANGE frames 2. Multiple different frames in the same query 3. Frames on last/first functions	2022-12-14 18:04:39 -08:00
Kashif Faraz	58a3acc2c4	Add InputStats to track bytes processed by a task (#13520 ) This commit adds a new class `InputStats` to track the total bytes processed by a task. The field `processedBytes` is published in task reports along with other row stats. Major changes: - Add class `InputStats` to track processed bytes - Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes. > Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost. - Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask` - Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method. - Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports Other changes: - Update tests to verify the value of `processedBytes` - Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class - Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager` - Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`	2022-12-13 18:54:42 +05:30
somu-imply	7682b0b6b1	Analysis refactor (#13501 ) Refactor DataSource to have a getAnalysis method() This removes various parts of the code where while loops and instanceof checks were being used to walk through the structure of DataSource objects in order to build a DataSourceAnalysis. Instead we just ask the DataSource for its analysis and allow the stack to rebuild whatever structure existed.	2022-12-12 17:35:44 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Kashif Faraz	69951273b8	Fix typo in metric name (#13521 )	2022-12-08 06:41:23 +05:30
Kashif Faraz	c7229fc787	Limit max batch size for segment allocation, add docs (#13503 ) Changes: - Limit max batch size in `SegmentAllocationQueue` to 500 - Rename `batchAllocationMaxWaitTime` to `batchAllocationWaitTime` since the actual wait time may exceed this configured value. - Replace usage of `SegmentInsertAction` in `TaskToolbox` with `SegmentTransactionalInsertAction`	2022-12-07 10:07:14 +05:30
Gian Merlino	fda0a1aadd	Set chatAsync default to true. (#13491 ) This functionality was originally added in #13354.	2022-12-05 20:53:59 -08:00
Kashif Faraz	45a8fa280c	Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369 ) In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions on the overlord can often take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. The root causes are: - large number of metadata calls made to the segments and pending segments tables - `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments Since the contention typically arises when several tasks of the same datasource try to allocate segments for the same interval/granularity, the allocation run times can be improved by batching the requests together. Changes - Add flags - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`) - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`) - Add methods `canPerformAsync` and `performAsync` to `TaskAction` - Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch - Process batch after `batchAllocationMaxWaitTime` - Acquire `giant` lock just once per batch in `TaskLockbox` - Reduce metadata calls by batching statements together and updating query filters - Except for batching, retain the whole behaviour (order of steps, retries, etc.) - Respond to leadership changes and fail items in queue when not leader - Emit batch and request level metrics	2022-12-05 14:00:07 +05:30
AmatyaAvadhanula	cc307e4c29	Fix needless task shutdown on leader switch (#13411 ) * Fix needless task shutdown on leader switch * Add unit test * Fix style * Fix UTs	2022-12-01 18:31:08 +05:30
Tejaswini Bandlamudi	b091b32f21	Fixes reindexing bug with filter on long column (#13386 ) * fixes BlockLayoutColumnarLongs close method to nullify internal buffer. * fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers. * fix spotbugs	2022-11-25 19:22:48 +05:30
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
Gian Merlino	bfffbabb56	Async task client for SeekableStreamSupervisors. (#13354 ) Main changes: 1) Convert SeekableStreamIndexTaskClient to an interface, move old code to SeekableStreamIndexTaskClientSyncImpl, and add new implementation SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient. 2) Add "chatAsync" parameter to seekable stream supervisors that causes the supervisor to use an async task client. 3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making blocking RPC calls in workerExec threads. 4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList to FutureUtils.coalesce, so we can better capture the errors that occurred with contacting individual tasks. Other, related changes: 1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether ServiceClient retries unavailable services. Useful since we do not want to retry calls unavailable tasks within the service client. (The supervisor does its own higher-level retries.) 2) Add FutureUtils.transformAsync, a more lambda friendly version of Futures.transform(f, AsyncFunction). 3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but returns Either instead of using null on error. 4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.	2022-11-21 19:20:26 +05:30
Gian Merlino	b8ca03d283	SeekableStreamSupervisor: Unique type name for GracefulShutdownNotice. (#13399 ) Allows GracefulShutdownNotice to be differentiated from ShutdownNotice.	2022-11-21 19:10:14 +05:30
AmatyaAvadhanula	de566eb0db	Fix shared lock acquisition criteria (#13390 ) Currently, a shared lock is acquired only when all other locks are also shared locks. This commit updates the behaviour and acquires a shared lock only if all locks of equal or higher priority are either shared locks or are already revoked. The lock type of locks with lower priority does not matter as they can be revoked.	2022-11-21 15:31:38 +05:30
Gian Merlino	c61313f4c4	Quieter streaming supervisors. (#13392 ) Eliminates two common sources of noise with Kafka supervisors that have large numbers of tasks and partitions: 1) Log the report at DEBUG rather than INFO level at each run cycle. It can get quite large, and can be retrieved via API when needed. 2) Use log4j2.xml to quiet down the org.apache.kafka.clients.consumer.internals package. Avoids a log message per-partition per-minute as part of seeking to the latest offset in the reporting thread. In the tasks, where this sort of logging might be more useful, we have another log message with the same information: "Seeking partition[%s] to[%s]".	2022-11-20 23:53:17 -08:00
Rohan Garg	6ccf31490e	Allow injection of node-role set to all non base modules (#13371 )	2022-11-18 12:12:03 +05:30
Gian Merlino	e78f648023	SeekableStreamSupervisor: Don't enqueue duplicate notices. (#13334 ) * SeekableStreamSupervisor: Don't enqueue duplicate notices. Similar goal to #12018, but more aggressive. Don't enqueue a notice at all if it is equal to one currently in the queue. * Adjustments from review. * Update indexing-service/src/test/java/org/apache/druid/indexing/overlord/supervisor/NoticesQueueTest.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-11-11 01:54:01 -08:00
Gian Merlino	77478f25fb	Add taskActionType dimension to task/action/run/time. (#13333 ) * Add taskActionType dimension to task/action/run/time. * Spelling.	2022-11-11 12:00:08 +05:30
AmatyaAvadhanula	fb23e38aa7	Fix messageGap emission (#13346 ) * Fix messageGap emission * Do not emit messageGap after stopping reading events * Refactoring * Fix tests	2022-11-10 17:50:19 +05:30
Clint Wylie	44f29030dd	fix flaky RemoteTaskRunnerTest.testRunPendingTaskFailToAssignTask with ugly Thread.sleep (#13344 )	2022-11-10 14:28:53 +05:30
AmatyaAvadhanula	0512ae4922	Optimize metadata calls in SeekableStreamSupervisor (#13328 ) * Optimize metadata calls * Modify isTaskCurrent * Fix tests * Refactoring	2022-11-10 07:22:51 +05:30
AmatyaAvadhanula	a2013e6566	Enhance streaming ingestion metrics (#13331 ) Changes: - Add a metric for partition-wise kafka/kinesis lag for streaming ingestion. - Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR} - Document metrics	2022-11-09 23:44:15 +05:30
Tejaswini Bandlamudi	594545da55	Adds cluster level idleConfig setting for supervisor (#13311 ) * adds cluster level idleConfig * updates docs * refactoring * spelling nit * nit * nit * refactoring	2022-11-08 14:54:14 +05:30
AmatyaAvadhanula	a738ac9ad7	Improve task pause logging and metrics for streaming ingestion (#13313 ) * Improve task pause logging and metrics for streaming ingestion * Add metrics doc * Fix spelling	2022-11-07 21:33:54 +05:30
AmatyaAvadhanula	650840ddaf	Add segment handoff time metric (#13238 ) * Add segment handoff time metric * Remove monitors on scheduler stop * Add warning log for slow handoff * Remove monitor when scheduler stops	2022-11-07 17:49:10 +05:30
Gian Merlino	227b57dd8e	Compaction: Fetch segments one at a time on main task; skip when possible. (#13280 ) * Compaction: Fetch segments one at a time on main task; skip when possible. Compact tasks include the ability to fetch existing segments and determine reasonable defaults for granularitySpec, dimensionsSpec, and metricsSpec. This is a useful feature that makes compact tasks work well even when the user running the compaction does not have a clear idea of what they want the compacted segments to be like. However, this comes at a cost: it takes time, and disk space, to do all of these fetches. This patch improves the situation in two ways: 1) When segments do need to be fetched, download them one at a time and delete them when we're done. This still takes time, but minimizes the required disk space. 2) Don't fetch segments on the main compact task when they aren't needed. If the user provides a full granularitySpec, dimensionsSpec, and metricsSpec, we can skip it. * Adjustments. * Changes from code review. * Fix logic for determining rollup.	2022-11-07 14:50:14 +05:30
Jonathan Wei	2fdaa2fcab	Make RecordSupplierInputSource respect sampler timeout when stream is empty (#13296 ) * Make RecordSupplierInputSource respect sampler timeout when stream is empty * Rename timeout param, make it nullable, add timeout test	2022-11-03 17:45:35 -05:00
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
Kashif Faraz	fd7864ae33	Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments (#13287 ) In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments` can take a long very long time to finish. This is because of the costly invocation of `timeline.isOvershadowed` which is done for every used segment in every coordinator run. Changes - Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments - Iterate over this set instead of all used segments to identify segments that can be marked as unused - Mark segments as unused in the DB in batches rather than one at a time - Refactor: Add class `SegmentTimeline` for ease of use and readability while using a `VersionedIntervalTimeline` of segments.	2022-11-01 20:19:52 +05:30
AmatyaAvadhanula	e1ff3ca289	Resume streaming tasks on Overlord switch (#13223 ) * Resume streaming tasks on Overlord switch * Refactoring and better messages * Better docs * Add unit test * Fix tests' setup * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Better logs * Fix test again Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-10-29 09:38:49 +05:30
Gian Merlino	5429b9d764	RTR: Dedupe items in getKnownTasks. (#13273 ) Fixes a problem where the tasks API in OverlordResource would complain about duplicate keys in the map it's building.	2022-10-28 08:31:26 -07:00
AmatyaAvadhanula	9cbda66d96	Remove skip ignorable shards (#13221 ) * Revert "Improve kinesis task assignment after resharding (#12235)" This reverts commit `1ec57cb935`.	2022-10-28 16:19:01 +05:30
Gian Merlino	d98c808d3f	Remove basePersistDirectory from tuning configs. (#13040 ) * Remove basePersistDirectory from tuning configs. Since the removal of CliRealtime, it serves no purpose, since it is always overridden in production using withBasePersistDirectory given some subdirectory of the task work directory. Removing this from the tuning config has a benefit beyond removing no-longer-needed logic: it also avoids the side effect of empty "druid-realtime-persist" directories getting created in the systemwide temp directory. * Test adjustments to appropriately set basePersistDirectory. * Remove unused import. * Fix RATC constructor.	2022-10-21 17:25:36 -07:00
AmatyaAvadhanula	b88e1c21ea	Fix Overlord leader election when task lock re-acquisition fails (#13172 ) Overlord leader election can sometimes fail due to task lock re-acquisition issues. This commit solves the issue by failing such tasks and clearing all their locks.	2022-10-17 15:23:16 +05:30
Tejaswini Bandlamudi	3e13584e0e	Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144 ) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes	2022-10-12 18:31:08 +05:30
Kashif Faraz	b07f01d645	Set useMaxMemoryEstimates=false by default (#13178 ) A value of `false` denotes that the new flow with improved estimates will be used.	2022-10-04 15:04:23 +05:30
Kashif Faraz	ce5f55e5ce	Fix over-replication caused by balancing when inventory is not updated yet (#13114 ) * Add coordinator test framework * Remove outdated changes * Add more tests * Add option to auto-sync inventory * Minor cleanup * Fix inspections * Add README for simulations, add SegmentLoadingNegativeTest * Fix over-replication from balancing * Fix README * Cleanup unnecessary fields from DruidCoordinator * Add a test * Fix DruidCoordinatorTest * Remove unused import * Fix CuratorDruidCoordinatorTest * Remove test log4j2.xml	2022-09-29 12:06:23 +05:30
Jonathan Wei	1f1fced6d4	Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089 ) * Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON * Fix serde and docs * Add PR comment check	2022-09-26 19:51:04 -05:00
Laksh Singla	0bfa81b7df	Fix the Injector creation in HadoopTask (#13138 ) * Injector fix in HadoopTask * Log the ExtensionsConfig while instantiating the HadoopTask * Log the config in the run() method instead of the ctor	2022-09-24 10:38:25 +05:30
Jonathan Wei	331e6d707b	Add KafkaConfigOverrides extension point (#13122 ) * Add KafkaConfigOverrides extension point * X	2022-09-21 11:47:19 +05:30
Gian Merlino	2e729170cc	Kill task: Don't include markAsUnused unless set. (#13104 ) Cleans up the serialized JSON.	2022-09-17 14:03:34 -07:00
AmatyaAvadhanula	1311e85f65	Faster fix for dangling tasks upon supervisor termination (#13072 ) This commit fixes issues with delayed supervisor termination during certain transient states. Tasks can be created during supervisor termination and left behind since the cleanup may not consider these newly added tasks. #12178 added a lock for the entire process of task creation to prevent such dangling tasks. But it also introduced a deadlock scenario as follows: - An invocation of `runInternal` is in progress. - A `stop` request comes, acquires `stateChangeLock` and submit a `ShutdownNotice` - `runInternal` keeps waiting to acquire the `stateChangeLock` - `ShutdownNotice` remains stuck in the notice queue because `runInternal` is still running - After some timeout, the supervisor goes through a forced termination Fix: * `SeekableStreamSupervisor.runInternal` - do not try to acquire lock if supervisor is already stopping * `SupervisorStateManager.maybeSetState` - do not allow transitions from STOPPING state	2022-09-15 15:31:14 +05:30
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Karan Kumar	275f834b2a	Race in Task report/log streamer (#12931 ) * Fixing RACE in HTTP remote task Runner * Changes in the interface * Updating documentation * Adding test cases to SwitchingTaskLogStreamer * Adding more tests	2022-08-25 17:56:01 -07:00
Clint Wylie	8ee8786d3c	add maxBytesInMemory and maxClientResponseBytes to SamplerConfig (#12947 ) * add maxBytesInMemory and maxClientResponseBytes to SamplerConfig	2022-08-25 00:50:41 -07:00
Gian Merlino	35aaaa9573	Fix serialization in TaskReportFileWriters. (#12938 ) * Fix serialization in TaskReportFileWriters. For some reason, serializing a Map<String, TaskReport> would omit the "type" field. Explicitly sending each value through the ObjectMapper fixes this, because the type information does not get lost. * Fixes for static analysis.	2022-08-24 08:11:01 -07:00
Adarsh Sanjeev	3b58a01c7c	Correct spelling in messages and variable names. (#12932 )	2022-08-24 11:06:31 +05:30
Gian Merlino	d7d15ba51f	Add druid-multi-stage-query extension. (#12918 ) * Add druid-multi-stage-query extension. * Adjustments from CI. * Task ID validation. * Various changes from code review. * Remove unnecessary code. * LGTM-related.	2022-08-23 18:44:01 -07:00
Karan Kumar	a3a9c5f409	Fixing overlord issued too many redirects (#12908 ) * Fixing race in overlord redirects where the node was redirecting to itself * Fixing test cases	2022-08-17 18:27:39 +05:30
Abhishek Agarwal	adbebc174a	Fix flaky tests in SeekableStreamSupervisorStateTest (#12875 ) * Fix flaky test in SeekableStreamSupervisorStateTest * Fix for flaky security IT Test * fix tests * retry queries if there is some flakiness	2022-08-16 18:38:03 +05:30
Gian Merlino	28836dfa71	Fix race in TaskQueue.notifyStatus. (#12901 ) * Fix race in TaskQueue.notifyStatus. It was possible for manageInternal to relaunch a task while it was being cleaned up, due to a race that happens when notifyStatus is called to clean up a successful task: 1) In a critical section, notifyStatus removes the task from "tasks". 2) Outside a critical section, notifyStatus calls taskRunner.shutdown to let the task runner know it can clear out its data structures. 3) In a critical section, syncFromStorage adds the task back to "tasks", because it is still present in metadata storage. 4) In a critical section, manageInternalCritical notices that the task is in "tasks" and is not running in the taskRunner, so it launches it again. 5) In a (different) critical section, notifyStatus updates the metadata store to set the task status to SUCCESS. 6) The task continues running even though it should not be. The possibility for this race was introduced in #12099, which shrunk the critical section in notifyStatus. Prior to that patch, a single critical section encompassed (1), (2), and (5), so the ordering above was not possible. This patch does the following: 1) Fixes the race by adding a recentlyCompletedTasks set that prevents the main management loop from doing anything with tasks that are currently being cleaned up. 2) Switches the order of the critical sections in notifyStatus, so metadata store updates happen first. This is useful in case of server failures: it ensures that if the Overlord fails in the midst of notifyStatus, then completed-task statuses are still available in ZK or on MMs for the next Overlord. (Those are cleaned up by taskRunner.shutdown, which formerly ran first.) This isn't related to the race described above, but is fixed opportunistically as part of the same patch. 3) Changes the "tasks" list to a map. Many operations require retrieval or removal of individual tasks; those are now O(1) instead of O(N) in the number of running tasks. 4) Changes various log messages to use task ID instead of full task payload, to make the logs more readable. * Fix format string. * Update comment.	2022-08-14 23:34:36 -07:00
Herb Brewer	9f8982a9a6	fix(druid-indexing): failed to get shardSpec for interval issue (#12573 )	2022-08-05 17:57:36 -07:00
AmatyaAvadhanula	d294404924	Kinesis ingestion with empty shards (#12792 ) Kinesis ingestion requires all shards to have at least 1 record at the required position in druid. Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic. Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively. These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard. If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset. These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number. However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation: The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering) Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read. However, the equality check is exclusive when dealing with UNREAD tokens.	2022-08-05 22:38:58 +05:30
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
刘小辉	6f5c1434b8	fix get task may be null (#12100 )	2022-08-03 09:23:48 -07:00
Maytas Monsereenusorn	5417aa2055	Fix: ParseException swallow cause Exception (#12810 ) * add impl * add impl * fix checkstyle	2022-07-22 13:46:28 -07:00
Tejaswini Bandlamudi	cc1ff56ca5	Unregisters `RealtimeMetricsMonitor`, `TaskRealtimeMetricsMonitor` on Indexers after task completion (#12743 ) Few indexing tasks register RealtimeMetricsMonitor or TaskRealtimeMetricsMonitor with the process’s MonitorScheduler when they start. These monitors never unregister themselves (they always return true, they'd need to return false to unregister). Each of these monitors emits a set of metrics once every druid.monitoring.emissionPeriod. As a result, after executing several tasks for a while, Indexer emits metrics of these tasks even after they're long gone. Proposed Solution Since one should be able to obtain the last round of ingestion metrics after the task unregisters the monitor, introducing lastRoundMetricsToBePushed variable to keep track of the same and overriding the AbstractMonitor.monitor method in RealtimeMetricsMonitor, TaskRealtimeMetricsMonitor to implement the new logic.	2022-07-18 14:34:18 +05:30
Abhishek Agarwal	2ab20c9fc9	Surface more information about task status in tests (#12759 ) I see some test runs failing because task status is not as expected. It will be helpful to know what error the task has.	2022-07-13 14:53:53 +05:30
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Gian Merlino	d2576584a0	Consolidate the two TaskStatus classes. (#12765 ) * Consolidate the two TaskStatus classes. There are two, but we don't need more than one. * Fix import order.	2022-07-11 07:25:22 -07:00
Didip Kerabat	06251c5d2a	Add EIGHT_HOUR into possible list of Granularities. (#12717 ) * Add EIGHT_HOUR into possible list of Granularities. * Add the missing definition. * fix test. * Fix another test. * Stylecheck finally passed. Co-authored-by: Didip Kerabat <didip@apple.com>	2022-07-05 11:05:37 -07:00
Gian Merlino	2b330186e2	Mid-level service client and updated high-level clients. (#12696 ) * Mid-level service client and updated high-level clients. Our servers talk to each other over HTTP. We have a low-level HTTP client (HttpClient) that is super-asynchronous and super-customizable through its handlers. It's also proven to be quite robust: we use it for Broker -> Historical communication over the wide variety of query types and workloads we support. But the low-level client has no facilities for service location or retries, which means we have a variety of high-level clients that implement these in their own ways. Some high-level clients do a better job than others. This patch adds a mid-level ServiceClient that makes it easier for high-level clients to be built correctly and harmoniously, and migrates some of the high-level logic to use ServiceClients. Main changes: 1) Add ServiceClient org.apache.druid.rpc package. That package also contains supporting stuff like ServiceLocator and RetryPolicy interfaces, and a DiscoveryServiceLocator based on DruidNodeDiscoveryProvider. 2) Add high-level OverlordClient in org.apache.druid.rpc.indexing. 3) Indexing task client creator in TaskServiceClients. It uses SpecificTaskServiceLocator to find the tasks. This improves on ClientInfoTaskProvider by caching task locations for up to 30 seconds across calls, reducing load on the Overlord. 4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient instead of extending IndexTaskClient. 5) Rework RemoteTaskActionClient to use a ServiceClient instead of DruidLeaderClient. 6) Rework LocalIntermediaryDataManager, TaskMonitor, and ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and Overlord no longer need IndexingServiceClient (which internally used DruidLeaderClient). There are some concrete benefits over the prior logic, namely: - DruidLeaderClient does retries in its "go" method, but only retries exactly 5 times, does not sleep between retries, and does not retry retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.) ServiceClient handles retries in a more reasonable way. - DruidLeaderClient's methods are all synchronous, whereas ServiceClient methods are asynchronous. This is used in one place so far: the SpecificTaskServiceLocator, so we don't need to block a thread trying to locate a task. It can be used in other places in the future. - HttpIndexingServiceClient does not properly handle all server errors. In some cases, it tries to parse a server error as a successful response (for example: in getTaskStatus). - IndexTaskClient currently makes an Overlord call on every task-to-task HTTP request, as a way to find where the target task is. ServiceClient, through SpecificTaskServiceLocator, caches these target locations for a period of time. * Style adjustments. * For the coverage. * Adjustments. * Better behaviors. * Fixes.	2022-07-05 09:43:26 -07:00
imply-cheddar	e3128e3fa3	Poison stupid pool (#12646 ) * Poison StupidPool and fix resource leaks There are various resource leaks from test setup as well as some corners in query processing. We poison the StupidPool to start failing tests when the leaks come and fix any issues uncovered from that so that we can start from a clean baseline. Unfortunately, because of how poisoning works, we can only fail future checkouts from the same pool, which means that there is a natural race between a leak happening -> GC occurs -> leak detected -> pool poisoned. This race means that, depending on interleaving of tests, if the very last time that an object is checked out from the pool leaks, then it won't get caught. At some point in the future, something will catch it, however and from that point on it will be deterministic. * Remove various things left over from iterations * Clean up FilterAnalysis and add javadoc on StupidPool * Revert changes to .idea/misc.xml that accidentally got pushed * Style and test branches * Stylistic woes	2022-07-03 14:36:22 -07:00
Kashif Faraz	f5b5cb93ea	Fix expiry timeout bug in LocalIntermediateDataManager (#12722 ) The expiry timeout is compared against the current time but the condition is reversed. This means that as soon as a supervisor task finishes, its partitions are cleaned up, irrespective of the specified `intermediaryPartitionTimeout` period. After these changes, the `intermediaryPartitionTimeout` will start getting honored. Changes * Fix the condition * Add tests to verify the new correct behaviour * Reduce the default expiry timeout from P1D to PT5M to retain current behaviour in case of default configs.	2022-07-01 16:29:22 +05:30
Gian Merlino	d5abd06b96	Fix flaky KafkaIndexTaskTest. (#12657 ) * Fix flaky KafkaIndexTaskTest. The testRunTransactionModeRollback case had many race conditions. Most notably, it would commit a transaction and then immediately check to see that the results were not indexed. This is racey because it relied on the indexing thread being slower than the test thread. Now, the case waits for the transaction to be processed by the indexing thread before checking the results. * Changes from review.	2022-06-24 13:53:51 -07:00
Gian Merlino	4d892483ca	Fix thread-unsafe emitter usage in SeekableStreamSupervisorStateTest. (#12658 ) The TestEmitter is used from different threads without concurrency control. This patch makes the emitter thread-safe.	2022-06-22 22:29:16 -07:00
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30

1 2 3 4 5 ...

2182 Commits