druid

Commit Graph

Author	SHA1	Message	Date
Rishabh Singh	66c3cc1391	Handle unparseable SupervisorSpec in metadata store (#14382 ) Changes: - Skip a supervisor spec entry which cannot be deserialised into a `SupervisorSpec` object. - Log an error for the unparseable spec	2023-06-13 08:02:01 +05:30
Rishabh Singh	8b212e73d7	Add method to authorize native query using authentication result (#14376 )	2023-06-12 11:06:00 +05:30
Kashif Faraz	6e158704cb	Do not retry INSERT task into metadata if max_allowed_packet limit is violated (#14271 ) Changes - Add a `DruidException` which contains a user-facing error message, HTTP response code - Make `EntryExistsException` extend `DruidException` - If metadata store max_allowed_packet limit is violated while inserting a new task, throw `DruidException` with response code 400 (bad request) to prevent retries - Add `SQLMetadataConnector.isRootCausePacketTooBigException` with impl for MySQL	2023-06-10 12:15:44 +05:30
Abhishek Radhakrishnan	23c2dcaf8d	Add NullHandling module initialization for `LookupDimensionSpecTest` (#14393 )	2023-06-09 09:07:32 +05:30
Kashif Faraz	12e8fa5c97	Prevent coordinator from getting stuck if leadership changes during coordinator run (#14385 ) Changes: - Add a timeout of 1 minute to resultFuture.get() in `CostBalancerStrategy.chooseBestServer`. 1 minute is the typical time for a full coordinator run and is more than enough time for cost computations of a single segment. - Raise an alert if an exception is encountered while computing costs and if the executor has not been shutdown. This is because a shutdown is intentional and does not require an alert.	2023-06-08 15:29:20 +05:30
zachjsh	04a82da63d	Input source security fixes (#14266 ) It was found that several supported tasks / input sources did not have implementations for the methods used by the input source security feature, causing these tasks and input sources to fail when used with this feature. This pr adds the needed missing implementations. Also securing the sampling endpoint with input source security, when enabled.	2023-06-01 16:37:19 -07:00
Kashif Faraz	d4cacebf79	Add tests for CostBalancerStrategy (#14230 ) Changes: - `CostBalancerStrategyTest` - Focus on verification of cost computations rather than choosing servers in this test - Add new tests `testComputeCost` and `testJointSegmentsCost` - Add tests to demonstrate that with a long enough interval gap, all costs become negligible - Retain `testIntervalCost` and `testIntervalCostAdditivity` - Remove redundant tests such as `testStrategyMultiThreaded`, `testStrategySingleThreaded`as verification of this behaviour is better suited to `BalancingStrategiesTest`. - `CostBalancerStrategyBenchmark` - Remove usage of static method from `CostBalancerStrategyTest` - Explicitly setup cluster and segments to use for benchmarking	2023-05-30 08:52:56 +05:30
Kashif Faraz	8091c6a547	Update default values in CoordinatorDynamicConfig (#14269 ) The defaults of the following config values in the `CoordinatorDynamicConfig` are being updated. 1. `maxSegmentsInNodeLoadingQueue = 500` (previous = 100) 2. `replicationThrottleLimit = 500` (previous = 10) Rationale: With round-robin segment assignment now being the default assignment technique, the Coordinator can assign a large number of under-replicated/unavailable segments very quickly, without getting stuck in `RunRules` duty due to very slow strategy-based cost computations. 3. `maxSegmentsToMove = 100` (previous = 5) Rationale: A very low value (say 5) is ineffective in balancing especially if there are many segments to balance. A very large value can cause excessive moves, which has these disadvantages: - Load of moving segments competing with load of unavailable/under-replicated segments - Unnecessary network costs due to constant download and delete of segments These defaults will be revisited after #13197 is merged.	2023-05-30 08:51:33 +05:30
Soumyava	22ba457d29	Expr getCacheKey now delegates to children (#14287 ) * Expr getCacheKey now delegates to children * Removed the LOOKUP_EXPR_CACHE_KEY as we do not need it * Adding an unit test * Update processing/src/main/java/org/apache/druid/math/expr/Expr.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-05-23 14:49:38 -07:00
imply-cheddar	e9fed1445f	Revert PreResponseAuthorizationCheckFilter (#13813 ) Make it permissive like it used to be again so that we ensure that validation errors make it out.	2023-05-18 18:16:43 -07:00
Paul Rogers	3c0983c8e9	Extend the IT framework to allow tests in extensions (#13877 ) The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.	2023-05-15 20:29:51 +05:30
imply-cheddar	f9861808bc	Be able to load segments on Peons (#14239 ) * Be able to load segments on Peons This change introduces a new config on WorkerConfig that indicates how many bytes of each storage location to use for storage of a task. Said config is divided up amongst the locations and slots and then used to set TaskConfig.tmpStorageBytesPerTask The Peons use their local task dir and tmpStorageBytesPerTask as their StorageLocations for the SegmentManager such that they can accept broadcast segments.	2023-05-12 16:51:00 -07:00
Kashif Faraz	ba11b3d462	Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235 ) Changes: - Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty` - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period. - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner` - Add utility class `Configs` - Update log, error messages and javadocs - Other minor style improvements	2023-05-12 22:39:56 +05:30
Kashif Faraz	64e6283eca	Do not allow retention rules to be null (#14223 ) Changes: - Do not allow retention rules for any datasource or cluster to be null - Allow empty rules at the datasource level but not at the cluster level - Add validation to ensure that `druid.manager.rules.defaultRule` is always set correctly - Minor style refactors	2023-05-11 14:33:56 +05:30
Clint Wylie	a7a4bfd331	modify QueryScheduler to lazily acquire lanes when executing queries to avoid leaks (#14184 ) This PR fixes an issue that could occur if druid.query.scheduler.numThreads is configured and any exception occurs after QueryScheduler.run has been called to create a Sequence. This would result in total and/or lane specific locks being acquired, but because the sequence was not actually being evaluated, the "baggage" which typically releases these locks was not being executed. An example of how this can happen is if a group-by having filter, which wraps and transforms this sequence happens to explode while wrapping the sequence. The end result is that the locks are acquired, but never released, eventually halting the ability to execute any queries.	2023-05-08 11:42:05 +05:30
Clint Wylie	90ea192d9c	fix bugs with auto encoded long vector deserializers (#14186 ) This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+. While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.	2023-05-01 11:49:27 +05:30
George Shiqi Wu	d0654e2174	Register emitter (#14180 )	2023-04-27 18:32:50 -07:00
Suneet Saldanha	84c11df980	Make LoggingEmitter more useful by using Markers (#14121 ) * Make LoggingEmitter more useful * Skip code coverage for facade classes * fix spellcheck * code review * fix dependency * logging.md * fix checkstyle * Add back jacoco version to main pom	2023-04-27 15:06:06 -07:00
Gian Merlino	752475b799	Fix two concurrency issues with segment fetching. (#14042 ) * Fix two concurrency issues with segment fetching. 1) SegmentLocalCacheManager: Fix a concurrency issue where certain directory cleanup happened outside of directoryWriteRemoveLock. This created the possibility that segments would be deleted by one thread, while being actively downloaded by another thread. 2) TaskDataSegmentProcessor (MSQ): Fix a concurrency issue when two stages in the same process both use the same segment. For example: a self-join using distributed sort-merge. Prior to this change, the two stages could delete each others' segments. 3) ReferenceCountingResourceHolder: increment() returns a new ResourceHolder, rather than a Releaser. This allows it to be passed to callers without them having to hold on to both the original ResourceHolder and a Releaser. 4) Simplify various interfaces and implementations by using ResourceHolder instead of Pair and instead of split-up fields. * Add test. * Fix style. * Remove Releaser. * Updates from master. * Add some GuardedBys. * Use the correct GuardedBy. * Adjustments.	2023-04-25 20:49:27 -07:00
Abhishek Singh Chouhan	895abd8929	Refresh DruidLeaderClient cache selectively for non-200 responses (#14092 ) * Refresh DruidLeaderClient cache for non-200 responses * Change local variable name to avoid confusion * Implicit retries for 503 and 504 * Remove unused imports * Use argumentmatcher instead of Mockito for #any in test * Remove flag to disable retry for 503/504 * Remove unused import from test * Add log line for internal retry --------- Co-authored-by: Abhishek Singh Chouhan <abhishek.chouhan@salesforce.com>	2023-04-20 01:46:54 -07:00
zachjsh	04da0102cb	KillTask should return empty inputSource resources (#14106 ) ### Description This pr fixes a few bugs found with the inputSource security feature. 1. `KillUnusedSegmentsTask` previously had no definition for the `getInputSourceResources`, which caused an unsupportedOperationException to be thrown when this task type was submitted with the inputSource security feature enabled. This task type should not require any input source specific resources, so returning an empty set for this task type now. 2. Fixed a bug where when the input source type security feature is enabled, all of the input source type specific resources used where authenticated against: `{"resource": {"name": "EXTERNAL", "type": "{INPUT_SOURCE_TYPE}"}, "action": "READ"}` When they should be instead authenticated against: `{"resource": {"name": "{INPUT_SOURCE_TYPE}", "type": "EXTERNAL"}, "action": "READ"}` 3. fixed bug where supervisor tasks were not authenticated against the specific input source types used, if input source security feature was enabled.	2023-04-18 15:27:16 -04:00
Atul Mohan	e3c160f2f2	Add start_time column to sys.servers (#13358 ) Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.	2023-04-14 15:23:34 +05:30
imply-cheddar	aaa6cc1883	Make the tasks run with only a single directory (#14063 ) * Make the tasks run with only a single directory There was a change that tried to get indexing to run on multiple disks It made a bunch of changes to how tasks run, effectively hiding the "safe" directory for tasks to write files into from the task code itself making it extremely difficult to do anything correctly inside of a task. This change reverts those changes inside of the tasks and makes it so that only the task runners are the ones that make decisions about which mount points should be used for storing task-related files. It adds the config druid.worker.baseTaskDirs which can be used by the task runners to know which directories they should schedule tasks inside of. The TaskConfig remains the authoritative source of configuration for where and how an individual task should be operating.	2023-04-13 00:45:02 -07:00
zachjsh	2e87b5a901	Input source security sql layer can handle input source with multiple types (#14050 ) ### Description This change allows for input sources used during MSQ ingestion to be authorized for multiple input source types, instead of just 1. Such an input source that allows for multiple types is the CombiningInputSource. Also fixed bug that caused some input source specific functions to be authorized against the permissions ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ), new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] ` when the inputSource based authorization feature is enabled, when it should instead be authorized against ` [ new ResourceAction(new Resource(ResourceType.EXTERNAL, {input_source_type}), Action.READ) ] `	2023-04-10 09:48:57 -04:00
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
zachjsh	5c0221375c	Allow for Input source security in native task layer (#14003 ) Fixes #13837. ### Description This change allows for input source type security in the native task layer. To enable this feature, the user must set the following property to true: `druid.auth.enableInputSourceSecurity=true` The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource. When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource. `new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ` where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc.. Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.	2023-04-06 13:13:09 -04:00
Abhishek Agarwal	92912a6a2b	JOIN or UNNEST queries over tombstone segment can fail (#14021 ) Join,Unnest queries over tombstone segment can fail	2023-04-06 16:55:58 +05:30
Clint Wylie	1c8a184677	add null safety checks for DiscoveryDruidNode services for more resilient http server and task views (#13930 ) * add null safety checks for DiscoveryDruidNode services for more resilient http server and task vi	2023-04-05 02:45:39 -07:00
Clint Wylie	d21babc5b8	remix nested columns (#14014 ) changes: * introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation * introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>. * revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility * fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)	2023-04-04 17:51:59 -07:00
soullkk	51f3db2ce6	Fix peon errors when executing tasks in ipv6(#13972 ) (#13995 )	2023-03-31 09:18:10 +05:30
Kashif Faraz	47face9ca9	Handle null values in BrokerServerView.serverAddedSegment (#13980 ) Due to race conditions, the BrokerServerView may sometimes try to add a segment to a server which has already been removed from the inventory. This results in an NPE and keeps the BrokerServerView from processing all change requests.	2023-03-30 16:19:05 +05:30
zachjsh	3bb67721f7	Allow for Input source security in SQL layer (#13989 ) This change introduces the concept of input source type security model, proposed in #13837.. With this change, this feature is only available at the SQL layer, but we will expand to native layer in a follow up PR. To enable this feature, the user must set the following property to true: druid.auth.enableInputSourceSecurity=true The default value for this property is false, which will continue the existing functionality of having the usage all external sources being authorized against the hardcoded resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ When this config is enabled, the users will be required to be authorized for the following resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ where {INPUT_SOURCE_TYPE} is the type of the input source being used;, http, inline, s3, etc.. Documentation has not been added for the feature as it is not complete at the moment, as we still need to enable this for the native layer in a follow up pr.	2023-03-29 22:15:33 -04:00
Paul Rogers	da42ee5bfa	Added TYPE(native) data type for external tables (#13958 )	2023-03-22 21:43:29 -07:00
Adarsh Sanjeev	7bab407495	Add segment generator counters to MSQ reports (#13909 ) * Add segment generator counters to reports * Remove unneeded annotation * Fix checkstyle and coverage * Add persist and merged as new metrics * Address review comments * Fix checkstyle * Create metrics class to handle updating counters * Address review comments * Add rowsPushed as a new metrics	2023-03-22 09:17:26 -07:00
Clint Wylie	f4392a3155	expression transform improvements and fixes (#13947 ) changes: * fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms * improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away * improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension * added some tests for ExpressionTransform with array inputs * improved ExpressionPostAggregator to use partial type information from decoration * migrate some test uses of InputBindings.forMap to use other methods	2023-03-21 23:26:53 -07:00
Kashif Faraz	b7752a909c	Enable round-robin segment assignment and batch segment allocation by default (#13942 ) Changes: - Set `useRoundRobinSegmentAssignment` in coordinator dynamic config to `true` by default. - Set `batchSegmentAllocation` in `TaskLockConfig` (used in Overlord runtime properties) to `true` by default.	2023-03-22 08:20:01 +05:30
Gian Merlino	1c7a03a47b	Lower default maxRowsInMemory for realtime ingestion. (#13939 ) * Lower default maxRowsInMemory for realtime ingestion. The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads. However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 150k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 33 persists due to this limit, which is a reasonable number of persists. * Update tests. * Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Fix test. * Fix link. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-21 10:36:36 -07:00
Atul Mohan	617c325c70	Make zk connection retries configurable (#13913 ) * This makes the zookeeper connection retry count configurable. This is presently hardcoded to 29 tries which ends up taking a long time for the druid node to shutdown in case of ZK connectivity loss. Having a shorter retry count helps k8s deployments to fail fast. In situations where the underlying k8s node loses network connectivity or is no longer able to talk to zookeeper, failing fast can trigger pod restarts which can then reassign the pod to a healthy k8s node. Existing behavior is preserved, but users can override this property if needed.	2023-03-21 14:45:28 +05:30
Gian Merlino	fe9d0c46d5	Improve memory efficiency of WrappedRoaringBitmap. (#13889 ) * Improve memory efficiency of WrappedRoaringBitmap. Two changes: 1) Use an int[] for sizes 4 or below. 2) Remove the boolean compressRunOnSerialization. Doesn't save much space, but it does save a little, and it isn't adding a ton of value to have it be configurable. It was originally configurable in case anything broke when enabling it, but it's been a while and nothing has broken. * Slight adjustment. * Adjust for inspection. * Updates. * Update snaps. * Update test. * Adjust test. * Fix snaps.	2023-03-09 15:48:02 -08:00
Clint Wylie	68db39d08a	fix ci (#13901 ) This PR is #13899 plus spotbugs fix to fix the failures introduced by #13815	2023-03-08 16:55:47 +05:30
Abhishek Agarwal	52bd9e6adb	Improved error message when topic name changes within same supervisor (#13815 ) Improved error message when topic name changes within same supervisor Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-03-07 18:10:18 -08:00
Karan Kumar	94cfabea18	Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. (#13846 ) * Suggested memory calculation in case NOT_ENOUGH_MEMORY_FAULT is thrown. Co-authored-by: Charles Smith <techdocsmith@gmail.com>	2023-03-06 18:00:36 +05:30
Karan Kumar	65c3954942	Adding forbidden api for Properties#get() and Properties#getOrDefault() (#13882 ) Properties#getOrDefault method does not check the default map for values where as Properties#getProperty() does.	2023-03-06 10:42:04 +05:30
Tejaswini Bandlamudi	7103cb4b9d	Removes FiniteFirehoseFactory and its implementations (#12852 ) The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead. Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.	2023-03-02 18:07:17 +05:30
Clint Wylie	1d8fff4096	sampler + type detection = bff (#13711 ) * sampler + type detection = bff * split logical and physical dimensions, tidy up	2023-02-28 04:14:30 -08:00
Gian Merlino	6f7f391762	Remove unused imports. (#13860 ) Crept in during #13842. Possibly logical conflict with another PR.	2023-02-27 15:14:34 -08:00
Suneet Saldanha	31c7de1087	Make CompactionSearchPolicy injectable (#13842 ) * Make CompactionSearchPolicy injectable A small refactoring that makes the search policy for compaction injectable. Future changes can introduce new search policies that can be configured and injected so that operators can choose which search policy is best suited for their cluster. This will also allow us to de-couple the scheduling of compaction jobs from the CompactSegments duty, allowing the co-ordinator to schedule compaction jobs faster than the duty lifecycle. This PR is made so that it easy to review the future changes. * fix tests	2023-02-27 07:57:03 -08:00
Abhishek Agarwal	48f4330100	Make leader redirection work when both plainText and TLS ports are set (#13847 ) When both plainText and TLS ports are set in druid, the redirection to a different leader node can fail. This is caused by how we compare a redirect path and the leader locations registered with a druid node. While the registered location has both plainText and TLS port set, the redirect path only has one port since it's a URI.	2023-02-26 21:23:29 +05:30
Kashif Faraz	3a67a43c8a	Add method SegmentTimeline.addSegments (#13831 )	2023-02-21 23:58:01 -08:00
Lucas Capistrant	46eafa57e1	Improve client change counter management in HTTP Server View (#13010 ) * Avoid calling resolveWaitingFutures if there are no changes made * Avoid telling HTTP serveview client to reset counter when their counter is valid	2023-02-20 17:32:27 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Paul Rogers	333196d207	Code cleanup & message improvements (#13778 ) * Misc cleanup edits Correct spacing Add type parameters Add toString() methods to formats so tests compare correctly IT doc revisions Error message edits Display UT query results when tests fail * Edit * Build fix * Build fixes	2023-02-15 15:22:54 +05:30
Paul Rogers	842ee554de	Refinements to input-source specific table functions (#13780 ) Refinements to table functions Fixes various bugs Improves the structure of the table function classes Adds unit and integration tests	2023-02-13 16:21:27 -08:00
AmatyaAvadhanula	34c04daa9f	Fix infinite iteration in http sync monitoring (#13731 ) * Fix infinite iteration in http task runner * Fix infinite iteration in http server view * Add tests	2023-02-08 15:14:11 +05:30
imply-cheddar	f684df4c22	Use an HllSketchHolder object to enable optimized merge (#13737 ) * Use an HllSketchHolder object to enable optimized merge HllSketchAggregatorFactory.combine had been implemented using a pure pair-wise, "make a union -> add 2 things to union -> get sketch" algorithm. This algorithm does 2 things that was CPU 1) The Union object always builds an HLL_8 sketch regardless of the target type. This means that when the target type is not HLL_8, we spent CPU cycles converting to HLL_8 and back over and over again 2) By throwing away the Union object and converting back to the HllSketch only to build another Union object, we do lots and lots of copy+conversions of the HllSketch This change introduces an HllSketchHolder object which can hold onto a Union object and delay conversion back into an HllSketch until it is actually needed. This follows the same pattern as the SketchHolder object for theta sketches.	2023-02-07 13:57:48 -08:00
AmatyaAvadhanula	dcdae84888	Add server view initialization metrics (#13716 ) * Add server view init metrics * Test coverage * Rename metrics	2023-02-07 20:02:00 +05:30
imply-cheddar	9c5b61e114	Fallback virtual column (#13739 ) * Fallback virtual column This virtual columns enables falling back to another column if the original column doesn't exist. This is useful when doing column migrations and you have some old data with column X, new data with column Y and you want to use Y if it exists, X otherwise so that you can run a consistent query against all of the data.	2023-02-06 19:36:50 -08:00
Suneet Saldanha	cfc3115a59	Compaction history returns empty list instead of 404 when not found (#13730 ) * Compaction history returns empty list instead of 404 when not found * checkstyle	2023-02-01 17:44:07 -08:00
Clint Wylie	ec1e6ac840	fix nested column handling of null and "null" (#13714 ) * fix nested column handling of null and "null" * fix issue merging nested column value dictionaries that could incorrect lose dictionary values	2023-01-31 20:59:19 -08:00
Suneet Saldanha	016c881795	Add API to return automatic compaction config history (#13699 ) Add a new API to return the history of changes to automatic compaction config history to make it easy for users to see what changes have been made to their auto-compaction config. The API is scoped per dataSource to allow users to triage issues with an individual dataSource. The API responds with a list of configs when there is a change to either the settings that impact all auto-compaction configs on a cluster or the dataSource in question.	2023-01-23 13:23:45 -08:00
Clint Wylie	fb26a1093d	discover nested columns when using nested column indexer for schemaless ingestion (#13672 ) * discover nested columns when using nested column indexer for schemaless * move useNestedColumnIndexerForSchemaDiscovery from AppendableIndexSpec to DimensionsSpec	2023-01-18 12:57:28 -08:00
Maytas Monsereenusorn	1582d74f37	Fix Parquet Reader for schema-less ingestion need to read all columns (#13689 ) * fix stuff * address comments	2023-01-18 12:52:12 -08:00
Paul Rogers	22630b0aab	Much improved table functions (#13627 ) Much improved table functions * Revises properties, definitions in the catalog * Adds a "table function" abstraction to model such functions * Specific functions for HTTP, inline, local and S3. * Extended SQL types in the catalog * Restructure external table definitions to use table functions * EXTEND syntax for Druid's extern table function * Support for array-valued table function parameters * Support for array-valued SQL query parameters * Much new documentation	2023-01-17 08:41:57 -08:00
imply-cheddar	7ff3722cb9	Swap LazySingleton for Singleton (#13673 ) * Swap LazySingleton for Singleton * Initialize WebserverTestUtils properly	2023-01-15 21:38:37 -08:00
Gian Merlino	182c4fad29	Kinesis: More robust default fetch settings. (#13539 ) * Kinesis: More robust default fetch settings. 1) Default recordsPerFetch and recordBufferSize based on available memory rather than using hardcoded numbers. For this, we need an estimate of record size. Use 10 KB for regular records and 1 MB for aggregated records. With 1 GB heaps, 2 processors per task, and nonaggregated records, recordBufferSize comes out to the same as the old default (10000), and recordsPerFetch comes out slightly lower (1250 instead of 4000). 2) Default maxRecordsPerPoll based on whether records are aggregated or not (100 if not aggregated, 1 if aggregated). Prior default was 100. 3) Default fetchThreads based on processors divided by task count on Indexers, rather than overall processor count. 4) Additionally clean up the serialized JSON a bit by adding various JsonInclude annotations. * Updates for tests. * Additional important verify.	2023-01-13 11:03:54 +05:30
Clint Wylie	b5b740bbbb	allow using nested column indexer for schema discovery (#13653 ) * single typed "root" only nested columns now mimic "regular" columns of those types * incremental index can now use nested column indexer instead of string indexer for discovered columns	2023-01-12 18:31:12 -08:00
Adarsh Sanjeev	0a486c3bcf	Update forbidden apis with fixed executor (#13633 ) * Update forbidden apis with fixed executor	2023-01-12 15:34:36 +05:30
Maytas Monsereenusorn	7f54ebbf47	Fix Parquet Parser missing column when reading parquet file (#13612 ) * fix parquet reader * fix checkstyle * fix bug * fix inspection * refactor * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * add test * fix checkstyle * fix tests * add IT * add IT * add more tests * fix checkstyle * fix stuff * fix stuff * add more tests * add more tests	2023-01-11 20:08:48 -10:00
Abhishek Agarwal	17936e2920	Add an option to enable HSTS in druid services (#13489 ) * Add an option to enable HSTS * Fix code and add docs * Deduplicate headers * unused import * Fix spelling	2023-01-10 22:31:51 +05:30
imply-cheddar	a8ecc48ffe	Validate response headers and fix exception logging (#13609 ) * Validate response headers and fix exception logging A class of QueryException were throwing away their causes making it really hard to determine what's going wrong when something goes wrong in the SQL planner specifically. Fix that and adjust tests to do more validation of response headers as well. We allow 404s and 307s to be returned even without authorization validated, but others get converted to 403	2023-01-05 14:15:15 -08:00
Kashif Faraz	36e6765596	Fix flaky test (#13603 )	2023-01-03 13:52:05 +05:30
imply-cheddar	7b92b85168	Unify DummyRequest with MockHttpServletRequest (#13602 ) We had 2 different classes both creating fake instances of an HttpServletRequest, this makes it to that we only have one in a common location	2022-12-21 20:15:08 -08:00
imply-cheddar	0efd0879a8	Unify the handling of HTTP between SQL and Native (#13564 ) * Unify the handling of HTTP between SQL and Native The SqlResource and QueryResource have been using independent logic for things like error handling and response context stuff. This became abundantly clear and painful during a change I was making for Window Functions, so I unified them into using the same code for walking the response and serializing it. Things are still not perfectly unified (it would be the absolute best if the SqlResource just took SQL, planned it and then delegated the query run entirely to the QueryResource), but this refactor doesn't take that fully on. The new code leverages async query processing from our jetty container, the different interaction model with the Resource means that a lot of tests had to be adjusted to align with the async query model. The semantics of the tests remain the same with one exception: the SqlResource used to not log requests that failed authorization checks, now it does.	2022-12-19 00:25:33 -08:00
Kashif Faraz	58a3acc2c4	Add InputStats to track bytes processed by a task (#13520 ) This commit adds a new class `InputStats` to track the total bytes processed by a task. The field `processedBytes` is published in task reports along with other row stats. Major changes: - Add class `InputStats` to track processed bytes - Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes. > Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost. - Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask` - Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method. - Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports Other changes: - Update tests to verify the value of `processedBytes` - Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class - Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager` - Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`	2022-12-13 18:54:42 +05:30
somu-imply	7682b0b6b1	Analysis refactor (#13501 ) Refactor DataSource to have a getAnalysis method() This removes various parts of the code where while loops and instanceof checks were being used to walk through the structure of DataSource objects in order to build a DataSourceAnalysis. Instead we just ask the DataSource for its analysis and allow the stack to rebuild whatever structure existed.	2022-12-12 17:35:44 -08:00
Gian Merlino	de5a4bafcb	Zero-copy local deep storage. (#13394 ) * Zero-copy local deep storage. This is useful for local deep storage, since it reduces disk usage and makes Historicals able to load segments instantaneously. Two changes: 1) Introduce "druid.storage.zip" parameter for local storage, which defaults to false. This changes default behavior from writing an index.zip to writing a regular directory. This is safe to do even during a rolling update, because the older code actually already handled unzipped directories being present on local deep storage. 2) In LocalDataSegmentPuller and LocalDataSegmentPusher, use hard links instead of copies when possible. (Generally this is possible when the source and destination directory are on the same filesystem.)	2022-12-12 17:28:24 -08:00
Rishabh Singh	4ebdfe226d	Druid automated quickstart (#13365 ) * Druid automated quickstart * remove conf/druid/single-server/quickstart/_common/historical/jvm.config * Minor changes in python script * Add lower bound memory for some services * Additional runtime properties for services * Update supervise script to accept command arguments, corresponding changes in druid-quickstart.py * File end newline * Limit the ability to start multiple instances of a service, documentation changes * simplify script arguments * restore changes in medium profile * run-druid refactor * compute and pass middle manager runtime properties to run-druid supervise script changes to process java opts array use argparse, leave free memory, logging * Remove extra quotes from mm task javaopts array * Update logic to compute minimum memory * simplify run-druid * remove debug options from run-druid * resolve the config_path provided * comment out service specific runtime properties which are computed in the code * simplify run-druid * clean up docs, naming changes * Throw ValueError exception on illegal state * update docs * rename args, compute_only -> compute, run_zk -> zk * update help documentation * update help documentation * move task memory computation into separate method * Add validation checks * remove print * Add validations * remove start-druid bash script, rename start-druid-main * Include tasks in lower bound memory calculation * Fix test * 256m instead of 256g * caffeine cache uses 5% of heap * ensure min task count is 2, task count is monotonic * update configs and documentation for runtime props in conf/druid/single-server/quickstart * Update docs * Specify memory argument for each profile in single-server.md * Update middleManager runtime.properties * Move quickstart configs to conf/druid/base, add bash launch script, support python2 * Update supervise script * rename base config directory to auto * rename python script, changes to pass repeated args to supervise * remove exmaples/conf/druid/base dir * add docs * restore changes in conf dir * update start-druid-auto * remove hashref for commands in supervise script * start-druid-main java_opts array is comma separated * update entry point script name in python script * Update help docs * documentation changes * docs changes * update docs * add support for running indexer * update supported services list * update help * Update python.md * remove dir * update .spelling * Remove dependency on psutil and pathlib * update docs * Update get_physical_memory method * Update help docs * update docs * update method to get physical memory on python * udpate spelling * update .spelling * minor change * Minor change * memory comptuation for indexer * update start-druid * Update python.md * Update single-server.md * Update python.md * run python3 --version to check if python is installed * Update supervise script * start-druid: echo message if python not found * update anchor text * minor change * Update condition in supervise script * JVM not jvm in docs	2022-12-09 11:04:02 -08:00
Paul Rogers	013a12e86f	Enhanced MSQ table functions (#13360 ) * Enhanced MSQ table functions * HTTP, LOCALFILES and INLINE table functions powered by catalog metadata. * Documentation	2022-12-08 13:56:02 -08:00
Clint Wylie	37d8833125	fix bug with broker parallel merge metrics emitting, add wall time, fast/slow partition time metrics (#13420 )	2022-12-06 17:50:59 -08:00
imply-cheddar	83261f9641	Starting on Window Functions (#13458 ) * Processors for Window Processing This is an initial take on how to use Processors for Window Processing. A Processor is an interface that transforms RowsAndColumns objects. RowsAndColumns objects are essentially combinations of rows and columns. The intention is that these Processors are the start of a set of operators that more closely resemble what DB engineers would be accustomed to seeing. * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Wire up windowed processors with a query type that can run them end-to-end. This code can be used to actually run a query, so yay! * Some SQL tests for window functions. Added wikipedia data to the indexes available to the SQL queries and tests validating the windowing functionality as it exists now. Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2022-12-06 15:54:05 -08:00
Clint Wylie	cf472162a6	fix issue with jetty graceful shutdown of data servers when druid.serverview.type=http (#13499 ) * fix issue with http server inventory view blocking data node http server shutdown with long polling * adjust * fix test inspections	2022-12-06 15:52:44 -08:00
AmatyaAvadhanula	658a9c2d35	Early stop on failed start (Alternative to #13087 ) (#13258 ) * Make halt configurable. Don't halt in tests	2022-12-05 21:05:07 +05:30
TSFenwick	10bec54acc	Switching emitter. This will allow for a per feed emitter designation. (#13363 ) * Switching emitter. This will allow for a per feed emitter designation. This will work by looking at an event's feed and direct it to a specific emitter. If no specific feed is specified for a feed. The emitter can direct the event to a default emitter. * fix checkstyle issues and make docs for switching emitter use basic event feeds * fix broken docs, add test, and guard against misconfigurations * add module test add switching emitter module test * fix broken SwitchingEmitterModuleTest * add apache license to top of test * fix checkstyle issues * address comments by adding javadocs, removing a todo, and making druid docs more clear	2022-12-05 16:04:34 +05:30
Kashif Faraz	45a8fa280c	Add SegmentAllocationQueue to batch SegmentAllocateActions (#13369 ) In a cluster with a large number of streaming tasks (~1000), SegmentAllocateActions on the overlord can often take very long intervals of time to finish thus causing spikes in the `task/action/run/time`. This may result in lag building up while a task waits for a segment to get allocated. The root causes are: - large number of metadata calls made to the segments and pending segments tables - `giant` lock held in `TaskLockbox.tryLock()` to acquire task locks and allocate segments Since the contention typically arises when several tasks of the same datasource try to allocate segments for the same interval/granularity, the allocation run times can be improved by batching the requests together. Changes - Add flags - `druid.indexer.tasklock.batchSegmentAllocation` (default `false`) - `druid.indexer.tasklock.batchAllocationMaxWaitTime` (in millis) (default `1000`) - Add methods `canPerformAsync` and `performAsync` to `TaskAction` - Submit each allocate action to a `SegmentAllocationQueue`, and add to correct batch - Process batch after `batchAllocationMaxWaitTime` - Acquire `giant` lock just once per batch in `TaskLockbox` - Reduce metadata calls by batching statements together and updating query filters - Except for batching, retain the whole behaviour (order of steps, retries, etc.) - Respond to leadership changes and fail items in queue when not leader - Emit batch and request level metrics	2022-12-05 14:00:07 +05:30
Paul Rogers	b76ff16d00	SQL test framework extensions (#13426 ) SQL test framework extensions * Capture planner artifacts: logical plan, etc. * Planner test builder validates the logical plan * Validation for the SQL resut schema (we already have validation for the Druid row signature) * Better Guice integration: properties, reuse Guice modules * Avoid need for hand-coded expr, macro tables * Retire some of the test-specific query component creation * Fix query log hook race condition	2022-12-02 09:11:59 -08:00
Gian Merlino	58c896ea0b	ServiceClient: More robust redirect handling. (#13413 ) Detects self-redirects, redirect loops, long redirect chains, and redirects to unknown servers. Treat all of these cases as an unavailable service, retrying if the retry policy allows it. Previously, some of these cases would lead to a prompt, unretryable error. This caused clients contacting an Overlord during a leader change to fail with error messages like: org.apache.druid.rpc.RpcException: Service [overlord] redirected too many times Additionally, a slight refactor of callbacks in ServiceClientImpl improves readability of the flow through onSuccess.	2022-11-28 22:24:46 +05:30
Kashif Faraz	656b6cdf62	Add MetricsVerifier to simplify verification of metric values in tests (#13442 )	2022-11-28 19:32:37 +05:30
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
Kashif Faraz	133054bf27	Make batched segment sampling the default, minor cleanup of coordinator config (#13391 ) The batch segment sampling performs significantly better than the older method of sampling if there are a large number of used segments. It also avoids duplicates. Changes: - Make batch segment sampling the default - Deprecate the property `useBatchedSegmentSampler` - Remove unused coordinator config `druid.coordinator.loadqueuepeon.repeatDelay` - Cleanup `KillUnusedSegments` - Simplify `KillUnusedSegmentsTest`, add better tests, remove redundant tests	2022-11-21 20:31:46 +05:30
Gian Merlino	bfffbabb56	Async task client for SeekableStreamSupervisors. (#13354 ) Main changes: 1) Convert SeekableStreamIndexTaskClient to an interface, move old code to SeekableStreamIndexTaskClientSyncImpl, and add new implementation SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient. 2) Add "chatAsync" parameter to seekable stream supervisors that causes the supervisor to use an async task client. 3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making blocking RPC calls in workerExec threads. 4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList to FutureUtils.coalesce, so we can better capture the errors that occurred with contacting individual tasks. Other, related changes: 1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether ServiceClient retries unavailable services. Useful since we do not want to retry calls unavailable tasks within the service client. (The supervisor does its own higher-level retries.) 2) Add FutureUtils.transformAsync, a more lambda friendly version of Futures.transform(f, AsyncFunction). 3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but returns Either instead of using null on error. 4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.	2022-11-21 19:20:26 +05:30
Rohan Garg	6ccf31490e	Allow injection of node-role set to all non base modules (#13371 )	2022-11-18 12:12:03 +05:30
Kashif Faraz	71b133f3ff	Add `RoundRobinServerSelector` to speed up segment assignments (#13367 ) Segment assignments can take very long due to the strategy cost computation for a large number of segments. This commit allows segment assignments to be done in a round-robin fashion within a tier. Only segment balancing takes cost-based decisions to move segments around. Changes - Add dynamic config `useRoundRobinSegmentAssignment` with default value false - Add `RoundRobinServerSelector`. This does not implement the `BalancerStrategy` as it does not conform to that contract and may also be used in conjunction with a strategy (round-robin for `RunRules` and a cost strategy for `BalanceSegments`) - Drops are still cost-based even when round-robin assignment is enabled.	2022-11-16 20:05:17 +05:30
Paul Rogers	81d005f267	Druid Catalog basics (#13165 ) Druid catalog basics Catalog object model for tables, columns Druid metadata DB storage (as an extension) REST API to update the catalog (as an extension) Integration tests Model only: no planner integration yet	2022-11-12 15:30:22 -08:00
AmatyaAvadhanula	fb23e38aa7	Fix messageGap emission (#13346 ) * Fix messageGap emission * Do not emit messageGap after stopping reading events * Refactoring * Fix tests	2022-11-10 17:50:19 +05:30
Paul Rogers	7e600d2c63	Enhancements to the Calcite test framework (#13283 ) * Enhancements to the Calcite test framework * Standardize "Unauthorized" messages * Additional test framework extension points * Resolved joinable factory dependency issue	2022-11-08 14:28:49 -08:00
Kashif Faraz	9f7fd57a69	Improve fetch of pending segments from metadata store (#13310 ) * Deserialize only when needed * Update query to fetch pending segments * Revert unneeded changes * Fix query	2022-11-08 05:46:19 -08:00
Kashif Faraz	ff8e0c3397	Fix issues with caching cost strategy (#13321 ) `cachingCost` strategy has some discrepancies when compared to cost strategy. This commit addresses two of these by retaining the same behaviour as the `cost` strategy when computing the cost of moving a segment to a server: - subtract the self cost of a segment if it is being served by the target server - subtract the cost of segments that are marked to be dropped Other changes: - Add tests to verify fixed strategy. These tests would fail without the fixes made to `CachingCostStrategy.computeCost()` - Fix the definition of the segment related metrics in the docs. - Fix some docs issues introduced in #13181	2022-11-08 16:11:39 +05:30
Tejaswini Bandlamudi	594545da55	Adds cluster level idleConfig setting for supervisor (#13311 ) * adds cluster level idleConfig * updates docs * refactoring * spelling nit * nit * nit * refactoring	2022-11-08 14:54:14 +05:30
AmatyaAvadhanula	47c32a9d92	Skip ALL granularity compaction (#13304 ) * Skip autocompaction for datasources with ETERNITY segments	2022-11-07 17:55:03 +05:30
AmatyaAvadhanula	650840ddaf	Add segment handoff time metric (#13238 ) * Add segment handoff time metric * Remove monitors on scheduler stop * Add warning log for slow handoff * Remove monitor when scheduler stops	2022-11-07 17:49:10 +05:30
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
Kashif Faraz	fd7864ae33	Improve run time of coordinator duty MarkAsUnusedOvershadowedSegments (#13287 ) In clusters with a large number of segments, the duty `MarkAsUnusedOvershadowedSegments` can take a long very long time to finish. This is because of the costly invocation of `timeline.isOvershadowed` which is done for every used segment in every coordinator run. Changes - Use `DataSourceSnapshot.getOvershadowedSegments` to get all overshadowed segments - Iterate over this set instead of all used segments to identify segments that can be marked as unused - Mark segments as unused in the DB in batches rather than one at a time - Refactor: Add class `SegmentTimeline` for ease of use and readability while using a `VersionedIntervalTimeline` of segments.	2022-11-01 20:19:52 +05:30
somu-imply	affc522b9f	Refactoring the data source before unnest (#13085 ) * First set of changes for framework * Second set of changes to move segment map function to data source * Minot change to server manager * Removing the createSegmentMapFunction from JoinableFactoryWrapper and moving to JoinDataSource * Checkstyle fixes * Patching Eric's fix for injection * Checkstyle and fixing some CI issues * Fixing code inspections and some failed tests and one injector for test in avatica * Another set of changes for CI...almost there * Equals and hashcode part update * Fixing injector from Eric + refactoring for broadcastJoinHelper * Updating second injector. Might revert later if better way found * Fixing guice issue in JoinableFactory * Addressing review comments part 1 * Temp changes refactoring * Revert "Temp changes refactoring" This reverts commit `9da42a9ef0`. * temp * Temp discussions * Refactoring temp * Refatoring the query rewrite to refer to a datasource * Refactoring getCacheKey by moving it inside data source * Nullable annotation check in injector * Addressing some comments, removing 2 analysis.isJoin() checks and correcting the benchmark files * Minor changes for refactoring * Addressing reviews part 1 * Refactoring part 2 with new test cases for broadcast join * Set for nullables * removing instance of checks * Storing nullables in guice to avoid checking on reruns * Fixing a test case and removing an irrelevant line * Addressing the atomic reference review comments	2022-10-26 15:58:58 -07:00
Gian Merlino	d98c808d3f	Remove basePersistDirectory from tuning configs. (#13040 ) * Remove basePersistDirectory from tuning configs. Since the removal of CliRealtime, it serves no purpose, since it is always overridden in production using withBasePersistDirectory given some subdirectory of the task work directory. Removing this from the tuning config has a benefit beyond removing no-longer-needed logic: it also avoids the side effect of empty "druid-realtime-persist" directories getting created in the systemwide temp directory. * Test adjustments to appropriately set basePersistDirectory. * Remove unused import. * Fix RATC constructor.	2022-10-21 17:25:36 -07:00
Paul Rogers	86e6e61e88	Modular Calcite Test Framework (#12965 ) * Refactor Calcite test "framework" for planner tests Refactors the current Calcite tests to make it a bit easier to adjust the set of runtime objects used within a test. * Move data creation out of CalciteTests into TestDataBuilder * Move "framework" creation out of CalciteTests into a QueryFramework * Move injector-dependent functions from CalciteTests into QueryFrameworkUtils * Wrapper around the planner factory, etc. to allow customization. * Bulk of the "framework" created once per class rather than once per test. * Refactor tests to use a test builder * Change all testQuery() methods to use the test builder. Move test execution & verification into a test runner.	2022-10-20 15:45:44 -07:00
Paul Rogers	f4dcc52dac	Redesign QueryContext class (#13071 ) We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays	2022-10-15 11:02:11 +05:30
Tejaswini Bandlamudi	3e13584e0e	Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144 ) * Idle Seekable stream supervisor changes. * nit * nit * nit * Adds unit tests * Supervisor decides it's idle state instead of AutoScaler * docs update * nit * nit * docs update * Adds Kafka unit test * Adds Kafka Integration test. * Updates travis config. * Updates kafka-indexing-service dependencies. * updates previous offsets snapshot & doc * Doesn't act if supervisor is suspended. * Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes. * Reverts Kinesis Supervisor idle behaviour changes. * nit * nit * Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests. * Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too * Adds Kafka Supervisor UT * Improves test coverage in druid-server * Corrects IT override config * Doc updates and Syntactic changes * nit * supervisorSpec.ioConfig.idleConfig changes	2022-10-12 18:31:08 +05:30
Gian Merlino	c19ae13323	Improve direct-memory check on startup. (#13207 ) 1) Better support for Java 9+ in RuntimeInfo. This means that in many cases, an actual validation can be done. 2) Clearer log message in cases where an actual validation cannot be done.	2022-10-12 05:10:25 +08:00
AmatyaAvadhanula	41e51b21c3	Make http options the default configurations (#13092 ) Druid currently uses Zookeeper dependent options as the default. This commit updates the following to use HTTP as the default instead. - task runner. `druid.indexer.runner.type=remote -> httpRemote` - load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http` - server inventory view. `druid.serverview.type=curator -> http`	2022-10-05 05:35:17 +05:30
Abhishek Agarwal	e3f9a0ed44	Lazy initialization of segment killers, movers and archivers (#13170 ) * Lazy initialization of segment killers, movers and archivers * Add test for lazy killer * Add more tests * Intellij fixes	2022-10-04 15:55:46 +05:30
Kashif Faraz	ce5f55e5ce	Fix over-replication caused by balancing when inventory is not updated yet (#13114 ) * Add coordinator test framework * Remove outdated changes * Add more tests * Add option to auto-sync inventory * Minor cleanup * Fix inspections * Add README for simulations, add SegmentLoadingNegativeTest * Fix over-replication from balancing * Fix README * Cleanup unnecessary fields from DruidCoordinator * Add a test * Fix DruidCoordinatorTest * Remove unused import * Fix CuratorDruidCoordinatorTest * Remove test log4j2.xml	2022-09-29 12:06:23 +05:30
Kashif Faraz	0039409817	Add test framework to simulate segment loading and balancing (#13074 ) Fixes #12822 The framework added here make it easy to write tests that verify the behaviour and interactions of the following entities under various conditions: - `DruidCoordinator` - `HttpLoadQueuePeon`, `LoadQueueTaskMaster` - coordinator duties: `BalanceSegments`, `RunRules`, `UnloadUnusedSegments`, etc. - datasource retention rules: `LoadRule`, `DropRule` Changes: Add the following main classes: - `CoordinatorSimulation` and related interfaces to dictate behaviour of simulation - `CoordinatorSimulationBuilder` to build a simulation. - `BlockingExecutorService` to keep submitted tasks in queue and execute them only when explicitly invoked. Add tests: - `CoordinatorSimulationBaseTest`, `SegmentLoadingTest`, `SegmentBalancingTest` - `SegmentLoadingNegativeTest` to contain tests which assert the existing erroneous behaviour of segment loading. Once the behaviour is fixed, these tests will be moved to the regular `SegmentLoadingTest`. Please refer to the README.md in `org.apache.druid.server.coordinator.simulate` for more details	2022-09-21 09:51:58 +05:30
Paul Rogers	8ce03eb094	Convert the Druid planner to use statement handlers (#12905 ) * Converted Druid planner to use statement handlers Converts the large collection of if-statements for statement types into a set of classes: one per supported statement type. Cleans up a few error messages. * Revisions from review comments * Build fix * Build fix * Resolve merge confict. * More merges with QueryResponse PR * More parameterized type cleanup Forces a rebuild due to a flaky test	2022-09-19 11:58:45 +05:30
AmatyaAvadhanula	9b53b0184f	Allocate numCorePartitions using only used segments (#13070 ) * Allocate numCorePartitions using only used segments * Add corePartition checks in existing test * Separate committedMaxId and overallMaxId * Fix bug: replace overall with committed	2022-09-16 19:16:36 +05:30
AmatyaAvadhanula	1311e85f65	Faster fix for dangling tasks upon supervisor termination (#13072 ) This commit fixes issues with delayed supervisor termination during certain transient states. Tasks can be created during supervisor termination and left behind since the cleanup may not consider these newly added tasks. #12178 added a lock for the entire process of task creation to prevent such dangling tasks. But it also introduced a deadlock scenario as follows: - An invocation of `runInternal` is in progress. - A `stop` request comes, acquires `stateChangeLock` and submit a `ShutdownNotice` - `runInternal` keeps waiting to acquire the `stateChangeLock` - `ShutdownNotice` remains stuck in the notice queue because `runInternal` is still running - After some timeout, the supervisor goes through a forced termination Fix: * `SeekableStreamSupervisor.runInternal` - do not try to acquire lock if supervisor is already stopping * `SupervisorStateManager.maybeSetState` - do not allow transitions from STOPPING state	2022-09-15 15:31:14 +05:30
Frank Chen	aa9b0900d4	Move web-console dependency declaration from druid-server to druid-distribution (#12501 ) * Move web-console dependency from druid-server to distribution * Add a test to check if the web-console is correctly integrated * exclude web-console from 'other integration tests' * Revert "exclude web-console from 'other integration tests'" This reverts commit `8d72225544`. * Revert "Add a test to check if the web-console is correctly integrated" This reverts commit `d6ac8f3087`.	2022-09-15 10:39:30 +08:00
Clint Wylie	f4ec50bf7a	fix JsonParserIteratorTest (#13083 )	2022-09-13 20:49:57 -07:00
Frank Chen	fd6c05eee8	Avoid ClassCastException when getting values from `QueryContext` (#13022 ) * Use safe conversion methods * Rename method * Add getContextAsBoolean * Update test case * Remove generic from getContextValue * Update catch-handler * Add test * Resolve comments * Replace 'getContextXXX' to 'getQueryContext().getAsXXXX'	2022-09-13 18:00:09 +08:00
imply-cheddar	5ba0075c0c	Expose HTTP Response headers from SqlResource (#13052 ) * Expose HTTP Response headers from SqlResource This change makes the SqlResource expose HTTP response headers in the same way that the QueryResource exposes them. Fundamentally, the change is to pipe the QueryResponse object all the way through to the Resource so that it can populate response headers. There is also some code cleanup around DI, as there was a superfluous FactoryFactory class muddying things up.	2022-09-12 01:40:06 -07:00
Gian Merlino	e29e7a8434	Add ARRAY_QUANTILE function. (#13061 ) * Add ARRAY_QUANTILE function. Expected usage is like: ARRAY_QUANTILE(ARRAY_AGG(x), 0.9). * Fix test.	2022-09-09 11:29:20 -07:00
Lucas Capistrant	99fd22c79b	fix bug in /status/properties filtering (#13045 ) * fix bug in /status/properties filtering * Refactor tests to use jackson for parsing druid.server.hiddenProperties instead of hacky string modifications * make javadoc more descriptive using example * add in a sanity assertion that raw properties keyset size is greater than filtered properties keyset size	2022-09-07 17:45:28 -07:00
Rohan Garg	7aa8d7f987	Add query/time metric for SQL queries from router (#12867 ) * Add query/time metric for SQL queries from router * Fix query cancel bug when user has overriden native query-id in a SQL query	2022-09-07 13:54:46 +05:30
Adam Peck	ee22663dd3	Add interpolation to JsonConfigurator (#13023 ) * Add interpolation to JsonConfigurator * Fix checkstyle * Fix tests by removing common-text override * Add back commons-text without version * Remove unused hadoopDir configs * Move some stuff to hopefully pass coverage	2022-09-07 12:48:01 +05:30
Clint Wylie	a3a377e570	more consistent expression error messages (#12995 ) * more consistent expression error messages * review stuff * add NamedFunction for Function, ApplyFunction, and ExprMacro to share common stuff * fixes * add expression transform name to transformer failure, better parse_json error messaging	2022-09-06 23:21:38 -07:00
zemin	6805a7f9c2	Ease of hidding sensitive properties from /status/proper… (#12950 ) * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint using one property for hiding properties, updated the index.md to document hiddenProperties * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint Added java docs * apache#12063 Ease of hidding sensitive properties from /status/properties endpoint Add "password", "key", "token", "pwd" as default druid.server.hiddenProperties fixed typo and removed redundant space Co-authored-by: zemin <zemin.piao@adyen.com>	2022-09-02 08:51:25 -05:00
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Clint Wylie	16f5ac5bd5	json_value adjustments (#12968 ) * json_value adjustments changes: * native json_value expression now has optional 3rd argument to specify type, which will cast all values to the specified type * rework how JSON_VALUE is wired up in SQL. Now we are using a custom convertlet to translate JSON_VALUE(... RETURNING type) into dedicated JSON_VALUE_BIGINT, JSON_VALUE_DOUBLE, JSON_VALUE_VARCHAR, JSON_VALUE_ANY instead of using the calcite StandardConvertletTable that wraps JSON_VALUE_ANY in a CAST, so that we preserve the typing of JSON_VALUE to pass down to the native expression as the 3rd argument * fix json_value_any to be usable by humans too, coverage * fix bug * checkstyle * checkstyle * review stuff * validate that options to json_value are the supported options rather than ignore them * remove more legacy undocumented functions	2022-08-27 07:15:47 -07:00
Adam Peck	21b73bde20	Update Curator to 5.3.0 (#12939 ) * Update Curator to 5.3.0 * Update licenses.yaml * Fix inspections + add tests. * Fix checkstyle * Another intellij inspection fix * Update curator exclusions * Cleanup new exhibitor references * Remove unused dep and checkstyle fix	2022-08-26 18:23:40 -07:00
Santosh Pingale	31dc9004bd	Auto-reload TLS certs for druid endpoints (#12933 ) * #12064 Auto-reload tls certs for druid endpoints * #12064 Add missing toString param * #12064 Add tests and new jks Co-authored-by: zemin-piao <pzm6391@gmail.com> * #12064 Refine tests * #12064 Add documentation * Apply suggestions from code review Co-authored-by: Frank Chen <frankchen@apache.org> Co-authored-by: santosh <santosh.pingale@adyen.com> Co-authored-by: Frank Chen <frankchen@apache.org>	2022-08-25 20:12:43 +08:00
Paul Rogers	cfed036091	Add the new integration test framework (#12368 ) This commit is a first draft of the revised integration test framework which provides: - A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.) - Maven module druid-it-tools to hold code placed into the Docker image. - Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.) - Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes. The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.	2022-08-24 17:03:23 +05:30
Bartosz Mikulski	0bc9f9f303	#12912 Fix KafkaEmitter not emitting queryType for a native query (#12915 ) Fixes KafkaEmitter not emitting queryType for a native query. The Event to JSON serialization was extracted to the external class: EventToJsonSerializer. This was done to simplify the testing logic for the serialization as well as extract the responsibility of serialization to the separate class. The logic builds ObjectNode incrementally based on the event .toMap method. Parsing each entry individually ensures that the Jackson polymorphic annotations are respected. Not respecting these annotation caused the missing of the queryType from output event.	2022-08-24 14:07:00 +05:30
Adarsh Sanjeev	3b58a01c7c	Correct spelling in messages and variable names. (#12932 )	2022-08-24 11:06:31 +05:30
Clint Wylie	289e43281e	stricter behavior for parse_json, add try_parse_json, remove to_json (#12920 )	2022-08-22 18:41:07 -07:00
Rohan Garg	a879d91a20	Remove misleading logging on router for JDBC queries (#12925 )	2022-08-22 11:58:51 +05:30
Rohan Garg	3c129f6728	Add sql planning time metric (#12923 )	2022-08-22 11:09:44 +05:30
Karan Kumar	a3a9c5f409	Fixing overlord issued too many redirects (#12908 ) * Fixing race in overlord redirects where the node was redirecting to itself * Fixing test cases	2022-08-17 18:27:39 +05:30
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Gian Merlino	836430019a	Add EXTERNAL resource type. (#12896 ) This is used to control access to the EXTERN function, which allows reading external data in SQL. The EXTERN function is not usable in production as of today, but it is used by the task-based SQL engine contemplated in #12262.	2022-08-12 10:57:30 -07:00
Karan Kumar	2f2d8ded5a	Introducing Storage connector Interface (#12874 ) In the current druid code base, we have the interface DataSegmentPusher which allows us to push segments to the appropriate deep storage without the extension being worried about the semantics of how to push too deep storage. While working on #12262, whose some part of the code will go as an extension, I realized that we do not have an interface that allows us to do basic "write, get, delete, deleteAll" operations on the appropriate deep storage without let's say pulling the s3-storage-extension dependency in the custom extension. Hence, the idea of StorageConnector was born where the storage connector sits inside the druid core so all extensions have access to it. Each deep storage implementation, for eg s3, GCS, will implement this interface. Now with some Jackson magic, we bind the implementation of the correct deep storage implementation on runtime using a type variable.	2022-08-12 16:11:49 +05:30
Adarsh Sanjeev	24f8f9e1ab	Add check for eternity time segment to SqlSegmentsMetadataQuery (#12844 ) * Add check for eternity time segment to SqlSegmentsMetadataQuery * Add check for half eternities * Add multiple segments test * Add failing test to document known issue	2022-08-04 22:33:08 -07:00
Paul Rogers	a618458bf0	Tidy up construction of the Guice Injectors (#12816 ) * Refactor Guice initialization Builders for various module collections Revise the extensions loader Injector builders for server startup Move Hadoop init to indexer Clean up server node role filtering Calcite test injector builder * Revisions from review comments * Build fixes * Revisions from review comments	2022-08-04 00:05:07 -07:00
Gian Merlino	ef6811ef88	Improved Java 17 support and Java runtime docs. (#12839 ) * Improved Java 17 support and Java runtime docs. 1) Add a "Java runtime" doc page with information about supported Java versions, garbage collection, and strong encapsulation.. 2) Update asm and equalsverifier to versions that support Java 17. 3) Add additional "--add-opens" lines to surefire configuration, so tests can pass successfully under Java 17. 4) Switch openjdk15 tests to openjdk17. 5) Update FrameFile to specifically mention Java runtime incompatibility as the cause of not being able to use Memory.map. 6) Update SegmentLoadDropHandler to log an error for Errors too, not just Exceptions. This is important because an IllegalAccessError is encountered when the correct "--add-opens" line is not provided, which would otherwise be silently ignored. 7) Update example configs to use druid.indexer.runner.javaOptsArray instead of druid.indexer.runner.javaOpts. (The latter is deprecated.) * Adjustments. * Use run-java in more places. * Add run-java. * Update .gitignore. * Exclude hadoop-client-api. Brought in when building on Java 17. * Swap one more usage of java. * Fix the run-java script. * Fix flag. * Include link to Temurin. * Spelling. * Update examples/bin/run-java Co-authored-by: Xavier Léauté <xl+github@xvrl.net> Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2022-08-03 23:16:05 -07:00
Gian Merlino	2912a36a20	Use nonzero default value of maxQueuedBytes. (#12840 ) * Use nonzero default value of maxQueuedBytes. The purpose of this parameter is to prevent the Broker from running out of memory. The prior default is unlimited; this patch changes it to a relatively conservative 25MB. This may be too low for larger clusters. The risk is that throughput can decrease for queries with large resultsets or large amounts of intermediate data. However, I think this is better than the risk of the prior default, which is that these queries can cause the Broker to go OOM. * Alter calculation.	2022-08-02 17:57:27 -07:00
Atul Mohan	93a9a4b1c5	Add retention for file request logs (#12559 ) * Add retention for file request logs * Spelling	2022-07-27 08:17:02 -07:00
Rohan Garg	bf0886a8ab	Fix hash calcuation in RendezvousHasher (#12817 )	2022-07-27 12:16:27 +05:30
Tejaswini Bandlamudi	5772dfd155	Peons should not report SysMonitor stats since MiddleManager reports them. (#12802 ) Sysmonitor stats (mem, fs, disk, net, cpu, swap, sys, tcp) are reported by all Druid processes, including Peons that are ephemeral in nature. Since Peons always run on the same host as the MiddleManager that spawned them and is unlikely to change, the SyMonitor metrics emitted by Peon are merely duplicates. This is often not a problem except when machines are super-beefy. Imagine a 64-core machine and 32 workers running on this machine. now you will have each Peon reporting metrics for each core. that's an increase of (32 * 64)x in the number of metrics. This leads to a metric explosion. This PR updates MetricsModule to check node role running while registering SysMonitor and not to load any existing SysMonitor$Stats.	2022-07-23 13:32:16 +05:30
Tejaswini Bandlamudi	cc1ff56ca5	Unregisters `RealtimeMetricsMonitor`, `TaskRealtimeMetricsMonitor` on Indexers after task completion (#12743 ) Few indexing tasks register RealtimeMetricsMonitor or TaskRealtimeMetricsMonitor with the process’s MonitorScheduler when they start. These monitors never unregister themselves (they always return true, they'd need to return false to unregister). Each of these monitors emits a set of metrics once every druid.monitoring.emissionPeriod. As a result, after executing several tasks for a while, Indexer emits metrics of these tasks even after they're long gone. Proposed Solution Since one should be able to obtain the last round of ingestion metrics after the task unregisters the monitor, introducing lastRoundMetricsToBePushed variable to keep track of the same and overriding the AbstractMonitor.monitor method in RealtimeMetricsMonitor, TaskRealtimeMetricsMonitor to implement the new logic.	2022-07-18 14:34:18 +05:30
Clint Wylie	05b2e967ed	druid nested data column type (#12753 ) * add new druid nested data column type * fixes and such * fixes * adjustments, more tests * self review * oops * fix and test * more better * style	2022-07-14 12:07:23 -07:00
zachjsh	c0380e7b0a	* fix duplicate dimension (#12778 )	2022-07-14 10:39:03 +05:30
TSFenwick	8c02880d5f	Emit metrics for distribution of number of rows per segment (#12730 ) * initial commit of bucket dimensions for metrics return counts of segments that have rowcount in a bucket size for a datasource return average value of rowcount per segment in a datasource added unit test naming could use a lot of work buckets right now are not finalized added javadocs altered metrics.md * fix checkstyle issues * addressed review comments add monitor test move added functionality to new monitor update docs * address comments renamed monitor handle tombstones better update docs added javadocs * Add support for tombstones in the segment distribution * undo changes to tombstone segmentizer factory * fix accidental whitespacing changes * address comments regarding metrics documentation and rename variable to be more accurate * fix tests * fix checkstyle issues * fix broken test * undo removal of timeout	2022-07-12 07:04:42 -07:00
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Gian Merlino	d2576584a0	Consolidate the two TaskStatus classes. (#12765 ) * Consolidate the two TaskStatus classes. There are two, but we don't need more than one. * Fix import order.	2022-07-11 07:25:22 -07:00
Gian Merlino	378fea9517	Retain CSP configuration in ServerConfig constructor. (#12755 ) Without this change, CliIndexer would not apply custom CSP headers and would revert to the default.	2022-07-08 19:19:14 +05:30
Gian Merlino	2b330186e2	Mid-level service client and updated high-level clients. (#12696 ) * Mid-level service client and updated high-level clients. Our servers talk to each other over HTTP. We have a low-level HTTP client (HttpClient) that is super-asynchronous and super-customizable through its handlers. It's also proven to be quite robust: we use it for Broker -> Historical communication over the wide variety of query types and workloads we support. But the low-level client has no facilities for service location or retries, which means we have a variety of high-level clients that implement these in their own ways. Some high-level clients do a better job than others. This patch adds a mid-level ServiceClient that makes it easier for high-level clients to be built correctly and harmoniously, and migrates some of the high-level logic to use ServiceClients. Main changes: 1) Add ServiceClient org.apache.druid.rpc package. That package also contains supporting stuff like ServiceLocator and RetryPolicy interfaces, and a DiscoveryServiceLocator based on DruidNodeDiscoveryProvider. 2) Add high-level OverlordClient in org.apache.druid.rpc.indexing. 3) Indexing task client creator in TaskServiceClients. It uses SpecificTaskServiceLocator to find the tasks. This improves on ClientInfoTaskProvider by caching task locations for up to 30 seconds across calls, reducing load on the Overlord. 4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient instead of extending IndexTaskClient. 5) Rework RemoteTaskActionClient to use a ServiceClient instead of DruidLeaderClient. 6) Rework LocalIntermediaryDataManager, TaskMonitor, and ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and Overlord no longer need IndexingServiceClient (which internally used DruidLeaderClient). There are some concrete benefits over the prior logic, namely: - DruidLeaderClient does retries in its "go" method, but only retries exactly 5 times, does not sleep between retries, and does not retry retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.) ServiceClient handles retries in a more reasonable way. - DruidLeaderClient's methods are all synchronous, whereas ServiceClient methods are asynchronous. This is used in one place so far: the SpecificTaskServiceLocator, so we don't need to block a thread trying to locate a task. It can be used in other places in the future. - HttpIndexingServiceClient does not properly handle all server errors. In some cases, it tries to parse a server error as a successful response (for example: in getTaskStatus). - IndexTaskClient currently makes an Overlord call on every task-to-task HTTP request, as a way to find where the target task is. ServiceClient, through SpecificTaskServiceLocator, caches these target locations for a period of time. * Style adjustments. * For the coverage. * Adjustments. * Better behaviors. * Fixes.	2022-07-05 09:43:26 -07:00
Kashif Faraz	f5b5cb93ea	Fix expiry timeout bug in LocalIntermediateDataManager (#12722 ) The expiry timeout is compared against the current time but the condition is reversed. This means that as soon as a supervisor task finishes, its partitions are cleaned up, irrespective of the specified `intermediaryPartitionTimeout` period. After these changes, the `intermediaryPartitionTimeout` will start getting honored. Changes * Fix the condition * Add tests to verify the new correct behaviour * Reduce the default expiry timeout from P1D to PT5M to retain current behaviour in case of default configs.	2022-07-01 16:29:22 +05:30
Gian Merlino	679ccffe0f	Revert "SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 )" (#12679 ) This reverts commit `8fbf92e047`.	2022-06-25 09:08:26 +05:30
Tejaswini Bandlamudi	1fc2f6e4b0	Throw BadQueryContextException if context params cannot be parsed (#12680 )	2022-06-24 09:21:25 +05:30
Paul Rogers	ffcb996468	Cleanup changes pulled out of PR #12368 (#12672 ) This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`	2022-06-23 23:19:50 +05:30
AmatyaAvadhanula	eccdec9139	Reduce interval creation cost for segment cost computation (#12670 ) Changes: - Reuse created interval in `SegmentId.getInterval()` - Intern intervals to save on memory footprint	2022-06-21 17:39:43 +05:30
Paul Rogers	893759de91	Remove null and empty fields from native queries (#12634 ) * Remove null and empty fields from native queries * Test fixes * Attempted IT fix. * Revisions from review comments * Build fixes resulting from changes suggested by reviews * IT fix for changed segment size	2022-06-16 14:07:25 -07:00
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Lucas Capistrant	602d95d865	Add a builder class for TestDruidCoordinatorConfig (#12624 ) * Add a builder class for TestDruidCoordinatorConfig * updates after review * Fix formatting	2022-06-16 09:11:31 -05:00
Paul Rogers	45e3111549	Clean up query contexts (#12633 ) * Clean up query contexts Uses constants in place of literal strings for context keys. Moves some QueryContext methods to QueryContexts for reuse. * Revisions from review comments	2022-06-15 11:31:22 -07:00
Gian Merlino	1f6e888472	Add QoSFilters first in the chain. (#12625 ) * Add QoSFilters first in the chain. When a request is suspended and later resumed due to QoS constraints, its filter chain is restarted. Placing QoSFilters first in the chain avoids double-execution of other filters. Fixes an issue where requests deferred by QoS would report 403 Forbidden due to double-execution of SecuritySanityCheckFilter. * Smaller changes. * Add QoS filters in BaseJettyTest. * Remove unused parameter.	2022-06-14 13:37:00 -07:00
Gian Merlino	8fbf92e047	SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. (#12600 ) * SqlSegmentsMetadataQuery: Fix OVERLAPS for wide target segments. Segments with endpoints prior to year 0 or after year 9999 may overlap the search intervals but not match the generated SQL conditions. So, we need to add an additional OR condition to catch these. I checked a real, live MySQL metadata store to confirm that the query still uses metadata store indexes. It does. * Add comments.	2022-06-07 11:33:46 -07:00
Abhishek Agarwal	59a0c10c47	Add remedial information in error message when type is unknown (#12612 ) Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.	2022-06-07 20:22:45 +05:30
Gian Merlino	a503683a4a	Add caching and CSP response headers. (#12609 ) * Add caching and CSP response headers. * Fix tests. * Fix checkstyle issues Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-06-04 21:46:49 +05:30
Clint Wylie	d0c9c37e35	make query context changes backwards compatible (#12564 ) Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.	2022-05-25 15:24:41 +05:30
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
superivaj	f9bdb3b236	Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551 ) Issue: Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config (to optimize memory usage, particulary for datasources with many dimensions), the corresponding client object `ClientCompactionTaskQueryTuningConfig` (used by the coordinator duty `CompactSegments` to trigger auto-compaction) does not contain this field. Thus, the value of `maxColumnsToMerge` specified in any datasource compaction config is ignored. Changes: - Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig` and `UserCompactionTaskQueryTuningConfig` - Fix tests	2022-05-20 22:23:08 +05:30
Gian Merlino	4631cff2a9	Free ByteBuffers in tests and fix some bugs. (#12521 ) * Ensure ByteBuffers allocated in tests get freed. Many tests had problems where a direct ByteBuffer would be allocated and then not freed. This is bad because it causes flaky tests. To fix this: 1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder. This makes it easy to free the direct buffer. Currently, it's only used in tests, because production code seems OK. 2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either to ByteBuffer.allocate (on-heap, which are garbaged collected), or to ByteBufferUtils.allocateDirect (wherever it seemed like there was a good reason for the buffer to be off-heap). Make sure to close all direct holders when done. * Changes based on CI results. * A different approach. * Roll back BitmapOperationTest stuff. * Try additional surefire memory. * Revert "Roll back BitmapOperationTest stuff." This reverts commit `49f846d9e3`. * Add TestBufferPool. * Revert Xmx change in tests. * Better behaved NestedQueryPushDownTest. Exit tests on OOME. * Fix TestBufferPool. * Remove T1C from ARM tests. * Somewhat safer. * Fix tests. * Fix style stuff. * Additional debugging. * Reset null / expr configs better. * ExpressionLambdaAggregatorFactory thread-safety. * Alter forkNode to try to get better info when a JVM crashes. * Fix buffer retention in ExpressionLambdaAggregatorFactory. * Remove unused import.	2022-05-19 07:42:29 -07:00
Tejaswini Bandlamudi	c877d8a981	Updates default inputSegmentSizeBytes in Compaction config (#12534 ) Fixes Cannot serialize BigInt value as JSON error while loading compaction config in console.	2022-05-19 14:43:34 +05:30
Clint Wylie	b23ddc5939	print replication levels in coordinator segment logs (#12511 ) * print replication levels in coordinator segment logs * add served segment count to stats * also for drops	2022-05-17 02:24:13 -07:00
Lucas Capistrant	deb69d1bc0	Allow coordinator to be configured to kill segments in future (#10877 ) Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system. A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.	2022-05-11 07:35:15 +05:30
Vadim Ogievetsky	fb08bac01a	Web console: Misc table fixes (#12489 ) * Misc table fixes * extract default className * table spacing updates * fix e2e action selector * try more times * make the web console exist again	2022-05-03 12:08:08 -07:00
MC-JY	bb080693a9	Improve build performance of modules (#12486 ) * improve build performance of modules * improve build performance of modules * Update pom.xml * improve build performance of modules	2022-05-01 22:43:11 +08:00
Gian Merlino	529b983ad0	GroupBy: Reduce allocations by reusing entry and key holders. (#12474 ) * GroupBy: Reduce allocations by reusing entry and key holders. Two main changes: 1) Reuse Entry objects returned by various implementations of Grouper.iterator. 2) Reuse key objects contained within those Entry objects. This is allowed by the contract, which states that entries must be processed and immediately discarded. However, not all call sites respected this, so this patch also updates those call sites. One particularly sneaky way that the old code retained entries too long is due to Guava's MergingIterator and CombiningIterator. Internally, these both advance to the next value prior to returning the current value. So, this patch addresses that in two ways: 1) For merging, we have our own implementation MergeIterator already, although it had the same problem. So, this patch updates our implementation to return the current item prior to advancing to the next item. It also adds a forbidden-api entry to ensure that this safer implementation is used instead of Guava's. 2) For combining, we address the problem in a different way: by copying the key when creating the new, combined entry. * Attempt to fix test. * Remove unused import.	2022-04-28 23:21:13 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Rohan Garg	95694b5afa	Convert simple min/max SQL queries on __time to timeBoundary queries (#12472 ) * Support array based results in timeBoundary query * Fix bug with query interval in timeBoundary * Convert min(__time) and max(__time) SQL queries to timeBoundary * Add tests for timeBoundary backed SQL queries * Fix query plans for existing tests * fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary * fixup! Add tests for timeBoundary backed SQL queries * fixup! Fix bug with query interval in timeBoundary	2022-04-25 08:18:58 -07:00
Gian Merlino	b7621226d2	QueryScheduler: Log per-query message at DEBUG level. (#12467 ) We generally want to avoid having any routine per-query messages at INFO level, because they pollute logs.	2022-04-22 11:22:34 -07:00
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
Maytas Monsereenusorn	c25a556827	Fix bug in auto compaction preserveExistingMetrics feature (#12438 ) * fix bug * fix test * fix IT	2022-04-15 15:47:47 -07:00
Agustin Gonzalez	0460d45e92	Make tombstones ingestible by having them return an empty result set. (#12392 ) * Make tombstones ingestible by having them return an empty result set. * Spotbug * Coverage * Coverage * Remove unnecessary exception (checkstyle) * Fix integration test and add one more to test dropExisting set to false over tombstones * Force dropExisting to true in auto-compaction when the interval contains only tombstones * Checkstyle, fix unit test * Changed flag by mistake, fixing it * Remove method from interface since this method is specific to only DruidSegmentInputentity * Fix typo * Adapt to latest code * Update comments when only tombstones to compact * Move empty iterator to a new DruidTombstoneSegmentReader * Code review feedback * Checkstyle * Review feedback * Coverage	2022-04-15 09:08:06 -07:00
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
Maytas Monsereenusorn	36e17a20ea	Improve metrics for Auto Compaction (#12413 ) * add impl * add docs * fix	2022-04-08 20:14:36 -07:00
Maytas Monsereenusorn	8edea5a82d	Add a new flag for ingestion to preserve existing metrics (#12185 ) * add impl * add impl * fix checkstyle * add impl * add unit test * fix stuff * fix stuff * fix stuff * add unit test * add more unit tests * add more unit tests * add IT * add IT * add IT * add IT * add ITs * address comments * fix test * fix test * fix test * address comments * address comments * address comments * fix conflict * fix checkstyle * address comments * fix test * fix checkstyle * fix test * fix test * fix IT	2022-04-08 11:02:02 -07:00
Paul Rogers	2cc2088720	Method to specify eternity in the scan query builder (#12223 ) * Method to specify eternity in the scan query builder * Fix checkstyle issue * Renamed eterity() to eternityInterval() * Minor fixes	2022-04-04 15:11:32 -07:00
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
Yuanli Han	f2495a67d2	fix messageGap metric (#12337 )	2022-03-28 09:21:06 -07:00
Maytas Monsereenusorn	ea51d8a16c	Duties in Indexing group (such as Auto Compaction) does not report metrics (#12352 ) * add impl * add unit tests * fix checkstyle * address comments * fix checkstyle	2022-03-23 18:18:28 -07:00
Jihoon Son	b6eeef31e5	Store null columns in the segments (#12279 ) * Store null columns in the segments * fix test * remove NullNumericColumn and unused dependency * fix compile failure * use guava instead of apache commons * split new tests * unused imports * address comments	2022-03-23 16:54:04 -07:00
Maytas Monsereenusorn	dbb9518f50	Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334 ) * add impl * add ITs * address comments * address comments * address comments * fix failure * fix checkstyle * fix checkstyle	2022-03-18 12:46:16 -07:00
Jihoon Son	5e23674fe5	Fix a race condition in the '/tasks' Overlord API (#12330 ) * finds complete and active tasks from the same snapshot * overlord resource * unit test * integration test * javadoc and cleanup * more cleanup * fix test and add more	2022-03-17 10:47:45 +09:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	28f8bcce9b	Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307 ) * Always reopen stream in FileUtils.copyLarge, RetryingInputStream. When an InputStream throws an exception from one of its read methods, we should assume it's bad and reopen it. The main changes here are: - In FileUtils.copyLarge, replace InputStream with InputStreamSupplier. - In RetryingInputStream, collapse retryCondition and resetCondition into a single condition. Also, make it required, since every usage is passing in a specific condition anyway. * Test fixes. * Fix read impl.	2022-03-05 14:39:14 -08:00
Sandeep	61e1ffc7f7	add a new query laning metrics to visualize lane assignment (#12111 ) * add a new query laning metrics to visualize lane assignment * fixes :spotbugs check * Update docs/operations/metrics.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2022-03-04 15:21:17 +08:00
Laksh Singla	3f709db173	Make ParseExceptions more informative (#12259 ) This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader) Following changes are addressed in this PR: A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next(). IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number"). TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error. This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).	2022-02-28 22:31:15 +05:30

... 2 3 4 5 6 ...

4183 Commits