druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	d0c9c37e35	make query context changes backwards compatible (#12564 ) Adds a default implementation of getQueryContext, which was added to the Query interface in #12396. Query is marked with @ExtensionPoint, and lately we have been trying to be less volatile on these interfaces by providing default implementations to be more chill for extension writers. The way this default implementation is done in this PR is a bit strange due to the way that getQueryContext is used (mutated with system default and system generated keys); the default implementation has a specific object that it returns, and I added another temporary default method isLegacyContext that checks if the getQueryContext returns that object or not. If not, callers fall back to using getContext and withOverriddenContext to set these default and system values. I am open to other ideas as well, but this way should work at least without exploding, and added some tests to ensure that it is wired up correctly for QueryLifecycle, including the context authorization stuff. The added test shows the strange behavior if query context authorization is enabled, mainly that the system default and system generated query context keys also need to be granted as permissions for things to function correctly. This is not great, so I mentioned it in the javadocs as well. Not sure if it needs to be called out anywhere else.	2022-05-25 15:24:41 +05:30
Agustin Gonzalez	2f3d7a4c07	Emit state of replace and append for native batch tasks (#12488 ) * Emit state of replace and append for native batch tasks * Emit count of one depending on batch ingestion mode (APPEND, OVERWRITE, REPLACE) * Add metric to compaction job * Avoid null ptr exc when null emitter * Coverage * Emit tombstone & segment counts * Tasks need a type * Spelling * Integrate BatchIngestionMode in batch ingestion tasks functionality * Typos * Remove batch ingestion type from metric since it is already in a dimension. Move IngestionMode to AbstractTask to facilitate having mode as a dimension. Add metrics to streaming. Add missing coverage. * Avoid inner class referenced by sub-class inspection. Refactor computation of IngestionMode to make it more robust to null IOConfig and fix test. * Spelling * Avoid polluting the Task interface * Rename computeCompaction methods to avoid ambiguous java compiler error if they are passed null. Other minor cleanup.	2022-05-23 12:32:47 -07:00
superivaj	f9bdb3b236	Fix usage of maxColumnsToMerge in auto-compaction tuning config (#12551 ) Issue: Even though `CompactionTuningConfig` allows a `maxColumnsToMerge` config (to optimize memory usage, particulary for datasources with many dimensions), the corresponding client object `ClientCompactionTaskQueryTuningConfig` (used by the coordinator duty `CompactSegments` to trigger auto-compaction) does not contain this field. Thus, the value of `maxColumnsToMerge` specified in any datasource compaction config is ignored. Changes: - Add field `maxColumnsToMerge` in `ClientCompactionTaskQueryTuningConfig` and `UserCompactionTaskQueryTuningConfig` - Fix tests	2022-05-20 22:23:08 +05:30
Gian Merlino	4631cff2a9	Free ByteBuffers in tests and fix some bugs. (#12521 ) * Ensure ByteBuffers allocated in tests get freed. Many tests had problems where a direct ByteBuffer would be allocated and then not freed. This is bad because it causes flaky tests. To fix this: 1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder. This makes it easy to free the direct buffer. Currently, it's only used in tests, because production code seems OK. 2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either to ByteBuffer.allocate (on-heap, which are garbaged collected), or to ByteBufferUtils.allocateDirect (wherever it seemed like there was a good reason for the buffer to be off-heap). Make sure to close all direct holders when done. * Changes based on CI results. * A different approach. * Roll back BitmapOperationTest stuff. * Try additional surefire memory. * Revert "Roll back BitmapOperationTest stuff." This reverts commit `49f846d9e3`. * Add TestBufferPool. * Revert Xmx change in tests. * Better behaved NestedQueryPushDownTest. Exit tests on OOME. * Fix TestBufferPool. * Remove T1C from ARM tests. * Somewhat safer. * Fix tests. * Fix style stuff. * Additional debugging. * Reset null / expr configs better. * ExpressionLambdaAggregatorFactory thread-safety. * Alter forkNode to try to get better info when a JVM crashes. * Fix buffer retention in ExpressionLambdaAggregatorFactory. * Remove unused import.	2022-05-19 07:42:29 -07:00
Tejaswini Bandlamudi	c877d8a981	Updates default inputSegmentSizeBytes in Compaction config (#12534 ) Fixes Cannot serialize BigInt value as JSON error while loading compaction config in console.	2022-05-19 14:43:34 +05:30
Clint Wylie	b23ddc5939	print replication levels in coordinator segment logs (#12511 ) * print replication levels in coordinator segment logs * add served segment count to stats * also for drops	2022-05-17 02:24:13 -07:00
Lucas Capistrant	deb69d1bc0	Allow coordinator to be configured to kill segments in future (#10877 ) Allow a Druid cluster to kill segments whose interval_end is a date in the future. This can be done by setting druid.coordinator.kill.durationToRetain to a negative period. For example PT-24H would allow segments to be killed if their interval_end date was 24 hours or less into the future at the time that the kill task is generated by the system. A cluster operator can also disregard the druid.coordinator.kill.durationToRetain entirely by setting a new configuration, druid.coordinator.kill.ignoreDurationToRetain=true. This ignores interval_end date when looking for segments to kill, and instead is capable of killing any segment marked unused. This new configuration is off by default, and a cluster operator should fully understand and accept the risks if they enable it.	2022-05-11 07:35:15 +05:30
Vadim Ogievetsky	fb08bac01a	Web console: Misc table fixes (#12489 ) * Misc table fixes * extract default className * table spacing updates * fix e2e action selector * try more times * make the web console exist again	2022-05-03 12:08:08 -07:00
MC-JY	bb080693a9	Improve build performance of modules (#12486 ) * improve build performance of modules * improve build performance of modules * Update pom.xml * improve build performance of modules	2022-05-01 22:43:11 +08:00
Gian Merlino	529b983ad0	GroupBy: Reduce allocations by reusing entry and key holders. (#12474 ) * GroupBy: Reduce allocations by reusing entry and key holders. Two main changes: 1) Reuse Entry objects returned by various implementations of Grouper.iterator. 2) Reuse key objects contained within those Entry objects. This is allowed by the contract, which states that entries must be processed and immediately discarded. However, not all call sites respected this, so this patch also updates those call sites. One particularly sneaky way that the old code retained entries too long is due to Guava's MergingIterator and CombiningIterator. Internally, these both advance to the next value prior to returning the current value. So, this patch addresses that in two ways: 1) For merging, we have our own implementation MergeIterator already, although it had the same problem. So, this patch updates our implementation to return the current item prior to advancing to the next item. It also adds a forbidden-api entry to ensure that this safer implementation is used instead of Guava's. 2) For combining, we address the problem in a different way: by copying the key when creating the new, combined entry. * Attempt to fix test. * Remove unused import.	2022-04-28 23:21:13 -07:00
Gian Merlino	a2bad0b3a2	Reduce allocations due to Jackson serialization. (#12468 ) * Reduce allocations due to Jackson serialization. This patch attacks two sources of allocations during Jackson serialization: 1) ObjectMapper.writeValue and JsonGenerator.writeObject create a new DefaultSerializerProvider instance for each call. It has lots of fields and creates pressure on the garbage collector. So, this patch adds helper functions in JacksonUtils that enable reuse of SerializerProvider objects and updates various call sites to make use of this. 2) GroupByQueryToolChest copies the ObjectMapper for every query to install a special module that supports backwards compatibility with map-based rows. This isn't needed if resultAsArray is set and all servers are running Druid 0.16.0 or later. This release was a while ago. So, this patch disables backwards compatibility by default, which eliminates the need to copy the heavyweight ObjectMapper. The patch also introduces a configuration option that allows admins to explicitly enable backwards compatibility. * Add test. * Update additional call sites and add to forbidden APIs.	2022-04-27 14:17:26 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Rohan Garg	95694b5afa	Convert simple min/max SQL queries on __time to timeBoundary queries (#12472 ) * Support array based results in timeBoundary query * Fix bug with query interval in timeBoundary * Convert min(__time) and max(__time) SQL queries to timeBoundary * Add tests for timeBoundary backed SQL queries * Fix query plans for existing tests * fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary * fixup! Add tests for timeBoundary backed SQL queries * fixup! Fix bug with query interval in timeBoundary	2022-04-25 08:18:58 -07:00
Gian Merlino	b7621226d2	QueryScheduler: Log per-query message at DEBUG level. (#12467 ) We generally want to avoid having any routine per-query messages at INFO level, because they pollute logs.	2022-04-22 11:22:34 -07:00
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
Maytas Monsereenusorn	c25a556827	Fix bug in auto compaction preserveExistingMetrics feature (#12438 ) * fix bug * fix test * fix IT	2022-04-15 15:47:47 -07:00
Agustin Gonzalez	0460d45e92	Make tombstones ingestible by having them return an empty result set. (#12392 ) * Make tombstones ingestible by having them return an empty result set. * Spotbug * Coverage * Coverage * Remove unnecessary exception (checkstyle) * Fix integration test and add one more to test dropExisting set to false over tombstones * Force dropExisting to true in auto-compaction when the interval contains only tombstones * Checkstyle, fix unit test * Changed flag by mistake, fixing it * Remove method from interface since this method is specific to only DruidSegmentInputentity * Fix typo * Adapt to latest code * Update comments when only tombstones to compact * Move empty iterator to a new DruidTombstoneSegmentReader * Code review feedback * Checkstyle * Review feedback * Coverage	2022-04-15 09:08:06 -07:00
Parag Jain	2c79d28bb7	Copy of #11309 with fixes (#12402 ) * Optionally load segment index files into page cache on bootstrap and new segment download * Fix unit test failure * Fix test case * fix spelling * fix spelling * fix test and test coverage issues Co-authored-by: Jian Wang <wjhypo@gmail.com>	2022-04-11 21:05:24 +05:30
Maytas Monsereenusorn	36e17a20ea	Improve metrics for Auto Compaction (#12413 ) * add impl * add docs * fix	2022-04-08 20:14:36 -07:00
Maytas Monsereenusorn	8edea5a82d	Add a new flag for ingestion to preserve existing metrics (#12185 ) * add impl * add impl * fix checkstyle * add impl * add unit test * fix stuff * fix stuff * fix stuff * add unit test * add more unit tests * add more unit tests * add IT * add IT * add IT * add IT * add ITs * address comments * fix test * fix test * fix test * address comments * address comments * address comments * fix conflict * fix checkstyle * address comments * fix test * fix checkstyle * fix test * fix test * fix IT	2022-04-08 11:02:02 -07:00
Paul Rogers	2cc2088720	Method to specify eternity in the scan query builder (#12223 ) * Method to specify eternity in the scan query builder * Fix checkstyle issue * Renamed eterity() to eternityInterval() * Minor fixes	2022-04-04 15:11:32 -07:00
Tejaswini Bandlamudi	984904779b	Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381 ) The current default value of inputSegmentSizeBytes is 400MB, which is pretty low for most compaction use cases. Thus most users are forced to override the default. The default value is now increased to Long.MAX_VALUE.	2022-04-04 16:28:53 +05:30
Yuanli Han	f2495a67d2	fix messageGap metric (#12337 )	2022-03-28 09:21:06 -07:00
Maytas Monsereenusorn	ea51d8a16c	Duties in Indexing group (such as Auto Compaction) does not report metrics (#12352 ) * add impl * add unit tests * fix checkstyle * address comments * fix checkstyle	2022-03-23 18:18:28 -07:00
Jihoon Son	b6eeef31e5	Store null columns in the segments (#12279 ) * Store null columns in the segments * fix test * remove NullNumericColumn and unused dependency * fix compile failure * use guava instead of apache commons * split new tests * unused imports * address comments	2022-03-23 16:54:04 -07:00
Maytas Monsereenusorn	dbb9518f50	Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334 ) * add impl * add ITs * address comments * address comments * address comments * fix failure * fix checkstyle * fix checkstyle	2022-03-18 12:46:16 -07:00
Jihoon Son	5e23674fe5	Fix a race condition in the '/tasks' Overlord API (#12330 ) * finds complete and active tasks from the same snapshot * overlord resource * unit test * integration test * javadoc and cleanup * more cleanup * fix test and add more	2022-03-17 10:47:45 +09:00
AmatyaAvadhanula	7bf1d8c5c0	Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298 ) Add config for eager / lazy connection initialization in ResourcePool Description Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator. While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it. Patch Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator. It is unnecessary to do this with other types of nodes. A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized. If set to false, lazy initialization of connection resources takes place. NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR Algorithm The current implementation relies on the creation of maxSize resources eagerly. The new implementation's behaviour is as follows: If a resource has been previously created and is available, lend it. Else if the number of created resources is less than the allowed parameter, create and lend it. Else, wait for one of the lent resources to be returned.	2022-03-09 23:17:43 +05:30
Agustin Gonzalez	abe76ccb90	Batch ingestion replace (#12137 ) * Tombstone support for replace functionality * A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec * Update compaction test to match replace behavior * Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker. * Style plus simple queriableindex test * Add segment cache loader tombstone test * Add more tests * Add a method to the LogicalSegment to test whether it has any data * Test filter with some empty logical segments * Refactor more compaction/dropexisting tests * Code coverage * Support for all empty segments * Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them. * Fix null ptr when segment does not have a queriable index * Add support for empty replace interval (all input data has been filtered out) * Fixed coverage & style * Find tombstone versions from lock versions * Test failures & style * Interner was making this fail since the two segments were consider equal due to their id's being equal * Cleanup tombstone version code * Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used * Reject replace spec when input intervals are empty * Documentation * Style and unit test * Restore test code deleted by mistake * Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added. * Unused imports. Dead code. Test coverage. * Coverage. * Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments. * Fix OmniKiller + more test coverage. * Tombstones are now marked using a shard spec * Drop a segment factory.json in the segment cache for tombstones * Style * Style + coverage * style * Add TombstoneLoadSpec.class to mapper in test * Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java Typo Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Update docs/configuration/index.md Missing Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com> * Typo * Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold. * Range does not work with multi-dim Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>	2022-03-08 20:07:02 -07:00
Gian Merlino	28f8bcce9b	Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307 ) * Always reopen stream in FileUtils.copyLarge, RetryingInputStream. When an InputStream throws an exception from one of its read methods, we should assume it's bad and reopen it. The main changes here are: - In FileUtils.copyLarge, replace InputStream with InputStreamSupplier. - In RetryingInputStream, collapse retryCondition and resetCondition into a single condition. Also, make it required, since every usage is passing in a specific condition anyway. * Test fixes. * Fix read impl.	2022-03-05 14:39:14 -08:00
Sandeep	61e1ffc7f7	add a new query laning metrics to visualize lane assignment (#12111 ) * add a new query laning metrics to visualize lane assignment * fixes :spotbugs check * Update docs/operations/metrics.md Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update server/src/main/java/org/apache/druid/server/QueryScheduler.java Co-authored-by: Benedict Jin <asdf2014@apache.org> Co-authored-by: Benedict Jin <asdf2014@apache.org>	2022-03-04 15:21:17 +08:00
Laksh Singla	3f709db173	Make ParseExceptions more informative (#12259 ) This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader) Following changes are addressed in this PR: A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next(). IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number"). TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error. This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).	2022-02-28 22:31:15 +05:30
Xavier Léauté	d105519558	Replace use of PowerMock with Mockito (#12282 ) Mockito now supports all our needs and plays much better with recent Java versions. Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. * replace all uses of powermock with mockito-inline * upgrade mockito to 4.3.1 and fix use of deprecated methods * import mockito bom to align all our mockito dependencies * add powermock to forbidden-apis to avoid accidentally reintroducing it in the future	2022-02-27 22:47:09 -08:00
Xavier Léauté	1434197ee1	update airline dependency to 2.x (#12270 ) * upgrade Airline to Airline 2 https://github.com/airlift/airline is no longer maintained, updating to https://github.com/rvesse/airline (Airline 2) to use an actively maintained version, while minimizing breaking changes. Note, this is a backwards incompatible change, and extensions relying on the CliCommandCreator extension point will also need to be updated. * fix dependency checks where jakarta.inject is now resolved first instead of javax.inject, due to Airline 2 using jakarta	2022-02-27 15:19:28 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Maytas Monsereenusorn	6e2eded277	Allow coordinator run auto compaction duty period to be configured separately from other indexing duties (#12263 ) * add impl * add impl * add unit tests * add impl * add impl * add serde test * add tests * add docs * fix test * fix test * fix docs * fix docs * fix spelling	2022-02-18 23:02:57 -08:00
tejaswini-imply	70c40c4281	Fix long overflow in SegmentCostCache.Bucket.toLocalInterval (#12257 ) Problem: When using a `CachingCostBalancerStrategy` with segments of granularity ALL, no segment gets loaded. - With granularity ALL, segments of eternity interval are created which have `start = Long.MIN_VALUE / 2` and `end = Long.MAX_VALUE / 2`. - For cost calculation in the balancer strategy, `toLocalInterval()` method is invoked where `Long.MIN_VALUE / 2` or `Long.MAX_VALUE / 2` cause an overflow thus resulting in no overlap. - The strategy is unable to find any eligible server for loading a given segment. Fix: - Reverse order of operations to divide by `MILLIS_FACTOR` (~10^8) first, then do the subtraction to prevent Long overflow.	2022-02-17 15:13:51 +05:30
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Suneet Saldanha	ced1389d4c	Enable auto kill segments by default (#12187 ) * Enable auto-kill by default * tests * wip * test * fix IT * fix it * remove from docs * make coverage bot happy	2022-02-07 06:57:54 -08:00
Maytas Monsereenusorn	2b8e7fc0b4	Add a flag to allow auto compaction task slot ratio to consider auto scaler slots (#12228 ) * add impl * fix checkstyle * add unit tests * checkstyle * add IT * fix IT * add comments * fix checkstyle	2022-02-06 20:46:05 -08:00
Suneet Saldanha	159f97dcb0	Update docs for druid.processing.numThreads in brokers (#12231 ) * Update docs for druid.processing.numThreads * error msg * one more reference	2022-02-04 17:34:21 -08:00
Clint Wylie	8fd587b28c	remove duplicate Broker ServerInventoryView, improve HttpServerInventoryView logging (#12209 ) * changes: * remove SystemSchema duplicate ServerInventoryView in broker * suppress duplicate segment added/removed warnings in HttpServerInventoryView when doing a full sync * fixes	2022-02-03 12:57:34 -08:00
Kashif Faraz	e648b01afb	Improve memory estimates in Aggregator and DimensionIndexer (#12073 ) Fixes #12022 ### Description The current implementations of memory estimation in `OnHeapIncrementalIndex` and `StringDimensionIndexer` tend to over-estimate which leads to more persistence cycles than necessary. This PR replaces the max estimation mechanism with getting the incremental memory used by the aggregator or indexer at each invocation of `aggregate` or `encode` respectively. ### Changes - Add new flag `useMaxMemoryEstimates` in the task context. This overrides the same flag in DefaultTaskConfig i.e. `druid.indexer.task.default.context` map - Add method `AggregatorFactory.factorizeWithSize()` that returns an `AggregatorAndSize` which contains the aggregator instance and the estimated initial size of the aggregator - Add method `Aggregator.aggregateWithSize()` which returns the incremental memory used by this aggregation step - Update the method `DimensionIndexer.processRowValsToKeyComponent()` to return the encoded key component as well as its effective size in bytes - Update `OnHeapIncrementalIndex` to use the new estimations only if `useMaxMemoryEstimates = false`	2022-02-03 10:34:02 +05:30
Rohan Garg	c4fa3ccfc4	Fix load-drop-load sequence for same segment and historical in http loadqueue peon (#11717 ) Fixes an issue where a load-drop-load sequence for a segment and historical doesn't work correctly for http based load queue peon. The first cycle of load-drop works fine - the problem comes when there is an attempt to reload the segment. The historical caches load success for some recent segments and makes the reload as a no-op. But it doesn't consider that fact that the segment was also dropped in between the load requests. This change invalidates the cache after a client tries to fetch a success result.	2022-01-31 13:16:58 +05:30
Clint Wylie	5d2291991e	use reflection to check for mysql transient exception type (#12205 ) * use reflection to check for mysql transient exception type * better * oops	2022-01-27 13:13:16 -08:00
zachjsh	f906f2f577	Fix HttpRemoteTaskRunner LifecycleStart / LifecycleStop race condition (#12184 ) * * stop workers, remove listener, and call exitStop() on HttpRemoteTaskRunner @LifecycleStop * * fix test failure	2022-01-27 13:15:14 -06:00
TSFenwick	a813816fb1	add module test for QueryableModule to allow for better runtime.properties testing (#12202 ) added a default GetRequestLoggerProviderTest and GetEmitterRequestLoggerProviderTest	2022-01-25 22:26:11 -08:00
Karan Kumar	96b3498a40	Grouping on arrays as arrays (#12078 ) * init multiValue column group by * Changing sorting to Lexicographic as default * Adding initial tests * 1.Fixing test cases adding 2.Optimized inmem structs * Linking SQL layer to native layer * Adding multiDimension support to group by column strategy * 1. Removing array coercion in Calcite layer 2. Removing ResultRowDeserializer * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Supporting all primitive array types 2. Removing dimension spec as part of columnSelector * 1. Checkstyle things 2. Removing flag * Minor naming things * CheckStyle Things * Fixing test case * Fixing hashing * 1. Adding the MV function 2. Added few test cases * 1. Adding MV function test cases * Adding Selector strategy function test cases * Fixing ClientQuerySegmentWalkerTest * Adding GroupByQueryRunnerTest test cases * Fixing test cases * Adding few more test cases * Fixing Exception asset statement and intellij inspection * Adding null compatibility tests * Review comments * Fixing few failing tests * Fixing few failing tests * Do no convert to topN Q incase of group by on array * Fixing checkstyle * Fixing differences between jdk's class cast exception message * 1. Fixing ordering if the grouping key is an array * Fixing DefaultLimitSpec * Fixing CalciteArraysQueryTest * Dummy commit for LGTM * changes: * only coerce multi-value string null values when `ExpressionPlan.Trait.NEEDS_APPLIED` is set * correct return type inference for ARRAY_APPEND,ARRAY_PREPEND,ARRAY_SLICE,ARRAY_CONCAT * fix bug with ExprEval.ofType when actual type of object from binding doesn't match its claimed type * Review comments * Fixing test cases * Fixing spot bugs * Fixing strict compile Co-authored-by: Clint Wylie <cwylie@apache.org>	2022-01-25 20:30:56 -08:00
Suneet Saldanha	2b32d86f3b	Enable automatic metdata cleanup by default (#12188 )	2022-01-24 20:04:17 -08:00

1 2 3 4 5 ...

3866 Commits