druid

Commit Graph

Author	SHA1	Message	Date
Rishabh Singh	74422b58f5	Emit disk spill and merge buffer utilisation metrics for GroupBy queries (#17360 ) This change is to emit following metrics as part of GroupByStatsMonitor monitor, mergeBuffer/used -> Number of merge buffers used. mergeBuffer/acquisitionTimeNs -> Total time required to acquire merge buffer. mergeBuffer/acquisition -> Number of queries that acquired a batch of merge buffers. groupBy/spilledQueries -> Number of queries that spilled onto the disk. groupBy/spilledBytes-> Spilled bytes on the disk. groupBy/mergeDictionarySize -> Size of the merging dictionary.	2024-11-22 14:22:03 +05:30
Gian Merlino	fd6706cd6a	MSQ: Rework memory management. (#17057 ) * MSQ: Rework memory management. This patch reworks memory management to better support multi-threaded workers running in shared JVMs. There are two main changes. First, processing buffers and threads are moved from a per-JVM model to a per-worker model. This enables queries to hold processing buffers without blocking other concurrently-running queries. Changes: - Introduce ProcessingBuffersSet and ProcessingBuffers to hold the per-worker and per-work-order processing buffers (respectively). On Peons, this is the JVM-wide processing pool. On Indexers, this is a per-worker pool of on-heap buffers. (This change fixes a bug on Indexers where excessive processing buffers could be used if MSQ tasks ran concurrently with realtime tasks.) - Add "bufferPool" argument to GroupingEngine#process so a per-worker pool can be passed in. - Add "druid.msq.task.memory.maxThreads" property, which controls the maximum number of processing threads to use per task. This allows usage of multiple processing buffers per task if admins desire. - IndexerWorkerContext acquires processingBuffers when creating the FrameContext for a work order, and releases them when closing the FrameContext. - Add "usesProcessingBuffers()" to FrameProcessorFactory so workers know how many sets of processing buffers are needed to run a given query. Second, adjustments to how WorkerMemoryParameters slices up bundles, to favor more memory for sorting and segment generation. Changes: - Instead of using same-sized bundles for processing and for sorting, workers now use minimally-sized processing bundles (just enough to read inputs plus a little overhead). The rest is devoted to broadcast data buffering, sorting, and segment-building. - Segment-building is now limited to 1 concurrent segment per work order. This allows each segment-building action to use more memory. Note that segment-building is internally multi-threaded to a degree. (Build and persist can run concurrently.) - Simplify frame size calculations by removing the distinction between "standard" and "large" frames. The new default frame size is the same as the old "standard" frames, 1 MB. The original goal of of the large frames was to reduce the number of temporary files during sorting, but I think we can achieve the same thing by simply merging a larger number of standard frames at once. - Remove the small worker adjustment that was added in #14117 to account for an extra frame involved in writing to durable storage. Instead, account for the extra frame whenever we are actually using durable storage. - Cap super-sorter parallelism using the number of output partitions, rather than using a hard coded cap at 4. Note that in practice, so far, this cap has not been relevant for tasks because they have only been using a single processing thread anyway. * Remove unused import. * Fix errorprone annotation. * Fixes for javadocs and inspections. * Additional test coverage. * Fix test.	2024-09-14 15:35:21 -07:00
Abhishek Agarwal	78775ad398	Prepare master for 32.0.0 release (#17022 )	2024-09-10 11:01:20 +05:30
Adarsh Sanjeev	9a2d7c28bc	Prepare master branch for 31.0.0 release (#16333 )	2024-04-26 09:22:43 +05:30
Laksh Singla	6bca406d31	Grouping on complex columns aka unifying GroupBy strategies (#16068 ) Users can pass complex types as dimensions to the group by queries. For example: SELECT nested_col1, count(*) FROM foo GROUP BY nested_col1	2024-04-24 23:00:14 +05:30
Laksh Singla	b9bbde5c0a	Fix deadlock that can occur while merging group by results (#15420 ) This PR prevents such a deadlock from happening by acquiring the merge buffers in a single place and passing it down to the runner that might need it.	2024-04-22 14:10:44 +05:30
Karan Kumar	c4990f56d6	Prepare main branch for next 30.0.0 release. (#15707 )	2024-01-23 15:55:54 +05:30
Clint Wylie	c221a2634b	overhaul DruidPredicateFactory to better handle 3VL (#15629 ) * overhaul DruidPredicateFactory to better handle 3VL fixes some bugs caused by some limitations of the original design of how DruidPredicateFactory interacts with 3-value logic. The primary impacted area was with how filters on values transformed with expressions or extractionFn which turn non-null values into nulls, which were not possible to be modelled with the 'isNullInputUnknown' method changes: * adds DruidObjectPredicate to specialize string, array, and object based predicates instead of using guava Predicate * DruidPredicateFactory now uses DruidObjectPredicate * introduces DruidPredicateMatch enum, which all predicates returned from DruidPredicateFactory now use instead of booleans to indicate match. This means DruidLongPredicate, DruidFloatPredicate, DruidDoublePredicate, and the newly added DruidObjectPredicate apply methods all now return DruidPredicateMatch. This allows matchers and indexes * isNullInputUnknown has been removed from DruidPredicateFactory * rename, fix test * adjust * style * npe * more test * fix default value mode to not match new test	2024-01-05 19:08:02 -08:00
Clint Wylie	d0f64608eb	sql compatible three-valued logic native filters (#15058 ) * sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true * log.warn if non-default configurations are used to guide operators towards SQL complaint behavior	2023-10-12 00:06:23 -07:00
Laksh Singla	5f86072456	Prepare master for Druid 29 (#15121 ) Prepare master for Druid 29	2023-10-11 10:33:45 +05:30
Tejaswini Bandlamudi	48b6d2abf9	skip org.owasp:dependency-check on extensions-contrib modules and suppress false-positive gRPC CVEs (#15026 )	2023-09-25 12:14:42 +05:30
Clint Wylie	36e659a501	remove group-by v1 (#14866 ) * remove group-by v1 * docs * remove unused configs, fix test * fix test * adjustments * why not * adjust * review stuff	2023-08-23 12:44:06 -07:00
imply-cheddar	748874405c	Minimize PostAggregator computations (#14708 ) * Minimize PostAggregator computations Since a change back in 2014, the topN query has been computing all PostAggregators on all intermediate responses from leaf nodes to brokers. This generates significant slow downs for queries with relatively expensive PostAggregators. This change rewrites the query that is pushed down to only have the minimal set of PostAggregators such that it is impossible for downstream processing to do too much work. The final PostAggregators are applied at the very end.	2023-08-04 00:04:31 +05:30
AmatyaAvadhanula	0412f40d36	Prepare master branch for next release, 28.0.0 (#14595 ) * Prepare master branch for next release, 28.0.0	2023-07-18 09:22:30 +05:30
Clint Wylie	1aef72aa7e	Bump up the version in pom to 27.0.0 in preparation of release (#14051 )	2023-04-10 14:56:59 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
Gian Merlino	6aca61763e	SQL: Use timestamp_floor when granularity is not safe. (#13206 ) * SQL: Use timestamp_floor when granularity is not safe. PR #12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature. * Fix test, address PR comments. * Fix ControllerImpl. * Fix test. * Fix unused import.	2022-10-17 08:22:45 -07:00
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Xavier Léauté	4c61878f9c	Reduce use of mocking and simplify some tests (#12283 ) * remove use of mocks for ServiceMetricEvent * simplify KafkaEmitterTests by moving to Mockito * speed up KafkaEmitterTest by adjusting reporting frequency in tests * remove unnecessary easymock and JUnitParams dependencies	2022-02-26 17:23:09 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Liran Funaro	08ab82f55c	IncrementalIndex Tests and Benchmarks Parametrization (#10593 ) * Remove redundant IncrementalIndex.Builder * Parametrize incremental index tests and benchmarks - Reveal and fix a bug in OffheapIncrementalIndex * Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] * Fix Intellij errors: declared exception is never thrown * Add documentation and validate before closing objects on tearDown. * Add documentation to OffheapIncrementalIndexTestSpec * Doc corrections and minor changes. * Add logging for generated rows. * Refactor new tests/benchmarks. * Improve IncrementalIndexCreator documentation * Add required tests for DataGenerator * Revert "rollupOpportunity" to be a string	2021-01-07 22:18:47 -08:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Maytas Monsereenusorn	574b062f1f	Cluster wide default query context setting (#10208 ) * Cluster wide default query context setting * Cluster wide default query context setting * Cluster wide default query context setting * add docs * fix docs * update props * fix checkstyle * fix checkstyle * fix checkstyle * update docs * address comments * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix checkstyle * fix NPE	2020-07-29 15:19:18 -07:00
Clint Wylie	c86e7ce30b	bump version to 0.20.0-SNAPSHOT (#10124 )	2020-07-06 15:08:32 -07:00
Jihoon Son	0da8ffc3ff	Bump up development version to 0.19.0-SNAPSHOT (#9586 )	2020-03-30 16:24:04 -07:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
Jihoon Son	f5b9bf5525	Cluster-wide configuration for query vectorization (#8657 ) * Cluster-wide configuration for query vectorization * add doc * fix build * fix doc * rename to QueryConfig and add javadoc * fix checkstyle * fix variable names	2019-10-23 21:44:28 +08:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00
Gian Merlino	77297f4e6f	GroupBy array-based result rows. (#8196 ) * GroupBy array-based result rows. Fixes #8118; see that proposal for details. Other than the GroupBy changes, the main other "interesting" classes are: - ResultRow: The array-based result type. - BaseQuery: T is no longer required to be Comparable. - QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization and deserialization of result rows (necessary due to their positional nature). - QueryResource: Uses the new decoration functionality. - DirectDruidClient: Also uses the new decoration functionality. - QueryMaker (in Druid SQL): Modifications to read ResultRows. These classes weren't changed, but got some new javadocs: - BySegmentQueryRunner - FinalizeResultsQueryRunner - Query * Adjustments for TC stuff.	2019-07-31 16:15:12 -07:00
Chi Cao Minh	ab71a2e1e4	Revert "Fix dependency analyze warnings (#8128 )" (#8189 ) This reverts commit `5dd0d8e873`.	2019-07-29 11:42:16 -07:00
Chi Cao Minh	5dd0d8e873	Fix dependency analyze warnings (#8128 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports. * Fix licenses and dependencies * Fix licenses and dependencies again * Fix integration test dependency * Address review comments * Fix unit test dependencies * Fix integration test dependency * Fix integration test dependency again * Fix integration test dependency third time * Fix integration test dependency fourth time * Fix compile error * Fix assert package	2019-07-26 10:49:03 -07:00
Eugene Sevastianov	799d20249f	Response context refactoring (#8110 ) * Response context refactoring * Serialization/Deserialization of ResponseContext * Added java doc comments * Renamed vars related to ResponseContext * Renamed empty() methods to createEmpty() * Fixed ResponseContext usage * Renamed multiple ResponseContext static fields * Added PublicApi annotations * Renamed QueryResponseContext class to ResourceIOReaderWriter * Moved the protected method below public static constants * Added createEmpty method to ResponseContext with DefaultResponseContext creation * Fixed inspection error * Added comments to the ResponseContext length limit and ResponseContext http header name * Added a comment of possible future refactoring * Removed .gitignore file of indexing-service * Removed a never-used method * VisibleForTesting method reducing boilerplate Co-Authored-By: Clint Wylie <cjwylie@gmail.com> * Reduced boilerplate * Renamed the method serialize to serializeWith * Removed unused import * Fixed incorrectly refactored test method * Added comments for ResponseContext keys * Fixed incorrectly refactored test method * Fixed IntervalChunkingQueryRunnerTest mocks	2019-07-24 18:29:03 +03:00
Jihoon Son	7abfbb066a	Bump up snapshot version to 0.16.0 (#7802 )	2019-05-30 17:17:33 -07:00
Roman Leventov	782863ed0f	Fix some problems reported by PVS-Studio (#7738 ) * Fix some problems reported by PVS-Studio * Address comments	2019-05-29 11:20:45 -07:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00

1 2

93 Commits