druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	38ac71ee56	one version of mockito is more than enough (#13871 )	2023-03-01 23:27:18 -08:00
Nicholas Lippis	d32dc1b0c9	Remove K8sOverlordConfig.java (#13866 )	2023-03-02 09:43:48 +05:30
Clint Wylie	08b5951cc5	merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698 ) * merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything * fix poms and license stuff * mockito is evil * allow reset of JvmUtils RuntimeInfo if tests used static injection to override	2023-02-17 14:27:41 -08:00
Churro	c1f283fd31	Better sidecar support (#13655 ) * Better sidecar support * remove un-thrown exception from test * Druid you are such a stickler about spelling :) * Only require the primaryContainerName, no need to exclude containers	2023-02-14 10:56:15 +05:30
AmatyaAvadhanula	0cf1fc3d55	Indexing on multiple disks (#13476 ) * Initial commit * Simple UTs * Parameterize tests * Parameterized tests for k8s task runner * Fix restore bug * Refactor TaskStorageDirTracker * Change CliPeon args	2023-02-08 11:31:34 +05:30
Clint Wylie	2d3bee8545	various nested column (and other) fixes (#13732 ) changes: * modified druid schema column type compution to special case COMPLEX<json> handling to choose COMPLEX<json> if any column in any segment is COMPLEX<json> * NestedFieldVirtualColumn can now work correctly on any type of column, returning either a column selector if a root path, or nil selector if not * fixed a random bug with NilVectorSelector when using a vector size larger than the default and druid.generic.useDefaultValueForNull=false would have the nulls vector set to all false instead of true * fixed an overly aggressive check in ExprEval.ofType when handling complex types which would try to treat any string as base64 without gracefully falling back if it was not in fact base64 encoded, along with special handling for complex<json> * added ExpressionVectorSelectors.castValueSelectorToObject and ExpressionVectorSelectors.castObjectSelectorToNumeric as convience methods to cast vector selectors using cast expressions without the trouble of constructing an expression. the polymorphic nature of the non-vectorized engine (and significantly larger overhead of non-vectorized expression processing) made adding similar methods for non-vectorized selectors less attractive and so have not been added at this time * fix inconsistency between nested column indexer and serializer in handling values (coerce non primitive and non arrays of primitives using asString) * ExprEval best effort mode now handles byte[] as string * added test for ExprEval.bestEffortOf, and add missing conversion cases that tests uncovered * more tests more better	2023-02-06 19:48:02 -08:00
Kashif Faraz	78ae0b7533	Upgrade to netty 4.1.86.Final to address CVEs (#13604 ) This commit addresses the following CVEs: - CVE-2021-43797 - CVE-2022-41881	2022-12-23 01:44:01 +05:30
Kashif Faraz	58a3acc2c4	Add InputStats to track bytes processed by a task (#13520 ) This commit adds a new class `InputStats` to track the total bytes processed by a task. The field `processedBytes` is published in task reports along with other row stats. Major changes: - Add class `InputStats` to track processed bytes - Add method `InputSourceReader.read(InputStats)` to read input rows while counting bytes. > Since we need to count the bytes, we could not just have a wrapper around `InputSourceReader` or `InputEntityReader` (the way `CountableInputSourceReader` does) because the `InputSourceReader` only deals with `InputRow`s and the byte information is already lost. - Classic batch: Use the new `InputSourceReader.read(inputStats)` in `AbstractBatchIndexTask` - Streaming: Increment `processedBytes` in `StreamChunkParser`. This does not use the new `InputSourceReader.read(inputStats)` method. - Extend `InputStats` with `RowIngestionMeters` so that bytes can be exposed in task reports Other changes: - Update tests to verify the value of `processedBytes` - Rename `MutableRowIngestionMeters` to `SimpleRowIngestionMeters` and remove duplicate class - Replace `CacheTestSegmentCacheManager` with `NoopSegmentCacheManager` - Refactor `KafkaIndexTaskTest` and `KinesisIndexTaskTest`	2022-12-13 18:54:42 +05:30
somu-imply	7682b0b6b1	Analysis refactor (#13501 ) Refactor DataSource to have a getAnalysis method() This removes various parts of the code where while loops and instanceof checks were being used to walk through the structure of DataSource objects in order to build a DataSourceAnalysis. Instead we just ask the DataSource for its analysis and allow the stack to rebuild whatever structure existed.	2022-12-12 17:35:44 -08:00
Paul Rogers	b76ff16d00	SQL test framework extensions (#13426 ) SQL test framework extensions * Capture planner artifacts: logical plan, etc. * Planner test builder validates the logical plan * Validation for the SQL resut schema (we already have validation for the Druid row signature) * Better Guice integration: properties, reuse Guice modules * Avoid need for hand-coded expr, macro tables * Retire some of the test-specific query component creation * Fix query log hook race condition	2022-12-02 09:11:59 -08:00
Kashif Faraz	8ff1b2d5d4	Revert "Add filter in cloud object input source for backward compatibility (#13437 )" (#13450 ) This reverts commit `b12e5f300e`.	2022-11-30 16:33:05 +05:30
Tejaswini Bandlamudi	b12e5f300e	Add filter in cloud object input source for backward compatibility (#13437 ) https://github.com/apache/druid/pull/13027 PR replaces `filter` parameter with `objectGlob` in ingestion input source. However, this will cause existing ingestion jobs to fail if they are using a filter already. This PR adds old filter functionality alongside objectGlob to preserve backward compatibility.	2022-11-28 23:04:33 +05:30
Kashif Faraz	7cf761cee4	Prepare master branch for next release, 26.0.0 (#13401 ) * Prepare master branch for next release, 26.0.0 * Use docker image for druid 24.0.1 * Fix version in druid-it-cases pom.xml	2022-11-22 15:31:01 +05:30
imply-cheddar	6b9344cd39	Persist legacy LatestPairs for now (#13378 ) We added compression to the latest/first pair storage, but the code change was forcing new things to be persisted with the new format, meaning that any segment created with the new code cannot be read by the old code. Instead, we need to default to creating the old format and then remove that default in a future version.	2022-11-17 21:37:02 +05:30
Didip Kerabat	56d5c9780d	Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects (#13027 ) * Use standard library to correctly glob and stop at the correct folder structure when filtering cloud objects. Removed: import org.apache.commons.io.FilenameUtils; Add: import java.nio.file.FileSystems; import java.nio.file.PathMatcher; import java.nio.file.Paths; * Forgot to update CloudObjectInputSource as well. * Fix tests. * Removed unused exceptions. * Able to reduced user mistakes, by removing the protocol and the bucket on filter. * add 1 more test. * add comment on filterWithoutProtocolAndBucket * Fix lint issue. * Fix another lint issue. * Replace all mention of filter -> objectGlob per convo here: https://github.com/apache/druid/pull/13027#issuecomment-1266410707 * fix 1 bad constructor. * Fix the documentation. * Don’t do anything clever with the object path. * Remove unused imports. * Fix spelling error. * Fix incorrect search and replace. * Addressing Gian’s comment. * add filename on .spelling * Fix documentation. * fix documentation again Co-authored-by: Didip Kerabat <didip@apple.com>	2022-11-10 23:46:40 -08:00
AmatyaAvadhanula	a2013e6566	Enhance streaming ingestion metrics (#13331 ) Changes: - Add a metric for partition-wise kafka/kinesis lag for streaming ingestion. - Emit lag metrics for streaming ingestion when supervisor is not suspended and state is in {RUNNING, IDLE, UNHEALTHY_TASKS, UNHEALTHY_SUPERVISOR} - Document metrics	2022-11-09 23:44:15 +05:30
Paul Rogers	7e600d2c63	Enhancements to the Calcite test framework (#13283 ) * Enhancements to the Calcite test framework * Standardize "Unauthorized" messages * Additional test framework extension points * Resolved joinable factory dependency issue	2022-11-08 14:28:49 -08:00
Churro	9a684af3c9	Fixing the K8s task runner to work with MSQ (#13305 ) * Fixing the K8s task runner to work with MSQ * Sorry incomplete PR Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>	2022-11-08 14:41:05 +05:30
dependabot[bot]	081508f1aa	Bump commons-text from 1.9 to 1.10.0 in /extensions-contrib/kubernetes-overlord-extensions (#13299 ) * Bump commons-text in /extensions-contrib/kubernetes-overlord-extensions Bumps commons-text from 1.9 to 1.10.0. --- updated-dependencies: - dependency-name: org.apache.commons:commons-text dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Cleanup pom Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Frank Chen <frank.chen021@outlook.com>	2022-11-05 15:21:39 +08:00
DENNIS	c5fcc03bdf	PrometheusEmitter NullPointerException fix (#13286 ) * PrometheusEmitter NullPointerException fix * Improved null value judgment in pushMetric * Delete meaningless judgments about namespace * Delete unnecessary @Nullable above namespace attribute	2022-11-03 18:50:27 +08:00
Dr. Sizzles	e5ad24ff9f	Support for middle manager less druid, tasks launch as k8s jobs (#13156 ) * Support for middle manager less druid, tasks launch as k8s jobs * Fixing forking task runner test * Test cleanup, dependency cleanup, intellij inspections cleanup * Changes per PR review Add configuration option to disable http/https proxy for the k8s client Update the docs to provide more detail about sidecar support * Removing un-needed log lines * Small changes per PR review * Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly * Merge conflict fix * Fixing tests and docs * update tiny-cluster.yaml changed `enableTaskLevelLogPush` to `encapsulatedTask` * Apply suggestions from code review Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Minor changes per PR request * Cleanup, adding test to AbstractTask * Add comment in peon.sh * Bumping code coverage * More tests to make code coverage happy * Doh a duplicate dependnecy * Integration test setup is weird for k8s, will do this in a different PR * Reverting back all integration test changes, will do in anotbher PR * use StringUtils.base64 instead of Base64 * Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different Co-authored-by: Rahul Gidwani <r_gidwani@apple.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2022-11-02 19:44:47 -07:00
Paul Rogers	86e6e61e88	Modular Calcite Test Framework (#12965 ) * Refactor Calcite test "framework" for planner tests Refactors the current Calcite tests to make it a bit easier to adjust the set of runtime objects used within a test. * Move data creation out of CalciteTests into TestDataBuilder * Move "framework" creation out of CalciteTests into a QueryFramework * Move injector-dependent functions from CalciteTests into QueryFrameworkUtils * Wrapper around the planner factory, etc. to allow customization. * Bulk of the "framework" created once per class rather than once per test. * Refactor tests to use a test builder * Change all testQuery() methods to use the test builder. Move test execution & verification into a test runner.	2022-10-20 15:45:44 -07:00
Gian Merlino	6aca61763e	SQL: Use timestamp_floor when granularity is not safe. (#13206 ) * SQL: Use timestamp_floor when granularity is not safe. PR #12944 added a check at the execution layer to avoid materializing excessive amounts of time-granular buckets. This patch modifies the SQL planner to avoid generating queries that would throw such errors, by switching certain plans to use the timestamp_floor function instead of granularities. This applies both to the Timeseries query type, and the GroupBy timestampResultFieldGranularity feature. The patch also goes one step further: we switch to timestamp_floor not just in the ETERNITY + non-ALL case, but also if the estimated number of time-granular buckets exceeds 100,000. Finally, the patch modifies the timestampResultFieldGranularity field to consistently be a String rather than a Granularity. This ensures that it can be round-trip serialized and deserialized, which is useful when trying to execute the results of "EXPLAIN PLAN FOR" with GroupBy queries that use the timestampResultFieldGranularity feature. * Fix test, address PR comments. * Fix ControllerImpl. * Fix test. * Fix unused import.	2022-10-17 08:22:45 -07:00
Paul Rogers	f4dcc52dac	Redesign QueryContext class (#13071 ) We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available: druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks. druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks. Both are set using JSON list format: druid.auth.securedContextKeys=["secretKey1", "secretKey2"] You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys. In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally: sqlQueryId sqlStringifyArrays	2022-10-15 11:02:11 +05:30
zachjsh	2f2fe20089	Improve global-cached-lookups metric reporting (#13219 ) It was found that the namespace/cache/heapSizeInBytes metric that tracks the total heap size in bytes of all lookup caches loaded on a service instance was being under reported. We were not accounting for the memory overhead of the String object, which I've found in testing to be ~40 bytes. While this overhead may be java version dependent, it should not vary much, and accounting for this provides a better estimate. Also fixed some logging, and reading bytes from the JDBI result set a little more efficient by saving hash table lookups. Also added some of the lookup metrics to the default statsD emitter metric whitelist.	2022-10-13 18:51:54 -04:00
Sam Rash	80e10ffe22	CompressedBigDecimal Min/Max (#13141 ) This adds min/max functions for CompressedBigDecimal. It exposes these functions via sql (BIG_MAX, BIG_MIN--see the SqlAggFunction implementations). It also includes various bug fixes and cleanup to the original CompressedBigDecimal code include the AggregatorFactories. Various null handling was improved. Additional test cases were added for both new and existing code including a base test case for AggregationFactories. Other tests common across sum,min,max may be refactored also to share the varoius cases in the future.	2022-10-11 16:35:21 -07:00
Frank Chen	d30cf8c308	Dependency cleanup (#13194 ) * Clean up dependency in extensions * Bump protobuf/aws.sdk * Bump aws-sdk to 1.12.317 * Fix CI * Fix CI * Update license * Update license	2022-10-10 20:34:38 +08:00
Abhishek Agarwal	e3f9a0ed44	Lazy initialization of segment killers, movers and archivers (#13170 ) * Lazy initialization of segment killers, movers and archivers * Add test for lazy killer * Add more tests * Intellij fixes	2022-10-04 15:55:46 +05:30
Sam Rash	28b9edc2a8	Add BIG_SUM SQL function (#13102 ) This adds a sql function, "BIG_SUM", that uses CompressedBigDecimal to do a sum. Other misc changes: 1. handle NumberFormatExceptions when parsing a string (default to set to 0, configurable in agg factory to be strict and throw on error) 2. format pom file (whitespace) + add dependency 3. scaleUp -> scale and always require scale as a parameter	2022-09-26 18:02:25 -07:00
Jonathan Wei	1f1fced6d4	Add JsonInputFormat option to assume newline delimited JSON, improve parse exception handling for multiline JSON (#13089 ) * Add JsonInputFormat option to assume newline delimited JSON, improve handling for non-NDJSON * Fix serde and docs * Add PR comment check	2022-09-26 19:51:04 -05:00
Sam Rash	044cab5094	Optimize CompressedBigDecimal compareTo() (#13086 ) Optimizes the compareTo() function in CompressedBigDecimal. It directly compares the int[] rather than creating BigDecimal objects and using its compareTo. It handles unequal sized CBDs, but does require the scales to match.	2022-09-21 20:31:02 -07:00
sr	54a2eb7dcc	Compressed Big Decimal Cleanup and Extension (#13048 ) 1. remove unnecessary generic type from CompressedBigDecimal 2. support Number input types 3. support aggregator reading supported input types directly (uningested data) 4. fix scaling bug in buffer aggregator	2022-09-13 19:14:31 -07:00
Frank Chen	fd6c05eee8	Avoid ClassCastException when getting values from `QueryContext` (#13022 ) * Use safe conversion methods * Rename method * Add getContextAsBoolean * Update test case * Remove generic from getContextValue * Update catch-handler * Add test * Resolve comments * Replace 'getContextXXX' to 'getQueryContext().getAsXXXX'	2022-09-13 18:00:09 +08:00
DENNIS	dced61645f	prometheus-emitter supports sending metrics to pushgateway regularly … (#13034 ) * prometheus-emitter supports sending metrics to pushgateway regularly and continuously * spell check fix * Optimization variable name and related documents * Update docs/development/extensions-contrib/prometheus.md OK, it looks more conspicuous Co-authored-by: Frank Chen <frankchen@apache.org> * Update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Frank Chen <frankchen@apache.org> * When PrometheusEmitter is closed, close the scheduler * Ensure that registeredMetrics is thread safe. * Local variable name optimization * Remove unnecessary white space characters Co-authored-by: Frank Chen <frankchen@apache.org>	2022-09-09 20:46:14 +08:00
Frank Chen	d57557d51d	Improve doc and configuration of prometheus emitter (#13028 ) * Improve doc and validation * Add configuration for peon tasks * Update doc * Update test case * Fix typo * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2022-09-09 02:20:34 +08:00
Didip Kerabat	66545a0f3d	Fix compiler error: The project was not built since its build path is incomplete. Cannot find the class file for org.slf4j.Logger. Fix the build path then try building this project (#13029 ) Co-authored-by: Didip Kerabat <didip@apple.com>	2022-09-06 20:49:41 +05:30
senthilkv	3d9aef225d	compressed big decimal - module (#10705 ) Compressed Big Decimal is an extension which provides support for Mutable big decimal value that can be used to accumulate values without losing precision or reallocating memory. This type helps in absolute precision arithmetic on large numbers in applications, where greater level of accuracy is required, such as financial applications, currency based transactions. This helps avoid rounding issues where in potentially large amount of money can be lost. Accumulation requires that the two numbers have the same scale, but does not require that they are of the same size. If the value being accumulated has a larger underlying array than this value (the result), then the higher order bits are dropped, similar to what happens when adding a long to an int and storing the result in an int. A compressed big decimal that holds its data with an embedded array. Compressed big decimal is an absolute number based complex type based on big decimal in Java. This supports all the functionalities supported by Java Big Decimal. Java Big Decimal is not mutable in order to avoid big garbage collection issues. Compressed big decimal is needed to mutate the value in the accumulator.	2022-09-06 00:06:57 -07:00
Abhishek Agarwal	618757352b	Bump up the version to 25.0.0 (#12975 ) * Bump up the version to 25.0.0 * Fix the version in console	2022-08-29 11:27:38 +05:30
Karan Kumar	275f834b2a	Race in Task report/log streamer (#12931 ) * Fixing RACE in HTTP remote task Runner * Changes in the interface * Updating documentation * Adding test cases to SwitchingTaskLogStreamer * Adding more tests	2022-08-25 17:56:01 -07:00
Bartosz Mikulski	0bc9f9f303	#12912 Fix KafkaEmitter not emitting queryType for a native query (#12915 ) Fixes KafkaEmitter not emitting queryType for a native query. The Event to JSON serialization was extracted to the external class: EventToJsonSerializer. This was done to simplify the testing logic for the serialization as well as extract the responsibility of serialization to the separate class. The logic builds ObjectNode incrementally based on the event .toMap method. Parsing each entry individually ensures that the Jackson polymorphic annotations are respected. Not respecting these annotation caused the missing of the queryType from output event.	2022-08-24 14:07:00 +05:30
Paul Rogers	41712b7a3a	Refactor SqlLifecycle into statement classes (#12845 ) * Refactor SqlLifecycle into statement classes Create direct & prepared statements Remove redundant exceptions from tests Tidy up Calcite query tests Make PlannerConfig more testable * Build fixes * Added builder to SqlQueryPlus * Moved Calcites system properties to saffron.properties * Build fix * Resolve merge conflict * Fix IntelliJ inspection issue * Revisions from reviews Backed out a revision to Calcite tests that didn't work out as planned * Build fix * Fixed spelling errors * Fixed failed test Prepare now enforces security; before it did not. * Rebase and fix IntelliJ inspections issue * Clean up exception handling * Fix handling of JDBC auth errors * Build fix * More tweaks to security messages	2022-08-14 00:44:08 -07:00
Lucas Capistrant	3a3271eddc	Introduce defaultOnDiskStorage config for Group By (#12833 ) * Introduce defaultOnDiskStorage config for groupBy * add debug log to groupby query config * Apply config change suggestion from review * Remove accidental new lines * update default value of new default disk storage config * update debug log to have more descriptive text * Make maxOnDiskStorage and defaultOnDiskStorage HumanRedadableBytes * improve test coverage * Provide default implementation to new default method on advice of reviewer	2022-08-12 09:40:21 -07:00
Karan Kumar	607b0b9310	Adding withName implementation to AggregatorFactory (#12862 ) * Adding agg factory with name impl * Adding test cases * Fixing test case * Fixing test case * Updated java docs.	2022-08-08 18:31:56 +05:30
Jianhuan Liu	d4403c15aa	Upgrade prometheus version, add more labels to PrometheusEmitter (#12769 ) Changes: - Upgrade prometheus to version 0.16.0 - Add optional labels `druid_service` and `host_name` to `PrometheusEmitter`	2022-07-15 14:43:12 +05:30
zachjsh	c0380e7b0a	* fix duplicate dimension (#12778 )	2022-07-14 10:39:03 +05:30
Rohan Garg	bb953be09b	Refactor usage of JoinableFactoryWrapper + more test coverage (#12767 ) Refactor usage of JoinableFactoryWrapper to add e2e test for createSegmentMapFn with joinToFilter feature enabled	2022-07-12 06:25:36 -07:00
Didip Kerabat	48fd2e6400	Add missing metrics into statsd-reporter. (#12762 )	2022-07-08 23:13:06 -07:00
Didip Kerabat	6ddb828c7a	Able to filter Cloud objects with glob notation. (#12659 ) In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable. Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord. This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files. I am using the glob notation to be consistent with the LocalFirehose syntax.	2022-06-24 11:40:08 +05:30
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Abhishek Agarwal	59a0c10c47	Add remedial information in error message when type is unknown (#12612 ) Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.	2022-06-07 20:22:45 +05:30
dependabot[bot]	86d01b3681	Bump opentelemetry-instrumentation-bom-alpha (#12531 ) Bumps [opentelemetry-instrumentation-bom-alpha](https://github.com/open-telemetry/opentelemetry-java-instrumentation) from 1.7.0-alpha to 1.14.0-alpha. - [Release notes](https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-java-instrumentation/commits) --- updated-dependencies: - dependency-name: io.opentelemetry.instrumentation:opentelemetry-instrumentation-bom-alpha dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-01 13:51:39 -07:00
Gian Merlino	4631cff2a9	Free ByteBuffers in tests and fix some bugs. (#12521 ) * Ensure ByteBuffers allocated in tests get freed. Many tests had problems where a direct ByteBuffer would be allocated and then not freed. This is bad because it causes flaky tests. To fix this: 1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder. This makes it easy to free the direct buffer. Currently, it's only used in tests, because production code seems OK. 2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either to ByteBuffer.allocate (on-heap, which are garbaged collected), or to ByteBufferUtils.allocateDirect (wherever it seemed like there was a good reason for the buffer to be off-heap). Make sure to close all direct holders when done. * Changes based on CI results. * A different approach. * Roll back BitmapOperationTest stuff. * Try additional surefire memory. * Revert "Roll back BitmapOperationTest stuff." This reverts commit `49f846d9e3`. * Add TestBufferPool. * Revert Xmx change in tests. * Better behaved NestedQueryPushDownTest. Exit tests on OOME. * Fix TestBufferPool. * Remove T1C from ARM tests. * Somewhat safer. * Fix tests. * Fix style stuff. * Additional debugging. * Reset null / expr configs better. * ExpressionLambdaAggregatorFactory thread-safety. * Alter forkNode to try to get better info when a JVM crashes. * Fix buffer retention in ExpressionLambdaAggregatorFactory. * Remove unused import.	2022-05-19 07:42:29 -07:00
Rohan Garg	2dd073c2cd	Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation (#12484 ) * Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * fixup! Pass metrics object for Scan, Timeseries and GroupBy queries during cursor creation * Document vectorized dimension	2022-05-09 10:40:17 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
zachjsh	564d6defd4	Worker level task metrics (#12446 ) * * fix metric name inconsistency * * add task slot metrics for middle managers * * add new WorkerTaskCountStatsMonitor to report task count metrics from worker * * more stuff * * remove unused variable * * more stuff * * add javadocs * * fix checkstyle * * fix hadoop test failure * * cleanup * * add more code coverage in tests * * fix test failure * * add docs * * increase code coverage * * fix spelling * * fix failing tests * * remove dead code * * fix spelling	2022-04-26 11:44:44 -05:00
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
dependabot[bot]	ee44fe45c6	Bump java-dogstatsd-client from 2.13.0 to 4.0.0 (#12353 ) * Bump java-dogstatsd-client from 2.13.0 to 4.0.0 Bumps [java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client) from 2.13.0 to 4.0.0. - [Release notes](https://github.com/DataDog/java-dogstatsd-client/releases) - [Changelog](https://github.com/DataDog/java-dogstatsd-client/blob/master/CHANGELOG.md) - [Commits](https://github.com/DataDog/java-dogstatsd-client/compare/v2.13.0...v4.0.0) * migrate statsd-emitter tests from easymock to mockito * add simple init test to make diff coverage happy Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2022-03-26 16:25:13 -07:00
syacobovitz	d7308e9290	Added support in urls, and grouped metrics (#12296 )	2022-03-22 11:22:05 -07:00
Aurélien Dunand	8f3a631cbf	Fix missing conversionFactor in prometheus emitter (#12338 ) query/node/ttfb metrics are in milliseconds.	2022-03-17 21:46:06 -07:00
Xavier Léauté	5d02a91faa	upgrade Error Prone to 2.11 (requires Java 11) (#12306 ) The latest version of Error Prone now requires Java 11. Upgrading means we can remove a lot of the maven profile complexity required to run checks with Java 8. This also requires switching our strict build to use Java 11. * update error-prone to 2.11 * remove need for specific maven profiles for Java 8 and Java 15 * fix additional Error Prone warnings with Java 11 * update strict build to use Java 11	2022-03-14 19:40:48 -07:00
Xavier Léauté	4c61878f9c	Reduce use of mocking and simplify some tests (#12283 ) * remove use of mocks for ServiceMetricEvent * simplify KafkaEmitterTests by moving to Mockito * speed up KafkaEmitterTest by adjusting reporting frequency in tests * remove unnecessary easymock and JUnitParams dependencies	2022-02-26 17:23:09 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Clint Wylie	3ee66bb492	allow optimizing sql expressions and virtual columns (#12241 ) * rework sql planner expression and virtual column handling * simplify a bit * add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests * spotbugs * fix bugs with multi-value string array expression handling * javadocs and adjust test * better * fix tests	2022-02-09 14:55:50 -08:00
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
JoyKing	ac87bdd736	fix typo in materialized view (#12174 )	2022-01-22 11:32:22 +08:00
Uwe Schindler	1f7dd6d86c	Forbiddenapis: Split the guava16-only signatures file from main signatures file (#12170 )	2022-01-19 17:50:28 -08:00
Ivan Vankovich	6a93872586	OpenTelemetry emitter extension (#12015 ) * Add OpenTelemetry emitter extension * Fix build * Fix checkstyle * Add used undeclared dependencies * Ignore unused declared dependencies	2022-01-15 12:18:04 +08:00
Jonathan Wei	229f82a6f0	Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961 ) * Add parse error list API for stream supervisors, simplify parse exception message * Add input string to parse exception * Use structured ParseExceptionReport * Fix tests * Add test * PR comments, add ParseExceptionReport equals verifier * Fix test	2021-12-09 15:42:55 -06:00
Paul Rogers	34a3d45737	Refactor ResponseContext (#11828 ) * Refactor ResponseContext Fixes a number of issues in preparation for request trailers and the query profile. * Converts keys from an enum to classes for smaller code * Wraps stored values in functions for easier capture for other uses * Reworks the "header squeezer" to handle types other than arrays. * Uses metadata for visibility, and ability to compress, to replace ad-hoc code. * Cleans up JSON serialization for the response context. * Other miscellaneous cleanup. * Handle unknown keys in deserialization Also, make "Visibility" into a boolean. * Revised comment * Renamd variable	2021-12-06 17:03:12 -08:00
Jihoon Son	1f052b43c5	Better serverView exec name; remove SingleServerInventoryView (#11770 ) Druid currently has 2 serverViews, regular serverView and filtered serverView. The regular serverView is used to monitor all segment announcements from all data nodes (historicals, tasks, indexers). The filtered serverView is used when you want to watch segment announcements from particular tiers. Since these server views keep track of different sets of druidServers and segments in memory, they should be maintained separately. However, they currently share the same name for their executorService, which can cause confusion and make debugging harder especially in the broker since it is using both serverViews, the filtered view for normal query processing and the regular view to serve the servers table (I'm unsure whether this is intended or whether this is a good behavior). This PR changes it to a more obvious name. This PR also removes SingleServerInventoryView. This view was deprecated a long time ago and has not been documented at least since 0.13 (#6127). I also don't think this can be better in any case than BatchServerInventoryView. Finally, I merged AbstractCuratorServerInventoryView and BatchServerInventoryView as we no longer need AbstractCuratorServerInventoryView after SingleServerInventoryView is removed.	2021-12-04 18:43:05 +05:30
Jihoon Son	fc9513b6cd	Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012 ) * Make nodeRole available during binding; add support for dynamic registration of DruidService * fix checkstyle and test * fix customRole test * address comments * add more javadoc	2021-12-03 11:59:00 -08:00
Gian Merlino	3d72e66f56	Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582 ) * Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. This patch gathers together a variety of SQL from SqlSegmentsMetadataManager and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery. It focuses on SQL related to retrieving segment payloads and marking segments used and unused. In addition to cleaning up the code a bit, this patch also fixes a bug with years before 0 or after 9999. The prior SQL did not work properly because dates outside this range cannot be compared as strings. The new code does work for these far-past and far-future years. So, if you're ever interested in using Druid to analyze things from ancient Babylon, you better apply this patch first! * Fix test compiling. * Fixes and improvements. * Fix forbidden API. * Additional fixes.	2021-11-24 14:51:53 -08:00
XIAO WANG	f1cf1c8f39	update count distinct tests (#11927 ) Co-authored-by: wangxiao060 <wangxiao060@ke.com>	2021-11-22 21:34:00 +08:00
Clint Wylie	f260bbed23	restore and deprecate AggregatorFactory methods (#11917 ) * add back and deprecate aggregator factory methods so i can say i told you so when i delete these later * rename to make less ambiguous, fix fill method * adjust	2021-11-19 15:59:35 -08:00
Nikhil Navadiya	3c51136098	Add worker category dimension (#11554 ) * Add worker category as dimension in TaskSlotCountStatsMonitor * Change description * Add workerConfig as field * Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics * Fixing tests * Fixing alerts * Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs * Resolving false positive spell check * addressing comments * throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>	2021-11-18 22:59:07 -08:00
Clint Wylie	a8805ab60d	add missing json type for ListFilteredVirtualColumn (#11887 ) * add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again * fixes * ugly, but maybe this * oops * too many mappers	2021-11-09 17:25:12 -08:00
Gian Merlino	babf00f8e3	Migrate File.mkdirs to FileUtils.mkdirp. (#11879 ) * Migrate File.mkdirs to FileUtils.mkdirp. * Remove unused imports. * Fix LookupReferencesManager. * Simplify. * Also migrate usages of forceMkdir. * Fix var name. * Fix incorrect call. * Update test.	2021-11-09 11:10:49 -08:00
Jian Wang	8e7e679984	Add more metrics for Jetty server thread pool usage (#11113 ) Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.	2021-11-07 16:51:44 +05:30
Karan Kumar	90640bb316	Support for hadoop 3 via maven profiles (#11794 ) Add support for hadoop 3 profiles . Most of the details are captured in #11791 . We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.	2021-10-30 22:46:24 +05:30
Clint Wylie	741b4ed516	add output type information to ExpressionPostAggregator (#11818 ) * add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types * add test for test for coverage * simplify * Remove unused imports. Co-authored-by: Gian Merlino <gian@imply.io>	2021-10-22 13:52:51 -07:00
Clint Wylie	187df58e30	better types (#11713 ) * better type system * needle in a haystack * ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support * fixup merge * more test * fixup * intern * fix * oops * oops again * ... * more test coverage * fix error message * adjust interning, more javadocs * oops * more docs more better	2021-10-19 01:47:25 -07:00
Rohan Garg	3c46577eec	Fix moving average extension loading in middle manager and overlord (#11662 )	2021-09-08 22:09:22 -07:00
Clint Wylie	fe1d8c206a	bump version to 0.23.0-SNAPSHOT (#11670 )	2021-09-08 15:56:04 -07:00
Frank Chen	c7e5fee452	Fix an exception when using redis cluster as cache (#11369 ) * Redis mget problem in cluster mode * Format code * push down implementation of getBulk to sub-classes * Add tests * revert some changes * Fix intelllij inspections * Fix comments Signed-off-by: frank chen <frank.chen021@outlook.com> * Update extensions-contrib/redis-cache/src/main/java/org/apache/druid/client/cache/RedisClusterCache.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update extensions-contrib/redis-cache/src/test/java/org/apache/druid/client/cache/RedisClusterCacheTest.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * Update extensions-contrib/redis-cache/src/main/java/org/apache/druid/client/cache/AbstractRedisCache.java Co-authored-by: Benedict Jin <asdf2014@apache.org> * returns empty map in case of internal exception Co-authored-by: Benedict Jin <asdf2014@apache.org>	2021-08-30 16:59:53 -07:00
imply-jhan	332e68edb5	improve the metric definition (#11602 )	2021-08-17 12:31:42 +07:00
Parag Jain	c7b46671b3	option to use deep storage for storing shuffle data (#11507 ) Fixes #11297. Description Description and design in the proposal #11297 Key changed/added classes in this PR DataSegmentPusher ShuffleClient PartitionStat PartitionLocation *IntermediaryDataManager	2021-08-13 16:40:25 -04:00
dependabot[bot]	eceacf74c0	Bump java-dogstatsd-client from 2.6.1 to 2.13.0 (#11533 ) Bumps [java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client) from 2.6.1 to 2.13.0. - [Release notes](https://github.com/DataDog/java-dogstatsd-client/releases) - [Changelog](https://github.com/DataDog/java-dogstatsd-client/blob/master/CHANGELOG.md) - [Commits](https://github.com/DataDog/java-dogstatsd-client/compare/java-dogstatsd-client-2.6.1...v2.13.0) --- updated-dependencies: - dependency-name: com.datadoghq:java-dogstatsd-client dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-08-03 17:53:45 -07:00
Agustin Gonzalez	a2da407b70	Add error msg to parallel task's TaskStatus (#11486 ) * Add error msg to parallel task's TaskStatus * Consolidate failure block * Add failure test * Make it fail * Add fail while stopped * Simplify hash task test using a runner that fails after so many runs (parameter) * Remove unthrown exception * Use runner names to identify phase * Added range partition kill test & fixed a timing bug with the custom runner * Forbidden api * Style * Unit test code cleanup * Added message to invalid state exception and improved readability of the phase error messages for the parallel task failure unit tests	2021-08-02 12:11:28 -07:00
Xavier Léauté	4bca7f014e	update error-prone to 2.8.0 with fix for crashing check (#11494 ) * error-prone 2.8.0 fixes https://github.com/google/error-prone/issues/2396 * fix for a few ignored return values * fix unknown args in sub-modules	2021-07-29 09:13:46 -07:00
Lucas Capistrant	9767b42e85	Add a new metric query/segments/count that is not emitted by default (#11394 ) * Add a new metric query/segments/count that is not emitted by default * docs * test the default implementation of the metric * fix spelling error in docs * document the fact that query retries will result in additional metric emissions * update using recommended text from @jihoonson	2021-07-22 17:57:35 -07:00
Maytas Monsereenusorn	8d7d60d18e	Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440 ) * fix pendingTaskBased * fix doc * address comments * address comments * address comments * address comments * address comments * address comments * address comments	2021-07-15 06:52:25 +07:00
Yi Yuan	de8daf8139	Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351 ) * delete_buildV9Directly_in_kafka_and_kinesis_indexing_service * delete * delete them from server * delete buildV9Directly from hadoop indexing * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-06-23 16:36:46 -07:00
Xavier Léauté	a1c20d7457	update jackson dependencies to use bom (#11353 ) Switching to the bom dependency declaration simplifies managing jackson dependencies. It also removes the need to override individual library versions for CVE fixes, since the bom takes care of that internally. This change aligns our jackson dependency versions on 2.10.5(.x): - updates jackson libraries from 2.10.2 to 2.10.5 - jackson-databind remains at 2.10.5.1 as defined in the bom Release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10	2021-06-16 18:37:30 -07:00
dependabot[bot]	167044f715	Bump fastutil from 8.2.3 to 8.5.4 (#11347 ) * Bump fastutil from 8.2.3 to 8.5.4 Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4. - [Release notes](https://github.com/vigna/fastutil/releases) - [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES) - [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4) --- updated-dependencies: - dependency-name: it.unimi.dsi:fastutil dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update maven dependency list for -core and -extra libraries to pass maven dependency checks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2021-06-10 07:43:18 -07:00
frank chen	04fefb0ca3	Fix ClassCastException (#11266 ) Signed-off-by: frank chen <frank.chen021@outlook.com>	2021-05-27 21:25:51 -07:00
Clint Wylie	f6662b4893	fix count and average SQL aggregators on constant virtual columns (#11208 ) * fix count and average SQL aggregators on constant virtual columns * style * even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry * oops missed a few * remove unused * this will fix it	2021-05-10 13:41:48 -07:00
Clint Wylie	691d7a1d54	SQL timeseries no longer skip empty buckets with all granularity (#11188 ) * SQL timeseries no longer skip empty buckets with all granularity * add comment, fix tests * the ol switcheroo * revert unintended change * docs and more tests * style * make checkstyle happy * docs fixes and more tests * add docs, tests for array_agg * fixes * oops * doc stuffs * fix compile, match doc style	2021-05-10 10:13:37 -07:00
John Bampton	a8c00d8d9b	chore: fix case of GitHub (#10928 )	2021-05-07 01:15:43 -07:00
Clint Wylie	57ddae782e	fix serde issues with time-min-max extension (#11146 ) * fix serde issues with time-min-max extension * fix pom dependencies	2021-04-27 10:33:13 -07:00
Jihoon Son	25db8787b3	Fix CAST being ignored when aggregating on strings after cast (#11083 ) * Fix CAST being ignored when aggregating on strings after cast * fix checkstyle and dependency * unused import	2021-04-12 22:21:24 -07:00

1 2 3 4 5 ...

552 Commits