druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	2219e68fa3	add backwards compat mode for frontCoded stringEncodingStrategy (#13988 )	2023-03-28 14:44:44 -07:00
Paul Rogers	76fe26d4ba	Fix typos, add tests for http() function (#13954 )	2023-03-28 14:41:06 -07:00
frankgrimes97	2f98675285	Tuple sketch SQL support (#13887 ) This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.	2023-03-28 18:47:12 +05:30
Karan Kumar	c2fe6a4956	Reworking s3 connector with various improvements (#13960 ) * Reworking s3 connector with 1. Adding retries 2. Adding max fetch size 3. Using s3Utils for most of the api's 4. Fixing bugs in DurableStorageCleaner 5. Moving to Iterator for listDir call	2023-03-28 17:05:16 +05:30
Rishabh Singh	e8e8082573	Update OIDCConfig with scope information (#13973 ) Allow users to provide custom scope through OIDC configuration	2023-03-28 14:50:00 +05:30
Clint Wylie	d5b1b5bc8e	nested columns + arrays = array columns! (#13803 ) array columns! changes: * add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns * nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns * neat test refactor stuff * add v4 segment test * add array element indexes * add tests for unnest and array columns * fix unnest column value selector cursor handling of null and empty arrays	2023-03-27 12:42:35 -07:00
Gian Merlino	062d72b67e	Add timeout to TaskStartTimeoutFault. (#13970 ) * Add timeout to TaskStartTimeoutFault. Makes the error message a bit more useful. * Update docs.	2023-03-27 23:37:19 +05:30
Arnout Engelen	daff7fe73b	Document how to report security issues (#13886 ) Document how to report security issues on the security overview page, so we can link this page from the homepage. That should make all the other important security information easier to find as well.	2023-03-27 11:26:37 +05:30
kaijianding	13ffeb50ba	should retry when failed to pause realtime task (#11515 )	2023-03-25 19:03:13 +05:30
Atul Mohan	19db32d6b4	Add JWT authenticator support for validating ID Tokens (#13242 ) Expands the OIDC based auth in Druid by adding a JWT Authenticator that validates ID Tokens associated with a request. The existing pac4j authenticator works for authenticating web users while accessing the console, whereas this authenticator is for validating Druid API requests made by Direct clients. Services already supporting OIDC can attach their ID tokens to the Druid requests under the Authorization request header.	2023-03-25 18:41:40 +05:30
Rishabh Singh	598eaad7e1	Fix HSTS for middle manager (#13975 ) Fix HSTS for middle manager	2023-03-25 14:01:09 +05:30
Gian Merlino	549018d076	Revert "Update docs." This reverts commit `de27c7d3c1`.	2023-03-24 17:16:12 -07:00
Gian Merlino	de27c7d3c1	Update docs.	2023-03-24 17:15:27 -07:00
Nicholas Lippis	8a72544bd2	Hook up pod template adapter (#13966 ) * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969) * Update docs/development/extensions-contrib/k8s-jobs.md Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * Hook up PodTemplateTaskAdapter * Make task adapter TYPE parameters final * Rename adapters types * Include specified adapter name in exception message * Documentation for sidecarSupport deprecation * Fix order * fix spelling errors --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-03-24 12:13:46 -06:00
Jill Osborne	976d39281f	Fix some broken links in docs (#13968 )	2023-03-24 10:48:23 -07:00
Nicholas Lippis	36df2495e1	Set TASK_ID as environment variable in PodTemplateTaskAdapter (#13969 )	2023-03-23 16:45:01 -06:00
Abhishek Agarwal	139a058ba7	Use sonatype maven central for plugin repositories (#13961 ) * Change search order of maven repositories * Update pom.xml	2023-03-23 15:35:47 +05:30
abhagraw	c52d15d65d	Fixing security vulnerability check errors (#13956 ) * Fixing security vulnerability check errors * Updating javax.el to jakarta.el * Adding cron job trigger on changes to suppressions file	2023-03-23 11:10:06 +05:30
Paul Rogers	da42ee5bfa	Added TYPE(native) data type for external tables (#13958 )	2023-03-22 21:43:29 -07:00
Soumyava	2ad133c06e	Unnest changes for moving the filter on right side of correlate to inside the unnest datasource (#13934 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-22 18:24:00 -07:00
Vadim Ogievetsky	8d125b7c7f	Web console: segment writing progress indication (#13929 ) * add segment writing progress indication * update with more metrics * add push metric	2023-03-22 16:34:38 -07:00
Nicholas Lippis	d81d13b9ba	Pod template task adapter (#13896 ) * Pod template task adapter * Use getBaseTaskDirPaths * Remove unused task from getEnv * Use Optional.ifPresent() instead of Optional.map() * Pass absolute path * Don't pass task to getEnv * Assert the correct adapter is created * Javadocs and Comments * Add exception message to assertions	2023-03-22 14:20:24 -06:00
Clint Wylie	086eb26b74	fix join and unnest planning to ensure that duplicate join prefixes are not used (#13943 ) * fix join and unnest planning to ensure that duplicate join prefixes are not used * wont somebody please think of the children	2023-03-22 12:53:55 -07:00
Adarsh Sanjeev	7bab407495	Add segment generator counters to MSQ reports (#13909 ) * Add segment generator counters to reports * Remove unneeded annotation * Fix checkstyle and coverage * Add persist and merged as new metrics * Address review comments * Fix checkstyle * Create metrics class to handle updating counters * Address review comments * Add rowsPushed as a new metrics	2023-03-22 09:17:26 -07:00
Clint Wylie	f4392a3155	expression transform improvements and fixes (#13947 ) changes: * fixes inconsistent handling of byte[] values between ExprEval.bestEffortOf and ExprEval.ofType, which could cause byte[] values to end up as java toString values instead of base64 encoded strings in ingest time transforms * improved ExpressionTransform binding to re-use ExprEval.bestEffortOf when evaluating a binding instead of throwing it away * improved ExpressionTransform array handling, added RowFunction.evalDimension that returns List<String> to back Row.getDimension and remove the automatic coercing of array types that would typically happen to expression transforms unless using Row.getDimension * added some tests for ExpressionTransform with array inputs * improved ExpressionPostAggregator to use partial type information from decoration * migrate some test uses of InputBindings.forMap to use other methods	2023-03-21 23:26:53 -07:00
Kashif Faraz	b7752a909c	Enable round-robin segment assignment and batch segment allocation by default (#13942 ) Changes: - Set `useRoundRobinSegmentAssignment` in coordinator dynamic config to `true` by default. - Set `batchSegmentAllocation` in `TaskLockConfig` (used in Overlord runtime properties) to `true` by default.	2023-03-22 08:20:01 +05:30
Victoria Lim	ede9903ff4	pip install for Python Druid API (#13938 ) Broken test appears unrelated to this PR * make druidapi pip installable * include druidapi in prerequisites * add license to setup.py * updates from Paul's review * note about editable install * Apply suggestions from code review Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * update install instructions * found unrelated typos * standardize install cmd with pip --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-03-21 11:37:39 -07:00
Gian Merlino	1c7a03a47b	Lower default maxRowsInMemory for realtime ingestion. (#13939 ) * Lower default maxRowsInMemory for realtime ingestion. The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads. However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 150k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 33 persists due to this limit, which is a reasonable number of persists. * Update tests. * Update server/src/main/java/org/apache/druid/segment/indexing/RealtimeTuningConfig.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Fix test. * Fix link. --------- Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-21 10:36:36 -07:00
Jill Osborne	4f95285406	Correct nested columns JSON example (#13953 )	2023-03-21 09:17:26 -07:00
Atul Mohan	617c325c70	Make zk connection retries configurable (#13913 ) * This makes the zookeeper connection retry count configurable. This is presently hardcoded to 29 tries which ends up taking a long time for the druid node to shutdown in case of ZK connectivity loss. Having a shorter retry count helps k8s deployments to fail fast. In situations where the underlying k8s node loses network connectivity or is no longer able to talk to zookeeper, failing fast can trigger pod restarts which can then reassign the pod to a healthy k8s node. Existing behavior is preserved, but users can override this property if needed.	2023-03-21 14:45:28 +05:30
Adarsh Sanjeev	143fdcfacf	Change test name so it triggers in CI (#13844 ) As the name of the class did not end or start with "Test", CalciteSelectQueryMSQTest was not triggered in CI. This PR renames the test.	2023-03-20 15:55:52 +05:30
Tejaswini Bandlamudi	1c250a0bc0	Fix error in cron job ITs workflow (#13945 )	2023-03-17 17:29:45 +05:30
John Gozde	38adac4369	Dart sass (#13937 ) * Run npx saas-migrator division * Switch to dart sass * Upgrade blueprint * Remove deprecated import syntax * Prettify * Snapshots	2023-03-16 12:44:24 -07:00
Karan Kumar	bf13156b55	Regression bug fix where ever LimitFrameProcessor's were used. (#13941 )	2023-03-16 09:18:18 -07:00
Karan Kumar	67df1324ee	Undocumenting certain context parameter in MSQ. (#13928 ) * Removing intermediateSuperSorterStorageMaxLocalBytes, maxInputBytesPerWorker, composedIntermediateSuperSorterStorageEnabled, clusterStatisticsMergeMode from docs * Adding documentation in the context class.	2023-03-16 17:56:44 +05:30
Tejaswini Bandlamudi	da197c9273	Migrate existing jdk11 ITs to cron job (#13918 ) This cron job runs on the latest commit of the master branch by default daily at 3:00 AM UTC.	2023-03-16 15:30:07 +05:30
Tejaswini Bandlamudi	6837289cb0	Fixes parquet uint_32 datatype conversion (#13935 ) After parquet ingestion, uint_32 parquet datatypes are stored as null values in the dataSource. This PR fixes this conversion bug.	2023-03-16 15:27:38 +05:30
abhagraw	c7d864d3bc	Update container creation in AzureTestUtil.java (#13911 ) * 1. Handling deletion/creation of container created during the previously run test in AzureTestUtil.java. 2. Adding/updating log messages and comments in Azure and GCS deep storage tests.	2023-03-16 11:04:43 +05:30
317brian	65a663adbb	docs: clarify Java precision (#13671 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-03-15 11:43:41 -07:00
Benedict Jin	cee2dfd768	Upgrade ZK from 3.5.9 to 3.5.10 to avoid data inconsistency risk (#13715 )	2023-03-15 19:21:09 +05:30
Andreas Maechler	46766d245c	Replace deprecated substr with slice (#13822 )	2023-03-15 03:57:06 -07:00
Clint Wylie	ed57c5c853	better FrontCodedIndexed (#13854 ) * Adds new implementation of 'frontCoded' string encoding strategy, which writes out a v1 FrontCodedIndexed which stores buckets on a prefix of the previous value instead of the first value in the bucket	2023-03-14 18:14:11 -07:00
somu-imply	7ce3371730	Fixing a github workflow to resolve conflicts and use the correct tag for jdk (#13933 )	2023-03-14 16:06:27 -07:00
somu-imply	a7ba361666	Refactoring and bug fixes on top of unnest. The allowList now is not passed … (#13922 ) * Refactoring and bug fixes on top of unnest. The filter now is passed inside the unnest cursors. Added tests for scenarios such as 1. filter on unnested column which involves a left filter rewrite 2. filter on unnested virtual column which pushes the filter to the right only and involves no rewrite 3. not filters 4. SQL functions applied on top of unnested column 5. null present in first row of the column to be unnested	2023-03-14 16:05:56 -07:00
Paul Rogers	4493275d88	Use Maven central repo rather than Apache (#13921 ) * Use Maven central repo rather than Apache * Disable snapshots	2023-03-13 10:49:32 -07:00
Karan Kumar	29b6bf0942	Removing the forbidden check on getOrDefault due to java8 incompatibility. (#13920 )	2023-03-11 09:49:32 +05:30
Elliott Freis	8a1dc2f51c	We want to tag the container based on the build jdk version, not the runtime version (#13917 ) Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net>	2023-03-10 11:35:33 -08:00
Suneet Saldanha	44547614ae	Report engine as a dimension for sqlQuery metrics (#13906 ) * Report engine as a dimension for sqlQuery metrics * docs	2023-03-10 11:23:57 -08:00
Karan Kumar	67be70e82e	Removing the forbidden check until we find a fix for java 8 to unblock builds. (#13910 )	2023-03-10 21:37:19 +05:30
Gian Merlino	4b1ffbc452	Various changes and fixes to UNNEST. (#13892 ) * Various changes and fixes to UNNEST. Native changes: 1) UnnestDataSource: Replace "column" and "outputName" with "virtualColumn". This enables pushing expressions into the datasource. This in turn allows us to do the next thing... 2) UnnestStorageAdapter: Logically apply query-level filters and virtual columns after the unnest operation. (Physically, filters are pulled up, when possible.) This is beneficial because it allows filters and virtual columns to reference the unnested column, and because it is consistent with how the join datasource works. 3) Various documentation updates, including declaring "unnest" as an experimental feature for now. SQL changes: 1) Rename DruidUnnestRel (& Rule) to DruidUnnestRel (& Rule). The rel is simplified: it only handles the UNNEST part of a correlated join. Constant UNNESTs are handled with regular inline rels. 2) Rework DruidCorrelateUnnestRule to focus on pulling Projects from the left side up above the Correlate. New test testUnnestTwice verifies that this works even when two UNNESTs are stacked on the same table. 3) Include ProjectCorrelateTransposeRule from Calcite to encourage pushing mappings down below the left-hand side of the Correlate. 4) Add a new CorrelateFilterLTransposeRule and CorrelateFilterRTransposeRule to handle pulling Filters up above the Correlate. New tests testUnnestWithFiltersOutside and testUnnestTwiceWithFilters verify this behavior. 5) Require a context feature flag for SQL UNNEST, since it's undocumented. As part of this, also cleaned up how we handle feature flags in SQL. They're now hooked into EngineFeatures, which is useful because not all engines support all features.	2023-03-10 16:42:08 +05:30

1 2 3 4 5 ...

12564 Commits All Branches Search

12564 Commits

All Branches