druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	f41468fd46	fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method (#14047 ) * fix off by one error in FrontCodedIndexedWriter and FrontCodedIntArrayIndexedWriter getCardinality method	2023-04-07 03:11:15 -07:00
Abhishek Radhakrishnan	f47b05a98c	Hyphenate multi value string for consistency. Fixup extra space in javadoc. (#14043 )	2023-04-07 11:46:07 +05:30
Clint Wylie	1b75b2d3d6	revert .idea/misc.xml changes (#14044 )	2023-04-06 17:45:03 -07:00
Suraj Sanjay Kadam	b4157e32ae	Update api.md (#13436 ) * Update api.md I have created changes in api call of python according to latest version of requests 2.28.1 library. Along with this there are some irregularities between use of <your-instance> and <hostname> so I have tried to fix that also. * Update api.md made some changes in declaring USER and PASSWORD	2023-04-06 15:05:36 -07:00
Charles Smith	1c2744b31e	Fix querying sql (#14026 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-04-06 14:50:06 -07:00
zachjsh	5c0221375c	Allow for Input source security in native task layer (#14003 ) Fixes #13837. ### Description This change allows for input source type security in the native task layer. To enable this feature, the user must set the following property to true: `druid.auth.enableInputSourceSecurity=true` The default value for this property is false, which will continue the existing functionality of needing authorization to write to the respective datasource. When this config is enabled, the users will be required to be authorized for the following resource action, in addition to write permission on the respective datasource. `new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ` where `{INPUT_SOURCE_TYPE}` is the type of the input source being used;, http, inline, s3, etc.. Only tasks that provide a non-default implementation of the `getInputSourceResources` method can be submitted when config `druid.auth.enableInputSourceSecurity=true` is set. Otherwise, a 400 error will be thrown.	2023-04-06 13:13:09 -04:00
Abhishek Agarwal	92912a6a2b	JOIN or UNNEST queries over tombstone segment can fail (#14021 ) Join,Unnest queries over tombstone segment can fail	2023-04-06 16:55:58 +05:30
Clint Wylie	b11c0bc249	smarter nested column index utilization (#13977 ) * smarter nested column index utilization changes: * adds skipValueRangeIndexScale and skipValuePredicateIndexScale to ColumnConfig (e.g. DruidProcessingConfig) available as system config via druid.processing.indexes.skipValueRangeIndexScale and druid.processing.indexes.skipValuePredicateIndexScale * NestedColumnIndexSupplier uses skipValueRangeIndexScale and skipValuePredicateIndexScale to multiply by the total number of rows to be processed to determine the threshold at which we should no longer consider using bitmap indexes because it will be too many operations * Default values for skipValueRangeIndexScale and skipValuePredicateIndexScale have been initially set to 0.08, but are separate to allow independent tuning * these are not documented on purpose yet because they are kind of hard to explain, the mainly exist to help conduct larger scale experiments than the jmh benchmarks used to derive the initial set of values * these changes provide a pretty sweet performance boost for filter processing on nested columns	2023-04-06 04:09:24 -07:00
Paul Rogers	030ed911d4	Temporarily revert extended table functions for Druid 26 (#14019 )	2023-04-05 21:09:33 -07:00
Abhishek Radhakrishnan	b98eed8fb8	Revert quoting lookup fix. (#14034 ) * Revert "Add ANSI_QUOTES propety to DBI init in lookups. (#13826)" This reverts commit `9e9976001c`. * Revert "Quote and escape literals in JDBC lookup to allow reserved identifiers. (#13632)" This reverts commit `41fdf6eafb`. * fix typo.	2023-04-05 20:52:36 -07:00
Nicholas Lippis	5810e650d4	K8s mm less fixes (#14028 ) Update Fabric8 version and allow metrics monitors to be overriden	2023-04-05 22:23:16 +05:30
Tejaswini Bandlamudi	ccf48245d7	Update documentation for Kafka Supervisor IdleConfig (#14032 )	2023-04-05 21:55:39 +05:30
Gian Merlino	319f99db05	Always use file sizes when determining batch ingest splits (#13955 ) * Always use file sizes when determining batch ingest splits. Main changes: 1) Update CloudObjectInputSource and its subclasses (S3, GCS, Azure, Aliyun OSS) to use SplitHintSpecs in all cases. Previously, they were only used for prefixes, not uris or objects. 2) Update ExternalInputSpecSlicer (MSQ) to consider file size. Previously, file size was ignored; all files were treated as equal weight when determining splits. A side effect of these changes is that we'll make additional network calls to find the sizes of objects when users specify URIs or objects as opposed to prefixes. IMO, this is worth it because it's the only way to respect the user's split hint and task assignment settings. Secondary changes: 1) S3, Aliyun OSS: Use getObjectMetadata instead of listObjects to get metadata for a single object. This is a simpler call that is also expected to be less expensive. 2) Azure: Fix a bug where getBlobLength did not populate blob reference attributes, and therefore would not actually retrieve the blob length. 3) MSQ: Align dynamic slicing logic between ExternalInputSpecSlicer and TableInputSpecSlicer. 4) MSQ: Adjust WorkerInputs to ensure there is always at least one worker, even if it has a nil slice. * Add msqCompatible to testGroupByWithImpossibleTimeFilter. * Fix tests. * Add additional tests. * Remove unused stuff. * Remove more unused stuff. * Adjust thresholds. * Remove irrelevant test. * Fix comments. * Fix bug. * Updates.	2023-04-05 08:54:01 -07:00
Karan Kumar	e6a11707cb	Adding query stack fault to MSQ to capture native query errors. (#13926 ) * Add a new fault "QueryRuntimeError" to MSQ engine to capture native query errors. * Fixed bug in MSQ fault tolerance where worker were being retried if `UnexpectedMultiValueDimensionException` was thrown. * An exception from the query runtime with `org.apache.druid.query` as the package name is thrown as a QueryRuntimeError	2023-04-05 16:29:10 +05:30
Clint Wylie	1c8a184677	add null safety checks for DiscoveryDruidNode services for more resilient http server and task views (#13930 ) * add null safety checks for DiscoveryDruidNode services for more resilient http server and task vi	2023-04-05 02:45:39 -07:00
Laksh Singla	012b49d5e5	Fix the order of aggregator finalization in GroupByPostShuffleFrameProcessor (MSQ) (#14022 ) * fix the order in which finalization is done * add comment explaining the change * null handling case	2023-04-05 11:04:06 +05:30
Clint Wylie	d21babc5b8	remix nested columns (#14014 ) changes: * introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation * introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>. * revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility * fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)	2023-04-04 17:51:59 -07:00
George Shiqi Wu	f60f377e5f	Fix issues with null pointers on jobResponse (#14010 ) * Fix issues with null pointers on jobResponse * fix unit tests * Update extensions-contrib/kubernetes-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/common/DruidKubernetesPeonClient.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * nullable * fix error message --------- Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>	2023-04-04 17:48:18 -07:00
317brian	7e572eef08	docs: sql unnest and cleanup unnest datasource (#13736 ) Co-authored-by: Elliott Freis <elliottfreis@Elliott-Freis.earth.dynamic.blacklight.net> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Paul Rogers <paul-rogers@users.noreply.github.com> Co-authored-by: Jill Osborne <jill.osborne@imply.io> Co-authored-by: Anshu Makkar <83963638+anshu-makkar@users.noreply.github.com> Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Elliott Freis <108356317+imply-elliott@users.noreply.github.com> Co-authored-by: Nicholas Lippis <nick.lippis@imply.io> Co-authored-by: Rohan Garg <7731512+rohangarg@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Gian Merlino <gianmerlino@gmail.com> Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Adarsh Sanjeev <adarshsanjeev@gmail.com> Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-04-04 13:07:54 -07:00
imply-cheddar	232491eea4	Document our conventions for writing messages (#13916 ) Document our conventions for writing messages	2023-04-03 21:30:20 -07:00
Benedict Jin	ab91768ddf	Fix broken shields (#14015 )	2023-04-04 09:41:53 +05:30
Benedict Jin	6c33ef8b15	Improve entries (#14016 )	2023-04-04 09:40:59 +05:30
Tejaswini Bandlamudi	5a9c13293b	remove duplicate trigger on Cron Job ITs workflow (#14013 )	2023-04-04 09:39:48 +05:30
Soumyava	ca94f7146f	Planning correctly for order by queries on time which previously thre… (#13965 ) * Planning correctly for order by queries on time which previously threw a planning error * Updating toDruidQueryForExplaining on a query data source if there is a window on the partial query	2023-04-03 18:30:19 -07:00
George Shiqi Wu	4560b9d8aa	New error message for task deletion (#14008 ) * New error message * Add unit test	2023-04-03 14:26:09 -07:00
Karan Kumar	217b0f6832	Eagerly fetching remote s3 files leading to out of disk (OOD) (#13981 ) * Eagerly fetching remote s3 files leading to OOD.	2023-04-03 14:10:37 +05:30
Clint Wylie	518698a952	lower segment heap footprint and fix bug with expression type coercion (#14002 )	2023-03-31 13:53:22 -07:00
Vadim Ogievetsky	981662e9f4	Web console: add a nice UI for overlord dynamic configs and improve the docs (#13993 ) * in progress * better form * doc updates * doc changes * add inline docs * fix tests * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * Update docs/configuration/index.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * final fixes * fix case * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * Update docs/configuration/index.md Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com> * fix overflow * fix spelling --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-03-31 10:12:25 -07:00
Clint Wylie	e3211e3be0	actually backwards compatible frontCoded string encoding strategy (#13996 )	2023-03-31 02:24:12 -07:00
soullkk	51f3db2ce6	Fix peon errors when executing tasks in ipv6(#13972 ) (#13995 )	2023-03-31 09:18:10 +05:30
Soumyava	1eeecf5fb2	Fixing regression issues on unnest (#13976 ) * select sum(c) on an unnested column now does not return 'Type mismatch' error and works properly * Making sure an inner join query works properly * Having on unnested column with a group by now works correctly * count(*) on an unnested query now works correctly	2023-03-31 09:06:43 +05:30
abhagraw	eb31207402	Using MinIO to run S3DeepStorage ITs (#13997 ) * Using MinIO to S3DeepStorage ITs * Adding S3DeepStorageTest to github actions revised ITs	2023-03-30 12:15:53 -07:00
Kashif Faraz	47face9ca9	Handle null values in BrokerServerView.serverAddedSegment (#13980 ) Due to race conditions, the BrokerServerView may sometimes try to add a segment to a server which has already been removed from the inventory. This results in an NPE and keeps the BrokerServerView from processing all change requests.	2023-03-30 16:19:05 +05:30
Nicholas Lippis	61a35262ec	Kubernetes task runner live reports (#13986 ) Implement Live Reports for the KubernetesTaskRunner	2023-03-30 10:30:22 +05:30
George Shiqi Wu	44abe2b96f	Fix bug in k8s task runner in handling deleted jobs (#14001 ) With the KubernetesTaskRunner, if a task is manually shutdown via the web console while running or the corresponding k8s job is manually deleted, the thread responsible for overseeing the task gets stuck in a loop because the fabric8 client sends one event to it that the job is null when the job is deleted, but this doesn't pass the condition. This means that the thread is stuck waiting on a fabric8 event (the job being successful) that will never come up until maxTaskDuration (default 4 hours). If a user of the extension is trying to use a limited taskqueue maxSize, this can cause problems as the k8s executor pool is unable to pick up additional tasks (since threads are stuck waiting on the old tasks that have already been deleted).	2023-03-30 10:09:52 +05:30
zachjsh	3bb67721f7	Allow for Input source security in SQL layer (#13989 ) This change introduces the concept of input source type security model, proposed in #13837.. With this change, this feature is only available at the SQL layer, but we will expand to native layer in a follow up PR. To enable this feature, the user must set the following property to true: druid.auth.enableInputSourceSecurity=true The default value for this property is false, which will continue the existing functionality of having the usage all external sources being authorized against the hardcoded resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, ResourceType.EXTERNAL), Action.READ When this config is enabled, the users will be required to be authorized for the following resource action new ResourceAction(new Resource(ResourceType.EXTERNAL, {INPUT_SOURCE_TYPE}, Action.READ where {INPUT_SOURCE_TYPE} is the type of the input source being used;, http, inline, s3, etc.. Documentation has not been added for the feature as it is not complete at the moment, as we still need to enable this for the native layer in a follow up pr.	2023-03-29 22:15:33 -04:00
Vadim Ogievetsky	abb7133153	Web console: use EXTEND syntax (#13985 ) * use EXTEND syntax * update licenses * update demo queries * updated snapshots * add join algorithm selector * dismiss	2023-03-29 16:19:49 -07:00
Tejaswini Bandlamudi	f715887172	Debug docker logs on ITs failure. (#13978 )	2023-03-29 09:06:41 -07:00
Karan Kumar	e4c5122a60	Fixing checkstyle (#14000 )	2023-03-29 20:21:21 +05:30
Sandeep	ccdf30e399	Bump Joda-Time version for current DateTimeZone data (#13999 )	2023-03-29 20:15:49 +05:30
Karan Kumar	8dce3ca4d5	OOM fix for running MSQ jobs with `intermediateSuperSorterStorageMaxLocalBytes` set (#13974 ) While using intermediateSuperSorterStorageMaxLocalBytes the super sorter was retaining references of the memory allocator. The fix clears the current outputChannel when close() is called on the ComposingWritableFrameChannel.java	2023-03-29 18:00:00 +05:30
Tejaswini Bandlamudi	3c096c01a2	cache mvn dependencies across tests without building (#13962 )	2023-03-29 16:27:36 +05:30
Nicholas Lippis	488f1d8363	Do not print error message if pod not found when getting task location (#13971 ) Do not print error message if pod not found when getting task location	2023-03-29 13:27:06 +05:30
Clint Wylie	2219e68fa3	add backwards compat mode for frontCoded stringEncodingStrategy (#13988 )	2023-03-28 14:44:44 -07:00
Paul Rogers	76fe26d4ba	Fix typos, add tests for http() function (#13954 )	2023-03-28 14:41:06 -07:00
frankgrimes97	2f98675285	Tuple sketch SQL support (#13887 ) This PR is a follow-up to #13819 so that the Tuple sketch functionality can be used in SQL for both ingestion using Multi-Stage Queries (MSQ) and also for analytic queries against Tuple sketch columns.	2023-03-28 18:47:12 +05:30
Karan Kumar	c2fe6a4956	Reworking s3 connector with various improvements (#13960 ) * Reworking s3 connector with 1. Adding retries 2. Adding max fetch size 3. Using s3Utils for most of the api's 4. Fixing bugs in DurableStorageCleaner 5. Moving to Iterator for listDir call	2023-03-28 17:05:16 +05:30
Rishabh Singh	e8e8082573	Update OIDCConfig with scope information (#13973 ) Allow users to provide custom scope through OIDC configuration	2023-03-28 14:50:00 +05:30
Clint Wylie	d5b1b5bc8e	nested columns + arrays = array columns! (#13803 ) array columns! changes: * add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns * nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns * neat test refactor stuff * add v4 segment test * add array element indexes * add tests for unnest and array columns * fix unnest column value selector cursor handling of null and empty arrays	2023-03-27 12:42:35 -07:00
Gian Merlino	062d72b67e	Add timeout to TaskStartTimeoutFault. (#13970 ) * Add timeout to TaskStartTimeoutFault. Makes the error message a bit more useful. * Update docs.	2023-03-27 23:37:19 +05:30

1 2 3 4 5 ...

12607 Commits All Branches Search

12607 Commits

All Branches