druid

mirror of https://github.com/apache/druid.git synced 2025-02-18 16:12:23 +00:00

Author	SHA1	Message	Date
dependabot[bot]	99da4f3057	Bump commons-codec:commons-codec from 1.13 to 1.16.0 (#14819 ) * Bump commons-codec:commons-codec from 1.13 to 1.16.0 Bumps [commons-codec:commons-codec](https://github.com/apache/commons-codec) from 1.13 to 1.16.0. - [Changelog](https://github.com/apache/commons-codec/blob/master/RELEASE-NOTES.txt) - [Commits](https://github.com/apache/commons-codec/compare/commons-codec-1.13...rel/commons-codec-1.16.0) --- updated-dependencies: - dependency-name: commons-codec:commons-codec dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * update licenses.yaml * update licences.yaml --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xavier Léauté <xvrl@apache.org>	2023-11-13 08:54:55 -08:00
YongGang	3a3d37ef40	Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347 ) * fix segment/count metric in Statsd-emitter * update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/statsd.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-11-10 08:08:58 -08:00
Charles Smith	e7d0429f5b	docs: suggest metadata store with instant ADD COLUMN semantics (#15334 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-09 12:56:30 -08:00
AmatyaAvadhanula	895e53555c	Optimize mark segments as unused (#15352 )	2023-11-09 15:13:45 +05:30
Vadim Ogievetsky	fa48d4ea7d	use is not distinct from (#15349 )	2023-11-08 18:02:42 -08:00
Vadim Ogievetsky	d12f557492	fix ingest datasource detection falling over on paren (#15339 )	2023-11-08 13:32:27 -08:00
George Shiqi Wu	130bfbfc6d	Revert "Separate task lifecycle from kubernetes/location lifecycle (#15133 )" (#15346 ) This reverts commit dc0b163e192545c802b7fe2b3271e035cc1e70ff.	2023-11-08 13:12:30 -05:00
Kengo Seki	b7d7f84bce	Bump Jedis version to 5.0.2 (#15344 ) Currently, the redis-cache extension uses Jedis 2.9.0, which was released over seven years ago and is no longer listed in the official support matrix. This patch upgrades it to ensure the compatibility with the recent version of Redis and make future upgrades easier, including: Upgrade Jedis to v5.0.2, the latest version at this writing, and address the API changes and dependency version mismatch. Replace mock-jedis with jedis-mock, since the former has not been actively maintained any longer and not compatible with recent versions of Jedis.	2023-11-08 20:22:41 +05:30
Rishabh Singh	db95c375a6	Increase historical heap for standard IT (#15337 ) Lately, Query IT has been failing due to historical server running out of memory (OOM). We are investigating the historical heap dump from the test. Until the issue is resolved, we are increasing the heap size of historical server.	2023-11-08 15:21:30 +05:30
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
17px	54fa3425c3	fix: Creating span label not closed (#15323 )	2023-11-07 11:01:28 +08:00
nasuiyile	9333dd1f73	Correct the path of ipynb file of notebook introduction. (#15327 )	2023-11-07 11:01:06 +08:00
HudsonShi	e6ab8a15eb	Fixed the table in docker.md (#15328 )	2023-11-07 11:00:23 +08:00
Charles Smith	0403e48266	window functions docs (#14739 ) * draft window functions * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address comments * remove default column * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * fix ntile * remove default header column * code tics to remove spelling errors * add known issues, add SUM example * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address spelling * remove extra chars * add to sidebar, fix admonition * Update sql-window-functions.md accept suggestion, change admonition style * update sidebar * Delete Untitled.ipynb rm unwanted file * Update docs/querying/sql-window-functions.md * Update docs/querying/sql-window-functions.md * update context param, accept suggestions * accept suggestions * Apply suggestions from code review * Fix known issues * require GROUP BY, explain order of operation * accept suggestions * fix spelling --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-06 11:34:42 -08:00
Abhishek Radhakrishnan	2136dc3591	Batch segment retrieval from the metadata store (#15305 ) * Add a unit test that fails when used segments with too many intervals are retrieved. - This is a failing test case that needs to be ignored. * Batch the intervals (use 100 as it's consistent with batching in other places). * move the filtering inside the batch * Account for limit cross the batch splits. * Adjustments * Fixup and add tests * small refactor * add more tests. * remove wrapper. * Minor edits * assert out of range	2023-11-06 11:30:24 -08:00
Abhishek Agarwal	4b64a5693b	Move service specific JVM parameters to the right in tests (#15325 ) Historical OOMs were not getting dumped into /shared/logs because common JVM flags will override service-specific JVM flags. This PR fixes that and also removes unnecessary overrides in historical.	2023-11-06 15:45:59 +05:30
Atul Mohan	ff7de49015	Consolidate and reduce dependency footprint for iceberg extension (#15280 ) * Consolidate and reduce dependency footprint * Fix dependency analysis	2023-11-06 12:17:32 +05:30
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
George Shiqi Wu	a8906b6ea0	Fix k8s task runner failure reporting (#15311 ) * Fix k8s task runner failure reporting * Fix reference * add jsonignore * PR changes	2023-11-03 21:28:46 -04:00
Clint Wylie	5d39b94149	allow compaction to work with spatial dimensions (#15321 )	2023-11-03 11:27:50 -07:00
Laksh Singla	0cc8839a60	Allow casted literal values in SQL functions accepting literals (Part 2) (#15316 )	2023-11-03 21:22:19 +05:30
Tts-233	f39a778f7d	Fix 404 URL about native query (#15324 )	2023-11-03 08:39:59 -07:00
Gian Merlino	98f1eb8ede	Use filters for pruning properly for hash-joins. (#15299 ) * Use filters for pruning properly for hash-joins. Native used them too aggressively: it might use filters for the RHS to prune the LHS. MSQ used them not at all. Now, both use them properly, pruning based on base (LHS) columns only. * Fix tests. * Fix style. * Clear filterFields too. * Update.	2023-11-03 07:29:16 -07:00
Karan Kumar	5036af6fb3	Doc fixes for query from deep storage and MSQ (#15313 ) Minor updates to the documentation. Added prerequisites. Removed a known issue in MSQ since its no longer valid. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-03 10:52:20 +05:30
Adarsh Sanjeev	9576fd3141	HllSketch Merge Aggregator optimizations (#15162 ) * Null byte serde for empty sketches * Cache for HllSketchMerge * Check for empty sketches * Address review comments * Revert changes to HllSketchHolder * Handle null sketch holders instead of null sketches * Add unit test for MSQ HllSketch * Add comments * Fix style	2023-11-03 11:01:22 +08:00
cristian-popa	fb260f3e41	docs: LDAP trust store property clarification (#15028 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-02 13:00:08 -07:00
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
AmatyaAvadhanula	dc3213b05d	Fix used segment retrieval in Kill tasks (#15306 ) Fix used segment retrieval in Kill tasks	2023-11-02 19:07:17 +05:30
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Adarsh Sanjeev	22443ab87e	Fix an issue with passing order by and limit to realtime tasks (#15301 ) While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns. If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks. Fixes a bug with MSQ while reading data from real time tasks with non time ordering	2023-11-02 11:38:26 +05:30
Laksh Singla	b82ad59dfe	Better logging in ServiceClientImpl (#15269 ) ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal. This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs. Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.	2023-11-02 11:32:49 +05:30
Gian Merlino	6b6d73b5d4	Use min of scheduler threads and server threads for subquery guardrails. (#15295 ) * Use min of scheduler threads and server threads for subquery guardrails. This allows more memory to be used for subqueries when the query scheduler is configured to limit queries below the number of server threads. The patch also refactors the code so SubqueryGuardrailHelper is provided by a Guice Provider rather than being created by ClientQuerySegmentWalker, to achieve better separation of concerns. * Exclude provider from coverage.	2023-11-01 22:34:53 -07:00
Gian Merlino	37e158c2c4	Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300 ) * Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. Prior to this patch, columnar frames would always write multi-valued columns if the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE causes expressions to assume that string inputs must be treated as arrays. We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness is encountered, and flipping it to TRUE if multi-valuedness is encountered. * Add regression test case.	2023-11-01 22:05:53 -07:00
Charles Smith	de557a62ad	Suggest adoption of Google Style guide (#14905 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-01 13:31:03 -07:00
Charles Smith	3860052de0	remove references to Jupyter notebooks within the Druid repo (#15143 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-11-01 13:17:06 -07:00
Katya Macedo	935050bf43	docs: Dynamic config cleanup (#15265 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-01 11:22:33 -07:00
Sergio Ferragut	c9c3df204e	Redirect to new jupyter notebook project (#15136 )	2023-11-01 08:38:40 -07:00
Laksh Singla	2ea7177f15	Allow casted literal values in SQL functions accepting literals (#15282 ) Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.	2023-11-01 10:38:48 +05:30
George Shiqi Wu	49e0cba7ba	Fix dockerfile for druid image (#15264 ) Fixes docker image build issues with apache/druid.	2023-11-01 09:55:54 +05:30
317brian	436ded3d78	docs: durable storage azure cleanup (#15120 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-10-31 15:20:38 -07:00
Katya Macedo	a43ffbdf2b	[Docs] Improvements to JSON-based batch Ingestion page (#15286 )	2023-10-31 14:50:45 -07:00
317brian	87695410ac	docs: blurb about msq union all (#15223 )	2023-10-31 14:15:38 -07:00
Suneet Saldanha	e6b7c36e74	LoadRules with 0 replicas should be treated as handoff complete (#15274 ) * LoadRules with 0 replicas should be treated as handoff complete * fix it * pr feedback * fixit	2023-10-30 10:42:58 -07:00
George Shiqi Wu	3173093415	Handle status failures for streaming supervisors (#15174 ) * Cleanup logic * newline * remove whitespace * Fix log message * Add test class * PR changes	2023-10-30 10:21:23 -07:00
Vishesh Garg	a27598a487	Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance (#15222 ) Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled. With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.	2023-10-30 14:39:15 +05:30
Ben Sykes	275c1ec64c	Fix error assuming a Complex Type that is a Number is a double (#15272 ) * Fix error assuming a Complex Type that is a Number is a double In the case where a complex type is a number, it may not be castable to double. It can safely be case as Number first to get to the doubleValue.	2023-10-30 09:52:52 +05:30
Vishesh Garg	039b05585c	Add worker status and duration metrics in live and task reports (#15180 ) Add worker status and duration metrics in live and task reports for tracking.	2023-10-30 09:43:22 +05:30
Zoltan Haindrich	f4a74710e6	Process pure ordering changes with windowing operators (#15241 ) - adds a new query build path: DruidQuery#toScanAndSortQuery which: - builds a ScanQuery without considering the current ordering - builds an operator to execute the sort - fixes a null string to "null" literal string conversion in the frame serializer code - fixes some DrillWindowQueryTest cases - fix NPE in NaiveSortOperator in case there was no input - enables back CoreRules.AGGREGATE_REMOVE - adds a processing level OffsetLimit class and uses that instead of just the limit in the rac parts - earlier window expressions on top of a subquery with an offset may have ignored the offset	2023-10-29 16:40:49 +05:30
317brian	737947754d	docs: add concurent compaction docs (#15218 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-10-27 10:29:34 -07:00
kaisun2000	60c2ad597a	Enhance json parser error logging to better track Istio Proxy error message (#15176 ) Currently the inter Druid communication via rest endpoints is based on json formatted payload. Upon parsing error, there is only a generic exception stating expected json token type and current json token type. There is no detailed error log about the content of the payload causing the violation. In the micro-service world, the trend is to deploy the Druid servers in k8 with the mesh network. Often the istio proxy or other proxies is used to intercept the network connection between Druid servers. The proxy may give error messages for various reasons. These error messages are not expected by the json parser. The generic error message from Druid can be very misleading as the user may think the message is based on the response from the other Druid server. For example, this is an example of mysterious error message QueryInterruptedException{msg=Next token wasn't a START_ARRAY, was[VALUE_STRING] from url[http://xxxxx:8088/druid/v2/], code=Unknown exception, class=org.apache.druid.java.util.common.IAE, host=xxxxx:8088}" While the context of the message is the following from the proxy when it can't tunnel the network connection. pstream connect error or disconnect/reset before header So this very simple PR is just to enhance the logging and get the real underlying message printed out. This would save a lot of head scratching time if Druid is deployed with mesh network. Co-authored-by: Kai Sun <kai.sun@salesforce.com>	2023-10-27 14:20:19 +05:30

1 2 3 4 5 ...

13381 Commits