13385 Commits

Author SHA1 Message Date
Vadim Ogievetsky
2cb74433fd
Web console: fix time shifting (#15359)
* fix time shifting
2023-11-14 15:33:52 -08:00
Krishna Anandan
5edeac28df
+ Switching Comparison from String to JSON (#15364) 2023-11-14 08:07:19 -08:00
Adarsh Sanjeev
a134cc30a6
Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
Rishabh Singh
5446494e63
Non-existent datasource shouldn't affect schema rebuilding for other datasources (#15355)
In pull request #14985, a bug was introduced where periodic refresh would skip rebuilding a datasource's schema after encountering a non-existent datasource. This resulted in remaining datasources having stale schema information.

This change addresses the bug and adds a unit test to validate the refresh mechanism's behaviour when a datasource is removed, and other datasources have schema changes.
2023-11-14 12:52:33 +05:30
dependabot[bot]
99da4f3057
Bump commons-codec:commons-codec from 1.13 to 1.16.0 (#14819)
* Bump commons-codec:commons-codec from 1.13 to 1.16.0

Bumps [commons-codec:commons-codec](https://github.com/apache/commons-codec) from 1.13 to 1.16.0.
- [Changelog](https://github.com/apache/commons-codec/blob/master/RELEASE-NOTES.txt)
- [Commits](https://github.com/apache/commons-codec/compare/commons-codec-1.13...rel/commons-codec-1.16.0)

---
updated-dependencies:
- dependency-name: commons-codec:commons-codec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml

* update licences.yaml

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2023-11-13 08:54:55 -08:00
YongGang
3a3d37ef40
Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347)
* fix segment/count metric in Statsd-emitter

* update doc

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/development/extensions-contrib/statsd.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-11-10 08:08:58 -08:00
Charles Smith
e7d0429f5b
docs: suggest metadata store with instant ADD COLUMN semantics (#15334)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-09 12:56:30 -08:00
AmatyaAvadhanula
895e53555c
Optimize mark segments as unused (#15352) 2023-11-09 15:13:45 +05:30
Vadim Ogievetsky
fa48d4ea7d
use is not distinct from (#15349) 2023-11-08 18:02:42 -08:00
Vadim Ogievetsky
d12f557492
fix ingest datasource detection falling over on paren (#15339) 2023-11-08 13:32:27 -08:00
George Shiqi Wu
130bfbfc6d
Revert "Separate task lifecycle from kubernetes/location lifecycle (#15133)" (#15346)
This reverts commit dc0b163e192545c802b7fe2b3271e035cc1e70ff.
2023-11-08 13:12:30 -05:00
Kengo Seki
b7d7f84bce
Bump Jedis version to 5.0.2 (#15344)
Currently, the redis-cache extension uses Jedis 2.9.0, which was released over seven years ago and is no longer listed in the official support matrix. This patch upgrades it to ensure the compatibility with the recent version of Redis and make future upgrades easier, including:

Upgrade Jedis to v5.0.2, the latest version at this writing, and address the API changes and dependency version mismatch.

Replace mock-jedis with jedis-mock, since the former has not been actively maintained any longer and not compatible with recent versions of Jedis.
2023-11-08 20:22:41 +05:30
Rishabh Singh
db95c375a6
Increase historical heap for standard IT (#15337)
Lately, Query IT has been failing due to historical server running out of memory (OOM).
We are investigating the historical heap dump from the test. Until the issue is resolved, we are increasing the heap size of historical server.
2023-11-08 15:21:30 +05:30
Pranav
e2fde8c516
Refactor lookups behavior while loading/dropping the containers (#14806) 2023-11-07 10:07:28 -08:00
17px
54fa3425c3
fix: Creating span label not closed (#15323) 2023-11-07 11:01:28 +08:00
nasuiyile
9333dd1f73
Correct the path of ipynb file of notebook introduction. (#15327) 2023-11-07 11:01:06 +08:00
HudsonShi
e6ab8a15eb
Fixed the table in docker.md (#15328) 2023-11-07 11:00:23 +08:00
Charles Smith
0403e48266
window functions docs (#14739)
* draft window functions

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* address comments

* remove default column

* Update docs/querying/sql-window-functions.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/querying/sql-window-functions.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* fix ntile

* remove default header column

* code tics to remove spelling errors

* add known issues, add SUM example

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* address spelling

* remove extra chars

* add to sidebar, fix admonition

* Update sql-window-functions.md

accept suggestion, change admonition style

* update sidebar

* Delete Untitled.ipynb

rm unwanted file

* Update docs/querying/sql-window-functions.md

* Update docs/querying/sql-window-functions.md

* update context param, accept suggestions

* accept suggestions

* Apply suggestions from code review

* Fix known issues

* require GROUP BY, explain order of operation

* accept suggestions

* fix spelling

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-11-06 11:34:42 -08:00
Abhishek Radhakrishnan
2136dc3591
Batch segment retrieval from the metadata store (#15305)
* Add a unit test that fails when used segments with too many intervals are retrieved.

- This is a failing test case that needs to be ignored.

* Batch the intervals (use 100 as it's consistent with batching in other places).

* move the filtering inside the batch

* Account for limit cross the batch splits.

* Adjustments

* Fixup and add tests

* small refactor

* add more tests.

* remove wrapper.

* Minor edits

* assert out of range
2023-11-06 11:30:24 -08:00
Abhishek Agarwal
4b64a5693b
Move service specific JVM parameters to the right in tests (#15325)
Historical OOMs were not getting dumped into /shared/logs because common JVM flags will override service-specific JVM flags. This PR fixes that and also removes unnecessary overrides in historical.
2023-11-06 15:45:59 +05:30
Atul Mohan
ff7de49015
Consolidate and reduce dependency footprint for iceberg extension (#15280)
* Consolidate and reduce dependency footprint

* Fix dependency analysis
2023-11-06 12:17:32 +05:30
Rishabh Singh
8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
George Shiqi Wu
a8906b6ea0
Fix k8s task runner failure reporting (#15311)
* Fix k8s task runner failure reporting

* Fix reference

* add jsonignore

* PR changes
2023-11-03 21:28:46 -04:00
Clint Wylie
5d39b94149
allow compaction to work with spatial dimensions (#15321) 2023-11-03 11:27:50 -07:00
Laksh Singla
0cc8839a60
Allow casted literal values in SQL functions accepting literals (Part 2) (#15316) 2023-11-03 21:22:19 +05:30
Tts-233
f39a778f7d
Fix 404 URL about native query (#15324) 2023-11-03 08:39:59 -07:00
Gian Merlino
98f1eb8ede
Use filters for pruning properly for hash-joins. (#15299)
* Use filters for pruning properly for hash-joins.

Native used them too aggressively: it might use filters for the RHS
to prune the LHS. MSQ used them not at all. Now, both use them properly,
pruning based on base (LHS) columns only.

* Fix tests.

* Fix style.

* Clear filterFields too.

* Update.
2023-11-03 07:29:16 -07:00
Karan Kumar
5036af6fb3
Doc fixes for query from deep storage and MSQ (#15313)
Minor updates to the documentation.

    Added prerequisites.
    Removed a known issue in MSQ since its no longer valid.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-03 10:52:20 +05:30
Adarsh Sanjeev
9576fd3141
HllSketch Merge Aggregator optimizations (#15162)
* Null byte serde for empty sketches

* Cache for HllSketchMerge

* Check for empty sketches

* Address review comments

* Revert changes to HllSketchHolder

* Handle null sketch holders instead of null sketches

* Add unit test for MSQ HllSketch

* Add comments

* Fix style
2023-11-03 11:01:22 +08:00
cristian-popa
fb260f3e41
docs: LDAP trust store property clarification (#15028)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-02 13:00:08 -07:00
Gian Merlino
d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
AmatyaAvadhanula
dc3213b05d
Fix used segment retrieval in Kill tasks (#15306)
Fix used segment retrieval in Kill tasks
2023-11-02 19:07:17 +05:30
Clint Wylie
d261587f4a
explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245)
* better documentation for the differences between arrays and mvds
* add outputType to ExpressionPostAggregator to make docs true
* add output coercion if outputType is defined on ExpressionPostAgg
* updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables
2023-11-02 00:31:37 -07:00
Adarsh Sanjeev
22443ab87e
Fix an issue with passing order by and limit to realtime tasks (#15301)
While running queries on real time tasks using MSQ, there is an issue with queries with certain order by columns.

If the query specifies a non time column, the query is planned as it is supported by MSQ. However, this throws an exception when passed to real time tasks once as the native query stack does not support it. This PR resolves this by removing the ordering from the query before contacting real time tasks.

    Fixes a bug with MSQ while reading data from real time tasks with non time ordering
2023-11-02 11:38:26 +05:30
Laksh Singla
b82ad59dfe
Better logging in ServiceClientImpl (#15269)
ServiceClientImpl logs the cause of every retry, even though we are retrying the connection attempt. This leads to slight pollution in the logs because a lot of the time, the reason for retrying is the same. This is seen primarily in MSQ, when the worker task hasn't launched yet however controller attempts to connect to the worker task, which can lead to scary-looking messages (with INFO log level), even though they are normal.
This PR changes the logging logic to log every 10 (arbitrary number) retries instead of every retry, to reduce the pollution of the logs.
Note: If there are no retries left, the client returns an exception, which would get thrown up by the caller, and therefore this change doesn't hide any important information.
2023-11-02 11:32:49 +05:30
Gian Merlino
6b6d73b5d4
Use min of scheduler threads and server threads for subquery guardrails. (#15295)
* Use min of scheduler threads and server threads for subquery guardrails.

This allows more memory to be used for subqueries when the query scheduler
is configured to limit queries below the number of server threads. The patch
also refactors the code so SubqueryGuardrailHelper is provided by a Guice
Provider rather than being created by ClientQuerySegmentWalker, to achieve
better separation of concerns.

* Exclude provider from coverage.
2023-11-01 22:34:53 -07:00
Gian Merlino
37e158c2c4
Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN. (#15300)
* Frames: consider writing singly-valued column when input column hasMultipleValues is UNKNOWN.

Prior to this patch, columnar frames would always write multi-valued columns if
the input column had hasMultipleValues = UNKNOWN. This had the effect of flipping
UNKNOWN to TRUE when copying data into frames, which is problematic because TRUE
causes expressions to assume that string inputs must be treated as arrays.

We now avoid this by flipping UNKNOWN to FALSE if no multi-valuedness
is encountered, and flipping it to TRUE if multi-valuedness is encountered.

* Add regression test case.
2023-11-01 22:05:53 -07:00
Charles Smith
de557a62ad
Suggest adoption of Google Style guide (#14905)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-11-01 13:31:03 -07:00
Charles Smith
3860052de0
remove references to Jupyter notebooks within the Druid repo (#15143)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-11-01 13:17:06 -07:00
Katya Macedo
935050bf43
docs: Dynamic config cleanup (#15265)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-01 11:22:33 -07:00
Sergio Ferragut
c9c3df204e
Redirect to new jupyter notebook project (#15136) 2023-11-01 08:38:40 -07:00
Laksh Singla
2ea7177f15
Allow casted literal values in SQL functions accepting literals (#15282)
Functions that accept literals also allow casted literals. This shouldn't have an impact on the queries that the user writes. It enables the SQL functions to accept explicit cast, which is required with JDBC.
2023-11-01 10:38:48 +05:30
George Shiqi Wu
49e0cba7ba
Fix dockerfile for druid image (#15264)
Fixes docker image build issues with apache/druid.
2023-11-01 09:55:54 +05:30
317brian
436ded3d78
docs: durable storage azure cleanup (#15120)
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
2023-10-31 15:20:38 -07:00
Katya Macedo
a43ffbdf2b
[Docs] Improvements to JSON-based batch Ingestion page (#15286) 2023-10-31 14:50:45 -07:00
317brian
87695410ac
docs: blurb about msq union all (#15223) 2023-10-31 14:15:38 -07:00
Suneet Saldanha
e6b7c36e74
LoadRules with 0 replicas should be treated as handoff complete (#15274)
* LoadRules with 0 replicas should be treated as handoff complete

* fix it

* pr feedback

* fixit
2023-10-30 10:42:58 -07:00
George Shiqi Wu
3173093415
Handle status failures for streaming supervisors (#15174)
* Cleanup logic

* newline

* remove whitespace

* Fix log message

* Add test class

* PR changes
2023-10-30 10:21:23 -07:00
Vishesh Garg
a27598a487
Segregate advance and advanceUninterruptibly flow in postJoinCursor to allow for interrupts in advance (#15222)
Currently advance function in postJoinCursor calls advanceUninterruptibly which in turn keeps calling baseCursor.advanceUninterruptibly until the post join condition matches, without checking for interrupts. This causes the CPU to hit 100% without getting a chance for query to be cancelled.

With this change, the call flow of advance and advanceUninterruptibly is separated out so that they call baseCursor.advance and baseCursor.advanceUninterruptibly in them, respectively, giving a chance for interrupts in the former case between successive calls to baseCursor.advance.
2023-10-30 14:39:15 +05:30
Ben Sykes
275c1ec64c
Fix error assuming a Complex Type that is a Number is a double (#15272)
* Fix error assuming a Complex Type that is a Number is a double
In the case where a complex type is a number, it may not be castable to double. It can safely be case as Number first to get to the doubleValue.
2023-10-30 09:52:52 +05:30