3166 Commits

Author SHA1 Message Date
Katya Macedo
099a9825d1
[Docs] Add a release notes template (#15333)
* Add release notes template

* Update spellcheck
2023-12-11 11:35:16 +05:30
Victoria Lim
e68979e03b
Docs: update SQL API reference (#15515)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-12-08 11:53:19 -08:00
Katya Macedo
355c800108
Revamp design page (#15486)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-12-08 11:40:24 -08:00
Clint Wylie
e64b92eb35
add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521) 2023-12-08 05:28:46 -08:00
Clint Wylie
1eafe983ec
fix array presenting columns to not match single element arrays to scalars for equality (#15503)
* fix array presenting columns to not match single element arrays to scalars for equality
* update docs to clarify usage model of mixed type columns
2023-12-08 01:22:07 -08:00
sb89594
5fda8613ad
Feature: Add IPv6 Match Function (#15212) 2023-12-07 23:09:06 -08:00
Charles Smith
db3a633250
update timeseries to reflect NULL filling (#15512)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-12-07 14:41:27 -08:00
Clint Wylie
82ac48786b
document arrayContainsElement filter (#15455) 2023-12-07 00:14:00 -08:00
Benjamin Hopp
fea53c7084
Re-arranging sections for append and replace docs. (#15497) 2023-12-06 13:13:05 -08:00
Abhishek Radhakrishnan
f4949afdd7
clarify and fixup typos related to unused segments in docs and javadocs. (#15498) 2023-12-05 22:30:32 -08:00
Jill Osborne
0e14a2c77f
Update retention rules doc (#15439) 2023-12-05 09:53:17 -08:00
Jan Werner
f4856bc1c1
ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481)
* Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172
* remove the reference to outdated ranger 2.0 from the docs

---------

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2023-12-05 08:24:37 -08:00
Rishabh Singh
d968bb3f43
Rename config for enabling CentralizedDatasourceSchema feature (#15476)
* Rename property to druid.centralizedDatasourceSchema.enabled
* Update config name in docker-compose
2023-12-05 16:57:25 +05:30
Pranav
74ab6024e1
Native doc update (#15456)
Updating the native docs for #15434
2023-11-30 10:37:23 +05:30
Pranav
93cd638645
Enabling aggregateMultipleValues in all StringAnyAggregators (#15434)
* Enabling aggregateMultipleValues in all StringAnyAggregators

* Adding more tests

* More validation

* fix warning

* updating asserts in decoupled mode

* fix intellij inspection

* Addressing comments

* Addressing comments

* Adding early validations and make aggregate consistent across all

* fixing tests

* fixing tests

* Update docs/querying/sql-aggregations.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fixing static check

---------

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2023-11-29 14:32:49 -08:00
Abhishek Agarwal
0a56c87e93
SQL: Plan non-equijoin conditions as cross join followed by filter (#15302)
This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.
2023-11-29 13:46:11 +05:30
Zoltan Haindrich
eb056e23b5
Fix dictionarySize overrides in tests (#15354)
I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit

Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here
2023-11-28 18:49:09 +05:30
Charles Smith
a929b9f16e
clafiry DISTINCT is optional for COUNT() (#15394) 2023-11-28 16:52:16 +05:30
Petrichor
b102667695
[Docs] Add example connection parameters for Java APIs (#15345) 2023-11-28 15:09:41 +05:30
Jill Osborne
3fa856b3ff
Update Kinesis resharding doc (#15401) 2023-11-20 15:40:59 -08:00
Jill Osborne
6ed343c047
Data management API doc refactor (#15087)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: George Shiqi Wu <george.wu@imply.io>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: ythorat2 <ythorat2@illinois.edu>
Co-authored-by: Krishna Anandan <krishna1729atom@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
Co-authored-by: Rishabh Singh <6513075+findingrish@users.noreply.github.com>
Co-authored-by: Magnus Henoch <magnus@gameanalytics.com>
Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Yashdeep Thorat <yashdeep97@gmail.com>
Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2023-11-20 12:34:42 -08:00
317brian
dfc52994d4
docs: fix code tabs (#15403) 2023-11-20 11:16:10 -08:00
Clint Wylie
a95c22ce70
support non-constant expressions for path arguments for json_value and json_query (#15320)
* support dynamic expressions for path arguments for json_value and json_query
2023-11-17 01:12:05 -08:00
Atul Mohan
a2914789d7
Add support for ingesting older iceberg snapshots (#15348)
This patch introduces a param snapshotTime in the iceberg inputsource spec that allows the user to ingest data files associated with the most recent snapshot as of the given time. This helps the user ingest data based on older snapshots by specifying the associated snapshot time.
This patch also upgrades the iceberg core version to 1.4.1
2023-11-17 12:32:28 +05:30
Charles Smith
6a5da5a05e
fix redirect for api docs and misc array-related typos (#15387)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-16 13:29:19 -08:00
Karan Kumar
857b8de425
Query from deep storage doc fixes. (#15382)
Fixing outdated query from deep storage docs.
2023-11-16 14:05:20 +05:30
Adarsh Sanjeev
a134cc30a6
Change default inSubQueryThreshold (#15336) 2023-11-14 14:08:12 +05:30
YongGang
3a3d37ef40
Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347)
* fix segment/count metric in Statsd-emitter

* update doc

* Update docs/development/extensions-contrib/prometheus.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/development/extensions-contrib/statsd.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

---------

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2023-11-10 08:08:58 -08:00
Charles Smith
e7d0429f5b
docs: suggest metadata store with instant ADD COLUMN semantics (#15334)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-09 12:56:30 -08:00
Pranav
e2fde8c516
Refactor lookups behavior while loading/dropping the containers (#14806) 2023-11-07 10:07:28 -08:00
Charles Smith
0403e48266
window functions docs (#14739)
* draft window functions

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* address comments

* remove default column

* Update docs/querying/sql-window-functions.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Update docs/querying/sql-window-functions.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* fix ntile

* remove default header column

* code tics to remove spelling errors

* add known issues, add SUM example

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* address spelling

* remove extra chars

* add to sidebar, fix admonition

* Update sql-window-functions.md

accept suggestion, change admonition style

* update sidebar

* Delete Untitled.ipynb

rm unwanted file

* Update docs/querying/sql-window-functions.md

* Update docs/querying/sql-window-functions.md

* update context param, accept suggestions

* accept suggestions

* Apply suggestions from code review

* Fix known issues

* require GROUP BY, explain order of operation

* accept suggestions

* fix spelling

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-11-06 11:34:42 -08:00
Rishabh Singh
8c802e4c9b
Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985)
In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal.

To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.
2023-11-04 19:33:25 +05:30
Tts-233
f39a778f7d
Fix 404 URL about native query (#15324) 2023-11-03 08:39:59 -07:00
Karan Kumar
5036af6fb3
Doc fixes for query from deep storage and MSQ (#15313)
Minor updates to the documentation.

    Added prerequisites.
    Removed a known issue in MSQ since its no longer valid.

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-03 10:52:20 +05:30
cristian-popa
fb260f3e41
docs: LDAP trust store property clarification (#15028)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-02 13:00:08 -07:00
Gian Merlino
d87d92bc43
Add system fields to input sources. (#15276)
* Add system fields to input sources.

Main changes:

1) The SystemField enum defines system fields "__file_uri", "__file_path",
   and "__file_bucket". They are associated with each input entity.

2) The SystemFieldInputSource interface can be added to any InputSource
   to make it system-field-capable. It sets up serialization of a list
   of configured "systemFields" in the JSON form of the input source, and
   provides a method getSystemFieldValue for computing the value of each
   system field. Cloud object, HDFS, HTTP, and Local now have this.

* Fix various LocalInputSource calls.

* Fix style stuff.

* Fixups.

* Fix tests and coverage.
2023-11-02 10:31:28 -07:00
Clint Wylie
d261587f4a
explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245)
* better documentation for the differences between arrays and mvds
* add outputType to ExpressionPostAggregator to make docs true
* add output coercion if outputType is defined on ExpressionPostAgg
* updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables
2023-11-02 00:31:37 -07:00
Charles Smith
de557a62ad
Suggest adoption of Google Style guide (#14905)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2023-11-01 13:31:03 -07:00
Charles Smith
3860052de0
remove references to Jupyter notebooks within the Druid repo (#15143)
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-11-01 13:17:06 -07:00
Katya Macedo
935050bf43
docs: Dynamic config cleanup (#15265)
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-11-01 11:22:33 -07:00
317brian
436ded3d78
docs: durable storage azure cleanup (#15120)
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
2023-10-31 15:20:38 -07:00
Katya Macedo
a43ffbdf2b
[Docs] Improvements to JSON-based batch Ingestion page (#15286) 2023-10-31 14:50:45 -07:00
317brian
87695410ac
docs: blurb about msq union all (#15223) 2023-10-31 14:15:38 -07:00
Vishesh Garg
039b05585c
Add worker status and duration metrics in live and task reports (#15180)
Add worker status and duration metrics in live and task reports for tracking.
2023-10-30 09:43:22 +05:30
317brian
737947754d
docs: add concurent compaction docs (#15218)
Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2023-10-27 10:29:34 -07:00
David Christle
fc0b940f78
Document the allowed range of announcer maxBytesPerNode (#15063) 2023-10-26 14:51:01 -07:00
YongGang
7a25ee4fd9
Ability to send task types to k8s or worker task runner (#15196)
* Ability to send task types to k8s or worker task runner

* add more tests

* use runnerStrategy to determine task runner

* minor refine

* refine runner strategy config

* move workerType config to upper level

* validate config when application start
2023-10-25 09:55:56 -07:00
Adarsh Sanjeev
c5fa649ea5
Rename segment load wait parameter (#15251) 2023-10-25 18:08:37 +05:30
Karan Kumar
61ea9e07c5
Limit pages size to a configurable limit (#14994)
Adding the ability to limit the pages sizes of select queries.

    We piggyback on the same machinery that is used to control the numRowsPerSegment.
    This patch introduces a new context parameter rowsPerPage for which the default value is set to 100000 rows.
    This patch also optimizes adding the last selectResults stage only when the previous stages have sorted outputs. Currently for each select query with selectDestination=durableStorage, we used to add this extra selectResults stage.
2023-10-12 14:01:46 +05:30
Clint Wylie
d0f64608eb
sql compatible three-valued logic native filters (#15058)
* sql compatible tri-state native logical filters when druid.expressions.useStrictBooleans=true and druid.generic.useDefaultValueForNull=false, and new druid.generic.useThreeValueLogicForNativeFilters=true
* log.warn if non-default configurations are used to guide operators towards SQL complaint behavior
2023-10-12 00:06:23 -07:00