druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	e8fcf2cac8	minor doc adjustments (#15531 )	2023-12-11 18:22:44 -08:00
zachjsh	ab7d9bc6ec	Add api for Retrieving unused segments (#15415 ) ### Description This pr adds an api for retrieving unused segments for a particular datasource. The api supports pagination by the addition of `limit` and `lastSegmentId` parameters. The resulting unused segments are returned with optional `sortOrder`, `ASC` or `DESC` with respect to the matching segments `id`, `start time`, and `end time`, or not returned in any guarenteed order if `sortOrder` is not specified `GET /druid/coordinator/v1/datasources/{dataSourceName}/unusedSegments?interval={interval}&limit={limit}&lastSegmentId={lastSegmentId}&sortOrder={sortOrder}` Returns a list of unused segments for a datasource in the cluster contained within an optionally specified interval. Optional parameters for limit and lastSegmentId can be given as well, to limit results and enable paginated results. The results may be sorted in either ASC, or DESC order depending on specifying the sortOrder parameter. `dataSourceName`: The name of the datasource `interval`: the specific interval to search for unused segments for. `limit`: the maximum number of unused segments to return information about. This property helps to support pagination `lastSegmentId`: the last segment id from which to search for results. All segments returned are > this segment lexigraphically if sortOrder is null or ASC, or < this segment lexigraphically if sortOrder is DESC. `sortOrder`: Specifies the order with which to return the matching segments by start time, end time. A null value indicates that order does not matter. This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [x] added documentation for new or modified features or behaviors. - [ ] a release note entry in the PR description. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster.	2023-12-11 16:32:18 -05:00
Katya Macedo	fc222377ae	[Docs] Document decode_base64_complex and decode_base64_utf8 functions (#15444 )	2023-12-11 09:12:06 -08:00
Abhishek Radhakrishnan	96be82a3e6	Clean up duty for non-overlapping eternity tombstones (#15281 ) * Add initial draft of MarkDanglingTombstonesAsUnused duty. * Use overshadowed segments instead of all used segments. * Add unit test for MarkDanglingSegmentsAsUnused duty. * Add mock call * Simplify code. * Docs * shorter lines formatting * metric doc * More tests, refactor and fix up some logic. * update javadocs; other review comments. * Make numCorePartitions as 0 in the TombstoneShardSpec. * fix up test * Add tombstone core partition tests * Update docs/design/coordinator.md Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> * review comment * Minor cleanup * Only consider tombstones with 0 core partitions * Need to register the test shard type to make jackson happy * test comments * checkstyle * fixup misc typos in comments * Update logic to use overshadowed segments * minor cleanup * Rename duty to eternity tombstone instead of dangling. Add test for full eternity tombstone. * Address review feedback. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-12-11 08:57:15 -08:00
Katya Macedo	099a9825d1	[Docs] Add a release notes template (#15333 ) * Add release notes template * Update spellcheck	2023-12-11 11:35:16 +05:30
Victoria Lim	e68979e03b	Docs: update SQL API reference (#15515 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-12-08 11:53:19 -08:00
Katya Macedo	355c800108	Revamp design page (#15486 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-12-08 11:40:24 -08:00
Clint Wylie	e64b92eb35	add JSON_QUERY_ARRAY function to pluck ARRAY<COMPLEX<json>> out of COMPLEX<json> (#15521 )	2023-12-08 05:28:46 -08:00
Clint Wylie	1eafe983ec	fix array presenting columns to not match single element arrays to scalars for equality (#15503 ) * fix array presenting columns to not match single element arrays to scalars for equality * update docs to clarify usage model of mixed type columns	2023-12-08 01:22:07 -08:00
sb89594	5fda8613ad	Feature: Add IPv6 Match Function (#15212 )	2023-12-07 23:09:06 -08:00
Charles Smith	db3a633250	update timeseries to reflect NULL filling (#15512 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-12-07 14:41:27 -08:00
Clint Wylie	82ac48786b	document arrayContainsElement filter (#15455 )	2023-12-07 00:14:00 -08:00
Benjamin Hopp	fea53c7084	Re-arranging sections for append and replace docs. (#15497 )	2023-12-06 13:13:05 -08:00
Abhishek Radhakrishnan	f4949afdd7	clarify and fixup typos related to unused segments in docs and javadocs. (#15498 )	2023-12-05 22:30:32 -08:00
Jill Osborne	0e14a2c77f	Update retention rules doc (#15439 )	2023-12-05 09:53:17 -08:00
Jan Werner	f4856bc1c1	ranger-security: exclude jackson-jaxrs from + fix outdated documentation (#15481 ) * Excluding jackson-jaxrs dependency from ranger-plugin-common to address CVE regression introduced by ranger-upgrade: CVE-2019-10202, CVE-2019-10172 * remove the reference to outdated ranger 2.0 from the docs --------- Co-authored-by: Xavier Léauté <xl+github@xvrl.net>	2023-12-05 08:24:37 -08:00
Rishabh Singh	d968bb3f43	Rename config for enabling CentralizedDatasourceSchema feature (#15476 ) * Rename property to druid.centralizedDatasourceSchema.enabled * Update config name in docker-compose	2023-12-05 16:57:25 +05:30
Pranav	74ab6024e1	Native doc update (#15456 ) Updating the native docs for #15434	2023-11-30 10:37:23 +05:30
Pranav	93cd638645	Enabling aggregateMultipleValues in all StringAnyAggregators (#15434 ) * Enabling aggregateMultipleValues in all StringAnyAggregators * Adding more tests * More validation * fix warning * updating asserts in decoupled mode * fix intellij inspection * Addressing comments * Addressing comments * Adding early validations and make aggregate consistent across all * fixing tests * fixing tests * Update docs/querying/sql-aggregations.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * fixing static check --------- Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2023-11-29 14:32:49 -08:00
Abhishek Agarwal	0a56c87e93	SQL: Plan non-equijoin conditions as cross join followed by filter (#15302 ) This PR revives #14978 with a few more bells and whistles. Instead of an unconditional cross-join, we will now split the join condition such that some conditions are now evaluated post-join. To decide what sub-condition goes where, I have refactored DruidJoinRule class to extract unsupported sub-conditions. We build a postJoinFilter out of these unsupported sub-conditions and push to the join.	2023-11-29 13:46:11 +05:30
Zoltan Haindrich	eb056e23b5	Fix dictionarySize overrides in tests (#15354 ) I think this is a problem as it discards the false return value when the putToKeyBuffer can't store the value because of the limit Not forwarding the return value at that point may lead to the normal continuation here regardless something was not added to the dictionary like here	2023-11-28 18:49:09 +05:30
Charles Smith	a929b9f16e	clafiry DISTINCT is optional for COUNT() (#15394 )	2023-11-28 16:52:16 +05:30
Petrichor	b102667695	[Docs] Add example connection parameters for Java APIs (#15345 )	2023-11-28 15:09:41 +05:30
Jill Osborne	3fa856b3ff	Update Kinesis resharding doc (#15401 )	2023-11-20 15:40:59 -08:00
Jill Osborne	6ed343c047	Data management API doc refactor (#15087 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: George Shiqi Wu <george.wu@imply.io> Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> Co-authored-by: ythorat2 <ythorat2@illinois.edu> Co-authored-by: Krishna Anandan <krishna1729atom@gmail.com> Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> Co-authored-by: Rishabh Singh <6513075+findingrish@users.noreply.github.com> Co-authored-by: Magnus Henoch <magnus@gameanalytics.com> Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io> Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Yashdeep Thorat <yashdeep97@gmail.com> Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com> Co-authored-by: Clint Wylie <cwylie@apache.org> Co-authored-by: Gian Merlino <gianmerlino@gmail.com>	2023-11-20 12:34:42 -08:00
317brian	dfc52994d4	docs: fix code tabs (#15403 )	2023-11-20 11:16:10 -08:00
Clint Wylie	a95c22ce70	support non-constant expressions for path arguments for json_value and json_query (#15320 ) * support dynamic expressions for path arguments for json_value and json_query	2023-11-17 01:12:05 -08:00
Atul Mohan	a2914789d7	Add support for ingesting older iceberg snapshots (#15348 ) This patch introduces a param snapshotTime in the iceberg inputsource spec that allows the user to ingest data files associated with the most recent snapshot as of the given time. This helps the user ingest data based on older snapshots by specifying the associated snapshot time. This patch also upgrades the iceberg core version to 1.4.1	2023-11-17 12:32:28 +05:30
Charles Smith	6a5da5a05e	fix redirect for api docs and misc array-related typos (#15387 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-16 13:29:19 -08:00
Karan Kumar	857b8de425	Query from deep storage doc fixes. (#15382 ) Fixing outdated query from deep storage docs.	2023-11-16 14:05:20 +05:30
Adarsh Sanjeev	a134cc30a6	Change default inSubQueryThreshold (#15336 )	2023-11-14 14:08:12 +05:30
YongGang	3a3d37ef40	Fix for segment/count Metric Not Emitting with Statsd-emitter (#15347 ) * fix segment/count metric in Statsd-emitter * update doc * Update docs/development/extensions-contrib/prometheus.md Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update docs/development/extensions-contrib/statsd.md Co-authored-by: Suneet Saldanha <suneet@apache.org> --------- Co-authored-by: Suneet Saldanha <suneet@apache.org>	2023-11-10 08:08:58 -08:00
Charles Smith	e7d0429f5b	docs: suggest metadata store with instant ADD COLUMN semantics (#15334 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-09 12:56:30 -08:00
Pranav	e2fde8c516	Refactor lookups behavior while loading/dropping the containers (#14806 )	2023-11-07 10:07:28 -08:00
Charles Smith	0403e48266	window functions docs (#14739 ) * draft window functions * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address comments * remove default column * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Update docs/querying/sql-window-functions.md Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * fix ntile * remove default header column * code tics to remove spelling errors * add known issues, add SUM example * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * address spelling * remove extra chars * add to sidebar, fix admonition * Update sql-window-functions.md accept suggestion, change admonition style * update sidebar * Delete Untitled.ipynb rm unwanted file * Update docs/querying/sql-window-functions.md * Update docs/querying/sql-window-functions.md * update context param, accept suggestions * accept suggestions * Apply suggestions from code review * Fix known issues * require GROUP BY, explain order of operation * accept suggestions * fix spelling --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-06 11:34:42 -08:00
Rishabh Singh	8c802e4c9b	Relocating Table Schema Building: Shifting from Brokers to Coordinator for Improved Efficiency (#14985 ) In the current design, brokers query both data nodes and tasks to fetch the schema of the segments they serve. The table schema is then constructed by combining the schemas of all segments within a datasource. However, this approach leads to a high number of segment metadata queries during broker startup, resulting in slow startup times and various issues outlined in the design proposal. To address these challenges, we propose centralizing the table schema management process within the coordinator. This change is the first step in that direction. In the new arrangement, the coordinator will take on the responsibility of querying both data nodes and tasks to fetch segment schema and subsequently building the table schema. Brokers will now simply query the Coordinator to fetch table schema. Importantly, brokers will still retain the capability to build table schemas if the need arises, ensuring both flexibility and resilience.	2023-11-04 19:33:25 +05:30
Tts-233	f39a778f7d	Fix 404 URL about native query (#15324 )	2023-11-03 08:39:59 -07:00
Karan Kumar	5036af6fb3	Doc fixes for query from deep storage and MSQ (#15313 ) Minor updates to the documentation. Added prerequisites. Removed a known issue in MSQ since its no longer valid. --------- Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-03 10:52:20 +05:30
cristian-popa	fb260f3e41	docs: LDAP trust store property clarification (#15028 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-02 13:00:08 -07:00
Gian Merlino	d87d92bc43	Add system fields to input sources. (#15276 ) * Add system fields to input sources. Main changes: 1) The SystemField enum defines system fields "__file_uri", "__file_path", and "__file_bucket". They are associated with each input entity. 2) The SystemFieldInputSource interface can be added to any InputSource to make it system-field-capable. It sets up serialization of a list of configured "systemFields" in the JSON form of the input source, and provides a method getSystemFieldValue for computing the value of each system field. Cloud object, HDFS, HTTP, and Local now have this. * Fix various LocalInputSource calls. * Fix style stuff. * Fixups. * Fix tests and coverage.	2023-11-02 10:31:28 -07:00
Clint Wylie	d261587f4a	explicit outputType for ExpressionPostAggregator, better documentation for the differences between arrays and mvds (#15245 ) * better documentation for the differences between arrays and mvds * add outputType to ExpressionPostAggregator to make docs true * add output coercion if outputType is defined on ExpressionPostAgg * updated post-aggregations.md to be consistent with aggregations.md and filters.md and use tables	2023-11-02 00:31:37 -07:00
Charles Smith	de557a62ad	Suggest adoption of Google Style guide (#14905 ) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>	2023-11-01 13:31:03 -07:00
Charles Smith	3860052de0	remove references to Jupyter notebooks within the Druid repo (#15143 ) Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com> Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>	2023-11-01 13:17:06 -07:00
Katya Macedo	935050bf43	docs: Dynamic config cleanup (#15265 ) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>	2023-11-01 11:22:33 -07:00
317brian	436ded3d78	docs: durable storage azure cleanup (#15120 ) Co-authored-by: Laksh Singla <lakshsingla@gmail.com>	2023-10-31 15:20:38 -07:00
Katya Macedo	a43ffbdf2b	[Docs] Improvements to JSON-based batch Ingestion page (#15286 )	2023-10-31 14:50:45 -07:00
317brian	87695410ac	docs: blurb about msq union all (#15223 )	2023-10-31 14:15:38 -07:00
Vishesh Garg	039b05585c	Add worker status and duration metrics in live and task reports (#15180 ) Add worker status and duration metrics in live and task reports for tracking.	2023-10-30 09:43:22 +05:30
317brian	737947754d	docs: add concurent compaction docs (#15218 ) Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>	2023-10-27 10:29:34 -07:00
David Christle	fc0b940f78	Document the allowed range of announcer maxBytesPerNode (#15063 )	2023-10-26 14:51:01 -07:00

1 2 3 4 5 ...

3020 Commits