Commit Graph

3215 Commits

Author SHA1 Message Date
Hugh Evans e91f680d50
Removed deprecated deep storage properties (#16904) 2024-08-15 11:54:34 -07:00
Hugh Evans 6cfdeb3894
Added a topic listing reserved keywords (#16843) 2024-08-15 10:25:09 -07:00
Hugh Evans 8c030feefc
Migration guide fixes (#16902)
* Fix typo in table header

* Fixed example NVL result
2024-08-15 09:26:34 -07:00
Rishabh Singh f67ff92d07
[bugfix] Run cold schema refresh thread periodically (#16873)
* Fix build

* Run coldSchemaExec thread periodically

* Bugfix: Run cold schema refresh periodically

* Rename metrics for deep storage only segment schema process
2024-08-13 11:44:01 +05:30
Abhishek Radhakrishnan d7dfbebf97
[Docs]: Fix typo and update broadcast rules section (#16882)
* Fix typo in waitUntilSegmentsLoad.

* Add a note on configuring druid.segmentCache.locations for broadcast rules.

* Update docs/operations/rule-configuration.md

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-08-12 13:55:33 -07:00
aaronm-bi ceed4a0634
Docs: Update list of ingestion types that support concurrent append and replace (#16852) 2024-08-08 08:06:22 +05:30
Atul Mohan 76ad17fb4c
Add config for http client connect timeout (#16831)
Adds a configuration clientConnectTimeout to our http client config which controls the connection timeout for our http client requests.

It was observed that on busy K8S clusters, the default connect timeout of 500ms is sometimes not enough time to complete syn/acks for a request and in these cases, the requests timeout with the error:
exceptionType=java.net.SocketTimeoutException, exceptionMessage=Connect Timeout
This behavior was mostly observed on the router while forwarding queries to the broker.
Having a slightly higher connect timeout helped resolve these issues.
2024-08-07 19:31:10 +05:30
Sree Charan Manamala 1f6d2c41d2
Update doc for dynamic parameters supporting array (#16660)
Update dynamic parameter docs to provide how it can used to replace an Array
2024-08-07 12:33:37 +05:30
Edgar Melendrez 83cf4dc554
[docs] fixes to sql-scalar.md (#16826)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-08-06 17:12:57 -07:00
zachjsh c324f09108
Kinesis input format docs (#16840)
* SQL syntax error should target USER persona

* * revert change to queryHandler and related tests, based on review comments

* * add test

* Docs for Kinesis input format

* * remove reference to kafka

* * fix spellcheck error

* Apply suggestions from code review

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-08-06 18:53:10 -04:00
Edgar Melendrez ebea34a814
[Docs] Batch06: starting string functions (#16838)
* batch06, starting string functions

* addind space after Syntax

* quick change

* correcting spelling

* Update docs/querying/sql-functions.md

* Update sql-functions.md

* applying suggestions

* Update docs/querying/sql-functions.md

* Update docs/querying/sql-functions.md

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-08-06 11:32:26 -07:00
Kashif Faraz aa49be61ea
Do not create ZK paths if not needed (#16816)
Background:
ZK-based segment loading has been completely disabled in #15705 .
ZK `servedSegmentsPath` has been deprecated since Druid 0.7.1, #1182 .
This legacy path has been replaced by the `liveSegmentsPath` and is not used in the code anymore.

Changes:
- Never create ZK loadQueuePath as it is never used.
- Never create ZK servedSegmentsPath as it is never used.
- Do not create ZK liveSegmentsPath if announcement on ZK is disabled
- Fix up tests
2024-08-06 19:29:13 +05:30
Rushikesh Bankar c8323d1a7c
Add indexer task success and failure metrics (#16829)
This PR adds indexer-level task metrics-

"indexer/task/failed/count"
"indexer/task/success/count"

the current "worker/task/completed/count" metric shows all the tasks completed irrespective of success or failure status so these metrics would help us get more visibility into the status of the completed tasks
2024-08-05 16:21:27 +05:30
Laksh Singla 0411c4e67e
Add metrics for number of rows/bytes materialized while running subqueries (#16835)
subquery/rows and subquery/bytes metrics have been added, which indicate the size of the results materialized on the heap.
2024-08-05 14:13:20 +05:30
Kashif Faraz 9dc2569f22
Track and emit segment loading rate for HttpLoadQueuePeon on Coordinator (#16691)
Design:
The loading rate is computed as a moving average of at least the last 10 GiB of successful segment loads.
To account for multiple loading threads on a server, we use the concept of a batch to track load times.
A batch is a set of segments added by the coordinator to the load queue of a server in one go.

Computation:
batchDurationMillis = t(load queue becomes empty) - t(first load request in batch is sent to server)
batchBytes = total bytes successfully loaded in batch
avg loading rate in batch (kbps) = (8 * batchBytes) / batchDurationMillis
overall avg loading rate (kbps) = (8 * sumOverWindow(batchBytes)) / sumOverWindow(batchDurationMillis)

Changes:
- Add `LoadingRateTracker` which computes a moving average load rate based on
the last few GBs of successful segment loads.
- Emit metric `segment/loading/rateKbps` from the Coordinator. In the future, we may
also consider emitting this metric from the historicals themselves.
- Add `expectedLoadTimeMillis` to response of API `/druid/coordinator/v1/loadQueue?simple`
2024-08-03 13:14:21 +05:30
Akshat Jain bb4d6cc001
Add task report fields in response of SQL statements endpoint (#16808)
If the optional query parameter detail is supplied, then the response also includes the following:

 * A stages object that summarizes information about the different stages being used for query execution, such as stage number, phase, start time, duration, input and output information, processing methods, and partitioning.
* A counters object that provides details on the rows, bytes, and files processed at various stages for each worker across different channels, along with sort progress.
* A warnings object that provides details about any warnings.
2024-08-01 10:26:04 +05:30
Edgar Melendrez 3bb6d40285
[docs] batch 5 updating functions (#16812)
* batch 5

* Update docs/querying/sql-functions.md

* applying suggestions

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-30 17:30:01 -07:00
Edgar Melendrez 85a8a1d805
[Docs]Batch04 - Bitwise numeric functions (#16805)
* Batch04 - Bitwise numeric functions

* Batch04 - Bitwise numeric functions

* minor fixes

* rewording bitwise_shift functions

* rewording bitwise_shift functions

* Update docs/querying/sql-functions.md

* applying suggestions

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-30 10:53:59 -07:00
Edgar Melendrez c07aeedbec
[docs] Updating Rollup tutorial (#16762)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2024-07-26 15:43:31 -07:00
Edgar Melendrez 028ee23a1e
[Docs] batch 03 - trig functions (#16795)
* batch 03 - trig functions

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* applying suggestions and corrections

---------

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-07-26 13:11:17 -07:00
Charles Smith ed48cb82e9
[Docs} Remove avro_ocf support from Kafka & Kinesis streaming sources (Revert changes from #11865) (#16807) 2024-07-26 13:06:22 -07:00
Clint Wylie 5da69a01cb
change arrayIngestMode default to array (#16789)
* change arrayIngestMode default to array

* remove arrayIngestMode flag option none

* fix space

* fix test
2024-07-25 15:09:40 +08:00
Zoltan Haindrich 7e3fab5bf9
Make WindowFrames more specific (#16741)
Changes the WindowFrame internals / representation a bit; introduces dedicated frametypes for rows and groups which corresponds to the implemented processing methods
2024-07-25 04:57:36 +02:00
Edgar Melendrez ca787885c9
[docs] batch02 of updating functions (#16761)
* applying changes

* ensuring batch is updated

* Update docs/querying/sql-functions.md

* raise -> raises

* addressing review

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-07-24 15:28:57 -07:00
317brian 704962ec8e
doc: minor fixes to migration guides (#16784) 2024-07-23 13:09:51 -07:00
Edgar Melendrez 934c10b1cd
docs: Adding admonition box to warn about MVD (#16712)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-22 17:32:23 -07:00
Clint Wylie 02b8738c00
remove batchProcessingMode from task config, remove AppenderatorImpl (#16765)
changes:
* removes `druid.indexer.task.batchProcessingMode` in favor of always using `CLOSED_SEGMENT_SINKS` which uses `BatchAppenderator`. This was intended to become the default for native batch, but that was missed so `CLOSED_SEGMENTS` was the default (using `AppenderatorImpl`), however MSQ has been exclusively using `BatchAppenderator` with no problems so it seems safe to just roll it out as the only option for batch ingestion everywhere.
* with `batchProcessingMode` gone, there is no use for `AppenderatorImpl` so it has been removed
* implify `Appenderator` construction since there are only separate stream and batch versions now
* simplify tests since `batchProcessingMode` is gone
2024-07-22 13:56:44 -07:00
Clint Wylie a34a06e192
remove Firehose and FirehoseFactory (#16758)
changes:
* removed `Firehose` and `FirehoseFactory` and remaining implementations which were mostly no longer used after #16602
* Moved `IngestSegmentFirehose` which was still used internally by Hadoop ingestion to `DatasourceRecordReader.SegmentReader`
* Rename `SQLFirehoseFactoryDatabaseConnector` to `SQLInputSourceDatabaseConnector` and similar renames for sub-classes
* Moved anything remaining in a 'firehose' package somewhere else
* Clean up docs on firehose stuff
2024-07-19 14:37:21 -07:00
Charles Smith 1881880714
[Docs] Adds a migration guide SQL compatible null handling (#16704)
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2024-07-19 09:25:05 -07:00
Clint Wylie 35b876436b
remove native scan query legacy mode (#16659) 2024-07-18 23:33:27 -07:00
Edgar Melendrez 721a65046f
docs: add examples for SQL functions (#16745)
* updating first batch of numeric functions

* First batch of functions

* addressing first few comments

* alphabetize list

* draft with suggestions applied

* minor discrepency expr -> <NUMERIC>

* changed raises to calculates

* Update docs/querying/sql-functions.md

* switch to underscore

* changed to exp(1) to match slack message

* adding html text for trademark symbol to .spelling

* fixed discrepancy between description and example

---------

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2024-07-18 17:06:22 -07:00
Kashif Faraz 9f6ce6ddc0
Remove task action audit logging and druid_taskLog metadata table (#16309)
Description:
Task action audit logging was first deprecated and disabled by default in Druid 0.13, #6368.

As called out in the original discussion #5859, there are several drawbacks to persisting task action audit logs. 
- Only usage of the task audit logs is to serve the API `/indexer/v1/task/{taskId}/segments`
which returns the list of segments created by a task.
- The use case is really narrow and no prod clusters really use this information.
- There can be better ways of obtaining this information, such as the metric
`segment/added/bytes` which reports both the segment ID and task ID
when a segment is committed by a task. We could also include committed segment IDs in task reports.
- A task persisting several segments would bloat up the audit logs table putting unnecessary strain
on metadata storage.

Changes:
- Remove `TaskAuditLogConfig`
- Remove method `TaskAction.isAudited()`. No task action is audited anymore.
- Remove `SegmentInsertAction` as it is not used anymore. `SegmentTransactionalInsertAction`
is the new incarnation which has been in use for a while.
- Deprecate `MetadataStorageActionHandler.addLog()` and `getLogs()`. These are not used anymore
but need to be retained for backward compatibility of extensions.
- Do not create `druid_taskLog` metadata table anymore.
2024-07-17 17:09:00 +05:30
Vadim Ogievetsky 307b8849de
Web console: better sql data loader reset (#16696)
* better sql data loader reset

* snapshot

* fix destination pane sizing

* clean doc links

* update doc links

* more doc links

* extract getClusterCapacity

* update snapsohts

* allow submit suspended

* some renaming

* diff with current

* Do delta
2024-07-11 14:45:04 -07:00
YongGang 4b293fc2a9
Docs: Fix k8s dynamic config URL (#16720) 2024-07-11 10:05:47 +05:30
Lars Francke 586c713d12
Updates build documentation to not mention explicit Java version as it was out of sync with the dedicated Java page. (#16674)
This means there is one less place to keep information in sync.
2024-07-03 20:53:15 +05:30
317brian d65e015c94
docs: nit for link format (#16687) 2024-07-02 16:45:09 -07:00
Victoria Lim adde024e11
docs: Subtitle updates in migration guide overview (#16683) 2024-07-02 12:56:05 -07:00
Jill Osborne bd49ecfd29
Addition to subquery limit migration guide (#16671)
Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-07-01 14:22:47 -07:00
Hugh Evans 920d9020c0
Docs: Fix default value for globalIngestionHeapLimitBytes (#16654)
Use the new default value added in #8255
2024-06-27 07:01:56 +05:30
Gian Merlino dbed1b0f50
Defer more expressions in vectorized groupBy. (#16338)
* Defer more expressions in vectorized groupBy.

This patch adds a way for columns to provide GroupByVectorColumnSelectors,
which controls how the groupBy engine operates on them. This mechanism is used
by ExpressionVirtualColumn to provide an ExpressionDeferredGroupByVectorColumnSelector
that uses the inputs of an expression as the grouping key. The actual expression
evaluation is deferred until the grouped ResultRow is created.

A new context parameter "deferExpressionDimensions" allows users to control when
this deferred selector is used. The default is "fixedWidthNonNumeric", which is a
behavioral change from the prior behavior. Users can get the prior behavior by setting
this to "singleString".

* Fix style.

* Add deferExpressionDimensions to SqlExpressionBenchmark.

* Fix style.

* Fix inspections.

* Add more testing.

* Use valueOrDefault.

* Compute exprKeyBytes a bit lighter-weight.
2024-06-26 17:28:36 -07:00
Andreas Maechler ab76d851ad
Update docs contribution with correct script (#16581)
* Spacing

* Fix ordering

* npm run start
2024-06-26 10:30:52 -07:00
Laksh Singla 71b3b5ab5d
Add query context parameter to remove null bytes when writing frames (#16579)
MSQ cannot process null bytes in string fields, and the current workaround is to remove them using the REPLACE function. 'removeNullBytes' context parameter has been added which sanitizes the input string fields by removing these null bytes.
2024-06-26 15:00:30 +05:30
Edgar Melendrez b43f4063c5
Docs: update link and title of quickstart (#16638)
* update link and title

* Discard changes to website/package.json

* Apply suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2024-06-25 09:07:00 -07:00
Clint Wylie 37a50e6803
Remove index_realtime and index_realtime_appenderator tasks (#16602)
index_realtime tasks were removed from the documentation in #13107. Even
at that time, they weren't really documented per se— just mentioned. They
existed solely to support Tranquility, which is an obsolete ingestion
method that predates migration of Druid to ASF and is no longer being
maintained. Tranquility docs were also de-linked from the sidebars and
the other doc pages in #11134. Only a stub remains, so people with
links to the page can see that it's no longer recommended.

index_realtime_appenderator tasks existed in the code base, but were
never documented, nor as far as I am aware were they used for any purpose.

This patch removes both task types completely, as well as removes all
supporting code that was otherwise unused. It also updates the stub
doc for Tranquility to be firmer that it is not compatible. (Previously,
the stub doc said it wasn't recommended, and pointed out that it is
built against an ancient 0.9.2 version of Druid.)

ITUnionQueryTest has been migrated to the new integration tests framework and updated to use Kafka ingestion.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2024-06-24 20:13:33 -07:00
317brian 2131917f16
docs: added front-coded dictionaries to upgrade notes (#16647)
* docs: add front-coded dictionareis to upgrade notes

* add it to release notes template
2024-06-24 10:52:26 -07:00
Abhishek Radhakrishnan 7463589b07
Support for bootstrap segments (#16609)
* Initial support for bootstrap segments.

  - Adds a new API in the coordinator.
  - All processes that have storage locations configured (including tasks)
    talk to the coordinator if they can, and fetch bootstrap segments from it.
  - Then load the segments onto the segment cache as part of startup.
  - This addresses the segment bootstrapping logic required by processes before
    they can start serving queries or ingesting.

    This patch also lays the foundation to speed up upgrades.

* Fail open by default if there are any errors talking to the coordinator.

* Add test for failure scenario and cleanup logs.

* Cleanup and add debug log

* Assert the events so we know the list exactly.

* Revert RunRules test.

The rules aren't evaluated if there are no clusters.

* Revert RunRulesTest too.

* Remove debug info.

* Make the API POST and update log.

* Fix up UTs.

* Throw 503 from MetadataResource; clean up exception handling and DruidException.

* Remove unused logger, add verification of metrics and docs.

* Update error message

* Update server/src/main/java/org/apache/druid/server/coordination/SegmentLoadDropHandler.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Apply suggestions from code review

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

* Adjust test metric expectations with the rename.

* Add BootstrapSegmentResponse container in the response for future extensibility.

* Rename to BootstrapSegmentsInfo for internal consistency.

* Remove unused log.

* Use a member variable for broadcast segments instead of segmentAssigner.

* Minor cleanup

* Add test for loadable bootstrap segments and clarify comment.

* Review suggestions.

---------

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
2024-06-24 09:27:17 -07:00
Suneet Saldanha 4e0ea7823b
Update docs for K8s TaskRunner Dynamic Config (#16600)
* Update docs for K8s TaskRunner Dynamic Config

* touchups

* code review

* npe

* oopsies
2024-06-21 06:01:59 -07:00
Akshat Jain cd438b1918
Emit metrics for S3UploadThreadPool (#16616)
* Emit metrics for S3UploadThreadPool

* Address review comments

* Revert unnecessary formatting change

* Revert unnecessary formatting change in metrics.md file

* Address review comments

* Add metric for task duration

* Minor fix in metrics.md

* Add s3Key and uploadId in the log message

* Address review comments

* Create new instance of ServiceMetricEvent.Builder for thread safety

* Address review comments

* Address review comments
2024-06-21 11:36:47 +05:30
Andreas Maechler ae70e18bc8
docs: Update Azure extension (#16585)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
2024-06-20 09:31:29 -07:00
Jill Osborne aec1d5ddd6
Link fix (#16596)
* Link fix

* Update docs/operations/auth.md

Co-authored-by: Andreas Maechler <amaechler@gmail.com>

---------

Co-authored-by: Andreas Maechler <amaechler@gmail.com>
2024-06-14 11:40:53 -07:00