Commit Graph

24 Commits

Author SHA1 Message Date
Katya Macedo 5f74ef56f1
Clean up Kafka supervisor topic (#14651)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-21 11:55:38 -07:00
Abhishek Agarwal b97cc45d81
Add clarification to the docs for multi-topic Kafka ingestion (#14847)
Follow-up to #14828. Added some more clarification about how topicPattern is used.
2023-08-17 12:52:06 +05:30
317brian 6b4dda964d
Docusaurus2 upgrade for master (#14411)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-08-16 19:01:21 -07:00
Abhishek Agarwal 7911a04064
Refactoring of multi-topic kafka ingestion docs (#14828)
In this PR, I have gotten rid of multiTopic parameter and instead added a topicPattern parameter. Kafka supervisor will pass topicPattern or topic as the stream name to the core ingestion engine. There is validation to ensure that only one of topic or topicPattern will be set. This new setting is easier to understand than overloading the topic field that earlier could be interpreted differently depending on the value of some other field.
2023-08-16 18:00:11 +05:30
Abhishek Agarwal 30b5dd4ca7
Add support to read from multiple kafka topics in same supervisor (#14424)
This PR adds support to read from multiple Kafka topics in the same supervisor. A multi-topic ingestion can be useful in scenarios where a cluster admin has no control over input streams. Different teams in an org may create different input topics that they can write the data to. However, the cluster admin wants all this data to be queryable in one data source.
2023-08-14 22:24:49 +05:30
Gian Merlino 5387f1bac0
Remove chatAsync parameter, so chat is always async. (#14692)
* Remove chatAsync parameter, so chat is always async.

chatAsync has been made default in Druid 26. I have seen good
battle-testing of it in production, and am comfortable removing the
older sync client.

This was the last remaining usage of IndexTaskClient, so this patch
deletes all that stuff too.

* Remove unthrown exception.

* Remove unthrown exception.

* No more TimeoutException.
2023-07-31 19:42:51 -07:00
Jaehui Lee 1f4ee5e21b
Docs: Change default value of "maxRowsInMemory" in tuningConfig (#14618)
Reflecting fixes from https://github.com/apache/druid/pull/13939
2023-07-19 23:14:15 +05:30
Gian Merlino 95ca43034f
Change default handoffConditionTimeout to 15 minutes. (#14539)
* Change default handoffConditionTimeout to 15 minutes.

Most of the time, when handoff is taking this long, it's because something
is preventing Historicals from loading new data. In this case, we have
two choices:

1) Stop making progress on ingestion, wait for Historicals to load stuff,
   and keep the waiting-for-handoff segments available on realtime tasks.
   (handoffConditionTimeout = 0, the current default)

2) Continue making progress on ingestion, by exiting the realtime tasks
   that were waiting for handoff. Once the Historicals get their act
   together, the segments will be loaded, as they are still there on
   deep storage. They will just not be continuously available.
   (handoffConditionTimeout > 0)

I believe most users would prefer [2], because [1] risks ingestion falling
behind the stream, which causes many other problems. It can cause data loss
if the stream ages-out data before we have a chance to ingest it.

Due to the way tuningConfigs are serialized -- defaults are baked into the
serialized form that is written to the database -- this default change will
not change anyone's existing supervisors. It will take effect for newly
created supervisors.

* Fix tests.

* Update docs/development/extensions-core/kafka-supervisor-reference.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kinesis-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

---------

Co-authored-by: Katya Macedo <38017980+ektravel@users.noreply.github.com>
2023-07-13 13:17:14 -07:00
Nhi Pham 579b93f282
API reference refactor (#14372)
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2023-06-26 15:48:54 -07:00
Pramod Immaneni 1ac5544da7
Updated default value of maxTotalRows to reflect the value in the code (#14298) 2023-05-30 14:41:06 +05:30
Katya Macedo 269137c682
Update Ingestion section (#14023)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-05-19 09:42:27 -07:00
Vadim Ogievetsky 3a7e4efdd6
Docs: updating Kafka input format docs (#14049)
* updating Kafka input format docs

* typo

* spellcheck

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>

---------

Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
2023-04-11 20:06:23 -07:00
Tejaswini Bandlamudi ccf48245d7
Update documentation for Kafka Supervisor IdleConfig (#14032) 2023-04-05 21:55:39 +05:30
Gian Merlino fe9d0c46d5
Improve memory efficiency of WrappedRoaringBitmap. (#13889)
* Improve memory efficiency of WrappedRoaringBitmap.

Two changes:

1) Use an int[] for sizes 4 or below.
2) Remove the boolean compressRunOnSerialization. Doesn't save much
   space, but it does save a little, and it isn't adding a ton of value
   to have it be configurable. It was originally configurable in case
   anything broke when enabling it, but it's been a while and nothing
   has broken.

* Slight adjustment.

* Adjust for inspection.

* Updates.

* Update snaps.

* Update test.

* Adjust test.

* Fix snaps.
2023-03-09 15:48:02 -08:00
Gian Merlino fda0a1aadd
Set chatAsync default to true. (#13491)
This functionality was originally added in #13354.
2022-12-05 20:53:59 -08:00
Jill Osborne 100a2aa4a2
Update and document experimental features (#13348)
* Update and document experimental features
* Updated
* Update experimental-features.md
* Update docs/development/experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
* Updated after review
* Updated
* Update materialized-view.md
* Update experimental-features.md
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-29 08:01:28 +05:30
Gian Merlino bfffbabb56
Async task client for SeekableStreamSupervisors. (#13354)
Main changes:
1) Convert SeekableStreamIndexTaskClient to an interface, move old code
   to SeekableStreamIndexTaskClientSyncImpl, and add new implementation
   SeekableStreamIndexTaskClientAsyncImpl that uses ServiceClient.
2) Add "chatAsync" parameter to seekable stream supervisors that causes
   the supervisor to use an async task client.
3) In SeekableStreamSupervisor.discoverTasks, adjust logic to avoid making
   blocking RPC calls in workerExec threads.
4) In SeekableStreamSupervisor generally, switch from Futures.successfulAsList
   to FutureUtils.coalesce, so we can better capture the errors that occurred
   with contacting individual tasks.

Other, related changes:
1) Add ServiceRetryPolicy.retryNotAvailable, which controls whether
   ServiceClient retries unavailable services. Useful since we do not
   want to retry calls unavailable tasks within the service client. (The
   supervisor does its own higher-level retries.)
2) Add FutureUtils.transformAsync, a more lambda friendly version of
   Futures.transform(f, AsyncFunction).
3) Add FutureUtils.coalesce. Similar to Futures.successfulAsList, but
   returns Either instead of using null on error.
4) Add JacksonUtils.readValue overloads for JavaType and TypeReference.
2022-11-21 19:20:26 +05:30
Tejaswini Bandlamudi 594545da55
Adds cluster level idleConfig setting for supervisor (#13311)
* adds cluster level idleConfig

* updates docs

* refactoring

* spelling nit

* nit

* nit

* refactoring
2022-11-08 14:54:14 +05:30
Tejaswini Bandlamudi 3e13584e0e
Adds Idle feature to `SeekableStreamSupervisor` for inactive stream (#13144)
* Idle Seekable stream supervisor changes.

* nit

* nit

* nit

* Adds unit tests

* Supervisor decides it's idle state instead of AutoScaler

* docs update

* nit

* nit

* docs update

* Adds Kafka unit test

* Adds Kafka Integration test.

* Updates travis config.

* Updates kafka-indexing-service dependencies.

* updates previous offsets snapshot & doc

* Doesn't act if supervisor is suspended.

* Fixes highest current offsets fetch bug, adds new Kafka UT tests, doc changes.

* Reverts Kinesis Supervisor idle behaviour changes.

* nit

* nit

* Corrects SeekableStreamSupervisorSpec check on idle behaviour config, adds tests.

* Fixes getHighestCurrentOffsets to fetch offsets of publishing tasks too

* Adds Kafka Supervisor UT

* Improves test coverage in druid-server

* Corrects IT override config

* Doc updates and Syntactic changes

* nit

* supervisorSpec.ioConfig.idleConfig changes
2022-10-12 18:31:08 +05:30
Gian Merlino d4967c38f8
Various documentation updates. (#13107)
* Various documentation updates.

1) Split out "data management" from "ingestion". Break it into thematic pages.

2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so
   all conceptual content is in concepts.md and all syntax content is in reference.md.
   Shorten the known issues page to the most interesting ones.

3) Add SQL-based ingestion to the ingestion method comparison page. Remove the
   index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1.

4) Rename various mentions of "Druid console" to "web console".

5) Add additional information to ingestion/partitioning.md.

6) Remove a mention of Tranquility.

7) Remove a note about upgrading to Druid 0.10.1.

8) Remove no-longer-relevant task types from ingestion/tasks.md.

9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated.

10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some
    places, but it isn't very useful compared to index_parallel, so it shouldn't take up space
    in the sidebar.

11) Make all br tags self-closing.

12) Certain other cosmetic changes.

13) Update to node-sass 7.

* make travis use node12 for docs

Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
2022-09-16 21:58:11 -07:00
Dr. Sizzles 7291c92f4f
Adding zstandard compression library (#12408)
* Adding zstandard compression library

* 1. Took @clintropolis's advice to have ZStandard decompressor use the byte array when the buffers are not direct.
2. Cleaned up checkstyle issues.

* Fixing zstandard version to latest stable version in pom's and updating license files

* Removing zstd from benchmarks and adding to processing (poms)

* fix the intellij inspection issue

* Removing the prefix v for the version in the license check for ztsd

* Fixing license checks

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
2022-05-28 17:01:44 -07:00
Victoria Lim 4ede3bbff6
Docs updates (#12069)
* minor updates to docs

* remove en.json
2021-12-14 14:38:18 -08:00
jacobtolar f7f5505631
Add avro_ocf to supported Kafka/Kinesis InputFormats (#11865)
* Update docs - Kinesis InputFormat ingestion

* Add avro_ocf to list of supported Kafka InputFormats

* Remove extra whitespace.

* Update kafka-supervisor-reference.md

* Delete extra whitespace.
2021-12-03 07:57:26 -08:00
Charles Smith 33a5cda061
Docs: Splits Kafka topic. Adds detailed example for kafka inputFormat (#11912)
* Splits Kafka topic according to function. Adds detailed example for kafka inputFormat

* Apply suggestions from code review

accept suggestions from review

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Apply suggestions from code review

accept suggestions

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* accept suggestions

* accept suggestions

* final typos and clarifications

* bringing forward some syntax fixes

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-11-12 13:02:23 -08:00