Make docs on reset supervisor operation scarier (#9288)

* Update kafka-ingestion.md

Companion doc update to #9253, intended to make a supervisor reset scarier

* Update kinesis-ingestion.md
This commit is contained in:
sthetland 2020-02-04 15:30:32 -08:00 committed by GitHub
parent 768d60c7b4
commit 556a3861ed
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 40 additions and 30 deletions

View File

@ -308,26 +308,30 @@ it will just ensure that no indexing tasks are running until the supervisor is r
### Resetting Supervisors
To reset a running supervisor, you can use `POST /druid/indexer/v1/supervisor/<supervisorId>/reset`.
The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears stored
offsets, causing the supervisor to start reading offsets from either the earliest or latest
offsets in Kafka (depending on the value of `useEarliestOffset`). After clearing stored
offsets, the supervisor kills and recreates any active tasks, so that tasks begin reading
from valid offsets.
The indexing service keeps track of the latest persisted Kafka offsets in order to provide exactly-once ingestion
guarantees across tasks. Subsequent tasks must start reading from where the previous task completed in order for the
generated segments to be accepted. If the messages at the expected starting offsets are no longer available in Kafka
(typically because the message retention period has elapsed or the topic was removed and re-created) the supervisor will
refuse to start and in-flight tasks will fail.
Use care when using this operation! Resetting the supervisor may cause Kafka messages
to be skipped or read twice, resulting in missing or duplicate data.
This endpoint can be used to clear the stored offsets which will cause the supervisor to start reading from
either the earliest or latest offsets in Kafka (depending on the value of `useEarliestOffset`). The supervisor must be
running for this endpoint to be available. After the stored offsets are cleared, the supervisor will automatically kill
and re-create any active tasks so that tasks begin reading from valid offsets.
The reason for using this operation is to recover from a state in which the supervisor
ceases operating due to missing offsets. The indexing service keeps track of the latest
persisted Kafka offsets in order to provide exactly-once ingestion guarantees across
tasks. Subsequent tasks must start reading from where the previous task completed in
order for the generated segments to be accepted. If the messages at the expected
starting offsets are no longer available in Kafka (typically because the message retention
period has elapsed or the topic was removed and re-created) the supervisor will refuse
to start and in flight tasks will fail. This operation enables you to recover from this condition.
Note that since the stored offsets are necessary to guarantee exactly-once ingestion, resetting them with this endpoint
may cause some Kafka messages to be skipped or to be read twice.
Note that the supervisor must be running for this endpoint to be available.
### Terminating Supervisors
`POST /druid/indexer/v1/supervisor/<supervisorId>/terminate` terminates a supervisor and causes all associated indexing
tasks managed by this supervisor to immediately stop and begin
The `POST /druid/indexer/v1/supervisor/<supervisorId>/terminate` operation terminates a supervisor and causes all
associated indexing tasks managed by this supervisor to immediately stop and begin
publishing their segments. This supervisor will still exist in the metadata store and it's history may be retrieved
with the supervisor history API, but will not be listed in the 'get supervisors' API response nor can it's configuration
or status report be retrieved. The only way this supervisor can start again is by submitting a functioning supervisor

View File

@ -306,28 +306,34 @@ it will just ensure that no indexing tasks are running until the supervisor is r
### Resetting Supervisors
To reset a running supervisor, you can use `POST /druid/indexer/v1/supervisor/<supervisorId>/reset`.
The `POST /druid/indexer/v1/supervisor/<supervisorId>/reset` operation clears stored
sequence numbers, causing the supervisor to start reading from either the earliest or
latest sequence numbers in Kinesis (depending on the value of `useEarliestSequenceNumber`).
After clearing stored sequence numbers, the supervisor kills and recreates active tasks,
so that tasks begin reading from valid sequence numbers.
The indexing service keeps track of the latest persisted Kinesis sequence number in order to provide exactly-once ingestion
guarantees across tasks. Subsequent tasks must start reading from where the previous task completed in order for the
generated segments to be accepted. If the messages at the expected starting sequence numbers are no longer available in Kinesis
(typically because the message retention period has elapsed or the topic was removed and re-created) the supervisor will
refuse to start and in-flight tasks will fail.
Use care when using this operation! Resetting the supervisor may cause Kinesis messages
to be skipped or read twice, resulting in missing or duplicate data.
This endpoint can be used to clear the stored sequence numbers which will cause the supervisor to start reading from
either the earliest or latest sequence numbers in Kinesis (depending on the value of `useEarliestSequenceNumber`). The supervisor must be
running for this endpoint to be available. After the stored sequence numbers are cleared, the supervisor will automatically kill
and re-create any active tasks so that tasks begin reading from valid sequence numbers.
The reason for using this operation is to recover from a state in which the supervisor
ceases operating due to missing sequence numbers. The indexing service keeps track of the latest
persisted sequence number in order to provide exactly-once ingestion guarantees across
tasks.
Note that since the stored sequence numbers are necessary to guarantee exactly-once ingestion, resetting them with this endpoint
may cause some Kinesis messages to be skipped or to be read twice.
Subsequent tasks must start reading from where the previous task completed in
order for the generated segments to be accepted. If the messages at the expected starting sequence numbers are
no longer available in Kinesis (typically because the message retention period has elapsed or the topic was
removed and re-created) the supervisor will refuse to start and in-flight tasks will fail. This operation
enables you to recover from this condition.
Note that the supervisor must be running for this endpoint to be available.
### Terminating Supervisors
`POST /druid/indexer/v1/supervisor/<supervisorId>/terminate` terminates a supervisor and causes all associated indexing
tasks managed by this supervisor to immediately stop and begin
publishing their segments. This supervisor will still exist in the metadata store and it's history may be retrieved
with the supervisor history API, but will not be listed in the 'get supervisors' API response nor can it's configuration
The `POST /druid/indexer/v1/supervisor/<supervisorId>/terminate` operation terminates a supervisor and causes
all associated indexing tasks managed by this supervisor to immediately stop and begin
publishing their segments. This supervisor will still exist in the metadata store and its history may be retrieved
with the supervisor history API, but will not be listed in the 'get supervisors' API response nor can its configuration
or status report be retrieved. The only way this supervisor can start again is by submitting a functioning supervisor
spec to the create API.