Data management API doc refactor (#15087)

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: George Shiqi Wu <george.wu@imply.io>
Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com>
Co-authored-by: ythorat2 <ythorat2@illinois.edu>
Co-authored-by: Krishna Anandan <krishna1729atom@gmail.com>
Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com>
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
Co-authored-by: Rishabh Singh <6513075+findingrish@users.noreply.github.com>
Co-authored-by: Magnus Henoch <magnus@gameanalytics.com>
Co-authored-by: AmatyaAvadhanula <amatya.avadhanula@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Yashdeep Thorat <yashdeep97@gmail.com>
Co-authored-by: Atul Mohan <atulmohan.mec@gmail.com>
Co-authored-by: Clint Wylie <cwylie@apache.org>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
This commit is contained in:
Jill Osborne 2023-11-20 20:34:42 +00:00 committed by GitHub
parent dfc52994d4
commit 6ed343c047
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 548 additions and 33 deletions

View File

@ -23,59 +23,571 @@ sidebar_label: Data management
~ under the License.
-->
This document describes the data management API endpoints for Apache Druid. This includes information on how to mark segments as `used` or `unused` and delete them from Druid.
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Note for Coordinator's POST and DELETE APIs
This topic describes the data management API endpoints for Apache Druid.
This includes information on how to mark segments as used or unused and delete them from Druid.
While segments may be enabled by issuing POST requests for the datasources, the Coordinator may again disable segments if they match any configured [drop rules](../operations/rule-configuration.md#drop-rules). Even if segments are enabled by these APIs, you must configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes. If an indexing or kill task runs at the same time these APIs are invoked, the behavior is undefined. Some segments might be killed and others might be enabled. It's also possible that all segments might be disabled, but the indexing task can still read data from those segments and succeed.
In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port.
Replace it with the information for your deployment.
For example, use `http://localhost:8888` for quickstart deployments.
:::info
Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
Avoid using indexing or kill tasks and these APIs at the same time for the same datasource and time chunk.
:::
`POST /druid/coordinator/v1/datasources/{dataSourceName}`
## Segment management
Marks as used all segments belonging to a datasource. Returns a JSON object of the form
`{"numChangedSegments": <number>}` with the number of segments in the database whose state has been changed (that is,
the segments were marked as used) as the result of this API call.
You can mark segments as used by sending POST requests to the datasource, but the Coordinator may subsequently mark segments as unused if they meet any configured [drop rules](../operations/rule-configuration.md#drop-rules).
Even if these API requests update segments to used, you still need to configure a [load rule](../operations/rule-configuration.md#load-rules) to load them onto Historical processes.
`POST /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
When you use these APIs concurrently with an indexing task or a kill task, the behavior is undefined.
Druid terminates some segments and marks others as used.
Furthermore, it is possible that all segments could be unused, yet an indexing task might still be able to read data from these segments and complete successfully.
Marks as used a segment of a datasource. Returns a JSON object of the form `{"segmentStateChanged": <boolean>}` with
the boolean indicating if the state of the segment has been changed (that is, the segment was marked as used) as the
result of this API call.
### Segment IDs
`POST /druid/coordinator/v1/datasources/{dataSourceName}/markUsed`
You must provide segment IDs when using many of the endpoints described in this topic.
For information on segment IDs, see [Segment identification](../design/segments.md#segment-identification).
For information on finding segment IDs in the web console, see [Segments](../operations/web-console.md#segments).
`POST /druid/coordinator/v1/datasources/{dataSourceName}/markUnused`
### Mark a single segment unused
Marks segments (un)used for a datasource by interval or set of segment Ids. When marking used only segments that are not overshadowed will be updated.
Marks the state of a segment as unused, using the segment ID.
This is a "soft delete" of the segment from Historicals.
To undo this action, [mark the segment used](#mark-a-single-segment-as-used).
The request payload contains the interval or set of segment IDs to be marked unused.
Either interval or segment IDs should be provided, if both or none are provided in the payload, the API would throw an error (400 BAD REQUEST).
Note that this endpoint returns an HTTP `200 OK` response code even if the segment ID or datasource doesn't exist.
Interval specifies the start and end times as IS0 8601 strings. `interval=(start/end)` where start and end both are inclusive and only the segments completely contained within the specified interval will be disabled, partially overlapping segments will not be affected.
#### URL
JSON Request Payload:
<code class="deleteAPI">DELETE</code> <code>/druid/coordinator/v1/datasources/:datasource/segments/:segmentId</code>
|Key|Description|Example|
#### Header
The following headers are required for this request:
```json
Content-Type: application/json
Accept: application/json, text/plain
```
#### Responses
<Tabs>
<TabItem value="1" label="200 SUCCESS">
*Successfully updated segment*
</TabItem>
</Tabs>
---
#### Sample request
The following example updates the segment `wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z` from datasource `wikipedia_hour` as `unused`.
<Tabs>
<TabItem value="2" label="cURL">
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json, text/plain'
```
</TabItem>
<TabItem value="3" label="HTTP">
```HTTP
DELETE /druid/coordinator/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T16:00:00.000Z_2015-09-12T17:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Accept: application/json, text/plain
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"segmentStateChanged": true
}
```
</details>
### Mark a single segment as used
Marks the state of a segment as used, using the segment ID.
#### URL
<code class="postAPI">POST</code> <code>/druid/coordinator/v1/datasources/segments/:segmentId</code>
#### Header
The following headers are required for this request:
```json
Content-Type: application/json
Accept: application/json, text/plain
```
#### Responses
<Tabs>
<TabItem value="4" label="200 SUCCESS">
*Successfully updated segments*
</TabItem>
</Tabs>
---
#### Sample request
The following example updates the segment with ID `wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z` to used.
<Tabs>
<TabItem value="5" label="cURL">
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json, text/plain'
```
</TabItem>
<TabItem value="6" label="HTTP">
```HTTP
POST /druid/coordinator/v1/datasources/wikipedia_hour/segments/wikipedia_hour_2015-09-12T18:00:00.000Z_2015-09-12T19:00:00.000Z_2023-08-10T04:12:03.860Z HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Accept: application/json, text/plain
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"segmentStateChanged": true
}
```
</details>
### Mark a group of segments unused
Marks the state of a group of segments as unused, using an array of segment IDs or an interval.
Pass the array of segment IDs or interval as a JSON object in the request body.
For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time.
Druid only updates the segments completely contained within the specified interval; partially overlapping segments are not affected.
#### URL
<code class="postAPI">POST</code> <code>/druid/coordinator/v1/datasources/:datasource/markUnused</code>
#### Request body
The group of segments is sent as a JSON request payload that accepts one of the following properties:
|Property|Description|Example|
|----------|-------------|---------|
|`interval`|The interval for which to mark segments unused|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`|
|`segmentIds`|Set of segment IDs to be marked unused|`["segmentId1", "segmentId2"]`|
|`interval`|ISO 8601 segments interval.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`|
|`segmentIds`|Array of segment IDs.|`["segmentId1", "segmentId2"]`|
`DELETE /druid/coordinator/v1/datasources/{dataSourceName}`
#### Responses
Marks as unused all segments belonging to a datasource. Returns a JSON object of the form
`{"numChangedSegments": <number>}` with the number of segments in the database whose state has been changed (that is,
the segments were marked as unused) as the result of this API call.
<Tabs>
`DELETE /druid/coordinator/v1/datasources/{dataSourceName}/intervals/{interval}`
`@Deprecated. /druid/coordinator/v1/datasources/{dataSourceName}?kill=true&interval={myInterval}`
<TabItem value="7" label="200 SUCCESS">
Runs a [Kill task](../ingestion/tasks.md) for a given interval and datasource.
`DELETE /druid/coordinator/v1/datasources/{dataSourceName}/segments/{segmentId}`
*Successfully updated segments*
Marks as unused a segment of a datasource. Returns a JSON object of the form `{"segmentStateChanged": <boolean>}` with
the boolean indicating if the state of the segment has been changed (that is, the segment was marked as unused) as the
result of this API call.
</TabItem>
<TabItem value="8" label="204 NO CONTENT">
*Invalid datasource name*
</TabItem>
<TabItem value="9" label="400 BAD REQUEST">
*Invalid request payload*
</TabItem>
</Tabs>
---
#### Sample request
The following example marks two segments from the `wikipedia_hour` datasource unused based on their segment IDs.
<Tabs>
<TabItem value="10" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/markUnused" \
--header 'Content-Type: application/json' \
--data '{
"segmentIds": [
"wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z",
"wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z"
]
}'
```
</TabItem>
<TabItem value="11" label="HTTP">
```HTTP
POST /druid/coordinator/v1/datasources/wikipedia_hour/markUnused HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Content-Length: 230
{
"segmentIds": [
"wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z",
"wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z"
]
}
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"numChangedSegments": 2
}
```
</details>
### Mark a group of segments used
Marks the state of a group of segments as used, using an array of segment IDs or an interval.
Pass the array of segment IDs or interval as a JSON object in the request body.
For the interval, specify the start and end times as ISO 8601 strings to identify segments inclusive of the start time and exclusive of the end time.
Druid only updates the segments completely contained within the specified interval; partially overlapping segments are not affected.
#### URL
<code class="postAPI">POST</code> <code>/druid/coordinator/v1/datasources/:datasource/markUsed</code>
#### Request body
The group of segments is sent as a JSON request payload that accepts one of the following properties:
|Property|Description|Example|
|----------|-------------|---------|
|`interval`| ISO 8601 segments interval.|`"2015-09-12T03:00:00.000Z/2015-09-12T05:00:00.000Z"`|
|`segmentIds`|Array of segment IDs.|`["segmentId1", "segmentId2"]`|
#### Responses
<Tabs>
<TabItem value="12" label="200 SUCCESS">
*Successfully updated segments*
</TabItem>
<TabItem value="13" label="204 NO CONTENT">
*Invalid datasource name*
</TabItem>
<TabItem value="14" label="400 BAD REQUEST">
*Invalid request payload*
</TabItem>
</Tabs>
---
#### Sample request
The following example marks two segments from the `wikipedia_hour` datasource used based on their segment IDs.
<Tabs>
<TabItem value="15" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/markUsed" \
--header 'Content-Type: application/json' \
--data '{
"segmentIds": [
"wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z",
"wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z"
]
}'
```
</TabItem>
<TabItem value="16" label="HTTP">
```HTTP
POST /druid/coordinator/v1/datasources/wikipedia_hour/markUsed HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Content-Length: 230
{
"segmentIds": [
"wikipedia_hour_2015-09-12T14:00:00.000Z_2015-09-12T15:00:00.000Z_2023-08-10T04:12:03.860Z",
"wikipedia_hour_2015-09-12T04:00:00.000Z_2015-09-12T05:00:00.000Z_2023-08-10T04:12:03.860Z"
]
}
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"numChangedSegments": 2
}
```
</details>
### Mark all segments unused
Marks the state of all segments of a datasource as unused.
This action performs a "soft delete" of the segments from Historicals.
Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist.
#### URL
<code class="deleteAPI">DELETE</code> <code>/druid/coordinator/v1/datasources/:datasource</code>
#### Responses
<Tabs>
<TabItem value="17" label="200 SUCCESS">
*Successfully updated segments*
</TabItem>
</Tabs>
---
#### Sample request
<Tabs>
<TabItem value="18" label="cURL">
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour"
```
</TabItem>
<TabItem value="19" label="HTTP">
```HTTP
DELETE /druid/coordinator/v1/datasources/wikipedia_hour HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"numChangedSegments": 24
}
```
</details>
### Mark all segments used
Marks the state of all unused segments of a datasource as used.
The endpoint returns the number of changed segments.
Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist.
#### URL
<code class="postAPI">POST</code> <code>/druid/coordinator/v1/datasources/:datasource</code>
#### Header
The following headers are required for this request:
```json
Content-Type: application/json
Accept: application/json, text/plain
```
#### Responses
<Tabs>
<TabItem value="20" label="200 SUCCESS">
*Successfully updated segments*
</TabItem>
</Tabs>
---
#### Sample request
The following example updates all unused segments of `wikipedia_hour` to used.
`wikipedia_hour` contains one unused segment eligible to be marked as used.
<Tabs>
<TabItem value="21" label="cURL">
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour" \
--header 'Content-Type: application/json' \
--header 'Accept: application/json, text/plain'
```
</TabItem>
<TabItem value="22" label="HTTP">
```HTTP
POST /druid/coordinator/v1/datasources/wikipedia_hour HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Accept: application/json, text/plain
```
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>Show sample response</summary>
```json
{
"numChangedSegments": 1
}
```
</details>
## Segment deletion
### Permanently delete segments
The DELETE endpoint sends a [kill task](../ingestion/tasks.md) for a given interval and datasource. The interval value is an ISO 8601 string delimited by `_`. This request permanently deletes all metadata for unused segments and removes them from deep storage.
Note that this endpoint returns an HTTP `200 OK` response code even if the datasource doesn't exist.
This endpoint supersedes the deprecated endpoint: `DELETE /druid/coordinator/v1/datasources/:datasource?kill=true&interval=:interval`
#### URL
<code class="deleteAPI">DELETE</code> <code>/druid/coordinator/v1/datasources/:datasource/intervals/:interval</code>
#### Responses
<Tabs>
<TabItem value="23" label="200 SUCCESS">
*Successfully sent kill task*
</TabItem>
</Tabs>
---
#### Sample request
The following example sends a kill task to permanently delete segments in the datasource `wikipedia_hour` from the interval `2015-09-12` to `2015-09-13`.
<Tabs>
<TabItem value="24" label="cURL">
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13"
```
</TabItem>
<TabItem value="25" label="HTTP">
```HTTP
DELETE /druid/coordinator/v1/datasources/wikipedia_hour/intervals/2015-09-12_2015-09-13 HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
</TabItem>
</Tabs>
#### Sample response
A successful request returns an HTTP `200 OK` and an empty response body.

View File

@ -2327,3 +2327,6 @@ taskslots
loadstatus
sqlQueryId
LAST_VALUE
markUnused
markUsed
segmentId