druid/docs/api-reference/automatic-compaction-api.md

851 lines
23 KiB
Markdown
Raw Normal View History

---
id: automatic-compaction-api
title: Automatic compaction API
sidebar_label: Automatic compaction
---
2023-11-20 14:16:10 -05:00
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
<!--
2023-11-20 14:16:10 -05:00
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->
2023-11-20 14:16:10 -05:00
This topic describes the status and configuration API endpoints for [automatic compaction](../data-management/automatic-compaction.md) in Apache Druid. You can configure automatic compaction in the Druid web console or API.
In this topic, `http://ROUTER_IP:ROUTER_PORT` is a placeholder for your Router service address and port. Replace it with the information for your deployment. For example, use `http://localhost:8888` for quickstart deployments.
## Manage automatic compaction
### Create or update automatic compaction configuration
Creates or updates the automatic compaction configuration for a datasource. Pass the automatic compaction as a JSON object in the request body.
The automatic compaction configuration requires only the `dataSource` property. Druid fills all other properties with default values if not specified. See [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration) for configuration details.
Note that this endpoint returns an HTTP `200 OK` message code even if the datasource name does not exist.
2023-11-20 14:16:10 -05:00
#### URL
`POST` `/druid/coordinator/v1/config/compaction`
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="1" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully submitted auto compaction configuration*
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
The following example creates an automatic compaction configuration for the datasource `wikipedia_hour`, which was ingested with `HOUR` segment granularity. This automatic compaction configuration performs compaction on `wikipedia_hour`, resulting in compacted segments that represent a day interval of data.
2023-11-20 14:16:10 -05:00
In this example:
* `wikipedia_hour` is a datasource with `HOUR` segment granularity.
2023-11-20 14:16:10 -05:00
* `skipOffsetFromLatest` is set to `PT0S`, meaning that no data is skipped.
* `partitionsSpec` is set to the default `dynamic`, allowing Druid to dynamically determine the optimal partitioning strategy.
* `type` is set to `index_parallel`, meaning that parallel indexing is used.
* `segmentGranularity` is set to `DAY`, meaning that each compacted segment is a day of data.
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="2" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction"\
--header 'Content-Type: application/json' \
--data '{
"dataSource": "wikipedia_hour",
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"partitionsSpec": {
"type": "dynamic"
},
"type": "index_parallel"
},
"granularitySpec": {
"segmentGranularity": "DAY"
}
}'
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="3" label="HTTP">
```HTTP
POST /druid/coordinator/v1/config/compaction HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
Content-Type: application/json
Content-Length: 281
{
"dataSource": "wikipedia_hour",
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"partitionsSpec": {
"type": "dynamic"
},
"type": "index_parallel"
},
"granularitySpec": {
"segmentGranularity": "DAY"
}
}
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
A successful request returns an HTTP `200 OK` message code and an empty response body.
### Remove automatic compaction configuration
2023-11-20 14:16:10 -05:00
Removes the automatic compaction configuration for a datasource. This updates the compaction status of the datasource to "Not enabled."
#### URL
`DELETE` `/druid/coordinator/v1/config/compaction/{dataSource}`
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="4" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully deleted automatic compaction configuration*
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="5" label="404 NOT FOUND">
2023-11-20 14:16:10 -05:00
*Datasource does not have automatic compaction or invalid datasource name*
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="6" label="cURL">
```shell
curl --request DELETE "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="7" label="HTTP">
```HTTP
DELETE /druid/coordinator/v1/config/compaction/wikipedia_hour HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
A successful request returns an HTTP `200 OK` message code and an empty response body.
### Update capacity for compaction tasks
2023-11-20 14:16:10 -05:00
Updates the capacity for compaction tasks. The minimum number of compaction tasks is 1 and the maximum is 2147483647.
2023-11-20 14:16:10 -05:00
Note that while the max compaction tasks can theoretically be set to 2147483647, the practical limit is determined by the available cluster capacity and is capped at 10% of the cluster's total capacity.
#### URL
`POST` `/druid/coordinator/v1/config/compaction/taskslots`
#### Query parameters
To limit the maximum number of compaction tasks, use the optional query parameters `ratio` and `max`:
* `ratio` (optional)
* Type: Float
* Default: 0.1
* Limits the ratio of the total task slots to compaction task slots.
* `max` (optional)
* Type: Int
* Default: 2147483647
* Limits the maximum number of task slots for compaction tasks.
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="8" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully updated compaction configuration*
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="9" label="404 NOT FOUND">
2023-11-20 14:16:10 -05:00
*Invalid `max` value*
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="10" label="cURL">
```shell
curl --request POST "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/taskslots?ratio=0.2&max=250000"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="11" label="HTTP">
```HTTP
POST /druid/coordinator/v1/config/compaction/taskslots?ratio=0.2&max=250000 HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
A successful request returns an HTTP `200 OK` message code and an empty response body.
## View automatic compaction configuration
### Get all automatic compaction configurations
Retrieves all automatic compaction configurations. Returns a `compactionConfigs` object containing the active automatic compaction configurations of all datasources.
You can use this endpoint to retrieve `compactionTaskSlotRatio` and `maxCompactionTaskSlots` values for managing resource allocation of compaction tasks.
#### URL
`GET` `/druid/coordinator/v1/config/compaction`
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
2023-11-20 14:16:10 -05:00
<TabItem value="12" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully retrieved automatic compaction configurations*
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="13" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="14" label="HTTP">
```HTTP
GET /druid/coordinator/v1/config/compaction HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>View the response</summary>
```json
{
"compactionConfigs": [
{
"dataSource": "wikipedia_hour",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"maxRowsInMemory": null,
"appendableIndexSpec": null,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": null,
"indexSpecForIntermediatePersists": null,
"maxPendingPersists": null,
"pushTimeout": null,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": null,
"maxRetry": null,
"taskStatusCheckPeriodMs": null,
"chatHandlerTimeout": null,
"chatHandlerNumRetries": null,
"maxNumSegmentsToMerge": null,
"totalNumMergeTasks": null,
"maxColumnsToMerge": null,
"type": "index_parallel",
"forceGuaranteedRollup": false
},
"granularitySpec": {
"segmentGranularity": "DAY",
"queryGranularity": null,
"rollup": null
},
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
},
{
"dataSource": "wikipedia",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"maxRowsInMemory": null,
"appendableIndexSpec": null,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": null,
"indexSpecForIntermediatePersists": null,
"maxPendingPersists": null,
"pushTimeout": null,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": null,
"maxRetry": null,
"taskStatusCheckPeriodMs": null,
"chatHandlerTimeout": null,
"chatHandlerNumRetries": null,
"maxNumSegmentsToMerge": null,
"totalNumMergeTasks": null,
"maxColumnsToMerge": null,
"type": "index_parallel",
"forceGuaranteedRollup": false
},
"granularitySpec": {
"segmentGranularity": "DAY",
"queryGranularity": null,
"rollup": null
},
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
}
],
"compactionTaskSlotRatio": 0.1,
"maxCompactionTaskSlots": 2147483647,
"useAutoScaleSlots": false
}
```
</details>
### Get automatic compaction configuration
Retrieves the automatic compaction configuration for a datasource.
#### URL
`GET` `/druid/coordinator/v1/config/compaction/{dataSource}`
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="15" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully retrieved configuration for datasource*
</TabItem>
<TabItem value="16" label="404 NOT FOUND">
*Invalid datasource or datasource does not have automatic compaction enabled*
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
---
#### Sample request
The following example retrieves the automatic compaction configuration for datasource `wikipedia_hour`.
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="17" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="18" label="HTTP">
```HTTP
GET /druid/coordinator/v1/config/compaction/wikipedia_hour HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>View the response</summary>
```json
{
"dataSource": "wikipedia_hour",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"maxRowsInMemory": null,
"appendableIndexSpec": null,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": null,
"indexSpecForIntermediatePersists": null,
"maxPendingPersists": null,
"pushTimeout": null,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": null,
"maxRetry": null,
"taskStatusCheckPeriodMs": null,
"chatHandlerTimeout": null,
"chatHandlerNumRetries": null,
"maxNumSegmentsToMerge": null,
"totalNumMergeTasks": null,
"maxColumnsToMerge": null,
"type": "index_parallel",
"forceGuaranteedRollup": false
},
"granularitySpec": {
"segmentGranularity": "DAY",
"queryGranularity": null,
"rollup": null
},
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
}
```
</details>
### Get automatic compaction configuration history
Retrieves the history of the automatic compaction configuration for a datasource. Returns an empty list if the datasource does not exist or there is no compaction history for the datasource.
The response contains a list of objects with the following keys:
2023-11-20 14:16:10 -05:00
* `globalConfig`: A JSON object containing automatic compaction configuration that applies to the entire cluster.
* `compactionConfig`: A JSON object containing the automatic compaction configuration for the datasource.
* `auditInfo`: A JSON object containing information about the change made, such as `author`, `comment` or `ip`.
* `auditTime`: The date and time when the change was made.
#### URL
`GET` `/druid/coordinator/v1/config/compaction/{dataSource}/history`
#### Query parameters
* `interval` (optional)
* Type: ISO-8601
2023-11-20 14:16:10 -05:00
* Limits the results within a specified interval. Use `/` as the delimiter for the interval string.
* `count` (optional)
* Type: Int
* Limits the number of results.
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="19" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully retrieved configuration history*
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="20" label="400 BAD REQUEST">
2023-11-20 14:16:10 -05:00
*Invalid `count` value*
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="21" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/config/compaction/wikipedia_hour/history"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="22" label="HTTP">
```HTTP
GET /druid/coordinator/v1/config/compaction/wikipedia_hour/history HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>View the response</summary>
```json
[
{
"globalConfig": {
"compactionTaskSlotRatio": 0.1,
"maxCompactionTaskSlots": 2147483647,
"useAutoScaleSlots": false
},
"compactionConfig": {
"dataSource": "wikipedia_hour",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "P1D",
"tuningConfig": null,
"granularitySpec": {
"segmentGranularity": "DAY",
"queryGranularity": null,
"rollup": null
},
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
},
"auditInfo": {
"author": "",
"comment": "",
"ip": "127.0.0.1"
},
"auditTime": "2023-07-31T18:15:19.302Z"
},
{
"globalConfig": {
"compactionTaskSlotRatio": 0.1,
"maxCompactionTaskSlots": 2147483647,
"useAutoScaleSlots": false
},
"compactionConfig": {
"dataSource": "wikipedia_hour",
"taskPriority": 25,
"inputSegmentSizeBytes": 100000000000000,
"maxRowsPerSegment": null,
"skipOffsetFromLatest": "PT0S",
"tuningConfig": {
"maxRowsInMemory": null,
"appendableIndexSpec": null,
"maxBytesInMemory": null,
"maxTotalRows": null,
"splitHintSpec": null,
"partitionsSpec": {
"type": "dynamic",
"maxRowsPerSegment": 5000000,
"maxTotalRows": null
},
"indexSpec": null,
"indexSpecForIntermediatePersists": null,
"maxPendingPersists": null,
"pushTimeout": null,
"segmentWriteOutMediumFactory": null,
"maxNumConcurrentSubTasks": null,
"maxRetry": null,
"taskStatusCheckPeriodMs": null,
"chatHandlerTimeout": null,
"chatHandlerNumRetries": null,
"maxNumSegmentsToMerge": null,
"totalNumMergeTasks": null,
"maxColumnsToMerge": null,
"type": "index_parallel",
"forceGuaranteedRollup": false
},
"granularitySpec": {
"segmentGranularity": "DAY",
"queryGranularity": null,
"rollup": null
},
"dimensionsSpec": null,
"metricsSpec": null,
"transformSpec": null,
"ioConfig": null,
"taskContext": null
},
"auditInfo": {
"author": "",
"comment": "",
"ip": "127.0.0.1"
},
"auditTime": "2023-07-31T18:16:16.362Z"
}
]
```
</details>
## View automatic compaction status
### Get segments awaiting compaction
Returns the total size of segments awaiting compaction for a given datasource. Returns a 404 response if a datasource does not have automatic compaction enabled.
#### URL
`GET` `/druid/coordinator/v1/compaction/progress?dataSource={dataSource}`
#### Query parameter
* `dataSource` (required)
* Type: String
* Name of the datasource for this status information.
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="23" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully retrieved segment size awaiting compaction*
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="24" label="404 NOT FOUND">
2023-11-20 14:16:10 -05:00
*Unknown datasource name or datasource does not have automatic compaction enabled*
</TabItem>
</Tabs>
---
#### Sample request
The following example retrieves the remaining segments to be compacted for datasource `wikipedia_hour`.
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="25" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/compaction/progress?dataSource=wikipedia_hour"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="26" label="HTTP">
```HTTP
GET /druid/coordinator/v1/compaction/progress?dataSource=wikipedia_hour HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>View the response</summary>
```json
{
"remainingSegmentSize": 7615837
}
```
</details>
### Get compaction status and statistics
Retrieves an array of `latestStatus` objects representing the status and statistics from the latest automatic compaction run for all datasources with automatic compaction enabled.
The `latestStatus` object has the following properties:
* `dataSource`: Name of the datasource for this status information.
* `scheduleStatus`: Automatic compaction scheduling status. Possible values are `NOT_ENABLED` and `RUNNING`. Returns `RUNNING ` if the datasource has an active automatic compaction configuration submitted. Otherwise, returns `NOT_ENABLED`.
* `bytesAwaitingCompaction`: Total bytes of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction).
* `bytesCompacted`: Total bytes of this datasource that are already compacted with the spec set in the automatic compaction configuration.
* `bytesSkipped`: Total bytes of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction.
* `segmentCountAwaitingCompaction`: Total number of segments of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction).
* `segmentCountCompacted`: Total number of segments of this datasource that are already compacted with the spec set in the automatic compaction configuration.
* `segmentCountSkipped`: Total number of segments of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction.
* `intervalCountAwaitingCompaction`: Total number of intervals of this datasource waiting to be compacted by the automatic compaction (only consider intervals/segments that are eligible for automatic compaction).
* `intervalCountCompacted`: Total number of intervals of this datasource that are already compacted with the spec set in the automatic compaction configuration.
* `intervalCountSkipped`: Total number of intervals of this datasource that are skipped (not eligible for automatic compaction) by the automatic compaction.
#### URL
`GET` `/druid/coordinator/v1/compaction/status`
#### Query parameters
* `dataSource` (optional)
* Type: String
* Filter the result by name of a specific datasource.
#### Responses
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="27" label="200 SUCCESS">
2023-11-20 14:16:10 -05:00
*Successfully retrieved `latestStatus` object*
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
---
#### Sample request
2023-11-20 14:16:10 -05:00
<Tabs>
<TabItem value="28" label="cURL">
```shell
curl "http://ROUTER_IP:ROUTER_PORT/druid/coordinator/v1/compaction/status"
```
2023-11-20 14:16:10 -05:00
</TabItem>
<TabItem value="29" label="HTTP">
```HTTP
GET /druid/coordinator/v1/compaction/status HTTP/1.1
Host: http://ROUTER_IP:ROUTER_PORT
```
2023-11-20 14:16:10 -05:00
</TabItem>
</Tabs>
#### Sample response
<details>
<summary>View the response</summary>
```json
{
"latestStatus": [
{
"dataSource": "wikipedia_api",
"scheduleStatus": "RUNNING",
"bytesAwaitingCompaction": 0,
"bytesCompacted": 0,
"bytesSkipped": 64133616,
"segmentCountAwaitingCompaction": 0,
"segmentCountCompacted": 0,
"segmentCountSkipped": 8,
"intervalCountAwaitingCompaction": 0,
"intervalCountCompacted": 0,
"intervalCountSkipped": 1
},
{
"dataSource": "wikipedia_hour",
"scheduleStatus": "RUNNING",
"bytesAwaitingCompaction": 0,
"bytesCompacted": 5998634,
"bytesSkipped": 0,
"segmentCountAwaitingCompaction": 0,
"segmentCountCompacted": 1,
"segmentCountSkipped": 0,
"intervalCountAwaitingCompaction": 0,
"intervalCountCompacted": 1,
"intervalCountSkipped": 0
}
]
}
```
</details>