|
|
@ -22,19 +22,7 @@ title: "Automatic compaction"
|
|
|
|
~ under the License.
|
|
|
|
~ under the License.
|
|
|
|
-->
|
|
|
|
-->
|
|
|
|
|
|
|
|
|
|
|
|
In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks managed by the [Druid Coordinator](../design/coordinator.md).
|
|
|
|
In Apache Druid, compaction is a special type of ingestion task that reads data from a Druid datasource and writes it back into the same datasource. A common use case for this is to [optimally size segments](../operations/segment-optimization.md) after ingestion to improve query performance. Automatic compaction, or auto-compaction, refers to the system for automatic execution of compaction tasks issued by Druid itself. In addition to auto-compaction, you can perform [manual compaction](./manual-compaction.md) using the Overlord APIs.
|
|
|
|
This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## How Druid manages automatic compaction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks.
|
|
|
|
|
|
|
|
The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled.
|
|
|
|
|
|
|
|
This time period affects other Coordinator duties including merge and conversion tasks.
|
|
|
|
|
|
|
|
To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#set-frequency-of-compaction-runs).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact.
|
|
|
|
|
|
|
|
When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity.
|
|
|
|
|
|
|
|
If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
:::info
|
|
|
|
:::info
|
|
|
|
Auto-compaction skips datasources that have a segment granularity of `ALL`.
|
|
|
|
Auto-compaction skips datasources that have a segment granularity of `ALL`.
|
|
|
@ -42,53 +30,9 @@ If a compaction task takes longer than the indexing period, the Coordinator wait
|
|
|
|
|
|
|
|
|
|
|
|
As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases.
|
|
|
|
As a best practice, you should set up auto-compaction for all Druid datasources. You can run compaction tasks manually for cases where you want to allocate more system resources. For example, you may choose to run multiple compaction tasks in parallel to compact an existing datasource for the first time. See [Compaction](compaction.md) for additional details and use cases.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This topic guides you through setting up automatic compaction for your Druid cluster. See the [examples](#examples) for common use cases for automatic compaction.
|
|
|
|
|
|
|
|
|
|
|
|
## Enable automatic compaction
|
|
|
|
## Auto-compaction syntax
|
|
|
|
|
|
|
|
|
|
|
|
You can enable automatic compaction for a datasource using the web console or programmatically via an API.
|
|
|
|
|
|
|
|
This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/tasks-api.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Web console
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use the web console to enable automatic compaction for a datasource as follows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Click **Datasources** in the top-level navigation.
|
|
|
|
|
|
|
|
2. In the **Compaction** column, click the edit icon for the datasource to compact.
|
|
|
|
|
|
|
|
3. In the **Compaction config** dialog, configure the auto-compaction settings. The dialog offers a form view as well as a JSON view. Editing the form updates the JSON specification, and editing the JSON updates the form field, if present. Form fields not present in the JSON indicate default values. You may add additional properties to the JSON for auto-compaction settings not displayed in the form. See [Configure automatic compaction](#configure-automatic-compaction) for supported settings for auto-compaction.
|
|
|
|
|
|
|
|
4. Click **Submit**.
|
|
|
|
|
|
|
|
5. Refresh the **Datasources** view. The **Compaction** column for the datasource changes from “Not enabled” to “Awaiting first run.”
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following screenshot shows the compaction config dialog for a datasource with auto-compaction enabled.
|
|
|
|
|
|
|
|
![Compaction config in web console](../assets/compaction-config.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To disable auto-compaction for a datasource, click **Delete** from the **Compaction config** dialog. Druid does not retain your auto-compaction configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Compaction configuration API
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use the [Automatic compaction API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) to configure automatic compaction.
|
|
|
|
|
|
|
|
To enable auto-compaction for a datasource, create a JSON object with the desired auto-compaction settings.
|
|
|
|
|
|
|
|
See [Configure automatic compaction](#configure-automatic-compaction) for the syntax of an auto-compaction spec.
|
|
|
|
|
|
|
|
Send the JSON object as a payload in a [`POST` request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
|
|
|
|
|
|
|
|
The following example configures auto-compaction for the `wikipedia` datasource:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
|
|
curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/config/compaction' \
|
|
|
|
|
|
|
|
--header 'Content-Type: application/json' \
|
|
|
|
|
|
|
|
--data-raw '{
|
|
|
|
|
|
|
|
"dataSource": "wikipedia",
|
|
|
|
|
|
|
|
"granularitySpec": {
|
|
|
|
|
|
|
|
"segmentGranularity": "DAY"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}'
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To disable auto-compaction for a datasource, send a [`DELETE` request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
|
|
curl --location --request DELETE 'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Configure automatic compaction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can configure automatic compaction dynamically without restarting Druid.
|
|
|
|
You can configure automatic compaction dynamically without restarting Druid.
|
|
|
|
The automatic compaction system uses the following syntax:
|
|
|
|
The automatic compaction system uses the following syntax:
|
|
|
@ -108,6 +52,14 @@ The automatic compaction system uses the following syntax:
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
:::info Experimental
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The MSQ task engine is available as a compaction engine when you run automatic compaction as a compaction supervisor. For more information, see [Auto-compaction using compaction supervisors](#auto-compaction-using-compaction-supervisors).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For automatic compaction using Coordinator duties, you submit the spec to the [Compaction config UI](#manage-auto-compaction-using-the-web-console) or the [Compaction configuration API](#manage-auto-compaction-using-coordinator-apis).
|
|
|
|
|
|
|
|
|
|
|
|
Most fields in the auto-compaction configuration correlate to a typical [Druid ingestion spec](../ingestion/ingestion-spec.md).
|
|
|
|
Most fields in the auto-compaction configuration correlate to a typical [Druid ingestion spec](../ingestion/ingestion-spec.md).
|
|
|
|
The following properties only apply to auto-compaction:
|
|
|
|
The following properties only apply to auto-compaction:
|
|
|
|
* `skipOffsetFromLatest`
|
|
|
|
* `skipOffsetFromLatest`
|
|
|
@ -131,7 +83,62 @@ maximize performance and minimize disk usage of the `compact` tasks launched by
|
|
|
|
|
|
|
|
|
|
|
|
For more details on each of the specs in an auto-compaction configuration, see [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
|
|
|
|
For more details on each of the specs in an auto-compaction configuration, see [Automatic compaction dynamic configuration](../configuration/index.md#automatic-compaction-dynamic-configuration).
|
|
|
|
|
|
|
|
|
|
|
|
### Set frequency of compaction runs
|
|
|
|
## Auto-compaction using Coordinator duties
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can control how often the Coordinator checks to see if auto-compaction is needed. The Coordinator [indexing period](../configuration/index.md#coordinator-operation), `druid.coordinator.period.indexingPeriod`, controls the frequency of compaction tasks.
|
|
|
|
|
|
|
|
The default indexing period is 30 minutes, meaning that the Coordinator first checks for segments to compact at most 30 minutes from when auto-compaction is enabled.
|
|
|
|
|
|
|
|
This time period also affects other Coordinator duties such as cleanup of unused segments and stale pending segments.
|
|
|
|
|
|
|
|
To configure the auto-compaction time period without interfering with `indexingPeriod`, see [Set frequency of compaction runs](#change-compaction-frequency).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
At every invocation of auto-compaction, the Coordinator initiates a [segment search](../design/coordinator.md#segment-search-policy-in-automatic-compaction) to determine eligible segments to compact.
|
|
|
|
|
|
|
|
When there are eligible segments to compact, the Coordinator issues compaction tasks based on available worker capacity.
|
|
|
|
|
|
|
|
If a compaction task takes longer than the indexing period, the Coordinator waits for it to finish before resuming the period for segment search.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No additional configuration is needed to run automatic compaction tasks using the Coordinator and native engine. This is the default behavior for Druid.
|
|
|
|
|
|
|
|
You can configure it for a datasource through the web console or programmatically via an API.
|
|
|
|
|
|
|
|
This process differs for manual compaction tasks, which can be submitted from the [Tasks view of the web console](../operations/web-console.md) or the [Tasks API](../api-reference/tasks-api.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Manage auto-compaction using the web console
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use the web console to enable automatic compaction for a datasource as follows:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Click **Datasources** in the top-level navigation.
|
|
|
|
|
|
|
|
2. In the **Compaction** column, click the edit icon for the datasource to compact.
|
|
|
|
|
|
|
|
3. In the **Compaction config** dialog, configure the auto-compaction settings. The dialog offers a form view as well as a JSON view. Editing the form updates the JSON specification, and editing the JSON updates the form field, if present. Form fields not present in the JSON indicate default values. You may add additional properties to the JSON for auto-compaction settings not displayed in the form. See [Configure automatic compaction](#auto-compaction-syntax) for supported settings for auto-compaction.
|
|
|
|
|
|
|
|
4. Click **Submit**.
|
|
|
|
|
|
|
|
5. Refresh the **Datasources** view. The **Compaction** column for the datasource changes from “Not enabled” to “Awaiting first run.”
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following screenshot shows the compaction config dialog for a datasource with auto-compaction enabled.
|
|
|
|
|
|
|
|
![Compaction config in web console](../assets/compaction-config.png)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To disable auto-compaction for a datasource, click **Delete** from the **Compaction config** dialog. Druid does not retain your auto-compaction configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Manage auto-compaction using Coordinator APIs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Use the [Automatic compaction API](../api-reference/automatic-compaction-api.md#manage-automatic-compaction) to configure automatic compaction.
|
|
|
|
|
|
|
|
To enable auto-compaction for a datasource, create a JSON object with the desired auto-compaction settings.
|
|
|
|
|
|
|
|
See [Configure automatic compaction](#auto-compaction-syntax) for the syntax of an auto-compaction spec.
|
|
|
|
|
|
|
|
Send the JSON object as a payload in a [`POST` request](../api-reference/automatic-compaction-api.md#create-or-update-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction`.
|
|
|
|
|
|
|
|
The following example configures auto-compaction for the `wikipedia` datasource:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
|
|
curl --location --request POST 'http://localhost:8081/druid/coordinator/v1/config/compaction' \
|
|
|
|
|
|
|
|
--header 'Content-Type: application/json' \
|
|
|
|
|
|
|
|
--data-raw '{
|
|
|
|
|
|
|
|
"dataSource": "wikipedia",
|
|
|
|
|
|
|
|
"granularitySpec": {
|
|
|
|
|
|
|
|
"segmentGranularity": "DAY"
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}'
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To disable auto-compaction for a datasource, send a [`DELETE` request](../api-reference/automatic-compaction-api.md#remove-automatic-compaction-configuration) to `/druid/coordinator/v1/config/compaction/{dataSource}`. Replace `{dataSource}` with the name of the datasource for which to disable auto-compaction. For example:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
|
|
curl --location --request DELETE 'http://localhost:8081/druid/coordinator/v1/config/compaction/wikipedia'
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Change compaction frequency
|
|
|
|
|
|
|
|
|
|
|
|
If you want the Coordinator to check for compaction more frequently than its indexing period, create a separate group to handle compaction duties.
|
|
|
|
If you want the Coordinator to check for compaction more frequently than its indexing period, create a separate group to handle compaction duties.
|
|
|
|
Set the time period of the duty group in the `coordinator/runtime.properties` file.
|
|
|
|
Set the time period of the duty group in the `coordinator/runtime.properties` file.
|
|
|
@ -142,6 +149,15 @@ druid.coordinator.compaction.duties=["compactSegments"]
|
|
|
|
druid.coordinator.compaction.period=PT60S
|
|
|
|
druid.coordinator.compaction.period=PT60S
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### View Coordinator duty auto-compaction stats
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
After the Coordinator has initiated auto-compaction, you can view compaction statistics for the datasource, including the number of bytes, segments, and intervals already compacted and those awaiting compaction. The Coordinator also reports the total bytes, segments, and intervals not eligible for compaction in accordance with its [segment search policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the web console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To get statistics by API, send a [`GET` request](../api-reference/automatic-compaction-api.md#view-automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the request—for example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Avoid conflicts with ingestion
|
|
|
|
## Avoid conflicts with ingestion
|
|
|
|
|
|
|
|
|
|
|
|
Compaction tasks may be interrupted when they interfere with ingestion. For example, this occurs when an ingestion task needs to write data to a segment for a time interval locked for compaction. If there are continuous failures that prevent compaction from making progress, consider one of the following strategies:
|
|
|
|
Compaction tasks may be interrupted when they interfere with ingestion. For example, this occurs when an ingestion task needs to write data to a segment for a time interval locked for compaction. If there are continuous failures that prevent compaction from making progress, consider one of the following strategies:
|
|
|
@ -169,15 +185,6 @@ The Coordinator compacts segments from newest to oldest. In the auto-compaction
|
|
|
|
|
|
|
|
|
|
|
|
To set `skipOffsetFromLatest`, consider how frequently you expect the stream to receive late arriving data. If your stream only occasionally receives late arriving data, the auto-compaction system robustly compacts your data even though data is ingested outside the `skipOffsetFromLatest` window. For most realtime streaming ingestion use cases, it is reasonable to set `skipOffsetFromLatest` to a few hours or a day.
|
|
|
|
To set `skipOffsetFromLatest`, consider how frequently you expect the stream to receive late arriving data. If your stream only occasionally receives late arriving data, the auto-compaction system robustly compacts your data even though data is ingested outside the `skipOffsetFromLatest` window. For most realtime streaming ingestion use cases, it is reasonable to set `skipOffsetFromLatest` to a few hours or a day.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## View automatic compaction statistics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
After the Coordinator has initiated auto-compaction, you can view compaction statistics for the datasource, including the number of bytes, segments, and intervals already compacted and those awaiting compaction. The Coordinator also reports the total bytes, segments, and intervals not eligible for compaction in accordance with its [segment search policy](../design/coordinator.md#segment-search-policy-in-automatic-compaction).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In the web console, the Datasources view displays auto-compaction statistics. The Tasks view shows the task information for compaction tasks that were triggered by the automatic compaction system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To get statistics by API, send a [`GET` request](../api-reference/automatic-compaction-api.md#view-automatic-compaction-status) to `/druid/coordinator/v1/compaction/status`. To filter the results to a particular datasource, pass the datasource name as a query parameter to the request—for example, `/druid/coordinator/v1/compaction/status?dataSource=wikipedia`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Examples
|
|
|
|
## Examples
|
|
|
|
|
|
|
|
|
|
|
|
The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../data-management/compaction.md#compaction-guidelines). The examples in this section do not change the underlying data.
|
|
|
|
The following examples demonstrate potential use cases in which auto-compaction may improve your Druid performance. See more details in [Compaction strategies](../data-management/compaction.md#compaction-guidelines). The examples in this section do not change the underlying data.
|
|
|
@ -221,6 +228,137 @@ The following auto-compaction configuration compacts updates the `wikipedia` seg
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Auto-compaction using compaction supervisors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
:::info Experimental
|
|
|
|
|
|
|
|
Compaction supervisors are experimental. For production use, we recommend [auto-compaction using Coordinator duties](#auto-compaction-using-coordinator-duties).
|
|
|
|
|
|
|
|
:::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can run automatic compaction using compaction supervisors on the Overlord rather than Coordinator duties. Compaction supervisors provide the following benefits over Coordinator duties:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Can use the supervisor framework to get information about the auto-compaction, such as status or state
|
|
|
|
|
|
|
|
* More easily suspend or resume compaction for a datasource
|
|
|
|
|
|
|
|
* Can use either the native compaction engine or the [MSQ task engine](#use-msq-for-auto-compaction)
|
|
|
|
|
|
|
|
* More reactive and submits tasks as soon as a compaction slot is available
|
|
|
|
|
|
|
|
* Tracked compaction task status to avoid re-compacting an interval repeatedly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To use compaction supervisors, set the following properties in your Overlord runtime properties:
|
|
|
|
|
|
|
|
* `druid.supervisor.compaction.enabled` to `true` so that compaction tasks can be run as supervisor tasks
|
|
|
|
|
|
|
|
* `druid.supervisor.compaction.engine` to `msq` to specify the MSQ task engine as the compaction engine or to `native` to use the native engine. This is the default engine if the `engine` field is omitted from your compaction config
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Compaction supervisors use the same syntax as auto-compaction using Coordinator duties with one key difference: you submit the auto-compaction as a a supervisor spec. In the spec, set the `type` to `autocompact` and include the auto-compaction config in the `spec`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To submit an automatic compaction task, you can submit a supervisor spec through the [web console](#manage-compaction-supervisors-with-the-web-console) or the [supervisor API](#manage-compaction-supervisors-with-supervisor-apis).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Manage compaction supervisors with the web console
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To submit a supervisor spec for MSQ task engine automatic compaction, perform the following steps:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. In the web console, go to the **Supervisors** tab.
|
|
|
|
|
|
|
|
1. Click **...** > **Submit JSON supervisor**.
|
|
|
|
|
|
|
|
1. In the dialog, include the following:
|
|
|
|
|
|
|
|
- The type of supervisor spec by setting `"type": "autocompact"`
|
|
|
|
|
|
|
|
- The compaction configuration by adding it to the `spec` field
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
|
|
|
{
|
|
|
|
|
|
|
|
"type": "autocompact",
|
|
|
|
|
|
|
|
"spec": {
|
|
|
|
|
|
|
|
"dataSource": YOUR_DATASOURCE,
|
|
|
|
|
|
|
|
"tuningConfig": {...},
|
|
|
|
|
|
|
|
"granularitySpec": {...},
|
|
|
|
|
|
|
|
"engine": <native|msq>,
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
1. Submit the supervisor.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To stop the automatic compaction task, suspend or terminate the supervisor through the UI or API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Manage compaction supervisors with supervisor APIs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Submitting an automatic compaction as a supervisor task uses the same endpoint as supervisor tasks for streaming ingestion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following example configures auto-compaction for the `wikipedia` datasource:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
|
|
|
curl --location --request POST 'http://localhost:8081/druid/indexer/v1/supervisor' \
|
|
|
|
|
|
|
|
--header 'Content-Type: application/json' \
|
|
|
|
|
|
|
|
--data-raw '{
|
|
|
|
|
|
|
|
"type": "autocompact", // required
|
|
|
|
|
|
|
|
"suspended": false, // optional
|
|
|
|
|
|
|
|
"spec": { // required
|
|
|
|
|
|
|
|
"dataSource": "wikipedia", // required
|
|
|
|
|
|
|
|
"tuningConfig": {...}, // optional
|
|
|
|
|
|
|
|
"granularitySpec": {...}, // optional
|
|
|
|
|
|
|
|
"engine": <native|msq>, // optional
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
}'
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that if you omit `spec.engine`, Druid uses the default compaction engine. You can control the default compaction engine with the `druid.supervisor.compaction.engine` Overlord runtime property. If `spec.engine` and `druid.supervisor.compaction.engine` are omitted, Druid defaults to the native engine.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To stop the automatic compaction task, suspend or terminate the supervisor through the UI or API.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Use MSQ for auto-compaction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The MSQ task engine is available as a compaction engine if you configure auto-compaction to use compaction supervisors. To use the MSQ task engine for automatic compaction, make sure the following requirements are met:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* [Load the MSQ task engine extension](../multi-stage-query/index.md#load-the-extension).
|
|
|
|
|
|
|
|
* In your Overlord runtime properties, set the following properties:
|
|
|
|
|
|
|
|
* `druid.supervisor.compaction.enabled` to `true` so that compaction tasks can be run as a supervisor task.
|
|
|
|
|
|
|
|
* Optionally, set `druid.supervisor.compaction.engine` to `msq` to specify the MSQ task engine as the default compaction engine. If you don't do this, you'll need to set `spec.engine` to `msq` for each compaction supervisor spec where you want to use the MSQ task engine.
|
|
|
|
|
|
|
|
* Have at least two compaction task slots available or set `compactionConfig.taskContext.maxNumTasks` to two or more. The MSQ task engine requires at least two tasks to run, one controller task and one worker task.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can use [MSQ task engine context parameters](../multi-stage-query/reference.md#context-parameters) in `spec.taskContext` when configuring your datasource for automatic compaction, such as setting the maximum number of tasks using the `spec.taskContext.maxNumTasks` parameter. Some of the MSQ task engine context parameters overlap with automatic compaction parameters. When these settings overlap, set one or the other.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### MSQ task engine limitations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!--This list also exists in multi-stage-query/known-issues-->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When using the MSQ task engine for auto-compaction, keep the following limitations in mind:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- The `metricSpec` field is only supported for certain aggregators. For more information, see [Supported aggregators](#supported-aggregators).
|
|
|
|
|
|
|
|
- Only dynamic and range-based partitioning are supported.
|
|
|
|
|
|
|
|
- Set `rollup` to `true` if and only if `metricSpec` is not empty or null.
|
|
|
|
|
|
|
|
- You can only partition on string dimensions. However, multi-valued string dimensions are not supported.
|
|
|
|
|
|
|
|
- The `maxTotalRows` config is not supported in `DynamicPartitionsSpec`. Use `maxRowsPerSegment` instead.
|
|
|
|
|
|
|
|
- Segments can only be sorted on `__time` as the first column.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
#### Supported aggregators
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Auto-compaction using the MSQ task engine supports only aggregators that satisfy the following properties:
|
|
|
|
|
|
|
|
* __Mergeability__: can combine partial aggregates
|
|
|
|
|
|
|
|
* __Idempotency__: produces the same results on repeated runs of the aggregator on previously aggregated values in a column
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is exemplified by the following `longSum` aggregator:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{"name": "added", "type": "longSum", "fieldName": "added"}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
where `longSum` being capable of combining partial results satisfies mergeability, while input and output column being the same (`added`) ensures idempotency.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following are some examples of aggregators that aren't supported since at least one of the required conditions aren't satisfied:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* `longSum` aggregator where the `added` column rolls up into `sum_added` column discarding the input `added` column, violating idempotency, as subsequent runs would no longer find the `added` column:
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{"name": "sum_added", "type": "longSum", "fieldName": "added"}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
* Partial sketches which cannot themselves be used to combine partial aggregates and need merging aggregators -- such as `HLLSketchMerge` required for `HLLSketchBuild` aggregator below -- violating mergeability:
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{"name": "added", "type": "HLLSketchBuild", "fieldName": "added"}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
* Count aggregator since it cannot be used to combine partial aggregates and it rolls up into a different `count` column discarding the input column(s), violating both mergeability and idempotency.
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
{"type": "count", "name": "count"}
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Learn more
|
|
|
|
## Learn more
|
|
|
|
|
|
|
|
|
|
|
|
See the following topics for more information:
|
|
|
|
See the following topics for more information:
|
|
|
|