druid/docs/operations/durable-storage.md

---
id: durable-storage
title: "Durable storage for the multi-stage query engine"
sidebar_label: "Durable storage"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

You can use durable storage to improve querying from deep storage and SQL-based ingestion.

> Note that only S3 is supported as a durable storage location.

Durable storage for queries from deep storage provides a location where you can write the results of deep storage queries to. Durable storage for SQL-based ingestion is used to temporarily house intermediate files, which can improve reliability.

Enabling durable storage also enables the use of local disk to store temporary files, such as the intermediate files produced
while sorting the data. Tasks will use whatever has been configured for their temporary usage as described in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes).
If the configured limit is too low, Druid may throw the error, `NotEnoughTemporaryStorageFault`.

## Enable durable storage

To enable durable storage, you need to set the following common service properties:

```
druid.msq.intermediate.storage.enable=true
druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir

# Include these configs if you're using S3
# druid.msq.intermediate.storage.type=s3
# druid.msq.intermediate.storage.bucket=YOUR_BUCKET

# Include these configs if you're using Azure Blob Storage
# druid.msq.intermediate.storage.type=azure
# druid.sq.intermediate.storage.container=YOUR_CONTAINER

druid.msq.intermediate.storage.prefix=YOUR_PREFIX
```

For detailed information about these and additional settings related to durable storage, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).


## Use durable storage for SQL-based ingestion queries

When you run a query, include the context parameter `durableShuffleStorage` and set it to `true`.

For queries where you want to use fault tolerance for workers,  set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`.

## Use durable storage for queries from deep storage

Depending on the size of the results you're expecting, saving the final results for queries from deep storage to durable storage might be needed.

By default, Druid saves the final results for queries from deep storage to task reports. Generally, this is acceptable for smaller result sets but may lead to timeouts for larger result sets. 

When you run a query, include the context parameter `selectDestination` and set it to `DURABLESTORAGE`:

```json
    "context":{
        ...
        "selectDestination": "DURABLESTORAGE"
    }
```

You can also write intermediate results to durable storage (`durableShuffleStorage`) for better reliability. The location where workers write intermediate results is different than the location where final results get stored. This means that durable storage for results can be enabled even if you don't write intermediate results to durable storage. 

If you write the results for queries from deep storage to durable storage, the results are cleaned up when the task is removed from the metadata store. 

## Durable storage clean up

To prevent durable storage from getting filled up with temporary files in case the tasks fail to clean them up, a periodic
cleaner can be scheduled to clean the directories corresponding to which there isn't a controller task running. It utilizes
the storage connector to work upon the durable storage. The durable storage location should only be utilized to store the output
for the cluster's MSQ tasks. If the location contains other files or directories, then they will get cleaned up as well.

Use `druid.msq.intermediate.storage.cleaner.enabled` and `druid.msq.intermediate.storage.cleaner.delaySeconds` to configure the cleaner. For more information, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).

Note that if you choose to write query results to durable storage,the results are cleaned up when the task is removed from the metadata store.
docs: query from deep storage (#14609) * cold tier wip * wip * copyedits * wip * copyedits * copyedits * wip * wip * update rules page * typo * typo * update sidebar * moves durable storage info to its own page in operations * update screenshots * add apache license * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add query from deep storage tutorial stub * address some of the feedback * revert screenshot update. handled in separate pr * load rule update * wip tutorial * reformat deep storage endpoints * rest of tutorial * typo * cleanup * screenshot and sidebar for tutorial * add license * typos * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rest of review comments * clarify where results are stored * update api reference for durablestorage context param * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * comments * incorporate #14720 * address rest of comments * missed one * Update docs/api-reference/sql-api.md * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> 2023-08-04 01:40:08 -04:00			`---`
			`id: durable-storage`
			`title: "Durable storage for the multi-stage query engine"`
			`sidebar_label: "Durable storage"`
			`---`

			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

			`You can use durable storage to improve querying from deep storage and SQL-based ingestion.`

			`> Note that only S3 is supported as a durable storage location.`

			`Durable storage for queries from deep storage provides a location where you can write the results of deep storage queries to. Durable storage for SQL-based ingestion is used to temporarily house intermediate files, which can improve reliability.`

			`Enabling durable storage also enables the use of local disk to store temporary files, such as the intermediate files produced`
			`while sorting the data. Tasks will use whatever has been configured for their temporary usage as described in [Configuring task storage sizes](../ingestion/tasks.md#configuring-task-storage-sizes).`
			If the configured limit is too low, Druid may throw the error, `NotEnoughTemporaryStorageFault`.

			`## Enable durable storage`

			`To enable durable storage, you need to set the following common service properties:`

			```
			`druid.msq.intermediate.storage.enable=true`
			`druid.msq.intermediate.storage.tempDir=/path/to/your/temp/dir`
docs: durable storage azure cleanup (#15120) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> 2023-10-31 18:20:38 -04:00
			`# Include these configs if you're using S3`
			`# druid.msq.intermediate.storage.type=s3`
			`# druid.msq.intermediate.storage.bucket=YOUR_BUCKET`

			`# Include these configs if you're using Azure Blob Storage`
			`# druid.msq.intermediate.storage.type=azure`
			`# druid.sq.intermediate.storage.container=YOUR_CONTAINER`

			`druid.msq.intermediate.storage.prefix=YOUR_PREFIX`
docs: query from deep storage (#14609) * cold tier wip * wip * copyedits * wip * copyedits * copyedits * wip * wip * update rules page * typo * typo * update sidebar * moves durable storage info to its own page in operations * update screenshots * add apache license * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add query from deep storage tutorial stub * address some of the feedback * revert screenshot update. handled in separate pr * load rule update * wip tutorial * reformat deep storage endpoints * rest of tutorial * typo * cleanup * screenshot and sidebar for tutorial * add license * typos * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rest of review comments * clarify where results are stored * update api reference for durablestorage context param * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * comments * incorporate #14720 * address rest of comments * missed one * Update docs/api-reference/sql-api.md * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> 2023-08-04 01:40:08 -04:00			```

docs: durable storage azure cleanup (#15120) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> 2023-10-31 18:20:38 -04:00			`For detailed information about these and additional settings related to durable storage, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).`
docs: query from deep storage (#14609) * cold tier wip * wip * copyedits * wip * copyedits * copyedits * wip * wip * update rules page * typo * typo * update sidebar * moves durable storage info to its own page in operations * update screenshots * add apache license * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add query from deep storage tutorial stub * address some of the feedback * revert screenshot update. handled in separate pr * load rule update * wip tutorial * reformat deep storage endpoints * rest of tutorial * typo * cleanup * screenshot and sidebar for tutorial * add license * typos * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rest of review comments * clarify where results are stored * update api reference for durablestorage context param * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * comments * incorporate #14720 * address rest of comments * missed one * Update docs/api-reference/sql-api.md * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> 2023-08-04 01:40:08 -04:00

			`## Use durable storage for SQL-based ingestion queries`

			When you run a query, include the context parameter `durableShuffleStorage` and set it to `true`.

			For queries where you want to use fault tolerance for workers, set `faultTolerance` to `true`, which automatically sets `durableShuffleStorage` to `true`.

			`## Use durable storage for queries from deep storage`

			`Depending on the size of the results you're expecting, saving the final results for queries from deep storage to durable storage might be needed.`

			`By default, Druid saves the final results for queries from deep storage to task reports. Generally, this is acceptable for smaller result sets but may lead to timeouts for larger result sets.`

			When you run a query, include the context parameter `selectDestination` and set it to `DURABLESTORAGE`:

			```json
			`"context":{`
			`...`
			`"selectDestination": "DURABLESTORAGE"`
			`}`
			```

			You can also write intermediate results to durable storage (`durableShuffleStorage`) for better reliability. The location where workers write intermediate results is different than the location where final results get stored. This means that durable storage for results can be enabled even if you don't write intermediate results to durable storage.

			`If you write the results for queries from deep storage to durable storage, the results are cleaned up when the task is removed from the metadata store.`

			`## Durable storage clean up`

			`To prevent durable storage from getting filled up with temporary files in case the tasks fail to clean them up, a periodic`
			`cleaner can be scheduled to clean the directories corresponding to which there isn't a controller task running. It utilizes`
			`the storage connector to work upon the durable storage. The durable storage location should only be utilized to store the output`
			`for the cluster's MSQ tasks. If the location contains other files or directories, then they will get cleaned up as well.`

docs: durable storage azure cleanup (#15120) Co-authored-by: Laksh Singla <lakshsingla@gmail.com> 2023-10-31 18:20:38 -04:00			Use `druid.msq.intermediate.storage.cleaner.enabled` and `druid.msq.intermediate.storage.cleaner.delaySeconds` to configure the cleaner. For more information, see [Durable storage configurations](../multi-stage-query/reference.md#durable-storage-configurations).
docs: query from deep storage (#14609) * cold tier wip * wip * copyedits * wip * copyedits * copyedits * wip * wip * update rules page * typo * typo * update sidebar * moves durable storage info to its own page in operations * update screenshots * add apache license * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * add query from deep storage tutorial stub * address some of the feedback * revert screenshot update. handled in separate pr * load rule update * wip tutorial * reformat deep storage endpoints * rest of tutorial * typo * cleanup * screenshot and sidebar for tutorial * add license * typos * Apply suggestions from code review Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> * rest of review comments * clarify where results are stored * update api reference for durablestorage context param * Apply suggestions from code review Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * comments * incorporate #14720 * address rest of comments * missed one * Update docs/api-reference/sql-api.md * Update docs/api-reference/sql-api.md --------- Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com> Co-authored-by: Karan Kumar <karankumar1100@gmail.com> 2023-08-04 01:40:08 -04:00
			`Note that if you choose to write query results to durable storage,the results are cleaned up when the task is removed from the metadata store.`