druid/docs/data-management/update.md

---
id: update
title: "Data updates"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

## Overwrite

Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports
overwriting existing data using time ranges. Data outside the replacement time range is not touched. Overwriting of
existing data is done using the same mechanisms as [batch ingestion](../ingestion/index.md#batch).

For example:

- [Native batch](../ingestion/native-batch.md) with `appendToExisting: false`, and `intervals` set to a specific
  time range, overwrites data for that time range.
- [SQL `REPLACE <table> OVERWRITE [ALL | WHERE ...]`](../multi-stage-query/reference.md#replace) overwrites data for
  the entire table or for a specified time range.

In both cases, Druid's atomic update mechanism ensures that queries will flip seamlessly from the old data to the new
data on a time-chunk-by-time-chunk basis.

Ingestion and overwriting cannot run concurrently for the same time range of the same datasource. While an overwrite job
is ongoing for a particular time range of a datasource, new ingestions for that time range are queued up. Ingestions for
other time ranges proceed as normal. Read-only queries also proceed as normal, using the pre-existing version of the
data.

:::info
 Druid does not support single-record updates by primary key.
:::

## Reindex

Reindexing is an [overwrite of existing data](#overwrite) where the source of new data is the existing data itself. It
is used to perform schema changes, repartition data, filter out unwanted data, enrich existing data, and so on. This
behaves just like any other [overwrite](#overwrite) with regard to atomic updates and locking.

With [native batch](../ingestion/native-batch.md), use the [`druid` input
source](../ingestion/input-sources.md#druid-input-source). If needed,
[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
reindexing job.

With SQL, use [`REPLACE <table> OVERWRITE`](../multi-stage-query/reference.md#replace) with `SELECT ... FROM <table>`.
(Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be used to filter,
modify, or enrich the data during the reindexing job.

## Rolled-up datasources

Rolled-up datasources can be effectively updated using appends, without rewrites. When you append a row that has an
identical set of dimensions to an existing row, queries that use aggregation operators automatically combine those two
rows together at query time.

[Compaction](compaction.md) or [automatic compaction](automatic-compaction.md) can be used to physically combine these
matching rows together later on, by rewriting segments in the background.

## Lookups

If you have a dimension where values need to be updated frequently, try first using [lookups](../querying/lookups.md). A
classic use case of lookups is when you have an ID dimension stored in a Druid segment, and want to map the ID dimension to a
human-readable string that may need to be updated periodically.
Various documentation updates. (#13107) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> 2022-09-17 00:58:11 -04:00			`---`
			`id: update`
			`title: "Data updates"`
			`---`

			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

			`## Overwrite`

docs: Anchor link checker (#15624) Co-authored-by: 317brian <53799971+317brian@users.noreply.github.com> 2024-01-08 18:19:05 -05:00			`Apache Druid stores data [partitioned by time chunk](../design/storage.md) and supports`
Various documentation updates. (#13107) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> 2022-09-17 00:58:11 -04:00			`overwriting existing data using time ranges. Data outside the replacement time range is not touched. Overwriting of`
			`existing data is done using the same mechanisms as [batch ingestion](../ingestion/index.md#batch).`

			`For example:`

			- [Native batch](../ingestion/native-batch.md) with `appendToExisting: false`, and `intervals` set to a specific
			`time range, overwrites data for that time range.`
			- [SQL `REPLACE <table> OVERWRITE [ALL \| WHERE ...]`](../multi-stage-query/reference.md#replace) overwrites data for
			`the entire table or for a specified time range.`

			`In both cases, Druid's atomic update mechanism ensures that queries will flip seamlessly from the old data to the new`
			`data on a time-chunk-by-time-chunk basis.`

			`Ingestion and overwriting cannot run concurrently for the same time range of the same datasource. While an overwrite job`
			`is ongoing for a particular time range of a datasource, new ingestions for that time range are queued up. Ingestions for`
			`other time ranges proceed as normal. Read-only queries also proceed as normal, using the pre-existing version of the`
			`data.`

Docusaurus2 upgrade for master (#14411) Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Charles Smith <techdocsmith@gmail.com> 2023-08-16 22:01:21 -04:00			`:::info`
			`Druid does not support single-record updates by primary key.`
			`:::`
Various documentation updates. (#13107) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> 2022-09-17 00:58:11 -04:00
			`## Reindex`

			`Reindexing is an [overwrite of existing data](#overwrite) where the source of new data is the existing data itself. It`
			`is used to perform schema changes, repartition data, filter out unwanted data, enrich existing data, and so on. This`
			`behaves just like any other [overwrite](#overwrite) with regard to atomic updates and locking.`

			With [native batch](../ingestion/native-batch.md), use the [`druid` input
Update Ingestion section (#14023) Co-authored-by: Charles Smith <techdocsmith@gmail.com> Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com> Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com> 2023-05-19 12:42:27 -04:00			`source](../ingestion/input-sources.md#druid-input-source). If needed,`
Various documentation updates. (#13107) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> 2022-09-17 00:58:11 -04:00			[`transformSpec`](../ingestion/ingestion-spec.md#transformspec) can be used to filter or modify data during the
			`reindexing job.`

fix html tags in docs (#13117) * fix html tags in docs * revert not null 2022-09-18 22:40:33 -04:00			With SQL, use [`REPLACE <table> OVERWRITE`](../multi-stage-query/reference.md#replace) with `SELECT ... FROM <table>`.
			(Druid does not have `UPDATE` or `ALTER TABLE` statements.) Any SQL SELECT query can be used to filter,
Various documentation updates. (#13107) * Various documentation updates. 1) Split out "data management" from "ingestion". Break it into thematic pages. 2) Move "SQL-based ingestion" into the Ingestion category. Adjust content so all conceptual content is in concepts.md and all syntax content is in reference.md. Shorten the known issues page to the most interesting ones. 3) Add SQL-based ingestion to the ingestion method comparison page. Remove the index task, since index_parallel is just as good when maxNumConcurrentSubTasks: 1. 4) Rename various mentions of "Druid console" to "web console". 5) Add additional information to ingestion/partitioning.md. 6) Remove a mention of Tranquility. 7) Remove a note about upgrading to Druid 0.10.1. 8) Remove no-longer-relevant task types from ingestion/tasks.md. 9) Move ingestion/native-batch-firehose.md to the hidden section. It was previously deprecated. 10) Move ingestion/native-batch-simple-task.md to the hidden section. It is still linked in some places, but it isn't very useful compared to index_parallel, so it shouldn't take up space in the sidebar. 11) Make all br tags self-closing. 12) Certain other cosmetic changes. 13) Update to node-sass 7. * make travis use node12 for docs Co-authored-by: Vadim Ogievetsky <vadim@ogievetsky.com> 2022-09-17 00:58:11 -04:00			`modify, or enrich the data during the reindexing job.`

			`## Rolled-up datasources`

			`Rolled-up datasources can be effectively updated using appends, without rewrites. When you append a row that has an`
			`identical set of dimensions to an existing row, queries that use aggregation operators automatically combine those two`
			`rows together at query time.`

			`[Compaction](compaction.md) or [automatic compaction](automatic-compaction.md) can be used to physically combine these`
			`matching rows together later on, by rewriting segments in the background.`

			`## Lookups`

			`If you have a dimension where values need to be updated frequently, try first using [lookups](../querying/lookups.md). A`
			`classic use case of lookups is when you have an ID dimension stored in a Druid segment, and want to map the ID dimension to a`
			`human-readable string that may need to be updated periodically.`