From ae638e338c4ec7a7d76eefaff7b4bacbd6fed084 Mon Sep 17 00:00:00 2001 From: 317brian <53799971+317brian@users.noreply.github.com> Date: Thu, 3 Nov 2022 01:47:44 -0700 Subject: [PATCH] docs(msq): update insert vs replace for dimension-based segment pruning (#13228) * docs(msq): update insert vs replace to mention dimension-based segment pruning * make suggested changes --- docs/multi-stage-query/concepts.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/multi-stage-query/concepts.md b/docs/multi-stage-query/concepts.md index 5955b5fc14c..f08d733d7ae 100644 --- a/docs/multi-stage-query/concepts.md +++ b/docs/multi-stage-query/concepts.md @@ -79,6 +79,8 @@ publishes them at the end of its run. For this reason, it is best suited to load INSERT statements to load data in a sequence of microbatches; for that, use [streaming ingestion](../ingestion/index.md#streaming) instead. +When deciding whether to use REPLACE or INSERT, keep in mind that segments generated with REPLACE can be pruned with dimension-based pruning but those generated with INSERT cannot. For more information about the requirements for dimension-based pruning, see [Clustering](#clustering). + For more information about the syntax, see [INSERT](./reference.md#insert). @@ -102,6 +104,8 @@ issues](./known-issues.md#select) page. For more information about the syntax, see [REPLACE](./reference.md#replace). +When deciding whether to use REPLACE or INSERT, keep in mind that segments generated with REPLACE can be pruned with dimension-based pruning but those generated with INSERT cannot. For more information about the requirements for dimension-based pruning, see [Clustering](#clustering). + ### Primary timestamp Druid tables always include a primary timestamp named `__time`. @@ -159,7 +163,7 @@ To activate dimension-based pruning, these requirements must be met: - Segments were generated by a REPLACE statement, not an INSERT statement. - All CLUSTERED BY columns are single-valued string columns. -If these requirements are _not_ met, Druid still clusters data during ingestion, but will not be able to perform +If these requirements are _not_ met, Druid still clusters data during ingestion but will not be able to perform dimension-based segment pruning at query time. You can tell if dimension-based segment pruning is possible by using the `sys.segments` table to inspect the `shard_spec` for the segments generated by an ingestion query. If they are of type `range` or `single`, then dimension-based segment pruning is possible. Otherwise, it is not. The shard spec type is also