docs(msq): update insert vs replace for dimension-based segment pruning (#13228)

* docs(msq): update insert vs replace to mention dimension-based segment pruning

* make suggested changes
This commit is contained in:
317brian 2022-11-03 01:47:44 -07:00 committed by GitHub
parent 7cb21cb968
commit ae638e338c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 1 deletions

View File

@ -79,6 +79,8 @@ publishes them at the end of its run. For this reason, it is best suited to load
INSERT statements to load data in a sequence of microbatches; for that, use [streaming INSERT statements to load data in a sequence of microbatches; for that, use [streaming
ingestion](../ingestion/index.md#streaming) instead. ingestion](../ingestion/index.md#streaming) instead.
When deciding whether to use REPLACE or INSERT, keep in mind that segments generated with REPLACE can be pruned with dimension-based pruning but those generated with INSERT cannot. For more information about the requirements for dimension-based pruning, see [Clustering](#clustering).
For more information about the syntax, see [INSERT](./reference.md#insert). For more information about the syntax, see [INSERT](./reference.md#insert).
<a name="replace"></a> <a name="replace"></a>
@ -102,6 +104,8 @@ issues](./known-issues.md#select) page.
For more information about the syntax, see [REPLACE](./reference.md#replace). For more information about the syntax, see [REPLACE](./reference.md#replace).
When deciding whether to use REPLACE or INSERT, keep in mind that segments generated with REPLACE can be pruned with dimension-based pruning but those generated with INSERT cannot. For more information about the requirements for dimension-based pruning, see [Clustering](#clustering).
### Primary timestamp ### Primary timestamp
Druid tables always include a primary timestamp named `__time`. Druid tables always include a primary timestamp named `__time`.
@ -159,7 +163,7 @@ To activate dimension-based pruning, these requirements must be met:
- Segments were generated by a REPLACE statement, not an INSERT statement. - Segments were generated by a REPLACE statement, not an INSERT statement.
- All CLUSTERED BY columns are single-valued string columns. - All CLUSTERED BY columns are single-valued string columns.
If these requirements are _not_ met, Druid still clusters data during ingestion, but will not be able to perform If these requirements are _not_ met, Druid still clusters data during ingestion but will not be able to perform
dimension-based segment pruning at query time. You can tell if dimension-based segment pruning is possible by using the dimension-based segment pruning at query time. You can tell if dimension-based segment pruning is possible by using the
`sys.segments` table to inspect the `shard_spec` for the segments generated by an ingestion query. If they are of type `sys.segments` table to inspect the `shard_spec` for the segments generated by an ingestion query. If they are of type
`range` or `single`, then dimension-based segment pruning is possible. Otherwise, it is not. The shard spec type is also `range` or `single`, then dimension-based segment pruning is possible. Otherwise, it is not. The shard spec type is also