docs: Separate section on ingesting MVDs in migration guide (#17109)

This commit is contained in:
Victoria Lim 2024-09-25 14:45:25 -07:00 committed by GitHub
parent 6f7e8ca74a
commit 203d6345af
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 16 additions and 2 deletions

View File

@ -231,7 +231,11 @@ GROUP BY 1, 2
```
## How to ingest data as arrays
## How to ingest arrays
:::tip
As a best practice, store data as arrays rather than MVDs.
:::
You can ingest arrays in Druid as follows:
@ -242,5 +246,15 @@ For an example, see [Ingesting arrays: Native batch and streaming ingestion](../
* For SQL-based batch ingestion, include the [query context parameter](../multi-stage-query/reference.md#context-parameters) `"arrayIngestMode": "array"` and reference the relevant array type (`VARCHAR ARRAY`, `BIGINT ARRAY`, or `DOUBLE ARRAY`) in the [EXTEND clause](../multi-stage-query/reference.md#extern-function) that lists the column names and data types.
For examples, see [Ingesting arrays: SQL-based ingestion](../querying/arrays.md#sql-based-ingestion).
As a best practice, always use the ARRAY data type in your input schema. If you want to ingest MVDs, explicitly wrap the string array in [ARRAY_TO_MV](../querying/sql-functions.md#array_to_mv). For an example, see [Multi-value dimensions: SQL-based ingestion](../querying/multi-value-dimensions.md#sql-based-ingestion).
## How to ingest MVDs
You can't mix arrays and MVDs in the same column.
If you need to continue to use MVDs, use the [ARRAY_TO_MV](../querying/sql-functions.md#array_to_mv) function when you ingest data.
This ensures that VARCHAR ARRAYS are stored as MVDs rather than arrays of strings.
To continue using MVDs in your existing queries, you need to ingest MVDs explicitly since arrays and MVDs behave differently.
For an example using ARRAY_TO_MV, see [Multi-value dimensions: SQL-based ingestion](../querying/multi-value-dimensions.md#sql-based-ingestion).
If you have MVD columns and want to migrate to array columns, [reindex](../data-management/update.md#reindex) your data to update its schema.
Reindexing overwrites existing data where the source of new data is the existing data itself.
Follow the same guidance on [How to ingest arrays](#how-to-ingest-arrays).