It is important to be aware that multi-value dimensions are distinct from [array types]( While array types behave like standard SQL arrays, multi-value dimensions do not. This document describes the behavior of multi-value dimensions, and some additional details can be found in the [SQL data type documentation](
This document describes inserting, filtering, and grouping behavior for multi-value dimensions. For information about the internal representation of multi-value dimensions, see
are in the form of both [SQL]( and [native Druid queries]( Refer to the [Druid SQL documentation]( for details
about the functions available for using multi-value string dimensions in SQL.
The following sections describe inserting, filtering, and grouping behavior based on the following example data, which includes a multi-value dimension, `tags`.
When using native [batch](../ingestion/ or streaming ingestion such as with [Apache Kafka](../ingestion/, the Druid web console data loader can detect multi-value dimensions and configure the `dimensionsSpec` accordingly.
For TSV or CSV data, you can specify the multi-value delimiters using the `listDelimiter` field in the `inputFormat`. JSON data must be formatted as a JSON array to be ingested as a multi-value dimension. JSON data does not require `inputFormat` configuration.
By default, Druid sorts values in multi-value dimensions. This behavior is controlled by the `SORTED_ARRAY` value of the `multiValueHandling` field. Alternatively, you can specify multi-value handling as:
*`SORTED_SET`: results in the removal of duplicate values
*`ARRAY`: retains the original order of the values
Multi-value dimensions can also be inserted with [SQL-based ingestion](../multi-stage-query/ The functions `MV_TO_ARRAY` and `ARRAY_TO_MV` can assist in converting `VARCHAR` to `VARCHAR ARRAY` and `VARCHAR ARRAY` into `VARCHAR` respectively. `multiValueHandling` is not available when using the multi-stage query engine to insert data.
For example, to insert the data used in this document:
Notice that `ARRAY_TO_MV` is not present in the `GROUP BY` clause since we only wish to coerce the type _after_ grouping.
The `EXTERN` is also able to refer to the `tags` input type as `VARCHAR`, which is also how a query on a Druid table containing a multi-value dimension would specify the type of the `tags` column. If this is the case you must use `MV_TO_ARRAY` since the multi-stage query engine only supports grouping on multi-value dimensions as arrays. So, they must be coerced first. These arrays must then be coerced back into `VARCHAR` in the `SELECT` part of the statement with `ARRAY_TO_MV`.
Native queries can also perform filtering that would be considered a "contradiction" in SQL, such as this "and" filter which would match only row1 of the dataset above:
If you only need to include values that match your filter, you can use the SQL functions [`MV_FILTER_ONLY`/`MV_FILTER_NONE`](,
[filtered virtual column](, or [filtered dimensionSpec]( This can also improve performance.
multi-valued dimensions that were inadvertently included.
## Differences between arrays and multi-value dimensions
Avoid confusing string arrays with [multi-value dimensions]( Arrays and multi-value dimensions are stored in different column types, and query behavior is different. You can use the functions `MV_TO_ARRAY` and `ARRAY_TO_MV` to convert between the two if needed. In general, we recommend using arrays whenever possible, since they are a newer and more powerful feature and have SQL compliant behavior.
Use care during ingestion to ensure you get the type you want.
To get arrays when performing an ingestion using JSON ingestion specs, such as [native batch](../ingestion/ or streaming ingestion such as with [Apache Kafka](../ingestion/, use dimension type `auto` or enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/, write a query that generates arrays and set the context parameter `"arrayIngestMode": "array"`. Arrays may contain strings or numbers.
To get multi-value dimensions when performing an ingestion using JSON ingestion specs, use dimension type `string` and do not enable `useSchemaDiscovery`. When performing a [SQL-based ingestion](../multi-stage-query/, wrap arrays in [`ARRAY_TO_MV`](, which ensures you get multi-value dimensions in any `arrayIngestMode`. Multi-value dimensions can only contain strings.
You can tell which type you have by checking the `INFORMATION_SCHEMA.COLUMNS` table, using a query like:
WHERE TABLE_NAME = 'mytable'
Arrays are type `ARRAY`, multi-value strings are type `VARCHAR`.