diff --git a/docs/design/architecture.md b/docs/design/architecture.md index 0362ca3c1d2..323cfb4fd3c 100644 --- a/docs/design/architecture.md +++ b/docs/design/architecture.md @@ -29,7 +29,7 @@ Druid has a distributed architecture that is designed to be cloud-friendly and e The following diagram shows the services that make up the Druid architecture, how they are typically organized into servers, and how queries and data flow through this architecture. - +![Druid architecture](../assets/druid-architecture.png) The following sections describe the components of this architecture. @@ -107,7 +107,7 @@ example, a single day, if your datasource is partitioned by day). Within a chunk [_segments_](../design/segments.md). Each segment is a single file, typically comprising up to a few million rows of data. Since segments are organized into time chunks, it's sometimes helpful to think of segments as living on a timeline like the following: - +![Segment timeline](../assets/druid-timeline.png) A datasource may have anywhere from just a few segments, up to hundreds of thousands and even millions of segments. Each segment is created by a MiddleManager as _mutable_ and _uncommitted_. Data is queryable as soon as it is added to diff --git a/docs/design/processes.md b/docs/design/processes.md index a2c314ff49c..c802f27b28d 100644 --- a/docs/design/processes.md +++ b/docs/design/processes.md @@ -43,7 +43,7 @@ Druid processes can be deployed any way you like, but for ease of deployment we * **Query** * **Data** - +![Druid architecture](../assets/druid-architecture.png) This section describes the Druid processes and the suggested Master/Query/Data server organization, as shown in the architecture diagram above. diff --git a/docs/ingestion/ingestion-spec.md b/docs/ingestion/ingestion-spec.md index fd91694f0d3..18d54fd8e5a 100644 --- a/docs/ingestion/ingestion-spec.md +++ b/docs/ingestion/ingestion-spec.md @@ -237,7 +237,7 @@ Dimension objects can have the following components: | Field | Description | Default | |-------|-------------|---------| -| type | Either `auto`, `string`, `long`, `float`, `double`, or `json`. For the `auto` type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY, LONG, ARRAY, DOUBLE, ARRAY, or COMPLEX columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is `auto`. | `string` | +| type | Either `auto`, `string`, `long`, `float`, `double`, or `json`. For the `auto` type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY, LONG, ARRAY, DOUBLE, ARRAY, or COMPLEX columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is `auto`. | `string` | | name | The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.

Note that you can use a [`transformSpec`](#transformspec) if you want to rename columns during ingestion time. | none (required) | | createBitmapIndex | For `string` typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for `string` typed dimensions. | `true` | | multiValueHandling | For `string` typed dimensions, specifies the type of handling for [multi-value fields](../querying/multi-value-dimensions.md). Possible values are `array` (ingest string arrays as-is), `sorted_array` (sort string arrays during ingestion), and `sorted_set` (sort and de-duplicate string arrays during ingestion). This parameter is ignored for types other than `string`. | `sorted_array` |