428 lines
13 KiB
Plaintext
428 lines
13 KiB
Plaintext
[[breaking_20_mapping_changes]]
|
|
=== Mapping changes
|
|
|
|
A number of changes have been made to mappings to remove ambiguity and to
|
|
ensure that conflicting mappings cannot be created.
|
|
|
|
One major change is that dynamically added fields must have their mapping
|
|
confirmed by the master node before indexing continues. This is to avoid a
|
|
problem where different shards in the same index dynamically add different
|
|
mappings for the same field. These conflicting mappings can silently return
|
|
incorrect results and can lead to index corruption.
|
|
|
|
This change can make indexing slower when frequently adding many new fields.
|
|
We are looking at ways of optimising this process but we chose safety over
|
|
performance for this extreme use case.
|
|
|
|
==== Conflicting field mappings
|
|
|
|
Fields with the same name, in the same index, in different types, must have
|
|
the same mapping, with the exception of the <<copy-to>>, <<dynamic>>,
|
|
<<enabled>>, <<ignore-above>>, <<include-in-all>>, and <<properties>>
|
|
parameters, which may have different settings per field.
|
|
|
|
[source,js]
|
|
---------------
|
|
PUT my_index
|
|
{
|
|
"mappings": {
|
|
"type_one": {
|
|
"properties": {
|
|
"name": { <1>
|
|
"type": "string"
|
|
}
|
|
}
|
|
},
|
|
"type_two": {
|
|
"properties": {
|
|
"name": { <1>
|
|
"type": "string",
|
|
"analyzer": "english"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------
|
|
<1> The two `name` fields have conflicting mappings and will prevent Elasticsearch
|
|
from starting.
|
|
|
|
Elasticsearch will not start in the presence of conflicting field mappings.
|
|
These indices must be deleted or reindexed using a new mapping.
|
|
|
|
The `ignore_conflicts` option of the put mappings API has been removed.
|
|
Conflicts can't be ignored anymore.
|
|
|
|
==== Fields cannot be referenced by short name
|
|
|
|
A field can no longer be referenced using its short name. Instead, the full
|
|
path to the field is required. For instance:
|
|
|
|
[source,js]
|
|
---------------
|
|
PUT my_index
|
|
{
|
|
"mappings": {
|
|
"my_type": {
|
|
"properties": {
|
|
"title": { "type": "string" }, <1>
|
|
"name": {
|
|
"properties": {
|
|
"title": { "type": "string" }, <2>
|
|
"first": { "type": "string" },
|
|
"last": { "type": "string" }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------
|
|
<1> This field is referred to as `title`.
|
|
<2> This field is referred to as `name.title`.
|
|
|
|
Previously, the two `title` fields in the example above could have been
|
|
confused with each other when using the short name `title`.
|
|
|
|
==== Type name prefix removed
|
|
|
|
Previously, two fields with the same name in two different types could
|
|
sometimes be disambiguated by prepending the type name. As a side effect, it
|
|
would add a filter on the type name to the relevant query. This feature was
|
|
ambiguous -- a type name could be confused with a field name -- and didn't
|
|
work everywhere e.g. aggregations.
|
|
|
|
Instead, fields should be specified with the full path, but without a type
|
|
name prefix. If you wish to filter by the `_type` field, either specify the
|
|
type in the URL or add an explicit filter.
|
|
|
|
The following example query in 1.x:
|
|
|
|
[source,js]
|
|
----------------------------
|
|
GET my_index/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"my_type.some_field": "quick brown fox"
|
|
}
|
|
}
|
|
}
|
|
----------------------------
|
|
|
|
would be rewritten in 2.0 as:
|
|
|
|
[source,js]
|
|
----------------------------
|
|
GET my_index/my_type/_search <1>
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"some_field": "quick brown fox" <2>
|
|
}
|
|
}
|
|
}
|
|
----------------------------
|
|
<1> The type name can be specified in the URL to act as a filter.
|
|
<2> The field name should be specified without the type prefix.
|
|
|
|
==== Field names may not contain dots
|
|
|
|
In 1.x, it was possible to create fields with dots in their name, for
|
|
instance:
|
|
|
|
[source,js]
|
|
----------------------------
|
|
PUT my_index
|
|
{
|
|
"mappings": {
|
|
"my_type": {
|
|
"properties": {
|
|
"foo.bar": { <1>
|
|
"type": "string"
|
|
},
|
|
"foo": {
|
|
"properties": {
|
|
"bar": { <1>
|
|
"type": "string"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----------------------------
|
|
<1> These two fields cannot be distinguised as both are referred to as `foo.bar`.
|
|
|
|
You can no longer create fields with dots in the name.
|
|
|
|
==== Type names may not start with a dot
|
|
|
|
In 1.x, Elasticsearch would issue a warning if a type name included a dot,
|
|
e.g. `my.type`. Now that type names are no longer used to distinguish between
|
|
fields in different types, this warning has been relaxed: type names may now
|
|
contain dots, but they may not *begin* with a dot. The only exception to this
|
|
is the special `.percolator` type.
|
|
|
|
==== Type names may not be longer than 255 characters
|
|
|
|
Mapping type names may not be longer than 255 characters. Long type names
|
|
will continue to function on indices created before upgrade, but it will not
|
|
be possible create types with long names in new indices.
|
|
|
|
==== Types may no longer be deleted
|
|
|
|
In 1.x it was possible to delete a type mapping, along with all of the
|
|
documents of that type, using the delete mapping API. This is no longer
|
|
supported, because remnants of the fields in the type could remain in the
|
|
index, causing corruption later on.
|
|
|
|
Instead, if you need to delete a type mapping, you should reindex to a new
|
|
index which does not contain the mapping. If you just need to delete the
|
|
documents that belong to that type, then use the delete-by-query plugin
|
|
instead.
|
|
|
|
[[migration-meta-fields]]
|
|
==== Type meta-fields
|
|
|
|
The <<mapping-fields,meta-fields>> associated with had configuration options
|
|
removed, to make them more reliable:
|
|
|
|
* `_id` configuration can no longer be changed. If you need to sort, use the <<mapping-uid-field,`_uid`>> field instead.
|
|
* `_type` configuration can no longer be changed.
|
|
* `_index` configuration can no longer be changed.
|
|
* `_routing` configuration is limited to marking routing as required.
|
|
* `_field_names` configuration is limited to disabling the field.
|
|
* `_size` configuration is limited to enabling the field.
|
|
* `_timestamp` configuration is limited to enabling the field, setting format and default value.
|
|
* `_boost` has been removed.
|
|
* `_analyzer` has been removed.
|
|
|
|
Importantly, *meta-fields can no longer be specified as part of the document
|
|
body.* Instead, they must be specified in the query string parameters. For
|
|
instance, in 1.x, the `routing` could be specified as follows:
|
|
|
|
[source,json]
|
|
-----------------------------
|
|
PUT my_index
|
|
{
|
|
"mappings": {
|
|
"my_type": {
|
|
"_routing": {
|
|
"path": "group" <1>
|
|
},
|
|
"properties": {
|
|
"group": { <1>
|
|
"type": "string"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT my_index/my_type/1 <2>
|
|
{
|
|
"group": "foo"
|
|
}
|
|
-----------------------------
|
|
<1> This 1.x mapping tells Elasticsearch to extract the `routing` value from the `group` field in the document body.
|
|
<2> This indexing request uses a `routing` value of `foo`.
|
|
|
|
In 2.0, the routing must be specified explicitly:
|
|
|
|
[source,json]
|
|
-----------------------------
|
|
PUT my_index
|
|
{
|
|
"mappings": {
|
|
"my_type": {
|
|
"_routing": {
|
|
"required": true <1>
|
|
},
|
|
"properties": {
|
|
"group": {
|
|
"type": "string"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT my_index/my_type/1?routing=bar <2>
|
|
{
|
|
"group": "foo"
|
|
}
|
|
-----------------------------
|
|
<1> Routing can be marked as required to ensure it is not forgotten during indexing.
|
|
<2> This indexing request uses a `routing` value of `bar`.
|
|
|
|
==== `_timestamp` and `_ttl` deprecated
|
|
|
|
The `_timestamp` and `_ttl` fields are deprecated, but will remain functional
|
|
for the remainder of the 2.x series.
|
|
|
|
Instead of the `_timestamp` field, use a normal <<date,`date`>> field and set
|
|
the value explicitly.
|
|
|
|
The current `_ttl` functionality will be replaced in a future version with a
|
|
new implementation of TTL, possibly with different semantics, and will not
|
|
depend on the `_timestamp` field.
|
|
|
|
==== Analyzer mappings
|
|
|
|
Previously, `index_analyzer` and `search_analyzer` could be set separately,
|
|
while the `analyzer` setting would set both. The `index_analyzer` setting has
|
|
been removed in favour of just using the `analyzer` setting.
|
|
|
|
If just the `analyzer` is set, it will be used at index time and at search time. To use a different analyzer at search time, specify both the `analyzer` and a `search_analyzer`.
|
|
|
|
The `index_analyzer`, `search_analyzer`, and `analyzer` type-level settings
|
|
have also been removed, as is is no longer possible to select fields based on
|
|
the type name.
|
|
|
|
The `_analyzer` meta-field, which allowed setting an analyzer per document has
|
|
also been removed. It will be ignored on older indices.
|
|
|
|
==== Date fields and Unix timestamps
|
|
|
|
Previously, `date` fields would first try to parse values as a Unix timestamp
|
|
-- milliseconds-since-the-epoch -- before trying to use their defined date
|
|
`format`. This meant that formats like `yyyyMMdd` could never work, as values
|
|
would be interpreted as timestamps.
|
|
|
|
In 2.0, we have added two formats: `epoch_millis` and `epoch_second`. Only
|
|
date fields that use these formats will be able to parse timestamps.
|
|
|
|
These formats cannot be used in dynamic templates, because they are
|
|
indistinguishable from long values.
|
|
|
|
==== Default date format
|
|
|
|
The default date format has changed from `date_optional_time` to
|
|
`strict_date_optional_time`, which expects a 4 digit year, and a 2 digit month
|
|
and day, (and optionally, 2 digit hour, minute, and second).
|
|
|
|
A dynamically added date field, by default, includes the `epoch_millis`
|
|
format to support timestamp parsing. For instance:
|
|
|
|
[source,js]
|
|
-------------------------
|
|
PUT my_index/my_type/1
|
|
{
|
|
"date_one": "2015-01-01" <1>
|
|
}
|
|
-------------------------
|
|
<1> Has `format`: `"strict_date_optional_time||epoch_millis"`.
|
|
|
|
[[migration-bool-fields]]
|
|
==== Boolean fields
|
|
|
|
Boolean fields used to have a string fielddata with `F` meaning `false` and `T`
|
|
meaning `true`. They have been refactored to use numeric fielddata, with `0`
|
|
for `false` and `1` for `true`. As a consequence, the format of the responses of
|
|
the following APIs changed when applied to boolean fields: `0`/`1` is returned
|
|
instead of `F`/`T`:
|
|
|
|
* <<search-request-fielddata-fields,fielddata fields>>
|
|
* <<search-request-sort,sort values>>
|
|
* <<search-aggregations-bucket-terms-aggregation,terms aggregations>>
|
|
|
|
In addition, terms aggregations use a custom formatter for boolean (like for
|
|
dates and ip addresses, which are also backed by numbers) in order to return
|
|
the user-friendly representation of boolean fields: `false`/`true`:
|
|
|
|
[source,js]
|
|
---------------
|
|
"buckets": [
|
|
{
|
|
"key": 0,
|
|
"key_as_string": "false",
|
|
"doc_count": 42
|
|
},
|
|
{
|
|
"key": 1,
|
|
"key_as_string": "true",
|
|
"doc_count": 12
|
|
}
|
|
]
|
|
---------------
|
|
|
|
==== `index_name` and `path` removed
|
|
|
|
The `index_name` setting was used to change the name of the Lucene field,
|
|
and the `path` setting was used on `object` fields to determine whether the
|
|
Lucene field should use the full path (including parent object fields), or
|
|
just the final `name`.
|
|
|
|
These setting have been removed as their purpose is better served with the
|
|
<<copy-to>> parameter.
|
|
|
|
==== Murmur3 Fields
|
|
|
|
Fields of type `murmur3` can no longer change `doc_values` or `index` setting.
|
|
They are always mapped as follows:
|
|
|
|
[source,js]
|
|
---------------------
|
|
{
|
|
"type": "murmur3",
|
|
"index": "no",
|
|
"doc_values": true
|
|
}
|
|
---------------------
|
|
|
|
==== Mappings in config files not supported
|
|
|
|
The ability to specify mappings in configuration files has been removed. To
|
|
specify default mappings that apply to multiple indexes, use
|
|
<<indices-templates,index templates>> instead.
|
|
|
|
Along with this change, the following settings have been removed:
|
|
|
|
* `index.mapper.default_mapping_location`
|
|
* `index.mapper.default_percolator_mapping_location`
|
|
|
|
==== Fielddata formats
|
|
|
|
Now that doc values are the default for fielddata, specialized in-memory
|
|
formats have become an esoteric option. These fielddata formats have been removed:
|
|
|
|
* `fst` on string fields
|
|
* `compressed` on geo points
|
|
|
|
The default fielddata format will be used instead.
|
|
|
|
==== Posting and doc-values codecs
|
|
|
|
It is no longer possible to specify per-field postings and doc values formats
|
|
in the mappings. This setting will be ignored on indices created before 2.0
|
|
and will cause mapping parsing to fail on indices created on or after 2.0. For
|
|
old indices, this means that new segments will be written with the default
|
|
postings and doc values formats of the current codec.
|
|
|
|
It is still possible to change the whole codec by using the `index.codec`
|
|
setting. Please however note that using a non-default codec is discouraged as
|
|
it could prevent future versions of Elasticsearch from being able to read the
|
|
index.
|
|
|
|
==== Compress and compress threshold
|
|
|
|
The `compress` and `compress_threshold` options have been removed from the
|
|
`_source` field and fields of type `binary`. These fields are compressed by
|
|
default. If you would like to increase compression levels, use the new
|
|
<<index-codec,`index.codec: best_compression`>> setting instead.
|
|
|
|
==== position_offset_gap
|
|
|
|
The `position_offset_gap` option is renamed to 'position_increment_gap'. This was
|
|
done to clear away the confusion. Elasticsearch's 'position_increment_gap' now is
|
|
mapped directly to Lucene's 'position_increment_gap'
|
|
|
|
The default `position_increment_gap` is now 100. Indexes created in Elasticsearch
|
|
2.0.0 will default to using 100 and indexes created before that will continue
|
|
to use the old default of 0. This was done to prevent phrase queries from
|
|
matching across different values of the same term unexpectedly. Specifically,
|
|
100 was chosen to cause phrase queries with slops up to 99 to match only within
|
|
a single value of a field.
|