522 lines
19 KiB
Plaintext
522 lines
19 KiB
Plaintext
[[breaking-changes-2.0]]
|
|
== Breaking changes in 2.0
|
|
|
|
This section discusses the changes that you need to be aware of when migrating
|
|
your application to Elasticsearch 2.0.
|
|
|
|
=== Indices API
|
|
|
|
The <<alias-retrieving, get alias api>> will, by default produce an error response
|
|
if a requested index does not exist. This change brings the defaults for this API in
|
|
line with the other Indices APIs. The <<multi-index>> options can be used on a request
|
|
to change this behavior
|
|
|
|
`GetIndexRequest.features()` now returns an array of Feature Enums instead of an array of String values.
|
|
|
|
The following deprecated methods have been removed:
|
|
|
|
* `GetIndexRequest.addFeatures(String[])` - Please use `GetIndexRequest.addFeatures(Feature[])` instead
|
|
* `GetIndexRequest.features(String[])` - Please use `GetIndexRequest.features(Feature[])` instead
|
|
* `GetIndexRequestBuilder.addFeatures(String[])` - Please use `GetIndexRequestBuilder.addFeatures(Feature[])` instead
|
|
* `GetIndexRequestBuilder.setFeatures(String[])` - Please use `GetIndexRequestBuilder.setFeatures(Feature[])` instead
|
|
|
|
=== Partial fields
|
|
|
|
Partial fields were deprecated since 1.0.0beta1 in favor of <<search-request-source-filtering,source filtering>>.
|
|
|
|
=== More Like This
|
|
|
|
The More Like This API and the More Like This Field query have been removed in
|
|
favor of the <<query-dsl-mlt-query, More Like This Query>>.
|
|
|
|
The parameter `percent_terms_to_match` has been removed in favor of
|
|
`minimum_should_match`.
|
|
|
|
=== Routing
|
|
|
|
The default hash function that is used for routing has been changed from djb2 to
|
|
murmur3. This change should be transparent unless you relied on very specific
|
|
properties of djb2. This will help ensure a better balance of the document counts
|
|
between shards.
|
|
|
|
In addition, the following node settings related to routing have been deprecated:
|
|
|
|
[horizontal]
|
|
|
|
`cluster.routing.operation.hash.type`::
|
|
|
|
This was an undocumented setting that allowed to configure which hash function
|
|
to use for routing. `murmur3` is now enforced on new indices.
|
|
|
|
`cluster.routing.operation.use_type`::
|
|
|
|
This was an undocumented setting that allowed to take the `_type` of the
|
|
document into account when computing its shard (default: `false`). `false` is
|
|
now enforced on new indices.
|
|
|
|
=== Async replication
|
|
|
|
The `replication` parameter has been removed from all CRUD operations (index,
|
|
update, delete, bulk, delete-by-query). These operations are now synchronous
|
|
only, and a request will only return once the changes have been replicated to
|
|
all active shards in the shard group.
|
|
|
|
=== Store
|
|
|
|
The `memory` / `ram` store (`index.store.type`) option was removed in Elasticsearch 2.0.
|
|
|
|
=== Term Vectors API
|
|
|
|
Usage of `/_termvector` is deprecated, and replaced in favor of `/_termvectors`.
|
|
|
|
=== Script fields
|
|
|
|
Script fields in 1.x were only returned as a single value. So even if the return
|
|
value of a script used to be list, it would be returned as an array containing
|
|
a single value that is a list too, such as:
|
|
|
|
[source,json]
|
|
---------------
|
|
"fields": {
|
|
"my_field": [
|
|
[
|
|
"v1",
|
|
"v2"
|
|
]
|
|
]
|
|
}
|
|
---------------
|
|
|
|
In elasticsearch 2.x, scripts that return a list of values are considered as
|
|
multivalued fields. So the same example would return the following response,
|
|
with values in a single array.
|
|
|
|
[source,json]
|
|
---------------
|
|
"fields": {
|
|
"my_field": [
|
|
"v1",
|
|
"v2"
|
|
]
|
|
}
|
|
---------------
|
|
|
|
=== Main API
|
|
|
|
Previously, calling `GET /` was giving back the http status code within the json response
|
|
in addition to the actual HTTP status code. We removed `status` field in json response.
|
|
|
|
=== Java API
|
|
|
|
`org.elasticsearch.index.queries.FilterBuilders` has been removed as part of the merge of
|
|
queries and filters. These filters are now available in `QueryBuilders` with the same name.
|
|
All methods that used to accept a `FilterBuilder` now accept a `QueryBuilder` instead.
|
|
|
|
In addition some query builders have been removed or renamed:
|
|
|
|
* `commonTerms(...)` renamed with `commonTermsQuery(...)`
|
|
* `queryString(...)` renamed with `queryStringQuery(...)`
|
|
* `simpleQueryString(...)` renamed with `simpleQueryStringQuery(...)`
|
|
* `textPhrase(...)` removed
|
|
* `textPhrasePrefix(...)` removed
|
|
* `textPhrasePrefixQuery(...)` removed
|
|
* `filtered(...)` removed. Use `filteredQuery(...)` instead.
|
|
* `inQuery(...)` removed.
|
|
|
|
=== Aggregations
|
|
|
|
The `date_histogram` aggregation now returns a `Histogram` object in the response, and the `DateHistogram` class has been removed. Similarly
|
|
the `date_range`, `ipv4_range`, and `geo_distance` aggregations all return a `Range` object in the response, and the `IPV4Range`, `DateRange`,
|
|
and `GeoDistance` classes have been removed. The motivation for this is to have a single response API for the Range and Histogram aggregations
|
|
regardless of the type of data being queried. To support this some changes were made in the `MultiBucketAggregation` interface which applies
|
|
to all bucket aggregations:
|
|
|
|
* The `getKey()` method now returns `Object` instead of `String`. The actual object type returned depends on the type of aggregation requested
|
|
(e.g. the `date_histogram` will return a `DateTime` object for this method whereas a `histogram` will return a `Number`).
|
|
* A `getKeyAsString()` method has been added to return the String representation of the key.
|
|
* All other `getKeyAsX()` methods have been removed.
|
|
* The `getBucketAsKey(String)` methods have been removed on all aggregations except the `filters` and `terms` aggregations.
|
|
|
|
The `histogram` and the `date_histogram` aggregation now support a simplified `offset` option that replaces the previous `pre_offset` and
|
|
`post_offset` rounding options. Instead of having to specify two separate offset shifts of the underlying buckets, the `offset` option
|
|
moves the bucket boundaries in positive or negative direction depending on its argument.
|
|
|
|
The `date_histogram` options for `pre_zone` and `post_zone` are replaced by the `time_zone` option. The behavior of `time_zone` is
|
|
equivalent to the former `pre_zone` option. Setting `time_zone` to a value like "+01:00" now will lead to the bucket calculations
|
|
being applied in the specified time zone but In addition to this, also the `pre_zone_adjust_large_interval` is removed because we
|
|
now always return dates and bucket keys in UTC.
|
|
|
|
Both the `histogram` and `date_histogram` aggregations now have a default `min_doc_count` of `0` instead of `1` previously.
|
|
|
|
`include`/`exclude` filtering on the `terms` aggregation now uses the same syntax as regexp queries instead of the Java syntax. While simple
|
|
regexps should still work, more complex ones might need some rewriting. Also, the `flags` parameter is not supported anymore.
|
|
|
|
=== Terms filter lookup caching
|
|
|
|
The terms filter lookup mechanism does not support the `cache` option anymore
|
|
and relies on the filesystem cache instead. If the lookup index is not too
|
|
large, it is recommended to make it replicated to all nodes by setting
|
|
`index.auto_expand_replicas: 0-all` in order to remove the network overhead as
|
|
well.
|
|
|
|
=== Delete by query
|
|
|
|
The meaning of the `_shards` headers in the delete by query response has changed. Before version 2.0 the `total`,
|
|
`successful` and `failed` fields in the header are based on the number of primary shards. The failures on replica
|
|
shards aren't being kept track of. From version 2.0 the stats in the `_shards` header are based on all shards
|
|
of an index. The http status code is left unchanged and is only based on failures that occurred while executing on
|
|
primary shards.
|
|
|
|
=== Delete api with missing routing when required
|
|
|
|
Delete api requires a routing value when deleting a document belonging to a type that has routing set to required in its
|
|
mapping, whereas previous elasticsearch versions would trigger a broadcast delete on all shards belonging to the index.
|
|
A `RoutingMissingException` is now thrown instead.
|
|
|
|
=== Mappings
|
|
|
|
* The setting `index.mapping.allow_type_wrapper` has been removed. Documents should always be sent without the type as the root element.
|
|
* The delete mappings API has been removed. Mapping types can no longer be deleted.
|
|
|
|
==== Removed type prefix on field names in queries
|
|
Types can no longer be specified on fields within queries. Instead, specify type restrictions in the search request.
|
|
|
|
The following is an example query in 1.x over types `t1` and `t2`:
|
|
|
|
[source,json]
|
|
---------------
|
|
curl -XGET 'localhost:9200/index/_search'
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"should": [
|
|
{"match": { "t1.field_only_in_t1": "foo" }},
|
|
{"match": { "t2.field_only_in_t2": "bar" }}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------
|
|
|
|
In 2.0, the query should look like the following:
|
|
|
|
[source,json]
|
|
---------------
|
|
curl -XGET 'localhost:9200/index/t1,t2/_search'
|
|
{
|
|
"query": {
|
|
"bool": {
|
|
"should": [
|
|
{"match": { "field_only_in_t1": "foo" }},
|
|
{"match": { "field_only_in_t2": "bar" }}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
---------------
|
|
|
|
==== Removed short name field access
|
|
Field names in queries, aggregations, etc. must now use the complete name. Use of the short name
|
|
caused ambiguities in field lookups when the same name existed within multiple object mappings.
|
|
|
|
The following example illustrates the difference between 1.x and 2.0.
|
|
|
|
Given these mappings:
|
|
|
|
[source,json]
|
|
---------------
|
|
curl -XPUT 'localhost:9200/index'
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"name": {
|
|
"type": "object",
|
|
"properties": {
|
|
"first": {"type": "string"},
|
|
"last": {"type": "string"}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
---------------
|
|
|
|
The following query was possible in 1.x:
|
|
|
|
[source,json]
|
|
---------------
|
|
curl -XGET 'localhost:9200/index/type/_search'
|
|
{
|
|
"query": {
|
|
"match": { "first": "foo" }
|
|
}
|
|
}
|
|
---------------
|
|
|
|
In 2.0, the same query should now be:
|
|
|
|
[source,json]
|
|
---------------
|
|
curl -XGET 'localhost:9200/index/type/_search'
|
|
{
|
|
"query": {
|
|
"match": { "name.first": "foo" }
|
|
}
|
|
}
|
|
---------------
|
|
|
|
==== Meta fields have limited configuration
|
|
Meta fields (those beginning with underscore) are fields used by elasticsearch
|
|
to provide special features. They now have limited configuration options.
|
|
|
|
* `_id` configuration can no longer be changed. If you need to sort, use `_uid` instead.
|
|
* `_type` configuration can no longer be changed.
|
|
* `_index` configuration is limited to enabling the field.
|
|
* `_routing` configuration is limited to requiring the field.
|
|
* `_boost` has been removed.
|
|
* `_field_names` configuration is limited to disabling the field.
|
|
* `_size` configuration is limited to enabling the field.
|
|
|
|
==== Source field limitations
|
|
The `_source` field could previously be disabled dynamically. Since this field
|
|
is a critical piece of many features like the Update API, it is no longer
|
|
possible to disable.
|
|
|
|
The options for `compress` and `compress_threshold` have also been removed.
|
|
The source field is already compressed. To minimize the storage cost,
|
|
set `index.codec: best_compression` in index settings.
|
|
|
|
==== Boolean fields
|
|
|
|
Boolean fields used to have a string fielddata with `F` meaning `false` and `T`
|
|
meaning `true`. They have been refactored to use numeric fielddata, with `0`
|
|
for `false` and `1` for `true`. As a consequence, the format of the responses of
|
|
the following APIs changed when applied to boolean fields: `0`/`1` is returned
|
|
instead of `F`/`T`:
|
|
|
|
- <<search-request-fielddata-fields,fielddata fields>>
|
|
- <<search-request-sort,sort values>>
|
|
- <<search-aggregations-bucket-terms-aggregation,terms aggregations>>
|
|
|
|
In addition, terms aggregations use a custom formatter for boolean (like for
|
|
dates and ip addresses, which are also backed by numbers) in order to return
|
|
the user-friendly representation of boolean fields: `false`/`true`:
|
|
|
|
[source,json]
|
|
---------------
|
|
"buckets": [
|
|
{
|
|
"key": 0,
|
|
"key_as_string": "false",
|
|
"doc_count": 42
|
|
},
|
|
{
|
|
"key": 1,
|
|
"key_as_string": "true",
|
|
"doc_count": 12
|
|
}
|
|
]
|
|
---------------
|
|
|
|
==== Murmur3 Fields
|
|
Fields of type `murmur3` can no longer change `doc_values` or `index` setting.
|
|
They are always stored with doc values, and not indexed.
|
|
|
|
==== Source field configuration
|
|
The `_source` field no longer supports `includes` and `excludes` parameters. When
|
|
`_source` is enabled, the entire original source will be stored.
|
|
|
|
==== Config based mappings
|
|
The ability to specify mappings in configuration files has been removed. To specify
|
|
default mappings that apply to multiple indexes, use index templates.
|
|
|
|
The following settings are no longer valid:
|
|
* `index.mapper.default_mapping_location`
|
|
* `index.mapper.default_percolator_mapping_location`
|
|
|
|
=== Codecs
|
|
|
|
It is no longer possible to specify per-field postings and doc values formats
|
|
in the mappings. This setting will be ignored on indices created before
|
|
elasticsearch 2.0 and will cause mapping parsing to fail on indices created on
|
|
or after 2.0. For old indices, this means that new segments will be written
|
|
with the default postings and doc values formats of the current codec.
|
|
|
|
It is still possible to change the whole codec by using the `index.codec`
|
|
setting. Please however note that using a non-default codec is discouraged as
|
|
it could prevent future versions of Elasticsearch from being able to read the
|
|
index.
|
|
|
|
=== Scripting settings
|
|
|
|
Removed support for `script.disable_dynamic` node setting, replaced by
|
|
fine-grained script settings described in the <<enable-dynamic-scripting,scripting docs>>.
|
|
The following setting previously used to enable dynamic scripts:
|
|
|
|
[source,yaml]
|
|
---------------
|
|
script.disable_dynamic: false
|
|
---------------
|
|
|
|
can be replaced with the following two settings in `elasticsearch.yml` that
|
|
achieve the same result:
|
|
|
|
[source,yaml]
|
|
---------------
|
|
script.inline: on
|
|
script.indexed: on
|
|
---------------
|
|
|
|
=== Script parameters
|
|
|
|
Deprecated script parameters `id`, `file`, and `scriptField` have been removed
|
|
from all scriptable APIs. `script_id`, `script_file` and `script` should be used
|
|
in their place.
|
|
|
|
=== Groovy scripts sandbox
|
|
|
|
The groovy sandbox and related settings have been removed. Groovy is now a non
|
|
sandboxed scripting language, without any option to turn the sandbox on.
|
|
|
|
=== Plugins making use of scripts
|
|
|
|
Plugins that make use of scripts must register their own script context through
|
|
`ScriptModule`. Script contexts can be used as part of fine-grained settings to
|
|
enable/disable scripts selectively.
|
|
|
|
=== Thrift and memcached transport
|
|
|
|
The thrift and memcached transport plugins are no longer supported. Instead, use
|
|
either the HTTP transport (enabled by default) or the node or transport Java client.
|
|
|
|
=== `search_type=count` deprecation
|
|
|
|
The `count` search type has been deprecated. All benefits from this search type can
|
|
now be achieved by using the `query_then_fetch` search type (which is the
|
|
default) and setting `size` to `0`.
|
|
|
|
=== JSONP support
|
|
|
|
JSONP callback support has now been removed. CORS should be used to access Elasticsearch
|
|
over AJAX instead:
|
|
|
|
[source,yaml]
|
|
---------------
|
|
http.cors.enabled: true
|
|
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
|
|
---------------
|
|
|
|
=== Cluster state REST api
|
|
|
|
The cluster state api doesn't return the `routing_nodes` section anymore when
|
|
`routing_table` is requested. The newly introduced `routing_nodes` flag can
|
|
be used separately to control whether `routing_nodes` should be returned.
|
|
|
|
=== Query DSL
|
|
|
|
Change to ranking behaviour: single-term queries on numeric fields now score in the same way as string fields (use of IDF, norms if enabled).
|
|
Previously, term queries on numeric fields were deliberately prevented from using the usual Lucene scoring logic and this behaviour was undocumented and, to some, unexpected.
|
|
If the introduction of scoring to numeric fields is undesirable for your query clauses the fix is simple: wrap them in a `constant_score` or use a `filter` expression instead.
|
|
|
|
|
|
The `fuzzy_like_this` and `fuzzy_like_this_field` queries have been removed.
|
|
|
|
The `limit` filter is deprecated and becomes a no-op. You can achieve similar
|
|
behaviour using the <<search-request-body,terminate_after>> parameter.
|
|
|
|
`or` and `and` on the one hand and `bool` on the other hand used to have
|
|
different performance characteristics depending on the wrapped filters. This is
|
|
fixed now, as a consequence the `or` and `and` filters are now deprecated in
|
|
favour or `bool`.
|
|
|
|
The `execution` option of the `terms` filter is now deprecated and ignored if
|
|
provided.
|
|
|
|
The `_cache` and `_cache_key` parameters of filters are deprecated in the REST
|
|
layer and removed in the Java API. In case they are specified they will be
|
|
ignored. Instead filters are always used as their own cache key and elasticsearch
|
|
makes decisions by itself about whether it should cache filters based on how
|
|
often they are used.
|
|
|
|
==== Query/filter merge
|
|
|
|
Elasticsearch no longer makes a difference between queries and filters in the
|
|
DSL; it detects when scores are not needed and automatically optimizes the
|
|
query to not compute scores and optionally caches the result.
|
|
|
|
As a consequence the `query` filter serves no purpose anymore and is deprecated.
|
|
|
|
=== Snapshot and Restore
|
|
|
|
The obsolete parameters `expand_wildcards_open` and `expand_wildcards_close` are no longer
|
|
supported by the snapshot and restore operations. These parameters have been replaced by
|
|
a single `expand_wildcards` parameter. See <<multi-index,the multi-index docs>> for more.
|
|
|
|
=== `_shutdown` API
|
|
|
|
The `_shutdown` API has been removed without a replacement. Nodes should be managed via operating
|
|
systems and the provided start/stop scripts.
|
|
|
|
=== Analyze API
|
|
|
|
The Analyze API return 0 as first Token's position instead of 1.
|
|
|
|
=== Multiple data.path striping
|
|
|
|
Previously, if the `data.path` setting listed multiple data paths, then a
|
|
shard would be ``striped'' across all paths by writing a whole file to each
|
|
path in turn (in accordance with the `index.store.distributor` setting). The
|
|
result was that the files from a single segment in a shard could be spread
|
|
across multiple disks, and the failure of any one disk could corrupt multiple
|
|
shards.
|
|
|
|
This striping is no longer supported. Instead, different shards may be
|
|
allocated to different paths, but all of the files in a single shard will be
|
|
written to the same path.
|
|
|
|
If striping is detected while starting Elasticsearch 2.0.0 or later, all of
|
|
the files belonging to the same shard will be migrated to the same path. If
|
|
there is not enough disk space to complete this migration, the upgrade will be
|
|
cancelled and can only be resumed once enough disk space is made available.
|
|
|
|
The `index.store.distributor` setting has also been removed.
|
|
|
|
=== Hunspell dictionary configuration
|
|
|
|
The parameter `indices.analysis.hunspell.dictionary.location` has been removed,
|
|
and `<path.conf>/hunspell` is always used.
|
|
|
|
=== Java API Transport API construction
|
|
|
|
The `TransportClient` construction code has changed, it now uses the builder
|
|
pattern. Instead of using:
|
|
|
|
[source,java]
|
|
--------------------------------------------------
|
|
Settings settings = ImmutableSettings.settingsBuilder()
|
|
.put("cluster.name", "myClusterName").build();
|
|
Client client = new TransportClient(settings);
|
|
--------------------------------------------------
|
|
|
|
Use:
|
|
|
|
[source,java]
|
|
--------------------------------------------------
|
|
Settings settings = ImmutableSettings.settingsBuilder()
|
|
.put("cluster.name", "myClusterName").build();
|
|
Client client = TransportClient.builder().settings(settings).build();
|
|
--------------------------------------------------
|
|
|
|
=== Logging
|
|
|
|
Log messages are now truncated at 10,000 characters. This can be changed in the
|
|
`logging.yml` configuration file.
|
|
|
|
[float]
|
|
=== Removed `top_children` query
|
|
|
|
The `top_children` query has been removed in favour of the `has_child` query. The `top_children` query wasn't always faster
|
|
than the `has_child` query and the `top_children` query was often inaccurate. The total hits and any aggregations in the
|
|
same search request will likely be off if `top_children` was used. |