OpenSearch/docs/reference/migration/migrate_2_0.asciidoc

392 lines
14 KiB
Plaintext

[[breaking-changes-2.0]]
== Breaking changes in 2.0
This section discusses the changes that you need to be aware of when migrating
your application to Elasticsearch 2.0.
=== Indices API
The <<alias-retrieving, get alias api>> will, by default produce an error response
if a requested index does not exist. This change brings the defaults for this API in
line with the other Indices APIs. The <<multi-index>> options can be used on a request
to change this behavior
`GetIndexRequest.features()` now returns an array of Feature Enums instead of an array of String values.
The following deprecated methods have been removed:
* `GetIndexRequest.addFeatures(String[])` - Please use `GetIndexRequest.addFeatures(Feature[])` instead
* `GetIndexRequest.features(String[])` - Please use `GetIndexRequest.features(Feature[])` instead
* `GetIndexRequestBuilder.addFeatures(String[])` - Please use `GetIndexRequestBuilder.addFeatures(Feature[])` instead
* `GetIndexRequestBuilder.setFeatures(String[])` - Please use `GetIndexRequestBuilder.setFeatures(Feature[])` instead
=== Partial fields
Partial fields were deprecated since 1.0.0beta1 in favor of <<search-request-source-filtering,source filtering>>.
=== More Like This Field
The More Like This Field query has been removed in favor of the <<query-dsl-mlt-query, More Like This Query>>
restrained set to a specific `field`.
=== Routing
The default hash function that is used for routing has been changed from djb2 to
murmur3. This change should be transparent unless you relied on very specific
properties of djb2. This will help ensure a better balance of the document counts
between shards.
In addition, the following node settings related to routing have been deprecated:
[horizontal]
`cluster.routing.operation.hash.type`::
This was an undocumented setting that allowed to configure which hash function
to use for routing. `murmur3` is now enforced on new indices.
`cluster.routing.operation.use_type`::
This was an undocumented setting that allowed to take the `_type` of the
document into account when computing its shard (default: `false`). `false` is
now enforced on new indices.
=== Async replication
The `replication` parameter has been removed from all CRUD operations (index,
update, delete, bulk, delete-by-query). These operations are now synchronous
only, and a request will only return once the changes have been replicated to
all active shards in the shard group.
=== Store
The `memory` / `ram` store (`index.store.type`) option was removed in Elasticsearch 2.0.
=== Term Vectors API
Usage of `/_termvector` is deprecated, and replaced in favor of `/_termvectors`.
=== Script fields
Script fields in 1.x were only returned as a single value. So even if the return
value of a script used to be list, it would be returned as an array containing
a single value that is a list too, such as:
[source,json]
---------------
"fields": {
"my_field": [
[
"v1",
"v2"
]
]
}
---------------
In elasticsearch 2.x, scripts that return a list of values are considered as
multivalued fields. So the same example would return the following response,
with values in a single array.
[source,json]
---------------
"fields": {
"my_field": [
"v1",
"v2"
]
}
---------------
=== Main API
Previously, calling `GET /` was giving back the http status code within the json response
in addition to the actual HTTP status code. We removed `status` field in json response.
=== Java API
Some query builders have been removed or renamed:
* `commonTerms(...)` renamed with `commonTermsQuery(...)`
* `queryString(...)` renamed with `queryStringQuery(...)`
* `simpleQueryString(...)` renamed with `simpleQueryStringQuery(...)`
* `textPhrase(...)` removed
* `textPhrasePrefix(...)` removed
* `textPhrasePrefixQuery(...)` removed
* `filtered(...)` removed. Use `filteredQuery(...)` instead.
* `inQuery(...)` removed.
=== Aggregations
The `date_histogram` aggregation now returns a `Histogram` object in the response, and the `DateHistogram` class has been removed. Similarly
the `date_range`, `ipv4_range`, and `geo_distance` aggregations all return a `Range` object in the response, and the `IPV4Range`, `DateRange`,
and `GeoDistance` classes have been removed. The motivation for this is to have a single response API for the Range and Histogram aggregations
regardless of the type of data being queried. To support this some changes were made in the `MultiBucketAggregation` interface which applies
to all bucket aggregations:
* The `getKey()` method now returns `Object` instead of `String`. The actual object type returned depends on the type of aggregation requested
(e.g. the `date_histogram` will return a `DateTime` object for this method whereas a `histogram` will return a `Number`).
* A `getKeyAsString()` method has been added to return the String representation of the key.
* All other `getKeyAsX()` methods have been removed.
* The `getBucketAsKey(String)` methods have been removed on all aggregations except the `filters` and `terms` aggregations.
The `histogram` and the `date_histogram` aggregation now support a simplified `offset` option that replaces the previous `pre_offset` and
`post_offset` rounding options. Instead of having to specify two separate offset shifts of the underlying buckets, the `offset` option
moves the bucket boundaries in positive or negative direction depending on its argument.
The `date_histogram` options for `pre_zone` and `post_zone` are replaced by the `time_zone` option. The behavior of `time_zone` is
equivalent to the former `pre_zone` option. Setting `time_zone` to a value like "+01:00" now will lead to the bucket calculations
being applied in the specified time zone but In addition to this, also the `pre_zone_adjust_large_interval` is removed because we
now always return dates and bucket keys in UTC.
`include`/`exclude` filtering on the `terms` aggregation now uses the same syntax as regexp queries instead of the Java syntax. While simple
regexps should still work, more complex ones might need some rewriting. Also, the `flags` parameter is not supported anymore.
=== Terms filter lookup caching
The terms filter lookup mechanism does not support the `cache` option anymore
and relies on the filesystem cache instead. If the lookup index is not too
large, it is recommended to make it replicated to all nodes by setting
`index.auto_expand_replicas: 0-all` in order to remove the network overhead as
well.
=== Delete by query
The meaning of the `_shards` headers in the delete by query response has changed. Before version 2.0 the `total`,
`successful` and `failed` fields in the header are based on the number of primary shards. The failures on replica
shards aren't being kept track of. From version 2.0 the stats in the `_shards` header are based on all shards
of an index. The http status code is left unchanged and is only based on failures that occurred while executing on
primary shards.
=== Delete api with missing routing when required
Delete api requires a routing value when deleting a document belonging to a type that has routing set to required in its
mapping, whereas previous elasticsearch versions would trigger a broadcast delete on all shards belonging to the index.
A `RoutingMissingException` is now thrown instead.
=== Mappings
* The setting `index.mapping.allow_type_wrapper` has been removed. Documents should always be sent without the type as the root element.
* The delete mappings API has been removed. Mapping types can no longer be deleted.
==== Removed type prefix on field names in queries
Types can no longer be specified on fields within queries. Instead, specify type restrictions in the search request.
The following is an example query in 1.x over types `t1` and `t2`:
[source,json]
---------------
curl -XGET 'localhost:9200/index/_search'
{
"query": {
"bool": {
"should": [
{"match": { "t1.field_only_in_t1": "foo" }},
{"match": { "t2.field_only_in_t2": "bar" }}
]
}
}
}
---------------
In 2.0, the query should look like the following:
[source,json]
---------------
curl -XGET 'localhost:9200/index/t1,t2/_search'
{
"query": {
"bool": {
"should": [
{"match": { "field_only_in_t1": "foo" }},
{"match": { "field_only_in_t2": "bar" }}
]
}
}
}
---------------
==== Removed short name field access
Field names in queries, aggregations, etc. must now use the complete name. Use of the short name
caused ambiguities in field lookups when the same name existed within multiple object mappings.
The following example illustrates the difference between 1.x and 2.0.
Given these mappings:
[source,json]
---------------
curl -XPUT 'localhost:9200/index'
{
"mappings": {
"type": {
"properties": {
"name": {
"type": "object",
"properties": {
"first": {"type": "string"},
"last": {"type": "string"}
}
}
}
}
}
}
---------------
The following query was possible in 1.x:
[source,json]
---------------
curl -XGET 'localhost:9200/index/type/_search'
{
"query": {
"match": { "first": "foo" }
}
}
---------------
In 2.0, the same query should now be:
[source,json]
---------------
curl -XGET 'localhost:9200/index/type/_search'
{
"query": {
"match": { "name.first": "foo" }
}
}
---------------
==== Meta fields have limited configuration
Meta fields (those beginning with underscore) are fields used by elasticsearch
to provide special features. They now have limited configuration options.
* `_id` configuration can no longer be changed. If you need to sort, use `_uid` instead.
* `_type` configuration can no longer be changed.
* `_index` configuration is limited to enabling the field.
* `_routing` configuration is limited to requiring the field.
* `_boost` has been removed.
* `_field_names` configuration is limited to disabling the field.
* `_size` configuration is limited to enabling the field.
=== Boolean fields
Boolean fields used to have a string fielddata with `F` meaning `false` and `T`
meaning `true`. They have been refactored to use numeric fielddata, with `0`
for `false` and `1` for `true`. As a consequence, the format of the responses of
the following APIs changed when applied to boolean fields: `0`/`1` is returned
instead of `F`/`T`:
- <<search-request-fielddata-fields,fielddata fields>>
- <<search-request-sort,sort values>>
- <<search-aggregations-bucket-terms-aggregation,terms aggregations>>
In addition, terms aggregations use a custom formatter for boolean (like for
dates and ip addresses, which are also backed by numbers) in order to return
the user-friendly representation of boolean fields: `false`/`true`:
[source,json]
---------------
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 42
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 12
}
]
---------------
=== Codecs
It is no longer possible to specify per-field postings and doc values formats
in the mappings. This setting will be ignored on indices created before
elasticsearch 2.0 and will cause mapping parsing to fail on indices created on
or after 2.0. For old indices, this means that new segments will be written
with the default postings and doc values formats of the current codec.
It is still possible to change the whole codec by using the `index.codec`
setting. Please however note that using a non-default codec is discouraged as
it could prevent future versions of Elasticsearch from being able to read the
index.
=== Scripting settings
Removed support for `script.disable_dynamic` node setting, replaced by
fine-grained script settings described in the <<enable-dynamic-scripting,scripting docs>>.
The following setting previously used to enable dynamic scripts:
[source,yaml]
---------------
script.disable_dynamic: false
---------------
can be replaced with the following two settings in `elasticsearch.yml` that
achieve the same result:
[source,yaml]
---------------
script.inline: on
script.indexed: on
---------------
=== Script parameters
Deprecated script parameters `id`, `file`, and `scriptField` have been removed
from all scriptable APIs. `script_id`, `script_file` and `script` should be used
in their place.
=== Plugins making use of scripts
Plugins that make use of scripts must register their own script context through
`ScriptModule`. Script contexts can be used as part of fine-grained settings to
enable/disable scripts selectively.
=== Thrift and memcached transport
The thrift and memcached transport plugins are no longer supported. Instead, use
either the HTTP transport (enabled by default) or the node or transport Java client.
=== `search_type=count` deprecation
The `count` search type has been deprecated. All benefits from this search type can
now be achieved by using the `query_then_fetch` search type (which is the
default) and setting `size` to `0`.
=== JSONP support
JSONP callback support has now been removed. CORS should be used to access Elasticsearch
over AJAX instead:
[source,yaml]
---------------
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
---------------
=== Cluster state REST api
The cluster state api doesn't return the `routing_nodes` section anymore when
`routing_table` is requested. The newly introduced `routing_nodes` flag can
be used separately to control whether `routing_nodes` should be returned.
=== Query DSL
The `fuzzy_like_this` and `fuzzy_like_this_field` queries have been removed.
The `limit` filter is deprecated and becomes a no-op. You can achieve similar
behaviour using the <<search-request-body,terminate_after>> parameter.
`or` and `and` on the one hand and `bool` on the other hand used to have
different performance characteristics depending on the wrapped filters. This is
fixed now, as a consequence the `or` and `and` filters are now deprecated in
favour or `bool`.
The `execution` option of the `terms` filter is now deprecated and ignored if
provided.