2014-09-30 05:54:50 -04:00
|
|
|
[[breaking-changes-2.0]]
|
|
|
|
== Breaking changes in 2.0
|
|
|
|
|
|
|
|
This section discusses the changes that you need to be aware of when migrating
|
|
|
|
your application to Elasticsearch 2.0.
|
|
|
|
|
|
|
|
=== Indices API
|
|
|
|
|
2015-02-16 10:54:06 -05:00
|
|
|
The <<alias-retrieving, get alias api>> will, by default produce an error response
|
|
|
|
if a requested index does not exist. This change brings the defaults for this API in
|
|
|
|
line with the other Indices APIs. The <<multi-index>> options can be used on a request
|
2014-10-17 07:24:04 -04:00
|
|
|
to change this behavior
|
|
|
|
|
2015-01-09 10:20:05 -05:00
|
|
|
`GetIndexRequest.features()` now returns an array of Feature Enums instead of an array of String values.
|
2015-02-25 08:07:19 -05:00
|
|
|
|
2014-11-07 11:39:42 -05:00
|
|
|
The following deprecated methods have been removed:
|
2015-02-25 08:07:19 -05:00
|
|
|
|
2014-11-07 11:39:42 -05:00
|
|
|
* `GetIndexRequest.addFeatures(String[])` - Please use `GetIndexRequest.addFeatures(Feature[])` instead
|
|
|
|
* `GetIndexRequest.features(String[])` - Please use `GetIndexRequest.features(Feature[])` instead
|
|
|
|
* `GetIndexRequestBuilder.addFeatures(String[])` - Please use `GetIndexRequestBuilder.addFeatures(Feature[])` instead
|
|
|
|
* `GetIndexRequestBuilder.setFeatures(String[])` - Please use `GetIndexRequestBuilder.setFeatures(Feature[])` instead
|
|
|
|
|
2014-10-17 07:24:04 -04:00
|
|
|
=== Partial fields
|
|
|
|
|
|
|
|
Partial fields were deprecated since 1.0.0beta1 in favor of <<search-request-source-filtering,source filtering>>.
|
2014-10-27 10:15:04 -04:00
|
|
|
|
|
|
|
=== More Like This Field
|
|
|
|
|
|
|
|
The More Like This Field query has been removed in favor of the <<query-dsl-mlt-query, More Like This Query>>
|
Switch to murmurhash3 to route documents to shards.
We currently use the djb2 hash function in order to compute the shard a
document should go to. Unfortunately this hash function is not very
sophisticated and you can sometimes hit adversarial cases, such as numeric ids
on 33 shards.
Murmur3 generates hashes with a better distribution, which should avoid the
adversarial cases.
Here are some examples of how 100000 incremental ids are distributed to shards
using either djb2 or murmur3.
5 shards:
Murmur3: [19933, 19964, 19940, 20030, 20133]
DJB: [20000, 20000, 20000, 20000, 20000]
3 shards:
Murmur3: [33185, 33347, 33468]
DJB: [30100, 30000, 39900]
33 shards:
Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]
Even if djb2 looks ideal in some cases (5 shards), the fact that the
distribution of its hashes has some patterns can raise issues with some shard
counts (eg. 3, or even worse 33).
Some tests have been modified because they relied on implementation details of
the routing hash function.
Close #7954
2014-10-01 18:34:05 -04:00
|
|
|
restrained set to a specific `field`.
|
|
|
|
|
|
|
|
=== Routing
|
|
|
|
|
|
|
|
The default hash function that is used for routing has been changed from djb2 to
|
|
|
|
murmur3. This change should be transparent unless you relied on very specific
|
|
|
|
properties of djb2. This will help ensure a better balance of the document counts
|
|
|
|
between shards.
|
|
|
|
|
|
|
|
In addition, the following node settings related to routing have been deprecated:
|
|
|
|
|
|
|
|
[horizontal]
|
|
|
|
|
|
|
|
`cluster.routing.operation.hash.type`::
|
|
|
|
|
|
|
|
This was an undocumented setting that allowed to configure which hash function
|
|
|
|
to use for routing. `murmur3` is now enforced on new indices.
|
|
|
|
|
|
|
|
`cluster.routing.operation.use_type`::
|
|
|
|
|
|
|
|
This was an undocumented setting that allowed to take the `_type` of the
|
|
|
|
document into account when computing its shard (default: `false`). `false` is
|
|
|
|
now enforced on new indices.
|
2014-11-18 09:13:28 -05:00
|
|
|
|
2015-03-19 10:25:52 -04:00
|
|
|
=== Async replication
|
|
|
|
|
|
|
|
The `replication` parameter has been removed from all CRUD operations (index,
|
|
|
|
update, delete, bulk, delete-by-query). These operations are now synchronous
|
|
|
|
only, and a request will only return once the changes have been replicated to
|
|
|
|
all active shards in the shard group.
|
|
|
|
|
2014-11-18 09:13:28 -05:00
|
|
|
=== Store
|
|
|
|
|
|
|
|
The `memory` / `ram` store (`index.store.type`) option was removed in Elasticsearch 2.0.
|
2014-11-13 09:05:09 -05:00
|
|
|
|
|
|
|
=== Term Vectors API
|
|
|
|
|
|
|
|
Usage of `/_termvector` is deprecated, and replaced in favor of `/_termvectors`.
|
2014-11-21 10:42:37 -05:00
|
|
|
|
|
|
|
=== Script fields
|
|
|
|
|
|
|
|
Script fields in 1.x were only returned as a single value. So even if the return
|
|
|
|
value of a script used to be list, it would be returned as an array containing
|
|
|
|
a single value that is a list too, such as:
|
|
|
|
|
|
|
|
[source,json]
|
|
|
|
---------------
|
|
|
|
"fields": {
|
|
|
|
"my_field": [
|
|
|
|
[
|
|
|
|
"v1",
|
|
|
|
"v2"
|
|
|
|
]
|
|
|
|
]
|
|
|
|
}
|
|
|
|
---------------
|
|
|
|
|
|
|
|
In elasticsearch 2.x, scripts that return a list of values are considered as
|
|
|
|
multivalued fields. So the same example would return the following response,
|
|
|
|
with values in a single array.
|
|
|
|
|
|
|
|
[source,json]
|
|
|
|
---------------
|
|
|
|
"fields": {
|
|
|
|
"my_field": [
|
|
|
|
"v1",
|
|
|
|
"v2"
|
|
|
|
]
|
|
|
|
}
|
|
|
|
---------------
|
2014-11-26 09:24:23 -05:00
|
|
|
|
2014-12-10 05:17:46 -05:00
|
|
|
=== Main API
|
|
|
|
|
|
|
|
Previously, calling `GET /` was giving back the http status code within the json response
|
|
|
|
in addition to the actual HTTP status code. We removed `status` field in json response.
|
|
|
|
|
2014-11-26 09:24:23 -05:00
|
|
|
=== Java API
|
|
|
|
|
|
|
|
Some query builders have been removed or renamed:
|
|
|
|
|
|
|
|
* `commonTerms(...)` renamed with `commonTermsQuery(...)`
|
|
|
|
* `queryString(...)` renamed with `queryStringQuery(...)`
|
|
|
|
* `simpleQueryString(...)` renamed with `simpleQueryStringQuery(...)`
|
|
|
|
* `textPhrase(...)` removed
|
|
|
|
* `textPhrasePrefix(...)` removed
|
|
|
|
* `textPhrasePrefixQuery(...)` removed
|
|
|
|
* `filtered(...)` removed. Use `filteredQuery(...)` instead.
|
|
|
|
* `inQuery(...)` removed.
|
|
|
|
|
2015-02-25 08:10:17 -05:00
|
|
|
=== Aggregations
|
2015-01-09 10:20:05 -05:00
|
|
|
|
2015-02-16 10:54:06 -05:00
|
|
|
The `date_histogram` aggregation now returns a `Histogram` object in the response, and the `DateHistogram` class has been removed. Similarly
|
|
|
|
the `date_range`, `ipv4_range`, and `geo_distance` aggregations all return a `Range` object in the response, and the `IPV4Range`, `DateRange`,
|
|
|
|
and `GeoDistance` classes have been removed. The motivation for this is to have a single response API for the Range and Histogram aggregations
|
|
|
|
regardless of the type of data being queried. To support this some changes were made in the `MultiBucketAggregation` interface which applies
|
2015-01-09 10:20:05 -05:00
|
|
|
to all bucket aggregations:
|
|
|
|
|
2015-02-16 10:54:06 -05:00
|
|
|
* The `getKey()` method now returns `Object` instead of `String`. The actual object type returned depends on the type of aggregation requested
|
2015-01-09 10:20:05 -05:00
|
|
|
(e.g. the `date_histogram` will return a `DateTime` object for this method whereas a `histogram` will return a `Number`).
|
|
|
|
* A `getKeyAsString()` method has been added to return the String representation of the key.
|
|
|
|
* All other `getKeyAsX()` methods have been removed.
|
|
|
|
* The `getBucketAsKey(String)` methods have been removed on all aggregations except the `filters` and `terms` aggregations.
|
|
|
|
|
2015-02-03 08:06:50 -05:00
|
|
|
The `histogram` and the `date_histogram` aggregation now support a simplified `offset` option that replaces the previous `pre_offset` and
|
|
|
|
`post_offset` rounding options. Instead of having to specify two separate offset shifts of the underlying buckets, the `offset` option
|
|
|
|
moves the bucket boundaries in positive or negative direction depending on its argument.
|
|
|
|
|
2015-02-16 10:54:06 -05:00
|
|
|
The `date_histogram` options for `pre_zone` and `post_zone` are replaced by the `time_zone` option. The behavior of `time_zone` is
|
|
|
|
equivalent to the former `pre_zone` option. Setting `time_zone` to a value like "+01:00" now will lead to the bucket calculations
|
|
|
|
being applied in the specified time zone but In addition to this, also the `pre_zone_adjust_large_interval` is removed because we
|
|
|
|
now always return dates and bucket keys in UTC.
|
|
|
|
|
2014-12-24 05:27:45 -05:00
|
|
|
=== Terms filter lookup caching
|
|
|
|
|
|
|
|
The terms filter lookup mechanism does not support the `cache` option anymore
|
|
|
|
and relies on the filesystem cache instead. If the lookup index is not too
|
|
|
|
large, it is recommended to make it replicated to all nodes by setting
|
|
|
|
`index.auto_expand_replicas: 0-all` in order to remove the network overhead as
|
|
|
|
well.
|
2014-11-26 09:24:23 -05:00
|
|
|
|
2015-01-07 04:08:15 -05:00
|
|
|
=== Parent parameter on update request
|
|
|
|
|
|
|
|
The `parent` parameter has been removed from the update request. Before 2.x it just set the routing parameter. The
|
|
|
|
`routing` setting should be used instead. The `parent` setting was confusing, because it had the impression that the parent
|
2014-09-24 08:54:50 -04:00
|
|
|
a child documents points to can be changed but this is not true.
|
|
|
|
|
2015-02-09 07:18:42 -05:00
|
|
|
==== Delete by query
|
2014-09-24 08:54:50 -04:00
|
|
|
|
|
|
|
The meaning of the `_shards` headers in the delete by query response has changed. Before version 2.0 the `total`,
|
|
|
|
`successful` and `failed` fields in the header are based on the number of primary shards. The failures on replica
|
|
|
|
shards aren't being kept track of. From version 2.0 the stats in the `_shards` header are based on all shards
|
|
|
|
of an index. The http status code is left unchanged and is only based on failures that occurred while executing on
|
2015-01-07 16:00:07 -05:00
|
|
|
primary shards.
|
|
|
|
|
2015-03-18 00:20:57 -04:00
|
|
|
=== Delete api with missing routing when required
|
|
|
|
|
|
|
|
Delete api requires a routing value when deleting a document belonging to a type that has routing set to required in its
|
|
|
|
mapping, whereas previous elasticsearch versions would trigger a broadcast delete on all shards belonging to the index.
|
|
|
|
A `RoutingMissingException` is now thrown instead.
|
|
|
|
|
2015-01-07 16:00:07 -05:00
|
|
|
=== Mappings
|
|
|
|
|
2015-01-29 19:45:20 -05:00
|
|
|
* The setting `index.mapping.allow_type_wrapper` has been removed. Documents should always be sent without the type as the root element.
|
2015-03-23 20:28:02 -04:00
|
|
|
* The delete mappings API has been removed. Mapping types can no longer be deleted.
|
2015-01-29 19:45:20 -05:00
|
|
|
|
|
|
|
==== Removed type prefix on field names in queries
|
|
|
|
Types can no longer be specified on fields within queries. Instead, specify type restrictions in the search request.
|
|
|
|
|
|
|
|
The following is an example query in 1.x over types `t1` and `t2`:
|
2015-02-25 08:13:25 -05:00
|
|
|
|
|
|
|
[source,json]
|
2015-01-29 19:45:20 -05:00
|
|
|
---------------
|
|
|
|
curl -XGET 'localhost:9200/index/_search'
|
|
|
|
{
|
|
|
|
"query": {
|
|
|
|
"bool": {
|
|
|
|
"should": [
|
|
|
|
{"match": { "t1.field_only_in_t1": "foo" }},
|
|
|
|
{"match": { "t2.field_only_in_t2": "bar" }}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
---------------
|
|
|
|
|
|
|
|
In 2.0, the query should look like the following:
|
2015-02-25 08:11:57 -05:00
|
|
|
|
2015-02-25 08:13:25 -05:00
|
|
|
[source,json]
|
2015-01-29 19:45:20 -05:00
|
|
|
---------------
|
|
|
|
curl -XGET 'localhost:9200/index/t1,t2/_search'
|
|
|
|
{
|
|
|
|
"query": {
|
|
|
|
"bool": {
|
|
|
|
"should": [
|
|
|
|
{"match": { "field_only_in_t1": "foo" }},
|
|
|
|
{"match": { "field_only_in_t2": "bar" }}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
---------------
|
2015-01-09 10:20:05 -05:00
|
|
|
|
2015-02-12 01:55:34 -05:00
|
|
|
==== Removed short name field access
|
|
|
|
Field names in queries, aggregations, etc. must now use the complete name. Use of the short name
|
|
|
|
caused ambiguities in field lookups when the same name existed within multiple object mappings.
|
|
|
|
|
|
|
|
The following example illustrates the difference between 1.x and 2.0.
|
|
|
|
|
|
|
|
Given these mappings:
|
2015-02-25 08:13:25 -05:00
|
|
|
|
|
|
|
[source,json]
|
2015-02-12 01:55:34 -05:00
|
|
|
---------------
|
|
|
|
curl -XPUT 'localhost:9200/index'
|
|
|
|
{
|
|
|
|
"mappings": {
|
|
|
|
"type": {
|
|
|
|
"properties": {
|
|
|
|
"name": {
|
|
|
|
"type": "object",
|
|
|
|
"properties": {
|
|
|
|
"first": {"type": "string"},
|
|
|
|
"last": {"type": "string"}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
---------------
|
|
|
|
|
|
|
|
The following query was possible in 1.x:
|
2015-02-25 08:13:25 -05:00
|
|
|
|
|
|
|
[source,json]
|
2015-02-12 01:55:34 -05:00
|
|
|
---------------
|
|
|
|
curl -XGET 'localhost:9200/index/type/_search'
|
|
|
|
{
|
|
|
|
"query": {
|
|
|
|
"match": { "first": "foo" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
---------------
|
|
|
|
|
|
|
|
In 2.0, the same query should now be:
|
2015-02-25 08:13:25 -05:00
|
|
|
|
|
|
|
[source,json]
|
2015-02-12 01:55:34 -05:00
|
|
|
---------------
|
|
|
|
curl -XGET 'localhost:9200/index/type/_search'
|
|
|
|
{
|
|
|
|
"query": {
|
|
|
|
"match": { "name.first": "foo" }
|
|
|
|
}
|
|
|
|
}
|
|
|
|
---------------
|
|
|
|
|
2015-02-26 04:24:17 -05:00
|
|
|
==== Meta fields have limited configuration
|
2015-02-24 04:20:13 -05:00
|
|
|
Meta fields (those beginning with underscore) are fields used by elasticsearch
|
|
|
|
to provide special features. They now have limited configuration options.
|
|
|
|
|
|
|
|
* `_id` configuration can no longer be changed. If you need to sort, use `_uid` instead.
|
2015-02-24 18:56:46 -05:00
|
|
|
* `_type` configuration can no longer be changed.
|
2015-02-25 01:45:40 -05:00
|
|
|
* `_index` configuration is limited to enabling the field.
|
2015-02-26 03:41:50 -05:00
|
|
|
* `_routing` configuration is limited to requiring the field.
|
2015-02-26 04:24:17 -05:00
|
|
|
* `_boost` has been removed.
|
2015-02-26 16:47:53 -05:00
|
|
|
* `_field_names` configuration is limited to disabling the field.
|
2015-02-26 17:02:24 -05:00
|
|
|
* `_size` configuration is limited to enabling the field.
|
2015-02-24 04:20:13 -05:00
|
|
|
|
2015-02-18 03:48:48 -05:00
|
|
|
=== Codecs
|
|
|
|
|
|
|
|
It is no longer possible to specify per-field postings and doc values formats
|
|
|
|
in the mappings. This setting will be ignored on indices created before
|
|
|
|
elasticsearch 2.0 and will cause mapping parsing to fail on indices created on
|
|
|
|
or after 2.0. For old indices, this means that new segments will be written
|
|
|
|
with the default postings and doc values formats of the current codec.
|
|
|
|
|
|
|
|
It is still possible to change the whole codec by using the `index.codec`
|
|
|
|
setting. Please however note that using a non-default codec is discouraged as
|
|
|
|
it could prevent future versions of Elasticsearch from being able to read the
|
|
|
|
index.
|
2015-02-23 07:08:06 -05:00
|
|
|
|
|
|
|
=== Scripts
|
|
|
|
|
2015-03-19 15:49:58 -04:00
|
|
|
Deprecated script parameters `id`, `file`, and `scriptField` have been removed
|
|
|
|
from all scriptable APIs. `script_id`, `script_file` and `script` should be used
|
2015-02-23 07:08:06 -05:00
|
|
|
in their place.
|
2015-03-19 15:49:58 -04:00
|
|
|
|
|
|
|
=== Thrift and memcached transport
|
|
|
|
|
|
|
|
The thrift and memcached transport plugins are no longer supported. Instead, use
|
|
|
|
either the HTTP transport (enabled by default) or the node or transport Java client.
|
|
|
|
|
2015-01-14 05:19:32 -05:00
|
|
|
=== `search_type=count` deprecation
|
|
|
|
|
|
|
|
The `count` search type has been deprecated. All benefits from this search type can
|
|
|
|
now be achieved by using the `query_then_fetch` search type (which is the
|
|
|
|
default) and setting `size` to `0`.
|