Reworked 5.0 breaking changes docs

This commit is contained in:
Clinton Gormley 2016-03-13 21:17:48 +01:00
parent 25531b7299
commit 5c845f8bb5
12 changed files with 957 additions and 856 deletions

View File

@ -4,877 +4,52 @@
This section discusses the changes that you need to be aware of when migrating
your application to Elasticsearch 5.0.
[IMPORTANT]
.Reindex indices from Elasticseach 1.x or before
=========================================
Indices created in Elasticsearch 1.x or before will need to be reindexed with
Elasticsearch 2.x in order to be readable by Elasticsearch 5.x. The easiest
way to do this is to upgrade to Elasticsearch 2.3 or later and to use the
`reindex` API.
=========================================
[float]
=== Also see:
* <<breaking_50_search_changes>>
* <<breaking_50_mapping_changes>>
* <<breaking_50_percolator>>
* <<breaking_50_index_apis>>
* <<breaking_50_settings_changes>>
* <<breaking_50_allocation>>
* <<breaking_50_rest_api_changes>>
* <<breaking_50_cat_api>>
* <<breaking_50_parent_child_changes>>
* <<breaking_50_settings_changes>>
* <<breaking_50_mapping_changes>>
* <<breaking_50_plugins>>
* <<breaking_50_java_api_changes>>
* <<breaking_50_cache_concurrency>>
* <<breaking_50_non_loopback>>
* <<breaking_50_thread_pool>>
* <<breaking_50_allocation>>
* <<breaking_50_percolator>>
* <<breaking_50_packaging>>
* <<breaking_50_scripting>>
* <<breaking_50_term_vectors>>
* <<breaking_50_security>>
* <<breaking_50_snapshot_restore>>
* <<breaking_50_plugins>>
[[breaking_50_search_changes]]
=== Warmers
include::migrate_5_0/search.asciidoc[]
Thanks to several changes like doc values by default or disk-based norms,
warmers have become quite useless. As a consequence, warmers and the warmer
API have been removed: it is not possible anymore to register queries that
will run before a new IndexSearcher is published.
include::migrate_5_0/mapping.asciidoc[]
Don't worry if you have warmers defined on your indices, they will simply be
ignored when upgrading to 5.0.
include::migrate_5_0/percolator.asciidoc[]
=== Search changes
include::migrate_5_0/index-apis.asciidoc[]
==== `search_type=count` removed
include::migrate_5_0/settings.asciidoc[]
The `count` search type was deprecated since version 2.0.0 and is now removed.
In order to get the same benefits, you just need to set the value of the `size`
parameter to `0`.
include::migrate_5_0/allocation.asciidoc[]
For instance, the following request:
include::migrate_5_0/rest.asciidoc[]
[source,sh]
---------------
GET /my_index/_search?search_type=count
{
"aggs": {
"my_terms": {
"terms": {
"field": "foo"
}
}
}
}
---------------
include::migrate_5_0/cat.asciidoc[]
can be replaced with:
include::migrate_5_0/java.asciidoc[]
[source,sh]
---------------
GET /my_index/_search
{
"size": 0,
"aggs": {
"my_terms": {
"terms": {
"field": "foo"
}
}
}
}
---------------
include::migrate_5_0/packaging.asciidoc[]
==== `search_type=scan` removed
include::migrate_5_0/plugins.asciidoc[]
The `scan` search type was deprecated since version 2.1.0 and is now removed.
All benefits from this search type can now be achieved by doing a scroll
request that sorts documents in `_doc` order, for instance:
[source,sh]
---------------
GET /my_index/_search?scroll=2m
{
"sort": [
"_doc"
]
}
---------------
Scroll requests sorted by `_doc` have been optimized to more efficiently resume
from where the previous request stopped, so this will have the same performance
characteristics as the former `scan` search type.
==== Boost accuracy for queries on `_all`
Per-field boosts on the `_all` are now compressed on a single byte instead of
4 bytes previously. While this will make the index more space-efficient, this
also means that the boosts will be less accurately encoded.
[[breaking_50_rest_api_changes]]
=== REST API changes
==== id values longer than 512 bytes are rejected
When specifying an `_id` value longer than 512 bytes, the request will be
rejected.
==== search exists api removed
The search exists api has been removed in favour of using the search api with
`size` set to `0` and `terminate_after` set to `1`.
==== `/_optimize` endpoint removed
The deprecated `/_optimize` endpoint has been removed. The `/_forcemerge`
endpoint should be used in lieu of optimize.
The `GET` HTTP verb for `/_forcemerge` is no longer supported, please use the
`POST` HTTP verb.
==== Deprecated queries removed
The following deprecated queries have been removed:
* `filtered`: use `bool` query instead, which supports `filter` clauses too
* `and`: use `must` clauses in a `bool` query instead
* `or`: use should clauses in a `bool` query instead
* `limit`: use `terminate_after` parameter instead
* `fquery`: obsolete after filters and queries have been merged
* `query`: obsolete after filters and queries have been merged
==== Unified fuzziness parameter
* Removed support for the deprecated `min_similarity` parameter in `fuzzy query`, in favour of `similarity`.
* Removed support for the deprecated `fuzzy_min_sim` parameter in `query_string` query, in favour of `similarity`.
* Removed support for the deprecated `edit_distance` parameter in completion suggester, in favour of `similarity`.
==== indices query
Removed support for the deprecated `filter` and `no_match_filter` fields in `indices` query,
in favour of `query` and `no_match_query`.
==== nested query
Removed support for the deprecated `filter` fields in `nested` query, in favour of `query`.
==== terms query
Removed support for the deprecated `minimum_should_match` and `disable_coord` in `terms` query, use `bool` query instead.
Removed also support for the deprecated `execution` parameter.
==== function_score query
Removed support for the top level `filter` element in `function_score` query, replaced by `query`.
==== highlighters
Removed support for multiple highlighter names, the only supported ones are: `plain`, `fvh` and `postings`.
==== top level filter
Removed support for the deprecated top level `filter` in the search api, replaced by `post_filter`.
==== `query_binary` and `filter_binary` removed
Removed support for the undocumented `query_binary` and `filter_binary` sections of a search request.
==== `span_near`'s' `collect_payloads` deprecated
Payloads are now loaded when needed.
[[breaking_50_cat_api]]
=== CAT API changes
==== Use Accept header for specifying response media type
Previous versions of Elasticsearch accepted the Content-type header
field for controlling the media type of the response in the cat API.
This is in opposition to the HTTP spec which specifies the Accept
header field for this purpose. Elasticsearch now uses the Accept header
field and support for using the Content-Type header field for this
purpose has been removed.
==== Host field removed from the cat nodes API
The `host` field has been removed from the cat nodes API as its value
is always equal to the `ip` field. The `name` field is available in the
cat nodes API and should be used instead of the `host` field.
==== Changes to cat recovery API
The fields `bytes_recovered` and `files_recovered` have been added to
the cat recovery API. These fields, respectively, indicate the total
number of bytes and files that have been recovered.
The fields `total_files` and `total_bytes` have been renamed to
`files_total` and `bytes_total`, respectively.
Additionally, the field `translog` has been renamed to
`translog_ops_recovered`, the field `translog_total` to
`translog_ops` and the field `translog_percent` to
`translog_ops_percent`. The short aliases for these fields are `tor`,
`to`, and `top`, respectively.
[[breaking_50_parent_child_changes]]
=== Parent/Child changes
The `children` aggregation, parent child inner hits and `has_child` and `has_parent` queries will not work on indices
with `_parent` field mapping created before version `2.0.0`. The data of these indices need to be re-indexed into a new index.
The format of the join between parent and child documents have changed with the `2.0.0` release. The old
format can't read from version `5.0.0` and onwards. The new format allows for a much more efficient and
scalable join between parent and child documents and the join data structures are stored on disk
data structures as opposed as before the join data structures were stored in the jvm heap space.
==== `score_type` has been removed
The `score_type` option has been removed from the `has_child` and `has_parent` queries in favour of the `score_mode` option
which does the exact same thing.
==== `sum` score mode removed
The `sum` score mode has been removed in favour of the `total` mode which does the same and is already available in
previous versions.
==== `max_children` option
When `max_children` was set to `0` on the `has_child` query then there was no upper limit on how many children documents
are allowed to match. This has changed and `0` now really means to zero child documents are allowed. If no upper limit
is needed then the `max_children` option shouldn't be defined at all on the `has_child` query.
==== `_parent` field no longer indexed
The join between parent and child documents no longer relies on indexed fields and therefor from `5.0.0` onwards
the `_parent` indexed field won't be indexed. In order to find documents that referrer to a specific parent id
the new `parent_id` query can be used. The get response and hits inside the search response remain to include
the parent id under the `_parent` key.
[[breaking_50_settings_changes]]
=== Settings changes
From Elasticsearch 5.0 on all settings are validated before they are applied. Node level and default index
level settings are validated on node startup, dynamic cluster and index setting are validated before they are updated/added
to the cluster state. Every setting must be a _known_ setting or in other words all settings must be registered with the
node or transport client they are used with. This implies that plugins that define custom settings must register all of their
settings during pluging loading using the `SettingsModule#registerSettings(Setting)` method.
==== Node settings
The `name` setting has been removed and is replaced by `node.name`. Usage of `-Dname=some_node_name` is not supported
anymore.
==== Transport Settings
All settings with a `netty` infix have been replaced by their already existing `transport` synonyms. For instance `transport.netty.bind_host` is
no longer supported and should be replaced by the superseding setting `transport.bind_host`.
==== Analysis settings
The `index.analysis.analyzer.default_index` analyzer is not supported anymore.
If you wish to change the analyzer to use for indexing, change the
`index.analysis.analyzer.default` analyzer instead.
==== Ping timeout settings
Previously, there were three settings for the ping timeout: `discovery.zen.initial_ping_timeout`,
`discovery.zen.ping.timeout` and `discovery.zen.ping_timeout`. The former two have been removed and
the only setting key for the ping timeout is now `discovery.zen.ping_timeout`. The default value for
ping timeouts remains at three seconds.
==== Recovery settings
Recovery settings deprecated in 1.x have been removed:
* `index.shard.recovery.translog_size` is superseded by `indices.recovery.translog_size`
* `index.shard.recovery.translog_ops` is superseded by `indices.recovery.translog_ops`
* `index.shard.recovery.file_chunk_size` is superseded by `indices.recovery.file_chunk_size`
* `index.shard.recovery.concurrent_streams` is superseded by `indices.recovery.concurrent_streams`
* `index.shard.recovery.concurrent_small_file_streams` is superseded by `indices.recovery.concurrent_small_file_streams`
* `indices.recovery.max_size_per_sec` is superseded by `indices.recovery.max_bytes_per_sec`
If you are using any of these settings please take the time and review their purpose. All of the settings above are considered
_expert settings_ and should only be used if absolutely necessary. If you have set any of the above setting as persistent
cluster settings please use the settings update API and set their superseded keys accordingly.
The following settings have been removed without replacement
* `indices.recovery.concurrent_small_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders
* `indices.recovery.concurrent_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders
==== Translog settings
The `index.translog.flush_threshold_ops` setting is not supported anymore. In order to control flushes based on the transaction log
growth use `index.translog.flush_threshold_size` instead. Changing the translog type with `index.translog.fs.type` is not supported
anymore, the `buffered` implementation is now the only available option and uses a fixed `8kb` buffer.
The translog by default is fsynced on a request basis such that the ability to fsync on every operation is not necessary anymore. In-fact it can
be a performance bottleneck and it's trappy since it enabled by a special value set on `index.translog.sync_interval`. `index.translog.sync_interval`
now doesn't accept a value less than `100ms` which prevents fsyncing too often if async durability is enabled. The special value `0` is not supported anymore.
==== Request Cache Settings
The deprecated settings `index.cache.query.enable` and `indices.cache.query.size` have been removed and are replaced with
`index.requests.cache.enable` and `indices.requests.cache.size` respectively.
`indices.requests.cache.clean_interval` has been replaced with `indices.cache.clean_interval` and is no longer supported.
==== Field Data Cache Settings
`indices.fielddata.cache.clean_interval` has been replaced with `indices.cache.clean_interval` and is no longer supported.
==== Allocation settings
Allocation settings deprecated in 1.x have been removed:
* `cluster.routing.allocation.concurrent_recoveries` is superseded by `cluster.routing.allocation.node_concurrent_recoveries`
Please change the setting in your configuration files or in the clusterstate to use the new settings instead.
==== Similarity settings
The 'default' similarity has been renamed to 'classic'.
==== Indexing settings
`indices.memory.min_shard_index_buffer_size` and `indices.memory.max_shard_index_buffer_size` are removed since Elasticsearch now allows any one shard to any
amount of heap as long as the total indexing buffer heap used across all shards is below the node's `indices.memory.index_buffer_size` (default: 10% of the JVM heap)
==== Removed es.max-open-files
Setting the system property es.max-open-files to true to get
Elasticsearch to print the number of maximum open files for the
Elasticsearch process has been removed. This same information can be
obtained from the <<cluster-nodes-info>> API, and a warning is logged
on startup if it is set too low.
==== Removed es.netty.gathering
Disabling Netty from using NIO gathering could be done via the escape
hatch of setting the system property "es.netty.gathering" to "false".
Time has proven enabling gathering by default is a non-issue and this
non-documented setting has been removed.
==== Removed es.useLinkedTransferQueue
The system property `es.useLinkedTransferQueue` could be used to
control the queue implementation used in the cluster service and the
handling of ping responses during discovery. This was an undocumented
setting and has been removed.
[[breaking_50_mapping_changes]]
=== Mapping changes
==== Default doc values settings
Doc values are now also on by default on numeric and boolean fields that are
not indexed.
==== Transform removed
The `transform` feature from mappings has been removed. It made issues very hard to debug.
==== Default number mappings
When a floating-point number is encountered, it is now dynamically mapped as a
float by default instead of a double. The reasoning is that floats should be
more than enough for most cases but would decrease storage requirements
significantly.
==== `index` property
On all types but `string`, the `index` property now only accepts `true`/`false`
instead of `not_analyzed`/`no`. The `string` field still accepts
`analyzed`/`not_analyzed`/`no`.
==== ++_source++'s `format` option
The `_source` mapping does not support the `format` option anymore. This option
will still be accepted for indices created before the upgrade to 5.0 for backward
compatibility, but it will have no effect. Indices created on or after 5.0 will
reject this option.
==== Object notation
Core types don't support the object notation anymore, which allowed to provide
values as follows:
[source,json]
---------------
{
"value": "field_value",
"boost": 42
}
---------------
==== `fielddata.format`
Setting `fielddata.format: doc_values` in the mappings used to implicitly
enable doc values on a field. This no longer works: the only way to enable or
disable doc values is by using the `doc_values` property of mappings.
[[breaking_50_plugins]]
=== Plugin changes
The command `bin/plugin` has been renamed to `bin/elasticsearch-plugin`.
The structure of the plugin has changed. All the plugin files must be contained in a directory called `elasticsearch`.
If you use the gradle build, this structure is automatically generated.
==== Site plugins removed
Site plugins have been removed. It is recommended to migrate site plugins to Kibana plugins.
==== Multicast plugin removed
Multicast has been removed. Use unicast discovery, or one of the cloud discovery plugins.
==== Plugins with custom query implementations
Plugins implementing custom queries need to implement the `fromXContent(QueryParseContext)` method in their
`QueryParser` subclass rather than `parse`. This method will take care of parsing the query from `XContent` format
into an intermediate query representation that can be streamed between the nodes in binary format, effectively the
query object used in the java api. Also, the query parser needs to implement the `getBuilderPrototype` method that
returns a prototype of the `NamedWriteable` query, which allows to deserialize an incoming query by calling
`readFrom(StreamInput)` against it, which will create a new object, see usages of `Writeable`. The `QueryParser`
also needs to declare the generic type of the query that it supports and it's able to parse.
The query object can then transform itself into a lucene query through the new `toQuery(QueryShardContext)` method,
which returns a lucene query to be executed on the data node.
Similarly, plugins implementing custom score functions need to implement the `fromXContent(QueryParseContext)`
method in their `ScoreFunctionParser` subclass rather than `parse`. This method will take care of parsing
the function from `XContent` format into an intermediate function representation that can be streamed between
the nodes in binary format, effectively the function object used in the java api. Also, the query parser needs
to implement the `getBuilderPrototype` method that returns a prototype of the `NamedWriteable` function, which
allows to deserialize an incoming function by calling `readFrom(StreamInput)` against it, which will create a
new object, see usages of `Writeable`. The `ScoreFunctionParser` also needs to declare the generic type of the
function that it supports and it's able to parse. The function object can then transform itself into a lucene
function through the new `toFunction(QueryShardContext)` method, which returns a lucene function to be executed
on the data node.
==== Cloud AWS plugin changes
Cloud AWS plugin has been split in two plugins:
* {plugins}/discovery-ec2.html[Discovery EC2 plugin]
* {plugins}/repository-s3.html[Repository S3 plugin]
Proxy settings for both plugins have been renamed:
* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host`
* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host`
* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host`
* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port`
* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port`
* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port`
==== Cloud Azure plugin changes
Cloud Azure plugin has been split in three plugins:
* {plugins}/discovery-azure.html[Discovery Azure plugin]
* {plugins}/repository-azure.html[Repository Azure plugin]
* {plugins}/store-smb.html[Store SMB plugin]
If you were using the `cloud-azure` plugin for snapshot and restore, you had in `elasticsearch.yml`:
[source,yaml]
-----
cloud:
azure:
storage:
account: your_azure_storage_account
key: your_azure_storage_key
-----
You need to give a unique id to the storage details now as you can define multiple storage accounts:
[source,yaml]
-----
cloud:
azure:
storage:
my_account:
account: your_azure_storage_account
key: your_azure_storage_key
-----
==== Cloud GCE plugin changes
Cloud GCE plugin has been renamed to {plugins}/discovery-gce.html[Discovery GCE plugin].
==== Mapper Attachments plugin deprecated
Mapper attachments has been deprecated. Users should use now the {plugins}/ingest-attachment.html[`ingest-attachment`]
plugin.
[[breaking_50_java_api_changes]]
=== Java API changes
==== Count api has been removed
The deprecated count api has been removed from the Java api, use the search api instead and set size to 0.
The following call
[source,java]
-----
client.prepareCount(indices).setQuery(query).get();
-----
can be replaced with
[source,java]
-----
client.prepareSearch(indices).setSource(new SearchSourceBuilder().size(0).query(query)).get();
-----
==== BoostingQueryBuilder
Removed setters for mandatory positive/negative query. Both arguments now have
to be supplied at construction time already and have to be non-null.
==== SpanContainingQueryBuilder
Removed setters for mandatory big/little inner span queries. Both arguments now have
to be supplied at construction time already and have to be non-null. Updated
static factory methods in QueryBuilders accordingly.
==== SpanOrQueryBuilder
Making sure that query contains at least one clause by making initial clause mandatory
in constructor.
==== SpanNearQueryBuilder
Removed setter for mandatory slop parameter, needs to be set in constructor now. Also
making sure that query contains at least one clause by making initial clause mandatory
in constructor. Updated the static factory methods in QueryBuilders accordingly.
==== SpanNotQueryBuilder
Removed setter for mandatory include/exclude span query clause, needs to be set in constructor now.
Updated the static factory methods in QueryBuilders and tests accordingly.
==== SpanWithinQueryBuilder
Removed setters for mandatory big/little inner span queries. Both arguments now have
to be supplied at construction time already and have to be non-null. Updated
static factory methods in QueryBuilders accordingly.
==== QueryFilterBuilder
Removed the setter `queryName(String queryName)` since this field is not supported
in this type of query. Use `FQueryFilterBuilder.queryName(String queryName)` instead
when in need to wrap a named query as a filter.
==== WrapperQueryBuilder
Removed `wrapperQueryBuilder(byte[] source, int offset, int length)`. Instead simply
use `wrapperQueryBuilder(byte[] source)`. Updated the static factory methods in
QueryBuilders accordingly.
==== QueryStringQueryBuilder
Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`.
Use the `field(String, float)` method instead.
==== Operator
Removed the enums called `Operator` from `MatchQueryBuilder`, `QueryStringQueryBuilder`,
`SimpleQueryStringBuilder`, and `CommonTermsQueryBuilder` in favour of using the enum
defined in `org.elasticsearch.index.query.Operator` in an effort to consolidate the
codebase and avoid duplication.
==== queryName and boost support
Support for `queryName` and `boost` has been streamlined to all of the queries. That is
a breaking change till queries get sent over the network as serialized json rather
than in `Streamable` format. In fact whenever additional fields are added to the json
representation of the query, older nodes might throw error when they find unknown fields.
==== InnerHitsBuilder
InnerHitsBuilder now has a dedicated addParentChildInnerHits and addNestedInnerHits methods
to differentiate between inner hits for nested vs. parent / child documents. This change
makes the type / path parameter mandatory.
==== MatchQueryBuilder
Moving MatchQueryBuilder.Type and MatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.Type.
Also reusing new Operator enum.
==== MoreLikeThisQueryBuilder
Removed `MoreLikeThisQueryBuilder.Item#id(String id)`, `Item#doc(BytesReference doc)`,
`Item#doc(XContentBuilder doc)`. Use provided constructors instead.
Removed `MoreLikeThisQueryBuilder#addLike` in favor of texts and/or items being provided
at construction time. Using arrays there instead of lists now.
Removed `MoreLikeThisQueryBuilder#addUnlike` in favor to using the `unlike` methods
which take arrays as arguments now rather than the lists used before.
The deprecated `docs(Item... docs)`, `ignoreLike(Item... docs)`,
`ignoreLike(String... likeText)`, `addItem(Item... likeItems)` have been removed.
==== GeoDistanceQueryBuilder
Removing individual setters for lon() and lat() values, both values should be set together
using point(lon, lat).
==== GeoDistanceRangeQueryBuilder
Removing setters for to(Object ...) and from(Object ...) in favour of the only two allowed input
arguments (String, Number). Removing setter for center point (point(), geohash()) because parameter
is mandatory and should already be set in constructor.
Also removing setters for lt(), lte(), gt(), gte() since they can all be replaced by equivalent
calls to to/from() and inludeLower()/includeUpper().
==== GeoPolygonQueryBuilder
Require shell of polygon already to be specified in constructor instead of adding it pointwise.
This enables validation, but makes it necessary to remove the addPoint() methods.
==== MultiMatchQueryBuilder
Moving MultiMatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.ZeroTermsQuery.
Also reusing new Operator enum.
Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`.
Use the `field(String, float)` method instead.
==== MissingQueryBuilder
The MissingQueryBuilder which was deprecated in 2.2.0 is removed. As a replacement use ExistsQueryBuilder
inside a mustNot() clause. So instead of using `new ExistsQueryBuilder(name)` now use
`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`.
==== NotQueryBuilder
The NotQueryBuilder which was deprecated in 2.1.0 is removed. As a replacement use BoolQueryBuilder
with added mustNot() clause. So instead of using `new NotQueryBuilder(filter)` now use
`new BoolQueryBuilder().mustNot(filter)`.
==== TermsQueryBuilder
Remove the setter for `termsLookup()`, making it only possible to either use a TermsLookup object or
individual values at construction time. Also moving individual settings for the TermsLookup (lookupIndex,
lookupType, lookupId, lookupPath) to the separate TermsLookup class, using constructor only and moving
checks for validation there. Removed `TermsLookupQueryBuilder` in favour of `TermsQueryBuilder`.
==== FunctionScoreQueryBuilder
`add` methods have been removed, all filters and functions must be provided as constructor arguments by
creating an array of `FunctionScoreQueryBuilder.FilterFunctionBuilder` objects, containing one element
for each filter/function pair.
`scoreMode` and `boostMode` can only be provided using corresponding enum members instead
of string values: see `FilterFunctionScoreQuery.ScoreMode` and `CombineFunction`.
`CombineFunction.MULT` has been renamed to `MULTIPLY`.
==== IdsQueryBuilder
For simplicity, only one way of adding the ids to the existing list (empty by default) is left: `addIds(String...)`
==== DocumentAlreadyExistsException removed
`DocumentAlreadyExistsException` is removed and a `VersionConflictException` is thrown instead (with a better
error description). This will influence code that use the `IndexRequest.opType()` or `IndexRequest.create()`
to index a document only if it doesn't already exist.
==== ShapeBuilders
`InternalLineStringBuilder` is removed in favour of `LineStringBuilder`, `InternalPolygonBuilder` in favour of PolygonBuilder` and `Ring` has been replaced with `LineStringBuilder`. Also the abstract base classes `BaseLineStringBuilder` and `BasePolygonBuilder` haven been merged with their corresponding implementations.
==== RescoreBuilder
`RecoreBuilder.Rescorer` was merged with `RescoreBuilder`, which now is an abstract superclass. QueryRescoreBuilder currently is its only implementation.
==== PhraseSuggestionBuilder
The inner DirectCandidateGenerator class has been moved out to its own class called DirectCandidateGeneratorBuilder.
==== Elasticsearch will no longer detect logging implementations
Elasticsearch now logs only to log4j 1.2. Previously if log4j wasn't on the classpath it made some effort to degrade to
slf4j or java.util.logging. Now it'll fail to work without the log4j 1.2 api. The log4j-over-slf4j bridge ought to work
when using the java client. As should log4j 2's log4j-1.2-api. The Elasticsearch server now only supports log4j as
configured by logging.yml and it no longer makes any effort to work if log4j isn't present.
[[breaking_50_cache_concurrency]]
=== Cache concurrency level settings removed
Two cache concurrency level settings `indices.requests.cache.concurrency_level` and
`indices.fielddata.cache.concurrency_level` because they no longer apply to the cache implementation used for the
request cache and the field data cache.
[[breaking_50_non_loopback]]
=== Remove bind option of `non_loopback`
This setting would arbitrarily pick the first interface not marked as loopback. Instead, specify by address
scope (e.g. `_local_,_site_` for all loopback and private network addresses) or by explicit interface names,
hostnames, or addresses.
[[breaking_50_thread_pool]]
=== Forbid changing of thread pool types
Previously, <<modules-threadpool,thread pool types>> could be dynamically adjusted. The thread pool type effectively
controls the backing queue for the thread pool and modifying this is an expert setting with minimal practical benefits
and high risk of being misused. The ability to change the thread pool type for any thread pool has been removed; do note
that it is still possible to adjust relevant thread pool parameters for each of the thread pools (e.g., depending on
the thread pool type, `keep_alive`, `queue_size`, etc.).
[[breaking_50_cpu_stats]]
=== System CPU stats
The recent CPU usage (as a percent) has been added to the OS stats
reported under the node stats API and the cat nodes API. The breaking
change here is that there is a new object in the `os` object in the node
stats response. This object is called `cpu` and includes "percent" and
`load_average` as fields. This moves the `load_average` field that was
previously a top-level field in the `os` object to the `cpu` object. The
format of the `load_average` field has changed to an object with fields
`1m`, `5m`, and `15m` representing the one-minute, five-minute and
fifteen-minute loads respectively. If any of these fields are not present,
it indicates that the corresponding value is not available.
In the cat nodes API response, the `cpu` field is output by default. The
previous `load` field has been removed and is replaced by `load_1m`,
`load_5m`, and `load_15m` which represent the one-minute, five-minute
and fifteen-minute loads respectively. The field will be null if the
corresponding value is not available.
Finally, the API for `org.elasticsearch.monitor.os.OsStats` has
changed. The `getLoadAverage` method has been removed. The value for
this can now be obtained from `OsStats.Cpu#getLoadAverage` but it is no
longer a double and is instead an object encapsulating the one-minute,
five-minute and fifteen-minute load averages. Additionally, the recent
CPU usage can be obtained from `OsStats.Cpu#getPercent`.
=== Fields option
Only stored fields are retrievable with this option.
The fields option won't be able to load non stored fields from _source anymore.
[[breaking_50_allocation]]
=== Primary shard allocation
Previously, primary shards were only assigned if a quorum of shard copies were found (configurable using
`index.recovery.initial_shards`, now deprecated). In case where a primary had only a single replica, quorum was defined
to be a single shard. This meant that any shard copy of an index with replication factor 1 could become primary, even it
was a stale copy of the data on disk. This is now fixed by using allocation IDs.
Allocation IDs assign unique identifiers to shard copies. This allows the cluster to differentiate between multiple
copies of the same data and track which shards have been active, so that after a cluster restart, shard copies
containing only the most recent data can become primaries.
=== Indices Shard Stores command
By using allocation IDs instead of version numbers to identify shard copies for primary shard allocation, the former versioning scheme
has become obsolete. This is reflected in the indices-shards-stores.html[Indices Shard Stores API]. A new field `allocation_id` replaces the
former `version` field in the result of the Indices Shard Stores command. This field is available for all shard copies that have been either
created with the current version of Elasticsearch or have been active in a cluster running a current version of Elasticsearch. For legacy
shard copies that have not been active in a current version of Elasticsearch, a `legacy_version` field is available instead (equivalent to
the former `version` field).
=== Reroute commands
The reroute command `allocate` has been split into two distinct commands `allocate_replica` and `allocate_empty_primary`.
This was done as we introduced a new `allocate_stale_primary` command. The new `allocate_replica` command corresponds to the
old `allocate` command with `allow_primary` set to false. The new `allocate_empty_primary` command corresponds to the old
`allocate` command with `allow_primary` set to true.
==== `index.shared_filesystem.recover_on_any_node` changes
The behavior of `index.shared_filesystem.recover_on_any_node = true` has been changed. Previously, in the case where no
shard copies could be found, an arbitrary node was chosen by potentially ignoring allocation deciders. Now, we take
balancing into account but don't assign the shard if the allocation deciders are not satisfied. The behavior has also changed
in the case where shard copies can be found. Previously, a node not holding the shard copy was chosen if none of the nodes
holding shard copies were satisfying the allocation deciders. Now, the shard will be assigned to a node having a shard copy,
even if none of the nodes holding a shard copy satisfy the allocation deciders.
[[breaking_50_percolator]]
=== Percolator
Adding percolator queries and modifications to existing percolator queries are no longer visible in immediately
to the percolator. A refresh is required to run before the changes are visible to the percolator.
The reason that this has changed is that on newly created indices the percolator automatically indexes the query terms
and these query terms are used at percolate time to reduce the amount of queries the percolate API needs evaluate.
This optimization didn't work in the percolate API mode where modifications to queries are immediately visible.
The percolator by defaults sets the `size` option to `10` whereas before this was set to unlimited.
The percolate api can no longer accept documents that have fields that don't exist in the mapping.
When percolating an existing document then specifying a document in the source of the percolate request is not allowed
any more.
The percolate api no longer modifies the mappings. Before the percolate api could be used to dynamically introduce new
fields to the mappings based on the fields in the document being percolated. This no longer works, because these
unmapped fields are not persisted in the mapping.
Percolator documents are no longer excluded from the search response.
[[breaking_50_packaging]]
=== Packaging
==== Default logging using systemd (since Elasticsearch 2.2.0)
In previous versions of Elasticsearch, the default logging
configuration routed standard output to /dev/null and standard error to
the journal. However, there are often critical error messages at
startup that are logged to standard output rather than standard error
and these error messages would be lost to the nether. The default has
changed to now route standard output to the journal and standard error
to inherit this setting (these are the defaults for systemd). These
settings can be modified by editing the elasticsearch.service file.
==== Longer startup times
In Elasticsearch 5.0.0 the `-XX:+AlwaysPreTouch` flag has been added to the JVM
startup options. This option touches all memory pages used by the JVM heap
during initialization of the HotSpot VM to reduce the chance of having to commit
a memory page during GC time. This will increase the startup time of
Elasticsearch as well as increasing the initial resident memory usage of the
Java process.
[[breaking_50_scripting]]
=== Scripting
==== Script mode settings
Previously script mode settings (e.g., "script.inline: true",
"script.engine.groovy.inline.aggs: false", etc.) accepted the values
`on`, `true`, `1`, and `yes` for enabling a scripting mode, and the
values `off`, `false`, `0`, and `no` for disabling a scripting mode.
The variants `on`, `1`, and `yes ` for enabling and `off`, `0`,
and `no` for disabling are no longer supported.
==== Groovy dependencies
In previous versions of Elasticsearch, the Groovy scripting capabilities
depended on the `org.codehaus.groovy:groovy-all` artifact. In addition
to pulling in the Groovy language, this pulls in a very large set of
functionality, none of which is needed for scripting within
Elasticsearch. Aside from the inherent difficulties in managing such a
large set of dependencies, this also increases the surface area for
security issues. This dependency has been reduced to the core Groovy
language `org.codehaus.groovy:groovy` artifact.
[[breaking_50_term_vectors]]
=== Term vectors
The term vectors APIs no longer persist unmapped fields in the mappings.
The `dfs` parameter has been removed completely, term vectors don't support
distributed document frequencies anymore.
[[breaking_50_security]]
=== Security
The option to disable the security manager `--security.manager.enabled` has been removed. In order to grant special
permissions to elasticsearch users must tweak the local Java Security Policy.
[[breaking_50_snapshot_restore]]
=== Snapshot/Restore
==== Closing / deleting indices while running snapshot
In previous versions of Elasticsearch, closing or deleting an index during a full snapshot would make the snapshot fail. This is now changed
by failing the close/delete index request instead. The behavior for partial snapshots remains unchanged: Closing or deleting an index during
a partial snapshot is still possible. The snapshot result is then marked as partial.

View File

@ -0,0 +1,54 @@
[[breaking_50_allocation]]
=== Allocation changes
==== Primary shard allocation
Previously, primary shards were only assigned if a quorum of shard copies were
found (configurable using `index.recovery.initial_shards`, now deprecated). In
case where a primary had only a single replica, quorum was defined to be a
single shard. This meant that any shard copy of an index with replication
factor 1 could become primary, even it was a stale copy of the data on disk.
This is now fixed thanks to shard allocation IDs.
Allocation IDs assign unique identifiers to shard copies. This allows the
cluster to differentiate between multiple copies of the same data and track
which shards have been active so that, after a cluster restart, only shard
copies containing the most recent data can become primaries.
==== Indices Shard Stores command
By using allocation IDs instead of version numbers to identify shard copies
for primary shard allocation, the former versioning scheme has become
obsolete. This is reflected in the
<<indices-shards-stores,Indices Shard Stores API>>.
A new `allocation_id` field replaces the former `version` field in the result
of the Indices Shard Stores command. This field is available for all shard
copies that have been either created with the current version of Elasticsearch
or have been active in a cluster running a current version of Elasticsearch.
For legacy shard copies that have not been active in a current version of
Elasticsearch, a `legacy_version` field is available instead (equivalent to
the former `version` field).
==== Reroute commands
The reroute command `allocate` has been split into two distinct commands
`allocate_replica` and `allocate_empty_primary`. This was done as we
introduced a new `allocate_stale_primary` command. The new `allocate_replica`
command corresponds to the old `allocate` command with `allow_primary` set to
false. The new `allocate_empty_primary` command corresponds to the old
`allocate` command with `allow_primary` set to true.
==== `index.shared_filesystem.recover_on_any_node` changes
The behavior of `index.shared_filesystem.recover_on_any_node: true` has been
changed. Previously, in the case where no shard copies could be found, an
arbitrary node was chosen by potentially ignoring allocation deciders. Now, we
take balancing into account but don't assign the shard if the allocation
deciders are not satisfied.
The behavior has also changed in the case where shard copies can be found.
Previously, a node not holding the shard copy was chosen if none of the nodes
holding shard copies were satisfying the allocation deciders. Now, the shard
will be assigned to a node having a shard copy, even if none of the nodes
holding a shard copy satisfy the allocation deciders.

View File

@ -0,0 +1,33 @@
[[breaking_50_cat_api]]
=== CAT API changes
==== Use Accept header for specifying response media type
Previous versions of Elasticsearch accepted the Content-type header
field for controlling the media type of the response in the cat API.
This is in opposition to the HTTP spec which specifies the Accept
header field for this purpose. Elasticsearch now uses the Accept header
field and support for using the Content-Type header field for this
purpose has been removed.
==== Host field removed from the cat nodes API
The `host` field has been removed from the cat nodes API as its value
is always equal to the `ip` field. The `name` field is available in the
cat nodes API and should be used instead of the `host` field.
==== Changes to cat recovery API
The fields `bytes_recovered` and `files_recovered` have been added to
the cat recovery API. These fields, respectively, indicate the total
number of bytes and files that have been recovered.
The fields `total_files` and `total_bytes` have been renamed to
`files_total` and `bytes_total`, respectively.
Additionally, the field `translog` has been renamed to
`translog_ops_recovered`, the field `translog_total` to
`translog_ops` and the field `translog_percent` to
`translog_ops_percent`. The short aliases for these fields are `tor`,
`to`, and `top`, respectively.

View File

@ -0,0 +1,48 @@
[[breaking_50_index_apis]]
=== Index APIs changes
==== Closing / deleting indices while running snapshot
In previous versions of Elasticsearch, closing or deleting an index during a
full snapshot would make the snapshot fail. In 5.0, the close/delete index
request will fail instead. The behavior for partial snapshots remains
unchanged: Closing or deleting an index during a partial snapshot is still
possible. The snapshot result is then marked as partial.
==== Warmers
Thanks to several changes like doc values by default and disk-based norms,
warmers are no longer useful. As a consequence, warmers and the warmer API
have been removed: it is no longer possible to register queries that will run
before a new IndexSearcher is published.
Don't worry if you have warmers defined on your indices, they will simply be
ignored when upgrading to 5.0.
==== System CPU stats
The recent CPU usage (as a percent) has been added to the OS stats
reported under the node stats API and the cat nodes API. The breaking
change here is that there is a new object in the `os` object in the node
stats response. This object is called `cpu` and includes percent` and
`load_average` as fields. This moves the `load_average` field that was
previously a top-level field in the `os` object to the `cpu` object. The
format of the `load_average` field has changed to an object with fields
`1m`, `5m`, and `15m` representing the one-minute, five-minute and
fifteen-minute loads respectively. If any of these fields are not present,
it indicates that the corresponding value is not available.
In the cat nodes API response, the `cpu` field is output by default. The
previous `load` field has been removed and is replaced by `load_1m`,
`load_5m`, and `load_15m` which represent the one-minute, five-minute
and fifteen-minute loads respectively. The field will be null if the
corresponding value is not available.
Finally, the API for `org.elasticsearch.monitor.os.OsStats` has
changed. The `getLoadAverage` method has been removed. The value for
this can now be obtained from `OsStats.Cpu#getLoadAverage` but it is no
longer a double and is instead an object encapsulating the one-minute,
five-minute and fifteen-minute load averages. Additionally, the recent
CPU usage can be obtained from `OsStats.Cpu#getPercent`.

View File

@ -0,0 +1,213 @@
[[breaking_50_java_api_changes]]
=== Java API changes
==== Count api has been removed
The deprecated count api has been removed from the Java api, use the search api instead and set size to 0.
The following call
[source,java]
-----
client.prepareCount(indices).setQuery(query).get();
-----
can be replaced with
[source,java]
-----
client.prepareSearch(indices).setSource(new SearchSourceBuilder().size(0).query(query)).get();
-----
==== Elasticsearch will no longer detect logging implementations
Elasticsearch now logs only to log4j 1.2. Previously if log4j wasn't on the
classpath it made some effort to degrade to slf4j or java.util.logging. Now it
will fail to work without the log4j 1.2 api. The log4j-over-slf4j bridge ought
to work when using the java client, as should log4j 2's log4j-1.2-api. The
Elasticsearch server now only supports log4j as configured by `logging.yml`
and will fail if log4j isn't present.
==== Groovy dependencies
In previous versions of Elasticsearch, the Groovy scripting capabilities
depended on the `org.codehaus.groovy:groovy-all` artifact. In addition
to pulling in the Groovy language, this pulls in a very large set of
functionality, none of which is needed for scripting within
Elasticsearch. Aside from the inherent difficulties in managing such a
large set of dependencies, this also increases the surface area for
security issues. This dependency has been reduced to the core Groovy
language `org.codehaus.groovy:groovy` artifact.
==== DocumentAlreadyExistsException removed
`DocumentAlreadyExistsException` is removed and a `VersionConflictException` is thrown instead (with a better
error description). This will influence code that use the `IndexRequest.opType()` or `IndexRequest.create()`
to index a document only if it doesn't already exist.
==== Changes to Query Builders
===== BoostingQueryBuilder
Removed setters for mandatory positive/negative query. Both arguments now have
to be supplied at construction time already and have to be non-null.
===== SpanContainingQueryBuilder
Removed setters for mandatory big/little inner span queries. Both arguments now have
to be supplied at construction time already and have to be non-null. Updated
static factory methods in QueryBuilders accordingly.
===== SpanOrQueryBuilder
Making sure that query contains at least one clause by making initial clause mandatory
in constructor.
===== SpanNearQueryBuilder
Removed setter for mandatory slop parameter, needs to be set in constructor now. Also
making sure that query contains at least one clause by making initial clause mandatory
in constructor. Updated the static factory methods in QueryBuilders accordingly.
===== SpanNotQueryBuilder
Removed setter for mandatory include/exclude span query clause, needs to be set in constructor now.
Updated the static factory methods in QueryBuilders and tests accordingly.
===== SpanWithinQueryBuilder
Removed setters for mandatory big/little inner span queries. Both arguments now have
to be supplied at construction time already and have to be non-null. Updated
static factory methods in QueryBuilders accordingly.
===== QueryFilterBuilder
Removed the setter `queryName(String queryName)` since this field is not supported
in this type of query. Use `FQueryFilterBuilder.queryName(String queryName)` instead
when in need to wrap a named query as a filter.
===== WrapperQueryBuilder
Removed `wrapperQueryBuilder(byte[] source, int offset, int length)`. Instead simply
use `wrapperQueryBuilder(byte[] source)`. Updated the static factory methods in
QueryBuilders accordingly.
===== QueryStringQueryBuilder
Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`.
Use the `field(String, float)` method instead.
===== Operator
Removed the enums called `Operator` from `MatchQueryBuilder`, `QueryStringQueryBuilder`,
`SimpleQueryStringBuilder`, and `CommonTermsQueryBuilder` in favour of using the enum
defined in `org.elasticsearch.index.query.Operator` in an effort to consolidate the
codebase and avoid duplication.
===== queryName and boost support
Support for `queryName` and `boost` has been streamlined to all of the queries. That is
a breaking change till queries get sent over the network as serialized json rather
than in `Streamable` format. In fact whenever additional fields are added to the json
representation of the query, older nodes might throw error when they find unknown fields.
===== InnerHitsBuilder
InnerHitsBuilder now has a dedicated addParentChildInnerHits and addNestedInnerHits methods
to differentiate between inner hits for nested vs. parent / child documents. This change
makes the type / path parameter mandatory.
===== MatchQueryBuilder
Moving MatchQueryBuilder.Type and MatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.Type.
Also reusing new Operator enum.
===== MoreLikeThisQueryBuilder
Removed `MoreLikeThisQueryBuilder.Item#id(String id)`, `Item#doc(BytesReference doc)`,
`Item#doc(XContentBuilder doc)`. Use provided constructors instead.
Removed `MoreLikeThisQueryBuilder#addLike` in favor of texts and/or items being provided
at construction time. Using arrays there instead of lists now.
Removed `MoreLikeThisQueryBuilder#addUnlike` in favor to using the `unlike` methods
which take arrays as arguments now rather than the lists used before.
The deprecated `docs(Item... docs)`, `ignoreLike(Item... docs)`,
`ignoreLike(String... likeText)`, `addItem(Item... likeItems)` have been removed.
===== GeoDistanceQueryBuilder
Removing individual setters for lon() and lat() values, both values should be set together
using point(lon, lat).
===== GeoDistanceRangeQueryBuilder
Removing setters for to(Object ...) and from(Object ...) in favour of the only two allowed input
arguments (String, Number). Removing setter for center point (point(), geohash()) because parameter
is mandatory and should already be set in constructor.
Also removing setters for lt(), lte(), gt(), gte() since they can all be replaced by equivalent
calls to to/from() and inludeLower()/includeUpper().
===== GeoPolygonQueryBuilder
Require shell of polygon already to be specified in constructor instead of adding it pointwise.
This enables validation, but makes it necessary to remove the addPoint() methods.
===== MultiMatchQueryBuilder
Moving MultiMatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.ZeroTermsQuery.
Also reusing new Operator enum.
Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`.
Use the `field(String, float)` method instead.
===== MissingQueryBuilder
The MissingQueryBuilder which was deprecated in 2.2.0 is removed. As a replacement use ExistsQueryBuilder
inside a mustNot() clause. So instead of using `new ExistsQueryBuilder(name)` now use
`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`.
===== NotQueryBuilder
The NotQueryBuilder which was deprecated in 2.1.0 is removed. As a replacement use BoolQueryBuilder
with added mustNot() clause. So instead of using `new NotQueryBuilder(filter)` now use
`new BoolQueryBuilder().mustNot(filter)`.
===== TermsQueryBuilder
Remove the setter for `termsLookup()`, making it only possible to either use a TermsLookup object or
individual values at construction time. Also moving individual settings for the TermsLookup (lookupIndex,
lookupType, lookupId, lookupPath) to the separate TermsLookup class, using constructor only and moving
checks for validation there. Removed `TermsLookupQueryBuilder` in favour of `TermsQueryBuilder`.
===== FunctionScoreQueryBuilder
`add` methods have been removed, all filters and functions must be provided as constructor arguments by
creating an array of `FunctionScoreQueryBuilder.FilterFunctionBuilder` objects, containing one element
for each filter/function pair.
`scoreMode` and `boostMode` can only be provided using corresponding enum members instead
of string values: see `FilterFunctionScoreQuery.ScoreMode` and `CombineFunction`.
`CombineFunction.MULT` has been renamed to `MULTIPLY`.
===== IdsQueryBuilder
For simplicity, only one way of adding the ids to the existing list (empty by default) is left: `addIds(String...)`
===== ShapeBuilders
`InternalLineStringBuilder` is removed in favour of `LineStringBuilder`, `InternalPolygonBuilder` in favour of PolygonBuilder` and `Ring` has been replaced with `LineStringBuilder`. Also the abstract base classes `BaseLineStringBuilder` and `BasePolygonBuilder` haven been merged with their corresponding implementations.
===== RescoreBuilder
`RecoreBuilder.Rescorer` was merged with `RescoreBuilder`, which now is an abstract superclass. QueryRescoreBuilder currently is its only implementation.
===== PhraseSuggestionBuilder
The inner DirectCandidateGenerator class has been moved out to its own class called DirectCandidateGeneratorBuilder.

View File

@ -0,0 +1,82 @@
[[breaking_50_mapping_changes]]
=== Mapping changes
==== `string` fields replaced by `text`/`keyword` fields
The `string` field datatype has been replaced by the `text` field for full
text analyzed content, and the `keyword` field for not-analyzed exact string
values. For backwards compatibility purposes, during the 5.x series:
* `string` fields on pre-5.0 indices will function as before.
* New `string` fields can be added to pre-5.0 indices as before.
* `text` and `keyword` fields can also be added to pre-5.0 indices.
* When adding a `string` field to a new index, the field mapping will be
rewritten as a `text` or `keyword` field if possible, otherwise
an exception will be thrown. Certain configurations that were possible
with `string` fields are no longer possible with `text`/`keyword` fields
such as enabling `term_vectors` on a not-analyzed `keyword` field.
==== `index` property
On all field datatypes (except for the deprecated `string` field), the `index`
property now only accepts `true`/`false` instead of `not_analyzed`/`no`. The
`string` field still accepts `analyzed`/`not_analyzed`/`no`.
==== Doc values on unindexed fields
Previously, setting a field to `index:no` would also disable doc-values. Now,
doc-values are always enabled on numeric and boolean fields unless
`doc_values` is set to `false`.
==== Floating points use `float` instead of `double`
When dynamically mapping a field containing a floating point number, the field
now defaults to using `float` instead of `double`. The reasoning is that
floats should be more than enough for most cases but would decrease storage
requirements significantly.
==== `fielddata.format`
Setting `fielddata.format: doc_values` in the mappings used to implicitly
enable doc-values on a field. This no longer works: the only way to enable or
disable doc-values is by using the `doc_values` property of mappings.
==== Source-transform removed
The source `transform` feature has been removed. Instead, use an ingest pipeline
==== `_parent` field no longer indexed
The join between parent and child documents no longer relies on indexed fields
and therefore from 5.0.0 onwards the `_parent` field is no longer indexed. In
order to find documents that referrer to a specific parent id the new
`parent_id` query can be used. The GET response and hits inside the search
response still include the parent id under the `_parent` key.
==== Source `format` option
The `_source` mapping no longer supports the `format` option. It will still be
accepted for indices created before the upgrade to 5.0 for backwards
compatibility, but it will have no effect. Indices created on or after 5.0
will reject this option.
==== Object notation
Core types no longer support the object notation, which was used to provide
per document boosts as follows:
[source,json]
---------------
{
"value": "field_value",
"boost": 42
}
---------------
==== Boost accuracy for queries on `_all`
Per-field boosts on the `_all` are now compressed into a single byte instead
of the 4 bytes used previously. While this will make the index much more
space-efficient, it also means that index time boosts will be less accurately
encoded.

View File

@ -0,0 +1,24 @@
[[breaking_50_packaging]]
=== Packaging
==== Default logging using systemd (since Elasticsearch 2.2.0)
In previous versions of Elasticsearch, the default logging
configuration routed standard output to /dev/null and standard error to
the journal. However, there are often critical error messages at
startup that are logged to standard output rather than standard error
and these error messages would be lost to the nether. The default has
changed to now route standard output to the journal and standard error
to inherit this setting (these are the defaults for systemd). These
settings can be modified by editing the elasticsearch.service file.
==== Longer startup times
In Elasticsearch 5.0.0 the `-XX:+AlwaysPreTouch` flag has been added to the JVM
startup options. This option touches all memory pages used by the JVM heap
during initialization of the HotSpot VM to reduce the chance of having to commit
a memory page during GC time. This will increase the startup time of
Elasticsearch as well as increasing the initial resident memory usage of the
Java process.

View File

@ -0,0 +1,41 @@
[[breaking_50_percolator]]
=== Percolator changes
==== Percolator is near-real time
Previously percolators were activated in real-time, i.e. as soon as they were
indexed. Now, changes to the percolator query are visible in near-real time,
as soon as the index has been refreshed. This change was required because, in
indices created from 5.0 onwards, the terms used in a percolator query are
automatically indexed to allow for more efficient query selection during
percolation.
==== Percolator mapping
The percolate API can no longer accept documents that reference fields that
don't already exist in the mapping.
The percolate API no longer modifies the mappings. Before the percolate API
could be used to dynamically introduce new fields to the mappings based on the
fields in the document being percolated. This no longer works, because these
unmapped fields are not persisted in the mapping.
==== Percolator documents returned by search
Documents with the `.percolate` type were previously excluded from the search
response, unless the `.percolate` type was specified explicitly in the search
request. Now, percolator documents are treated in the same way as any other
document and are returned by search requests.
==== Percolator `size` default
The percolator by default sets the `size` option to `10` whereas before this
was unlimited.
==== Percolate API
When percolating an existing document then specifying a document in the source
of the percolate request is not allowed any more.

View File

@ -0,0 +1,99 @@
[[breaking_50_plugins]]
=== Plugin changes
The command `bin/plugin` has been renamed to `bin/elasticsearch-plugin`. The
structure of the plugin ZIP archive has changed. All the plugin files must be
contained in a top-level directory called `elasticsearch`. If you use the
gradle build, this structure is automatically generated.
==== Site plugins removed
Site plugins have been removed. Site plugins should be reimplemented as Kibana
plugins.
==== Multicast plugin removed
Multicast has been removed. Use unicast discovery, or one of the cloud
discovery plugins.
==== Plugins with custom query implementations
Plugins implementing custom queries need to implement the `fromXContent(QueryParseContext)` method in their
`QueryParser` subclass rather than `parse`. This method will take care of parsing the query from `XContent` format
into an intermediate query representation that can be streamed between the nodes in binary format, effectively the
query object used in the java api. Also, the query parser needs to implement the `getBuilderPrototype` method that
returns a prototype of the `NamedWriteable` query, which allows to deserialize an incoming query by calling
`readFrom(StreamInput)` against it, which will create a new object, see usages of `Writeable`. The `QueryParser`
also needs to declare the generic type of the query that it supports and it's able to parse.
The query object can then transform itself into a lucene query through the new `toQuery(QueryShardContext)` method,
which returns a lucene query to be executed on the data node.
Similarly, plugins implementing custom score functions need to implement the `fromXContent(QueryParseContext)`
method in their `ScoreFunctionParser` subclass rather than `parse`. This method will take care of parsing
the function from `XContent` format into an intermediate function representation that can be streamed between
the nodes in binary format, effectively the function object used in the java api. Also, the query parser needs
to implement the `getBuilderPrototype` method that returns a prototype of the `NamedWriteable` function, which
allows to deserialize an incoming function by calling `readFrom(StreamInput)` against it, which will create a
new object, see usages of `Writeable`. The `ScoreFunctionParser` also needs to declare the generic type of the
function that it supports and it's able to parse. The function object can then transform itself into a lucene
function through the new `toFunction(QueryShardContext)` method, which returns a lucene function to be executed
on the data node.
==== Cloud AWS plugin changes
Cloud AWS plugin has been split in two plugins:
* {plugins}/discovery-ec2.html[Discovery EC2 plugin]
* {plugins}/repository-s3.html[Repository S3 plugin]
Proxy settings for both plugins have been renamed:
* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host`
* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host`
* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host`
* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port`
* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port`
* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port`
==== Cloud Azure plugin changes
Cloud Azure plugin has been split in three plugins:
* {plugins}/discovery-azure.html[Discovery Azure plugin]
* {plugins}/repository-azure.html[Repository Azure plugin]
* {plugins}/store-smb.html[Store SMB plugin]
If you were using the `cloud-azure` plugin for snapshot and restore, you had in `elasticsearch.yml`:
[source,yaml]
-----
cloud:
azure:
storage:
account: your_azure_storage_account
key: your_azure_storage_key
-----
You need to give a unique id to the storage details now as you can define multiple storage accounts:
[source,yaml]
-----
cloud:
azure:
storage:
my_account:
account: your_azure_storage_account
key: your_azure_storage_key
-----
==== Cloud GCE plugin changes
Cloud GCE plugin has been renamed to {plugins}/discovery-gce.html[Discovery GCE plugin].
==== Mapper Attachments plugin deprecated
Mapper attachments has been deprecated. Users should use now the {plugins}/ingest-attachment.html[`ingest-attachment`]
plugin.

View File

@ -0,0 +1,17 @@
[[breaking_50_rest_api_changes]]
=== REST API changes
==== id values longer than 512 bytes are rejected
When specifying an `_id` value longer than 512 bytes, the request will be
rejected.
==== `/_optimize` endpoint removed
The deprecated `/_optimize` endpoint has been removed. The `/_forcemerge`
endpoint should be used in lieu of optimize.
The `GET` HTTP verb for `/_forcemerge` is no longer supported, please use the
`POST` HTTP verb.

View File

@ -0,0 +1,141 @@
[[breaking_50_search_changes]]
=== Search and Query DSL changes
==== `search_type`
===== `search_type=count` removed
The `count` search type was deprecated since version 2.0.0 and is now removed.
In order to get the same benefits, you just need to set the value of the `size`
parameter to `0`.
For instance, the following request:
[source,sh]
---------------
GET /my_index/_search?search_type=count
{
"aggs": {
"my_terms": {
"terms": {
"field": "foo"
}
}
}
}
---------------
can be replaced with:
[source,sh]
---------------
GET /my_index/_search
{
"size": 0,
"aggs": {
"my_terms": {
"terms": {
"field": "foo"
}
}
}
}
---------------
===== `search_type=scan` removed
The `scan` search type was deprecated since version 2.1.0 and is now removed.
All benefits from this search type can now be achieved by doing a scroll
request that sorts documents in `_doc` order, for instance:
[source,sh]
---------------
GET /my_index/_search?scroll=2m
{
"sort": [
"_doc"
]
}
---------------
Scroll requests sorted by `_doc` have been optimized to more efficiently resume
from where the previous request stopped, so this will have the same performance
characteristics as the former `scan` search type.
==== `fields` parameter
The `fields` parameter used to try to retrieve field values from stored
fields, and fall back to extracting from the `_source` if a field is not
marked as stored. Now, the `fields` parameter will only return stored fields
-- it will no longer extract values from the `_source`.
==== search-exists API removed
The search exists api has been removed in favour of using the search api with
`size` set to `0` and `terminate_after` set to `1`.
==== Deprecated queries removed
The following deprecated queries have been removed:
`filtered`:: Use `bool` query instead, which supports `filter` clauses too.
`and`:: Use `must` clauses in a `bool` query instead.
`or`:: Use `should` clauses in a `bool` query instead.
`limit`:: Use the `terminate_after` parameter instead.
`fquery`:: Is obsolete after filters and queries have been merged.
`query`:: Is obsolete after filters and queries have been merged.
`query_binary`:: Was undocumented and has been removed.
`filter_binary`:: Was undocumented and has been removed.
==== Changes to queries
* Removed support for the deprecated `min_similarity` parameter in `fuzzy
query`, in favour of `fuzziness`.
* Removed support for the deprecated `fuzzy_min_sim` parameter in
`query_string` query, in favour of `fuzziness`.
* Removed support for the deprecated `edit_distance` parameter in completion
suggester, in favour of `fuzziness`.
* Removed support for the deprecated `filter` and `no_match_filter` fields in `indices` query,
in favour of `query` and `no_match_query`.
* Removed support for the deprecated `filter` fields in `nested` query, in favour of `query`.
* Removed support for the deprecated `minimum_should_match` and
`disable_coord` in `terms` query, use `bool` query instead. Also removed
support for the deprecated `execution` parameter.
* Removed support for the top level `filter` element in `function_score` query, replaced by `query`.
* The `collect_payloads` parameter of the `span_near` query has been deprecated. Payloads will be loaded when needed.
* The `score_type` parameter to the `has_child` and `has_parent` queries has been removed in favour of `score_mode`.
Also, the `sum` score mode has been removed in favour of the `total` mode.
* When the `max_children` parameter was set to `0` on the `has_child` query
then there was no upper limit on how many child documents were allowed to
match. Now, `0` really means that zero child documents are allowed. If no
upper limit is needed then the `max_children` parameter shouldn't be specified
at all.
==== Top level `filter` parameter
Removed support for the deprecated top level `filter` in the search api,
replaced by `post_filter`.
==== Highlighters
Removed support for multiple highlighter names, the only supported ones are:
`plain`, `fvh` and `postings`.
==== Term vectors API
The term vectors APIs no longer persist unmapped fields in the mappings.
The `dfs` parameter to the term vectors API has been removed completely. Term
vectors don't support distributed document frequencies anymore.

View File

@ -0,0 +1,174 @@
[[breaking_50_settings_changes]]
=== Settings changes
From Elasticsearch 5.0 on all settings are validated before they are applied.
Node level and default index level settings are validated on node startup,
dynamic cluster and index setting are validated before they are updated/added
to the cluster state.
Every setting must be a *known* setting. All settings must have been
registered with the node or transport client they are used with. This implies
that plugins that define custom settings must register all of their settings
during plugin loading using the `SettingsModule#registerSettings(Setting)`
method.
==== Node settings
The `name` setting has been removed and is replaced by `node.name`. Usage of
`-Dname=some_node_name` is not supported anymore.
==== Transport Settings
All settings with a `netty` infix have been replaced by their already existing
`transport` synonyms. For instance `transport.netty.bind_host` is no longer
supported and should be replaced by the superseding setting
`transport.bind_host`.
==== Script mode settings
Previously script mode settings (e.g., "script.inline: true",
"script.engine.groovy.inline.aggs: false", etc.) accepted the values
`on`, `true`, `1`, and `yes` for enabling a scripting mode, and the
values `off`, `false`, `0`, and `no` for disabling a scripting mode.
The variants `on`, `1`, and `yes ` for enabling and `off`, `0`,
and `no` for disabling are no longer supported.
==== Security manager settings
The option to disable the security manager `security.manager.enabled` has been
removed. In order to grant special permissions to elasticsearch users must
edit the local Java Security Policy.
==== Network settings
The `_non_loopback_` value for settings like `network.host` would arbitrarily
pick the first interface not marked as loopback. Instead, specify by address
scope (e.g. `_local_,_site_` for all loopback and private network addresses)
or by explicit interface names, hostnames, or addresses.
==== Forbid changing of thread pool types
Previously, <<modules-threadpool,thread pool types>> could be dynamically
adjusted. The thread pool type effectively controls the backing queue for the
thread pool and modifying this is an expert setting with minimal practical
benefits and high risk of being misused. The ability to change the thread pool
type for any thread pool has been removed. It is still possible to adjust
relevant thread pool parameters for each of the thread pools (e.g., depending
on the thread pool type, `keep_alive`, `queue_size`, etc.).
==== Analysis settings
The `index.analysis.analyzer.default_index` analyzer is not supported anymore.
If you wish to change the analyzer to use for indexing, change the
`index.analysis.analyzer.default` analyzer instead.
==== Ping timeout settings
Previously, there were three settings for the ping timeout:
`discovery.zen.initial_ping_timeout`, `discovery.zen.ping.timeout` and
`discovery.zen.ping_timeout`. The former two have been removed and the only
setting key for the ping timeout is now `discovery.zen.ping_timeout`. The
default value for ping timeouts remains at three seconds.
==== Recovery settings
Recovery settings deprecated in 1.x have been removed:
* `index.shard.recovery.translog_size` is superseded by `indices.recovery.translog_size`
* `index.shard.recovery.translog_ops` is superseded by `indices.recovery.translog_ops`
* `index.shard.recovery.file_chunk_size` is superseded by `indices.recovery.file_chunk_size`
* `index.shard.recovery.concurrent_streams` is superseded by `indices.recovery.concurrent_streams`
* `index.shard.recovery.concurrent_small_file_streams` is superseded by `indices.recovery.concurrent_small_file_streams`
* `indices.recovery.max_size_per_sec` is superseded by `indices.recovery.max_bytes_per_sec`
If you are using any of these settings please take the time to review their
purpose. All of the settings above are considered _expert settings_ and should
only be used if absolutely necessary. If you have set any of the above setting
as persistent cluster settings please use the settings update API and set
their superseded keys accordingly.
The following settings have been removed without replacement
* `indices.recovery.concurrent_small_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders
* `indices.recovery.concurrent_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders
==== Translog settings
The `index.translog.flush_threshold_ops` setting is not supported anymore. In
order to control flushes based on the transaction log growth use
`index.translog.flush_threshold_size` instead.
Changing the translog type with `index.translog.fs.type` is not supported
anymore, the `buffered` implementation is now the only available option and
uses a fixed `8kb` buffer.
The translog by default is fsynced after every `index`, `create`, `update`,
`delete`, or `bulk` request. The ability to fsync on every operation is not
necessary anymore. In fact, it can be a performance bottleneck and it's trappy
since it enabled by a special value set on `index.translog.sync_interval`.
Now, `index.translog.sync_interval` doesn't accept a value less than `100ms`
which prevents fsyncing too often if async durability is enabled. The special
value `0` is no longer supported.
==== Request Cache Settings
The deprecated settings `index.cache.query.enable` and
`indices.cache.query.size` have been removed and are replaced with
`index.requests.cache.enable` and `indices.requests.cache.size` respectively.
`indices.requests.cache.clean_interval has been replaced with
`indices.cache.clean_interval and is no longer supported.
==== Field Data Cache Settings
The `indices.fielddata.cache.clean_interval` setting has been replaced with
`indices.cache.clean_interval`.
==== Allocation settings
The `cluster.routing.allocation.concurrent_recoveries` setting has been
replaced with `cluster.routing.allocation.node_concurrent_recoveries`.
==== Similarity settings
The 'default' similarity has been renamed to 'classic'.
==== Indexing settings
The `indices.memory.min_shard_index_buffer_size` and
`indices.memory.max_shard_index_buffer_size` have been removed as
Elasticsearch now allows any one shard to use amount of heap as long as the
total indexing buffer heap used across all shards is below the node's
`indices.memory.index_buffer_size` (defaults to 10% of the JVM heap).
==== Removed es.max-open-files
Setting the system property es.max-open-files to true to get
Elasticsearch to print the number of maximum open files for the
Elasticsearch process has been removed. This same information can be
obtained from the <<cluster-nodes-info>> API, and a warning is logged
on startup if it is set too low.
==== Removed es.netty.gathering
Disabling Netty from using NIO gathering could be done via the escape
hatch of setting the system property "es.netty.gathering" to "false".
Time has proven enabling gathering by default is a non-issue and this
non-documented setting has been removed.
==== Removed es.useLinkedTransferQueue
The system property `es.useLinkedTransferQueue` could be used to
control the queue implementation used in the cluster service and the
handling of ping responses during discovery. This was an undocumented
setting and has been removed.
==== Cache concurrency level settings removed
Two cache concurrency level settings
`indices.requests.cache.concurrency_level` and
`indices.fielddata.cache.concurrency_level` because they no longer apply to
the cache implementation used for the request cache and the field data cache.