From 5c845f8bb557db280086fe016b5b141b0ae34f2c Mon Sep 17 00:00:00 2001 From: Clinton Gormley Date: Sun, 13 Mar 2016 21:17:48 +0100 Subject: [PATCH] Reworked 5.0 breaking changes docs --- docs/reference/migration/migrate_5_0.asciidoc | 887 +----------------- .../migration/migrate_5_0/allocation.asciidoc | 54 ++ .../migration/migrate_5_0/cat.asciidoc | 33 + .../migration/migrate_5_0/index-apis.asciidoc | 48 + .../migration/migrate_5_0/java.asciidoc | 213 +++++ .../migration/migrate_5_0/mapping.asciidoc | 82 ++ .../migration/migrate_5_0/packaging.asciidoc | 24 + .../migration/migrate_5_0/percolator.asciidoc | 41 + .../migration/migrate_5_0/plugins.asciidoc | 99 ++ .../migration/migrate_5_0/rest.asciidoc | 17 + .../migration/migrate_5_0/search.asciidoc | 141 +++ .../migration/migrate_5_0/settings.asciidoc | 174 ++++ 12 files changed, 957 insertions(+), 856 deletions(-) create mode 100644 docs/reference/migration/migrate_5_0/allocation.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/cat.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/index-apis.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/java.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/mapping.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/packaging.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/percolator.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/plugins.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/rest.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/search.asciidoc create mode 100644 docs/reference/migration/migrate_5_0/settings.asciidoc diff --git a/docs/reference/migration/migrate_5_0.asciidoc b/docs/reference/migration/migrate_5_0.asciidoc index 8e082a1e426..23cadbbd9ed 100644 --- a/docs/reference/migration/migrate_5_0.asciidoc +++ b/docs/reference/migration/migrate_5_0.asciidoc @@ -4,877 +4,52 @@ This section discusses the changes that you need to be aware of when migrating your application to Elasticsearch 5.0. +[IMPORTANT] +.Reindex indices from Elasticseach 1.x or before +========================================= + +Indices created in Elasticsearch 1.x or before will need to be reindexed with +Elasticsearch 2.x in order to be readable by Elasticsearch 5.x. The easiest +way to do this is to upgrade to Elasticsearch 2.3 or later and to use the +`reindex` API. + +========================================= + +[float] +=== Also see: + * <> +* <> +* <> +* <> +* <> +* <> * <> * <> -* <> -* <> -* <> -* <> * <> -* <> -* <> -* <> -* <> -* <> * <> -* <> -* <> -* <> -* <> +* <> -[[breaking_50_search_changes]] -=== Warmers +include::migrate_5_0/search.asciidoc[] -Thanks to several changes like doc values by default or disk-based norms, -warmers have become quite useless. As a consequence, warmers and the warmer -API have been removed: it is not possible anymore to register queries that -will run before a new IndexSearcher is published. +include::migrate_5_0/mapping.asciidoc[] -Don't worry if you have warmers defined on your indices, they will simply be -ignored when upgrading to 5.0. +include::migrate_5_0/percolator.asciidoc[] -=== Search changes +include::migrate_5_0/index-apis.asciidoc[] -==== `search_type=count` removed +include::migrate_5_0/settings.asciidoc[] -The `count` search type was deprecated since version 2.0.0 and is now removed. -In order to get the same benefits, you just need to set the value of the `size` -parameter to `0`. +include::migrate_5_0/allocation.asciidoc[] -For instance, the following request: +include::migrate_5_0/rest.asciidoc[] -[source,sh] ---------------- -GET /my_index/_search?search_type=count -{ - "aggs": { - "my_terms": { - "terms": { - "field": "foo" - } - } - } -} ---------------- +include::migrate_5_0/cat.asciidoc[] -can be replaced with: +include::migrate_5_0/java.asciidoc[] -[source,sh] ---------------- -GET /my_index/_search -{ - "size": 0, - "aggs": { - "my_terms": { - "terms": { - "field": "foo" - } - } - } -} ---------------- +include::migrate_5_0/packaging.asciidoc[] -==== `search_type=scan` removed +include::migrate_5_0/plugins.asciidoc[] -The `scan` search type was deprecated since version 2.1.0 and is now removed. -All benefits from this search type can now be achieved by doing a scroll -request that sorts documents in `_doc` order, for instance: -[source,sh] ---------------- -GET /my_index/_search?scroll=2m -{ - "sort": [ - "_doc" - ] -} ---------------- - -Scroll requests sorted by `_doc` have been optimized to more efficiently resume -from where the previous request stopped, so this will have the same performance -characteristics as the former `scan` search type. - -==== Boost accuracy for queries on `_all` - -Per-field boosts on the `_all` are now compressed on a single byte instead of -4 bytes previously. While this will make the index more space-efficient, this -also means that the boosts will be less accurately encoded. - -[[breaking_50_rest_api_changes]] -=== REST API changes - -==== id values longer than 512 bytes are rejected - -When specifying an `_id` value longer than 512 bytes, the request will be -rejected. - -==== search exists api removed - -The search exists api has been removed in favour of using the search api with -`size` set to `0` and `terminate_after` set to `1`. - -==== `/_optimize` endpoint removed - -The deprecated `/_optimize` endpoint has been removed. The `/_forcemerge` -endpoint should be used in lieu of optimize. - -The `GET` HTTP verb for `/_forcemerge` is no longer supported, please use the -`POST` HTTP verb. - -==== Deprecated queries removed - -The following deprecated queries have been removed: - -* `filtered`: use `bool` query instead, which supports `filter` clauses too -* `and`: use `must` clauses in a `bool` query instead -* `or`: use should clauses in a `bool` query instead -* `limit`: use `terminate_after` parameter instead -* `fquery`: obsolete after filters and queries have been merged -* `query`: obsolete after filters and queries have been merged - -==== Unified fuzziness parameter - -* Removed support for the deprecated `min_similarity` parameter in `fuzzy query`, in favour of `similarity`. -* Removed support for the deprecated `fuzzy_min_sim` parameter in `query_string` query, in favour of `similarity`. -* Removed support for the deprecated `edit_distance` parameter in completion suggester, in favour of `similarity`. - -==== indices query - -Removed support for the deprecated `filter` and `no_match_filter` fields in `indices` query, -in favour of `query` and `no_match_query`. - -==== nested query - -Removed support for the deprecated `filter` fields in `nested` query, in favour of `query`. - -==== terms query - -Removed support for the deprecated `minimum_should_match` and `disable_coord` in `terms` query, use `bool` query instead. -Removed also support for the deprecated `execution` parameter. - -==== function_score query - -Removed support for the top level `filter` element in `function_score` query, replaced by `query`. - -==== highlighters - -Removed support for multiple highlighter names, the only supported ones are: `plain`, `fvh` and `postings`. - -==== top level filter - -Removed support for the deprecated top level `filter` in the search api, replaced by `post_filter`. - -==== `query_binary` and `filter_binary` removed - -Removed support for the undocumented `query_binary` and `filter_binary` sections of a search request. - -==== `span_near`'s' `collect_payloads` deprecated - -Payloads are now loaded when needed. - -[[breaking_50_cat_api]] -=== CAT API changes - -==== Use Accept header for specifying response media type - -Previous versions of Elasticsearch accepted the Content-type header -field for controlling the media type of the response in the cat API. -This is in opposition to the HTTP spec which specifies the Accept -header field for this purpose. Elasticsearch now uses the Accept header -field and support for using the Content-Type header field for this -purpose has been removed. - -==== Host field removed from the cat nodes API - -The `host` field has been removed from the cat nodes API as its value -is always equal to the `ip` field. The `name` field is available in the -cat nodes API and should be used instead of the `host` field. - -==== Changes to cat recovery API - -The fields `bytes_recovered` and `files_recovered` have been added to -the cat recovery API. These fields, respectively, indicate the total -number of bytes and files that have been recovered. - -The fields `total_files` and `total_bytes` have been renamed to -`files_total` and `bytes_total`, respectively. - -Additionally, the field `translog` has been renamed to -`translog_ops_recovered`, the field `translog_total` to -`translog_ops` and the field `translog_percent` to -`translog_ops_percent`. The short aliases for these fields are `tor`, -`to`, and `top`, respectively. - -[[breaking_50_parent_child_changes]] -=== Parent/Child changes - -The `children` aggregation, parent child inner hits and `has_child` and `has_parent` queries will not work on indices -with `_parent` field mapping created before version `2.0.0`. The data of these indices need to be re-indexed into a new index. - -The format of the join between parent and child documents have changed with the `2.0.0` release. The old -format can't read from version `5.0.0` and onwards. The new format allows for a much more efficient and -scalable join between parent and child documents and the join data structures are stored on disk -data structures as opposed as before the join data structures were stored in the jvm heap space. - -==== `score_type` has been removed - -The `score_type` option has been removed from the `has_child` and `has_parent` queries in favour of the `score_mode` option -which does the exact same thing. - -==== `sum` score mode removed - -The `sum` score mode has been removed in favour of the `total` mode which does the same and is already available in -previous versions. - -==== `max_children` option - -When `max_children` was set to `0` on the `has_child` query then there was no upper limit on how many children documents -are allowed to match. This has changed and `0` now really means to zero child documents are allowed. If no upper limit -is needed then the `max_children` option shouldn't be defined at all on the `has_child` query. - -==== `_parent` field no longer indexed - -The join between parent and child documents no longer relies on indexed fields and therefor from `5.0.0` onwards -the `_parent` indexed field won't be indexed. In order to find documents that referrer to a specific parent id -the new `parent_id` query can be used. The get response and hits inside the search response remain to include -the parent id under the `_parent` key. - -[[breaking_50_settings_changes]] -=== Settings changes - -From Elasticsearch 5.0 on all settings are validated before they are applied. Node level and default index -level settings are validated on node startup, dynamic cluster and index setting are validated before they are updated/added -to the cluster state. Every setting must be a _known_ setting or in other words all settings must be registered with the -node or transport client they are used with. This implies that plugins that define custom settings must register all of their -settings during pluging loading using the `SettingsModule#registerSettings(Setting)` method. - -==== Node settings - -The `name` setting has been removed and is replaced by `node.name`. Usage of `-Dname=some_node_name` is not supported -anymore. - -==== Transport Settings - -All settings with a `netty` infix have been replaced by their already existing `transport` synonyms. For instance `transport.netty.bind_host` is -no longer supported and should be replaced by the superseding setting `transport.bind_host`. - -==== Analysis settings - -The `index.analysis.analyzer.default_index` analyzer is not supported anymore. -If you wish to change the analyzer to use for indexing, change the -`index.analysis.analyzer.default` analyzer instead. - -==== Ping timeout settings - -Previously, there were three settings for the ping timeout: `discovery.zen.initial_ping_timeout`, -`discovery.zen.ping.timeout` and `discovery.zen.ping_timeout`. The former two have been removed and -the only setting key for the ping timeout is now `discovery.zen.ping_timeout`. The default value for -ping timeouts remains at three seconds. - -==== Recovery settings - -Recovery settings deprecated in 1.x have been removed: - - * `index.shard.recovery.translog_size` is superseded by `indices.recovery.translog_size` - * `index.shard.recovery.translog_ops` is superseded by `indices.recovery.translog_ops` - * `index.shard.recovery.file_chunk_size` is superseded by `indices.recovery.file_chunk_size` - * `index.shard.recovery.concurrent_streams` is superseded by `indices.recovery.concurrent_streams` - * `index.shard.recovery.concurrent_small_file_streams` is superseded by `indices.recovery.concurrent_small_file_streams` - * `indices.recovery.max_size_per_sec` is superseded by `indices.recovery.max_bytes_per_sec` - -If you are using any of these settings please take the time and review their purpose. All of the settings above are considered -_expert settings_ and should only be used if absolutely necessary. If you have set any of the above setting as persistent -cluster settings please use the settings update API and set their superseded keys accordingly. - -The following settings have been removed without replacement - - * `indices.recovery.concurrent_small_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders - * `indices.recovery.concurrent_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders - -==== Translog settings - -The `index.translog.flush_threshold_ops` setting is not supported anymore. In order to control flushes based on the transaction log -growth use `index.translog.flush_threshold_size` instead. Changing the translog type with `index.translog.fs.type` is not supported -anymore, the `buffered` implementation is now the only available option and uses a fixed `8kb` buffer. - -The translog by default is fsynced on a request basis such that the ability to fsync on every operation is not necessary anymore. In-fact it can -be a performance bottleneck and it's trappy since it enabled by a special value set on `index.translog.sync_interval`. `index.translog.sync_interval` -now doesn't accept a value less than `100ms` which prevents fsyncing too often if async durability is enabled. The special value `0` is not supported anymore. - -==== Request Cache Settings - -The deprecated settings `index.cache.query.enable` and `indices.cache.query.size` have been removed and are replaced with -`index.requests.cache.enable` and `indices.requests.cache.size` respectively. - -`indices.requests.cache.clean_interval` has been replaced with `indices.cache.clean_interval` and is no longer supported. - -==== Field Data Cache Settings - -`indices.fielddata.cache.clean_interval` has been replaced with `indices.cache.clean_interval` and is no longer supported. - -==== Allocation settings - -Allocation settings deprecated in 1.x have been removed: - - * `cluster.routing.allocation.concurrent_recoveries` is superseded by `cluster.routing.allocation.node_concurrent_recoveries` - -Please change the setting in your configuration files or in the clusterstate to use the new settings instead. - -==== Similarity settings - -The 'default' similarity has been renamed to 'classic'. - -==== Indexing settings - -`indices.memory.min_shard_index_buffer_size` and `indices.memory.max_shard_index_buffer_size` are removed since Elasticsearch now allows any one shard to any -amount of heap as long as the total indexing buffer heap used across all shards is below the node's `indices.memory.index_buffer_size` (default: 10% of the JVM heap) - -==== Removed es.max-open-files - -Setting the system property es.max-open-files to true to get -Elasticsearch to print the number of maximum open files for the -Elasticsearch process has been removed. This same information can be -obtained from the <> API, and a warning is logged -on startup if it is set too low. - -==== Removed es.netty.gathering - -Disabling Netty from using NIO gathering could be done via the escape -hatch of setting the system property "es.netty.gathering" to "false". -Time has proven enabling gathering by default is a non-issue and this -non-documented setting has been removed. - -==== Removed es.useLinkedTransferQueue - -The system property `es.useLinkedTransferQueue` could be used to -control the queue implementation used in the cluster service and the -handling of ping responses during discovery. This was an undocumented -setting and has been removed. - -[[breaking_50_mapping_changes]] -=== Mapping changes - -==== Default doc values settings - -Doc values are now also on by default on numeric and boolean fields that are -not indexed. - -==== Transform removed - -The `transform` feature from mappings has been removed. It made issues very hard to debug. - -==== Default number mappings - -When a floating-point number is encountered, it is now dynamically mapped as a -float by default instead of a double. The reasoning is that floats should be -more than enough for most cases but would decrease storage requirements -significantly. - -==== `index` property - -On all types but `string`, the `index` property now only accepts `true`/`false` -instead of `not_analyzed`/`no`. The `string` field still accepts -`analyzed`/`not_analyzed`/`no`. - -==== ++_source++'s `format` option - -The `_source` mapping does not support the `format` option anymore. This option -will still be accepted for indices created before the upgrade to 5.0 for backward -compatibility, but it will have no effect. Indices created on or after 5.0 will -reject this option. - -==== Object notation - -Core types don't support the object notation anymore, which allowed to provide -values as follows: - -[source,json] ---------------- -{ - "value": "field_value", - "boost": 42 -} ---------------- - -==== `fielddata.format` - -Setting `fielddata.format: doc_values` in the mappings used to implicitly -enable doc values on a field. This no longer works: the only way to enable or -disable doc values is by using the `doc_values` property of mappings. - - -[[breaking_50_plugins]] -=== Plugin changes - -The command `bin/plugin` has been renamed to `bin/elasticsearch-plugin`. -The structure of the plugin has changed. All the plugin files must be contained in a directory called `elasticsearch`. -If you use the gradle build, this structure is automatically generated. - -==== Site plugins removed - -Site plugins have been removed. It is recommended to migrate site plugins to Kibana plugins. - -==== Multicast plugin removed - -Multicast has been removed. Use unicast discovery, or one of the cloud discovery plugins. - -==== Plugins with custom query implementations - -Plugins implementing custom queries need to implement the `fromXContent(QueryParseContext)` method in their -`QueryParser` subclass rather than `parse`. This method will take care of parsing the query from `XContent` format -into an intermediate query representation that can be streamed between the nodes in binary format, effectively the -query object used in the java api. Also, the query parser needs to implement the `getBuilderPrototype` method that -returns a prototype of the `NamedWriteable` query, which allows to deserialize an incoming query by calling -`readFrom(StreamInput)` against it, which will create a new object, see usages of `Writeable`. The `QueryParser` -also needs to declare the generic type of the query that it supports and it's able to parse. -The query object can then transform itself into a lucene query through the new `toQuery(QueryShardContext)` method, -which returns a lucene query to be executed on the data node. - -Similarly, plugins implementing custom score functions need to implement the `fromXContent(QueryParseContext)` -method in their `ScoreFunctionParser` subclass rather than `parse`. This method will take care of parsing -the function from `XContent` format into an intermediate function representation that can be streamed between -the nodes in binary format, effectively the function object used in the java api. Also, the query parser needs -to implement the `getBuilderPrototype` method that returns a prototype of the `NamedWriteable` function, which -allows to deserialize an incoming function by calling `readFrom(StreamInput)` against it, which will create a -new object, see usages of `Writeable`. The `ScoreFunctionParser` also needs to declare the generic type of the -function that it supports and it's able to parse. The function object can then transform itself into a lucene -function through the new `toFunction(QueryShardContext)` method, which returns a lucene function to be executed -on the data node. - -==== Cloud AWS plugin changes - -Cloud AWS plugin has been split in two plugins: - -* {plugins}/discovery-ec2.html[Discovery EC2 plugin] -* {plugins}/repository-s3.html[Repository S3 plugin] - -Proxy settings for both plugins have been renamed: - -* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host` -* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host` -* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host` -* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port` -* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port` -* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port` - -==== Cloud Azure plugin changes - -Cloud Azure plugin has been split in three plugins: - -* {plugins}/discovery-azure.html[Discovery Azure plugin] -* {plugins}/repository-azure.html[Repository Azure plugin] -* {plugins}/store-smb.html[Store SMB plugin] - -If you were using the `cloud-azure` plugin for snapshot and restore, you had in `elasticsearch.yml`: - -[source,yaml] ------ -cloud: - azure: - storage: - account: your_azure_storage_account - key: your_azure_storage_key ------ - -You need to give a unique id to the storage details now as you can define multiple storage accounts: - -[source,yaml] ------ -cloud: - azure: - storage: - my_account: - account: your_azure_storage_account - key: your_azure_storage_key ------ - - -==== Cloud GCE plugin changes - -Cloud GCE plugin has been renamed to {plugins}/discovery-gce.html[Discovery GCE plugin]. - - -==== Mapper Attachments plugin deprecated - -Mapper attachments has been deprecated. Users should use now the {plugins}/ingest-attachment.html[`ingest-attachment`] -plugin. - - -[[breaking_50_java_api_changes]] -=== Java API changes - -==== Count api has been removed - -The deprecated count api has been removed from the Java api, use the search api instead and set size to 0. - -The following call - -[source,java] ------ -client.prepareCount(indices).setQuery(query).get(); ------ - -can be replaced with - -[source,java] ------ -client.prepareSearch(indices).setSource(new SearchSourceBuilder().size(0).query(query)).get(); ------ - -==== BoostingQueryBuilder - -Removed setters for mandatory positive/negative query. Both arguments now have -to be supplied at construction time already and have to be non-null. - -==== SpanContainingQueryBuilder - -Removed setters for mandatory big/little inner span queries. Both arguments now have -to be supplied at construction time already and have to be non-null. Updated -static factory methods in QueryBuilders accordingly. - -==== SpanOrQueryBuilder - -Making sure that query contains at least one clause by making initial clause mandatory -in constructor. - -==== SpanNearQueryBuilder - -Removed setter for mandatory slop parameter, needs to be set in constructor now. Also -making sure that query contains at least one clause by making initial clause mandatory -in constructor. Updated the static factory methods in QueryBuilders accordingly. - -==== SpanNotQueryBuilder - -Removed setter for mandatory include/exclude span query clause, needs to be set in constructor now. -Updated the static factory methods in QueryBuilders and tests accordingly. - -==== SpanWithinQueryBuilder - -Removed setters for mandatory big/little inner span queries. Both arguments now have -to be supplied at construction time already and have to be non-null. Updated -static factory methods in QueryBuilders accordingly. - -==== QueryFilterBuilder - -Removed the setter `queryName(String queryName)` since this field is not supported -in this type of query. Use `FQueryFilterBuilder.queryName(String queryName)` instead -when in need to wrap a named query as a filter. - -==== WrapperQueryBuilder - -Removed `wrapperQueryBuilder(byte[] source, int offset, int length)`. Instead simply -use `wrapperQueryBuilder(byte[] source)`. Updated the static factory methods in -QueryBuilders accordingly. - -==== QueryStringQueryBuilder - -Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`. -Use the `field(String, float)` method instead. - -==== Operator - -Removed the enums called `Operator` from `MatchQueryBuilder`, `QueryStringQueryBuilder`, -`SimpleQueryStringBuilder`, and `CommonTermsQueryBuilder` in favour of using the enum -defined in `org.elasticsearch.index.query.Operator` in an effort to consolidate the -codebase and avoid duplication. - -==== queryName and boost support - -Support for `queryName` and `boost` has been streamlined to all of the queries. That is -a breaking change till queries get sent over the network as serialized json rather -than in `Streamable` format. In fact whenever additional fields are added to the json -representation of the query, older nodes might throw error when they find unknown fields. - -==== InnerHitsBuilder - -InnerHitsBuilder now has a dedicated addParentChildInnerHits and addNestedInnerHits methods -to differentiate between inner hits for nested vs. parent / child documents. This change -makes the type / path parameter mandatory. - -==== MatchQueryBuilder - -Moving MatchQueryBuilder.Type and MatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.Type. -Also reusing new Operator enum. - -==== MoreLikeThisQueryBuilder - -Removed `MoreLikeThisQueryBuilder.Item#id(String id)`, `Item#doc(BytesReference doc)`, -`Item#doc(XContentBuilder doc)`. Use provided constructors instead. - -Removed `MoreLikeThisQueryBuilder#addLike` in favor of texts and/or items being provided -at construction time. Using arrays there instead of lists now. - -Removed `MoreLikeThisQueryBuilder#addUnlike` in favor to using the `unlike` methods -which take arrays as arguments now rather than the lists used before. - -The deprecated `docs(Item... docs)`, `ignoreLike(Item... docs)`, -`ignoreLike(String... likeText)`, `addItem(Item... likeItems)` have been removed. - -==== GeoDistanceQueryBuilder - -Removing individual setters for lon() and lat() values, both values should be set together - using point(lon, lat). - -==== GeoDistanceRangeQueryBuilder - -Removing setters for to(Object ...) and from(Object ...) in favour of the only two allowed input -arguments (String, Number). Removing setter for center point (point(), geohash()) because parameter -is mandatory and should already be set in constructor. -Also removing setters for lt(), lte(), gt(), gte() since they can all be replaced by equivalent -calls to to/from() and inludeLower()/includeUpper(). - -==== GeoPolygonQueryBuilder - -Require shell of polygon already to be specified in constructor instead of adding it pointwise. -This enables validation, but makes it necessary to remove the addPoint() methods. - -==== MultiMatchQueryBuilder - -Moving MultiMatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.ZeroTermsQuery. -Also reusing new Operator enum. - -Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`. -Use the `field(String, float)` method instead. - -==== MissingQueryBuilder - -The MissingQueryBuilder which was deprecated in 2.2.0 is removed. As a replacement use ExistsQueryBuilder -inside a mustNot() clause. So instead of using `new ExistsQueryBuilder(name)` now use -`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`. - -==== NotQueryBuilder - -The NotQueryBuilder which was deprecated in 2.1.0 is removed. As a replacement use BoolQueryBuilder -with added mustNot() clause. So instead of using `new NotQueryBuilder(filter)` now use -`new BoolQueryBuilder().mustNot(filter)`. - -==== TermsQueryBuilder - -Remove the setter for `termsLookup()`, making it only possible to either use a TermsLookup object or -individual values at construction time. Also moving individual settings for the TermsLookup (lookupIndex, -lookupType, lookupId, lookupPath) to the separate TermsLookup class, using constructor only and moving -checks for validation there. Removed `TermsLookupQueryBuilder` in favour of `TermsQueryBuilder`. - -==== FunctionScoreQueryBuilder - -`add` methods have been removed, all filters and functions must be provided as constructor arguments by -creating an array of `FunctionScoreQueryBuilder.FilterFunctionBuilder` objects, containing one element -for each filter/function pair. - -`scoreMode` and `boostMode` can only be provided using corresponding enum members instead -of string values: see `FilterFunctionScoreQuery.ScoreMode` and `CombineFunction`. - -`CombineFunction.MULT` has been renamed to `MULTIPLY`. - -==== IdsQueryBuilder - -For simplicity, only one way of adding the ids to the existing list (empty by default) is left: `addIds(String...)` - -==== DocumentAlreadyExistsException removed - -`DocumentAlreadyExistsException` is removed and a `VersionConflictException` is thrown instead (with a better -error description). This will influence code that use the `IndexRequest.opType()` or `IndexRequest.create()` -to index a document only if it doesn't already exist. - -==== ShapeBuilders - -`InternalLineStringBuilder` is removed in favour of `LineStringBuilder`, `InternalPolygonBuilder` in favour of PolygonBuilder` and `Ring` has been replaced with `LineStringBuilder`. Also the abstract base classes `BaseLineStringBuilder` and `BasePolygonBuilder` haven been merged with their corresponding implementations. - -==== RescoreBuilder - -`RecoreBuilder.Rescorer` was merged with `RescoreBuilder`, which now is an abstract superclass. QueryRescoreBuilder currently is its only implementation. - -==== PhraseSuggestionBuilder - -The inner DirectCandidateGenerator class has been moved out to its own class called DirectCandidateGeneratorBuilder. - -==== Elasticsearch will no longer detect logging implementations - -Elasticsearch now logs only to log4j 1.2. Previously if log4j wasn't on the classpath it made some effort to degrade to -slf4j or java.util.logging. Now it'll fail to work without the log4j 1.2 api. The log4j-over-slf4j bridge ought to work -when using the java client. As should log4j 2's log4j-1.2-api. The Elasticsearch server now only supports log4j as -configured by logging.yml and it no longer makes any effort to work if log4j isn't present. - -[[breaking_50_cache_concurrency]] -=== Cache concurrency level settings removed - -Two cache concurrency level settings `indices.requests.cache.concurrency_level` and -`indices.fielddata.cache.concurrency_level` because they no longer apply to the cache implementation used for the -request cache and the field data cache. - -[[breaking_50_non_loopback]] -=== Remove bind option of `non_loopback` - -This setting would arbitrarily pick the first interface not marked as loopback. Instead, specify by address -scope (e.g. `_local_,_site_` for all loopback and private network addresses) or by explicit interface names, -hostnames, or addresses. - -[[breaking_50_thread_pool]] -=== Forbid changing of thread pool types - -Previously, <> could be dynamically adjusted. The thread pool type effectively -controls the backing queue for the thread pool and modifying this is an expert setting with minimal practical benefits -and high risk of being misused. The ability to change the thread pool type for any thread pool has been removed; do note -that it is still possible to adjust relevant thread pool parameters for each of the thread pools (e.g., depending on -the thread pool type, `keep_alive`, `queue_size`, etc.). - -[[breaking_50_cpu_stats]] -=== System CPU stats - -The recent CPU usage (as a percent) has been added to the OS stats -reported under the node stats API and the cat nodes API. The breaking -change here is that there is a new object in the `os` object in the node -stats response. This object is called `cpu` and includes "percent" and -`load_average` as fields. This moves the `load_average` field that was -previously a top-level field in the `os` object to the `cpu` object. The -format of the `load_average` field has changed to an object with fields -`1m`, `5m`, and `15m` representing the one-minute, five-minute and -fifteen-minute loads respectively. If any of these fields are not present, -it indicates that the corresponding value is not available. - -In the cat nodes API response, the `cpu` field is output by default. The -previous `load` field has been removed and is replaced by `load_1m`, -`load_5m`, and `load_15m` which represent the one-minute, five-minute -and fifteen-minute loads respectively. The field will be null if the -corresponding value is not available. - -Finally, the API for `org.elasticsearch.monitor.os.OsStats` has -changed. The `getLoadAverage` method has been removed. The value for -this can now be obtained from `OsStats.Cpu#getLoadAverage` but it is no -longer a double and is instead an object encapsulating the one-minute, -five-minute and fifteen-minute load averages. Additionally, the recent -CPU usage can be obtained from `OsStats.Cpu#getPercent`. - -=== Fields option -Only stored fields are retrievable with this option. -The fields option won't be able to load non stored fields from _source anymore. - -[[breaking_50_allocation]] -=== Primary shard allocation - -Previously, primary shards were only assigned if a quorum of shard copies were found (configurable using -`index.recovery.initial_shards`, now deprecated). In case where a primary had only a single replica, quorum was defined -to be a single shard. This meant that any shard copy of an index with replication factor 1 could become primary, even it -was a stale copy of the data on disk. This is now fixed by using allocation IDs. - -Allocation IDs assign unique identifiers to shard copies. This allows the cluster to differentiate between multiple -copies of the same data and track which shards have been active, so that after a cluster restart, shard copies -containing only the most recent data can become primaries. - -=== Indices Shard Stores command - -By using allocation IDs instead of version numbers to identify shard copies for primary shard allocation, the former versioning scheme -has become obsolete. This is reflected in the indices-shards-stores.html[Indices Shard Stores API]. A new field `allocation_id` replaces the -former `version` field in the result of the Indices Shard Stores command. This field is available for all shard copies that have been either -created with the current version of Elasticsearch or have been active in a cluster running a current version of Elasticsearch. For legacy -shard copies that have not been active in a current version of Elasticsearch, a `legacy_version` field is available instead (equivalent to -the former `version` field). - -=== Reroute commands - -The reroute command `allocate` has been split into two distinct commands `allocate_replica` and `allocate_empty_primary`. -This was done as we introduced a new `allocate_stale_primary` command. The new `allocate_replica` command corresponds to the -old `allocate` command with `allow_primary` set to false. The new `allocate_empty_primary` command corresponds to the old -`allocate` command with `allow_primary` set to true. - -==== `index.shared_filesystem.recover_on_any_node` changes - -The behavior of `index.shared_filesystem.recover_on_any_node = true` has been changed. Previously, in the case where no -shard copies could be found, an arbitrary node was chosen by potentially ignoring allocation deciders. Now, we take -balancing into account but don't assign the shard if the allocation deciders are not satisfied. The behavior has also changed -in the case where shard copies can be found. Previously, a node not holding the shard copy was chosen if none of the nodes -holding shard copies were satisfying the allocation deciders. Now, the shard will be assigned to a node having a shard copy, -even if none of the nodes holding a shard copy satisfy the allocation deciders. - -[[breaking_50_percolator]] -=== Percolator - -Adding percolator queries and modifications to existing percolator queries are no longer visible in immediately -to the percolator. A refresh is required to run before the changes are visible to the percolator. - -The reason that this has changed is that on newly created indices the percolator automatically indexes the query terms -and these query terms are used at percolate time to reduce the amount of queries the percolate API needs evaluate. -This optimization didn't work in the percolate API mode where modifications to queries are immediately visible. - -The percolator by defaults sets the `size` option to `10` whereas before this was set to unlimited. - -The percolate api can no longer accept documents that have fields that don't exist in the mapping. - -When percolating an existing document then specifying a document in the source of the percolate request is not allowed -any more. - -The percolate api no longer modifies the mappings. Before the percolate api could be used to dynamically introduce new -fields to the mappings based on the fields in the document being percolated. This no longer works, because these -unmapped fields are not persisted in the mapping. - -Percolator documents are no longer excluded from the search response. - -[[breaking_50_packaging]] -=== Packaging - -==== Default logging using systemd (since Elasticsearch 2.2.0) - -In previous versions of Elasticsearch, the default logging -configuration routed standard output to /dev/null and standard error to -the journal. However, there are often critical error messages at -startup that are logged to standard output rather than standard error -and these error messages would be lost to the nether. The default has -changed to now route standard output to the journal and standard error -to inherit this setting (these are the defaults for systemd). These -settings can be modified by editing the elasticsearch.service file. - -==== Longer startup times - -In Elasticsearch 5.0.0 the `-XX:+AlwaysPreTouch` flag has been added to the JVM -startup options. This option touches all memory pages used by the JVM heap -during initialization of the HotSpot VM to reduce the chance of having to commit -a memory page during GC time. This will increase the startup time of -Elasticsearch as well as increasing the initial resident memory usage of the -Java process. - -[[breaking_50_scripting]] -=== Scripting - -==== Script mode settings - -Previously script mode settings (e.g., "script.inline: true", -"script.engine.groovy.inline.aggs: false", etc.) accepted the values -`on`, `true`, `1`, and `yes` for enabling a scripting mode, and the -values `off`, `false`, `0`, and `no` for disabling a scripting mode. -The variants `on`, `1`, and `yes ` for enabling and `off`, `0`, -and `no` for disabling are no longer supported. - -==== Groovy dependencies - -In previous versions of Elasticsearch, the Groovy scripting capabilities -depended on the `org.codehaus.groovy:groovy-all` artifact. In addition -to pulling in the Groovy language, this pulls in a very large set of -functionality, none of which is needed for scripting within -Elasticsearch. Aside from the inherent difficulties in managing such a -large set of dependencies, this also increases the surface area for -security issues. This dependency has been reduced to the core Groovy -language `org.codehaus.groovy:groovy` artifact. - -[[breaking_50_term_vectors]] -=== Term vectors - -The term vectors APIs no longer persist unmapped fields in the mappings. - -The `dfs` parameter has been removed completely, term vectors don't support -distributed document frequencies anymore. - -[[breaking_50_security]] -=== Security - -The option to disable the security manager `--security.manager.enabled` has been removed. In order to grant special -permissions to elasticsearch users must tweak the local Java Security Policy. - -[[breaking_50_snapshot_restore]] -=== Snapshot/Restore - -==== Closing / deleting indices while running snapshot - -In previous versions of Elasticsearch, closing or deleting an index during a full snapshot would make the snapshot fail. This is now changed -by failing the close/delete index request instead. The behavior for partial snapshots remains unchanged: Closing or deleting an index during -a partial snapshot is still possible. The snapshot result is then marked as partial. diff --git a/docs/reference/migration/migrate_5_0/allocation.asciidoc b/docs/reference/migration/migrate_5_0/allocation.asciidoc new file mode 100644 index 00000000000..1e095831381 --- /dev/null +++ b/docs/reference/migration/migrate_5_0/allocation.asciidoc @@ -0,0 +1,54 @@ +[[breaking_50_allocation]] +=== Allocation changes + +==== Primary shard allocation + +Previously, primary shards were only assigned if a quorum of shard copies were +found (configurable using `index.recovery.initial_shards`, now deprecated). In +case where a primary had only a single replica, quorum was defined to be a +single shard. This meant that any shard copy of an index with replication +factor 1 could become primary, even it was a stale copy of the data on disk. +This is now fixed thanks to shard allocation IDs. + +Allocation IDs assign unique identifiers to shard copies. This allows the +cluster to differentiate between multiple copies of the same data and track +which shards have been active so that, after a cluster restart, only shard +copies containing the most recent data can become primaries. + +==== Indices Shard Stores command + +By using allocation IDs instead of version numbers to identify shard copies +for primary shard allocation, the former versioning scheme has become +obsolete. This is reflected in the +<>. + +A new `allocation_id` field replaces the former `version` field in the result +of the Indices Shard Stores command. This field is available for all shard +copies that have been either created with the current version of Elasticsearch +or have been active in a cluster running a current version of Elasticsearch. +For legacy shard copies that have not been active in a current version of +Elasticsearch, a `legacy_version` field is available instead (equivalent to +the former `version` field). + +==== Reroute commands + +The reroute command `allocate` has been split into two distinct commands +`allocate_replica` and `allocate_empty_primary`. This was done as we +introduced a new `allocate_stale_primary` command. The new `allocate_replica` +command corresponds to the old `allocate` command with `allow_primary` set to +false. The new `allocate_empty_primary` command corresponds to the old +`allocate` command with `allow_primary` set to true. + +==== `index.shared_filesystem.recover_on_any_node` changes + +The behavior of `index.shared_filesystem.recover_on_any_node: true` has been +changed. Previously, in the case where no shard copies could be found, an +arbitrary node was chosen by potentially ignoring allocation deciders. Now, we +take balancing into account but don't assign the shard if the allocation +deciders are not satisfied. + +The behavior has also changed in the case where shard copies can be found. +Previously, a node not holding the shard copy was chosen if none of the nodes +holding shard copies were satisfying the allocation deciders. Now, the shard +will be assigned to a node having a shard copy, even if none of the nodes +holding a shard copy satisfy the allocation deciders. diff --git a/docs/reference/migration/migrate_5_0/cat.asciidoc b/docs/reference/migration/migrate_5_0/cat.asciidoc new file mode 100644 index 00000000000..c3b1c84ee8d --- /dev/null +++ b/docs/reference/migration/migrate_5_0/cat.asciidoc @@ -0,0 +1,33 @@ +[[breaking_50_cat_api]] +=== CAT API changes + +==== Use Accept header for specifying response media type + +Previous versions of Elasticsearch accepted the Content-type header +field for controlling the media type of the response in the cat API. +This is in opposition to the HTTP spec which specifies the Accept +header field for this purpose. Elasticsearch now uses the Accept header +field and support for using the Content-Type header field for this +purpose has been removed. + +==== Host field removed from the cat nodes API + +The `host` field has been removed from the cat nodes API as its value +is always equal to the `ip` field. The `name` field is available in the +cat nodes API and should be used instead of the `host` field. + +==== Changes to cat recovery API + +The fields `bytes_recovered` and `files_recovered` have been added to +the cat recovery API. These fields, respectively, indicate the total +number of bytes and files that have been recovered. + +The fields `total_files` and `total_bytes` have been renamed to +`files_total` and `bytes_total`, respectively. + +Additionally, the field `translog` has been renamed to +`translog_ops_recovered`, the field `translog_total` to +`translog_ops` and the field `translog_percent` to +`translog_ops_percent`. The short aliases for these fields are `tor`, +`to`, and `top`, respectively. + diff --git a/docs/reference/migration/migrate_5_0/index-apis.asciidoc b/docs/reference/migration/migrate_5_0/index-apis.asciidoc new file mode 100644 index 00000000000..72651295bbc --- /dev/null +++ b/docs/reference/migration/migrate_5_0/index-apis.asciidoc @@ -0,0 +1,48 @@ +[[breaking_50_index_apis]] +=== Index APIs changes + +==== Closing / deleting indices while running snapshot + +In previous versions of Elasticsearch, closing or deleting an index during a +full snapshot would make the snapshot fail. In 5.0, the close/delete index +request will fail instead. The behavior for partial snapshots remains +unchanged: Closing or deleting an index during a partial snapshot is still +possible. The snapshot result is then marked as partial. + +==== Warmers + +Thanks to several changes like doc values by default and disk-based norms, +warmers are no longer useful. As a consequence, warmers and the warmer API +have been removed: it is no longer possible to register queries that will run +before a new IndexSearcher is published. + +Don't worry if you have warmers defined on your indices, they will simply be +ignored when upgrading to 5.0. + +==== System CPU stats + +The recent CPU usage (as a percent) has been added to the OS stats +reported under the node stats API and the cat nodes API. The breaking +change here is that there is a new object in the `os` object in the node +stats response. This object is called `cpu` and includes percent` and +`load_average` as fields. This moves the `load_average` field that was +previously a top-level field in the `os` object to the `cpu` object. The +format of the `load_average` field has changed to an object with fields +`1m`, `5m`, and `15m` representing the one-minute, five-minute and +fifteen-minute loads respectively. If any of these fields are not present, +it indicates that the corresponding value is not available. + +In the cat nodes API response, the `cpu` field is output by default. The +previous `load` field has been removed and is replaced by `load_1m`, +`load_5m`, and `load_15m` which represent the one-minute, five-minute +and fifteen-minute loads respectively. The field will be null if the +corresponding value is not available. + +Finally, the API for `org.elasticsearch.monitor.os.OsStats` has +changed. The `getLoadAverage` method has been removed. The value for +this can now be obtained from `OsStats.Cpu#getLoadAverage` but it is no +longer a double and is instead an object encapsulating the one-minute, +five-minute and fifteen-minute load averages. Additionally, the recent +CPU usage can be obtained from `OsStats.Cpu#getPercent`. + + diff --git a/docs/reference/migration/migrate_5_0/java.asciidoc b/docs/reference/migration/migrate_5_0/java.asciidoc new file mode 100644 index 00000000000..d1b96eb9446 --- /dev/null +++ b/docs/reference/migration/migrate_5_0/java.asciidoc @@ -0,0 +1,213 @@ + + + +[[breaking_50_java_api_changes]] +=== Java API changes + +==== Count api has been removed + +The deprecated count api has been removed from the Java api, use the search api instead and set size to 0. + +The following call + +[source,java] +----- +client.prepareCount(indices).setQuery(query).get(); +----- + +can be replaced with + +[source,java] +----- +client.prepareSearch(indices).setSource(new SearchSourceBuilder().size(0).query(query)).get(); +----- + +==== Elasticsearch will no longer detect logging implementations + +Elasticsearch now logs only to log4j 1.2. Previously if log4j wasn't on the +classpath it made some effort to degrade to slf4j or java.util.logging. Now it +will fail to work without the log4j 1.2 api. The log4j-over-slf4j bridge ought +to work when using the java client, as should log4j 2's log4j-1.2-api. The +Elasticsearch server now only supports log4j as configured by `logging.yml` +and will fail if log4j isn't present. + +==== Groovy dependencies + +In previous versions of Elasticsearch, the Groovy scripting capabilities +depended on the `org.codehaus.groovy:groovy-all` artifact. In addition +to pulling in the Groovy language, this pulls in a very large set of +functionality, none of which is needed for scripting within +Elasticsearch. Aside from the inherent difficulties in managing such a +large set of dependencies, this also increases the surface area for +security issues. This dependency has been reduced to the core Groovy +language `org.codehaus.groovy:groovy` artifact. + +==== DocumentAlreadyExistsException removed + +`DocumentAlreadyExistsException` is removed and a `VersionConflictException` is thrown instead (with a better +error description). This will influence code that use the `IndexRequest.opType()` or `IndexRequest.create()` +to index a document only if it doesn't already exist. + +==== Changes to Query Builders + +===== BoostingQueryBuilder + +Removed setters for mandatory positive/negative query. Both arguments now have +to be supplied at construction time already and have to be non-null. + +===== SpanContainingQueryBuilder + +Removed setters for mandatory big/little inner span queries. Both arguments now have +to be supplied at construction time already and have to be non-null. Updated +static factory methods in QueryBuilders accordingly. + +===== SpanOrQueryBuilder + +Making sure that query contains at least one clause by making initial clause mandatory +in constructor. + +===== SpanNearQueryBuilder + +Removed setter for mandatory slop parameter, needs to be set in constructor now. Also +making sure that query contains at least one clause by making initial clause mandatory +in constructor. Updated the static factory methods in QueryBuilders accordingly. + +===== SpanNotQueryBuilder + +Removed setter for mandatory include/exclude span query clause, needs to be set in constructor now. +Updated the static factory methods in QueryBuilders and tests accordingly. + +===== SpanWithinQueryBuilder + +Removed setters for mandatory big/little inner span queries. Both arguments now have +to be supplied at construction time already and have to be non-null. Updated +static factory methods in QueryBuilders accordingly. + +===== QueryFilterBuilder + +Removed the setter `queryName(String queryName)` since this field is not supported +in this type of query. Use `FQueryFilterBuilder.queryName(String queryName)` instead +when in need to wrap a named query as a filter. + +===== WrapperQueryBuilder + +Removed `wrapperQueryBuilder(byte[] source, int offset, int length)`. Instead simply +use `wrapperQueryBuilder(byte[] source)`. Updated the static factory methods in +QueryBuilders accordingly. + +===== QueryStringQueryBuilder + +Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`. +Use the `field(String, float)` method instead. + +===== Operator + +Removed the enums called `Operator` from `MatchQueryBuilder`, `QueryStringQueryBuilder`, +`SimpleQueryStringBuilder`, and `CommonTermsQueryBuilder` in favour of using the enum +defined in `org.elasticsearch.index.query.Operator` in an effort to consolidate the +codebase and avoid duplication. + +===== queryName and boost support + +Support for `queryName` and `boost` has been streamlined to all of the queries. That is +a breaking change till queries get sent over the network as serialized json rather +than in `Streamable` format. In fact whenever additional fields are added to the json +representation of the query, older nodes might throw error when they find unknown fields. + +===== InnerHitsBuilder + +InnerHitsBuilder now has a dedicated addParentChildInnerHits and addNestedInnerHits methods +to differentiate between inner hits for nested vs. parent / child documents. This change +makes the type / path parameter mandatory. + +===== MatchQueryBuilder + +Moving MatchQueryBuilder.Type and MatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.Type. +Also reusing new Operator enum. + +===== MoreLikeThisQueryBuilder + +Removed `MoreLikeThisQueryBuilder.Item#id(String id)`, `Item#doc(BytesReference doc)`, +`Item#doc(XContentBuilder doc)`. Use provided constructors instead. + +Removed `MoreLikeThisQueryBuilder#addLike` in favor of texts and/or items being provided +at construction time. Using arrays there instead of lists now. + +Removed `MoreLikeThisQueryBuilder#addUnlike` in favor to using the `unlike` methods +which take arrays as arguments now rather than the lists used before. + +The deprecated `docs(Item... docs)`, `ignoreLike(Item... docs)`, +`ignoreLike(String... likeText)`, `addItem(Item... likeItems)` have been removed. + +===== GeoDistanceQueryBuilder + +Removing individual setters for lon() and lat() values, both values should be set together + using point(lon, lat). + +===== GeoDistanceRangeQueryBuilder + +Removing setters for to(Object ...) and from(Object ...) in favour of the only two allowed input +arguments (String, Number). Removing setter for center point (point(), geohash()) because parameter +is mandatory and should already be set in constructor. +Also removing setters for lt(), lte(), gt(), gte() since they can all be replaced by equivalent +calls to to/from() and inludeLower()/includeUpper(). + +===== GeoPolygonQueryBuilder + +Require shell of polygon already to be specified in constructor instead of adding it pointwise. +This enables validation, but makes it necessary to remove the addPoint() methods. + +===== MultiMatchQueryBuilder + +Moving MultiMatchQueryBuilder.ZeroTermsQuery enum to MatchQuery.ZeroTermsQuery. +Also reusing new Operator enum. + +Removed ability to pass in boost value using `field(String field)` method in form e.g. `field^2`. +Use the `field(String, float)` method instead. + +===== MissingQueryBuilder + +The MissingQueryBuilder which was deprecated in 2.2.0 is removed. As a replacement use ExistsQueryBuilder +inside a mustNot() clause. So instead of using `new ExistsQueryBuilder(name)` now use +`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`. + +===== NotQueryBuilder + +The NotQueryBuilder which was deprecated in 2.1.0 is removed. As a replacement use BoolQueryBuilder +with added mustNot() clause. So instead of using `new NotQueryBuilder(filter)` now use +`new BoolQueryBuilder().mustNot(filter)`. + +===== TermsQueryBuilder + +Remove the setter for `termsLookup()`, making it only possible to either use a TermsLookup object or +individual values at construction time. Also moving individual settings for the TermsLookup (lookupIndex, +lookupType, lookupId, lookupPath) to the separate TermsLookup class, using constructor only and moving +checks for validation there. Removed `TermsLookupQueryBuilder` in favour of `TermsQueryBuilder`. + +===== FunctionScoreQueryBuilder + +`add` methods have been removed, all filters and functions must be provided as constructor arguments by +creating an array of `FunctionScoreQueryBuilder.FilterFunctionBuilder` objects, containing one element +for each filter/function pair. + +`scoreMode` and `boostMode` can only be provided using corresponding enum members instead +of string values: see `FilterFunctionScoreQuery.ScoreMode` and `CombineFunction`. + +`CombineFunction.MULT` has been renamed to `MULTIPLY`. + +===== IdsQueryBuilder + +For simplicity, only one way of adding the ids to the existing list (empty by default) is left: `addIds(String...)` + +===== ShapeBuilders + +`InternalLineStringBuilder` is removed in favour of `LineStringBuilder`, `InternalPolygonBuilder` in favour of PolygonBuilder` and `Ring` has been replaced with `LineStringBuilder`. Also the abstract base classes `BaseLineStringBuilder` and `BasePolygonBuilder` haven been merged with their corresponding implementations. + +===== RescoreBuilder + +`RecoreBuilder.Rescorer` was merged with `RescoreBuilder`, which now is an abstract superclass. QueryRescoreBuilder currently is its only implementation. + +===== PhraseSuggestionBuilder + +The inner DirectCandidateGenerator class has been moved out to its own class called DirectCandidateGeneratorBuilder. + diff --git a/docs/reference/migration/migrate_5_0/mapping.asciidoc b/docs/reference/migration/migrate_5_0/mapping.asciidoc new file mode 100644 index 00000000000..768a2438d3e --- /dev/null +++ b/docs/reference/migration/migrate_5_0/mapping.asciidoc @@ -0,0 +1,82 @@ +[[breaking_50_mapping_changes]] +=== Mapping changes + +==== `string` fields replaced by `text`/`keyword` fields + +The `string` field datatype has been replaced by the `text` field for full +text analyzed content, and the `keyword` field for not-analyzed exact string +values. For backwards compatibility purposes, during the 5.x series: + +* `string` fields on pre-5.0 indices will function as before. +* New `string` fields can be added to pre-5.0 indices as before. +* `text` and `keyword` fields can also be added to pre-5.0 indices. +* When adding a `string` field to a new index, the field mapping will be + rewritten as a `text` or `keyword` field if possible, otherwise + an exception will be thrown. Certain configurations that were possible + with `string` fields are no longer possible with `text`/`keyword` fields + such as enabling `term_vectors` on a not-analyzed `keyword` field. + +==== `index` property + +On all field datatypes (except for the deprecated `string` field), the `index` +property now only accepts `true`/`false` instead of `not_analyzed`/`no`. The +`string` field still accepts `analyzed`/`not_analyzed`/`no`. + +==== Doc values on unindexed fields + +Previously, setting a field to `index:no` would also disable doc-values. Now, +doc-values are always enabled on numeric and boolean fields unless +`doc_values` is set to `false`. + +==== Floating points use `float` instead of `double` + +When dynamically mapping a field containing a floating point number, the field +now defaults to using `float` instead of `double`. The reasoning is that +floats should be more than enough for most cases but would decrease storage +requirements significantly. + +==== `fielddata.format` + +Setting `fielddata.format: doc_values` in the mappings used to implicitly +enable doc-values on a field. This no longer works: the only way to enable or +disable doc-values is by using the `doc_values` property of mappings. + +==== Source-transform removed + +The source `transform` feature has been removed. Instead, use an ingest pipeline + +==== `_parent` field no longer indexed + +The join between parent and child documents no longer relies on indexed fields +and therefore from 5.0.0 onwards the `_parent` field is no longer indexed. In +order to find documents that referrer to a specific parent id the new +`parent_id` query can be used. The GET response and hits inside the search +response still include the parent id under the `_parent` key. + +==== Source `format` option + +The `_source` mapping no longer supports the `format` option. It will still be +accepted for indices created before the upgrade to 5.0 for backwards +compatibility, but it will have no effect. Indices created on or after 5.0 +will reject this option. + +==== Object notation + +Core types no longer support the object notation, which was used to provide +per document boosts as follows: + +[source,json] +--------------- +{ + "value": "field_value", + "boost": 42 +} +--------------- + +==== Boost accuracy for queries on `_all` + +Per-field boosts on the `_all` are now compressed into a single byte instead +of the 4 bytes used previously. While this will make the index much more +space-efficient, it also means that index time boosts will be less accurately +encoded. + diff --git a/docs/reference/migration/migrate_5_0/packaging.asciidoc b/docs/reference/migration/migrate_5_0/packaging.asciidoc new file mode 100644 index 00000000000..9be2d4accac --- /dev/null +++ b/docs/reference/migration/migrate_5_0/packaging.asciidoc @@ -0,0 +1,24 @@ +[[breaking_50_packaging]] +=== Packaging + +==== Default logging using systemd (since Elasticsearch 2.2.0) + +In previous versions of Elasticsearch, the default logging +configuration routed standard output to /dev/null and standard error to +the journal. However, there are often critical error messages at +startup that are logged to standard output rather than standard error +and these error messages would be lost to the nether. The default has +changed to now route standard output to the journal and standard error +to inherit this setting (these are the defaults for systemd). These +settings can be modified by editing the elasticsearch.service file. + +==== Longer startup times + +In Elasticsearch 5.0.0 the `-XX:+AlwaysPreTouch` flag has been added to the JVM +startup options. This option touches all memory pages used by the JVM heap +during initialization of the HotSpot VM to reduce the chance of having to commit +a memory page during GC time. This will increase the startup time of +Elasticsearch as well as increasing the initial resident memory usage of the +Java process. + + diff --git a/docs/reference/migration/migrate_5_0/percolator.asciidoc b/docs/reference/migration/migrate_5_0/percolator.asciidoc new file mode 100644 index 00000000000..3c560182c87 --- /dev/null +++ b/docs/reference/migration/migrate_5_0/percolator.asciidoc @@ -0,0 +1,41 @@ +[[breaking_50_percolator]] +=== Percolator changes + +==== Percolator is near-real time + +Previously percolators were activated in real-time, i.e. as soon as they were +indexed. Now, changes to the percolator query are visible in near-real time, +as soon as the index has been refreshed. This change was required because, in +indices created from 5.0 onwards, the terms used in a percolator query are +automatically indexed to allow for more efficient query selection during +percolation. + +==== Percolator mapping + +The percolate API can no longer accept documents that reference fields that +don't already exist in the mapping. + +The percolate API no longer modifies the mappings. Before the percolate API +could be used to dynamically introduce new fields to the mappings based on the +fields in the document being percolated. This no longer works, because these +unmapped fields are not persisted in the mapping. + +==== Percolator documents returned by search + +Documents with the `.percolate` type were previously excluded from the search +response, unless the `.percolate` type was specified explicitly in the search +request. Now, percolator documents are treated in the same way as any other +document and are returned by search requests. + +==== Percolator `size` default + +The percolator by default sets the `size` option to `10` whereas before this +was unlimited. + +==== Percolate API + +When percolating an existing document then specifying a document in the source +of the percolate request is not allowed any more. + + + diff --git a/docs/reference/migration/migrate_5_0/plugins.asciidoc b/docs/reference/migration/migrate_5_0/plugins.asciidoc new file mode 100644 index 00000000000..10268887417 --- /dev/null +++ b/docs/reference/migration/migrate_5_0/plugins.asciidoc @@ -0,0 +1,99 @@ +[[breaking_50_plugins]] +=== Plugin changes + +The command `bin/plugin` has been renamed to `bin/elasticsearch-plugin`. The +structure of the plugin ZIP archive has changed. All the plugin files must be +contained in a top-level directory called `elasticsearch`. If you use the +gradle build, this structure is automatically generated. + +==== Site plugins removed + +Site plugins have been removed. Site plugins should be reimplemented as Kibana +plugins. + +==== Multicast plugin removed + +Multicast has been removed. Use unicast discovery, or one of the cloud +discovery plugins. + +==== Plugins with custom query implementations + +Plugins implementing custom queries need to implement the `fromXContent(QueryParseContext)` method in their +`QueryParser` subclass rather than `parse`. This method will take care of parsing the query from `XContent` format +into an intermediate query representation that can be streamed between the nodes in binary format, effectively the +query object used in the java api. Also, the query parser needs to implement the `getBuilderPrototype` method that +returns a prototype of the `NamedWriteable` query, which allows to deserialize an incoming query by calling +`readFrom(StreamInput)` against it, which will create a new object, see usages of `Writeable`. The `QueryParser` +also needs to declare the generic type of the query that it supports and it's able to parse. +The query object can then transform itself into a lucene query through the new `toQuery(QueryShardContext)` method, +which returns a lucene query to be executed on the data node. + +Similarly, plugins implementing custom score functions need to implement the `fromXContent(QueryParseContext)` +method in their `ScoreFunctionParser` subclass rather than `parse`. This method will take care of parsing +the function from `XContent` format into an intermediate function representation that can be streamed between +the nodes in binary format, effectively the function object used in the java api. Also, the query parser needs +to implement the `getBuilderPrototype` method that returns a prototype of the `NamedWriteable` function, which +allows to deserialize an incoming function by calling `readFrom(StreamInput)` against it, which will create a +new object, see usages of `Writeable`. The `ScoreFunctionParser` also needs to declare the generic type of the +function that it supports and it's able to parse. The function object can then transform itself into a lucene +function through the new `toFunction(QueryShardContext)` method, which returns a lucene function to be executed +on the data node. + +==== Cloud AWS plugin changes + +Cloud AWS plugin has been split in two plugins: + +* {plugins}/discovery-ec2.html[Discovery EC2 plugin] +* {plugins}/repository-s3.html[Repository S3 plugin] + +Proxy settings for both plugins have been renamed: + +* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host` +* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host` +* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host` +* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port` +* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port` +* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port` + +==== Cloud Azure plugin changes + +Cloud Azure plugin has been split in three plugins: + +* {plugins}/discovery-azure.html[Discovery Azure plugin] +* {plugins}/repository-azure.html[Repository Azure plugin] +* {plugins}/store-smb.html[Store SMB plugin] + +If you were using the `cloud-azure` plugin for snapshot and restore, you had in `elasticsearch.yml`: + +[source,yaml] +----- +cloud: + azure: + storage: + account: your_azure_storage_account + key: your_azure_storage_key +----- + +You need to give a unique id to the storage details now as you can define multiple storage accounts: + +[source,yaml] +----- +cloud: + azure: + storage: + my_account: + account: your_azure_storage_account + key: your_azure_storage_key +----- + + +==== Cloud GCE plugin changes + +Cloud GCE plugin has been renamed to {plugins}/discovery-gce.html[Discovery GCE plugin]. + + +==== Mapper Attachments plugin deprecated + +Mapper attachments has been deprecated. Users should use now the {plugins}/ingest-attachment.html[`ingest-attachment`] +plugin. + diff --git a/docs/reference/migration/migrate_5_0/rest.asciidoc b/docs/reference/migration/migrate_5_0/rest.asciidoc new file mode 100644 index 00000000000..590a097f021 --- /dev/null +++ b/docs/reference/migration/migrate_5_0/rest.asciidoc @@ -0,0 +1,17 @@ + +[[breaking_50_rest_api_changes]] +=== REST API changes + +==== id values longer than 512 bytes are rejected + +When specifying an `_id` value longer than 512 bytes, the request will be +rejected. + +==== `/_optimize` endpoint removed + +The deprecated `/_optimize` endpoint has been removed. The `/_forcemerge` +endpoint should be used in lieu of optimize. + +The `GET` HTTP verb for `/_forcemerge` is no longer supported, please use the +`POST` HTTP verb. + diff --git a/docs/reference/migration/migrate_5_0/search.asciidoc b/docs/reference/migration/migrate_5_0/search.asciidoc new file mode 100644 index 00000000000..48807bf187a --- /dev/null +++ b/docs/reference/migration/migrate_5_0/search.asciidoc @@ -0,0 +1,141 @@ +[[breaking_50_search_changes]] +=== Search and Query DSL changes + +==== `search_type` + +===== `search_type=count` removed + +The `count` search type was deprecated since version 2.0.0 and is now removed. +In order to get the same benefits, you just need to set the value of the `size` +parameter to `0`. + +For instance, the following request: + +[source,sh] +--------------- +GET /my_index/_search?search_type=count +{ + "aggs": { + "my_terms": { + "terms": { + "field": "foo" + } + } + } +} +--------------- + +can be replaced with: + +[source,sh] +--------------- +GET /my_index/_search +{ + "size": 0, + "aggs": { + "my_terms": { + "terms": { + "field": "foo" + } + } + } +} +--------------- + +===== `search_type=scan` removed + +The `scan` search type was deprecated since version 2.1.0 and is now removed. +All benefits from this search type can now be achieved by doing a scroll +request that sorts documents in `_doc` order, for instance: + +[source,sh] +--------------- +GET /my_index/_search?scroll=2m +{ + "sort": [ + "_doc" + ] +} +--------------- + +Scroll requests sorted by `_doc` have been optimized to more efficiently resume +from where the previous request stopped, so this will have the same performance +characteristics as the former `scan` search type. + +==== `fields` parameter + +The `fields` parameter used to try to retrieve field values from stored +fields, and fall back to extracting from the `_source` if a field is not +marked as stored. Now, the `fields` parameter will only return stored fields +-- it will no longer extract values from the `_source`. + +==== search-exists API removed + +The search exists api has been removed in favour of using the search api with +`size` set to `0` and `terminate_after` set to `1`. + + +==== Deprecated queries removed + +The following deprecated queries have been removed: + +`filtered`:: Use `bool` query instead, which supports `filter` clauses too. +`and`:: Use `must` clauses in a `bool` query instead. +`or`:: Use `should` clauses in a `bool` query instead. +`limit`:: Use the `terminate_after` parameter instead. +`fquery`:: Is obsolete after filters and queries have been merged. +`query`:: Is obsolete after filters and queries have been merged. +`query_binary`:: Was undocumented and has been removed. +`filter_binary`:: Was undocumented and has been removed. + + +==== Changes to queries + +* Removed support for the deprecated `min_similarity` parameter in `fuzzy + query`, in favour of `fuzziness`. + +* Removed support for the deprecated `fuzzy_min_sim` parameter in + `query_string` query, in favour of `fuzziness`. + +* Removed support for the deprecated `edit_distance` parameter in completion + suggester, in favour of `fuzziness`. + +* Removed support for the deprecated `filter` and `no_match_filter` fields in `indices` query, +in favour of `query` and `no_match_query`. + +* Removed support for the deprecated `filter` fields in `nested` query, in favour of `query`. + +* Removed support for the deprecated `minimum_should_match` and + `disable_coord` in `terms` query, use `bool` query instead. Also removed + support for the deprecated `execution` parameter. + +* Removed support for the top level `filter` element in `function_score` query, replaced by `query`. + +* The `collect_payloads` parameter of the `span_near` query has been deprecated. Payloads will be loaded when needed. + +* The `score_type` parameter to the `has_child` and `has_parent` queries has been removed in favour of `score_mode`. + Also, the `sum` score mode has been removed in favour of the `total` mode. + +* When the `max_children` parameter was set to `0` on the `has_child` query + then there was no upper limit on how many child documents were allowed to + match. Now, `0` really means that zero child documents are allowed. If no + upper limit is needed then the `max_children` parameter shouldn't be specified + at all. + + +==== Top level `filter` parameter + +Removed support for the deprecated top level `filter` in the search api, +replaced by `post_filter`. + +==== Highlighters + +Removed support for multiple highlighter names, the only supported ones are: +`plain`, `fvh` and `postings`. + +==== Term vectors API + +The term vectors APIs no longer persist unmapped fields in the mappings. + +The `dfs` parameter to the term vectors API has been removed completely. Term +vectors don't support distributed document frequencies anymore. diff --git a/docs/reference/migration/migrate_5_0/settings.asciidoc b/docs/reference/migration/migrate_5_0/settings.asciidoc new file mode 100644 index 00000000000..002d6cf05df --- /dev/null +++ b/docs/reference/migration/migrate_5_0/settings.asciidoc @@ -0,0 +1,174 @@ +[[breaking_50_settings_changes]] +=== Settings changes + +From Elasticsearch 5.0 on all settings are validated before they are applied. +Node level and default index level settings are validated on node startup, +dynamic cluster and index setting are validated before they are updated/added +to the cluster state. + +Every setting must be a *known* setting. All settings must have been +registered with the node or transport client they are used with. This implies +that plugins that define custom settings must register all of their settings +during plugin loading using the `SettingsModule#registerSettings(Setting)` +method. + +==== Node settings + +The `name` setting has been removed and is replaced by `node.name`. Usage of +`-Dname=some_node_name` is not supported anymore. + +==== Transport Settings + +All settings with a `netty` infix have been replaced by their already existing +`transport` synonyms. For instance `transport.netty.bind_host` is no longer +supported and should be replaced by the superseding setting +`transport.bind_host`. + +==== Script mode settings + +Previously script mode settings (e.g., "script.inline: true", +"script.engine.groovy.inline.aggs: false", etc.) accepted the values +`on`, `true`, `1`, and `yes` for enabling a scripting mode, and the +values `off`, `false`, `0`, and `no` for disabling a scripting mode. +The variants `on`, `1`, and `yes ` for enabling and `off`, `0`, +and `no` for disabling are no longer supported. + + +==== Security manager settings + +The option to disable the security manager `security.manager.enabled` has been +removed. In order to grant special permissions to elasticsearch users must +edit the local Java Security Policy. + +==== Network settings + +The `_non_loopback_` value for settings like `network.host` would arbitrarily +pick the first interface not marked as loopback. Instead, specify by address +scope (e.g. `_local_,_site_` for all loopback and private network addresses) +or by explicit interface names, hostnames, or addresses. + +==== Forbid changing of thread pool types + +Previously, <> could be dynamically +adjusted. The thread pool type effectively controls the backing queue for the +thread pool and modifying this is an expert setting with minimal practical +benefits and high risk of being misused. The ability to change the thread pool +type for any thread pool has been removed. It is still possible to adjust +relevant thread pool parameters for each of the thread pools (e.g., depending +on the thread pool type, `keep_alive`, `queue_size`, etc.). + + +==== Analysis settings + +The `index.analysis.analyzer.default_index` analyzer is not supported anymore. +If you wish to change the analyzer to use for indexing, change the +`index.analysis.analyzer.default` analyzer instead. + +==== Ping timeout settings + +Previously, there were three settings for the ping timeout: +`discovery.zen.initial_ping_timeout`, `discovery.zen.ping.timeout` and +`discovery.zen.ping_timeout`. The former two have been removed and the only +setting key for the ping timeout is now `discovery.zen.ping_timeout`. The +default value for ping timeouts remains at three seconds. + +==== Recovery settings + +Recovery settings deprecated in 1.x have been removed: + + * `index.shard.recovery.translog_size` is superseded by `indices.recovery.translog_size` + * `index.shard.recovery.translog_ops` is superseded by `indices.recovery.translog_ops` + * `index.shard.recovery.file_chunk_size` is superseded by `indices.recovery.file_chunk_size` + * `index.shard.recovery.concurrent_streams` is superseded by `indices.recovery.concurrent_streams` + * `index.shard.recovery.concurrent_small_file_streams` is superseded by `indices.recovery.concurrent_small_file_streams` + * `indices.recovery.max_size_per_sec` is superseded by `indices.recovery.max_bytes_per_sec` + +If you are using any of these settings please take the time to review their +purpose. All of the settings above are considered _expert settings_ and should +only be used if absolutely necessary. If you have set any of the above setting +as persistent cluster settings please use the settings update API and set +their superseded keys accordingly. + +The following settings have been removed without replacement + + * `indices.recovery.concurrent_small_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders + * `indices.recovery.concurrent_file_streams` - recoveries are now single threaded. The number of concurrent outgoing recoveries are throttled via allocation deciders + +==== Translog settings + +The `index.translog.flush_threshold_ops` setting is not supported anymore. In +order to control flushes based on the transaction log growth use +`index.translog.flush_threshold_size` instead. + +Changing the translog type with `index.translog.fs.type` is not supported +anymore, the `buffered` implementation is now the only available option and +uses a fixed `8kb` buffer. + +The translog by default is fsynced after every `index`, `create`, `update`, +`delete`, or `bulk` request. The ability to fsync on every operation is not +necessary anymore. In fact, it can be a performance bottleneck and it's trappy +since it enabled by a special value set on `index.translog.sync_interval`. +Now, `index.translog.sync_interval` doesn't accept a value less than `100ms` +which prevents fsyncing too often if async durability is enabled. The special +value `0` is no longer supported. + +==== Request Cache Settings + +The deprecated settings `index.cache.query.enable` and +`indices.cache.query.size` have been removed and are replaced with +`index.requests.cache.enable` and `indices.requests.cache.size` respectively. + +`indices.requests.cache.clean_interval has been replaced with +`indices.cache.clean_interval and is no longer supported. + +==== Field Data Cache Settings + +The `indices.fielddata.cache.clean_interval` setting has been replaced with +`indices.cache.clean_interval`. + +==== Allocation settings + +The `cluster.routing.allocation.concurrent_recoveries` setting has been +replaced with `cluster.routing.allocation.node_concurrent_recoveries`. + +==== Similarity settings + +The 'default' similarity has been renamed to 'classic'. + +==== Indexing settings + +The `indices.memory.min_shard_index_buffer_size` and +`indices.memory.max_shard_index_buffer_size` have been removed as +Elasticsearch now allows any one shard to use amount of heap as long as the +total indexing buffer heap used across all shards is below the node's +`indices.memory.index_buffer_size` (defaults to 10% of the JVM heap). + +==== Removed es.max-open-files + +Setting the system property es.max-open-files to true to get +Elasticsearch to print the number of maximum open files for the +Elasticsearch process has been removed. This same information can be +obtained from the <> API, and a warning is logged +on startup if it is set too low. + +==== Removed es.netty.gathering + +Disabling Netty from using NIO gathering could be done via the escape +hatch of setting the system property "es.netty.gathering" to "false". +Time has proven enabling gathering by default is a non-issue and this +non-documented setting has been removed. + +==== Removed es.useLinkedTransferQueue + +The system property `es.useLinkedTransferQueue` could be used to +control the queue implementation used in the cluster service and the +handling of ping responses during discovery. This was an undocumented +setting and has been removed. + +==== Cache concurrency level settings removed + +Two cache concurrency level settings +`indices.requests.cache.concurrency_level` and +`indices.fielddata.cache.concurrency_level` because they no longer apply to +the cache implementation used for the request cache and the field data cache. +