Docs: Rewrote the migrating-to-2.0 section

This commit is contained in:
Clinton Gormley 2015-08-14 20:26:06 +02:00
parent 0240b581e7
commit db1e83884f
16 changed files with 1545 additions and 982 deletions

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,69 @@
=== Aggregation changes
==== Min doc count defaults to zero
Both the `histogram` and `date_histogram` aggregations now have a default
`min_doc_count` of `0` instead of `1`.
==== Timezone for date field
Specifying the `time_zone` parameter in queries or aggregations on fields of
type `date` must now be either an ISO 8601 UTC offset, or a timezone id. For
example, the value `+1:00` must now be written as `+01:00`.
==== Time zones and offsets
The `histogram` and the `date_histogram` aggregation now support a simplified
`offset` option that replaces the previous `pre_offset` and `post_offset`
rounding options. Instead of having to specify two separate offset shifts of
the underlying buckets, the `offset` option moves the bucket boundaries in
positive or negative direction depending on its argument.
The `date_histogram` options for `pre_zone` and `post_zone` are replaced by
the `time_zone` option. The behavior of `time_zone` is equivalent to the
former `pre_zone` option. Setting `time_zone` to a value like "+01:00" now
will lead to the bucket calculations being applied in the specified time zone.
The `key` is returned as the timestamp in UTC, but the `key_as_string` is
returned in the time zone specified.
In addition to this, the `pre_zone_adjust_large_interval` is removed because
we now always return dates and bucket keys in UTC.
==== Including/excluding terms
`include`/`exclude` filtering on the `terms` aggregation now uses the same
syntax as <<regexp-syntax,regexp queries>> instead of the Java regular
expression syntax. While simple regexps should still work, more complex ones
might need some rewriting. Also, the `flags` parameter is no longer supported.
==== Boolean fields
Aggregations on `boolean` fields will now return `0` and `1` as keys, and
`"true"` and `"false"` as string keys. See <<migration-bool-fields>> for more
information.
==== Java aggregation classes
The `date_histogram` aggregation now returns a `Histogram` object in the
response, and the `DateHistogram` class has been removed. Similarly the
`date_range`, `ipv4_range`, and `geo_distance` aggregations all return a
`Range` object in the response, and the `IPV4Range`, `DateRange`, and
`GeoDistance` classes have been removed.
The motivation for this is to have a single response API for the Range and
Histogram aggregations regardless of the type of data being queried. To
support this some changes were made in the `MultiBucketAggregation` interface
which applies to all bucket aggregations:
* The `getKey()` method now returns `Object` instead of `String`. The actual
object type returned depends on the type of aggregation requested (e.g. the
`date_histogram` will return a `DateTime` object for this method whereas a
`histogram` will return a `Number`).
* A `getKeyAsString()` method has been added to return the String
representation of the key.
* All other `getKeyAsX()` methods have been removed.
* The `getBucketAsKey(String)` methods have been removed on all aggregations
except the `filters` and `terms` aggregations.

View File

@ -0,0 +1,129 @@
=== CRUD and routing changes
==== Explicit custom routing
Custom `routing` values can no longer be extracted from the document body, but
must be specified explicitly as part of the query string, or in the metadata
line in the <<docs-bulk,`bulk`>> API. See <<migration-meta-fields>> for an
example.
==== Routing hash function
The default hash function that is used for routing has been changed from
`djb2` to `murmur3`. This change should be transparent unless you relied on
very specific properties of `djb2`. This will help ensure a better balance of
the document counts between shards.
In addition, the following routing-related node settings have been deprecated:
`cluster.routing.operation.hash.type`::
This was an undocumented setting that allowed to configure which hash function
to use for routing. `murmur3` is now enforced on new indices.
`cluster.routing.operation.use_type`::
This was an undocumented setting that allowed to take the `_type` of the
document into account when computing its shard (default: `false`). `false` is
now enforced on new indices.
==== Delete API with custom routing
The delete API used to be broadcast to all shards in the index which meant
that, when using custom routing, the `routing` parameter was optional. Now,
the delete request is forwarded only to the document holding the shard. If you
are using custom routing then you should specify the `routing` value when
deleting a document, just as is already required for the `index`, `create`,
and `update` APIs.
To make sure that you never forget a routing value, make routing required with
the following mapping:
[source,js]
---------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_routing": {
"required": true
}
}
}
}
---------------------------
==== All stored meta-fields returned by default
Previously, meta-fields like `_routing`, `_timestamp`, etc would only be
included in a GET request if specifically requested with the `fields`
parameter. Now, all meta-fields which have stored values will be returned by
default. Additionally, they are now returned at the top level (along with
`_index`, `_type`, and `_id`) instead of in the `fields` element.
For instance, the following request:
[source,sh]
---------------
GET /my_index/my_type/1
---------------
might return:
[source,js]
---------------
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_timestamp": 10000000, <1>,
"_source": {
"foo" : [ "bar" ]
}
}
---------------
<1> The `_timestamp` is returned by default, and at the top level.
==== Async replication
The `replication` parameter has been removed from all CRUD operations
(`index`, `create`, `update`, `delete`, `bulk`) as it interfered with the
<<indices-synced-flush,synced flush>> feature. These operations are now
synchronous only and a request will only return once the changes have been
replicated to all active shards in the shard group.
Instead, use more client processes to send more requests in parallel.
==== Documents must be specified without a type wrapper
Previously, the document body could be wrapped in another object with the name
of the `type`:
[source,js]
--------------------------
PUT my_index/my_type/1
{
"my_type": { <1>
"text": "quick brown fox"
}
}
--------------------------
<1> This `my_type` wrapper is not part of the document itself, but represents the document type.
This feature was deprecated before but could be reenabled with the
`mapping.allow_type_wrapper` index setting. This setting is no longer
supported. The above document should be indexed as follows:
[source,js]
--------------------------
PUT my_index/my_type/1
{
"text": "quick brown fox"
}
--------------------------
==== Term Vectors API
Usage of `/_termvector` is deprecated in favor of `/_termvectors`.

View File

@ -0,0 +1,42 @@
=== Index API changes
==== Index aliases
Fields used in alias filters no longer have to exist in the mapping at alias
creation time. Previously, alias filters were parsed at alias creation time
and the parsed form was cached in memory. Now, alias filters are parsed at
request time and the fields in filters are resolved from the current mapping.
This also means that index aliases now support `has_parent` and `has_child`
queries.
The <<alias-retrieving, GET alias api>> will now throw an exception if no
matching aliases are found. This change brings the defaults for this API in
line with the other Indices APIs. The <<multi-index>> options can be used on a
request to change this behavior.
==== File based index templates
Index templates can no longer be configured on disk. Use the
<<indices-templates,`_template`>> API instead.
==== Analyze API changes
The Analyze API now returns the the `position` of the first token as `0`
instead of `1`.
The `prefer_local` parameter has been removed. The `_analyze` API is a light
operation and the caller shouldn't be concerned about whether it executes on
the node that receives the request or another node.
The `text()` method on `AnalyzeRequest` now returns `String[]` instead of
`String`.
==== Removed `id_cache` from clear cache api
The <<indices-clearcache,clear cache>> API no longer supports the `id_cache`
option. Instead, use the `fielddata` option to clear the cache for the
`_parent` field.

View File

@ -0,0 +1,76 @@
=== Java API changes
==== Transport API construction
The `TransportClient` construction code has changed, it now uses the builder
pattern. Instead of:
[source,java]
--------------------------------------------------
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = new TransportClient(settings);
--------------------------------------------------
Use the following:
[source,java]
--------------------------------------------------
Settings settings = Settings.settingsBuilder()
.put("cluster.name", "myClusterName").build();
Client client = TransportClient.builder().settings(settings).build();
--------------------------------------------------
==== Automatically thread client listeners
Previously, the user had to set request listener threads to `true` when on the
client side in order not to block IO threads on heavy operations. This proved
to be very trappy for users, and ended up creating problems that are very hard
to debug.
In 2.0, Elasticsearch automatically threads listeners that are used from the
client when the client is a node client or a transport client. Threading can
no longer be manually set.
==== Query/filter refactoring
`org.elasticsearch.index.queries.FilterBuilders` has been removed as part of the merge of
queries and filters. These filters are now available in `QueryBuilders` with the same name.
All methods that used to accept a `FilterBuilder` now accept a `QueryBuilder` instead.
In addition some query builders have been removed or renamed:
* `commonTerms(...)` renamed with `commonTermsQuery(...)`
* `queryString(...)` renamed with `queryStringQuery(...)`
* `simpleQueryString(...)` renamed with `simpleQueryStringQuery(...)`
* `textPhrase(...)` removed
* `textPhrasePrefix(...)` removed
* `textPhrasePrefixQuery(...)` removed
* `filtered(...)` removed. Use `filteredQuery(...)` instead.
* `inQuery(...)` removed.
==== GetIndexRequest
`GetIndexRequest.features()` now returns an array of Feature Enums instead of an array of String values.
The following deprecated methods have been removed:
* `GetIndexRequest.addFeatures(String[])` - Use
`GetIndexRequest.addFeatures(Feature[])` instead
* `GetIndexRequest.features(String[])` - Use
`GetIndexRequest.features(Feature[])` instead.
* `GetIndexRequestBuilder.addFeatures(String[])` - Use
`GetIndexRequestBuilder.addFeatures(Feature[])` instead.
* `GetIndexRequestBuilder.setFeatures(String[])` - Use
`GetIndexRequestBuilder.setFeatures(Feature[])` instead.
==== BytesQueryBuilder removed
The redundant BytesQueryBuilder has been removed in favour of the
WrapperQueryBuilder internally.

View File

@ -0,0 +1,390 @@
=== Mapping changes
A number of changes have been made to mappings to remove ambiguity and to
ensure that conflicting mappings cannot be created.
One major change is that dynamically added fields must have their mapping
confirmed by the master node before indexing continues. This is to avoid a
problem where different shards in the same index dynamically add different
mappings for the same field. These conflicting mappings can silently return
incorrect results and can lead to index corruption.
This change can make indexing slower when frequently adding many new fields.
We are looking at ways of optimising this process but we chose safety over
performance for this extreme use case.
==== Conflicting field mappings
Fields with the same name, in the same index, in different types, must have
the same mapping, with the exception of the <<copy-to>>, <<dynamic>>,
<<enabled>>, <<ignore-above>>, <<include-in-all>>, and <<properties>>
parameters, which may have different settings per field.
[source,js]
---------------
PUT my_index
{
"mappings": {
"type_one": {
"properties": {
"name": { <1>
"type": "string"
}
}
},
"type_two": {
"properties": {
"name": { <1>
"type": "string",
"analyzer": "english"
}
}
}
}
}
---------------
<1> The two `name` fields have conflicting mappings and will prevent Elasticsearch
from starting.
Elasticsearch will not start in the presence of conflicting field mappings.
These indices must be deleted or reindexed using a new mapping.
The `ignore_conflicts` option of the put mappings API has been removed.
Conflicts can't be ignored anymore.
==== Fields cannot be referenced by short name
A field can no longer be referenced using its short name. Instead, the full
path to the field is required. For instance:
[source,js]
---------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": { "type": "string" }, <1>
"name": {
"properties": {
"title": { "type": "string" }, <2>
"first": { "type": "string" },
"last": { "type": "string" }
}
}
}
}
}
}
---------------
<1> This field is referred to as `title`.
<2> This field is referred to as `name.title`.
Previously, the two `title` fields in the example above could have been
confused with each other when using the short name `title`.
==== Type name prefix removed
Previously, two fields with the same name in two different types could
sometimes be disambiguated by prepending the type name. As a side effect, it
would add a filter on the type name to the relevant query. This feature was
ambiguous -- a type name could be confused with a field name -- and didn't
work everywhere e.g. aggregations.
Instead, fields should be specified with the full path, but without a type
name prefix. If you wish to filter by the `_type` field, either specify the
type in the URL or add an explicit filter.
The following example query in 1.x:
[source,js]
----------------------------
GET my_index/_search
{
"query": {
"match": {
"my_type.some_field": "quick brown fox"
}
}
}
----------------------------
would be rewritten in 2.0 as:
[source,js]
----------------------------
GET my_index/my_type/_search <1>
{
"query": {
"match": {
"some_field": "quick brown fox" <2>
}
}
}
----------------------------
<1> The type name can be specified in the URL to act as a filter.
<2> The field name should be specified without the type prefix.
==== Field names may not contain dots
In 1.x, it was possible to create fields with dots in their name, for
instance:
[source,js]
----------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"foo.bar": { <1>
"type": "string"
},
"foo": {
"properties": {
"bar": { <1>
"type": "string"
}
}
}
}
}
}
}
----------------------------
<1> These two fields cannot be distinguised as both are referred to as `foo.bar`.
You can no longer create fields with dots in the name.
==== Type names may not start with a dot
In 1.x, Elasticsearch would issue a warning if a type name included a dot,
e.g. `my.type`. Now that type names are no longer used to distinguish between
fields in differnt types, this warning has been relaxed: type names may now
contain dots, but they may not *begin* with a dot. The only exception to this
is the special `.percolator` type.
==== Types may no longer be deleted
In 1.x it was possible to delete a type mapping, along with all of the
documents of that type, using the delete mapping API. This is no longer
supported, because remnants of the fields in the type could remain in the
index, causing corruption later on.
Instead, if you need to delete a type mapping, you should reindex to a new
index which does not contain the mapping. If you just need to delete the
documents that belong to that type, then use the delete-by-query plugin
instead.
[[migration-meta-fields]]
==== Type meta-fields
The <<mapping-fields,meta-fields>> associated with had configuration options
removed, to make them more reliable:
* `_id` configuration can no longer be changed. If you need to sort, use the <<mapping-uid-field,`_uid`>> field instead.
* `_type` configuration can no longer be changed.
* `_index` configuration can no longer be changed.
* `_routing` configuration is limited to marking routing as required.
* `_field_names` configuration is limited to disabling the field.
* `_size` configuration is limited to enabling the field.
* `_timestamp` configuration is limited to enabling the field, setting format and default value.
* `_boost` has been removed.
* `_analyzer` has been removed.
Importantly, *meta-fields can no longer be specified as part of the document
body.* Instead, they must be specified in the query string parameters. For
instance, in 1.x, the `routing` could be specified as follows:
[source,json]
-----------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_routing": {
"path": "group" <1>
},
"properties": {
"group": { <1>
"type": "string"
}
}
}
}
}
PUT my_index/my_type/1 <2>
{
"group": "foo"
}
-----------------------------
<1> This 1.x mapping tells Elasticsearch to extract the `routing` value from the `group` field in the document body.
<2> This indexing request uses a `routing` value of `foo`.
In 2.0, the routing must be specified explicitly:
[source,json]
-----------------------------
PUT my_index
{
"mappings": {
"my_type": {
"_routing": {
"required": true <1>
},
"properties": {
"group": {
"type": "string"
}
}
}
}
}
PUT my_index/my_type/1?routing=bar <2>
{
"group": "foo"
}
-----------------------------
<1> Routing can be marked as required to ensure it is not forgotten during indexing.
<2> This indexing request uses a `routing` value of `bar`.
==== Analyzer mappings
Previously, `index_analyzer` and `search_analyzer` could be set separately,
while the `analyzer` setting would set both. The `index_analyzer` setting has
been removed in favour of just using the `analyzer` setting.
If just the `analyzer` is set, it will be used at index time and at search time. To use a different analyzer at search time, specify both the `analyzer` and a `search_analyzer`.
The `index_analyzer`, `search_analyzer`, and `analyzer` type-level settings
have also been removed, as is is no longer possible to select fields based on
the type name.
The `_analyzer` meta-field, which allowed setting an analyzer per document has
also been removed. It will be ignored on older indices.
==== Date fields and Unix timestamps
Previously, `date` fields would first try to parse values as a Unix timestamp
-- milliseconds-since-the-epoch -- before trying to use their defined date
`format`. This meant that formats like `yyyyMMdd` could never work, as values
would be interpreted as timestamps.
In 2.0, we have added two formats: `epoch_millis` and `epoch_second`. Only
date fields that use these formats will be able to parse timestamps.
These formats cannot be used in dynamic templates, because they are
indistinguishable from long values.
==== Default date format
The default date format has changed from `date_optional_time` to
`strict_date_optional_time`, which expects a 4 digit year, and a 2 digit month
and day, (and optionally, 2 digit hour, minute, and second).
A dynamically added date field, by default, includes the `epoch_millis`
format to support timestamp parsing. For instance:
[source,js]
-------------------------
PUT my_index/my_type/1
{
"date_one": "2015-01-01" <1>
}
-------------------------
<1> Has `format`: `"strict_date_optional_time||epoch_millis"`.
[[migration-bool-fields]]
==== Boolean fields
Boolean fields used to have a string fielddata with `F` meaning `false` and `T`
meaning `true`. They have been refactored to use numeric fielddata, with `0`
for `false` and `1` for `true`. As a consequence, the format of the responses of
the following APIs changed when applied to boolean fields: `0`/`1` is returned
instead of `F`/`T`:
* <<search-request-fielddata-fields,fielddata fields>>
* <<search-request-sort,sort values>>
* <<search-aggregations-bucket-terms-aggregation,terms aggregations>>
In addition, terms aggregations use a custom formatter for boolean (like for
dates and ip addresses, which are also backed by numbers) in order to return
the user-friendly representation of boolean fields: `false`/`true`:
[source,js]
---------------
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 42
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 12
}
]
---------------
==== `index_name` and `path` removed
The `index_name` setting was used to change the name of the Lucene field,
and the `path` setting was used on `object` fields to determine whether the
Lucene field should use the full path (including parent object fields), or
just the final `name`.
These setting have been removed as their purpose is better served with the
<<copy-to>> parameter.
==== Murmur3 Fields
Fields of type `murmur3` can no longer change `doc_values` or `index` setting.
They are always mapped as follows:
[source,js]
---------------------
{
"type": "murmur3",
"index": "no",
"doc_values": true
}
---------------------
==== Mappings in config files not supported
The ability to specify mappings in configuration files has been removed. To
specify default mappings that apply to multiple indexes, use
<<indices-templates,index templates>> instead.
Along with this change, the following settings have ben removed:
* `index.mapper.default_mapping_location`
* `index.mapper.default_percolator_mapping_location`
==== Posting and doc-values codecs
It is no longer possible to specify per-field postings and doc values formats
in the mappings. This setting will be ignored on indices created before 2.0
and will cause mapping parsing to fail on indices created on or after 2.0. For
old indices, this means that new segments will be written with the default
postings and doc values formats of the current codec.
It is still possible to change the whole codec by using the `index.codec`
setting. Please however note that using a non-default codec is discouraged as
it could prevent future versions of Elasticsearch from being able to read the
index.
==== Compress and compress threshold
The `compress` and `compress_threshold` options have been removed from the
`_source` field and fields of type `binary`. These fields are compressed by
default. If you would like to increase compression levels, use the new
<<index-codec,`index.codec: best_compression`>> setting instead.

View File

@ -0,0 +1,58 @@
=== Plugin and packaging changes
==== Symbolic links and paths
Elasticsearch 2.0 runs with the Java security manager enabled and is much more
restrictive about which paths it is allowed to access. Various paths can be
configured, e.g. `path.data`, `path.scripts`, `path.repo`. A configured path
may itself be a symbolic link, but no symlinks under that path will be
followed (with the exception of `path.scripts`, which does follow symlinks).
==== Running `/bin/elasticsearch`
The command line parameter parsing has been rewritten to deal properly with
spaces in parameters. All config settings can still be specified on the
command line when starting Elasticsearch, but they must appear after the
built-in "static parameters", such as `-d` (to daemonize) and `-p` (the PID path).
For instance:
[source,sh]
-----------
/bin/elasticsearch -d -p /tmp/foo.pid --http.cors.enabled=true --http.cors.allow-origin='*'
-----------
For a list of static parameters, run `/bin/elasticsearch -h`
==== `-f` removed
The `-f` parameter, which used to indicate that Elasticsearch should be run in
the foreground, was deprecated in 1.0 and removed in 2.0.
==== `V` for version
The `-v` parameter now means `--verbose` for both `bin/plugin` and
`bin/elasticsearch` (although it has no effect on the latter). To output the
version, use `-V` or `--version` instead.
==== Plugin manager should run as root
The permissions of the `config`, `bin`, and `plugins` directories in the RPM
and deb packages have been made more restrictive. The plugin manager should
be run as root otherwise it will not be able to install plugins.
==== Support for official plugins
Almost all of the official Elasticsearch plugins have been moved to the main
`elasticsearch` repository. They will be released at the same time as
Elasticsearch and have the same version number as Elasticsearch.
Official plugins can be installed as follows:
[source,sh]
---------------
sudo bin/plugin install analysis-icu
---------------
Community-provided plugins can be installed as before.

View File

@ -0,0 +1,43 @@
=== Parent/Child changes
Parent/child has been rewritten completely to reduce memory usage and to
execute `has_child` and `has_parent` queries faster and more efficient. The
`_parent` field uses doc values by default. The refactored and improved
implementation is only active for indices created on or after version 2.0.
In order to benefit from all the performance and memory improvements, we
recommend reindexing all existing indices that use the `_parent` field.
==== Parent type cannot pre-exist
A mapping type is declared as a child of another mapping type by specifying
the `_parent` meta field:
[source,js]
--------------------------
DELETE *
PUT my_index
{
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent" <1>
}
}
}
}
--------------------------
<1> The `my_parent` type is the parent of the `my_child` type.
The mapping for the parent type can be added at the same time as the mapping
for the child type, but cannot be added before the child type.
==== `top_children` query removed
The `top_children` query has been removed in favour of the `has_child` query.
It wasn't always faster than the `has_child` query and the was usually
inaccurate. The total hits and any aggregations in the same search request
would be incorrect if `top_children` was used.

View File

@ -0,0 +1,186 @@
=== Query DSL changes
==== Queries and filters merged
Queries and filters have been merged -- all filter clauses are now query
clauses. Instead, query clauses can now be used in _query context_ or in
_filter context_:
Query context::
A query used in query context will caculated relevance scores and will not be
cacheable. Query context is used whenever filter context does not apply.
Filter context::
+
--
A query used in filter context will not calculate relevance scores, and will
be cacheable. Filter context is introduced by:
* the `constant_score` query
* the `must_not` and (newly added) `filter` parameter in the `bool` query
* the `filter` and `filters` parameters in the `function_score` query
* any API called `filter`, such as the `post_filter` search parameter, or in
aggregations or index aliases
--
As a result of this change, he `execution` option of the `terms` filter is now
deprecated and ignored if provided.
==== `or` and `and` now implemented via `bool`
The `or` and `and` filters previously had a different execution pattern to the
`bool` filter. It used to be important to use `and`/`or` with certain filter
clauses, and `bool` with others.
This distinction has been removed: the `bool` query is now smart enough to
handle both cases optimally. As a result of this change, the `or` and `and`
filters are now sugar syntax which are executed internally as a `bool` query.
These filters may be removed in the future.
==== `filtered` query and `query` filter deprecated
The `query` filter is deprecated as is it no longer needed -- all queries can
be used in query or filter context.
The `filtered` query is deprecated in favour of the `bool` query. Instead of
the following:
[source,js]
-------------------------
GET _search
{
"query": {
"filtered": {
"query": {
"match": {
"text": "quick brown fox"
}
},
"filter": {
"term": {
"status": "published"
}
}
}
}
}
-------------------------
move the query and filter to the `must` and `filter` parameters in the `bool`
query:
[source,js]
-------------------------
GET _search
{
"query": {
"bool": {
"must": {
"match": {
"text": "quick brown fox"
}
},
"filter": {
"term": {
"status": "published"
}
}
}
}
}
-------------------------
==== Filter auto-caching
It used to be possible to control which filters were cached with the `_cache`
option and to provide a custom `_cache_key`. These options are deprecated
and, if present, will be ignored.
Query clauses used in filter context are now auto-cached when it makes sense
to do so. The algorithm takes into account the frequency of use, the cost of
query execution, and the cost of building the filter.
The `terms` filter lookup mechanism no longer caches the values of the
document containing the terms. It relies on the filesystem cache instead. If
the lookup index is not too large, it is recommended to replicate it to all
nodes by setting `index.auto_expand_replicas: 0-all` in order to remove the
network overhead as well.
==== Numeric queries use IDF for scoring
Previously, term queries on numeric fields were deliberately prevented from
using the usual Lucene scoring logic and this behaviour was undocumented and,
to some, unexpected.
Single `term` queries on numeric fields now score in the same way as string
fields, using IDF and norms (if enabled).
To query numeric fields without scoring, the query clause should be used in
filter context, e.g. in the `filter` parameter of the `bool` query, or wrapped
in a `constant_score` query:
[source,js]
----------------------------
GET _search
{
"query": {
"bool": {
"must": [
{
"match": { <1>
"numeric_tag": 5
}
}
],
"filter": [
{
"match": { <2>
"count": 5
}
}
]
}
}
}
----------------------------
<1> This clause would include IDF in the relevance score calculation.
<2> This clause would have no effect on the relevance score.
==== Fuzziness and fuzzy-like-this
Fuzzy matching used to calculate the score for each fuzzy alternative, meaning
that rare misspellings would have a higher score than the more common correct
spellings. Now, fuzzy matching blends the scores of all the fuzzy alternatives
to use the IDF of the most frequently occurring alternative.
Fuzziness can no longer be specified using a percentage, but should instead
use the number of allowed edits:
* `0`, `1`, `2`, or
* `AUTO` (which chooses `0`, `1`, or `2` based on the length of the term)
The `fuzzy_like_this` and `fuzzy_like_this_field` queries used a very
expensive approach to fuzzy matching and have been removed.
==== More Like This
The More Like This (`mlt`) API and the `more_like_this_field` (`mlt_field`)
query have been removed in favor of the
<<query-dsl-mlt-query, `more_like_this`>> query.
The parameter `percent_terms_to_match` has been removed in favor of
`minimum_should_match`.
==== `limit` filter deprecated
The `limit` filter is deprecated and becomes a no-op. You can achieve similar
behaviour using the <<search-request-body,terminate_after>> parameter.
==== Jave plugins registering custom queries
Java plugins that register custom queries can do so by using the
`IndicesQueriesModule#addQuery(Class<? extends QueryParser>)` method. Other
ways to register custom queries are not supported anymore.

View File

@ -0,0 +1,68 @@
=== Removed features
==== Rivers have been removed
Elasticsearch does not support rivers anymore. While we had first planned to
keep them around to ease migration, keeping support for rivers proved to be
challenging as it conflicted with other important changes that we wanted to
bring to 2.0 like synchronous dynamic mappings updates, so we eventually
decided to remove them entirely. See
link:/blog/deprecating_rivers[Deprecating Rivers] for more background about
why we took this decision.
==== Facets have been removed
Facets, deprecated since 1.0, have now been removed. Instead, use the much
more powerful and flexible <<search-aggregations,aggregations>> framework.
This also means that Kibana 3 will not work with Elasticsearch 2.0.
==== Delete-by-query is now a plugin
The old delete-by-query functionality was fast but unsafe. It could lead to
document differences between the primary and replica shards, and could even
produce out of memory exceptions and cause the cluster to crash.
This feature has been reimplemented using the <<scroll-scan,scroll/scan>> and
the <<docs-bulk,`bulk`>> API, which may be slower for queries which match
large numbers of documents, but is safe.
Currently, a long running delete-by-query job cannot be cancelled, which is
one of the reasons that this functionality is only available as a plugin. You
can install the plugin with:
[source,sh]
------------------
./bin/plugin install delete-by-query
------------------
==== `_shutdown` API
The `_shutdown` API has been removed without a replacement. Nodes should be
managed via the operating system and the provided start/stop scripts.
==== `_size` is now a plugin
The `_size` meta-data field, which indexes the size in bytes of the original
JSON document, has been moved out of core and is available as a plugin. It
can be installed as:
[source,sh]
------------------
./bin/plugin install mapper-size
------------------
==== Thrift and memcached transport
The thrift and memcached transport plugins are no longer supported. Instead, use
either the HTTP transport (enabled by default) or the node or transport Java client.
==== Bulk UDP
The bulk UDP API has been removed. Instead, use the standard
<<docs-bulk,`bulk`>> API, or use UDP to send documents to Logstash first.
==== MergeScheduler pluggability
The merge scheduler is no longer pluggable.

View File

@ -0,0 +1,102 @@
=== Scripting changes
==== Scripting syntax
The syntax for scripts has been made consistent across all APIs. The accepted
format is as follows:
Inline/Dynamic scripts::
+
--
[source,js]
---------------
"script": {
"inline": "doc['foo'].value + val", <1>
"lang": "groovy", <2>
"params": { "val": 3 } <3>
}
---------------
<1> The inline script to execute.
<2> The optional language of the script.
<3> Any named parameters.
--
Indexed scripts::
+
--
[source,js]
---------------
"script": {
"id": "my_script_id", <1>
"lang": "groovy", <2>
"params": { "val": 3 } <3>
}
---------------
<1> The ID of the indexed script.
<2> The optional language of the script.
<3> Any named parameters.
--
File scripts::
+
--
[source,js]
---------------
"script": {
"file": "my_file", <1>
"lang": "groovy", <2>
"params": { "val": 3 } <3>
}
---------------
<1> The filename of the script, without the `.lang` suffix.
<2> The optional language of the script.
<3> Any named parameters.
--
For example, an update request might look like this:
[source,js]
---------------
POST my_index/my_type/1/_update
{
"script": {
"inline": "ctx._source.count += val",
"params": { "val": 3 }
},
"upsert": {
"count": 0
}
}
---------------
A short syntax exists for running inline scripts in the default scripting
language without any parameters:
[source,js]
----------------
GET _search
{
"script_fields": {
"concat_fields": {
"script": "doc['one'].value + ' ' + doc['two'].value"
}
}
}
----------------
==== Scripting settings
The `script.disable_dynamic` node setting has been replaced by fine-grained
script settings described in <<migration-script-settings>>.
==== Groovy scripts sandbox
The Groovy sandbox and related settings have been removed. Groovy is now a
non-sandboxed scripting language, without any option to turn the sandbox on.
==== Plugins making use of scripts
Plugins that make use of scripts must register their own script context
through `ScriptModule`. Script contexts can be used as part of fine-grained
settings to enable/disable scripts selectively.

View File

@ -0,0 +1,121 @@
=== Search changes
==== Partial fields
Partial fields have been removed in favor of <<search-request-source-filtering,source filtering>>.
==== `search_type=count` deprecated
The `count` search type has been deprecated. All benefits from this search
type can now be achieved by using the (default) `query_then_fetch` search type
and setting `size` to `0`.
==== The count api internally uses the search api
The count api is now a shortcut to the search api with `size` set to 0. As a
result, a total failure will result in an exception being returned rather
than a normal response with `count` set to `0` and shard failures.
==== All stored meta-fields returned by default
Previously, meta-fields like `_routing`, `_timestamp`, etc would only be
included in the search results if specifically requested with the `fields`
parameter. Now, all meta-fields which have stored values will be returned by
default. Additionally, they are now returned at the top level (along with
`_index`, `_type`, and `_id`) instead of in the `fields` element.
For instance, the following request:
[source,sh]
---------------
GET /my_index/_search?fields=foo
---------------
might return:
[source,js]
---------------
{
[...]
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1,
"_timestamp": 10000000, <1>
"fields": {
"foo" : [ "bar" ]
}
}
]
}
}
---------------
<1> The `_timestamp` is returned by default, and at the top level.
==== Script fields
Script fields in 1.x were only returned as a single value. Even if the return
value of a script was a list, it would be returned as an array containing an
array:
[source,js]
---------------
"fields": {
"my_field": [
[
"v1",
"v2"
]
]
}
---------------
In elasticsearch 2.0, scripts that return a list of values are treated as
multivalued fields. The same example would return the following response, with
values in a single array.
[source,js]
---------------
"fields": {
"my_field": [
"v1",
"v2"
]
}
---------------
==== Timezone for date field
Specifying the `time_zone` parameter in queries or aggregations on fields of
type `date` must now be either an ISO 8601 UTC offset, or a timezone id. For
example, the value `+1:00` must now be written as `+01:00`.
==== Only highlight queried fields
The default value for the `require_field_match` option has changed from
`false` to `true`, meaning that the highlighters will, by default, only take
the fields that were queried into account.
This means that, when querying the `_all` field, trying to highlight on any
field other than `_all` will produce no highlighted snippets. Querying the
same fields that need to be highlighted is the cleaner solution to get
highlighted snippets back. Otherwise `require_field_match` option can be set
to `false` to ignore field names completely when highlighting.
The postings highlighter doesn't support the `require_field_match` option
anymore, it will only highlight fields that were queried.
==== Postings highlighter doesn't support `match_phrase_prefix`
The `match` query with type set to `phrase_prefix` (or the
`match_phrase_prefix` query) is not supported by the postings highlighter. No
highlighted snippets will be returned.

View File

@ -0,0 +1,125 @@
=== Setting changes
[[migration-script-settings]]
==== Scripting settings
The `script.disable_dynamic` node setting has been replaced by fine-grained
script settings described in the <<enable-dynamic-scripting,scripting docs>>.
The following setting previously used to enable dynamic or inline scripts:
[source,yaml]
---------------
script.disable_dynamic: false
---------------
It should be replaced with the following two settings in `elasticsearch.yml` that
achieve the same result:
[source,yaml]
---------------
script.inline: on
script.indexed: on
---------------
==== Units required for time and byte-sized settings
Any settings which accept time or byte values must now be specified with
units. For instance, it is too easy to set the `refresh_interval` to 1
*millisecond* instead of 1 second:
[source,js]
---------------
PUT _settings
{
"index.refresh_interval": 1
}
---------------
In 2.0, the above request will throw an exception. Instead the refresh
interval should be set to `"1s"` for one second.
==== Shadow replica settings
The `node.enable_custom_paths` setting has been removed and replaced by the
`path.shared_data` setting to allow shadow replicas with custom paths to work
with the security manager. For example, if your previous configuration had:
[source,yaml]
------
node.enable_custom_paths: true
------
And you created an index using shadow replicas with `index.data_path` set to
`/opt/data/my_index` with the following:
[source,js]
--------------------------------------------------
PUT /my_index
{
"index": {
"number_of_shards": 1,
"number_of_replicas": 4,
"data_path": "/opt/data/my_index",
"shadow_replicas": true
}
}
--------------------------------------------------
For 2.0, you will need to set `path.shared_data` to a parent directory of the
index's data_path, so:
[source,yaml]
-----------
path.shared_data: /opt/data
-----------
==== Resource watcher settings renamed
The setting names for configuring the resource watcher have been renamed
to prevent clashes with the watcher plugin
* `watcher.enabled` is now `resource.reload.enabled`
* `watcher.interval` is now `resource.reload.interval`
* `watcher.interval.low` is now `resource.reload.interval.low`
* `watcher.interval.medium` is now `resource.reload.interval.medium`
* `watcher.interval.high` is now `resource.reload.interval.high`
==== Hunspell dictionary configuration
The parameter `indices.analysis.hunspell.dictionary.location` has been
removed, and `<path.conf>/hunspell` is always used.
==== CORS allowed origins
The CORS allowed origins setting, `http.cors.allow-origin`, no longer has a default value. Previously, the default value
was `*`, which would allow CORS requests from any origin and is considered insecure. The `http.cors.allow-origin` setting
should be specified with only the origins that should be allowed, like so:
[source,yaml]
---------------
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
---------------
==== JSONP support
JSONP callback support has now been removed. CORS should be used to access Elasticsearch
over AJAX instead:
[source,yaml]
---------------
http.cors.enabled: true
http.cors.allow-origin: /https?:\/\/localhost(:[0-9]+)?/
---------------
==== In memory indices
The `memory` / `ram` store (`index.store.type`) option was removed in
Elasticsearch. In-memory indices are no longer supported.
==== Log messages truncated
Log messages are now truncated at 10,000 characters. This can be changed in
the `logging.yml` configuration file with the `file.layout.conversionPattern`
setting.
Remove mapping.date.round_ceil setting for date math parsing #8889 (issues: #8556, #8598)

View File

@ -0,0 +1,37 @@
=== Snapshot and Restore changes
==== File-system repositories must be whitelisted
Locations of the shared file system repositories and the URL repositories with
`file:` URLs now have to be registered before starting Elasticsearch using the
`path.repo` setting. The `path.repo` setting can contain one or more
repository locations:
[source,yaml]
---------------
path.repo: ["/mnt/daily", "/mnt/weekly"]
---------------
If the repository location is specified as an absolute path it has to start
with one of the locations specified in `path.repo`. If the location is
specified as a relative path, it will be resolved against the first location
specified in the `path.repo` setting.
==== URL repositories must be whitelisted
URL repositories with `http:`, `https:`, and `ftp:` URLs have to be
whitelisted before starting Elasticsearch with the
`repositories.url.allowed_urls` setting. This setting supports wildcards in
the place of host, path, query, and fragment. For example:
[source,yaml]
-----------------------------------
repositories.url.allowed_urls: ["http://www.example.org/root/*", "https://*.mydomain.com/*?*#*"]
-----------------------------------
==== Wildcard expansion
The obsolete parameters `expand_wildcards_open` and `expand_wildcards_close`
are no longer supported by the snapshot and restore operations. These
parameters have been replaced by a single `expand_wildcards` parameter. See
<<multi-index,the multi-index docs>> for more.

View File

@ -0,0 +1,57 @@
=== Stats, info, and `cat` changes
==== Sigar removed
We no longer ship the Sigar library for operating system dependent statistics,
as it no longer seems to be maintained. Instead, we rely on the statistics
provided by the JVM. This has resulted in a number of changes to the node
info, and node stats responses:
* `network.*` has been removed from nodes info and nodes stats.
* `fs.*.dev` and `fs.*.disk*` have been removed from nodes stats.
* `os.*` has been removed from nodes stats, except for `os.timestamp`,
`os.load_average`, `os.mem.*`, and `os.swap.*`.
* `os.mem.total` and `os.swap.total` have been removed from nodes info.
* `process.mem.resident` and `process.mem.share` have been removed from node stats.
==== Removed `id_cache` from stats apis
Removed `id_cache` metric from nodes stats, indices stats and cluster stats
apis. This metric has also been removed from the shards cat, indices cat and
nodes cat apis. Parent/child memory is now reported under fielddata, because
it has internally be using fielddata for a while now.
To just see how much parent/child related field data is taking, the
`fielddata_fields` option can be used on the stats apis. Indices stats
example:
[source,js]
--------------------------------------------------
GET /_stats/fielddata?fielddata_fields=_parent
--------------------------------------------------
==== Percolator stats
The total time spent running percolator queries is now called `percolate.time`
instead of `percolate.get_time`.
==== Cluster state REST API
The cluster state API doesn't return the `routing_nodes` section anymore when
`routing_table` is requested. The newly introduced `routing_nodes` flag can be
used separately to control whether `routing_nodes` should be returned.
==== Index status API
The deprecated index status API has been removed.
==== `cat` APIs verbose by default
The `cat` APIs now default to being verbose, which means they output column
headers by default. Verbosity can be turned off with the `v` parameter:
[source,sh]
-----------------
GET _cat/shards?v=0
-----------------

View File

@ -0,0 +1,20 @@
=== Multiple `data.path` striping
Previously, if the `data.path` setting listed multiple data paths, then a
shard would be ``striped'' across all paths by writing a whole file to each
path in turn (in accordance with the `index.store.distributor` setting). The
result was that files from a single segment in a shard could be spread across
multiple disks, and the failure of any one disk could corrupt multiple shards.
This striping is no longer supported. Instead, different shards may be
allocated to different paths, but all of the files in a single shard will be
written to the same path.
If striping is detected while starting Elasticsearch 2.0.0 or later, *all of
the files belonging to the same shard will be migrated to the same path*. If
there is not enough disk space to complete this migration, the upgrade will be
cancelled and can only be resumed once enough disk space is made available.
The `index.store.distributor` setting has also been removed.