This commit limits the `index.translog.sync_interval` to a value not less than `100ms` and
removes the support for fsync on every operation which used to be enabled if `index.translog.sync_interval` was set to `0s`
Now this pr also only schedules an async fsync if the durability is set to `async`. By default not async task is scheduled.
Closes#16152
This commit converts the script mode settings to the new settings
infrastructure. This is a major refactoring of the handling of script
mode settings. This refactoring is necessary because these settings are
determined at runtime based on the registered script engines and the
registered script contexts.
Doc values can now only be enabled by setting `doc_values: true` in the
mappings. Removing this feature also means that we can now fail mapping updates
that try to disable doc values.
Doc values currently default to `true` if the field is indexed and not analyzed.
So setting `index:no` automatically disables doc values, which is not explicit
in the documentation.
This commit makes doc values default to true for numerics, booleans regardless
of whether they are indexed. Not indexed strings still don't have doc values,
since we can't know whether it is rather a text or keyword field. This
potential source of confusion should go away when we split `string` into `text`
and `keyword`.
RescoreBuilder: Add parsing and creating of RescoreSearchContext
Adding the ability to parse from xContent to the rescore builder. Also making RescoreBuilder an abstract base class that encapsulates the window_size setting, with QueryRescoreBuilder as its only implementation at the moment.
Relates to #15559
This commit modifies the default setting for standard output in the
systemd configuration to the journal instead of /dev/null. This is to
address a user pain point where Elasticsearch would fail to start but
the error message would be sent to standard output and therefore
/dev/null leading to difficult-to-debug situations.
* Cleaned up MapperService#searchFilter(...) and moved it DefaultSearchContext, since it that class was the only user. As part of the cleanup percolate query documents are no longer excluded from the search response.
* Removed resolveClosestNestedObjectMapper(...) method as it was no longer used.
* Removed DocumentTypeListener infrastructure. Before it was used by the percolator and parent/child, but these features no longer use it.
Closes#15924
* Remove remaining 1.x bwc logic.
* Stop storing stored fields and indexed terms. The _parent field's only purpose is to support joins between parent and child type and only storing doc values is sufficient.
* In the mapping the parent field mapper is now known under '{parent}#{child}' key, because this is the field the parent/child join uses too.
* Added new sub fetch phase to lookup that _parent field from doc values field if that is required (before this was fetched from stored _parent field)
* Removed the ability to query directly on `_parent` in the query dsl. Instead the `{parent}#{child}` field should be used. Under the hood a doc values query is used instead of a term query, because only doc values fields are stored now.
* Added a new `parent_id` query to easily query child documents with a specific parent id without having to know what join field to use
* Also in aggregations `_parent` field can't be used any more and `{parent}#{child}` field name should be used instead to aggregate directly on the _parent join field.
This commit modifies the load_average in the node stats API response
to be an object containing the one-minute, five-minute and
fifteen-minute load averages as fields (if those values are
available). Additionally, this commit modifies the cat nodes API
response to format the one-minute, five-minute and fifteen-minute load
averages as null if any of the respective values are not available.
The indexing buffer on a node (default: 10% of the JVM heap) is now a "shared pool" across all shards on that node. This way, shards doing intense indexing can use much more than other shards doing only light indexing, and only once the sum of all indexing buffers across all shards exceeds the node's indexing buffer will we ask shards to move recently indexed documents to segments on disk.
Warmers are now barely useful and will be removed in 3.0. Note that this only
removes the warmer API and query-based warmers. We still have warmers internally
for eg. global ordinals.
Close#15607
* Added percolator field mapper that extracts the query terms and indexes these terms with the percolator query.
* At percolate time these extracted terms are used to query percolator queries that are like to be evaluated. This can significantly cut down the time it takes to percolate. Whereas before all percolator queries were evaluated if they matches with the document being percolated.
* Changes made to percolator queries are no longer immediately visible, a refresh needs to happen before the changes are visible.
* By default the percolate api only returns upto 10 matches instead of returning all matching percolator queries.
* Made percolate more modular, so that it is easier to add unit tests.
* Added unit tests for the percolator.
Closes#12664Closes#13646
When specifying a string field, you can either do:
```
{
"foo": "bar"
}
```
or
```
{
"foo": {
"value": "bar",
"boost": 42
}
}
```
The latter option is now removed.
Closes#15388
Today we throttle recoveries only for incoming recoveries. Nodes that have a lot
of primaries can get overloaded due to too many recoveries. To still keep that at bay
we limit the number of threads that are sending files to the target to overcome this problem.
The right solution here is to also throttle the outgoing recoveries that are today unbounded on
the master and don't start the recovery until we have enough resources on both source and target nodes.
The concurrency aspects of the recovery source also added a lot of complexity and additional threadpools
that are hard to configure. This commit removes the concurrent streamns notion completely and sends files
in the thread that drives the recovery simplifying the recovery code considerably.
Outgoing recoveries are not throttled on the master via a allocation decider.
Today we have two variants of translogs for indexing. We only recommend the buffered
one which also has a 20% advantage in indexing speed. This commit removes the option and defaults
to the buffered case. It also hard-wires the translog buffer to 8kb instead of 64kb. We used to
adjust that buffer based on if the shard is active or not, this code has also been removed and
instead we just keep an 8kb buffer arround.
This option allows to force the xcontent type to use to store the `_source`
document. The default is to use the same format as the input format.
This commit makes this option ignored for 2.x indices and rejected for 3.0
indices.
As a replacement use ExistsQueryBuilder inside a mustNot() clause.
So instead of using `new ExistsQueryBuilder(name)` now use:
`new BoolQueryBuilder().mustNot(new ExistsQueryBuilder(name))`.
Closes#14112
throw exception if a copy_to is within a multi field
Copy to within multi field is ignored from 2.0 on, see #10802.
Instead of just ignoring it, we should throw an exception if this
is found in the mapping when a mapping is added. For already
existing indices we should at least log a warning.
We remove the copy_to in any case.
related to #14946
When using S3 or EC2, it was possible to use a proxy to access EC2 or S3 API but username and password were not possible to be set.
This commit adds support for this. Also, to make all that consistent, proxy settings for both plugins have been renamed:
* from `cloud.aws.proxy_host` to `cloud.aws.proxy.host`
* from `cloud.aws.ec2.proxy_host` to `cloud.aws.ec2.proxy.host`
* from `cloud.aws.s3.proxy_host` to `cloud.aws.s3.proxy.host`
* from `cloud.aws.proxy_port` to `cloud.aws.proxy.port`
* from `cloud.aws.ec2.proxy_port` to `cloud.aws.ec2.proxy.port`
* from `cloud.aws.s3.proxy_port` to `cloud.aws.s3.proxy.port`
New settings are `proxy.username` and `proxy.password`.
```yml
cloud:
aws:
protocol: https
proxy:
host: proxy1.company.com
port: 8083
username: myself
password: theBestPasswordEver!
```
You can also set different proxies for `ec2` and `s3`:
```yml
cloud:
aws:
s3:
proxy:
host: proxy1.company.com
port: 8083
username: myself1
password: theBestPasswordEver1!
ec2:
proxy:
host: proxy2.company.com
port: 8083
username: myself2
password: theBestPasswordEver2!
```
Note that `password` is filtered with `SettingsFilter`.
We also fix a potential issue in S3 repository. We were supposed to accept key/secret either set under `cloud.aws` or `cloud.aws.s3` but the actual code never implemented that.
It was:
```java
account = settings.get("cloud.aws.access_key");
key = settings.get("cloud.aws.secret_key");
```
We replaced that by:
```java
String account = settings.get(CLOUD_S3.KEY, settings.get(CLOUD_AWS.KEY));
String key = settings.get(CLOUD_S3.SECRET, settings.get(CLOUD_AWS.SECRET));
```
Also, we extract all settings for S3 in `AwsS3Service` as it's already the case for `AwsEc2Service` class.
Closes#15268.
Do not to load fields from _source when using the `fields` option.
Non stored (non existing) fields are ignored by the fields visitor when using the `fields` option.
Fixes#10783
Support * wildcard to retrieve stored fields when using the `fields` option.
Supported pattern styles are "xxx*", "*xxx", "*xxx*" and "xxx*yyy".
We actually want to keep the test when using deprecated setting in 3.0.
We will keep this setting deprecated as well so people will be able to update in a smoother way.
Also add the deprecating information to the migration documentation.
query_binary and filter_binary are unused at this point, as we only parse on the coordinating node and the java api only holds structured java objects for queries and filters, meaning they all implement Writeable and get natively serialized.
Relates to #14308Closes#14433
This commit forbids the changing of thread pool types for any thread
pool. The motivation here is that these are expert settings with
little practical advantage.
Closes#14294, relates #2509, relates #2858, relates #5152
Removes the mapping transform feature which when used made debugging very
difficult. Users should transform their documents on the way into
Elasticsearch rather than having Elasticsearch do it.
Closes#12674
We have two types of parse methods for queries: one for the inner query, to be used once the parser is positioned within the query element, and one for the whole query source, including the query element that wraps the actual query.
With the search refactoring we ended up using the former in count, cat count and delete by query, whereas we should have used the former. It ends up working properly given that we have a registered (deprecated) query called "query", which used to allow to wrap a filter into a query, but this has the following downsides:
1) prevents us from removing the deprecated "query" query
2) we end up supporting a top level query that is not wrapped within a query element (pre 1.0 syntax iirc that shouldn't be supported anymore)
This commit finally removes the "query" query and fixes the related parsing bugs. We also had some tests that were providing queries in the wrong format, those have been fixed too.
Closes#13326Closes#14304
* Allow for multiple host specifications (e.g. _en0_,192.168.1.2,_site_).
* Add _site_ and _global_ scopes as counterparts to _local_.
* Warn on heuristic selection of publish address.
* Remove the arbitrary _non_loopback_ setting.
Closes#13954
The NotQueryBuilder has been deprecated on the 2.x branches
and can be removed with the next major version. It can be
replaced by boolean query with added mustNot() clause.
Closes#13761
This adds an API for force merging lucene segments. The `/_optimize` API is now
deprecated and replaced by the `/_forcemerge` API, which has all the same flags
and action, just a different name.
This commit removes some cache concurrency level settings that were
applicable when the cache was backed by the Guava cache implementation,
but no longer apply with the cache implementation completed in #13717.
Relates #7836, relates #13224, relates #13717
The `_create` API is handy way to specify an index operation should only be done if the document doesn't exist. This is currently implemented in explicit code paths all the way down to the engine. However, conceptually this is no different than any other versioned operation - instead of requiring a document is on a specific version, we require it to be deleted (or non-existent). This PR removes Engine.Create in favor of a slight extension in the VersionType logic.
There are however a couple of side effects:
- DocumentAlreadyExistsException is removed and VersionConflictException is used instead (with an improved error message)
- Update will reject version parameters if the upsert option is used (it doesn't compute anyway).
- Translog.Create is also removed infavor of Translog.Index (that's OK because their binary format was the same, so we can just read Translog.Index of the translog file)
Closes#13955
Types are still optional, but if you do provide them, they can't be null. Split the existing constructor that accepted nnull into two, one that accepts no arguments, and another one that accepts the types argument, which must be not null.
Also trimmed down different ways of setting ids, some were misleading as they would always add the ids to the existing ones and not set them, the add prefix makes that clear. Left `addIds` method that accepts a varargs argument. Added check for ids not be null.
TermsLookupQueryBuilder was left around only for bw comp reasons, but TermsQueryBuilder is its replacement. We can remove it now that it is clear query refactoring goes in master (3.0).
This commit fixes ping timeout settings inconsistencies in
ZenDiscovery. In particular, the documentation refers to the ping
timeout setting as discovery.zen.ping_timeout but the code was
ultimately using discovery.zen.ping.timeout if this was set.
This commit also changes all instances of the raw string
“discovery.zen.ping_timeout” to the constant
o.e.d.z.ZenDiscovery.SETTING_PING_TIMEOUT.
Finally, this commit removes the legacy setting
"discovery.zen.initial_ping_timeout".
Closes#6579, #9581, #9908
The current MoreLikeThisQueryBuilder validation checks for existence of at
least one `like` text or item. This is hard to check in setters, so this PR
tries to change the construction of the query so that we can do these checks
already at construction time.
Changing to using arrays for fieldnames, likeTexts, likeItems, unlikeTexts
and unlikeItems. `likeTexts` and/or `likeItems` need to be specified at
construction time to validate we have at least one item there.
Relates to #10217
Refactor the function_score query so it can be parsed on the coordinating node, split parse into fromXContent and toQuery, make FunctionScoreQueryBuilder Writeable.
Closes#13653
Moving validation from validate() to constructors and setters for the
following query builders:
* GeoDistanceQueryBuilder
* GeoDistanceRangeQueryBuilder
* GeoPolygonQueryBuilder
* GeoShapeQueryBuilder
* GeohashCellQuery
* TermsQueryBuilder
Relates to #10217
This PR is the second batch in moving the query validation we started
to collect in the validate() method to the corresponding setters
and constructors.
Requesting a million hits, or page 100,000 is always a bad idea, but users
may not be aware of this. This adds a per-index limit on the maximum size +
from that can be requested which defaults to 10,000.
This should not interfere with deep-scrolling.
Closes#9311
* Dropped ScoreType in favour of Lucene's ScoreMode
* Removed `score_type` option from `has_child` and `has_parent` queries in favour for the already existing `score_mode` option.
* Removed the score mode `sum` in favour for the already existing `total` score mode. (`sum` doesn't exist in Lucene's ScoreMode class)
* If `max_children` is set to `0` it now really means that zero children are allowed to match.
This add equals, hashcode, read/write methods, separates toQuery and JSON parsing and adds tests.
Also moving MatchQueryBuilder.Type to MatchQuery to MatchQuery, adding serialization and hashcode,
equals there.
Relates to #10217
Previously the parser could take any Term Vectors request, but this would be
not the case of the builder which would still use MultiGetRequest.Item. This
introduces a new Item class which is used by both the builder and parser.
Beyond that the rest is mostly cleanups such as:
1) Deprecating the ignoreLike methods, in favor to using unlike.
2) Deprecating and renaming MoreLikeThisBuilder#addItem to addLikeItem.
3) Ordering the methods of MoreLikeThisBuilder more logically.
This change is needed for the upcoming query refactoring of MLT.
Closes#13372
This is an intial commit that splits HasChildQueryParser / Builder into
the two seperate steps. This one is particularly nasty since it transports
a pretty wild InnerHits object that needs heavy refactoring. Yet, this commit
has still some nocommits and needs more tests and maybe another cleanup but
it's a start to get the code out there.
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.
Also had to do some testing and documentation around how _ttl works with
detect_noop.
Closes#11282
Until a couple of hours ago we expected the position_offset_gap to default
to 0 in 2.0 and 100 in 2.1. We decided it was worth backporting that new
default to 2.0. So now that its backported we need to teach 2.1 that 2.0
also defaults to 100.
Closes#7268
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.
Also postition_offset_gaps less than 0 aren't allowed at all.
New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through
Closes#7268
This move the `murmur3` field to the `mapper-murmur3` plugin and fixes its
defaults so that values will not be indexed by default, as the only purpose
of this field is to speed up `cardinality` aggregations on high-cardinality
string fields, which only requires doc values.
I also removed the `rehash` option from the `cardinality` aggregation as it
doesn't bring much value (rehashing is cheap) and allowed to remove the
coupling between the `cardinality` aggregation and the `murmur3` field.
Close#12874
Due to the limited abilities of parsing of dynamic (not configured) arguments
like `http.cors.enabled`, that dont map to a command line argument but will
become configuration, we need to mention explicitely, that those dynamic arguments
must come last.
Also fixed some mentions of a memory index setting, that does not exist anymore.
Closes#12758
In order to ensure, we have the same experience across operating systems
and shells, this commit uses the java CLI parser instead of the shell
getopt parsing to parse arguments.
This also allows for support for paths, which contain spaces.
Also commons-cli depdency was upgraded to 1.3.1 and tests have been added.
Changes
* new exit code, OK_AND_EXIT, allowing to tell the caller to exit, as everything
went as expected (e.g. when running a version output)
BWC breaking:
* execute() returns an ExitStatus instead of an integer, otherwise there is no
possibility to signal by a command, if the JVM should be exited after a run.
This affects plugins, that have command line tools
* -v used to be version, but is a verbose flag by default in the current CLI infra,
must be -V or --version now
* -X has been removed - the current implementation was useless anyway, as
it prefixed those properties with "es.". You should use
ES_JAVA_OPTS/JAVA_OPTS for JVM configuration
The TransportSingleCustomOperationAction `prefer_local` option has been removed as it isn't worth the effort.
The TransportSingleShardAction will execute the operation on the receiving node if a concrete list doesn't provide a list of candite shards routings to perform the operation on.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Moving the query building functionality from the parser to the builders
new doToQuery() method analogous to other recent query refactorings.
Relates to #10217Closes#12365
The `_index` field is now a completely virtual field thanks
to #12027. It is no longer necessary to index the actual value
of the index name.
closes#12329
Require urls for URL repository to be listed in repositories.url.allowed_urls setting. This change ensures that only authorized URLs can be accessed by elasticsearch
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Plugin Manager can now use another simplified form when a user wants to install an official plugin hosted at elasticsearch download service.
The form we use is:
```sh
bin/plugin install pluginname
```
As plugins share now the same version as elasticsearch, we can automatically guess what is the exact current version of the plugin manager script.
Also, download service will now use `/org.elasticsearch.plugins/pluginName/pluginName-version.zip` URL path to download a plugin.
If the older form is provided (`user/plugin/version` or `user/plugin`), we will still use:
* elasticsearch download service at `/user/plugin/plugin-version.zip`
* maven central with groupIp=user, artifactId=plugin and version=version
* github with user=user, repoName=plugin and tag=version
* github with user=user, repoName=plugin and branch=master if no version is set
Note that community plugin providers can use other download services by using `--url` option.
If you try to use the new form with a non core elasticsearch plugin, the plugin manager will reject
it and will give you all known core plugins.
```
Usage:
-u, --url [plugin location] : Set exact URL to download the plugin from
-i, --install [plugin name] : Downloads and installs listed plugins [*]
-t, --timeout [duration] : Timeout setting: 30s, 1m, 1h... (infinite by default)
-r, --remove [plugin name] : Removes listed plugins
-l, --list : List installed plugins
-v, --verbose : Prints verbose messages
-s, --silent : Run in silent mode
-h, --help : Prints this help message
[*] Plugin name could be:
elasticsearch-plugin-name for Elasticsearch 2.0 Core plugin (download from download.elastic.co)
elasticsearch/plugin/version for elasticsearch commercial plugins (download from download.elastic.co)
groupId/artifactId/version for community plugins (download from maven central or oss sonatype)
username/repository for site plugins (download from github master)
Elasticsearch Core plugins:
- elasticsearch-analysis-icu
- elasticsearch-analysis-kuromoji
- elasticsearch-analysis-phonetic
- elasticsearch-analysis-smartcn
- elasticsearch-analysis-stempel
- elasticsearch-cloud-aws
- elasticsearch-cloud-azure
- elasticsearch-cloud-gce
- elasticsearch-delete-by-query
- elasticsearch-lang-javascript
- elasticsearch-lang-python
```
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.
Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.
In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.
Closes#6158
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Currently parsing inner queries can return `null` which leads to unnecessary complicated
null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.
This new builder returns `null` when calling its `toQuery()` method, so at this point
we still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).
Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder.
Removed setters for positive/negatice query in favour of constructor with
mandatory non-null arguments in BoostingQueryBuilder.
Closes#11915
Following the discussion in #11744, move boost and query _name to base class AbstractQueryBuilder with their getters and setters. Unify their serialization code and equals/hashcode handling in the base class too. This guarantess that every query supports both _name and boost and nothing needs to be done around those in subclasses besides properly parsing the fields in the parsers and printing them out as part of the doXContent method in the builders. More specifically, these are the performed changes:
- Introduced printBoostAndQueryName utility method in AbstractQueryBuilder that subclasses can use to print out _name and boost in their doXContent method.
- readFrom and writeTo are now final methods that take care of _name and boost serialization. Subclasses have to implement doReadFrom and doWriteTo instead.
- toQuery is a final method too that takes care of properly applying _name and boost to the lucene query. Subclasses have to implement doToQuery instead. The query returned will have boost and queryName applied automatically.
- Removed BoostableQueryBuilder interface, given that every query is boostable after this change. This won't have any negative effect on filters, as the boost simply gets ignored in that case.
- Extended equals and hashcode to handle queryName and boost automatically as well.
- Update the query test infra so that queryName and boost are tested automatically, and whenever they are forgotten in parser or doXContent tests fail, so this makes things a lot less error-prone
- Introduced DEFAULT_BOOST constant to make sure we don't repeat 1.0f all the time for default boost values.
SpanQueryBuilder is again a marker interface only. The convenient toQuery that allowed us to override the return type to SpanQuery cannot be supported anymore due to a clash with the toQuery implementation from AbstractQueryBuilder. We have to go back to castin lucene Query to SpanQuery when dealing with span queries unfortunately.
Note that this change touches not only the already refactored queries but also the untouched ones, by making sure that we parse _name and boost whenever we need to and that we print them out as part of QueryBuilder#doXContent. This will result in printing out the default boost all the time rather than skipping it in non refactored queries, something that we would have changed anyway as part of the query refactoring.
The following are the queries that support boost now while previously they didn't (parser now parses it and builder prints it out): and, exists, fquery, geo_bounding_box, geo_distance, geo_distance_range, geo_hash_cell, geo_polygon, indices, limit, missing, not, or, script, type.
The following are the queries that support _name now while previously they didn't (parser now parses it and builder prints it out): boosting, constant_score, function_score, limit, match_all, type.
Range query parser supports now _name at the same level as boost too (_name is still supported on the outer object though for bw comp).
There are two exceptions that despite have getters and setters for queryName and boost don't really support boost and queryName: query filter and span multi term query. The reason for this is that they only support a single inner object which is another query that they wrap, no other elements.
Relates to #11744Closes#10776Closes#11974
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.
Relates #10971
As we now have an enum Operator that comes with many useful helper methods switching to use
that instead of the enums defined separately. Also switches to using the new enum's helper
methods where applicable removing duplicate parsing logic.
This breaks backwards compatibility. Documenting the break in
migrate_query_refactoring.asciidoc
Relates to #10217
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0. Users who need to
adjust these settings can use a date field.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings. In
this case this also includes FQueryFilterParser, since both queries are
closely related.
Relates to #10217Closes#11729
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation
Fixes#11478
While we had initially planned to keep rivers around in 2.0 to ease migration,
keeping support for rivers is challenging as it conflicts with other important
changes that we want to bring to 2.0 like synchronous dynamic mappings updates.
Nothing impossible to fix, but it would increase the complexity of how we
deal with dynamic mappings updates and manage rivers, while handling dynamic
mappings updates correctly is important for resiliency and rivers are on the go.
So removing rivers in 2.0 may well be a better trade-off.
The ResourceWatcher used settings prefixed `watcher.`, which
potentially could clash with the watcher plugin.
In order to prevent confusion, the settings have been renamed to
`resource.reload` prefixes.
This also uses the deprecation logging infrastructure introduced
in #11033 to log deprecated settings and their alternative at
startup.
Closes#11175
There are different ways to register custom query parsers through plugins, a couple of them work per index via index settings, which is probably even too flexible. There also three different ways to add a global custom query parser through either IndicesQueriesModule or IndicesQueriesRegistry. This commit consolidates the registration of custom query parsers via IndicesQueriesModule#addQuery(Class<? extends QueryParser>). The complexity of supporting parsers per index is not needed hence it got removed. Also the other ways of registering global custom parsers are dropped in favour of the one mentioned above.
Closes#11481
Some of our meta fields (such as _id, _version, ...) are returned as top-level
properties of the json document, while other properties (_timestamp, _routing,
...) are returned under `fields`. This commit makes all meta fields returned
as top-level properties.
So eg. `GET test/test/1?fields=_timestamp,foo` would now return
```json
{
"_index": "test",
"_type": "test",
"_id": "1",
"_version": 1,
"_timestamp": 10000000,
"found": true,
"fields": {
"foo": [ "bar" ]
}
}
```
while it used to return
```json
{
"_index": "test",
"_type": "test",
"_id": "1",
"_version": 1,
"found": true,
"fields": {
"_timestamp": 10000000,
"foo": [ "bar" ]
}
}
```
* Cut the `has_child` and `has_parent` queries over to use Lucene's query time global ordinal join. The main benefit of this change is that parent/child queries can now efficiently execute if parent/child queries are wrapped in a bigger boolean query. If the rest of the query only hit a few documents both has_child and has_parent queries don't need to evaluate all parent or child documents any more.
* Cut the `_parent` field over to use doc values. This significantly reduces the on heap memory footprint of parent/child, because the parent id values are never loaded into memory.
Breaking changes:
* The `type` option on the `_parent` field can only point to a parent type that doesn't exist yet, so this means that an existing type/mapping can't become a parent type any longer.
* The `has_child` and `has_parent` queries can no longer be use in alias filters.
All these changes, improvements and breaks in compatibility only apply for indices created with ES version 2.0 or higher. For indices creates with ES <= 2.0 the older implementation is used.
It is highly recommended to re-index all your indices with parent and child documents to benefit from all the improvements that come with this refactoring. The easiest way to achieve this is by using the scan and bulk apis using a simple script.
Closes#6107Closes#8134
The count api used to have its own execution path, although it would do the same (up to bugs!) of the search api. This commit makes it a shortcut to the search api with size set to 0. The change is made in a backwards compatible manner, by leaving all of the java api code around too, given that you may not want to get back a whole SearchResponse when asking only for number of hits matching a query, also cause migrating from countResponse.getCount() to searchResponse.getHits().totalHits() doesn't look great from a user perspective. We can always decide to drop more code around the count api if we want to break backwards compatibility on the java api, making it a shortcut on the rest layer only.
Closes#9117Closes#11198
This option is broken currently since it potentially interprets an incoming
binary value as compressed while it just happens that the first bytes are the
same as the LZF header.
Mappings conflicts should not be ignored. If I read the history correctly, this
option was added when a mapping update to an existing field was considered a
conflict, even if the new mapping was exactly the same. Now that mapping updates
are smart enough to detect conflicting options, we don't need an option to
ignore conflicts.
The default `false` for `require_field_match` is a bit odd and confusing for users, given that field names get ignored by default and every field gets highlighted if it contains terms extracted out of the query, regardless of which fields were queries. Changed the default to `true`, it can always be changed per request.
Closes#10627Closes#11067
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.
One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).
Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- some custom es queries won't be supported anymore, meaning they won't be highlighted. The only one I found up until now is the phrase_prefix. Postings highlighter rewrites against an empty reader to avoid slow operations (like the ones that we were performing with the fork that we are removing here), thus the prefix will not be expanded to any term. What the postings highlighter does instead is pulling the automata out of multi term queries, but this is not supported at the moment with our MultiPhrasePrefixQuery.
Closes#10625Closes#11077
These clauses filter the document space without affecting scoring and map to
Lucene's BooleanClause.Occur.FILTER. The `filtered` query is now deprecated and
```json
{
"filtered": {
"query": { //query },
"filter": { //filter }
}
}
```
should be replaced with
```json
{
"bool": {
"must": { //query },
"filter": { //filter }
}
}
```
A few meta fields can currently be set within a document's source.
However, the recommended way to set meta fields like this is through
the api, and setting within the document can be a performance trap
(e.g. needing to find _id in order to route the document).
This change removes the ability to set meta fields within
a document source for 2.0+ indexes.
closes#11051closes#11074
As a follow up to #10870, this removes support for
index templates on disk. It also removes a missed
place still allowing disk based mappings.
closes#11052
This commit makes queries and filters parsed the same way using the
QueryParser abstraction. This allowed to remove duplicate code that we had
for similar queries/filters such as `range`, `prefix` or `term`.
Removes the More Like This API, users should now use the More Like This query.
The MLT API tests were converted to their query equivalent. Also some clean
ups in MLT tests.
Closes#10736Closes#11003
Current features (eg. update API) and future features (eg. reindex API)
depend on _source. This change locks down the field so that
it can no longer be disabled. It also removes legacy settings
compress/compress_threshold.
closes#8142closes#10915
This change removes the multiple implementations of different admin interfaces and centralizes it with AbstractClient. It also makes sure *all* executions of actions now go through a single AbstractClient#execute method, taking care of copying headers and wrapping listener.
This also has the side benefit of removing all the code around differnet possible clients, and removes quite a bit of code (most of the + code is actually removal of generics and such).
This change also changes how TransportClient is constructed, requiring a Builder to create it, its a breaking change and its noted in the migration guide.
Yea another step towards simplifying the action infra and making it simpler...
This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
- custom cache keys (`_cache_key`) are unsupported
- decisions are made internally and can't be overridden by users ('_cache`)
- not only filters can be cached but also all queries that do not need scores
- parent/child queries can now be cached, however cached entries are only
valid for the current top-level reader so in practice it will likely only
be used on read-only indices
- the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
- better stats: we already had ram usage and evictions, but now also hit count,
miss count, lookup count, number of cached doc id sets and current number of
doc id sets in the cache
- dynamically changing the filter cache size is not supported anymore
Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).
Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
Using files that must be specified on each node is an anti-pattern
from the API based goal of ES. This change removes the ability
to specify the default mapping with a file on each node.
closes#10620
The assumption is that gaps in histogram are generally undesirable, for instance
if you want to build a visualization from it. Additionally, we are building new
aggregations that require that there are no gaps to work correctly (eg.
derivatives).
this commit removes the obsolete settings for distributors and updates
the documentation on multiple data.path. It also adds an explain to the
migration guide.
Relates to #9498Closes#10770
Regardless of the outcome of #8142, we should at least enforce that
when _source is enabled, it is sufficient to reindex. This change
removes the excludes and includes settings, since these modify
the source, causing us to lose the ability to reindex some fields.
closes#10814
Groovy sandboxing was disabled by default from 1.4.3 on though since we found out that it could be worked around, so it makes little sense to keep it and maintain it.
Closes#10156Closes#10480
This changes the default ranking behaviour of single-term queries on numeric fields to use the usual Lucene TermQuery scoring logic rather than a constant-scoring wrapper.
Closes#10628
Disabling doc values or trying to index hash values are not
correct uses of this the murmur3 field type, and just cause
problems. This disallows changing doc values or index options
for 2.0+.
closes#10465
In Lucene 5.1 lots of filters got deprecated in favour of equivalent queries.
Additionally, random-access to filters is now replaced with approximations on
scorers. This commit
- replaces the deprecated NumericRangeFilter, PrefixFilter, TermFilter and
TermsFilter with NumericRangeQuery, PrefixQuery, TermQuery and TermsQuery,
wrapped in a QueryWrapperFilter
- replaces XBooleanFilter, AndFilter and OrFilter with a BooleanQuery in a
QueryWrapperFilter
- removes DocIdSets.isBroken: the new two-phase iteration API will now help
execute slow filters efficiently
- replaces FilterCachingPolicy with QueryCachingPolicy
Close#8960
This is really a Collector instead of a filter. This commit deprecates the
`limit` filter, makes it a no-op and recommends to use the `terminate_after`
parameter instead that we introduced in the meantime.
Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective.
This commit switches to Lucene regular expressions instead of Java (not exactly
the same syntax yet most existing regular expressions should keep working) and
uses the same logic as RegExpQuery to intersect the regular expression with the
terms dictionary. I wrote a quick benchmark (in the PR) to make sure it made
things faster and the same request that took 750ms on master now takes 74ms with
this change.
Close#7526
For bacwards compatibility reasons routing_nodes were previously printed out when routing_table was requested, together with the actual routing_table. Now they are printed out only when requests through `routing_nodes` flag.
Relates to #10412Closes#10486
Removed the following methods from `ScriptService`, which don't require the `ScriptContext` argument:
```
public CompiledScript compile(String lang, String script, ScriptType scriptType)
public ExecutableScript executable(String lang, String script, ScriptType scriptType, Map<String, Object> vars)
public SearchScript search(SearchLookup lookup, String lang, String script, ScriptType scriptType, @Nullable Map<String, Object> vars)
```
Also removed the ScriptContext.Standard.GENERIC_PLUGIN enum value, as it was used only for backwards compatibility.
Plugins that make use of scripts should declare their own script contexts through `ScriptModule#registerScriptContext` and use them when compiling/executing scripts.
Closes#10476
This pull request makes boolean handled like dates and ipv4 addresses: things
are stored as as numerics under the hood and aggregations add some special
formatting logic in order to return true/false in addition to 1/0.
For example, here is an output of a terms aggregation on a boolean field:
```
"aggregations": {
"top_f": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": 0,
"key_as_string": "false",
"doc_count": 2
},
{
"key": 1,
"key_as_string": "true",
"doc_count": 1
}
]
}
}
```
Sorted numeric doc values are used under the hood.
Close#4678Close#7851
Now that fine-grained script settings are supported (#10116) we can remove support for the script.disable_dynamic setting.
Same result as `script.disable_dynamic: false` can be obtained as follows:
```
script.inline: on
script.indexed: on
```
An exception is thrown at startup when the old setting is set, so we make sure we tell users they have to change it rather than ignoring the setting.
Closes#10286
This commit brings the benefits of the `count` search type to search requests
that have a `size` of 0:
- a single round-trip to shards (no fetch phase)
- ability to use the query cache
Since `count` now provides no benefits over `query_then_fetch`, it has been
deprecated.
Close#7630
Deleting a type from an index is inherently dangerous because
the type can be recreated with new mappings which may conflict
with existing segments still using the old mappings. This
removes the ability to delete a type (similar to how deleting
fields within a type is not allowed, for the same reason).
closes#8877closes#10231
This commit changes the behaviour of the delete api when processing a delete request that refers to a type that has routing set to required in the mapping, and the routing is missing in the request. Up until now the delete api sent a broadcast delete request to all of the shards that belong to the index, making sure that the document could be found although the routing value wasn't specified. This was probably not the best choice: if the routing is set to required, an error should be thrown instead.
A `RoutingMissingException` gets now thrown instead, like it happens in the same situation with every other api (index, update, get etc.). Last but not least, this change allows to get rid of a couple of `TransportAction`s, `Request`s and `Response`s and simplify the codebase.
Closes#9123Closes#10136
There are two implications to this change.
First, percolator now uses _uid internally, extracting the id portion
when needed. Second, sorting on _id is no longer possible, since you
can no longer index _id. However, _uid can still be used to sort, and
is better anyways as indexing _id just to make it available to
fielddata for sorting is wasteful.
see #8143closes#9842
This change removes the deprecated script parameter names ('file', 'id', and 'scriptField').
It also removes the ability to load file scripts using the 'script' parameter. File scripts should be loaded using the 'script_file' parameter only.
This commit makes the `postings_format` and `doc_values_format` options of
mappings illegal on 2.0 and ignored on 1.x (meaning that the default postings
and doc values formats from the codec will be used in such a case).
This removes a fair amount of code.
Close#8746#9741
Removed the existing `pre_zone` and `post_zone` option in `date_histogram` in favor of
the simpler `time_zone` option. Previously, specifying different values for these could
lead to confusing scenarios where ES would return bucket keys that are not UTC.
Now `time_zone` is the only option setting, the calculation of date buckets to take place in the
preferred time zone, but after rounding converting the bucket key values back to UTC.
Closes#9062Closes#9637
When multiple fields under object fields share the same name, accessing
by short name is ambiguous. This removes support for short names,
always requiring the full name when used in queries.
closes#8872
Add offset option to 'date_histogram' replacing and simplifying the previous 'pre_offset' and 'post_offset' options.
This change is part of a larger clean up task for `date_histogram` from issue #9062.
This change makes the response API object for Histogram Aggregations the same for all types of Histogram, and does the same for all types of Ranges.
The change removes getBucketByKey() from all aggregations except filters and terms. It also reduces the methods on the Bucket class to just getKey() and getKeyAsString().
The getKey() method returns Object and the actual Type is returns will be appropriate for the type of aggregation being run. e.g. date_histogram will return a DateTime for this method and Histogram will return a Number.
Today we give the HTTP status back within the HTTP response itself and within the JSON response as well:
```sh
curl localhost:9200/
```
```js
{
"status" : 200,
"name" : "Red Wolf",
"version" : {
"number" : "2.0.0",
"build_hash" : "6837a61d8a646a2ac7dc8da1ab3c4ab85d60882d",
"build_timestamp" : "2014-08-19T13:55:56Z",
"build_snapshot" : true,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
```
Before Elasticsearch 1.0, the type was allowed to be passed as the root
element when uploading a document. However, this was ambiguous if the
mappings also contained a field with the same name as the type. The
behavior was changed in 1.0 to not allow this, but a setting was added
for backwards compatibility. This change removes the setting for 2.0.
The header indicates to how many shard copies (primary and replicas shards) a write was supposed to go to, to how many
shard copies to write succeeded and potentially captures shard failures if writing into a replica shard fails.
For async writes it also includes the number of shards a write is still pending.
Closes#7994
This is our only cache which is not 'exact' and might allow for stalled results.
Additionally, a similar cache that we have and needs to perform lookups in other
indices in order to run queries is the script index, and for this index we rely
on the filesystem cache, so we should probably do the same with terms filters
lookups.
Close#9056
Related to #8667:
Some QueryBuilders have been deprecated in 1.x branches. We removed them in 2.0.
Removed
-------
* `textPhrase(...)`
* `textPhrasePrefix(...)`
* `textPhrasePrefixQuery(...)`
* `filtered(...)`
* `inQuery(...)`
* `commonTerms(...)`
* `queryString(...)`
* `simpleQueryString(...)`
Closes#8721.
We speak of the term vectors of a document, where each field has an associated
stored term vector. Since by default we are requesting all the term vectors of
a document, the HTTP request endpoint should rather be called `_termvectors`
instead of `_termvector`. The usage of `_termvector` is now deprecated, as
well as the transport client call to termVector and prepareTermVector.
Closes#8484
Previous to this change all features (_alias,_mapping,_settings,_warmer) are run regardless of which features are actually requested. This change fixes the request object to resolve this bug
We currently use the djb2 hash function in order to compute the shard a
document should go to. Unfortunately this hash function is not very
sophisticated and you can sometimes hit adversarial cases, such as numeric ids
on 33 shards.
Murmur3 generates hashes with a better distribution, which should avoid the
adversarial cases.
Here are some examples of how 100000 incremental ids are distributed to shards
using either djb2 or murmur3.
5 shards:
Murmur3: [19933, 19964, 19940, 20030, 20133]
DJB: [20000, 20000, 20000, 20000, 20000]
3 shards:
Murmur3: [33185, 33347, 33468]
DJB: [30100, 30000, 39900]
33 shards:
Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]
Even if djb2 looks ideal in some cases (5 shards), the fact that the
distribution of its hashes has some patterns can raise issues with some shard
counts (eg. 3, or even worse 33).
Some tests have been modified because they relied on implementation details of
the routing hash function.
Close#7954
The MLT field query is simply replaced by a MLT query set to specififc field.
To simplify code maintenance we should deprecate it in 1.4 and remove it in
2.0.
Closes#8238
Returns information about settings, aliases, warmers, and mappings. Basically returns the IndexMetadata. This new endpoint replaces the /{index}/_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers and /_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers endpoints whilst maintaining the same response formats. The only exception to this is on the /_alias|_aliases|_warmer|_warmers endpoint which will now return a section for 'aliases' or 'warmers' even if no aliases or warmers exist. This backwards compatibility change is documented in the reference docs.
Closes#4069