Lucene deprecated this in 4.0 and we only try best effort to support it.
Folks should only use edit distance rather than some length based
similarity. Yet the formular is simple enough such that users can
still do it in the client if they really need to.
Closes#10638
Change the default delayed allocation timeout from 0 (no delayed allocation) to 1m. The value came from a test of having a node with 50 shards being indexed into (so beefy translog requiring flush on shutdown), then shutting it down and starting it back up and waiting for it to join the cluster. This took, on a slow machine, about 30s.
The value is conservatively low and does not try to address a virtual machine / OS restart for now, in order to not have the affect of node going away and users being concerned that shards are not being allocated to the rest of the cluster as a result of that. The setting can always be changed in order to increase the delayed allocation if needed.
closes#12166
This PR is a simple doc patch to explicitly mention with an example of
how to create an alias using a glob pattern. This comes up from
time-to-time with our customers and in the community and although
mentioned in the documentation already, is not obvious.
Also mention that the alias will not auto-update as indices matching the
glob change.
Closes#12175Closes#12176
ignore_above is used to guard against the lucene limitation
that a term cannot exceed 32766 bytes.
However, the implementation just used the character count, which
doesn't take into account the fact that some characters have
multi-byte utf-8 encodings.
This commit updates the docs to make this relationship clear.
Closes#11563
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Add support for retrieving fields in bulk updates
This commit adds support to retrieve fields when using the bulk update API. This functionality was previously available for the update API
but not for the bulk update API.
Closes#11527
This commit adds support to retrieve fields when using the bulk update API. This functionality was previously available for the update API
but not for the bulk update API.
Closes#11527
Fixed documentation since the default rewrite method for fuzzy queries is to
select top terms, fixed usage of the fuzzy rewrite method, and removed unused
`rewrite` parameter.
Close#6932
This rewrite method is interesting because it computes scores as if all terms
had the same frequencies, which avoids disappointments with ranking when a fuzzy
query ranks typos first given that they are less frequent than the correct term.
Plugin Manager can now use another simplified form when a user wants to install an official plugin hosted at elasticsearch download service.
The form we use is:
```sh
bin/plugin install pluginname
```
As plugins share now the same version as elasticsearch, we can automatically guess what is the exact current version of the plugin manager script.
Also, download service will now use `/org.elasticsearch.plugins/pluginName/pluginName-version.zip` URL path to download a plugin.
If the older form is provided (`user/plugin/version` or `user/plugin`), we will still use:
* elasticsearch download service at `/user/plugin/plugin-version.zip`
* maven central with groupIp=user, artifactId=plugin and version=version
* github with user=user, repoName=plugin and tag=version
* github with user=user, repoName=plugin and branch=master if no version is set
Note that community plugin providers can use other download services by using `--url` option.
If you try to use the new form with a non core elasticsearch plugin, the plugin manager will reject
it and will give you all known core plugins.
```
Usage:
-u, --url [plugin location] : Set exact URL to download the plugin from
-i, --install [plugin name] : Downloads and installs listed plugins [*]
-t, --timeout [duration] : Timeout setting: 30s, 1m, 1h... (infinite by default)
-r, --remove [plugin name] : Removes listed plugins
-l, --list : List installed plugins
-v, --verbose : Prints verbose messages
-s, --silent : Run in silent mode
-h, --help : Prints this help message
[*] Plugin name could be:
elasticsearch-plugin-name for Elasticsearch 2.0 Core plugin (download from download.elastic.co)
elasticsearch/plugin/version for elasticsearch commercial plugins (download from download.elastic.co)
groupId/artifactId/version for community plugins (download from maven central or oss sonatype)
username/repository for site plugins (download from github master)
Elasticsearch Core plugins:
- elasticsearch-analysis-icu
- elasticsearch-analysis-kuromoji
- elasticsearch-analysis-phonetic
- elasticsearch-analysis-smartcn
- elasticsearch-analysis-stempel
- elasticsearch-cloud-aws
- elasticsearch-cloud-azure
- elasticsearch-cloud-gce
- elasticsearch-delete-by-query
- elasticsearch-lang-javascript
- elasticsearch-lang-python
```
This pipeline aggregation runs a script on each bucket in the parent aggregation to determine whether the bucket is kept in the final aggregation tree. If the script returns true the bucket is retained, if it returns false the bucket is dropped
If you are using the default date or the named identifiers of dates,
the current implementation was allowed to read a year with only one
digit. In order to make this more strict, this fixes a year to be at
least 4 digits. Same applies for month, day, hour, minute, seconds.
Also the new default is `strictDateOptionalTime` for indices created
with Elasticsearch 2.0 or newer.
In addition a couple of not exposed date formats have been exposed, as they
have been mentioned in the documentation.
Closes#6158
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings.
Relates to #10217
Currently parsing inner queries can return `null` which leads to unnecessary complicated
null checks further up the query hierarchy. By introducing a special EmptyQueryBuilder
that can be used to signal that this query clause should be ignored upstream where possible,
we can avoid additional null checks in parent query builders and still allow for this query
to be ignored when creating the lucene query later.
This new builder returns `null` when calling its `toQuery()` method, so at this point
we still need to handle it, but having an explicit object for the intermediate query
representation makes it easier to differentiate between cases where inner query clauses are
empty (legal) or are not set at all (illegal).
Also added check for empty inner builder list to BoolQueryBuilder and OrQueryBuilder.
Removed setters for positive/negatice query in favour of constructor with
mandatory non-null arguments in BoostingQueryBuilder.
Closes#11915
The `:ref:` link in java-api doc is connected to `current` version which is at this time `1.6`.
This commit patch this.
That being said, we might have to change it again once master will become `current` doc.
This commit reorganizes the docs to make Java API docs looking more like the REST docs.
Also, with 2.0.0, FilterBuilders don't exist anymore but only QueryBuilders.
Also, all docs api move now to docs/java-api/docs dir as for REST doc.
Remove removed queries/filters
-----
* Remove Constant Score Query with filter
* Remove Fuzzy Like This (Field) Query (flt and flt_field)
* Remove FilterBuilders
Move filters to queries
-----
* Move And Filter to And Query
* Move Bool Filter to Bool Query
* Move Exists Filter to Exists Query
* Move Geo Bounding Box Filter to Geo Bounding Box Query
* Move Geo Distance Filter to Geo Distance Query
* Move Geo Distance Range Filter to Geo Distance Range Query
* Move Geo Polygon Filter to Geo Polygon Query
* Move Geo Shape Filter to Geo Shape Query
* Move Has Child Filter by Has Child Query
* Move Has Parent Filter by Has Parent Query
* Move Ids Filter by Ids Query
* Move Limit Filter to Limit Query
* Move MatchAll Filter to MatchAll Query
* Move Missing Filter to Missing Query
* Move Nested Filter to Nested Query
* Move Not Filter to Not Query
* Move Or Filter to Or Query
* Move Range Filter to Range Query
* Move Ids Filter to Ids Query
* Move Term Filter to Term Query
* Move Terms Filter to Terms Query
* Move Type Filter to Type Query
Add missing queries
-----
* Add Common Terms Query
* Add Filtered Query
* Add Function Score Query
* Add Geohash Cell Query
* Add Regexp Query
* Add Script Query
* Add Simple Query String Query
* Add Span Containing Query
* Add Span Multi Term Query
* Add Span Within Query
Reorganize the documentation
-----
* Organize by full text queries
* Organize by term level queries
* Organize by compound queries
* Organize by joining queries
* Organize by geo queries
* Organize by specialized queries
* Organize by span queries
* Move Boosting Query
* Move DisMax Query
* Move Fuzzy Query
* Move Indices Query
* Move Match Query
* Move Mlt Query
* Move Multi Match Query
* Move Prefix Query
* Move Query String Query
* Move Span First Query
* Move Span Near Query
* Move Span Not Query
* Move Span Or Query
* Move Span Term Query
* Move Template Query
* Move Wildcard Query
Add some missing pages
----
* Add multi get API
* Add indexed-scripts link
Also closes#7826
Related to https://github.com/elastic/elasticsearch/pull/11477#issuecomment-114745934
The work around for resolving `now` doesn't need to be used for aliases, becuase alias filters are parsed at search time. However it can't be removed, because the percolator relies on it.
Parent/child can be specified again in alias filters, this now works again because alias filters are parsed at search time. Parent/child will also use the late query parse work around, to make sure to do the final preparations when the search context is around. This allows the aliases api to validate the parent/child queries without failing because there is no search context.
Closes#10485
Following the discussion in #11744, move boost and query _name to base class AbstractQueryBuilder with their getters and setters. Unify their serialization code and equals/hashcode handling in the base class too. This guarantess that every query supports both _name and boost and nothing needs to be done around those in subclasses besides properly parsing the fields in the parsers and printing them out as part of the doXContent method in the builders. More specifically, these are the performed changes:
- Introduced printBoostAndQueryName utility method in AbstractQueryBuilder that subclasses can use to print out _name and boost in their doXContent method.
- readFrom and writeTo are now final methods that take care of _name and boost serialization. Subclasses have to implement doReadFrom and doWriteTo instead.
- toQuery is a final method too that takes care of properly applying _name and boost to the lucene query. Subclasses have to implement doToQuery instead. The query returned will have boost and queryName applied automatically.
- Removed BoostableQueryBuilder interface, given that every query is boostable after this change. This won't have any negative effect on filters, as the boost simply gets ignored in that case.
- Extended equals and hashcode to handle queryName and boost automatically as well.
- Update the query test infra so that queryName and boost are tested automatically, and whenever they are forgotten in parser or doXContent tests fail, so this makes things a lot less error-prone
- Introduced DEFAULT_BOOST constant to make sure we don't repeat 1.0f all the time for default boost values.
SpanQueryBuilder is again a marker interface only. The convenient toQuery that allowed us to override the return type to SpanQuery cannot be supported anymore due to a clash with the toQuery implementation from AbstractQueryBuilder. We have to go back to castin lucene Query to SpanQuery when dealing with span queries unfortunately.
Note that this change touches not only the already refactored queries but also the untouched ones, by making sure that we parse _name and boost whenever we need to and that we print them out as part of QueryBuilder#doXContent. This will result in printing out the default boost all the time rather than skipping it in non refactored queries, something that we would have changed anyway as part of the query refactoring.
The following are the queries that support boost now while previously they didn't (parser now parses it and builder prints it out): and, exists, fquery, geo_bounding_box, geo_distance, geo_distance_range, geo_hash_cell, geo_polygon, indices, limit, missing, not, or, script, type.
The following are the queries that support _name now while previously they didn't (parser now parses it and builder prints it out): boosting, constant_score, function_score, limit, match_all, type.
Range query parser supports now _name at the same level as boost too (_name is still supported on the outer object though for bw comp).
There are two exceptions that despite have getters and setters for queryName and boost don't really support boost and queryName: query filter and span multi term query. The reason for this is that they only support a single inner object which is another query that they wrap, no other elements.
Relates to #11744Closes#10776Closes#11974
The filters aggregation now has an option to add an 'other' bucket which will, when turned on, contain all documents which do not match any of the defined filters. There is also an option to change the name of the 'other' bucket from the default of '_other_'
Closes#11289
Field stats index constraints allows to omit all field stats for indices that don't match with the constraint. An index
constraint can exclude indices' field stats based on the `min_value` and `max_value` statistic. This option is only
useful if the `level` option is set to `indices`.
For example index constraints can be useful to find out the min and max value of a particular property of your data in
a time based scenario. The following request only returns field stats for the `answer_count` property for indices
holding questions created in the year 2014:
curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
"fields" : ["answer_count"] <1>
"index_constraints" : { <2>
"creation_date" : { <3>
"min_value" : { <4>
"gte" : "2014-01-01T00:00:00.000Z",
},
"max_value" : {
"lt" : "2015-01-01T00:00:00.000Z"
}
}
}
}'
Closes#11187
In order to be more consistent with what they do, the query cache has been
renamed to request cache and the filter cache has been renamed to query
cache.
A known issue is that package/logger names do no longer match settings names,
please speak up if you think this is an issue.
Here are the settings for which I kept backward compatibility. Note that they
are a bit different from what was discussed on #11569 but putting `cache` before
the name of what is cached has the benefit of making these settings consistent
with the fielddata cache whose size is configured by
`indices.fielddata.cache.size`:
* index.cache.query.enable -> index.requests.cache.enable
* indices.cache.query.size -> indices.requests.cache.size
* indices.cache.filter.size -> indices.queries.cache.size
Close#11569
Today, we disable CORS by default, but if a user simply enables CORS their instance of
elasticsearch will allow cross origin requests from anywhere, as the default value for allowed
origins is `*`.
This changes the default to be `null` so that no origins are allowed and the user must explicitly
specify the origins they wish to allow requests from. The documentation also mentions that there
is a security risk in using `*` as the value.
Closes#11169
In order to be backwards compatible, indices created before 2.x must support
indexing of a unix timestamp and its configured date format. Indices created
with 2.x must configure the `epoch_millis` date formatter in order to
support this.
Relates #10971
This adds a new pipeline aggregation, the cumulative sum aggregation. This is a parent aggregation which must be specified as a sub-aggregation to a histogram or date_histogram aggregation. It will add a new aggregation to each bucket containing the sum of a specified metrics over this and all previous buckets.
As we now have an enum Operator that comes with many useful helper methods switching to use
that instead of the enums defined separately. Also switches to using the new enum's helper
methods where applicable removing duplicate parsing logic.
This breaks backwards compatibility. Documenting the break in
migrate_query_refactoring.asciidoc
Relates to #10217
This commit consolidates several abstractions on the shard level in
ordinary classes not managed by the shard level guice injector.
Several classes have been collapsed into IndexShard and IndexShardGatewayService
was cleaned up to be more lightweight and self-contained. It has also been moved into
the index.shard package and it's operation is renamed from recovery from "gateway" to recovery
from "store" or "shard_store".
Closes#11847
Since there is a recommended version of JDK, it would be helpful to provide a link to the Oracle documentation. Since there are many versions of Java, those that are new or infrequent users of Java would find the link helpful. Thanks!
Closes#11792
we currently don't expose this.
This adds the following to the OS section of `_nodes`:
```
"os": {
"name": "Mac OS X",
...
}
```
and the following to the OS section of `_cluster/stats`:
```
"os": {
...
"names": [
{
"name": "Mac OS X",
"count": 1
}
],
...
},
```
Closes#11807
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0. Users who need to
adjust these settings can use a date field.
Moving the query building functionality from the parser to the builders
new toQuery() method analogous to other recent query refactorings. In
this case this also includes FQueryFilterParser, since both queries are
closely related.
Relates to #10217Closes#11729
This fixes an issue to allow for negative unix timestamps.
An own printer for epochs instead of just having a parser has been added.
Added docs that only 10/13 length unix timestamps are supported
Added docs in upgrade documentation
Fixes#11478
Replace the previous example which leveraged a range filter, which causes unnecessary confusion about when to use a range filter to create a single bucket or a range aggregation with exactly one member in ranges.
Closes#11704
I was unable to get my BulkProcessor script to work without importing the "ByteSizeUnit" and "ByteSizeValue" classes. Perhaps I overlooked something in the example and do not understand its code.
This changes the parameter name `ignore_like` to the more user friendly name
`unlike`. This later feature generates a query from the terms in `A` but not
from the terms in `B`. This translates to a result set which is like `A` but
unlike `B`. We could have further negatively boosted any documents that have
some `B`, but these documents already do not receive any contribution from
having `B`, and would therefore negatively compete with documents having `A`.
Closes#11117
* QueryBuilders.queryString is now QueryBuilders.queryStringQuery
* DateHistogram.Interval is now DateHistogramInterval
* Refactoring of buckets in aggs
* FilterBuilders has been replaced by QueryBuilders
Closes#9976.
Now that doc values are the default for fielddata, specialized in-memory
formats are becoming an esoteric option. This commit removes such formats:
- `fst` on string fields,
- `compressed` on geo points.
I also removed documentation and tests that the fielddata cache is shared if
you change the format, since this is only true for in-memory fielddata formats
(given that for doc values, the caching is done directly in Lucene).
Information about in-progress snapshot and restore processes is not really metadata and should be represented as a part of the cluster state similar to discovery nodes, routing table, and cluster blocks. Since in-progress snapshot and restore information is no longer part of metadata, this refactoring also enables us to handle cluster blocks in more consistent manner and allow creation of snapshots of a read-only cluster.
Closes#8102
Today we provide the ability to plug in MergePolicy and
we provide the once lucene ships with. We do not recommend to change
the default and even only a small number of expert users would ever touch
this. This commit removes the ancient log byte size and log doc count
merge policy providers, simplifies the MergePolicy wiring and makes the
tiered MP the one and only default. All notions of a merge policy has been
removed from the docs and should be deprecated in the previous version.
Closes#11588
While we had initially planned to keep rivers around in 2.0 to ease migration,
keeping support for rivers is challenging as it conflicts with other important
changes that we want to bring to 2.0 like synchronous dynamic mappings updates.
Nothing impossible to fix, but it would increase the complexity of how we
deal with dynamic mappings updates and manage rivers, while handling dynamic
mappings updates correctly is important for resiliency and rivers are on the go.
So removing rivers in 2.0 may well be a better trade-off.
The ResourceWatcher used settings prefixed `watcher.`, which
potentially could clash with the watcher plugin.
In order to prevent confusion, the settings have been renamed to
`resource.reload` prefixes.
This also uses the deprecation logging infrastructure introduced
in #11033 to log deprecated settings and their alternative at
startup.
Closes#11175
There are different ways to register custom query parsers through plugins, a couple of them work per index via index settings, which is probably even too flexible. There also three different ways to add a global custom query parser through either IndicesQueriesModule or IndicesQueriesRegistry. This commit consolidates the registration of custom query parsers via IndicesQueriesModule#addQuery(Class<? extends QueryParser>). The complexity of supporting parsers per index is not needed hence it got removed. Also the other ways of registering global custom parsers are dropped in favour of the one mentioned above.
Closes#11481
In #10918, we introduced the prompt placeholders. These were had a different format
than our existing placeholders. This changes the prompt placeholders to follow the
format of the existing placeholders.
Relates to #11455