Removes the More Like This API, users should now use the More Like This query.
The MLT API tests were converted to their query equivalent. Also some clean
ups in MLT tests.
Closes#10736Closes#11003
This removes Elasticsearch's filter cache and uses Lucene's instead. It has some
implications:
- custom cache keys (`_cache_key`) are unsupported
- decisions are made internally and can't be overridden by users ('_cache`)
- not only filters can be cached but also all queries that do not need scores
- parent/child queries can now be cached, however cached entries are only
valid for the current top-level reader so in practice it will likely only
be used on read-only indices
- the cache deduplicates filters, which plays nicer with large keys (eg. `terms`)
- better stats: we already had ram usage and evictions, but now also hit count,
miss count, lookup count, number of cached doc id sets and current number of
doc id sets in the cache
- dynamically changing the filter cache size is not supported anymore
Internally, an important change is that it removes the NoCacheFilter infrastructure
in favour of making Query.rewrite specializing the query for the current reader so
that it will only be cached on this reader (look for IndexCacheableQuery).
Note that consuming filters with the query API (createWeight/scorer) instead of
the filter API (getDocIdSet) is important for parent/child queries because
otherwise a QueryWrapperFilter(ParentQuery) would run the wrapped query per
segment while relations might be cross segments.
The current implementation is dangerous: it unexpectedly refreshes,
which can quickly cause an unhealthy index (segment explosion). It
can also delete different documents on primary vs replicas, causing
inconsistent replicas.
For 2.0 we will replace this with an optional plugin that does a
scan/scroll search and then issues bulk delete requests.
Closes#10859
The field stats api returns field level statistics such as lowest, highest values and number of documents that have at least one value for a field.
An api like this can be useful to explore a data set you don't know much about. For example you can figure at with the lowest and highest response times are, so that you can create a histogram or range aggregation with sane settings.
This api doesn't run a search to figure this statistics out, but rather use the Lucene index look these statics up (using Terms class in Lucene). So finding out these stats for fields is cheap and quick.
The min/max values are based on the type of the field. So for a numeric field min/max are numbers and date field the min/max date and other fields the min/max are term based.
Closes#10523
This option defaults to false, because it is also important to upgrade
the "merely old" segments since many Lucene improvements happen within
minor releases.
But you can pass true to do the minimal work necessary to upgrade to
the next major Elasticsearch release.
The HTTP GET upgrade request now also breaks out how many bytes of
ancient segments need upgrading.
Closes#10213Closes#10540
Conflicts:
dev-tools/create_bwc_index.py
rest-api-spec/api/indices.upgrade.json
src/main/java/org/elasticsearch/action/admin/indices/optimize/OptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/ShardOptimizeRequest.java
src/main/java/org/elasticsearch/action/admin/indices/optimize/TransportOptimizeAction.java
src/main/java/org/elasticsearch/index/engine/InternalEngine.java
src/test/java/org/elasticsearch/bwcompat/StaticIndexBackwardCompatibilityTest.java
src/test/java/org/elasticsearch/index/engine/InternalEngineTests.java
src/test/java/org/elasticsearch/rest/action/admin/indices/upgrade/UpgradeReallyOldIndexTest.java
Cluster state api returns both routing_table and routing_nodes sections whenever routing_table is requested. That is pretty much the same info, just grouped differently. This commit allows to differentiate between the two. Yet, routing_table still returns both for bw comp reasons.
Closes#10352Closes#10412
This commit brings the benefits of the `count` search type to search requests
that have a `size` of 0:
- a single round-trip to shards (no fetch phase)
- ability to use the query cache
Since `count` now provides no benefits over `query_then_fetch`, it has been
deprecated.
Close#7630
The analysis chain should be used instead of relying on this, as it is
confusing when dealing with different per-field analysers.
The `locale` option was only used for `lowercase_expanded_terms`, which,
once removed, is no longer needed, so it was removed as well.
Fixes#9978
Relates to #9973
Whenever we have an api that supports GET with a body, we always support the POST method too, as well as providing the body as a query_string parameter called `source`. Our REST spec should reflect this convention. FIxed them and introduced a hard check at parse time in our Java REST tests runner, which will cause the tests to fail if spec are not compliant.
Closes#9629
The `full` option and `FlushType.NEW_WRITER` only exists to allow
realtime changes to two settings (`index.codec` and `index.concurrency`).
Those settings are very expert and don't really need to be updateable
in realtime.
Adding missing support for the multi-index query parameters 'ignore_unavailable',
'allow_no_indices' and 'expand_wildcards' to '_cluster/state' API. These
parameters are supposed to be supported for APIs that work across multiple indices.
So far overwriting the default settings per REST call was not possible which is
fixed here.
Closes#5229Closes#9295
Add a new ignore_idle_threads boolean option (default true) to
/_nodes/hot_threads, to filter out threads in known idle places like
waiting on a socket select or on pulling the next task from an empty
queue.
Closes#8985Closes#8908
This commit adds support for version and version_type to the Term Vectors API.
This could be useful in the following case whereby the user gets a document
and later wants to generate its TVs. With version, this would ensure that only
the TVs of that particular document are generated, and error out if the
document has been updated in between.
Closes#7480
We speak of the term vectors of a document, where each field has an associated
stored term vector. Since by default we are requesting all the term vectors of
a document, the HTTP request endpoint should rather be called `_termvectors`
instead of `_termvector`. The usage of `_termvector` is now deprecated, as
well as the transport client call to termVector and prepareTermVector.
Closes#8484
* `get_upgrade` => `GET _upgrade` -- Return the status
* `upgrade` => `POST _upgrade` -- Perform the operation
Original specification part of c021f22523.
Related: #7884, #7922
This commit does the following:
* Add the new API at the rest layer, being backed by the optimize API
with upgrade flag, and segments api to find upgrade status.
* Add `upgrade` flag to optimize API, and deprecate `force` flag (will
remove in master)
* Add test for both synchronous and async upgrade
closes#7884closes#7922
By default term vectors are now realtime, as opposed to previously near
realtime. If they are not found in the index, they will be generated on the
fly. The document is fetched from the transaction log and treated as an
artificial document. One can set `realtime` parameter to `false` in order to
disable this functionality. This consequently makes the MLT query realtime in
fetching documents, as it previsouly used to be before switching from using
the multi get API to the mtv API.
Closes#7846