This adds a `per_field_analyzer` parameter to the Term Vectors API, which
allows to override the default analyzer at the field. If the field already
stores term vectors, then they will be re-generated. Since the MLT Query uses
the Term Vectors API under its hood, this commits also adds the same ability
to the MLT Query, thereby allowing users to fine grain how each field item
should be processed and analyzed.
Closes#7801
The randomisation that deletes documents is also removed from tests as this doc-accounting change would mean the specific scores being expected in tests would now be subject to random variability and so fail.
Closes#7951
When asking for `GET /_cat/indices?v`, you can now retrieve closed indices in addition to opened ones.
```
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open .marvel-2014.05.21 1 1 8792 0 21.7mb 21.7mb
close test
yellow open .marvel-2014.05.22 1 1 3871 0 10.7mb 10.7mb
red open .marvel-2014.05.27 1 1
```
Closes#7907.
Closes#7936.
`took` is computed based on the system clock and can be negative if the clock
time was updated during the execution of the search request. This commit
protects against these cases by replacing `took` with 1 if the elapsed time is
negative.
Close#7968
By default term vectors are now realtime, as opposed to previously near
realtime. If they are not found in the index, they will be generated on the
fly. The document is fetched from the transaction log and treated as an
artificial document. One can set `realtime` parameter to `false` in order to
disable this functionality. This consequently makes the MLT query realtime in
fetching documents, as it previsouly used to be before switching from using
the multi get API to the mtv API.
Closes#7846
Currently DEBUG logs can get very verbose because IndicesClusterStateService logs the complete mapping with every mapping update. We should suppress it if long in DEBUG mode and always log the full one in TRACE.
Closes#7949
When sending a multicast ping, there is no way to determine how long it will take before all nodes will respond. Currently we send two pings (one at start, one after half timeout) and wait until the ping timeout has passed for all responses to come back. However, if all nodes are fast to respond, there is a gap relatively large between the moment that pings were gathered and the election that is based on them. This commits adds a last ping round (at timeout) where we know the number of nodes we expect to receive answers from. Once all nodes responded, we complete the pinging.
Closes#7924
Due to component start order we may process an incoming ping while the ZenDiscovery module is not yet started. This leads to exception (from which we recover correctly, but the logs are note nice). UnicastZenPing should only start processing pings if it is started. We previously processed if not closed or stopped.
Closes#7950
This commit makes the lookup structures that are used for mappings immutable.
When changes are required, a new instance is created while the current instance
is left unmodified. This is done efficiently thanks to a hash table
implementation based on a array hash trie, see
org.elasticsearch.common.collect.CopyOnWriteHashMap.
ManyMappingsBenchmark returns indexing times that are similar to the ones that
can be observed in current master.
Ultimately, I would like to see if we can make mappings completely immutable as
well and updated atomically. This is not trivial however, eg. because of dynamic
mappings. So here is a first baby step that should help move towards that
direction.
Close#7486
Don't insist on log file removal until after usage is printed.
Some simple Python code improvements (x.find(y) != -1 --> y in x)
Make sure the git area is "clean" (has no unpushed changes, has pulled
all changes, has no untracked files)
Add label color detail when creating next github version label.
Closes#7913
The Put Warmer API executes the search encapsulated in the warmer before accepting it. This requires that at least one shard will be started. The tests used to use ensureGreen to check for that because of a publish timeout of 0 (needed to check the ack mechanism) that doesn't guarantee the shard is really started - just that the master has changed the CS to say so. This commit changes the ensureGreen to a the indexing of a single document.
Adds a check to make sure that all ids in the query are either strings
or numbers. This is to prevent the case where a user accidentally
specifies:
"ids": [["1", "2"]]
(note the double array)
With this change, an exception will be thrown since the second "[" is
not a string or number, it is a Token.START_ARRAY.
Fixes#7686
The PluginManager had a subtle bug in case the config directory was not in the
es home directory - which is always true in case of packaging.
This fixes the plugin manager, so that when specifying a path.home and a
path.conf variable on the commandline, the plugin manager acts
appropriately.
Before this change all persistent custom metadata is stored as part of snapshot. It requires us to remove repositories metadata later during recovery process. This change allows custom metadata to specify whether or not it should be stored as part of a snapshot.
Fixes#7900
Failing a parent breaker check is eventually consistent, so the test
could fail the parent limit, throw an exception, and before being
adjusted back down, increment more and throw a circuit breaking
exception on the child. This increases the child's limit, to ensure
we're only testing the parent limit.
It adds an additional assert to ensure that the breaker total is
correctly re-adjusted when the parent breaker has been tripped.
This does the following:
* Make 'force' flag only build a merge if the delegate MP returned no merges
* Add async handling for 'flush' when 'waitForMerges' is false
* Remove flush at the beginning of optimize. This is something the user can
do if they wish, before calling optimize.
closes#7886closes#7904closes#7920
Remove the creation of a node client if not there before each test through setup method. `numClientNodes` makes sure that the client node gets created during suite cluster initialization.
Previously, the only way to specify a document not present in the index was to
use `like_text`. This would usually lead to complex queries made of multiple
MLT queries per document field. This commit adds the ability to the MLT query
to directly specify documents not present in the index (artificial documents).
The syntax is similar to the Percolator API or to the Multi Term Vector API.
Closes#7725
Create client nodes using `node.client: true` instead of `node.data: false` and `node.master: false`.
We should create client nodes in our test infra using the `node.client:true` settings as that is the one that users use, and the one that we use as well in `ClientNodePredicate` thus we end up not finding client nodes otherwise as they weren't created with the proper setting.
Updated also the `DataNodePredicate` so that `client: true` is enough, no need for `data: false` as well.
Closes#7911
The minimum number of optional should clauses of the generated query to match
can now be set using the more extensive minimum should match syntax. This
makes the `percent_terms_to_match` parameter deprecated, and replaced in favor
to a new `minimum_should_match` parameter.
Closes#7898