To support the `_recovery` API, the recovery process keeps track of current progress in a class called RecoveryState. This class currently have some issues, mostly around concurrency (see #6644 ). This PR cleans it up as well as other issues around it:
- Make the Index subsection API cleaner:
- remove redundant information - all calculation is done based on the underlying file map
- clearer definition of what is what: total files, vs reused files (local files that match the source) vs recovered files (copied over). % based progress is reported based on recovered files only.
- cleaned up json response to match other API (sadly this breaks the structure). We now properly report human values for dates and other units.
- Add more robust unit testing
- Detail flag was passed along as state (it's now a ToXContent param)
- State lookup during reporting is now always done via the IndexShard , no more fall backs to many other classes.
- Cleanup APIs around time and move the little computations to the state class as opposed to doing them out of the API
I also improved error messages out of the REST testing infra for things I run into.
Closes#6644Closes#9811
The number of current pending tasks is useful to detect and overloaded master. This commit adds it to the cluster health API. The complete list can be retrieved from the dedicated pending tasks API.
It also adds rest tests for the cluster health variants.
Closes#9877
The plus sign is not treated correctly in encoding and can lead
to problems, if the search request is encoded as HTTP parameter
instead of the HTTP body.
Relates #9769
Using '_cat/segments' or the indices segments api without matching any index
now returns empty result instead of throwing IndexMissingException.
Closes#9219
Whenever we have an api that supports GET with a body, we always support the POST method too, as well as providing the body as a query_string parameter called `source`. Our REST spec should reflect this convention. FIxed them and introduced a hard check at parse time in our Java REST tests runner, which will cause the tests to fail if spec are not compliant.
Closes#9629
The percolate api doesn't parse the encoded body provided as `source` query string parameter, when percolating an existing document. Fixed and added REST test that would have caught this since we randomly use GET + encoded `source` param instead of GET + request body in our java runner (the perl runner does the same too).
Closes#9628
The `full` option and `FlushType.NEW_WRITER` only exists to allow
realtime changes to two settings (`index.codec` and `index.concurrency`).
Those settings are very expert and don't really need to be updateable
in realtime.
Until now, there was no possibility to expose infos about configured
transport profiles. This commit adds the ability to expose those
information in the TransportInfo class.
The channel was well as the netty pipeline handler now also contain
the profile they were configured for, as this information cannot be
extracted elsewhere.
In addition, each profile now can set its own publish host and port,
which might be needed in case of portforwarding or using docker.
Closes#9134
Adding missing support for the multi-index query parameters 'ignore_unavailable',
'allow_no_indices' and 'expand_wildcards' to '_cluster/state' API. These
parameters are supposed to be supported for APIs that work across multiple indices.
So far overwriting the default settings per REST call was not possible which is
fixed here.
Closes#5229Closes#9295
Today we give the HTTP status back within the HTTP response itself and within the JSON response as well:
```sh
curl localhost:9200/
```
```js
{
"status" : 200,
"name" : "Red Wolf",
"version" : {
"number" : "2.0.0",
"build_hash" : "6837a61d8a646a2ac7dc8da1ab3c4ab85d60882d",
"build_timestamp" : "2014-08-19T13:55:56Z",
"build_snapshot" : true,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}
```
The header indicates to how many shard copies (primary and replicas shards) a write was supposed to go to, to how many
shard copies to write succeeded and potentially captures shard failures if writing into a replica shard fails.
For async writes it also includes the number of shards a write is still pending.
Closes#7994
This fix ensures that calls to the GET alias/mappings/settings/warmers APIs return the aliases/mappings/settings/warmers object even if there is no content within them.. This make them consistent with the GET Index API docs and the breaking changes in 1.4 docs
Closes#9148
Add a new ignore_idle_threads boolean option (default true) to
/_nodes/hot_threads, to filter out threads in known idle places like
waiting on a socket select or on pulling the next task from an empty
queue.
Closes#8985Closes#8908
This commit adds support for version and version_type to the Term Vectors API.
This could be useful in the following case whereby the user gets a document
and later wants to generate its TVs. With version, this would ensure that only
the TVs of that particular document are generated, and error out if the
document has been updated in between.
Closes#7480
Adds a `ignore_like` parameter to the MLT Query, which simply tells the
algorithm to skip all the terms from the given documents. This could be useful
in order to better guide nearest neighbor search by telling the algorithm to
never explore the space spanned by the given `ignore_like` docs. In essence we
are interested about the characteristic of a given item, but not of the ones
provided by `ignore_like`, thereby forcing the algorithm to go deeper in its
selection of terms. Note that this is different than simply performing a must
not boolean query on the unliked items. The syntax is exactly the same as the
`like` parameter.
Closes#8674
Today, Elasticsearch has a separate merge thread pool checking once
per second (by default) if any merges are necessary, but this is no
longer necessary since we can and do now tell Lucene's
ConcurrentMergeScheduler never to "hard pause" threads when merges
fall behind, since we do our own index throttling.
This change goes back to letting Lucene launch merges as needed, and
removes these two expert settings:
index.merge.force_async_merge
index.merge.async_interval
Now merges kick off immediately instead of waiting up to 1 second
before running.
Closes#8643
We speak of the term vectors of a document, where each field has an associated
stored term vector. Since by default we are requesting all the term vectors of
a document, the HTTP request endpoint should rather be called `_termvectors`
instead of `_termvector`. The usage of `_termvector` is now deprecated, as
well as the transport client call to termVector and prepareTermVector.
Closes#8484
Fixed behaviour where two representations of the default index analyzer weren't being treated as equivalent. Added REST test to confirm fix.
Closes#2716
If a shard (e.g. replica) gets initialized after we indexed the document it gets refreshed internally and we find the doc and its term_vectors, thus the test fails
We currently use the djb2 hash function in order to compute the shard a
document should go to. Unfortunately this hash function is not very
sophisticated and you can sometimes hit adversarial cases, such as numeric ids
on 33 shards.
Murmur3 generates hashes with a better distribution, which should avoid the
adversarial cases.
Here are some examples of how 100000 incremental ids are distributed to shards
using either djb2 or murmur3.
5 shards:
Murmur3: [19933, 19964, 19940, 20030, 20133]
DJB: [20000, 20000, 20000, 20000, 20000]
3 shards:
Murmur3: [33185, 33347, 33468]
DJB: [30100, 30000, 39900]
33 shards:
Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
DJB: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]
Even if djb2 looks ideal in some cases (5 shards), the fact that the
distribution of its hashes has some patterns can raise issues with some shard
counts (eg. 3, or even worse 33).
Some tests have been modified because they relied on implementation details of
the routing hash function.
Close#7954
Fixes a bug where alias creation would allow `null` for index name, which thereby
applied the alias to _all_ indices. This patch makes the validator throw an
exception if the index is null.
```bash
POST /_aliases
{
"actions": [
{
"add": {
"alias": "empty-alias",
"index": null
}
}
]
}
```
```json
{
"error": "ActionRequestValidationException[Validation Failed: 1: Alias action [add]: [index] may not be null;]",
"status": 400
}
```
The reason this bug wasn't caught by the existing tests is because
the old test for nullness only validated against a cluster which had
zero indices. The null index is translated into "_all", and since
there are no indices, this fails because the index doesn't exist.
So the test passes.
However, as soon as you add an index, "_all" resolves and you get the
situation described in the original bug report: null index is
accepted by the alias, resolves to "_all" and gets applied to everything.
The REST tests, otoh, explicitly tested this bug as a real feature and therefore
passed. The REST tests were modified to change this behavior.
Fixes#7863
Add source_node and target_node fields to the recovery cat API. Also fixed and updated the documentation which was not complete concerning fields names.
Closes#8041
Storing `_timestamp` by default means that under the default configuration, you
would have all the information you need in order to reindex into a different
index.
Close#8139
cat/nodes currently does not report any details related to file descriptors. This adds the current number in use, the maximum number available as well as their ratio (percentage) to cat/nodes as hidden-by-default metrics. In addition, this also adds current heap usage (as a non-percentage of ts max) and ram usage (as a non-percerntage of its max) to allow tools to provide more granularity.
Closes#7652
* `get_upgrade` => `GET _upgrade` -- Return the status
* `upgrade` => `POST _upgrade` -- Perform the operation
Original specification part of c021f22523.
Related: #7884, #7922
This commit does the following:
* Add the new API at the rest layer, being backed by the optimize API
with upgrade flag, and segments api to find upgrade status.
* Add `upgrade` flag to optimize API, and deprecate `force` flag (will
remove in master)
* Add test for both synchronous and async upgrade
closes#7884closes#7922
When asking for `GET /_cat/indices?v`, you can now retrieve closed indices in addition to opened ones.
```
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open .marvel-2014.05.21 1 1 8792 0 21.7mb 21.7mb
close test
yellow open .marvel-2014.05.22 1 1 3871 0 10.7mb 10.7mb
red open .marvel-2014.05.27 1 1
```
Closes#7907.
Closes#7936.
By default term vectors are now realtime, as opposed to previously near
realtime. If they are not found in the index, they will be generated on the
fly. The document is fetched from the transaction log and treated as an
artificial document. One can set `realtime` parameter to `false` in order to
disable this functionality. This consequently makes the MLT query realtime in
fetching documents, as it previsouly used to be before switching from using
the multi get API to the mtv API.
Closes#7846
Previously, the only way to specify a document not present in the index was to
use `like_text`. This would usually lead to complex queries made of multiple
MLT queries per document field. This commit adds the ability to the MLT query
to directly specify documents not present in the index (artificial documents).
The syntax is similar to the Percolator API or to the Multi Term Vector API.
Closes#7725
This contains several cleanups to the indexed scripts.
Remove the unused FetchSourceContext from the Get request..
Add lang,_version,_id to the REST GET API.
Removes the routing from GetIndexedScriptRequest since the script index is a single shard that is replicated across all nodes.
Fix backward compatible template file reference
Before 1.3.0 on disk scripts could be referenced by requesting
````
_search/template
{
"template" : "ondiskscript"
}
````
This was broken in 1.3.0 by requiring
````
{
"template" :
{
"file" : "ondiskscript"
}
}
````
This commit restores the previous behavior.
Remove support for preference, realtime and refresh
These parameters don't make sense anymore for indexed scripts as we always force the preference to _local and
always refresh after a Put to the indexed scripts index.
Closes#7568Closes#7559Closes#7647Closes#7567
Returns information about settings, aliases, warmers, and mappings. Basically returns the IndexMetadata. This new endpoint replaces the /{index}/_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers and /_alias|_aliases|_mapping|_mappings|_settings|_warmer|_warmers endpoints whilst maintaining the same response formats. The only exception to this is on the /_alias|_aliases|_warmer|_warmers endpoint which will now return a section for 'aliases' or 'warmers' even if no aliases or warmers exist. This backwards compatibility change is documented in the reference docs.
Closes#4069
By default the reroute API should return the new cluster state, excluding the metadata. It was however it was wrongly using an old parameter (filter_metadata) and thus failed to do so. This commits restores but wiring it to the correct `metric` parameter. We also add an enum representing the possible metrics, to avoid similar future mistakes.
Closes#7520Closes#7523
The get, put and delete indexed script apis map to get, index and delete api and internally create those corresponding requests. We need to make sure that the original headers are handed over to the new request by passing the original request in the constructor when creating the new one.
Also streamlined the support for version and version_type in the REST layer since the parameters were not consistently parsed and set to the internal java API requests.
Modified the REST delete template and delete script actions to make use of a client instead of using the `ScriptService` directly.
Closes#7569
The root endpoint returns basic information about this node, like it's name and ES version etc. The cluster name is an important information that belongs in that list.
Closes#7524
This change means that the default settings for expand_wildcards are only applied if the expand_wildcards parameter is not specified rather than being set upfront. It also adds the none and all options to the parameter to allow the user to specify no expansion and expansion to all indexes (equivalent to 'open,closed')
Closes#7258
Index, type and id were returned as part of the REST explain api response, but not through java api. That info was read out of the request, relying on the fact that the index would get overridden with the concrete one within that same request.
Closes#7201
A request level flag, defaults to be unset, to control the query cache. When not set, it defaults to the index level settings, when explicitly set, will override the index level setting
closes#7167
In the case of inserts the UpdateHelper class will now allow the script used to apply updates to run on the upsert doc provided by clients. This allows the logic for managing the internal state of the data item to be managed by the script and is not reliant on clients performing the initialisation of data structures managed by the script.
Closes#7143
Implements a new Exists API allowing users to do fast exists check on any matched documents for a given query.
This API should be faster then using the Count API as it will:
- early terminate the search execution once any document is found to exist
- return the response as soon as the first shard reports matched documents
closes#6995
This commit adds the ability to force blocking on the flush operaition
to make sure all files have been written and synced to disk. Without
this option a flush might be executing at the same time causing the
current flush to fail and return before all files being synced.
Closes#6996