We moved a lot of repositories into elasticsearch, but in their new
location they retained their LICENSE.txt and NOTICE.txt files. These are
all the same, and having the license and notice and the root of the
repository should be sufficient.
Adds a node attribute to all test runs and uses the attribute to test
`_cat/nodeattrs`.
Note that its quite possible create an impressively slow regex while doing
this and you have to be careful. See comment in commit for more if curious.
Closes#12558
detect_noop is pretty cheap and noop updates compartively expensive so this
feels like a sensible default.
Also had to do some testing and documentation around how _ttl works with
detect_noop.
Closes#11282
In order to have consistent deploys across several repositories,
we should deploy to sonatype first, then mirror those contents,
and then upload to s3.
This means, the aws wagon is not needed anymore.
Adds an explicit description the RPM package so it doesn't inherit the description from the POM.
Closes#12550
Also, modified descriptions for deb and rpm packages to be the same and to reference the documentation rather than listing features that are out of date.
Conflicting mappings that were allowed before v2.0 can cause runaway shard failures on upgrade. This commit adds a check that prevents a cluster from starting if it contains such indices as well as restoring such indices from a snapshot into already running cluster.
Closes#11857
this change was added recently which uses default timezone for the creation
date on CAT endpoints. We should be consistent and use UTC across the board.
This commit adds #getDefaultTimzone() to forbidden API and fixes the REST tests.
Relates to #11688
Store information reports on which nodes shard copies exist, the shard
copy version, indicating how recent they are, and any exceptions
encountered while opening the shard index or from earlier engine failure.
closes#10952
Today we have a intermediate hierarchy for shard and index exceptions
which makes it hard to introduce generic exceptions like ResourceNotFoundException
intoduced in this commit. This commit breaks up the hierarchy by adding index and shard
as a special internal header that gets rendered for every exception that fills that header.
This commit removes dedicated exceptions like `IndexMissingException` or
`IndexShardMissingException` in favour of `ResourceNotFoundException`
This test asserts that the first recovery result is of type SNAPSHOT.
This assertion might not be true depending on the rendering order if
a replica is recovered quick enough. This commit disables replicas and
their recovery since it's not the purpose of this test.
Today everything is tight to having the next version as the latest.
In order to work towards 2.0.0.beta1 we need to fix all the usage of
2.0.0-SNAPSHOT to reflect the version we will release soon.
Usually we do this on the release branch but to simplify things I wanna
keep this on master for now and move to 2.1.0-SNAPSHOT on master once
we created a 2.0 branch.
Closes#12148
This information was stored with the snapshot but wasn't available on the interface. Knowing the version of elasticsearch that created the snapshot can be useful to determine the minimal version of the cluster that is required in order to restore this snapshot.
Closes#11980
This commit adds support to retrieve fields when using the bulk update API. This functionality was previously available for the update API
but not for the bulk update API.
Closes#11527
Field stats index constraints allows to omit all field stats for indices that don't match with the constraint. An index
constraint can exclude indices' field stats based on the `min_value` and `max_value` statistic. This option is only
useful if the `level` option is set to `indices`.
For example index constraints can be useful to find out the min and max value of a particular property of your data in
a time based scenario. The following request only returns field stats for the `answer_count` property for indices
holding questions created in the year 2014:
curl -XPOST 'http://localhost:9200/_field_stats?level=indices' -d '{
"fields" : ["answer_count"] <1>
"index_constraints" : { <2>
"creation_date" : { <3>
"min_value" : { <4>
"gte" : "2014-01-01T00:00:00.000Z",
},
"max_value" : {
"lt" : "2015-01-01T00:00:00.000Z"
}
}
}
}'
Closes#11187
In order to be more consistent with what they do, the query cache has been
renamed to request cache and the filter cache has been renamed to query
cache.
A known issue is that package/logger names do no longer match settings names,
please speak up if you think this is an issue.
Here are the settings for which I kept backward compatibility. Note that they
are a bit different from what was discussed on #11569 but putting `cache` before
the name of what is cached has the benefit of making these settings consistent
with the fielddata cache whose size is configured by
`indices.fielddata.cache.size`:
* index.cache.query.enable -> index.requests.cache.enable
* indices.cache.query.size -> indices.requests.cache.size
* indices.cache.filter.size -> indices.queries.cache.size
Close#11569
This commit consolidates several abstractions on the shard level in
ordinary classes not managed by the shard level guice injector.
Several classes have been collapsed into IndexShard and IndexShardGatewayService
was cleaned up to be more lightweight and self-contained. It has also been moved into
the index.shard package and it's operation is renamed from recovery from "gateway" to recovery
from "store" or "shard_store".
Closes#11847
This is a follow up to #8143 and #6730 for _timestamp. It removes
support for `path`, as well as any field type settings, and
enables docvalues for _timestamp, for 2.0. Users who need to
adjust these settings can use a date field.
In order to get a quick overview using by simply checking the cluster state
and its corresponding cat API, the following two attributes have been added
to the cluster health response:
* task max waiting time, the time value of the first task of the
queue and how long it has been waiting
* active shards percent: The percentage of the number of shards that are in
initializing state
This makes the cluster health API handy to check, when a fully restarted
cluster is back up and running.
Closes#10805
The change makes rest-spec-api a project in the same way as we build dev-tools. it packages the tests and api in a bundle using the maven-remote-resources-plugin and uses the same plugin in the plugins and core pom to unpack the rest-api-spec into the target directory and references the rest tests there in the test resources.
The main stimulus for this change is that for those using Eclipse the current build does not work. After running `mvn eclipse:eclipse` the Eclipse IDE errors because the rest-api-spec is outside of the project scope, meaning that every time the command is run (required whenever any dependencies change), the class path of all the projects has to be manually fixed.
By setting human parameter to true, it's now possible to see human readable versions of Elasticsearch that created and updated the index as well as the date when the index was created.
Closes#11484
This changes the parameter name `ignore_like` to the more user friendly name
`unlike`. This later feature generates a query from the terms in `A` but not
from the terms in `B`. This translates to a result set which is like `A` but
unlike `B`. We could have further negatively boosted any documents that have
some `B`, but these documents already do not receive any contribution from
having `B`, and would therefore negatively compete with documents having `A`.
Closes#11117
Unassigned meta includes additional information as to why a shard is unassigned, this is especially handy when a shard moves to unassigned due to node leaving or shard failure.
The additional data is provided as part of the cluster state, and as part of `_cat/shards` API.
The additional meta includes the timestamp that the shard has moved to unassigned, allowing us in the future to build functionality such as delay allocation due to node leaving until a copy of the shard is found.
closes#11653
We have to make sure all shards are started to know the synced flush will hit them all. Shards that are still initializing during the sync flush may be missed and confuse the stats call
Some of our meta fields (such as _id, _version, ...) are returned as top-level
properties of the json document, while other properties (_timestamp, _routing,
...) are returned under `fields`. This commit makes all meta fields returned
as top-level properties.
So eg. `GET test/test/1?fields=_timestamp,foo` would now return
```json
{
"_index": "test",
"_type": "test",
"_id": "1",
"_version": 1,
"_timestamp": 10000000,
"found": true,
"fields": {
"foo": [ "bar" ]
}
}
```
while it used to return
```json
{
"_index": "test",
"_type": "test",
"_id": "1",
"_version": 1,
"found": true,
"fields": {
"_timestamp": 10000000,
"foo": [ "bar" ]
}
}
```
The wildcard cat API REST tests relied on bulk.max and bulk.min in
the thread_pool response. However due to the thread pool types being
randomized in InternalTestCluster, the min/max values were not guaranteed
to exist (the cached thread pool type is unbounded and thus does not have a
max value).
In order to prevent this, the test has been removed and now the cat
nodes test is used for wildcard testing, which always returns stats
about the heap.
The tests for the recently added added wildcard feature were
relying on order of the hashmap being used, which could be
different.
The implementation now ensures, that the header fields are
parsed in the order they have been added.
This change adds a new "filter_path" parameter that can be used to filter and reduce the responses returned by the REST API of elasticsearch.
For example, returning only the shards that failed to be optimized:
```
curl -XPOST 'localhost:9200/beer/_optimize?filter_path=_shards.failed'
{"_shards":{"failed":0}}%
```
It supports multiple filters (separated by a comma):
```
curl -XGET 'localhost:9200/_mapping?pretty&filter_path=*.mappings.*.properties.name,*.mappings.*.properties.title'
```
It also supports the YAML response format. Here it returns only the `_id` field of a newly indexed document:
```
curl -XPOST 'localhost:9200/library/book?filter_path=_id' -d '---hello:\n world: 1\n'
---
_id: "AU0j64-b-stVfkvus5-A"
```
It also supports wildcards. Here it returns only the host name of every nodes in the cluster:
```
curl -XGET 'http://localhost:9200/_nodes/stats?filter_path=nodes.*.host*'
{"nodes":{"lvJHed8uQQu4brS-SXKsNA":{"host":"portable"}}}
```
And "**" can be used to include sub fields without knowing the exact path. Here it returns only the Lucene version of every segment:
```
curl 'http://localhost:9200/_segments?pretty&filter_path=indices.**.version'
{
"indices" : {
"beer" : {
"shards" : {
"0" : [ {
"segments" : {
"_0" : {
"version" : "5.2.0"
},
"_1" : {
"version" : "5.2.0"
}
}
} ]
}
}
}
}
```
Note that elasticsearch sometimes returns directly the raw value of a field, like the _source field. If you want to filter _source fields, you should consider combining the already existing _source parameter (see Get API for more details) with the filter_path parameter like this:
```
curl -XGET 'localhost:9200/_search?pretty&filter_path=hits.hits._source&_source=title'
{
"hits" : {
"hits" : [ {
"_source":{"title":"Book #2"}
}, {
"_source":{"title":"Book #1"}
}, {
"_source":{"title":"Book #3"}
} ]
}
}
```
#10032 introduced the notion of sealing an index by marking it with a special read only marker, allowing for a couple of optimization to happen. The most important one was to speed up recoveries of shards where we know nothing has changed since they were online by skipping the file based sync phase. During the implementation we came up with a light notion which achieves the same recovery benefits but without the read only aspects which we dubbed synced flush. The fact that it was light weight and didn't put the index in read only mode, allowed us to do it automatically in the background which has great advantage. However we also felt the need to allow users to manually trigger this operation.
The implementation at #11179 added the sync flush internal logic and the manual (rest) rest API. The name of the API was modeled after the sealing terminology which may end up being confusing. This commit changes the API name to match the internal synced flush naming, namely `{index}/_flush/synced'.
On top of that it contains a couple other changes:
- Remove all java client API. This feature is not supposed to be called programtically by applications but rather by admins.
- Improve rest responses making structure similar to other (flush) API
- Change IndexShard#getOperationsCount to exclude the internal +1 on open shard . it's confusing to get 1 while there are actually no ongoing operations
- Some minor other clean ups