Today we only run 10% of the time, and the test doesn't fail when
corruption is detected.
I think it's better to always run and fail the test, so we can catch
any possible resiliency bugs in Lucene/Elasticsearch causing corruption.
For known tests that create corrupted indices, it's easy to set
MockFSDirectoryService.CHECK_INDEX_ON_CLOSE to false...
Closes#7730
Only when segments are merged away due to merging then entries in this cache are cleaned up.
Nested and parent/child rely on the fact that type filters produce a FixedBitSet, the FixedBitSetFilterCache does this.
Also if nested and parent/child is configured the type filters are eagerly loaded by default via the FixedBitSetFilterCache.
Closes#7037Closes#7031
The query cache allow to cache the (binary serialized) response of the shard level query phase execution based on the actual request as the key. The cache is fully coherent with the semantics of NRT, with a refresh (that actually ended up refreshing) causing previous cached entries on the relevant shard to be invalidated and eventually evicted.
This change enables query caching as an opt in index level setting, called `index.cache.query.enable` and defaults to `false`. The setting can be changed dynamically on an index. The cache is only enabled for search requests with search_type count.
The indices query cache is a node level query cache. The `indices.cache.query.size` controls what is the size (bytes wise) the cache will take, and defaults to `1%` of the heap. Note, this cache is very effective with small values in it already. There is also the advanced option to set `indices.cache.query.expire` that allow to control after a certain time of inaccessibility the cache will be evicted.
Note, the request takes the search "body" as is (bytes), and uses it as the key. This means same JSON but with different key order will constitute different cache entries.
This change includes basic stats (shard level, index/indices level, and node level) for the query cache, showing how much is used and eviction rates.
While this is a good first step, and the goal is to get it in, there are a few things that would be great additions to this work, but they can be done as additional pull requests:
- More stats, specifically cache hit and cache miss, per shard.
- Request level flag, defaults to "not set" (inheriting what the setting is).
- Allowing to change the cache size using the cluster update settings API
- Consider enabling the cache to query phase also when asking hits are involved, note, this will only include the "top docs", not the actual hits.
- See if there is a performant manner to solve the "out of order" of keys in the JSON case.
- Maybe introduce a filter element, that is outside of the request, that is checked, and if it matches all docs in a shard, will not be used as part of the key. This will help with time based indices and moving windows for shards that fall "inside" the window to be more effective caching wise.
- Add a more infra level support in search context that allows for any element to mark the search as non deterministic (on top of the support for "now"), and use it to not cache search responses.
closes#7161
correctly.
InterruptedExceptions should be handled by either rethrowing or
restoring the interrupt state (i.e. calling
`Thread.currentThread().interrupt()`). This is important since the
caller of the is method or subequent method calls might also be
interested in this exception. If we ignore the interrupt state the
caller might be left unaware of the exception and blocks again on
a subsequent method.
Closes#4712
* Clean up s/ElasticSearch/Elasticsearch on docs/*
* Clean up s/ElasticSearch/Elasticsearch on src/* bin/* & pom.xml
* Clean up s/ElasticSearch/Elasticsearch on NOTICE.txt and README.textile
Closes#4634
This can go wrong if indices with the same name are repeatably created and deleted.
UUIDs can not be null anymore. If UUID is not available `_na_` will be used as a value.
Also - some minor clean up in ShardStateAction where shard started events could be added twice to the to-be-applied list where the second instance will be ignored.
Closes#3783
MockDirectoryWrapper adds asserting logic to the low level directory
implementation that helps to track and catch resource leaks like
unclosed index inputs caused by dangling IndexReader or IndexSearcher
instances. It prevents double writes to files and allows low level
random exceptions to be thrown for testing index consistency etc.
Closes#3654
--------------------------
This feature allows to retrieve [term vectors](https://github.com/elasticsearch/elasticsearch/issues/3114) for a list of documents. The json request has exactly the same [format](https://github.com/elasticsearch/elasticsearch/issues/3484) as the ```_termvectors``` endpoint
It use it, call
```
curl -XGET 'http://localhost:9200/index/type/_mtermvectors' -d '{
"fields": [
"field1",
"field2",
...
],
"ids": [
"docId1",
"docId2",
...
],
"offsets": false|true,
"payloads": false|true,
"positions": false|true,
"term_statistics": false|true,
"field_statistics": false|true
}'
```
The return format is an array, each entry of which conatins the term vector response for one document:
```
{
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "docId1",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
},
{
"_index": "index",
"_type": "type",
"_id": "docId2",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
}
]
}
```
Note that, like term vectors, the mult term vectors request will silenty skip over documents that have no term vectors stored in the index and will simply return an empty response in this case.
Closes#3536
Currently the timeout for an delete index operation is set to 10 seconds.
Yet, if a full flush is running while we delete and index this can
easily exceed 10 seconds. The timeout is not dramatic ie. the index
will be deleted eventually but the client request is not acked which
can cause confusion. We should raise it to prevent unnecessary confusion
especially in client tests where this can happen if the machine is pretty busy.
The new timeout is set to 60 seconds.
Closes#3498
With this design the percolate queries will be stored in a special `_percolator` type with its own mapping in the same index where the actual data is or in a different index (dedicated percolation index, which might require different sharding behavior compared to the index that holds actual data and being search on). This approach allows percolate requests to scale to the number of primary shards an index has been configured with and effectively distributes the percolate execution.
This commit doesn't add new percolate features other than scaling. The response remains similar, with exception that a header similar to the search api has been added to the percolate response.
Closes#3173
Allow to configure store throttling (only applied on file system based storage), which allows to control the maximum bytes per sec written to the file system. It can be configured to only apply while merging, or on all output operations. The setting can eb set on the node level (in which case the throttling is done across all shards allocated on the node), or index level, in which case it only applied to that index.
The node level settings are indices.store.throttle.type to set the type, with values of none, merge and all (defaults to none). And, also, indices.store.throttle.max_bytes_per_sec (defaults to 0), which can be set to something like 1mb.
The index level settings is index.store.throttle.type for the type, with values of node, none, merge, and all. Defaults to node which will use the "shared" throttling on the node level. And, index.store.throttle.max_bytes_per_sec (defaults to 0).