Stats, histogram and range facets and sorting currently fail if a field that they are running on is not defined in the mapping. In case of dynamic fields it might mean that by the time the facet query is executed the new field mapping might not be propagated to all nodes yet.
When startNode exits there is no guarantee that shard cleanup is finished because the cleanup operation is performed on another thread and startNode doesn't wait for it to complete. Therefore we might need to wait for the shard to disappear.
we want to support ~ notion in query parser for types other than strings, we are getting there, one can do now age:10~5, we would love to support it for dates, as in timestamp:2012-10-10~5d, but that requires changes in the query parser to support strings after the ~ sign
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available from version `0.21.0`.
# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.
# Suggest API
The suggest request part is defined along side the query part as top field in the json request.
```
curl -s -XPOST 'localhost:9200/_search' -d '{
"query" : {
...
},
"suggest" : {
...
}
}'
```
Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. Both `my-suggest-1` and `my-suggest-2` suggestions use the `fuzzy` suggester, but have a different `text`.
```
"suggest" : {
"my-suggest-1" : {
"text" : "the amsterdma meetpu",
"fuzzy" : {
"field" : "body"
}
},
"my-suggest-2" : {
"text" : "the rottredam meetpu",
"fuzzy" : {
"field" : "title",
}
}
}
```
The below suggest response example includes the suggestion response for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains entries. Each entry is effectively a token from the suggest text and contains the suggestion entry text, the original start offset and length in the suggest text and if found an arbitary number of options.
```
{
...
"suggest": {
"my-suggest-1": [
{
"text" : "amsterdma",
"offset": 4,
"length": 9,
"options": [
...
]
},
...
],
"my-suggest-2" : [
...
]
}
...
}
```
Each options array contains a option object that includes the suggested text, its document frequency and score compared to the suggest entry text. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.
```
"options": [
{
"text": "amsterdam",
"freq": 77,
"score": 0.8888889
},
...
]
```
# Global suggest text
To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is defined globally and applies to the `my-suggest-1` and `my-suggest-2` suggestions.
```
"suggest" : {
"text" : "the amsterdma meetpu"
"my-suggest-1" : {
"fuzzy" : {
"field" : "title"
}
},
"my-suggest-2" : {
"fuzzy" : {
"field" : "body"
}
}
}
```
The suggest text can in the above example also be specied as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.
# Other suggest example.
In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.
```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
"suggest" : {
"my-title-suggestions-1" : {
"text" : "devloping distibutd saerch engies",
"fuzzy" : {
"size" : 3,
"field" : "title"
}
}
}
}'
```
The above request could yield the response as stated in the code example below. As you can see if we take the first suggested options of each suggestion entry we get `developing distributed search engines` as result.
```
{
...
"suggest": {
"my-title-suggestions-1": [
{
"text": "devloping",
"offset": 0,
"length": 9,
"options": [
{
"text": "developing",
"freq": 77,
"score": 0.8888889
},
{
"text": "deloping",
"freq": 1,
"score": 0.875
},
{
"text": "deploying",
"freq": 2,
"score": 0.7777778
}
]
},
{
"text": "distibutd",
"offset": 10,
"length": 9,
"options": [
{
"text": "distributed",
"freq": 217,
"score": 0.7777778
},
{
"text": "disributed",
"freq": 1,
"score": 0.7777778
},
{
"text": "distribute",
"freq": 1,
"score": 0.7777778
}
]
},
{
"text": "saerch",
"offset": 20,
"length": 6,
"options": [
{
"text": "search",
"freq": 1038,
"score": 0.8333333
},
{
"text": "smerch",
"freq": 3,
"score": 0.8333333
},
{
"text": "serch",
"freq": 2,
"score": 0.8
}
]
},
{
"text": "engies",
"offset": 27,
"length": 6,
"options": [
{
"text": "engines",
"freq": 568,
"score": 0.8333333
},
{
"text": "engles",
"freq": 3,
"score": 0.8333333
},
{
"text": "eggies",
"freq": 1,
"score": 0.8333333
}
]
}
]
}
...
}
```
# Common suggest options:
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.
# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.
# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option.
# Suggest feature
The suggest feature suggests similar looking terms based on a provided text by using a suggester. At the moment there the only supported suggester is `fuzzy`. The suggest feature is available since version `0.21.0`.
# Fuzzy suggester
The `fuzzy` suggester suggests terms based on edit distance. The provided suggest text is analyzed before terms are suggested. The suggested terms are provided per analyzed suggest text token. The `fuzzy` suggester doesn't take the query into account that is part of request.
# Suggest API
The suggest request part is defined along side the query part as top field in the json request.
```
curl -s -XPOST 'localhost:9200/_search' -d '{
"query" : {
...
},
"suggest" : {
...
}
}'
```
Several suggestions can be specified per request. Each suggestion is identified with an arbitary name. In the example below two suggestions are requested. The `my-suggest-1` suggestion uses the `body` field and `my-suggest-2` uses the `title` field. The `type` field is a required field and defines what suggester to use for a suggestion.
```
"suggest" : {
"suggestions" : {
"my-suggest-1" : {
"type" : "fuzzy",
"field" : "body",
"text" : "the amsterdma meetpu"
},
"my-suggest-2" : {
"type" : "fuzzy",
"field" : "title",
"text" : "the rottredam meetpu"
}
}
}
```
The below suggest response example includes the suggestions part for `my-suggest-1` and `my-suggest-2`. Each suggestion part contains a terms array, that contains all terms outputted by the analyzed suggest text. Each term object includes the term itself, the original start and end offset in the suggest text and if found an arbitary number of suggestions.
```
{
...
"suggest": {
"my-suggest-1": {
"terms" : [
{
"term" : "amsterdma",
"start_offset": 5,
"end_offset": 14,
"suggestions": [
...
]
}
...
]
},
"my-suggest-2" : {
"terms" : [
...
]
}
}
```
Each suggestions array contains a suggestion object that includes the suggested term, its document frequency and score compared to the suggest text term. The meaning of the score depends on the used suggester. The fuzzy suggester's score is based on the edit distance.
```
"suggestions": [
{
"term": "amsterdam",
"frequency": 77,
"score": 0.8888889
},
...
]
```
# Global suggest text
To avoid repitition of the suggest text, it is possible to define a global text. In the example below the suggest text is a global option and applies to the `my-suggest-1` and `my-suggest-2` suggestions.
```
"suggest" : {
"suggestions" : {
"text" : "the amsterdma meetpu",
"my-suggest-1" : {
"type" : "fuzzy",
"field" : "title"
},
"my-suggest-2" : {
"type" : "fuzzy",
"field" : "body"
}
}
}
```
The suggest text can be specied as global option or as suggestion specific option. The suggest text specified on suggestion level override the suggest text on the global level.
# Other suggest example.
In the below example we request suggestions for the following suggest text: `devloping distibutd saerch engies` on the `title` field with a maximum of 3 suggestions per term inside the suggest text. Note that in this example we use the `count` search type. This isn't required, but a nice optimalization. The suggestions are gather in the `query` phase and in the case that we only care about suggestions (so no hits) we don't need to execute the `fetch` phase.
```
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
"suggest" : {
"suggestions" : {
"my-title-suggestions" : {
"suggester" : "fuzzy",
"field" : "title",
"text" : "devloping distibutd saerch engies",
"size" : 3
}
}
}
}'
```
The above request could yield the response as stated in the code example below. As you can see if we take the first suggested term of each suggest text term we get `developing distributed search engines` as result.
```
{
...
"suggest": {
"my-title-suggestions": {
"terms": [
{
"term": "devloping",
"start_offset": 0,
"end_offset": 9,
"suggestions": [
{
"term": "developing",
"frequency": 77,
"score": 0.8888889
},
{
"term": "deloping",
"frequency": 1,
"score": 0.875
},
{
"term": "deploying",
"frequency": 2,
"score": 0.7777778
}
]
},
{
"term": "distibutd",
"start_offset": 10,
"end_offset": 19,
"suggestions": [
{
"term": "distributed",
"frequency": 217,
"score": 0.7777778
},
{
"term": "disributed",
"frequency": 1,
"score": 0.7777778
},
{
"term": "distribute",
"frequency": 1,
"score": 0.7777778
}
]
},
{
"term": "saerch",
"start_offset": 20,
"end_offset": 26,
"suggestions": [
{
"term": "search",
"frequency": 1038,
"score": 0.8333333
},
{
"term": "smerch",
"frequency": 3,
"score": 0.8333333
},
{
"term": "serch",
"frequency": 2,
"score": 0.8
}
]
},
{
"term": "engies",
"start_offset": 27,
"end_offset": 33,
"suggestions": [
{
"term": "engines",
"frequency": 568,
"score": 0.8333333
},
{
"term": "engles",
"frequency": 3,
"score": 0.8333333
},
{
"term": "eggies",
"frequency": 1,
"score": 0.8333333
}
]
}
]
}
}
...
}
```
# Common suggest options:
* `suggester` - The suggester implementation type. The only supported value is 'fuzzy'. This is a required option.
* `text` - The suggest text. The suggest text is a required option that needs to be set globally or per suggestion.
# Common fuzzy suggest options
* `field` - The field to fetch the candidate suggestions from. This is an required option that either needs to be set globally or per suggestion.
* `analyzer` - The analyzer to analyse the suggest text with. Defaults to the search analyzer of the suggest field.
* `size` - The maximum corrections to be returned per suggest text token.
* `sort` - Defines how suggestions should be sorted per suggest text term. Two possible value:
** `score` - Sort by sore first, then document frequency and then the term itself.
** `frequency` - Sort by document frequency first, then simlarity score and then the term itself.
* `suggest_mode` - The suggest mode controls what suggestions are included or controls for what suggest text terms, suggestions should be suggested. Three possible values can be specified:
** `missing` - Only suggest terms in the suggest text that aren't in the index. This is the default.
** `popular` - Only suggest suggestions that occur in more docs then the original suggest text term.
** `always` - Suggest any matching suggestions based on terms in the suggest text.
# Other fuzzy suggest options:
* `lowercase_terms` - Lower cases the suggest text terms after text analyzation.
* `max_edits` - The maximum edit distance candidate suggestions can have in order to be considered as a suggestion. Can only be a value between 1 and 2. Any other value result in an bad request error being thrown. Defaults to 2.
* `min_prefix` - The number of minimal prefix characters that must match in order be a candidate suggestions. Defaults to 1. Increasing this number improves spellcheck performance. Usually misspellings don't occur in the beginning of terms.
* `min_query_length` - The minimum length a suggest text term must have in order to be included. Defaults to 4.
* `shard_size` - Sets the maximum number of suggestions to be retrieved from each individual shard. During the reduce phase only the top N suggestions are returned based on the `size` option. Defaults to the `size` option. Setting this to a value higher than the `size` can be useful in order to get a more accurate document frequency for spelling corrections at the cost of performance. Due to the fact that terms are partitioned amongst shards, the shard level document frequencies of spelling corrections may not be precise. Increasing this will make these document frequencies more precise.
* `max_inspections` - A factor that is used to multiply with the `shards_size` in order to inspect more candidate spell corrections on the shard level. Can improve accuracy at the cost of performance. Defaults to 5.
* `threshold_frequency` - The minimal threshold in number of documents a suggestion should appear in. This can be specified as an absolute number or as a relative percentage of number of documents. This can improve quality by only suggesting high frequency terms. Defaults to 0f and is not enabled. If a value higher than 1 is specified then the number cannot be fractional. The shard level document frequencies are used for this option.
* `max_query_frequency` - The maximum threshold in number of documents a sugges text token can exist in order to be included. Can be a relative percentage number (e.g 0.4) or an absolute number to represent document frequencies. If an value higher than 1 is specified then fractional can not be specified. Defaults to 0.01f. This can be used to exclude high frequency terms from being spellchecked. High frequency terms are usually spelled correctly on top of this this also improves the spellcheck performance. The shard level document frequencies are used for this option.
Closes#2585
* added configurable MemoryIndexPool that pools MemoryIndex instance across Threads
* Pool can be configured based on the number of pooled instances as well as the maximum number of bytes that is reused across the pooled instances
Closes#2581
* Removed CustmoMemoryIndex in favor of MemoryIndex which as of 4.1 supports adding the same field twice
* Replaced duplicated logic in X[*]FSDirectory for rate limiting with a RateLimitedFSDirectory wrapper
* Remove hacks to find out merge context in rate limiting in favor of IOContext
* replaced Scorer#freq() return type (from float to int)
* Upgraded FVHighlighter to new 'centered' highlighting
* Fixed RobinEngine to use seperate setCommitData
* Default ShardsAllocator is set to BalancedShardsAllocator
* Core ShardsAllocator implementations can be defined via 'cluster.routing.allocation.type'
* Core ShardsAllocator implementations are exposed via short keys 'balanced' (BalancedShardsAllocator) and 'even_shards' (EvenShardsCountAllocator)
* Third party allocators can be loaded via fully-qualified class names.
Closes#2557
* Weights are calculated per index and incorporate index level, global and primary related parameters
* Balance operations are executed based on a win maximation strategy that tries to relocate shards
first that offer the biggest gain towards the weight functions optimum
* The WeightFunction allows settings to prefer index based balance over global balance and vice versa
* Balance operations can be throttled by raising a threshold resulting in less agressive balance operations
* WeightFunction shipps with defaults to achive evenly distributed indexes while maintaining a global balance
Closes#2555
we need to in order to properly handle bytes, and normalize Integer to Long for example for consistency, the fact that mappers now handle different Objtes help here
Closes: #646
- Introduced HunspellService which holds a repository of hunspell dictionaries
- It is possible to register a dictionary via a plugin or by placing the dictionary files on the file system
Added score support to `has_child` and `has_parent` queries. Both queries support a score_type option. The has_child support the same options as the top_children query and the none option which is the default and yields the current behaviour. The has_parent query support the score type options: score and none. The latter is the default and yields the current behaviour.
If the score_type is set to a value other than none then the has_parent query map the matched parent score into the related children documents. The has_child query then map the matched children documents into the related parent document. The score_type on both queries defines how the children documents scores are mapped in the parent documents. Both queries are executed in two phases. First phase collects the parent uid values of matching documents with an aggregated score per parent uid value. In the second phase either child or parent typed documents are emitted as hit that have the same parent uid value as found during the first phase. The score computed in the first phase will be used as score.
Closes#2502
Fixed error with the top_children query when `DFS_QUERY_*` is used as search_type and wraps a query that gets rewritten (E.g wildcard query).
Closes#2501
If the a routing value isn't id based, the get part of the mlt request couldn't retrieve the document for the second part of the mlt request and a 500 code is returned instead. This fix addresses this issue.
Closes#2489
- Added "regexp" query type (based on Lucene 4 RegexpQuery)
- Added "regexp" filter type
- Fixed a bug in IdFieldMapper where prefixQuery on a single type would be redundantly wrapped in a boolean query
our idea is to apply it on the "filtered/constant" level, and not on compound filters, so we won't apply it multiple times. The solution is conservative a bit now, we can further optimize it in the future, for example, not to wrap it when no caching is done within the filter chain
- remove the DocSet abstraction, and use Bits where we can by getting it from DocIdSet
- better handling of acceptDocs, though still need to properly apply them when caching is involved
This feature adds the option to configure a `PostingsFormat` and assign it to a field in the mapping. This feature is very expert and in almost all cases Elasticsearch's defaults will suite your needs.
## Configuring a postingsformat per field
There're several default postings formats configured by default which can be used in your mapping:
a* `direct` - A codec that wraps the default postings format during write time, but loads the terms and postinglists into memory directly in memory during read time as raw arrays. This postings format is exceptional memory intensive, but can give a substantial increase in search performance.
* `memory` - A codec that loads and stores terms and postinglists in memory using a FST. Acts like a cached postingslist.
* `bloom_default` - Maintains a bloom filter for the indexed terms, which is stored to disk and builds on top of the `default` postings format. This postings format is useful for low document frequency terms and offers a fail fast for seeks to terms that don't exist.
* `bloom_pulsing` - Similar to the `bloom_default` postings format, but builds on top of the `pulsing` postings format.
* `default` - The default postings format. The default if none is specified.
On all fields it possible to configure a `postings_format` attribute. Example mapping:
```
{
"person" : {
"properties" : {
"second_person_id" : {"type" : "string", "postings_format" : "pulsing"}
}
}
}
```
## Configuring a custom postingsformat
It is possible the instantiate custom postingsformats. This can be specified via the index settings.
```
{
"codec" : {
"postings_format" : {
"my_format" : {
"type" : "pulsing40"
"freq_cut_off" : "5"
}
}
}
}
```
In the above example the `freq_cut_off` is set the 5 (defaults to 1). This tells the pulsing postings format to inline the postinglist of terms with a document frequency lower or equal to 5 in the term dictionary.
Closes#2411
no need to test for boost, we already have specific boost tests, in general, we should get rid of this test, and use more specialized tests if we are missing some
This commit enables setting boost for numeric fields. However, there is still no way to take advantage of boosted numeric fields during searching because all queries against numeric fields are translated into range queries wrapped in ConstantScore. Boost for numeric fields is broken on master as well https://gist.github.com/7ecedea4f6a5219efb89
The issue was that under these circumstances the delete by query operation would run forever.
What also is fixed is that during shard recovery when delete by query is replayed nested docs
are also deleted. Closes#2302
The NPE occurred when for an arbitrary segment no parent documents exist for a has_parent filter/query and no child documents exist for a has_child filter/query.
Closes#2297
introduce a new class, TransportRequest, which includes headers. This class can be used when sending requests over the transport layer, and ActionRequest also extends it now.
This is the first phase of the refactoring part in the transport layer and action layer to allow for simpler implementations of those as well as simpler "filtering" capabilities in the future
The types exists api checks whether one or more types exists in one or more indices.
## Example usage
curl -XHEAD 'localhost:9200/twitter/tweet'
## Options
* `index` - One or more indices. Either specified as query string parameter or in the uri path.
* `type` - One or more types. Either specified as query string parameter or in the uri path.
* `ignore_missing` - Determines what type of indices to exclude from a request. The option can have the following values: `none` or `missing`.
Closes#2273
If has_parent, has_child or top_children are executed incorrectly then a better exception is thrown. This gives a better error description when one of these queries or filters is being used in count api.
Closes#2261
When setting cluster.routing.allocation.disable_allocation, it causes new indices primary shards to not be allocated. By default, new indices created should allow to, at the very least, allocate primary shards so they become operations. A new setting, cluster.routing.allocation.disable_new_allocation, allows to also disable "new" allocations.
closes#2258.
Here is a short example of how a simple reroute API call:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [
{"move" : {"index" : "test", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}},
{"allocate" : {"index" : "test", "shard" : 1, "node" : "node3"}}
]
}'
An importnat aspect to remember is the fact that once when an allocation occurs, the cluster will aim at rebalancing its state back to an even state. For example, if the allocation includes moving a shard from `node1` to `node2`, in an "even" state, then another shard will be moved from `node2` to `node1` to even things out.
The cluster can be set to disable allocations, which means that only the explicitl allocations will be performed. Obviously, only once all commands has been applied, the cluster will aim to be rebalance its state.
Anohter option is to run the commands in "dry_run" (as a URI flag, or in the request body). This will cause the commands to apply to the current cluster state, and reutrn the resulting cluster after the comamnds (and rebalancing) has been applied.
The commands supporterd are:
* `move`: Move a started shard from one node to anotehr node. Accepts `index` and `shard` for index name and shard number, `from_node` for the node to move the shard "from", and `to_node` for the node to move the shard to.
* `cancel`: Cancel allocation of a shard (or recovery). Accepts `index` and `shard` for index name and shar number, and `node` for the node to cancel the shard allocation on.
* `allocate`: Allocate an unassigned shard to a node. Accepts the `index` and `shard` for index name and shard number, and `node` to allocate the shard to. It also accepts `allow_primary` flag to explciitly specify that it is allowed to explciitly allocate a primary shard (might result in data loss).
closes#2256
The `has_parent` filter accepts a query and a parent type. The query is executed in the parent document space, which is specified by the parent type. This filter return child documents which associated parents have matched. For the rest `has_parent` filter has the same options and works in the same manner as the `has_child` filter.
This is an experimental filter.
Filter example
###################
```
{
"has_parent" : {
"parent_type" : "blog"
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
```
The `parent_type` field name can also be abbreviated to `type`.
Memory considerations
###############
With the current implementation, all _id values are loaded to memory (heap) in order to support fast lookups, so make sure there is enough mem for it.
This issue originates from issue #792
add support for internal custom allocation commands, including allocation, move, and cancel (shard).
also, fix#2242, which causes the cluster state to be in inconsistent state when a shard being the source of relocation is failed
* Fixed an issue where dynamic update to minimum_master_nodes settings would not take immediate effect
* Added LocalNodeMasterListener support to the ClusterService. Enables listening to when the local node becomes/stopped being a master
By introducing the Text abstraction, we can keep (long) text fields in their UTF8 bytes format, and no need to convert them to a string when serializing it back to Json for example.
The first place we can apply this is to highlighted text, which can be long.. . This does breaks backward comp. for people using the Java API where the HighlightField now has a Text as its content, and not String.
First phase at improving buffer management and reducing even further buffer copies. Introduce a BytesReference abstraction, allowing to more easily slice and "read/write references" from streams. This is the foundation for later using it to create smarter buffers on top of composite netty channels for example (which http now produces) as well as reducing buffer copies when sending transport/rest responses.
Allow the use of "doc" as the update source when a script is not
specified. New fields are added, existing fields are overwritten, and
maps are merged recursively.
When a node restarts, it might be canceling one recovery of a shard id only to get another one in the next cycle. We should detect this case and handle it properly.
This is a fix to the annoying message seen by users: suspect illegal state: trying to move shard from primary mode to replica mode.
Two main changes:
Improve cluster resiliency to disconnected sub clusters. If a node pings a master and that node is no longer registered with the master, improve the rejoin process of that node to the cluster. Also, if a master receives a message from another master, pick one to force to rejoin the cluster (based on cluster state versioning).
On quick rolling restart, without waiting for shard allocation, the shard allocation logic can mess up its counts, causing for strange logic in allocating shards, or validation failures on routing table allocation.
Compressing the stored fields file (the .fdt file) directly allows to have better compression on the size of the index, specifically when indexing (and storing) small documents. The compression will be considerably more effective compared to compressing each doc on its own (when setting compress on the _source mapper). The downside is that more data needs to be uncompressed when loading documents.
The settings to control it is `index.store.compress.stored_fields` set to `true` (it defaults to `false`), and can be enabled dynamically using the update settings API. This allows to enabled compression at a later stage (i.e. old time based indices), and then optimize the index to make sure it gets compressed.