When we have many new fields keep being introduced, the immutable open map we used becomes more and more expensive because of its clone characteristics, and we use it in several places.
The usage semantics of it allows us to also use a CHM if we want to, but it would be nice to still maintain the concurrency aspects of volatile immutable map when the number of fields is sane.
Introduce a new map like data structure, that can switch internally to CHM when a certain threshold is met.
Also add a benchmark class to exploit the many new field mappings use case, which shows significant gains by using this change, to a level where mapping introduction is no longer a significant bottleneck.
closes#6707
This method tires to optimize boolean queries if there is only
one clause. Yet BooleanQuery already does that internally This
optimization is unneeded.
Closes#6743
If you try to remove a plugin in read only dir, you get a successful result:
```
$ bin/plugin --remove marvel
-> Removing marvel
Removed marvel
```
But actually the plugin has not been removed.
When installing, if fails properly:
```
$ bin/plugin -i elasticsearch/marvel/latest
-> Installing elasticsearch/marvel/latest...
Failed to install elasticsearch/marvel/latest, reason: plugin directory /usr/local/elasticsearch/plugins is read only
```
This change throw an exception when we don't succeed removing the plugin.
Closes#6546.
Adding code to test for unset plugin names to fail fast with descriptive error messages. Also simplified the series of `if` statements checking for the commands by using a `switch` (now that it's using Java 7), added tests, and updated random exceptions with the up-to-date flag names (e.g., "--verbose" instead of "-verbose").
Closes#5976.
Closes#6013.
This causes a NPE since XContentStructure checks if the query is null
and takes this as the condition to parse from the byte source which is
actually null in that case.
Closes#6722
In this test we only index a handful of docs so if we have more shards
than docs we might fail on the `assertSearchResult` since not all shards
are started but results are just fine.
Today, we take great care to try and share the same analyzer instances across shards and indices (global analyzer). The idea is to share the same analyzer so the thread local resource it has will not be allocated per analyzer instance per thread.
The problem is that AnalyzerWrapper keeps its resources on its own per thread storage, and with per field reuse strategy, it causes for per field per thread token stream components to be used. This is very evident with the StandardTokenizer that uses a buffer...
This came out of test with "many fields", where the majority of 1GB heap was consumed by StandardTokenizer instances...
closes#6714
After a node joins the clusters, it starts pinging the master to verify it's health. Before, the cluster join request was processed async and we had to give some time to complete. With #6480 we changed this to wait for the join process to complete on the master. We can therefore start pinging immediately for fast detection of failures. Similar change can be made to the Node fault detection from the master side.
Closes#6706
The change introduced in #6686 checks for ConnectionTransportException during pinging. However, transport layer wraps it in SendRequestTransportException
Currently regexes in Pattern Tokenizer docs are escaped (it seems according to Java rules). I think it is better not to escape them because JSON escaping should be automatic in client libraries, and string escaping depends on a client language used. The default pattern is `\W+`, not `\\W+`.
Closes#6615
Add `irish` analyzer
Add `sorani` analyzer (Kurdish)
Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc.
Add `thai` tokenizer: segments thai text into words.
Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer
Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself
Add `german_normalization` tokenfilter: umlaut/sharp S normalization
Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences
Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages
Add `sorani_normalization` tokenfilter: normalizes kurdish text
Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text
Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization`
Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk`
Add support access to default Thai stopword set "_thai_"
Fix some bugs and broken links in documentation.
Closes#5935
`-Dtests.filter` allows to pass filter expressions to the elasticsearch
tests. This allows to filter test annotaged with TestGroup annotations
like @Slow, @Nightly, @Backwards, @Integration with a boolean expresssion like:
* to run only backwards tests run:
`mvn -Dtests.bwc.version=X.Y.Z -Dtests.filter="@backwards"`
* to run all integration tests but skip slow tests run:
`mvn -Dtests.filter="@integration and not @slow"
* to take defaults into account ie run all test as well as backwards:
`mvn -Dtests.filter="default and @backwards"
This feature is a more powerful alternative to flags like
`-Dtests.nighly=true|false` etc.
Closes#6703
For the casual reader, the reference to "term queries" may be glossed over, yielding an unexpected result when using `regexp` queries.
This attempts to make that distinction more prominent.
Closes#6698
today we track both the index name and type for mapping updates in the shard bulk action, but we only work against on index in this level, so no need to track the index name itself
closes#6695