When an analysis plugins provides default index settings using `PreBuiltAnalyzerProviderFactory`, `PreBuiltTokenFilterFactoryFactory` or `PreBuiltTokenizerFactoryFactory` it fails when upgrading it with elasticsearch superior or equal to 0.90.5.
Related issue: #4936
Fix is needed in core. But, in the meantime, analysis plugins developers can fix that issue by overloading default prebuilt factories.
For example:
```java
public class StempelAnalyzerProviderFactory extends PreBuiltAnalyzerProviderFactory {
private final PreBuiltAnalyzerProvider analyzerProvider;
public StempelAnalyzerProviderFactory(String name, AnalyzerScope scope, Analyzer analyzer) {
super(name, scope, analyzer);
analyzerProvider = new PreBuiltAnalyzerProvider(name, scope, analyzer);
}
@Override
public AnalyzerProvider create(String name, Settings settings) {
return analyzerProvider;
}
public Analyzer analyzer() {
return analyzerProvider.get();
}
}
```
And instead of:
```java
@Inject
public PolishIndicesAnalysis(Settings settings, IndicesAnalysisService indicesAnalysisService) {
super(settings);
indicesAnalysisService.analyzerProviderFactories().put("polish", new PreBuiltAnalyzerProviderFactory("polish", AnalyzerScope.INDICES, new PolishAnalyzer(Lucene.ANALYZER_VERSION)));
}
```
do
```java
@Inject
public PolishIndicesAnalysis(Settings settings, IndicesAnalysisService indicesAnalysisService) {
super(settings);
indicesAnalysisService.analyzerProviderFactories().put("polish", new StempelAnalyzerProviderFactory("polish", AnalyzerScope.INDICES, new PolishAnalyzer(Lucene.ANALYZER_VERSION)));
}
```
Closes#5030
The byte[] array that was used to store the term was owned by the BytesRefHash
which is used to compute counts. However, the BytesRefHash is released at some
point and its content may be recycled.
MockPageCacheRecycler has been improved to expose this issue (putting random
content into the arrays upon release).
Number of documents/terms have been increased in RandomTests to make sure page
recycling occurs.
Close#5021
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.
Relates to #2959
This commit removes FilterBytesValues which is very trappy as the default
implementation forwards all method calls to the delegate. So if you do any
non-trivial modification to the terms or to the order of the terms, you need
to remember to override currentValueHash, copyShared, and this is very
error-prone.
FieldDataSource.WithScript.BytesValues and ScriptBytesValues now return correct
hash codes, future bugs here would be catched by the new assertion in
SortedUniqueBytesValues.
This bug was causing performance issues with scripts as all terms were assumed
to have the same hash code.
Close#5004
* Mostly minor things like typos and grammar stuff
* Some clarifications
* The note on the deprecation was ambiguous. I've removed the problematic part so that it now definitely says it's deprecated
The last response body gets now always stashed in the REST tests and can be retrieved via `$body`. This implies that not only expected values can be retrieved from the stashed values, but actual values as well.
Added support for regular expressions to `match` assertion, using `Pattern.COMMENTS` flag for better readability through new custom hamcrest matcher (adopted in do section as well). Functionality added through new feature called `regex` that needs to be mentioned in the skip sections whenever needed till all the runners support it.
Added also example tests for cat count api
Our yaml parser seems to be too tolerant in some cases, e.g. it properly parses yaml suites when they don't end with a line feed, whereas other clients tests break due to that. We should try to do the same to keep consistency across runners.
Upgrading 0.90.x `multi_field` type that has a `geo_point` or `completion` field type as default field would otherwise fail.
Also it make sense to support these field types, because both support specifying the actual values as string.