Each plugin is now loaded in its own classloader to prevent class
conflicts when loading different versions of the same library. It
is enabled by default and is configurable through
`plugins.isolation` settings .Additionally, each plugin can
change its own isolation through the `isolation` property in
`es-plugin.properties`- if not specified, the global setting in
ES applies.
Closes#5261
The BigArrays utility class is useful to generate arrays of various sizes: when
small, arrays will be allocated directly on the heap while larger arrays are
going to be paged and to recycle pages through PageCacheRecycler. We already
have tracking for pages but this is not triggered very often since it only
happens on large amounts of data while our tests work on small amounts of data
in order to be fast.
Tracking arrays directly helps make sure that we never forget to release them.
This pull request also improves testing by:
- putting random content in the arrays upon release: this makes sure that
consumers don't use these arrays anymore when they are released as their
content may be subject to use for another purpose since pages are recycled
- putting random content in the arrays upon creation and resize when
`clearOnResize` is `false`.
The major difference with `master` is that the `BigArrays` class is now
instanciable, injected via Guice and usually available through the
`SearchContext`. This way, it can be mocked for tests.
It was being invoked once per reader and parent type combination resulting in more memory being reported to the circuit breaker than actually being used in field data.
Lucene 4.7 supports a setter for the `filler_token` that is
inserted if there are gaps in the token stream. This change exposes
this setting.
Closes#4307
The `ShardOperationFailedException` is now created within `TransportIndexReplicationAction` passing in the current shard id as a constructor argument.
Also replaced `AtomicReferenceArray<Object>` with `AtomicReferenceArray<ShardActionResult>`, where `ShardActionResult` wraps the `ShardResponse` or the failure, containing all the needed info.
seed from the main master seed. Removed shared cluster's seed entirely.
The problem here is that if you don't give cluster's seed then test times
fluctuate oddly, even for a fixed -Dtests.seed=... This shouldn't be the
case -- ideally, the test ran with the same master seed should reproduce
pretty much with the same execution time (and internal logic, obviously).
From the code point of view "global" variables are indeed a problem
because JUnit has no notion of before-suite hooks. And RandomizedRunner
doesn't support context accesses in static class initializers (this is
intentional because there is no way to determine when such initializers
will be executed). A workaround is to move such static global variables to
lazily-initialized methods and invoke them (once) in @BeforeClass hooks.
the thread local recycler requires obtain and recycle to be called on the same thread, while other recyclers do not. Also, it can create heavy recycle usage since it depends on the threads that its being used on. The concurrent / pinned thread base one is by far better than the pure thread local (and is the default) one since it more easily bounds the elements recycled, while still allowing to mix obtain and recycle across threads.
We will end up using the paged recyclers more and more, for example, in our networking output buffer, where obtaining will happen on one thread, while recycling can potentially occur on another thread (the callback thread). Since the limit of binding to a thread of the 2 calls is not really needed, and our best implementation supports going cross threads, there is no real need to impose this restriction.
some of the highlighters require term extraction to be implemented in
order to work. BlendedTermQuery doesn't implement the trivial extraction.
Closes#5246
- introduce additional destroy() callback that allows better control
over internals of recycled data
- introduced AbstractRecyclerC as superclass for Recycler factories
(Adrien) with empty destroy() by default
- added tests for destroy()
- cleaned up Recycler tests (reduce copy&paste)
A Field instance can map to multiple actual fields when using wildcard expressions. Each actual field should use the proper highlighter depending on the available data structure (e.g. term_vectors), while we currently select the highlighter for the first field and we keep using the same for all the fields that match the wildcard expression.
Modified also how the PercolateContext sets the forceSource option, in a global manner now rather than per field.
Closes#5175
When starting elasticsearch with a wrong linux user, it could generate a `NullPointerException` when `PluginsService` tries to list available plugins in `./plugins` dir.
To reproduce:
* create a plugins directory with `rwx` rights for root user only
* launch elasticsearch from another account (elasticsearch for example)
It was supposed to be fixed with #4186, but sadly it's not :-(
Closes#5195.
In #4052 we added support for highlighting multi term queries using the postings highlighter. That worked only for top-level queries though, and not for multi term queries that are nested for instance within a bool query, or filtered query, or a constant score query.
The way we make this work is by walking the query structure and temporarily overriding the query rewrite method with a method that allows for multi terms extraction.
Closes#5102
Fixes#5128
Remove java 7 specific Locale functions, add "coming[1.1.0]" to documentation
add LocaleUtils utility class for dealing with Locale functions
When running tests for site plugins, it could happen that the REST Service is not fully started and not ready immediately to serve HTTP requests.
It gives `503 Service Unavailable` error in that case.
This patch will gives 5 seconds before failing the test.
Adds support for storing mustache based query templates that can later be filled
with query parameter values at execution time. Templates may be both quoted,
non-quoted and referencing templates stored in config/scripts/*.mustache by file
name.
See docs/reference/query-dsl/queries/template-query.asciidoc for templating
examples.
Implementation detail: mustache itself is being shaded as it depends directly on
guava - so having it marked optional but included in the final distribution
raises chances of version conflicts downstream.
Fixes#4879
`_exists_` and `_missing_` miss field name expansion that `exists` and
`missing` have, which allows these filters to work on `object` fields.
Close#5142
If there initialization errors and no tests to execute at the same time, better to return the initialization errors, whose check should be first then, as it might be that the "no tests to execute" is caused by the initialization errors.
- Renamed IndexMetaData#removerAlias to removeAlias
- Removed IndexTemplateMetaData#fromXContentStandalone unused method (relates to #4511)
- MetaDataIndexAliasesService fix typo in comment
- Alias removed unused constructor that accepts both alias name and filter
In 0.90.x I was able to delete all my indices from the java api by calling
client.admin().indices().prepareDelete(new String[] {}).execute().actionGet();
However this fails in 1.0.0 with
org.elasticsearch.action.ActionRequestValidationException: Validation Failed: 1: index / indices is missing;
at org.elasticsearch.action.ValidateActions.addValidationError(ValidateActions.java:29)
at org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest.validate(DeleteIndexRequest.java:72)
*snip long stacktrace*
which points me to
public ActionRequestValidationException validate() {
ActionRequestValidationException validationException = null;
if (indices == null || indices.length == 0) {
validationException = addValidationError("index / indices is missing", validationException);
}
return validationException;
}
So that's what now throws the error, however the documentation still says:
/**
* Deletes an index based on the index name.
*
* @param indices The indices to delete. Empty array to delete all indices.
*/
DeleteIndexRequestBuilder prepareDelete(String... indices);
Closes#5164.
Closes#5167.
Closes#5168.
The old post installation script on debian set all data to
644 inside of /etc/elasticsearch, which does not work, when
there are subdirectories
Closes#3820
It is now possible to specify aliases during index creation:
curl -XPUT 'http://localhost:9200/test' -d '
{
"aliases" : {
"alias1" : {},
"alias2" : {
"filter" : { "term" : {"field":"value"}}
}
}
}'
Closes#4920
Nodes that receive the cluster state, and they have several of those pending, can optimize and try and process potentially only one of those.
closes#5139
more assertAcked, checked that the `discovery.zen.publish_timeout` has been changed in `DiscoverySettings`, removed restriction on number of nodes, doesn't seem needed.
`custom_boost_factor` and `custom_score` were deprecated in `0.90.5`
and their documentation was removed already in `1.0`. This commit
removes all support for those queries since they are supercede by
`function_score`.
`discovery.zen.publish_timeout` controls how long the master node is going to try and wait for all the nodes to respond to a cluster state publish before going ahead with the following updates in the queue (default 30s). Up until now changing the settings required restarting each node. The setting is now dynamic and can be changed through the cluster update settings api.
Closes#5063
In order to be consistent (and because in 1.0 we switched from
parameter driven information to specifzing the metrics as part of the URI)
this patch moves from 'plugin' to 'plugins' in the Nodes Info API.
AndDocIdSet#IteratorBasedIterator was potentially initialized with
NO_MORE_DOCS which violates the initial state of DocIdSetIterator and
could lead to undefined behavior when used in a search context.
Closes#5049
Multiple nodes are now started when running REST tests against the `TestCluster` (default randomized settings are now used instead of the hardcoded `1`)
Added also randomized round-robin based on all available nodes, and ability to provide multiple addresses when running tests against an external cluster to have the same behaviour
When an analysis plugins provides default index settings using `PreBuiltAnalyzerProviderFactory`, `PreBuiltTokenFilterFactoryFactory` or `PreBuiltTokenizerFactoryFactory` it fails when upgrading it with elasticsearch superior or equal to 0.90.5.
Related issue: #4936
Fix is needed in core. But, in the meantime, analysis plugins developers can fix that issue by overloading default prebuilt factories.
For example:
```java
public class StempelAnalyzerProviderFactory extends PreBuiltAnalyzerProviderFactory {
private final PreBuiltAnalyzerProvider analyzerProvider;
public StempelAnalyzerProviderFactory(String name, AnalyzerScope scope, Analyzer analyzer) {
super(name, scope, analyzer);
analyzerProvider = new PreBuiltAnalyzerProvider(name, scope, analyzer);
}
@Override
public AnalyzerProvider create(String name, Settings settings) {
return analyzerProvider;
}
public Analyzer analyzer() {
return analyzerProvider.get();
}
}
```
And instead of:
```java
@Inject
public PolishIndicesAnalysis(Settings settings, IndicesAnalysisService indicesAnalysisService) {
super(settings);
indicesAnalysisService.analyzerProviderFactories().put("polish", new PreBuiltAnalyzerProviderFactory("polish", AnalyzerScope.INDICES, new PolishAnalyzer(Lucene.ANALYZER_VERSION)));
}
```
do
```java
@Inject
public PolishIndicesAnalysis(Settings settings, IndicesAnalysisService indicesAnalysisService) {
super(settings);
indicesAnalysisService.analyzerProviderFactories().put("polish", new StempelAnalyzerProviderFactory("polish", AnalyzerScope.INDICES, new PolishAnalyzer(Lucene.ANALYZER_VERSION)));
}
```
Closes#5030
The byte[] array that was used to store the term was owned by the BytesRefHash
which is used to compute counts. However, the BytesRefHash is released at some
point and its content may be recycled.
MockPageCacheRecycler has been improved to expose this issue (putting random
content into the arrays upon release).
Number of documents/terms have been increased in RandomTests to make sure page
recycling occurs.
Close#5021
`cross_fields` attemps to treat fields with the same analysis
configuration as a single field and uses maximum score promotion or
combination of the scores based depending on the `use_dis_max` setting.
By default scores are combined. `cross_fields` can also search across
fields of hetrogenous types for instance if numbers can be part of
the query it makes sense to search also on numeric fields if an analyzer
is provided in the reqeust.
Relates to #2959
This commit removes FilterBytesValues which is very trappy as the default
implementation forwards all method calls to the delegate. So if you do any
non-trivial modification to the terms or to the order of the terms, you need
to remember to override currentValueHash, copyShared, and this is very
error-prone.
FieldDataSource.WithScript.BytesValues and ScriptBytesValues now return correct
hash codes, future bugs here would be catched by the new assertion in
SortedUniqueBytesValues.
This bug was causing performance issues with scripts as all terms were assumed
to have the same hash code.
Close#5004
The last response body gets now always stashed in the REST tests and can be retrieved via `$body`. This implies that not only expected values can be retrieved from the stashed values, but actual values as well.
Added support for regular expressions to `match` assertion, using `Pattern.COMMENTS` flag for better readability through new custom hamcrest matcher (adopted in do section as well). Functionality added through new feature called `regex` that needs to be mentioned in the skip sections whenever needed till all the runners support it.
Added also example tests for cat count api
Our yaml parser seems to be too tolerant in some cases, e.g. it properly parses yaml suites when they don't end with a line feed, whereas other clients tests break due to that. We should try to do the same to keep consistency across runners.
Upgrading 0.90.x `multi_field` type that has a `geo_point` or `completion` field type as default field would otherwise fail.
Also it make sense to support these field types, because both support specifying the actual values as string.
Currently, boosting on `copy_to` is misleading and does not work as originally specified in #4520. Instead of boosting just the terms from the origin field, it boosts the whole destination field. If two fields copy_to a third field, one with a boost of 2 and another with a boost of 3, all the terms in the third field end up with a boost of 6. This was not the intention.
The alternative: to store the boost in a payload for every term, results in poor performance and inflexibility. Instead, users should either (1) query the common field AND the field that requires boosting, or (2) the multi_match query will soon be able to perform term-centric cross-field matching that will allow per-field boosting at query time (coming in 1.1).
As we have different runners for the REST tests we need a mechanism that allows us to add features to any of them without breaking all others builds.
The idea is to name a feature and temporarily use skip sections that mention the required new features, so that runners that don't support it will skip the test.
Added support for `features` field in skip section.
Added `Features` class that contains a static list of the features supported by the runner. If a feature mentioned in a skip section is not listed here, the test will be skipped.
When fixing #4738, a small issue leaked into the implementation.
The equals check in the RestAction only applied when the master node
returned the rest request, otherwise the object equality would not hold
due to being transferred over the wire and being deserialized into
another object (from and an equality point of view) than the
FieldMappingMetaData.NULL object - this could result in serialization
exceptions as an empty length bytes reference is used in toXContent.
By default active, rejected and queue thread statistics are included for the index, bulk and search thread pool.
Other thread statistics of other thread pools can be included via the `h` query string parameter.
Closes#4907
In recent changes, we added missing support for `source` parameter in some REST APIs:
* #4892 : mget
* #4900 : mpercolate
* #4901 : msearch
* #4902 : mtermvectors
* #4903 : percolate
```java
BytesReference content = null;
if (request.hasContent()) {
content = request.content();
} else {
String source = request.param("source");
if (source != null) {
content = new BytesArray(source);
}
}
```
It's definitely better to have:
```java
BytesReference content = request.content();
if (!request.hasContent()) {
String source = request.param("source");
if (source != null) {
content = new BytesArray(source);
}
}
```
That said, it could be nice to have a single method to manage it for various REST actions.
Closes#4924.
We currently use the number of hot threads that we are
interested in as the value for iterating over the actual
hot threads which can lead to AIOOB is the actual number
of threads is less than the given number.
Closes#4927
- add javadocs
- remove Iterable from all multi-bucket aggregations
- all single-bucket aggregations should have getDocCount() and getAggregations()
- all multi-bucket aggregations should have getBuckets() that returns Collection
- every multi-bucket aggregation should have these methods:
- getBuckets() : Collection
- getBucketByKey(String) : Bucket
- getBucketByKey(Number) : Bucket (only for numeric buckets)
- getBucketByKey(DateTime) : Bucket (only for date buckets)
- getBucketByKey(GeoPoint) : Bucket (only for geohash_grid)
- every bucket in all multi-bucket aggregations should have these methods:
- getKey() : String
- getKeyAsText() : Text
- getKeyAsNumber() : Number (if the key can be numeric value, eg. range & histograms)
- getKeyAsGeoPoint() : GeoPoint (in case of the geohash_grid agg)
Closes#4922
This upgrade includes a fix for RAM estimation on IndexReader
that allows to expose the amount of used bytes per segment now
as a setting in Elasticsearch. (LUCENE-5373)
Additionally this bugfix release contained a small fix for highlighting
that was already ported to Elasticsearch when reported (LUCENE-5361)
Closes#4897
If a get field mapping request is issued, and all but the field can be
found, the response should return an empty JSON object instead of a 404.
Closes#4738
In order to make sure, that only the requested data is returned to the client,
a couple of fixes have been applied in the ClusterState.toXContent() method.
Also some tests were added to the yaml test suite
Closes#4885
If a preparsing of the source is needed (due to mapping configuration,
which extracts the routing/id value from the source) and the source is not
valid JSON, then the whole bulk request is failed instead of a single
BulkRequest.
This commit ensures, that a broken JSON request is not forwarded to the
destination shard and creates an appropriate BulkItemResponse, which
includes a failure.
This also implied changing the BulkItemResponse serialization, because one
cannot be sure anymore, if a response includes an ID, in case it was not
specified and could not be extracted from the JSON.
Closes#4745
After copying the index files (which are throttled), we currently throttle the translog as well. The translog phase3 part is performed under a lock, so its better not to throttle it at all, and move it as fast as possible.
We currently run always with SecurityManager installed. To make sure we
work also without we should randomly swap it out ie. run without the
security manager.
Tests fail once in a while because of a ClassCastException at the mvel level.
We suspect that this happens because a script that is JIT-ed on a specific
data type cannot later be used with another one, but we didn't manage to
reproduce in our development environments, so let's try to change the field
names to see if this error keeps occurring.
We rely on the `cluster.name` setting to be the same across all nodes
and transport clients etc. If a node setting contains `cluster.name`
the test might not work if a random transport client is swapped
in. Passing such a configuration should result in an exception since
it's clearly an illegal state.
If one asks for `http://es:9200/_plugin/PLUGIN_NAME/` and the the plugin's _site directory contains an index.html file, it will be correctly served.
This is not the case for sub directories: a _site/folder/index.html is not served when requesting `http://es:9200/_plugin/PLUGIN_NAME/folder/` but one gets a 403 Forbidden response as if trying to browse the folder.
Closes#4845.
Detects if rescores arrive as an array instead of a plain object. If so
then parse each element of the array as a separate rescore to be executed
one after another. It looks like this:
"rescore" : [ {
"window_size" : 100,
"query" : {
"rescore_query" : {
"match" : {
"field1" : {
"query" : "the quick brown",
"type" : "phrase",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}, {
"window_size" : 10,
"query" : {
"score_mode": "multiply",
"rescore_query" : {
"function_score" : {
"script_score": {
"script": "log10(doc['numeric'].value + 2)"
}
}
}
}
} ]
Rescores as a single object are still supported.
Closes#4748
It happens to be the case that the iteration order of a HashMaps
keyset might be different across runs. This can cause undeterministic
results in shard balancing if weights are identical and multiple shards
of the same index are eligable for relocation. This commit adds
a tie-breaker based on the shard ID to prioritise the lowest shard
ID. This also makes `AddIncrementallyTests#testAddNodesAndIndices`
reproducible.
Closes#4867
We need to remove the reproduce info printer after the suite
returns otherwise it might print a bogus line if a subsequent
non-rest test fails. The `RunNotifier` is used across suites in
the same JVM and the listener sticks to it.