All settings should be passes as settings and the enviroment should not
influence the test cluster settings. The settings we care about ie.
`es.node.mode` and `es.logger.level` should be passed via settings.
This allows tests to override these settings if they for instance need
`network` transport to operate at all.
Closes#6663
Today `index.version.created` depends on the version of the master
node in the cluster. This is potentially causing new features to be
expected on shards that didn't exist when the index was created.
There is no notion of `where was the shard allocated first` such that
`index.version.created` can't be reliably used as a feature flag.
With this change the `index.version.created` can be reliably used to
determin the smallest nodes version at the point in time when the index
was created. This means we can safely use certain features that would
for instance require reindeing and / or would not work if not the
entire index (all shards and segments) have been created with a certain
version or newer.
Closes#6660
When three threads are trying to write checksums at the same time, it's possible for all three threads to obtain the same checksum file name A. Then the first thread enters the synchronized section, creates the file with name A and exits. The second thread enters the synchronized section, checks that A exists, creates file A+1 and exits the critical section. Then it proceeds to clean up and deletes all checksum files including A. If it happens before the third thread enters the synchronized section, it's possible for the third thread to check for A and since it no longer exists create the checksum file A the second time, which triggers "file _checksums-XXXXXXXXXXXXX was already written to" exception in MockDirectoryWrapper and fails recovery.
They were due to a combination of mappings propagation delays and the behavior
of MapperService.smartName(String) so mappings are now configured up-front.
Makes it possible to delete snapshots that are missing some of the metadata files. This can happen if snapshot creation failed because repository drive ran out of disk space.
Closes#6383
The delayed mapping intro tests exposed a bug where if a new mapping is introduced, yet not updated on the master, and a full restart occurs, reply of the transaction log will not cause the new mapping to be re-introduced.
closes#6659
add comment on the method
Today, when a new mapping is introduced, the mapping is rebuilt (refreshSource) on the thread that performs the indexing request. This can become heavier and heavier if new mappings keeps on being introduced, we can move this process to another thread that will be responsible to refresh the source and then send the update mapping to the master (note, this doesn't change the semantics of new mapping introduction, since they are async anyhow).
When doing so, the thread can also try and batch as much updates as possible, this is handy especially when multiple shards for the same index exists on the same node. An internal setting that can control the time to wait for batches is also added (defaults to 0).
Testing wise, a new support method on ElasticsearchIntegrationTest#waitForConcreteMappingsOnAll to allow to wait for the concrete manifestation of mappings on all relevant nodes is added. Some tests mistakenly rely on the fact that there are no more pending tasks to mean mappings have been updated, so if we see, timing related, failures down later (all tests pass), then those will need to be fixed to wither awaitBusy on the master for the new mapping, or in the rare case, wait for the concrete mapping on all the nodes using the new method.
closes#6648
allow to change the additional time window dynamically
better sorting on mappers when refreshing source
also, no need to call nodes info in test, we already have the node names
clean calls to mapping update to provide doc mapper and UUID always
also use the internal cluster support method to get the list of nodes an index is on
reverse the order to pick the latest change first
remove unused field
and fix constructor param
move to start/stop on mapping update action
randomize INDICES_MAPPING_ADDITIONAL_MAPPING_CHANGE_TIME
Try and push our system to a state where there is only a single worker, trying to expose potential deadlocks when we by mistake execute blocking operations on the worker thread
closes#6635
only change recovery throttling to slow down recoveries. The recovery file chunk size updates are not picked up by ongoing recoveries. That cause the recovery to take too long even after the default settings are restored.
Also - change document creation to reuse field names in order to speed up the test.
We clone RateLimitedIndexOutput from lucene just to collect pausing
statistics we can do this in a more straight forward way in a delegating
RateLimiter.
Closes#6625
Thread rejection should return too many requests status code, and not 503, which is used to also show that the cluster is not available
relates to #6627, but only for rejections for now
closes#6629
We want to make sure recycling will not fail for any reason while trying to send a response back that is caused by a failure, for example, if we have circuit breaker on it (at one point), sending an error back will not be affected by it.
closes#6631
the test failed but couldn't repro (yet), at the very least, make sure we have the exception message as the reason, can help to track down the failure itself when it happens again
If the match query with cutoff_frequency encounters stacked tokens,
like synonyms in the same position, it returns a boolean query instead
of a common terms query. However, if the original operator was set
to "and", it was ignoring that and resetting the operator to "or".
In fact, if operator is "and" then there is little benefit in using
a common terms query as a must query is already
executed efficiently.
Waiting for ongoing recoveries was not good enough as it can run before the master finishing processing the started events of primary shards, causing the recovery response to be erroneously empty
Made AckedClusterStateUpdateTask an abstract class instead of an interface, which contains the common methods.
Also introduced the AckedRequest interface to mark both AcknowledgedRequest & ClusterStateUpdateRequest so that the different ways of updating the cluster state (with or without a MetaData*Service) can share the same code.
Removed ClusterStateUpdateListener as we can just use its base class ActionListener instead.
Closes#6559
Sandboxes the groovy scripting language with multiple configurable
whitelists:
`script.groovy.sandbox.receiver_whitelist`: comma-separated list of string
classes for objects that may have methods invoked.
`script.groovy.sandbox.package_whitelist`: comma-separated list of
packages under which new objects may be constructed.
`script.groovy.sandbox.class_whitelist` comma-separated list of classes
that are allowed to be constructed.
As well as a method blacklist:
`script.groovy.sandbox.method_blacklist`: comma-separated list of
methods that are never allowed to be invoked, regardless of target
object.
The sandbox can be entirely disabled by setting:
`script.groovy.sandbox.enabled: false`
Introduced the use of the FilterClient in all of the REST actions, which delegates all of the operations to the internal Client, but makes sure that the headers are properly copied if needed from REST requests to TransportRequest(s) when it comes to executing them.
Added new abstract handleRequest method to BaseRestHandler with additional Client argument and made private the client instance member (was protected before) to force the use of the client received as argument.
The list of headers to be copied over is by default empty but can be extended via plugins.
Closes#6513
After #6517 we ended up registering all of the actions (included admin ones) to the NodeClient.
Made sure that only the proper type of Action instances are registered to each client type.
Also fixed some compiler warnings: unused members, imports and non matching generic types.
Closes#6563
Mid-term we should switch from `BytesValues` to Lucene's doc values APIs, in
particular the `SortedSetDocValues` class. While `BytesValues.WithOrdinals` and
SortedSetDocValues expose the same functionality, `BytesValues.WithOrdinals`
exposes its ordinals via a different `Ordinals.Docs` object while
`SortedSetDocValues` exposes them on the same object as the one that holds the
values. This commit merges ordinals into `BytesValues.WithOrdinals` in order to
make both classes even closer.
Global ordinals were a bit tricky to migrate so I just changed them to use
Lucene's OrdinalMap that will soon (LUCENE-5767, scheduled for 4.9) have the
same optimizations as our global ordinals.
Close#6524
The `exists` and `missing` filters need to merge postings lists of all existing
terms, which can be very costly, especially on high-cardinality fields. This
commit indexes the field names of a document under `_field_names` and reuses it
to speed up the `exists` and `missing` filters.
This is only enabled for indices that are created on or after Elasticsearch
1.3.0.
Close#5659
VersionFieldMapper.defaultDocValuesFormat claims that the default is `disk`.
This is not used to choose the DV format in the index but for mappings
serialization in order to know when the _version doc values format is
different from the default format. This made it impossible to use the `disk`
doc values format since mappings would never retain that information at
serialization time.
Close#6523
The TTL, size, timestamp and index meta properties could be lost on an
update of a single field mapping due to a wrong comparison in the
merge method (which was caused by a wrong initialization, which marked
an update as explicitely disabled instead of unset.
Closes#5053
Added the http.jsonp.enable option to configure disabling of JSONP responses, as those
might pose a security risk, and can be disabled if unused.
This also fixes bugs in NettyHttpChannel
* JSONP responses were never setting application/javascript as the content-type
* The content-type and content-length headers were being overwritten even if they were set before
Closes#6164
Commit fbd7c9aa5d introduced a regression that caused
the min_doc_count to be equal to the number of documents in the
background set. As a result no buckets were built when the
response for significant terms was created.
This only affected the final XContent response.
closes#6535
Currently we send relocation & flush actions based on all assigned ShardRoutings. During the final stage of relocation, we may miss to refresh/flush a shard if the coordinating node has not yet processed the cluster state update indicating that a relocation is completed *and* the relocation target node has already processed it (i.e., started the shard and has accepted new indexing requests).
Closes#6545
Percentile Rank Aggregation is the reverse of the Percetiles aggregation. It determines the percentile rank (the proportion of values less than a given value) of the provided array of values.
Closes#6386
Nested documents were indexed as separate documents, but it was never checked
if the hits represent nested documents or not. Therefore, nested objects could
match not nested queries and nested queries could also match not nested documents.
Examples are in issue #6540 .
closes#6540closes#6544
Client, ClusterAdminClient and IndicesAdminClient had corresponding
intermediate `internal` interfaces that are unnecessary and cause
a lot of casting. This commit removes the intermediate interfaces
and uses the super interfaces directly.
This commit also adds Releaseable to `Node` and `Client` in order to
be used with utilities like try / with.
Closes#4355Closes#6517
This commit renames `TestCluster` -> `InternalTestCluster` and
`ImmutableTestCluster` to `TestCluster` for consistency. This also
makes `ExternalTestCluster` and `InternalTestCluster` consistent
with respect to their execution environment.
Closes#6510
This commit add a basic infrastructure as well as primitive tests
to ensure version backwards compatibility between the current
development trunk and an arbitrary previous version. The compatibility
tests are simple unit tests derived from a base class that starts
and manages nodes from a provided elasticsearch release package.
Use the following commandline executes all backwards compatiblity tests
in isolation:
```
mvn test -Dtests.bwc=true -Dtests.bwc.version=1.2.1 -Dtests.class=org.elasticsearch.bwcompat.*
```
These tests run basic checks like rolling upgrades and
routing/searching/get etc. against the specified version. The version
must be present in the `./backwards` folder as
`./backwards/elasticsearch-x.y.z`
The alias -> (index -> alias) map, specifically the index -> alias one, typically just hold one entry, yet we eagerly initialize it to the number of indices. When there are many indices, each with many aliases, this is a very large overhead per alias...
closes#6504
Our field data currently exposes hashes of the bytes values. That takes roughly
4 bytes per unique value, which is definitely not negligible on high-cardinality
fields.
These hashes have been used for 3 different purposes:
- term-based aggregations,
- parent/child queries,
- the percolator _id -> Query cache.
Both aggregations and parent/child queries have been moved to ordinals which
provide a greater speedup and lower memory usage. In the case of the percolator
it is used in conjunction with HashedBytesRef to not recompute the hash value
when getting resolving a query given its ID. However, removing this has no
impact on PercolatorStressBenchmark.
Close#6500
This was how terms aggregations managed to not be too slow initially by caching
reads into the terms dictionary using ordinals. However, this doesn't behave
nicely on high-cardinality fields since the reads into the terms dict are
random and this execution mode loads all unique terms into memory.
The `global_ordinals` execution mode (default since 1.2) is expected to be
better in all cases.
Close#6499
Under some rare circumstances:
- local transport,
- the range aggregation has both a parent and a child aggregation,
- the range aggregation got no documents on one shard or more and several
documents on one shard or more.
the range aggregation could return incorrect counts and sub aggregations.
The root cause is that since the reduce happens in-place and since the range
aggregation uses the same instance for all sub-aggregation in case of an
empty bucket, sometimes non-empty buckets would have been reduced into this
shared instance.
In order to avoid similar bugs in the future, aggregations have been updated
to return a new instance when reducing instead of doing it in-place.
Close#6435
Moved BulkProcessor tests from BulkTests to newly added BulkProcessorTests class.
Strenghtened BulkProcessorTests by adding randomizations to existing tests and new tests for concurrent requests and expcetions.
Also made sure that afterBulk is called only once per request if concurrentRequests==0.
Closes#5038
When a node sends a join request to the master, only send back the response after it has been added to the master cluster state and published.
This will fix the rare cases where today, a join request can return, and the master, since its under load, have not yet added the node to its cluster state, and the node that joined will start a fault detect against the master, failing since its not part of the cluster state.
Since now the join request is longer, also increase the join request timeout default.
closes#6480
Previously, one MLT query per field was created for each item. One issue with
this method is that the maximum number of selected terms was equal to the
number of items times 'max_query_terms'. Instead, users should have direct control
over the maximum number of selected terms allowed, regardless of the number of
queried items.
Another issue related to the previous method is that it could lead to the
selection of rather uninteresting terms, that because they were found in a
particular queried item. Instead, this new procedure enforces the selection of
interesting terms across ALL items, not within each item. This could lead to
search results where the best matching items share commonalities amongst the
best characteristics of all the items.
Closes#6404
This commit adds checks for nocommit and tabs in the source code.
The task is executed during the validate phase and can be disabled via
`-Dvalidate.skip`
* `english` returned the slow snowball English stemmer
* `porter2` returned the snowball Porter stemmer (v1)
* `portuguese` was used twice, preventing the second version from working
Changes:
* `english` now returns the fast PorterStemmer (for indices created from v1.3.0 onwards)
* `porter2` now returns the snowball English stemmer (for indices created from v1.3.0 onwards)
* `light_english` now returns the `kstem` stemmer (`kstem` still works)
* `portuguese_rslp` returns the PortugueseStemmer
* `dutch_kp` is a synonym for `kp`
Tests and docs updated
Fixes#6345Fixes#6213Fixes#6330
Bugs:
* "groups" and "types" were being ignored
* "completion_fields" as wildcards were not being resolved to fieldnames
Enhancements:
* Made "groups" and "types" support wildcards
* Added missing tests
Closes#6390
In order to return more information to the client, in case a TransportClient
can not connect to the cluster, this commit adds logging and also returns the
configured nodes in the NoNodeAvailableException
Also a minor bug has been fixed, which propagated exceptions wrong, so that an
invalid request was actually tried on every node, if a regular connection failure
on the first node had happened.
Closes#6376
Our REST backwards compatibility tests need to be able to disable client nodes within the TestCluster when running older tests that assume client nodes are not around.
Previously if the user provided a non-conforming string, it would blow up with
`java.lang.StringIndexOutOfBoundsException: String index out of range: -1`
which is not a *helpful* error message.
Also updated the documentation to make the possible setting values more clear.
Close#5752
A new "breadth_first" results collection mode allows upper branches of aggregation tree to be calculated and then pruned
to a smaller selection before advancing into executing collection on child branches.
Closes#6128
The put index template api supports the create parameter (defaults to false), which tells whether the template can replace an existing one with same name or not. Unified its behaviour between PUT and POST method, whereas the POST would previously force create to true.
Added create parameter to the rest spec (was missing before) and a REST test for create true scenario.
There is a pretty nasty bug in the lock factory we use that can cause
nodes to use the same data dir wiping each others data. Luckily this is
unlikely to happen if the nodes are running in different JVM which they
do unless they are embedded.
See LUCENE-5738
Closes#6424
The _uid field wasn't available in a script despite it's always stored. Made it available and made available also _id and _type fields that are deducted from it.
Closes#6406
The versionMap holds all versions (keyed by _uid) for recently indexed
documents. Previously we only cleared it during flush, which can be
infrequent if the translog flush thresholds are high, and can cause
excessive heap usage especially for small documents.
Now we clear it during refresh which is usually more frequent
(e.g. once per second by default).
Closes#6379
The GeoBounds Aggregation is a new single bucket aggregation which outputs the coordinates of a bounding box containing all the points from all the documents passed to the aggregation as well as the doc count. Geobound Aggregation also use a wrap_logitude parameter which specifies whether the resulting bounding box is permitted to overlap the international date line. This option defaults to true.
This aggregation introduces the idea of MetricsAggregation which do not return double values and cannot be used for sorting. The existing MetricsAggregation has been renamed to NumericMetricsAggregation and is a subclass of MetricsAggregation. MetricsAggregations do not store doc counts and do not support child aggregations.
Closes#5634
This commit reverts the commit for issue #5900 introduced
in `1.2.0`. The unlimited translog size can cause memory pressure
on ES instances with low memory and high indexing load.
Closes#6377
Routing has been inadvertly changed in #5562 resulting in documents going to
different shards in 1.2. This is a terrible bug because an indexing request
would not necessarily go to the same shard anymore, potentially leading to
duplicates.
Close#6391
Previously, More Like This would create a new mlt query for each value of a
multi-value field. This could result in all the values of the field to be
selected, which defeats the purpose of More Like This. Instead, the correct
behavior is to generate only one mlt query for all the values of the field.
This commit provides the correct behavior for More Like This DSL. The fix for
More Like This API will be coming in another commit.
Closes#6310
The BigArrays limit is currently shared by the translog, netty, http and some
queries/aggregations. If any of these consumers starts taking a lot of memory,
then other ones might fail to allocate memory, which could have bad
consequences, eg. if ping requests can't be sent. The plan is to come up with
a better solution in 1.3.
Close#6332
If a polygon is constructed which overlaps the date line but has a hole which lies entirely one to one side of the date line, JTS error saying that the hole is not within the bounds of the polygon because the code which splits the polygon either side of the date line does not add the hole to the correct component of the final set of polygons. The fix ensures this selection happens correctly.
Closes#6179
We perform some management operations that require the cluster to be
consistent with respect to the number of nodes in the cluster state
/ visible to the master in order to rely on the ack mechanism. This
only applies to the test infrastructure when nodes are not explicitly
started / stopped as well as while tearing down the cluster and wiping
indices after the tests.
Reusing Lucene's TermsEnum for _uid/version lookups gives a small
indexing (updates) speedup and brings us a closer to not having
to spend RAM on bloom filters.
Closes#6212
Randomize rewrite methods instead of trying them all when highlighting multi term queries with postings highlighter
Rely on search type randomization and remove all the explicit setSearchType calls as they are not needed anymore
Remove explicit `.from`, `.size` and `.explain`, not needed and might slow tests down (especially explain)
Added support for min_children and max_children parameters to
the has_child query and filter. A parent document will only
be considered if a match if the number of matching children
fall between the min/max bounds.
Closes#6019
Using ping.timeout, which defaults to 3s, to use as a timeout value on the join request a node makes to the master once its discovered can be too small, specifically when there is a large cluster state involved (and by definition, all the buffers and such on the nio layer will be "cold"). Introduce a dedicated join.timeout setting, that by default is 10x the ping.timeout (so 30s by default).
closes#6342
The top_hits aggregation returned an empty InternalTopHits instance with no fields set when there were no result, causing reduce and serialization errors down the road. This is fixed by setting all required fields when a there are no results.
Closes#6346
This default type has been inherited from its ancestor, the (non-paged) recycler whose memory
usage was unbounded and required soft references to make sure it could release memory eventually.
On the contrary, the page cache recycler memory usage is bounded so we could remove soft
references in order to remove load on the garbage collector.
Note: the cache type is already randomized in integration tests.
Close#6320
Mustache extracts the key/value pairs for parameter substitution from
objects and maps but it's decided on the first execution. We need to
make sure if the params are null we pass an empty map to ensure we
bind the map based extractor
Closes#6318
At the moment plain highligher only uses an analyzer defined for on the type
level. However, during the indexing stage it is possible to define analyzer on
per document level, for example mapping '_analyzer' to another field, containing
required name. This commit attempts to make sure that highlighting works
correctly in this scenario.
Closes#5497
Guava's caches have overflow issues around 32GB with our default segment
count of 16 and weight of 1 unit per byte. We give them 100MB of headroom
so 31.9GB.
This limits the sizes of both the field data and filter caches, the two
large guava caches.
Closes#6268
Because json objects are unordered this also adds an explicit order syntax
that looks like
"highlight": {
"fields": [
{"title":{ /*params*/ }},
{"text":{ /*params*/ }}
]
}
This is not useful for any of the builtin highlighters but will be useful
in plugins.
Closes#4649
When multiple date formats are specified using the || syntax in the field mappings the date_histogram aggregation breaks. This is because we are getting a parser rather than a printer from the date formatter for the object we use to convert the DateTime values back into Strings. Simple fix to get the printer from the date format and test to back it up
Closes#6239
Assertion was triggered for percolating documents with nested object
in mapping if the document did not actually contain a nested object.
Reason:
MultiDocumentPercolatorIndex checks if the number of documents is
actualu >1. Instead we can just use the SingleDocumentPercolatorIndex
in this case.
closes#6263
The `FieldDataWeighter` allowed to use a concrete subclass of the caches
generic type to be used that causes ClassCastException and also trips the
CirciutBreaker to not be decremented appropriately.
This was tripped by settings randomization also part of this commit.
Closes#6260
Today we check if the DocIdSet we filter by is `fast` but the check fails
if the DocIdSet if wrapped in an `ApplyAcceptedDocsFilter` which is always
the case if the index has deleted documents. This commit unwraps
the original DocIdSet in the case of deleted documents.
Closes#6247
AllTokenStream, used to index the _all field, adds some overhead, but
it's not necessary when no fields were boosted or when positions are
not indexed the _all field.
Closes#6187Closes#6219
This adds support for sending a list of benchmark names and/or wildcard
patterns when submitting an abort request. Standard wildcards such as:
"*", "*xxx", and "xxx*" are supported. Multiple benchmark names and
wildcard patterns can be submitted together as a comma-separated list
from the REST API or as a string array from the Java API.
Closes#6185
This change adds a new cluster state that waits for the replication of a shard to finish before starting snapshotting process. Because this change adds a new snapshot state, an pre-1.2.0 nodes will not be able to join the 1.2.0 cluster that is currently running snapshot/restore operation.
Closes#5531
This is yet another safety guard to make sure we don't delete data if the local copy is the only one (even if it's not part of the cluster state any more)
Closes#6191
In some places we want to delay the start of a shard recovery because the source node is not ready to receive. At the moment the retry logic ignores the time delay parameter (`retryAfter`) causing a busy waiting like scenario. This is fixed in this commit.
Closes#6226
This commit upgrades to the latest Lucene 4.8.1 release including the
following bugfixes:
* An IndexThrottle now kicks in when merges start falling behind
limiting index threads to 1 until merges caught up. Closes#6066
* RateLimiter now kicks in at the configured rate where previously
the limiter was limiting at ~8MB/sec almost all the time. Closes#6018
Just follow "lowercase" token filter example and register "uppercase" token filter as an exposed token filter. This will not, by itself, test whether ES is correctly handling "uppercase" TF; this is more of a "code as documentation" fix.
This commit add `dummy docs` to `ElasticsearchIntegrationTest#indexRandom`.
It indexes document with an empty body into the indices specified by the docs
and deletes them after all docs have been indexed. This produces gaps in
the segments and enforces usage of accept docs on lower levels to ensure
the features work with delete documents as well.
We currently ask `MatchDocIdSet#matchDoc(int)` before consulting
the accept docs. This can also have a negative performance impact
since `matchDoc(int)` calls might be way more expensive than
acceptDocs calls.
Closes#6234
FilterableTermsEnum allows to filter stats by supplying per segment
bits. Today if all docs are filtered out the term is still reported as
live but shouldn't.
Relates to #6211
because the shard the get request for 'test' index, 'type1' type and id 1 is getting executed on may not be in a started state
and also added more logging.
The `testCompareMoreLikeThisDSLWithAPI` test compares results from query
and API which might query different shards. Those shares might use
different doc IDs internally to disambiguate. This commit resorts the
results and compares them after stable disambiguation.
These calls were introduced in pr #6149 as a backward compatibility layer for the previous value of `Versions.MATCH_ANY`. This is not needed as the translog never contains these values. On top of that, the calls are not effective as the stream the translog used is effectively not versioned (versioining is done on an item by item basis)
The syntax to specify one or more items is the same as for the Multi GET API.
If only one document is specified, the results returned are the same as when
using the More Like This API.
Relates #4075Closes#5857
The retry mechanism in the transport layer might cause the delete snapshot request to be executed twice if the cluster master is closed while the request is executed. First time delete snapshot request is getting successfully executed on the old master and then it is retried on the newly elected master. When the new master tries to delete the snapshot - the snapshot no longer exists (since it was successfully deleted by the old master) and SnapshotMissingException is returned.
Currently even if some shards of the snapshot are not snapshotted successfully, the snapshot is still marked as "SUCCESS". Users may miss the fact the there are shard failures present in the snapshot and think that snapshot was completed. This change adds a new snapshot state "PARTIAL" that provides a quick indication that the snapshot was only partially successful.
Closes#5792
Until now all version types have officially required the version to be a positive long number. Despite of this has being documented, ES versions <=1.0 did not enforce it when using the `external` version type. As a result people have succesfully indexed documents with 0 as a version. In 1.1. we introduced validation checks on incoming version values and causing indexing request to fail if the version was set to 0. While this is strictly speaking OK, we effectively have a situation where data already indexed does not match the version invariant.
To be lenient and adhere to spirit of our data backward compatibility policy, we have decided to allow 0 as a valid external version type. This is somewhat complicated as 0 is also the internal value of `MATCH_ANY`, which indicates requests should succeed regardles off the current doc version. To keep things simple, this commit changes the internal value of `MATCH_ANY` to `-3` for all version types.
Since we're doing this in a minor release (and because versions are stored in the transaction log), the default `internal` version type still accepts 0 as a `MATCH_ANY` value. This is not a problem for other version types as `MATCH_ANY` doesn't make sense in that context.
Closes#5662
We try to reuse character arrays and UTF8 writers with softreferences.
SoftReferences have negative impact on GC and should be avoided in
general. Yet in this case it can simply replaced with a per-stream
Bytes/CharsRef that is thread local and has the same lifetime as the
stream.
Today the RecovyerID is taken from a static atomic long which
is essentially a per JVM ID. We run the tests within the same
JVM and that means we don't really simulate what happens in
production environments. Instead we should use a per node generated
ID.
* If plugin does not provide `lucene` property, we consider that the plugin is compatible.
* If plugin provides `lucene` property, we try to load related Enum org.apache.lucene.util.Version. If this fails, it means that the node is too "old" comparing to the Lucene version the plugin was built for.
* We compare then two first digits of current node lucene version against two first digits of plugin Lucene version. If not equal, it means that the plugin is too "old" for the current node.
Plugin developers who wants to launch plugin check only have to add a `lucene` property in `es-plugin.properties` file. If you are using maven to build your plugin, you can do it like this:
In `pom.xml`:
```xml
<properties>
<lucene.version>4.6.0</lucene.version>
</properties>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
</build>
```
In `es-plugin.properties`, add:
```properties
lucene=${lucene.version}
```
BTW, if you don't already have it, you can add the plugin version as well:
```properties
version=${project.version}
```
You can disable that check using `plugins.check_lucene: false`.
This commit adds randomization for:
* `index.merge.scheduler.max_thread_count`
* `index.merge.scheduler.max_merge_count`
This commit also moves to use
EsExecutors#boundedNumberOfProcessors(Settings) to default
configure the default `max_thread_count` for better reproducibility
Closes#6194
Added new internal flag to IndicesOptions that tells whether aliases can be resolved to multiple indices or not.
Cut over to new metaData#concreteIndices(IndicesOptions, String...) for all the api previously using MetaData#concreteIndices(String[], IndicesOptions) and removed old method, deprecation is not needed as it doesn't break client code.
Introduced constants for flags in IndicesOptions for more readability
Renamed MetaData#concreteIndex to concreteSingleIndex, left method as a shortcut although it calls the common concreteIndices that accepts IndicesOptions and multipleIndices
We configure the threadpools according to the number of processors which is
different on every machine. Yet, we had some test failures related to this
and #6174 that only happened reproducibly on a node with 1 available processor.
This commit does:
* sometimes randomize the number of available processors
* if we don't randomize we should set the actual number of available processors
in the settings on the test node
* always print out the num of processors when a test fails to make sure we can
reproduce the thread pool settings with the reproduce info line
Closes#6176
On small hardware, the BENCH thread pool can be set to size 1. This is
problematic as it means that while a benchmark is active, there are no
threads available to service administrative tasks such as listing and
aborting. This change fixes that by executing list and abort operations
on the GENERIC thread pool.
Closes#6174
I think Chuck Norris is required to fix this at this point until we have an API
that can for instance pause a Benchmark. We basically wait for a query to be executed
and that query syncs on a latch with the test in a script :)
This commit also adds some more testing for benchmarks that run into errors.
When you have a nested document and want to sort on its fields, it's perfectly doable on regular fields but not on "generated" sub fields.
Here is a SENSE recreation:
```
DELETE /tmp
PUT /tmp
PUT /tmp/doc/_mapping
{
"properties": {
"flat": {
"type": "string",
"index": "analyzed",
"fields": {
"sub": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nested": {
"type": "nested",
"properties": {
"foo": {
"type": "string",
"index": "analyzed",
"fields": {
"sub": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /tmp/doc/1
{
"flat":"bar",
"nested":{
"foo":"bar"
}
}
```
When sorting on `flat.sub` sub field, everything is fine:
```
GET /tmp/doc/_search
{
"sort": [
{
"flat.sub": {
"order": "desc"
}
}
]
}
```
When sorting on `nested` field, everything is fine:
```
GET /tmp/doc/_search
{
"sort": [
{
"nested.foo": {
"order": "desc"
}
}
]
}
```
But when sorting on `nested.sub` field, sorting is incorrect:
```
GET /tmp/doc/_search
{
"sort": [
{
"nested.foo.sub": {
"order": "desc"
}
}
]
}
Closes#6150.
Every class using these parameters has their own member where these four
are stored. This clutters the code. Because they mostly needed together
it might make sense to group them.