This commit fixes several NPEs caused by implicitly performing a get request for a document that exists with its _source disabled and then trying to access the source. Instead of causing an NPE the following queries will throw an exception with a "source disabled" message (similar behavior as if the document does not exist).:
- GeoShape query for pre-indexed shape (throws IllegalArgumentException)
- Percolate query for an existing document (throws IllegalArgumentException)
A Terms query with a lookup will ignore the document if the source does not exist (same as if the document does not exist).
GET and HEAD requests for the document _source will return a 404 if the source is disabled (even if the document exists).
If we don't care about scoring then for certain candidate matches we can be certain, that if they are a candidate match,
then they will always match. So verifying these queries with the MemoryIndex can be skipped.
Currently we don't throw an error when there is more than one query clause
specified in a must/must_not/should/filter object of the bool query without
using array notation, e.g.:
{ "bool" : { "must" : { "match" : { ... }, "match": { ... }}}}
In these cases, only the first query will be parsed and further behaviour is
unspecified, possibly leading to silently ignoring the rest of the query.
Instead we should throw a ParsingException if we don't encounter an END_OBJECT
token after having parsed the query clause.
Global cluster blocks were not checked if an empty set of indices were passed as argument to the block checking method. This would lead to issues where some operations are already executed on a cluster before it has recovered its cluster state.
This commit upgrades JNA from version 4.1.0 to 4.2.2. Additionally, this
dependency is now non-optional as JNA is dual-licensed with Apache
License 2.0 since JNA 4.0.0.
Relates #19045
This is the same as what Lucene does for its analysis factories, and we hawe
tests that make sure that the elasticsearch factories are in sync with
Lucene's. This is a first step to move forward on #9978 and #18064.
I suspect recent failures are due to the fact that the cache disables itself
when there is contention. This runs assertions in an assertBusy block since
they should eventually succeed.
This commit moves template support out of the Search API to its own dedicated Search Template API in the lang-mustache module. It provides a new SearchTemplateAction that can be used to render templates before it gets delegated to the usual Search API. The current REST endpoint are identical, but the Render Search Template endpoint now uses the same Search Template API with a new "simulate" option. When this option is enabled, the Search Template API only renders template and returns immediatly, without executing the search.
Closes#17906
There are secondary issues with async shard fetch going out to nodes before they have a cluster state published to them that need to be solved first. For example:
- async fetch uses transport node action that resolves nodes based on the cluster state (but it's not yet exposed by ClusterService since we inline the reroute)
- after disruption nodes will respond with an allocated shard (they didn't clean up their shards yet) which throws of decisions master side.
- nodes deed the index meta data in question but they may not have if they didn't recieve the latest CS
If a plugin declares `onModule(SomethingThatIsntAModule)` then refuse
to start. Before this commit we just logged a warning that flies by in
the console and is easy to miss. You can't miss refusing to start!
This commit removes duplicated methods for reading byte arrays in
StreamInput. One method would read a byte array by repeatedly calling
StreamInput#readByte in a loop, and the other would just call
StreamInput#readBytes. In this commit, we remove the former.
Relates #19023
`stored_fields` parameter will no longer try to retrieve fields from the _source but will only return stored fields.
`fields` will throw an exception if the user uses it.
Add `docvalue_fields` as an adjunct to `fielddata_fields` which is deprecated. `docvalue_fields` will try to load the value from the docvalue and fallback to fielddata cache if docvalues are not enabled on that field.
Closes#18943
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
This commit refactors InternalEngine#innerIndex and
InternalEngine#innerDelete to collapse some common logic into a single
method. This has the advantage that it shrinks the bytecode size of
InternalEngine#innerIndex so that it can be inlined.
Makes ScriptModule just a plain class that manages building the
ScriptSettings and ScriptService from plugins. When we *need*
to bind ScriptService with guice we bind it in a lambda.
- fix it so that processors with the `ignore_failure` option do not
record their exception in the response
- add more tests to make empty `on_failure`. This now throws an
exception
Today we only throw random exceptions on the translog writer. This commit
extends it to also throw exceptions during checkpoint writing etc to test
if the correct flags are provided to open method etc.
This makes this sequence:
```
curl -XDELETE localhost:9200/source,dest?pretty
for i in $( seq 1 100 ); do
curl -XPOST localhost:9200/source/test -d'{"test": "test"}'; echo
done
curl localhost:9200/_refresh?pretty
curl -XPOST 'localhost:9200/_reindex?pretty&wait_for_completion=false' -d'{
"source": {
"index": "source"
},
"dest": {
"index": "dest"
}
}'
curl 'localhost:9200/_tasks/Jsyd6d9wSRW-O-NiiKbPcQ:237?wait_for_completion&pretty'
```
Return task *AND* the response to the user.
This also renames "result" to "response" in the persisted task info
to line it up with how we name the objects in Elasticsearch.
This commit modifies reading the Elasticsearch jar manifest via the URL
instead of converting the URL to an NIO path for increased portability.
Relates #18999
The default similarity was set to `classic` which refers to TFIDF and has not been moved after the upgrade to Lucene 6.
Though moving to BM25 could have some downside for queries that relies on coordination factor (match_query, multi_match_query) ?
relates #18944
```
> Throwable #1: java.lang.RuntimeException: file handle leaks: [SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1), SeekableByteChannel(/var/lib/jenkins/workspace/elastic+elasticsearch+master+g1gc/core/build/testrun/integTest/J0/temp/org.elasticsearch.search.suggest.CompletionSuggestSearch2xIT_518545A20D129C8C-001/tempDir-001/data/nodes/1/indices/4sTECv6WSJOJsw9L4CGamg/0/index/segments_1)]
> at __randomizedtesting.SeedInfo.seed([518545A20D129C8C]:0)
> at org.apache.lucene.mockfile.LeakFS.onClose(LeakFS.java:63)
> at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:77)
> at org.apache.lucene.mockfile.FilterFileSystem.close(FilterFileSystem.java:78)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception
> at org.apache.lucene.mockfile.LeakFS.onOpen(LeakFS.java:46)
> at org.apache.lucene.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:81)
> at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:271)
> at org.apache.lucene.mockfile.FilterFileSystemProvider.newByteChannel(FilterFileSystemProvider.java:212)
> at org.apache.lucene.mockfile.HandleTrackingFS.newByteChannel(HandleTrackingFS.java:240)
> at java.nio.file.Files.newByteChannel(Files.java:361)
> at java.nio.file.Files.newByteChannel(Files.java:407)
> at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:77)
> at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
> at org.apache.lucene.util.LuceneTestCase.slowFileExists(LuceneTestCase.java:2695)
> at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:737)
> at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:94)
> at org.elasticsearch.common.lucene.Lucene$1.doBody(Lucene.java:237)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:685)
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:637)
> at org.elasticsearch.common.lucene.Lucene.checkSegmentInfoIntegrity(Lucene.java:242)
> at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:847)
> at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:740)
> at org.elasticsearch.index.store.Store.getMetadata(Store.java:260)
> at org.elasticsearch.index.store.Store.getMetadata(Store.java:240)
> at org.elasticsearch.index.shard.IndexShard.doCheckIndex(IndexShard.java:1310)
> at org.elasticsearch.common.util.CancellableThreads.executeIO(CancellableThreads.java:102)
> at org.elasticsearch.index.shard.IndexShard.checkIndex(IndexShard.java:1288)
> at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:921)
> at org.elasticsearch.index.shard.IndexShard.skipTranslogRecovery(IndexShard.java:964)
> at org.elasticsearch.indices.recovery.RecoveryTarget.prepareForTranslogOperations(RecoveryTarget.java:297)
> at
```
The MMapDirectory has a switch that allows the content of files to be loaded
into the filesystem cache upon opening. This commit exposes it with the new
`index.store.pre_load` setting.
Today we only emit that the setting wasn't found unless we have
some DYM suggestions. Yet, if a setting is not found at all and there
are no suggestions due to typos it's likely a removed setting or the plugin
that is supposed to be configured is not installed.
This commit adds some info text to the exception to help the user debugging
the problem before opening bugreports.
Instead of emitting:
`unknown setting [foo.bar]`
we now emit:
`unknown setting [foo.bar] please check the migration guide for removed settings and ensure that the plugin you are configuring is installed`
Relates to #18663
If someone sets `index.shard.check_on_startup`, indexing start up time can be slow (by design, it diligently goes and checks all data). If for some reason the shard is closed in that time, the store ref is kept around and prevents a new shard copy to be allocated to this node via the shard level locks. This is especially tricky if the shard is close due to a cancelled recovery which may re-restart soon.
This commit adds a cancellable threads instance to each IndexShard and perform index checking underneath it, so it can be cancelled on close.
We pretended to be able to ackt like a different version node for so long it's
time to be honest and remove this ability. It's just confusing and where needed
and tested we should build dedicated extension points.
This commit introduce unit testing infrastructure to test replication operations using real index shards. This is infra is complementary to the full integration tests and unit testing of ReplicationOperation we already have. The new ESIndexLevelReplicationTestCase base makes it easier to test and simulate failure mode that require real shards and but do not need the full blow stack of a complete node.
The commit also add a simple "nothing is wrong" test plus a test that checks we don't drop docs during the various stages of recovery.
For now, only single doc indexing is supported but this can be easily extended in the future.
This class was forked in 0.20 to remove a volatile keyword. While there
is no issue attached to the commit, no evidence of the criticality of the
change nor does it seem to be correct since we set this value internally as well
I think this class should be used as is from joda time even if we have to pay
the price of volatile reads. We can't do 3rd party optimization in our codebase that
way it just not maintainable.
This was added in 2280915d3c
This commit removes the search preference _only_node as the same
functionality can be obtained by using the search preference
_only_nodes. This commit also adds a test that ensures that _only_nodes
will continue to support specifying node IDs.
Relates #18875
In the past, we had the semantics where the very first cluster state a node processed after joining could not contain shard assignment to it. This was to make sure the node cleans up local / stale shard copies before receiving new ones that might confuse it. Since then a lot of work in this area, most notably the introduction of allocation ids and #17270 . This means we don't have to be careful and just reroute in the same cluster state change where we process the join, keeping things simple and following the same pattern we have in other places.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
The NodeJoinController is responsible for processing joins from nodes, both normally and during master election. For both use cases, the class processes incoming joins in batches in order to be efficient and to accumulated enough joins (i.e., >= min_master_nodes) to seal an election and ensure the new cluster state can be committed. Since the class was written, we introduced a new infrastructure to support batch changes to the cluster state at the `ClusterService` level. This commit rewrites NodeJoinController to use that infra and be simpler.
The PR also introduces a new concept to ClusterService allowing to submit tasks in batches, guaranteeing that all tasks submitted in a batch will be processed together (potentially with more tasks). On top of that I added some extra safety checks to the ClusterService, around potential double submission of task objects into the queue.
This is done in preparation to revive #17811
Today we have a push model for registering basically anything. All our extension points
are defined on modules which we pass in to plugins. This is harder to maintain and adds
unnecessary dependencies on the modules itself. This change moves towards a pull model
where the plugin offers a getter kind of method to get the extensions. This will also
help in the future if we need to pass dependencies to the extension points which can
easily be defined on the method as arguments if a pull model is used.
Currently the error messages for failing tests in the TimeZoneRoundingTests test
suite are hard to read because they usually report the actual end expected date
in milliseconds utc (e.g. "Expected: <1414270860000L> but: was <1414270800000L>".
This makes failing tests hard to read.
This change introduces a new Matcher that can be used for equality checks for
long dates but reports the error both as a formated date string according to
some time zone and also as the actual long values, so you get messages like
"Expected: 2014-10-26T00:01:00.000+03:00 [1414270860000] but: was
"2014-10-26T00:00:00.000+03:00 [1414270800000]".
Also clean cleaning up some helper methods and generally simplifying a few test
cases. Otherwise this change shouldn't affect either the scope of the test or
anything about the rounding implementation itself.
They have been implemented in https://issues.apache.org/jira/browse/LUCENE-7289.
Ranges are implemented so that the accuracy loss only occurs at index time,
which means that if you are searching for values between A and B, the query will
match exactly all documents whose value rounded to the closest half-float point
is between A and B.
Registering a script engine or native scripts still uses Guice today
and is much more complicated than needed. This change moves to a pull
based model where script plugins have to implement a dedicated interface
`ScriptPlugin` and defines simple getter returning instances rather than
classes.