This test is periodically failing. As I suspect that the GCDisruption scheme is somehow making the wrong node block on
its cluster state update thread, I've added some more logging and a thread dump once the given assertion triggers
again.
Adds an explicit recoverySource field to ShardRouting that characterizes the type of recovery to perform:
- fresh empty shard copy
- existing local shard copy
- recover from peer (primary)
- recover from snapshot
- recover from other local shards on same node (shrink index action)
Objects hierarchy must be tracked when entering/leaving an object so that it better knows if the "newField" has been inserted into an arbitrary holding object.
Can be reproduced with gradle :core:test -Dtests.seed=760F8BD0F7E46D45 -Dtests.class=org.elasticsearch.index.query.MoreLikeThisQueryBuilderTests -Dtests.method="testUnknownObjectException" -Dtests.security.manager=true -Dtests.locale=ko -Dtests.timezone=Etc/Zulu
When need to check the whole hierarchy of objects to know if the newly inserted "newField" object is part of an arbitrary holding object or not.
Reproduced with `gradle :modules:percolator:test -Dtests.seed=736B0B67DA7A3632 -Dtests.class=org.elasticsearch.percolator.PercolateQueryBuilderTests -Dtests.method="testUnknownObjectException" -Dtests.security.manager=true -Dtests.locale=es-ES -Dtests.timezone=ART`
This method fails when a randomized string value contains a double-quote. This commit changes the method so that it is not based on string concatenation anymore. It now use XContentGenerator & XContentParser to mutate the valid queries.
Related #19864
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Deprecates the optimize_bbox parameter on geodistance queries. This has no longer been needed since version 2.2 because lucene geo distance queries (postings and LatLonPoint) already optimize by bounding box.
This change converts AllocationDecider registration from push based on
ClusterModule to implementing with a new ClusterPlugin interface.
AllocationDecider instances are allowed to use only Settings and
ClusterSettings.
Adds a class that records changes made to RoutingAllocation, so that at the end of the allocation round other values can be more easily derived based on these changes. Most notably, it:
- replaces the explicit boolean flag that is passed around everywhere to denote changes to the routing table. The boolean flag is automatically updated now when changes actually occur, preventing issues where it got out of sync with actual changes to the routing table.
- records actual changes made to RoutingNodes so that primary term and in-sync allocation ids, which are part of index metadata, can be efficiently updated just by looking at the shards that were actually changed.
In addition to be an allocation decider, DiskThresholdDecider also
monitors the used disk in order to trigger a reroute when the thresholds
are crossed. This change splits out the settings for disk thresholds
into DiskThresholdSettings, and moves the monitoring to a new
DiskThresholdMonitor. DiskThresholdDecider is then in line with other
allocation deciders, needing only Settings and ClusterSettings for
construction, which will allow deguicing allocation deciders.
`LobObtainFailedException` should be reserved for on-disk locks that
Lucene attempts (like `write.lock`). This switches our in-memory
semaphore locks for shards to use a different exception. Additionally,
ShardLockObtainFailedException no longer subclasses IOException, since
no IO is being done is this case.
Resolves#19978
As the most complicated `FetchSubPhase` highlighting gets its own package
(`o.e.seach.fetch.subphase.highlight`. No other `FetchSubPhase`s get their
own package. Instead they all reside together in `o.e.search.fetch.subphase`.
Add package descriptions to `o.e.search.fetch` and subpackages.
This commit defaults the max local storage nodes to one. The motivation
for this change is that a default value greather than one is dangerous
as users sometimes end up unknowingly starting a second node and start
thinking that they have encountered data loss.
Relates #19964
This commit separates the description of the links in the network that are to be disrupted from the failure that is to be applied to the links (disconnect/unresponsive/delay). Previously we had subclasses for the various kind of network disruption schemes combining on one hand failure mode (disconnect/unresponsive/delay) as well as the network links to cut (two partitions / bridge partitioning) into a single class.
I also reduced the visibility of a couple classes and renamed/consolidated some
test classes for consistency, eg. removing the `Simple` prefix or using the
`<Type>FieldMapperTests` convention for testing field mappers.
testUnknownObjectException used to generate malformed json objects in some cases, due to the existence of arrays as it was not closing the injected object correctly. That is why the test was catching JsonParseException among the exception that are expected to be thrown. That is fixed by tracking where the new object is placed and placing its end object marker to the right level rather than always at the end.
Also introduced a mechanism to explicitly declare objects that won't cause any exception when they get additional objects injected, so that there is no need to override the method anymore as that caused copy pasting of the whole test method. This also makes sure that changes are reflected in tests, as those inner objects are not skipped but we actually check that what is declared is true (no exceptions get thrown when an additional object is added within them.
When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.
If more dynamic scripts are sent, a user will get the following
exception:
```json
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "i",
"node" : "a5V1eXcZRYiIk8lecjZ4Jw",
"reason" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
}
],
"caused_by" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
},
"status" : 500
}
```
This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.
Relates to #19396
Some random operations were conditionally performed in the before test, which made tests not repeatable. For instance take the seed chain to repeat a specific iteration and try to reproduce it, this conditional code would get executed in both cases when trying to isolate the failure, but not among the different iterations (as only the first method/iteration executes it), hence the failure will not reproduce.
Moved the random operations to beforeClass and left the non random part in the before method, which is needed as it depends on some method that can be overridden by subclasses.
This commit cleans up indices in a snapshot repository when all
snapshots containing the index are all deleted. Previously, empty
indices folders would lay around after all snapshots containing
them were deleted.
Adds `warnings` syntax to the yaml test that allows you to expect
a `Warning` header that looks like:
```
- do:
warnings:
- '[index] is deprecated'
- quotes are not required because yaml
- but this argument is always a list, never a single string
- no matter how many warnings you expect
get:
index: test
type: test
id: 1
```
These are accessible from the docs with:
```
// TEST[warning:some warning]
```
This should help to force you to update the docs if you deprecate
something. You *must* add the warnings marker to the docs or the build
will fail. While you are there you *should* update the docs to add
deprecation warnings visible in the rendered results.
AbstractQueryTestCase parses the main version of the query in strict mode, meaning that it will fail if any deprecated syntax is used. It should do the same for alternate versions (e.g. short versions). This is the way it is because the two alternate versions for ids query are both deprecated. Moved testing for those to a specific test method that isolates the deprecations and actually tests that the two are deprecated.
Our parsing code accepted up until now queries in the following form (note that the query starts with `[`:
```
{
"bool" : [
{
"must" : []
}
]
}
```
This would lead to a null pointer exception as most parsers assume that the field name ("must" in this example) is the first thing that can be found in a query if its json is valid, hence always non null while parsing. Truth is that the additional array layer doesn't make the json invalid, hence the following code fragment would cause NPE within ParseField, because null gets passed to `parseContext.isDeprecatedSetting`:
```
if (token == XContentParser.Token.FIELD_NAME) {
currentFieldName = parser.currentName();
} else if (parseContext.isDeprecatedSetting(currentFieldName)) {
// skip
} else if (token == XContentParser.Token.START_OBJECT) {
```
We could add null checks in each of our parsers in lots of places, but we rely on `currentFieldName` being non null in all of our parsers, and we should consider it a bug when these unexpected situations are not caught explicitly. It would be best to find a way to prevent such queries altogether without changing all of our parsers.
The reason why such a query goes through is that we've been allowing a query to start with either `[` or `{`. The only reason I found is that we accept `match_all : []`. This seems like an undocumented corner case that we could drop support for. Then we can be stricter and accept only `{` as start token of a query. That way the only next token that the parser can encounter if the json is valid (otherwise the json parser would barf earlier) is actually a field_name, hence the assumption that all our parser makes hold.
The downside of this is simply dropping support for `match_all : []`
Relates to #12887
Old:
```
> Throwable #1: java.lang.AssertionError: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
> at __randomizedtesting.SeedInfo.seed([9325F8C5C6F227DD:1B71C71F680E4A25]:0)
> at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:119)
> at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:309)
> at java.lang.Thread.run(Thread.java:745)
```
New:
```
> Throwable #1: java.lang.AssertionError: Failure at [reindex/10_basic:12]: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
> at __randomizedtesting.SeedInfo.seed([444DEEAF47322306:CC19D175E9CE4EFE]:0)
> at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:329)
> at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:309)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.AssertionError: expected [2xx] status code but api [reindex] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25}],"type":"parsing_exception","reason":"[reindex] failed to parse field [dest]","line":1,"col":25,"caused_by":{"type":"illegal_argument_exception","reason":"[dest] unknown field [asdfadf], parser not found"}},"status":400}]
> at org.elasticsearch.test.rest.yaml.section.DoSection.execute(DoSection.java:119)
> at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:325)
> ... 37 more
```
Sorry for the longer stack trace, but I wanted to be sure I didn't throw
anything away by accident.
Currently any code that wants to added NamedWriteables to the
NamedWriteableRegistry can do so via guice injection of the registry,
and registering at construction time. However, this makes the registry
complex: it has both get and register methods synchronized, and there is
likely contention on the read side from multiple threads. The
registration has mostly already been contained to guice modules at node
construction time.
This change makes the registry immutable, taking all of the
NamedWriteable readers at construction time. It also allows plugins to
added arbitrary named writables that it may use in its own transport
actions.
conform with the requirements of the writeBlob method by
throwing a FileAlreadyExistsException if attempting to write
to a blob that already exists. This change means implementations
of BlobContainer should never overwrite blobs - to overwrite a
blob, it must first be deleted and then can be written again.
Closes#15579
After #13834 many tests that used Groovy scripts (for good or bad reason) in their tests have been moved in the lang-groovy module and the issue #13837 has been created to track these messy tests in order to clean them up.
The work started with #19280, #19302 and #19336 and this PR moves the remaining messy tests back in core, removes the dependency on Groovy, changes the scripts in order to use the mocked script engine, and change the tests to integration tests.
It also moves IndexLookupIT test back (even if it has good chance to be removed soon) and fixes its tests.
It also changes AbstractQueryTestCase to use custom script plugins in tests.
closes#13837
* Rename operation to result and reworking responses
* Rename DocWriteResponse.Operation enum to DocWriteResponse.Result
These are just easier to interpret names.
Closes#19664
This is cleanup work from #19566, where @nik9000 suggested trying to nuke the isCreated and isFound methods. I've combined nuking the two methods with removing UpdateHelper.Operation in favor of DocWriteResponse.Operation here.
Closes#19631.
This removes two packages, consolidating them into their parent package
and adds `package-info.java` files to describe all of the packages under
`org.elasticsearch.test.rest`.
The tests for authentication extend ESIntegTestCase and use a mock
authentication plugin. This way the clients don't have to worry about
running it. Sadly, that means we don't really have good coverage on the
REST portion of the authentication.
This also adds ElasticsearchStatusException, and exception on which you
can set an explicit status. The nice thing about it is that you can
set the RestStatus that it returns to whatever arbitrary status you like
based on the status that comes back from the remote system.
reindex-from-remote then uses it to wrap all remote failures, preserving
the status from the remote Elasticsearch or whatever proxy is between us
and the remove Elasticsearch.
This makes it obvious that these tests are for running the client yaml
suites. Now that there are other ways of running tests using the REST
client against a running cluster we can't go on calling the shared
client yaml tests "REST tests". They are rest tests, but they aren't
**the** rest tests.
This adds a header that looks like `Location: /test/test/1` to the
response for the index/create/update API. The requirement for the header
comes from https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.htmlhttps://tools.ietf.org/html/rfc7231#section-7.1.2 claims that relative
URIs are OK. So we use an absolute path which should resolve to the
appropriate location.
Closes#19079
This makes large changes to our rest test infrastructure, allowing us
to write junit tests that test a running cluster via the rest client.
It does this by splitting ESRestTestCase into two classes:
* ESRestTestCase is the superclass of all tests that use the rest client
to interact with a running cluster.
* ESClientYamlSuiteTestCase is the superclass of all tests that use the
rest client to run the yaml tests. These tests are shared across all
official clients, thus the `ClientYamlSuite` part of the name.
After #13834 many tests that used Groovy scripts (for good or bad reason) in their tests have been moved in the lang-groovy module and the issue #13837 has been created to track these messy tests in order to clean them up.
This commit moves more tests back in core, removes the dependency on Groovy, changes the scripts in order to use the mocked script engine, and change the tests to integration tests.
With #19140 we started persisting the node ID across node restarts. Now that we have a "stable" anchor, we can use it to generate a stable default node name and make it easier to track nodes over a restarts. Sadly, this means we will not have those random fun Marvel characters but we feel this is the right tradeoff.
On the implementation side, this requires a bit of juggling because we now need to read the node id from disk before we can log as the node node is part of each log message. The PR move the initialization of NodeEnvironment as high up in the starting sequence as possible, with only one logging message before it to indicate we are initializing. Things look now like this:
```
[2016-07-15 19:38:39,742][INFO ][node ] [_unset_] initializing ...
[2016-07-15 19:38:39,826][INFO ][node ] [aAmiW40] node name set to [aAmiW40] by default. set the [node.name] settings to change it
[2016-07-15 19:38:39,829][INFO ][env ] [aAmiW40] using [1] data paths, mounts [[ /(/dev/disk1)]], net usable_space [5.5gb], net total_space [232.6gb], spins? [unknown], types [hfs]
[2016-07-15 19:38:39,830][INFO ][env ] [aAmiW40] heap size [1.9gb], compressed ordinary object pointers [true]
[2016-07-15 19:38:39,837][INFO ][node ] [aAmiW40] version[5.0.0-alpha5-SNAPSHOT], pid[46048], build[473d3c0/2016-07-15T17:38:06.771Z], OS[Mac OS X/10.11.5/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_51/25.51-b03]
[2016-07-15 19:38:40,980][INFO ][plugins ] [aAmiW40] modules [percolator, lang-mustache, lang-painless, reindex, aggs-matrix-stats, lang-expression, ingest-common, lang-groovy, transport-netty], plugins []
[2016-07-15 19:38:43,218][INFO ][node ] [aAmiW40] initialized
```
Needless to say, settings `node.name` explicitly still works as before.
The commit also contains some clean ups to the relationship between Environment, Settings and Plugins. The previous code suggested the path related settings could be changed after the initial Environment was changed. This did not have any effect as the security manager already locked things down.
#19096 introduced a generic TCPTransport base class so we can have multiple TCP based transport implementation. These implementations can vary in how they respond internally to situations where we concurrently send, receive and handle disconnects and can have different exceptions. However, disconnects are important events for the rest of the code base and should be distinguished from other errors (for example, it signals TransportMasterAction that it needs to retry and wait for the a (new) master to come back). Therefore, we should make sure that all the implementations do the proper translation from their internal exceptions into ConnectTransportException which is used externally.
Similarly we should make sure that the transport implementation properly recognize errors that were caused by a disconnect as such and deal with them correctly. This was, for example, the source of a build failure at https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-intake/1080 , where a concurrency issue cause SocketException to bubble out of MockTcpTransport.
This PR adds a tests which concurrently simulates connects, disconnects, sending and receiving and makes sure the above holds. It also fixes anything (not much!) that was found it.
* rethrow script compilation exceptions into ingest configuration exceptions
* update readProcessor to rethrow any exception as an ElasticsearchException
We throw IOException, which is the exception that is going to be thrown in 99% of the cases. A more generic exception can happen, and if it is a runtime one we just let it bubble up as is, otherwise we wrap it into runtime one so that we don't require to catch Exception everywhere, which seems odd.
Also adjusted javadocs for all performRequest methods
With the introduction of the async client, ResponseException doesn't eagerly read the response body anymore into a string. That is better, but raised a problem in our REST tests infra: we were reading the response body twice, while it can only be consumed once. Introduced a RestTestResponseException that wraps a ResponseException and exposes the body which now gets read only once.
The new method accepts the usual parameters (method, endpoint, params, entity and headers) plus a response listener and an async response consumer. Shortcut methods are also added that don't require params, entity and the async response consumer optional.
There are a few relevant api changes as a consequence of the move to async client that affect sync methods:
- Response doesn't implement Closeable anymore, responses don't need to be closed
- performRequest throws Exception rather than just IOException, as that is the the exception that we get from the FutureCallback#failed method in the async http client
- ssl configuration is a bit simpler, one only needs to call setSSLStrategy from a custom HttpClientConfigCallback, that doesn't end up overridng any other default around connection pooling (it used to happen with the sync client and make ssl configuration more complex)
Relates to #19055
We used to mutate it as part of building the aggregation. That
caused assertVersionSerializable to fail because it assumes that
requests aren't mutated after they are sent.
Closes#19481
The `client/transport` project adds a new jar build project that
pulls in all dependencies and configures all required modules.
Preinstalled modules are:
* transport-netty
* lang-mustache
* reindex
* percolator
The `TransportClient` classes are still in core
while `TransportClient.Builder` has only a protected construcutor
such that users are redirected to use the new `TransportClientBuilder`
from the new jar.
Closes#19412
* Removed `Template` class and unified script & template parsing logic. Templates are scripts, so they should be defined as a script. Unless there will be separate template infrastructure, templates should share as much code as possible with scripts.
* Removed ScriptParseException in favour for ElasticsearchParseException
* Moved TemplateQueryBuilder to lang-mustache module because this query is hard coded to work with mustache only
Before returning, index creation now waits for the configured number
of shard copies to be started. In the past, a client would create an
index and then potentially have to check the cluster health to wait
to execute write operations. With the cluster health semantics changing
so that index creation does not cause the cluster health to go RED,
this change enables waiting for the desired number of active shards
to be active before returning from index creation.
Relates #9126
Also introduced a `Processor.Parameters` class that is holder for several services processors rely on,
the IngestPlugin#getProcessors(...) method has been changed to accept `Processor.Parameters` instead
of each service seperately.
This commit renames the Netty 3 transport module from transport-netty to
transport-netty3. This is to make room for a Netty 4 transport module,
transport-netty4.
Relates #19439
Today `node.mode` and `node.local` serve almost the same purpose, they
are a shortcut for `discovery.type` and `transport.type`. If `node.local: true`
or `node.mode: local` is set elasticsearch will start in _local_ mode which means
only nodes within the same JVM are discovered and a non-network based transport
is used. The _local_ mode it only really used in tests or if nodes are embedded.
For both, embedding and tests explicit configuration via `discovery.type` and `transport.type`
should be preferred.
This change removes all the usage of these settings and by-default doesn't
configure a default transport implemenation since netty is now a module. Yet, to make
the user expericence flawless, plugins or modules can set a `http.type.default` and
`transport.type.default`. Plugins set this via `PluginService#additionalSettings()`
which enforces _set-once_ which prevents node startup if set multiple times. This means
that our distributions will just startup with netty transport since it's packaged as a
module unless `transport.type` or `http.transport.type` is explicitly set.
This change also found a bunch of bugs since several NamedWriteables were not registered if a
transport client is used. Now that we don't rely on the `node.mode` leniency which is inherited
instead of using explicit settings, `TransportClient` uses `AssertingLocalTransport` which detects these problems since it serializes all messages.
Closes#16234
This moves all netty related code into modules/transport-netty the module is build as a zip file as well as a JAR to serve as a dependency for transport client. For the time being this is required otherwise we have no network based impl. for transport client users. This might be subject to change given that we move forward http client.
Some tests still start http implicitly or miss configuring the transport clients correctly.
This commit fixes all remaining tests and adds a depdenceny to `transport-netty` from
`qa/smoke-test-http` and `modules/reindex` since they need an http server running on the nodes.
This also moves all required permissions for netty into it's module and out of core.
That exception is currently serialized as its current base class IllegalStateException which confuses code supposed to deal with the stepping down of a master. This is an important exception and we should be able to serialize it correctly. This commit fixes it by moving the exception to inherit from ElasticsearchException and properly register it.
As a bonus I adapted CapturingTransport to properly simulate serialized exceptions.
The callback replaces the ability to fully replace the http client instance. By doing that, one used to lose any default that the RestClient had set for the underlying http client. Given that you'd usually override one or two things only, like a couple of timeout values, the ssl factory or the default credentials providers, it is not uder friendly if by doing that users end up replacing the whole http client instance and lose any default set by us.
Switches most search behavior extensions from push (`onModule(SearchModule)`)
to pull (`implements SearchPlugin`). This effort in general gives plugin
authors a much cleaner view of how to extend Elasticsearch and starts to
set up portions of Elasticsearch as "the plugin API". This commit in
particular does that for search-time behavior like customized suggesters,
highlighters, score functions, and significance heuristics.
It also switches most such customization to being done at search module
construction time which is much, much easier to reason about from a testing
perspective. It also helps significantly in the process of de-guice-ing
Elasticsearch's startup.
There are at least two major search time extensions that aren't covered in
this commit that will simply have to wait for the next commit on the topic
because this one has already grown large: custom aggregations and custom
queries. These will likely live in the same SearchPlugin interface as well.
This change adds a createComponents() method to Plugin implementations
which they can use to return already constructed componenents/services.
Eventually this should be just services ("components" don't really do
anything), but for now it allows any object so that preconstructed
instances by plugins can still be bound to guice. Over time we should
add basic services as arguments to this method, but for now I have left
it empty so as to not presume what is a necessary service.
If the allocation decision for a primary shard was NO, this should
cause the cluster health for the shard to go RED, even if the shard
belongs to a newly created index or is part of cluster recovery.
Relates #9126
Previously, index creation would momentarily cause the cluster health to
go RED, because the primaries were still being assigned and activated.
This commit ensures that when an index is created or an index is being
recovered during cluster recovery and it does not have any active
allocation ids, then the cluster health status will not go RED, but
instead be YELLOW.
Relates #9126
this commit moves the most of the http related integ tests out into it's own
`qa/smoke-test-http` project where most of the test can run against the external cluster.
Today we have a bunch of tests that use netty transport for several reasons
these tests use it because they need to run some tcp based transport. Yet, this
couples our tests tightly to the netty implementation which should be tested on it's own.
This change adds a plain socket based blocking TcpTransport implementation that is used by
default in tests if local transport is suppressed or if network is selected.
It also adds another tcp network implementation as a showcase how the interface works.
Today when a thread encounters a fatal unrecoverable error that
threatens the stability of the JVM, Elasticsearch marches on. This
includes out of memory errors, stack overflow errors and other errors
that leave the JVM in a questionable state. Instead, the Elasticsearch
JVM should die when these errors are encountered. This commit causes
this to be the case.
Relates #19272
This commit moves back some messy tests that have been placed in lang-groovy module in https://github.com/elastic/elasticsearch/pull/13834. It removes the dependency on Groovy plugin as well as change back the tests to integration tests (IT suffix).
It also changes the current MockScriptEngine and MockScriptPlugin to make it easier to use.
* master: (192 commits)
[TEST] Fix rare OBOE in AbstractBytesReferenceTestCase
Reindex from remote
Rename writeThrowable to writeException
Start transport client round-robin randomly
Reword Refresh API reference (#19270)
Update fielddata.asciidoc
Fix stored_fields message
Add missing footer notes in mapper size docs
Remote BucketStreams
Add doc values support to the _size field in the mapper-size plugin
Bump version to 5.0.0-alpha5.
Update refresh.asciidoc
Update shrink-index.asciidoc
Change Debian repository for Vagrant debian-8 box
[TEST] fix test to account for internal empyt reference optimization
Upgrade to netty 3.10.6.Final (#19235)
[TEST] fix histogram test when extended bounds overlaps data
Remove redundant modifier
Simplify TcpTransport interface by reducing send code to a single send method (#19223)
Fix style violation in InstallPluginCommand.java
...
This adds a remote option to reindex that looks like
```
curl -POST 'localhost:9200/_reindex?pretty' -d'{
"source": {
"remote": {
"host": "http://otherhost:9200"
},
"index": "target",
"query": {
"match": {
"foo": "bar"
}
}
},
"dest": {
"index": "target"
}
}'
```
This reindex has all of the features of local reindex:
* Using queries to filter what is copied
* Retry on rejection
* Throttle/rethottle
The big advantage of this version is that it goes over the HTTP API
which can be made backwards compatible.
Some things are different:
The query field is sent directly to the other node rather than parsed
on the coordinating node. This should allow it to support constructs
that are invalid on the coordinating node but are valid on the target
node. Mostly, that means old syntax.
This commit renames writeThrowable to writeException. The situation here
stems from the fact that the StreamOutput method for serializing
Exceptions needs to accept Throwables too as Throwables can be the cause
of serialized Exceptions. Yet, we do not serialize Throwables in the
Error sub-hierarchy in a way that they can be deserialized into their
initial type. This leads to an asymmetry in the StreamOutput method for
serializing Exceptions and the StreamInput method for writing
Excpetions. Namely, the former will accept Throwables but the latter
will only return Exceptions. A goal with the stream methods has always
been symmetry in the method names so that serialization/deserialization
routines appear symmetrical in code. It is this asymmetry on the
input/output types for Exceptions on StreamOutput/StreamInput that
clashes with the desired symmetry of naming. Despite this, we should
favor symmetry in the naming of the methods. This commit renames
StreamOutput#writeThrowable to StreamOutput#writeException which leaves
us with Exception StreamInput#readException and void
StreamOutput#writeException(Throwable).
Node IDs are currently randomly generated during node startup. That means they change every time the node is restarted. While this doesn't matter for ES proper, it makes it hard for external services to track nodes. Another, more minor, side effect is that indexing the output of, say, the node stats API results in creating new fields due to node ID being used as keys.
The first approach I considered was to use the node's published address as the base for the id. We already [treat nodes with the same address as the same](https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/discovery/zen/NodeJoinController.java#L387) so this is a simple change (see [here](https://github.com/elastic/elasticsearch/compare/master...bleskes:node_persistent_id_based_on_address)). While this is simple and it works for probably most cases, it is not perfect. For example, if after a node restart, the node is not able to bind to the same port (because it's not yet freed by the OS), it will cause the node to still change identity. Also in environments where the host IP can change due to a host restart, identity will not be the same.
Due to those limitation, I opted to go with a different approach where the node id will be persisted in the node's data folder. This has the upside of connecting the id to the nodes data. It also means that the host can be adapted in any way (replace network cards, attach storage to a new VM). I
It does however also have downsides - we now run the risk of two nodes having the same id, if someone copies clones a data folder from one node to another. To mitigate this I changed the semantics of the protection against multiple nodes with the same address to be stricter - it will now reject the incoming join if a node exists with the same id but a different address. Note that if the existing node doesn't respond to pings (i.e., it's not alive) it will be removed and the new node will be accepted when it tries another join.
Last, and most importantly, this change requires that *all* nodes persist data to disk. This is a change from current behavior where only data & master nodes store local files. This is the main reason for marking this PR as breaking.
Other less important notes:
- DummyTransportAddress is removed as we need a unique network address per node. Use `LocalTransportAddress.buildUnique()` instead.
- I renamed `node.add_lid_to_custom_path` to `node.add_lock_id_to_custom_path` to avoid confusion with the node ID which is now part of the `NodeEnvironment` logic.
- I removed the `version` paramater from `MetaDataStateFormat#write` , it wasn't really used and was just in the way :)
- TribeNodes are special in the sense that they do start multiple sub-nodes (previously known as client nodes). Those sub-nodes do not store local files but derive their ID from the parent node id, so they are generated consistently.
Today throughout the codebase, catch throwable is used with reckless
abandon. This is dangerous because the throwable could be a fatal
virtual machine error resulting from an internal error in the JVM, or an
out of memory error or a stack overflow error that leaves the virtual
machine in an unstable and unpredictable state. This commit removes
catch throwable from the codebase and removes the temptation to use it
by modifying listener APIs to receive instances of Exception instead of
the top-level Throwable.
Relates #19231
The only reason for LifecycleComponent taking a generic type was so that
it could return that type on its start and stop methods. However, this
chaining has no practical necessity. Instead, start and stop can be
void, and a whole bunch of confusing generics disappear.
This allowes embedding stash keys in string like `t${key}est`. This
allows simple string concatenation like acitons.
The test for this is in `ObjectPathTests` because `Stash` doesn't seem
to have a test on its own and it is simple enough to test embedded
stashes this way. And this is a way I expect them to be used eventually.
BytesReference should be a really simple interface, yet it has a gazillion
ways to achieve the same this. Methods like `#hasArray`, `#toBytesArray`, `#copyBytesArray`
`#toBytesRef` `#bytes` are all really duplicates. This change simplifies the interface
dramatically and makes implementations of it much simpler. All array access has been removed
and is streamlined through a single `#toBytesRef` method. Utility methods to materialize a
compact byte array has been added too for convenience.
When we introduced docs testing we added a special case for $body in Stash, so that the last stashed body could be evaluated, and expressions like "$body.took" could be extracted out of it. We can instead do that for any object in the stash, by simply wrapping the internal map in an ObjectPath instance. We can then drop the special stashResponse method and go back to using the ordinary stashValue too.
The downside of this change is that it adds a feature that may not be supported by other REST test runners, namely the evaluation of compouned paths from the stash. If we have "object" stashed as an object, it is now possible to extract directly each subobject of it as well e.g. "object.subobject.field1". None of the current REST tests rely on this, but our docs snippets tests do.
No need to match against yaml responses via regexes in REST tests, yaml responses can be properly parsed via ObjectPath instead. Few REST tests need to be updated accordingly.
We are going to parse the body anyways whenever it's in json format as it is going to be stashed. It is not useful to lazily parse it anymore. Also this allows us to not rely on automatic detection of the xcontent type based on the content of the response, but rather read the content type from the response headers.
ObjectPath used a Map up until now for the internal representation of its navigable object. That works in most of the cases, but there could also be an array as root object, in which case a List needs to be used instead of a Map. This commit changes the internal representation of the object to Object which can either be a List or a Map. The change is minimal as ObjectPath already had the checks in place to verify the type of the object in the current position and navigate through it.
Note: The new test added to ObjectPathTest uses yaml format explicitly as auto-detection of json format works only for a json object that starts with '{', not if the root object is actually an array and starts with '['.
The internal representation of the object that JsonPath gives access to is a map. That is independent of the initial input format, which is json but could also be yaml etc.
This commit renames JsonPath to ObjectPath and adds a static method to create an ObjectPath from an XContent
We introduced a special response_body assertion to test our docs snippets. The match assertion does the same job though and can be reused and adapted where needed. ResponseBodyAssertion contains provides much better and accurate errors though, which can be now utilized in MatchAssertion so that many more REST tests can benefit from readable error messages.
Each response body gets always stashed and can be retrieved for later evaluations already. Instead of providing the response body as strings that get parsed to json objects separately, then converted to maps as ResponseBodyAssertion did, we parse everything once, the json is part of the yaml test, which is supported. The only downside is that json comments cannot be used, rather yaml comments should be used (// C style vs # ). There were only two docs tests that were using comments in ingest-node.asciidoc where I went ahead and remove the comments which didn't seem that useful anyways.
Raise IOException on deleteBlob if the blob doesn't exist
This commit raises an IOException on BlobContainer#deleteBlob
if the blob does not exist, in conformance with the BlobContainer
interface contract. Each implementation of BlobContainer now
conforms to this contract (file system, S3, Azure, HDFS). This
commit also contains blob container tests for each of the
repository implementations.
Closes#18530
We'll migrate to NamedWriteable so we can share code with the rest
of the system. So we can work on this in multiple pull requests without
breaking Elasticsearch in between the commits this change supports
*both* old style `InternalAggregations.stream` serialization and
`NamedWriteable` style serialization. As such it creates about a
half dozen `// NORELEASE` comments that will have to be removed
once the migration is complete.
This also introduces a boolean `transportClient` flag to `SearchModule`
which is used to skip inappropriate registrations for for the
transport client while still registering the things it needs. In
this case that means that the `InternalAggregation` subclasses are
registered with the `NamedWriteableRegistry` but the `AggregationBuilder`
subclasses are not.
Finally, this moves aggregation registration from guice configuration
time to `SearchModule` construction time. This will make it simpler to
work with in the future as we further clean up Elasticsearch's
extension points.
We have long worked to capture different partitioning scenarios in our testing infra. This PR adds a new variant, inspired by the Jepsen blogs, which was forgotten far - namely a partition where one node can still see and be seen by all other nodes. It also updates the resiliency page to better reflect all the work that was done in this area.
This commits adds support for a `teardown` section that can be defined in REST tests to
clean up any items that may have been created by the test and are not cleaned up by
deletion of indices and templates.
Today we have a ton of logic inside the NettyTransport* codebase. The footprint
of the code that has a direct netty dependency is large and alternative implementations
are pretty hard today since they need to know all about our proticol etc.
This change moves most of the code into TCPTransport* baseclasses and moves all
the protocol send code together. The base classes now contain the majority of the logic
while NettyTransport* classes remain to implement the glue code, configuration and optimization.
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.
This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
The ChannelBuffer interface today leaks into the BytesReference abstraction
which causes a hard dependency on Netty across the board. This chance moves
this dependency and all BytesReference -> ChannelBuffer conversion into
NettyUtlis and removes the abstraction leak on BytesReference.
This change also removes unused methods on the BytesReference interface
and simplifies access to internal pages.
The plan for persistent node ids ( #17811 ) is to tie the node identity to a file stored in it's data folders. As such it becomes important that nodes in our testing infra have better affinity with their data folders and that their data folders are not cleaned underneath them. The first is important because we fix the random seed used for node id generation (for reproducibility) and allowing the same node to use two different data folders causes two separate nodes to have the same id, which prevents the cluster from forming. The second is important, for example, where a full cluster restart / single node restart need to maintain node identity and wiping the data folders at the wrong moment prevents this.
Concretely this commit does the following:
1) Remove previous attempts to have data folder per role using a prefix. This wasn't effective as it was using the data paths settings which are only used for part of the runs. An attempt to completely separate the paths via the home dir failed due to assumptions made by index custom path about node data folder ordinal uniqueness (see #19076)
2) Change full cluster restarts to start up nodes in the same order their were first created in, only randomly swapping nodes with the same roles.
3) Change test cluster reset methods to first shutdown the unneeded nodes and then re-start the shared nodes that were shut down, so they'll reclaim their data folders.
4) Improve data folder wiping logic and make sure it wipes only folders of "offline" nodes.
5) Add some very basic tests
This commit modifies TimeValue parsing to keep the input time unit. This
enables round-trip parsing from instances of String to instances of
TimeValue and vice-versa. With this, this commit removes support for the
unit "w" representing weeks, and also removes support for fractional
values of units (e.g., 0.5s).
Relates #19102
#18938 has changed the timing in which we send out to nodes to fetch their shard stores. Instead of doing this after the cluster state resulting of the node's join was published, #18938 made it be sent concurrently to the publishing processes. This revealed a couple of points where the shard store fetching is dependent of the current state of affairs of the cluster state, both on the master and the data nodes. The problem discovered were already present without #18938 but required a failure/extreme situations to make them happen.This PR tries to remove as much as possible of these dependencies making shard store fetching simpler and make the way to re-introduce #18938 which was reverted.
These are the notable changes:
1) Allow TransportNodesAction (of which shard store fetching is derived) callers to supply concrete disco nodes, so it won't need the cluster state to resolve them. This was a problem because the cluster state containing the needed nodes was not yet made available through ClusterService. Note that long term we can expect the rest layer to resolve node ids to concrete nodes, making this mode the only one needed.
2) The data node relied on the cluster state to have the relevant index meta data so it can find data when custom paths are used. We now fall back to read the meta data from disk if needed.
3) The data node was relying on it's own IndexService state to indicate whether the data it has corresponds to an existing allocation. This is of course something it can not know until it got (and processed) the new cluster state from the master. This flag in the response is now removed. This is not a problem because we used that flag to protect against double assigning of a shard to the same node, but we are already protected from it by the allocation deciders.
4) I removed the redundant filterNodeIds method in TransportNodesAction - if people want to filter they can override resolveRequest.
Instead of plugins calling `registerTokenizer` to extend the analyzer
they now instead have to implement `AnalysisPlugin` and override
`getTokenizer`. This lines up extending plugins in with extending
scripts. This allows `AnalysisModule` to construct the `AnalysisRegistry`
immediately as part of its constructor which makes testing anslysis
much simpler.
This also moves the default analysis configuration into `AnalysisModule`
which is how search is setup.
Like `ScriptModule`, `AnalysisModule` no longer extends `AbstractModule`.
Instead it is only responsible for building `AnslysisRegistry`. We still
bind `AnalysisRegistry` but we only do so in `Node`. This is means it
is available at module construction time so we slowly remove the need to
bind it in guice.
* master: (416 commits)
docs: removed obsolete information, percolator queries are not longer loaded into jvm heap memory.
Upgrade JNA to 4.2.2 and remove optionality
[TEST] Increase timeouts for Rest test client (#19042)
Update migrate_5_0.asciidoc
Add ThreadLeakLingering option to Rest client tests
Add a MultiTermAwareComponent marker interface to analysis factories. #19028
Attempt at fixing IndexStatsIT.testFilterCacheStats.
Fix docs build.
Move templates out of the Search API, into lang-mustache module
revert - Inline reroute with process of node join/master election (#18938)
Build valid slices in SearchSourceBuilderTests
Docs: Convert aggs/misc to CONSOLE
Docs: migration notes for _timestamp and _ttl
Group client projects under :client
[TEST] Add client-test module and make client tests use randomized runner directly
Move upgrade test to upgrade from version 2.3.3
Tasks: Add completed to the mapping
Fail to start if plugin tries broken onModule
Remove duplicated read byte array methods
Rename `fields` to `stored_fields` and add `docvalue_fields`
...
This is the same as what Lucene does for its analysis factories, and we hawe
tests that make sure that the elasticsearch factories are in sync with
Lucene's. This is a first step to move forward on #9978 and #18064.
This commit moves template support out of the Search API to its own dedicated Search Template API in the lang-mustache module. It provides a new SearchTemplateAction that can be used to render templates before it gets delegated to the usual Search API. The current REST endpoint are identical, but the Render Search Template endpoint now uses the same Search Template API with a new "simulate" option. When this option is enabled, the Search Template API only renders template and returns immediatly, without executing the search.
Closes#17906
:client ---------> :client:rest
:client-sniffer -> :client:sniffer
:client-test ----> :client:test
This lines the client up with how we do things like modules and
plugins.
This changes adds a MapperPlugin interface which allows pull style
retrieval of mappers and metadata mappers added by plugins. For now, I
have kept the MapperRegistry, but this should be removed in the future
as it is just a silly container for 2 maps which could themselves be
passed around.
Makes ScriptModule just a plain class that manages building the
ScriptSettings and ScriptService from plugins. When we *need*
to bind ScriptService with guice we bind it in a lambda.
Significantly quiets the logging of the docs tests by:
1. Switching two log statements to debug level.
2. Only calling ESTestCase#afterIfFailed if the test failure wasn't
just assumptions being violated.
We pretended to be able to ackt like a different version node for so long it's
time to be honest and remove this ability. It's just confusing and where needed
and tested we should build dedicated extension points.
This commit introduce unit testing infrastructure to test replication operations using real index shards. This is infra is complementary to the full integration tests and unit testing of ReplicationOperation we already have. The new ESIndexLevelReplicationTestCase base makes it easier to test and simulate failure mode that require real shards and but do not need the full blow stack of a complete node.
The commit also add a simple "nothing is wrong" test plus a test that checks we don't drop docs during the various stages of recovery.
For now, only single doc indexing is supported but this can be easily extended in the future.
This change removes some unnecessary dependencies from ClusterService
and cleans up ClusterName creation. ClusterService is now not created
by guice anymore.
Today we have a push model for registering basically anything. All our extension points
are defined on modules which we pass in to plugins. This is harder to maintain and adds
unnecessary dependencies on the modules itself. This change moves towards a pull model
where the plugin offers a getter kind of method to get the extensions. This will also
help in the future if we need to pass dependencies to the extension points which can
easily be defined on the method as arguments if a pull model is used.
Registering a script engine or native scripts still uses Guice today
and is much more complicated than needed. This change moves to a pull
based model where script plugins have to implement a dedicated interface
`ScriptPlugin` and defines simple getter returning instances rather than
classes.
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
With this commit we exclude certain HTTP requests that are needed to inspect the cluster
from HTTP request limiting to ensure these commands are processed even in critical
memory conditions.
Relates #17951, relates #18145, closes#18833
This will let things that don't depend on :test:framework like the
client use it.
Also skip initializing the classes we check because we don't care
about their initialization behavior because we're not executing them.
This makes the naming conventions check pretty close to instant
from a "human eye" perspective.
* master: (51 commits)
Switch QueryBuilders to new MatchPhraseQueryBuilder
Added method to allow creation of new methods on-the-fly.
more cleanups
Remove cluster name from data path
Remove explicit parallel new GC flag
rehash the docvalues in DocValuesSliceQuery using BitMixer.mix instead of the naive Long.hashCode.
switch FunctionRef over to methodhandles
ingest: Move processors from core to ingest-common module.
Fix some typos (#18746)
Fix ut
convert FunctionRef/Def usage to methodhandles.
Add the ability to partition a scroll in multiple slices. API:
use painless types in FunctionRef
Update ingest-node.asciidoc
compute functional interface stuff in Definition
Use method name in bootstrap check might fork test
Make checkstyle happy (add Lookup import, line length)
Don't hide LambdaConversionException and behave like real javac compiled code when a conversion fails. This works anyways, because fallback is allowed to throw any Throwable
Pass through the lookup given by invokedynamic to the LambdaMetaFactory. Without it real lambdas won't work, as their implementations are private to script class
checkstyle have your upper L
...
Folded grok processor into ingest-common module.
The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.
Removed messy tests, these tests were already covered in the rest tests
Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin
Moved reindex ingest tests to qa module
Closes#18490
This commit refactors the handling of thread pool settings so that the
individual settings can be registered rather than registering the top
level group. With this refactoring, individual plugins must now register
their own settings for custom thread pools that they need, but a
dedicated API is provided for this in the thread pool module. This
commit also renames the prefix on the thread pool settings from
"threadpool" to "thread_pool". This enables a hard break on the settings
so that:
- some of the settings can be given more sensible names (e.g., the max
number of threads in a scaling thread pool is now named "max" instead
of "size")
- change the soft limit on the number of threads in the bulk and
indexing thread pools to a hard limit
- the settings names for custom plugins for thread pools can be
prefixed (e.g., "xpack.watcher.thread_pool.size")
- remove dynamic thread pool settings
Relates #18674
* master: (184 commits)
Add back pending deletes (#18698)
refactor matrix agg documentation from modules to main agg section
Implement ctx.op = "delete" on _update_by_query and _reindex
Close SearchContext if query rewrite failed
Wrap lines at 140 characters (:qa projects)
Remove log file
painless: Add support for the new Java 9 MethodHandles#arrayLength() factory (see https://bugs.openjdk.java.net/browse/JDK-8156915)
More complete exception message in settings tests
Use java from path if JAVA_HOME is not set
Fix uncaught checked exception in AzureTestUtils
[TEST] wait for yellow after setup doc tests (#18726)
Fix recovery throttling to properly handle relocating non-primary shards (#18701)
Fix merge stats rendering in RestIndicesAction (#18720)
[TEST] mute RandomAllocationDeciderTests.testRandomDecisions
Reworked docs for index-shrink API (#18705)
Improve painless compile-time exceptions
Adds UUIDs to snapshots
Add test rethrottle test case for delete-by-query
Do not start scheduled pings until transport start
Adressing review comments
...
Global checkpoints are update by the primary and represent the common part of history across shard copies, as know at a given time. The primary is also in charge of periodically broadcast this information to the replicas. See #10708 for more details.
A RestClient instance is now created whenever EsIntegTestCase#getRestClient is invoked for the first time. It is then kept until the cluster is cleared (depending on the cluster scope of the test).
Renamed other two restClient methods to createRestClient, as that instance needs to be closed and managed in the tests.
We still have a wrapper called RestTestClient that is very specific to Rest tests, as well as RestTestResponse etc. but all the low level bits around http connections etc. are now handled by RestClient.
This commit adds a UUID for each snapshot, in addition to the already
existing repository and snapshot name. The addition of UUIDs will enable
more robust handling of the deletion of previous snapshots and lingering
files from partially failed delete operations, on top of being able to
uniquely track each snapshot.
Closes#18228
Relates #18156
Currently we support empty query clauses like the filter in
"constant_score" : { "filter" : { } }
How these clauses are handled depends on the surrounding query.
They later are either ignored, converted to match all or no documents or
passed up further in the query hierarchy. During parsing these claues are
currently represented as EmptyQueryBuilders. When not handled anywhere else,
these special cases need to be checked for on the shard when building the
lucene query.
This is trappy, so this PR changes the parsing of compound queries. Instead
of returning QueryBuilder, the core query parsing method
QueryShardContext#parseInnerQueryBuilder() now return an Optional which can
be empty in the case of empty query clauses. This has the advantage of forcing
callers to deal with this sooner or later. When encountering empty Optionals,
compound query builders now have the choice to ignore them, pass them on or
rewrite to a different query, depending on context.
This commit clarifies the behavior that must be adhered to by any
implementors of the BlobContainer interface. This is done through
expanded Javadocs.
Closes#18157Closes#15580
The page cache recycler has a dependency on thread pool that was there
for historical reasons but is no longer needed. This commit removes this
now unneeded dependency.
Relates #18664
This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.
This PR changes the InternalTestCluster to support dedicated master nodes. The creation of dedicated master nodes can be controlled using a new `supportsMasterNodes` parameter to the ClusterScope annotation. If set to true (the default), dedicated master nodes will randomly be used. If set to false, no master nodes will be created and data nodes will also be allowed to become masters. If active, test runs will either have 1 or 3 masternodes
This commit simplifies the delayed shard allocation implementation by assigning clear responsibilities to the various components that are affected by delayed shard allocation:
- UnassignedInfo gets a boolean flag delayed which determines whether assignment of the shard should be delayed. The flag gets persisted in the cluster state and is thus available across nodes, i.e. each node knows whether a shard was delayed-unassigned in a specific cluster state. Before, nodes other than the current master were unaware of that information.
- This flag is initially set as true if the shard becomes unassigned due to a node leaving and the index setting index.unassigned.node_left.delayed_timeout being strictly positive. From then on, unassigned shards can only transition from delayed to non-delayed, never in the other direction.
- The reroute step is in charge of removing the delay marker (comparing timestamp when node left to current timestamp).
- A dedicated service DelayedAllocationService, reacting to cluster change events, has the responsibility to schedule reroutes to remove the delay marker.
Closes#18293
Failing the build on deprecation warnings was removed in
19b3ec88af. This commit removes the
suppressed deprecation warnings so that their use is surfaced in the
build now.
Relates #18582
This is a leftover spot that wasn't changed. It was breaking
ClusterSettingsIT#ClusterSettingsIT because that test expected
the test's log level to default to the default logger level for
the nodes.
Significant changes:
* AbstractQueryTestCase has moved to the test framework module, in order for query builder tests in modules and plugins
* Added support to AbstractQueryTestCase to register plugins
* Lift the restriction that only one percolator could be added per index. This validation existed in MapperService, but because the percolator moved to a module it could no longer exist there. Instead of bringing it back it was removed. This validation existed since the percolator cache only supported one percolator query per document, since the percolator cache has been removed this restriction could removed as well.
* While moving percolator tests to the new module, also removed a couple of tests for the deprecated percolate and mpercolate api. These APIs are now sugar APIs for bwc and rediect to the searvh and msearvh APIs. Some tests were still testing as if percolate and mpercolate API did the percolation, but this no longer the case and these tests could be removed.
This now requires that system properties passed to Gradle must be in the form of "-Dtests.es.*" instead of
"-Des.*". It then chops off "tests.es." and passes that as a "-E" property to Elasticsearch.
Also changed system properties:
- `tests.logger.level` became `tests.es.logger.level`
- `node.mode` became `tests.es.node.mode`
- `node.local` became `tests.es.node.local`
The assertBusy method currently has both a Runnable and Callable
version. This has caused confusion with type inference and lambdas
sometimes, in particular with java 9. This change removes the callable
version as nothing was actually using it.
Rather than having one win against the other, reject duplicated apis. Also enforce the convention that see the api name have the same name as the name of the rest spec file that defines it.
* master: (158 commits)
Document the hack
Refactor property placeholder use of env. vars
Force java9 log4j hack in testing
Fix log4j buggy java version detection
Make java9 work again
Don't mkdir directly in deb init script
Fix env. var placeholder test so it's reproducible
Remove ScriptMode class in favor of boolean true/false
[rest api spec] fix doc urls
Netty request/response tracer should wait for send
Filter client/server VM options from jvm.options
[rest api spec] fix url for reindex api docs
Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly.
Limit retries of failed allocations per index (#18467)
Proxy box method to use valueOf.
Use the build-in valueOf method instead of the custom one.
Fixed tests and added a comment to the box method.
Fix boxing.
Do not decode path when sending error
Fix race condition in snapshot initialization
...
This change makes ES compile with java9 again, build 118.
* There are a handful of changes due to failure to determine types during compile.
* The attachment plugins which use tika needed to have tika upgraded in order to pickup fixes there for java 9.
* azure discovery and s3 repository indirectly depend on jaxb, which is no longer in the default modules. They now add a jaxb dependency externally, and make JarHell allow for this package.
This removes the ScriptMode class entirely, which was an enum with two
options (ON and OFF) which essentially boiled down to true and false.
Now the boolean values are used instead.
Before 5.0 for it was required that the percolator queries were cached in jvm heap as Lucene queries for two reasons:
1) Performance. The percolator evaluated all percolator queries all the time. There was no pre-selecting queries that are likely to match like we have today.
2) Updates made to percolator queries were visible in realtime, Today these changes are visible in near realtime. So updating no longer requires the percolator to have the queries in jvm heap.
So having the percolator queries in jvm heap via the percolator cache is now less attractive. Especially when there are many percolator queries then these queries can consume many GBs of jvm heap.
Removing the percolator cache does make the percolate query slower compared to how the execution time in 5.0.0-alpha1 and alpha2, but it is still faster compared to 2.x and before.
Today when parsing settings during bootstrap, we add a system property
for every Elasticsearch setting. Additionally, settings can be set via
system properties. This commit simplifies this situation.
- settings are no longer propogated to system properties
- system properties can not be used to set settings
- the "es." prefix on settings is no longer required (nor permitted)
- test logging has a dedicated system property (tests.logger.level)
Relates #18198
We often require a random joda DateTimeZone in our tests. Currently
there are a few options for generating such a random DateTimeZone
from the set of available ids. Currently most random picks are not
really reproducable across different jvms because they rely on order
in the ids set implementation. The helper in DateProcessorFactoryTests
thus performs a sort on the set of ids before random picking from
the result, so I moved this to ESTestCase to make it publicly
available and changed all other tests to use that method.
This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.
Relates #15915
This change does the following:
- Queries that are currently unsupported such as prefix queries on numeric
fields or term queries on geo fields now throw an error rather than returning
a query that does not match anything.
- Fuzzy queries on numeric, date and ip fields are now unsupported: they used
to create range queries, we now expect users to use range queries directly.
Fuzzy, regexp and prefix queries are now only supported on text/keyword
fields (including `_all`).
- The `_uid` and `_id` fields do not support prefix or range queries anymore as
it would prevent us to store them more efficiently in the future, eg. by
using a binary encoding.
Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
* master: (904 commits)
Removes unused methods in the o/e/common/Strings class.
Add note regarding thread stack size on Windows
painless: restore accidentally removed test
Documented fuzzy_transpositions in match query
Add not-null precondition check in BulkRequest
Build: Make run task you full zip distribution
Build: More pom generation improvements
Add test for wrong array index
Take return type from "after" field.
painless: build descriptor of array and field load/store in code; fix array index to adapt type not DEF
Build: Add developer info to generated pom files
painless: improve exception stacktraces
painless: Rename the dynamic call site factory to DefBootstrap and make the inner class very short (PIC = Polymorphic Inline Cache)
Remove dead code.
Avoid race while retiring executors
Allow only a single extension for a scripting engine
Adding REST tests to ensure key_as_string behavior stays consistent
[test] Set logging to 11 on reindex test
[TEST] increase logger level until we know what is going on
Don't allow `fuzziness` for `multi_match` types cross_fields, phrase and phrase_prefix
...
Previously multiple extensions could be provided, however, this can lead
to confusion with on-disk scripts (ie, "foo.js" and "foo.javascript")
having different content. Only a single extension is now supported.
The only language currently supporting multiple extensions was the
Javascript engine ("js" and "javascript"). It now only supports the
`.js` extension.
Relates to #10598
This removes all the mentions of the sandbox from the script engine
services and permissions model. This means that the following settings
are no longer supported:
```yaml
script.inline: sandbox
script.stored: sandbox
```
Instead, only a `true` or `false` value can be specified.
Since this would otherwise break the default-allow parameter for
languages like expressions, painless, and mustache, all script engines
have been updated to have individual settings, for instance:
```yaml
script.engine.groovy.inline: true
```
Would enable all inline scripts for groovy. (they can still be
overridden on a per-operation basis).
Expressions, Painless, and Mustache all default to `true` for inline,
file, and stored scripts to preserve the old scripting behavior.
Resolves#17114
This makes it much easier to apply to other projects.
Fixes to doc tests infrastructure:
* Fix comparing lists. Was totally broken.
* Fix order of actual vs expected parameters.
* Allow multiple `// TESTRESPONSE` lines with substitutions to join
into one big list of subtitutions. This makes lets the docs look
tidier.
* Exclude build from snippet scanning
* Allow subclasses of ESRestTestCase access to the admin execution context
Most of the current implementations of BaseNodesResponse (plural Nodes) ignore FailedNodeExceptions.
- This adds a helper function to do the grouping to TransportNodesAction
- Requires a non-null array of FailedNodeExceptions within the BaseNodesResponse constructor
- Reads/writes the array to output
- Also adds StreamInput and StreamOutput methods for generically reading and writing arrays
The `ip` field uses a binary representation internally. This breaks when
rendering sort values in search responses since elasticsearch tries to write a
binary byte[] as an utf8 json string. This commit extends the `DocValueFormat`
API in order to give fields a chance to choose how to render values.
Closes#6077
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.
By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.
Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.
See docs/README.asciidoc for lots more.
Closes#12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
This commit introduces a handshake when initiating a light
connection. During this handshake, node information, cluster name, and
version are received from the target node of the connection. This
information can be used to immediately validate that the target node is
a member of the same cluster, and used to set the version on the
stream. This will allow us to extend APIs that are used during initial
cluster recovery without a major version change.
Relates #15971
This commit removes the method Strings#splitStringToArray and replaces
the call sites with invocations to String#split. There are only two
explanations for the existence of this method. The first is that
String#split is slightly tricky in that it accepts a regular expression
rather than a character to split on. This means that if s is a string,
s.split(".") does not split on the character '.', but rather splits on
the regular expression '.' which splits on every character (of course,
this is easily fixed by invoking s.split("\\.") instead). The second
possible explanation is that (again) String#split accepts a regular
expression. This means that there could be a performance concern
compared to just splitting on a single character. However, it turns out
that String#split has a fast path for the case of splitting on a single
character and microbenchmarks show that String#split has 1.5x--2x the
throughput of Strings#splitStringToArray. There is a slight behavior
difference between Strings#splitStringToArray and String#split: namely,
the former would return an empty array in cases when the input string
was null or empty but String#split will just NPE at the call site on
null and return a one-element array containing the empty string when the
input string is empty. There was only one place relying on this behavior
and the call site has been modified accordingly.
With this commit we compress HTTP responses provided the client
supports it (as indicated by the HTTP header 'Accept-Encoding').
We're also able to process compressed HTTP requests if needed.
The default compression level is lowered from 6 to 3 as benchmarks
have indicated that this reduces query latency with a negligible
increase in network traffic.
Closes#7309
Previously, we would determine index deletes in the cluster state by
comparing the index metadatas between the current cluster state and the
previous cluster state and decipher which ones were missing (the missing
ones are deleted indices). This led to a situation where a node that
went offline and rejoined the cluster could potentially cause dangling
indices to be imported which should have been deleted, because when a node
rejoins, its previous cluster state does not contain reliable state.
This commit introduces the notion of index tombstones in the cluster
state, where we are explicit about which indices have been deleted.
In the case where the previous cluster state is not useful for index
metadata comparisons, a node now determines which indices are to be
deleted based on these tombstones in the cluster state. There is also
functionality to purge the tombstones after exceeding a certain amount.
Closes#17265Closes#16358Closes#17435
This commit actually bounds the size of the generic thread pool. The
generic thread pool was of type cached, a thread pool with an unbounded
number of workers and an unbounded work queue. With this commit, the
generic thread pool is now of type scaling. As such, the cached thread
pool type has been removed. By default, the generic thread pool is
constructed with a core pool size of four, a max pool size of 128 and
idle workers can be reaped after a keep-alive time of thirty seconds
expires. The work queue for this thread pool remains unbounded.
When I pulled on the thread that is "Remove PROTOTYPEs from
SignificanceHeuristics" I ended up removing SignificanceHeuristicStreams
and replacing it with readNamedWriteable. That seems like a lot at once
but it made sense at the time. And it is what we want in the end, I think.
Anyway, this also converts registration of SignificanceHeuristics to
use ParseFieldRegistry to make them consistent with Queries, Aggregations
and lots of other stuff.
Adds a new and wonderous hack to support serialization checking of
NamedWriteables registered by plugins!
Related to #17085
This commit contains the following improvements/fixes:
1. Renaming method names and variables to better reflect the purpose
of the method and the semantics of the variable.
2. For deleting indexes, replace the closed parameter passed to the
delete index/store methods with obtaining the index's state from the
IndexSettings that is already passed in.
3. Added tests to the IndexWithShadowReplicaIT suite, some of which
show issues in the shadow replica delete process that are captured in
Github issue 17695.
Closes#17638
With this commit we limit the size of all in-flight requests on
HTTP level. The size is guarded by the same circuit breaker that
is also used on transport level. Similarly, the size that is used
is HTTP content length.
Relates #16011
With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.
By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting
network.breaker.inflight_requests.limit
Relates #16011
We have a couple places in the code base that assume that search is always done
on the inverted index. However with the new points API in Lucene 6, this is not
true anymore. This commit makes MappedFieldType.indexedValueForSearch protected
and fixes call sites to keep working for field types that use the inverted
index and either work differently ar throw an exception otherwise. For instance,
it will still be possible to run cross_fields multi match queries on numeric
fields, but the score contributions will not be blended as well as before, and
significant terms aggregations on long terms will not be possible anymore since
points do not record document frequencies.
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.
This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
Sometimes we get a test failure caused by search contexts left open.
The tests include a stack trace of the call that opened the context
but nothing else about the context. This adds more information about
the context that has been left open like what query it was running,
what shard it targeted, and whether or not it was a scroll.
Relates to #17582
We have both `Settings.settingsBuilder` and `Settings.builder` that do exactly
the same thing, so we should keep only one. I kept `Settings.builder` since it
has my preference but also it is the one that we use in examples of the Java API.
This removes the inconsistent output of IP addresses. The format was parsing-unfriendly and it makes it hard
to reason about API responses, such as to _nodes.
With this change in place, it will never print the hostname as part of the default format, which has the
added benefit that it can be used consistently for URIs, which was not the case when the hostname might
appear at the front with "hostname/ip:port".
Aggregations need to perform instanceof calls on MappedFieldType instances in
order to know how they should be parsed or formatted. Instead, we should let
the field types provide a formatter/parser that can can be used.
This change makes the root (/) rest api delegate to a transport action to get the
data for the response. This aligns this rest api with all of the other apis, which
delegate to one or more actions.
In doing this, unit tests were added to provide coverage of the RestMainAction
and the associated classes.
* master: (156 commits)
Make JNA calls optional
Added RPM metadata
Remove PROTOTYPE from MLT.Item
Remove PROTOTYPE from VersionType
Fix mistake in TopHits change
Remove PROTOTYPEs from highlighting
Clean up some log messages
Command line arguments with comma must be quoted on windows
Cluster Health should run on applied states, even if waitFor=0 #17440
ingest: make concrete processor impl final, like all other processor concrete impls.
Improve some test method comments.
Document task id's as string in the rest spec
Replace FieldStatsProvider with a method on MappedFieldType. #17334
cleanup test
Remove MathUtils. #17454
Addressing review comments
fix javadocs
Make TranslogConfig immutable and pass TranslogGeneration as a ctor arg to Translog
[reindex] Don't get rejected
Remove redundant commit - #openTranslog() already commits in that case
...
Move translog recover outside of the engine
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
Currently our testing of parsing query builders is limited to the
default order of the parameters that each builders toXContent()
method produces. To better test real queries where the order of
parameters can be different, this change adds a helper
method to ESTestCase that takes a XContentBuilder and randomly
shuffles the order of the fields inside an object. This is
used in AbstractQueryTestCase, but it can be used in other similar
places in the future.
We changed the way we manage engine memory buffers to an
open model where each shard can essentially has infinite memory.
The indexing memory controller is responsible for moving memory to disk
when it's needed. Yet, this doesn't work today when we recover from store/translog
since the engine is not fully initialized such that IMC has no access to the engine,
neither to it's memory buffer nor can it move data to disk.
The biggest issue here is that translog recovery happends inside the Engine constructor
which is problematic by itself since it might take minutes and uses a not yet fully
initialzied engine to perform write operations on.
This change detaches the translog recovery and makes it the responsibility of the caller
to run it once the engine is fully constructed or skip it if not necessary.
* master: (25 commits)
Replication operation that try to perform the primary phase on a replica should be retried
split long line in ConvertProcessorTests
add type conversion support to ConvertProcessor
percolator: Make explain use the two phase iterator
test: make sure we don't flush during indexing the percolator queries
Added experimental annotation to the update-by-query and reindex docs
Fixed bad YAML in reindex REST test: 50_routing.yaml
Update-by-query rest tests: fixed bad yaml and deleted a client-dependent test
Prevents exception being raised when ordering by an aggregation which wasn't collected
The reindex body is now required, which changes the exception thrown by the REST test
Docs: Included Nodes Task API and tidied reindex/update-by-query
Rename update-by-query REST tests to update_by_query
REST: The body is required in the reindex API
The source parameter should not be defined in the delete-by-query REST spec
Renamed update-by-query REST spec to update_by_query
Fix test bug in TypeQueryBuilderTests.
Add comment why it is safe to check the number of nested fields in MapperService.merge.
Automatically add a sub keyword field to string dynamic mappings. #17188
Type filters should not have a performance impact when there is a single type. #17350
Add API to explain why a shard is or isn't assigned
...
If a terms aggregation was ordered by a metric nested in a single bucket aggregator which did not collect any documents (e.g. a filters aggregation which did not match in that term bucket) an ArrayOutOfBoundsException would be thrown when the ordering code tried to retrieve the value for the metric. This fix fixes all numeric metric aggregators so they return their default value when a bucket ordinal is requested which was not collected.
Closes#17225
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.
It looks like this:
```
GET /_cluster/allocation/explain?pretty
{
"index": "only-foo",
"shard": 0,
"primary": false
}
```
Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".
The output when a shard is unassigned looks like this:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : false
},
"assigned" : false,
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-03-22T20:04:23.620Z"
},
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 0.06666675,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "NO",
"weight" : -1.3833332,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 2.3166666,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
And when the shard *is* assigned, the output looks like:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : true
},
"assigned" : true,
"assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 1.4499999,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "CURRENTLY_ASSIGNED",
"weight" : 0.0,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 3.6999998,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.
Resolves#14593
* master: (419 commits)
Remove PROTOTYPE from ShapeBuilders
Take filterNodeIds into consideration while sending tasks actions requests to nodes
test: cleanup imports and method rename
Remove PROTOTYPE from SortBuilders
percolator: Add query extract support for the blended term query and the common terms query.
Don't iterate over shard routing if it's null
[TEST] Reduce size of random shapes
Add some debug logging to testPrimaryRelocationWhileIndexing
Order methods in IndicesClusterStateService according to execution
Tidied up percolator doc annotations
In cat.snapshots, repository is required
Do not retrieve all indices stats when checking for cache resets
Enforce `discovery.zen.minimum_master_nodes` is set when bound to a public ip #17288
Port Primary Terms to master #17044
Revert "Add debug logging for Vagrant upgrade test"
Ownership for data, logs, and configs for packages
add on_failure exception metadata to ingest document for verbose simulate
Revert "Merge pull request #16843 from xuzha/s3-encryption"
Update Format, add new settings into the setting test
Update and rebase the init implementation.
...
Node roles are now serialized as well, they are not part of the node attributes anymore. DiscoveryNodeService takes care of dividing settings into attributes and roles. DiscoveryNode always requires to pass in attributes and roles separately.
Primary terms is a way to make sure that operations replicated from stale primary are rejected by shards following a newly elected primary.
Original PRs adding this to the seq# feature branch #14062 , #14651 . Unlike those PR, here we take a different approach (based on newer code in master) where the primary terms are stored in the meta data only (and not in `ShardRouting` objects).
Relates to #17038Closes#17044
Change version, required a minor fix in the RPM building.
In case of a alpha/beta version, the release will contain alpha/beta
as the RPM version cannot contains dashes/tildes.
The fielddata settings in mappings have been refatored so that:
- text and string have a `fielddata` (boolean) setting that tells whether it
is ok to load in-memory fielddata. It is true by default for now but the
plan is to make it default to false for text fields.
- text and string have a `fielddata_frequency_filter` which contains the same
thing as `fielddata.filter.frequency` used to (but validated at parsing time
instead of being unchecked settings)
- regex fielddata filtering is not supported anymore and will be dropped from
mappings automatically on upgrade.
- text, string and _parent fields have an `eager_global_ordinals` (boolean)
setting that tells whether to load global ordinals eagerly on refresh.
- in-memory fielddata is not supported on keyword fields anymore at all.
- the `fielddata` setting is not supported on other fields that text and string
and will be dropped when upgrading if specified.
this is the last step to remove node level service from IndexShard.
This means that tests can now more easily create an IndexShard instance
without starting a node and removes the dependency between IndexShard and Client/ScriptService
We current have a ClusterService interface, implemented by InternalClusterService and a couple of test classes. Since the decoupling of the transport service and the cluster service, one can construct a ClusterService fairly easily, so we don't need this extra indirection.
Closes#17183
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.
The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.
Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.
The following feature requests are automatically implemented via this refactoring:
Closes#10741Closes#7297Closes#13176Closes#13978Closes#11264Closes#10741Closes#4317
Today we allow to set all kinds of index level settings on the node level which
is error prone and difficult to get right in a consistent manner.
For instance if some analyzers are setup in a yaml config file some nodes might
not have these analyzers and then index creation fails.
Nevertheless, this change allows some selected settings to be specified on a node level
for instance:
* `index.codec` which is used in a hot/cold node architecture and it's value is really per node or per index
* `index.store.fs.fs_lock` which is also dependent on the filesystem a node uses
All other index level setting must be specified on the index level. For existing clusters the index must be closed
and all settings must be updated via the API on each of the indices.
Closes#16799
Without this commit fetching the status of a reindex from a node that isn't
coordinating the reindex will fail. This commit properly registers reindex's
status so this doesn't happen. To do so it moves all task status registration
into NetworkModule and creates a method to register other statuses which the
reindex plugin calls.
Today, certain bootstrap properties are set and read via system
properties. This action-at-distance way of managing these properties is
rather confusing, and completely unnecessary. But another problem exists
with setting these as system properties. Namely, these system properties
are interpreted as Elasticsearch settings, not all of which are
registered. This leads to Elasticsearch failing to startup if any of
these special properties are set. Instead, these properties should be
kept as local as possible, and passed around as method parameters where
needed. This eliminates the action-at-distance way of handling these
properties, and eliminates the need to register these non-setting
properties. This commit does exactly that.
Additionally, today we use the "-D" command line flag to set the
properties, but this is confusing because "-D" is a special flag to the
JVM for setting system properties. This creates confusion because some
"-D" properties should be passed via arguments to the JVM (so via
ES_JAVA_OPTS), and some should be passed as arguments to
Elasticsearch. This commit changes the "-D" flag for Elasticsearch
settings to "-E".
The ingest stats include the following statistics:
* `ingest.total.count`- The total number of document ingested during the lifetime of this node
* `ingest.total.time_in_millis` - The total time spent on ingest preprocessing documents during the lifetime of this node
* `ingest.total.current` - The total number of documents currently being ingested.
* `ingest.total.failed` - The total number ingest preprocessing operations failed during the lifetime of this node
Also these stats are returned on a per pipeline basis.
Currently, the cluster service is tightly coupled to the transport service by both managing node connections and requiring the bound address in order to create the local disco node. This commit introduces a new NodeConnectionsService which is in charge of node connection management and makes it possible to remove all network related calls from the cluster service. The local DiscoNode is now created by DiscoveryNodeService and is set both the cluster service and the transport service during node start up.
Closes#16788Closes#16872
Use index UUID to lookup indices on IndicesService
Today we use the index name to lookup index instances on the IndicesService
which applied to search requests but also to index deletion etc. This commit
moves the interface to expect an Index instance which is a tuple
and looks up the index by uuid rather than by name. This prevents accidental modification
of the wrong index if and index is recreated or searching from the wrong index in such a case.
Accessing an index that has the same name but different UUID will now result in an IndexNotFoundException.
Today we use the index name to lookup index instances on the IndicesService
which applied to search reqeusts but also to index deletion etc. This commit
moves the interface to expcet and `Index` instance which is a <name, uuid> tuple
and looks up the index by uuid rather than by name. This prevents accidential modificaiton
of the wrong index if and index is recreated or searching from the _wrong_ index in such a case.
Accessing an index that has the same name but different UUID will now result in an IndexNotFoundException.
Closes#17001
_wait_for_completion defaults to false. If set to true then the API will
wait for all the tasks that it finds to stop running before returning. You
can use the timeout parameter to prevent it from waiting forever. If you
don't set a timeout parameter it'll default to 30 seconds.
Also adds a log message to rest tests if any tasks overrun the test. This
is just a log (instead of failing the test) because lots of tasks are run
by the cluster on its own and they shouldn't cause the test to fail. Things
like fetching disk usage from the other nodes, for example.
Switches the request to getter/setter style methods as we're going that
way in the Elasticsearch code base. Reindex is all getter/setter style.
Closes#16906
* master: (350 commits)
Note to configuration docs on number of threads
Reduce maximum number of threads in boostrap check
Limit generic thread pool
Remove NodeService injection to Discovery
Prevent closing index during snapshot restore
[TEST] Fix newline issue in PluginCliTests on Windows
ParseFieldMatcher should log when using deprecated settings. #16988
fix checkstyle error
Add test for the index_options on a keyword field. #16990
Analysis : Allow string explain param in JSON
Analysis : Allow string explain param in JSON
fix typo
Remove SNAPSHOT from versions in plugin descriptors
Add support for alpha versions
Enable unmap hack for java 9
Simplify mock scripts
Adding `time_zone` parameter to daterange-aggregation docs
Adding tests for `time_zone` parameter for date range aggregation
Added ingest info to node info API, which contains a list of available processors.
Remove bw compat from size mapper
...
Closes#16964
Squashed commit of the following:
commit a23f9d2d29220991aa498214530753d7a5a148c6
Merge: eec9c4e 0b0a251
Author: Robert Muir <rmuir@apache.org>
Date: Mon Mar 7 04:12:02 2016 -0500
Merge branch 'master' into lucene6
commit eec9c4e5cd11e9c3e0b426f04894bb2a6dae4f21
Merge: bc67205 675d940
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 13:45:00 2016 -0500
Merge branch 'master' into lucene6
commit bc67205bdfe1526eae277ab7856fc050ecbdb7b2
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 09:56:31 2016 -0500
fix test bug
commit a60723b007ff12d97b1810cef473bd7b553a0327
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:35:35 2016 +0100
Fix SimpleValidateQueryIT to put braces around boosted terms
commit ae3a49d7ba7ced448d2a5262e5d8ec98671a9090
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:27:25 2016 +0100
fix multimatchquery
commit ae23fdb88a8f6d3fb7ba60fd1aaf3fd72d899aa5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:20:49 2016 +0100
Rewrite DecayFunctionScoreIT to be independent of the similarity used
This test relied a lot on the term scoring and compared scores
that are dependent on the similarity. This commit changes the base query
to be a predictable constant score query.
commit 366c2d518c35d31251033f1b6f6a93f6e2ae327d
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 14:06:14 2016 +0100
Fix scoring in tests due to changes to idf calculation.
Lucene 6 uses a different default similarity as well as a different
way to calculate IDF. In contrast to older version lucene 6 uses docCount per field
to calculate the IDF not the # of docs in the index to overcome the sparse field
cases.
commit dac99fd64ac2fa71b8d8d106fe68825e574c49f8
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:21:57 2016 -0500
don't hardcoded expected termquery score
commit 6e9f340ba49ab10eed512df86d52a121aa775b0f
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:04:45 2016 -0500
suppress deprecation warning until migrated to points
commit 3ac8908424b3fdad44a90a4f7bdb3eff7efd077d
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:21:43 2016 -0500
Remove invalid test: all commits have IDs, and its illegal to do this.
commit c12976288124ad1a26467e7e848fb810548e7eab
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:06:14 2016 -0500
don't test with unsupported back compat
commit 18bbfe76128570bc70883bf91ff4c44c82d27817
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:02:18 2016 -0500
remove now invalid lucene 4 backcompat test
commit 7e730e572886f0ef2d3faba712e4256216ff01ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:58:52 2016 -0500
remove now invalid lucene 4 backwards test
commit 244d2ab6868ba5ac9e0bcde3c2833743751a25ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:47:23 2016 -0500
use 6.0 codec
commit 5f64d4a431a6fdaa1234adca23f154c2a1de8284
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:43:08 2016 -0500
compile, javadocs, forbidden-apis, etc
commit 1f273cd62a7fe9ca8f8944acbbfc5cbdd3d81ccb
Merge: cd33921 29e3443
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 10:45:29 2016 +0100
Merge branch 'master' into lucene6
commit cd33921ac742ef9fb351012eff35f3c7dbda7264
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:58:37 2016 -0500
fix hunspell dictionary loading
commit c7fdbd837b01f7defe9cb1c24e2ec65604b0dc96
Merge: 4d4190f d8948ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:41:53 2016 -0500
Merge branch 'master' into lucene6
commit 4d4190fd82601aaafac6b8254ccb3edf218faa34
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:39:14 2016 -0500
remove nocommit
commit 77ca69e288b1a41aa9595c921ed166c272a00ea8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:38:24 2016 -0500
clean up numericutils vs legacynumericutils
commit a466d696fbaad04b647ffbc0857a9439b583d0bf
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:32:43 2016 -0500
upgrade spatial4j
commit 5412c747a8cfe638bacedbc8233163cb75cc3dc5
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:19:28 2016 -0500
move to 6.0.0-snapshot-8eada27
commit b32bfe924626b87e540692375ece09e7c2edb189
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:30:09 2016 +0100
Fix some test compile errors.
commit 6ccde35e9840b03c68d1a2cd47c7923a06edf64a
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:25:51 2016 +0100
Current Lucene version is 6.0.0.
commit f62e1015d931b4cc04c778298a8fa1ba65e97ad9
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:20:48 2016 +0100
Fix compile errors in NGramTokenFilterFactory.
commit 6837c6eabf96075f743649da9b9b52dd39611c58
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:50:59 2016 +0100
Fix the edge ngram tokenizer/filter.
commit ccd7f070de5efcdfbeb34b9555c65c4990bf1ba6
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:42:44 2016 +0100
The missing value is now accessible through a getter.
commit bd3b77f9b28e5b05daa3d49683a9922a6baf2963
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:41:51 2016 +0100
Remove IndexCacheableQuery.
commit 05f3091c347aeae80eeb16349ac51d2b53cf86f7
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:39:43 2016 +0100
Fix compilation of function_score queries.
commit 81cda79a2431ac78f56b0cc5a5765387f662d801
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:35:02 2016 +0100
Fix compile errors in BlendedTermQuery.
commit 70994ce8dd1eca0b995870974a38e20f26f96a7b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 23:33:03 2016 -0500
add bug ID
commit 29d4f1a71f36f646b5a6060bed3db019564a279d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 21:02:32 2016 -0500
easy .store changes
commit 5e1a1e6fd665fa455e88d3a8987362fad5f44bb1
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:47:24 2016 -0500
cleanups mostly around boosting
commit 333a669ec6c305ada5645d13ed1da0e19ec1d053
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:27:56 2016 -0500
more simple fixes
commit bd5cd98a1e089c866b6b4a5e159400b110140ce6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:49:38 2016 -0500
more easy fixes and removal of ancient cruft
commit a68f419ee47da5f9c9ce5b372f01d707e902474c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:35:02 2016 -0500
cutover numerics
commit 4ca5dc1fa47dd5892db00899032133318fff3116
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:34:18 2016 -0500
fix some constants
commit 88710a17817086e477c6c021ec346d0534b7fb88
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:14:25 2016 -0500
Add spatial-extras jar as a core dependency
commit c8cd6726583e5ce3f546ed355d4eca037164a30d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:03:33 2016 -0500
update to lucene 6 jars
This commit simplifies and consolidates the two different
implementations of terminals used in tests. There is now a single
MockTerminal which captures output, and allows accessing as one large
string (with unix style \n as newlines), as well as configuring
input.
No changes were really needed in our test infra as it didn't use `node.client`. Yet it didn't take into account ingest nodes, what we used to call client nodes in InternalTestCluster were actually ingest only nodes, which now become coordinating only nodes.
Also renamed some method to get rid of the node client terminology as much as possible in favour or coordinating only node.
As discussed in #16565, the node.client setting is an unnecessary shortcut to node.data: false and node.master: false. We have places where we treat nodes with node.client set to true differently compared to master false and data false, which is not correct. Also, with the addition of node.ingest or potentially new roles, it becomes confusing to figure out if a node client should support ingestion or not.
This commit removes the node.client setting in favour being explicit using node.master, node.data and node.ingest instead.
This commit modifies TransportBulkAction to use relative time instead of
absolute time when measuring how long a bulk request took to be
processed, and adds tests for this functionality.
Closes#16916
The big win here is catching tests that are incorrectly named and will
be skipped by gradle, providing a false sense of security.
The whole thing takes about 10 seconds on my Macbook Air, not counting
compiling the test classes, which seems worth it. Because this runs as
a gradle task with propery UP-TO-DATE handling it can be skipped if the
tests haven't been changed which should save some time.
I chose to keep this in test:framework rather than a new subproject of
buildSrc because ESIntegTestCase and doesn't inroduce any additional
dependencies.
DiscoveryService was a bridge into the discovery universe. This is unneeded and we can just access discovery directly or do things in a different way.
One of those different ways, is not having a dedicated discovery implementation for each our dicovery plugins but rather reuse ZenDiscovery.
UnicastHostProviders are now classified by discovery type, removing unneeded checks on plugins.
Closes#16821
Instead of modifying methods each time we need to add a new behavior for settings, we can simply pass `SettingsProperty... properties` instead.
`SettingsProperty` could be defined then:
```
public enum SettingsProperty {
Filtered,
Dynamic,
ClusterScope,
NodeScope,
IndexScope
// HereGoesYours;
}
```
Then in setting code, it become much more flexible.
TODO: Note that we need to validate SettingsProperty which are added to a Setting as some of them might be mutually exclusive.
This commit works around an issue with hostname verification in HttpClient when using IPv6
addresses in URLs. When an IPv6 address is used in a URL it is typically wrapped with square
brackets. The hostname verifier for HttpClient does not recognize these as valid IPv6 addresses
and instead treats them as a DNS name. We wrap the strict hostname verifier for this version
of HttpClient and strip brackets if we need to.
The corresponding issue in HttpClient is https://issues.apache.org/jira/browse/HTTPCLIENT-1698
but the fix has not been released yet in a stable version.
Currently dynamic mappings propgate through call semantics, where deeper
dynamic mappings are merged into higher level mappings through
return values of recursive method calls. This makese it tricky
to handle multiple updates in the same method, for example when
trying to create parent object mappers dynamically for a field name
that contains dots.
This change makes the api for adding mappers a simple list
of new mappers, and moves construction of the root level mapping
update to the end of doc parsing.
Today we might start a node and some of the paths might not have the
required permissions. This commit goes through all data directories as
well as index, shard and state directories and ensures we have write access.
To make this work across all OS etc. we are trying to write a real file
and remove it again in each of those directories
Today we have the notion of a snapshot inside Version.java which makes
releasing complicated since to do a release Version.java must be changed.
This commit removes all notions of snapshot from the code and allows to
switch between snapshot and release build by specifying a system property on
the build. For instance running:
```
gradle run -Dbuild.snapshot=false
```
will build and package a release build while the default always
builds snapshots. Calls to the main rest action will still get the snapshot
information rendered out with the response.
This commit moves IndicesRequestCache into o.e.indics and makes all API in this
class package private. All references to SearchReqeust, SearchContext etc. have been factored
out and relevant glue code has been added to IndicesService. The IndicesRequestCache is not a
simple class without any hard dependencies on ThreadPool nor SearchService or IndexShard. This now
allows to add unittests.
This commit also removes two settings `indices.requests.cache.clean_interval` and `indices.fielddata.cache.clean_interval`
in favor of `indices.cache.clean_interval` which cleans both caches.
this is a minor cleanup that detaches `IndicesRequestCache` and `IndicesQueryCache`
from guice and moves it into `IndicesService`. It also decouples the `IndexShard` and `IndexService`
from these caches which are unnecessary dependencies.
This PR renames the following three variables to fix a typo `settting` into `setting`.
* Rename a static class member:
INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTTING -> INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING
* Rename a parameter: aSettting --> aSetting
* Rename a local variable: indexSetttings -> indexSettings
IndexShard currently holds an arbitraritly used `getQueryShardContext` that comes
out of a ThreadLocal. It's usage is undefined and arbitraty since there is also
such a method with different semantics on `IndexService` This commit removes the threadLocal on
IndexShard as well as on the context itself. It's types are now a member and the QueryShardContext
lifecycle is managed byt SearchContext which passes the types on from the SearchRequest.
There is no need for IndicesWarmer to be a global accessible class. All it needs
access to is inside IndexService. It also doesn't need to be mutable once it's not a per node
instance. This commit move IndicesWarmer to IndexWarmer and makes the default impls like field data and
norms warming an impl detail. Also the IndexShard doesn't depend on this class anymore, instead it accepts
an Engine.Warmer as a ctor argument which delegates to the actual warmer from the index.
Indices level field data cacheing belongs into IndicesService and doesn't need to be
wired by guice. This commit also moves the async cache refresh out of the class into
IndicesService such that threadpool dependencies are removed and testing / creation becomes
simpler.
This change documents the Terminal abstraction that cli tools use, as
well as simplifies the api to be a minimal set of methods to interact
with a terminal.
This commit adds a convenience method for producing random positive time
values which can be useful for places where non-negative and non-zero
time values are expected.
This commit enableds strict settings validation on node startup. All settings
passed to elasticsearch either through system properties, yaml files or any other
way to pass settings must be registered and valid. Settings that are unknown ie. due to
typos or due to deprecation or removal will cause the node to NOT start up. Plugins
have to declare all their settings on the `SettingsModule#registerSetting` and settings for
plugins that are not installed must be removed.
This commit also removes the ability to specify the nodes name via `-Des.name` or just `name` in the
configuration files. The node name must be prefixed with the node prexif like `node.name: Boom`. Left over
usage of `name` will also cause startup to fail.
When primary relocation completes, a cluster state is propagated that deactivates the old primary and marks the new primary as active.
As cluster state changes are not applied synchronously on all nodes, there can be a time interval where the relocation target has processed
the cluster state and believes to be the active primary and the relocation source has not yet processed the cluster state update and
still believes itself to be the active primary. This commit ensures that, before completing the relocation, the reloction source deactivates
writing to its store and delegates requests to the relocation target.
Closes#15900
The plugin cli currently is extremely lenient, allowing most errors to
simply be logged. This can lead to either corrupt installations (eg
partially installed plugins), or confused users.
This change rewrites the plugin cli to have almost no leniency.
Unfortunately it was not possible to remove all leniency, due in
particular to how config files are handled.
The following functionality was simplified:
* The format of the name argument to install a plugin is now an official
plugin name, maven coordinates, or a URL.
* Checksum files are required, and only checked, for official plugins
and maven plugins. Checksums are also only SHA1.
* Downloading no longer uses a separate thread, and no longer has a timeout.
* Installation, and removal, attempts to be atomic. This only truly works
when no config or bin files exist.
* config and bin directories are verified before copying is attempted.
* Permissions and user/group are no longer set on config and bin files.
We rely on the users umask.
* config and bin directories must only contain files, no subdirectories.
* The code is reorganized so each command is a separate class. These
classes already existed, but were embedded in the plugin cli class, as
an extra layer between the cli code and the code running for each command.
With the recent change to allow ESSingleNodeTestCase to specify plugins
and Version for the node, node creation became lazy, where it now
happens before the first test to run. However, there were many static
methods on ESSingleNodeTestCase that still try to access the node. This
change makes those methods non-static.
This commit removes and forbids the use of lowercase ells ('l') in long
literals because they are often hard to distinguish from the digit
representing one ('1').
Closes#16329
We are leaking all kinds of resources if something during Node#close() barfs.
This commit cuts over to a list of closeables to release resources that
also closed remaining services if one or more services fail to close.
Closes#13685
In junit5, a neat assertion method is added which makes testing expected
failures a little more straightforward. The block of code that is
expected to throw is passed in with a lambda expression, and the caught
exception returned for inspection.
This change adds an implementation of expectThrows to ESTestCase. When
junit5 is eventually releassed and we switch to it, we can remove.
Now MasterNodeOperations, ReplicationAllShards, ReplicationSingleShard, BroadcastReplication and BroadcastByNode actions keep track of their parent tasks.
Note for many of the settings we had a netty specific override to a generic transport setting. I have removed those as they are not needed imo (ex. "transport.connections_per_node.recovery" was overriden by "transport.netty.connections_per_node.recovery", which I removed). The netty prefix was kept for settings that are tied to the netty universe - ex. "transport.netty.max_cumulation_buffer_capacity"
Closes#16200
In the early days Elasticsearch used to use the index name as the index identity. Around 1.0.0 we introduced a unique index uuid which is stored in the index setting. Since then we used that uuid in a few places but it is by far not the main identifier when working with indices, partially because it's not always readily available in all places.
This PR start to make a move in the direction of using uuids instead of name by making sure that the uuid is available on the Index class (currently just a wrapper around the name) and as such also available via ShardRouting and ShardId.
Note that this is by no means an attempt to do the right thing with the uuid in all places. In almost all places it falls back to the name based comparison that was done before. It is meant as a first step towards slowly improving the situation.
Closes#16217
This commit limits the `index.translog.sync_interval` to a value not less than `100ms` and
removes the support for fsync on every operation which used to be enabled if `index.translog.sync_interval` was set to `0s`
Now this pr also only schedules an async fsync if the durability is set to `async`. By default not async task is scheduled.
Closes#16152
This commit method renames the ScriptEngineService interface methods
types, extensions, and sandboxed to getTypes, getExtensions, and
isSandboxed, respectively.
This commit converts the script mode settings to the new settings
infrastructure. This is a major refactoring of the handling of script
mode settings. This refactoring is necessary because these settings are
determined at runtime based on the registered script engines and the
registered script contexts.
The search_after parameter provides a way to efficiently paginate from one page to the next. This parameter accepts an array of sort values, those values are then used by the searcher to sort the top hits from the first document that is greater to the sort values.
This parameter must be used in conjunction with the sort parameter, it must contain exactly the same number of values than the number of fields to sort on.
NOTE: A field with one unique value per document should be used as the last element of the sort specification. Otherwise the sort order for documents that have the same sort values would be undefined. The recommended way is to use the field `_uuid` which is certain to contain one unique value for each document.
Fixes#8192
When a Version is passed to `Settings#put(String, Version)` it's id is used as
an integer which should also be used to deserialize it on the consumer end.
Today AssertinLocalTransport expects Version#toString() to be used which can lead
to subtile bugs in tests.
This commit migrates all the settings under network service to the new settings infra.
It also adds some chaining utils to make fall back settings slightly less verbose.
Breaking (but I think acceptable) - network.tcp.no_delay and network.tcp.keep_alive used to accept the value `default` which make us not set them at all on netty. Our default was true so we weren't using this feature. I removed it and now we only accept a true boolean.
* `test.cluster.node.seed` was only used in one place where it wasn't adding value and was now replaced with a constant
* `tests.portsfile` is now an official setting and has been renamed to `node.portsfile`
this commit also convert `search.default_keep_alive` and `search.keep_alive_interval` to the new settings infrastrucutre
This commit allows an integ test to wrap all clients that are exposed by the
InternalTestCluster which is useful for request interception or to add general headers
to request or to intercept client to server call even if the client calls are done from within
the test framework.
This commit fixes an issue in the handling of TransportExceptions in
ShardStateAction. There were two cases not being handled correctly.
- when the local node is shutting down, handlers will be notified with
a TransportException with a message starting "transport stopped"
- when the remote node disconnects, handlers will be notified with a
NodeDisconnectedException
In both of these cases, the cause of the exception will be null and this
was incorrectly being handled. The first case can passed to the listener
like any other critical non-channel failure, and the second case can be
handled by modifying the logic for detecting master channel exceptions.
There was a third case of NodeNotConnectedException that was not being
treated as a master channel exception but should be.
This commit adds an integration test that simulates the handling of a
shard failure request during a network partition. By isolating the
master from the cluster while a shard failed request is in flight, this
test simulates that we wait until a new master is elected and then retry
sending that shard failed request to the newly elected master.
This commit adds methods to CapturingTransport to separate local and
remote transport exceptions. The motivation for this change is that
local transport exceptions are delivered to listeners (usually, but not
always) wrapped in SendRequestTransportException while remote transport
exceptions are delivered to listeners wrapped in
RemoteTransportException. By making this distinction clear in the
CapturingTransport, this makes it less likely that tests will make
incorrect assumptions about the exceptions coming out of the transport
layer to listeners.
Closes#16057
The rest test framework, because it used to be tightly integrated with
ESIntegTestCase, currently expects the addresses for the test cluster to
be passed using the transport protocol port. However, it only uses this
to then find the http address.
This change makes ESRestTestCase extend from ESTestCase instead of
ESIntegTestCase, and changes the sysprop used to tests.rest.cluster,
which now takes the http address.
closes#15459
We have two similar tests with the same name, ContextAndHeaderTransportTests.
They shared lots of common code so I extracted much of it into
ActionRecordingPlugin, a plugin which records all action requests for later
inspection.
I also removed all the warnings from both tests. That made lang-mustache
compile cleanly without any custom -Xlint so I removed those. To remove
the warnings I had to add type parameters to ActionFilter which seemed
like a good idea anyway.
The old infa has been removed in this commit such that nothing uses `DynamicSettings` anymore
and all index-scoped settings require to be registered before the node has fully started up.
Site plugins used to be used for things like kibana and marvel, but
there is no longer a need since kibana (and marvel as a kibana plugin)
uses node.js. This change removes site plugins, as well as the flag for
jvm plugins. Now all plugins are jvm plugins.
This undocumented setting was mainly used for testing and a safety net for
a new flushOnClose feature back in 1.x. We can now safely remove this setting.
The test usage now uses a mock plugin setting to achive the same.
Today we maintain a lot of settings on the shard level which are all index level settings.
In order to cut over to the new settings API where we register update listener we have to move
all of them on to the index level otherwise we need a way to un-register listeners which is error-prone
and requires additional handling when shards are closed. It's simpler and also more accurate to handle all of
them on the index level where we can trash the entire registry for update listener once the index goes out of scope.
ContextAndHeaders has a massive impact on the core infrastructure since it has to
be manually passed on to all relevant places across threads/network calls etc. For the same reason
it's also very error prone and easily forgotten on potentially relevant APIs.
The new ThreadContext is associated with a ThreadPool (node or transport client) and ensures that
headers and context registered on a current thread are inherited to new threads spawned, send across
the network to be deserialized on the receiver end as well as restored on the response handling thread
once the response is received.
This commit adds convenience methods to o.e.t.t.CapturingTransport
that enables capturing requests and clearing the captured requests
with a single method. This is to simplify a common pattern in tests of
capturing requests, and then clearing the captured requests.
1. Uses forbidden patterns to prevent things from referencing
java.io.Serializable or from mentioning serialVersionUID.
2. Uses -Xlint:-serial so we don't have to hear from javac that we aren't
declaring serialVersionUID on any classes that we make that happen to extend
Serializable.
3. Remove Serializable and serialVersionUID declarations.
I didn't use forbidden apis because it doesn't look like it has a way to ban
explicitly implementing Serializable. If you try to ban Serializable with
forbidden apis you end up banning all Exceptions and all Strings.
Closes#15847
This commit removes and now forbids use of
org.apache.lucene.index.IndexWriter#isLocked as this method was
deprecated in LUCENE-6508. The deprecation is due to the fact that
checking if a lock is held before acquiring that lock is subject to a
time-of-check-to-time-of-use race condition. There were three uses of
IndexWriter#isLocked in the code base:
- a logging statement in o.e.i.e.InternalEngine where we are already in
an exceptional condition that the lock was held; in this case,
logging whether or not the directory is locked is superfluous
- in o.e.c.l.u.VersionsTests where we were verifying that a write lock
is released upon closing an IndexWriter; in this case, the check is
not needed as successfully closing an IndexWriter releases its
write lock
- in o.e.t.s.MockFSDirectoryService where we were verifying that a
directory is not write-locked before (implicitly) trying to obtain
such a write lock in org.apache.lucene.index.CheckIndex#<init> (this
is the exact type of a situation that is subject to a race
condition); in this case we can proceed by just (implicitly) trying
to obtain the write lock and failing if we encounter a
LockObtainFailedException
* Added percolator field mapper that extracts the query terms and indexes these terms with the percolator query.
* At percolate time these extracted terms are used to query percolator queries that are like to be evaluated. This can significantly cut down the time it takes to percolate. Whereas before all percolator queries were evaluated if they matches with the document being percolated.
* Changes made to percolator queries are no longer immediately visible, a refresh needs to happen before the changes are visible.
* By default the percolate api only returns upto 10 matches instead of returning all matching percolator queries.
* Made percolate more modular, so that it is easier to add unit tests.
* Added unit tests for the percolator.
Closes#12664Closes#13646
Adds task manager class and enables all activities to register with the task manager. Currently, the immutable Transport*Activity class represents activity itself shared across all requests. This PR adds and an additional structure Task that keeps track of currently running requests and can be used to communicate with these requests using TransportTaskAction.
Related to #15117
When waiting indefinitely for a new cluster state in a test,
TestClusterService#add will throw a NullPointerException if the timeout
is null. Instead, TestClusterService#add should guard against a null
timeout and not even attempt to add a notification for the timeout
expiring. Note that the usage of null is the agreed upon contract for
specifying an indefinite wait from ClusterStateObserver.
DedicatedClusterSnapshotRestoreIT#testRestoreIndexWithMissingShards took ~1.5 min to finish
due to timeouts that are applied if not all shards are allocated. Now that the index that has
unallocated shareds is not refreshed the test is more reasonable and runs in 15 sec
Today we throttle recoveries only for incoming recoveries. Nodes that have a lot
of primaries can get overloaded due to too many recoveries. To still keep that at bay
we limit the number of threads that are sending files to the target to overcome this problem.
The right solution here is to also throttle the outgoing recoveries that are today unbounded on
the master and don't start the recovery until we have enough resources on both source and target nodes.
The concurrency aspects of the recovery source also added a lot of complexity and additional threadpools
that are hard to configure. This commit removes the concurrent streamns notion completely and sends files
in the thread that drives the recovery simplifying the recovery code considerably.
Outgoing recoveries are not throttled on the master via a allocation decider.
Today the logic to async - commit the translog is in every translog instance
itself. While the setting is a per index setting we manageing it per shard. This
polluts the translog code and can more easily be managed in IndexService.
Today we have two variants of translogs for indexing. We only recommend the buffered
one which also has a 20% advantage in indexing speed. This commit removes the option and defaults
to the buffered case. It also hard-wires the translog buffer to 8kb instead of 64kb. We used to
adjust that buffer based on if the shard is active or not, this code has also been removed and
instead we just keep an 8kb buffer arround.
This commit removes `index.translog.flush_threshold_ops` and `index.translog.disable_flush`
in favor of `index.translog.flush_threshold_size`. The number of operations is meaningless by itself and
can easily be turned into a size value with knowledge of the data. Disabling the flush is only useful in
tests and we can set the size value to a really high value. If users really need to do this they can
also apply a very high value like `1PB`.
This change adds a Fixture class for use by gradle. A Fixture is an
external process that integration tests will use. It can be added as a
dependsOn for integTest, and will automatically be shutdown upon success
or failure, as well as relevant information dumped on failure. There is
also an example fixture in this change.