The current logic for doing recovery from a source to a target shourd is tightly coupled with the underlying network pipes. This changes decouple the two, making it easier to add unit tests for shard recovery that doesn't involve the node and network environment.
On top that, RecoveryTarget is renamed to RecoveryTargetService leaving space to renaming RecoveryStatus to RecoveryTarget (and thus avoid the confusion we have today with RecoveryState).
Correspondingly RecoverySource is renamed to RecoverySourceService.
Closes#16605
All we do is check the cancelled flag and stop the request at a few key
points.
Adds the cancellation cause to the status so any request that is cancelled
but doesn't die can be seen in the task list.
The `keyword` field is intended to replace `not_analyzed` string fields. It is
indexed and has doc values by default, and doesn't support enabling term
vectors.
Although it doesn't support setting an analyzer for now, there are plans for
it to support basic normalization in the future such as case folding.
2.x has show so far that running with security manager is the way to go.
This commit make this non-optional. Users that need to pass their own rules
can still do this via the system configuration for the security manager. They
can even opt out of all security that way.
This commit moves IndicesRequestCache into o.e.indics and makes all API in this
class package private. All references to SearchReqeust, SearchContext etc. have been factored
out and relevant glue code has been added to IndicesService. The IndicesRequestCache is not a
simple class without any hard dependencies on ThreadPool nor SearchService or IndexShard. This now
allows to add unittests.
This commit also removes two settings `indices.requests.cache.clean_interval` and `indices.fielddata.cache.clean_interval`
in favor of `indices.cache.clean_interval` which cleans both caches.
Some bw incompatible setting changes:
http.netty.http.blocking_server -> http.tcp.blocking_server
http.netty.host (removed, we just have http.host)
http.netty.bind_host (removed, we just have http.bind_host)
http.netty.publish_host (removed, we just have http.publish_host)
http.netty.tcp_no_delay -> http.tcp.no_delay
http.netty.tcp_keep_alive -> http.tcp.keep_alive
http.netty.reuse_address -> http.txp.reuse_address
http.netty.tcp_send_buffer_size -> http.tcp.send_buffer_size
http.netty.tcp_receive_buffer_size -> http.tcp.receive_buffer_size
Closes#16531
this is a minor cleanup that detaches `IndicesRequestCache` and `IndicesQueryCache`
from guice and moves it into `IndicesService`. It also decouples the `IndexShard` and `IndexService`
from these caches which are unnecessary dependencies.
QueryBuilders today do all their heavy lifting in toQuery() which
can be too late for several operations. For instance if we want to fetch geo shapes
on the coordinating node we need to do all this before we create the actual lucene query
which happens on the shard itself. Also optimizations for request caching need to be done
to the query builder rather than the query which then in-turn needs to be serialized again.
This commit adds the basic infrastructure for query rewriting and moves the heavy lifting into
the rewrite method for the following queries:
* `WrapperQueryBuilder`
* `GeoShapeQueryBuilder`
* `TermsQueryBuilder`
* `TemplateQueryBuilder`
Other queries like `MoreLikeThisQueryBuilder` still need to be fixed / converted. The nice
sideeffect of this is that queries like template queries will now also match the request cache
if their non-template equivalent has been cached befoore. In the future this will allow to
add optimizataion like rewriting time-based queries into primitives like `match_all_docs` or `match_no_docs`
based on the currents shards bounds. This is especially appealing for indices that are read-only ie. never change.
In the testCanFetchIndexStatus the task check can occur before the indexing process is started making the test to fail. This commit adds an additional lock to make sure we check tasks only after at least one of the tasks is registered.
This commit removes bootstrap support for Java Service Wrapper. The
implementation of this has been moved to its own repository where it was
deprecated, does not work with Elasticsearch 2.x, and is untested and
therefore unmaintained.
Closes#16580
This commit handles the scenario where a replication action fails on a
replica shard, the primary shard attempts to fail the replica shard
but the primary shard is notified of demotion by the master. In this
scenario, the demoted primary shard must be failed, and then the
request rerouted again to the new primary shard.
Closes#16415, closes#14252
Adds to GeoDistanceSortBuilder:
* equals
* hashcode
* writeto/readfrom
* moves xcontent parsing logic over
* adds roundtrip tests
* fixes roundtrip test for xcontent by keeping points just as geopoints not geohashes internally
* fixes xcontent parsing of ignore_malformed if coerce is set/unset
* adds exception to sortMode setter to avoid setting invalid sort modes
Relates to #15178
Today put mapping operations only update metadata of the type that is being
modified, which is not enough since some modifications may have side-effects
on other types.
Closes#16239
Only tasks that extend CancellableTask can be cancelled using this mechanism. If a cancellable task has children it can elect to cancel all child tasks as well. In this case a special ban parent request is sent to all nodes. This request does two things: 1) it prevents any tasks with the banned parent task from being started, and 2) it cancels all currently running tasks that have the banned task as a parent. The ban is lifted as soon as the coordinating node notifies all other nodes that the cancelled task has finished executing. If the coordinating node leaves the cluster before it has a chance to lift its bans, all bans set by this coordinating node are automatically removed.
As an option a task can elect to automatically cancel all child tasks if their parent task was running on a node that just left the cluster. This option makes sense for cancellable heavy tasks that have no side-effects and only return results to the coordinating node. With the coordinating node gone, it doesn't make sense to run such tasks any longer since their results will be most likely discarded.
That is like some kind of cardinal sin or something, right?
We had two violations though they weren't super likely to be keys in a hashmap
any time soon.
This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.
Fields can be selected in the processor as well.
Close#16303
This PR renames the following three variables to fix a typo `settting` into `setting`.
* Rename a static class member:
INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTTING -> INDEX_TRANSLOG_FLUSH_THRESHOLD_SIZE_SETTING
* Rename a parameter: aSettting --> aSetting
* Rename a local variable: indexSetttings -> indexSettings
This commit registers bootstrap settings used on startup. Without
registration, setting any of these settings causes node startup to
fail. By registering these settings (rather than clearing) after use, we
enable them to be visible in any APIs that show all settings.
Closes#16513
The purpose of this commit is to speed up the runtime of
MessageDigestTests#testToHexString. As written, the test contains a loop
that creates 1024 test cases leading to a test runtime on the order of a
few seconds. Given build infrastructure, a single test case should
suffice. Therefore, this commit removes this loop so that the test can
execute on the order of a couple hundred milliseconds.
This commit includes a few minor cleanups to o/e/b/JavaVersion.java:
- Stronger argument checking in JavaVersion#parse
- Use JDK 8 string joiner
- Keep an immutable copy of the version sequence
IndexShard currently holds an arbitraritly used `getQueryShardContext` that comes
out of a ThreadLocal. It's usage is undefined and arbitraty since there is also
such a method with different semantics on `IndexService` This commit removes the threadLocal on
IndexShard as well as on the context itself. It's types are now a member and the QueryShardContext
lifecycle is managed byt SearchContext which passes the types on from the SearchRequest.
Recovery from store fails to correctly set the translog recovery stats. This fixes it and tightens up the logic bringing it all to IndexShard (previously it was set by the recovery logic).
Closes#15974Closes#16493
This commit modifies the MessageDigests message digest provider to
return a thread local instance of MessageDigest instances instead of
using clone since some providers do not support clone.
Closes#16479
There is no need for IndicesWarmer to be a global accessible class. All it needs
access to is inside IndexService. It also doesn't need to be mutable once it's not a per node
instance. This commit move IndicesWarmer to IndexWarmer and makes the default impls like field data and
norms warming an impl detail. Also the IndexShard doesn't depend on this class anymore, instead it accepts
an Engine.Warmer as a ctor argument which delegates to the actual warmer from the index.
The cat API previously used the Content-Type header field for
determining the media type of the response. This is in opposition to the
HTTP spec which specifies the Accept header field for this purpose. This
commit replaces the use of the Content-Type header field with the Accept
header field in the cat API.
Closes#14421
One of our tests leaked a system property here since we failed after appling some
system properties in BootstrapCLIParser. This is not a huge deal in production since
we exit the JVM if we fail on that. Yet for correctnes we should only apply them if
we manage to parse them all.
This also caused a test failure lately on CI but on an unrelated test:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/314/console
Indices level field data cacheing belongs into IndicesService and doesn't need to be
wired by guice. This commit also moves the async cache refresh out of the class into
IndicesService such that threadpool dependencies are removed and testing / creation becomes
simpler.
This processor is useful when all elements of a json array need to be processed in the same way.
This avoids that a processor needs to be defined for each element in an array.
Also it is very likely that it is unknown how many elements are inside an json array.
Retrieving distributed DF for TermVectors is beside it's esotheric justification
a very slow process and can cause serious load on the cluster. We also don't have nearly
enough testing for this stuff and given the complexity we should remove it rather than carrying it
around.
During initial cluster forming, when a master is elected, it reaches out to all other masters nodes and ask the last cluster state they persisted. To make sure we select the right state, we must successfully read from a `min_master_nodes` nodes. The gateway currently have specific settings to override this behavior, but I don't think they are ever used. We can drop them and reach out to the discovery layer, the single source of truth for the min master nodes settings.
Closes#16446
This change documents the Terminal abstraction that cli tools use, as
well as simplifies the api to be a minimal set of methods to interact
with a terminal.
Uses a refactored version of Netty's CORS implementation to provide more
robust cross-origin resource request functionality. The CORS specific
Elasticsearch parameters remain the same, just the underlying
implementation has changed.
It has also been refactored in a way that allows dropping in Netty's
CORS handler as a replacement once Elasticsearch is upgraded to Netty 4.
This change rewrites the entire settings filtering mechanism to be immutable.
All filters must be registered up-front in the SettingsModule. Filters that are comma-sparated are
not allowed anymore and check on registration.
This commit also adds settings filtering to the default settings recently added to ensure we don't render
filtered settings.
Mostly just wrapping the exception list.
Also:
* Reworks the docs on ElasticsearchExceptionHandle
* Removes long lines from ExceptionSerializationTests
* Switches one method from arrow shaped to early returns
* Adds line breaks
This splits the geo distance and geo distance sorting tests marked messy:
Test cases that don't really need Groovy support are moved back to the
core test suite closer to the code they actually test.
Relates to #15178
This commit modifies the string representation of a shard state action
request. The issue being addressed is that the previous logging would
log "failure: [Unknown]" for shard started actions but this just leads
to confusion that there is a failure but its cause is unknown.
Closes#16396
Today, shard failure requests are blindly handled on the master without
any validation that the request is a legal request. A legal request is a
shard failure request for which the shard requesting the failure is
either the local allocation or the primary allocation. This is because
shard failure requests are classified into only two sets: requests that
correspond to shards that exist, and requests that correspond to shards
that do not exist. Requests that correspond to shards that do not exist
are immediately marked as successful (there is nothing to do), and
requests that correspond to shards that do exist are sent to the
allocation service for handling the failure.
This pull request adds a third classification for shard failure requests
to separate out illegal shard failure requests and enables the master to
validate shard failure requests. The master communicates the illegality
of a shard failure request via a new exception:
NoLongerPrimaryShardException. This exception can be used by shard
failure listeners to discover when they've sent a shard failure request
that they were not allowed to send (e.g., if they are no longer the
primary allocation for the shard).
Closes#16275
Identifying when a plugin id is maven coordinates is currently done by
checking if the plugin id contains 2 colons. However, a valid url could
have 2 colons, for example when a port is specified. This change adds
another check, ensuring the plugin id with maven coordinates does not
contain a slash, which only a url would have.
closes#16376
This commit marks OldIndexBackwardsCompatibilityIT#testOldIndexes as
awaiting a Lucene snapshot upgrade to reflect the fact that
Elasticsearch 2.2.0 is built against Lucene 5.4.1 but the current Lucene
snapshot in master/2.x does not contain the Lucene version 5.4.1 field.
Relates #16373
This commit fixes a test bug in
JvmGcMonitorServiceSettingsTests#testMissingSetting. The purpose of the
test is to test that if settings are provided for a collector for at
least one of warn, info, and debug then it is provided for all of warn,
info, and debug. However, for a collector setting to be valid it must be
a positive time value but the randomization in the test construction
could produce zero time values.
Closes#16369
This removes the deprecation for the geohash based setter to quickly fix the failure here: http://build-us-00.elastic.co/job/es_core_master_suse/3312
Reintroducing postponed until related test in groovy module is fixed. Need to figure out what went wrong when I ran the build locally w/o failure before.
This commit enableds strict settings validation on node startup. All settings
passed to elasticsearch either through system properties, yaml files or any other
way to pass settings must be registered and valid. Settings that are unknown ie. due to
typos or due to deprecation or removal will cause the node to NOT start up. Plugins
have to declare all their settings on the `SettingsModule#registerSetting` and settings for
plugins that are not installed must be removed.
This commit also removes the ability to specify the nodes name via `-Des.name` or just `name` in the
configuration files. The node name must be prefixed with the node prexif like `node.name: Boom`. Left over
usage of `name` will also cause startup to fail.
Adds to GeoDistanceSortBuilder:
* equals
* hashcode
* writeto/readfrom
* moves xcontent parsing logic over
* adds roundtrip tests
* fixes roundtrip test for xcontent by keeping points just as geopoints not geohashes internally
* fixes xcontent parsing of ignore_malformed if coerce is set/unset
* adds exception to sortMode setter to avoid setting invalid sort modes
Relates to #15178
When primary relocation completes, a cluster state is propagated that deactivates the old primary and marks the new primary as active.
As cluster state changes are not applied synchronously on all nodes, there can be a time interval where the relocation target has processed
the cluster state and believes to be the active primary and the relocation source has not yet processed the cluster state update and
still believes itself to be the active primary. This commit ensures that, before completing the relocation, the reloction source deactivates
writing to its store and delegates requests to the relocation target.
Closes#15900
Adds a container that represents a resource with reference counting capabilities. Provides operations to suspend acquisition of new references. Useful for resource management when resources are intermittently unavailable.
Closes#15956
Its what we say our maximum line length is in CONTRIBUTING.md and it'd be
nice to have a check on that. Unfortunately, we don't actually wrap all lines
at 140.
Cli tools currently catch all exceptions, and only print the exception
message, except when a special system property is set. Even with this
flag set, certain exceptions, like IOException, are captured and their
stack trace is always lost.
This change adds a UserError class, which can be used a cli tools to
specify a message to the user, as well as an exit status. All other
exceptions are propagated out of main, so java will exit with non-zero
and print the stack trace.
This commit fixes an inconsistency between ShardId#equals(Object),
ShardId#hashCode, and ShardId#compareTo(ShardId). In particular,
ShardId#equals(Object) compared only the numerical shard ID and the
index name, but did not compare the index UUID; a similar situation
applies to ShardId#compareTo(ShardId). However, ShardId#hashCode
incorporated the indexUUID into its calculation. This can lead to
situations where two ShardIds compare as equal yet have different hash
codes.
Closes#16319
The following settings were moved to Setting contents in HttpTransportSettings:
* http.cors.allow-origin
* http.port
* http.publish_port
* http.detailed_errors.enabled
* http.max_content_length
* http.max_chunk_size
* http.max_header_size
* http.max_initial_line_length
The following settings were removed:
* http.port
* http.netty.port
* http.netty.publish_port
* http.netty.max_content_length
* http.netty.max_chunk_size
* http.netty.max_header_size
* http.netty.max_initial_line_length
As a prerequisite for refactoring the whole PhraseSuggestionBuilder
to be able to be parsed and streamed from the coordinating node, the
DirectCandidateGenerator must implement Writeable, be able to parse
a new instance (fromXContent()) and later when transported to the
shard to generate a PhraseSuggestionContext.DirectCandidateGenerator.
Also adding equals/hashCode and tests and moving DirectCandidateGenerator
to its own DirectCandidateGeneratorBuilder class.
Long ago (#7834) the owner ship of the local disco node was centralized to the cluster service. LocalDiscovery is still created it's own disco node, which is not used by the cluster service and thus creating confusion (two nodes same name but different ids).
This commit also removes and optimization where when joining a new master we would first copy the master's metadata and only then pull in the rest of the cluster state (and it's nodes).
Closes#16317
The plugin cli currently is extremely lenient, allowing most errors to
simply be logged. This can lead to either corrupt installations (eg
partially installed plugins), or confused users.
This change rewrites the plugin cli to have almost no leniency.
Unfortunately it was not possible to remove all leniency, due in
particular to how config files are handled.
The following functionality was simplified:
* The format of the name argument to install a plugin is now an official
plugin name, maven coordinates, or a URL.
* Checksum files are required, and only checked, for official plugins
and maven plugins. Checksums are also only SHA1.
* Downloading no longer uses a separate thread, and no longer has a timeout.
* Installation, and removal, attempts to be atomic. This only truly works
when no config or bin files exist.
* config and bin directories are verified before copying is attempted.
* Permissions and user/group are no longer set on config and bin files.
We rely on the users umask.
* config and bin directories must only contain files, no subdirectories.
* The code is reorganized so each command is a separate class. These
classes already existed, but were embedded in the plugin cli class, as
an extra layer between the cli code and the code running for each command.
With the recent change to allow ESSingleNodeTestCase to specify plugins
and Version for the node, node creation became lazy, where it now
happens before the first test to run. However, there were many static
methods on ESSingleNodeTestCase that still try to access the node. This
change makes those methods non-static.
This commit removes and forbids the use of lowercase ells ('l') in long
literals because they are often hard to distinguish from the digit
representing one ('1').
Closes#16329
Previously it's possible to get errors like:
```
Caused by: NotSerializableExceptionWrapper[d:\shared_data\afs-issue-1-1\23]
```
Which doesn't tell us what the underlying exception type was that could
not be serialized.
This changes the message to prepend the
`ElasticsearchException.getExceptionName()` of the exception (which is a
underscore case of the class with the leading 'Elasticsearch' removed)
This commit amends a logging statement in TransportReplicationAction to
log the full shard ID instead of just the numerical shard ID (without
the index name).
When there is an exception thrown during pipeline creation within
Rest calls (in put pipeline, and simulate) We now return a structured
error response to the user with details around which processor's
configuration is the cause of the issue, or which configuration property
is misconfigured, etc.
Does so by introducing TaskListener which is just like ActionListener but
gets the Task as each parameter. Unlike ActionListener which is used
_everywhere_ you can only use TaskListener directly with TransportAction.
TransportAction under the covers uses an ActionListener implemetation that
closes over the task to call the TaskListener.
We are leaking all kinds of resources if something during Node#close() barfs.
This commit cuts over to a list of closeables to release resources that
also closed remaining services if one or more services fail to close.
Closes#13685
This is an issue due to some refactoring we did along the lines of Index.java etc.
We pass the index object to Set#contains which should actually be only it's name.
Closes#16299
This can cause premature closing of the underlying file descriptor immediately
if at the same time the thread is blocked on IO. The file descriptor will remain closed
and subsequent access to NIOFSDirectory will throw a ClosedChannelException.
Adds a method that emits a WordScorerFactory to all of the
three SmoothingModel implementatins that will be needed when
we switch to parsing the PhraseSuggestion on the coordinating
node and need to delay creating the WordScorer on the shards.
Now MasterNodeOperations, ReplicationAllShards, ReplicationSingleShard, BroadcastReplication and BroadcastByNode actions keep track of their parent tasks.