Failing an initializing primary when shadow replicas are enabled for the index can leave the primary unassigned with replicas being active. Instead, a replica should be promoted to primary, which is fixed by this commit.
This commit makes two changes to how the in-sync allocations set is updated:
- the set is only trimmed when it grows. This prevents trimming too eagerly when the number of replicas was decreased while shards were unassigned.
- the allocation id of an active primary that failed is only removed from the in-sync set if another replica gets promoted to primary. This prevents the situation where the only available shard copy in the cluster gets removed the in-sync set.
Closes#21719
We had tests for the regular factories, but not for the pre-built ones, that
ship by default without requiring users to define them in the analysis settings.
Currently we expose the internal representation that we use for ip addresses,
which are the ipv6 bytes. However, this is not really usable, exposes internal
implementation details and also does not work fine with other APIs that expect
that the values can be `toString`'d.
Closes#21977
`_update_by_query` supports specifying the `pipeline` to process the
documents as a url parameter but `_reindex` doesn't. It doesn't because
everything about the `_reindex` request that has to do with writing
the documents is grouped under the `dest` object in the request body.
This changes the response parameter from
`request [_reindex] contains unrecognized parameter: [pipeline]` to
`_reindex doesn't support [pipeline] as a query parmaeter. Specify it in the [dest] object instead.`
When you submit a _termvectors request for an artificial document and
specify the 'preference' parameter to send the request to a particular
shard, the request sometimes hits NPE. Fix this case by ignoring the
auto-generated artificial document ID and pick a shard per the
preference parameter, or a random shard.
This closes#21928
* Include unindexed field in FieldStats response
This change adds non-searchable fields to the FieldStats response. These fields do not have min/max informations but they can be aggregatable. Fields that are only stored in _source (store:no, index:no, doc_values:no) will still be missing since they do not have any useful information to show. Indices and clients must be at least on V_5_2_0 to see this change.
Since the removal of local discovery of #https://github.com/elastic/elasticsearch/pull/20960 we rely on minimum master nodes to be set in our test cluster. The settings is automatically managed by the cluster (by default) but current management doesn't work with concurrent single node async starting. On the other hand, with `MockZenPing` and the `discovery.initial_state_timeout` set to `0s` node starting and joining is very fast making async starting an unneeded complexity. Test that still need async starting could, in theory, still do so themselves via background threads.
Note that this change also removes the usage of `INITIAL_STATE_TIMEOUT_SETTINGS` as the starting of nodes is done concurrently (but building them is sequential)
With this commit we document that ingest processors need to be
thread-safe. Previously this could be inferred from reading the
source code but we got several user questions about this so
it is stated explicitly in the Javadocs of Processor now.
Action filters currently have the ability to filter both the request and
response. But the response side was not actually used. This change
removes support for filtering responses with action filters.
Changes the default socket and connection timeouts for the rest
client from 10 seconds to the more generous 30 seconds.
Defaults reindex-from-remote to those timeouts and make the
timeouts configurable like so:
```
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"socket_timeout": "1m",
"connect_timeout": "10s"
},
"index": "source",
"query": {
"match": {
"test": "data"
}
}
},
"dest": {
"index": "dest"
}
}
```
Closes#21707
The RecoveryFailedException's output prints the source and
target nodes for the recovery. However, sometimes there is
no source node for the recovery, only a target node (such as
when recovering a primary shard from disk). In this case,
we don't want to display the source node. This commit fixes
this by displaying "Recovery failed on target node.." instead
of "Recovery failed from null to target node" which is what the
output currently displays.
This commit increases the test logging on the unicast zeng ping test of
simple pings to gather more info for chasing a race condition that is
happening in this test.
Sadly, the timeouts here need to be increased to reduce the likelihood
of spurious test failures (test hosts under load are especially prone to
this). This does slow down this test suite a bit, but it's still not as
slow as it was before this endeavor of lowering these timeouts started.
This commit cleans up the unicast zen ping unknown hosts cached test:
- send pings from the same node to more clearly indicate DNS lookups
are not cached (within the same UnicastZenPing instance)
- increase ping and wait timeout to 500ms to address race conditions
(on a test host under load, the timeout was too short for the
connect/handshake/ping cycle to complete)
The port limit test is a simple test that fakes that resolving an
address with a port range results the correct address collection. This
test is subject to a race condition where the timeout on the resolve
request can fire before the resolve code finishes executing (this race
is exceptionally rare, because there are not actually any DNS lookups
being done here since we are just resolving addresses). This commit
increases the timeout here to significantly reduce the chance of a
losing race causing a spurious test failure. This increased timeout
should not increase the runtime of the test, just make failures less
likely.
The unknown hosts test is a simple test that fakes that resolving an
address results in an unknown host exception. The main purpose of this
test is to ensure that we log (and do not silently drop) when a host
fails to resolve. This test is subject to a race condition where the
timeout on the resolve request can fire before the resolve code finishes
executing (this race is exceptionally rare, because there are not
actually any DNS lookups being done here, just a mock resolve
implementation that throws an exception and that's where losing the race
can arise). This commit increases the timeout here to significantly
reduce the chance of a losing race causing a spurious test failure. This
increased timeout should not increase the runtime of the test, just make
failures less likely.
This commit renames InternalEngine#loadSeqNoStatsLucene to
InternalEngine#loadSeqNoStatsFromLucene to make this name consistent
with the method InternalEngine#loadSeqNoStatsFromLuceneAndTranslog.
* Plugins: Replace Rest filters with RestHandler wrapper
RestFilters are a complex way of allowing plugins to add extra code
before rest actions are executed. This change removes rest filters, and
replaces with a wrapper which a single plugin may provide.
Today when starting a new engine, we read the global checkpoint from the
translog only if we are opening an existing translog. This commit
clarifies this situation by distinguishing the three cases of engine
creation in the constructor leading to clearer code.
Relates #21934
Today if system call filters fail to install on startup, we log a
message but otherwise march on. This might leave users without system
call filters installed not knowing that they have implicitly accepted
the additional risk. We should not be lenient like this, instead clearly
informing the user that they have to either fix their configuration or
accept the risk of not having system call filters installed. This commit
adds a bootstrap check that if system call filters are enabled, they
must successfully install.
Relates #21940
In #21828, serialization of the host string was added to preserve this information when
a TransportAddress gets serialized. However, there is still a case where this did not always
work. In UnicastZenPings, DiscoveryNode instances are created for the ping hosts with the
minimum compatibility version, which is currently less than the version required to preserve
the host information. This means that when a node is received from a PingResponse that the
host information is no longer set correctly on the InetSocketAddress contained in the
DiscoveryNode.
This commit adds a workaround for this situation by allowing the host string to be passed
into the TransportAddress constructor that takes a StreamInput and using that as the host
for the InetAddress that is created during deserialization.
I added an assertion to Netty4/Netty3Transport in 5.x that is not in
master yet. This commit port the assert to ensure we consumed all connection
in `connectToChannels`
If you ask for the term vectors of an artificial document with
term_statistics=true, but a shard does not have any terms of the doc's
field(s), it returns the doc's term vectors values as the shard-level
term statistics. This commit fixes that to return 0 for `ttf` and also
field-level aggregated statistics.
Closes#21906
We don't use the test infra nor do we run the tests. They might all be
entirely out of date. We also have a different BWC test infra in-place.
This change removes all of the legacy infra.
This commit adds a skip for the missing types REST test. There was a bug
for an unclosed XContent object on version prior to version 5.0.3 which
is going to lead to different responses on vresions prior to 5.0.3 and
versions on or after version 5.0.3.
The reference for the jvm.options docs recently changed from
es-java-opts to jvm-options. This commit fixes a broken reference that
arose as a result of this change.
When there are no indexes, get mapping has a series of special cases.
Two of those expect the response object already started, and the other
two respond with an exception. Those two cases (types passed in but no
indexes and vice versa) would fail in their error response generation
because it did not expect an object to already be started in the json
generator. This change moves the object start to where it is needed for
the empty responses.
closes#21916
This commit fixes the handling of spaces in Windows paths. The current
mechanism works fine in a path that contains a single space, but fails
on a path that contains multiple spaces. With this commit, that is no
longer the case.
Relates #21921
Elasticsearch can be run in a few different ways:
- from the command line on Linux and Windows
- as a service on Linux and Windows
on both 32-bit client and 64-bit server VMs. We strive for a great
out-of-the-box experience any of these combinations but today it is
lacking on 32-bit client JVMs and on the Windows service. There are two
deficiencies that arise:
- on any 32-bit client JVM we fail to start out of the box because we
force the server JVM in jvm.options
- when installing the Windows service, the thread stack size must be
specified in jvm.options
This commit attempts to address these deficiencies.
We should continue to force the server JVM because there are systems
where the server JVM is not active by default (e.g., the 32-bit JDK on
Windows). This does mean that if a user tries to run with a client JVM
they will see a failure message at startup but this is the best that we
can do if we want to continue to force the server JVM. Thus, this commit
at least documents this situation.
To improve the situation with installing the Windows service, this
commit adds a default setting for the thread stack size. This default is
chosen based on the default thread stack size across all 64-bit server
JVMs. This means that if a user tries to run with a 32-bit JVM they
could otherwise see significantly higher memory usage (this situation is
complicated, it's really only on Windows where the extra memory usage is
egregious, but cutting into the 32-bit address space on any system is
bad). So this commit makes it so that the out-of-the-box experience is
improved for the Windows service on 64-bit server JVMs and we document
the need to adjust this setting on 32-bit JVMs.
Again, we are focusing on the out-of-the-box experience here and this
means optimizing for the best experience on any 64-bit server JVM as
this covers the vast majority of the user base. The users that are on
32-bit JVMs will suffer a little bit but at least now any user on any
64-bit server JVM can start Elasticsearch out of the box.
Finally, we fix some references to the jvm.options documentation.
Relates #21920
During package install on systemd-based systems, we try to set
vm.max_map_count. On some systems (e.g., containers), users do not have
the ability to tune these parameters from within the container. This
commit provides an option for these users to skip setting such kernel
parameters.
Relates #21899
Today we can easily join a cluster that holds an index we don't support since
we currently allow rolling upgrades from 5.x to 6.x. Along the same lines we don't check if we can support an index based on the nodes in the cluster when we open, restore or metadata-upgrade and index. This commit adds
additional safety that fails cluster state validation, open, restore and /or upgrade if there is an open index with an incompatible index version created in the cluster.
Realtes to #21670
Timeouts are global today across all connections this commit allows to specify
a connection timeout per node such that depending on the context connections can
be established with different timeouts.
Relates to #19719