There may be cases where the XContentBuilder is not used and therefore it never gets
closed, which can cause a leak of bytes. This change moves the creation of the builder
into a try with resources block and adds an assertion to verify that we always consume
the bytes in our code; the try-with resources provides protections against memory leaks
caused by plugins, which do not test this.
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
We throw this exception in some cases that the shard is closed, so we have to be consistent here. Otherwise we get logs like:
```
1> [2016-10-30T21:06:22,529][WARN ][o.e.i.IndexService ] [node_s_0] [test] failed to run task refresh - suppressing re-occurring exceptions unless the exception changes
1> org.elasticsearch.index.shard.IndexShardClosedException: CurrentState[CLOSED] operation only allowed when not closed
1> at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1147) ~[main/:?]
1> at org.elasticsearch.index.shard.IndexShard.verifyNotClosed(IndexShard.java:1141) ~[main/:?]
```
Before publishing a cluster state the master connects to the nodes that are added in the cluster state. When publishing fails, however, it does not disconnect from these nodes, leaving NodeConnectionsService out of sync with the currently applied cluster state.
The assertion assertMaster checks if all nodes have each other in the cluster state and the correct master set.
It is usually called after a disruption has been healed and ensureStableCluster been called. In presence of a low
publish timeout of 1s in this test class, publishing might not be fully done even after ensureStableCluster returns.
This commit adds an assertBusy to assertMaster so that the node has a bit more time to apply the cluster state from
the master, even if it's a bit slow.
Replication request may arrive at a replica before the replica's node has processed a required mapping update. In these cases the TransportReplicationAction will retry the request once a new cluster state arrives. Sadly that retry logic failed to call `ReplicationRequest#onRetry`, causing duplicates in the append only use case.
This commit fixes this and also the test which missed the check. I also added an assertion which would have helped finding the source of the duplicates.
This was discovered by https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=opensuse/174/
Relates #20211
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.
Closes#21142
Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
The network disruption type "network delay" continues delaying existing requests even after the disruption has been cleared. This commit ensures that the requests get to execute right after the delay rule is cleared.
This test failed when the node that was shutting down was not yet removed from the cluster state on the master.
The cluster allocation explain API will not see any unassigned shards until the node shutting down is removed from the
cluster state.
Previously, if a node left the cluster (for example, due to a long GC),
during a snapshot, the master node would mark the snapshot as failed, but
the node itself could continue snapshotting the data on its shards to the
repository. If the node rejoins the cluster, the master may assign it to
hold the replica shard (where it held the primary before getting kicked off
the cluster). The initialization of the replica shard would repeatedly fail
with a ShardLockObtainFailedException until the snapshot thread finally
finishes and relinquishes the lock on the Store.
This commit resolves the situation by ensuring that when a shard is removed
from a node (such as when a node rejoins the cluster and realizes it no longer
holds the active shard copy), any snapshotting of the removed shards is aborted.
In the scenario above, when the node rejoins the cluster, it will see in the cluster
state that the node no longer holds the primary shard, so IndicesClusterStateService
will remove the shard, thereby causing any snapshots of that shard to be aborted.
Closes#20876
The cluster state on a node is updated either
- by incoming cluster states that are received from the active master or
- by the node itself when it notices that the master has gone.
In the second case, the node adds the NO_MASTER_BLOCK and removes the current master as active master from its cluster state. In one particular case, it would also update the list of nodes, removing the master node that just failed. In the future, we want a clear separation between actions that can be executed by a master publishing a cluster state and a node locally updating its cluster state when no active master is around.
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.
This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.
Relates #21123
This commit adds preformatted tags to the Javadoc for
OsProbe#readSysFsCgroupCpuAcctCpuStat to render the form of the cpu.stat
file in a fixed-width font.
When acquiring cgroup stats, we check if such stats are available by
invoking a method areCgroupStatsAvailable. This method checks
availability by looking for existence of some virtual files in
/proc/self/cgroup and /sys/fs/cgroups. If these stats are not available,
the getCgroup method returns null. The OsProbeTests#testCgroupProbe did
not account for this. On some systems where tests run, the cgroup stats
might not be available yet this test method was expecting them to be (we
mock the relevant virtual file reads). This commit handles the execution
of this test on such systems by overriding the behavior of
OsProbe#areCgroupStatsAvailable. We test both the possibility of this
method returning true as well as false.
On some systems, cgroups will be available but not configured. And in
some cases, cgroups will be configured, but not for the subsystems that
we are expecting (e.g., cpu and cpuacct). This commit strengthens the
handling of cgroup stats on such systems.
Relates #21094
When system starts, it creates a temporary file named .es_temp_file to ensure the data directories are writable.
If system crashes after creating the .es_temp_file but before deleting this file, next time the system will not be able to start because the Files.createFile(resolve) will throw an exception if the file already exists.