3303 Commits

Author SHA1 Message Date
David Turner
a09389c511 AwaitsFix GatewayIndexStateIT#testJustMasterNode
Relates #44416.
2019-07-16 11:02:32 +01:00
David Turner
8d68d1f54d Cluster health should await events plus other things (#44348)
Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
2019-07-16 06:34:02 +01:00
Ryan Ernst
e0b82e92f3
Convert BaseNode(s) Request/Response classes to Writeable (#44301) (#44358)
This commit converts all BaseNodeResponse and BaseNodesResponse
subclasses to implement Writeable.Reader instead of Streamable.

relates #34389
2019-07-15 18:07:52 -07:00
David Turner
86ee8eab3f Allow RerouteService to reroute at lower priority (#44338)
Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH`
priority, but in some cases a lower priority would be more appropriate. This
commit adds the facility to submit delayed reroute tasks at different
priorities, such that each submitted reroute task runs at a priority no lower
than the one requested. It does not change the fact that all delayed reroute
tasks are submitted at `HIGH` priority, but at least it makes this explicit.
2019-07-15 17:41:39 +01:00
Ryan Ernst
59658daef9
Separate streamable based master node actions (#44313)
This commit creates new base classes for master node actions whose
response types still implement Streamable. This simplifies both finding
remaining classes to convert, as well as creating new master node
actions that use Writeable for their responses.

relates #34389
2019-07-15 09:20:20 -07:00
David Turner
e3d2af64c4 Throw TranslogCorruptedException in more cases (#44217)
Today we do not throw a `TranslogCorruptedException` in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing. This means
that `elasticsearch-shard` will not truncate the translog in those cases.

This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
`TranslogCorruptedException` is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data.

It also adjusts (and renames) `RemoveCorruptedShardDataCommandIT.getDirs()` to
return only a single path, since in practice this was the only thing that could
happen and yet we were relying on its callers to verify this and not all
callers were doing so.
2019-07-15 15:20:33 +01:00
Armin Braun
eb1106c465
Stronger Cleanup Shard Snapshot Directory on Delete (#44257) (#44337)
* Stronger Cleanup Shard Snapshot Directory on Delete

* Use `RepositoryData` to clean up unreferenced `snap-${uuid}.dat` blobs
from shard directories (and index-N) and as a result also clean up data
blobs that are only referenced by them
* Stop cleaning up anything but index-N on shard snapshot creation to
align behavior of shard index-N handling with root path index-N
handling
2019-07-15 12:59:38 +02:00
Christoph Büscher
22dc125dad AnalyzeAction.Response doesn't need to call super.readFrom() (#44331)
The responses super.writeTo() method was removed in #44092, so the corresponding
contructor that reads from a stream shouldn't call super itself, even though its
implementation is currently empty.
2019-07-15 11:53:25 +02:00
Armin Braun
7f5d40d235
Avoid Needless Set Instantiation in InboundMessage (#44318) (#44329)
* Avoid Needless Set Instantiation in InboundMessage

* When `features` is empty (when there's no xpack) we constantly and needless instantiated a few objects here for the empty set on every message
2019-07-15 10:59:51 +02:00
Armin Braun
0cc94a457d
Remove non-SMILE Serialization from ChecksumBlobStoreFormat (#44278) (#44326)
* At least all the way back to 6.x we never use anything but `SMILE` in
production code with this class so I removed the more general
constructor and removed the format leniency from the deserialization
2019-07-15 10:59:33 +02:00
Tanguy Leroux
76a96c3774 Remove ReusePeerRecoverySharedTest class (#44275) 2019-07-15 10:29:29 +02:00
Armin Braun
d73e2f9c56
HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164) (#44324)
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes #33077
2019-07-15 10:21:54 +02:00
Nhat Nguyen
2203d447aa Fail engine if hit document failure on replicas (#43523)
An indexing on a replica should never fail after it was successfully
indexed on a primary. Hence, we should fail an engine if we hit any
failure (document level or tragic failure) when processing an indexing
on a replica.

Relates #43228
Closes #40435
2019-07-14 19:29:16 -04:00
Christoph Büscher
835b7a120d Fix AnalyzeAction response serialization (#44284)
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
2019-07-14 10:35:11 +02:00
Ryan Ernst
1dcf53465c Reorder HandledTransportAction ctor args (#44291)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.

relates #34389
2019-07-12 13:45:09 -07:00
Nikita Glashenko
d187fcb9de Support WKT point conversion to geo_point type (#44107)
This PR adds support for parsing geo_point values from WKT POINT format.
Also, a few minor bugs in geo_point parsing were fixed.

Closes #41821
2019-07-12 14:31:07 -04:00
Przemyslaw Gomulka
e23ecc5838
JSON logging refactoring and X-Opaque-ID support backport(#41354) (#44178)
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)

These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.

Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.

closes #41350
 backport #41354
2019-07-12 16:53:27 +02:00
Armin Braun
9b4f50b40a
Remove Redundant GetAllSnapshots Method from RepositoryData (#44259) (#44271)
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
2019-07-12 15:03:09 +02:00
Yannick Welsch
068286ca4b Remove RemoteClusterConnection.ConnectedNodes (#44235)
This instead exposes the set of connected nodes on ConnectionManager.
2019-07-12 14:54:21 +02:00
Armin Braun
6c02cf0241
Fix InternalTestCluster StopRandomNode Assertion (#44258) (#44265)
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes #44245
2019-07-12 13:18:55 +02:00
Armin Braun
ad6dce16f4
Safer Shard Snapshot Delete (#44165) (#44244)
* Safer Shard Snapshot Delete

* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
2019-07-12 12:45:06 +02:00
David Turner
735c897ec6
Avoid counting votes from master-ineligible nodes (#43688)
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.

This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.

Backport of #43688, see #44260.
2019-07-12 11:30:52 +01:00
Armin Braun
9e920f9612
Make Timestamps Returned by Snapshot APIs Consistent (#43148) (#44261)
* We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or
take the most recent time from the shard stats for in progress snapshots
* Closes #43074
2019-07-12 12:05:35 +02:00
Mark Vieira
3cd9606566
Mute failing test 2019-07-11 13:32:49 -07:00
Armin Braun
0dd06cf7a5
Remove Dead Code Around Snapshots (#44109) (#44236)
* Just some random spots that have become unused with recent cleanups
2019-07-11 21:56:36 +02:00
Christoph Büscher
31725ef390 [Tests] Increase SimpleQueryStringIT allowed maxClauseCount (#44215)
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.

Closes #44192
2019-07-11 20:16:20 +02:00
Yannick Welsch
ae8f625d73 Report usages old child breakers when breaking on real memory (#44221)
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
2019-07-11 19:52:12 +02:00
Armin Braun
2768662822
Cleanup Stale Root Level Blobs in Sn. Repository (#43542) (#44226)
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
2019-07-11 19:35:15 +02:00
Andrei Stefan
e9f9f00940
SQL: add pretty printing to JSON format (#43756) (#44220)
(cherry picked from commit cbd9d4c259bf5a541bc49f65f7973174a36df449)
2019-07-11 20:02:24 +03:00
Christos Soulios
c091b6c004
Migrating tests from AvgIT integration test to AvgAggregatorTests (#44076) (#44225)
This PR migrates most tests from AvgIT integration test to AvgAggregatorTests, as described in #42893
2019-07-11 19:20:13 +03:00
Armin Braun
5f22370b6b
Fix ShrinkIndexIT (#44214) (#44223)
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
2019-07-11 17:58:00 +02:00
Igor Motov
1636701d69 CI: Disable SimpleQueryStringIT.testDocWithAllTypes
Tracked by #44192
2019-07-11 09:22:18 -05:00
Nick Knize
374030a53f
Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171) (#44184)
Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378
2019-07-11 09:17:22 -05:00
Igor Motov
66a9b721f5 Add Map to XContentParser Wrapper (#44036)
In some cases we need to parse some XContent that is already parsed into
a map. This is currently happening in handling source in SQL and ingest
processors as well as parsing null_value values in geo mappings. To avoid
re-serializing and parsing the value again or writing another map-based
parser this commit adds an iterator that iterates over a map as if it was
XContent. This makes reusing existing XContent parser on maps possible.

Relates to #43554
2019-07-11 09:38:31 -04:00
Yannick Welsch
ea5513f2cf Make NodeConnectionsService non-blocking (#44211)
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.

I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
2019-07-11 14:08:07 +02:00
Armin Braun
51f0e941d3
Reduce Number of List Calls During Snapshot Create and Delete (#44088) (#44209)
* Reduce Number of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
2019-07-11 13:52:36 +02:00
Armin Braun
8ce8c627dd
Some Cleanup in o.e.i.shard (#44097) (#44208)
* Some Cleanup in o.e.i.shard

* Extract one duplicated method
* Cleanup obviously unused code
2019-07-11 13:52:06 +02:00
Yannick Welsch
2ee07f1ff4 Simplify port usage in transport tests (#44157)
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports  and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
2019-07-11 13:35:37 +02:00
Armin Braun
c0ed64bb92
Improve Repository Consistency Check in Tests (#44204)
* Improve Repository Consistency Check in Tests (#44099)

* Check that index metadata as well as snapshot metadata always exists
when referenced by other metadata

* Fix SnapshotResiliencyTests on ExtraFS (#44113)

* As a result of #44099 we're now checking more directories and have to
ignore the `extraN` folders for those like we do for indices already
* Closes #44112
2019-07-11 11:14:37 +02:00
Armin Braun
8a554f9737
Remove IncompatibleSnapshots Logic from Codebase (#44096) (#44183)
* The incompatible snapshots logic was created to track 1.x snapshots that
became incompatible with 2.x
   * It serves no purpose at this point
   * It adds an additional GET request to every loading of
RepositoryData (from loading the incompatible snapshots blob)
2019-07-11 07:15:51 +02:00
Igor Motov
df2e1fb43e Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 16:55:03 -04:00
Ryan Ernst
8fda49a834 Remove unused import in TransportShardBulkAction
Accidentally left from backporting #44092
2019-07-10 13:43:47 -07:00
Christoph Büscher
cbb19032df [Test] Additional logging for RemoteClusterClientTests (#44124) 2019-07-10 22:41:54 +02:00
Ryan Ernst
c6efb9be2a Convert ReplicationResponse to Writeable (#43953)
This commit convers ReplicationResponse and all its subclasses to
support Writeable.Reader as a constructor.

relates #34389
2019-07-10 12:45:10 -07:00
Ryan Ernst
fb77d8f461 Removed writeTo from TransportResponse and ActionResponse (#44092)
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.

This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.

relates #34389
2019-07-10 12:42:04 -07:00
Zachary Tong
92ad588275
Remove generic on AggregatorFactory (#43664) (#44079)
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override).  Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.

But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
2019-07-10 13:20:28 -04:00
Nhat Nguyen
b158919542 Do not use mock engine in PrimaryAllocationIT (#44083)
PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary 
relies on the flushing when a shard is no long assigned. This behavior,
however, can be randomly disabled in MockInternalEngine.

Closes #44049
2019-07-10 12:26:34 -04:00
David Turner
d0f1a756d9 Comment on the extra reroute after failing shards (#44152)
The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a
reroute, but then sometimes schedules a followup reroute. It's not clear from
the code why this followup is necessary, so this commit adds a short comment
describing why it's necessary.
2019-07-10 13:24:21 +01:00
David Roberts
cad804df92 [TEST] Mute ShrinkIndexIT
Due to https://github.com/elastic/elasticsearch/issues/44164
2019-07-10 13:22:25 +01:00
Martijn van Groningen
913b6a64e8
Replace Streamable w/ Writable for MultiSearchRequest (#44057)
This commit replaces usages of Streamable with Writeable for the
MultiSearchRequest class.

I ran into this when developing a custom action that reuses
MultiSearchRequest in the enrich branch.

Relates to #34389
2019-07-10 11:13:28 +02:00