Commit Graph

3153 Commits

Author SHA1 Message Date
David Turner e4fd0ce730 Reduce TestLogging usage in DisruptionIT tests (#43411)
Removes `@TestLogging` annotations in `*DisruptionIT` tests, so that the only
tests with annotations are those with open issues. Also adds links to the open
issues in the remaining cases.

Relates #43403
2019-06-21 15:01:03 +01:00
Christoph Büscher 4fe650c9e5 Fix DefaultShardOperationFailedException subclass xcontent serialization (#43435)
The current toXContent implementation can fail when the superclasses toXContent
is called (see #43423). This change makes sure that 
DefaultShardOperationFailedException#toXContent is final and implementations
need to add special fields in #innerToXContent. All implementations should write
to self-contained xContent objects. Also adding a test for xContent deserialization 
to CloseIndexResponseTests.

Closes #43423
2019-06-21 14:31:19 +02:00
Yu c88f2f23a5 Make Recovery API support `detailed` params (#29076)
Properly forwards the `detailed` parameter to show the recovery stats details.

Closes #28910
2019-06-21 09:05:33 +02:00
Andrei Stefan 90e151edeb
Mute MultiSearchRequestTests.java tests (#43467) 2019-06-21 08:38:21 +03:00
Jim Ferenczi cc6c114cb8 Fix round up of date range without rounding (#43303)
Today when searching for an exclusive range the java date math parser rounds up the value
with the granularity of the operation. So when searching for values that are greater than
"now-2M" the parser rounds up the operation to "now-1M". This behavior was introduced when
we migrated to java date but it looks like a bug since the joda math parser rounds up values
but only when a rounding is used. So "now/M" is rounded to "now-1ms" (minus 1ms to get the largest inclusive value)
in the joda parser if the result should be exclusive but no rounding is applied if the input
is a simple operation like "now-1M". This change restores the joda behavior in order to have
a consistent parsing in all versions.

Closes #43277
2019-06-20 23:59:08 +02:00
Tim Brooks 827f8fcbd5
Move reindex request parsing into request (#43450)
Currently the fromXContent logic for reindex requests is implemented in
the rest action. This is inconsistent with other requests where the
logic is implemented in the request. Additionally, it requires access to
the rest action in order to parse the request. This commit moves the
logic and tests into the ReindexRequest.
2019-06-20 17:49:11 -04:00
sandmannn cf610b5e81 Added parsing of erroneous field value (#42321) 2019-06-20 15:24:04 -04:00
Jake Landis 2f2d0a198f
add version 6.8.2 2019-06-20 12:07:55 -05:00
Zachary Tong a8a81200d0 Better support for unmapped fields in AggregatorTestCase (#43405)
AggregatorTestCase will NPE if only a single, null MappedFieldType
is provided (which is required to simulate an unmapped field).  While
it's possible to test unmapped fields by supplying other, non-related
field types... that's clunky and unnecessary.  AggregatorTestCase
just needs to filter out null field types when setting up.
2019-06-20 11:31:49 -04:00
Yannick Welsch 8c856d6d91 Adapt local checkpoint assertion
With async durability, it does not hold true anymore after #43205. This is fine.
2019-06-20 17:29:53 +02:00
Armin Braun 99a44a04f7
Fix Infinite Loops in ExceptionsHelper#unwrap (#42716) (#43421)
* Fix Infinite Loops in ExceptionsHelper#unwrap

* Keep track of all seen exceptions and break out on loops
* Closes #42340
2019-06-20 16:38:28 +02:00
Armin Braun 39fef8379b
Fix FsRepositoryTests.testSnapshotAndRestore (#42925) (#43420)
* The commit generation can be 3 or 2 here -> fixed by checking the actual generation on the second commit instead of hard coding 2
* Closes #42905
2019-06-20 16:36:40 +02:00
synical b4c4018d00 Remove Confusing Comment (#43400) 2019-06-20 15:02:37 +01:00
David Turner c8eb09f158 Fail connection attempts earlier in tests (#43320)
Today the `DisruptibleMockTransport` always allows a connection to a node to be
established, and then fails requests sent to that node such as the subsequent
handshake. Since #42342, we log handshake failures on an open connection as a
warning, and this makes the test logs rather noisy. This change fails the
connection attempt first, avoiding these unrealistic warnings.
2019-06-20 14:45:24 +01:00
Yannick Welsch e04a2258fc Fix testGlobalCheckpointSync
The test needed adaption after #43205, as the ReplicationTracker now distinguishes between the
knowledge of the persisted global checkpoint and the computed global checkpoint on the primary

Follow-up to #43205
2019-06-20 14:00:00 +02:00
Yannick Welsch a76c034866 Reduce shard started failure logging (#43330)
If the master is stepping or shutting down, the error-level logging can cause quite a bit of noise.
2019-06-20 13:23:05 +02:00
Yannick Welsch 7f8e1454ab Advance checkpoints only after persisting ops (#43205)
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

This commit required changing some core classes in the system:

- The LocalCheckpointTracker keeps track now not only of the information whether an operation has
been processed, but also whether that operation has been persisted to disk.
- TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once
they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
- ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all
shard copies when in primary mode. The computed global checkpoint (which represents the
minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored
in the checkpoint entry for the local shard copy, has been moved to an extra field.
- The periodic global checkpoint sync now also takes async durability into account, where the local
checkpoints on shards only advance when the translog is asynchronously fsynced. This means that
the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is
not sufficient anymore.
- The new index closing API does not work when combined with async durability. The shard
verification step is now requires an additional pre-flight step to fsync the translog, so that the main
verify shard step has the most up-to-date global checkpoint at disposition.
2019-06-20 11:12:38 +02:00
Tanguy Leroux 24cfca53fa Reconnect remote cluster when seeds are changed (#43379)
The RemoteClusterService should close the current 
RemoteClusterConnection and should build it again if 
the seeds are changed, similarly to what is done when 
the ping interval or the compression settings are changed.

Closes #37799
2019-06-20 10:30:02 +02:00
Luca Cavanna 94a4bc9933 SearchPhaseContext to not extend ActionListener (#43269)
The fact that SearchPhaseContext extends ActionListener makes it hard
to reason about when the original listener is notified and to trace
those calls. Also, the corresponding onFailure and onResponse were
only needed in two places, one each, where they can be replaced by a
more intuitive call, like sendSearchResponse for onResponse.
2019-06-20 10:21:24 +02:00
Jim Ferenczi c33d62adbc Reduce the number of docvalues iterator created in the global ordinals fielddata (#43091)
Today the fielddata for global ordinals re-creates docvalues readers of each segment
when building the iterator of a single segment. This is required because the lookup of
global ordinals needs to access the docvalues's TermsEnum of each segment to retrieve
the original terms. This also means that we need to create NxN (where N is the number of segment in the index) docvalues iterators
each time we want to collect global ordinal values. This wasn't an issue in previous versions since docvalues readers are stateless
before 6.0 so they are reused on each segment but now that docvalues are iterators we need to create a new instance each time
we want to access the values. In order to avoid creating too many iterators this change splits
the global ordinals fielddata in two classes, one that is used to cache a single instance per directory reader and one
that is created from the cached instance that can be used by a single consumer. The latter creates the TermsEnum of each segment
once and reuse them to create the segment's iterator. This prevents the creation of all TermsEnums each time we want to access
the value of a single segment, hence reducing the number of docvalues iterator to create to Nx2 (one iterator and one lookup per segment).
2019-06-20 08:44:07 +02:00
Jason Tedor 1f1a035def
Remove stale test logging annotations (#43403)
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
2019-06-19 22:58:22 -04:00
Lee Hinman 6b084e55c5
[7.x] Prevent NullPointerException in TransportRolloverAction (#43353) (#43397)
It's possible for the passed in `IndexMetaData` to be null (for
instance, cluster state passed in does not have the index in its
metadata) which in turn can cause a `NullPointerException` when
evaluating the conditions for an index. This commit adds null protection
and unit tests for this case.

Resolves #43296
2019-06-19 16:07:28 -06:00
Jim Ferenczi b957aa46ce Allocate memory lazily in BestBucketsDeferringCollector (#43339)
While investigating memory consumption of deeply nested aggregations for #43091
the memory used to keep track of the doc ids and buckets in the BestBucketsDeferringCollector
showed up as one of the main contributor. In my tests half of the memory held in the
 BestBucketsDeferringCollector is associated to segments that don't have matching docs
 in the selected buckets. This is expected on fields that have a big cardinality since each
 bucket can appear in very few segments. By allocating the builders lazily this change
 reduces the memory consumption by a factor 2 (from 1GB to 512MB), hence reducing the
impact on gcs for these volatile allocations. This commit also switches the PackedLongValues.Builder
with a RoaringDocIdSet in order to handle very sparse buckets more efficiently.

I ran all my tests on the `geoname` rally track with the following query:

````
{
    "size": 0,
    "aggs": {
        "country_population": {
            "terms": {
                "size": 100,
                "field": "country_code.raw"
            },
            "aggs": {
                "admin1_code": {
                    "terms": {
                        "size": 100,
                        "field": "admin1_code.raw"
                    },
                    "aggs": {
                        "admin2_code": {
                            "terms": {
                                "size": 100,
                                "field": "admin2_code.raw"
                            },
                            "aggs": {
                                "sum_population": {
                                    "sum": {
                                        "field": "population"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
````
2019-06-19 22:10:59 +02:00
Christos Soulios d1637ca476
Backport: Refactor aggregation base classes to remove doEquals() and doHashCode() (#43363)
This PR is a backport a of #43214 from v8.0.0

A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).

Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.

This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.
2019-06-19 22:31:06 +03:00
Armin Braun be42b2c70c
Fix NetworkUtilsTests (#43295) (#43378)
* Follow up to #42109:
   * Adjust test to only check that interface lookup by name works not actually lookup IPs which is brittle since virtual interfaces can be destroyed/created by Docker while the tests are running

Co-authored-by:  Jason Tedor <jason@tedor.me>
2019-06-19 21:23:09 +02:00
Lee Hinman d81ce9a647 Return 0 for negative "free" and "total" memory reported by the OS (#42725)
* Return 0 for negative "free" and "total" memory reported by the OS

We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.

In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.

Resolves #42157

* Fix test passing in invalid memory value

* Fix another test passing in invalid memory value

* Also change mem check in MachineLearning.machineMemoryFromStats

* Add background documentation for why we prevent negative return values

* Clarify comment a bit more
2019-06-19 10:35:48 -06:00
Nhat Nguyen b5c8b32cab Do not use soft-deletes to resolve indexing strategy (#43336)
This PR reverts #35230.

Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).

With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.

Closes #42979
2019-06-19 10:40:24 -04:00
Martijn van Groningen a4c45b5d70
Replace Streamable w/ Writeable in SingleShardRequest and subclasses (#43222) (#43364)
Backport of: https://github.com/elastic/elasticsearch/pull/43222

This commit replaces usages of Streamable with Writeable for the
SingleShardRequest / TransportSingleShardAction classes and subclasses of
these classes.

Note that where possible response fields were made final and default
constructors were removed.

Relates to #34389
2019-06-19 16:15:09 +02:00
Paul Sanwald 8578aba654
[backport] Adds a minimum interval to `auto_date_histogram`. (#42814) (#43285)
Backports minimum interval to date histogram
2019-06-19 07:06:45 -04:00
Igor Motov 9f7d1ff2de Geo: Add coerce support to libs/geo WKT parser (#43273)
Adds support for coercing not closed polygons and ignoring Z value
to libs/geo WKT parser.

Closes #43173
2019-06-18 14:41:01 -04:00
Jim Ferenczi de1a685cce Fix sporadic failures in QueryStringQueryTests#testToQueryFuzzyQueryAutoFuziness (#43322)
This commit ensures that the test does not use reserved keyword (OR, AND, NOT)
when generating the random query strings.

Closes #43318
2019-06-18 20:18:09 +02:00
David Turner 90a8589294 Local node is discovered when cluster fails (#43316)
Today the `ClusterFormationFailureHelper` does not include the local node in
the list of nodes it claims to have discovered. This means that it sometimes
reports that it has not discovered a quorum when in fact it has. This commit
adds the local node to the set of discovered nodes.
2019-06-18 12:23:23 +01:00
David Turner 2e064e0d13 Allow election of nodes outside voting config (#43243)
Today we suppress election attempts on master-eligible nodes that are not in
the voting configuration. In fact this restriction is not necessary: any
master-eligible node can safely become master as long as it has a fresh enough
cluster state and can gather a quorum of votes. Moreover, this restriction is
sometimes undesirable: there may be a reason why we do not want any of the
nodes in the voting configuration to become master.

The reason for this restriction is as follows. If you want to shut the master
down then you might first exclude it from the voting configuration. When this
exclusion succeeds you might reasonably expect that a new master has been
elected, since the voting config exclusion is almost always a step towards
shutting the node down. If we allow nodes outside the voting configuration to
be the master then the excluded node will continue to be master, which is
confusing.

This commit adjusts the logic to allow master-eligible nodes to attempt an
election even if they are not in the voting configuration. If such a master is
successfully elected then it adds itself to the voting configuration. This
commit also adjusts the logic that causes master nodes to abdicate when they
are excluded from the voting configuration, to avoid the confusion described
above.

Relates #37712, #37802.
2019-06-18 12:10:48 +01:00
Nhat Nguyen 0c5086d2f3 Rebuild version map when opening internal engine (#43202)
With this change, we will rebuild the live version map and local
checkpoint using documents (including soft-deleted) from the safe commit
when opening an internal engine. This allows us to safely prune away _id
of all soft-deleted documents as the version map is always in-sync with
the Lucene index.

Relates #40741
Supersedes #42979
2019-06-17 18:08:09 -04:00
David Turner 2d9b3a69e8 Relocation targets are assigned shards too (#43276)
Adds relocation targets to the output of
`IndexShardRoutingTable#assignedShards`.
2019-06-17 17:14:09 +01:00
Henning Andersen ba15d08e14
Allow cluster access during node restart (#42946) (#43272)
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).
2019-06-17 15:04:17 +02:00
David Turner 4b58827beb Make DiscoveryNodeRole into a value object (#43257)
Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare
these objects' values for equality, and adds a field to allow us to distinguish
unknown roles from known ones with the same name and abbreviation, for clearer
test failures.

Relates #43175
2019-06-17 10:23:29 +01:00
Alpar Torok a8bf18184a
Refactor Version class to make version bumps easier (#42668) (#43215)
With this change we only have to add one line to add a new version.
The intent is to make it less error prone and easier to write a script
to automate the process.
2019-06-17 10:49:20 +03:00
Nhat Nguyen 4b643c50fa Account soft deletes in committed segments (#43126)
This change fixes the delete count issue in segment stats where we don't
account soft-deleted documents from committed segments.

Relates #43103
2019-06-16 22:56:24 -04:00
Jay Modi c3f1e6a542 Ensure threads running before closing node (#43240)
There are a few tests within NodeTests that submit items to the
threadpool and then close the node. The tests are designed to check
how running tasks are affected during node close. These tests can cause
CI failures since the submitted tasks may not be running when the node
is closed and then execute after the thread context is closed, which
triggers an unexpected exception. This change ensures the threads are
running so we avoid the unexpected exception and can test these cases.

The test of task submittal while a node is closing is also important so
an additional but muted test has been added that tests the case where a
task may be getting submitted while the node is closing and ensuring we
do not trigger anything unexpected in these cases.

Relates #42774
Relates #42577
2019-06-14 12:35:43 -06:00
Julie Tibshirani 4b1d8e4433 Allow big integers and decimals to be mapped dynamically. (#42827)
This PR proposes to model big integers as longs (and big decimals as doubles)
in the context of dynamic mappings.

Previously, the dynamic mapping logic did not recognize big integers or
decimals, and would an error of the form "No matching token for number_type
[BIG_INTEGER]" when a dynamic big integer was encountered. It now accepts these
numeric types and interprets them as 'long' and 'double' respectively. This
allows `dynamic_templates` to accept and and remap them as another type such as
`keyword` or `scaled_float`.

Addresses #37846.
2019-06-14 10:05:11 -07:00
Yannick Welsch be9f27bb16 Properly use cancellable threads to stop UnicastZenPing (#42844)
Fixes a backport issue with #42884 where Zen1 was not properly taken into account.
2019-06-14 13:32:44 +02:00
David Turner 221d23de9f
Fix DiscoveryNodeRoleIT (#43225)
The test fails if querying the roles via a transport client, since the
transport client does not have the plugin necessary to interpret the additional
role correctly. This commit adds this plugin to the transport client used.

Relates #43175
Fixes #43223
2019-06-14 12:27:01 +01:00
Christoph Büscher 7af23324e3 SimpleQ.S.B and QueryStringQ.S.B tests should avoid `now` in query (#43199)
Currently the randomization of the q.b. in these tests can create query strings
that can cause caching to be disabled for this query if we query all fields and
there is a date field present. This is pretty much an anomaly that we shouldn't
generally test for in the "testToQuery" tests where cache policies are checked.

This change makes sure we don't create offending query strings so the cache
checks never hit these cases and adds a special test method to check this edge
case.

Closes #43112
2019-06-14 11:21:48 +02:00
Przemyslaw Gomulka 4c8e77e092
Disable DiscoveryNodeRoleIT test due to failures (#43224)
relates #43223
2019-06-14 10:57:22 +02:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Przemyslaw Gomulka d27c0fd50d
Fix roundUp parsing with composite patterns backport(#43080) (#43191)
roundUp parsers were losing the composite pattern information when new
JavaDateFormatter was created from methods withLocale or withZone.

The roundUp parser should be preserved when calling these methods. This is the same approach in withLocale/Zone methods as in daa2ec8a60/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java

closes #42835
2019-06-14 08:56:26 +02:00
Jason Tedor 2bcc49424d
Register possible node roles in transport client
The transport client needs to be told about the possible node
roles. This commit does that.
2019-06-13 16:46:38 -04:00
Jason Tedor 55dba6ffad
Fix JDK-version dependent exception message parsing
This commit fixes some JDK-version dependent exception message checking
in the discovery node role tests.
2019-06-13 15:46:53 -04:00
Jason Tedor 5bc3b7f741
Enable node roles to be pluggable (#43175)
This commit introduces the possibility for a plugin to introduce
additional node roles.
2019-06-13 15:15:48 -04:00