Commit Graph

3336 Commits

Author SHA1 Message Date
Tanguy Leroux 24cfca53fa Reconnect remote cluster when seeds are changed (#43379)
The RemoteClusterService should close the current 
RemoteClusterConnection and should build it again if 
the seeds are changed, similarly to what is done when 
the ping interval or the compression settings are changed.

Closes #37799
2019-06-20 10:30:02 +02:00
Luca Cavanna 94a4bc9933 SearchPhaseContext to not extend ActionListener (#43269)
The fact that SearchPhaseContext extends ActionListener makes it hard
to reason about when the original listener is notified and to trace
those calls. Also, the corresponding onFailure and onResponse were
only needed in two places, one each, where they can be replaced by a
more intuitive call, like sendSearchResponse for onResponse.
2019-06-20 10:21:24 +02:00
Jim Ferenczi c33d62adbc Reduce the number of docvalues iterator created in the global ordinals fielddata (#43091)
Today the fielddata for global ordinals re-creates docvalues readers of each segment
when building the iterator of a single segment. This is required because the lookup of
global ordinals needs to access the docvalues's TermsEnum of each segment to retrieve
the original terms. This also means that we need to create NxN (where N is the number of segment in the index) docvalues iterators
each time we want to collect global ordinal values. This wasn't an issue in previous versions since docvalues readers are stateless
before 6.0 so they are reused on each segment but now that docvalues are iterators we need to create a new instance each time
we want to access the values. In order to avoid creating too many iterators this change splits
the global ordinals fielddata in two classes, one that is used to cache a single instance per directory reader and one
that is created from the cached instance that can be used by a single consumer. The latter creates the TermsEnum of each segment
once and reuse them to create the segment's iterator. This prevents the creation of all TermsEnums each time we want to access
the value of a single segment, hence reducing the number of docvalues iterator to create to Nx2 (one iterator and one lookup per segment).
2019-06-20 08:44:07 +02:00
Jason Tedor 1f1a035def
Remove stale test logging annotations (#43403)
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
2019-06-19 22:58:22 -04:00
Lee Hinman 6b084e55c5
[7.x] Prevent NullPointerException in TransportRolloverAction (#43353) (#43397)
It's possible for the passed in `IndexMetaData` to be null (for
instance, cluster state passed in does not have the index in its
metadata) which in turn can cause a `NullPointerException` when
evaluating the conditions for an index. This commit adds null protection
and unit tests for this case.

Resolves #43296
2019-06-19 16:07:28 -06:00
Jim Ferenczi b957aa46ce Allocate memory lazily in BestBucketsDeferringCollector (#43339)
While investigating memory consumption of deeply nested aggregations for #43091
the memory used to keep track of the doc ids and buckets in the BestBucketsDeferringCollector
showed up as one of the main contributor. In my tests half of the memory held in the
 BestBucketsDeferringCollector is associated to segments that don't have matching docs
 in the selected buckets. This is expected on fields that have a big cardinality since each
 bucket can appear in very few segments. By allocating the builders lazily this change
 reduces the memory consumption by a factor 2 (from 1GB to 512MB), hence reducing the
impact on gcs for these volatile allocations. This commit also switches the PackedLongValues.Builder
with a RoaringDocIdSet in order to handle very sparse buckets more efficiently.

I ran all my tests on the `geoname` rally track with the following query:

````
{
    "size": 0,
    "aggs": {
        "country_population": {
            "terms": {
                "size": 100,
                "field": "country_code.raw"
            },
            "aggs": {
                "admin1_code": {
                    "terms": {
                        "size": 100,
                        "field": "admin1_code.raw"
                    },
                    "aggs": {
                        "admin2_code": {
                            "terms": {
                                "size": 100,
                                "field": "admin2_code.raw"
                            },
                            "aggs": {
                                "sum_population": {
                                    "sum": {
                                        "field": "population"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
````
2019-06-19 22:10:59 +02:00
Christos Soulios d1637ca476
Backport: Refactor aggregation base classes to remove doEquals() and doHashCode() (#43363)
This PR is a backport a of #43214 from v8.0.0

A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).

Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.

This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.
2019-06-19 22:31:06 +03:00
Armin Braun be42b2c70c
Fix NetworkUtilsTests (#43295) (#43378)
* Follow up to #42109:
   * Adjust test to only check that interface lookup by name works not actually lookup IPs which is brittle since virtual interfaces can be destroyed/created by Docker while the tests are running

Co-authored-by:  Jason Tedor <jason@tedor.me>
2019-06-19 21:23:09 +02:00
Lee Hinman d81ce9a647 Return 0 for negative "free" and "total" memory reported by the OS (#42725)
* Return 0 for negative "free" and "total" memory reported by the OS

We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.

In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.

Resolves #42157

* Fix test passing in invalid memory value

* Fix another test passing in invalid memory value

* Also change mem check in MachineLearning.machineMemoryFromStats

* Add background documentation for why we prevent negative return values

* Clarify comment a bit more
2019-06-19 10:35:48 -06:00
Nhat Nguyen b5c8b32cab Do not use soft-deletes to resolve indexing strategy (#43336)
This PR reverts #35230.

Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).

With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.

Closes #42979
2019-06-19 10:40:24 -04:00
Martijn van Groningen a4c45b5d70
Replace Streamable w/ Writeable in SingleShardRequest and subclasses (#43222) (#43364)
Backport of: https://github.com/elastic/elasticsearch/pull/43222

This commit replaces usages of Streamable with Writeable for the
SingleShardRequest / TransportSingleShardAction classes and subclasses of
these classes.

Note that where possible response fields were made final and default
constructors were removed.

Relates to #34389
2019-06-19 16:15:09 +02:00
Paul Sanwald 8578aba654
[backport] Adds a minimum interval to `auto_date_histogram`. (#42814) (#43285)
Backports minimum interval to date histogram
2019-06-19 07:06:45 -04:00
Igor Motov 9f7d1ff2de Geo: Add coerce support to libs/geo WKT parser (#43273)
Adds support for coercing not closed polygons and ignoring Z value
to libs/geo WKT parser.

Closes #43173
2019-06-18 14:41:01 -04:00
Jim Ferenczi de1a685cce Fix sporadic failures in QueryStringQueryTests#testToQueryFuzzyQueryAutoFuziness (#43322)
This commit ensures that the test does not use reserved keyword (OR, AND, NOT)
when generating the random query strings.

Closes #43318
2019-06-18 20:18:09 +02:00
David Turner 90a8589294 Local node is discovered when cluster fails (#43316)
Today the `ClusterFormationFailureHelper` does not include the local node in
the list of nodes it claims to have discovered. This means that it sometimes
reports that it has not discovered a quorum when in fact it has. This commit
adds the local node to the set of discovered nodes.
2019-06-18 12:23:23 +01:00
David Turner 2e064e0d13 Allow election of nodes outside voting config (#43243)
Today we suppress election attempts on master-eligible nodes that are not in
the voting configuration. In fact this restriction is not necessary: any
master-eligible node can safely become master as long as it has a fresh enough
cluster state and can gather a quorum of votes. Moreover, this restriction is
sometimes undesirable: there may be a reason why we do not want any of the
nodes in the voting configuration to become master.

The reason for this restriction is as follows. If you want to shut the master
down then you might first exclude it from the voting configuration. When this
exclusion succeeds you might reasonably expect that a new master has been
elected, since the voting config exclusion is almost always a step towards
shutting the node down. If we allow nodes outside the voting configuration to
be the master then the excluded node will continue to be master, which is
confusing.

This commit adjusts the logic to allow master-eligible nodes to attempt an
election even if they are not in the voting configuration. If such a master is
successfully elected then it adds itself to the voting configuration. This
commit also adjusts the logic that causes master nodes to abdicate when they
are excluded from the voting configuration, to avoid the confusion described
above.

Relates #37712, #37802.
2019-06-18 12:10:48 +01:00
Nhat Nguyen 0c5086d2f3 Rebuild version map when opening internal engine (#43202)
With this change, we will rebuild the live version map and local
checkpoint using documents (including soft-deleted) from the safe commit
when opening an internal engine. This allows us to safely prune away _id
of all soft-deleted documents as the version map is always in-sync with
the Lucene index.

Relates #40741
Supersedes #42979
2019-06-17 18:08:09 -04:00
David Turner 2d9b3a69e8 Relocation targets are assigned shards too (#43276)
Adds relocation targets to the output of
`IndexShardRoutingTable#assignedShards`.
2019-06-17 17:14:09 +01:00
Henning Andersen ba15d08e14
Allow cluster access during node restart (#42946) (#43272)
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).
2019-06-17 15:04:17 +02:00
David Turner 4b58827beb Make DiscoveryNodeRole into a value object (#43257)
Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare
these objects' values for equality, and adds a field to allow us to distinguish
unknown roles from known ones with the same name and abbreviation, for clearer
test failures.

Relates #43175
2019-06-17 10:23:29 +01:00
Alpar Torok a8bf18184a
Refactor Version class to make version bumps easier (#42668) (#43215)
With this change we only have to add one line to add a new version.
The intent is to make it less error prone and easier to write a script
to automate the process.
2019-06-17 10:49:20 +03:00
Nhat Nguyen 4b643c50fa Account soft deletes in committed segments (#43126)
This change fixes the delete count issue in segment stats where we don't
account soft-deleted documents from committed segments.

Relates #43103
2019-06-16 22:56:24 -04:00
Jay Modi c3f1e6a542 Ensure threads running before closing node (#43240)
There are a few tests within NodeTests that submit items to the
threadpool and then close the node. The tests are designed to check
how running tasks are affected during node close. These tests can cause
CI failures since the submitted tasks may not be running when the node
is closed and then execute after the thread context is closed, which
triggers an unexpected exception. This change ensures the threads are
running so we avoid the unexpected exception and can test these cases.

The test of task submittal while a node is closing is also important so
an additional but muted test has been added that tests the case where a
task may be getting submitted while the node is closing and ensuring we
do not trigger anything unexpected in these cases.

Relates #42774
Relates #42577
2019-06-14 12:35:43 -06:00
Julie Tibshirani 4b1d8e4433 Allow big integers and decimals to be mapped dynamically. (#42827)
This PR proposes to model big integers as longs (and big decimals as doubles)
in the context of dynamic mappings.

Previously, the dynamic mapping logic did not recognize big integers or
decimals, and would an error of the form "No matching token for number_type
[BIG_INTEGER]" when a dynamic big integer was encountered. It now accepts these
numeric types and interprets them as 'long' and 'double' respectively. This
allows `dynamic_templates` to accept and and remap them as another type such as
`keyword` or `scaled_float`.

Addresses #37846.
2019-06-14 10:05:11 -07:00
Yannick Welsch be9f27bb16 Properly use cancellable threads to stop UnicastZenPing (#42844)
Fixes a backport issue with #42884 where Zen1 was not properly taken into account.
2019-06-14 13:32:44 +02:00
David Turner 221d23de9f
Fix DiscoveryNodeRoleIT (#43225)
The test fails if querying the roles via a transport client, since the
transport client does not have the plugin necessary to interpret the additional
role correctly. This commit adds this plugin to the transport client used.

Relates #43175
Fixes #43223
2019-06-14 12:27:01 +01:00
Christoph Büscher 7af23324e3 SimpleQ.S.B and QueryStringQ.S.B tests should avoid `now` in query (#43199)
Currently the randomization of the q.b. in these tests can create query strings
that can cause caching to be disabled for this query if we query all fields and
there is a date field present. This is pretty much an anomaly that we shouldn't
generally test for in the "testToQuery" tests where cache policies are checked.

This change makes sure we don't create offending query strings so the cache
checks never hit these cases and adds a special test method to check this edge
case.

Closes #43112
2019-06-14 11:21:48 +02:00
Przemyslaw Gomulka 4c8e77e092
Disable DiscoveryNodeRoleIT test due to failures (#43224)
relates #43223
2019-06-14 10:57:22 +02:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Przemyslaw Gomulka d27c0fd50d
Fix roundUp parsing with composite patterns backport(#43080) (#43191)
roundUp parsers were losing the composite pattern information when new
JavaDateFormatter was created from methods withLocale or withZone.

The roundUp parser should be preserved when calling these methods. This is the same approach in withLocale/Zone methods as in daa2ec8a60/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java

closes #42835
2019-06-14 08:56:26 +02:00
Jason Tedor 2bcc49424d
Register possible node roles in transport client
The transport client needs to be told about the possible node
roles. This commit does that.
2019-06-13 16:46:38 -04:00
Jason Tedor 55dba6ffad
Fix JDK-version dependent exception message parsing
This commit fixes some JDK-version dependent exception message checking
in the discovery node role tests.
2019-06-13 15:46:53 -04:00
Jason Tedor 5bc3b7f741
Enable node roles to be pluggable (#43175)
This commit introduces the possibility for a plugin to introduce
additional node roles.
2019-06-13 15:15:48 -04:00
Simon Willnauer f70141c862 Only load FST off heap if we are actually using mmaps for the term dictionary (#43158)
Given the significant performance impact that NIOFS has when term dicts are
loaded off-heap this change enforces FstLoadMode#AUTO that loads term dicts
off heap only if the underlying index input indicates a memory map.

Relates to #43150
2019-06-13 07:54:02 +02:00
Tal Levy 20031fb13f
Introduce unit tests for ValuesSourceType (#43174) (#43176)
As the ValuesSourceType evolves, it is important to be
confident that new enum constants do not break
backwards-compatibility on the stream. Having dedicated
unit tests for this class will help be sure of that.
2019-06-12 18:17:23 -07:00
Jim Ferenczi 6cfed7ec72 Also mmap terms index (`.tip`) files for hybridfs (#43150)
This change adds the terms index (`.tip`) to the list of extensions
that are memory-mapped by hybridfs. These files used to be accessed
only once to load the terms index on-heap but since #42838 they can
now be used to read the binary FST directly so it is benefical to
memory-map them instead of accessing them via NIO.
2019-06-12 20:54:09 +02:00
Yannick Welsch 8711a092bf Stop SeedHostsResolver on shutdown (#42844)
Fixes an issue where tests would sometimes hang for 5 seconds when restarting a node. The reason
is that the SeedHostsResolver is blockingly waiting on a result for the full 5 seconds when the
corresponding threadpool is shut down.
2019-06-12 19:36:10 +02:00
Simon Willnauer 9d2adfb41e Remove usage of FileSwitchDirectory (#42937)
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
2019-06-12 19:35:27 +02:00
Alan Woodward 9de1c69c28 IndexAnalyzers doesn't need to extend AbstractIndexComponent (#43149)
AIC doesn't add anything here, and it removes the need to pass index settings
to the constructor.
2019-06-12 17:48:31 +01:00
Jim Ferenczi 79614aeb2d SearchRequest#allowPartialSearchResults does not handle successful retries (#43095)
When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes #40743
2019-06-12 16:27:10 +02:00
Christoph Büscher 7f690e8606 Fix suggestions for empty indices (#42927)
Currently suggesters return null values on empty shards. Usually this gets replaced
by results from other non-epmty shards, but if the index is completely epmty (e.g. after
creation) the search responses "suggest" is also "null" and we don't render a corresponding
output in the REST response. This is an irritating edge case that requires special handling on
the user side (see #42473) and should be fixed.

This change makes sure every suggester type (completion, terms, phrase) returns at least an
empty skeleton suggestion output, even for empty shards. This way, even if we don't find
any suggestions anywhere, we still return and output the empty suggestion.

Closes #42473
2019-06-12 15:42:23 +02:00
Alexander Reelsen 6f95038001 Upgrade HPPC to version 0.8.1 (#43025) 2019-06-12 13:14:16 +02:00
Luca Cavanna afeda1a7b9 Split search in two when made against throttled and non throttled searches (#42510)
When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.

This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.

Closes #40900
2019-06-12 11:25:03 +02:00
Luca Cavanna 31e8bff2ac Rename SearchRequest#crossClusterSearch (#42363)
The SearchRequest#crossClusterSearch method is currently used only as
part of cross cluster search request, when minimizing roundtrips.
It will soon be used also when splitting a search into two: one for
throttled and one for non throttled indices. It will probably be used
for other usecases as well in the future, hence it makes sense to generalize its name to subSearchRequest.
2019-06-12 11:25:03 +02:00
Henning Andersen 30d8085d96 scheduleAtFixedRate would hang (#42993)
Though not in use in elasticsearch currently, it seems surprising that
ThreadPool.scheduler().scheduleAtFixedRate would hang. A recurring
scheduled task is never completed (except on failure) and we test for
exceptions using RunnableFuture.get(), which hangs for periodic tasks.
Fixed by checking that task is done before calling .get().
2019-06-11 19:46:37 +02:00
David Turner 04cde1d6e2 Defer reroute when nodes join (#42855)
Today the master eagerly reroutes the cluster as part of processing node joins.
However, it is not necessary to do this reroute straight away, and it is
sometimes preferable to defer it until later. For instance, when the master
wins its election it processes joins and performs a reroute, but it would be
better to defer the reroute until after the master has become properly
established.

This change defers this reroute into a separate task, and batches multiple such
tasks together.
2019-06-11 14:00:18 +01:00
Henning Andersen 1c7cd09375 Enable TRACE for testRecoverBrokenIndexMetadata (#43081)
Relates to #43034
2019-06-11 12:38:48 +02:00
Jim Ferenczi 900eb4f882 Handle empty terms index in TermsSliceQuery (#43078)
#40741 introduced a merge policy that can drop the postings for the `_id`
field on soft deleted documents. The TermsSliceQuery assumes that every document
has has an entry in the postings for that field so it doesn't check if the terms
index exists or not. This change fixes this bug by checking if the terms index for
the `_id` field is null and ignore the segment entirely if it's the case. This should
be harmless since segments without an `_id` terms index should only contain soft deleted
documents.

Closes #42996
2019-06-11 12:01:53 +02:00
Henning Andersen 6a77dde5ea Better test diag output on OOM (#42989)
If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.

Related to #42244
2019-06-11 09:48:52 +02:00
Alan Woodward 8e23e4518a Move construction of custom analyzers into AnalysisRegistry (#42940)
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.

This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
2019-06-10 14:33:25 +01:00
Jim Ferenczi 39cb1abc9d Fix auto fuzziness in query_string query (#42897)
Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.
2019-06-10 10:13:16 +02:00
Vigya Sharma 25218733e6 Allow routing commands with ?retry_failed=true (#42658)
We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.

This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:

```
POST /_cluster/reroute?retry_failed=true
{
  "commands": [
    {
      "allocate_replica": {
        "index": "blahblah",
        "shard": 2,
        "node": "node-4"
      }
    }
  ]
}
```

Fixes #39546
2019-06-10 08:31:05 +01:00
Jason Tedor 63bad28005
Do not allow modify aliases on followers (#43017)
Now that aliases are replicated by a follower from its leader, this
commit prevents directly modifying aliases on follower indices.
2019-06-09 22:53:54 -04:00
Nhat Nguyen 0ebcb21d2c Unmuted testRecoverBrokenIndexMetadata
These tests should be okay as we flush at the end of peer recovery.

Closes #40867
2019-06-09 10:26:57 -04:00
Nhat Nguyen afe65b5988 Fix assertion in ReadOnlyEngine (#43010)
We should execute the assertion before throwing an exception;
otherwise, it's a noop.
2019-06-09 10:26:56 -04:00
Jason Tedor 915d2f2daa
Refactor put mapping request validation for reuse (#43005)
This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.
2019-06-09 10:19:04 -04:00
Nhat Nguyen 0a982fc57f Mute testLookupSeqNoByIdInLucene
Tracked at #42979
2019-06-08 00:30:12 -04:00
Jason Tedor b580677412
Fix put mapping request validators random test
This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.

Relates #43000
2019-06-07 17:47:51 -04:00
Jason Tedor d6fe4b648d
Fix possible NPE in put mapping validators (#43000)
When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.
2019-06-07 16:24:12 -04:00
David Turner 5bc0dfce94
Improve translog corruption detection (#42980)
Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:

- a random value is written
- the file is truncated

It also makes sure that we return a `TranslogCorruptionException` in all such
cases.

Fixes #42661
Backport of #42744
2019-06-07 20:28:02 +01:00
Jason Tedor 479a1eeff6
Drop dead code for socket permissions for transport (#42990)
This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).
2019-06-07 15:22:10 -04:00
markharwood 0719779a48
Search - enable low_level_cancellation by default. (#42291) (#42857)
Benchmarking on worst-case queries (max agg on match_all or popular-term query with large index) was not noticeably slower.

Closes #26258
2019-06-07 14:53:17 +01:00
Henning Andersen dea935ac31
Reindex max_docs parameter name (#42942)
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.

The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.

Finally, all 3 endpoints now support max_docs in both body and URL.

Relates #24344
2019-06-07 12:16:36 +02:00
David Turner 5929803413 Relax timeout in NodeConnectionsServiceTests (#42934)
Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.

```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
             final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
             if (connectionBlock != null) {
                 try {
+                    Thread.sleep(50);
                     connectionBlock.run();
                 } catch (Exception e) {
                     throw new AssertionError(e);
```

This change relaxes the test to allow some time for the connection thread to
hit the barrier.

Fixes #40170
2019-06-07 10:38:56 +01:00
henryptung 61b62125b8 Wire query cache into sorting nested-filter computation (#42906)
Don't use Lucene's default query cache when filtering in sort.

Closes #42813
2019-06-06 21:16:58 +02:00
Henning Andersen ca5dbf93a5 Fix concurrent search and index delete (#42621)
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.

This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).

Closes #28053
2019-06-06 20:10:43 +02:00
Simon Willnauer 7fcca55a3c [TEST] Remove unnecessary log line 2019-06-06 14:17:44 +02:00
Simon Willnauer 2582e1e8ad Fix `InternalEngineTests#testPruneAwayDeletedButRetainedIds`
The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.

Relates to: #40741
2019-06-06 14:16:24 +02:00
Yannick Welsch 9f7be70f7a Fix testPendingTasks (#42922)
Fixes a race in the test which can be reliably reproduced by adding Thread.sleep(100) to the end of
IndicesService.processPendingDeletes

Closes #18747
2019-06-06 14:15:48 +02:00
Yannick Welsch 72735be673 Fix NPE when rejecting bulk updates (#42923)
Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.
2019-06-06 14:15:48 +02:00
Simon Willnauer 2c3bd32aff Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.

Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
2019-06-06 13:41:46 +02:00
Gordon Brown 6eb4600e93
Add custom metadata to snapshots (#41281)
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
2019-06-05 17:30:31 -06:00
Mark Vieira 1f4ff97d7d
Mute failing test
(cherry picked from commit 4952d4facf5949abdb9aae47dbe1ee18cf7eef99)
2019-06-05 13:47:18 -07:00
Przemyslaw Gomulka ab5bc83597
Deprecation info for joda-java migration on 7.x (#42659)
Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes.
Deprecation api on that version will give them guidance on what patterns need to be changed.
relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings

There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not
#42010
2019-06-05 19:50:04 +02:00
Simon Willnauer d3524fdd06 Add back import after backport 2019-06-05 11:25:19 +02:00
Simon Willnauer 4dfaeb9046 Remove post Java 9 API usage after backport 2019-06-05 11:24:58 +02:00
Jim Ferenczi de0ea4bbf7 Deduplicate alias and concrete fields in query field expansion (#42328)
The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
 these duplicates early on.
2019-06-05 11:05:40 +02:00
Simon Willnauer 41a9f3ae3b Use reader attributes to control term dict memory useage (#42838)
This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.

Closes #38390
2019-06-05 11:01:06 +02:00
David Turner 955aee8a07 More logging in testRerouteOccursOnDiskPassingHighWatermark (#42864)
This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.

Relates #40174, #42424
2019-06-05 09:05:44 +01:00
Jason Tedor 78be3dde25
Enable testing against JDK 13 EA builds (#40829)
This commit adds JDK 13 to the CI rotation for testing. For now, we will
be testing against JDK 13 EA builds.
2019-06-04 20:54:24 -04:00
Jason Tedor 117df87b2b
Replicate aliases in cross-cluster replication (#42875)
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.

Relates #41815
2019-06-04 20:36:24 -04:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
Andrey Ershov 6391f90616 Fix testNoMasterActionsWriteMasterBlock (#42798)
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.

Closes #39688

(cherry picked from commit c8988d5cf5a85f9b28ce148dbf100aaa6682a757)
2019-06-04 17:24:03 +02:00
Alan Woodward df124f32db Refactor control flow in TransportAnalyzeAction (#42801)
The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.
2019-06-04 14:52:46 +01:00
Yu 428beabc49 Remove "template" field in IndexTemplateMetaData (#42099)
Remove "template" field from XContent parsing in IndexTemplateMetaData
2019-06-03 12:43:11 -05:00
Armin Braun 00db9c1a2f
Make Connection Future Err. Handling more Resilient (#42781) (#42804)
* There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener
* Relates #42350
2019-06-03 19:29:36 +02:00
David Turner df0f0b3d40
Rename autoMinMasterNodes to autoManageMasterNodes (#42789)
Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its
broader meaning since 7.0.

Backport of the relevant part of #42700 to `7.x`.
2019-06-03 12:12:07 +01:00
Alan Woodward 2129d06643 Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197)
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.

This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
2019-06-03 09:46:36 +01:00
Alan Woodward d0da30e5f4 Return NO_INTERVALS rather than null from empty TokenStream (#42750)
IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.

Fixes #42587
2019-05-31 17:45:57 +01:00
Jason Tedor 61c6a26b31
Remove locale-dependent string checking
We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.

Relates #41689
2019-05-31 12:08:38 -04:00
Jason Tedor 371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Mark Vieira c1816354ed
[Backport] Improve build configuration time (#42674) 2019-05-30 10:29:42 -07:00
David Turner d14799f0a5 Prevent merging nodes' data paths (#42665)
Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.

This change adds a check during startup to ensure that each data path belongs
to the same node.
2019-05-30 18:08:55 +01:00
Marios Trivyzas ce30afcd01
Deprecate CommonTermsQuery and cutoff_frequency (#42619) (#42691)
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.

Relates to #27096

(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
2019-05-30 18:04:47 +02:00
David Turner 86b1a07887 Log leader and handshake failures by default (#42342)
Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:

- if the low-level handshake succeeds but the high-level one fails then this
  indicates a config error that the user should resolve, and the exception
  will help them to do so.

- if leader checks fail repeatedly then we restart discovery, and the exception
  will help to determine what went wrong.

Resolves #42153
2019-05-30 08:14:19 +01:00
Igor Motov d2f9ccbe18 Geo: Refactor libs/geo parsers (#42549)
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
2019-05-29 20:07:27 -04:00
Henning Andersen 53f5d313cd Use correct global checkpoint sync interval (#42642)
A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.

Closes #42637
2019-05-29 08:15:53 +02:00
Adrien Grand 38f9e24411
Add 7.1.2 version constant. (#42648)
Relates to #42635
2019-05-28 23:14:10 +02:00
Jim Ferenczi 267e5a1110 fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219) 2019-05-28 22:12:16 +02:00
Armin Braun 6166fed6f1
Fix BulkProcessorRetryIT (#41700) (#42618)
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
2019-05-28 17:58:00 +02:00
Vigya Sharma 130c832e10 Validate routing commands using updated routing state (#42066)
When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.

This addresses an issue uncovered during work on #41050.
2019-05-28 17:01:14 +02:00
David Turner c21745c8ab Avoid loading retention leases while writing them (#42620)
Resolves #41430.
2019-05-28 15:27:06 +01:00
Yannick Welsch 1e0b0f640b Fix compilation
Follow-up to 5598647922
2019-05-28 13:56:36 +02:00
Yannick Welsch 5598647922 Reset state recovery after successful recovery (#42576)
The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.

Closes #39172
2019-05-28 13:46:10 +02:00
David Turner 746a2f41fd
Remove PRE_60_NODE_CHECKPOINT (#42531)
This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing
with 5.x nodes' lack of sequence number support.

Backport of #42527
2019-05-28 12:25:53 +01:00
Armin Braun 00d665540a
Make unwrapCorrupt Check Suppressed Ex. (#41889) (#42605)
* Make unwrapCorrupt Check Suppressed Ex. (#41889)
* As discussed in #24800 we want to check for suppressed corruption
indicating exceptions here as well to more reliably categorize
corruption related exceptions
* Closes #24800, 41201
2019-05-28 12:44:40 +02:00
Daniel Mitterdorfer adb3574af8
Mute NodeTests (#42615)
Relates #42577
Relates #42614
2019-05-28 12:25:18 +02:00
Armin Braun 116b050cc6
Cleanup Bulk Delete Exception Logging (#41693) (#42606)
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
2019-05-28 11:00:28 +02:00
Nhat Nguyen de6be819d6 Allocate to data-only nodes in ReopenWhileClosingIT (#42560)
If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.

Closes #39757
2019-05-27 17:32:06 -04:00
Armin Braun a94d24ae5a
Fix RareClusterStateIT (#42430) (#42580)
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
   * Fixed by adding a wait for no in-progress publications
   * Also added debug logging that would've identified this problem
* Closes #36813
2019-05-27 13:57:17 +02:00
Armin Braun c4f44024af
Remove Delete Method from BlobStore (#41619) (#42574)
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
  * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
2019-05-27 12:24:20 +02:00
Armin Braun bb7e8eb2fd
Introduce ShardState Enum + Slight Cleanup SnapshotsInProgress (#41940) (#42573)
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
   * relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
2019-05-27 12:08:45 +02:00
Armin Braun 7b4d1ac352
Remove Obsolete BwC Logic from BlobStoreRepository (#42193) (#42571)
* Remove Obsolete BwC Logic from BlobStoreRepository

* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
2019-05-27 11:47:04 +02:00
Armin Braun c7448b12e1
Cleanup Redundant BlobStoreFormat Class (#42195) (#42570)
* No need to have an abstract class here when there's only a single impl.
2019-05-27 11:28:50 +02:00
Armin Braun 49767fc1e9
Some Cleanup in o.e.gateway Package (#42108) (#42568)
* Removing obvious dead code
* Removing redundant listener interface
2019-05-27 11:28:12 +02:00
Armin Braun a5ca20a250
Some Cleanup in o.e.i.engine (#42278) (#42566)
* Some Cleanup in o.e.i.engine

* Remove dead code and parameters
* Reduce visibility in some obvious spots
* Add missing `assert`s (not that important here since the methods
themselves will probably be dead-code eliminated) but still
2019-05-27 11:04:54 +02:00
Martijn van Groningen e591d30918
fixed test compile issue 2019-05-27 10:17:00 +02:00
Martijn van Groningen 48a71459c0
Improve how internal representation of pipelines are updated (#42257)
If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.

Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.

This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.
2019-05-27 10:01:15 +02:00
Nhat Nguyen 85e60850af Add debug log for retention leases (#42557)
We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.
2019-05-26 16:04:47 -04:00
Tanguy Leroux 6bec876682 Improve Close Index Response (#39687)
This changes the `CloseIndexResponse` so that it reports closing result
for each index. Shard failures or exception are also reported per index,
and the global acknowledgment flag is computed from the index results
only.

The response looks like:
```
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "indices" : {
    "docs" : {
      "closed" : true
    }
  }
}
```

The response reports shard failures like:
```
{
  "acknowledged" : false,
  "shards_acknowledged" : false,
  "indices" : {
    "docs-1" : {
      "closed" : true
    },
    "docs-2" : {
      "closed" : false,
      "shards" : {
        "1" : {
          "failures" : [
            {
              "shard" : 1,
              "index" : "docs-2",
              "status" : "BAD_REQUEST",
              "reason" : {
                "type" : "index_closed_exception",
                "reason" : "closed",
                "index_uuid" : "JFmQwr_aSPiZbkAH_KEF7A",
                "index" : "docs-2"
              }
            }
          ]
        }
      }
    },
    "docs-3" : {
      "closed" : true
    }
  }
}
```

Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
2019-05-24 21:57:55 -04:00
Julie Tibshirani 3a6c2525ca
Deprecate support for chained multi-fields. (#42330)
This PR contains a straight backport of #41926, and also updates the
migration documentation and deprecation info API for 7.x.
2019-05-24 15:55:06 -07:00
Jason Tedor f2cfd09289
Remove renewal in retention lease recovery test (#42536)
This commit removes the act of renewing some retention leases during a
retention lease recovery test. Having renewal does not add anything
extra to this test, but does allow for some situations where the test
can fail spuriously (i.e., in a way that does not indicate that
production code is broken).
2019-05-24 17:40:59 -05:00
Nhat Nguyen 74d771d8f6 Adjust load SplitIndexIT#testSplitIndexPrimaryTerm (#42477)
SplitIndexIT#testSplitIndexPrimaryTerm sometimes timeout due to
relocating many shards. This change adjusts loads and increases
the timeout.
2019-05-24 15:47:29 -04:00
Nhat Nguyen 02739d038c Mute accounting circuit breaker check after test (#42448)
If we close an engine while a refresh is happening, then we might leak
refCount of some SegmentReaders. We need to skip the ram accounting
circuit breaker check until we have a new Lucene snapshot which includes
the fix for LUCENE-8809.

This also adds a test to the engine but left it muted so we won't forget
to reenable this check.

Closes #30290
2019-05-24 15:42:12 -04:00
Nhat Nguyen 329d1307a5 Add test to verify force primary allocation on closed indices (#42458)
This change adds a test verifying that we can force primary allocation on closed indices.
2019-05-24 17:23:58 +02:00
Henning Andersen 075fd2a0ac Shard CLI tool always check shards (#41480)
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Related to #41298
2019-05-24 16:49:37 +02:00
Marios Trivyzas 523b5bfdb5 Fix sorting on nested field with unmapped (#42451)
Previously sorting on a missing nested field would fail with an
Exception:
`[nested_field] failed to find nested object under path [nested_path]`
despite `unmapped_type` being set on the query.

Fixes: #33644

(cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)
2019-05-24 15:47:41 +02:00
Christoph Büscher 12d5642e93 Small internal AnalysisRegistry changes (#42500)
Some internal refactorings to the AnalysisRegistry, spin-off from #40782.
2019-05-24 15:27:35 +02:00
David Turner a5b6ed8d1e Remove AwaitsFix of #41967 following #42504 2019-05-24 14:26:49 +01:00
David Turner 4d02ca1633 Drain master task queue when stabilising (#42504)
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
2019-05-24 14:18:02 +01:00
weizijun 40348ab726 Use accurate total hits in IndexPrimaryRelocationIT
By default, we track total hits up to 10k but we might index more than
10k documents `testPrimaryRelocationWhileIndexing`. With this change, we
always request for the accurate total hits in the test.

> java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.
2019-05-24 12:47:21 +02:00
Simon Willnauer 46ccfba808 Remove IndexStore and DirectoryService (#42446)
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
2019-05-24 12:14:56 +02:00
David Turner f864f6a740 Cluster state from API should always have a master (#42454)
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes #38331
Relates #38432
2019-05-24 08:45:22 +01:00
David Turner 528f8cc073 Add stack traces to RetentionLeasesIT failures (#42425)
Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing
the stack trace that came with the exception. This commit adjusts this to
re-throw the exception wrapped in an `AssertionError` so we can see more
details about failures such as #41430.
2019-05-24 08:37:51 +01:00
David Turner c0974a9813 Add more logging to MockDiskUsagesIT (#42424)
This commit adds a log message containing the routing table, emitted on each
iteration of the failing assertBusy() in #40174. It also modernizes the code a
bit.
2019-05-24 08:28:10 +01:00
Jack Conradson 167f391cfd Bug fix to allow access to top level params in reduce script (#42096) 2019-05-23 16:00:39 -07:00
Ryan Ernst a49bafc194
Split document and metadata fields in GetResult (#38373) (#42456)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
2019-05-23 14:01:07 -07:00
Jake Landis 2b22ceac04
Bulk processor concurrent requests (#41451) (#42438)
`org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that
allows for simple semantics to deal with sending bulk requests. Once a
bulk reaches it's pre-defined size, documents, or flush interval it will
execute sending the bulk. One configurable option is the number of concurrent
outstanding bulk requests. That concurrency is implemented in
`org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However,
the only code that currently calls into this code is blocked by `synchronized`
methods. This results in the in-ability for the BulkProcessor to behave concurrently
despite supporting configurable amounts of concurrent requests.

This change removes the `synchronized` method in favor an explicit
lock around the non-thread safe parts of the method. The call into
`org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which
allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.
2019-05-23 14:22:16 -05:00
Simon Willnauer 5a884dac03 Unguice Snapshot / Restore services (#42357)
This removes the @Inject annotations from the Snapshot/Restore infrastructure
classes and registers them manually in Node.java
2019-05-23 17:09:26 +02:00
Jim Ferenczi a497603219 Disable max score optimization for queries with unbounded max scores (#41361)
Lucene 8 has the ability to skip blocks of non-competitive documents.
However some queries don't track their maximum score (`script_score`, `span`, ...)
so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down
some boolean queries if other clauses have bounded max scores. This commit disables
the max score optimization when we detect a mandatory scoring clause with unbounded
 max scores. Optional clauses are not checked since they can still skip documents
 when the unbounded clause is after the current document.
2019-05-23 16:53:57 +02:00
Yannick Welsch f57fdc57e9
Deprecate max_local_storage_nodes (#42426)
Allows this setting to be removed in 8.0, see #42428
2019-05-23 15:59:55 +02:00
Christoph Büscher 85ff9543b7 Prevent normalizer from not being closed on exception (#42375)
Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and
only later checks whether it should be added to the normalizer map passed in. In
case we throw an exception it isn't closed. This can be prevented by moving the
check that throws the exception earlier.
2019-05-23 15:53:55 +02:00
markharwood c2c8d0e637 Test fix - results equality failed because of subtle scoring differences between replicas. (#42366)
Diverging merge policies means the segments and therefore scores are not the same.
Fixed the test by ensuring there are zero replicas.

Closes #32492
2019-05-23 12:00:57 +01:00
Jim Ferenczi b88e80ab89 Upgrade to Lucene 8.1.0 (#42214)
This commit upgrades to the GA release of Lucene 8.1.0
2019-05-23 11:46:45 +02:00
Jim Ferenczi 4ca5649a0d Upgrade to lucene 8.1.0-snapshot-e460356abe (#40952) 2019-05-23 11:45:33 +02:00
Marios Trivyzas 0777223bab
Allow `fields` to be set to `*` (#42301)
Allow for SimpleQueryString, QueryString and MultiMatchQuery
to set the `fields` parameter to the wildcard `*`. If so, set
the leniency to `true`, to achieve the same behaviour as from the
`"default_field" : "*" setting.

Furthermore,  check if `*` is in the list of the `default_field` but
not necessarily as the 1st element.

Closes: #39577
(cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)
2019-05-23 10:10:48 +02:00
Yannick Welsch a71d19e92a Ensure testAckedIndexing uses disruption index settings
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
2019-05-22 19:13:14 +02:00
Jake Landis 496fee3333
bump to 7.3 (#42365) 2019-05-22 11:57:07 -05:00
Luca Cavanna c2af62455f Cut over SearchResponse and SearchTemplateResponse to Writeable (#41855)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 29c9bb9181 Clean up ShardId usage of Streamable (#41843)
ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be
easily replaced with direct usages of the constructor that takes a
StreamInput as argument.
2019-05-22 18:47:54 +02:00
Luca Cavanna 96ba0b13e0 Cut over MultiSearchResponse to Writeable (#41844)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 1ded45b0a2 Cut over SearchPhaseResult to Writeable (#41853)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna c85f285298 Move InternalAggregations to Writeable (#41841)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 39d4c7c26f Skip explain fetch sub phase when request holds only suggestions (#41739)
In case a search request holds only the suggest section, the query phase
is skipped and only the suggest phase is executed instead. There will
never be hits returned, and in case the explain flag is set to true, the
 explain sub phase throws a null pointer exception as the query is null.
 Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest
 only searches. To address this, we skip the explain sub fetch phase
 for search requests that only requested suggestions.

Closes #31260
2019-05-22 18:47:54 +02:00
Luca Cavanna 3416cda8b1 Cut over ClusterSearchShardsGroup to Writeable (#41788) 2019-05-22 18:47:54 +02:00
Guillaume Darmont 3e231bbad6 StackOverflowError when calling BulkRequest#add (#41672)
Removing of payload in BulkRequest (#39843) had a side effect of making
`BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus
leading to StackOverflowError. This PR adds a small change in
RequestConvertersTests to show the error and the corresponding fix in
`BulkRequest`.

Fixes #41668
2019-05-22 11:22:14 -05:00
mushao999 d4b5933225 Fix alpha version error message (#40406) 2019-05-22 09:06:10 -07:00
Yannick Welsch eae58c477c Remove testNodeFailuresAreProcessedOnce
This test was not checking the thing it was supposed to anyway.
2019-05-22 14:52:01 +02:00
Yannick Welsch 250973af1d Fix testCannotJoinIfMasterLostDataFolder
Relates to #41047
2019-05-22 14:37:31 +02:00
Simon Willnauer a79cd77e5c Remove IndexShard dependency from Repository (#42213)
* Remove IndexShard dependency from Repository

In order to simplify repository testing especially for BlobStoreRepository
it's important to remove the dependency on IndexShard and reduce it to
Store and MapperService (in the snapshot case). This significantly reduces
the dependcy footprint for Repository and allows unittesting without starting
nodes or instantiate entire shard instances. This change deprecates the old
method signatures and adds a unittest for FileRepository to show the advantage
of this change.
In addition, the unittesting surfaced a bug where the internal file names that
are private to the repository were used in the recovery stats instead of the
target file names which makes it impossible to relate to the actual lucene files
in the recovery stats.

* don't delegate deprecated methods

* apply comments

* test
2019-05-22 14:27:11 +02:00
Ignacio Vera 3a20ff7e86
Fix TopHitsAggregationBuilder adding duplicate _score sort clauses (#42179) (#42343)
When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC)
caused query to contain duplicate sort clauses.
2019-05-22 14:02:52 +02:00
Yannick Welsch f338005179 Revert "Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock()"
This reverts commit 448fc8444559be3145e4a7f65dec794ebbff7b81.
2019-05-22 13:22:09 +02:00
Yannick Welsch 0c7322ebf2 Avoid bubbling up failures from a shard that is recovering (#42287)
A shard that is undergoing peer recovery is subject to logging warnings of the form

org.elasticsearch.action.FailedNodeException: Failed node [XYZ]
...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ...

These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e.
there is an IndexShard instance, but no proper IndexCommit just yet).
As these failures are currently bubbled up to the master, they cause unnecessary reroutes and
confusion amongst users due to being logged as warnings.

Closes  #40107
2019-05-22 12:26:15 +02:00
Yannick Welsch 770d8e9e39 Remove usage of max_local_storage_nodes in test infrastructure (#41652)
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.

This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
2019-05-22 11:04:55 +02:00
Yannick Welsch c9dedf180b Use comparator for Reconfigurator (#42283)
Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for
the priorities. Does not make changes to the behavior of the component.
2019-05-22 10:04:51 +02:00
Nhat Nguyen bcbf1aff6b Peer recovery should flush at the end (#41660)
Flushing at the end of a peer recovery (if needed) can bring these
benefits:

1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.

2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)

Closes #40024
Closes #39588
2019-05-21 22:45:17 -04:00
Nhat Nguyen 84df48ccb3 Recovery with syncId should verify seqno infos (#41265)
This change verifies and aborts recovery if source and target have the
same syncId but different sequenceId. This commit also adds an upgrade
test to ensure that we always utilize syncId.
2019-05-21 22:44:17 -04:00
Nhat Nguyen 3573b1d0ce Skip global checkpoint sync for closed indices (#41874)
The verifying-before-close step ensures the global checkpoints on all
shard copies are in sync; thus, we don' t need to sync global
checkpoints for closed indices.

Relate #33888
2019-05-21 19:55:21 -04:00
Nhat Nguyen 4d55e9e070 Estimate num history ops should always use translog (#42211)
Currently, we ignore soft-deletes in peer recovery, thus
estimateNumberOfHistoryOperations should always use translog.

Relates #38904
2019-05-21 19:53:31 -04:00
Jason Tedor b510402b67
Fix off-by-one error in an index shard test
There is an off-by-one error in this test. It leads to the recovery
thread never being started, and that means joining on it will wait
indefinitely. This commit addresses that by fixing the off-by-one error.

Relates #42325
2019-05-21 19:20:29 -04:00
Nhat Nguyen 6808951e6f Mute testDelayedOperationsBeforeAndAfterRelocated
Tracked at #42325
2019-05-21 17:08:43 -04:00
Jason Tedor dd7a65fdf2
Fix compilation in IndexShardTests
I forgot to git add these before pushing, sorry. This commit fixes
compilation in IndexShardTests, they are needed here and not in master
due to differences in how Java infers types in generics between JDK 8
and JDK 11.
2019-05-21 16:12:27 -04:00
Jason Tedor f7ff0aff79
Execute actions under permit in primary mode only (#42241)
Today when executing an action on a primary shard under permit, we do
not enforce that the shard is in primary mode before executing the
action. This commit addresses this by wrapping actions to be executed
under permit in a check that the shard is in primary mode before
executing the action.
2019-05-21 15:54:31 -04:00
Jason Tedor 32b70ed34c
Avoid unnecessary persistence of retention leases (#42299)
Today we are persisting the retention leases at least every thirty
seconds by a scheduled background sync. This sync causes an fsync to
disk and when there are a large number of shards allocated to slow
disks, these fsyncs can pile up and can severely impact the system. This
commit addresses this by only persisting and fsyncing the retention
leases if they have changed since the last time that we persisted and
fsynced the retention leases.
2019-05-21 14:00:48 -04:00
Armin Braun ecd033bea6
Cleanup Various Uses of ActionListener (#40126) (#42274)
* Cleanup Various Uses of ActionListener

* Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class
* Use ActionRunnable where functionally equivalent
2019-05-21 17:20:52 +02:00
Henning Andersen 75425ae167 Remove 7.0.2 (#42282)
7.0.2 removed, since it will never be, fixing branch consistency check.
2019-05-21 15:52:58 +02:00
David Turner 7abeaba8bb Prevent in-place downgrades and invalid upgrades (#41731)
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.

This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.

This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
2019-05-21 08:04:30 +01:00
Jake Landis b0a25c3170
add 7.1.1 and 6.8.1 versions (#42251) 2019-05-20 17:58:24 -05:00
Ryan Ernst be515d7ce0 Validate non-secure settings are not in keystore (#42209)
Secure settings currently error if they exist inside elasticsearch.yml.
This commit adds validation that non-secure settings do not exist inside
the keystore.

closes #41831
2019-05-20 11:35:53 -07:00
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
Alexander Reelsen c72c76b5ea Update to joda time 2.10.2 (#42199) 2019-05-20 16:58:54 +02:00
Zachary Tong 072a9bdf55 Fix FiltersAggregation NPE when `filters` is empty (#41459)
If `keyedFilters` is null it assumes there are unkeyed filters...which
will NPE if the unkeyed filters was actually empty.

This refactors to simplify the filter assignment a bit, adds an empty
check and tidies up some formatting.
2019-05-20 10:04:21 -04:00
Jim Ferenczi b7599472ac Fix random failure in SearchRequestTests#testRandomVersionSerialization (#42069)
This commit fixes a test bug that ends up comparing the result of two consecutive calls to System.currentTimeMillis that can be different
on slow CIs.

Closes #42064
2019-05-20 10:14:05 +02:00
Nhat Nguyen 0ec7986049 Enable debug log in testRetentionLeasesSyncOnRecovery
Relates #39105
2019-05-19 22:07:25 -04:00
Nhat Nguyen 6ffc6ea42e Don't verify evictions in testFilterCacheStats (#42091)
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
2019-05-15 18:17:53 -04:00
Nhat Nguyen a75e916078 Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)
This test can create and shuffle 2*(3*5*7) = 210 shards which is quite
heavy for our CI. This commit reduces the load, so we don't timeout on
CI.

Closes #28153
2019-05-15 18:17:46 -04:00
Igor Motov 70ea3cf847
SQL: Add initial geo support (#42031) (#42135)
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. 

Queries that are supported as a result of this initial implementation

Metadata commands

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. 

Returning geoshapes and geopoints from elasticsearch

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

Using geopoints to elasticsearch

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. 
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

Limitations:

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to #29872
Backport of #42031
2019-05-14 18:57:12 -05:00
Jay Modi 327f44e051
Concurrent tests wait for threads to be ready (#42083)
This change updates tests that use a CountDownLatch to synchronize the
running of threads when testing concurrent operations so that we ensure
the thread has been fully created and run by the scheduler. Previously,
these tests used a latch with a value of 1 and the test thread counted
down while the threads performing concurrent operations just waited.
This change updates the value of the latch to be 1 + the number of
threads. Each thread counts down and then waits. This means that each
thread has been constructed and has started running. All threads will
have a common start point now.
2019-05-14 16:29:52 -04:00
David Turner 367e027962 Log cluster UUID when committed (#42065)
Today we do not expose the cluster UUID in any logs by default, but it would be
useful to see it. For instance if a user starts multiple nodes as separate
clusters then they will silently remain as separate clusters even if they are
subsequently reconfigured to look like a single cluster. This change logs the
committed cluster UUID the first time the node encounters it.
2019-05-14 05:35:14 -04:00
Yogesh Gaikwad 90dce0864a
Increase the sample space for random inner hits name generator (#42057) (#42072)
This commits changes the minimum length for inner hits
name to avoid name collision which sometimes failed the
test.
2019-05-12 10:32:02 +10:00
Andrei Stefan 912c6bdbff
Prevent order being lost for _nodes API filters (#42045) (#42089)
* Switch to using a list instead of a Set for the filters, so that the
order of these filters is kept.

(cherry picked from commit 74a743829799b64971e0ac5ae265f43f6c14e074)
2019-05-11 01:58:03 +03:00
Nhat Nguyen c19ea0a6f1 Remove global checkpoint assertion in peer recovery (#41987)
If remote recovery copies an index commit which has gaps in sequence
numbers to a follower; then these assertions (introduced in #40823)
don't hold for follower replicas.

Closes #41037
2019-05-10 14:38:35 -04:00
Christoph Büscher 3e59c31a12 Change IndexAnalyzers default analyzer access (#42011)
Currently IndexAnalyzers keeps the three default as separate class members
although they should refer to the same analyzers held in the additional
analyzers map under the default names. This assumption should be made more
explicit by keeping all analyzers in the map. This change adapts the constructor
to check all the default entries are there and the getters to reach into the map
with the default names when needed.
2019-05-10 18:08:51 +02:00
Jay Modi 80432a3552
Remove close method in PageCacheRecycler/Recycler (#41917)
The changes in #39317 brought to light some concurrency issues in the
close method of Recyclers as we do not wait for threads running in the
threadpool to be finished prior to the closing of the PageCacheRecycler
and the Recyclers that are used internally. #41695 was opened to
address the concurrent close issues but upon review, the closing of
these classes is not really needed as the instances should be become
available for garbage collection once there is no longer a reference to
the closed node.

Closes #41683
2019-05-10 08:56:05 -06:00
Alan Woodward 44c3418531 Simplify handling of keyword field normalizers (#42002)
We have a number of places in analysis-handling code where we check
if a field type is a keyword field, and if so then extract the normalizer rather
than pulling the index-time analyzer. However, a keyword normalizer is
really just a special case of an analyzer, so we should be able to simplify this
by setting the normalizer as the index-time analyzer at construction time.
2019-05-10 14:38:46 +01:00
Nhat Nguyen 809ed3b721 shouldRollGeneration should execute under read lock (#41696)
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
2019-05-10 09:28:33 -04:00
David Turner 2a8a64d3f1 Remove extra `ms` from log message (#42068)
This log message logs a `TimeValue` which includes units, but also logs an
extra `ms`. This commit removes the extra `ms`.
2019-05-10 14:03:37 +01:00
Armin Braun ea7db2bb6a
Fix testCloseOrDeleteIndexDuringSnapshot (#42007)
* This test was resulting in a `PARTIAL` instead of a `SUCCESS` state for
the case of closing an index during snapshotting on 7.x
  * The reason for this is the changed default behaviour regarding
waiting for active shards between 8.0 and 7.x
  * Fixed by adjusting the waiting behaviour on the close index request
in the test
* Closes #39828
2019-05-10 11:59:20 +02:00
Armin Braun dc444cef49
Fix Race in Closing IndicesService.CacheCleaner (#42016) (#42052)
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce #32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
2019-05-10 09:29:27 +02:00
Tal Levy 5640197632
Refactor TransportSingleShardAction to serialize Writeable responses (#41985) (#42040)
Previously, TransportSingleShardAction required constructing a new
empty response object. This response object's Streamable readFrom
was used. As part of the migration to Writeable, the interface here
was updated to leverage Writeable.Reader.

relates to #34389.
2019-05-09 22:08:31 -07:00
Jay Modi 2998c107fb
Fix node close stopwatch usage (#41918)
The close method in Node uses a StopWatch to time to closing of
various services. However, the call to log the timing was made before
any of the services had been closed and therefore no timing would be
printed out. This change moves the timing log call to be a closeable
that is the last item closed.
2019-05-09 09:41:42 -06:00
Jay Modi f3bcc4fc22
Default seed address tests account for no IPv6 (#41971)
This change makes the default seed address tests account for the lack
of an IPv6 network. By default docker containers only run with IPv4 and
these tests fail in a vanilla installation of elasticsearch-ci. To
resolve this we only expect IPv6 seed addresses if IPv6 is available.

Relates #41404
2019-05-09 08:19:46 -06:00
David Kyle 256588d773 Mute IndexStatsIT#testFilterCacheStats
See https://github.com/elastic/elasticsearch/issues/32506
2019-05-09 13:49:47 +01:00
Jim Ferenczi b7c7ca8f09 Fix IAE on cross_fields query introduced in 7.0.1 (#41938)
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.

Closes #41934
2019-05-09 14:25:46 +02:00
Alan Woodward 309e4a11b5 Cut AnalyzeResponse over to Writeable (#41915)
This commit makes AnalyzeResponse and its various helper classes implement
Writeable. The classes are also now immutable.

Relates to #34389
2019-05-09 13:09:23 +01:00
Jim Ferenczi a329aaec90 Fix assertion error when caching the result of a search in a read-only index (#41900)
The ReadOnlyEngine wraps its reader with a SoftDeletesDirectoryReaderWrapper if soft deletes
are enabled. However the wrapping is done on top of the ElasticsearchDirectoryReader and that
trips assertion later on since the cache key of these directories are different. This commit
changes the order of the wrapping to put the ElasticsearchDirectoryReader first in order to
ensure that it is always retrieved first when we unwrap the directory.

Closes #41795
2019-05-09 08:59:52 +02:00
Benjamin Trent edd6438e34
mute test related to #41967 (#41968) 2019-05-08 15:03:28 -05:00
William Brafford a2b7871f9f
Allow unknown task time in QueueResizingEsTPE (#41957)
* Allow unknown task time in QueueResizingEsTPE

The afterExecute method previously asserted that a TimedRunnable task
must have a positive execution time. However, the code in TimedRunnable
returns a value of -1 when a task time is unknown. Here, we expand the
logic in the assertion to allow for that possibility, and we don't
update our task time average if the value is negative.

* Add a failure flag to TimedRunnable

In order to be sure that a task has an execution time of -1 because of
a failure, I'm adding a failure flag boolean to the TimedRunnable class.
If execution time is negative for some other reason, an assertion will
fail.

Backport of #41810
Fixes #41448
2019-05-08 14:15:22 -04:00
David Roberts 452ee55cdb Make ISO8601 date parser accept timezone when time does not have seconds (#41896)
Prior to this change the ISO8601 date parser would only
parse an optional timezone if seconds were specified.
This change moves the timezone to the same level of
optional components as hour, so that timestamps without
minutes or seconds may optionally contain a timezone.
It also adds a unit test to cover all the supported
formats.
2019-05-08 13:50:53 +01:00
Yannick Welsch 957046dad0 Allow IDEA test runner to control number of test iterations (#41653)
Allows configuring the number of test iterations via IntelliJ's config dialog, instead of having to add it
manually via the tests.iters system property.
2019-05-08 13:57:29 +02:00
Armin Braun 5c824f3993
Reenable testCloseOrDeleteIndexDuringSnapshot (#41892)
* Relates #39828
2019-05-08 13:10:19 +02:00
Jim Ferenczi ca3d881716 Always set terminated_early if terminate_after is set in the search request (#40839)
* terminated_early should always be set in the response with terminate_after

Today we set `terminated_early` to true in the response if the query terminated
early due to `terminate_after`. However if `terminate_after` is smaller than
the number of documents in a shard we don't set the flag in the response indicating
that the query was exhaustive. This change fixes this disprepancy by setting
terminated_early to false in the response if the number of documents that match
the query is smaller than the provided `terminate_after` value.

Closes #33949
2019-05-08 12:26:38 +02:00
David Turner 4c909e93bb
Reject port ranges in `discovery.seed_hosts` (#41905)
Today Elasticsearch accepts, but silently ignores, port ranges in the
`discovery.seed_hosts` setting:

```
discovery.seed_hosts: 10.1.2.3:9300-9400
```

Silently ignoring part of a setting like this is trappy. With this change we
reject seed host addresses of this form.

Closes #40786
Backport of #41404
2019-05-08 08:34:32 +01:00
David Turner 935f70c05e Handle serialization exceptions during publication (#41781)
Today if an exception is thrown when serializing a cluster state during
publication then the master enters a poisoned state where it cannot publish any
more cluster states, but nor does it stand down as master, yielding repeated
exceptions of the following form:

```
failed to commit cluster state version [12345]
org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1045) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0.jar:7.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: cannot start publishing next value before accepting previous one
        at org.elasticsearch.cluster.coordination.CoordinationState.handleClientValue(CoordinationState.java:280) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1030) ~[elasticsearch-7.0.0.jar:7.0.0]
        ... 11 more
```

This is because it already created the publication request using
`CoordinationState#handleClientValue()` but then it fails before accepting it.
This commit addresses this by performing the serialization before calling
`handleClientValue()`.

Relates #41090, which was the source of such a serialization exception.
2019-05-07 17:53:12 +01:00
Alan Woodward 4cca1e8fff Correct spelling of MockLogAppender.PatternSeenEventExpectation (#41893)
The class was called PatternSeenEventExcpectation. This commit
is a straight class rename to correct the spelling.
2019-05-07 17:28:51 +01:00
Ryan Ernst e9e4bae683 Fix fractional seconds for strict_date_optional_time (#41871)
The fractional seconds portion of strict_date_optional_time was
accidentally copied from the printer, which always prints at least 3
fractional digits. This commit fixes the formatter to allow 1 or 2
fractional seconds.

closes #41633
2019-05-07 09:09:30 -07:00
Henning Andersen f068a22f5f SeqNo CAS linearizability (#38561)
Add a test that stresses concurrent writes using ifSeqno/ifPrimaryTerm to do CAS style updates. Use linearizability checker to verify linearizability. Linearizability of successful CAS'es is guaranteed.

Changed linearizability checker to allow collecting history concurrently.

Changed unresponsive network simulation to wake up immediately when network disruption is cleared to ensure tests proceed in a timely manner (and this also seems more likely to provoke issues).
2019-05-07 14:04:38 +02:00
Jim Ferenczi 70bf432fa8 Fix full text queries test that start with now (#41854)
Full text queries that start with now are not cacheable if they target a date field.
However we assume in the query builder tests that all queries are cacheable and this assumption
fails when the random generated query string starts with "now". This fails twice in several years
since the probability that a random string starts with "now" is low but this commit ensures that
 isCacheable is correctly checked for full text queries that fall into this edge case.

 Closes #41847
2019-05-06 19:08:30 +02:00
Przemyslaw Gomulka 79b7ce8697
Fix javadoc in WrapperQueryBuilder backport(41641) #41849
missing brackets in javadoc
backports #41641
2019-05-06 17:55:11 +02:00
Henning Andersen 227d5e15fb ReadOnlyEngine assertion fix (#41842)
Fixed the assertion that maxSeqNo == globalCheckpoint to actually check
against the global checkpoint.
2019-05-06 16:11:38 +02:00
Hicham Mallah 4a88da70c5 Add index name to cluster block exception (#41489)
Updates the error message to reveal the index name that is causing it.

Closes #40870
2019-05-04 19:11:59 -04:00
Nhat Nguyen c7924014fa
Verify consistency of version and source in disruption tests (#41614) (#41661)
With this change, we will verify the consistency of version and source
(besides id, seq_no, and term) of live documents between shard copies
at the end of disruption tests.
2019-05-03 18:47:14 -04:00
Nhat Nguyen e61469aae6 Noop peer recoveries on closed index (#41400)
If users close an index to change some non-dynamic index settings, then the current implementation forces replicas of that closed index to copy over segment files from the primary. With this change, we make peer recoveries of closed index skip both phases.

Relates #33888

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2019-05-03 12:07:37 -04:00
Issam EL-ATIF 23706d4cdf Update error message for allowed characters in aggregation names (#41573)
Exception message thrown when specifying illegal characters did
no accurately described the allowed characters.  This updates the 
error message to reflect reality (any character except [, ] and >)
2019-05-03 11:55:09 -04:00
Jason Tedor 03c959f188
Upgrade keystore on package install (#41755)
When Elasticsearch is run from a package installation, the running
process does not have permissions to write to the keystore. This is
because of the root:root ownership of /etc/elasticsearch. This is why we
create the keystore if it does not exist during package installation. If
the keystore needs to be upgraded, that is currently done by the running
Elasticsearch process. Yet, as just mentioned, the Elasticsearch process
would not have permissions to do that during runtime. Instead, this
needs to be done during package upgrade. This commit adds an upgrade
command to the keystore CLI for this purpose, and that is invoked during
package upgrade if the keystore already exists. This ensures that we are
always on the latest keystore format before the Elasticsearch process is
invoked, and therefore no upgrade would be needed then. While this bug
has always existed, we have not heard of reports of it in practice. Yet,
this bug becomes a lot more likely with a recent change to the format of
the keystore to remove the distinction between file and string entries.
2019-05-03 10:34:30 -04:00
David Turner 873d0020a5 Reject null customs at build time (#41782)
Today you can add a null `Custom` to the cluster state or its metadata, but
attempting to publish such a cluster state will fail. Unfortunately, the
publication-time failure gives very little information about the source of the
problem. This change causes the failure to manifest earlier and adds
information about which `Custom` was null in order to simplify the
investigation.

Relates #41090.
2019-05-03 14:52:32 +02:00
Jack Conradson 025619bbf1 Improve error message for ln/log with negative results in function score
This changes the error message for a negative result in a function score when 
using the ln modifier to suggest using ln1p or ln2p when a negative result 
occurs in a function score and for the log modifier to suggest using log1p or 
log2p.

This relates to #41509
2019-05-02 16:31:25 -07:00
Jason Tedor d0f071236a
Simplify filtering addresses on interfaces (#41758)
This commit is a refactoring of how we filter addresses on
interfaces. In particular, we refactor all of these methods into a
common private method. We also change the order of logic to first check
if an address matches our filter and then check if the interface is
up. This is to possibly avoid problems we are seeing where devices are
flapping up and down while we are checking for loopback addresses. We do
not expect the loopback device to flap up and down so by reversing the
logic here we avoid that problem on CI machines. Finally, we expand the
error message when this does occur so that we know which device is
flapping.
2019-05-02 16:36:27 -04:00
Colin Goodheart-Smithe ab9154005b
Adds version 6.7.3 2019-05-02 17:36:23 +01:00
Tim Brooks b4bcbf9f64
Support http read timeouts for transport-nio (#41466)
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
2019-05-02 09:48:52 -06:00
David Turner b189596631 Add details to BulkShardRequest#getDescription() (#41711)
Today a bulk shard request appears as follows in the detailed task list:

    requests[42], index[my_index]

This change adds the shard index and refresh policy too:

    requests[42], index[my_index][2], refresh[IMMEDIATE]
2019-05-02 08:29:25 +02:00
Andy Bristol b9e44288d3 mute NodeTests#testCloseOnInterruptibleTask
For #41448
2019-05-01 13:24:22 -07:00
Jason Tedor 39b0b5809d
Fix minimum compatible version after 6.8
This commit fixes the minimum compatible version after the introduction
of 6.8.
2019-05-01 16:21:13 -04:00
Jay Modi 7f7eb7b679 Add version 7.0.2 to 7.x branch (#41715) 2019-05-01 15:23:53 -04:00
Jason Tedor f08ac103ee
Add 6.8 version constant
This commit adds the 6.8 version constant to the 7.x branch.
2019-05-01 13:38:58 -04:00
Jason Tedor 7f3ab4524f
Bump 7.x branch to version 7.2.0
This commit adds the 7.2.0 version constant to the 7.x branch, and bumps
BWC logic accordingly.
2019-05-01 13:38:57 -04:00
Henning Andersen c6abe74dd6
Close and acquire commit during reset engine fix (#41584) (#41709)
If closing a shard while resetting engine,
IndexEventListener.afterIndexShardClosed would be called while there is
still an active IndexWriter on the shard. For integration tests, this
leads to an exception during check index called from MockFSIndexStore
.Listener. Fixed.

Relates to #38561
2019-05-01 15:22:24 +02:00
Jason Tedor 26c72c96bd
Fix imports in KeyStoreWrapperTests
This commit addresses a checkstyle violation in KeyStoreWrapperTests,
removing a leftover import.
2019-05-01 07:21:23 -04:00
Jason Tedor 0b46a62f6b
Drop distinction in entries for keystore (#41701)
Today we allow adding entries from a file or from a string, yet we
internally maintain this distinction such that if you try to add a value
from a file for a setting that expects a string or add a value from a
string for a setting that expects a file, you will have a bad time. This
causes a pain for operators such that for each setting they need to know
this difference. Yet, we do not need to maintain this distinction
internally as they are bytes after all. This commit removes that
distinction and includes logic to upgrade legacy keystores.
2019-05-01 07:02:04 -04:00
Nhat Nguyen 887f3f2c83 Simplify initialization of max_seq_no of updates (#41161)
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates #33842
2019-04-30 15:14:52 -04:00
Igor Motov 10ab838106
Geo: Add GeoJson parser to libs/geo classes (#41575) (#41657)
Adds GeoJson parser for Geometry classes defined in libs/geo.

Relates #40908 and #29872
2019-04-29 19:43:31 -04:00
Alan Woodward a01f451ef7 Limit complexity of IntervalQueryBuilderTests#testRandomSource() (#41538)
IntervalsSources can throw IllegalArgumentExceptions if they would produce
too many disjunctions.  To mitigate against this when building random
sources, we limit the depth of the randomly generated source to four
nested sources

Fixes #41402
2019-04-29 13:31:19 +01:00
Dan Hermann b23709b178
Applies the same naming restrictions to repositories as to snapshots except that leading underscores and uppercase characters are permitted. (#41585)
Fixes #40817.
2019-04-29 07:31:01 -05:00
Armin Braun 6e51b6f96d
Add Repository Consistency Assertion to SnapshotResiliencyTests (#41631)
* Add Repository Consistency Assertion to SnapshotResiliencyTests (#40857)

* Add Repository Consistency Assertion to SnapshotResiliencyTests

* Add some quick validation on not leaving behind any dangling metadata or dangling indices to the snapshot resiliency tests
   * Added todo about expanding this assertion further

* Fix SnapshotResiliencyTest Repo Consistency Check (#41332)

* Fix SnapshotResiliencyTest Repo Consistency Check

* Due to the random creation of an empty `extra0` file by the Lucene mockFS we see broken tests because we use the existence of an index folder in assertions and the index deletion doesn't go through if there are extra files in an index folder
  * Fixed by removing the `extra0` file and resulting empty directory trees before asserting repo consistency
* Closes #41326

* Reenable SnapshotResiliency Test (#41437)

This was fixed in https://github.com/elastic/elasticsearch/pull/41332
but I forgot to reenable the test.

* fix compile on java8
2019-04-29 12:01:58 +02:00
Nhat Nguyen 615a0211f0 Recovery should not indefinitely retry on mapping error (#41099)
A stuck peer recovery in #40913 reveals that we indefinitely retry on
new cluster states if indexing translog operations hits a mapper
exception. We should not wait and retry if the mapping on the target is
as recent as the mapping that the primary used to index the replaying
operations.

Relates #40913
2019-04-27 10:55:08 -04:00
Michael Morello 75283294f5 Fix multi-node parsing in voting config exclusions REST API (#41588)
Fixes an issue where multiple nodes where not properly parsed in the voting config exclusions REST API.

Closes #41587
2019-04-27 12:20:03 +02:00
Nick Knize 113b24be4b Refactor GeoHashUtils (#40869)
This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.
2019-04-26 10:06:36 -05:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Armin Braun 7824f60a34
Simplify Snapshot Resiliency Test (#40930) (#41565)
* Thanks to #39793 dynamic mapping updates don't contain blocking operations anymore so we don't have to manually put the mapping in this test and can keep it a little simpler
2019-04-26 10:59:09 +02:00
Christoph Büscher 078936b8f5 Remove search analyzers from DocumentFieldMappers (#41484)
These references seem to be unused except for tests and should be removed to
keep the places we store analyzers limited.
2019-04-26 09:48:48 +02:00