Commit Graph

3366 Commits

Author SHA1 Message Date
Simon Willnauer 2c3bd32aff Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.

Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
2019-06-06 13:41:46 +02:00
Gordon Brown 6eb4600e93
Add custom metadata to snapshots (#41281)
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
2019-06-05 17:30:31 -06:00
Mark Vieira 1f4ff97d7d
Mute failing test
(cherry picked from commit 4952d4facf5949abdb9aae47dbe1ee18cf7eef99)
2019-06-05 13:47:18 -07:00
Przemyslaw Gomulka ab5bc83597
Deprecation info for joda-java migration on 7.x (#42659)
Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes.
Deprecation api on that version will give them guidance on what patterns need to be changed.
relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings

There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not
#42010
2019-06-05 19:50:04 +02:00
Simon Willnauer d3524fdd06 Add back import after backport 2019-06-05 11:25:19 +02:00
Simon Willnauer 4dfaeb9046 Remove post Java 9 API usage after backport 2019-06-05 11:24:58 +02:00
Jim Ferenczi de0ea4bbf7 Deduplicate alias and concrete fields in query field expansion (#42328)
The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
 these duplicates early on.
2019-06-05 11:05:40 +02:00
Simon Willnauer 41a9f3ae3b Use reader attributes to control term dict memory useage (#42838)
This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.

Closes #38390
2019-06-05 11:01:06 +02:00
David Turner 955aee8a07 More logging in testRerouteOccursOnDiskPassingHighWatermark (#42864)
This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.

Relates #40174, #42424
2019-06-05 09:05:44 +01:00
Jason Tedor 78be3dde25
Enable testing against JDK 13 EA builds (#40829)
This commit adds JDK 13 to the CI rotation for testing. For now, we will
be testing against JDK 13 EA builds.
2019-06-04 20:54:24 -04:00
Jason Tedor 117df87b2b
Replicate aliases in cross-cluster replication (#42875)
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.

Relates #41815
2019-06-04 20:36:24 -04:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
Andrey Ershov 6391f90616 Fix testNoMasterActionsWriteMasterBlock (#42798)
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.

Closes #39688

(cherry picked from commit c8988d5cf5a85f9b28ce148dbf100aaa6682a757)
2019-06-04 17:24:03 +02:00
Alan Woodward df124f32db Refactor control flow in TransportAnalyzeAction (#42801)
The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.
2019-06-04 14:52:46 +01:00
Yu 428beabc49 Remove "template" field in IndexTemplateMetaData (#42099)
Remove "template" field from XContent parsing in IndexTemplateMetaData
2019-06-03 12:43:11 -05:00
Armin Braun 00db9c1a2f
Make Connection Future Err. Handling more Resilient (#42781) (#42804)
* There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener
* Relates #42350
2019-06-03 19:29:36 +02:00
David Turner df0f0b3d40
Rename autoMinMasterNodes to autoManageMasterNodes (#42789)
Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its
broader meaning since 7.0.

Backport of the relevant part of #42700 to `7.x`.
2019-06-03 12:12:07 +01:00
Alan Woodward 2129d06643 Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197)
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.

This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
2019-06-03 09:46:36 +01:00
Alan Woodward d0da30e5f4 Return NO_INTERVALS rather than null from empty TokenStream (#42750)
IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.

Fixes #42587
2019-05-31 17:45:57 +01:00
Jason Tedor 61c6a26b31
Remove locale-dependent string checking
We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.

Relates #41689
2019-05-31 12:08:38 -04:00
Jason Tedor 371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Mark Vieira c1816354ed
[Backport] Improve build configuration time (#42674) 2019-05-30 10:29:42 -07:00
David Turner d14799f0a5 Prevent merging nodes' data paths (#42665)
Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.

This change adds a check during startup to ensure that each data path belongs
to the same node.
2019-05-30 18:08:55 +01:00
Marios Trivyzas ce30afcd01
Deprecate CommonTermsQuery and cutoff_frequency (#42619) (#42691)
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.

Relates to #27096

(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
2019-05-30 18:04:47 +02:00
David Turner 86b1a07887 Log leader and handshake failures by default (#42342)
Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:

- if the low-level handshake succeeds but the high-level one fails then this
  indicates a config error that the user should resolve, and the exception
  will help them to do so.

- if leader checks fail repeatedly then we restart discovery, and the exception
  will help to determine what went wrong.

Resolves #42153
2019-05-30 08:14:19 +01:00
Igor Motov d2f9ccbe18 Geo: Refactor libs/geo parsers (#42549)
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
2019-05-29 20:07:27 -04:00
Henning Andersen 53f5d313cd Use correct global checkpoint sync interval (#42642)
A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.

Closes #42637
2019-05-29 08:15:53 +02:00
Adrien Grand 38f9e24411
Add 7.1.2 version constant. (#42648)
Relates to #42635
2019-05-28 23:14:10 +02:00
Jim Ferenczi 267e5a1110 fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219) 2019-05-28 22:12:16 +02:00
Armin Braun 6166fed6f1
Fix BulkProcessorRetryIT (#41700) (#42618)
* Now that we process the bulk requests themselves on the WRITE threadpool, they can run out of retries too like the item requests even when backoff is active
* Fixes #41324 by using the same logic that checks failed item requests for their retry status for the top level bulk requests as well
2019-05-28 17:58:00 +02:00
Vigya Sharma 130c832e10 Validate routing commands using updated routing state (#42066)
When multiple commands are called in sequence, fetch shards
from mutable, up-to-date routing nodes to ensure each command's
changes are visible to subsequent commands.

This addresses an issue uncovered during work on #41050.
2019-05-28 17:01:14 +02:00
David Turner c21745c8ab Avoid loading retention leases while writing them (#42620)
Resolves #41430.
2019-05-28 15:27:06 +01:00
Yannick Welsch 1e0b0f640b Fix compilation
Follow-up to 5598647922
2019-05-28 13:56:36 +02:00
Yannick Welsch 5598647922 Reset state recovery after successful recovery (#42576)
The problem this commit addresses is that state recovery is not reset on a node that then becomes
master with a cluster state that has a state not recovered flag in it. The situation that was observed
in a failed test run of MinimumMasterNodesIT.testThreeNodesNoMasterBlock (see below) is that we
have 3 master nodes (node_t0, node_t1, node_t2), two of them are shut down (node_t2 remains),
when the first one comes back (renamed to node_t4) it becomes leader in term 2 and sends state
(with state_not_recovered_block) to node_t2, which accepts. node_t2 becomes leader in term 3, and
as it was previously leader in term1 and successfully completed state recovery, does never retry
state recovery in term 3.

Closes #39172
2019-05-28 13:46:10 +02:00
David Turner 746a2f41fd
Remove PRE_60_NODE_CHECKPOINT (#42531)
This commit removes the obsolete `PRE_60_NODE_CHECKPOINT` constant for dealing
with 5.x nodes' lack of sequence number support.

Backport of #42527
2019-05-28 12:25:53 +01:00
Armin Braun 00d665540a
Make unwrapCorrupt Check Suppressed Ex. (#41889) (#42605)
* Make unwrapCorrupt Check Suppressed Ex. (#41889)
* As discussed in #24800 we want to check for suppressed corruption
indicating exceptions here as well to more reliably categorize
corruption related exceptions
* Closes #24800, 41201
2019-05-28 12:44:40 +02:00
Daniel Mitterdorfer adb3574af8
Mute NodeTests (#42615)
Relates #42577
Relates #42614
2019-05-28 12:25:18 +02:00
Armin Braun 116b050cc6
Cleanup Bulk Delete Exception Logging (#41693) (#42606)
* Cleanup Bulk Delete Exception Logging

* Follow up to #41368
* Collect all failed blob deletes and add them to the exception message
* Remove logging of blob name list from caller exception logging
2019-05-28 11:00:28 +02:00
Nhat Nguyen de6be819d6 Allocate to data-only nodes in ReopenWhileClosingIT (#42560)
If all primary shards are allocated on the master node, then the
verifying before close step will never interact with mock transport
service. This change prefers to allocate shards on data-only nodes.

Closes #39757
2019-05-27 17:32:06 -04:00
Armin Braun a94d24ae5a
Fix RareClusterStateIT (#42430) (#42580)
* It looks like we might be cancelling a previous publication instead of
the one triggered by the given request with a very low likelihood.
   * Fixed by adding a wait for no in-progress publications
   * Also added debug logging that would've identified this problem
* Closes #36813
2019-05-27 13:57:17 +02:00
Armin Braun c4f44024af
Remove Delete Method from BlobStore (#41619) (#42574)
* Remove Delete Method from BlobStore (#41619)
* The delete method on the blob store was used almost nowhere and just duplicates the delete method on the blob containers
  * The fact that it provided for some recursive delete logic (that did not behave the same way on all implementations) was not used and not properly tested either
2019-05-27 12:24:20 +02:00
Armin Braun bb7e8eb2fd
Introduce ShardState Enum + Slight Cleanup SnapshotsInProgress (#41940) (#42573)
* Added separate enum for the state of each shard, it was really
confusing that we used the same enum for the state of the snapshot
overall and the state of each individual shard
   * relates https://github.com/elastic/elasticsearch/pull/40943#issuecomment-488664150
* Shortened some obvious spots in equals method and saved a few lines
via `computeIfAbsent` to make up for adding 50 new lines to this class
2019-05-27 12:08:45 +02:00
Armin Braun 7b4d1ac352
Remove Obsolete BwC Logic from BlobStoreRepository (#42193) (#42571)
* Remove Obsolete BwC Logic from BlobStoreRepository

* We can't restore 1.3.3 files anyway -> no point in doing the dance of computing a hash here
* Some other minor+obvious cleanups
2019-05-27 11:47:04 +02:00
Armin Braun c7448b12e1
Cleanup Redundant BlobStoreFormat Class (#42195) (#42570)
* No need to have an abstract class here when there's only a single impl.
2019-05-27 11:28:50 +02:00
Armin Braun 49767fc1e9
Some Cleanup in o.e.gateway Package (#42108) (#42568)
* Removing obvious dead code
* Removing redundant listener interface
2019-05-27 11:28:12 +02:00
Armin Braun a5ca20a250
Some Cleanup in o.e.i.engine (#42278) (#42566)
* Some Cleanup in o.e.i.engine

* Remove dead code and parameters
* Reduce visibility in some obvious spots
* Add missing `assert`s (not that important here since the methods
themselves will probably be dead-code eliminated) but still
2019-05-27 11:04:54 +02:00
Martijn van Groningen e591d30918
fixed test compile issue 2019-05-27 10:17:00 +02:00
Martijn van Groningen 48a71459c0
Improve how internal representation of pipelines are updated (#42257)
If a single pipeline is updated then the internal representation of
all pipelines was updated. With this change, only the internal representation
of the pipelines that have been modified will be updated.

Prior to this change the IngestMetadata of the previous and current cluster
was used to determine whether the internal representation of pipelines
should be updated. If applying the previous cluster state change failed then
subsequent cluster state changes that have no changes to IngestMetadata
will not attempt to update the internal representation of the pipelines.

This commit, changes how the IngestService updates the internal representation
by keeping track of the underlying configuration and use that to detect
against the new IngestMetadata whether a pipeline configuration has been
changed and if so, then the internal pipeline representation will be updated.
2019-05-27 10:01:15 +02:00
Nhat Nguyen 85e60850af Add debug log for retention leases (#42557)
We need more information to understand why CcrRetentionLeaseIT is
failing. This commit adds some debug log to retention leases and enables
them in CcrRetentionLeaseIT.
2019-05-26 16:04:47 -04:00
Tanguy Leroux 6bec876682 Improve Close Index Response (#39687)
This changes the `CloseIndexResponse` so that it reports closing result
for each index. Shard failures or exception are also reported per index,
and the global acknowledgment flag is computed from the index results
only.

The response looks like:
```
{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "indices" : {
    "docs" : {
      "closed" : true
    }
  }
}
```

The response reports shard failures like:
```
{
  "acknowledged" : false,
  "shards_acknowledged" : false,
  "indices" : {
    "docs-1" : {
      "closed" : true
    },
    "docs-2" : {
      "closed" : false,
      "shards" : {
        "1" : {
          "failures" : [
            {
              "shard" : 1,
              "index" : "docs-2",
              "status" : "BAD_REQUEST",
              "reason" : {
                "type" : "index_closed_exception",
                "reason" : "closed",
                "index_uuid" : "JFmQwr_aSPiZbkAH_KEF7A",
                "index" : "docs-2"
              }
            }
          ]
        }
      }
    },
    "docs-3" : {
      "closed" : true
    }
  }
}
```

Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
2019-05-24 21:57:55 -04:00
Julie Tibshirani 3a6c2525ca
Deprecate support for chained multi-fields. (#42330)
This PR contains a straight backport of #41926, and also updates the
migration documentation and deprecation info API for 7.x.
2019-05-24 15:55:06 -07:00
Jason Tedor f2cfd09289
Remove renewal in retention lease recovery test (#42536)
This commit removes the act of renewing some retention leases during a
retention lease recovery test. Having renewal does not add anything
extra to this test, but does allow for some situations where the test
can fail spuriously (i.e., in a way that does not indicate that
production code is broken).
2019-05-24 17:40:59 -05:00
Nhat Nguyen 74d771d8f6 Adjust load SplitIndexIT#testSplitIndexPrimaryTerm (#42477)
SplitIndexIT#testSplitIndexPrimaryTerm sometimes timeout due to
relocating many shards. This change adjusts loads and increases
the timeout.
2019-05-24 15:47:29 -04:00
Nhat Nguyen 02739d038c Mute accounting circuit breaker check after test (#42448)
If we close an engine while a refresh is happening, then we might leak
refCount of some SegmentReaders. We need to skip the ram accounting
circuit breaker check until we have a new Lucene snapshot which includes
the fix for LUCENE-8809.

This also adds a test to the engine but left it muted so we won't forget
to reenable this check.

Closes #30290
2019-05-24 15:42:12 -04:00
Nhat Nguyen 329d1307a5 Add test to verify force primary allocation on closed indices (#42458)
This change adds a test verifying that we can force primary allocation on closed indices.
2019-05-24 17:23:58 +02:00
Henning Andersen 075fd2a0ac Shard CLI tool always check shards (#41480)
The shard CLI tool would not do anything if a corruption marker was not
present. But a corruption marker is only added if a corruption is
detected during indexing/writing, not if a search or other read fails.

Changed the tool to always check shards regardless of corruption marker
presence.

Related to #41298
2019-05-24 16:49:37 +02:00
Marios Trivyzas 523b5bfdb5 Fix sorting on nested field with unmapped (#42451)
Previously sorting on a missing nested field would fail with an
Exception:
`[nested_field] failed to find nested object under path [nested_path]`
despite `unmapped_type` being set on the query.

Fixes: #33644

(cherry picked from commit 631142d5dd088a10de8dcd939b50a14301173283)
2019-05-24 15:47:41 +02:00
Christoph Büscher 12d5642e93 Small internal AnalysisRegistry changes (#42500)
Some internal refactorings to the AnalysisRegistry, spin-off from #40782.
2019-05-24 15:27:35 +02:00
David Turner a5b6ed8d1e Remove AwaitsFix of #41967 following #42504 2019-05-24 14:26:49 +01:00
David Turner 4d02ca1633 Drain master task queue when stabilising (#42504)
Today the default stabilisation time is calculated on the assumption that the
elected master has no pending tasks to process when it is elected, but this is
not a safe assumption to make. This can result in a cluster reaching the end of
its stabilisation time without having stabilised. Furthermore in #36943 we
increased the probability that each step in `runRandomly()` enqueues another
task, vastly increasing the chance that we hit such a situation.

This change extends the stabilisation process to allow time for all pending
tasks, plus a task that might currently be in flight.

Fixes #41967, in which the master entered the stabilisation phase with over 800
tasks to process.
2019-05-24 14:18:02 +01:00
weizijun 40348ab726 Use accurate total hits in IndexPrimaryRelocationIT
By default, we track total hits up to 10k but we might index more than
10k documents `testPrimaryRelocationWhileIndexing`. With this change, we
always request for the accurate total hits in the test.

> java.lang.AssertionError: Count is 10000+ hits but 11684 was expected.
2019-05-24 12:47:21 +02:00
Simon Willnauer 46ccfba808 Remove IndexStore and DirectoryService (#42446)
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
2019-05-24 12:14:56 +02:00
David Turner f864f6a740 Cluster state from API should always have a master (#42454)
Today the `TransportClusterStateAction` ignores the state passed by the
`TransportMasterNodeAction` and obtains its state from the cluster applier.
This might be inconsistent, showing a different node as the master or maybe
even having no master.

This change adjusts the action to use the passed-in state directly, and adds
tests showing that the state returned is consistent with our expectations even
if there is a concurrent master failover.

Fixes #38331
Relates #38432
2019-05-24 08:45:22 +01:00
David Turner 528f8cc073 Add stack traces to RetentionLeasesIT failures (#42425)
Today `RetentionLeaseIT` calls `fail(e.toString())` on some exceptions, losing
the stack trace that came with the exception. This commit adjusts this to
re-throw the exception wrapped in an `AssertionError` so we can see more
details about failures such as #41430.
2019-05-24 08:37:51 +01:00
David Turner c0974a9813 Add more logging to MockDiskUsagesIT (#42424)
This commit adds a log message containing the routing table, emitted on each
iteration of the failing assertBusy() in #40174. It also modernizes the code a
bit.
2019-05-24 08:28:10 +01:00
Jack Conradson 167f391cfd Bug fix to allow access to top level params in reduce script (#42096) 2019-05-23 16:00:39 -07:00
Ryan Ernst a49bafc194
Split document and metadata fields in GetResult (#38373) (#42456)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
2019-05-23 14:01:07 -07:00
Jake Landis 2b22ceac04
Bulk processor concurrent requests (#41451) (#42438)
`org.elasticsearch.action.bulk.BulkProcessor` is a threadsafe class that
allows for simple semantics to deal with sending bulk requests. Once a
bulk reaches it's pre-defined size, documents, or flush interval it will
execute sending the bulk. One configurable option is the number of concurrent
outstanding bulk requests. That concurrency is implemented in
`org.elasticsearch.action.bulk.BulkRequestHandler` via a semaphore. However,
the only code that currently calls into this code is blocked by `synchronized`
methods. This results in the in-ability for the BulkProcessor to behave concurrently
despite supporting configurable amounts of concurrent requests.

This change removes the `synchronized` method in favor an explicit
lock around the non-thread safe parts of the method. The call into
`org.elasticsearch.action.bulk.BulkRequestHandler` is no longer blocking, which
allows `org.elasticsearch.action.bulk.BulkRequestHandler` to handle it's own concurrency.
2019-05-23 14:22:16 -05:00
Simon Willnauer 5a884dac03 Unguice Snapshot / Restore services (#42357)
This removes the @Inject annotations from the Snapshot/Restore infrastructure
classes and registers them manually in Node.java
2019-05-23 17:09:26 +02:00
Jim Ferenczi a497603219 Disable max score optimization for queries with unbounded max scores (#41361)
Lucene 8 has the ability to skip blocks of non-competitive documents.
However some queries don't track their maximum score (`script_score`, `span`, ...)
so they always return Float.POSITIVE_INFINITY as maximum score. This can slow down
some boolean queries if other clauses have bounded max scores. This commit disables
the max score optimization when we detect a mandatory scoring clause with unbounded
 max scores. Optional clauses are not checked since they can still skip documents
 when the unbounded clause is after the current document.
2019-05-23 16:53:57 +02:00
Yannick Welsch f57fdc57e9
Deprecate max_local_storage_nodes (#42426)
Allows this setting to be removed in 8.0, see #42428
2019-05-23 15:59:55 +02:00
Christoph Büscher 85ff9543b7 Prevent normalizer from not being closed on exception (#42375)
Currently AnalysisRegistry#processNormalizerFactory creates a normalizer and
only later checks whether it should be added to the normalizer map passed in. In
case we throw an exception it isn't closed. This can be prevented by moving the
check that throws the exception earlier.
2019-05-23 15:53:55 +02:00
markharwood c2c8d0e637 Test fix - results equality failed because of subtle scoring differences between replicas. (#42366)
Diverging merge policies means the segments and therefore scores are not the same.
Fixed the test by ensuring there are zero replicas.

Closes #32492
2019-05-23 12:00:57 +01:00
Jim Ferenczi b88e80ab89 Upgrade to Lucene 8.1.0 (#42214)
This commit upgrades to the GA release of Lucene 8.1.0
2019-05-23 11:46:45 +02:00
Jim Ferenczi 4ca5649a0d Upgrade to lucene 8.1.0-snapshot-e460356abe (#40952) 2019-05-23 11:45:33 +02:00
Marios Trivyzas 0777223bab
Allow `fields` to be set to `*` (#42301)
Allow for SimpleQueryString, QueryString and MultiMatchQuery
to set the `fields` parameter to the wildcard `*`. If so, set
the leniency to `true`, to achieve the same behaviour as from the
`"default_field" : "*" setting.

Furthermore,  check if `*` is in the list of the `default_field` but
not necessarily as the 1st element.

Closes: #39577
(cherry picked from commit e75ff0c748e6b68232c2b08e19ac4a4934918264)
2019-05-23 10:10:48 +02:00
Yannick Welsch a71d19e92a Ensure testAckedIndexing uses disruption index settings
AbstractDisruptionTestCase set a lower global checkpoint sync interval setting, but this was ignored by
testAckedIndexing, which has led to spurious test failures

Relates #41068, #38931
2019-05-22 19:13:14 +02:00
Jake Landis 496fee3333
bump to 7.3 (#42365) 2019-05-22 11:57:07 -05:00
Luca Cavanna c2af62455f Cut over SearchResponse and SearchTemplateResponse to Writeable (#41855)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 29c9bb9181 Clean up ShardId usage of Streamable (#41843)
ShardId already implements Writeable so there is no need for it to implement Streamable too. Also the readShardId static method can be
easily replaced with direct usages of the constructor that takes a
StreamInput as argument.
2019-05-22 18:47:54 +02:00
Luca Cavanna 96ba0b13e0 Cut over MultiSearchResponse to Writeable (#41844)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 1ded45b0a2 Cut over SearchPhaseResult to Writeable (#41853)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna c85f285298 Move InternalAggregations to Writeable (#41841)
Relates to #34389
2019-05-22 18:47:54 +02:00
Luca Cavanna 39d4c7c26f Skip explain fetch sub phase when request holds only suggestions (#41739)
In case a search request holds only the suggest section, the query phase
is skipped and only the suggest phase is executed instead. There will
never be hits returned, and in case the explain flag is set to true, the
 explain sub phase throws a null pointer exception as the query is null.
 Usually a null query is replaced with a match all query as part of SearchContext#preProcess which is though skipped as well with suggest
 only searches. To address this, we skip the explain sub fetch phase
 for search requests that only requested suggestions.

Closes #31260
2019-05-22 18:47:54 +02:00
Luca Cavanna 3416cda8b1 Cut over ClusterSearchShardsGroup to Writeable (#41788) 2019-05-22 18:47:54 +02:00
Guillaume Darmont 3e231bbad6 StackOverflowError when calling BulkRequest#add (#41672)
Removing of payload in BulkRequest (#39843) had a side effect of making
`BulkRequest.add(DocWriteRequest<?>...)` (with varargs) recursive, thus
leading to StackOverflowError. This PR adds a small change in
RequestConvertersTests to show the error and the corresponding fix in
`BulkRequest`.

Fixes #41668
2019-05-22 11:22:14 -05:00
mushao999 d4b5933225 Fix alpha version error message (#40406) 2019-05-22 09:06:10 -07:00
Yannick Welsch eae58c477c Remove testNodeFailuresAreProcessedOnce
This test was not checking the thing it was supposed to anyway.
2019-05-22 14:52:01 +02:00
Yannick Welsch 250973af1d Fix testCannotJoinIfMasterLostDataFolder
Relates to #41047
2019-05-22 14:37:31 +02:00
Simon Willnauer a79cd77e5c Remove IndexShard dependency from Repository (#42213)
* Remove IndexShard dependency from Repository

In order to simplify repository testing especially for BlobStoreRepository
it's important to remove the dependency on IndexShard and reduce it to
Store and MapperService (in the snapshot case). This significantly reduces
the dependcy footprint for Repository and allows unittesting without starting
nodes or instantiate entire shard instances. This change deprecates the old
method signatures and adds a unittest for FileRepository to show the advantage
of this change.
In addition, the unittesting surfaced a bug where the internal file names that
are private to the repository were used in the recovery stats instead of the
target file names which makes it impossible to relate to the actual lucene files
in the recovery stats.

* don't delegate deprecated methods

* apply comments

* test
2019-05-22 14:27:11 +02:00
Ignacio Vera 3a20ff7e86
Fix TopHitsAggregationBuilder adding duplicate _score sort clauses (#42179) (#42343)
When using High Level Rest Client Java API to produce search query, using AggregationBuilders.topHits("th").sort("_score", SortOrder.DESC)
caused query to contain duplicate sort clauses.
2019-05-22 14:02:52 +02:00
Yannick Welsch f338005179 Revert "Mute MinimumMasterNodesIT.testThreeNodesNoMasterBlock()"
This reverts commit 448fc8444559be3145e4a7f65dec794ebbff7b81.
2019-05-22 13:22:09 +02:00
Yannick Welsch 0c7322ebf2 Avoid bubbling up failures from a shard that is recovering (#42287)
A shard that is undergoing peer recovery is subject to logging warnings of the form

org.elasticsearch.action.FailedNodeException: Failed node [XYZ]
...
Caused by: org.apache.lucene.index.IndexNotFoundException: no segments* file found in ...

These failures are actually harmless, and expected to happen while a peer recovery is ongoing (i.e.
there is an IndexShard instance, but no proper IndexCommit just yet).
As these failures are currently bubbled up to the master, they cause unnecessary reroutes and
confusion amongst users due to being logged as warnings.

Closes  #40107
2019-05-22 12:26:15 +02:00
Yannick Welsch 770d8e9e39 Remove usage of max_local_storage_nodes in test infrastructure (#41652)
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.

This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
2019-05-22 11:04:55 +02:00
Yannick Welsch c9dedf180b Use comparator for Reconfigurator (#42283)
Simplifies the voting configuration reconfiguration logic by switching to an explicit Comparator for
the priorities. Does not make changes to the behavior of the component.
2019-05-22 10:04:51 +02:00
Nhat Nguyen bcbf1aff6b Peer recovery should flush at the end (#41660)
Flushing at the end of a peer recovery (if needed) can bring these
benefits:

1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.

2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)

Closes #40024
Closes #39588
2019-05-21 22:45:17 -04:00
Nhat Nguyen 84df48ccb3 Recovery with syncId should verify seqno infos (#41265)
This change verifies and aborts recovery if source and target have the
same syncId but different sequenceId. This commit also adds an upgrade
test to ensure that we always utilize syncId.
2019-05-21 22:44:17 -04:00
Nhat Nguyen 3573b1d0ce Skip global checkpoint sync for closed indices (#41874)
The verifying-before-close step ensures the global checkpoints on all
shard copies are in sync; thus, we don' t need to sync global
checkpoints for closed indices.

Relate #33888
2019-05-21 19:55:21 -04:00
Nhat Nguyen 4d55e9e070 Estimate num history ops should always use translog (#42211)
Currently, we ignore soft-deletes in peer recovery, thus
estimateNumberOfHistoryOperations should always use translog.

Relates #38904
2019-05-21 19:53:31 -04:00
Jason Tedor b510402b67
Fix off-by-one error in an index shard test
There is an off-by-one error in this test. It leads to the recovery
thread never being started, and that means joining on it will wait
indefinitely. This commit addresses that by fixing the off-by-one error.

Relates #42325
2019-05-21 19:20:29 -04:00
Nhat Nguyen 6808951e6f Mute testDelayedOperationsBeforeAndAfterRelocated
Tracked at #42325
2019-05-21 17:08:43 -04:00
Jason Tedor dd7a65fdf2
Fix compilation in IndexShardTests
I forgot to git add these before pushing, sorry. This commit fixes
compilation in IndexShardTests, they are needed here and not in master
due to differences in how Java infers types in generics between JDK 8
and JDK 11.
2019-05-21 16:12:27 -04:00
Jason Tedor f7ff0aff79
Execute actions under permit in primary mode only (#42241)
Today when executing an action on a primary shard under permit, we do
not enforce that the shard is in primary mode before executing the
action. This commit addresses this by wrapping actions to be executed
under permit in a check that the shard is in primary mode before
executing the action.
2019-05-21 15:54:31 -04:00
Jason Tedor 32b70ed34c
Avoid unnecessary persistence of retention leases (#42299)
Today we are persisting the retention leases at least every thirty
seconds by a scheduled background sync. This sync causes an fsync to
disk and when there are a large number of shards allocated to slow
disks, these fsyncs can pile up and can severely impact the system. This
commit addresses this by only persisting and fsyncing the retention
leases if they have changed since the last time that we persisted and
fsynced the retention leases.
2019-05-21 14:00:48 -04:00
Armin Braun ecd033bea6
Cleanup Various Uses of ActionListener (#40126) (#42274)
* Cleanup Various Uses of ActionListener

* Use shorter `map`, `runAfter` or `wrap` where functionally equivalent to anonymous class
* Use ActionRunnable where functionally equivalent
2019-05-21 17:20:52 +02:00
Henning Andersen 75425ae167 Remove 7.0.2 (#42282)
7.0.2 removed, since it will never be, fixing branch consistency check.
2019-05-21 15:52:58 +02:00
David Turner 7abeaba8bb Prevent in-place downgrades and invalid upgrades (#41731)
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.

This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.

This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
2019-05-21 08:04:30 +01:00
Jake Landis b0a25c3170
add 7.1.1 and 6.8.1 versions (#42251) 2019-05-20 17:58:24 -05:00
Ryan Ernst be515d7ce0 Validate non-secure settings are not in keystore (#42209)
Secure settings currently error if they exist inside elasticsearch.yml.
This commit adds validation that non-secure settings do not exist inside
the keystore.

closes #41831
2019-05-20 11:35:53 -07:00
Zachary Tong 6ae6f57d39
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.

This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed.  And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit).  This arrangement is very
error-prone for users.

This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.

The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.

The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
Alexander Reelsen c72c76b5ea Update to joda time 2.10.2 (#42199) 2019-05-20 16:58:54 +02:00
Zachary Tong 072a9bdf55 Fix FiltersAggregation NPE when `filters` is empty (#41459)
If `keyedFilters` is null it assumes there are unkeyed filters...which
will NPE if the unkeyed filters was actually empty.

This refactors to simplify the filter assignment a bit, adds an empty
check and tidies up some formatting.
2019-05-20 10:04:21 -04:00
Jim Ferenczi b7599472ac Fix random failure in SearchRequestTests#testRandomVersionSerialization (#42069)
This commit fixes a test bug that ends up comparing the result of two consecutive calls to System.currentTimeMillis that can be different
on slow CIs.

Closes #42064
2019-05-20 10:14:05 +02:00
Nhat Nguyen 0ec7986049 Enable debug log in testRetentionLeasesSyncOnRecovery
Relates #39105
2019-05-19 22:07:25 -04:00
Nhat Nguyen 6ffc6ea42e Don't verify evictions in testFilterCacheStats (#42091)
If a background merge and refresh happens after a search but before a
stats query, then evictions will be non-zero.

Closes #32506
2019-05-15 18:17:53 -04:00
Nhat Nguyen a75e916078 Adjust load and timeout in testShrinkIndexPrimaryTerm (#42098)
This test can create and shuffle 2*(3*5*7) = 210 shards which is quite
heavy for our CI. This commit reduces the load, so we don't timeout on
CI.

Closes #28153
2019-05-15 18:17:46 -04:00
Igor Motov 70ea3cf847
SQL: Add initial geo support (#42031) (#42135)
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2. 

Queries that are supported as a result of this initial implementation

Metadata commands

- `DESCRIBE table`  - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly. 

Returning geoshapes and geopoints from elasticsearch

- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;

Using geopoints to elasticsearch

- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes. 
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.

Limitations:

Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.

Relates to #29872
Backport of #42031
2019-05-14 18:57:12 -05:00
Jay Modi 327f44e051
Concurrent tests wait for threads to be ready (#42083)
This change updates tests that use a CountDownLatch to synchronize the
running of threads when testing concurrent operations so that we ensure
the thread has been fully created and run by the scheduler. Previously,
these tests used a latch with a value of 1 and the test thread counted
down while the threads performing concurrent operations just waited.
This change updates the value of the latch to be 1 + the number of
threads. Each thread counts down and then waits. This means that each
thread has been constructed and has started running. All threads will
have a common start point now.
2019-05-14 16:29:52 -04:00
David Turner 367e027962 Log cluster UUID when committed (#42065)
Today we do not expose the cluster UUID in any logs by default, but it would be
useful to see it. For instance if a user starts multiple nodes as separate
clusters then they will silently remain as separate clusters even if they are
subsequently reconfigured to look like a single cluster. This change logs the
committed cluster UUID the first time the node encounters it.
2019-05-14 05:35:14 -04:00
Yogesh Gaikwad 90dce0864a
Increase the sample space for random inner hits name generator (#42057) (#42072)
This commits changes the minimum length for inner hits
name to avoid name collision which sometimes failed the
test.
2019-05-12 10:32:02 +10:00
Andrei Stefan 912c6bdbff
Prevent order being lost for _nodes API filters (#42045) (#42089)
* Switch to using a list instead of a Set for the filters, so that the
order of these filters is kept.

(cherry picked from commit 74a743829799b64971e0ac5ae265f43f6c14e074)
2019-05-11 01:58:03 +03:00
Nhat Nguyen c19ea0a6f1 Remove global checkpoint assertion in peer recovery (#41987)
If remote recovery copies an index commit which has gaps in sequence
numbers to a follower; then these assertions (introduced in #40823)
don't hold for follower replicas.

Closes #41037
2019-05-10 14:38:35 -04:00
Christoph Büscher 3e59c31a12 Change IndexAnalyzers default analyzer access (#42011)
Currently IndexAnalyzers keeps the three default as separate class members
although they should refer to the same analyzers held in the additional
analyzers map under the default names. This assumption should be made more
explicit by keeping all analyzers in the map. This change adapts the constructor
to check all the default entries are there and the getters to reach into the map
with the default names when needed.
2019-05-10 18:08:51 +02:00
Jay Modi 80432a3552
Remove close method in PageCacheRecycler/Recycler (#41917)
The changes in #39317 brought to light some concurrency issues in the
close method of Recyclers as we do not wait for threads running in the
threadpool to be finished prior to the closing of the PageCacheRecycler
and the Recyclers that are used internally. #41695 was opened to
address the concurrent close issues but upon review, the closing of
these classes is not really needed as the instances should be become
available for garbage collection once there is no longer a reference to
the closed node.

Closes #41683
2019-05-10 08:56:05 -06:00
Alan Woodward 44c3418531 Simplify handling of keyword field normalizers (#42002)
We have a number of places in analysis-handling code where we check
if a field type is a keyword field, and if so then extract the normalizer rather
than pulling the index-time analyzer. However, a keyword normalizer is
really just a special case of an analyzer, so we should be able to simplify this
by setting the normalizer as the index-time analyzer at construction time.
2019-05-10 14:38:46 +01:00
Nhat Nguyen 809ed3b721 shouldRollGeneration should execute under read lock (#41696)
Translog#shouldRollGeneration should execute under the read lock since
it accesses the current writer.
2019-05-10 09:28:33 -04:00
David Turner 2a8a64d3f1 Remove extra `ms` from log message (#42068)
This log message logs a `TimeValue` which includes units, but also logs an
extra `ms`. This commit removes the extra `ms`.
2019-05-10 14:03:37 +01:00
Armin Braun ea7db2bb6a
Fix testCloseOrDeleteIndexDuringSnapshot (#42007)
* This test was resulting in a `PARTIAL` instead of a `SUCCESS` state for
the case of closing an index during snapshotting on 7.x
  * The reason for this is the changed default behaviour regarding
waiting for active shards between 8.0 and 7.x
  * Fixed by adjusting the waiting behaviour on the close index request
in the test
* Closes #39828
2019-05-10 11:59:20 +02:00
Armin Braun dc444cef49
Fix Race in Closing IndicesService.CacheCleaner (#42016) (#42052)
* When close becomes true while the management pool is shut down, we run
into an unhandled `EsRejectedExecutionException` that fails tests
* Found this while trying to reproduce #32506
   * Running the IndexStatsIT in a loop is a way of reproducing this
2019-05-10 09:29:27 +02:00
Tal Levy 5640197632
Refactor TransportSingleShardAction to serialize Writeable responses (#41985) (#42040)
Previously, TransportSingleShardAction required constructing a new
empty response object. This response object's Streamable readFrom
was used. As part of the migration to Writeable, the interface here
was updated to leverage Writeable.Reader.

relates to #34389.
2019-05-09 22:08:31 -07:00
Jay Modi 2998c107fb
Fix node close stopwatch usage (#41918)
The close method in Node uses a StopWatch to time to closing of
various services. However, the call to log the timing was made before
any of the services had been closed and therefore no timing would be
printed out. This change moves the timing log call to be a closeable
that is the last item closed.
2019-05-09 09:41:42 -06:00
Jay Modi f3bcc4fc22
Default seed address tests account for no IPv6 (#41971)
This change makes the default seed address tests account for the lack
of an IPv6 network. By default docker containers only run with IPv4 and
these tests fail in a vanilla installation of elasticsearch-ci. To
resolve this we only expect IPv6 seed addresses if IPv6 is available.

Relates #41404
2019-05-09 08:19:46 -06:00
David Kyle 256588d773 Mute IndexStatsIT#testFilterCacheStats
See https://github.com/elastic/elasticsearch/issues/32506
2019-05-09 13:49:47 +01:00
Jim Ferenczi b7c7ca8f09 Fix IAE on cross_fields query introduced in 7.0.1 (#41938)
If the max doc in the index is greater than the minimum total term frequency
among the requested fields we need to adjust max doc to be equal to the min ttf.
This was removed by mistake when fixing #41125.

Closes #41934
2019-05-09 14:25:46 +02:00
Alan Woodward 309e4a11b5 Cut AnalyzeResponse over to Writeable (#41915)
This commit makes AnalyzeResponse and its various helper classes implement
Writeable. The classes are also now immutable.

Relates to #34389
2019-05-09 13:09:23 +01:00
Jim Ferenczi a329aaec90 Fix assertion error when caching the result of a search in a read-only index (#41900)
The ReadOnlyEngine wraps its reader with a SoftDeletesDirectoryReaderWrapper if soft deletes
are enabled. However the wrapping is done on top of the ElasticsearchDirectoryReader and that
trips assertion later on since the cache key of these directories are different. This commit
changes the order of the wrapping to put the ElasticsearchDirectoryReader first in order to
ensure that it is always retrieved first when we unwrap the directory.

Closes #41795
2019-05-09 08:59:52 +02:00
Benjamin Trent edd6438e34
mute test related to #41967 (#41968) 2019-05-08 15:03:28 -05:00
William Brafford a2b7871f9f
Allow unknown task time in QueueResizingEsTPE (#41957)
* Allow unknown task time in QueueResizingEsTPE

The afterExecute method previously asserted that a TimedRunnable task
must have a positive execution time. However, the code in TimedRunnable
returns a value of -1 when a task time is unknown. Here, we expand the
logic in the assertion to allow for that possibility, and we don't
update our task time average if the value is negative.

* Add a failure flag to TimedRunnable

In order to be sure that a task has an execution time of -1 because of
a failure, I'm adding a failure flag boolean to the TimedRunnable class.
If execution time is negative for some other reason, an assertion will
fail.

Backport of #41810
Fixes #41448
2019-05-08 14:15:22 -04:00
David Roberts 452ee55cdb Make ISO8601 date parser accept timezone when time does not have seconds (#41896)
Prior to this change the ISO8601 date parser would only
parse an optional timezone if seconds were specified.
This change moves the timezone to the same level of
optional components as hour, so that timestamps without
minutes or seconds may optionally contain a timezone.
It also adds a unit test to cover all the supported
formats.
2019-05-08 13:50:53 +01:00
Yannick Welsch 957046dad0 Allow IDEA test runner to control number of test iterations (#41653)
Allows configuring the number of test iterations via IntelliJ's config dialog, instead of having to add it
manually via the tests.iters system property.
2019-05-08 13:57:29 +02:00
Armin Braun 5c824f3993
Reenable testCloseOrDeleteIndexDuringSnapshot (#41892)
* Relates #39828
2019-05-08 13:10:19 +02:00
Jim Ferenczi ca3d881716 Always set terminated_early if terminate_after is set in the search request (#40839)
* terminated_early should always be set in the response with terminate_after

Today we set `terminated_early` to true in the response if the query terminated
early due to `terminate_after`. However if `terminate_after` is smaller than
the number of documents in a shard we don't set the flag in the response indicating
that the query was exhaustive. This change fixes this disprepancy by setting
terminated_early to false in the response if the number of documents that match
the query is smaller than the provided `terminate_after` value.

Closes #33949
2019-05-08 12:26:38 +02:00
David Turner 4c909e93bb
Reject port ranges in `discovery.seed_hosts` (#41905)
Today Elasticsearch accepts, but silently ignores, port ranges in the
`discovery.seed_hosts` setting:

```
discovery.seed_hosts: 10.1.2.3:9300-9400
```

Silently ignoring part of a setting like this is trappy. With this change we
reject seed host addresses of this form.

Closes #40786
Backport of #41404
2019-05-08 08:34:32 +01:00
David Turner 935f70c05e Handle serialization exceptions during publication (#41781)
Today if an exception is thrown when serializing a cluster state during
publication then the master enters a poisoned state where it cannot publish any
more cluster states, but nor does it stand down as master, yielding repeated
exceptions of the following form:

```
failed to commit cluster state version [12345]
org.elasticsearch.cluster.coordination.FailedToCommitClusterStateException: publishing failed
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1045) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:252) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:238) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:142) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.0.0.jar:7.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: cannot start publishing next value before accepting previous one
        at org.elasticsearch.cluster.coordination.CoordinationState.handleClientValue(CoordinationState.java:280) ~[elasticsearch-7.0.0.jar:7.0.0]
        at org.elasticsearch.cluster.coordination.Coordinator.publish(Coordinator.java:1030) ~[elasticsearch-7.0.0.jar:7.0.0]
        ... 11 more
```

This is because it already created the publication request using
`CoordinationState#handleClientValue()` but then it fails before accepting it.
This commit addresses this by performing the serialization before calling
`handleClientValue()`.

Relates #41090, which was the source of such a serialization exception.
2019-05-07 17:53:12 +01:00
Alan Woodward 4cca1e8fff Correct spelling of MockLogAppender.PatternSeenEventExpectation (#41893)
The class was called PatternSeenEventExcpectation. This commit
is a straight class rename to correct the spelling.
2019-05-07 17:28:51 +01:00
Ryan Ernst e9e4bae683 Fix fractional seconds for strict_date_optional_time (#41871)
The fractional seconds portion of strict_date_optional_time was
accidentally copied from the printer, which always prints at least 3
fractional digits. This commit fixes the formatter to allow 1 or 2
fractional seconds.

closes #41633
2019-05-07 09:09:30 -07:00
Henning Andersen f068a22f5f SeqNo CAS linearizability (#38561)
Add a test that stresses concurrent writes using ifSeqno/ifPrimaryTerm to do CAS style updates. Use linearizability checker to verify linearizability. Linearizability of successful CAS'es is guaranteed.

Changed linearizability checker to allow collecting history concurrently.

Changed unresponsive network simulation to wake up immediately when network disruption is cleared to ensure tests proceed in a timely manner (and this also seems more likely to provoke issues).
2019-05-07 14:04:38 +02:00
Jim Ferenczi 70bf432fa8 Fix full text queries test that start with now (#41854)
Full text queries that start with now are not cacheable if they target a date field.
However we assume in the query builder tests that all queries are cacheable and this assumption
fails when the random generated query string starts with "now". This fails twice in several years
since the probability that a random string starts with "now" is low but this commit ensures that
 isCacheable is correctly checked for full text queries that fall into this edge case.

 Closes #41847
2019-05-06 19:08:30 +02:00
Przemyslaw Gomulka 79b7ce8697
Fix javadoc in WrapperQueryBuilder backport(41641) #41849
missing brackets in javadoc
backports #41641
2019-05-06 17:55:11 +02:00
Henning Andersen 227d5e15fb ReadOnlyEngine assertion fix (#41842)
Fixed the assertion that maxSeqNo == globalCheckpoint to actually check
against the global checkpoint.
2019-05-06 16:11:38 +02:00
Hicham Mallah 4a88da70c5 Add index name to cluster block exception (#41489)
Updates the error message to reveal the index name that is causing it.

Closes #40870
2019-05-04 19:11:59 -04:00
Nhat Nguyen c7924014fa
Verify consistency of version and source in disruption tests (#41614) (#41661)
With this change, we will verify the consistency of version and source
(besides id, seq_no, and term) of live documents between shard copies
at the end of disruption tests.
2019-05-03 18:47:14 -04:00
Nhat Nguyen e61469aae6 Noop peer recoveries on closed index (#41400)
If users close an index to change some non-dynamic index settings, then the current implementation forces replicas of that closed index to copy over segment files from the primary. With this change, we make peer recoveries of closed index skip both phases.

Relates #33888

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2019-05-03 12:07:37 -04:00
Issam EL-ATIF 23706d4cdf Update error message for allowed characters in aggregation names (#41573)
Exception message thrown when specifying illegal characters did
no accurately described the allowed characters.  This updates the 
error message to reflect reality (any character except [, ] and >)
2019-05-03 11:55:09 -04:00
Jason Tedor 03c959f188
Upgrade keystore on package install (#41755)
When Elasticsearch is run from a package installation, the running
process does not have permissions to write to the keystore. This is
because of the root:root ownership of /etc/elasticsearch. This is why we
create the keystore if it does not exist during package installation. If
the keystore needs to be upgraded, that is currently done by the running
Elasticsearch process. Yet, as just mentioned, the Elasticsearch process
would not have permissions to do that during runtime. Instead, this
needs to be done during package upgrade. This commit adds an upgrade
command to the keystore CLI for this purpose, and that is invoked during
package upgrade if the keystore already exists. This ensures that we are
always on the latest keystore format before the Elasticsearch process is
invoked, and therefore no upgrade would be needed then. While this bug
has always existed, we have not heard of reports of it in practice. Yet,
this bug becomes a lot more likely with a recent change to the format of
the keystore to remove the distinction between file and string entries.
2019-05-03 10:34:30 -04:00
David Turner 873d0020a5 Reject null customs at build time (#41782)
Today you can add a null `Custom` to the cluster state or its metadata, but
attempting to publish such a cluster state will fail. Unfortunately, the
publication-time failure gives very little information about the source of the
problem. This change causes the failure to manifest earlier and adds
information about which `Custom` was null in order to simplify the
investigation.

Relates #41090.
2019-05-03 14:52:32 +02:00
Jack Conradson 025619bbf1 Improve error message for ln/log with negative results in function score
This changes the error message for a negative result in a function score when 
using the ln modifier to suggest using ln1p or ln2p when a negative result 
occurs in a function score and for the log modifier to suggest using log1p or 
log2p.

This relates to #41509
2019-05-02 16:31:25 -07:00
Jason Tedor d0f071236a
Simplify filtering addresses on interfaces (#41758)
This commit is a refactoring of how we filter addresses on
interfaces. In particular, we refactor all of these methods into a
common private method. We also change the order of logic to first check
if an address matches our filter and then check if the interface is
up. This is to possibly avoid problems we are seeing where devices are
flapping up and down while we are checking for loopback addresses. We do
not expect the loopback device to flap up and down so by reversing the
logic here we avoid that problem on CI machines. Finally, we expand the
error message when this does occur so that we know which device is
flapping.
2019-05-02 16:36:27 -04:00
Colin Goodheart-Smithe ab9154005b
Adds version 6.7.3 2019-05-02 17:36:23 +01:00
Tim Brooks b4bcbf9f64
Support http read timeouts for transport-nio (#41466)
This is related to #27260. Currently there is a setting
http.read_timeout that allows users to define a read timeout for the
http transport. This commit implements support for this functionality
with the transport-nio plugin. The behavior here is that a repeating
task will be scheduled for the interval defined. If there have been
no requests received since the last run and there are no inflight
requests, the channel will be closed.
2019-05-02 09:48:52 -06:00
David Turner b189596631 Add details to BulkShardRequest#getDescription() (#41711)
Today a bulk shard request appears as follows in the detailed task list:

    requests[42], index[my_index]

This change adds the shard index and refresh policy too:

    requests[42], index[my_index][2], refresh[IMMEDIATE]
2019-05-02 08:29:25 +02:00
Andy Bristol b9e44288d3 mute NodeTests#testCloseOnInterruptibleTask
For #41448
2019-05-01 13:24:22 -07:00
Jason Tedor 39b0b5809d
Fix minimum compatible version after 6.8
This commit fixes the minimum compatible version after the introduction
of 6.8.
2019-05-01 16:21:13 -04:00
Jay Modi 7f7eb7b679 Add version 7.0.2 to 7.x branch (#41715) 2019-05-01 15:23:53 -04:00
Jason Tedor f08ac103ee
Add 6.8 version constant
This commit adds the 6.8 version constant to the 7.x branch.
2019-05-01 13:38:58 -04:00
Jason Tedor 7f3ab4524f
Bump 7.x branch to version 7.2.0
This commit adds the 7.2.0 version constant to the 7.x branch, and bumps
BWC logic accordingly.
2019-05-01 13:38:57 -04:00
Henning Andersen c6abe74dd6
Close and acquire commit during reset engine fix (#41584) (#41709)
If closing a shard while resetting engine,
IndexEventListener.afterIndexShardClosed would be called while there is
still an active IndexWriter on the shard. For integration tests, this
leads to an exception during check index called from MockFSIndexStore
.Listener. Fixed.

Relates to #38561
2019-05-01 15:22:24 +02:00
Jason Tedor 26c72c96bd
Fix imports in KeyStoreWrapperTests
This commit addresses a checkstyle violation in KeyStoreWrapperTests,
removing a leftover import.
2019-05-01 07:21:23 -04:00
Jason Tedor 0b46a62f6b
Drop distinction in entries for keystore (#41701)
Today we allow adding entries from a file or from a string, yet we
internally maintain this distinction such that if you try to add a value
from a file for a setting that expects a string or add a value from a
string for a setting that expects a file, you will have a bad time. This
causes a pain for operators such that for each setting they need to know
this difference. Yet, we do not need to maintain this distinction
internally as they are bytes after all. This commit removes that
distinction and includes logic to upgrade legacy keystores.
2019-05-01 07:02:04 -04:00
Nhat Nguyen 887f3f2c83 Simplify initialization of max_seq_no of updates (#41161)
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates #33842
2019-04-30 15:14:52 -04:00
Igor Motov 10ab838106
Geo: Add GeoJson parser to libs/geo classes (#41575) (#41657)
Adds GeoJson parser for Geometry classes defined in libs/geo.

Relates #40908 and #29872
2019-04-29 19:43:31 -04:00
Alan Woodward a01f451ef7 Limit complexity of IntervalQueryBuilderTests#testRandomSource() (#41538)
IntervalsSources can throw IllegalArgumentExceptions if they would produce
too many disjunctions.  To mitigate against this when building random
sources, we limit the depth of the randomly generated source to four
nested sources

Fixes #41402
2019-04-29 13:31:19 +01:00
Dan Hermann b23709b178
Applies the same naming restrictions to repositories as to snapshots except that leading underscores and uppercase characters are permitted. (#41585)
Fixes #40817.
2019-04-29 07:31:01 -05:00
Armin Braun 6e51b6f96d
Add Repository Consistency Assertion to SnapshotResiliencyTests (#41631)
* Add Repository Consistency Assertion to SnapshotResiliencyTests (#40857)

* Add Repository Consistency Assertion to SnapshotResiliencyTests

* Add some quick validation on not leaving behind any dangling metadata or dangling indices to the snapshot resiliency tests
   * Added todo about expanding this assertion further

* Fix SnapshotResiliencyTest Repo Consistency Check (#41332)

* Fix SnapshotResiliencyTest Repo Consistency Check

* Due to the random creation of an empty `extra0` file by the Lucene mockFS we see broken tests because we use the existence of an index folder in assertions and the index deletion doesn't go through if there are extra files in an index folder
  * Fixed by removing the `extra0` file and resulting empty directory trees before asserting repo consistency
* Closes #41326

* Reenable SnapshotResiliency Test (#41437)

This was fixed in https://github.com/elastic/elasticsearch/pull/41332
but I forgot to reenable the test.

* fix compile on java8
2019-04-29 12:01:58 +02:00
Nhat Nguyen 615a0211f0 Recovery should not indefinitely retry on mapping error (#41099)
A stuck peer recovery in #40913 reveals that we indefinitely retry on
new cluster states if indexing translog operations hits a mapper
exception. We should not wait and retry if the mapping on the target is
as recent as the mapping that the primary used to index the replaying
operations.

Relates #40913
2019-04-27 10:55:08 -04:00
Michael Morello 75283294f5 Fix multi-node parsing in voting config exclusions REST API (#41588)
Fixes an issue where multiple nodes where not properly parsed in the voting config exclusions REST API.

Closes #41587
2019-04-27 12:20:03 +02:00
Nick Knize 113b24be4b Refactor GeoHashUtils (#40869)
This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.
2019-04-26 10:06:36 -05:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Armin Braun 7824f60a34
Simplify Snapshot Resiliency Test (#40930) (#41565)
* Thanks to #39793 dynamic mapping updates don't contain blocking operations anymore so we don't have to manually put the mapping in this test and can keep it a little simpler
2019-04-26 10:59:09 +02:00
Christoph Büscher 078936b8f5 Remove search analyzers from DocumentFieldMappers (#41484)
These references seem to be unused except for tests and should be removed to
keep the places we store analyzers limited.
2019-04-26 09:48:48 +02:00
Armin Braun 6a24fd3f26
Add Restore Operation to SnapshotResiliencyTests (#40634) (#41546)
* Add Restore Operation to SnapshotResiliencyTests

* Expand the successful snapshot test case to also include restoring the snapshop
  * Add indexing of documents as well to be able to meaningfully verify the restore
* This is part of the larger effort to test eventually consistent blob stores in #39504
2019-04-26 09:04:34 +02:00
Christoph Büscher 52495843cc [Docs] Fix common word repetitions (#39703) 2019-04-25 20:47:47 +02:00
Armin Braun 23b3741618
Remove Exists Check from S3 Repository Deletes (#40931) (#41534)
* The check doesn't add much if anything practically, since the S3 repository is eventually consistent and we only log the non-existence of a blob anyway
  * We don't do the check on writes for this very reason and documented it as such
  * Removing the check saves one API call per single delete speeding up the deletion process and lowering costs
2019-04-25 18:25:03 +02:00
Jim Ferenczi 6184efaff6
Handle unmapped fields in _field_caps API (#34071) (#41426)
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
2019-04-25 18:13:48 +02:00
Armin Braun 40aef2b8aa
Introduce Delegating ActionListener Wrappers (#40129) (#41527)
* Introduce Delegating ActionListener Wrappers
* Dry up use cases of ActionListener that simply pass through the response or exception to another listener
2019-04-25 16:05:04 +02:00
Ignacio Vera d119abdf96
Improve accuracy for Geo Centroid Aggregation (#41514)
keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.
2019-04-25 15:25:48 +02:00
Armin Braun cd830b53e2
Name Snapshot Data Blobs by UUID (#40652) (#41523)
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in #38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
2019-04-25 13:18:03 +02:00
Luca Cavanna 8a0e5f7b87
Deprecate support for first line empty in msearch API (#41442)
In order to support empty action metadata in the first msearch item,
we need to remove support for prepending msearch request body with an
empty line, which prevents us from parsing the empty line as action
metadata for the first search item.

Relates to #41011
2019-04-25 12:45:18 +02:00
Przemyslaw Gomulka 906f88029b
Remove the test which is testing java and joda api backport(#41493) #41518
The test is testing the java time API and fails in case it hits daylight saving time changes.
Java time has the right implementation and we don't need to test this.
more details on how the test was affected by the DST change on this comment
closes #39617
backport(#41493)
2019-04-25 12:21:01 +02:00
Armin Braun 7c819fd2aa
Fix BulkRejectionIT (#41446) (#41500)
* Due to #40866 one of the two parallel bulk requests can randomly be
rejected outright when the write queue is full already, we can catch
this situation and ignore it since we can still have the rejection for
the dynamic mapping udate for the other reuqest and it's somewhat rare
to run into this anyway
* Closes #41363
2019-04-24 20:46:21 +02:00
Zachary Tong ec5dd0594f Disallow null/empty or duplicate composite sources (#41359)
Adds some validation to prevent duplicate source names from being
used in the composite agg.

Also refactored to use a ConstructingObjectParser and removed the
private ctor and setter for sources, making it mandatory.
2019-04-24 13:23:31 -04:00
Armin Braun 1db9166ea0
Fix Broken Index Shard Snapshot File Preventing Snapshot Creation (#41310) (#41473)
* The problem here is that if we run into a corrupted index-N file, instead of generating a new index-(N+1) file, we instead set the newest index generation to -1 and thus tried to create `index-0`
   * If `index-0` is corrupt, this prevents us from ever creating a new snapshot using the broken shard, because we are unable to create `index-0` since it already exists
   * Fixed by still using the index generation for naming the next index file, even if it was a broken index file
* Added test that makes sure restoring as well as snapshotting on top of the broken shard index file work as expected
* closes #41304
2019-04-24 18:39:17 +02:00
Armin Braun 381b8e2ece
Fix BulkProcessor Retry ITs (#41338) (#41472)
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes #41324
2019-04-24 13:46:32 +02:00
Jason Tedor 65af47eb31
Introduce aliases version (#41397)
This commit introduces aliases versions to index metadata. This will be
useful in CCR when we replicate aliases.
2019-04-23 12:19:11 -04:00
David Roberts 7e2aec022d [TEST] Mute BulkRejectionIT.testBulkRejectionAfterDynamicMappingUpdate
Due to https://github.com/elastic/elasticsearch/issues/41363
2019-04-23 15:58:38 +01:00
David Roberts d8a2970fa4 [TEST] Mute RemoteClusterServiceTests.testCollectNodes
Due to https://github.com/elastic/elasticsearch/issues/41067
2019-04-23 15:13:01 +01:00
David Turner 0bb15d3dac Allow ops to be blocked after primary promotion (#41360)
Today we assert that there are no operations in flight in this test. However we
will sometimes be in a situation where the operations are blocked, and we
distinguish these cases since #41271 causing the assertion to fail. This commit
addresses this by allowing operations to be blocked sometimes after a primary
promotion.

Fixes #41333.
2019-04-19 07:48:43 +01:00
Jim Ferenczi 8f73e1e883 Fix unmapped field handling in the composite aggregation (#41280)
The `composite` aggregation maps unknown fields as numerics, this means that
any `after` value that is set on a query with an unmapped field on some indices
will fail if the provided value is not numeric. This commit changes the default
value source to use keyword instead in order to be able to parse any type of after
values.
2019-04-18 23:08:13 +02:00
Jim Ferenczi 754037b71e Unified highlighter should ignore terms that targets the _id field (#41275)
The `_id` field uses a binary encoding to index terms that is not compatible with
the utf8 automaton that the unified highlighter creates to reanalyze the input.
For these reason this commit ignores terms that target the `_id` field when
`require_field_match` is set to false.

Closes #37525
2019-04-18 22:31:23 +02:00
Jim Ferenczi 068f8ba223 more_like_this query to throw an error if the like fields is not provided (#40632)
With the removal of the `_all` field the `mlt` query cannot infer a field name
to use to analyze the provided (un)like text if the `fields` parameter is not
explicitly set in the query and the `index.query.default_field` is not changed
in the index settings (by default it is set to `*`). For this reason the like text
is ignored and queries are only built from the provided document ids.
This change fixes this bug by throwing an error if the fields option is not set
and the `index.query.default_field` is equals to `*`. The error is thrown only
if like or unlike texts are provided in the query.
2019-04-18 22:30:22 +02:00
Simon Willnauer 11dc9fe249 Mark searcher as accessed in acquireSearcher (#41335)
This fixes an issue where every N seconds a slow search request is triggered
since the searcher access time is not set unless the shard is idle. This change
moves to a more pro-active approach setting the searcher as accessed all the time.
2019-04-18 19:14:50 +02:00
Adrien Grand a699cb76a5
Fix javadoc tag. (#41330)
s/returns/return/
2019-04-18 14:41:09 +02:00
Armin Braun 389a13b68e
Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325) (#41331)
* For #41324
2019-04-18 11:55:28 +02:00
Alpar Torok a4a4259cac Mute failing test
Tracking #41326
2019-04-18 09:26:20 +03:00
Armin Braun c77e10b16b
Handle Bulk Requests on Write Threadpool (#40866) (#41315)
* Bulk requests can be thousands of items large and take more than O(10ms) time to handle => we should not handle them on the transport threadpool to not block select loops
* relates #39128
* relates #39658
2019-04-18 07:10:23 +02:00
David Turner 946baf87d3 Assert TransportReplicationActions acquire permits (#41271)
Today we do not distinguish "no operations in flight" from "operations are
blocked", since both return `0` from `IndexShard#getActiveOperationsCount()`.
We therefore cannot assert that every `TransportReplicationAction` performs its
actions under permit(s). This commit fixes this by returning
`IndexShard#OPERATIONS_BLOCKED` if operations are blocked, allowing these two
cases to be distinguished.
2019-04-17 23:05:03 +02:00
Zachary Tong 7e62ff2823 [Rollup] Validate timezones based on rules not string comparision (#36237)
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
2019-04-17 13:46:44 -04:00
Christoph Büscher 4d964194db Fix error applying `ignore_malformed` to boolean values (#41261)
The `ignore_malformed` option currently works on numeric fields only when the
bad value isn't a string value but not if it is a boolean. In this case we get a
parsing error from the xContent parser which we need to catch in addition to the
field mapper.

Closes #11498
2019-04-17 18:44:57 +02:00
David Turner 2670ed2f8f Assert the stability of custom search preferences (#41150)
Today the `?preference=custom_string_value` search preference will only change
its choice of a shard copy if something changes the `IndexShardRoutingTable`
for that specific shard. Users can use this behaviour to route searches to a
consistent set of shard copies, which means they can reliably hit copies with
hot caches, and use the other copies only for redundancy in case of failure.
However we do not assert this property anywhere, so we might break it in
future.

This commit adds a test that shows that searches are routed consistently even
if other indices are created/rebalanced/deleted.

Relates https://discuss.elastic.co/t/176598, #41115, #26791
2019-04-17 17:47:44 +02:00
Nhat Nguyen 2ee87c99d9 Fix bwc version of sanity check of read only engine
Relates #41041
2019-04-17 10:25:47 -04:00
Nhat Nguyen aa0c957a4a Do not trim unsafe commits when open readonly engine (#41041)
Today we always trim unsafe commits (whose max_seq_no >= global
checkpoint) before starting a read-write or read-only engine. This is
mandatory for read-write engines because they must start with the safe
commit. This is also fine for read-only engines since most of the cases
we should have exactly one commit after closing an index (trimming is a
noop). However, this is dangerous for following indices which might have
more than one commits when they are being closed.

With this change, we move the trimming logic to the ctor of InternalEngine
so we won't trim anything if we are going to open a read-only engine.
2019-04-17 10:16:12 -04:00
Adrien Grand f7e590ce0d
ProfileScorer should propagate `setMinCompetitiveScore`. (#40958) (#41302)
Currently enabling profiling disables top-hits optimizations, which is
unfortunate: it would be nice to be able to notice the difference in method
counts and timings depending on whether total hit counts are requested.
2019-04-17 16:11:14 +02:00
Adrien Grand 9fd5237fd4
Clean up Node#close. (#39317) (#41301)
`Node#close` is pretty hard to rely on today:
 - it might swallow exceptions
 - it waits for 10 seconds for threads to terminate but doesn't signal anything
   if threads are still not terminated after 10 seconds

This commit makes `IOException`s propagated and splits `Node#close` into
`Node#close` and `Node#awaitClose` so that the decision what to do if a node
takes too long to close can be done on top of `Node#close`.

It also adds synchronization to lifecycle transitions to make them atomic. I
don't think it is a source of problems today, but it makes things easier to
reason about.
2019-04-17 16:10:53 +02:00
Jason Tedor 6566979c18
Always check for archiving broken index settings (#41209)
Today we check if an index has broken settings when checking if an index
needs to be upgraded. However, it can be the case that an index setting
became broken even if an index is already upgraded to the current
version if the user removed a plugin (or downgraded from the default
distribution to the non-default distribution) while on the same version
of Elasticsearch. In this case, some registered settings would go
missing and the index would now be broken. Yet, we miss this check and
instead of archiving the settings, the index becomes unassigned due to
the missing settings. This commit addresses this by checking for broken
settings whether or not the index is upgraded.
2019-04-17 07:00:23 -04:00
Christoph Büscher badb7a22e0 Some cleanups in NoisyChannelSpellChecker (#40949)
One of the two #getCorrections methods is only used in tests, so we can move
it and any of the required helper methods to that test. Also reducing the
visibility of several methods to package private since the class isn't used
elsewhere outside the package.
2019-04-17 10:22:12 +02:00
David Turner bfa06d963e Do not create missing directories in readonly repo (#41249)
Today we erroneously look for a node setting called `readonly` when deciding
whether or not to create a missing directory in a filesystem repository. This
change fixes this by using the repository setting instead.

Closes #41009
Relates #26909
2019-04-17 09:43:14 +02:00
Yogesh Gaikwad 6a552c05fe
Use alias name from rollover request to query indices stats (#40774) (#41284)
In `TransportRolloverAction` before doing rollover we resolve
source index name (write index) from the alias in the rollover request.
Before evaluating the conditions and executing rollover action, we
retrieve stats, but to do so we used the source index name
resolved from the alias instead of alias from the index.
This fails when the user is assigned a role with index privilege on the
alias instead of the concrete index. This commit fixes this by using
the alias from the request.
After this change, verified that when we retrieve all the stats (including write + read indexes)
we are considering only source index.

Closes #40771
2019-04-17 14:15:05 +10:00
Jim Ferenczi 043c1f5d42 Unified highlighter should respect no_match_size with number_of_fragments set to 0 (#41069)
The unified highlighter returns the first sentence of the text when number_of_fragments
is set to 0 (full highlighting). This is a legacy of the removed postings highlighter
that was based on sentence break only. This commit changes this behavior in order
to respect the provided no_match_size value when number_of_fragments is set to 0.
This means that the behavior will be consistent for any value of the number_of_fragments option.

Closes #41066
2019-04-16 19:25:25 +02:00
Armin Braun c4e84e2b34
Add Bulk Delete Api to BlobStore (#40322) (#41253)
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes #40250
2019-04-16 17:19:05 +02:00
Jim Ferenczi c22a2cea12 BlendedTermQuery should ignore fields that don't exists in the index (#41125)
Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes #41118
2019-04-16 16:25:42 +02:00
David Turner 8577bbd73b Inline TransportReplAct#createReplicatedOperation (#41197)
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates #40706.
2019-04-16 13:36:29 +01:00
David Turner 10e58210a0
Validate cluster UUID when joining Zen1 cluster (#41063)
Today we fail to join a Zen2 cluster if the cluster UUID does not match our
own, but we do not perform the same validation when joining a Zen1 cluster.
This means that a Zen2 node will pass join validation and be added to a Zen1
cluster but will reject all cluster states from the master.

Relates #37775
2019-04-16 12:49:47 +01:00
Nhat Nguyen 8ee84f2268 Correct flush parameters in engine test
Since #40213, we forbid a combination of flush parameters: force=true
and wait_if_ongoing=false.

Closes #41236
2019-04-16 05:04:31 -04:00
Christoph Büscher f8161ffa88 Fix some `range` query edge cases (#41160)
Currently we throw an error when a range querys minimum value exceeds the
maximum value due to the fact that they are neighbouring values and both upper
and lower value are excluded from the interval.

Since this is a condition that the user usually doesn't specify conciously (at
least in the case of float and double values its difficult to see which values
are adjacent) we should ignore those "wrong" intervals and create a
MatchNoDocsQuery in those cases.

We should still throw errors with an actionable message if the user specifies
the query interval in a way that min value > max value. This PR adds those
checks and tests for those cases.

Closes #40937
2019-04-16 10:56:13 +02:00
Tim Brooks ad3b7abaa3
Deprecate old transport settings (#41229)
This is related to #36652. We intend to remove a number of old transport
settings in 8.0. This commit deprecates those settings for 7.x.
2019-04-15 21:43:09 -06:00
Tim Brooks 56c00eecbc
Remove string usages of old transport settings (#41207)
This is related to #36652. We intend to deprecate a number of transport
settings in 7.x and remove them in 8.0. This commit removes the string
usages of these settings.
2019-04-15 16:54:24 -06:00
Zachary Tong f19b052e03 Better error messages when pipelines reference incompatible aggs (#40068)
Pipelines require single-valued agg or a numeric to be returned.
If they don't get that, they throw an exception.  Unfortunately, this
exception text is very confusing to users because it usually arises
from pathing "through" multiple terms aggs.  The final target is a numeric,
but it's the intermediary aggs that cause the problem.

This commit adds the current agg name to the exception message
so the user knows which "level" is the issue.
2019-04-15 10:35:53 -04:00
Jim Ferenczi d30fec4914 Full text queries should not always ignore unmapped fields (#41062)
Full text queries ignore unmapped fields since https://github.com/elastic/elasticsearch/issues/41022
even if all fields in the query are unmapped.
This change makes sure that we ignore unmapped fields only if they are mixed
with mapped fields and returns a MatchNoDocsQuery otherwise.

Closes #41022
2019-04-15 12:16:50 +02:00
Christoph Büscher 2980a6c70f Clarify some ToXContent implementations behaviour (#41000)
This change adds either ToXContentObject or ToXContentFragment to classes
directly implementing ToXContent currently. This helps in reasoning about
whether those implementations output full xcontent object or just fragments.

Relates to #16347
2019-04-15 09:42:08 +02:00
Yogesh Gaikwad e7375368d6
Remove nested loop in IndicesStatsResponse (#40988) (#41138)
This commit removes nested loop in `getIndices`.
2019-04-13 04:36:29 +10:00
Ignacio Vera 8af930c468
Improve error message when polygons contains twice the same point in no-consecutive position (#41051) (#41133)
When a polygon contains a self-intersection due to have twice the same point in no-consecutive position, the polygon builder tries to split the polygon. During the split one of the polygons become invalid as it is not closed and an error is thrown which is not related to the real issue.

We detect this situation now and throw a more meaningful error.
2019-04-12 09:16:33 +02:00
Nhat Nguyen e9999dfa1d
Init global checkpoint after copy commit in peer recovery (#40823)
Today a new replica of a closed index does not have a safe commit
invariant when its engine is opened because we won't initialize the
global checkpoint on a recovering replica until the finalize step. With
this change, we can achieve that property by creating a new translog
with the global checkpoint from the primary at the end of phase 1.
2019-04-11 22:18:31 -04:00
Antonio Matarrese 79c7a57737 Use the breadth first collection mode for significant terms aggs. (#29042)
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.

Closes #28652.
2019-04-11 15:56:02 -07:00
Nhat Nguyen 0f496842fd Fix msu assertion in restore shard history test
Since #40249, we always reinitialize max_seq_no_of_updates to max_seq_no
when a promoting primary restores history regardless of whether it did
rollback previously or not.

Closes #40929
2019-04-11 18:44:13 -04:00
Ryan Ernst 5cdd87deb7 Remove settings members from Node (#40811)
This commit removes the settings member variable from Node.
This member made it confusing which settings should actually be looked
at. Now all settings are accessed through the final environment.
2019-04-11 13:59:54 -07:00
David Turner b522de975d Move primary term from replicas proxy to repl op (#41119)
A small refactoring that removes the primaryTerm field from ReplicasProxy and
instead passes it directly in to the methods that need it. Relates #40706.
2019-04-11 21:19:27 +01:00
Armin Braun 233df6b73b
Make Transport Shard Bulk Action Async (#39793) (#41112)
This is a dependency of #39504

Motivation:
By refactoring `TransportShardBulkAction#shardOperationOnPrimary` to async, we enable using `DeterministicTaskQueue` based tests to run indexing operations. This was previously impossible since we were blocking on the `write` thread until the `update` thread finished the mapping update.
With this change, the mapping update will trigger a new task in the `write` queue instead.
This change significantly enhances the amount of coverage we get from `SnapshotResiliencyTests` (and other potential future tests) when it comes to tracking down concurrency issues with distributed state machines.

The logical change is effectively all in `TransportShardBulkAction`, the rest of the changes is then simply mechanically moving the caller code and tests to being async and passing the `ActionListener` down.

Since the move to async would've added more parameters to the `private static` steps in this logic, I decided to inline and dry up (between delete and update) the logic as much as I could instead of passing the listener + wait-consumer down through all of them.
2019-04-11 16:01:52 +02:00
Jason Tedor 24446ceae0
Add packaging to cluster stats response (#41048)
This commit adds a packaging_types field to the cluster stats response
that outlines the build flavors and types present in a cluster.
2019-04-10 13:47:19 -04:00
Zachary Tong e611334b2b Add 7.0.1 version constant 2019-04-10 11:32:53 -04:00
Dimitrios Liappis 799541e068
Mute DateTimeUnitTests.testConversion (#40738)
Due to #39617

Backport of  #40086
2019-04-10 16:37:16 +03:00
Jim Ferenczi 4263a28039 Fix rewrite of inner queries in DisMaxQueryBuilder (#40956)
This commit implements missing rewrite for the DisMaxQueryBuilder.

Closes #40953
2019-04-10 11:38:16 +02:00
Jason Tedor 3aae98f922
Add debug logging for leases sync on recovery test
This commit adds some debug logging for a retention leases sync on
recovery test.
2019-04-09 22:59:22 -04:00
Julie Tibshirani d38214060e Mute ClusterDisruptionIT#testCannotJoinIfMasterLostDataFolder.
Tracked in #41047.
2019-04-09 17:36:21 -07:00
Julie Tibshirani a417905098 Mute RareClusterStateIT#testDelayedMappingPropagationOnPrimary as we await a fix.
Tracked in #41030.
2019-04-09 13:41:53 -07:00
Julie Tibshirani a0fc2461d7 Mute DedicatedClusterSnapshotRestoreIT#testSnapshotWithStuckNode as we await a fix. 2019-04-09 12:03:33 -07:00
Mark Vieira 1287c7d91f
[Backport] Replace usages RandomizedTestingTask with built-in Gradle Test (#40978) (#40993)
* Replace usages RandomizedTestingTask with built-in Gradle Test (#40978)

This commit replaces the existing RandomizedTestingTask and supporting code with Gradle's built-in JUnit support via the Test task type. Additionally, the previous workaround to disable all tasks named "test" and create new unit testing tasks named "unitTest" has been removed such that the "test" task now runs unit tests as per the normal Gradle Java plugin conventions.

(cherry picked from commit 323f312bbc829a63056a79ebe45adced5099f6e6)

* Fix forking JVM runner

* Don't bump shadow plugin version
2019-04-09 11:52:50 -07:00
Jason Tedor 321f93c4f9
Wait for all listeners in checkpoint listeners test
It could be that we try to shutdown the executor pool before all the
listeners have been invoked. It can happen that one was not invoked if
it timed out and was in the process of being notified that it timed out
on the executor. If we do this shutdown then, a listener will be met
with rejected execution exception. To address this, we first wait until
all listeners have been notified (or timed out) before proceeding with
shutting down the executor.

Relates #40970
2019-04-09 14:27:09 -04:00
Henning Andersen c5a77e5d8c Node repurpose tool docs (#40525)
Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used.

Co-Authored-By: David Turner <david.turner@elastic.co>
2019-04-09 15:07:37 +02:00
David Turner 08ecdfe20e Short-circuit rebalancing when disabled (#40966)
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.

Relates #40942 which was reverted because of egregiously faulty tests.
2019-04-09 07:59:52 +01:00
Nhat Nguyen 69421612e5 Mute testRecoverMissingAnalyzer
Tracked at #40867
2019-04-08 22:47:20 -04:00
Nhat Nguyen 713e5c987b Adjust init map size of user data of index commit (#40965)
The number of user data attributes of an index commit has increased 
from 6 to 8, but we forgot to adjust. This change increases the initial 
size of that map to avoid resizing.
2019-04-08 22:47:20 -04:00
Christoph Büscher 335955b874 Some internal refactorings in AnalysisRegistry (#40609)
Reducing some methods scope and marking them as static where possible. Removing
"alias" support from AnalysisRegistry#produceAnalyze and changing that method to
return a NamedAnalyzer instead of having a side effect on the analyzer map passed in.
Also, CustomAnalyzerProvider doesn't seem to need the `environment` field.
2019-04-08 20:48:34 +02:00
David Turner 8eef92fafd Revert "Short-circuit rebalancing when disabled (#40942)"
This reverts commit f78e6ef73b.
2019-04-08 15:58:56 +01:00
David Turner f78e6ef73b Short-circuit rebalancing when disabled (#40942)
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.
2019-04-08 14:57:29 +01:00
Jim Ferenczi bc0fe7d64d Handle min_doc_freq in phrase suggester (#40840)
The phrase suggesters have an option to remove terms that have
a frequency lower than a provided min_doc_freq. However this value is
overwritten by the frequency of the original term in the popular mode.
This change ensures that we keep the maximum value between the provided
min_doc_value and the original term frequency as a threshold to select
candidates.

Fixes #16764
2019-04-08 12:23:54 +02:00
Jason Tedor 4163e59768
Mute failing IndexShard local history test
This test fails reliably with, so this commit mutes that test until a
fix is available.
2019-04-07 10:17:46 -04:00
Jason Tedor 6900399144
Be lenient when parsing build flavor and type on the wire (#40734)
Today we are strict when parsing build flavor and types off the
wire. This means that if a later version introduces a new build flavor
or type, an older version would not be able to parse what that new
version is sending. For a practical example of this, we recently added
the build type "docker", and this means that in a rolling upgrade
scenario older nodes would not be able to understand the build type that
the newer node is sending. This breaks clusters and is bad. We do not
normally think of adding a new enumeration value as being a
serialization breaking change, it is just not a lesson that we have
learned before. We should be lenient here though, so that we can add
future changes without running the risk of breaking ourselves
horribly. It is either that, or we have super-strict testing
infrastructure here yet still I fear the possibility of mistakes. This
commit changes the parsing of build flavor and build type so that we are
still strict at startup, yet we are lenient with values coming across
the wire. This will help avoid us breaking rolling upgrades, or clients
that are on an older version.
2019-04-06 17:24:16 -04:00
Jason Tedor e44e84ab42
Suppress lease background sync failures if stopping (#40902)
If the transport service is stopped, likely because we are shutting
down, and a retention lease background sync fires the logs will display
a warn message and stacktrace. Yet, this situaton is harmless and can
happen as a normal course of business when shutting down. This commit
suppresses the log messages in this case.
2019-04-06 10:18:52 -04:00
David Turner 2ff19bc1b7
Use Writeable for TransportReplAction derivatives (#40905)
Relates #34389, backport of #40894.
2019-04-05 19:10:10 +01:00
Colin Goodheart-Smithe 4452e8e10f Mutes GatewayIndexStateIT.testRecoverBrokenIndexMetadata 2019-04-05 10:53:52 -04:00
David Turner 922a70ce32 Remove unused import
Relates #40863
2019-04-05 09:21:34 +01:00
David Turner d8956d2601 Remove test-only customisation from TransReplAct (#40863)
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).

This commit removes these customisation points and injects the test-only
behaviour using other techniques.
2019-04-05 08:54:41 +01:00
Martijn van Groningen 809a5f13a4
Make -try xlint warning disabled by default. (#40833)
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.

This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).

Relates to #40366
2019-04-05 08:02:26 +02:00
Nhat Nguyen 5a2eb07c0e Primary replica resync should not send ops without seqno (#40433)
Primary-replica resync in a mixed-cluster between 6.x and 5.6 can send
operations without sequence number to a replica which already processed
operations with sequence number. This leads to the failure of that
replica for we trip the sequence number assertion when writing resync
operations without sequence number to translog.
2019-04-04 21:54:31 -04:00
Colin Goodheart-Smithe 402f312c5e
Adds version 6.7.2 2019-04-04 16:35:39 +01:00
Nhat Nguyen 2756a3936b Reject illegal flush parameters (#40213)
This change rejects an illegal combination of flush parameters where
force is true, but wait_if_ongoing is false. This combination is trappy
and should be forbidden.

Closes #36342
2019-04-04 09:02:31 -04:00
Nhat Nguyen c4960ad736 Ensure flush happen before closing an index (#40184)
If there's an ongoing flush triggered by the translog flush threshold,
we may fail to execute a flush because waitIfOngoing is false by
default.

Relates to #36342
2019-04-04 09:02:31 -04:00
Nhat Nguyen e716b9ceee Ensure no scheduled refresh in testPendingRefreshWithIntervalChange
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.

Closes #39565
Relates #39462

PR #40387
2019-04-04 09:02:31 -04:00
Adrien Grand 670e76669c
Fix alias resolution runtime complexity. (#40263) (#40788)
A user reported that the same query that takes ~900ms when querying an index
pattern only takes ~50ms when only querying indices that have matches. The
query is a date range query and we confirmed that the `can_match` phase works
as expected. I was able to reproduce this issue locally with a single node: with
900 1-shard indices, a query to an index pattern that matches all indices runs
in ~90ms while a query to the only index that has matches runs in 0-1ms.

This ended up not being related to the `can_match` phase but to the cost of
resolving aliases when querying an index pattern that matches lots of indices.
In that case, we first resolve the index pattern to a list of concrete indices
and then for each concrete index, we check whether it was matched through an
alias, meaning we might have to apply alias filters. Unfortunately this second
per-index operation runs in linear time with the number of matched concrete
indices, which means that alias resolution runs in O(num_indices^2) overall.
So queries get exponentially slower as an index pattern matches more indices.

I reorganized alias resolution into a one-step operation that runs in linear
time with the number of matches indices, and then a per-index operation that
runs in linear time with the number of aliases of this index. This makes alias
resolution run is O(num_indices * num_aliases_per_index) overall instead. When
testing the scenario described above, the `took` went down from ~90ms to ~10ms.
It is still more than the 0-1ms latency that one gets when only querying the
single index that has data, but still much better than what we had before.

Closes #40248
2019-04-04 11:40:42 +02:00
Adrien Grand f5f5c3e429
Add unit test for MetaDataMappingService with typeless put mapping. (#40578) (#40720)
This is currently only tested via REST tests.

Closes #37450
2019-04-04 10:07:55 +02:00
Ryan Ernst a28d5f35d9 Fix geo points missing test (#40704)
This commit initializes the geo points for the missing doc values test.

fixes #40684
2019-04-03 16:48:09 -07:00
Mayya Sharipova a94e9500ac Correct bug in ScriptDocValues (#40488)
If a field `field_name` was missing in a document,
doc['field_name'].get(0) incorrectly retrieved
a value of the previously accessed document.
This happened because `get(int index)` function
was just accessing `values[index]` without
checking the number of values - `count`.

This PR fixes this.
2019-04-03 16:47:59 -07:00
Yannick Welsch 6ae7d593ea Avoid background sync on relocated primary (#40800)
There were some test failures caused by the background retention lease sync running on a relocated
primary. This commit fixes the situation that triggered the assertion and reactivates the failing test.

Closes #40731
2019-04-03 20:28:48 +02:00
Christoph Büscher 89389197b3 Help Eclipse infering lambda parameter types (#40747)
The Eclipse compiler (4.10, Photon) cannot build this test because it cannot
correctly infer the type arguments of the functions. Explicitely adding them
helps in this case.
2019-04-03 17:51:22 +02:00
Christoph Büscher 09ba3ec677 Small refactorings to analysis components (#40745)
This change adds the following internal refactorings:

* wraps input analyzers into an unmodifiable map in IndexAnalyzers ctor
* removes duplicated indexSetting in IndexAnalyzers
* removes references to IndexAnalyzers from DocumentMapperParser and TypeParser.ParserContext.
  It can always be retrieve it from MapperService directly in those cases
2019-04-03 14:22:16 +02:00
David Turner 1d2bc85586 Inline TransportReplAction#registerRequestHandlers (#40762)
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.

Relates #40706
2019-04-03 12:12:26 +01:00
Jason Tedor df65e46d10
Deprecate versions of Java prior to Java 11 (#40756)
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
2019-04-03 06:39:40 -04:00
David Turner e64524c46f Remove some abstractions from `TransportReplicationAction` (#40706)
`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.

This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
2019-04-03 09:08:29 +01:00
Simon Willnauer dd624c31b0 Don't mark shard as refreshPending on stats fetching (#40458)
Completion and DocStats are pulled from internal readers
instead of external since #33835 and #33847 which doesn't require
us to refresh after a stats call since refreshes will happen internally
anyhow and that will cause updated stats on ongoing indexing.
2019-04-02 16:15:30 +02:00
David Turner 6f00952abd Use TAR instead of DOCKER build type before 6.7.0 (#40723)
In 6.7.0 (#39378) we added a build type of DOCKER for the docker images, but
unfortunately earlier versions do not understand this and will reject any
transport messages that mention this build type.

This commit fixes this by reporting TAR instead of DOCKER when talking to older
nodes.

Relates (but does not fix) #40511
Relates #39378
2019-04-02 13:17:50 +01:00
Alexander Reelsen c644fbfc6e Allow single digit milliseconds in strict date parsing (#40676)
In order to remain compatible with the existing joda based
implementation the parsing of milliseconds should support parsing single
digits instead of relying on three, even with strict formats.

This adds a few tests to duel against the existing joda based
implementation in order to ensure the parsing behaviour is the same.

Closes #40403
2019-04-02 10:27:50 +02:00
Andrey Ershov 287e334ef3 Do not perform cleanup if Manifest write fails with dirty exception (#40519)
Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
https://github.com/elastic/elasticsearch/issues/39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes https://github.com/elastic/elasticsearch/issues/39077

(cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
2019-04-01 12:52:32 +03:00
Jim Ferenczi 7cc79123df Fix merging of text field mapper (#40627)
On mapping updates the `text` field mapper does not update
the field types for the underlying prefix and phrase fields.
In practice this shouldn't be considered as a bug but we have
an assert in the code that check that field types in the mapper service
are identical to the ones present in field mappers.
2019-04-01 08:41:42 +02:00
Jason Tedor cebe509460
Fix bug in detecting use of bundled JDK on macOS
This commit fixes a bug in detecting the use of the bundled JDK on
macOS. This bug arose because the path of Java home is different on
macOS.
2019-03-31 19:43:17 -04:00
Henning Andersen 92d07e9377 Geo Point parse error fix (#40447)
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.

Related to #17617
2019-03-29 17:39:12 +01:00
Luca Cavanna a0b02ce6ef Move top-level pipeline aggs out of QuerySearchResult (#40319)
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
2019-03-29 17:01:14 +01:00
Oghenovo Usiwoma 444b4c4136 Improve error message for absence of indices (#39789)
"no indices exist" has been added to the error message for absence of indices
2019-03-29 17:01:14 +01:00
Luca Cavanna 48b0deef4f Remove throws IOException from PipelineAggregationBuilder#create (#40222)
IOException are never thrown in any of the existing pipeline aggregation
builders. Removing the throws IOException from the create method allows
to remove it also from a couple of other methods which ends up simplifying
 AggregationPhase (one less catch).
2019-03-29 17:01:14 +01:00
Jason Tedor 585f38787c
Add usage indicators for the bundled JDK (#40616)
This commit adds indications whether or not a distribution is from the
bundled JDK, and whether or not we are using the bundled JDK.
2019-03-29 08:25:32 -04:00
Martijn van Groningen be31800154
Update ingest jdocs that a null return value will drop the current document. (#40359) 2019-03-29 09:46:54 +01:00
Jason Tedor 7255562afd
Add start and stop time to cat recovery API (#40378)
The cat recovery API is incredibly useful. Yet it is missing the start
and stop time as an option from the output. This commit adds these as
options to the cat recovery API. We elect to make these not visible by
default to avoid breaking the output that users might rely on.
2019-03-28 16:23:37 -04:00
Mayya Sharipova 24755209b4 Add randomScore function in script_score query (#40186)
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.

This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
2019-03-28 13:23:47 -04:00
David Turner 1a3916a8de Optimise rejection of out-of-range `long` values (#40325)
Today if you try and insert a very large number like `1e9999999` into a long
field we first construct this number as a `BigDecimal`, convert this to a
`BigInteger` and then reject it because it is out of range. Unfortunately
making such a large `BigInteger` is rather expensive.

We can avoid this expense by performing a (weaker) range check on the
`BigDecimal` representation of incoming `long`s too.

Relates #26137
Closes #40323
2019-03-28 12:27:34 +00:00
David Turner 073b13f5b0 Add docs for cluster.remote.*.proxy setting (#40281)
In #33062 we introduced the `cluster.remote.*.proxy` setting for proxied
connections to remote clusters, but left it deliberately undocumented since it
needed followup work so that it could work with SNI. However, since #32517 is
now closed we can add this documentation and remove the comment about its lack
of documentation.
2019-03-28 12:11:24 +00:00
jimczi 8775e37d03 Fix SearchResponseMerger#testMergeSearchHits
This commit fixes an edge case in tests where search hits are empty
after the merge but some shards returned hits. This can happen if
the total number of merged hits is less than the provided `from`.

Closes #40553
2019-03-28 09:57:21 +01:00
Adrien Grand 65a35c985c
Remove type from VersionConflictEngineException. (#37490) (#40514)
It initially mentioned the type in the exception because the type used to be
required to uniquely identify a document. This is not necessary anymore given
that indices have at most one type.
2019-03-28 09:32:09 +01:00
Adrien Grand 2326a3dccb
Remove String interning from `o.e.index.Index`. (#40350) (#40517)
`Index` interns its name and uuid. My guess is that the main goal is to avoid
having duplicate strings in the representation of the cluster state. However
I doubt it helps much given that we have many other objects in the cluster state
that we don't try to reuse, and interning has some cost. When looking into
 #40263 my profiler pointed to string interning because of the `Index` object
that is created in `QueryShardContext` as one of the bottlenecks of the
`can_match` phase.
2019-03-28 09:31:42 +01:00
Andy Bristol 23395a9b9f
search as you type fieldmapper (#35600)
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried

Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.

The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
2019-03-27 13:29:13 -07:00
Benjamin Trent 6563dc7ed9
Muting test for #40553 (#40555) 2019-03-27 14:52:12 -05:00
Tim Brooks ab44f5fd5d
Add InboundHandler for inbound message handling (#40430)
This commit adds an InboundHandler to handle inbound message processing.
With this commit, this code is moved out of the TcpTransport.
Additionally, finer grained unit tests are added to ensure that the
inbound processing works as expected
2019-03-27 12:33:26 -06:00
Yannick Welsch 64b31f44af No mapper service and index caches for replicated closed indices (#40423)
Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with
full indexing and search capabilities allocated. We can save on a lot of heap memory for those
indices by not allocating a mapper service and caching infrastructure (which preallocates a constant
amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed
metricbeat indices (each index with one shard). After this change, the same instance can host 7300
replicated closed metricbeat instances (not that this would be a recommended configuration). Most
of the remaining memory is in the cluster state and the IndexSettings object.
2019-03-27 19:04:24 +01:00
Yannick Welsch 8f7c5732f1 Use default discovery implementation for single-node discovery (#40036)
Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions:

-  auto-bootstrapping, but requiring initial_master_nodes not to be set.
- not actively pinging other nodes using the Peerfinder
- not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).
2019-03-27 19:04:24 +01:00
Tim Brooks 3860ddd1a4
Move outbound message handling to OutboundHandler (#40336)
Currently there are some components of message serializer and sending
that still occur in TcpTransport. This commit makes it possible to
send a message without the TcpTransport by moving all of the remaining
application logic to the OutboundHandler. Additionally, it adds unit
tests to ensure that this logic works as expected.
2019-03-27 11:47:36 -06:00
David Turner 707d40ce06 Stabilise testStaleMasterNotHijackingMajority (#40253)
This test inadvertently asserts that the election occurs after a master failure
is clean. However, messy elections are a fact of life so we should not fail on
a messy election.

This change moves this test away from an `AbstractDisruptionTestCase` since it
does not need the fault detector to be so enthusiastic, and weakens the
assertions to merely say that we ignore states published by the old master
without saying anything about the cleanliness of the election.

Closes #36556
2019-03-27 16:00:14 +00:00
Tim Brooks 760cfffe4b
Move TransportMessageListener to TransportService (#40474)
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.

Additionally this commit back ports #40237

Remove Tracer from MockTransportService

Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
2019-03-27 09:24:20 -06:00
Przemyslaw Gomulka 65f01277ed
Parse composite patterns using ClassicFormat.parseObject backport(#40100) (#40501)
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
backport #40100
2019-03-27 13:51:44 +01:00
Yannick Welsch b4b17e16e0 Remove TransportSingleItemBulkWriteAction as replication action (#40424)
The implementation of TransportIndexAction and TransportDeleteAction as
TransportReplicationAction existed for interoperability with older 5.x nodes, as these older nodes
coordinated single index / deletes as replication requests. This BWC layer is no longer needed in 7.x,
where these single actions are now mapped to bulk requests. Completely removing the deprecated
transport actions is not possible yet if we want to keep BWC with a 6.x transport client. The best
way here is to wait for the transport client to go away and then just remove the actions.
2019-03-27 13:16:58 +01:00
Jim Ferenczi fe05a4d511 Fix random failures in SearchResponseMerger#testMergeSearchHits (#40223)
This commit fixes the expectation in the test when the search hits are empty.

Closes #40214
2019-03-27 11:17:10 +01:00
alex101101 fb8ad0cf30 Add a soft limit to the field name length (#40309)
Adds an optional limit to the length of field names, throws an IllegalArgumentException if the limit is breached. 
Closes #33651
2019-03-26 17:58:32 +01:00
Daniel Mitterdorfer f2b5960f90 Add version 6.7.1 2019-03-26 17:38:14 +01:00
Yannick Welsch bf7b167bba Remove timeout task after completing cluster state publication (#40411)
Each cluster state publication schedules a cancellation task with the provided publication timeout
(30s by default). This scheduled cancellation keeps a reference to the publication, and therefore the
full cluster state that was published. In case of frequently updating a large cluster state, this results
in a large number of cancellation tasks keeping references to all previously published cluster states.
2019-03-26 17:13:57 +01:00
Henning Andersen bf444b9f02 Store Pending Deletions Fix (#40345)
FilterDirectory.getPendingDeletions does not delegate, fixed
temporarily by overriding in StoreDirectory.

This in turn caused duplicate file name use after a trimUnsafeCommits
had been done, since a new IndexWriter would not consider the pending
deletes in IndexFileDeleter. This should only happen on windows (AFAIK).

Reenabled doing index updates for all tests using
IndexShardTests.indexOnReplicaWithGaps (which could fail due to above
when using mocked WindowsFS).

Added getPendingDeletions delegation to all elasticsearch
FilterDirectory subclasses that were not trivial test-only overrides to
minimize the risk of hitting this issue in another case.
2019-03-26 15:30:44 +01:00
Alan Woodward 12634850d6 IntervalQueryBuilderTests#testNonIndexedFields test fix (#40418)
This test checks that interval queries constructed against a field with no indexed
positions will throw exceptions. It uses a randomly-build IntervalsSourceProvider
against a fixed set of fields; however, the random source builder can occasionally
provide a source with a fixed field, meaning that even if the top-level query asks
for a set of intervals over a non-indexed field, the source will delegate to another
field, and no exception will be thrown.

This commit changes the test to always use a simple Match provider.

Fixes #40436
2019-03-26 08:33:42 +00:00
Nhat Nguyen 495dc11c9c Mute testPendingRefreshWithIntervalChange
Tracked at #39565
2019-03-25 11:47:08 -04:00
Armin Braun 3968d46a17
Remove Redundant Request Wrappers from RepositoryService (#40192) (#40404) 2019-03-25 16:36:02 +01:00
Armin Braun dc5ff0fffc
Log Warning on Failed Blob Deletes in BlobStoreRepository (#40188) (#40340)
* Log Warning on Failed Blob Deletes in BlobStoreRepository
* We should not just debug log these spots, they all can and will lead to leaked files when snapshot deletion fails
2019-03-25 08:52:09 +01:00
Nhat Nguyen b9f96a8e1f
Expose external refreshes through the stats API (#38643)
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.

In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.

Relates #36712
2019-03-24 22:21:00 -04:00
Armin Braun 13d76239a0
Use Netty ByteBuf Bulk Operations for Faster Deserialization (#40158) (#40339)
* Use bulk methods to read numbers faster from byte buffers
2019-03-24 19:08:51 +01:00
Jason Tedor 10bbb082a4
Only run retention lease actions on active primary (#40386)
In some cases, a request to perform a retention lease action can arrive
on a primary shard before it is active. In this case, the primary shard
would not yet be in primary mode, tripping an assertion in the
replication tracker. Instead, we should not attempt to perform such
actions on an initializing shard. This commit addresses this by not
returning the primary shard in the single shard iterator if the primary
shard is not yet active.
2019-03-23 09:39:39 -04:00
Zachary Tong 78f737dad3
Map value field to double in MovavgIT (#40230)
We were accidentally not mapping the index, which meant dynamic mapping
was choosing floats for the values.  This led to enough loss of precision
for the aggregated values to differ slightly from the test doubles,
which accumulated into large differences in the holt output.

This test fix adds an explicit mapping.
2019-03-21 14:03:14 -04:00
Jason Tedor 1e6941b138
Reduce retention lease sync intervals (#40302)
This commit adjusts the frequency with which CCR renews retention leases
and with which primaries sync retention leases to replicas. This helps
Lucene reclaim soft-deleted documents more aggressively, which we have
found in some use-cases can help improve performance, and either way
will help keep disk space under more control.
2019-03-21 07:37:44 -04:00
Alan Woodward 83d2870308 Add `use_field` option to intervals query (#40157)
This is the equivalent of the `field_masking_span` query, allowing users to
merge intervals from multiple fields - for example, to search for stemmed tokens
near unstemmed tokens.
2019-03-20 16:26:04 +00:00
Like 6f64267626 Make setting index.translog.sync_interval be dynamic (#37382)
Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's
not dynamic which can be updated for closed index only.

Closes #32763
2019-03-20 17:12:45 +01:00
Yannick Welsch a5fb7fb17c Fix snapshot restore logging on fresh restore (#40252)
A recent refactoring (#37130) where imports got mixed up (changing Lucene's
IndexNotFoundException to Elasticsearch's IndexNotFoundException) led to many warnings being
logged in case of restoring a fresh snapshot.
2019-03-20 16:51:44 +01:00
Jim Ferenczi 3400483af4
Add date and date_nanos conversion to the numeric_type sort option (#40199) (#40224)
This change adds an option to convert a `date` field to nanoseconds resolution
 and a `date_nanos` field to millisecond resolution when sorting.
The resolution of the sort can be set using the `numeric_type` option of the
field sort builder. The conversion is done at the shard level and is restricted
to dates from 1970 to 2262 for the nanoseconds resolution in order to avoid
numeric overflow.
2019-03-20 16:50:28 +01:00
Nhat Nguyen efaf95628b Use separate translog dir in testDeleteWithFatalError
This test currently opens a new engine but shares the same translog
directory of the previous opening engine.
2019-03-20 10:22:27 -04:00
Mayya Sharipova 49a7c6e0e8
Expose proximity boosting (#39385) (#40251)
Expose DistanceFeatureQuery for geo, date and date_nanos types

Closes #33382
2019-03-20 09:24:41 -04:00
Henning Andersen 4c2a8638ca Cascading primary failure lead to MSU too low (#40249)
If a replica were first reset due to one primary failover and then
promoted (before resync completes), its MSU would not include changes
since global checkpoint, leading to errors during translog replay.

Fixed by re-initializing MSU before restoring local history.
2019-03-20 14:00:43 +01:00
Simon Willnauer 235f57989f Return cached segments stats if `include_unloaded_segments` is true (#39698)
Today we don't return segments stats for closed indices which makes it
hard to tell how much memory such an index would require. With this change
we return the statistics if requested by setting `include_unloaded_segments` to
true on the rest request.

Relates to #39512
2019-03-20 12:08:41 +01:00
Jason Tedor 9ce740a2eb
Modfiy casing in JVM home log message
This makes the log message consistent with the following line that shows
the JVM arguments.
2019-03-20 00:06:16 -04:00
Zachary Tong 69f5869707 Mute SearchResponseMergerTests#testMergeSearchHits
Tracking issue: https://github.com/elastic/elasticsearch/issues/40214
2019-03-19 13:40:38 -04:00
David Turner 33d8738c68 Fix RareClusterStateIT on MacOS (#40203)
Today RareClusterStateIT#testAssignmentWithJustAddedNodes fails on my Mac
because it waits for the default connection timeout of 30 seconds to connect to
a fake node with IP address 0.0.0.0. This connection attempt fails much more
quickly on Linux so the test passes.

This commit fixes this by reducing the connection timeout for this test.
2019-03-19 17:33:21 +00:00
Nhat Nguyen a13b4bc8c5 Always fail engine if delete operation fails (#40117)
Unlike index operations which can fail at the document level to
analyzing errors, delete operations should never fail at the document
level whether soft-deletes is enabled or not. With this change, we will
always fail the engine if we fail to apply a delete operation to Lucene.

Closes #33256
2019-03-19 13:09:23 -04:00
Luca Cavanna d14e79e849 Serialize top-level pipeline aggs as part of InternalAggregations (#40177)
We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.

With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.

With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.

Closes #40059
2019-03-19 14:43:39 +01:00
Luca Cavanna 803ec46331 Skip sibling pipeline aggregators reduction during non-final reduce (#40101)
Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
2019-03-19 14:43:39 +01:00
Luca Cavanna 83f12a3d9c CCS: skip empty search hits when minimizing round-trips (#40098)
When minimizing round-trips, each cluster returns its own independent
search response. In case sort by field and/or field collapsing were
requested, when one cluster has no results to return, the information
about the field that sorting was based on (SortField array) as well as
the field (and the values) that collapsing was performed on are missing
in the search response. That causes problems as we can't build the
proper `TopDocs` instance which would need to be either `TopFieldDocs`
or `CollapseTopFieldDocs`. The merge routine expects that all the top
docs are of the same exact type which can't be guaranteed. Given that
the problematic results are empty, hence have no impact on the final
results, we can simply skip them.

Relates to #32125
Closes #40067
2019-03-19 14:43:39 +01:00
Luca Cavanna 9c38fa6468 [TEST] Update TransportSearchActionTests#testShouldMinimizeRoundtrips
Relates to #40044

Closes #40051
2019-03-19 14:43:38 +01:00
Luca Cavanna 07bfb4c7f7 CCS: Disable minimizing round-trips when dfs is requested (#40044)
When using DFS_QUERY_THEN_FETCH search type, the dfs phase is run and
its results are used in the query phase to make scoring accurate.
When using CCS, depending on whether the DFS phase runs in the CCS
coordinating node (like if all shards were local) or in each remote
cluster (when minimizing round-trips), scoring will differ.

This commit disables minimizing round-trips whenever DFS is requested,
as it is not currently  possible to ensure that scoring is accurate in
that case.

Relates to #32125
2019-03-19 14:43:38 +01:00
Nhat Nguyen 8dc6862b17 Unmute and trace testPendingRefreshWithIntervalChange
Tracked at #39565
2019-03-19 09:07:54 -04:00
Henning Andersen dde41cc2dd Node repurpose tool (#39403)
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
2019-03-19 11:52:02 +01:00
Dimitris Athanasiou 95f660d577
Mute NoMasterNodeIT.testNoMasterActionsWriteMasterBlock test (#39689)
Relates #39688
2019-03-18 15:04:26 -06:00
Henning Andersen 0b214c1bfb Linearizability checker memory reduction (#40149)
The cache used in linearizability checker now uses approximately 6x less
memory by changing the cache from a set of (bits, state) tuples into a
map from bits -> { state }.

Each combination of states is kept once only, building on the
assumption that the number of state permutations is small compared to
the number of bits permutations. For those histories that are difficult
to check we will have many bits combinations that use the same state
permutations.

We end up now using approximately 15 bytes per entry compared to 101
bytes before, ie. a 6x improvement, allowing us to linearizability check
significantly longer histories.

Re-enabled linearizability checker in CoordinatorTests, hoping above
ensures we no longer run out of memory.

Resolves #39437
2019-03-18 21:16:59 +01:00
Nhat Nguyen 38e9522218 Remove wait for cluster state step in peer recovery (#40004)
We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped
using it since #25692 (6.0). This change removes that action and related
code in 7.x and 8.0.

Relates #19287
Relates #25692
2019-03-18 15:17:21 -04:00
Nhat Nguyen d720a64b9e Ensure sendBatch not called recursively (#39988)
This PR introduces AsyncRecoveryTarget which executes remote calls of
peer recovery asynchronously. In this change, we also add a new
assertion to ensure that method sendBatch, which sends a batch of
history operations in phase2, is never called recursively on the same
thread. This new assertion will also be used in method sendFileChunks.
2019-03-18 15:17:21 -04:00
Jim Ferenczi eb540125ea
Fix IndexSearcherWrapper visibility (#39071) (#40145)
This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by
sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin.
This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed.

Closes #30758
2019-03-18 11:33:54 +01:00
Jim Ferenczi 5b73a1bc7d
Add an option to force the numeric type of a field sort (#38095) (#40084)
This change adds an option to the `FieldSortBuilder` that allows to transform the type
of a numeric field into another. Possible values for this option are `long` that transforms
the source field into an integer and `double` that transforms the source field into a floating point.
This new option is useful for cross-index search when the sort field is mapped differently on some
indices. For instance if a field is mapped as a floating point in one index and as an integer in another
it is possible to align the type for both indices using the `numeric_type` option:

```
{
   "sort": {
    "field": "my_field",
    "numeric_type": "double" <1>
   }
}
```

<1> Ensure that values for this field are transformed to a floating point if needed.
2019-03-18 09:32:45 +01:00
Albert Zaharovits 1b75ee0bd7 AuditTrail correctly handle ReplicatedWriteRequest (#39925)
This fix deduplicates index names in `BulkShardRequests` and only audits
the specific resolved index for every comprising `BulkItemRequest`.
2019-03-17 13:05:26 +02:00
Jason Tedor 86d1d03c37
Remove cluster state size (#40109)
This commit removes the cluster state size field from the cluster state
response, and drops the backwards compatibility layer added in 6.7.0 to
continue to support this field. As calculation of this field was
expensive and had dubious value, we have elected to remove this field.
2019-03-15 17:16:25 -04:00
Tim Brooks 0b50a670a4
Remove transport name from tcp channel (#40074)
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
2019-03-15 12:04:13 -06:00
Zachary Tong c72feedd74 Do not allow Sampler to allocate more than maxDoc size, better CB accounting (#39381)
The `sampler` agg creates a BestDocsDeferringCollector, which internally
initializes a priority queue of size `shardSize`.  This queue is
populated with empty `Object` sentinels, which is roughly 16b per
object.

Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors
which internally track PQ slots with ScoreDocKeys, weighing in around
28kb

If the user sets a very abusive `shard_size`, this could easily OOM
a node or cluster since these PQ are allocated up-front without
any checks.

This commit makes sure that when we create the collector, it
cannot be larger than the maxDoc so that we don't accidentally blow
up the node.  We ensure the size is not greater than the overall
index maxDoc. A similar treatment is done for `maxDocsPerValue`
parameter of the diversified samplers

For good measure, this also adds in some CB accounting to try and track
memory usage.

Finally, a redundant array creation is removed to reduce a bit of
temporary memory.
2019-03-15 13:19:55 -04:00