Commit Graph

4099 Commits

Author SHA1 Message Date
Nhat Nguyen 3f67cbe974 Suppress warning from background sync on relocated primary (#46247)
If a primary as being relocated, then the global checkpoint and
retention lease background sync can emit unnecessary warning logs.
This side effect was introduced in #42241.

Relates #40800
Relates #42241
2019-09-03 18:44:15 -04:00
Nhat Nguyen 5924df1764 Mute testRecoveryFromFailureOnTrimming
Tracked at #46267
2019-09-03 18:44:08 -04:00
Lee Hinman 57f322f85e Move MockRespository into test framework (#46298)
This moves the `MockRespository` class into `test/framework/src/main` so
it can be used across all modules and plugins in tests.
2019-09-03 16:21:10 -06:00
Jason Tedor b8c51ff894
Multi-get requests should wait for search active (#46283)
When a shard has fallen search idle, and a non-realtime multi-get
request is executed, today such requests do not wait for the shard to
become search active and therefore such requests do not wait for a
refresh to see the latest changes to the index. This also prevents such
requests from triggering the shard as non-search idle, influencing the
behavior of scheduled refreshes. This commit addresses this by attaching
a listener to the shard search active state for multi-get requests. In
this way, when the next scheduled refresh is executed, the multi-get
request will then proceed.
2019-09-03 14:31:37 -04:00
Henning Andersen 2383acaa89 Fix testSyncFailsIfOperationIsInFlight (#46269)
testSyncFailsIfOperationIsInFlight could fail due to the index request
spawing a GCP sync (new since 7.4). Test now waits for it to finish
before testing that flushed sync fails.
2019-09-03 17:30:00 +02:00
dengweisysu 416419e4c9 Sync translog without lock when trim unreferenced readers (#46203)
With this change, we can avoid blocking writing threads when trimming
unreferenced readers; hence improving the translog writing performance
in async durability mode.

Close #46201
2019-09-02 21:55:06 -04:00
Anup e01ec802e7 Remove duplicate line in SearchAfterBuilder (#45994) 2019-09-03 01:30:01 +02:00
Armin Braun 2662c1b417
Wait for all Rec. to Stop on Node Close (#46178) (#46237)
* Wait for all Rec. to Stop on Node Close

* This issue is in the `RecoverySourceHandler#acquireStore`. If we submit the store release to the generic threadpool while it is getting shut down we never complete the futue we wait on (in the generic pool as well) and fail to ever release the store potentially.
* Fixed by waiting for all recoveries to end on node close so that we aways have a healthy thread pool here
* Closes #45956
2019-09-02 18:04:37 +02:00
Martijn van Groningen 555b630160
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-09-02 09:16:55 +02:00
Martijn van Groningen 5747badaa8
Allow ingest processors access to node client. (#46077)
This is the first PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.

The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.
2019-09-02 08:24:26 +02:00
Nhat Nguyen db949847e5 Fix translog stats in testPrepareIndexForPeerRecovery (#46137)
When recovering a shard locally, we use a translog snapshot from
newSnapshotFromGen which consists of all readers from a certain
generation. In the test, we use newSnapshotFromMinSeqNo for the
expectation. The snapshot of this method includes only readers
containing operations in the requesting range.

Closes #46022
2019-08-30 08:53:27 -04:00
Andrey Ershov 152ce62c58 Enhanced logging when transport is misconfigured to talk to HTTP port (#45964)
If a node is misconfigured to talk to remote node HTTP port (instead of
transport port) eventually it will receive an HTTP response from the
remote node on transport port (this happens when a node sends
accidentally line terminating byte in a transport request).
If this happens today it results in a non-friendly log message and a
long stack trace.
This commit adds a check if a malformed response is HTTP response. In
this case, a concise log message would appear.

(cherry picked from commit 911d02b7a9c3ce7fe316360c127a935ca4b11f37)
2019-08-30 13:02:08 +02:00
Paul Sanwald 8bdbc7d9bf
Bump version from 7.4 to 7.5 (#46142) 2019-08-29 15:03:26 -04:00
Julie Tibshirani b5d8b364bb
Ensure top docs optimization is fully disabled for queries with unbounded max scores. (#46105) (#46139)
When a query contains a mandatory clause that doesn't track the max score per
block, we disable the max score optimization. Previously, we were doing this by
wrapping the collector with a FilterCollector that always returned
ScoreMode.COMPLETE.

However we weren't adjusting totalHitsThreshold, so the collector could still
call Scorer#setMinCompetitiveScore. It is against the method contract to call
setMinCompetitiveScore when the score mode is COMPLETE, and some scorers like
ReqOptSumScorer throw an error in this case.

This commit tries to disable the optimization by always setting
totalHitsThreshold to max int, as opposed to wrapping the collector.
2019-08-29 10:56:53 -07:00
Simon Willnauer 9b2ea07b17
Flush engine after big merge (#46066) (#46111)
Today we might carry on a big merge uncommitted and therefore
occupy a significant amount of diskspace for quite a long time
if for instance indexing load goes down and we are not quickly
reaching the translog size threshold. This change will cause a
flush if we hit a significant merge (512MB by default) which
frees diskspace sooner.
2019-08-29 17:54:15 +02:00
Nhat Nguyen bb49124690 Only verify global checkpoint if translog sync occurred (#45980)
We only sync translog if the given offset hasn't synced yet. We can't
verify the global checkpoint from the latest translog checkpoint unless
a sync has occurred.

Closes #46065
Relates #45634
2019-08-29 09:44:40 -04:00
David Turner d340530a47 Avoid overshooting watermarks during relocation (#46079)
Today the `DiskThresholdDecider` attempts to account for already-relocating
shards when deciding how to allocate or relocate a shard. Its goal is to stop
relocating shards onto a node before that node exceeds the low watermark, and
to stop relocating shards away from a node as soon as the node drops below the
high watermark.

The decider handles multiple data paths by only accounting for relocating
shards that affect the appropriate data path. However, this mechanism does not
correctly account for _new_ relocating shards, which are unwittingly ignored.
This means that we may evict far too many shards from a node above the high
watermark, and may relocate far too many shards onto a node causing it to blow
right past the low watermark and potentially other watermarks too.

There are in fact two distinct issues that this PR fixes. New incoming shards
have an unknown data path until the `ClusterInfoService` refreshes its
statistics. New outgoing shards have a known data path, but we fail to account
for the change of the corresponding `ShardRouting` from `STARTED` to
`RELOCATING`, meaning that we fail to find the correct data path and treat the
path as unknown here too.

This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths
for all shards. With the changes here, the data paths are handled in tests as
they are in production, except that their sizes are fake.

Fixes #45177
2019-08-29 12:40:55 +01:00
Jason Tedor 9bc4a24118
Handle delete document level failures (#46100)
Today we assume that document failures can not occur for deletes. This
assumption is bogus, as they can fail for a variety of reasons such as
the Lucene index having reached the document limit. Because of this
assumption, we were asserting that such a document-level failure would
never happen. When this bogus assertion is violated, we fail the node, a
catastrophe. Instead, we need to treat this as a fatal engine exception.
2019-08-28 22:17:16 -04:00
Tal Levy a356bcff41
Add Circle Processor (#43851) (#46097)
add circle-processor that translates circles to polygons
2019-08-28 14:44:08 -07:00
Jason Tedor 1249e6ba5d
Handle no-op document level failures (#46083)
Today we assume that document failures can not occur for no-ops. This
assumption is bogus, as they can fail for a variety of reasons such as
the Lucene index having reached the document limit. Because of this
assumption, we were asserting that such a document-level failure would
never happen. When this bogus assertion is violated, we fail the node, a
catastrophe. Instead, we need to treat this as a fatal engine exception.
2019-08-28 13:57:24 -04:00
Tanguy Leroux 9e14ffa8be Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068) 2019-08-28 16:29:46 +02:00
Mark Tozzi aec125faff
Support Range Fields in Histogram and Date Histogram (#46012)
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)

     * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
     * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
     * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type.  This is similar to how Terms aggregator works.
     * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
2019-08-28 09:06:09 -04:00
Martijn van Groningen 1157224a6b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-28 10:14:07 +02:00
Henning Andersen 300e717e42 Disallow partial results when shard unavailable (#45739)
Searching with `allowPartialSearchResults=false` could still return
partial search results during recovery. If a shard copy fails
with a "shard not available" exception, the failure would be ignored and
a partial result returned. The one case where this is known to happen
is when a shard copy is recovering when searching, since
`IllegalIndexShardStateException` is considered a "shard not available"
exception.

Relates to #42612
2019-08-27 17:01:23 +02:00
Nhat Nguyen 146e23a8a9 Relax translog assertion in testRestoreLocalHistoryFromTranslog (#45943)
Since #45473, we trim translog below the local checkpoint of the safe
commit immediately if soft-deletes enabled. In
testRestoreLocalHistoryFromTranslog, we should have a safe commit after
recoverFromTranslog is called; then we will trim translog files which
contain only operations that are at most the global checkpoint.

With this change, we relax the assertion to ensure that we don't put
operations to translog while recovering history from the local translog.
2019-08-26 17:19:19 -04:00
Nhat Nguyen c66bae39c3 Update translog checkpoint after marking ops as persisted (#45634)
If two translog syncs happen concurrently, then one can return before
its operations are marked as persisted. In general, this should not be
an issue; however, peer recoveries currently rely on this assumption.

Closes #29161
2019-08-26 17:18:52 -04:00
Nhat Nguyen f2e8b17696 Do not create engine under IndexShard#mutex (#45263)
Today we create new engines under IndexShard#mutex. This is not ideal
because it can block the cluster state updates which also execute under
the same mutex. We can avoid this problem by creating new engines under
a separate mutex.

Closes #43699
2019-08-26 17:18:29 -04:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Zachary Tong 943a016bb2
Add Cumulative Cardinality agg (and Data Science plugin) (#45990)
This adds a pipeline aggregation that calculates the cumulative
cardinality of a field.  It does this by iteratively merging in the
HLL sketch from consecutive buckets and emitting the cardinality up
to that point.

This is useful for things like finding the total "new" users that have
visited a website (as opposed to "repeat" visitors).

This is a Basic+ aggregation and adds a new Data Science plugin
to house it and future advanced analytics/data science aggregations.
2019-08-26 16:19:55 -04:00
James Baiera 5535ff0a44
Fix IngestService to respect original document content type (#45799) (#45984)
Backport of #45799

This PR modifies the logic in IngestService to preserve the original content type 
on the IndexRequest, such that when a document with a content type like SMILE 
is submitted to a pipeline, the resulting document that is persisted will remain in 
the original content type (SMILE in this case).
2019-08-26 14:33:33 -04:00
Armin Braun af2bd75def
Fix Broken HTTP Request Breaking Channel Closing (#45958) (#45973)
This is essentially the same issue fixed in #43362 but for http request
version instead of the request method. We have to deal with the
case of not being able to parse the request version, otherwise
channel closing fails.

Fixes #43850
2019-08-26 16:20:58 +02:00
Armin Braun 5a17987e19
Fix SnapshotStatusApisIT (#45929) (#45971)
The snapshot status when blocking can still be INIT in rare cases when
the new cluster state that has the snapshot in `STARTED` hasn't yet
become visible.
Fixes #45917
2019-08-26 15:59:02 +02:00
Andrey Ershov d96469ddff Better logging for TLS message on non-secure transport channel (#45835)
This commit enhances logging for 2 cases:

1. If non-TLS enabled node receives transport message from TLS enabled
node on transport port.
2. If non-TLS enabled node receives HTTPs request on transport port.

(cherry picked from commit 4f52ebd32eb58526b4c8022f8863210bf88fc9be)
2019-08-26 15:07:13 +02:00
Jason Tedor 599bf2d68b
Deprecate the pidfile setting (#45938)
This commit deprecates the pidfile setting in favor of node.pidfile.
2019-08-23 21:31:35 -04:00
Mayya Sharipova 3bc1494d38 Correct warning testScalingThreadPoolConfiguration
Correct expected warning

Closes #45907
2019-08-23 10:30:36 -04:00
Henning Andersen 46d9a575db Fix RemoteClusterConnection close race (#45898)
Closing a `RemoteClusterConnection` concurrently with trying to connect
could result in double invoking the listener.

This fixes
RemoteClusterConnectionTest#testCloseWhileConcurrentlyConnecting

Closes #45845
2019-08-23 14:26:02 +02:00
Tanguy Leroux 8e66df9925 Move testRetentionLeasesClearedOnRestore (#45896) 2019-08-23 13:43:40 +02:00
Martijn van Groningen 837cfa2640
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-23 11:22:27 +02:00
Alexander Reelsen ecafe4f4ad Update joda to 2.10.3 (#45495) 2019-08-23 10:39:39 +02:00
Armin Braun ba6d72ea9f
Fix TransportSnapshotsStatusAction ThreadPool Use (#45824) (#45883)
In case of an in-progress snapshot this endpoint was broken because
it tried to execute repository operations in the callback on a
transport thread which is not allowed (only generic or snapshot
pool are allowed here).
2019-08-23 06:17:50 +02:00
Jason Tedor de6b6fd338
Add node.processors setting in favor of processors (#45885)
This commit namespaces the existing processors setting under the "node"
namespace. In doing so, we deprecate the existing processors setting in
favor of node.processors.
2019-08-22 22:18:37 -04:00
Nhat Nguyen 3393f9599e
Ignore translog retention policy if soft-deletes enabled (#45473)
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Backport of #45473
Relates #45136
2019-08-22 16:40:06 -04:00
dengweisysu 72c6302d12 Fsync translog without writeLock before rolling (#45765)
Today, when rolling a new translog generation, we block all write
threads until a new generation is created. This choice is perfectly 
fine except in a highly concurrent environment with the translog 
async setting. We can reduce the blocking time by pre-sync the 
current generation without writeLock before rolling. The new step 
would fsync most of the data of the current generation without 
blocking write threads.

Close #45371
2019-08-22 16:18:42 -04:00
William Brafford f82c0f56a6
Mute flaky RemoteClusterConnection test (#45850) 2019-08-22 15:00:43 -04:00
Jake Landis c60399c77f
introduce 7.3.2 version to 7.x (#45864) 2019-08-22 12:24:19 -05:00
Andrey Ershov ed8307c198 Deprecate es.http.cname_in_publish_address setting (#45616)
Follow up on #32806.

The system property es.http.cname_in_publish_address is deprecated
starting from 7.0.0 and deprecation warning should be added if the
property is specified.
This PR will go to 7.x and master.
Follow-up PR to remove es.http.cname_in_publish_address property
completely will go to the master.

(cherry picked from commit a5ceca7715818f47ec87dd5f17f8812c584b592b)
2019-08-22 12:09:35 +02:00
Armin Braun 88acae48ce
Remove index-N Rebuild in Shard Snapshot Updates (#45740) (#45778)
* There is no point in listing out every shard over and over when the `index-N` blob in the shard contains a list of all the files
   * Rebuilding the `index-N` from the `snap-${uuid}.dat` blobs does not provide any material benefit. It only would in the corner case of a corrupted `index-N` but otherwise uncorrupted blobs since we neither check the correctness of the content of all segment blobs nor do we do a similar recovery at the root of the repository.
   * Also, at least in version `6.x` we only mark a shard snapshot as successful after writing out the updated `index-N` blob so all snapshots that would work with `7.x` and newer must have correct `index-N` blobs

=> Removed the rebuilding of the `index-N` content from `snap-${uuid}.dat` files and moved to only listing `index-N` when taking a snapshot instead of listing all files
=> Removed check of file existence against physical blob listing
=> Kept full listing on the delete side to retain full cleanup of blobs that aren't referenced by the `index-N`
2019-08-22 11:32:45 +02:00
Luca Cavanna b95ca9c3bb Fix compile errors in HttpChannelTaskHandler
Relates to #43332
2019-08-22 11:13:26 +02:00
Luca Cavanna a47ade3e64 Cancel search task on connection close (#43332)
This PR introduces a mechanism to cancel a search task when its corresponding connection gets closed. That would relief users from having to manually deal with tasks and cancel them if needed. Especially the process of finding the task_id requires calling get tasks which needs to call every node in the cluster.

The implementation is based on associating each http channel with its currently running search task, and cancelling the task when the previously registered close listener gets called.
2019-08-22 10:43:20 +02:00
Nhat Nguyen 3029887451 Never release store using CancellableThreads (#45409)
Today we can release a Store using CancellableThreads. If we are holding
the last reference, then we will verify the node lock before deleting
the store. Checking node lock performs some I/O on FileChannel. If the
current thread is interrupted, then the channel will be closed and the
node lock will also be invalid.

Closes #45237
2019-08-21 21:24:31 -04:00
Tal Levy 9b14b7298b
[7.x] Add is_write_index column to cat.aliases (#45798)
* Add is_write_index column to cat.aliases (#44772)

Aliases have had the option to set `is_write_index` since 6.4,
but the cat.aliases action was never updated.

* correct version bounds to 7.4
2019-08-21 14:15:49 -07:00
William Brafford 2b549e7342
CLI tools: write errors to stderr instead of stdout (#45586)
Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools.

This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr.

Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo | sort).
2019-08-21 14:46:07 -04:00
Armin Braun 790765d3f9
Remove Dep. on SnapshotsService in SnapshotShardsService (#45776) (#45791)
SnapshotShardsService depends on the RepositoriesService
not the SnapshotsService, no need to have this indirection.
2019-08-21 19:26:19 +02:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Jim Ferenczi fe2a7523ec Add support for inlined user dictionary in the Kuromoji plugin (#45489)
This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.

Closes #25343
2019-08-21 16:28:30 +02:00
Martijn van Groningen 2677ac14d2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-21 14:28:17 +02:00
Christos Soulios 2a0c7c40e5
[7.x] Implement AvgAggregatorTests#testDontCacheScripts and remove AvgIT #45746
Backports PR #45737:

    Similar to PR #45030 integration test testDontCacheScripts() was moved to unit test AvgAggregatorTests#testDontCacheScripts.

    AvgIT class was removed.
2019-08-20 20:19:51 +03:00
Christos Soulios 96a40acd82
[7.x] Migrate tests from MaxIT to MaxAggregatorTests (#45030) #45742
Backports PR #45030 to 7.x:

    This PR migrates tests from MaxIT integration test to MaxAggregatorTests, as described in #42893
2019-08-20 18:58:47 +03:00
Nhat Nguyen e9759b2b33 Wait for background refresh in testAutomaticRefresh (#45661)
If the background refresh is running, but not finished yet then the
document might not be visible to the next search. Thus, if
scheduledRefresh returns false, we need to wait until the background
refresh is done.

Closes #45571
2019-08-20 10:40:12 -04:00
Rory Hunter 47b3dccbc4
Always check that cgroup data is present (#45647)
`OsProbe` fetches cgroup data from the filesystem, and has asserts that
check for missing values. This PR changes most of these asserts into
runtime checks, since at least one user has reported an NPE where
a piece of cgroup data was missing.

Backport of #45606 to 7.x.
2019-08-19 10:29:41 +01:00
Nhat Nguyen 6f5d944fbd Ensure AsyncTask#isScheduled remain false after close (#45687)
If a scheduled task of an AbstractAsyncTask starts after it was closed,
then isScheduledOrRunning can remain true forever although no task is
running or scheduled.

Closes #45576
2019-08-17 13:48:50 -04:00
Vega 6f2daa85e3 Allow uppercase in keystore setting names (#45222)
The elasticsearch keystore was originally backed by a PKCS#12 keystore, which had several limitations. To overcome some of these limitations in encoding, the setting names existing within the keystore were limited to lowercase alphanumberic (with underscore). Now that the keystore is backed by an encrypted blob, this restriction is no longer relevant. This commit relaxes that restriction by allowing uppercase ascii characters as well.

closes #43835
2019-08-16 17:50:08 -07:00
Igor Motov 98c850c08b
Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618)
Changes the order of parameters in Geometries from lat, lon to lon, lat
and moves all Geometry classes are moved to the
org.elasticsearch.geomtery package.

Backport of #45332

Closes #45048
2019-08-16 14:42:02 -04:00
Ryan Ernst 742213d710 Improve error message when index settings are not a map (#45588)
This commit adds an explicit error message when a create index request
contains a settings key that is not a json object. Prior to this change
the user would be given a ClassCastException with no explanation of what
went wrong.

closes #45126
2019-08-16 11:39:26 -07:00
Zachary Tong 50c65d05ba Move bucket reduction from Bucket to the InternalAgg (#45566)
The current idiom is to have the InternalAggregator find all the
buckets sharing the same key, put them in a list, get the first bucket
and ask that bucket to reduce all the buckets (including itself).

This a somewhat confusing workflow, and feels like the aggregator should
be reducing the buckets (since the aggregator owns the buckets), rather
than asking one bucket to do all the reductions.

This commit basically moves the `Bucket.reduce()` method to the
InternalAgg and renames it `reduceBucket()`.  It also moves the
`createBucket()` (or equivalent) method from the bucket to the
InternalAgg as well.
2019-08-16 13:59:00 -04:00
Andrey Ershov dbc90653dc transport.publish_address should contain CNAME (#45626)
This commit adds CNAME reporting for transport.publish_address same way
it's done for http.publish_address.

Relates #32806
Relates #39970

(cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)
2019-08-16 17:42:00 +02:00
Armin Braun d6a9edea16
Lower Limit for Maximum Message Size in TcpTransport (#44496) (#45635)
* Since we're buffering network reads to the heap and then deserializing them it makes no sense to buffer a message that is 90% of the heap size since we couldn't deserialize it anyway
* I think `30%` is a more reasonable guess here given that we can reasonably assume that the deserialized message will be larger than the serialized message itself and processing it will take additional heap as well
2019-08-16 12:27:54 +02:00
Martijn van Groningen 5ea0985711
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-16 09:47:11 +02:00
Armin Braun a48242c371
Cleanup Redundant TransportLogger Instantiation (#43265) (#45629)
* This class' methods are all effectively `static` => make them `static` and stop instantiating it needlessly
2019-08-15 21:16:56 +02:00
Zachary Tong cd441f6906 Catch AllocatedTask registration failures (#45300)
When a persistent task attempts to register an allocated task locally,
this creates the Task object and starts tracking it locally.  If there
is a failure while initializing the task, this is handled by a catch
and subsequent error handling (canceling, unregistering, etc).

But if the task fails to be created because an exception is thrown
in the tasks ctor, this is uncaught and fails the cluster update
thread.  The ramification is that a persistent task remains in the
cluster state, but is unable to create the allocated task, and the
exception prevents other tasks "after" the poisoned task from starting
too.

Because the allocated task is never created, the cancellation tools
are not able to remove the persistent task and it is stuck as a
zombie in the CS.

This commit adds exception handling around the task creation,
and attempts to notify the master if there is a failure (so the
persistent task can be removed).  Even if this notification fails,
the exception handling means the rest of the uninitialized tasks
can proceed as normal.
2019-08-15 15:14:19 -04:00
Armin Braun de58353722
Lower Painless Static Memory Footprint (#45487) (#45619)
* Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable
* This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node)
   * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization
2019-08-15 19:41:45 +02:00
Alpar Torok 03a1645bc6 Use dynamic port ranges for ExternalTestCluster (#45601)
Moves methods added in #44213 and uses them to configure the port range
for `ExternalTestCluster` too.
These were still using `9300-9400` ( teh default ) and running into
races.
2019-08-15 16:40:12 +03:00
Armin Braun 1beea3588b
Make BlobStoreRepository Validation Read master.dat (#45546) (#45578)
* Fixing this for two reasons:
   1. Why not verify that the seed we wrote is actually there when we can
   2. The AWS S3 SDK started to log a bunch of WARN messages about not fully reading the stream now that we started to abuse the read blob as an `exists` check after removing that method from the blob container
2019-08-15 07:07:52 +02:00
Nick Knize 647a8308c3
[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363)
* Introduce Spatial Plugin (#44389)

Introduce a skeleton Spatial plugin that holds new licensed features coming to 
Geo/Spatial land!

* [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923)

Refactor DeprecatedParameters specific to legacy geo_shape out of
AbstractGeometryFieldMapper.TypeParser#parse.

* [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980)

Add a new ShapeFieldMapper to the xpack spatial module for
indexing arbitrary cartesian geometries using a new field type called shape.
The indexing approach leverages lucene's new XYShape field type which is
backed by BKD in the same manner as LatLonShape but without the WGS84
latitude longitude restrictions. The new field mapper builds on and
extends the refactoring effort in AbstractGeometryFieldMapper and accepts
shapes in either GeoJSON or WKT format (both of which support non geospatial
geometries).

Tests are provided in the ShapeFieldMapperTest class in the same manner
as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests.
Documentation for how to use the new field type and what parameters are
accepted is included. The QueryBuilder for searching indexed shapes is
provided in a separate commit.

* [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108)

Add a new ShapeQueryBuilder to the xpack spatial module for
querying arbitrary Cartesian geometries indexed using the new shape field
type.

The query builder extends AbstractGeometryQueryBuilder and leverages the
ShapeQueryProcessor added in the previous field mapper commit.

Tests are provided in ShapeQueryTests in the same manner as
GeoShapeQueryTests and docs are updated to explain how the query works.
2019-08-14 16:35:10 -05:00
Armin Braun e0d84e7178
Clean up Callback Chains and Duplicate in SnapshotResiliencyTests (#45398) (#45563)
* It's in the title, follow up to #45233
* Flatten more listeners into `StepListener`
* Remove duplication from repo and index bootstrap and asserting that the steps execute successfully
2019-08-14 21:53:07 +02:00
Armin Braun 5f6bc6fc2d
Prevent Leaking Search Tasks on Exceptions in FetchSearchPhase and DfsQueryPhase (#45500) (#45540)
* If `counter.onResult` throws an exception we might leak a transport task because the failure is not handled as a phase failure (instead it bubbles up in the transport service eventually hitting the `onFailure` callback again and couting down the `counter` twice).

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
2019-08-14 14:49:38 +02:00
Armin Braun 00e4fba2fb
Simplify and Optimize RestController Slightly (#45419) (#45485)
* Simplify the path iterator to generate less garbage
* `dispatchRequest` always terminates, adjust code accordingly
2019-08-13 10:43:30 +02:00
Martijn van Groningen 1951cdf1cb
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-13 09:12:31 +02:00
Julie Tibshirani dc1856ca53 Make sure to validate the type before attempting to merge a new mapping. (#45157)
Currently, when adding a new mapping, we attempt to parse + merge it before
checking whether its top-level document type matches the existing type. So when
a user attempts to introduce a new mapping type, we may give a confusing error
message around merging instead of complaining that it's not possible to add
more than one type ("Rejecting mapping update to [my-index] as the final
mapping would have more than 1 type...").

This PR moves the type validation to the start of
`MetaDataMappingService#applyRequest` so that we make sure the type matches
before performing any mapper merging.

We already partially addressed this issue in #29316, but the tests there
focused on `MapperService` and did not catch this problem with end-to-end
mapping updates.

Addresses #43012.
2019-08-12 14:28:03 -07:00
Zachary Tong 4d97d2c50f Revert "Only execute one final reduction in InternalAutoDateHistogram (#45359)"
This reverts commit c0ea8a867e.
2019-08-12 17:17:17 -04:00
Julie Tibshirani 8c4394d5d7 Fix a bug where mappings are dropped from rollover requests. (#45411)
We accidentally introduced this bug when adding a typeless version of the
rollover request. The bug is not present if include_type_name is set to true.
2019-08-12 12:46:27 -07:00
Michael Basnight a521e4c86f Retrieve processors instead of checking existence (#45354)
The previous hasProcessors method would validate if a processor was
present within a pipeline, but would not return the contents of the
processors. This does not allow a consumer to inspect the processor for
specific metadata. The method now returns the list of processors based
on the class of the processor passed in.
2019-08-12 13:48:17 -05:00
Zachary Tong 472f6ef41a Mute InternalAutoDateHistogramTests#testReduceRandom() 2019-08-12 14:45:08 -04:00
Zachary Tong c0ea8a867e Only execute one final reduction in InternalAutoDateHistogram (#45359)
Because auto-date-histo can perform multiple reductions while
merging buckets, we need to ensure that the intermediate reductions
are done with a `finalReduce` set to false to prevent Pipeline aggs
from generating their output.

Once all the buckets have been merged and the output is stable,
a mostly-noop reduction can be performed which will allow pipelines
to generate their output.
2019-08-12 14:07:38 -04:00
Albert Zaharovits 2cb172f079
CreateIndex and PutIndexTemplate with typeless mapping (#45120)
This commit makes sure that mapping parameters to `CreateIndex` and
`PutIndexTemplate` are keyed by the type name. 

`IndexCreationTask` expects mappings to be keyed by the type name.
It asserts this for template mappings but not for the mappings in the request.
The `CreateIndexRequest` and `RestCreateIndexAction` mostly make it sure
that the mapping is keyed by a type name, but not always.
When building the create-index request outside of the REST handler, there are
a few methods to set the mapping for the request. Some of them add the type
name some of them do not.
For example, `CreateIndexRequest#mapping(String type, Map<String, ?> source)`
adds the type name, but
`CreateIndexRequest#mapping(String type, XContentBuilder source)` does not.
This PR asserts the type name in the request mapping inside `IndexCreationTask`
and makes all `CreateIndexRequest#mapping` methods add the type name.
2019-08-12 08:05:07 +03:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Nhat Nguyen cf9a73b5ac Call afterWriteOperation after trim translog in peer recovery (#45182)
testShouldFlushAfterPeerRecovery was added #28350 to make sure the
flushing loop triggered by afterWriteOperation eventually terminates.
This test relies on the fact that we call afterWriteOperation after
making changes in translog. In #44756, we roll a new generation in
RecoveryTarget#finalizeRecovery but do not call afterWriteOperation.

Relates #28350
Relates #45073
2019-08-10 22:59:02 -04:00
Nhat Nguyen 25c6102101 Trim local translog in peer recovery (#44756)
Today, if an operation-based peer recovery occurs, we won't trim
translog but leave it as is. Some unacknowledged operations existing in
translog of that replica might suddenly reappear when it gets promoted.
With this change, we ensure trimming translog above the starting
sequence number of phase 2. This change can allow us to read translog
forward.
2019-08-10 22:59:02 -04:00
Armin Braun 1cd464d675
Isolate Request in Call-Chain for REST Request Handling (#45130) (#45417)
* Follow up to #44949
* Stop using a special code path for multi-line JSON and instead handle its detection like that of other XContent types when creating the request
* Only leave a single path that holds a reference to the full REST request
   * In the next step we can move the copying of request content to happen before the actual request handling and make it conditional on the handler in question to stop copying bulk requests as suggested in #44564
2019-08-10 10:21:01 +02:00
Armin Braun d1ed9bdbfd
Use StepListener to Simplify SnapshotResiliencyTests (#45233) (#45386)
* Reduces complicated callback relations in `testSuccessfulSnapshotAndRestore` to flat steps of sequential actions
* Will refactor the other tests in this suit as a follow up
   * This format certainly makes it easier to create more complicated tests that involve multiple subsequent snapshots as it would allow adding loops
2019-08-09 18:19:48 +02:00
Yannick Welsch 9e6d874a41
Show BWC version in ClusterFormationFailureHelper (#45352)
When having a cluster state from 6.x, display the metadata version as the cluster state version.
Avoids confusion where a cluster state from 6.x is displayed as version 0 even if has some actual
content.
2019-08-09 16:23:38 +02:00
Yannick Welsch 5ddeb488a6 Allow _update on write alias (#45318)
Using the document update API on aliases with a write index does not work.

Follow-up to #31520
2019-08-09 11:44:24 +02:00
Martijn van Groningen f1ee29f22e
Added a custom api to perform the msearch more efficiently for enrich processor (#43965)
Currently the msearch api is used to execute buffered search requests;
however the msearch api doesn't deal with search requests in an intelligent way.
It basically executes each search separately in a concurrent manner.

This api reuses the msearch request and response classes and executes
the searches as one request in the node holding the enrich index shard.
Things like engine.searcher and query shard context are only created once.
Also there are less layers than executing a regular msearch request. This
results in an interesting speedup.

Without this change, in a single node cluster, enriching documents
with a bulk size of 5000 items, the ingest time in each bulk response
varied from 174ms to 822ms. With this change the ingest time in each
bulk response varied from 54ms to 109ms.

I think we should add a change like this based on this improvement in ingest time.

However I do wonder if instead of doing this change, we should improve
the msearch api to execute more efficiently. That would be more complicated
then this change, because in this change the custom api can only search
enrich index shards and these are special because they always have a single
primary shard. If msearch api is to be improved then that should work for
any search request to any indices. Making the same optimization for
indices with more than 1 primary shard requires much more work.

The current change is isolated in the enrich plugin and LOC / complexity
is small. So this good enough for now.
2019-08-09 09:11:04 +02:00
Tal Levy 2a99eaa7c2 Revert "removes the CellIdSource abstraction from geo-grid aggs (#45307) (#45353)"
This reverts commit 7b0a8040de.
2019-08-08 17:40:03 -07:00
Armin Braun 12ed6dc999
Only retain reasonable history for peer recoveries (#45208) (#45355)
Today if a shard is not fully allocated we maintain a retention lease for a
lost peer for up to 12 hours, retaining all operations that occur in that time
period so that we can recover this replica using an operations-based recovery
if it returns. However it is not always reasonable to perform an
operations-based recovery on such a replica: if the replica is a very long way
behind the rest of the replication group then it can be much quicker to perform
a file-based recovery instead.

This commit introduces a notion of "reasonable" recoveries. If an
operations-based recovery would involve copying only a small number of
operations, but the index is large, then an operations-based recovery is
reasonable; on the other hand if there are many operations to copy across and
the index itself is relatively small then it makes more sense to perform a
file-based recovery. We measure the size of the index by computing its number
of documents (including deleted documents) in all segments belonging to the
current safe commit, and compare this to the number of operations a lease is
retaining below the local checkpoint of the safe commit. We consider an
operations-based recovery to be reasonable iff it would involve replaying at
most 10% of the documents in the index.

The mechanism for this feature is to expire peer-recovery retention leases
early if they are retaining so much history that an operations-based recovery
using that lease would be unreasonable.

Relates #41536
2019-08-09 01:56:32 +02:00
Tal Levy 7b0a8040de
removes the CellIdSource abstraction from geo-grid aggs (#45307) (#45353)
CellIdSource is a helper ValuesSource that encodes GeoPoint
into a long-encoded representation of the grid bucket the point
is associated with. This complicates thing as usage evolves to
support shapes that are associated with more than one bucket ordinal.
2019-08-08 16:33:16 -07:00
Armin Braun b19de55095
Add missing wait to testAutomaticReleaseOfIndexBlock (#45342) (#45351)
Today the test waits for one of the shards to be blocked, but this does not
mean that the block has been applied on all nodes, so a subsequent indexing
operation may still go through.

Fixes #45338
2019-08-08 22:39:22 +02:00
Henning Andersen d139896b66
Reindex share retry between hit sources (#44203) (#45348)
The client and remote hit sources had each their own retry mechanism,
which would do the same. Supporting resiliency we would have to expand
on the retry mechanisms and as a preparation for that, the retry
mechanism is now shared such that each sub class is only responsible for
sending requests and converting responses/failures to common format.

Part of #42612
2019-08-08 22:01:29 +02:00
Christoph Büscher a552b33276 Fix occasional SuggestSearchIT failure (#45330)
Refreshes happening during indexing can result differen segment counts and
slightly skewed term statistics, which in turn has the potential to change
suggestion output slightly. In order to prevent this, disable refresh for the
affected tests.

Closes #43261
2019-08-08 21:06:32 +02:00
Martijn van Groningen bb429d3b5c
required changes after merge 2019-08-08 17:04:18 +02:00
Dimitris Athanasiou e53bb050db Mute testAutomaticReleaseOfIndexBlock
Relates #45338
2019-08-08 17:56:41 +03:00
Martijn van Groningen 708f856940
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-08 16:52:45 +02:00
Andrey Ershov 07c656fba9 Mute testCustomDataPaths on Windows
See #45333

(cherry picked from commit 671e1ad1068aee4b593ad0c8ab13ff60b4f125b8)
2019-08-08 16:26:56 +02:00
Zachary Tong 86d6597890 Use newIndexSearcher() instead of newSearcher() (#45248)
`newSearcher()` from lucene can randomly choose index readers which
are not compatible with our tests, like ParallelCompositeReader.
The `newIndexSearcher()` method on AggregatorTestCase is a wrapper
similar to newSearcher but compatible with our tests
2019-08-08 09:34:38 -04:00
Martijn van Groningen e066133016
Change the ingest simulate api to not include dropped documents (#44161)
If documents are dropped by the `drop` processor then
these documents are returned as a `null` value in the response.

=== Example

Create pipeline:

```
PUT _ingest/pipeline/droppipeline
{
    "processors": [
        {
            "set": {
                "field": "bla",
                "value": "val"
            }
        },
        {
            "drop": {}
        }
    ]
}
```

Simulate request:

POST _ingest/pipeline/droppipeline/_simulate
{
    "docs": [
        {
            "_source": {
                "message": "text"
            }
        }
    ]
}

Response:

```
{
    "docs": [
        null
    ]
}
```

Response if verbose is enabled:

```
{
    "docs": [
        {
            "processor_results": [
                {
                    "doc": {
                        "_index": "_index",
                        "_type": "_doc",
                        "_id": "_id",
                        "_source": {
                            "message": "text",
                            "bla": "val"
                        },
                        "_ingest": {
                            "timestamp": "2019-07-10T11:07:10.758315Z"
                        }
                    }
                },
                null
            ]
        }
    ]
}
```

Closes #36150

* Abort pipeline simulation in verbose mode when document has been dropped
by drop processor
2019-08-08 13:04:33 +02:00
Martijn van Groningen fb959d188c
Backport: Add description to force-merge tasks (#41365) (#45191)
* Add description to force-merge tasks (#41365)

This is static information that is part of the force merge request.

Relates to #15975
2019-08-08 08:15:09 +02:00
Michael Basnight 89861d0884 Add ingest processor existence helper method (#45156)
This commit adds a helper method to the ingest service allowing it to
inspect a pipeline by id and verify the existence of a processor in the
pipeline. This work exposed a potential bug in that some processors
contain inner processors that are passed in at instantiation. These
processors needed a common way to expose their inner processors, so the
WrappingProcessor was created in order to expose the inner processor.
2019-08-07 11:19:04 -05:00
Bukhtawar cd304c4def Auto-release flood-stage write block (#42559)
If a node exceeds the flood-stage disk watermark then we add a block to all of
its indices to prevent further writes as a last-ditch attempt to prevent the
node completely exhausting its disk space. However today this block remains in
place until manually removed, and this block is a source of confusion for users
who current have ample disk space and did not even realise they nearly ran out
at some point in the past.

This commit changes our behaviour to automatically remove this block when a
node drops below the high watermark again. The expectation is that the high
watermark is some distance below the flood-stage watermark and therefore the
disk space problem is truly resolved.

Fixes #39334
2019-08-07 11:03:53 +01:00
Tanguy Leroux a869342910 Restore DefaultShardOperationFailedException's reason after deserialization (#45203)
The reason field of DefaultShardOperationFailedException is lost during serialization. 
This is sad because this field is checked for nullity during xcontent generation and it 
means that the cause won't be included in the generated xcontent and won't be 
printed in two REST API responses (Close Index API and Indices Shard Stores API).

This commit simply restores the reason from the cause during deserialization.
2019-08-07 10:37:15 +02:00
Jason Tedor bd59ee6c72
Fix clock used in update requests (#45262)
We accidentally switched to using the relative time provider here. This
commit fixes this by switching to the appropriate absolute clock.
2019-08-06 21:15:21 -04:00
David Turner f5d1381e01 Remove always-true param from IndicesService#stats (#45231)
Parameter `includePrevious` is always true, so this commit inlines it.
2019-08-06 17:22:11 +01:00
David Turner 355713b9ca
Improve slow logging in MasterService (#45241)
Adds a tighter threshold for logging a warning about slowness in the
`MasterService` instead of relying on the cluster service's 30-second warning
threshold. This new threshold applies to the computation of the cluster state
update in isolation, so we get a warning if computing a new cluster state
update takes longer than 10 seconds even if it is subsequently applied quickly.
It also applies independently to the length of time it takes to notify the
cluster state tasks on completion of publication, in case any of these
notifications holds up the master thread for too long.

Relates #45007
Backport of #45086
2019-08-06 17:01:49 +01:00
Tanguy Leroux 772ce1f599
Add deprecation warning for Force Merge API (#44903)
This commit adds a deprecation warning in 7.x for the Force Merge API 
when both only_expunge_deletes and max_num_segments are set in a request.

Relates #44761
2019-08-06 16:04:24 +02:00
Jason Tedor 5b1b146099
Normalize environment paths (#45179)
This commit applies a normalization process to environment paths, both
in how they are stored internally, also their settings values. This
normalization is done via two means:
 - we make the paths absolute
 - we remove redundant name elements from the path (what Java calls
   "normalization")

This change ensures that when we compare and refer to these paths within
the system, we are using a common ground. For example, prior to the
change if the data path was relative, we would not compare it correctly
to paths from disk usage. This is because the paths in disk usage were
being made absolute.
2019-08-06 06:04:30 -04:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Igor Motov b5f88120b5 Geo: add Geometry-based query builders to QueryBuilders (#45058)
Add Geometry-based method for creation of query builders in
QueryBuilder

Relates to #44715
2019-08-05 13:34:48 -04:00
Zachary Tong 3df1c76f9b Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179)
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
2019-08-05 12:18:40 -04:00
Zachary Tong e5079ac288
[7.x backport] Add more flexibility to MovingFunction window alignment (#45159)
Introduce shift field to MovingFunction aggregation.

By default, shift = 0. Behavior, in this case, is the same as before.
Increasing shift by 1 moves starting window position by 1 to the right.

    To simply include current bucket to the window, use shift = 1
    For center alignment (n/2 values before and after the current bucket), use shift = window / 2
    For right alignment (n values after the current bucket), use shift = window.
2019-08-05 11:56:52 -04:00
Nhat Nguyen 56083ba1ff Remove assertion after locally recover replica (#45181)
If the disk becomes broken after we have locally recovered shard up to
the global checkpoint, then the assertion won't hold.
2019-08-05 10:48:02 -04:00
David Turner 13a167051f
Remove fileBasedRecovery flag (#45146)
Today `RecoveryTarget#prepareForTranslogOperations` takes a boolean flag
indicating whether the recovery is file-based or not. This was used in 6.x to
bootstrap some commit data that were missing in indices created in 5.x:

b506955f8d/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java (L298-L300)

This flag no longer has any effect, so this commit removes it.

Backport of #45131 to 7.x.
2019-08-05 08:17:40 +01:00
Armin Braun 41815ed614
Optimize StreamInput#readString (#44930) (#45180)
* Resolve TODO in `readString` by moving to reading chunks of `byte[]` instead of going byte by byte
* Motivated by `readString` showing up as a significant user of CPU time on the IO thread in Rally PMC benchmark
* Benchmarking this:
  * Could not reproduce a slowdown in the potential worst case (one or two non-ascii chars) since in this case the cost of creating the string itself exceeds the read times anyway
  * Speedup for 50%+ for reading 200 char ascii strings from `ByteBuf` or pages bytes backed streams
  * Longer strings obviously get bigger speedups
  * More ascii chars -> more speedup
2019-08-05 07:22:42 +02:00
Jason Tedor d78ecd9c09
Use the full hash in build info (#45163)
This commit switches to using the full hash to build into the JAR
manifest, which is used in node startup and the REST main action to
display the build hash.
2019-08-03 11:27:53 -04:00
Tim Brooks 984ba82251
Move nio channel initialization to event loop (#45155)
Currently in the transport-nio work we connect and bind channels on the
a thread before the channel is registered with a selector. Additionally,
it is at this point that we set all the socket options. This commit
moves these operations onto the event-loop after the channel has been
registered with a selector. It attempts to set the socket options for a
non-server channel at registration time. If that fails, it will attempt
to set the options after the channel is connected. This should fix
#41071.
2019-08-02 17:31:31 -04:00
Zachary Tong ffbe047c32 Revert "Add more flexibility to MovingFunction window alignment (#44360)"
This reverts commit 1a58a487f0.
2019-08-02 15:16:04 -04:00
Nikita Glashenko 1a58a487f0 Add more flexibility to MovingFunction window alignment (#44360)
Introduce shift field to MovingFunction aggregation.

By default, shift = 0. Behavior, in this case, is the same as before.
Increasing shift by 1 moves starting window position by 1 to the right.

    To simply include current bucket to the window, use shift = 1
    For center alignment (n/2 values before and after the current bucket), use shift = window / 2
    For right alignment (n values after the current bucket), use shift = window.
2019-08-02 15:10:21 -04:00
David Turner 9ff320d967
Use index for peer recovery instead of translog (#45137)
Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.

Reverts #38904 and #42211
Relates #41536
Backport of #45136 to 7.x.
2019-08-02 15:00:43 +01:00
Armin Braun 9450505d5b
Stop Passing Around REST Request in Multiple Spots (#44949) (#45109)
* Stop Passing Around REST Request in Multiple Spots

* Motivated by #44564
  * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap.
  * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.
2019-08-02 07:31:38 +02:00
Jim Ferenczi 3f94e2ea43 Sparse role queries can throw an NPE (#45053)
Sparse role queries are executed differently than other queries in order
to account for the fact that most of the documents are filtered from search.
However this special execution does not set the scorer for the query so any
collector that needs to access the score of a document fails with an NPE.
This change fixed this bug by setting the scorer before collecting any hits
when intersecting the main query and the sparse role.
2019-08-01 20:21:53 +02:00
William Brafford 5f50da947a
Fix bug in the Settings#processSetting method (#45095)
The Settings#processSetting method is intended to take a setting map and add a
setting to it, adjusting the keys as it goes in case of "conflicts" where the
new setting implies an object where there is currently a string, or vice
versa. processSetting was failing in two cases: adding a setting two levels
under a string, and adding a setting two levels under a string and four levels
under a map. This commit fixes the bug and adds test coverage for the
previously faulty edge cases.

* fix issue #43791 about settings
* add unit test in testProcessSetting()
2019-08-01 13:27:08 -04:00
Yannick Welsch 917510d3e4 Always use primary term of operation in InternalEngine (#45083)
We keep adding the current primary term to operations for which we do not assign a sequence
number. This does not make sense anymore as all operations which we care about have
sequence numbers now. The goal of this commit is to clean things up in InternalEngine and
reduce the complexity.
2019-08-01 17:30:00 +02:00
Armin Braun 48dc53f8d2
Make PathTrieIterator a Little more Memory Efficient (#44951) (#45070)
* There's no need to have the trie iterator hold another reference to the request object (which could be huge, see #44564)
* Also removed unused boolean field from trie node
2019-08-01 17:26:08 +02:00
Nhat Nguyen 3a487379c3 Tighten no pending scheduled refresh check (#45025)
Previously, we use ThreadPoolStats to ensure that the scheduledRefresh
triggered by the internal refresh setting update is executed before we
index a new document. With that change (#40387), this test did not fail for 
the last 3 months. However, using ThreadPoolStats is not entirely watertight
as both "active" and "queue" count can be 0 in a very small interval
when ThreadPoolExecutor pulls a task from the queue but before marking
the corresponding worker as active (i.e., lock it).

Closes #39565
2019-08-01 09:06:22 -04:00
David Turner c088bafbbc Wait for events in waitForRelocation (#45074)
Adds a `waitForEvents(Priority.LANGUID)` to the cluster health request in
`ESIntegTestCase#waitForRelocation()` to deal with the case that this health
request returns successfully despite the fact that there is a pending reroute task which
will relocate another shard.

Relates #44433
Fixes #45003
2019-08-01 13:47:39 +01:00
David Turner 532ade7816 More logging for slow cluster state application (#45007)
Today the lag detector may remove nodes from the cluster if they fail to apply
a cluster state within a reasonable timeframe, but it is rather unclear from
the default logging that this has occurred and there is very little extra
information beyond the fact that the removed node was lagging. Moreover the
only forewarning that the lag detector might be invoked is a message indicating
that cluster state publication took unreasonably long, which does not contain
enough information to investigate the problem further.

This commit adds a good deal more detail to make the issues of slow nodes more
prominent:

- after 10 seconds (by default) we log an INFO message indicating that a
  publication is still waiting for responses from some nodes, including the
  identities of the problematic nodes.

- when the publication times out after 30 seconds (by default) we log a WARN
  message identifying the nodes that are still pending.

- the lag detector logs a more detailed warning when a fatally-lagging node is
  detected.

- if applying a cluster state takes too long then the cluster applier service
  logs a breakdown of all the tasks it ran as part of that process.
2019-08-01 13:20:46 +01:00
Hendrik Muhs b3be8f75f0 Fix version logic after 7.3 release (BWC) (#45077)
removes unreleased version 7.2.2 after release of 7.3.0 as it breaks the version verifier, add documentation that explains the logic
2019-08-01 12:43:23 +02:00
Christoph Büscher a669efd2a4
Remove left-over AwaitsFix in RateClusterStateIT (#45043)
Issues are closed and fixes in #42580 and #42430 seem to be merged to 7.x at
least.
2019-08-01 12:03:29 +02:00
Martijn van Groningen 39f280364b
required change after merging in 7.x branch 2019-08-01 13:44:42 +07:00
Martijn van Groningen aae2f0cff2
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-08-01 13:38:03 +07:00
Tim Brooks aff66e3ac5
Add Cors integration tests (#44361)
This commit adds integration tests to ensure that the basic cors
functionality works for the netty and nio transports.
2019-07-31 14:24:23 -06:00
Armin Braun 8d63bd1d1e
Cleanup Various Action- Listener and Runnable Usages (#42273) (#45052)
* Dry up code for creating simple `ActionRunnable` a little
* Shorten some other code around `ActionListener` usage, in particular
when wrapping it in a `TransportResponseListener`
2019-07-31 18:55:31 +02:00
Armin Braun ee663dc9ac
Reenable Parallel Restore Test on Windows (#45037) (#45050)
* As a result of #44096 this test shouldn't fail anymore on `master` and `7.4`+ so we should reenable it there
  * For older versions we won't backport that change so the tests should stay disabled there
* Closes #44671
2019-07-31 18:35:34 +02:00
Christoph Büscher 35291ae175
Remove muted AckIT and AckClusterUpdateSettingsIT (#45044)
Reading up on #33673 it looks like parts of these tests have been reworked and
there is no intention to fix the remains on 7.x, so I think we can remove the
entire test.
2019-07-31 17:17:21 +02:00
Luca Cavanna 8cc3c0dd93 Remove task null check in TransportAction (#45014)
The task that TaskManager#register returns cannot be null. The method
enforces that it is not null after calling request#createTask. It is
then needless to check for null in the listener later. Also, added the
call to the delegate listener in a finally block, just to make sure.
2019-07-31 17:16:41 +02:00
Christoph Büscher e85b53a955
Remove left-over AwaitsFix in DedicatedClusterSnapshotRestoreIT (#45042)
The issue mentioned (#38845) seems to have been closed with #38891 so the test
can be re-activated.
2019-07-31 17:15:41 +02:00
Armin Braun c7d7230524
Stop Recreating Wrapped Handlers in RestController (#44964) (#45040)
* We shouldn't be recreating wrapped REST handlers over and over for every request. We only use this hook in x-pack and the wrapper there does not have any per request state.
  This is inefficient and could lead to some very unexpected memory behavior
   => I made the logic create the wrapper on handler registration and adjusted the x-pack wrapper implementation to correctly forward the circuit breaker and content stream flags
2019-07-31 17:11:34 +02:00
Zachary Tong c25f3dd5d0
Introduce 7.3.1 version (#45046) 2019-07-31 10:53:55 -04:00
Andrey Ershov c27ac3d24c Unmute testClusterJoinDespiteOfPublishingIssues and testElectMasterWithLatestVersion (#38555)
See my comments for #37539 and #37685

(cherry picked from commit 038d4ab2940340eca942e32b54044f183b7804d9)
2019-07-31 14:55:02 +02:00
David Roberts 5e3010a606 Use system context for looking up connected nodes (#43991)
When finding nodes in a connected cluster for cross cluster
search the requests to get cluster state on the connected
cluster should be made in the system context because
logically they are equivalent to checking a single detail
in the local cluster state and should not require that the
user who made the request that is using this method in its
implementation is authorized to view the entire cluster
state.

Fixes #43974
2019-07-31 09:09:56 +01:00
Igor Motov 1a1bb4707d Geo: move indexShape to AbstractGeometryFieldMapper.Indexer (#44979)
Move indexShape functionality into AbstractGeometryFieldMapper to make
it more unit testable.

Relates to #43644
2019-07-30 14:50:23 -04:00
Mayya Sharipova a154b73d99 Assure index ops are successful for SimpleNestedIT (#44815)
relates to #44486
2019-07-30 14:24:28 -04:00
Nhat Nguyen 979d0a71c7 Remove leniency during replay translog in peer recovery (#44989)
This change removes leniency in InternalEngine during replaying translog
in peer recovery.
2019-07-30 13:25:15 -04:00
Jake Landis 41a99c9e4a introduce 7.2.2 as a version (#44371)
* introduce 7.2.2 as a version
2019-07-30 18:52:34 +02:00
Jake Landis 03fea1c503 introduce 6.8.3 as a version (#44708) 2019-07-30 18:48:41 +02:00
David Kyle 78aa6143a6 Mute FilteringAllocationIT testTransientSettingsStillApplied
Relates to https://github.com/elastic/elasticsearch/issues/45003
2019-07-30 14:10:50 +01:00
Yannick Welsch c1b569ed4b Revert "Mute Zen1IT#testMixedClusterDisruption"
This reverts commit cf78ca58e3.
2019-07-30 13:10:14 +02:00
David Turner 55f1dd8da6 Close nodes properly in Coordinator tests (#44967)
Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses
`onNode()` so has no effect if the node is not in the current list of nodes.
It also discards the `Runnable` it creates without having run it, so has no
effect anyway.

This commit makes these tests much stricter about properly closing the nodes
started during `Coordinator` tests, by tracking the persisted states that are
opened, and adds an assertion to catch the trappy requirement that the closing
node still belongs to the cluster.
2019-07-30 11:47:36 +01:00
David Kyle cf78ca58e3 Mute Zen1IT#testMixedClusterDisruption 2019-07-30 11:33:39 +01:00
Jim Ferenczi 43bd8f2ba0 Fix aggregators early termination with breadth-first mode (#44963)
This commit fixes a bug when a deferred aggregator tries to early terminate the collection. In such case the CollectionTerminatedException is not caught and
the search fails on the shard. This change makes sure that we catch the exception in order to continue the deferred collection on the next leaf.

Fixes #44909
2019-07-30 11:26:40 +02:00
Andrey Ershov 5a0bd696fc
Snapshot tool S3 cleanup 7.x backport (#44575)
Backport of #44551
2019-07-30 11:02:08 +02:00
Nhat Nguyen 4813728783 Remove leniency in reset engine from translog (#44711)
Replaying operations from the local translog must never fail as those
operations were processed successfully on the primary before and the
mapping is up to update already. This change removes leniency during
resetting engine from translog in IndexShard and InternalEngine.
2019-07-29 16:31:45 -04:00
Jack Conradson 1a21682ed0 Fix JodaCompatibleZonedDateTime casts in Painless (#44874)
This is a temporary fix during the Joda to Java datetime transition. This will 
implicitly cast a JodaCompatibleZonedDateTime to a ZonedDateTime for 
both def and static types. This is necessary to insulate users from needing 
to know about JodaCompatibleZonedDateTime explicitly.
2019-07-29 12:05:26 -07:00
Igor Motov b6cef227a5 Geo: fix geo query decomposition (#44924)
The recent refactoring introduced an issue where queries where not
going through the decomposition processing.

Fixes #44891
2019-07-29 11:48:24 -04:00
Luca Cavanna a3cc32da64 TaskListener#onFailure to accept Exception instead of Throwable (#44946)
TaskListener accepts today Throwable in its onFailure method. Though
looking at where it is called (TransportAction), it can never be
notified of a Throwable.

This commit changes the signature of TaskListener#onFailure so that it
accepts an `Exception` rather than a `Throwable` as second argument.
2019-07-29 16:47:19 +02:00
Michał Perlak 245c9b7914 Optimize Min and Max BKD optimizations (#44315)
MinAggregator - skip BKD optimization when no result found after 1024 lookups.
MaxAggregator - skip unnecessary conversions.
2019-07-29 10:04:39 -04:00
Yannick Welsch 24873dd3e3 Do not block transport thread on startup (#44939)
We currently block the transport thread on startup, which has caused test failures. I think this is
some kind of deadlock situation. I don't think we should even block a transport thread, and
there's also no need to do so. We can just reject requests as long we're not fully set up. Note
that the HTTP layer is only started much later (after we've completed full start up of the
transport layer), so that one should be completely unaffected by this.

Closes #41745
2019-07-29 11:35:17 +02:00
Armin Braun f5efafd4d6
Cleanup Deadcode o.e.indices (#44931) (#44938)
* none of this is used anywhere
2019-07-29 10:38:35 +02:00
Martijn van Groningen db49cb505e
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-29 14:45:10 +07:00
Igor Motov cfc8d17bb4 Geo: refactor geo mapper and query builder (#44884)
Refactors out the indexing and query generation logic out of the
mapper and query builder into a separate unit-testable classes.
2019-07-26 16:48:31 -04:00
Yannick Welsch 1561ab5420 Guard open connection call in RemoteClusterConnection (#44921)
Fixes an issue where a call to openConnection was not properly guarded, allowing an exception
to bubble up to the uncaught exception handler, causing test failures.

Closes #44912
2019-07-26 22:27:45 +02:00
Tanguy Leroux e1b626b947 Ensure index is green in SimpleClusterStateIT.testIndicesOptions() (#44893)
SimpleClusterStateIT testIndicesOptions failed in #44817 because it tries to close 
an index at the beginning of the test. With random index settings, it is possible that 
the index has a high number of shards (10) and replicas (1), which means that on 
CI this index can take time to be fully allocated.

The close index request can fail in the case where replicas are still recovering operations. 
Thiscommit adds a simple ensureGreen() at the beginning of the test to be sure that all 
replicas are started before trying to close the index.

closes #44817
2019-07-26 17:07:53 +02:00
Armin Braun 1340ff19bc
Fix Test Failure in ScalingThreadPoolTests (#44898) (#44901)
* Due to #44894 some constellations log a deprecation warning here now
* Fixed by checking for that
2019-07-26 17:05:50 +02:00
Tanguy Leroux 8848fcfb22 Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode (#44860)
The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails 
because the resize operation is not acknowledged (see #44736). This resize 
operation creates a new index "splitagain" and it results in a cluster state 
update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() 
to create the resized index). This cluster state update is expected to be 
acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but 
this is not always true: the data node that was just stopped in the test before 
executing the resize operation might still be considered as a "faulty" node
 (and not yet removed from the cluster nodes) by the FollowersChecker. The 
cluster state is then acked on all nodes but one, and it results in a non 
acknowledged resize operation.

This commit adds an ensureStableCluster() check after stopping the node in 
the test. The goal is to ensure that the data node has been correctly removed 
from the cluster and that all nodes are fully connected to each before moving 
forward with the resize operation.

Closes #44736
2019-07-26 10:14:27 +02:00
Jason Tedor 6ea2b5dec0
Deprecate setting processors to more than available (#44889)
Today the processors setting is permitted to be set to more than the
number of processors available to the JVM. The processors setting
directly sizes the number of threads in the various thread pools, with
most of these sizes being a linear function in the number of
processors. It doesn't make any sense to set processors very high as the
overhead from context switching amongst all the threads will overwhelm,
and changing the setting does not control how many physical CPU
resources there are on which to schedule the additional threads. We have
to draw a line somewhere and this commit deprecates setting processors
to more than the number of available processors. This is the right place
to draw the line given the linear growth as a function of processors in
most of the thread pools, and that some are capped at the number of
available processors already.
2019-07-26 17:06:44 +09:00
Ignacio Vera 821f6f893b
Upgrade to Lucene 8.2.0 release (#44859) (#44892) 2019-07-26 08:14:59 +02:00
Nhat Nguyen d128188c28 Return seq_no and primary_term in noop update (#44603)
With this change, we will return primary_term and seq_no of the current
document if an update is detected as a noop. We already return the
version; hence we should also return seq_no and primary_term.

Relates #42497
2019-07-25 19:16:56 -04:00
Yannick Welsch bd8470e738 Asynchronously connect to remote clusters (#44825)
Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters.

Relates to #40150
2019-07-25 22:59:59 +02:00
Yannick Welsch 0ce841915c Add Clone Index API (#44267)
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.

Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.

Closes #44128
2019-07-25 22:02:28 +02:00
Ryan Ernst 03dd22b56c Add missing ZonedDateTime methods for joda compat layer (#44829)
While joda no longer exists in the apis for 7.x, the compatibility layer
still exists with helper methods mimicking the behavior of joda for
ZonedDateTime objects returned for date fields in scripts. This layer
was originally intended to be removed in 7.0, but is now likely to exist
for the lifetime of 7.x.

This commit adds missing methods from ChronoZonedDateTime to the compat
class. These methods were not part of joda, but are needed to act like a
real ZonedDateTime.

relates #44411
2019-07-25 11:45:57 -07:00
Julie Tibshirani acb7f599a3 Fix an NPE when requesting inner hits and _source is disabled. (#44836)
This PR makes two changes to FetchSourceSubPhase when _source is disabled and
we're in a nested context:
* If no source filters are provided, return early to avoid an NPE.
* If there are source filters, make sure to throw an exception.

The behavior was chosen to match what currently happens in a non-nested context.
2019-07-25 10:38:00 -07:00
James Baiera c5528a25e6 Merge branch '7.x' into enrich-7.x 2019-07-25 13:12:56 -04:00
Nicholas Knize 48757da6e1 [GEO] Fix GeoShapeQueryBuilder to check for valid spatial relations
Refactor left out the spatial strategy check in GeoShapeQueryBuilder.relation
setter method. This commit adds that check back in.
2019-07-25 11:32:13 -05:00
Nick Knize 133f848e9f [Geo] Refactor GeoShapeQueryBuilder to derive from AbstractGeometryQueryBuilder (#44780)
Refactors GeoShapeQueryBuilder to derive from a new AbstractGeometryQueryBuilder that provides common parsing and build logic for spatial geometries. This will allow development of custom geometry queries by extending AbstractGeometryQueryBuilder preventing duplication of common spatial query logic.
2019-07-25 11:32:13 -05:00
Armin Braun 383d7b7713
Cleanup Dead Code in Index Creation (#44784) (#44822)
* Cleanup Dead Code in Index Creation
* This is all unused and the state of a create request is always `OPEN`
2019-07-25 10:50:04 +02:00
Yannick Welsch e0d4544ef6 Close connection manager on current thread in RemoteClusterConnection (#44805)
The problem is that RemoteClusterConnection closes the connection manager asynchronously, which races with the threadpool being shutdown at the end of the test.

Closes #44339
Closes #44610
2019-07-25 09:34:41 +02:00
Igor Motov f9943a3e53 Geo: deprecate ShapeBuilder in QueryBuilders (#44715)
Removes unnecessary now timeline decompositions from shape builders
and deprecates ShapeBuilders in QueryBuilder in favor of libs/geo
shapes.

Relates to #40908
2019-07-24 14:27:58 -04:00
David Turner 4cfd2fc6b2 Fix testFirstListElementsToCommaDelimitedStringReportsFirstElementsIfLong (#44785)
This test can fail (super-rarely) if it generates a list of length 11
containing a duplicate, because the `.distinct()` reduces the list length to 10
and then it is not abbreviated any more. This change generalises the test to
cover lists of any random length.
2019-07-24 16:10:41 +01:00
Tanguy Leroux a8905ef142
[7.x] Add CloseIndexResponse to HLRC (#44349) (#44788)
The CloseIndexResponse was improved in #39687; this commit
exposes it in the HLRC.

Backport of #44349 to 7.x.
2019-07-24 15:51:01 +02:00
Dimitris Athanasiou 5453188cef [TEST] Mute SharedClusterSnapshotRestoreIT.testParallelRestoreOperationsFromSingleSnapshot
This was supposed to be muted in #44675 and its backports but that PR accidentally muted
another test.

Relates #44671
2019-07-24 14:28:09 +03:00
Armin Braun 4a3218551c
Fix ConnectionManagerTests (#44769) (#44789)
* In both fake connection validators we were potentially executing the listener twice. This lead to the situation that the locking via `connectionLock` that ensures that each listener is only executed once ever
would fail and the lister would run twice (in which case the listeners for that node are already `null` and we get an NPE)
* The fact that two different tests fail is due to the fact that we weren't safely shutting down the threadpool which meant the the task that trips the assertion (on the generic pool) would leak into the next test and fail it
* Closes #44758
2019-07-24 13:12:57 +02:00
Jason Tedor 4c77d5e2c7
Remove stale permissions from untrusted policy (#44783)
We have some old permissions lying around, granted to untrusted code
from the days of yore when we supported Groovy and Javascript
scripting. This commit removes these stale permissions.
2019-07-24 15:59:16 +09:00
Jason Tedor 659ebf6cfb
Notify systemd when Elasticsearch is ready (#44673)
Today our systemd service defaults to a service type of simple. This
means that systemd assumes Elasticsearch is ready as soon as the
ExecStart (bin/elasticsearch) process is forked off. This means that the
service appears ready long before it actually is, so before it is ready
to receive requests. It also means that services that want to depend on
Elasticsearch being ready to start can not as there is not a reliable
mechanism to determine this. This commit changes the service type to
notify. This requires that Elasticsearch sends a notification message
via libsystemd sd_notify method. This commit does that by using JNA to
invoke this native method. Additionally, we use this integration to also
notify systemd when we are stopping.
2019-07-24 14:04:36 +09:00
Armin Braun 818103ff1e
Fix testRetentionLeasesClearedOnRestore (#44754) (#44766)
* Fix this test randomly failing when running into async translog persistence edge case and failing to successfully close index
* Also, slightly improve debug logging on close failure
* Closes #44681
2019-07-23 21:29:07 +02:00
Igor Motov 9338fc8536 GEO: Switch to using GeoTestUtil to generate random geo shapes (#44635)
Switches to more robust way of generating random test geometries by
reusing lucene's GeoTestUtil. Removes duplicate random geometry
generators by moving them to the test framework.

Closes #37278
2019-07-23 14:30:41 -04:00
Armin Braun e5bd3ad0e9
Remove some Dead Code in o.e.transport (#44653) (#44734)
* None of this is used
2019-07-23 10:52:37 +02:00
David Turner ee23968f05 Ignore unknown fields if overriding node metadata (#44689)
The `elasticsearch-node override-version` command fails if it cannot read the
existing node metadata file. However, it reads this file strictly and fails if
there are any unknown fields, which means it will not be useful if we add
another field in future.

This commit adds leniency to this command, allowing it to ignore any unknown
fields and proceed with the downgrade. A downgrade is already unsafe, and the
user is already copiously warned about this, so being lenient in this case does
not make things much worse.
2019-07-23 08:54:58 +01:00
Jason Tedor 6928a315c4
Check shard limit after applying index templates (#44619)
Today when creating an index and checking cluster shard limits, we check
the number of shards before applying index templates. At this point, we
do not know the actual number of shards that will be used to create the
index. In a case when the defaults are used and a template would
override, we could be grossly underestimating the number of shards that
would be created, and thus incorrectly applying the limits. This commit
addresses this by checking the shard limits after applying index
templates.
2019-07-23 16:50:42 +09:00
Ignacio Vera 05ec970723
Support BucketScript paths of type string and array. (#44694) (#44731) 2019-07-23 09:05:47 +02:00
Ioannis Kakavas 3714cb63da Allow parsing the value of java.version sysprop (#44017)
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.

This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
 (see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it 
handles a version number part, like before, but additionally a 
PRE part that matches ([a-zA-Z0-9]+).

It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.

Resolves #43141
2019-07-22 20:14:56 +03:00
Tanguy Leroux bcb3563dcf Remove AllocationService.reroute(ClusterState, String, boolean) (#44629)
This commit removes the method AllocationService.reroute(ClusterState, String, boolean) 
in favor of AllocationService.reroute(ClusterState, String).

Motivations are:
    there are already 3 other reroute methods in this class
    this method is always called with the debug parameter set to false
    almost all tests use the method reroute(ClusterState, String)
2019-07-22 17:12:21 +02:00
Evgenia Badiyanova 5273a548a4 Unmute PendingTasksBlocksIT tests 2019-07-22 10:59:21 -04:00
Armin Braun 6ceae5d586
Document Type of Collections Returned by StreamInput (#44686) (#44688)
* As a result of #44665 the collections returned by the deserialization methods on `StreamInput` may be either mutable or immutable now,
this PR adds documentation for that fact
2019-07-22 16:06:34 +02:00
Evgenia Badiyanova 8ee4c4d5ba Mute some tests in PendingTasksBlocksIT
Tracked in #44695.
2019-07-22 09:55:07 -04:00
David Turner dcb3b2c18a Fix testPendingTasksWithClusterNotRecoveredBlock
In 7.x we cannot start a new master-eligible node before the cluster has formed
since we first try and update minimum_master_nodes and this is blocked. This
commit changes the test to start a data-only node so that no such adjustment is
necessary.

Relates #44685
2019-07-22 14:42:20 +01:00
Mayya Sharipova 972a49312c Fix testQuotedQueryStringWithBoost test (#43385)
Add more logging to indexRandom

Seems that asynchronous indexing from indexRandom sometimes indexes
the same document twice, which will mess up the expected score calculations.

For example, indexing:
{ "index" : {"_id" : "1" } }
{"important" :"phrase match", "less_important": "nothing important"}
{ "index" : {"_id" : "2" } }
{"important" :"nothing important", "less_important" :"phrase match"}
Produces the expected scores: 13.8 for doc1, and 1.38 for doc2

indexing:
{ "index" : {"_id" : "1" } }
{"important" :"phrase match", "less_important": "nothing important"}
{ "index" : {"_id" : "2" } }
{"important" :"nothing important", "less_important" :"phrase match"}
{ "index" : {"_id" : "3" } }
{"important" :"phrase match", "less_important": "nothing important"}
Produces scores: 9.4 for doc1, and 1.96 for doc2 which are found in the
error logs.

Relates to #43144
2019-07-22 08:44:31 -04:00
Przemyslaw Gomulka a154f49b94
Fix stats in slow logs to be a escaped JSON backport(#44642) #44687
Fields in JSON logs should be an escaped JSON fields. It is a broken json value at the moment
"stats": "["group1", "group2"]", -> "stats": "[\"group1\", \"group2\"]",
This should later be refactored into a JSON array of strings (the same as types in 7.x)
2019-07-22 14:28:39 +02:00
David Turner 0ce3114779 Allow pending tasks before state recovery (#44685)
Today we block access to the pending tasks API before the cluster has recovered
its state. There's no real need to do so, and the master does meaningful work
even before performing state recovery so it might sometimes be useful to allow
access to this API. This commit changes this API to ignore all cluster blocks.

Fixes #44652
2019-07-22 13:15:10 +01:00
Przemyslaw Gomulka 09e9c4cb59
Fix types field in JSON Search Slow Logs (#44641)
The field has to be defined in log4j2.properties and should be an
escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array
of strings.
2019-07-22 12:02:20 +02:00
Przemyslaw Gomulka fe20e217a4
Deprecation messages with the same key but different x-opaque-id are allowed backport(#44587) #44682
Deprecation logger was filtering log entries by key, that means that if two log messages with the same key are logged from different users, then the second log messages will be filtered.
This change allows to log deprecation message with the same key by different users.

relates #41354
backport #44587
2019-07-22 11:38:11 +02:00
Armin Braun a6adcecd20 Fix Tring to Mutate Immutable Collections
Fixes two spots where #44665 caused a previously mutable collection to now be read as an immutable one, leading to errors
2019-07-22 11:04:05 +02:00
Armin Braun b9067ba1ba
Remove Needless Synchronization in FollowersChecker (#44631) (#44680)
* It seems redundant to synchronize here and check that the map hasn't checked via the `isRunning` under the mutex
* The map won't change if under the mutex that locks on all the updates to it
* Without the mutex it's very unlikely to change inside the method call relative to the likelihood of changing until the generic pool where we check for `isRunning` again anyway

-> just remove the synchronization (it's on the IO loop) and check since we do check the running state on the generic pool under the mutex anyway when we actually fail it
2019-07-22 10:57:30 +02:00
Jason Tedor ff76b0af8b
Copy field names in stored fields context
We have to copy the field names otherwise we either have a handle of a
list that a caller might mutate or we might mutate when they aren't
expecting it, or worse, a handle of a list that is not mutable (and we
end up mutating the list).

Relates #44665
2019-07-22 17:40:07 +09:00
Alpar Torok b34ac66d96
Mute multiple tests on Windows (7.x) (#44676)
* Mute failing test

tracked in #44552

* mute EvilSecurityTests

tracking in #44558

* Fix line endings in ESJsonLayoutTests

* Mute failing ForecastIT  test on windows

Tracking in #44609

* mute BasicRenormalizationIT.testDefaultRenormalization

tracked in #44613

* fix mute testDefaultRenormalization

* Increase busyWait timeout windows is slow

* Mute failure unconfigured node name

* mute x-pack internal cluster test windows

tracking #44610

* Mute JvmErgonomicsTests on windows

Tracking #44669

* mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot

Tracking #44671

* Mute NodeTests on Windows

Tracking #44256
2019-07-22 11:32:29 +03:00
Armin Braun 0e2e83f591
More Efficient Deserialization of Empty Collections in StreamInput (#44665) (#44674)
* We only had the `size == 0` optimization in some but not all spots of deserializing collections in this class, fixed the remaining spots.
* Also fixed the a similar spot when deserializing `ThreadContextStruct` that could now be simplified (it was apparently doing it's own version of this optimization for the first map it deserialized before ... but not for the second map -> made it not instantiate anything if both maps are empty since it's always the same object here anyway)
2019-07-22 09:31:12 +02:00
Armin Braun 0ac137a9a1
Optimize some StreamOutput Operations (#44660) (#44668)
* Optimize some StreamOutput Operations

* Writing numbers byte by byte adds a lot of unnecessary bounds checks to serialization
* Serializing to a threadlocal `byte[]` instead and bulk writing gives about a 50% speedup on `long` and `vlong` (for large numbers) writes and 30% for `int`, `vint` on Linux on an i9
* Using a threadlocal of the maximum string buffer size we used to allocate before also removes allocations when writing strings in general since we now never have to allocate a `byte[]` for that
   * And don't have to GC one either resolving the TODO removed here
2019-07-22 07:09:32 +02:00
Tal Levy 1a9cfe9110
Removal Streamable (#44647) (#44655)
This commit ends the grand adventure that was the
refactoring effort to migrate all usages of
Streamable to Writeable.

Closes #34389.
2019-07-20 19:10:49 -07:00
Ryan Ernst 4c05d25ec7
Convert Transport Request/Response to Writeable (#44636) (#44654)
This commit converts all remaining TransportRequest and
TransportResponse classes to implement Writeable, and disallows
Streamable implementations.

relates #34389
2019-07-20 11:25:58 -07:00
Ryan Ernst f4ee2e9e91
Convert direct implementations of Streamable to Writeable (#44605) (#44646)
This commit converts Streamable to Writeable for direct implementations.

relates #34389
2019-07-20 08:32:29 -07:00
Tal Levy 7c84636029
Remove StreamOutput #writeOptionalStreamable and #writeStreamableList (#44602) (#44643)
remove usages of writeOptionalStreamable and writeStreambaleList

relates #34389.
2019-07-19 15:55:53 -07:00
Ryan Ernst f193d14764
Convert remaining Action Response/Request to writeable.reader (#44528) (#44607)
This commit converts readFrom to ctor with StreamInput on the remaining
ActionResponse and ActionRequest classes.

relates #34389
2019-07-19 13:33:38 -07:00
Armin Braun f028ab43ad
Don't Swallow Interrupt in TransportService#onRequestReceived (#44622) (#44627)
* We shouldn't just swallow the interrupt here quietly and keep going on the IO thread
   * Currently interrupt continues here just the same way an invocation of `acceptIncomingRequests` woudl have made things continue
* Relates #44610
2019-07-19 20:35:29 +02:00
Christoph Büscher eafe54c81c Fix AnalysisMode propagation in NamedAnalyzer (#44626)
NamedAnalyzer should return the same AnalysisMode than any custom analyzer it
wraps, otherwise AnalysisMode.ALL. This used to be only CustomAnalyzer in the
past, but with the introduction of the ReloadableCustomAnalyzer this needs to be
added as an option where the analysis mode gets propagated.

Closes #44625
2019-07-19 18:18:43 +02:00
Nikita Glashenko 804476c35d Remove support for old translog checkpoint formats (#44280)
This commit removes support for the translog checkpoint format from versions
before 6.0.0 since 7.x versions are incompatible with indices from these
versions.

Relates #44720
Fixes #44210
2019-07-19 16:01:47 +01:00
Przemyslaw Gomulka 597d2dfaf5
Add types field to slow logs in 7.x (#44592)
By mistake in 7.x types field was removed from slow logs. Types are
still present in that version, so this have to be present as a JSON
field
relates #41354
backport that was causing this #44178
2019-07-19 08:31:00 +02:00
Ryan Ernst 60785a9fa8
Convert several direct uses of Streamable to Writeable (#44586) (#44604)
This commit converts several utility classes that implement Streamable
to have StreamInput constructors. It also adds a default version of
readFrom to Streamable so that overriding to throw UOE is not necessary.

relates #34389
2019-07-18 21:25:44 -07:00
Julie Tibshirani 336364fefe
Convert more classes in 'server' to Writeable. (#44600)
* Convert GetTask*.
* Convert RemoteInfo*.
* Convert GetFieldMappings*.
* Convert ValidateQueryRequest*.
* Convert MainResponse*.
* Convert MultiGet*.
* Convert Update*.
* Add a missing call to parent constructors.

Relates to #34389.
2019-07-18 18:45:10 -07:00
Ryan Ernst 13f46aa801
Convert index and persistent actions/response to writeable (#44582) (#44601)
This commit converts several more classes from streamable to writeable
in server, mostly within the o.e.index and o.e.persistent packages.

relates #34389
2019-07-18 18:32:09 -07:00
Tal Levy 03f5084ac7
remove usages of #readOptionalStreamable, #readStreamableList. (#44578) (#44598)
This commit removes references to Streamable from StreamInput.

This is all a part of the effort to remove Streamable usage.

relates #34389.
2019-07-18 16:19:02 -07:00
Ryan Ernst af093a4095
Convert ShardOperationFailedException to Writeable (#44532) (#44580)
This commit converts subclasses of ShardOperationFailedException to
implement ctors with StreamInput instead of readFrom. It also simplifies
IndicesShardStoresResponse.Failure to serialize its shardId after the
super data.

relates #34389
2019-07-18 13:29:19 -07:00
Armin Braun 3b5038b837
Implement Eventually Consistent Mock Repository for SnapshotResiliencyTests (#40893) (#44570)
* Add eventually consistent mock repository for reproducing and testing AWS S3 blob store behavior
* Relates #38941
2019-07-18 17:54:54 +02:00
Andrey Ershov ef6ddd15c6 Revert "Snapshot tool: S3 orphaned files
cleanup (#44551)"

This reverts commit 09edeeb3
2019-07-18 17:21:45 +02:00
Andrey Ershov 09edeeb38e Snapshot tool: S3 orphaned files cleanup (#44551)
A tool to work with snapshots.
Co-authored by @original-brownbear.
This commit adds snapshot tool and the single command cleanup, that
cleans up orphaned files for S3.
Snapshot tool lives in x-pack/snapshot-tool.

(cherry picked from commit fc4aed44dd975d83229561090f957a95cc76b287)
2019-07-18 16:38:00 +02:00
David Turner 452f7f67a0
Defer reroute when starting shards (#44539)
Today we reroute the cluster as part of the process of starting a shard, which
runs at `URGENT` priority. In large clusters, rerouting may take some time to
complete, and this means that a mere trickle of shard-started events can cause
starvation for other, lower-priority, tasks that are pending on the master.

However, it isn't really necessary to perform a reroute when starting a shard,
as long as one occurs eventually. This commit removes the inline reroute from
the process of starting a shard and replaces it with a deferred one that runs
at `NORMAL` priority, avoiding starvation of higher-priority tasks.

Backport of #44433 and #44543.
2019-07-18 14:10:40 +01:00
Alan Woodward ec0a0a41db Remove type parameter from ParserContext (#44478)
ParserContext.getType() is never called, so we can remove it and tidy up
the callers as well.
2019-07-18 11:07:46 +01:00
Luca Cavanna a8a16e6b08 Associate sub-requests to their parent task in multi search API (#44492)
Multi search accepts multiple search requests and runs them as
independent requests, each one as part of their own search task. Today
they don't get associated though with their parent multi search task,
which would be useful to monitor which msearch a certain search was part
of, if any, and also to cancel all of the sub-requests in case the
parent msearch gets cancelled (though this will also require making 
the multi search task cancellable as a follow-up).
2019-07-18 11:58:30 +02:00
David Turner 7598e0186a
Harmonise indentation of cluster settings (#44540)
Today the long list of `BUILT_IN_CLUSTER_SETTINGS` is indented differently
between `master` and `7.x`. This sometimes makes backporting painful. This
commit adjusts the indentation of earlier branches to match that in `master`.
2019-07-18 09:50:53 +01:00
Armin Braun 6565825a13
Avoid CharsRef Allocations in StreamInput (#44488) (#44519)
* Many messages deserialized from a `StreamInput` only contain short strings, some use-cases of instantiating a `StreamInput` don't deserialize any strings
  * Don't allocate `CharsRef` for small strings to save some allocations (especially on the IO threads)
  * Lazily allocate a larger `CharsRef` if needed for larger strings like we did before and have it live as long as the `StreamInput` like before as well
2019-07-18 08:52:37 +02:00
Tal Levy 38d2ada84f
deprecate Supplier<Response> constructors in HandledTransportAction (#44456) (#44533)
This commit deprecates all constructors of HandledTransportAction
that take in a Supplier instead of a Writeable.Reader for response
objects.

in addition to the deprecation, the following modules were updated to
leverage Writeable

- modules:ingest-common
- modules:lang-mustache

relates #34389.
2019-07-17 22:47:09 -07:00
Tal Levy 075a3f0e99
remove usage of ActionType#(String) (#44459) (#44526)
this commit removes usage of the deprecated
constructor with a single argument and no Writeable.Reader.

The purpose of this is to reduce the boilerplate necessary for
properly implementing a new action, as well as reducing the
chances of using the incorrect super constructor while classes
are being migrated to Writeable

relates #34389.
2019-07-17 20:28:11 -07:00
Nhat Nguyen 51180af91d Make peer recovery send file chunks async (#44468)
Relates #44040
Relates #36195
2019-07-17 22:25:43 -04:00
Nhat Nguyen 458f24c46a Reenable accounting circuit breaker (#44495)
We have a new Lucene 8.2 snapshot on master and 7.x; hence we can
re-enable the accounting on these branches.

Relates #30290
2019-07-17 22:25:43 -04:00
Julie Tibshirani 34c6067018
Convert several classes in 'server' to Writeable. (#44527)
* Convert FieldCapabilities*.
* Convert MultiTermVectors*.
* Convert SyncedFlush*.
* Convert SearchTemplateRequest.
* Convert MultiSearchTemplateRequest.
* Convert GrokProcessorGet*.
* Remove a stray reference to SearchTemplateRequest#readFrom.

Relates to #34389.
2019-07-17 19:04:21 -07:00
Ryan Ernst 2a2686e6e7
Convert remaining ActionTypes to writeable in xpack core (#44467) (#44525)
This commit converts all remaining ActionType response classes to
writeable in xpack core. It also converts a few from server which were
used by xpack core.

relates #34389
2019-07-17 18:01:45 -07:00
Ryan Ernst 17c4b2b839
Convert MasterNodeRequest to implement Writeable.Reader (#44452) (#44513)
This commit converts all MasterNodeRequest subclasses to fullfill
Writeable.Reader constructors.

relates #34389
2019-07-17 18:01:29 -07:00
Paul Sanwald 7114fe786b
Fix incorrect calculation of how many buckets will result from a merge operation. (#44461) (#44515) 2019-07-17 19:14:16 -04:00
Julie Tibshirani 8841779de8
Convert ClearScroll* to Writeable. (#44511)
This PR converts `ClearScrollRequest` and `ClearScrollResponse` to
`Writeable`.

Relates to #34389.
2019-07-17 15:49:38 -07:00
Jason Tedor 39c5f98de7
Introduce test issue logging (#44477)
Today we have an annotation for controlling logging levels in
tests. This annotation serves two purposes, one is to control the
logging level used in tests, when such control is needed to impact and
assert the behavior of loggers in tests. The other use is when a test is
failing and additional logging is needed. This commit separates these
two concerns into separate annotations.

The primary motivation for this is that we have a history of leaving
behind the annotation for the purpose of investigating test failures
long after the test failure is resolved. The accumulation of these stale
logging annotations has led to excessive disk consumption. Having
recently cleaned this up, we would like to avoid falling into this state
again. To do this, we are adding a link to the test failure under
investigation to the annotation when used for the purpose of
investigating test failures. We will add tooling to inspect these
annotations, in the same way that we have tooling on awaits fix
annotations. This will enable us to report on the use of these
annotations, and report when stale uses of the annotation exist.
2019-07-18 05:33:33 +09:00
Ryan Ernst 0755a13c9f
Convert AcknowledgedRequest to Writeable.Reader (#44412) (#44454)
This commit adds constructors to AcknolwedgedRequest subclasses to
implement Writeable.Reader, and ensures all future subclasses implement
the same.

relates #34389
2019-07-17 11:17:36 -07:00
Yannick Welsch c8b66c549d Ignore failures to set socket options on Mac (#44355)
Brings some temporary relief for test failures until #41071 is addressed.
2019-07-17 18:51:25 +02:00
Yannick Welsch f78e64e3e2 Terminate linearizability check early on large histories (#44444)
Large histories can be problematic and have the linearizability checker occasionally run OOM. As it's
very difficult to bound the size of the histories just right, this PR will let it instead run for 10 seconds
on large histories and then abort.

Closes #44429
2019-07-17 18:51:25 +02:00
Igor Motov d3cb7bbc8f Geo: fix GeoWKTShapeParserTests (#44448)
Changes in #44187 introduced some optimization in the way shapes are
generated. These changes were not captured in GeoWKTShapeParserTests.

Relates #44187
2019-07-17 12:09:38 -04:00
Igor Motov cd5a334864 Geo: extract dateline handling logic from ShapeBuilders (#44187)
Extracts dateline decomposition logic from ShapeBuilder into a separate
utility class that is used on the indexing side. The search side
will be handled as part of another PR at this time we will remove
the decomposition logic from ShapeBuilders as well. This PR also doesn't
change any existing logic including bugs.

Relates to #40908
2019-07-17 12:09:38 -04:00
Alan Woodward b6a0f098e6 Don't use index_phrases on graph queries (#44340)
Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976
2019-07-17 16:46:00 +01:00
Yannick Welsch ddd740162e
Do not use CancellableThreads for Zen1 (#44430)
Zen 1 stops pinging threads in ZenDiscovery by calling Thread.interrupt(). This is incompatible with
the CancellableThreads that only allow threads to be interrupted through cancellation. The use of
CancellableThreads was introduced in #42844 and added to UnicastZenPing as part of the
backport, as both Zen1 and Zen2 share the same SeedHostsResolver implementation. This commit
effectively undoes the change in the backport while still allowing to share same implementation.

Closes #44425
2019-07-17 17:32:47 +02:00
Zachary Tong 103ba976fd Convert BucketScript to static parser (#44385)
BucketScript was using the old-style parser and could easily be
converted over to the newer static parser.

Also adds a test for GapPolicy enum serialization
2019-07-17 10:22:42 -04:00
David Turner 377a6a47ac Improve handshake failure messages (#44485)
Today we report an exception on a handshake failure (e.g. cluster name
mismatch) but the message does not include all the details of the mismatch. If
the mismatch is something subtle like `my-cluster` instead of `my_cluster` then
we cannot diagnose this from the message alone. This commit adds the details of
the local cluster to the message, along with the details of the remote cluster,
improving the utility of the exception message if reported in isolation.
2019-07-17 13:33:28 +01:00
Armin Braun 91673e373a
Fix Incorrect Uncompressed Error Handling in InboundMessage (#44317) (#44483)
* Fix Incorrect Uncompressed Error Handling in InboundMessage

* CompressorFactory.compressor does not throw uncompressed exception on uncompressed bytes, it merely returns `null` in this case if the bytes are at least XContent so the current catch and re-throw logic is dead code
* Made it work again by throwing on a `null` return so we get a real error message instead of an NPE
2019-07-17 14:31:46 +02:00
Ignacio Vera eb348d2593
Upgrade to lucene-8.2.0-snapshot-6413aae226 (#44480) 2019-07-17 13:28:28 +02:00
Armin Braun c8db0e9b7e
Remove blobExists Method from BlobContainer (#44472) (#44475)
* We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface
   * Keep it as an implementation detail in the Azure repository
2019-07-17 11:56:02 +02:00
Tanguy Leroux e423b7341a Log non-acknowledged close index response in ReplicaToPrimaryPromotionIT
Relates #44479
2019-07-17 10:32:44 +02:00
David Turner dca8a918f3 Use applied cluster state in cluster health (#44426)
In #44348 we changed the cluster health action so that it sometimes uses the
cluster state directly from the master service rather than from the cluster
applier. If the state is not recovered then this is inappropriate, because
prior to state recovery the state available to the cluster applier contains no
indices. This commit moves us back to using the state from the applier.

Fixes #44416.
2019-07-17 08:36:13 +01:00
David Turner 0fd33b089f Report shard state changes better (#44419)
Today when the cluster health changes the `AllocationService` reports at most
ten shards that were started or failed, and always ends its message with `...`
suggesting that the list is truncated. This commit adjusts these messages to be
clearer about whether the list is truncated or not. When debug logging is
enabled the list is not truncated; if the list is truncated then its length is
logged, and if it is not truncated then no `...` is included in the message.
2019-07-17 08:36:06 +01:00
Ryan Ernst 6e50bafa8f
Convert Broadcast request and response to use writeable.reader (#44386) (#44453)
This commit converts the request and response classes for broadcast
actions to implement ctors for Writeable.Reader and forces all future
implementations to implement the same.

relates #34389
2019-07-16 23:24:02 -07:00
Tim Brooks 6b1a769638
Move CORS Config into :server package (#43779)
This commit moves the config that stores Cors options into the server
package. Currently both nio and netty modules must have a copy of this
config. Moving it into server allows one copy and the tests to be in a
common location.
2019-07-16 17:50:42 -06:00
Julie Tibshirani cc0ff3aa71 Ensure field caps doesn't error on rank feature fields. (#44370)
The contract for MappedFieldType#fielddataBuilder is to throw an
IllegalArgumentException if fielddata is not supported. The rank feature mappers
were instead throwing an UnsupportedOperationException, which caused
MappedFieldType#isAggregatable to fail.
2019-07-16 15:56:50 -07:00
Ryan Ernst c26edb4c43
Ensure replication response/requests implement writeable (#44392) (#44446)
This commit cleans up replication response and request so that the base
class does not allow subclasses to implement Streamable.

relates #34389
2019-07-16 12:53:08 -07:00
Przemysław Witek 9613700a63
[7.x] Implement MlConfigIndexMappingsFullClusterRestartIT test which verifies that .ml-config index mappings are properly updated during cluster upgrade (#44341) (#44366) 2019-07-16 21:22:40 +02:00
Nhat Nguyen 301c8daf4c Revert "Make peer recovery send file chunks async (#44040)"
This reverts commit a2b4687d89.
2019-07-16 14:18:35 -04:00
Christoph Büscher 67ec0a4e9b
Unmute SpecificMasterNodesIT test (#44436)
The underlying issue is closed and the fix in #42454 seems to have been
backported to 7.x and 7.3 so we can reactivate the test.
2019-07-16 19:41:59 +02:00
Yu 563a78829f Do not allow version in Rest Update API (#43516)
The versioning of Update API doesn't rely on version number anymore (and
rather on sequence number). But in rest api level we ignored the
"version" and "version_type" parameter, so that the server cannot raise
the exception when whey were set.

This PR restores "version" and "version_type" parsing in Update Rest API
so that we can get the appropriate errors.

Relates to #42497
2019-07-16 13:19:07 -04:00
Nhat Nguyen a2b4687d89 Make peer recovery send file chunks async (#44040) 2019-07-16 10:43:46 -04:00
Lee Hinman fb0461ac76
[7.x] Add Snapshot Lifecycle Management (#44382)
* Add Snapshot Lifecycle Management (#43934)

* Add SnapshotLifecycleService and related CRUD APIs

This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.

This also includes the get, put, and delete APIs for these policies

Relates to #38461

* Make scheduledJobIds return an immutable set

* Use Object.equals for SnapshotLifecyclePolicy

* Remove unneeded TODO

* Implement ToXContentFragment on SnapshotLifecyclePolicyItem

* Copy contents of the scheduledJobIds

* Handle snapshot lifecycle policy updates and deletions (#40062)

(Note this is a PR against the `snapshot-lifecycle-management` feature branch)

This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.

Relates to #38461

* Take a snapshot for the policy when the SLM policy is triggered (#40383)

(This is a PR for the `snapshot-lifecycle-management` branch)

This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.

This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.

Relates to #38461

* Record most recent snapshot policy success/failure (#40619)

Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.

This is the first step toward writing history in a more permanent
fashion.

* Validate snapshot lifecycle policies (#40654)

(This is a PR against the `snapshot-lifecycle-management` branch)

With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.

Part of #38461

* Hook SLM into ILM's start and stop APIs (#40871)

(This pull request is for the `snapshot-lifecycle-management` branch)

This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.

Relates to #38461

* Add tests for SnapshotLifecyclePolicyItem (#40912)

Adds serialization tests for SnapshotLifecyclePolicyItem.

* Fix improper import in build.gradle after master merge

* Add human readable version of modified date for snapshot lifecycle policy (#41035)

* Add human readable version of modified date for snapshot lifecycle policy

This small change changes it from:

```
...
"modified_date": 1554843903242,
...
```

To

```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```

Including the `"modified_date"` field when the `?human` field is used.

Relates to #38461

* Fix test

* Add API to execute SLM policy on demand (#41038)

This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.

```json
PUT /_ilm/snapshot/<policy>/_execute
```

And it returns the response with the generated snapshot name:

```json
{
  "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```

Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.

Relates to #38461

* Add next_execution to SLM policy metadata (#41221)

* Add next_execution to SLM policy metadata

This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:

```json
GET /_ilm/snapshot?human
{
  "production" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:16:21.865Z",
    "modified_date_millis" : 1555362981865,
    "policy" : {
      "name" : "<production-snap-{now/d}>",
      "schedule" : "*/30 * * * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "foo-*",
          "important"
        ],
        "ignore_unavailable" : true,
        "include_global_state" : false
      }
    },
    "next_execution" : "2019-04-15T21:16:30.000Z",
    "next_execution_millis" : 1555362990000
  },
  "other" : {
    "version" : 1,
    "modified_date" : "2019-04-15T21:12:19.959Z",
    "modified_date_millis" : 1555362739959,
    "policy" : {
      "name" : "<other-snap-{now/d}>",
      "schedule" : "0 30 2 * * ?",
      "repository" : "repo",
      "config" : {
        "indices" : [
          "other"
        ],
        "ignore_unavailable" : false,
        "include_global_state" : true
      }
    },
    "next_execution" : "2019-04-16T02:30:00.000Z",
    "next_execution_millis" : 1555381800000
  }
}
```

Relates to #38461

* Fix and enhance tests

* Figured out how to Cron

* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)

This commit changes the endpoint for snapshot lifecycle management from:

```
GET /_ilm/snapshot/<policy>
```

to:

```
GET /_slm/policy/<policy>
```

It mimics the ILM path only using `slm` instead of `ilm`.

Relates to #38461

* Add initial documentation for SLM (#41510)

* Add initial documentation for SLM

This adds the initial documentation for snapshot lifecycle management.

It also includes the REST spec API json files since they're sort of
documentation.

Relates to #38461

* Add `manage_slm` and `read_slm` roles (#41607)

* Add `manage_slm` and `read_slm` roles

This adds two more built in roles -

`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.

`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.

Relates to #38461

* Add execute to the test

* Fix ilm -> slm typo in test

* Record SLM history into an index (#41707)

It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.

This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.

Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s.  Watcher would cause the same problem, but it
is already disabled (for the same reason).

* High Level Rest Client support for SLM (#41767)

* High Level Rest Client support for SLM

This commit add HLRC support for SLM.

Relates to #38461

* Fill out documentation tests with tags

* Add more callouts and asciidoc for HLRC

* Update javadoc links to real locations

* Add security test testing SLM cluster privileges (#42678)

* Add security test testing SLM cluster privileges

This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.

Relates to #38461

* Don't redefine vars

*  Add Getting Started Guide for SLM  (#42878)

This commit adds a basic Getting Started Guide for SLM.

* Include SLM policy name in Snapshot metadata (#43132)

Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.

* Fix compilation after master merge

* [TEST] Move exception wrapping for devious exception throwing

Fixes an issue where an exception was created from one line and thrown in another.

* Fix SLM for the change to AcknowledgedResponse

* Add Snapshot Lifecycle Management Package Docs (#43535)

* Fix compilation for transport actions now that task is required

* Add a note mentioning the privileges needed for SLM (#43708)

* Add a note mentioning the privileges needed for SLM

This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.

Relates to #38461

* Mention that you can create snapshots for indices you can't read

* Fix REST tests for new number of cluster privileges

* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)

* Fix SnapshotHistoryStoreTests after merge

* Remove overridden newResponse functions that have been removed

* Fix compilation for backport

* Fix get snapshot output parsing in test

* [DOCS] Add redirects for removed autogen anchors (#44380)

* Switch <tt>...</tt> in javadocs for {@code ...}
2019-07-16 07:37:13 -06:00
Armin Braun 4a79ccd324
Cleaner Exception Handling on Shard Delete (#44384) (#44407)
* Follow up to #44165
* We should just catch all exceptions here and not return errors after the index-N update went through since a subsequent delete attempt by the user would fail with SnapshotMissingException since the snapshot now appears deleted. Also, `SnapshotException` isn't even thrown in the changed spot it seems in the first place and certainly not the only exception possible.
2019-07-16 12:20:52 +02:00
David Turner a09389c511 AwaitsFix GatewayIndexStateIT#testJustMasterNode
Relates #44416.
2019-07-16 11:02:32 +01:00
David Turner 8d68d1f54d Cluster health should await events plus other things (#44348)
Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
2019-07-16 06:34:02 +01:00
Ryan Ernst e0b82e92f3
Convert BaseNode(s) Request/Response classes to Writeable (#44301) (#44358)
This commit converts all BaseNodeResponse and BaseNodesResponse
subclasses to implement Writeable.Reader instead of Streamable.

relates #34389
2019-07-15 18:07:52 -07:00
David Turner 86ee8eab3f Allow RerouteService to reroute at lower priority (#44338)
Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH`
priority, but in some cases a lower priority would be more appropriate. This
commit adds the facility to submit delayed reroute tasks at different
priorities, such that each submitted reroute task runs at a priority no lower
than the one requested. It does not change the fact that all delayed reroute
tasks are submitted at `HIGH` priority, but at least it makes this explicit.
2019-07-15 17:41:39 +01:00
Ryan Ernst 59658daef9
Separate streamable based master node actions (#44313)
This commit creates new base classes for master node actions whose
response types still implement Streamable. This simplifies both finding
remaining classes to convert, as well as creating new master node
actions that use Writeable for their responses.

relates #34389
2019-07-15 09:20:20 -07:00
David Turner e3d2af64c4 Throw TranslogCorruptedException in more cases (#44217)
Today we do not throw a `TranslogCorruptedException` in certain cases of
translog corruption, such as for a corrupted checkpoint file or when an
expected file (either checkpoint or translog) is completely missing. This means
that `elasticsearch-shard` will not truncate the translog in those cases.

This commit strengthens the translog corruption tests to corrupt and/or delete
both translog and checkpoint files, and ensures that a
`TranslogCorruptedException` is thrown in all cases. It also sometimes
simulates a recovery after a crash while rolling the translog generation,
including cases where the rolled checkpoint contains incorrect data.

It also adjusts (and renames) `RemoveCorruptedShardDataCommandIT.getDirs()` to
return only a single path, since in practice this was the only thing that could
happen and yet we were relying on its callers to verify this and not all
callers were doing so.
2019-07-15 15:20:33 +01:00
Armin Braun eb1106c465
Stronger Cleanup Shard Snapshot Directory on Delete (#44257) (#44337)
* Stronger Cleanup Shard Snapshot Directory on Delete

* Use `RepositoryData` to clean up unreferenced `snap-${uuid}.dat` blobs
from shard directories (and index-N) and as a result also clean up data
blobs that are only referenced by them
* Stop cleaning up anything but index-N on shard snapshot creation to
align behavior of shard index-N handling with root path index-N
handling
2019-07-15 12:59:38 +02:00
Christoph Büscher 22dc125dad AnalyzeAction.Response doesn't need to call super.readFrom() (#44331)
The responses super.writeTo() method was removed in #44092, so the corresponding
contructor that reads from a stream shouldn't call super itself, even though its
implementation is currently empty.
2019-07-15 11:53:25 +02:00
Armin Braun 7f5d40d235
Avoid Needless Set Instantiation in InboundMessage (#44318) (#44329)
* Avoid Needless Set Instantiation in InboundMessage

* When `features` is empty (when there's no xpack) we constantly and needless instantiated a few objects here for the empty set on every message
2019-07-15 10:59:51 +02:00
Armin Braun 0cc94a457d
Remove non-SMILE Serialization from ChecksumBlobStoreFormat (#44278) (#44326)
* At least all the way back to 6.x we never use anything but `SMILE` in
production code with this class so I removed the more general
constructor and removed the format leniency from the deserialization
2019-07-15 10:59:33 +02:00
Tanguy Leroux 76a96c3774 Remove ReusePeerRecoverySharedTest class (#44275) 2019-07-15 10:29:29 +02:00
Armin Braun d73e2f9c56
HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164) (#44324)
* HLRC: Fix '+' Not Correctly Encoded in GET Req.

* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes #33077
2019-07-15 10:21:54 +02:00
Nhat Nguyen 2203d447aa Fail engine if hit document failure on replicas (#43523)
An indexing on a replica should never fail after it was successfully
indexed on a primary. Hence, we should fail an engine if we hit any
failure (document level or tragic failure) when processing an indexing
on a replica.

Relates #43228
Closes #40435
2019-07-14 19:29:16 -04:00
Christoph Büscher 835b7a120d Fix AnalyzeAction response serialization (#44284)
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.

Closes #44078
2019-07-14 10:35:11 +02:00
Ryan Ernst 1dcf53465c Reorder HandledTransportAction ctor args (#44291)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.

relates #34389
2019-07-12 13:45:09 -07:00
Nikita Glashenko d187fcb9de Support WKT point conversion to geo_point type (#44107)
This PR adds support for parsing geo_point values from WKT POINT format.
Also, a few minor bugs in geo_point parsing were fixed.

Closes #41821
2019-07-12 14:31:07 -04:00
Przemyslaw Gomulka e23ecc5838
JSON logging refactoring and X-Opaque-ID support backport(#41354) (#44178)
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)

These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.

Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.

closes #41350
 backport #41354
2019-07-12 16:53:27 +02:00
Armin Braun 9b4f50b40a
Remove Redundant GetAllSnapshots Method from RepositoryData (#44259) (#44271)
* With the removal of the incompatible snapshots list in RepositoryData
the get snapshots and get all snapshots methods are equivalent so I
removed one of them
2019-07-12 15:03:09 +02:00
Yannick Welsch 068286ca4b Remove RemoteClusterConnection.ConnectedNodes (#44235)
This instead exposes the set of connected nodes on ConnectionManager.
2019-07-12 14:54:21 +02:00
Armin Braun 6c02cf0241
Fix InternalTestCluster StopRandomNode Assertion (#44258) (#44265)
* The assertion added in #44214 is tripped by tests running dedicated
test clusters per test needlessly.This breaks existing tests like the one in #44245.
* Closes #44245
2019-07-12 13:18:55 +02:00
Armin Braun ad6dce16f4
Safer Shard Snapshot Delete (#44165) (#44244)
* Safer Shard Snapshot Delete

* We shouldn't delete the snapshot meta file before we update the index
in the shard folder. If we fail to update the index-N after deleting the
existing index-N is broken because the snap- blob it references is gone.
2019-07-12 12:45:06 +02:00
David Turner 735c897ec6
Avoid counting votes from master-ineligible nodes (#43688)
Today if a master-eligible node is converted to a master-ineligible node it may
remain in the voting configuration, meaning that the master node may count its
publish responses as an indication that it has properly persisted the cluster
state. However master-ineligible nodes do not properly persist the cluster
state, so it is not safe to count these votes.

This change adjusts `CoordinationState` to take account of this from a safety
point of view, and also adjusts the `Coordinator` to prevent such nodes from
joining the cluster. Instead, it triggers a reconfiguration to remove from the
voting configuration a node that now appears to be master-ineligible before
processing its join.

Backport of #43688, see #44260.
2019-07-12 11:30:52 +01:00
Armin Braun 9e920f9612
Make Timestamps Returned by Snapshot APIs Consistent (#43148) (#44261)
* We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or
take the most recent time from the shard stats for in progress snapshots
* Closes #43074
2019-07-12 12:05:35 +02:00
Mark Vieira 3cd9606566
Mute failing test 2019-07-11 13:32:49 -07:00
Armin Braun 0dd06cf7a5
Remove Dead Code Around Snapshots (#44109) (#44236)
* Just some random spots that have become unused with recent cleanups
2019-07-11 21:56:36 +02:00
Christoph Büscher 31725ef390 [Tests] Increase SimpleQueryStringIT allowed maxClauseCount (#44215)
For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup
(@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses
which hasn't been an issue before LUCENE-8811, but we slightly need to increase
the minimal possible clause count now.

Closes #44192
2019-07-11 20:16:20 +02:00
Yannick Welsch ae8f625d73 Report usages old child breakers when breaking on real memory (#44221)
This will help in investigations where the real memory circuit breaker is tripped to better understand
on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to
more permanently allocated memory (e.g. accounting).
2019-07-11 19:52:12 +02:00
Armin Braun 2768662822
Cleanup Stale Root Level Blobs in Sn. Repository (#43542) (#44226)
* Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures
   * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot
   * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic
* Follow up to #42189
  * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written
2019-07-11 19:35:15 +02:00
Andrei Stefan e9f9f00940
SQL: add pretty printing to JSON format (#43756) (#44220)
(cherry picked from commit cbd9d4c259bf5a541bc49f65f7973174a36df449)
2019-07-11 20:02:24 +03:00
Christos Soulios c091b6c004
Migrating tests from AvgIT integration test to AvgAggregatorTests (#44076) (#44225)
This PR migrates most tests from AvgIT integration test to AvgAggregatorTests, as described in #42893
2019-07-11 19:20:13 +03:00
Armin Braun 5f22370b6b
Fix ShrinkIndexIT (#44214) (#44223)
* Fix ShrinkIndexIT

* Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived.
* Closes #44164
2019-07-11 17:58:00 +02:00
Igor Motov 1636701d69 CI: Disable SimpleQueryStringIT.testDocWithAllTypes
Tracked by #44192
2019-07-11 09:22:18 -05:00
Nick Knize 374030a53f
Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171) (#44184)
Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378
2019-07-11 09:17:22 -05:00
Igor Motov 66a9b721f5 Add Map to XContentParser Wrapper (#44036)
In some cases we need to parse some XContent that is already parsed into
a map. This is currently happening in handling source in SQL and ingest
processors as well as parsing null_value values in geo mappings. To avoid
re-serializing and parsing the value again or writing another map-based
parser this commit adds an iterator that iterates over a map as if it was
XContent. This makes reusing existing XContent parser on maps possible.

Relates to #43554
2019-07-11 09:38:31 -04:00
Yannick Welsch ea5513f2cf Make NodeConnectionsService non-blocking (#44211)
With connection management now being non-blocking, we can make NodeConnectionsService
avoid the use of MANAGEMENT threads that are blocked during the connection attempts.

I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool
from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level,
which resulted in races.
2019-07-11 14:08:07 +02:00
Armin Braun 51f0e941d3
Reduce Number of List Calls During Snapshot Create and Delete (#44088) (#44209)
* Reduce Number of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
2019-07-11 13:52:36 +02:00
Armin Braun 8ce8c627dd
Some Cleanup in o.e.i.shard (#44097) (#44208)
* Some Cleanup in o.e.i.shard

* Extract one duplicated method
* Cleanup obviously unused code
2019-07-11 13:52:06 +02:00
Yannick Welsch 2ee07f1ff4 Simplify port usage in transport tests (#44157)
Simplifies AbstractSimpleTransportTestCase to use JVM-local ports  and also adds an assertion so
that cases like #44134 can be more easily debugged. The likely reason for that one is that a test,
which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle
daemon) kept increasing Gradle worker IDs, causing an overflow at some point.
2019-07-11 13:35:37 +02:00
Armin Braun c0ed64bb92
Improve Repository Consistency Check in Tests (#44204)
* Improve Repository Consistency Check in Tests (#44099)

* Check that index metadata as well as snapshot metadata always exists
when referenced by other metadata

* Fix SnapshotResiliencyTests on ExtraFS (#44113)

* As a result of #44099 we're now checking more directories and have to
ignore the `extraN` folders for those like we do for indices already
* Closes #44112
2019-07-11 11:14:37 +02:00
Armin Braun 8a554f9737
Remove IncompatibleSnapshots Logic from Codebase (#44096) (#44183)
* The incompatible snapshots logic was created to track 1.x snapshots that
became incompatible with 2.x
   * It serves no purpose at this point
   * It adds an additional GET request to every loading of
RepositoryData (from loading the incompatible snapshots blob)
2019-07-11 07:15:51 +02:00
Igor Motov df2e1fb43e Geo: add validator that only checks altitude (#43893)
By default, we don't check ranges while indexing geo_shapes. As a
result, it is possible to index geoshapes that contain contain
coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes
will currently break SQL and ML retrieval mechanism. This commit removes
these restriction from the validator is used in SQL and ML retrieval.
2019-07-10 16:55:03 -04:00
Ryan Ernst 8fda49a834 Remove unused import in TransportShardBulkAction
Accidentally left from backporting #44092
2019-07-10 13:43:47 -07:00
Christoph Büscher cbb19032df [Test] Additional logging for RemoteClusterClientTests (#44124) 2019-07-10 22:41:54 +02:00
Ryan Ernst c6efb9be2a Convert ReplicationResponse to Writeable (#43953)
This commit convers ReplicationResponse and all its subclasses to
support Writeable.Reader as a constructor.

relates #34389
2019-07-10 12:45:10 -07:00
Ryan Ernst fb77d8f461 Removed writeTo from TransportResponse and ActionResponse (#44092)
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.

This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.

relates #34389
2019-07-10 12:42:04 -07:00
Zachary Tong 92ad588275
Remove generic on AggregatorFactory (#43664) (#44079)
AggregatorFactory was generic over itself, but it doesn't appear we
use this functionality anywhere (e.g. to allow the super class
to declare arguments/return types generically for subclasses to
override).  Most places use a wildcard constraint, and even when a
concrete type is specified it wasn't used.

But since AggFactories are widely used, this led to
the generic touching many pieces of code and making type signatures
fairly complex
2019-07-10 13:20:28 -04:00
Nhat Nguyen b158919542 Do not use mock engine in PrimaryAllocationIT (#44083)
PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary 
relies on the flushing when a shard is no long assigned. This behavior,
however, can be randomly disabled in MockInternalEngine.

Closes #44049
2019-07-10 12:26:34 -04:00
David Turner d0f1a756d9 Comment on the extra reroute after failing shards (#44152)
The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a
reroute, but then sometimes schedules a followup reroute. It's not clear from
the code why this followup is necessary, so this commit adds a short comment
describing why it's necessary.
2019-07-10 13:24:21 +01:00
David Roberts cad804df92 [TEST] Mute ShrinkIndexIT
Due to https://github.com/elastic/elasticsearch/issues/44164
2019-07-10 13:22:25 +01:00
Martijn van Groningen 913b6a64e8
Replace Streamable w/ Writable for MultiSearchRequest (#44057)
This commit replaces usages of Streamable with Writeable for the
MultiSearchRequest class.

I ran into this when developing a custom action that reuses
MultiSearchRequest in the enrich branch.

Relates to #34389
2019-07-10 11:13:28 +02:00
Armin Braun a23d1ed00d
Mute SearchWithRandomExceptionsIT (#44147) (#44149)
* This is failing quiete often and we can reproduce it now so we don't
need additional test logging on CI
* Relates #40435
2019-07-10 08:12:26 +02:00
David Turner aec44fecbc Decouple DiskThresholdMonitor & ClusterInfoService (#44105)
Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at
construction time so that it can notify it when nodes report changes in their
disk usage, but this is awkward to construct: the `DiskThresholdMonitor`
requires a `RerouteService` which requires an `AllocationService` which comees
from the `ClusterModule` which requires the `ClusterInfoService`.

Today we break the cycle with a `LazilyInitializedRerouteService` which is
itself a little ugly. This commit replaces this with a more traditional
subject/observer relationship between the `ClusterInfoService` and the
`DiskThresholdMonitor`.
2019-07-09 18:43:32 +01:00
David Turner e70cad4c52 Remove node conn block after connection barrier (#44114)
Today `testOnlyBlocksOnConnectionsToNewNodes` fails (extremely rarely) if the
last attempt to connect to `node0` is delayed for so long that the test runs
`nodeConnectionsBlocks.clear()` before the connection attempt obtains the
expected connection block. We can turn this into a reliable failure with this
delay:

```diff
diff --git a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
index f48413824d3..9a1d0336bcd 100644
--- a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
+++ b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java
@@ -300,6 +300,13 @@ public class NodeConnectionsService extends AbstractLifecycleComponent {
         private final Runnable connectActivity = () -> threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() {
             @Override
             protected void doRun() {
+
+                try {
+                    Thread.sleep(500);
+                } catch (InterruptedException e) {
+                    throw new AssertionError("unexpected", e);
+                }
+
                 assert Thread.holdsLock(mutex) == false : "mutex unexpectedly held";
                 transportService.connectToNode(discoveryNode);
                 consecutiveFailureCount.set(0);
```

This commit reverts the extra logging introduced in #43979 and fixes this
failure by waiting for the connection attempt to hit the barrier before
removing it.

Fixes #40170
2019-07-09 17:03:26 +01:00
David Turner 268971db03 Wait for blackholed connection before discovery (#44077)
Since #42636 we no longer treat connections specially when simulating a
blackholed connection. This means that at the end of the safety phase we may
have just started a connection attempt which will time out, but the default
timeout is 30 seconds, much longer than the 2 seconds we normally allow for
post-safety-phase discovery. This commit adds time for such a connection
attempt to time out.

It also fixes some spurious logging of `this` that now refers to an object with
an unhelpful `toString()` implementation introduced in #42636.

Fixes #44073
2019-07-09 10:59:53 +01:00
Henning Andersen 748a10866d Reindex ScrollableHitSource pump data out (#43864)
Refactor ScrollableHitSource to pump data out and have a simplified
interface (callers should no longer call startNextScroll, instead they
simply mark that they are done with the previous result, triggering a
new batch of data). This eases making reindex resilient, since we will
sometimes need to rerun search during retries.

Relates #43187 and #42612
2019-07-09 11:50:09 +02:00
David Turner fd9eebae81 Only apply initial recovery filter to shrunk shard (#44054)
Today the `index.routing.allocation.initial_recovery._id` setting can only be
set on indices that are the result of a shrink, but the filtered allocation
decider also applies this filter to shards with a recovery source of
`EMPTY_STORE`. The only way to have this setting set while the recovery source
is `EMPTY_STORE` is to force-allocate an empty primary, but such a forced
allocation ignores this allocation decider.

This commit simplifies the allocation decider so that the `initial_recovery`
setting only applies to shards with a recovery source of `LOCAL_SHARDS`.
2019-07-09 08:42:18 +01:00
Armin Braun 9eac5ceb1b
Dry up inputstream to bytesreference (#43675) (#44094)
* Dry up Reading InputStream to BytesReference
* Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences
2019-07-09 09:18:25 +02:00
Armin Braun dc8f8e40eb
Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode (#43537) (#44082)
* Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode

* See comment in the test: The problem is that when the snapshot delete works out partially on master failover and the retry fails on `SnapshotMissingException` no repository cleanup is run => we still failed even with repo cleanup logic in the delete path now
   * Fixed the test by rerunning a create snapshot and delete loop to clean up the repo before verifying file counts
* Closes #39852
2019-07-09 06:32:08 +02:00
Armin Braun 03332b5aeb
Don't Consistency Check Broken Repository in Test (#43499) (#44071)
* Missed this one in #42189 and it randomly runs into a situation where the broken mock repo is broken such that we can't get to a consistent end state via a delete
* Closes #43498
2019-07-08 17:21:40 +02:00
Tanguy Leroux 251287f89d Check again on-going snapshots/restores of indices before closing (#43873)
Today we prevent any index that is actively snapshotted or restored to be closed. 
This verification is done during the execution of the first phase of index closing 
(ie before blocking the indices).

We should also do this verification again in the last phase of index closing 
(ie after the shard sanity checks and right before actually changing the index 
state and the routing table) because a snapshot/restore could sneak in while
 the shards are verified-before-close.
2019-07-08 17:07:04 +02:00
Mark Tozzi 299a52c17d
Enable validating user-supplied missing values on unmapped fields (#43718) (#43940)
Provides a hook for aggregations to introspect the `ValuesSourceType` for a user supplied Missing value on an unmapped field, when the type would otherwise be `ANY`.  Mapped field behavior is unchanged, and still applies the `ValuesSourceType` of the field.  This PR just provides the hook for doing this, no existing aggregations have their behavior changed.
2019-07-08 10:46:23 -04:00
Armin Braun 2918363e90
Simplify BlobStoreRepository (Flatten Nested Classes) (#42833) (#44060)
* In the current codebase it is hardly obvious what code operates on a shard and is run by a datanode what code operates on the global metadata and is run on master
   * Fixed by adjusting the method names accordingly
* The nested context classes don't add much if any value, they simply spread out the parameters that go into a shard snapshot create or delete all over the place since their
constructors can be inlined in all spots
   * Fixed by flattening the nested classes into BlobStoreRepository
* Also:
  * Inlined the other single use inner classes
2019-07-08 14:57:27 +02:00
Armin Braun afe81fd625
Some Cleanup in Test Framework (#44039) (#44059)
* Remove some obvious dead code
* Move assert methods that were only used in a single test class to the child they belong to
* Inline some redundant methods
2019-07-08 14:15:31 +02:00
David Turner 3f3bcb23c2 AwaitsFix testForceStaleReplicaToBePromotedToPrimary
Relates #44049
2019-07-08 11:26:57 +01:00
David Turner 3129f5b42e Do not copy initial recovery filter during split (#44053)
If an index is the result of a shrink then it will have a value set for
`index.routing.allocation.initial_recovery._id`. If this index is subsequently
split then this value will be copied over, forcing the initial allocation of
the split shards to occur on the node on which the shrink took place. Moreover
if this node no longer exists then the split will fail.  This commit suppresses
the copying of this setting when splitting an index.

Fixes #43955
2019-07-08 10:32:05 +01:00
Armin Braun af9b98e81c
Recursively Delete Unreferenced Index Directories (#42189) (#44051)
* Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes
* Runs after ever delete operation
* Relates  #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)
2019-07-08 10:55:39 +02:00
Przemyslaw Gomulka 247f2dabad
Fix decimal point parsing for date_optional_time backport(#43859) #44050
Joda allowed for date_optional_time and strict_date_optional_time a decimal point to be . dot or , comma
For our java.time implementation we should also extend this for strict_date_optional_time-nanos
the approach to fix this is the same as in iso8601 parser
closes #43730
2019-07-08 09:56:01 +02:00
Armin Braun f6efc55556
Fix SnapshotResiliencyTest (#44015) (#44041)
* Closes #43989
2019-07-07 19:59:16 +02:00
Armin Braun 990ac4ca83
Some Cleanup in BlobStoreRepository (#43323) (#44043)
* Some Cleanup in BlobStoreRepository

* Extracted from #42833:
  * Dry up index and shard path handling
  * Shorten XContent handling
2019-07-07 19:50:46 +02:00
Nhat Nguyen 9089820d8f Enable indexing optimization using sequence numbers on replicas (#43616)
This PR enables the indexing optimization using sequence numbers on
replicas. With this optimization, indexing on replicas should be faster
and use less memory as it can forgo the version lookup when possible.
This change also deactivates the append-only optimization on replicas.

Relates #34099
2019-07-05 22:12:08 -04:00
Yannick Welsch 504a43d43a Move ConnectionManager to async APIs (#42636)
This commit converts the ConnectionManager's openConnection and connectToNode methods to
async-style. This will allow us to not block threads anymore when opening connections. This PR also
adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove
some hacks in the test infrastructure that had to account for the previous synchronous nature of the
connection APIs.
2019-07-05 20:40:22 +02:00
Yannick Welsch 88783927d1 Weaken assertion in PublicationTransportHandler (#44014)
These assertions do not hold true when a master fails during publication and quickly becomes
master again, publishing a new cluster state in a higher term which races against the previous
cluster state publication to self (which does not matter anyway).

Relates #43994

Closes #44012
2019-07-05 18:27:42 +02:00
Yannick Welsch 1220ff5b6d Publish to self through transport (#43994)
This commit ensures that cluster state publications to self also go through the transport layer. This
allows voting-only nodes to intercept the publication to self.

Fixes an issue discovered by a test failure where a voting-only node, which was the only
bootstrapped node, would not step down as master after state transfer because publishing to self
would succeed.

Closes #43631
2019-07-05 13:00:52 +02:00
Yannick Welsch 5cdf3ff3fa Revert "[TEST] Mute RemoteClusterServiceTests.testCollectNodes"
This reverts commit d8a2970fa4.
2019-07-05 11:02:42 +02:00
David Turner 06df0c0a4c Improve RetentionLease(Bgrd)SyncAction#toString() (#43987)
Today `RetentionLeaseSyncAction.Request` and
`RetentionLeaseBackgroundSyncAction.Request` both describe themselves as
`Request{...}` in the value returned from their respective `toString()`
methods. This commit adds the name of the owning class to both so we have
something a bit easier to search for and so we can distinguish foreground from
background syncs in logs and test failures and so on.
2019-07-05 09:58:35 +01:00
David Turner 435a83f3fd Add more logging to testOnlyBlocksOnConnectionsToNewNodes (#43979)
Some more output from this occasionally-failing test tracked in #40170.
2019-07-05 09:54:48 +01:00
Jim Ferenczi cdf55cb5c5 Refactor index engines to manage readers instead of searchers (#43860)
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
2019-07-04 22:49:43 +02:00
Christoph Büscher aeb3c1fd1b Prevent types deprecation warning for indices.exists requests (#43963)
Currently we log a deprecation warning to the types removal in
RestGetIndicesAction even if the REST method is HEAD, which is used by the
indices.exists API. Since the body is empty in this case we should not need to
show the deprecation warning.

Closes #43905
2019-07-04 17:20:43 +02:00
Tanguy Leroux b037aeaa6e
Fix IndexShardIT.testIndexCanChangeCustomDataPath() (#43978)
The test IndexShardIT.testIndexCanChangeCustomDataPath() fails
 on 7.x and 7.3 because the translog cannot be recovered.

While I can't reproduce the issue, I think it has been introduced in #43752 
which changed ReadOnlyEngine so that it opens the translog in its 
constructor in order to load the translog stats. This opening writes a 
new checkpoint file, but because 7.x/7.3 does not wait for shards to be 
started after being closed, the test immediately starts to copy shard files
 to a new directory and possibly does not copy all the required translog files.

By waiting for the shards to be started after being closed, we ensure 
that the shards (and engines) have been correctly initialized and that
 the translog checkpoint file is not currently being written.

closes #43964
2019-07-04 17:06:37 +02:00
Martijn van Groningen 653f1436a0
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-07-04 13:05:10 +02:00
Alan Woodward 4b99255fed Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:28:55 +01:00
Jim Ferenczi 2cc0a56fe6 Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941)
Disjunction over two individual terms in a phrase query with multi-word synonyms
wrongly applies a prefix query to each of these terms. This change fixes this bug
by inversing the logic to use prefixes on `phrase_prefix` queries only.

Closes #43308
2019-07-04 09:39:39 +02:00
Henning Andersen cacc3f7ff8 Async IO Processor release before notify (#43682)
This commit changes async IO processor to release the promiseSemaphore
before notifying consumers. This ensures that a bad consumer that
sometimes does blocking (or otherwise slow) operations does not halt the
processor. This should slightly increase the concurrency for shard
fsync, but primarily improves safety so that one bad piece of code has
less effect on overall system performance.
2019-07-04 06:33:38 +02:00
Igor Motov c593085104 Geo: Refactors libs/geo parser to provide serialization logic as well (#43717)
Enables libs/geo parser to return a geometry format object that can
perform both serialization and deserialization functions. This can
be useful for ingest nodes that are trying to modify an existing
geometry in the source.

Relates to #43554
2019-07-03 19:31:44 -04:00
Adrien Grand 680edbe3f1
Bump current version to 7.4. (#43927) 2019-07-03 20:32:04 +02:00
Armin Braun be20fb80e4
Recursive Delete on BlobContainer (#43281) (#43920)
This is a prerequisite of #42189:

* Add directory delete method to blob container specific to each implementation:
  * Some notes on the implementations:
       * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting.
       * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block.
       * HDFS and FS are trivial since we have directory delete methods available for them
* Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)
2019-07-03 17:14:57 +02:00
Alan Woodward 49d69bf987 Actually close IndexAnalyzers contents (#43914)
IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
2019-07-03 16:06:58 +01:00
David Turner 9cecc31cdc Shortcut simple patterns ending in `*` (#43904)
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.

Microbenchmark results before this change:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      1113.839 ±(99.9%) 6.338 ns/op [Average]
      (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
      CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)

Microbenchmark results with this change applied:

    Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
      433.190 ±(99.9%) 0.644 ns/op [Average]
      (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
      CI (99.9%): [432.546, 433.833] (assumes normal distribution)

The microbenchmark in question was:

    @Fork(3)
    @Warmup(iterations = 10)
    @Measurement(iterations = 10)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    @State(Scope.Benchmark)
    @SuppressWarnings("unused") //invoked by benchmarking framework
    public class RegexStartsWithBenchmark {

        private static final String testString = "abcdefghijklmnopqrstuvwxyz";
        private static final String[] patterns;

        static {
            patterns = new String[testString.length() + 1];
            for (int i = 0; i <= testString.length(); i++) {
                patterns[i] = testString.substring(0, i) + "*";
            }
        }

        @Benchmark
        public void performSimpleMatch() {
            for (int i = 0; i < patterns.length; i++) {
                Regex.simpleMatch(patterns[i], testString);
            }
        }
    }
2019-07-03 14:15:27 +01:00
paulward24 cff027499a Ensure to access RecoveryState#fileDetails under lock
Closes #43840
2019-07-03 07:39:58 -04:00
Armin Braun 7059224668
Optimize Snapshot Finalization (#42723) (#43908)
* Optimize Snapshot Finalization

* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
2019-07-03 13:26:35 +02:00
Armin Braun 455b12a4fb
Add Ability to List Child Containers to BlobContainer (#42653) (#43903)
* Add Ability to List Child Containers to BlobContainer (#42653)

* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
2019-07-03 11:30:49 +02:00
Henning Andersen cd2972239c AsyncIOProcessor preserve thread context (#43729)
AsyncIOProcessor now preserves thread context, ensuring that deprecation
warnings are not duplicated to other concurrent operations on the same
shard.
2019-07-03 10:22:20 +02:00
Jim Ferenczi 05c0cff1b6 Fix index_prefix sub field name on nested text fields (#43862)
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.

Closes #43741
2019-07-03 09:50:52 +02:00
Armin Braun 826f38cd70
Enable Parallel Deletes in Azure Repository (#42783) (#43886)
* Parallel deletes via private thread pool
2019-07-03 09:28:39 +02:00
Tanguy Leroux 365dfe88ca Refresh translog stats after translog trimming in NoOpEngine (#43825)
This commit changes NoOpEngine so that it refreshes its translog 
stats once translog is trimmed.

Relates #43156
2019-07-03 08:49:14 +02:00
Jake Landis 2dc056b0a0
Read the default pipeline for bulk upsert through an alias (#41963) (#42802)
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
2019-07-02 20:44:33 -05:00
Christoph Büscher 31cf96e7bf Return reloaded analyzers in _reload_search_ananlyzer response (#43813)
Currently the repsonse of the "_reload_search_analyzer" endpoint contains the
index names and nodeIds of indices were analyzers reloading was triggered. This
change add the names of the search-time analyzers that were reloaded.

Closes #43804
2019-07-02 18:51:15 +02:00
Nhat Nguyen 697cd494bf Remove sort by primary term when reading soft-deletes (#43845)
With Lucene rollback (#33473), we should never have more than one
primary term for each sequence number. Therefore we don't have to sort
by the primary term when reading soft-deletes.
2019-07-02 10:54:32 -04:00
Tanguy Leroux b977f019b8
Expose translog stats in ReadOnlyEngine (#43752) (#43823)
Backport of #43752 for 7.x.
2019-07-02 13:39:00 +02:00
David Turner 1e8e85797d Rename and refactor RoutingService (#43827)
The `RoutingService` has a confusing name, since it doesn't really have
anything to do with routing. Its responsibility is submitting reroute commands
to the master.

This commit renames this class to `BatchedRerouteService`, and extracts the
`RerouteService` interface to avoid passing `BiConsumer`s everywhere. It also
removes that `BatchedRerouteService extends AbstractLifecycleComponent` since
this service has no meaningful lifecycle. Finally, it introduces a small
wrapper class to allow for lazy initialization to deal with the dependency loop
when constructing a `Node`.
2019-07-02 07:04:18 +01:00
Christoph Büscher fe3f9f0c6b Yet another `the the` cleanup (#43815) 2019-07-01 20:22:19 +02:00
Yogesh Gaikwad 031d5e96ac
HLRC changes for kerberos grant type (#43642) (#43822)
The TODO from last PR for kerbero grant type was missed.
This commit adds the changes for kerberos grant type in HLRC.
2019-07-02 00:55:02 +10:00
Zachary Tong 1e47ea5f18 Update rare_term version skips, fix SetBackedScalingCuckooFilter javadoc 2019-07-01 10:52:06 -04:00
Zachary Tong ea1794832f Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:30:02 -04:00
Nhat Nguyen 598e00a689 Make peer recovery send file info step async (#43792)
Relates #36195
2019-07-01 08:40:45 -04:00
Julie Tibshirani ffa5919d7c
Add support for 'flattened object' fields. (#43762)
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
2019-07-01 12:08:50 +03:00
Martijn van Groningen 8f3387e7cb
fixed compile errors after cherry-picking 2019-07-01 08:31:31 +02:00
Martijn van Groningen 237f2bd60a
Make ingest executing non blocking (#43361)
Added an additional method to the Processor interface to allow a
processor implementation to make a non blocking call.

Also added semaphore in order to avoid search thread pools from rejecting
search requests originating from the match processor. This is a temporary workaround.
2019-07-01 08:01:46 +02:00
Ryan Ernst 3a2c698ce0
Rename Action to ActionType (#43778)
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.

This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.

relates #34389
2019-06-30 22:00:17 -07:00
Martijn van Groningen eb8e03bc8b
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-30 21:32:51 +02:00
David Turner fca7a19713 Avoid parallel reroutes in DiskThresholdMonitor (#43381)
Today the `DiskThresholdMonitor` limits the frequency with which it submits
reroute tasks, but it might still submit these tasks faster than the master can
process them if, for instance, each reroute takes over 60 seconds. This causes
a problem since the reroute task runs with priority `IMMEDIATE` and is always
scheduled when there is a node over the high watermark, so this can starve any
other pending tasks on the master.

This change avoids further updates from the monitor while its last task(s) are
still in progress, and it measures the time of each update from the completion
time of the reroute task rather than its start time, to allow a larger window
for other tasks to run.

It also now makes use of the `RoutingService` to submit the reroute task, in
order to batch this task with any other pending reroutes. It enhances the
`RoutingService` to notify its listeners on completion.

Fixes #40174
Relates #42559
2019-06-30 16:54:16 +01:00
Nhat Nguyen 55b3ec8d7b Make peer recovery clean files step async (#43787)
Relates #36195
2019-06-29 18:30:51 -04:00
Albert Zaharovits 5e17bc5dcc
Consistent Secure Settings #40416
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
2019-06-29 23:26:17 +03:00
Ryan Ernst 28ab77a023
Add StreamableResponseAction to aid in deprecation of Streamable (#43770)
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.

relates #34389
2019-06-28 21:40:00 -07:00
Tanguy Leroux f02cbe9e40 Trim translog for closed indices (#43156)
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.

At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
 also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).

Relates to #42445
2019-06-28 16:58:39 +02:00
Jim Ferenczi 7ca69db83f Refactor IndexSearcherWrapper to disallow the wrapping of IndexSearcher (#43645)
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
2019-06-28 16:28:02 +02:00
weizijun 377c4cfdc0 Fix threshold spelling errors (#43326)
Substitutes treshold by threshold
2019-06-28 15:47:57 +02:00
Alan Woodward 81dbcfb268 Wildcard intervals (#43691)
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.

Closes #43198
2019-06-28 14:04:03 +01:00
Christoph Büscher 2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Alan Woodward 51b230f6ab
Fix PreConfiguredTokenFilters getSynonymFilter() implementations (#38839) (#43678)
When we added support for TokenFilterFactories to specialise how they were used when parsing
synonym files, PreConfiguredTokenFilters were set up to either apply themselves, or be ignored.
This behaviour is a leftover from an earlier iteration, and also has an incorrect default.

This commit makes preconfigured token filters usable in synonym file parsing by default, and brings
those filters that should not be used into line with index-specific filter factories; in indexes created
before version 7 we emit a deprecation warning, and we throw an error in indexes created after.

Fixes #38793
2019-06-28 08:19:00 +01:00
Ryan Ernst 5b4089e57e
Remove nodeId from BaseNodeRequest (#43658)
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.

This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
2019-06-27 18:45:14 -07:00
Igor Motov 3607876a71 Geo: Makes coordinate validator in libs/geo plugable (#43657)
Moves coordinate validation from Geometry constructors into
parser.

Relates #43644
2019-06-27 19:53:41 -04:00
Nhat Nguyen ce8771feb7 Do not use MockInternalEngine in GatewayIndexStateIT (#43716)
GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the
flushing on shutdown. This behaviour, however, can be randomly disabled
in MockInternalEngine.

Closes #43034
2019-06-27 18:28:04 -04:00
Yannick Welsch 6744344ef2 Handle situation where only voting-only nodes are bootstrapped (#43628)
Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will
still try to become elected and bring full master nodes into the cluster.
2019-06-27 18:10:15 +02:00
Jim Ferenczi df4b30fd8b Fix propagation of enablePositionIncrements in QueryStringQueryBuilder (#43578)
This change fixes the propagation of the enablePositionIncrements option
to the underlying QueryBuilder.

Closes #43574
2019-06-27 17:01:01 +02:00
Jim Ferenczi 329d05f61e Fix UOE on search requests that match a sparse role query (#43668)
Search requests executed through the SecurityIndexSearcherWrapper throw
an UnsupportedOperationException if they match a sparse role query.
When low level cancellation is activated (which is the default since #42857),
the context index searcher creates a weight that doesn't handle #scorer.
This change fixes this bug and adds a test to ensure that we check this case.
2019-06-27 16:56:56 +02:00
Christoph Büscher 36360358b2 Move query builder caching check to dedicated tests (#43238)
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.

This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.

Closes #43200
2019-06-27 14:56:29 +02:00
Alan Woodward 8ff5519b11 Use preconfigured filters correctly in Analyze API (#43568)
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.

This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.

It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings

Closes #43002
Closes #43621
Closes #43582
2019-06-27 09:07:01 +01:00
Martijn van Groningen 683e116601
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-27 08:35:37 +02:00
Yannick Welsch 05b945d010 Avoid AssertionError when closing engine (#43638)
Lucene throwing an AlreadyClosedException when closing the engine is fine, and should not trigger
an AssertionError.

Closes #43626
2019-06-26 17:40:52 +02:00
Alan Woodward 76d0edd1a4 Add prefix intervals source (#43635)
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.

Relates to #43198
2019-06-26 16:22:12 +01:00
Tim Brooks 2fa6bc5e12
Properly serialize remote query in ReindexRequest (#43596)
This commit modifies the RemoteInfo to clarify that a search query
must always be serialized as JSON. Additionally, it adds an assertion
to ensure that this is the case. This fixes #43406.

Additionally, this PR implements AbstractXContentTestCase for the
reindex request. This is related to #43456.
2019-06-26 10:50:14 -04:00
David Kyle 531efb3fe5 Remove unreleased 7.1.2 version constant (#43629)
This was breaking BWC tests as the presence of the constant implied 7.1.2 was released
2019-06-26 13:53:05 +01:00
David Kyle 58d0d5c51b Mute DiskDisruptionIT#testGlobalCheckpointIsSafe
Relates to #43626
2019-06-26 10:13:41 +01:00
Yannick Welsch 2049f715b3 Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-26 08:07:56 +02:00
David Turner 11f41c4e7d Omit non-masters in ClusterFormationFailureHelper (#41344)
Today the `ClusterFormationFailureHelper` says `... discovery will continue
using ... from last-known cluster state` and lists all the nodes in the
last-known cluster state. In fact we ignore the master-ineligible nodes in the
last-known cluster state during discovery. This commit fixes this by listing
only the master-eligible nodes from the cluster state in this message.
2019-06-26 08:07:56 +02:00
Nhat Nguyen 05e1f55a88 Ensure relocation target still tracked when start handoff (#42201)
If the master removes the relocating shard, but recovery isn't aware of
it, then we can enter an invalid state where ReplicationTracker does not
include the local shard.
2019-06-25 23:19:59 -04:00
Jake Landis 9a3c86d422
include 7.2.1 as a version (#43584) 2019-06-25 16:02:48 -05:00
David Turner e738f0e6d2 Allow extra time for a warning to be logged (#43597)
Today we assert that a warning is logged after no more than
`discovery.cluster_formation_warning_timeout`, but the deterministic scheduler
adds a small amount of extra randomness to the timing of future events, causing
the following build to fail:

    ./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" -Dtests.seed=DF35C28D4FA9EE2D

This commit adds an allowance for this extra time.
2019-06-25 20:04:56 +01:00
Tanguy Leroux 0dc1c12f13
Fix indices shown in _cat/indices (#43286)
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.

This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
 information about the indices.

Closes #39933
2019-06-25 20:02:34 +02:00
James Baiera 1b902aa746 Make enrich processor use search action through a client (#43311)
Add client to processor parameters in the ingest service.
Remove the search provider function from the processor parameters.
ExactMatchProcessor and Factory converted to use client.
Remove test cases that are no longer applicable from processor.
2019-06-25 13:09:08 -04:00
Zachary Tong 63fef5a31e Add scripting support to AggregatorTestCase (#43494)
This refactors AggregatorTestCase to allow testing mock scripts.
The main change is to QueryShardContext.  This was previously mocked,
but to get the ScriptService you have to invoke a final method
which can't be mocked.

Instead, we just create a mostly-empty QueryShardContext and populate
the fields that are needed for testing.  It also introduces a few
new helper methods that can be overridden to change the default
behavior a bit.

Most tests should be able to override getMockScriptService() to supply
a ScriptService to the context, which is later used by the aggs.
More complicated tests can override queryShardContextMock() as before.

Adds a test to MaxAggregatorTests to test out the new functionality.
2019-06-25 11:52:12 -04:00
Przemysław Witek c702cd7415
[7.x] Implement XContentParser.genericMap and XContentParser.genericMapOrdered methods (#42059) (#43575) 2019-06-25 16:04:54 +02:00
Armin Braun 62a28921e8
Cleanup IndicesService#CacheCleaner Scheduling (#42060) (#43528)
* Follow up to #42016
2019-06-25 13:04:04 +02:00
Yannick Welsch 3d5e4577aa Fix testPostOperationGlobalCheckpointSync
The conditions in this test do not hold true anymore after #43205.

Relates to #43205
2019-06-25 12:49:29 +02:00
Martijn van Groningen f587519f17
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-25 10:09:51 +02:00
Nhat Nguyen 01205432fe Unmute testOpenCloseApiWildcards
Relates #39578
2019-06-24 17:12:57 -04:00
Jim Ferenczi ae31ca5f7e Fix score mode of the MinimumScoreCollector (#43527)
This change fixes the score mode of the minimum score collector to
be set based on the score mode of the child collector (top docs).

Closes #43497
2019-06-24 21:32:33 +02:00
Yannick Welsch d45f12799c Sync global checkpoint on pending in-sync shards (#43526)
At the end of a peer recovery the primary wants to mark the replica as in-sync. For that the
persisted local checkpoint of the replica needs to have caught up with the global checkpoint on the
primary. If translog durability is set to ASYNC, this means that information about the persisted local
checkpoint can lag on the primary and might need to be explicitly fetched through a global
checkpoint sync action. Unfortunately, that action will only be triggered after 30 seconds, and, even
worse, will only run based on what the in-sync shard copies say (see
IndexShard.maybeSyncGlobalCheckpoint). As the replica has not been marked as in-sync yet, it is
not taken into consideration, and the primary might have its global checkpoint equal to the max seq
no, so it thinks nothing needs to be done.

Closes #43486
2019-06-24 18:35:57 +02:00
Zachary Tong eaa9ee1f16 Set document on script when using Bytes.WithScript (#43390)
Long and Double ValuesSource set the current document on the script
before executing, but Bytes was missing this method call.  That meant
it was possible to generate an OutOfBoundsException when using
a "value" script (field + script) on keyword or other bytes
fields.

This adds in the method call, and a few yaml tests to verify correct
behavior.
2019-06-24 12:20:28 -04:00
Andrey Ershov 98d7d231bb Fix testNoMasterActions (#43471)
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.
Similar to #42798
Closes #42051

(cherry picked from commit cd1ed662f847a0055ede7dfbd325e214ec4d1490)
2019-06-24 18:53:58 +03:00
Martijn van Groningen 101cf384ba
Replace Streamable w/ Writable in AcknowledgedResponse and subclasses (backport 7.x) (#43525)
This commit replaces usages of Streamable with Writeable for the
AcknowledgedResponse and its subclasses, plus associated actions.

Note that where possible response fields were made final and default
constructors were removed.

This is a large PR, but the change is mostly mechanical.

Relates to #34389
Backport of #43414
2019-06-24 13:47:37 +02:00
Tanguy Leroux 41ebaf57b5 Do not hang on unsupported HTTP methods (#43362)
Unsupported HTTP methods are detected during requests dispatching
which generates an appropriate error response. Sadly, this error is
never sent back to the client because the method of the original
request is checked again in DefaultRestChannel which throws again
an IllegalArgumentException that is never handled.

This pull request changes the DefaultRestChannel so that the latest
exception is swallowed, allowing the error message to be sent back
to the client. It also eagerly adds the objects to close to the toClose
list so that resources are more likely to be released if something
 goes wrong during the response creation and sending.
2019-06-24 13:16:29 +02:00
Yannick Welsch 19520d4640 Add additional logging for #43034
It's unclear why sometimes the shard is not flushed on closing
2019-06-24 12:30:22 +02:00
Yannick Welsch 127a608147 Assert that NOOPs must succeed (#43483)
We currently assert that adding deletion tombstones to Lucene must always succeed if it's not a
tragic exception, and the same should also hold true for NOOP tombstones. We rely on this
assumption, as without this, we have the risk of creating gaps in the history, which will break
operation-based recoveries and CCR.
2019-06-24 11:38:34 +02:00
Nhat Nguyen 04bc754d8d Cleanup legacy logic in CombinedDeletionPolicy (#43484)
This change removes the support for pre-v6 index commits which 
do not have sequence numbers.
2019-06-23 11:30:04 -04:00
Martijn van Groningen df9f06213d
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-21 19:58:04 +02:00
Luca Cavanna 186c3122be
[TEST] Embed msearch samples in MultiSearchRequestTests (#43482)
Depending on git configuration, line feed on checked out files may be
platform dependent, which causes problems to some msearch tests as the
line separator must always be `/n`. With this change we move two files
to the test code so that we control exactly what line separator is used,
given that the corresponding tests fail on windows.

Closes #43464
2019-06-21 19:05:53 +02:00
David Turner e4fd0ce730 Reduce TestLogging usage in DisruptionIT tests (#43411)
Removes `@TestLogging` annotations in `*DisruptionIT` tests, so that the only
tests with annotations are those with open issues. Also adds links to the open
issues in the remaining cases.

Relates #43403
2019-06-21 15:01:03 +01:00
Christoph Büscher 4fe650c9e5 Fix DefaultShardOperationFailedException subclass xcontent serialization (#43435)
The current toXContent implementation can fail when the superclasses toXContent
is called (see #43423). This change makes sure that 
DefaultShardOperationFailedException#toXContent is final and implementations
need to add special fields in #innerToXContent. All implementations should write
to self-contained xContent objects. Also adding a test for xContent deserialization 
to CloseIndexResponseTests.

Closes #43423
2019-06-21 14:31:19 +02:00
Yu c88f2f23a5 Make Recovery API support `detailed` params (#29076)
Properly forwards the `detailed` parameter to show the recovery stats details.

Closes #28910
2019-06-21 09:05:33 +02:00
Andrei Stefan 90e151edeb
Mute MultiSearchRequestTests.java tests (#43467) 2019-06-21 08:38:21 +03:00
Jim Ferenczi cc6c114cb8 Fix round up of date range without rounding (#43303)
Today when searching for an exclusive range the java date math parser rounds up the value
with the granularity of the operation. So when searching for values that are greater than
"now-2M" the parser rounds up the operation to "now-1M". This behavior was introduced when
we migrated to java date but it looks like a bug since the joda math parser rounds up values
but only when a rounding is used. So "now/M" is rounded to "now-1ms" (minus 1ms to get the largest inclusive value)
in the joda parser if the result should be exclusive but no rounding is applied if the input
is a simple operation like "now-1M". This change restores the joda behavior in order to have
a consistent parsing in all versions.

Closes #43277
2019-06-20 23:59:08 +02:00
Tim Brooks 827f8fcbd5
Move reindex request parsing into request (#43450)
Currently the fromXContent logic for reindex requests is implemented in
the rest action. This is inconsistent with other requests where the
logic is implemented in the request. Additionally, it requires access to
the rest action in order to parse the request. This commit moves the
logic and tests into the ReindexRequest.
2019-06-20 17:49:11 -04:00
sandmannn cf610b5e81 Added parsing of erroneous field value (#42321) 2019-06-20 15:24:04 -04:00
Jake Landis 2f2d0a198f
add version 6.8.2 2019-06-20 12:07:55 -05:00
Zachary Tong a8a81200d0 Better support for unmapped fields in AggregatorTestCase (#43405)
AggregatorTestCase will NPE if only a single, null MappedFieldType
is provided (which is required to simulate an unmapped field).  While
it's possible to test unmapped fields by supplying other, non-related
field types... that's clunky and unnecessary.  AggregatorTestCase
just needs to filter out null field types when setting up.
2019-06-20 11:31:49 -04:00
Yannick Welsch 8c856d6d91 Adapt local checkpoint assertion
With async durability, it does not hold true anymore after #43205. This is fine.
2019-06-20 17:29:53 +02:00
Armin Braun 99a44a04f7
Fix Infinite Loops in ExceptionsHelper#unwrap (#42716) (#43421)
* Fix Infinite Loops in ExceptionsHelper#unwrap

* Keep track of all seen exceptions and break out on loops
* Closes #42340
2019-06-20 16:38:28 +02:00
Armin Braun 39fef8379b
Fix FsRepositoryTests.testSnapshotAndRestore (#42925) (#43420)
* The commit generation can be 3 or 2 here -> fixed by checking the actual generation on the second commit instead of hard coding 2
* Closes #42905
2019-06-20 16:36:40 +02:00
synical b4c4018d00 Remove Confusing Comment (#43400) 2019-06-20 15:02:37 +01:00
David Turner c8eb09f158 Fail connection attempts earlier in tests (#43320)
Today the `DisruptibleMockTransport` always allows a connection to a node to be
established, and then fails requests sent to that node such as the subsequent
handshake. Since #42342, we log handshake failures on an open connection as a
warning, and this makes the test logs rather noisy. This change fails the
connection attempt first, avoiding these unrealistic warnings.
2019-06-20 14:45:24 +01:00
Yannick Welsch e04a2258fc Fix testGlobalCheckpointSync
The test needed adaption after #43205, as the ReplicationTracker now distinguishes between the
knowledge of the persisted global checkpoint and the computed global checkpoint on the primary

Follow-up to #43205
2019-06-20 14:00:00 +02:00
Yannick Welsch a76c034866 Reduce shard started failure logging (#43330)
If the master is stepping or shutting down, the error-level logging can cause quite a bit of noise.
2019-06-20 13:23:05 +02:00
Yannick Welsch 7f8e1454ab Advance checkpoints only after persisting ops (#43205)
Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is
that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This
leaves room for the history below the global checkpoint to still change in case of a crash. As we rely
on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard
copies / follower clusters going out of sync.

This commit required changing some core classes in the system:

- The LocalCheckpointTracker keeps track now not only of the information whether an operation has
been processed, but also whether that operation has been persisted to disk.
- TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once
they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this.
- ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all
shard copies when in primary mode. The computed global checkpoint (which represents the
minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored
in the checkpoint entry for the local shard copy, has been moved to an extra field.
- The periodic global checkpoint sync now also takes async durability into account, where the local
checkpoints on shards only advance when the translog is asynchronously fsynced. This means that
the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is
not sufficient anymore.
- The new index closing API does not work when combined with async durability. The shard
verification step is now requires an additional pre-flight step to fsync the translog, so that the main
verify shard step has the most up-to-date global checkpoint at disposition.
2019-06-20 11:12:38 +02:00
Tanguy Leroux 24cfca53fa Reconnect remote cluster when seeds are changed (#43379)
The RemoteClusterService should close the current 
RemoteClusterConnection and should build it again if 
the seeds are changed, similarly to what is done when 
the ping interval or the compression settings are changed.

Closes #37799
2019-06-20 10:30:02 +02:00
Luca Cavanna 94a4bc9933 SearchPhaseContext to not extend ActionListener (#43269)
The fact that SearchPhaseContext extends ActionListener makes it hard
to reason about when the original listener is notified and to trace
those calls. Also, the corresponding onFailure and onResponse were
only needed in two places, one each, where they can be replaced by a
more intuitive call, like sendSearchResponse for onResponse.
2019-06-20 10:21:24 +02:00
Martijn van Groningen 9de4e878f7
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-20 09:44:31 +02:00
Jim Ferenczi c33d62adbc Reduce the number of docvalues iterator created in the global ordinals fielddata (#43091)
Today the fielddata for global ordinals re-creates docvalues readers of each segment
when building the iterator of a single segment. This is required because the lookup of
global ordinals needs to access the docvalues's TermsEnum of each segment to retrieve
the original terms. This also means that we need to create NxN (where N is the number of segment in the index) docvalues iterators
each time we want to collect global ordinal values. This wasn't an issue in previous versions since docvalues readers are stateless
before 6.0 so they are reused on each segment but now that docvalues are iterators we need to create a new instance each time
we want to access the values. In order to avoid creating too many iterators this change splits
the global ordinals fielddata in two classes, one that is used to cache a single instance per directory reader and one
that is created from the cached instance that can be used by a single consumer. The latter creates the TermsEnum of each segment
once and reuse them to create the segment's iterator. This prevents the creation of all TermsEnums each time we want to access
the value of a single segment, hence reducing the number of docvalues iterator to create to Nx2 (one iterator and one lookup per segment).
2019-06-20 08:44:07 +02:00
Jason Tedor 1f1a035def
Remove stale test logging annotations (#43403)
This commit removes some very old test logging annotations that appeared
to be added to investigate test failures that are long since closed. If
these are needed, they can be added back on a case-by-case basis with a
comment associating them to a test failure.
2019-06-19 22:58:22 -04:00
Lee Hinman 6b084e55c5
[7.x] Prevent NullPointerException in TransportRolloverAction (#43353) (#43397)
It's possible for the passed in `IndexMetaData` to be null (for
instance, cluster state passed in does not have the index in its
metadata) which in turn can cause a `NullPointerException` when
evaluating the conditions for an index. This commit adds null protection
and unit tests for this case.

Resolves #43296
2019-06-19 16:07:28 -06:00
Jim Ferenczi b957aa46ce Allocate memory lazily in BestBucketsDeferringCollector (#43339)
While investigating memory consumption of deeply nested aggregations for #43091
the memory used to keep track of the doc ids and buckets in the BestBucketsDeferringCollector
showed up as one of the main contributor. In my tests half of the memory held in the
 BestBucketsDeferringCollector is associated to segments that don't have matching docs
 in the selected buckets. This is expected on fields that have a big cardinality since each
 bucket can appear in very few segments. By allocating the builders lazily this change
 reduces the memory consumption by a factor 2 (from 1GB to 512MB), hence reducing the
impact on gcs for these volatile allocations. This commit also switches the PackedLongValues.Builder
with a RoaringDocIdSet in order to handle very sparse buckets more efficiently.

I ran all my tests on the `geoname` rally track with the following query:

````
{
    "size": 0,
    "aggs": {
        "country_population": {
            "terms": {
                "size": 100,
                "field": "country_code.raw"
            },
            "aggs": {
                "admin1_code": {
                    "terms": {
                        "size": 100,
                        "field": "admin1_code.raw"
                    },
                    "aggs": {
                        "admin2_code": {
                            "terms": {
                                "size": 100,
                                "field": "admin2_code.raw"
                            },
                            "aggs": {
                                "sum_population": {
                                    "sum": {
                                        "field": "population"
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
````
2019-06-19 22:10:59 +02:00
Christos Soulios d1637ca476
Backport: Refactor aggregation base classes to remove doEquals() and doHashCode() (#43363)
This PR is a backport a of #43214 from v8.0.0

A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java).

Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods.

This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.
2019-06-19 22:31:06 +03:00
Armin Braun be42b2c70c
Fix NetworkUtilsTests (#43295) (#43378)
* Follow up to #42109:
   * Adjust test to only check that interface lookup by name works not actually lookup IPs which is brittle since virtual interfaces can be destroyed/created by Docker while the tests are running

Co-authored-by:  Jason Tedor <jason@tedor.me>
2019-06-19 21:23:09 +02:00
Lee Hinman d81ce9a647 Return 0 for negative "free" and "total" memory reported by the OS (#42725)
* Return 0 for negative "free" and "total" memory reported by the OS

We've had a situation where the MX bean reported negative values for the
free memory of the OS, in those rare cases we want to return a value of
0 rather than blowing up later down the pipeline.

In the event that there is a serialization or creation error with regard
to memory use, this adds asserts so the failure will occur as soon as
possible and give us a better location for investigation.

Resolves #42157

* Fix test passing in invalid memory value

* Fix another test passing in invalid memory value

* Also change mem check in MachineLearning.machineMemoryFromStats

* Add background documentation for why we prevent negative return values

* Clarify comment a bit more
2019-06-19 10:35:48 -06:00
Nhat Nguyen b5c8b32cab Do not use soft-deletes to resolve indexing strategy (#43336)
This PR reverts #35230.

Previously, we reply on soft-deletes to fill the mismatch between the
version map and the Lucene index. This is no longer needed after #43202
where we rebuild the version map when opening an engine. Moreover,
PrunePostingsMergePolicy can prune _id of soft-deleted documents out of
order; thus the lookup result including soft-deletes sometimes does not
return the latest version (although it's okay as we only use a valid
result in an engine).

With this change, we use only live documents in Lucene to resolve the
indexing strategy. This is perfectly safe since we keep all deleted
documents after the local checkpoint in the version map.

Closes #42979
2019-06-19 10:40:24 -04:00
Martijn van Groningen a4c45b5d70
Replace Streamable w/ Writeable in SingleShardRequest and subclasses (#43222) (#43364)
Backport of: https://github.com/elastic/elasticsearch/pull/43222

This commit replaces usages of Streamable with Writeable for the
SingleShardRequest / TransportSingleShardAction classes and subclasses of
these classes.

Note that where possible response fields were made final and default
constructors were removed.

Relates to #34389
2019-06-19 16:15:09 +02:00
Paul Sanwald 8578aba654
[backport] Adds a minimum interval to `auto_date_histogram`. (#42814) (#43285)
Backports minimum interval to date histogram
2019-06-19 07:06:45 -04:00
Igor Motov 9f7d1ff2de Geo: Add coerce support to libs/geo WKT parser (#43273)
Adds support for coercing not closed polygons and ignoring Z value
to libs/geo WKT parser.

Closes #43173
2019-06-18 14:41:01 -04:00
Jim Ferenczi de1a685cce Fix sporadic failures in QueryStringQueryTests#testToQueryFuzzyQueryAutoFuziness (#43322)
This commit ensures that the test does not use reserved keyword (OR, AND, NOT)
when generating the random query strings.

Closes #43318
2019-06-18 20:18:09 +02:00
David Turner 90a8589294 Local node is discovered when cluster fails (#43316)
Today the `ClusterFormationFailureHelper` does not include the local node in
the list of nodes it claims to have discovered. This means that it sometimes
reports that it has not discovered a quorum when in fact it has. This commit
adds the local node to the set of discovered nodes.
2019-06-18 12:23:23 +01:00
David Turner 2e064e0d13 Allow election of nodes outside voting config (#43243)
Today we suppress election attempts on master-eligible nodes that are not in
the voting configuration. In fact this restriction is not necessary: any
master-eligible node can safely become master as long as it has a fresh enough
cluster state and can gather a quorum of votes. Moreover, this restriction is
sometimes undesirable: there may be a reason why we do not want any of the
nodes in the voting configuration to become master.

The reason for this restriction is as follows. If you want to shut the master
down then you might first exclude it from the voting configuration. When this
exclusion succeeds you might reasonably expect that a new master has been
elected, since the voting config exclusion is almost always a step towards
shutting the node down. If we allow nodes outside the voting configuration to
be the master then the excluded node will continue to be master, which is
confusing.

This commit adjusts the logic to allow master-eligible nodes to attempt an
election even if they are not in the voting configuration. If such a master is
successfully elected then it adds itself to the voting configuration. This
commit also adjusts the logic that causes master nodes to abdicate when they
are excluded from the voting configuration, to avoid the confusion described
above.

Relates #37712, #37802.
2019-06-18 12:10:48 +01:00
Nhat Nguyen 0c5086d2f3 Rebuild version map when opening internal engine (#43202)
With this change, we will rebuild the live version map and local
checkpoint using documents (including soft-deleted) from the safe commit
when opening an internal engine. This allows us to safely prune away _id
of all soft-deleted documents as the version map is always in-sync with
the Lucene index.

Relates #40741
Supersedes #42979
2019-06-17 18:08:09 -04:00
David Turner 2d9b3a69e8 Relocation targets are assigned shards too (#43276)
Adds relocation targets to the output of
`IndexShardRoutingTable#assignedShards`.
2019-06-17 17:14:09 +01:00
Henning Andersen ba15d08e14
Allow cluster access during node restart (#42946) (#43272)
This commit modifies InternalTestCluster to allow using client() and
other operations inside a RestartCallback (onStoppedNode typically).
Restarting nodes are now removed from the map and thus all
methods now return the state as if the restarting node does not exist.

This avoids various exceptions stemming from accessing the stopped
node(s).
2019-06-17 15:04:17 +02:00
David Turner 4b58827beb Make DiscoveryNodeRole into a value object (#43257)
Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare
these objects' values for equality, and adds a field to allow us to distinguish
unknown roles from known ones with the same name and abbreviation, for clearer
test failures.

Relates #43175
2019-06-17 10:23:29 +01:00
Alpar Torok a8bf18184a
Refactor Version class to make version bumps easier (#42668) (#43215)
With this change we only have to add one line to add a new version.
The intent is to make it less error prone and easier to write a script
to automate the process.
2019-06-17 10:49:20 +03:00
Nhat Nguyen 4b643c50fa Account soft deletes in committed segments (#43126)
This change fixes the delete count issue in segment stats where we don't
account soft-deleted documents from committed segments.

Relates #43103
2019-06-16 22:56:24 -04:00
Jay Modi c3f1e6a542 Ensure threads running before closing node (#43240)
There are a few tests within NodeTests that submit items to the
threadpool and then close the node. The tests are designed to check
how running tasks are affected during node close. These tests can cause
CI failures since the submitted tasks may not be running when the node
is closed and then execute after the thread context is closed, which
triggers an unexpected exception. This change ensures the threads are
running so we avoid the unexpected exception and can test these cases.

The test of task submittal while a node is closing is also important so
an additional but muted test has been added that tests the case where a
task may be getting submitted while the node is closing and ensuring we
do not trigger anything unexpected in these cases.

Relates #42774
Relates #42577
2019-06-14 12:35:43 -06:00
Julie Tibshirani 4b1d8e4433 Allow big integers and decimals to be mapped dynamically. (#42827)
This PR proposes to model big integers as longs (and big decimals as doubles)
in the context of dynamic mappings.

Previously, the dynamic mapping logic did not recognize big integers or
decimals, and would an error of the form "No matching token for number_type
[BIG_INTEGER]" when a dynamic big integer was encountered. It now accepts these
numeric types and interprets them as 'long' and 'double' respectively. This
allows `dynamic_templates` to accept and and remap them as another type such as
`keyword` or `scaled_float`.

Addresses #37846.
2019-06-14 10:05:11 -07:00
Yannick Welsch be9f27bb16 Properly use cancellable threads to stop UnicastZenPing (#42844)
Fixes a backport issue with #42884 where Zen1 was not properly taken into account.
2019-06-14 13:32:44 +02:00
David Turner 221d23de9f
Fix DiscoveryNodeRoleIT (#43225)
The test fails if querying the roles via a transport client, since the
transport client does not have the plugin necessary to interpret the additional
role correctly. This commit adds this plugin to the transport client used.

Relates #43175
Fixes #43223
2019-06-14 12:27:01 +01:00
Christoph Büscher 7af23324e3 SimpleQ.S.B and QueryStringQ.S.B tests should avoid `now` in query (#43199)
Currently the randomization of the q.b. in these tests can create query strings
that can cause caching to be disabled for this query if we query all fields and
there is a date field present. This is pretty much an anomaly that we shouldn't
generally test for in the "testToQuery" tests where cache policies are checked.

This change makes sure we don't create offending query strings so the cache
checks never hit these cases and adds a special test method to check this edge
case.

Closes #43112
2019-06-14 11:21:48 +02:00
Przemyslaw Gomulka 4c8e77e092
Disable DiscoveryNodeRoleIT test due to failures (#43224)
relates #43223
2019-06-14 10:57:22 +02:00
Przemysław Witek 65a584b6fb
[7.x] Report timing stats as part of the Job stats response (#42709) (#43193) 2019-06-14 09:03:14 +02:00
Przemyslaw Gomulka d27c0fd50d
Fix roundUp parsing with composite patterns backport(#43080) (#43191)
roundUp parsers were losing the composite pattern information when new
JavaDateFormatter was created from methods withLocale or withZone.

The roundUp parser should be preserved when calling these methods. This is the same approach in withLocale/Zone methods as in daa2ec8a60/server/src/main/java/org/elasticsearch/common/time/JavaDateFormatter.java

closes #42835
2019-06-14 08:56:26 +02:00
Jason Tedor 2bcc49424d
Register possible node roles in transport client
The transport client needs to be told about the possible node
roles. This commit does that.
2019-06-13 16:46:38 -04:00
Jason Tedor 55dba6ffad
Fix JDK-version dependent exception message parsing
This commit fixes some JDK-version dependent exception message checking
in the discovery node role tests.
2019-06-13 15:46:53 -04:00
Jason Tedor 5bc3b7f741
Enable node roles to be pluggable (#43175)
This commit introduces the possibility for a plugin to introduce
additional node roles.
2019-06-13 15:15:48 -04:00
Martijn van Groningen 1f3db7eb3e
Merge remote-tracking branch 'es/7.x' into enrich-7.x 2019-06-13 16:49:38 +02:00
Simon Willnauer f70141c862 Only load FST off heap if we are actually using mmaps for the term dictionary (#43158)
Given the significant performance impact that NIOFS has when term dicts are
loaded off-heap this change enforces FstLoadMode#AUTO that loads term dicts
off heap only if the underlying index input indicates a memory map.

Relates to #43150
2019-06-13 07:54:02 +02:00
Tal Levy 20031fb13f
Introduce unit tests for ValuesSourceType (#43174) (#43176)
As the ValuesSourceType evolves, it is important to be
confident that new enum constants do not break
backwards-compatibility on the stream. Having dedicated
unit tests for this class will help be sure of that.
2019-06-12 18:17:23 -07:00
Jim Ferenczi 6cfed7ec72 Also mmap terms index (`.tip`) files for hybridfs (#43150)
This change adds the terms index (`.tip`) to the list of extensions
that are memory-mapped by hybridfs. These files used to be accessed
only once to load the terms index on-heap but since #42838 they can
now be used to read the binary FST directly so it is benefical to
memory-map them instead of accessing them via NIO.
2019-06-12 20:54:09 +02:00
Yannick Welsch 8711a092bf Stop SeedHostsResolver on shutdown (#42844)
Fixes an issue where tests would sometimes hang for 5 seconds when restarting a node. The reason
is that the SeedHostsResolver is blockingly waiting on a result for the full 5 seconds when the
corresponding threadpool is shut down.
2019-06-12 19:36:10 +02:00
Simon Willnauer 9d2adfb41e Remove usage of FileSwitchDirectory (#42937)
We are still using `FileSwitchDirectory` in the case a user configures file based pre-load of mmaps. This is trappy for multiple reasons if the both directories used by `FileSwitchDirectory` point to the same filesystem directory. One issue is LUCENE-8835 that cause issues like #37111 - unless LUCENE-8835 isn't fixed we should not use it in elasticsearch. Instead we use a similar trick as we use for HybridFS and subclass mmap directory directly.
2019-06-12 19:35:27 +02:00
Alan Woodward 9de1c69c28 IndexAnalyzers doesn't need to extend AbstractIndexComponent (#43149)
AIC doesn't add anything here, and it removes the need to pass index settings
to the constructor.
2019-06-12 17:48:31 +01:00
Jim Ferenczi 79614aeb2d SearchRequest#allowPartialSearchResults does not handle successful retries (#43095)
When set to false, allowPartialSearchResults option does not check if the
shard failures have been reseted to null. The atomic array, that is used to record
shard failures, is filled with a null value if a successful request on a shard happens
after a failure on a shard of another replica. In this case the atomic array is not empty
but contains only null values so this shouldn't be considered as a failure since all
shards are successful (some replicas have failed but the retries on another replica succeeded).
This change fixes this bug by checking the content of the atomic array and fails the request only
if allowPartialSearchResults is set to false and at least one shard failure is not null.

Closes #40743
2019-06-12 16:27:10 +02:00
Christoph Büscher 7f690e8606 Fix suggestions for empty indices (#42927)
Currently suggesters return null values on empty shards. Usually this gets replaced
by results from other non-epmty shards, but if the index is completely epmty (e.g. after
creation) the search responses "suggest" is also "null" and we don't render a corresponding
output in the REST response. This is an irritating edge case that requires special handling on
the user side (see #42473) and should be fixed.

This change makes sure every suggester type (completion, terms, phrase) returns at least an
empty skeleton suggestion output, even for empty shards. This way, even if we don't find
any suggestions anywhere, we still return and output the empty suggestion.

Closes #42473
2019-06-12 15:42:23 +02:00
Alexander Reelsen 6f95038001 Upgrade HPPC to version 0.8.1 (#43025) 2019-06-12 13:14:16 +02:00
Luca Cavanna afeda1a7b9 Split search in two when made against throttled and non throttled searches (#42510)
When a search on some indices takes a long time, it may cause problems to other indices that are being searched as part of the same search request and being written to as well, because their search context needs to stay open for a long time. This is especially a problem when searching against throttled and non-throttled indices as part of the same request. The problem can be generalized though: this may happen whenever read-only indices are searched together with indices that are being written to. Search contexts staying open for a long time is only an issue for indices that are being written to, in practice.

This commit splits the search in two sub-searches: one for read-only indices, and one for ordinary indices. This way the two don't interfere with each other. The split is done only when size is greater than 0, no scroll is provided and query_then_fetch is used as search type. Otherwise, the search executes like before. Note that the returned num_reduce_phases reflect the number of reduction phases that were run. If the search is split in two, there are three reductions: one non-final for each search, and a final one that merges the results of the previous two.

Closes #40900
2019-06-12 11:25:03 +02:00
Luca Cavanna 31e8bff2ac Rename SearchRequest#crossClusterSearch (#42363)
The SearchRequest#crossClusterSearch method is currently used only as
part of cross cluster search request, when minimizing roundtrips.
It will soon be used also when splitting a search into two: one for
throttled and one for non throttled indices. It will probably be used
for other usecases as well in the future, hence it makes sense to generalize its name to subSearchRequest.
2019-06-12 11:25:03 +02:00
Henning Andersen 30d8085d96 scheduleAtFixedRate would hang (#42993)
Though not in use in elasticsearch currently, it seems surprising that
ThreadPool.scheduler().scheduleAtFixedRate would hang. A recurring
scheduled task is never completed (except on failure) and we test for
exceptions using RunnableFuture.get(), which hangs for periodic tasks.
Fixed by checking that task is done before calling .get().
2019-06-11 19:46:37 +02:00
David Turner 04cde1d6e2 Defer reroute when nodes join (#42855)
Today the master eagerly reroutes the cluster as part of processing node joins.
However, it is not necessary to do this reroute straight away, and it is
sometimes preferable to defer it until later. For instance, when the master
wins its election it processes joins and performs a reroute, but it would be
better to defer the reroute until after the master has become properly
established.

This change defers this reroute into a separate task, and batches multiple such
tasks together.
2019-06-11 14:00:18 +01:00
Henning Andersen 1c7cd09375 Enable TRACE for testRecoverBrokenIndexMetadata (#43081)
Relates to #43034
2019-06-11 12:38:48 +02:00
Jim Ferenczi 900eb4f882 Handle empty terms index in TermsSliceQuery (#43078)
#40741 introduced a merge policy that can drop the postings for the `_id`
field on soft deleted documents. The TermsSliceQuery assumes that every document
has has an entry in the postings for that field so it doesn't check if the terms
index exists or not. This change fixes this bug by checking if the terms index for
the `_id` field is null and ignore the segment entirely if it's the case. This should
be harmless since segments without an `_id` terms index should only contain soft deleted
documents.

Closes #42996
2019-06-11 12:01:53 +02:00
Henning Andersen 6a77dde5ea Better test diag output on OOM (#42989)
If linearizability checking fails with OOM (or other exception), we did
not get the serialized history written into the log, making it difficult
to debug in cases where the problem is hard to reproduce. Fixed to
always attempt dumping the serialized history.

Related to #42244
2019-06-11 09:48:52 +02:00
Alan Woodward 8e23e4518a Move construction of custom analyzers into AnalysisRegistry (#42940)
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.

This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
2019-06-10 14:33:25 +01:00
Jim Ferenczi 39cb1abc9d Fix auto fuzziness in query_string query (#42897)
Setting `auto` after the fuzzy operator (e.g. `"query": "foo~auto"`) in the `query_string`
does not take the length of the term into account when computing the distance and always use
a max distance of 1. This change fixes this disrepancy by ensuring that the term is passed when
the fuzziness is computed.
2019-06-10 10:13:16 +02:00
Vigya Sharma 25218733e6 Allow routing commands with ?retry_failed=true (#42658)
We respect allocation deciders, including the `MaxRetryAllocationDecider`, when
executing reroute commands. If you specify `?retry_failed=true` then the retry
counter is reset, but today this does not happen until after trying to execute
the reroute commands. This means that if an allocation has repeatedly failed,
but you want to take control and assign a shard to a particular node to work
around the repeated failures, you cannot execute the routing command in the
same call to `POST /_cluster/reroute` as the one that resets the failure
counter.

This commit fixes this by resetting the failure counter first, meaning that you
can now explicitly allocate a repeatedly-failed shard like this:

```
POST /_cluster/reroute?retry_failed=true
{
  "commands": [
    {
      "allocate_replica": {
        "index": "blahblah",
        "shard": 2,
        "node": "node-4"
      }
    }
  ]
}
```

Fixes #39546
2019-06-10 08:31:05 +01:00
Jason Tedor 63bad28005
Do not allow modify aliases on followers (#43017)
Now that aliases are replicated by a follower from its leader, this
commit prevents directly modifying aliases on follower indices.
2019-06-09 22:53:54 -04:00
Nhat Nguyen 0ebcb21d2c Unmuted testRecoverBrokenIndexMetadata
These tests should be okay as we flush at the end of peer recovery.

Closes #40867
2019-06-09 10:26:57 -04:00
Nhat Nguyen afe65b5988 Fix assertion in ReadOnlyEngine (#43010)
We should execute the assertion before throwing an exception;
otherwise, it's a noop.
2019-06-09 10:26:56 -04:00
Jason Tedor 915d2f2daa
Refactor put mapping request validation for reuse (#43005)
This commit refactors put mapping request validation for reuse. The
concrete case that we are after here is the ability to apply effectively
the same framework to indices aliases requests. This commit refactors
the put mapping request validation framework to allow for that.
2019-06-09 10:19:04 -04:00
Nhat Nguyen 0a982fc57f Mute testLookupSeqNoByIdInLucene
Tracked at #42979
2019-06-08 00:30:12 -04:00
Jason Tedor b580677412
Fix put mapping request validators random test
This commit fixes a test bug in the request validators random test. In
particular, an assertion was not properly nested in a guard that would
ensure that was at least one failure.

Relates #43000
2019-06-07 17:47:51 -04:00
Jason Tedor d6fe4b648d
Fix possible NPE in put mapping validators (#43000)
When applying put mapping validators, we apply all the validators in the
collection. If a failure occurs, we collect that as a top-level
exception, and suppress any additional failures into the top-level
exception. However, if a request passes the validator after a top-level
exception has been collected, we would try to suppress a null exception
into the top-level exception. This is a violation of the
Throwable#addSuppressed API. This commit addresses this, and adds test
to cover the logic of collecting the failures when validating a put
mapping request.
2019-06-07 16:24:12 -04:00
David Turner 5bc0dfce94
Improve translog corruption detection (#42980)
Today we test for translog corruption by incrementing a byte by 1 somewhere in
a file, and verify that this leads to a `TranslogCorruptionException`.
However, we rely on _all_ corruptions leading to this exception in the
`RemoveCorruptedShardDataCommand`: this command fails if a translog file
corruption leads to a different kind of exception, and `EOFException` and
`NegativeArraySizeException` are both possible. This commit strengthens the
translog corruption detection tests by simulating the following:

- a random value is written
- the file is truncated

It also makes sure that we return a `TranslogCorruptionException` in all such
cases.

Fixes #42661
Backport of #42744
2019-06-07 20:28:02 +01:00
Jason Tedor 479a1eeff6
Drop dead code for socket permissions for transport (#42990)
This code has not been needed since the removal of tribe nodes, it was
left behind when those were dropped (note that regular transport
permissions are handled through transport profiles, even if they are not
explicitly in use).
2019-06-07 15:22:10 -04:00
markharwood 0719779a48
Search - enable low_level_cancellation by default. (#42291) (#42857)
Benchmarking on worst-case queries (max agg on match_all or popular-term query with large index) was not noticeably slower.

Closes #26258
2019-06-07 14:53:17 +01:00
Henning Andersen dea935ac31
Reindex max_docs parameter name (#42942)
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.

The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.

Finally, all 3 endpoints now support max_docs in both body and URL.

Relates #24344
2019-06-07 12:16:36 +02:00
David Turner 5929803413 Relax timeout in NodeConnectionsServiceTests (#42934)
Today we assert that the connection thread is blocked by the time the test gets
to the barrier, but in fact this is not a valid assertion. The following
`Thread.sleep()` will cause the test to fail reasonably often.

```diff
diff --git a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
index 193cde3180d..0e57211cec4 100644
--- a/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
+++ b/server/src/test/java/org/elasticsearch/cluster/NodeConnectionsServiceTests.java
@@ -364,6 +364,7 @@ public class NodeConnectionsServiceTests extends ESTestCase {
             final CheckedRunnable<Exception> connectionBlock = nodeConnectionBlocks.get(node);
             if (connectionBlock != null) {
                 try {
+                    Thread.sleep(50);
                     connectionBlock.run();
                 } catch (Exception e) {
                     throw new AssertionError(e);
```

This change relaxes the test to allow some time for the connection thread to
hit the barrier.

Fixes #40170
2019-06-07 10:38:56 +01:00
henryptung 61b62125b8 Wire query cache into sorting nested-filter computation (#42906)
Don't use Lucene's default query cache when filtering in sort.

Closes #42813
2019-06-06 21:16:58 +02:00
Henning Andersen ca5dbf93a5 Fix concurrent search and index delete (#42621)
Changed order of listener invocation so that we notify before
registering search context and notify after unregistering same.

This ensures that count up/down like what we do in ShardSearchStats
works. Otherwise, we risk notifying onFreeScrollContext before notifying
onNewScrollContext (same for onFreeContext/onNewContext, but we
currently have no assertions failing in those).

Closes #28053
2019-06-06 20:10:43 +02:00
Simon Willnauer 7fcca55a3c [TEST] Remove unnecessary log line 2019-06-06 14:17:44 +02:00
Simon Willnauer 2582e1e8ad Fix `InternalEngineTests#testPruneAwayDeletedButRetainedIds`
The test failed because we had only a single document in the index
that got deleted such that some assertions that expected at least
one live doc failed.

Relates to: #40741
2019-06-06 14:16:24 +02:00
Yannick Welsch 9f7be70f7a Fix testPendingTasks (#42922)
Fixes a race in the test which can be reliably reproduced by adding Thread.sleep(100) to the end of
IndicesService.processPendingDeletes

Closes #18747
2019-06-06 14:15:48 +02:00
Yannick Welsch 72735be673 Fix NPE when rejecting bulk updates (#42923)
Single updates use a different internal code path than updates that are wrapped in a bulk request.
While working on a refactoring to bring both closer together I've noticed that bulk updates were
failing some of the tests that single updates passed. In particular, bulk updates cause
NullPointerExceptions to be thrown and listeners not being properly notified when being rejected
from the thread pool.
2019-06-06 14:15:48 +02:00
Simon Willnauer 2c3bd32aff Add a merge policy that prunes ID postings for soft-deleted but retained documents (#40741)
This change adds a merge policy that drops all _id postings for documents that
are marked as soft-deleted but retained across merges. This is usually unnecessary
unless soft-deletes are used with a retention policy since otherwise a merge would
remove deleted documents anyway.

Yet, this merge policy prevents extreme cases where a very large number of soft-deleted
documents are retained and are impacting update performance.
Note, using this merge policy will remove all lookup by ID capabilities for soft-deleted documents.
2019-06-06 13:41:46 +02:00
Gordon Brown 6eb4600e93
Add custom metadata to snapshots (#41281)
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
2019-06-05 17:30:31 -06:00
Mark Vieira 1f4ff97d7d
Mute failing test
(cherry picked from commit 4952d4facf5949abdb9aae47dbe1ee18cf7eef99)
2019-06-05 13:47:18 -07:00
Przemyslaw Gomulka ab5bc83597
Deprecation info for joda-java migration on 7.x (#42659)
Some clusters might have been already migrated to version 7 without being warned about the joda-java migration changes.
Deprecation api on that version will give them guidance on what patterns need to be changed.
relates. This change is using the same logic like in 6.8 that is: verifying the pattern is from the incompatible set ('y'-Y', 'C', 'Z' etc), not from predifined set, not prefixed with 8. AND was also created in 6.x. Mappings created in 7.x are considered migrated and should not generate warnings

There is no pipeline check (present on 6.8) as it is impossible to verify when the pipeline was created, and therefore to make sure the format is depracated or not
#42010
2019-06-05 19:50:04 +02:00
Simon Willnauer d3524fdd06 Add back import after backport 2019-06-05 11:25:19 +02:00
Simon Willnauer 4dfaeb9046 Remove post Java 9 API usage after backport 2019-06-05 11:24:58 +02:00
Jim Ferenczi de0ea4bbf7 Deduplicate alias and concrete fields in query field expansion (#42328)
The full-text query parsers accept field pattern that are expanded using the mapping.
Alias field are also detected during the expansion but they are not deduplicated with the
concrete fields that are found from other patterns (or the same). This change ensures
that we deduplicate the target fields of the full-text query parsers in order to avoid
adding the same clause multiple times. Boolean queries are already able to deduplicate
clauses during rewrite but since we also use DisjunctionMaxQuery it is preferable to detect
 these duplicates early on.
2019-06-05 11:05:40 +02:00
Simon Willnauer 41a9f3ae3b Use reader attributes to control term dict memory useage (#42838)
This change makes use of the reader attributes added in LUCENE-8671
to ensure that `_id` fields are always on-heap for best update performance
and term dicts are generally off-heap on Read-Only engines.

Closes #38390
2019-06-05 11:01:06 +02:00
David Turner 955aee8a07 More logging in testRerouteOccursOnDiskPassingHighWatermark (#42864)
This test is failing because recoveries of these empty shards are not
completing in a reasonable time, but the reason for this is still obscure. This
commit adds yet more logging.

Relates #40174, #42424
2019-06-05 09:05:44 +01:00
Jason Tedor 78be3dde25
Enable testing against JDK 13 EA builds (#40829)
This commit adds JDK 13 to the CI rotation for testing. For now, we will
be testing against JDK 13 EA builds.
2019-06-04 20:54:24 -04:00
Jason Tedor 117df87b2b
Replicate aliases in cross-cluster replication (#42875)
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.

Relates #41815
2019-06-04 20:36:24 -04:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
Andrey Ershov 6391f90616 Fix testNoMasterActionsWriteMasterBlock (#42798)
This commit performs the proper restore of network disruption.
Previously disruptionScheme.stopDisrupting() was called that does not
ensure that connectivity between cluster nodes is restored. The test
was checking that the cluster has green status, but it was not checking
that connectivity between nodes is restored.
Here we switch to internalCluster().clearDisruptionScheme(true) which
performs both checks before returning.

Closes #39688

(cherry picked from commit c8988d5cf5a85f9b28ce148dbf100aaa6682a757)
2019-06-04 17:24:03 +02:00
Alan Woodward df124f32db Refactor control flow in TransportAnalyzeAction (#42801)
The control flow in TransportAnalyzeAction is currently spread across two large
methods, and is quite difficult to follow. This commit tidies things up a bit, to make
it clearer when we use pre-defined analyzers and when we use custom built ones.
2019-06-04 14:52:46 +01:00
Yu 428beabc49 Remove "template" field in IndexTemplateMetaData (#42099)
Remove "template" field from XContent parsing in IndexTemplateMetaData
2019-06-03 12:43:11 -05:00
Armin Braun 00db9c1a2f
Make Connection Future Err. Handling more Resilient (#42781) (#42804)
* There were a number of possible (runtime-) exceptions that could be raised in the adjusted code and prevent resolving the listener
* Relates #42350
2019-06-03 19:29:36 +02:00
David Turner df0f0b3d40
Rename autoMinMasterNodes to autoManageMasterNodes (#42789)
Renames the `ClusterScope` attribute `autoMinMasterNodes` to reflect its
broader meaning since 7.0.

Backport of the relevant part of #42700 to `7.x`.
2019-06-03 12:12:07 +01:00
Alan Woodward 2129d06643 Create client-only AnalyzeRequest/AnalyzeResponse classes (#42197)
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.

This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
2019-06-03 09:46:36 +01:00
Alan Woodward d0da30e5f4 Return NO_INTERVALS rather than null from empty TokenStream (#42750)
IntervalBuilder#analyzeText will currently return null if it is passed an
empty TokenStream, which can lead to a confusing NullPointerException
later on during querying. This commit changes the code to return
NO_INTERVALS instead.

Fixes #42587
2019-05-31 17:45:57 +01:00
Jason Tedor 61c6a26b31
Remove locale-dependent string checking
We were checking if an exception was caused by a specific reason "Not a
directory". Alas, this reason is locale-dependent and can fail on
systems that are not set to en_US.UTF-8. This commit addresses this by
deriving what the locale-dependent error message would be and using that
for comparison with the actual exception thrown.

Relates #41689
2019-05-31 12:08:38 -04:00
James Baiera 215170b6c3 Merge branch '7.x' into enrich-7.x 2019-05-30 16:13:06 -04:00
Jason Tedor 371cb9a8ce
Remove Log4j 1.2 API as a dependency (#42702)
We had this as a dependency for legacy dependencies that still needed
the Log4j 1.2 API. This appears to no longer be necessary, so this
commit removes this artifact as a dependency.

To remove this dependency, we had to fix a few places where we were
accidentally relying on Log4j 1.2 instead of Log4j 2 (easy to do, since
both APIs were on the compile-time classpath).

Finally, we can remove our custom Netty logger factory. This was needed
when we were on Log4j 1.2 and handled logging in our own unique
way. When we migrated to Log4j 2 we could have dropped this
dependency. However, even then Netty would still pick up Log4j 1.2 since
it was on the classpath, thus the advantage to removing this as a
dependency now.
2019-05-30 16:08:07 -04:00
Mark Vieira c1816354ed
[Backport] Improve build configuration time (#42674) 2019-05-30 10:29:42 -07:00
David Turner d14799f0a5 Prevent merging nodes' data paths (#42665)
Today Elasticsearch does not prevent you from reconfiguring a node's
`path.data` to point to data paths that previously belonged to more than one
node. There's no good reason to be able to do this, and the consequences can be
quietly disastrous. Furthermore, #42489 might result in a user trying to split
up a previously-shared collection of data paths by hand and there's definitely
scope for mixing the paths up across nodes when doing this.

This change adds a check during startup to ensure that each data path belongs
to the same node.
2019-05-30 18:08:55 +01:00
Marios Trivyzas ce30afcd01
Deprecate CommonTermsQuery and cutoff_frequency (#42619) (#42691)
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.

Relates to #27096

(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
2019-05-30 18:04:47 +02:00
David Turner 86b1a07887 Log leader and handshake failures by default (#42342)
Today the `LeaderChecker` and `HandshakingTransportAddressConnector` do not log
anything above `DEBUG` level. However there are some situations where it is
appropriate for them to log at a higher level:

- if the low-level handshake succeeds but the high-level one fails then this
  indicates a config error that the user should resolve, and the exception
  will help them to do so.

- if leader checks fail repeatedly then we restart discovery, and the exception
  will help to determine what went wrong.

Resolves #42153
2019-05-30 08:14:19 +01:00
Igor Motov d2f9ccbe18 Geo: Refactor libs/geo parsers (#42549)
Refactors the WKT and GeoJSON parsers from an utility class into an
instantiatable objects. This is a preliminary step in
preparation for moving out coordinate validators from Geometry
constructors. This should allow us to make validators plugable.
2019-05-29 20:07:27 -04:00
Henning Andersen 53f5d313cd Use correct global checkpoint sync interval (#42642)
A disruption test case need to use a lower checkpoint sync interval
since they verify sequence numbers after the test waiting max 10 seconds
for it to stabilize.

Closes #42637
2019-05-29 08:15:53 +02:00
Michael Basnight be60125a4e Merge branch '7.x' into enrich-7.x 2019-05-28 18:32:18 -05:00
Adrien Grand 38f9e24411
Add 7.1.2 version constant. (#42648)
Relates to #42635
2019-05-28 23:14:10 +02:00
Jim Ferenczi 267e5a1110 fix javadoc of SearchRequestBuilder#setTrackTotalHits (#42219) 2019-05-28 22:12:16 +02:00