Commit Graph

3815 Commits

Author SHA1 Message Date
Nhat Nguyen 2a863ac8ff Fix testCleanUpCommitsWhenGlobalCheckpointAdvanced
Relates #48559
2019-10-29 10:39:16 -04:00
Nhat Nguyen b08cd058bc Greedily advance safe commit on new global checkpoint (#48559)
Today we won't advance the safe commit on a new global checkpoint unless 
the last commit can become safe. This is not great if we have more than
two commits as we can have a new safe commit earlier.

Closes #4853
2019-10-29 10:39:16 -04:00
Jim Ferenczi aa70ff5ea4 Fix failures in ShuffleForcedMergePolicyTests#testDiagnostics (#48627)
This commit fixes intermittent failures in ShuffleForcedMergePolicyTests#testDiagnostics
by setting a more restricted merge policy that ensures that extra merging will not happen
before the forced merge.
2019-10-29 13:46:55 +01:00
Jim Ferenczi c6abe58f63 Fix expectations in SearchAfter integration tests (#48372)
This commit fixes the expectations of SearchAfterIT#shouldFail regarding the inner exceptions that should be thrown
when testing failures. The exception is sometimes wrapped in a QueryShardException so this change only checks that
the toString representation contains the expected message.

Closes #43143
2019-10-29 12:37:22 +01:00
Yannick Welsch 6af3ce58f8 Filter on node id in AllocationIdIT (#48623)
Makes the assertions more targeted.

Relates #48529
2019-10-29 12:10:48 +01:00
Jim Ferenczi 028084ce23 Add a new merge policy that interleaves old and new segments on force merge (#48533)
This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges
and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents
first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices
even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler
than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working
on in # remain efficient even after running a force merge.

Relates #37043
2019-10-29 10:44:56 +01:00
Armin Braun 53a22b8a8a
Fix Validity of RepositoryDataTests Randomness (#48564) (#48566)
Trivial point, but we were only testing shard generations
for a single shard here, accidentally, and not testing the
`null` generation case at all.
2019-10-28 11:04:57 +01:00
Nhat Nguyen 1ef87c9a68 Refresh should not acquire readLock (#48414)
Today, we hold the engine readLock while refreshing. Although this 
choice simplifies the correctness reasoning, it can block IndexShard 
from closing if warming an external reader takes time. The current
implementation of refresh does not need to hold readLock as
ReferenceManager can handle errors correctly if the engine is closed in
midway.

This PR is a prerequisite that we need to solve #47186.
2019-10-25 17:32:35 -04:00
Dan Hermann 2e3db518c9
Do not reference values for filtered settings (#48066) (#48518) 2019-10-25 16:22:11 -05:00
Tim Brooks f5f1072824
Multiple remote connection strategy support (#48496)
* Extract remote "sniffing" to connection strategy (#47253)

Currently the connection strategy used by the remote cluster service is
implemented as a multi-step sniffing process in the
RemoteClusterConnection. We intend to introduce a new connection strategy
that will operate in a different manner. This commit extracts the
sniffing logic to a dedicated strategy class. Additionally, it implements
dedicated tests for this class.

Additionally, in previous commits we moved away from a world where the
remote cluster connection was mutable. Instead, when setting updates are
made, the connection is torn down and rebuilt. We still had methods and
tests hanging around for the mutable behavior. This commit removes those.

* Introduce simple remote connection strategy (#47480)

This commit introduces a simple remote connection strategy which will
open remote connections to a configurable list of user supplied
addresses. These addresses can be remote Elasticsearch nodes or
intermediate proxies. We will perform normal clustername and version
validation, but otherwise rely on the remote cluster to route requests
to the appropriate remote node.

* Make remote setting updates support diff strategies (#47891)

Currently the entire remote cluster settings infrastructure is designed
around the sniff strategy. As we introduce an additional conneciton
strategy this infrastructure needs to be modified to support it. This
commit modifies the code so that the strategy implementations will tell
the service if the connection needs to be torn down and rebuilt.

As part of this commit, we will wait 10 seconds for new clusters to
connect when they are added through the "update" settings
infrastructure.

* Make remote setting updates support diff strategies (#47891)

Currently the entire remote cluster settings infrastructure is designed
around the sniff strategy. As we introduce an additional conneciton
strategy this infrastructure needs to be modified to support it. This
commit modifies the code so that the strategy implementations will tell
the service if the connection needs to be torn down and rebuilt.

As part of this commit, we will wait 10 seconds for new clusters to
connect when they are added through the "update" settings
infrastructure.
2019-10-25 09:29:41 -06:00
Luca Cavanna d6d2edf324 Fix .tasks index strict mapping: parent_id should be parent_task_id (#48393)
* Fix .tasks index strict mapping: parent_id should be parent_task_id

The .tasks index has mappings that's strictly defined. `parent_task_id`
was defined as `parent_id` though which would cause an exception in case
a task is persisted that has a parent task id set.

While at it, a couple of compiler warnings were addressed and a test
request builder was removed in favour of using its corresponding request.

* increment version
2019-10-25 17:00:06 +02:00
Luca Cavanna 9c48ed12bc Remove response search phase from ExpandSearchPhase (#48401)
The expand phase is always created providing a function that builds
the next phase to be run, which has a single purpose: sending the
response back. Such small search phase is not necessary and causes some
issues when reporting search progress and counting the search phases
that need to be executed and that are already executed. We can simply
rather send back the response, without creating a specific phase for that.
2019-10-25 17:00:06 +02:00
Armin Braun 84a47a9632
Remove Outdated AwaitsFix (#48513) (#48522)
This `AwaitsFix` was accidentally added after the test
was already fixed in #46594 => we can remove it.
2019-10-25 16:08:56 +02:00
Armin Braun edab3748e9
Remove Incorrect Assertion from SnapshotsInProgress (#47458) (#48514)
This relates to the effort towards #46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes #47406
2019-10-25 15:03:23 +02:00
Christoph Büscher 3fb3397c12 BlendedTermQuery's equals method should consider boosts (#48193)
This changes the queries equals() method so that the boost factors for each term
are considered for the equality calculation. This means queries are only equal
if both their terms and associated boosts match. The ordering of the terms
doesn't matter as before, which is why we internally need to sort the terms and
boost for comparison on the first equals() call like before. Boosts that are
`null` are considered equal to boosts of 1.0f because topLevelQuery() will only
wrap into BoostQuery if boost is not null and different from 1f.

Closes #48184
2019-10-25 13:35:14 +02:00
Yannick Welsch 486794f24d Show task ID in source of persistent task state update (#48483)
Relates #48395
2019-10-25 10:29:16 +02:00
Tim Brooks c0b545f325
Make BytesReference an interface (#48486)
BytesReference is currently an abstract class which is extended by
various implementations. This makes it very difficult to use the
delegation pattern. The implication of this is that our releasable
BytesReference is a PagedBytesReference type and cannot be used as a
generic releasable bytes reference that delegates to any reference type.
This commit makes BytesReference an interface and introduces an
AbstractBytesReference for common functionality.
2019-10-24 15:39:30 -06:00
Yannick Welsch acf6d34d69 Always use last properly persisted metadata as previous state (#47779)
On data-only nodes we were not using the last persisted cluster state as base point to compute
what needed storage, but the last applied cluster state (but not necessarily properly persisted)
instead.
2019-10-24 13:30:59 +02:00
David Turner 50518359fe Fix relocating shards size calculation (#48421)
In #48392 we added a second computation of the sizes of the relocating shards
in `canRemain()` but passed the wrong value for `subtractLeavingShards`. This
fixes that. It also removes some unnecessary logging in a test case added in
the same commit.
2019-10-24 08:58:50 +01:00
Jim Ferenczi dc5c31d67a
Add a deprecation warning regarding allocation awareness in search request (#48351)
This is a follow up of https://github.com/elastic/elasticsearch/issues/43453 where we added
a system property to disallow allocation awareness in search requests. Since search requests
will no longer check the allocation awareness attributes for routing in the next major version,
this change adds a deprecation warning on any setup that uses these attributes.

Relates #43453
2019-10-24 09:25:50 +02:00
Mayya Sharipova 9e9533f717 Correct syntax from backport
User older format of map

Relates to #48425
2019-10-23 17:19:15 -04:00
Mayya Sharipova 975dbecfa9 Correct rewritting of script_score query (#48425)
Previously there was a bug when an query inside script_score query
was rewritten. If min_score was not set and was equal to null,
we were converting it to float value which resulted to NPE.
This commit corrects this.

Closes #48081
2019-10-23 17:01:51 -04:00
Igor Motov bdbc353dea Geo: improve handling of out of bounds points in linestrings (#47939)
Brings handling of out of bounds points in linestrings in line with
points. Now points with latitude above 90 and below -90 are handled
the same way as for points by adjusting the longitude by moving it by
180 degrees.

Relates to #43916
2019-10-23 14:17:44 -04:00
Jim Ferenczi 41116eb7ea Do not throw errors on unknown types in SearchAfterBuilder (#48147)
* Do not throw errors on unknown types in SearchAfterBuilder

The support for BigInteger and BigDecimal was added for XContent in
https://github.com/elastic/elasticsearch/pull/32888. However the SearchAfterBuilder
xcontent parser doesn't expect them to be present so it throws an AssertionError.
This change fixes this discrepancy by changing the AssertionError into an
IllegalArgumentException that will not cause the node to die when thrown.

Closes #48074
2019-10-23 20:02:14 +02:00
Tom Callahan 892264a97a Add versions 7.4.2 and 6.8.5 2019-10-23 13:32:51 -04:00
David Turner c783a20560 Handle negative free disk space in deciders (#48392)
Today it is possible that the total size of all relocating shards exceeds the
total amount of free disk space. For instance, this may be caused by another
user of the same disk increasing their disk usage, or may be due to how
Elasticsearch double-counts relocations that are nearly complete particularly
if there are many concurrent relocations in progress.

The `DiskThresholdDecider` treats negative free space similarly to zero free
space, but it then fails when rendering the messages that explain its decision.
This commit fixes its handling of negative free space.

Fixes #48380
2019-10-23 18:16:41 +01:00
Adrien Grand 81ef72d3ef
Lucene#asSequentialBits gets the leadCost backwards. (#48335) (#48403)
The comment says it needs random-access, but it passes `Long#MAX_VALUE` as the
lead cost, which forces sequential access, it should pass `0` instead. I took
advantage of this fix to improve the logic to leverage an estimation of the
number of times that `Bits#get` gets called to make better decisions.
2019-10-23 17:48:17 +02:00
Przemyslaw Gomulka aaa6209be6
[7.x] [Java.time] Calculate week of a year with ISO rules BACKPORT(#48209) (#48349)
Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only.
It require jvm property java.locale.providers to be set with SPI,COMPAT

closes #41670
backport #48209
2019-10-23 17:39:38 +02:00
Armin Braun 7215201406
Track Shard-Snapshot Index Generation at Repository Root (#48371)
This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question).

This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see #45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see #38941 ... now only the root `index-${N}` is determined by listing).

Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs.

This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob.
__No deletes are executed on the data nodes at all for any operation with this change.__

Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details).

This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository.
This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes.
Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur.
Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless).
With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
2019-10-23 10:58:26 +01:00
Jim Ferenczi 50f565b158 SearchSlowLog uses a non thread-safe object to escape json (#48363)
This commit fixes the usage of JsonStringEncoder#quoteAsUTF8 in the SearchSlowLog.
JsonStringEncoder#getInstance should always be called to get a thread local object
but this assumption was broken by #44642. This means that any slow log can throw
an AIOOBE since it uses the same byte array concurrently.

Closes #48358
2019-10-23 10:23:06 +02:00
Armin Braun 8a02a5fc7d
Simplify Shard Snapshot Upload Code (#48155) (#48345)
The code here was needlessly complicated when it
enqueued all file uploads up-front. Instead, we can
go with a cleaner worker + queue pattern here by taking
the max-parallelism from the threadpool info.

Also, I slightly simplified the rethrow and
listener (step listener is pointless when you add the callback in the next line)
handling it since I noticed that we were needlessly rethrowing in the same
code and that wasn't worth a separate PR.
2019-10-22 17:17:09 +01:00
Nhat Nguyen d0a4bad95b Use MultiFileTransfer in CCR remote recovery (#44514)
Relates #44468
2019-10-21 23:30:52 -04:00
Armin Braun e65c60915a
Cleanup FileRestoreContext Abstractions (#48173) (#48300)
This class is only used by the blob store repository
and CCR and the abstractions didn't really make sense
with CCR ignoring the concrete `restoreFiles` method
completely and having a method used only by the blobstore
overriden as unsupported.
=> Moved to a more fitting set of abstractions
=> Dried up the stream wrapping in `BlobStoreRepository` a little
now that the `restoreFile` method could be simplified

Relates #48110 as it makes changing the API of `FileRestoreContext`
to what is needed for async restores simpler
2019-10-21 17:30:35 +02:00
Armin Braun dc08feadc6
Remove Redundant Version Param from Repository APIs (#48231) (#48298)
This parameter isn't used by any implementation
2019-10-21 16:20:45 +02:00
David Turner 672b2a92ca Fix compile error from previous commit (#48230)
The previous commit, 3a6fa0bbdb introduces a
compile error that was fixed locally but not committed. This commit adds the
missing change.
2019-10-21 08:54:04 +01:00
David Turner 3a6fa0bbdb Close query cache on index service creation failure (#48230)
Today it is possible that we create the `QueryCache` and then fail to create
the owning `IndexService` and this means we do not close the `QueryCache`
again. This commit addresses that leak.

Fixes #48186
2019-10-21 08:46:53 +01:00
Ignacio Vera b1224fca8c
upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227) 2019-10-21 08:21:09 +02:00
Takuya Kajiwara a56daeae2d [DOCS] Fix typos in InternalEngine.java comments (#46861) 2019-10-18 10:36:58 -04:00
David Turner a8bcbbc38a Quieter logging from the DiskThresholdMonitor (#48115)
Today if an Elasticsearch node reaches a disk watermark then it will repeatedly
emit logging about it, which implies that some action needs to be taken by the
administrator. This is misleading. Elasticsearch strives to keep nodes under
the high watermark, but it is normal to have a few nodes occasionally exceed
this level. Nodes may be over the low watermark for an extended period without
any ill effects.

This commit enhances the logging emitted by the `DiskThresholdMonitor` to be
less misleading. The expected case of hitting the high watermark and
immediately relocating one or more shards that to bring the node back under the
watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages
are not emitted repeatedly.

Fixes #48038
2019-10-18 15:00:14 +01:00
Armin Braun 1157775074
Remove Support for pre-5.x Indices in Restore (#48181) (#48199)
The logic for handling empty segment files has been
unnecessary ever since #24021 which removes the support
for these files in 6.x -> we can safely remove the
support for restoring these from 7.x+ to simplify the code.
2019-10-18 09:45:07 +02:00
Przemyslaw Gomulka 02d18f5c1e
[7.x] Slow log must use separate underlying logger for each index BACKPORT(#47234) (#48176)
* Slow log must use separate underlying logger for each index (#47234)

SlowLog instances should not share the same underlying logger, as it would cause different indexes override each other levels. When creating underlying logger, unique per index identifier should be used. Name + IndexSettings.UUID

Closes #42432
2019-10-17 20:04:57 +02:00
Armin Braun 04e3316408
Stop Resolving Fallback IndexId (#48141) (#48204)
There is no reason to still resolve the
fallback `IndexId` here. It only applies to
`2.x` repos and those we can't read anymore
anyway because they use an `/index` instead of
an `/index-N` blob at the repo root for which
at least 7.x+ does not contain the logic to find
it.
2019-10-17 19:27:49 +02:00
Stuart Tettemer 356eef00c8
Scripting: get context names REST API (#48026) (#48168)
Adds `GET /_script_context`, returning a `contexts` object with each
available context as a key whose value is an empty object. eg.
```
{
  "contexts": {
    "aggregation_selector": {},
    "aggs": {},
    "aggs_combine": {},
...
  }
}
```

refs: #47411
2019-10-17 09:08:55 -06:00
Armin Braun 0ca7cc1848
Safely Close Repositories on Node Shutdown (#48020) (#48107)
We were not closing repositories on Node shutdown.
In production, this has little effect but in tests
shutting down a node using `MockRepository` and is
currently stuck in a simulated blocked-IO situation
will only unblock when the node's threadpool is
interrupted. This might in some edge cases (many
snapshot threads and some CI slowness) result
in the execution taking longer than 5s to release
all the shard stores and thus we fail the assertion
about unreleased shard stores in the internal test cluster.

Regardless of tests, I think we should close repositories
and release resources associated with them when closing
a node and not just when removing a repository from the CS
with running nodes as this behavior is really unexpected.

Fixes #47689
2019-10-17 07:55:05 +02:00
Armin Braun f1bc3a0753
Remove TestLogging for #46701 (#48156) (#48160)
This hasn't failed in 5 weeks now. Removing
the test logging and closing the issue.

Closes #46701
2019-10-17 07:54:20 +02:00
Jack Conradson fa99721295 Drop stored scripts with the old style-id (#48078)
This PR fixes (#47593). Stored scripts with the old-style id of lang#id are 
saved through the upgrade process but are no longer accessible in recent 
versions. This fix will drop those scripts altogether since there is no way for 
a user to access them.
2019-10-16 16:10:31 -07:00
jimczi b2dc98562b Bump version to 7.6 2019-10-16 15:57:12 +02:00
Klemen Košir 8243e99134 Fix typo in QueryBuilders Javadoc. (#47362)
This PR fixes a typo in the Javadoc for terms queries in QueryBuilders.
2019-10-15 16:16:21 -07:00
Martijn van Groningen aff0c9babc
This commits merges (#48040) the enrich-7.x feature branch,
which is backport merge and adds a new ingest processor, named enrich processor,
that allows document being ingested to be enriched with data from other indices.

Besides a new enrich processor, this PR adds several APIs to manage an enrich policy.
An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner.

Related to #32789
2019-10-15 17:31:45 +02:00
jimczi b858e19bcc Revert #46598 that breaks the cachability of the sub search contexts. 2019-10-15 09:40:59 +02:00