Commit Graph

2893 Commits

Author SHA1 Message Date
Armin Braun 6e51b6f96d
Add Repository Consistency Assertion to SnapshotResiliencyTests (#41631)
* Add Repository Consistency Assertion to SnapshotResiliencyTests (#40857)

* Add Repository Consistency Assertion to SnapshotResiliencyTests

* Add some quick validation on not leaving behind any dangling metadata or dangling indices to the snapshot resiliency tests
   * Added todo about expanding this assertion further

* Fix SnapshotResiliencyTest Repo Consistency Check (#41332)

* Fix SnapshotResiliencyTest Repo Consistency Check

* Due to the random creation of an empty `extra0` file by the Lucene mockFS we see broken tests because we use the existence of an index folder in assertions and the index deletion doesn't go through if there are extra files in an index folder
  * Fixed by removing the `extra0` file and resulting empty directory trees before asserting repo consistency
* Closes #41326

* Reenable SnapshotResiliency Test (#41437)

This was fixed in https://github.com/elastic/elasticsearch/pull/41332
but I forgot to reenable the test.

* fix compile on java8
2019-04-29 12:01:58 +02:00
Nhat Nguyen 615a0211f0 Recovery should not indefinitely retry on mapping error (#41099)
A stuck peer recovery in #40913 reveals that we indefinitely retry on
new cluster states if indexing translog operations hits a mapper
exception. We should not wait and retry if the mapping on the target is
as recent as the mapping that the primary used to index the replaying
operations.

Relates #40913
2019-04-27 10:55:08 -04:00
Michael Morello 75283294f5 Fix multi-node parsing in voting config exclusions REST API (#41588)
Fixes an issue where multiple nodes where not properly parsed in the voting config exclusions REST API.

Closes #41587
2019-04-27 12:20:03 +02:00
Nick Knize 113b24be4b Refactor GeoHashUtils (#40869)
This commit refactors GeoHashUtils class into a new Geohash utility class located in the ES geo library. The intent is to not only better control what geo methods are whitelisted for painless scripting but to clean up the geo utility API in general.
2019-04-26 10:06:36 -05:00
Armin Braun aad33121d8
Async Snapshot Repository Deletes (#40144) (#41571)
Motivated by slow snapshot deletes reported in e.g. #39656 and the fact that these likely are a contributing factor to repositories accumulating stale files over time when deletes fail to finish in time and are interrupted before they can complete.

* Makes snapshot deletion async and parallelizes some steps of the delete process that can be safely run concurrently via the snapshot thread poll
   * I did not take the biggest potential speedup step here and parallelize the shard file deletion because that's probably better handled by moving to bulk deletes where possible (and can still be parallelized via the snapshot pool where it isn't). Also, I wanted to keep the size of the PR manageable.
* See https://github.com/elastic/elasticsearch/pull/39656#issuecomment-470492106
* Also, as a side effect this gives the `SnapshotResiliencyTests` a little more coverage for master failover scenarios (since parallel access to a blob store repository during deletes is now possible since a delete isn't a single task anymore).
* By adding a `ThreadPool` reference to the repository this also lays the groundwork to parallelizing shard snapshot uploads to improve the situation reported in #39657
2019-04-26 15:36:09 +02:00
Armin Braun 7824f60a34
Simplify Snapshot Resiliency Test (#40930) (#41565)
* Thanks to #39793 dynamic mapping updates don't contain blocking operations anymore so we don't have to manually put the mapping in this test and can keep it a little simpler
2019-04-26 10:59:09 +02:00
Christoph Büscher 078936b8f5 Remove search analyzers from DocumentFieldMappers (#41484)
These references seem to be unused except for tests and should be removed to
keep the places we store analyzers limited.
2019-04-26 09:48:48 +02:00
Armin Braun 6a24fd3f26
Add Restore Operation to SnapshotResiliencyTests (#40634) (#41546)
* Add Restore Operation to SnapshotResiliencyTests

* Expand the successful snapshot test case to also include restoring the snapshop
  * Add indexing of documents as well to be able to meaningfully verify the restore
* This is part of the larger effort to test eventually consistent blob stores in #39504
2019-04-26 09:04:34 +02:00
Christoph Büscher 52495843cc [Docs] Fix common word repetitions (#39703) 2019-04-25 20:47:47 +02:00
Armin Braun 23b3741618
Remove Exists Check from S3 Repository Deletes (#40931) (#41534)
* The check doesn't add much if anything practically, since the S3 repository is eventually consistent and we only log the non-existence of a blob anyway
  * We don't do the check on writes for this very reason and documented it as such
  * Removing the check saves one API call per single delete speeding up the deletion process and lowering costs
2019-04-25 18:25:03 +02:00
Jim Ferenczi 6184efaff6
Handle unmapped fields in _field_caps API (#34071) (#41426)
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
2019-04-25 18:13:48 +02:00
Armin Braun 40aef2b8aa
Introduce Delegating ActionListener Wrappers (#40129) (#41527)
* Introduce Delegating ActionListener Wrappers
* Dry up use cases of ActionListener that simply pass through the response or exception to another listener
2019-04-25 16:05:04 +02:00
Ignacio Vera d119abdf96
Improve accuracy for Geo Centroid Aggregation (#41514)
keeps the partial results as doubles and uses Kahan summation to help reduce floating point errors.
2019-04-25 15:25:48 +02:00
Armin Braun cd830b53e2
Name Snapshot Data Blobs by UUID (#40652) (#41523)
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in #38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
2019-04-25 13:18:03 +02:00
Luca Cavanna 8a0e5f7b87
Deprecate support for first line empty in msearch API (#41442)
In order to support empty action metadata in the first msearch item,
we need to remove support for prepending msearch request body with an
empty line, which prevents us from parsing the empty line as action
metadata for the first search item.

Relates to #41011
2019-04-25 12:45:18 +02:00
Przemyslaw Gomulka 906f88029b
Remove the test which is testing java and joda api backport(#41493) #41518
The test is testing the java time API and fails in case it hits daylight saving time changes.
Java time has the right implementation and we don't need to test this.
more details on how the test was affected by the DST change on this comment
closes #39617
backport(#41493)
2019-04-25 12:21:01 +02:00
Armin Braun 7c819fd2aa
Fix BulkRejectionIT (#41446) (#41500)
* Due to #40866 one of the two parallel bulk requests can randomly be
rejected outright when the write queue is full already, we can catch
this situation and ignore it since we can still have the rejection for
the dynamic mapping udate for the other reuqest and it's somewhat rare
to run into this anyway
* Closes #41363
2019-04-24 20:46:21 +02:00
Zachary Tong ec5dd0594f Disallow null/empty or duplicate composite sources (#41359)
Adds some validation to prevent duplicate source names from being
used in the composite agg.

Also refactored to use a ConstructingObjectParser and removed the
private ctor and setter for sources, making it mandatory.
2019-04-24 13:23:31 -04:00
Armin Braun 1db9166ea0
Fix Broken Index Shard Snapshot File Preventing Snapshot Creation (#41310) (#41473)
* The problem here is that if we run into a corrupted index-N file, instead of generating a new index-(N+1) file, we instead set the newest index generation to -1 and thus tried to create `index-0`
   * If `index-0` is corrupt, this prevents us from ever creating a new snapshot using the broken shard, because we are unable to create `index-0` since it already exists
   * Fixed by still using the index generation for naming the next index file, even if it was a broken index file
* Added test that makes sure restoring as well as snapshotting on top of the broken shard index file work as expected
* closes #41304
2019-04-24 18:39:17 +02:00
Armin Braun 381b8e2ece
Fix BulkProcessor Retry ITs (#41338) (#41472)
* The test fails for the retry backoff enabled case because the retry handler in the bulk processor hasn't been adjusted to account for #40866 which now might lead to an outright rejection of the request instead of its items individually
   * Fixed by adding retry functionality to the top level request as well
* Also fixed the duplicate test for the HLRC that wasn't handling the non-backoff case yet the same way the non-client IT did
* closes #41324
2019-04-24 13:46:32 +02:00
Jason Tedor 65af47eb31
Introduce aliases version (#41397)
This commit introduces aliases versions to index metadata. This will be
useful in CCR when we replicate aliases.
2019-04-23 12:19:11 -04:00
David Roberts 7e2aec022d [TEST] Mute BulkRejectionIT.testBulkRejectionAfterDynamicMappingUpdate
Due to https://github.com/elastic/elasticsearch/issues/41363
2019-04-23 15:58:38 +01:00
David Roberts d8a2970fa4 [TEST] Mute RemoteClusterServiceTests.testCollectNodes
Due to https://github.com/elastic/elasticsearch/issues/41067
2019-04-23 15:13:01 +01:00
David Turner 0bb15d3dac Allow ops to be blocked after primary promotion (#41360)
Today we assert that there are no operations in flight in this test. However we
will sometimes be in a situation where the operations are blocked, and we
distinguish these cases since #41271 causing the assertion to fail. This commit
addresses this by allowing operations to be blocked sometimes after a primary
promotion.

Fixes #41333.
2019-04-19 07:48:43 +01:00
Jim Ferenczi 8f73e1e883 Fix unmapped field handling in the composite aggregation (#41280)
The `composite` aggregation maps unknown fields as numerics, this means that
any `after` value that is set on a query with an unmapped field on some indices
will fail if the provided value is not numeric. This commit changes the default
value source to use keyword instead in order to be able to parse any type of after
values.
2019-04-18 23:08:13 +02:00
Jim Ferenczi 754037b71e Unified highlighter should ignore terms that targets the _id field (#41275)
The `_id` field uses a binary encoding to index terms that is not compatible with
the utf8 automaton that the unified highlighter creates to reanalyze the input.
For these reason this commit ignores terms that target the `_id` field when
`require_field_match` is set to false.

Closes #37525
2019-04-18 22:31:23 +02:00
Jim Ferenczi 068f8ba223 more_like_this query to throw an error if the like fields is not provided (#40632)
With the removal of the `_all` field the `mlt` query cannot infer a field name
to use to analyze the provided (un)like text if the `fields` parameter is not
explicitly set in the query and the `index.query.default_field` is not changed
in the index settings (by default it is set to `*`). For this reason the like text
is ignored and queries are only built from the provided document ids.
This change fixes this bug by throwing an error if the fields option is not set
and the `index.query.default_field` is equals to `*`. The error is thrown only
if like or unlike texts are provided in the query.
2019-04-18 22:30:22 +02:00
Simon Willnauer 11dc9fe249 Mark searcher as accessed in acquireSearcher (#41335)
This fixes an issue where every N seconds a slow search request is triggered
since the searcher access time is not set unless the shard is idle. This change
moves to a more pro-active approach setting the searcher as accessed all the time.
2019-04-18 19:14:50 +02:00
Adrien Grand a699cb76a5
Fix javadoc tag. (#41330)
s/returns/return/
2019-04-18 14:41:09 +02:00
Armin Braun 389a13b68e
Mute BulkProcessorRetryIT#testBulkRejectionLoadWithBackoff (#41325) (#41331)
* For #41324
2019-04-18 11:55:28 +02:00
Alpar Torok a4a4259cac Mute failing test
Tracking #41326
2019-04-18 09:26:20 +03:00
Armin Braun c77e10b16b
Handle Bulk Requests on Write Threadpool (#40866) (#41315)
* Bulk requests can be thousands of items large and take more than O(10ms) time to handle => we should not handle them on the transport threadpool to not block select loops
* relates #39128
* relates #39658
2019-04-18 07:10:23 +02:00
David Turner 946baf87d3 Assert TransportReplicationActions acquire permits (#41271)
Today we do not distinguish "no operations in flight" from "operations are
blocked", since both return `0` from `IndexShard#getActiveOperationsCount()`.
We therefore cannot assert that every `TransportReplicationAction` performs its
actions under permit(s). This commit fixes this by returning
`IndexShard#OPERATIONS_BLOCKED` if operations are blocked, allowing these two
cases to be distinguished.
2019-04-17 23:05:03 +02:00
Zachary Tong 7e62ff2823 [Rollup] Validate timezones based on rules not string comparision (#36237)
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
2019-04-17 13:46:44 -04:00
Christoph Büscher 4d964194db Fix error applying `ignore_malformed` to boolean values (#41261)
The `ignore_malformed` option currently works on numeric fields only when the
bad value isn't a string value but not if it is a boolean. In this case we get a
parsing error from the xContent parser which we need to catch in addition to the
field mapper.

Closes #11498
2019-04-17 18:44:57 +02:00
David Turner 2670ed2f8f Assert the stability of custom search preferences (#41150)
Today the `?preference=custom_string_value` search preference will only change
its choice of a shard copy if something changes the `IndexShardRoutingTable`
for that specific shard. Users can use this behaviour to route searches to a
consistent set of shard copies, which means they can reliably hit copies with
hot caches, and use the other copies only for redundancy in case of failure.
However we do not assert this property anywhere, so we might break it in
future.

This commit adds a test that shows that searches are routed consistently even
if other indices are created/rebalanced/deleted.

Relates https://discuss.elastic.co/t/176598, #41115, #26791
2019-04-17 17:47:44 +02:00
Nhat Nguyen 2ee87c99d9 Fix bwc version of sanity check of read only engine
Relates #41041
2019-04-17 10:25:47 -04:00
Nhat Nguyen aa0c957a4a Do not trim unsafe commits when open readonly engine (#41041)
Today we always trim unsafe commits (whose max_seq_no >= global
checkpoint) before starting a read-write or read-only engine. This is
mandatory for read-write engines because they must start with the safe
commit. This is also fine for read-only engines since most of the cases
we should have exactly one commit after closing an index (trimming is a
noop). However, this is dangerous for following indices which might have
more than one commits when they are being closed.

With this change, we move the trimming logic to the ctor of InternalEngine
so we won't trim anything if we are going to open a read-only engine.
2019-04-17 10:16:12 -04:00
Adrien Grand f7e590ce0d
ProfileScorer should propagate `setMinCompetitiveScore`. (#40958) (#41302)
Currently enabling profiling disables top-hits optimizations, which is
unfortunate: it would be nice to be able to notice the difference in method
counts and timings depending on whether total hit counts are requested.
2019-04-17 16:11:14 +02:00
Adrien Grand 9fd5237fd4
Clean up Node#close. (#39317) (#41301)
`Node#close` is pretty hard to rely on today:
 - it might swallow exceptions
 - it waits for 10 seconds for threads to terminate but doesn't signal anything
   if threads are still not terminated after 10 seconds

This commit makes `IOException`s propagated and splits `Node#close` into
`Node#close` and `Node#awaitClose` so that the decision what to do if a node
takes too long to close can be done on top of `Node#close`.

It also adds synchronization to lifecycle transitions to make them atomic. I
don't think it is a source of problems today, but it makes things easier to
reason about.
2019-04-17 16:10:53 +02:00
Jason Tedor 6566979c18
Always check for archiving broken index settings (#41209)
Today we check if an index has broken settings when checking if an index
needs to be upgraded. However, it can be the case that an index setting
became broken even if an index is already upgraded to the current
version if the user removed a plugin (or downgraded from the default
distribution to the non-default distribution) while on the same version
of Elasticsearch. In this case, some registered settings would go
missing and the index would now be broken. Yet, we miss this check and
instead of archiving the settings, the index becomes unassigned due to
the missing settings. This commit addresses this by checking for broken
settings whether or not the index is upgraded.
2019-04-17 07:00:23 -04:00
Christoph Büscher badb7a22e0 Some cleanups in NoisyChannelSpellChecker (#40949)
One of the two #getCorrections methods is only used in tests, so we can move
it and any of the required helper methods to that test. Also reducing the
visibility of several methods to package private since the class isn't used
elsewhere outside the package.
2019-04-17 10:22:12 +02:00
David Turner bfa06d963e Do not create missing directories in readonly repo (#41249)
Today we erroneously look for a node setting called `readonly` when deciding
whether or not to create a missing directory in a filesystem repository. This
change fixes this by using the repository setting instead.

Closes #41009
Relates #26909
2019-04-17 09:43:14 +02:00
Yogesh Gaikwad 6a552c05fe
Use alias name from rollover request to query indices stats (#40774) (#41284)
In `TransportRolloverAction` before doing rollover we resolve
source index name (write index) from the alias in the rollover request.
Before evaluating the conditions and executing rollover action, we
retrieve stats, but to do so we used the source index name
resolved from the alias instead of alias from the index.
This fails when the user is assigned a role with index privilege on the
alias instead of the concrete index. This commit fixes this by using
the alias from the request.
After this change, verified that when we retrieve all the stats (including write + read indexes)
we are considering only source index.

Closes #40771
2019-04-17 14:15:05 +10:00
Jim Ferenczi 043c1f5d42 Unified highlighter should respect no_match_size with number_of_fragments set to 0 (#41069)
The unified highlighter returns the first sentence of the text when number_of_fragments
is set to 0 (full highlighting). This is a legacy of the removed postings highlighter
that was based on sentence break only. This commit changes this behavior in order
to respect the provided no_match_size value when number_of_fragments is set to 0.
This means that the behavior will be consistent for any value of the number_of_fragments option.

Closes #41066
2019-04-16 19:25:25 +02:00
Armin Braun c4e84e2b34
Add Bulk Delete Api to BlobStore (#40322) (#41253)
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes #40250
2019-04-16 17:19:05 +02:00
Jim Ferenczi c22a2cea12 BlendedTermQuery should ignore fields that don't exists in the index (#41125)
Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.

Closes #41118
2019-04-16 16:25:42 +02:00
David Turner 8577bbd73b Inline TransportReplAct#createReplicatedOperation (#41197)
`TransportReplicationAction.AsyncPrimaryAction#createReplicatedOperation`
exists so it can be overridden in tests. This commit re-works these tests to
use a real `ReplicationOperation` and inlines the now-unnecessary method.

Relates #40706.
2019-04-16 13:36:29 +01:00
David Turner 10e58210a0
Validate cluster UUID when joining Zen1 cluster (#41063)
Today we fail to join a Zen2 cluster if the cluster UUID does not match our
own, but we do not perform the same validation when joining a Zen1 cluster.
This means that a Zen2 node will pass join validation and be added to a Zen1
cluster but will reject all cluster states from the master.

Relates #37775
2019-04-16 12:49:47 +01:00
Nhat Nguyen 8ee84f2268 Correct flush parameters in engine test
Since #40213, we forbid a combination of flush parameters: force=true
and wait_if_ongoing=false.

Closes #41236
2019-04-16 05:04:31 -04:00