Commit Graph

5508 Commits

Author SHA1 Message Date
Nik Everett 0c47d49784
Make sure non-collecting aggs include sub-aggs (backport of #64214) (#64247)
Now that we're consistently using `cat_match` to filter which shards we
run on we can get this confusing case:
1. You have a search with, say, a range and a sub-agg.
2. That search has a query that `can_match` can recognize will match no
   docs. On *any* shard.
3. So we dutifully run it on a single shard so it can produce the
   "empty" aggs.
4. The shard we pick happens to not have the target of the range mapped.
5. This kicks in the special range aggregator that doesn't collect any
   documents.
6. Before this commit, that range aggregator *also* never produced any
   sub-aggs.

So, without this change, it was quite possible for a search that
happened to match no documents to "throw away" the sub-aggs of a range
and a few other aggs.

We've had this problem for a long, long time but it is more confusing
now because `can_match` is really kicking in and causing us to see cases
where it looks like you are targeting a lot of shards but you really are
only targeting a couple. It used to be that to get the "no sub-aggs"
behavior you had to explicitly target only shards that didn't map the
target field of the `range` agg. And, like, in that case it isn't too
bad because you targeted a sort of degenerate shard. But now that
`can_match` is doing its thing you can end up with the confusing steps
above. It took me several hours to track down what what happening I know
how the individual pieces of all of this works. It took four hours to
figure out how they fit together in this case....

Anyway! This replaces all the aggregator implementations that throw out
the sub-aggregators with ones that keep them. I think this'll be less
confusing in the future.

Closes #64142
2020-10-28 08:38:05 -04:00
Jason Tedor 78c741ab32
Log whether or not we are using the bundled JDK (#64255)
This commit adds logging to indicate whether or not we are using the
bundled JDK. We distinguish between using a distribution that bundles
the JDK versus using a distribution that does not bundle the JDK.
2020-10-28 07:10:47 -04:00
Armin Braun 2983584ef6
Fix #invariant Assertion in CacheFile (#64180) (#64264)
Fix #invariant Assertion in CacheFile

closes #64141
2020-10-28 10:22:47 +01:00
Armin Braun a697d5edae
Don't Generate an Index Setting History UUID unless it's Supported (#64164) (#64213)
In 7.x we can't just by default generate this setting as it might not be
supported by data nodes that are assigned shards for an older version in mixed version
clusters.

Closes #64152
2020-10-28 09:03:09 +01:00
Jason Tedor dfc8ae48cc
Fix using bundled JDK detection on macOS (#64236)
This commit fixes an issue with the detection on macOS for whether or
not the bundled JDK is being used. The logic between macOS and non-macOS
is different because the JDK has a different directory structure on
macOS versus non-macOS. However, due to notarization issues, we changed
the top-level directory from jdk to jdk.app, yet never updated this
detection logic to account for that.

Ideally, we would have a packaging test that asserts that we have the
behavior here correct, and it maintains over time. Alas, we do not
currently have packaging tests on macOS.
2020-10-27 16:47:02 -04:00
Nhat Nguyen 566d1fd459 Return the same point in time in search response (#64188)
With this change, we will always return the same point in time in a
search response as its input until we implement the retry mechanism
for the point in times.
2020-10-27 10:17:44 -04:00
Jim Ferenczi e34014eb6a Fix sorted query when date_nanos is used as the numeric_type (#64183)
The formatting of the global bottom value does not take the resolution of the provided
numeric_type into account. This change fixes this bug by providing the resolution
directly in the doc value format if the numeric_type is provided as `date_nanos`.

Closes #63719
2020-10-27 11:00:23 +01:00
Armin Braun e02561476e
Fix Broken Clone Snapshot CS Update (#64116) (#64159)
We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).

Closes #64115
2020-10-26 14:32:42 +01:00
Armin Braun 96407268a0
Fix Background Merge Breaking Snapshot Restore Test (#63579) (#64129)
If we run into a background merge between creating the snapshot and closing the index
then with compound files we could be in a situation where we get zero file reuse
on restore.
Force merging before the snapshot gives us a single segment that won't change down the line
so the restore always sees file reuse from the close index.

Closes #63476
2020-10-26 09:34:43 +01:00
Armin Braun bdea16301d
Fix testMasterFailoverDuringCloneStep1 (#63580) (#64127)
Assuming the clone failed when the request failed is not sufficient.
There are failure modes where the request fails but the clone still works out
because the data node resent the requeest after the first clone had already been
failed and removed from the cluster state when master was restarted.

Closes #63473
2020-10-26 09:30:09 +01:00
Marios Trivyzas 9b8ea63cd2
[7.10] Bump version after 7.9.3 release (#63818) 2020-10-22 17:49:21 +02:00
Przemyslaw Gomulka bab426be2c
[7.10] add 6.8.14 version (#63824)
adding 6.8.14 after version 6.8.13 release
2020-10-22 16:51:01 +02:00
Armin Braun e0f73c96f7
Fix testStartCloneWithSuccessfulShardSnapshotPendingFinalization (#63966) (#64000)
We have to wait for no more operations here not for `1`. This mostly worked
because the test thread would add the listener quickly enough so that it sees the
state where either the snapshot or clone but not both have already finished
but randomly the test thread would be slow and time out on a state without snaphots in it.
2020-10-21 15:33:12 +02:00
markharwood b933bd9f45
Search - make term/prefix/wildcard/regex query parsing more lenient (#63926)
Remove errors when case_insensitive flag set to false

Closes #63893
2020-10-21 13:33:19 +01:00
Henning Andersen ddd897f747 Fix test timeout for health on master failover (#63455)
testHealthOnMasterFailover could timeout on some of the health requests
in the case where an index is added, since the recovery leads to
extended test run time.

Closes #62690
2020-10-21 14:31:53 +02:00
Nik Everett 8d30766a7d
Fix scripted metric BWC serialization (backport of #63821) (#63897)
We had and an error when serializing fully reduced scripted metrics.
Small typo and sever lack of tests..... Anyway, this fixed the one
character typo and adds a bunch more tests.
2020-10-20 13:15:26 -04:00
Ignacio Vera d0f5066310
Upgrade to lucene-8.7.0-snapshot-72d8528c3a6 (#63912) (#63928) (#63933) 2020-10-20 15:08:06 +02:00
Tanguy Leroux b2e07076a0
Add snapshot shard size based test in DiskThresholdDeciderTests (#63913)
This commit adds a test in DiskThresholdDeciderTests that verifies
 the allocation of a snapshot recovery source based shard in the 
situation where the snapshot shard size was successfully provided 
by the SnapshotInfoService introduced in #61906 and when the 
service failed to provide the size.

Relates #61906
2020-10-20 14:59:00 +02:00
Jim Ferenczi 3423f214dd Composite aggregation must check live docs when the index is sorted (#63864)
This change ensures that the live docs are checked in the composite aggregator
when the index is sorted.
2020-10-20 11:40:28 +02:00
Armin Braun 1880bcdc09
Add REST Test for Snapshot Clone API (#63863) (#63881)
Adds snapshot clone REST tests and HLRC support for the API.
2020-10-20 09:48:03 +02:00
Nik Everett 5583db5a73
Fix broken parent and child aggregator (backport #63811) (#63892)
In #57892 I broke *some* sub-aggregations inside of the `parent` and
`child` aggregator, specifically any sub-aggregations that do work in
the `postCollect` phase. This fixes it by delaying the post collect
phase of aggs under `parent` and `child` until `beforeBuildingBuckets`
because, well, we haven't done *any* collection until after that phase.
2020-10-19 13:05:22 -04:00
Mayya Sharipova c0c1a7a9a6 Apply boost only once for distance_feature query (#63767)
Currently if distance_feature query contains boost,
it incorrectly  gets applied twice: in AbstractQueryBuilder::toQuery and
we also pass this boost to Lucene's LongPoint.newDistanceFeatureQuery.
As a result we get incorrect scores.

This fixes this error to ensure that boost is applied only once.

Closes #63691
2020-10-16 10:02:55 -04:00
Ioannis Kakavas 364511395d
[7.10] Move RestRequestFilter to core (#63507)
Move RestRequestFilter to core so that Rest requests outside xpack can use 
it to filter fields and expand its usage.

Backport of #63507
2020-10-16 13:57:52 +03:00
Tanguy Leroux 7ea44d20c3
Try to fix DiskThresholdDeciderIT (#63614) (#63721)
This is another attempt to fix #62326 as my previous 
attempts failed (#63112, #63385).
2020-10-16 09:20:54 +02:00
Jay Modi 822fea9889
Fix threadpool setting test for system_write (#63706)
This commit fixes the UpdateThreadPoolSettingsTests to be aware of the
hard limit on the maximum size of the system_write executor. This
executor has a hard limit that matches the write executor, which is
the number of allocated processors.

Closes #63131
Backport #63700
2020-10-14 14:57:43 -06:00
James Rodewig ac2b668016
[DOCS] Fix AbstractDiffable typo (#59034) (#63668)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-10-14 09:56:56 -04:00
Armin Braun 424b313784
Adapt Shard Generation Assertion for 7.x (#63625) (#63642)
In 7.x we can have `null` generations so we need to adjust the `assert`
accordingly.
See e.g. failure https://gradle-enterprise.elastic.co/s/dgypleytdotfu/tests/:server:internalClusterTest/org.elasticsearch.snapshots.ConcurrentSnapshotsIT/testConcurrentSnapshotWorksWithOldVersionRepo
2020-10-14 06:57:25 +02:00
Nhat Nguyen 9015b50e1b
Check docs limit before indexing on primary (#63273)
Today indexing to a shard with 2147483519 documents will fail that
shard. We should check the number of documents and reject the write
requests instead.

Closes #51136
2020-10-13 17:39:08 -04:00
Lee Hinman 7371e51583
[7.10] Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581) (#63613)
Backports the following commits to 7.10:

    Add DiscoveryNodeRole compatibility role for bwc tier serialization (#63581)
2020-10-13 09:17:15 -06:00
Armin Braun f70391c6cc
Fix Broken Snapshot State Machine in Corner Case (#63534) (#63608)
This fixes a gap in testing and a bug that can occur in various forms:
When we would start a snapshot or clone related to a shard that was done
snapshotting/cloning but its overall operation was not yet finalized
at the time of starting the operation, we would base the operation off of
the wrong generation. This would not cause a corrupted repo, but would
cause the operation to be `PARTIAL`.
This commit fixes the state machine to take into account the correct generation
in this case.

Closes #63498
2020-10-13 16:05:34 +02:00
James Rodewig 845ccc2264
[DOCS] Fix dup word in ShardRouting hashcode method. (#63452) (#63583)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-10-13 09:05:19 -04:00
Tanguy Leroux 8499924e51
InternalSnapshotsInfoService should also removed failed snapshot shard size infos (#63492) (#63592)
Relates #61906
2020-10-13 10:42:38 +02:00
Julie Tibshirani 9e52513c7b
Add support for missing value fetchers. (#63585)
This PR implements value fetching for the following field types:
* `text` phrase and prefix subfields
* `search_as_you_type`, plus its subfields
* `token_count`, which is implemented by fetching doc values

Supporting these types helps ensure that retrieving all fields through
`"fields": ["*"]` doesn't fail because of unsupported value fetchers.
2020-10-12 17:34:21 -07:00
Tim Brooks 56092b1a9f
Flush translog writer before adding new operation (#63505)
Currently we flush the Translog buffer when a new operation causes the
buffer to breach 1MB. This introduces a scenario where an exception is
thrown AFTER the writer has accepted the operation. To avoid this, this
commit flushes the Translog in an #add call before adding a new
operation.

This fixes #63299.
2020-10-09 10:02:55 -06:00
Julie Tibshirani ae2fc4118d Add factory methods for common value fetchers. (#63438)
This PR adds factory methods for the most common implementations:
* `SourceValueFetcher.identity` to pass through the source value untouched.
* `SourceValueFetcher.toString` to simply convert the source value to a string.
2020-10-08 12:14:53 -07:00
Julie Tibshirani c6b915c8e6 Make TextFieldMapper.FAST_PHRASE_SUFFIX private. 2020-10-08 11:45:53 -07:00
Tanguy Leroux 943fcaf970
Simplify reroute counting in InternalSnapshotsInfoServiceTests (#63416) (#63491)
Closes #63352
2020-10-08 18:20:07 +02:00
Dan Hermann 85886e71c2
Handle error conditions when simulating ingest pipelines with verbosity enabled (#63327) (#63484) 2020-10-08 09:21:05 -05:00
Przemyslaw Gomulka d7391bc040
[7.10] Fix incorrect use of Format.equals instead of matches backport#63462 #63463
closes #63459
backports #63462
2020-10-08 15:35:13 +02:00
Christoph Büscher 517d3e4336 Mute DiskThresholdDeciderIT.testHighWatermarkNotExceeded 2020-10-08 15:14:50 +02:00
Mayya Sharipova e022b78198
Upgrade to lucene-8.7.0-snapshot-5c4168d (#63466)
This disables sort optim on _doc, which may still be unstable.
Backport for #63444
2020-10-08 08:20:43 -04:00
Christoph Büscher 564823b00f Muting parts of JavaJodaTimeDuellingTests 2020-10-08 11:50:47 +02:00
Alan Woodward c4726a2cec Don't emit separate warnings for type filters (#63391)
#63214 made TypeFieldType a constant field, and fixed things so that it always
emits deprecation warnings whenever it is referenced in a query or aggregation.
However, it also emits warnings when it is used to build a type filter through
the search context; this is unnecessary, as warnings are already emitted by
the REST layer when types are specified as part of the URL, and it is causing
failures in some BWC tests.

This commit adds a specialised typeFilter method to TypeFieldType to handle
this case without emitted any extra warnings. It also removes an unused duplicate
TypeFieldType class that resulted from a backport merge error.

Fixes #63366
2020-10-07 15:56:39 +01:00
Mayya Sharipova e236ea43e9 Upgrade to lucene-8.7.0-snapshot-e914862 (#63401)
Backport for: #63395
2020-10-07 09:45:14 -04:00
Alan Woodward 88b45dfa61
Convert TextFieldMapper to parametrized form (#63269) (#63392)
As a result of this, we can remove a chunk of code from TypeParsers as well. Tests
for search/index mode analyzers have moved into their own file. This commit also
rationalises the serialization checks for parameters into a single SerializerCheck
interface that takes the values includeDefaults, isConfigured and the value
itself.

Relates to #62988
2020-10-07 13:26:25 +01:00
Przemyslaw Gomulka 5534a60fa0
strict_date_optional_time_nanos with width 1 on nanos part (#63117) (#63387)
This formatter should allow parsing fraction of a second with minimum
width of 1. The same is allowed for strict_date_optional_time
closes #61357
2020-10-07 14:12:04 +02:00
Armin Braun 244f1a60f9
Selectively Add ClusterState Listeners Depending on Node Roles (#63223) (#63396)
We were not consistent in checking for node roles before adding listeners.
In some cases we did check the necessity of a CS listener and in others we did not.
This commit fixes a number of cases of redundant listeners that don't apply to all node roles.
2020-10-07 14:11:43 +02:00
Tanguy Leroux eac99dd594
SnapshotShardSizeInfo should prefer default value when provided (#63390) (#63394)
In #61906 we agreed on always providing the default value 
ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE 
when the SnasphotInfoService failed to retrieve the exact 
size for a given snapshot shard. The motivation was to 
allow the shard allocation to move forward in case of 
failures (so that the unassigned shard does not get stuck 
in an unassigned state for too long) while relying on the 
fallback values for shard sizes.

Sadly a bug in the 
SnapshotShardSizeInfo#getShardSize(ShardRouting, long) 
makes the default value to be ignored when the snapshot 
shard size retrieval previously failed, returning 
ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE 
instead of the provided default value. With DiskThresholdDecider 
also not relying on the provided default value this triggers 
some assertion like in #63376 which helped us to spot the bug.

Closes ##63376
2020-10-07 13:53:05 +02:00
Tanguy Leroux 581490d83c
Fix DiskThresholdDeciderIT.testHighWatermarkNotExceeded (#63112) (#63385)
The first refreshDiskUsage() refreshes the ClusterInfo update which in turn 
calls listeners like DiskThreshMonitor. This one triggers a reroute as 
expected and turns an internal checkInProgress flag before submitting 
a cluster state update to relocate shards (the internal flag is toggled 
again once the cluster state update is processed).

In the test I suspect that the second refreshDiskUsage() may complete 
before DiskThreshMonitor's internal flag is set back to its initial state, 
resulting in the second ClusterInfo update to be ignored and message 
like "[node_t0] skipping monitor as a check is already in progress" to 
be logged. Adding another wait for languid events to be processed 
before executing the second refreshDiskUsage() should help here.

Closes #62326
2020-10-07 11:27:25 +02:00
Przemyslaw Gomulka eadd69e1e4
Deprecate week_year in favour of weekyear date format backport(63307) (#63308)
week_year is misleading as the formatter only has a weekyear. A field
corresponding to 'Y'. 'weekyear' should be used instead

relates #60707
backports https://github.com/elastic/elasticsearch/pull/63307
2020-10-07 09:16:27 +02:00