Commit Graph

54675 Commits

Author SHA1 Message Date
Tanguy Leroux 16fae5d66d
Also reroute after shard snapshot size fetch failure (#66008)
In #61906 we added the possibility for the master node to fetch
the size of a shard snapshot before allocating the shard to a
data node with enough disk space to host it. When merging
this change we agreed that any failure during size fetching
should not prevent the shard to be allocated.

Sadly it does not work as expected: the service only triggers
reroutes when fetching the size succeed but never when it
 fails. It means that a shard might stay unassigned until
another cluster state update triggers a new allocation
(as in #64372). More sadly, the test I wrote was wrong as
it explicitly triggered a reroute.

This commit changes the InternalSnapshotsInfoService
so that it also triggers a reroute when fetching the snapshot
shard size failed, ensuring that the allocation can move
forward by using an UNAVAILABLE_EXPECTED_SHARD_SIZE
shard size. This unknown shard size is kept around in the
snapshot info service until no corresponding unassigned
shards need the information.

Backport of #65436
2020-12-08 12:10:37 +01:00
István Zoltán Szabó 063db03d17
[7.10] [DOCS] Adds Working with transforms at scale to docs (#65726) (#65966) 2020-12-08 07:52:32 +01:00
Przemko Robakowski eaab5c65e0
Allow more legit cases in Metadata.Builder.validateDataStreams (#65791) (#65938)
This change simplifies logic and allow more legit cases in Metadata.Builder.validateDataStreams.
It will only show conflict on names that are in form of .ds-<data stream name>-<[0-9]+> and will allow any names like .ds-<data stream name>-something-else-<[0-9]+>.
This fixes problem with rollover when you have 2 data streams with names like a and a-b - currently if a-b has generation greater than a you won't be able to rollover a anymore.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-12-07 19:54:46 +01:00
James Rodewig 53b839fada
[DOCS] Fix wording for HTTP settings (#65964) (#65968) 2020-12-07 12:33:22 -05:00
David Turner 29a50357e1 Expand docs on disk-based shard allocation (#65668)
Today we document the settings used to control rebalancing and
disk-based shard allocation but there isn't really any discussion around
what these processes do so it's hard to know what, if any, adjustments
to make.

This commit adds some words to help folk understand this area better.
2020-12-07 14:56:55 +00:00
James Rodewig 4676cb30d0
[DOCS] Make data stream names consistent (#65920) (#65944) 2020-12-07 09:14:03 -05:00
James Rodewig 24b48230cc
[DOCS] EQL: Add diagrams for sequence matching (#65898) (#65940) 2020-12-07 08:39:41 -05:00
Przemysław Witek d562caf9b2
Fix compile errors in QuerierTests (#65935) 2020-12-07 13:27:36 +01:00
Bogdan Pintea 2ec53ea7c4 Abort sorting in case of local agg sort queue overflow (#65687)
In case the local agg sorter queue gets full and no limit has been provided,
the local sorter will now erroneously call the failure callback for every
single row in the original rowset that's left over the local queue limit
(instead for just the first one).  The failure response is dispatched in any
case, so this is relatively harmless.  The sorter continues iterating on the
original response fetching subsequent pages. In case of correct Elasticsearch
behaviour, this is also harmless, it'll just trigger a number of internal
exceptions. However, in case of a pagination defect in Elasticsearch (like
GH#65685, where the same search_after is returned), this will result in an
effective spin loop, potentially rendering eventually the node unresponsive.

This PR simply breaks both the inner loop iterating over the current unsorted
rowset, as well as the outer one, iterating over the left pages.

It also fixes an outdated documentation limitation.

(cherry picked from commit 638402c387faf79bba38fcc95f371a73146efc0b)
2020-12-07 11:32:41 +01:00
James Rodewig 1ed5a5633f
[DOCS] Fix typo (#65912) (#65914)
Co-authored-by: Toast <mrtoastcheng@gmail.com>
2020-12-05 10:21:02 -05:00
James Rodewig 8b8154594d
[DOCS] Correct the default value of `wait_for_completion` query param (#65800) (#65903)
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>

Co-authored-by: bellengao <gbl_long@163.com>
2020-12-04 17:14:50 -05:00
lcawl 5320083bed [DOCS] Add coming tag to release notes 2020-12-04 10:59:10 -08:00
James Rodewig 793eb48502
[DOCS] EQL: Document how sequence queries handle matches (#65794) (#65887)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
2020-12-04 09:57:08 -05:00
Francisco Fernández Castaño a5e65beab2
Add release notes for 7.10.1 (#65439)
* Add release notes for 7.10.1

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2020-12-04 12:18:21 +01:00
Jay Modi 55e79dd286
Mute IndicesClientIT.testDataStreams (#65859)
This commit mutes IndicesClientIT.testDataStreams as this test is
failing in CI intermittently.

Relates #60746
Relates #60461
2020-12-03 14:36:35 -07:00
Seth Michael Larson 5a89835025
Mark Task APIs as experimental in rest-api-spec 2020-12-03 15:11:21 -06:00
Nhat Nguyen 26d67c1662 Ensure notify when proxy connections disconnect (#65697)
TransportService doesn't respond to the pending requests of proxy
connections when the underlying connections get disconnected because
proxy connections do not override the getCacheKey method. Some CCS
requests would never be completed because of this bug.
2020-12-03 14:53:17 -05:00
Mike Barretta 4d6a501cec
Update inference-bucket-aggregation.asciidoc (#65833)
minor tweak to lineup a code example and add a missing word
2020-12-03 11:49:24 -05:00
James Rodewig 873f4995c5
[DOCS] Fix typo in histogram agg docs (#65822) (#65828) 2020-12-03 10:53:19 -05:00
leonseng 57dca5f44a
Add missing comma in sample payload for the watcher's pagerduty action 2020-12-03 08:29:23 -06:00
Christoph Büscher dc2444c631 Fix lucene_version_path in docs (#65809)
The `lucene_version_path` variable is used in urls pointing to the Lucene docs.
They use underscore separators for the major and minor versions there.
Correcting this since this has been lost in the latest update on the 7.x branch.
2020-12-03 14:51:55 +01:00
Mark Vieira 49d062efb0 Always use "elasticsearch" as Gradle project name (#65709) 2020-12-02 09:29:36 -08:00
Bhavya Gupta dcff07f717
Updated painless-walkthrough documentation (#65530)
Update documentation about regex note.
2020-12-02 09:23:12 -08:00
James Rodewig ff368304a5
[DOCS] Add cluster get settings API example (#65754) (#65758) 2020-12-02 10:56:51 -05:00
James Rodewig d56d2a399b
[DOCS] Remove inert component template file (#65749) (#65752) 2020-12-02 09:59:00 -05:00
James Rodewig 6ae0370dd7
[DOCS] Fix EQL syntax formatting (#65711) (#65744)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-12-02 09:40:52 -05:00
Thiago Souza 5b422dd19d
[DOCS] Remove erroneous `flat_settings` query param (#65670) 2020-12-02 09:15:15 -05:00
István Zoltán Szabó df16045753
[DOCS] Changes wording of pivot parameter in PUT transforms API docs. (#65731) (#65737) 2020-12-02 14:57:25 +01:00
James Rodewig b8291764da
[DOCS] Update reference documentation that mentions CMS (#50542) (#65734)
Relates to https://github.com/elastic/elasticsearch/issues/46973

Co-authored-by: Evgenia Badyanova <evgenia.badiyanova@elastic.co>
2020-12-02 08:11:03 -05:00
Jim Ferenczi 1c34507e66 Create async search index if necessary on updates and deletes (#64606)
This change ensures that we create the async search index with the right mappings and settings when updating or deleting a document. Users can delete the async search index at any time so we have to re-create it internally if necessary before applying any new operation.
2020-12-02 09:04:28 +01:00
James Rodewig bd159d8c17
[DOCS] EQL: Flatten EQL syntax headings (#65693) (#65696) 2020-12-01 13:43:46 -05:00
James Rodewig eca0401d56
[DOCS] EQL: Remove outdated wildcard ref (#65684) (#65691) 2020-12-01 11:56:12 -05:00
Armin Braun 745f527fea
Deduplicate Index Meta Generations when Deserializing (#65619) (#65666)
These strings are quite long individually and will be repeated
potentially up to the number of snapshots in the repository times.
Since these make up more than half of the size of the repository metadata
and are likely the same for all snapshots the savings from deduplicating them
can make up for more than half the size of `RepositoryData` easily in most real-world
cases.
2020-12-01 12:34:35 +01:00
Armin Braun f8f08ba3a7
Fix NPE in ClusterInfoService (#65654) (#65659)
Store stats can be `null` if e.g. the shard was already closed
when the stats where retrieved. Don't record those shards in the
sizes map to fix an NPE in this case.
2020-12-01 10:33:36 +01:00
Armin Braun 16642f1c74
Handle RejectedExecutionException in ShardFollowTasksExecutor (#65648) (#65653)
Follow-up to #65415. We can't have this exception bubble up in an exception
handler any longer due to the new assertion so we must handle it here.
2020-12-01 06:51:05 +01:00
Armin Braun 6bbeedc932
Reset Deflater/Inflater after Use in DeflateCompressor (#65617) (#65646)
We should reset after use, not before reuse. Otherwise we keep the input buffers
on these objects around for a long time and they can grow to O(MB).
2020-12-01 02:44:36 +01:00
Przemko Robakowski bb0fcb150b Fix TranslogTests.testTotalTests when n=0 (#65632)
When n=0 in TranslogTests.testTotalTests we never update earliestLastModifiedAge so it fails comparison with default value of total.getEarliestLastModifiedAge() which is 0.
In this change we always check this special case and then select n>0

Closes #65629
2020-11-30 18:35:55 -05:00
Howard 0137c1679b Fix the earliest last modified age of translog stats (#64753)
Currently translog's `earliest_last_modified_age` field is always 0 in `_nodes/stats` response.
2020-11-30 17:34:55 -05:00
James Rodewig 154c579e9b
[DOCS] Correct restore snapshot API request example (#65525) (#65628)
Co-authored-by: bellengao <gbl_long@163.com>
2020-11-30 14:21:36 -05:00
Dody Suria Wijaya d14dcfc473
[DOCS] Fix type exists API request example (#65574) 2020-11-30 13:45:46 -05:00
Henning Andersen 02ed90b54a Searchable snapshot terminology (#65549)
We chose to use searchable snapshot index over snapshot-backed index, so
changed terminology towards this in a couple places.
2020-11-30 17:17:04 +01:00
Henning Andersen 9564a8b1e0 Cold tier time-range should not be specified (#65546)
Whether the cold tier can handle years depends a lot on the use case and
for instance our BWC guarantees. This would need to be part of a
specific sizing exercise, so in the spirit of not over-promising, the
description of the cold tier has been changed to not mention years.
2020-11-30 17:10:05 +01:00
David Turner aa8ebeb918 Clarify snapshot incrementality (#65587)
Today we describe snapshots as "incremental" but their incrementality is
rather different beast from e.g. incremental filesystem backups. With
traditional backups you take a large and relatively infrequent "full"
backup and then a sequence of smaller "incremental" ones, and this whole
sequence of backups is required for a restore so it must be kept around
until at least the next full backup. In contrast, Elasticsearch
snapshots are logically independent and each can be deleted without
affecting the integrity of the others.

This distinction frequently causes confusion amongst newer users, so
this commit clarifies what we mean by "incremental" in the docs.
2020-11-30 14:58:26 +00:00
James Rodewig a122f10742
[DOCS] Add `require_alias` query param to reindex API (#65608) (#65610) 2020-11-30 09:48:29 -05:00
James Rodewig c13d6082fe
[DOCS] Add missing "with" in remote reindex doc (#65532) (#65603)
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
2020-11-30 09:34:41 -05:00
Alan Woodward fb84b6710d
Restore use of default search and search_quote analyzers (#65491) (#65562)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 18:34:59 +00:00
Ioannis Kakavas f6921af885 Revert "Gracefully handle exceptions from Security Providers (#65464) (#65554)"
This reverts commit 12ba9e3e16. This
commit was mechanically backported to 7.10 while it shouldn't have
been.
2020-11-26 17:11:34 +02:00
Ioannis Kakavas 12ba9e3e16
Gracefully handle exceptions from Security Providers (#65464) (#65554)
In certain situations, such as when configured in FIPS 140 mode,
the Java security provider in use might throw a subclass of
java.lang.Error. We currently do not catch these and as a result
the JVM exits, shutting down elasticsearch.

This commit attempts to address this by catching subclasses of Error
that might be thrown for instance when a PBKDF2 implementation
is used from a Security Provider in FIPS 140 mode, with the password
input being less than 14 bytes (112 bits).

- In our PBKDF2 family of hashers, we catch the Error and
throw an ElasticsearchException while creating or verifying the
hash. We throw on verification instead of simply returning false
on purpose so that the message bubbles up and the cause becomes
obvious (otherwise it would be indistinguishable from a wrong
password).
- In KeyStoreWrapper, we catch the Error in order to wrap and re-throw 
a GeneralSecurityException with a helpful message. This can happen when 
using any of the keystore CLI commands, when the node starts or when we 
attempt to reload secure settings.
- In the `elasticsearch-users` tool, we catch the ElasticsearchException that
the Hasher class re-throws and throw an appropriate UserException.

Tests are missing because it's not trivial to set CI in fips approved mode
right now, and thus any tests would need to be muted. There is a parallel
effort in #64024 to enable that and tests will be added in a followup.
2020-11-26 17:04:34 +02:00
Henning Andersen 9f35b3d402 Clarify searchable snapshot cost trade-offs (#65384)
Clarify that searchable snapshots only result in cost savings for less
frequently accessed data and that the savings do not apply to the entire
cluster.
2020-11-26 13:45:47 +01:00
Ioannis Kakavas b4b4483e24
Do not interpret SecurityException in KeystoreAwareCommand (#65366) (#65486)
KeyStoreAwareCommand attempted to deduce whether an error occurred
because of a wrong password by checking the cause of the
SecurityException that KeyStoreWrapper.decrypt() throws. Checking
for AEADBadTagException was wrong becase that exception could be
(and usually is) wrapped in an IOException. Furthermore, since we
are doing the check already in KeyStoreWrapper, we can just return
the message of the SecurityException to the user directly, as we do
in other places.
2020-11-26 13:12:18 +02:00