Commit Graph

650 Commits

Author SHA1 Message Date
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Nhat Nguyen f63cbad629
Ensure CCR partial reads never overuse buffer (#58620)
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
2020-07-01 13:23:28 -04:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Przemko Robakowski a44dad9fbb
[7.x] Add support for snapshot and restore to data streams (#57675) (#58371)
* Add support for snapshot and restore to data streams (#57675)

This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.

Closes #57127

Relates to #53100

* version fix

* compilation fix

* compilation fix

* remove unused changes

* compilation fix

* test fix
2020-06-19 22:41:51 +02:00
Stuart Tettemer 20abba8433
Scripting: Deprecate general cache settings (#55753) (#58283)
Backport: ef543b0
2020-06-18 11:54:23 -06:00
Jason Tedor be08268562
Allow follower indices to override leader settings (#58103)
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
2020-06-18 11:56:06 -04:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Stuart Tettemer 01795d1925
Revert "Scripting: Deprecate general cache settings (#55753)" (#58201)
This reverts commit 88e8b34fc2.
2020-06-16 14:58:18 -06:00
Stuart Tettemer 88e8b34fc2
Scripting: Deprecate general cache settings (#55753)
Backport: ef543b0
2020-06-16 13:06:59 -06:00
Yannick Welsch 1e235a7f55 Fix off-by-one on CCR lease (#58158)
The leases issued by CCR keep one extra operation around on the leader shards. This is not
harmful to the leader cluster, but means that there's potentially one delete that can't be
cleaned up.
2020-06-16 14:04:58 +02:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Armin Braun 9fa60f7367
Add History UUID Index Setting (#56930) (#57104)
Pre-requesite for #50278 to be able to uniquely identify index metadata by
its version fields and UUIDs when restoring into closed indices.
2020-05-25 11:26:03 +02:00
Ioannis Kakavas 6c90727166
Fix custom policy in plugins in FIPS 140 (#52046) (#57049)
Our FIPS 140 testing depends on setting the appropriate java policy
in order to configure the JVM in FIPS mode. Some tests (
discovery-ec2 and ccr qa ) also needed to set a custom policy file
to grant a specific permission, which overwrote the FIPS related
policy and tests would fail. This change ensures that when a
custom policy needs to be set in these tests, the permissions that
are necessary for FIPS are also set.

Resolves: #51685, #52034
2020-05-21 19:26:56 +03:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Yannick Welsch f296c08021 Increase timeout for assertLongBusy in AutoFollowIT (#56910)
Closes #56891
2020-05-18 16:20:46 +02:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Ryan Ernst 902fc546bd
Migrate remaining ESIntegTestCases to internalClusterTest (#56479) (#56563)
This commit migrates the ESIntegTestCase tests in x-pack to the
internalClusterTest source set.
2020-05-11 21:06:04 -07:00
Lee Hinman 1337b35572
Remove prefer_v2_templates query string parameter (#56545)
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.

Relates to #53101
Resolves #56528

This is not a breaking change because this flag was never in a released version.
2020-05-11 14:56:42 -06:00
Martijn van Groningen 9ae09570d8
Allow a number of broadcast transport actions to resolve data streams (#55726) (#56502)
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.

This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats, 
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.

Relates to #53100
2020-05-11 12:48:35 +02:00
Armin Braun e01b999ef0
Add Functionality to Consistently Read RepositoryData For CS Updates (#55773) (#56091)
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
2020-05-04 08:13:14 +02:00
Armin Braun 3a64ecb6bf
Allow Deleting Multiple Snapshots at Once (#55474) (#56083)
* Allow Deleting Multiple Snapshots at Once (#55474)

Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
2020-05-03 20:30:58 +02:00
William Brafford d53c941c41
Make xpack.monitoring.enabled setting a no-op (#55617) (#56061)
* Make xpack.monitoring.enabled setting a no-op

This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved
removing the setting from the setup for integration tests. Monitoring may
introduce some complexity for test setup and teardown, so we should keep an eye
out for turbulence and failures

* Docs for making deprecated setting a no-op
2020-05-01 16:42:11 -04:00
Ryan Ernst 52b9d8d15e
Convert remaining license methods to isAllowed (#55908) (#55991)
This commit converts the remaining isXXXAllowed methods to instead of
use isAllowed with a Feature value. There are a couple other methods
that are static, as well as some licensed features that check the
license directly, but those will be dealt with in other followups.
2020-04-30 15:52:22 -07:00
Jason Tedor 22a8b60187
Reduce code duplication in CCR non-compliance tests
This commit removes some code duplication in the CCR non-compliance
tests by refactoring an assertion method so that it can be used in both
tests that are present there.
2020-04-24 13:24:56 -04:00
Armin Braun db7eb8e8ff
Remove Redundant CS Update on Snapshot Finalization (#55276) (#55528)
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
2020-04-21 15:33:17 +02:00
Nhat Nguyen 3cc4e0dd09 Retry follow task when remote connection queue full (#55314)
If more than 100 shard-follow tasks are trying to connect to the remote 
cluster, then some of them will abort with "connect listener queue is 
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
2020-04-20 22:43:05 -04:00
Lee Hinman 9eddd2bcc9
[7.x] Add prefer_v2_templates flag and index setting (#55411) (#55476)
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover

These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.

Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.

This also adds support for sending this parameter to the High Level Rest Client.

Relates to #53101
2020-04-20 12:05:42 -06:00
Armin Braun a0763d958d
Make RepositoryData Less Memory Heavy (#55293) (#55468)
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
2020-04-20 18:28:06 +02:00
William Brafford 49e30b15a2
Deprecate disabling basic-license features (#54816) (#55405)
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.

This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
2020-04-17 15:04:17 -04:00
David Turner 7941f4a47e Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 16:27:36 +01:00
Henning Andersen b3eb57a094 CCR: Test follow on top of closed index (#54956)
Added testing of following on top of a closed index.
This could for instance be the old leader index in
cases where leader and follower clusters have been
swapped.
2020-04-15 20:13:32 +02:00
Armin Braun 2f91e2aab7
Fix Race in Snapshot Abort (#54873) (#55233)
We can be a little more efficient when aborting a snapshot. Since we know the new repository
data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners.
This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.
2020-04-15 15:42:15 +02:00
Ryan Ernst ae14d1661e
Replace license check isAuthAllowed with isSecurityEnabled (#54547) (#55082)
The isAuthAllowed() method for license checking is used by code that
wants to ensure security is both enabled and available. The enabled
state is dynamic and provided by isSecurityEnabled(). But since security
is available with all license types, an check on the license level is
not necessary. Thus, this change replaces isAuthAllowed() with calling
isSecurityEnabled().
2020-04-13 12:26:39 -07:00
Nhat Nguyen c9f8fb2dd0 Clear recent errors when auto-follow successfully (#54997)
Today, we do not clear the recent errors in AutoFollowCoordinator when 
we successfully auto-follow indices. This can lead to confusion for the
operators.
2020-04-09 14:35:16 -04:00
Nhat Nguyen 73d24203e7 Handle no such remote cluster exception in ccr (#53415)
A remote client can throw a NoSuchRemoteClusterException while fetching
the cluster state from the leader cluster. We also need to handle that
exception when retrying to add a retention lease to the leader shard.

Closes #53225
2020-04-04 13:55:06 -04:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Yannick Welsch 8126ad0ab1 Increase timeout on testUpdateAnalysisLeaderIndexSettings
Closes #54204
2020-03-27 13:41:47 +01:00
Jason Tedor c547fabb2b
Put CCR tasks on (data && remote cluster clients) (#54146)
Today we assign CCR persistent tasks to nodes with the data role. It
could be that the data node is not capable of connecting to remote
clusters, in which case the task will fail since it can not connect to
the remote cluster with the leader shard. Instead, we need to assign
such tasks to nodes that are capable of connecting to remote
clusters. This commit addresses this by enabling such persistent tasks
to only be assigned to nodes that have the data role, and also have the
remote cluster client role.
2020-03-26 23:50:16 -04:00
Gordon Brown 0d30b48613
Disallow negative TimeValues (#53913)
This commit causes negative TimeValues, other than -1 which is sometimes used as
a sentinel value, to be rejected during parsing.

Also introduces a hack to allow ILM to load policies which were written to the
cluster state with a negative min_age, treating those values as 0, which should
match the behavior of prior versions.
2020-03-26 13:30:35 -06:00
Armin Braun 5b9864db2c
Better Incrementality for Snapshots of Unchanged Shards (#52182) (#53984)
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
2020-03-23 16:43:41 +01:00
Jake Landis db3420d757
[7.x] Optimize which Rest resources are used by the Rest tests… (#53766)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-19 12:28:59 -05:00
Stuart Tettemer cdbee32f55
Scripting: Per-context script cache, default off (#52855) (#53756)
* Adds per context settings:
  `script.context.${CONTEXT}.cache_max_size` ~
  `script.cache.max_size`

  `script.context.${CONTEXT}.cache_expire` ~
  `script.cache.expire`

  `script.context.${CONTEXT}.max_compilations_rate` ~
  `script.max_compilations_rate`

* Context cache is used if:
  `script.max_compilations_rate=use-context`.  This
  value is dynamically updatable, so users can
  switch back to the general cache if desired.

* Settings for context caches take the first value
  that applies:
  1) Context specific settings if set, eg
     `script.context.ingest.cache_max_size`
  2) Correlated general setting is set to the non-default
     value, eg `script.cache.max_size`
  3) Context default

The reason for 2's inclusion is to allow an easy
transition for users who've customized their general
cache settings.

Using the general cache settings for the context caches
results in higher effective settings, since they are
multiplied across the number of contexts.  So a general
cache max size of 200 will become 200 * # of contexts.
However, this behavior it will avoid users snapping to a
value that is too low for them.

Backport of: #52855
Refs: #50152
2020-03-18 14:44:04 -06:00
Jay Modi af36665b08
Deprecate the logstash enabled setting (#53487)
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.

Backport of #53367
2020-03-12 10:18:39 -06:00
Nhat Nguyen cad02d4a31 Increase timeout testFollowIndexMaxOperationSizeInBytes (#53014)
Replicating 1000 documents one by one (as we cap the request size at 
1 byte) can take more than 10 seconds on a slow CI.

Closes #52812
2020-03-10 21:51:33 -04:00
Jason Tedor 8ad0080a59
Fork CCR checkpoint listeners on CCR thread pool (#53265)
This commit moves the global checkpoint listeners used in CCR to the CCR
thread pool. This removes the last use of the listener thread pool in
the codebase.
2020-03-09 08:56:30 -04:00