OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-03-09 14:34:43 +00:00

Author	SHA1	Message	Date
Nhat Nguyen	87c889f9c9	CCR should retry on CircuitBreakingException (#62013 ) CCR shard follow task can hit CircuitBreakingException on the leader cluster (read changes requests) or the follower cluster (bulk requests). CCR should retry on CircuitBreakingException as it's a transient error.	2020-09-10 17:23:47 -04:00
Nhat Nguyen	e37ce561c7	Set timeout of auto put-follow request to unbounded (#61679 ) If the master node of the follower cluster is busy, then the auto-follower will fail to initialize the following process. This also occurs when an auto-follow pattern matches multiple indices. We should set the timeout of put-follow requests issued by the auto-follower to unbounded to avoid this problem. Closes #56891	2020-08-31 09:58:19 -04:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Nhat Nguyen	328c86a4ec	Increase timeout in PrimaryFollowerAllocationIT A slow CI can take more than 10 seconds to relocate shards on the follower.	2020-08-13 14:41:32 -04:00
Nhat Nguyen	ceaa28e97b	Increase timeout in testFollowIndexWithConcurrentMappingChanges (#60534 ) The test failed because the leader was taking a lot of CPUs to process many mapping updates. This commit reduces the mapping updates, increases timeout, and adds more debug info. Closes #59832	2020-08-11 17:03:22 -04:00
Nhat Nguyen	bf7eecf1dc	Fix synchronization in ShardFollowNodeTask (#60490 ) The leader mapping, settings, and aliases versions in a shard follow-task are updated without proper synchronization and can go backward.	2020-08-11 14:52:52 -04:00
Nhat Nguyen	9d4a64e749	Allow CCR on nodes with legacy roles only (#60093 ) CCR will stop functioning if the master node is on 7.8, but data nodes are before that version because the master node considers that all data nodes do not have the remote cluster client role. This commit allows CCR work on data nodes with legacy roles only. Relates #54146 Relates #59375	2020-07-29 10:57:31 -04:00
Nhat Nguyen	416e51980c	Relax ShardFollowTasksExecutor validation (#60054 ) If a primary shard of a follower index is being relocated, then we will fail to create a follow-task. This validation is too restricted. We should ensure that all primaries of the follower index are active instead. Closes #59625	2020-07-28 13:46:49 -04:00
Nhat Nguyen	6ece629ec3	Set timeout of master requests on follower to unbounded (#60070 ) Today, a follow task will fail if the master node of the follower cluster is temporarily overloaded and unable to process master node requests (such as update mapping, setting, or alias) from a follow-task within the default timeout. This error is transient, and follow-tasks should not abort. We can avoid this problem by setting the timeout of master node requests on the follower cluster to unbounded. Closes #56891	2020-07-28 13:46:49 -04:00
Nhat Nguyen	bc65b3a590	Increase timeout in AutoFollowIT (#60004 ) It can take more than 10 seconds to auto-follow and create a follow-task on a slow CI. This commit increases timeout in AutoFollowIT by replacing assertBusy with assertLongBusy. Closes #59952	2020-07-23 16:36:53 -04:00
Nhat Nguyen	0fe4d5df67	Increase timeout testFollowIndexWithConcurrentMappingChanges Fixes #59273	2020-07-23 16:22:58 -04:00
Yannick Welsch	07784a0b16	CCR recoveries using wrong setting for chunk sizes (#59597 ) The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.	2020-07-21 13:56:06 +02:00
Nhat Nguyen	b599f7a9c0	Fix estimate size of translog operations (#59206 ) Make sure that the estimateSize method includes all fields of translog operations.	2020-07-16 00:19:30 -04:00
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Nhat Nguyen	4d7c59bedb	Assign follower primary to nodes with remote cluster client role (#59375 ) The primary shards of follower indices during the bootstrap need to be on nodes with the remote cluster client role as those nodes reach out to the corresponding leader shards on the remote cluster to copy Lucene segment files and renew the retention leases. This commit introduces a new allocation decider that ensures bootstrapping follower primaries are allocated to nodes with the remote cluster client role. Co-authored-by: Jason Tedor <jason@tedor.me>	2020-07-14 11:23:55 -04:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Daniel Mitterdorfer	daa48329ec	[TEST] Mute FollowerFailOverIT.testFailOverOnFollower (#58659 ) (#59286 ) Relates #58534 Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>	2020-07-09 12:38:36 +02:00
Nhat Nguyen	ef5c397c0f	Sending operations concurrently in peer recovery (#58018 ) Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018	2020-07-07 22:03:31 -04:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Nhat Nguyen	f63cbad629	Ensure CCR partial reads never overuse buffer (#58620 ) When the documents are large, a follower can receive a partial response because the requesting range of operations is capped by max_read_request_size instead of max_read_request_operation_count. In this case, the follower will continue reading the subsequent ranges without checking the remaining size of the buffer. The buffer then can use more memory than max_write_buffer_size and even causes OOM. Backport of #58620	2020-07-01 13:23:28 -04:00
Ryan Ernst	c23613e05a	Split license allowed checks into two types (#58704 ) (#58797 ) The checks on the license state have a singular method, isAllowed, that returns whether the given feature is allowed by the current license. However, there are two classes of usages, one which intends to actually use a feature, and another that intends to return in telemetry whether the feature is allowed. When feature usage tracking is added, the latter case should not count as a "usage", so this commit reworks the calls to isAllowed into 2 methods, checkFeature, which will (eventually) both check whether a feature is allowed, and keep track of the last usage time, and isAllowed, which simply determines whether the feature is allowed. Note that I considered having a boolean flag on the current method, but wanted the additional clarity that a different method name provides, versus a boolean flag which is more easily copied without realizing what the flag means since it is nameless in call sites.	2020-07-01 07:11:05 -07:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
David Turner	3a234d2669	Account for remaining recovery in disk allocator (#58800 ) Today the disk-based shard allocator accounts for incoming shards by subtracting the estimated size of the incoming shard from the free space on the node. This is an overly conservative estimate if the incoming shard has almost finished its recovery since in that case it is already consuming most of the disk space it needs. This change adds to the shard stats a measure of how much larger each store is expected to grow, computed from the ongoing recovery, and uses this to account for the disk usage of incoming shards more accurately. Backport of #58029 to 7.x * Picky picky * Missing type	2020-07-01 10:12:44 +01:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Przemko Robakowski	a44dad9fbb	[7.x] Add support for snapshot and restore to data streams (#57675 ) (#58371 ) * Add support for snapshot and restore to data streams (#57675) This change adds support for including data streams in snapshots. Names are provided in indices field (the same way as in other APIs), wildcards are supported. If rename pattern is specified it renames both data streams and backing indices. It also adds test to make sure SLM works correctly. Closes #57127 Relates to #53100 * version fix * compilation fix * compilation fix * remove unused changes * compilation fix * test fix	2020-06-19 22:41:51 +02:00
Stuart Tettemer	20abba8433	Scripting: Deprecate general cache settings (#55753 ) (#58283 ) Backport: ef543b0	2020-06-18 11:54:23 -06:00
Jason Tedor	be08268562	Allow follower indices to override leader settings (#58103 ) Today when creating a follower index via the put follow API, or via an auto-follow pattern, it is not possible to specify settings overrides for the follower index. Instead, we copy all of the leader index settings to the follower. Yet, there are cases where a user would want some different settings on the follower index such as the number of replicas, or allocation settings. This commit addresses this by allowing the user to specify settings overrides when creating follower index via manual put follower calls, or via auto-follow patterns. Note that not all settings can be overrode (e.g., index.number_of_shards) so we also have detection that prevents attempting to override settings that must be equal between the leader and follow index. Note that we do not even allow specifying such settings in the overrides, even if they are specified to be equal between the leader and the follower index. Instead, the must be implicitly copied from the leader index, not explicitly set by the user.	2020-06-18 11:56:06 -04:00
Stuart Tettemer	01795d1925	Revert "Scripting: Deprecate general cache settings (#55753 )" (#58201 ) This reverts commit 88e8b34fc2d672060a82979cb782b8cf491a3985.	2020-06-16 14:58:18 -06:00
Stuart Tettemer	88e8b34fc2	Scripting: Deprecate general cache settings (#55753 ) Backport: ef543b0	2020-06-16 13:06:59 -06:00
Yannick Welsch	1e235a7f55	Fix off-by-one on CCR lease (#58158 ) The leases issued by CCR keep one extra operation around on the leader shards. This is not harmful to the leader cluster, but means that there's potentially one delete that can't be cleaned up.	2020-06-16 14:04:58 +02:00
Armin Braun	9fa60f7367	Add History UUID Index Setting (#56930 ) (#57104 ) Pre-requesite for #50278 to be able to uniquely identify index metadata by its version fields and UUIDs when restoring into closed indices.	2020-05-25 11:26:03 +02:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Yannick Welsch	f296c08021	Increase timeout for assertLongBusy in AutoFollowIT (#56910 ) Closes #56891	2020-05-18 16:20:46 +02:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Ryan Ernst	902fc546bd	Migrate remaining ESIntegTestCases to internalClusterTest (#56479 ) (#56563 ) This commit migrates the ESIntegTestCase tests in x-pack to the internalClusterTest source set.	2020-05-11 21:06:04 -07:00
Lee Hinman	1337b35572	Remove prefer_v2_templates query string parameter (#56545 ) This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that allowed specifying whether V1 or V2 template should be used when an index is created. It has been removed in favor of V2 templates always having priority. Relates to #53101 Resolves #56528 This is not a breaking change because this flag was never in a released version.	2020-05-11 14:56:42 -06:00
Martijn van Groningen	9ae09570d8	Allow a number of broadcast transport actions to resolve data streams (#55726 ) (#56502 ) Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction to be able to resolve data streams by default. Implementations can change this ability. This change allows to following APIs to resolve data streams: flush, refresh (already supported data streams), force merge, clear indices cache, indices stats (already supported data streams), segments, upgrade stats, upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and reload analyzers APIs. Relates to #53100	2020-05-11 12:48:35 +02:00
Armin Braun	e01b999ef0	Add Functionality to Consistently Read RepositoryData For CS Updates (#55773 ) (#56091 ) Using optimistic locking, add the ability to run a repository state update task with a consistent view of the current repository data. Allows for a follow-up to remove the snapshot INIT state.	2020-05-04 08:13:14 +02:00
Armin Braun	3a64ecb6bf	Allow Deleting Multiple Snapshots at Once (#55474 ) (#56083 ) * Allow Deleting Multiple Snapshots at Once (#55474) Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise. This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.	2020-05-03 20:30:58 +02:00
William Brafford	d53c941c41	Make xpack.monitoring.enabled setting a no-op (#55617 ) (#56061 ) * Make xpack.monitoring.enabled setting a no-op This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved removing the setting from the setup for integration tests. Monitoring may introduce some complexity for test setup and teardown, so we should keep an eye out for turbulence and failures * Docs for making deprecated setting a no-op	2020-05-01 16:42:11 -04:00
Ryan Ernst	52b9d8d15e	Convert remaining license methods to isAllowed (#55908 ) (#55991 ) This commit converts the remaining isXXXAllowed methods to instead of use isAllowed with a Feature value. There are a couple other methods that are static, as well as some licensed features that check the license directly, but those will be dealt with in other followups.	2020-04-30 15:52:22 -07:00
Armin Braun	db7eb8e8ff	Remove Redundant CS Update on Snapshot Finalization (#55276 ) (#55528 ) This change folds the removal of the in-progress snapshot entry into setting the safe repository generation. Outside of removing an unnecessary cluster state update, this also has the advantage of removing a somewhat inconsistent cluster state where the safe repository generation points at `RepositoryData` that contains a finished snapshot while it is still in-progress in the cluster state, making it easier to reason about the state machine of upcoming concurrent snapshot operations.	2020-04-21 15:33:17 +02:00
Nhat Nguyen	3cc4e0dd09	Retry follow task when remote connection queue full (#55314 ) If more than 100 shard-follow tasks are trying to connect to the remote cluster, then some of them will abort with "connect listener queue is full". This is because we retry on ESRejectedExecutionException, but not on RejectedExecutionException.	2020-04-20 22:43:05 -04:00
Lee Hinman	9eddd2bcc9	[7.x] Add prefer_v2_templates flag and index setting (#55411 ) (#55476 ) This commit adds a new querystring parameter on the following APIs: - Index - Update - Bulk - Create Index - Rollover These APIs now support a `?prefer_v2_templates=true\|false` flag. This flag changes the preference creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be changed to `true` for 8.0+ in subsequent work. Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting. This setting is used so that actions that automatically create a new index (things like rollover initiated by ILM) will inherit the preference from the original index. This setting is dynamic so that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias performing periodic rollover. This also adds support for sending this parameter to the High Level Rest Client. Relates to #53101	2020-04-20 12:05:42 -06:00

1 2 3 4 5 ...

605 Commits