OpenSearch

Commit Graph

Author	SHA1	Message	Date
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
David Turner	e14d9c9514	Introduce cache index for searchable snapshots (#61595 ) If a searchable snapshot shard fails (e.g. its node leaves the cluster) we want to be able to start it up again on a different node as quickly as possible to avoid unnecessarily blocking or failing searches. It isn't feasible to fully restore such shards in an acceptably short time. In particular we would like to be able to deal with the `can_match` phase of a search ASAP so that we can skip unnecessary waiting on shards that may still be warming up but which are not required for the search. This commit solves this problem by introducing a system index that holds much of the data required to start a shard. Today() this means it holds the contents of every file with size <8kB, and the first 4kB of every other file in the shard. This system index acts as a second-level cache, behind the first-level node-local disk cache but in front of the blob store itself. Reading chunks from the index is slower than reading them directly from disk, but faster than reading them from the blob store, and is also replicated and accessible to all nodes in the cluster. () the exact heuristics for what we should put into the system index are still under investigation and may change in future. This second-level cache is populated when we attempt to read a chunk which is missing from both levels of cache and must therefore be read from the blob store. We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which verify that we do not hit the blob store more than necessary when starting up a shard that we've seen before, whether due to a node restart or because a snapshot was mounted multiple times. Backport of #60522 Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2020-08-27 06:38:32 +01:00
Francisco Fernández Castaño	89a7f32100	Fix SearchableSnapshotDirectoryTests#testRecoveryStateIsKeptOpenAfterPreWarmFailure (#61343 ) The test didn't take into account the case where 0 documents are indexed into the shard, meaning that files aren't loaded during the pre-warm phase. The test injects FileSystem failures, if the snapshot doesn't contain any files, pre-warm doesn't read any files and the recovery completes normally. Closes #61295 Backport of #61317	2020-08-19 19:28:47 +02:00
David Turner	dd7410d8c2	Disable rebalancing in searchable snapshots tests (#61068 ) Fixes a test failure in which we allocated some shards and then relocated them elsewhere, invalidating an assertion about the recovery statistics which assumed that the shards stayed where they were originally allocated. Closes #61067.	2020-08-13 09:08:27 +01:00
Francisco Fernández Castaño	d544528c7b	Increase information on assertRecoveryStats assertion (#60960 ) Backport of #60952	2020-08-11 15:30:59 +02:00
Francisco Fernández Castaño	2a4fd8329b	Avoid a race condition while waiting for pre warm to finish on SearchableSnapshotDirectoryTests (#60906 ) Backport of #60885. Closes #60813	2020-08-10 17:29:16 +02:00
Francisco Fernández Castaño	b4044004aa	Add recovery state tracking for Searchable Snapshots (#60751 ) This pull request adds recovery state tracking for Searchable Snapshots. In order to track recoveries for searchable snapshot backed indices, this pull request adds a new type of RecoveryState. This newRecoveryState instance is able to deal with the small differences that arise during Searchable snapshots recoveries. Those differences can be summarized as follows: - The Directory implementation that's provided by SearchableSnapshots mark the snapshot files as reused during recovery. In order to keep track of the recovery process as the cache is pre-warmed, those files shouldn't be marked as reused. - Once the shard is created, the cache starts its pre-warming phase, meaning that we should keep track of those downloads during that process and tie the recovery to this pre-warming phase. The shard is considered recovered once this pre-warming phase has finished. Backport of #60505	2020-08-05 17:41:49 +02:00
Yannick Welsch	9f6f66f156	Fail searchable snapshot shards on invalid license (#60722 ) Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.	2020-08-05 13:14:15 +02:00
Francisco Fernández Castaño	b500b3d55a	Decrease restore rate limit value to enforce its usage on SearchableSnapshotsIntegTests#testMaxRestoreBytesPerSecIsUsed (#60650 ) Fixes #59287. Backport of #59592	2020-08-04 17:44:47 +02:00
Rene Groeschke	bdd7347bbf	Merge test runner task into RestIntegTest (7.x backport) (#60600 ) * Merge test runner task into RestIntegTest (#60261) * Merge test runner task into RestIntegTest * Reorganizing Standalone runner and RestIntegTest task * Rework general test task configuration and extension * Fix merge issues * use former 7.x common test configuration	2020-08-04 14:46:32 +02:00
Jake Landis	bcb9d06bb6	[7.x] Cleanup xpack build.gradle (#60554 ) (#60603 ) This commit does three things: * Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license) * Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack * Removes a place holder test in favor of disabling the test task (in the async plugin)	2020-08-03 13:11:43 -05:00
Yannick Welsch	9e24a54382	Clean existing index folder when loading searchable snapshot (#60122 ) Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index folders of those preexisting shards. This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will make reuse of the existing index files to populate the cache.	2020-08-03 13:19:11 +02:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Armin Braun	ebb6677815	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) (#60051 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 21:06:31 +02:00
Tanguy Leroux	4827fec1cf	Revert "Mute AzureSearchableSnapshotsIT (#58775 )" (#59749 ) This reverts commit `74a78b3a7b`.	2020-07-17 10:02:46 +02:00
David Turner	691759fb1f	Validate snapshot UUID during restore (#59601 ) Today when mounting a searchable snapshot we obtain the snapshot/index UUIDs and then assume that these are the UUIDs used during the subsequent restore. If you concurrently delete the snapshot and replace it with one with the same name then this assumption is violated, with chaotic consequences. This commit introduces a check that ensures that the snapshot UUID does not change during the mount process. If the snapshot remains in place then the index UUID necessarily does not change either. Relates #50999	2020-07-15 16:23:20 +01:00
Tanguy Leroux	604f22db79	Use a dedicated thread pool for searchable snapshot cache prewarming (#59313 ) (#59590 ) Since #58728 writing operations on searchable snapshot directory cache files are executed in an asynchronous manner using a dedicated thread pool. The thread pool used is searchable_snapshots which has been created to execute prewarming tasks. Reusing the same thread pool wasn't a good idea as it can lead to deadlock situations. One of these situation arose in a test failure where the thread pool was full of prewarming tasks, all waiting for a cache file to be accessible, while the cache file was being evicted by the cache service. But such an eviction can only be processed when all read/write operations on the cache file are completed and in this case the deadlock occurred because the cache file was actively being read by a concurrent search which also won the privilege to write the range of bytes in cache... and this writing operation could never have been completed because of the prewarming tasks making no progress and filling up the thread pool. This commit renames the searchable_snapshots thread pool to searchable_snapshots_cache_fetch_async. Assertions are added to assert that cache writes are executed using this thread pool and to assert that read on cached index inputs are executed using a different thread pool to avoid potential deadlock situations. This commit also adds a searchable_snapshots_cache_prewarming that is used to execute prewarming tasks. It also converts the existing cache prewarming test into a more complte integration test that creates multiple searchable snapshot indices concurrently with randomized thread pool sizes, and verifies that all files have been correctly prewarmed.	2020-07-15 11:45:52 +02:00
Daniel Mitterdorfer	10ef4d2140	Mute testMaxRestoreBytesPerSecIsUsed (#59289 ) Relates #59287	2020-07-09 12:52:17 +02:00
David Turner	6ffdb19a2a	Clean searchable snapshots cache on startup (#59009 ) Today we empty the searchable snapshots cache when cleanly closing a shard, but leak cache files in some cases involving an unclean shutdown. Such leaks are not permanent, they are cleaned up on shard relocation or deletion, but they still might last for arbitrarily long until that happens. This commit introduces a cleanup process that runs during node startup to catch such leaks sooner. Also, today we permit searchable snapshots to be held on custom data paths, and store the corresponding cache files within the custom location. Supporting this feature would make the cleanup process significantly more complicated since it would require each node to parse the index metadata for the shards it held before shutdown. Yet, this feature is undocumented and offers minimal benefits to searchable snapshots. Therefore with this commit we forbid custom data paths for searchable snapshot shards.	2020-07-08 15:17:52 +01:00
Armin Braun	9268b25789	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) (#59216 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 14:25:01 +02:00
Yannick Welsch	0b9eb210b8	Add basic searchable snapshots usage information (#58828 ) (#59160 ) Adds super basic usage information for searchable snapshots, to be extended later. Backport of #58828	2020-07-08 13:09:29 +02:00
Francisco Fernández Castaño	0752a86fe5	Enforce higher priority for RepositoriesService ClusterStateApplier (#59040 ) * Enforce higher priority for RepositoriesService ClusterStateApplier This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation. Backport of #58808	2020-07-07 09:51:08 +02:00
Jake Landis	604c6dd528	7.x - Create plugin for yamlTest task (#56841 ) (#59090 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 14:16:26 -05:00
Tanguy Leroux	49f4227837	Check acknowledged responses in FsSearchableSnapshotsIT (#59021 ) Despite all my attempts I did not manage to reproduce issues like the ones described in #58961. My guess is that the _mount request got retried at some point but I wasn't able to validate this assumption. Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small random chunk size and a large number of documents is picked up in the tests. The parent class also does not verify the acknowledged status of some requests. This commit lowers down the chunk size and number of docs in tests (this is extensively tests in unit tests) and also adds assertions on acknowledged responses. Relates #58961	2020-07-05 10:50:31 +02:00
Tanguy Leroux	6aa669c8bb	Fix SearchableSnapshotDirectoryStatsTests (#58912 ) Similar to #58847 but in a different tests. The failure never reproduced locally but occurs from time to time on CI.	2020-07-02 16:39:26 +02:00
David Kyle	d6643bfc7f	Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902 )" The test was fixed in #58847 This reverts commit `bb96c910a5`.	2020-07-02 13:21:05 +01:00
David Kyle	bb96c910a5	Mute FsSearchableSnapshotsIT testClearCache (#58902 ) For #58901	2020-07-02 12:58:28 +01:00
Mark Vieira	1fcaec7dfc	Ignore test seed used in test system properties (#58789 )	2020-07-01 11:52:22 -07:00
Tanguy Leroux	ec4843f4df	Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847 ) Since #58728 part of searchable snapshot shard files are written in cache in an asynchronous manner in a dedicated thread pool. It means that even if a search query is successful and returns, there are still more bytes to write in the cached files on disk. On CI this can be slow; if we want to check that the cached_bytes_written has changed we need to check multiple times to give some time for the cached data to be effectively written.	2020-07-01 18:01:00 +02:00
Tanguy Leroux	d35e8f45da	Allow read operations to be executed without waiting for full range to be written in cache (#58728 ) (#58829 ) This commit changes CacheFile and CachedBlobContainerIndexInput so that the read operations made by these classes are now progressively executed and do not wait for full range to be written in cache. It relies on the change introduced in #58477 and it is the last change extracted from #58164. Relates #58164	2020-07-01 15:38:17 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
Lee Hinman	74a78b3a7b	Mute AzureSearchableSnapshotsIT (#58775 ) Relates to #58260	2020-06-30 13:30:51 -06:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Tanguy Leroux	4e03633a66	Differentiate base paths for searchable snapshots QA tests (#58664 ) (#58714 ) This commit adds the BuildParams.testSeed to the repository base paths used in searchable snapshots QA tests. For S3 and GCS the test seed is added for coherency sake with other integration tests while it's required for Azure as Azure 3rd party tests are executed on CI simultaneously for regular and SAS token accounts. Closes #58260	2020-06-30 10:18:33 +02:00
Tanguy Leroux	73adcf4d44	SparseFileTracker.Gap should keep a reference to the corresponding Range (#58587 ) (#58665 ) SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill, it does not need to resolve the range each time onSuccess/onProgress/onFailure are called. Relates #58477	2020-06-29 15:24:19 +02:00
Tanguy Leroux	775fb5d4cf	Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477 ) (#58584 ) Today SparseFileTracker allows to wait for a range to become available before executing a given listener. In the case of searchable snapshot, we'd like to be able to wait for a large range to be filled (ie, downloaded and written to disk) while being able to execute the listener as soon as a smaller range is available. This pull request is an extract from #58164 which introduces a ProgressListenableActionFuture that is used internally by SparseFileTracker. The progressive listenable future allows to register listeners attached to SparseFileTracker.Gap so that they are executed once the Gap is completed (with success or failure) or as soon as the Gap progress reaches a given progress value. This progress value is defined when the tracker.waitForRange() method is called; this method has been modified to accept a range and another listener's range to operate on. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-06-26 18:26:20 +02:00
Tanguy Leroux	f3b6e41f02	Do not wrap CacheFile reentrant r/w locks with ReleasableLock (#58244 ) Today the read/write locks used internally by CacheFile object are wrapped into a ReleasableLock. This is not strictly required and also prevents usage of the tryLock() methods which we would like to use for early releasing of read operations (#58164).	2020-06-18 11:01:53 +02:00
David Turner	423697f414	Default to zero replicas for searchable snapshots (#57802 ) Today a mounted searchable snapshot defaults to having the same replica configuration as the index that was snapshotted. This commit changes this behaviour so that we default to zero replicas on these indices, but allow the user to override this in the mount request. Relates #50999	2020-06-16 10:12:23 +01:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Tanguy Leroux	0e57528d5d	Remove more //NORELEASE (#57517 ) We agreed on removing the following //NORELEASE tags.	2020-06-05 15:34:06 +02:00
Tanguy Leroux	b4a2cd810a	Use 3rd party task to run integration tests on external service (#56588 ) Backport of #56587 for 7.x	2020-06-02 11:26:58 +02:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
David Turner	c10b4ae15a	Support cloning of searchable snapshot indices (#56595 ) Today you can convert a searchable snapshot index back into a regular index by restoring the underlying snapshot, but this is somewhat wasteful if the shards are already in cache since it copies the whole index from the repository again. Instead, we can make use of the locally-cached data by using the clone API to copy the contents of the cache into the layout expected by a regular shard. This commit marks the searchable snapshot's private index settings as `NotCopyableOnResize` so that they are removed by resize operations such as cloning. Cloning a regular index typically hard-links the underlying files rather than copying them, but this is tricky to support in the case of a searchable snapshot so this commit takes the simpler approach of always copying the underlying files.	2020-05-13 11:05:14 +01:00
Tanguy Leroux	8e9b69bfd7	Use snapshot information to build searchable snapshot store MetadataSnapshot (#56289 ) (#56403 ) While investigating possible optimizations to speed up searchable snapshots shard restores, we noticed that Elasticsearch builds the list of shard files on local disk in order to compare it with the list of files contained in the snapshot to restore. This list of files is materialized with a MetadataSnapshot object whose construction involves to read the footer checksum of every files of the shard using Store.checksumFromLuceneFile() method. Further investigation shows that a MetadataSnapshot object is also created for other types of operations like building the list of files to recover in a peer recovery (and primary shard relocation) or in order to assign a shard to a node. These operations use the Store.getMetadata(IndexCommit) method to build the list of files and checksums. In the case of searchable snapshots building the MetadataSnapshot object can potentially trigger cache misses, which in turn can cause the download and the writing in cache of the last range of the file in order to check the 16 bytes footer. This in turn can cause more evictions. Since searchable snapshots already contains the footer information of every file in BlobStoreIndexShardSnapshot it can directly read the checksum from it and avoid to use the cache at all to create a MetadataSnapshot for the operations mentioned above. This commit adds a shortcut to the SearchableSnapshotDirectory.openInput() method - similarly to what already exists for segment infos - so that it creates a specific IndexInput for checksum reading operation.	2020-05-08 14:16:19 +02:00
Tanguy Leroux	6233e32ab3	Fix SearchableSnapshotDirectoryTests.testIndexSearcher() (#56275 ) Closes #56233	2020-05-07 11:12:35 +02:00
Tanguy Leroux	65a061e33a	Fix SearchableSnapshotDirectoryTests.testClearCache (#56277 ) This test sometimes fails when prewarming is enabled because it's possible that some files are cached in background while the test tries to clear the cache. This commit disables prewarming for this test.	2020-05-07 10:59:33 +02:00
Tanguy Leroux	07ad742b60	Enable prewarming by default for searchable snapshots (#56201 ) Now searchable snapshots directories respect the repository rate limitations (#55952) we can enable prewarming by default for shards.	2020-05-06 10:18:34 +02:00
Tanguy Leroux	131a3911eb	Replace BlobContainerWrapper by FilterBlobContainer (#56200 ) A FilterBlobContainer class was introduced in #55952 and it delegates its behavior to a given BlobContainer while allowing to override only necessary methods. This commit replaces the existing BlobContainerWrapper class from the test framework with the new FilterBlobContainer from core.	2020-05-06 10:05:43 +02:00
Julie Tibshirani	793f265451	Mute SearchableSnapshotDirectoryTests.testIndexSearcher.	2020-05-05 12:29:05 -07:00
Tanguy Leroux	b9636713b1	Searchable Snapshots should respect max_restore_bytes_per_sec (#55952 ) (#56199 ) This commit changes searchable snapshots so that it now respects the repository's max_restore_bytes_per_sec setting when it downloads blobs. Backport of #55952 for 7.x	2020-05-05 15:43:06 +02:00

1 2

71 Commits