If a searchable snapshot shard fails (e.g. its node leaves the cluster)
we want to be able to start it up again on a different node as quickly
as possible to avoid unnecessarily blocking or failing searches. It
isn't feasible to fully restore such shards in an acceptably short time.
In particular we would like to be able to deal with the `can_match`
phase of a search ASAP so that we can skip unnecessary waiting on shards
that may still be warming up but which are not required for the search.
This commit solves this problem by introducing a system index that holds
much of the data required to start a shard. Today(*) this means it holds
the contents of every file with size <8kB, and the first 4kB of every
other file in the shard. This system index acts as a second-level cache,
behind the first-level node-local disk cache but in front of the blob
store itself. Reading chunks from the index is slower than reading them
directly from disk, but faster than reading them from the blob store,
and is also replicated and accessible to all nodes in the cluster.
(*) the exact heuristics for what we should put into the system index
are still under investigation and may change in future.
This second-level cache is populated when we attempt to read a chunk
which is missing from both levels of cache and must therefore be read
from the blob store.
We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which
verify that we do not hit the blob store more than necessary when
starting up a shard that we've seen before, whether due to a node
restart or because a snapshot was mounted multiple times.
Backport of #60522
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>