Commit Graph

5052 Commits

Author SHA1 Message Date
Rory Hunter 2e05ce5f88 Bump version to 7.10.0 2020-07-15 11:56:45 +01:00
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
David Turner 0c2510dc68 Don't request cluster metadata in _cat/shards impl (#59548)
Today `GET _cat/shards` requests the nodes, routing table, and metadata
from the cluster state, but it does not use any information from the
metadata portion of the response. Metadata includes things like mappings
and templates that may be substantial in size.

This commit drops the unnecessary metadata portion of this cluster state
request.
2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Armin Braun 96f52a028f
Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428) (#59584)
We were not handling the case where during a partial snapshot all
shards would enter a failed state right off the bat.

Closes #59384
2020-07-15 07:59:22 +02:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Armin Braun 16a47e0d08
Simplify SnapshotsInProgress Construction (#58893) (#59573)
With parallel snapshots incoming (but also in isolation) it makes sense to clean up
`SnapshotsInProgress` construction.
We don't need to pre-compute the waiting shards for every entry. We rarely use this information
(only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping
over all shards instead of just the waiting ones once per routing change tops instead of on every change
to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the
current master cares about the information).
In addition to that change I removed some dead code constructors and slighly optimized deserialization.
2020-07-15 00:00:53 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Armin Braun 68a199f75f
Minor Cleanup Dead Code Snapshotting (#57716) (#59569)
* Use consistent cluster state instead in state update
* Remove dead loop in tests
* Remove some dead exception ctors

Just three trivial/random things I found.
2020-07-14 23:13:14 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Mark Tozzi ed2c29f102
If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360) (#59567)
Co-authored-by: Fabio Corneti <info@corneti.com>
2020-07-14 16:56:29 -04:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Tim Brooks a46e5e0f04
Increase default write queue size (#59464)
This commit increases the default write queue size to 10000. This is to
allow a greater number of pending indexing requests. This work is safe
as we have added additional memory limits. Relates to #59263.
2020-07-14 10:35:25 -06:00
Tim Brooks 1a24916fef
Enable replication retries on 7.9+ (#59546)
Currently the work to support replication retries is present on 7.9.
This commit enables these retries by setting the replication timeout to
60s.
2020-07-14 10:35:05 -06:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Luca Cavanna af2f85be15
Consolidate script parsing from object (7.x) (#59509)
The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391).

This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.
2020-07-14 17:08:29 +02:00
Mark Tozzi b357c1b77a
[7.x] Fix NPE when building exception messages for aggregations (#59156) (#59334) 2020-07-14 09:37:44 -04:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Armin Braun 0e3d87ab54
Add Assertions on CS Application in Snapshot Logic (#58681) (#59511)
Relates to #58680. Bugs like that should not only show up in logs
but ideally also get caught in tests. We expect to never see exceptions
in these two spots.
2020-07-14 12:16:42 +02:00
Armin Braun 81e96954d0
Improve Efficiency of SnapshotsService CS Apply (#56874) (#59508)
This change removes the redundant submitting of two separate cluster state updates
for the node configuration changes and routing changes that affect snapshots.
Since we submitted the task to deal with node configuration changes every time on master
fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well
as snapshots that have all their shards completed into this cluster state update task.

Aside from improving efficiency overall this change has the fortunate side effect of moving
all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots
since it makes it very natural and straight forward to order snapshot finalizations by exploiting
that they are all initiated on the same thread.
2020-07-14 11:49:09 +02:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Tim Brooks 68d56fa7db
Implement rejections in `WriteMemoryLimits` (#59451)
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
`indexing_limits.memory.limit`.
2020-07-13 14:34:50 -06:00
Mark Tozzi eb0b28dd1d
Move getPointReaderOrNull into AggregatorBase (#58769) (#59455) 2020-07-13 16:31:33 -04:00
Armin Braun 64c5f70a2d
Remove Needless Context Switches on Loading RepositoryData (#56935) (#59452)
We don't need to switch to the generic or snapshot pool for loading
cached repository data (i.e. most of the time in normal operation).

This makes `executeConsistentStateUpdate` less heavy if it has to retry
and lowers the chance of having to retry in the first place.
Also, this change allowed simplifying a few other spots in the codebase
where we would fork off to another pool just to load repository data.
2020-07-13 21:38:29 +02:00
Armin Braun bde92fc5fc
Remove Needless Context Switch From Snapshot Finalization (#56871) (#59443)
No need to do any switch to the `SNAPSHOT` pool here, the blob store
repo handles all its writes async on the `SNAPSHOT` pool so we're just
needlessly context-switching to enqueue those tasks there.
Also cleaned up the source only repository (the only override to `finalizeSnapshot`)
to make it clear that no IO is happening there and we don't need to run it on the
`SNAPSHOT` pool either.
2020-07-13 20:11:07 +02:00
Armin Braun 31be3a3645
More Efficient Snapshot State Handling (#56669) (#59430)
Follow up to #56365. Instead of redundantly checking snapshots for completion
over and over, just track the completed snapshots in the CS updates that complete
them instead of looping over the smae snapshot entries over and over.
Also, in the batched snapshot shard status updates, only check for completion
of a snapshot entry if it isn't already finalizing.
2020-07-13 18:58:04 +02:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Henning Andersen adf6083dd0
Enhance real memory circuit breaker with G1 GC (#58674) (#59394)
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202
2020-07-13 17:41:09 +02:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Alan Woodward bd01fd107c Revert "Migrate CompletionFieldMapper to parametrized format (#59291)"
This reverts commit 19ba6c39d2.
2020-07-13 14:16:09 +01:00
Armin Braun 4e574a7136
Remove Dead Code from Closed Index Snapshot Logic (#56764) (#59398)
The code path for closed indices is dead code here ever since #39644
because `shards(currentState, indexIds, ...)` does not set
`MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned.
=> simplified the code accordingly.
2020-07-13 14:49:16 +02:00
David Turner 3fb9dccc22 Fix FSHealthServiceTests on Windows (#59387)
In #52680 we introduced a new health check mechanism. This commit fixes
up some related test failures on Windows caused by erroneously assuming
that all paths begin with `/`.

Closes #59380
2020-07-13 12:43:45 +01:00
Alan Woodward 19ba6c39d2 Migrate CompletionFieldMapper to parametrized format (#59291)
This adds some optional extra configuration to Parameter:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
2020-07-13 12:43:15 +01:00
Armin Braun 08b54feaaf
Remove Snapshot INIT Step (#55918) (#59374)
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
2020-07-13 13:41:09 +02:00
Alan Woodward c810a4a12e Continue to accept unused 'universal' params in <8.0 indexes (#59381)
We have a number of parameters which are universally parsed by almost all
mappers, whether or not they make sense. Migrating the binary and boolean
mappers to the new style of declaring their parameters explicitly has meant
that these universal parameters stopped being accepted, which would break
existing mappings.

This commit adds some extra logic to ParametrizedFieldMapper that checks
for the existence of these universal parameters, and issues a warning on
7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an
error.

Fixes #59359
2020-07-13 11:15:56 +01:00
David Kyle 7dcd943e1d Mute FsHealthServiceTests testFailsHealthOnIOException (#59382)
For #59380
2020-07-13 09:48:07 +01:00
Armin Braun 483386136d
Move all Snapshot Master Node Steps to SnapshotsService (#56365) (#59373)
This refactoring has three motivations:

1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
    * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
2020-07-12 22:19:07 +02:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Stuart Tettemer 4c04fd1e05
Scripting: Unlimited compilation rate for ingest (#59268)
* `ingest` and `processor_conditional` default to unlimited compilation rate

Refs: #50152
2020-07-09 16:34:47 -05:00
Stuart Tettemer 94e213dd5f
Scripting: Per context stats in `script` in _nodes/stats (#59266)
Updated `_nodes/stats`:
 * Update `script` in `_node/stats` to include stats per context:

```
      "script": {
        "compilations": 1,
        "cache_evictions": 0,
        "compilation_limit_triggered": 0,
        "contexts":[
          {
            "context": "aggregation_selector",
            "compilations": 0,
            "cache_evictions": 0,
            "compilation_limit_triggered": 0
          },

```

Refs: #50152
Backport: #59625
2020-07-09 15:30:50 -05:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Dan Hermann c26d2b5fa5
Data stream support for indices shard stores API 2020-07-09 13:11:45 -05:00
Nik Everett 28ef997953
Improve vwh's distant bucket handling (#59094) (#59248)
This modifies the `variable_width_histogram`'s distant bucket handling
to:
1. Properly handle integer overflows
2. Recalculate the average distance when new buckets are added on the
   ends. This should slow down the rate at which we build extra buckets
   as we build more of them.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-09 12:14:46 -04:00
Przemko Robakowski c870d6e570
[7.x] Restart tests with data streams (#58330) (#59303)
* Restart tests with data streams (#58330)
2020-07-09 17:52:20 +02:00
David Turner d56fc72ee5 Fix node health-check-related test failures (#59277)
In #52680 we introduced a new health check mechanism. This commit fixes
up some sporadic related test failures, and improves the behaviour of
the `FollowersChecker` slightly in the case that no retries are
configured.

Closes #59252
Closes #59172
2020-07-09 12:46:12 +01:00
David Turner c80a9e2ec2 Skip unnecessary directory iteration (#59007)
Today `NodeEnvironment#findAllShardIds` enumerates the index directories
in each data path in order to find one with a specific name. Since we
already know the name of the folder we seek we can construct the path
directly and avoid this directory listing. This commit does that.
2020-07-09 11:56:41 +01:00