OpenSearch

Commit Graph

Author	SHA1	Message	Date
Rory Hunter	b8d73a1e7e	Default gateway.auto_import_dangling_indices to false (#59302 ) Backport of #58898. Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import behaviour can default to off. Also add a note to the breaking changes for 7.9.0.	2020-07-15 17:10:42 +01:00
David Turner	691759fb1f	Validate snapshot UUID during restore (#59601 ) Today when mounting a searchable snapshot we obtain the snapshot/index UUIDs and then assume that these are the UUIDs used during the subsequent restore. If you concurrently delete the snapshot and replace it with one with the same name then this assumption is violated, with chaotic consequences. This commit introduces a check that ensures that the snapshot UUID does not change during the mount process. If the snapshot remains in place then the index UUID necessarily does not change either. Relates #50999	2020-07-15 16:23:20 +01:00
Martijn van Groningen	2a89e13e43	Move data stream transport and rest action to xpack (#59593 ) Backport of #59525 to 7.x branch. * Actions are moved to xpack core. * Transport and rest actions are moved the data-streams module. * Removed data streams methods from Client interface. * Adjusted tests to use client.execute(...) instead of data stream specific methods. * only attempt to delete all data streams if xpack is installed in rest tests * Now that ds apis are in xpack and ESIntegTestCase no longers deletes all ds, do that in the MlNativeIntegTestCase class for ml tests.	2020-07-15 16:50:44 +02:00
Rory Hunter	2e05ce5f88	Bump version to 7.10.0	2020-07-15 11:56:45 +01:00
Ignacio Vera	f8037abf47	upgrade to lucene-8.6.0 release (#59596 ) (#59599 )	2020-07-15 12:40:57 +02:00
David Turner	0c2510dc68	Don't request cluster metadata in _cat/shards impl (#59548 ) Today `GET _cat/shards` requests the nodes, routing table, and metadata from the cluster state, but it does not use any information from the metadata portion of the response. Metadata includes things like mappings and templates that may be substantial in size. This commit drops the unnecessary metadata portion of this cluster state request.	2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño	66ef1cdad7	Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124 ) Add a custom factory for recovery state into IndexStorePlugin that allows different implementors to provide its own RecoveryState implementation. Backport of #59038	2020-07-15 11:11:07 +02:00
Armin Braun	96f52a028f	Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428 ) (#59584 ) We were not handling the case where during a partial snapshot all shards would enter a failed state right off the bat. Closes #59384	2020-07-15 07:59:22 +02:00
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	06d94cbb2a	Fix TODO about Spurious FAILED Snapshots (#58994 ) (#59576 ) There is no point in writing out snapshots that contain no data that can be restored whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot step that wrote data to the repository that would've other become unreferenced, but in the current day state machine without the `INIT` step there is no point in doing so.	2020-07-15 00:54:30 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00
Armin Braun	16a47e0d08	Simplify SnapshotsInProgress Construction (#58893 ) (#59573 ) With parallel snapshots incoming (but also in isolation) it makes sense to clean up `SnapshotsInProgress` construction. We don't need to pre-compute the waiting shards for every entry. We rarely use this information (only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping over all shards instead of just the waiting ones once per routing change tops instead of on every change to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the current master cares about the information). In addition to that change I removed some dead code constructors and slighly optimized deserialization.	2020-07-15 00:00:53 +02:00
Martijn van Groningen	35ae3d19db	Remove data stream feature flag (#59572 ) so that it can used in the next minor release (7.9.0). Backport of #59504 to 7.x branch. Closes #53100	2020-07-14 23:50:41 +02:00
Armin Braun	68a199f75f	Minor Cleanup Dead Code Snapshotting (#57716 ) (#59569 ) * Use consistent cluster state instead in state update * Remove dead loop in tests * Remove some dead exception ctors Just three trivial/random things I found.	2020-07-14 23:13:14 +02:00
James Baiera	5f7e7e9410	[7.x] Data Stream Stats API (#58707 ) (#59566 ) This API reports on statistics important for data streams, including the number of data streams, the number of backing indices for those streams, the disk usage for each data stream, and the maximum timestamp for each data stream	2020-07-14 16:57:46 -04:00
Mark Tozzi	ed2c29f102	If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360 ) (#59567 ) Co-authored-by: Fabio Corneti <info@corneti.com>	2020-07-14 16:56:29 -04:00
Armin Braun	d456f7870a	Deduplicate Index Metadata in BlobStore (#50278 ) (#59514 ) This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot. This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time. The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`). Relates to #45736 as it improves the efficiency of snapshotting unchanged indices Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete	2020-07-14 22:18:42 +02:00
Tim Brooks	408a07f96a	Separate coordinating and primary bytes in stats (#59487 ) Currently we combine coordinating and primary bytes into a single bucket for indexing pressure stats. This makes sense for rejection logic. However, for metrics it would be useful to separate them.	2020-07-14 12:37:06 -06:00
Tim Brooks	a46e5e0f04	Increase default write queue size (#59464 ) This commit increases the default write queue size to 10000. This is to allow a greater number of pending indexing requests. This work is safe as we have added additional memory limits. Relates to #59263.	2020-07-14 10:35:25 -06:00
Tim Brooks	1a24916fef	Enable replication retries on 7.9+ (#59546 ) Currently the work to support replication retries is present on 7.9. This commit enables these retries by setting the replication timeout to 60s.	2020-07-14 10:35:05 -06:00
Dan Hermann	e54b4a729f	[7.x] Adds write_index_only option to put mapping API (#59539 )	2020-07-14 10:34:08 -05:00
Luca Cavanna	af2f85be15	Consolidate script parsing from object (7.x) (#59509 ) The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391). This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.	2020-07-14 17:08:29 +02:00
Mark Tozzi	b357c1b77a	[7.x] Fix NPE when building exception messages for aggregations (#59156 ) (#59334 )	2020-07-14 09:37:44 -04:00
Andrei Dan	7dcdaeae49	Default to @timestamp in composable template datastream definition (#59317 ) (#59516 ) This makes the data_stream timestamp field specification optional when defining a composable template. When there isn't one specified it will default to `@timestamp`. (cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 12:36:54 +01:00
Andrei Dan	4180333bbc	[7.x] Composable templates: add a default mapping for @timestamp (#59244 ) (#59510 ) This adds a low precendece mapping for the `@timestamp` field with type `date`. This will aid with the bootstrapping of data streams as a timestamp mapping can be omitted when nanos precision is not needed. (cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-14 11:29:33 +01:00
Armin Braun	0e3d87ab54	Add Assertions on CS Application in Snapshot Logic (#58681 ) (#59511 ) Relates to #58680. Bugs like that should not only show up in logs but ideally also get caught in tests. We expect to never see exceptions in these two spots.	2020-07-14 12:16:42 +02:00
Armin Braun	81e96954d0	Improve Efficiency of SnapshotsService CS Apply (#56874 ) (#59508 ) This change removes the redundant submitting of two separate cluster state updates for the node configuration changes and routing changes that affect snapshots. Since we submitted the task to deal with node configuration changes every time on master fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well as snapshots that have all their shards completed into this cluster state update task. Aside from improving efficiency overall this change has the fortunate side effect of moving all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots since it makes it very natural and straight forward to order snapshot finalizations by exploiting that they are all initiated on the same thread.	2020-07-14 11:49:09 +02:00
Tim Brooks	623df95a32	Adding indexing pressure stats to node stats API (#59467 ) We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.	2020-07-13 17:23:42 -06:00
Tim Brooks	68d56fa7db	Implement rejections in `WriteMemoryLimits` (#59451 ) This commit adds rejections when the indexing memory limits are exceeded for primary or coordinating operations. The amount of bytes allow for indexing is controlled by a new setting `indexing_limits.memory.limit`.	2020-07-13 14:34:50 -06:00
Mark Tozzi	eb0b28dd1d	Move getPointReaderOrNull into AggregatorBase (#58769 ) (#59455 )	2020-07-13 16:31:33 -04:00
Armin Braun	64c5f70a2d	Remove Needless Context Switches on Loading RepositoryData (#56935 ) (#59452 ) We don't need to switch to the generic or snapshot pool for loading cached repository data (i.e. most of the time in normal operation). This makes `executeConsistentStateUpdate` less heavy if it has to retry and lowers the chance of having to retry in the first place. Also, this change allowed simplifying a few other spots in the codebase where we would fork off to another pool just to load repository data.	2020-07-13 21:38:29 +02:00
Armin Braun	bde92fc5fc	Remove Needless Context Switch From Snapshot Finalization (#56871 ) (#59443 ) No need to do any switch to the `SNAPSHOT` pool here, the blob store repo handles all its writes async on the `SNAPSHOT` pool so we're just needlessly context-switching to enqueue those tasks there. Also cleaned up the source only repository (the only override to `finalizeSnapshot`) to make it clear that no IO is happening there and we don't need to run it on the `SNAPSHOT` pool either.	2020-07-13 20:11:07 +02:00
Armin Braun	31be3a3645	More Efficient Snapshot State Handling (#56669 ) (#59430 ) Follow up to #56365. Instead of redundantly checking snapshots for completion over and over, just track the completed snapshots in the CS updates that complete them instead of looping over the smae snapshot entries over and over. Also, in the batched snapshot shard status updates, only check for completion of a snapshot entry if it isn't already finalizing.	2020-07-13 18:58:04 +02:00
Christos Soulios	3868bcc7b8	[7.x] Histogram integration on Histogram field type (#59431 ) Backports #58930 to 7.x Implements histogram aggregation over histogram fields as requested in #53285.	2020-07-13 19:36:33 +03:00
Henning Andersen	adf6083dd0	Enhance real memory circuit breaker with G1 GC (#58674 ) (#59394 ) Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above the real memory circuit breaker limit and stays there for an extended period. This situation will persist until the next young GC. The circuit breaking itself hinders that from occurring in a timely manner since it breaks all request before real work is done. This commit gently nudges G1 to do a young GC and then double checks that heap usage is still above the real memory circuit breaker limit before throwing the circuit breaker exception. Related to #57202	2020-07-13 17:41:09 +02:00
Martijn van Groningen	b1b7bf3912	Make data streams a basic licensed feature. (#59392 ) Backport of #59293 to 7.x branch. * Create new data-stream xpack module. * Move TimestampFieldMapper to the new module, this results in storing a composable index template with data stream definition only to work with default distribution. This way data streams can only be used with default distribution, since a data stream can currently only be created if a matching composable index template exists with a data stream definition. * Renamed `_timestamp` meta field mapper to `_data_stream_timestamp` meta field mapper. * Add logic to put composable index template api to fail if `_data_stream_timestamp` meta field mapper isn't registered. So that a more understandable error is returned when attempting to store a template with data stream definition via the oss distribution. In a follow up the data stream transport and rest actions can be moved to the xpack data-stream module.	2020-07-13 17:26:46 +02:00
Alan Woodward	bd01fd107c	Revert "Migrate CompletionFieldMapper to parametrized format (#59291 )" This reverts commit `19ba6c39d2`.	2020-07-13 14:16:09 +01:00
Armin Braun	4e574a7136	Remove Dead Code from Closed Index Snapshot Logic (#56764 ) (#59398 ) The code path for closed indices is dead code here ever since #39644 because `shards(currentState, indexIds, ...)` does not set `MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned. => simplified the code accordingly.	2020-07-13 14:49:16 +02:00
David Turner	3fb9dccc22	Fix FSHealthServiceTests on Windows (#59387 ) In #52680 we introduced a new health check mechanism. This commit fixes up some related test failures on Windows caused by erroneously assuming that all paths begin with `/`. Closes #59380	2020-07-13 12:43:45 +01:00
Alan Woodward	19ba6c39d2	Migrate CompletionFieldMapper to parametrized format (#59291 ) This adds some optional extra configuration to Parameter: * custom serialization (to handle analyzers) * deprecated parameter names * parameter validation	2020-07-13 12:43:15 +01:00
Armin Braun	08b54feaaf	Remove Snapshot INIT Step (#55918 ) (#59374 ) With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine. This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.	2020-07-13 13:41:09 +02:00
Alan Woodward	c810a4a12e	Continue to accept unused 'universal' params in <8.0 indexes (#59381 ) We have a number of parameters which are universally parsed by almost all mappers, whether or not they make sense. Migrating the binary and boolean mappers to the new style of declaring their parameters explicitly has meant that these universal parameters stopped being accepted, which would break existing mappings. This commit adds some extra logic to ParametrizedFieldMapper that checks for the existence of these universal parameters, and issues a warning on 7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an error. Fixes #59359	2020-07-13 11:15:56 +01:00
David Kyle	7dcd943e1d	Mute FsHealthServiceTests testFailsHealthOnIOException (#59382 ) For #59380	2020-07-13 09:48:07 +01:00
Armin Braun	483386136d	Move all Snapshot Master Node Steps to SnapshotsService (#56365 ) (#59373 ) This refactoring has three motivations: 1. Separate all master node steps during snapshot operations from all data node steps in code. 2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service). * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons) 3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots	2020-07-12 22:19:07 +02:00
Dan Hermann	e01d73c737	[7.x] Data stream admin actions are now index-level actions	2020-07-10 14:36:18 -05:00
Stuart Tettemer	4c04fd1e05	Scripting: Unlimited compilation rate for ingest (#59268 ) * `ingest` and `processor_conditional` default to unlimited compilation rate Refs: #50152	2020-07-09 16:34:47 -05:00
Stuart Tettemer	94e213dd5f	Scripting: Per context stats in `script` in _nodes/stats (#59266 ) Updated `_nodes/stats`: * Update `script` in `_node/stats` to include stats per context: ``` "script": { "compilations": 1, "cache_evictions": 0, "compilation_limit_triggered": 0, "contexts":[ { "context": "aggregation_selector", "compilations": 0, "cache_evictions": 0, "compilation_limit_triggered": 0 }, ``` Refs: #50152 Backport: #59625	2020-07-09 15:30:50 -05:00
Alan Woodward	f4caadd239	MappedFieldType no longer requires equals/hashCode/clone (#59212 ) With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer have any cause to compare MappedFieldType instances. This means that we can remove all equals and hashCode implementations, and in addition we no longer need the clone implementations which were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes, which will be particularly useful for the runtime fields project.	2020-07-09 21:05:10 +01:00
Dan Hermann	c26d2b5fa5	Data stream support for indices shard stores API	2020-07-09 13:11:45 -05:00
Nik Everett	28ef997953	Improve vwh's distant bucket handling (#59094 ) (#59248 ) This modifies the `variable_width_histogram`'s distant bucket handling to: 1. Properly handle integer overflows 2. Recalculate the average distance when new buckets are added on the ends. This should slow down the rate at which we build extra buckets as we build more of them. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-09 12:14:46 -04:00
Przemko Robakowski	c870d6e570	[7.x] Restart tests with data streams (#58330 ) (#59303 ) * Restart tests with data streams (#58330)	2020-07-09 17:52:20 +02:00
David Turner	d56fc72ee5	Fix node health-check-related test failures (#59277 ) In #52680 we introduced a new health check mechanism. This commit fixes up some sporadic related test failures, and improves the behaviour of the `FollowersChecker` slightly in the case that no retries are configured. Closes #59252 Closes #59172	2020-07-09 12:46:12 +01:00
David Turner	c80a9e2ec2	Skip unnecessary directory iteration (#59007 ) Today `NodeEnvironment#findAllShardIds` enumerates the index directories in each data path in order to find one with a specific name. Since we already know the name of the folder we seek we can construct the path directly and avoid this directory listing. This commit does that.	2020-07-09 11:56:41 +01:00
Alan Woodward	67a27e2b9d	Add declarative parameters to FieldMappers (#58663 ) The FieldMapper infrastructure currently has a bunch of shared parameters, many of which are only applicable to a subset of the 41 mapper implementations we ship with. Merging, parsing and serialization of these parameters are spread around the class hierarchy, with much repetitive boilerplate code required. It would be much easier to reason about these things if we could declare the parameter set of each FieldMapper directly in the implementing class, and share the parsing, merging and serialization logic instead. This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it. Parameters are declared on Builder classes, with the declaration including the parameter name, whether or not it is updateable, a default value, how to parse it from mappings, and how to extract it from another mapper at merge time. Builders have a getParameters method, which returns a list of the declared parameters; this is then used for parsing, merging and serialization. Merging is achieved by constructing a new Builder from the existing Mapper, and merging in values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new, merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final. Other mappers can be gradually migrated to this new style, and once they have all been refactored we can merge ParametrizedFieldMapper and FieldMapper entirely.	2020-07-09 11:43:21 +01:00
Ignacio Vera	1ad00d1ceb	Add Support in geo_match enrichment policy for any type of geometry (#59276 ) geo_match enrichment works currently only with points. This change adds the ability to use any type of geometry.	2020-07-09 11:41:41 +02:00
Nhat Nguyen	6a0f7411e2	Do not release safe commit with CancellableThreads (#59182 ) We are leaking a FileChannel in #39585 if we release a safe commit with CancellableThreads. Although it is a bug in Lucene where we do not close a FileChannel if we failed to create a NIOFSIndexInput, I think it's safer if we release a safe commit using the generic thread pool instead. Closes #39585 Relates #45409	2020-07-08 13:51:48 -04:00
Nhat Nguyen	00c859bfca	Fix testSendSnapshotSendsOps We need to use a concurrent collection to keep track of the shipped operations as they can arrive concurrently since #58018. Relates #58018	2020-07-08 12:25:33 -04:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
Nik Everett	a29d3515a2	Improve cardinality measure used to build aggs (#56533 ) (#59107 ) This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-08 08:42:23 -04:00
Dan Hermann	90c8d3fc9d	IndexNameExpressionResolver::dataStreamNames should support exclusions	2020-07-08 07:35:52 -05:00
Armin Braun	9268b25789	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) (#59216 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 14:25:01 +02:00
Tim Brooks	3700bd1c08	Fix assertion in testCollectNodes test (#58948 ) Currently we assert that the reason we fail collecting nodes in this test is due to the fact that no seeds are available or no connections could be established to cluster_2. However, the collection could fail if we cannot establish connections to cluster_1. This commit adds that as an acceptible assertion.	2020-07-07 21:37:10 -06:00
Nhat Nguyen	ef5c397c0f	Sending operations concurrently in peer recovery (#58018 ) Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018	2020-07-07 22:03:31 -04:00
Lee Hinman	b832fe30ab	[7.x] Validate Data Streams reference a template on composable template update (#59106 ) (#59193 ) This commit adds validation that when a composable index template is updated, that the number of unreferenced data streams does not increase. While it is still possible to have data streams without a backing template (through snapshot restoration), this reduces the chance of getting in to that scenario. Relates to #53100	2020-07-07 15:38:27 -06:00
Tim Brooks	b1c3ad8f59	Fix race in RecoveryRequestTrackerTests (#59187 ) Currently in the recovery request tracker tests we place the futures into the future map on the GENERIC thread. It is possible that the test has already advanced past the point where we block on these futures before they are placed in the map. This introduces other potential failures as we expect all futures have been completed. This commit fixes the test by places the futures in the map prior to dispatching.	2020-07-07 15:10:31 -06:00
Nik Everett	d536854879	Fix test bug in auto_date_histo The test would try to prepare a `Rounding` even when there aren't any buckets. This would fail because there is no range over which to prepare the rounding. It turns out that we don't need the rounding in that case so we just use `null` then. Closes #59131	2020-07-07 15:39:48 -04:00
Andrei Dan	24c6a30e2b	[7.9] GET data stream API returns additional information (#59128 ) (#59177 ) * GET data stream API returns additional information (#59128) This adds the data stream's index template, the configured ILM policy (if any) and the health status of the data stream to the GET _data_stream response. Restoring a data stream from a snapshot could install a data stream that doesn't match any composable templates. This also makes the `template` field in the `GET _data_stream` response optional. (cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-07 20:30:09 +01:00
Nhat Nguyen	de6ac6aea6	Fix recovery stage transition with sync_id (#57754 ) If the recovery source is on an old node (before 7.2), then the recovery target won't have the safe commit after phase1 because the recovery source does not send the global checkpoint in the clean_files step. And if the recovery fails and retries, then the recovery stage won't transition properly. If a sync_id is used in peer recovery, then the clean_files step won't be executed to move the stage to TRANSLOG. Relates ##7187 Closes #57708	2020-07-07 12:00:37 -04:00
Rene Groeschke	a896df53ac	Remove misc dependency related deprecation warnings (7.x backport) (#59122 ) * Fix dependency related deprecations (#58892) * Fix classpath setup for forbiddenapi usage	2020-07-07 17:10:31 +02:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00
David Turner	46c8d00852	Remove nodes with read-only filesystems (#52680 ) (#59138 ) Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes #45286 Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>	2020-07-07 14:00:02 +01:00
Francisco Fernández Castaño	1ced3f0eb3	Extract recovery files details to its own class (#59121 ) Backport of #59039	2020-07-07 12:35:57 +02:00
Ignacio Vera	5cc6457ed8	upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091 ) (#59120 )	2020-07-07 12:07:41 +02:00
Armin Braun	d6d6df16bb	Share IT Infrastructure between Core Snapshot and SLM ITs (#59082 ) (#59119 ) For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.	2020-07-07 12:04:41 +02:00
David Turner	ef2f0d1f67	Inline no-op IndicesModule#getEngineFactories (#59051 ) This method was introduced in #31183 but it has no effect and is never overridden so this commit removes it.	2020-07-07 09:15:20 +01:00
Francisco Fernández Castaño	0752a86fe5	Enforce higher priority for RepositoriesService ClusterStateApplier (#59040 ) * Enforce higher priority for RepositoriesService ClusterStateApplier This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation. Backport of #58808	2020-07-07 09:51:08 +02:00
Howard	00ed31d000	Remove IndexShardRoutingTable#primaryAsList (#59044 )	2020-07-07 07:34:32 +01:00
Nik Everett	be13dea113	Drop a TODO from the terms aggregator (#59100 ) We did it in #56487.	2020-07-06 17:46:06 -04:00
Nik Everett	eff5f4d234	Add pipeline aggregations to the rewrite phase (backport #58878 ) (#59081 ) This allows pipeline aggregations to participate in the up-front rewrite phase for searches, in particular, it allows them to load data that they need asynchronously. Relates to #58193 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-06 15:13:45 -04:00
Nhat Nguyen	e827d2ed92	Fix testRestoreLocalHistoryFromTranslogOnPromotion (#58745 ) If the global checkpoint equals max_seq_no, then we won't reset an engine (as all operations are safe), and max_seqno_of_updates_or_deletes won't advance to max_seq_no. Closes #58163	2020-07-06 12:19:45 -04:00
Andrei Dan	2d516d7bcc	[7.x] Search all (_all, *) resolves data streams too (#58869 ) (#59058 ) Part of the original PR was merged by #59028 (cherry picked from commit 2598327726124d8a86333f79cdc45bf6a4297dbc) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-06 14:19:15 +01:00
Dan Hermann	550dcb0ca6	[7.x] Delete data stream API accepts multiple names (#59064 )	2020-07-06 08:06:10 -05:00
Armin Braun	722d94688b	Fix MinimumMasterNodesIT Test (#59054 ) (#59057 ) Tiny oversight in dee9e048bdcc5ba59f20d2554e989015463df05a caused the `otherNodes` collection to incorrectly contain `master` here.	2020-07-06 13:00:15 +02:00
Armin Braun	62eabdac6e	Dry up Snapshot ITs further (#59035 ) (#59052 ) Some more obvious cleaning up of the snapshot ITs. follow up to #58818	2020-07-06 12:26:42 +02:00
Martijn van Groningen	f0dd9b4ace	Add data stream timestamp validation via metadata field mapper (#59002 ) Backport of #58582 to 7.x branch. This commit adds a new metadata field mapper that validates, that a document has exactly a single timestamp value in the data stream timestamp field and that the timestamp field mapping only has `type`, `meta` or `format` attributes configured. Other attributes can affect the guarantee that an index with this meta field mapper has a useable timestamp field. The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever a new backing index of a data stream is created. Relates to #53100	2020-07-06 11:32:33 +02:00
Armin Braun	49857cc35d	Dry up Master Disconnect Disruption Tests (#58953 ) (#59050 ) Dry up tests that use a disruption that isolates the master from all other nodes. Also, turn disruption types that have neither parameters nor state into constants to make things a little clearer.	2020-07-06 11:04:24 +02:00
Nhat Nguyen	62763b177d	Implement toString for BulkByScrollTask (#59042 ) We should implement "toString" of BulkByScrollTask.StatusOrException to have a meaningful log message when a reindex task completes.	2020-07-05 22:06:56 -04:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
Dan Hermann	7c43cbca82	[7.x] Ignore matching data streams if include_data_streams is false (#59028 )	2020-07-03 14:51:32 -05:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
Armin Braun	d22dd437f1	Fix Two Common Zero Len Array Instantiations (#58944 ) (#58993 ) Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.	2020-07-03 09:18:14 +02:00
Nhat Nguyen	65645217bc	Handle IOException while checking translog corruption We can hit an IOException while reading a translog header after corrupting it. Relates #58866	2020-07-02 22:38:05 -04:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Mark Vieira	8fca312a3a	Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented	2020-07-02 16:58:23 -07:00
Tim Brooks	9d1bf383d0	Add test assertions to ensure write bytes released (#58970 ) This is a follow-up to #57573. This commit ensures that the bytes marked in WriteMemoryLimits are released by any test using an internal test cluster.	2020-07-02 17:38:23 -06:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Jim Ferenczi	a4e08acdd1	Fix exists query on unmapped field in query_string (#58804 ) Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped. This change also introduced a bug in the `query_string` query, using an unmapped field like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the exception if the query is built outside of an `ExistsQueryBuilder`. Closes #58737	2020-07-02 21:52:03 +02:00
Nhat Nguyen	be804b765d	Avoid flipping translog header version (#58866 ) An old translog header does not have a checksum. If we flip the header version of an empty translog to the older version, then we won't detect that corruption, and translog will be considered clean as before. Closes #58671	2020-07-02 14:34:19 -04:00
Tal Levy	d516959774	Re-enable support for array-valued geo_shape fields. (#58786 ) (#58943 ) A regression in the mapping code led to geo_shape no longer supporting array-valued fields. This commit fixes this support and adds an integration test to make sure this problem does not return!	2020-07-02 11:21:55 -07:00

1 2 3 4 5 ...

5105 Commits