OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nik Everett	28ef997953	Improve vwh's distant bucket handling (#59094 ) (#59248 ) This modifies the `variable_width_histogram`'s distant bucket handling to: 1. Properly handle integer overflows 2. Recalculate the average distance when new buckets are added on the ends. This should slow down the rate at which we build extra buckets as we build more of them. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-09 12:14:46 -04:00
Przemko Robakowski	c870d6e570	[7.x] Restart tests with data streams (#58330 ) (#59303 ) * Restart tests with data streams (#58330)	2020-07-09 17:52:20 +02:00
David Turner	d56fc72ee5	Fix node health-check-related test failures (#59277 ) In #52680 we introduced a new health check mechanism. This commit fixes up some sporadic related test failures, and improves the behaviour of the `FollowersChecker` slightly in the case that no retries are configured. Closes #59252 Closes #59172	2020-07-09 12:46:12 +01:00
David Turner	c80a9e2ec2	Skip unnecessary directory iteration (#59007 ) Today `NodeEnvironment#findAllShardIds` enumerates the index directories in each data path in order to find one with a specific name. Since we already know the name of the folder we seek we can construct the path directly and avoid this directory listing. This commit does that.	2020-07-09 11:56:41 +01:00
Alan Woodward	67a27e2b9d	Add declarative parameters to FieldMappers (#58663 ) The FieldMapper infrastructure currently has a bunch of shared parameters, many of which are only applicable to a subset of the 41 mapper implementations we ship with. Merging, parsing and serialization of these parameters are spread around the class hierarchy, with much repetitive boilerplate code required. It would be much easier to reason about these things if we could declare the parameter set of each FieldMapper directly in the implementing class, and share the parsing, merging and serialization logic instead. This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it. Parameters are declared on Builder classes, with the declaration including the parameter name, whether or not it is updateable, a default value, how to parse it from mappings, and how to extract it from another mapper at merge time. Builders have a getParameters method, which returns a list of the declared parameters; this is then used for parsing, merging and serialization. Merging is achieved by constructing a new Builder from the existing Mapper, and merging in values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new, merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final. Other mappers can be gradually migrated to this new style, and once they have all been refactored we can merge ParametrizedFieldMapper and FieldMapper entirely.	2020-07-09 11:43:21 +01:00
Ignacio Vera	1ad00d1ceb	Add Support in geo_match enrichment policy for any type of geometry (#59276 ) geo_match enrichment works currently only with points. This change adds the ability to use any type of geometry.	2020-07-09 11:41:41 +02:00
Nhat Nguyen	6a0f7411e2	Do not release safe commit with CancellableThreads (#59182 ) We are leaking a FileChannel in #39585 if we release a safe commit with CancellableThreads. Although it is a bug in Lucene where we do not close a FileChannel if we failed to create a NIOFSIndexInput, I think it's safer if we release a safe commit using the generic thread pool instead. Closes #39585 Relates #45409	2020-07-08 13:51:48 -04:00
Nhat Nguyen	00c859bfca	Fix testSendSnapshotSendsOps We need to use a concurrent collection to keep track of the shipped operations as they can arrive concurrently since #58018. Relates #58018	2020-07-08 12:25:33 -04:00
Martijn van Groningen	17bd559253	Fix the timestamp field of a data stream to @timestamp (#59210 ) Backport of #59076 to 7.x branch. The commit makes the following changes: * The timestamp field of a data stream definition in a composable index template can only be set to '@timestamp'. * Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and instead only check that the _timestamp field mapping has been defined on a backing index of a data stream. * Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method to `MetadataIndexTemplateService#collectMappings(...)` method. * Fixed a bug (#58956) that cases timestamp field validation to be performed for each template and instead of the final mappings that is created. * only apply _timestamp meta field if index is created as part of a data stream or data stream rollover, this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition. Relates to #58642 Relates to #53100 Closes #58956 Closes #58583	2020-07-08 17:30:46 +02:00
Nik Everett	a29d3515a2	Improve cardinality measure used to build aggs (#56533 ) (#59107 ) This makes a `parentCardinality` available to every `Aggregator`'s ctor so it can make intelligent choices about how it collects bucket values. This replaces `collectsFromSingleBucket` and is similar to it but: 1. It supports `NONE`, `ONE`, and `MANY` values and is generally extensible if we decide we can use more precise counts. 2. It is more accurate. `collectsFromSingleBucket` assumed that all sub-aggregations live under multi-bucket aggregations. This is normally true but `parentCardinality` is properly carried forward for single bucket aggregations like `filter` and for multi-bucket aggregations configured in single-bucket for like `range` with a single range. While I was touching every aggregation I renamed `doCreateInternal` to `createMapped` because that seemed like a much better name and it was right there, next to the change I was already making. Relates to #56487 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-08 08:42:23 -04:00
Dan Hermann	90c8d3fc9d	IndexNameExpressionResolver::dataStreamNames should support exclusions	2020-07-08 07:35:52 -05:00
Armin Braun	9268b25789	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) (#59216 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 14:25:01 +02:00
Tim Brooks	3700bd1c08	Fix assertion in testCollectNodes test (#58948 ) Currently we assert that the reason we fail collecting nodes in this test is due to the fact that no seeds are available or no connections could be established to cluster_2. However, the collection could fail if we cannot establish connections to cluster_1. This commit adds that as an acceptible assertion.	2020-07-07 21:37:10 -06:00
Nhat Nguyen	ef5c397c0f	Sending operations concurrently in peer recovery (#58018 ) Today, we send operations in phase2 of peer recoveries batch by batch sequentially. Normally that's okay as we should have a fairly small of operations in phase 2 due to the file-based threshold. However, if phase1 takes a lot of time and we are actively indexing, then phase2 can have a lot of operations to replay. With this change, we will send multiple batches concurrently (defaults to 1) to reduce the recovery time. Backport of #58018	2020-07-07 22:03:31 -04:00
Lee Hinman	b832fe30ab	[7.x] Validate Data Streams reference a template on composable template update (#59106 ) (#59193 ) This commit adds validation that when a composable index template is updated, that the number of unreferenced data streams does not increase. While it is still possible to have data streams without a backing template (through snapshot restoration), this reduces the chance of getting in to that scenario. Relates to #53100	2020-07-07 15:38:27 -06:00
Tim Brooks	b1c3ad8f59	Fix race in RecoveryRequestTrackerTests (#59187 ) Currently in the recovery request tracker tests we place the futures into the future map on the GENERIC thread. It is possible that the test has already advanced past the point where we block on these futures before they are placed in the map. This introduces other potential failures as we expect all futures have been completed. This commit fixes the test by places the futures in the map prior to dispatching.	2020-07-07 15:10:31 -06:00
Nik Everett	d536854879	Fix test bug in auto_date_histo The test would try to prepare a `Rounding` even when there aren't any buckets. This would fail because there is no range over which to prepare the rounding. It turns out that we don't need the rounding in that case so we just use `null` then. Closes #59131	2020-07-07 15:39:48 -04:00
Andrei Dan	24c6a30e2b	[7.9] GET data stream API returns additional information (#59128 ) (#59177 ) * GET data stream API returns additional information (#59128) This adds the data stream's index template, the configured ILM policy (if any) and the health status of the data stream to the GET _data_stream response. Restoring a data stream from a snapshot could install a data stream that doesn't match any composable templates. This also makes the `template` field in the `GET _data_stream` response optional. (cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-07 20:30:09 +01:00
Nhat Nguyen	de6ac6aea6	Fix recovery stage transition with sync_id (#57754 ) If the recovery source is on an old node (before 7.2), then the recovery target won't have the safe commit after phase1 because the recovery source does not send the global checkpoint in the clean_files step. And if the recovery fails and retries, then the recovery stage won't transition properly. If a sync_id is used in peer recovery, then the clean_files step won't be executed to move the stage to TRANSLOG. Relates ##7187 Closes #57708	2020-07-07 12:00:37 -04:00
Rene Groeschke	a896df53ac	Remove misc dependency related deprecation warnings (7.x backport) (#59122 ) * Fix dependency related deprecations (#58892) * Fix classpath setup for forbiddenapi usage	2020-07-07 17:10:31 +02:00
Nik Everett	eb169ae226	Fix lookup support in adjacency matrix (backport of #59099 ) (#59108 ) This request: ``` POST /_search { "aggs": { "a": { "adjacency_matrix": { "filters": { "1": { "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } } } } } } } } ``` Would fail with a 500 error and a message like: ``` { "error": { "root_cause": [ { "type": "illegal_state_exception", "reason":"async actions are left after rewrite" } ] } } ``` This fixes that by moving the query rewrite phase from a synchronous call on the data nodes into the standard aggregation rewrite phase which can properly handle the asynchronous actions.	2020-07-07 10:28:20 -04:00
David Turner	46c8d00852	Remove nodes with read-only filesystems (#52680 ) (#59138 ) Today we do not allow a node to start if its filesystem is readonly, but it is possible for a filesystem to become readonly while the node is running. We don't currently have any infrastructure in place to make sure that Elasticsearch behaves well if this happens. A node that cannot write to disk may be poisonous to the rest of the cluster. With this commit we periodically verify that nodes' filesystems are writable. If a node fails these writability checks then it is removed from the cluster and prevented from re-joining until the checks start passing again. Closes #45286 Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>	2020-07-07 14:00:02 +01:00
Francisco Fernández Castaño	1ced3f0eb3	Extract recovery files details to its own class (#59121 ) Backport of #59039	2020-07-07 12:35:57 +02:00
Ignacio Vera	5cc6457ed8	upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091 ) (#59120 )	2020-07-07 12:07:41 +02:00
Armin Braun	d6d6df16bb	Share IT Infrastructure between Core Snapshot and SLM ITs (#59082 ) (#59119 ) For #58994 it would be useful to be able to share test infrastructure. This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests accordingly and adds a shared and efficient (compared to the previous implementations) way of waiting for no running snapshot operations to the test infrastructure to dry things up further.	2020-07-07 12:04:41 +02:00
David Turner	ef2f0d1f67	Inline no-op IndicesModule#getEngineFactories (#59051 ) This method was introduced in #31183 but it has no effect and is never overridden so this commit removes it.	2020-07-07 09:15:20 +01:00
Francisco Fernández Castaño	0752a86fe5	Enforce higher priority for RepositoriesService ClusterStateApplier (#59040 ) * Enforce higher priority for RepositoriesService ClusterStateApplier This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation. Backport of #58808	2020-07-07 09:51:08 +02:00
Howard	00ed31d000	Remove IndexShardRoutingTable#primaryAsList (#59044 )	2020-07-07 07:34:32 +01:00
Nik Everett	be13dea113	Drop a TODO from the terms aggregator (#59100 ) We did it in #56487.	2020-07-06 17:46:06 -04:00
Nik Everett	eff5f4d234	Add pipeline aggregations to the rewrite phase (backport #58878 ) (#59081 ) This allows pipeline aggregations to participate in the up-front rewrite phase for searches, in particular, it allows them to load data that they need asynchronously. Relates to #58193 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-07-06 15:13:45 -04:00
Nhat Nguyen	e827d2ed92	Fix testRestoreLocalHistoryFromTranslogOnPromotion (#58745 ) If the global checkpoint equals max_seq_no, then we won't reset an engine (as all operations are safe), and max_seqno_of_updates_or_deletes won't advance to max_seq_no. Closes #58163	2020-07-06 12:19:45 -04:00
Andrei Dan	2d516d7bcc	[7.x] Search all (_all, *) resolves data streams too (#58869 ) (#59058 ) Part of the original PR was merged by #59028 (cherry picked from commit 2598327726124d8a86333f79cdc45bf6a4297dbc) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-07-06 14:19:15 +01:00
Dan Hermann	550dcb0ca6	[7.x] Delete data stream API accepts multiple names (#59064 )	2020-07-06 08:06:10 -05:00
Armin Braun	722d94688b	Fix MinimumMasterNodesIT Test (#59054 ) (#59057 ) Tiny oversight in dee9e048bdcc5ba59f20d2554e989015463df05a caused the `otherNodes` collection to incorrectly contain `master` here.	2020-07-06 13:00:15 +02:00
Armin Braun	62eabdac6e	Dry up Snapshot ITs further (#59035 ) (#59052 ) Some more obvious cleaning up of the snapshot ITs. follow up to #58818	2020-07-06 12:26:42 +02:00
Martijn van Groningen	f0dd9b4ace	Add data stream timestamp validation via metadata field mapper (#59002 ) Backport of #58582 to 7.x branch. This commit adds a new metadata field mapper that validates, that a document has exactly a single timestamp value in the data stream timestamp field and that the timestamp field mapping only has `type`, `meta` or `format` attributes configured. Other attributes can affect the guarantee that an index with this meta field mapper has a useable timestamp field. The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever a new backing index of a data stream is created. Relates to #53100	2020-07-06 11:32:33 +02:00
Armin Braun	49857cc35d	Dry up Master Disconnect Disruption Tests (#58953 ) (#59050 ) Dry up tests that use a disruption that isolates the master from all other nodes. Also, turn disruption types that have neither parameters nor state into constants to make things a little clearer.	2020-07-06 11:04:24 +02:00
Nhat Nguyen	62763b177d	Implement toString for BulkByScrollTask (#59042 ) We should implement "toString" of BulkByScrollTask.StatusOrException to have a meaningful log message when a reindex task completes.	2020-07-05 22:06:56 -04:00
Armin Braun	071d8b2c1c	Deduplicate Empty InternalAggregations (#58386 ) (#59032 ) Working through a heap dump for an unrelated issue I found that we can easily rack up tens of MBs of duplicate empty instances in some cases. I moved to a static constructor to guard against that in all cases.	2020-07-04 14:02:16 +02:00
Dan Hermann	7c43cbca82	[7.x] Ignore matching data streams if include_data_streams is false (#59028 )	2020-07-03 14:51:32 -05:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Dan Hermann	5e7746d3bd	[7.x] Mirror privileges over data streams to their backing indices (#58991 )	2020-07-03 06:33:38 -05:00
Armin Braun	d22dd437f1	Fix Two Common Zero Len Array Instantiations (#58944 ) (#58993 ) Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.	2020-07-03 09:18:14 +02:00
Nhat Nguyen	65645217bc	Handle IOException while checking translog corruption We can hit an IOException while reading a translog header after corrupting it. Relates #58866	2020-07-02 22:38:05 -04:00
Tim Brooks	dc9e364ff2	Count coordinating and primary bytes as write bytes (#58984 ) This is a follow-up to #57573. This commit combines coordinating and primary bytes under the same "write" bucket. Double accounting is prevented by only accounting the bytes at either the reroute phase or the primary phase. TransportBulkAction calls execute directly, so the operations handler is skipped and the bytes are not double accounted.	2020-07-02 19:48:19 -06:00
Mark Vieira	8fca312a3a	Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented	2020-07-02 16:58:23 -07:00
Tim Brooks	9d1bf383d0	Add test assertions to ensure write bytes released (#58970 ) This is a follow-up to #57573. This commit ensures that the bytes marked in WriteMemoryLimits are released by any test using an internal test cluster.	2020-07-02 17:38:23 -06:00
Tim Brooks	1ef2cd7f1a	Add memory tracking to queued write operations (#58957 ) Currently we do not track the memory consuming by in-process write operations. This commit adds a mechanism to track write operation memory usage.	2020-07-02 14:14:57 -06:00
Jim Ferenczi	a4e08acdd1	Fix exists query on unmapped field in query_string (#58804 ) Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped. This change also introduced a bug in the `query_string` query, using an unmapped field like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the exception if the query is built outside of an `ExistsQueryBuilder`. Closes #58737	2020-07-02 21:52:03 +02:00
Nhat Nguyen	be804b765d	Avoid flipping translog header version (#58866 ) An old translog header does not have a checksum. If we flip the header version of an empty translog to the older version, then we won't detect that corruption, and translog will be considered clean as before. Closes #58671	2020-07-02 14:34:19 -04:00

1 2 3 4 5 ...

5006 Commits