OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	7215201406	Track Shard-Snapshot Index Generation at Repository Root (#48371 ) This change adds a new field `"shards"` to `RepositoryData` that contains a mapping of `IndexId` to a `String[]`. This string array can be accessed by shard id to get the generation of a shard's shard folder (i.e. the `N` in the name of the currently valid `/indices/${indexId}/${shardId}/index-${N}` for the shard in question). This allows for creating a new snapshot in the shard without doing any LIST operations on the shard's folder. In the case of AWS S3, this saves about 1/3 of the cost for updating an empty shard (see #45736) and removes one out of two remaining potential issues with eventually consistent blob stores (see #38941 ... now only the root `index-${N}` is determined by listing). Also and equally if not more important, a number of possible failure modes on eventually consistent blob stores like AWS S3 are eliminated by moving all delete operations to the `master` node and moving from incremental naming of shard level index-N to uuid suffixes for these blobs. This change moves the deleting of the previous shard level `index-${uuid}` blob to the master node instead of the data node allowing for a safe and consistent update of the shard's generation in the `RepositoryData` by first updating `RepositoryData` and then deleting the now unreferenced `index-${newUUID}` blob. __No deletes are executed on the data nodes at all for any operation with this change.__ Note also: Previous issues with hanging data nodes interfering with master nodes are completely impossible, even on S3 (see next section for details). This change changes the naming of the shard level `index-${N}` blobs to a uuid suffix `index-${UUID}`. The reason for this is the fact that writing a new shard-level `index-` generation blob is not atomic anymore in its effect. Not only does the blob have to be written to have an effect, it must also be referenced by the root level `index-N` (`RepositoryData`) to become an effective part of the snapshot repository. This leads to a problem if we were to use incrementing names like we did before. If a blob `index-${N+1}` is written but due to the node/network/cluster/... crashes the root level `RepositoryData` has not been updated then a future operation will determine the shard's generation to be `N` and try to write a new `index-${N+1}` to the already existing path. Updates like that are problematic on S3 for consistency reasons, but also create numerous issues when thinking about stuck data nodes. Previously stuck data nodes that were tasked to write `index-${N+1}` but got stuck and tried to do so after some other node had already written `index-${N+1}` were prevented form doing so (except for on S3) by us not allowing overwrites for that blob and thus no corruption could occur. Were we to continue using incrementing names, we could not do this. The stuck node scenario would either allow for overwriting the `N+1` generation or force us to continue using a `LIST` operation to figure out the next `N` (which would make this change pointless). With uuid naming and moving all deletes to `master` this becomes a non-issue. Data nodes write updated shard generation `index-${uuid}` and `master` makes those `index-${uuid}` part of the `RepositoryData` that it deems correct and cleans up all those `index-` that are unused. Co-authored-by: Yannick Welsch <yannick@welsch.lu> Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>	2019-10-23 10:58:26 +01:00
Jim Ferenczi	50f565b158	SearchSlowLog uses a non thread-safe object to escape json (#48363 ) This commit fixes the usage of JsonStringEncoder#quoteAsUTF8 in the SearchSlowLog. JsonStringEncoder#getInstance should always be called to get a thread local object but this assumption was broken by #44642. This means that any slow log can throw an AIOOBE since it uses the same byte array concurrently. Closes #48358	2019-10-23 10:23:06 +02:00
Armin Braun	8a02a5fc7d	Simplify Shard Snapshot Upload Code (#48155 ) (#48345 ) The code here was needlessly complicated when it enqueued all file uploads up-front. Instead, we can go with a cleaner worker + queue pattern here by taking the max-parallelism from the threadpool info. Also, I slightly simplified the rethrow and listener (step listener is pointless when you add the callback in the next line) handling it since I noticed that we were needlessly rethrowing in the same code and that wasn't worth a separate PR.	2019-10-22 17:17:09 +01:00
Nhat Nguyen	d0a4bad95b	Use MultiFileTransfer in CCR remote recovery (#44514 ) Relates #44468	2019-10-21 23:30:52 -04:00
Armin Braun	e65c60915a	Cleanup FileRestoreContext Abstractions (#48173 ) (#48300 ) This class is only used by the blob store repository and CCR and the abstractions didn't really make sense with CCR ignoring the concrete `restoreFiles` method completely and having a method used only by the blobstore overriden as unsupported. => Moved to a more fitting set of abstractions => Dried up the stream wrapping in `BlobStoreRepository` a little now that the `restoreFile` method could be simplified Relates #48110 as it makes changing the API of `FileRestoreContext` to what is needed for async restores simpler	2019-10-21 17:30:35 +02:00
Armin Braun	dc08feadc6	Remove Redundant Version Param from Repository APIs (#48231 ) (#48298 ) This parameter isn't used by any implementation	2019-10-21 16:20:45 +02:00
David Turner	672b2a92ca	Fix compile error from previous commit (#48230 ) The previous commit, `3a6fa0bbdb` introduces a compile error that was fixed locally but not committed. This commit adds the missing change.	2019-10-21 08:54:04 +01:00
David Turner	3a6fa0bbdb	Close query cache on index service creation failure (#48230 ) Today it is possible that we create the `QueryCache` and then fail to create the owning `IndexService` and this means we do not close the `QueryCache` again. This commit addresses that leak. Fixes #48186	2019-10-21 08:46:53 +01:00
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Takuya Kajiwara	a56daeae2d	[DOCS] Fix typos in InternalEngine.java comments (#46861 )	2019-10-18 10:36:58 -04:00
David Turner	a8bcbbc38a	Quieter logging from the DiskThresholdMonitor (#48115 ) Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes #48038	2019-10-18 15:00:14 +01:00
Armin Braun	1157775074	Remove Support for pre-5.x Indices in Restore (#48181 ) (#48199 ) The logic for handling empty segment files has been unnecessary ever since #24021 which removes the support for these files in 6.x -> we can safely remove the support for restoring these from 7.x+ to simplify the code.	2019-10-18 09:45:07 +02:00
Przemyslaw Gomulka	02d18f5c1e	[7.x] Slow log must use separate underlying logger for each index BACKPORT(#47234 ) (#48176 ) * Slow log must use separate underlying logger for each index (#47234) SlowLog instances should not share the same underlying logger, as it would cause different indexes override each other levels. When creating underlying logger, unique per index identifier should be used. Name + IndexSettings.UUID Closes #42432	2019-10-17 20:04:57 +02:00
Armin Braun	04e3316408	Stop Resolving Fallback IndexId (#48141 ) (#48204 ) There is no reason to still resolve the fallback `IndexId` here. It only applies to `2.x` repos and those we can't read anymore anyway because they use an `/index` instead of an `/index-N` blob at the repo root for which at least 7.x+ does not contain the logic to find it.	2019-10-17 19:27:49 +02:00
Stuart Tettemer	356eef00c8	Scripting: get context names REST API (#48026 ) (#48168 ) Adds `GET /_script_context`, returning a `contexts` object with each available context as a key whose value is an empty object. eg. ``` { "contexts": { "aggregation_selector": {}, "aggs": {}, "aggs_combine": {}, ... } } ``` refs: #47411	2019-10-17 09:08:55 -06:00
Armin Braun	0ca7cc1848	Safely Close Repositories on Node Shutdown (#48020 ) (#48107 ) We were not closing repositories on Node shutdown. In production, this has little effect but in tests shutting down a node using `MockRepository` and is currently stuck in a simulated blocked-IO situation will only unblock when the node's threadpool is interrupted. This might in some edge cases (many snapshot threads and some CI slowness) result in the execution taking longer than 5s to release all the shard stores and thus we fail the assertion about unreleased shard stores in the internal test cluster. Regardless of tests, I think we should close repositories and release resources associated with them when closing a node and not just when removing a repository from the CS with running nodes as this behavior is really unexpected. Fixes #47689	2019-10-17 07:55:05 +02:00
Armin Braun	f1bc3a0753	Remove TestLogging for #46701 (#48156 ) (#48160 ) This hasn't failed in 5 weeks now. Removing the test logging and closing the issue. Closes #46701	2019-10-17 07:54:20 +02:00
Jack Conradson	fa99721295	Drop stored scripts with the old style-id (#48078 ) This PR fixes (#47593). Stored scripts with the old-style id of lang#id are saved through the upgrade process but are no longer accessible in recent versions. This fix will drop those scripts altogether since there is no way for a user to access them.	2019-10-16 16:10:31 -07:00
jimczi	b2dc98562b	Bump version to 7.6	2019-10-16 15:57:12 +02:00
Klemen Košir	8243e99134	Fix typo in QueryBuilders Javadoc. (#47362 ) This PR fixes a typo in the Javadoc for terms queries in QueryBuilders.	2019-10-15 16:16:21 -07:00
Martijn van Groningen	aff0c9babc	This commits merges (#48040 ) the enrich-7.x feature branch, which is backport merge and adds a new ingest processor, named enrich processor, that allows document being ingested to be enriched with data from other indices. Besides a new enrich processor, this PR adds several APIs to manage an enrich policy. An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner. Related to #32789	2019-10-15 17:31:45 +02:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Martijn van Groningen	cc4b6c43b3	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-15 07:23:47 +02:00
Jim Ferenczi	ef02a736ca	Don't apply the plugin's reader wrapper in can_match phase (#47816 ) This change modifies the local execution of the `can_match` phase to not apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes #46817	2019-10-14 13:07:05 +02:00
Martijn van Groningen	d4901a71d7	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-14 10:27:17 +02:00
Nhat Nguyen	8180cf1e68	Mute testDoNotInfinitelyWaitForMapping Tracked at #47974	2019-10-13 22:06:50 -04:00
Nhat Nguyen	2995d4a9c0	Sequence number based replica allocation (#46959 ) With this change, shard allocation prefers allocating replicas on a node that already has a copy of the shard that is as close as possible to the primary, so that it is as cheap as possible to bring the new replica in sync with the primary. Furthermore, if we find a copy that is identical to the primary then we cancel an ongoing recovery because the new copy which is identical to the primary needs no work to recover as a replica. We no longer need to perform a synced flush before performing a rolling upgrade or full cluster start with this improvement. Closes #46318	2019-10-13 22:06:50 -04:00
Nhat Nguyen	4f06225928	Avoid unneeded refresh with concurrent realtime gets (#47895 ) This change should reduce refreshes for a use-case where we perform multiple realtime gets at the same time on an active index. Currently, we only call refresh if the index operation is still on the versionMap. However, at the time we call refresh, that operation might be already or will be included in the latest reader. Hence, we do not need to refresh. Adding another lock here is not an issue as the refresh is already sequential.	2019-10-13 20:08:21 -04:00
Nhat Nguyen	4c1bb210cb	Force flush in translog retention policy test (#47879 ) If we roll translog but do not index, then a flush without force is a noop. In this case, the number of retained translog files will be higher than the value specified by the retention policy. Closes #4741	2019-10-13 20:08:21 -04:00
Przemyslaw Gomulka	6ab58de7ef	[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675 ) (#47913 ) Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from Within DateFormatters we use the correct uuuu year instead of yyyy year of era worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.	2019-10-11 21:19:56 +02:00
Christoph Büscher	2ef12c37f5	Add builder for distance_feature to QueryBuilders (#47846 ) The QueryBuilders convenience class is currently missing a shortcut to construct a DistanceFeatureQueryBuilder, which is added here. Closes #47767	2019-10-11 18:20:01 +02:00
Alan Woodward	ec9198d0e2	Adjust Version.V_6_8_4 to refer to Lucene 7.7.2 (#47926 ) 6.8.4 will ship with Lucene 7.7.2, so we need to change our version settings to reflect this. Relates #47901	2019-10-11 17:01:42 +01:00
David Turner	ba62eb3dce	Allow truncation of clean translog (#47866 ) Today the `elasticsearch-shard remove-corrupted-data` tool will only truncate a translog it determines to be corrupt. However there may be other cases in which it is desirable to truncate the translog, for instance if an operation in the translog cannot be replayed for some reason other than corruption. This commit adds a `--truncate-clean-translog` option to skip the corruption check on the translog and blindly truncate it.	2019-10-11 15:48:12 +01:00
Henning Andersen	a0d0866f59	Shrink should not touch max_retries (#47719 ) Shrink would set `max_retries=1` in order to avoid retrying. This however sticks to the shrunk index afterwards, causing issues when a shard copy later fails to allocate just once. Avoiding a retry of a shrink makes sense since there is no new node to allocate to and a retry will likely fail again. However, the downside of having max_retries=1 afterwards outweigh the benefit of not retrying the failed shrink a few times. This change ensures shrink no longer sets max_retries and also makes all resize operations (shrink, clone, split) leave the setting at default value rather than copy it from source.	2019-10-11 14:22:56 +02:00
Przemyslaw Gomulka	0c439fe495	[7.x] Allow partial parsing dates (#47872 ) backport(#46814 ) Enable partial parsing of date part. This is making the behaviour in java.time implementation the same as with joda. 2018, 2018-01 and 2018-01-01 are all valid dates for date_optional_time or strict_date_optional_time closes #45284 closes #47473	2019-10-11 11:17:19 +02:00
Zachary Tong	2de3411c9c	Make sibling pipeline agg ctor's protected (#42808 ) SiblingPipelineAggregator is a public interfaces, but the ctor was package-private. These should be protected so that plugin authors can extend and implement their own sibling pipeline agg.	2019-10-10 12:31:14 -04:00
Martijn van Groningen	102016d571	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-10 14:44:05 +02:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Jim Ferenczi	3d334a262b	Ensure that we don't call listener twice when detecting a partial failure in _search (#47694 ) This change fixes a bug that can occur when a shard failure is detected while we build the search response and accept partial failures in set to false. In this case we currently call onFailure on the provided listener but also continue the search as if the failure didn't occur. This can lead to a listener called twice, once with onFailure and once with onSuccess which is forbidden by design.	2019-10-10 09:59:49 +02:00
dengweisysu	dc4224fbdf	Sync translog without lock before trim unreferenced readers (#47790 ) This commit is similar to the optimization made in #45765. With this change, we fsync most of the data of the current generation without holding writeLock when trimming unreferenced readers. Relates #45765	2019-10-09 17:56:30 -04:00
Armin Braun	302e09decf	Simplify some Common ActionRunnable Uses (#47799 ) (#47828 ) Especially in the snapshot code there's a lot of logic chaining `ActionRunnables` in tricky ways now and the code is getting hard to follow. This change introduces two convinience methods that make it clear that a wrapped listener is invoked with certainty in some trickier spots and shortens the code a bit.	2019-10-09 23:29:50 +02:00
Igor Motov	12e4e7ef54	Geo: implement proper handling of out of bounds geo points (#47734 ) This is the first iteration in improving of handling of out of bounds geopoints with a latitude outside of the -90 - +90 range and a longitude outside of the -180 - +180 range. Relates to #43916	2019-10-09 20:30:59 +04:00
Igor Motov	f8b8afdc70	Geo: Fixes indexing of linestrings that go around the globe (#47471 ) LINESTRING (0 0, 720 20) is now decomposed into 3 strings: multilinestring ( (0.0 0.0, 180.0 5.0), (-180.0 5.0, 180 15), (-180.0 15.0, 0 20) ) It also fixes issues with linestrings that intersect antimeridian more than 5 times. Fixes #43837 Fixes #43826	2019-10-09 20:30:59 +04:00
Tim Brooks	d18ff24dbe	Fix BulkByScrollResponseTests exception assertions (#45519 ) Currently in the x content serialization tests we compare the exception messages that are serialized. These exceptions messages are not equivalent because the exception often changes when serialized to x content. This commit removes this assertion.	2019-10-09 10:15:58 -06:00
Tim Brooks	02622c1ef9	Fix issues with serializing BulkByScrollResponse (#45357 ) Currently there are two issues with serializing BulkByScrollResponse. First, when deserializing from XContent, indexing exceptions and search exceptions are switched. Additionally, search exceptions do no retain the appropriate RestStatus code, so you must evaluate the status code from the exception. However, the exception class is not always correctly retained when serialized. This commit adds tests in the failure case. Additionally, fixes the swapping of failure types and adds the rest status code to the search failure.	2019-10-09 10:12:14 -06:00
Martijn van Groningen	da1e2ea461	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-09 09:06:13 +02:00
Armin Braun	96b36b5a8c	Make loadShardSnapshot Exceptions Consistent (#47728 ) (#47735 ) Similar to #47507. We are throwing `SnapshotException` when you (and SLM tests) would expect a `SnapshotMissingException` for concurrent snapshot status and snapshot delete operations with a very low probability. Fixed the exception type and added a test for this scenario.	2019-10-08 21:04:51 +02:00
Armin Braun	5cef4752f7	Fix Ex. Handling in SnapshotsService#snapshots (#47507 ) (#47727 ) We're needlessly wrapping a `SnapshotMissingException` which itself is a `SnapshotException` when trying to load a missing snapshot. This leads to failure #47442 which expects a `SnapshotMissingException` in this case. Closes #47442	2019-10-08 17:01:54 +02:00
Henning Andersen	ce91ba7c25	Dangling indices strip aliases (#47581 ) Importing dangling indices with aliases risks breaking functionalities using those aliases. For instance, writing to an alias may break if there is no is_write_index indication on the existing alias and the dangling index import adds a second index to the alias. Or an application could have an assumption about the alias only ever pointing to one index and suddenly seeing the alias also linked to an old index could break it. With this change we strip aliases of the index meta data found before importing a dangling index.	2019-10-08 12:09:30 +02:00
David Turner	bb5f750ab4	Deprecate include_relocations setting (#47443 ) Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This commit deprecates this setting so it can be removed in the next major release.	2019-10-08 08:19:04 +01:00

1 2 3 4 5 ...

3787 Commits