OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Takuya Kajiwara	a56daeae2d	[DOCS] Fix typos in InternalEngine.java comments (#46861 )	2019-10-18 10:36:58 -04:00
David Turner	a8bcbbc38a	Quieter logging from the DiskThresholdMonitor (#48115 ) Today if an Elasticsearch node reaches a disk watermark then it will repeatedly emit logging about it, which implies that some action needs to be taken by the administrator. This is misleading. Elasticsearch strives to keep nodes under the high watermark, but it is normal to have a few nodes occasionally exceed this level. Nodes may be over the low watermark for an extended period without any ill effects. This commit enhances the logging emitted by the `DiskThresholdMonitor` to be less misleading. The expected case of hitting the high watermark and immediately relocating one or more shards that to bring the node back under the watermark again is reduced in severity to `INFO`. Additionally, `INFO` messages are not emitted repeatedly. Fixes #48038	2019-10-18 15:00:14 +01:00
Armin Braun	1157775074	Remove Support for pre-5.x Indices in Restore (#48181 ) (#48199 ) The logic for handling empty segment files has been unnecessary ever since #24021 which removes the support for these files in 6.x -> we can safely remove the support for restoring these from 7.x+ to simplify the code.	2019-10-18 09:45:07 +02:00
Przemyslaw Gomulka	02d18f5c1e	[7.x] Slow log must use separate underlying logger for each index BACKPORT(#47234 ) (#48176 ) * Slow log must use separate underlying logger for each index (#47234) SlowLog instances should not share the same underlying logger, as it would cause different indexes override each other levels. When creating underlying logger, unique per index identifier should be used. Name + IndexSettings.UUID Closes #42432	2019-10-17 20:04:57 +02:00
Armin Braun	04e3316408	Stop Resolving Fallback IndexId (#48141 ) (#48204 ) There is no reason to still resolve the fallback `IndexId` here. It only applies to `2.x` repos and those we can't read anymore anyway because they use an `/index` instead of an `/index-N` blob at the repo root for which at least 7.x+ does not contain the logic to find it.	2019-10-17 19:27:49 +02:00
Stuart Tettemer	356eef00c8	Scripting: get context names REST API (#48026 ) (#48168 ) Adds `GET /_script_context`, returning a `contexts` object with each available context as a key whose value is an empty object. eg. ``` { "contexts": { "aggregation_selector": {}, "aggs": {}, "aggs_combine": {}, ... } } ``` refs: #47411	2019-10-17 09:08:55 -06:00
Armin Braun	0ca7cc1848	Safely Close Repositories on Node Shutdown (#48020 ) (#48107 ) We were not closing repositories on Node shutdown. In production, this has little effect but in tests shutting down a node using `MockRepository` and is currently stuck in a simulated blocked-IO situation will only unblock when the node's threadpool is interrupted. This might in some edge cases (many snapshot threads and some CI slowness) result in the execution taking longer than 5s to release all the shard stores and thus we fail the assertion about unreleased shard stores in the internal test cluster. Regardless of tests, I think we should close repositories and release resources associated with them when closing a node and not just when removing a repository from the CS with running nodes as this behavior is really unexpected. Fixes #47689	2019-10-17 07:55:05 +02:00
Armin Braun	f1bc3a0753	Remove TestLogging for #46701 (#48156 ) (#48160 ) This hasn't failed in 5 weeks now. Removing the test logging and closing the issue. Closes #46701	2019-10-17 07:54:20 +02:00
Jack Conradson	fa99721295	Drop stored scripts with the old style-id (#48078 ) This PR fixes (#47593). Stored scripts with the old-style id of lang#id are saved through the upgrade process but are no longer accessible in recent versions. This fix will drop those scripts altogether since there is no way for a user to access them.	2019-10-16 16:10:31 -07:00
jimczi	b2dc98562b	Bump version to 7.6	2019-10-16 15:57:12 +02:00
Klemen Košir	8243e99134	Fix typo in QueryBuilders Javadoc. (#47362 ) This PR fixes a typo in the Javadoc for terms queries in QueryBuilders.	2019-10-15 16:16:21 -07:00
Martijn van Groningen	aff0c9babc	This commits merges (#48040 ) the enrich-7.x feature branch, which is backport merge and adds a new ingest processor, named enrich processor, that allows document being ingested to be enriched with data from other indices. Besides a new enrich processor, this PR adds several APIs to manage an enrich policy. An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner. Related to #32789	2019-10-15 17:31:45 +02:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Martijn van Groningen	cc4b6c43b3	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-15 07:23:47 +02:00
Jim Ferenczi	ef02a736ca	Don't apply the plugin's reader wrapper in can_match phase (#47816 ) This change modifies the local execution of the `can_match` phase to not apply the plugin's reader wrapper (if it is configured) when acquiring the searcher. We must ensure that the phase runs quickly and since we don't know the cost of applying the wrapper it is preferable to avoid it entirely. The can_match phase can aford false positives so it is also safe for the builtin plugins that use this functionality. Closes #46817	2019-10-14 13:07:05 +02:00
Martijn van Groningen	d4901a71d7	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-14 10:27:17 +02:00
Nhat Nguyen	8180cf1e68	Mute testDoNotInfinitelyWaitForMapping Tracked at #47974	2019-10-13 22:06:50 -04:00
Nhat Nguyen	2995d4a9c0	Sequence number based replica allocation (#46959 ) With this change, shard allocation prefers allocating replicas on a node that already has a copy of the shard that is as close as possible to the primary, so that it is as cheap as possible to bring the new replica in sync with the primary. Furthermore, if we find a copy that is identical to the primary then we cancel an ongoing recovery because the new copy which is identical to the primary needs no work to recover as a replica. We no longer need to perform a synced flush before performing a rolling upgrade or full cluster start with this improvement. Closes #46318	2019-10-13 22:06:50 -04:00
Nhat Nguyen	4f06225928	Avoid unneeded refresh with concurrent realtime gets (#47895 ) This change should reduce refreshes for a use-case where we perform multiple realtime gets at the same time on an active index. Currently, we only call refresh if the index operation is still on the versionMap. However, at the time we call refresh, that operation might be already or will be included in the latest reader. Hence, we do not need to refresh. Adding another lock here is not an issue as the refresh is already sequential.	2019-10-13 20:08:21 -04:00
Nhat Nguyen	4c1bb210cb	Force flush in translog retention policy test (#47879 ) If we roll translog but do not index, then a flush without force is a noop. In this case, the number of retained translog files will be higher than the value specified by the retention policy. Closes #4741	2019-10-13 20:08:21 -04:00
Przemyslaw Gomulka	6ab58de7ef	[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675 ) (#47913 ) Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from Within DateFormatters we use the correct uuuu year instead of yyyy year of era worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.	2019-10-11 21:19:56 +02:00
Christoph Büscher	2ef12c37f5	Add builder for distance_feature to QueryBuilders (#47846 ) The QueryBuilders convenience class is currently missing a shortcut to construct a DistanceFeatureQueryBuilder, which is added here. Closes #47767	2019-10-11 18:20:01 +02:00
Alan Woodward	ec9198d0e2	Adjust Version.V_6_8_4 to refer to Lucene 7.7.2 (#47926 ) 6.8.4 will ship with Lucene 7.7.2, so we need to change our version settings to reflect this. Relates #47901	2019-10-11 17:01:42 +01:00
David Turner	ba62eb3dce	Allow truncation of clean translog (#47866 ) Today the `elasticsearch-shard remove-corrupted-data` tool will only truncate a translog it determines to be corrupt. However there may be other cases in which it is desirable to truncate the translog, for instance if an operation in the translog cannot be replayed for some reason other than corruption. This commit adds a `--truncate-clean-translog` option to skip the corruption check on the translog and blindly truncate it.	2019-10-11 15:48:12 +01:00
Henning Andersen	a0d0866f59	Shrink should not touch max_retries (#47719 ) Shrink would set `max_retries=1` in order to avoid retrying. This however sticks to the shrunk index afterwards, causing issues when a shard copy later fails to allocate just once. Avoiding a retry of a shrink makes sense since there is no new node to allocate to and a retry will likely fail again. However, the downside of having max_retries=1 afterwards outweigh the benefit of not retrying the failed shrink a few times. This change ensures shrink no longer sets max_retries and also makes all resize operations (shrink, clone, split) leave the setting at default value rather than copy it from source.	2019-10-11 14:22:56 +02:00
Przemyslaw Gomulka	0c439fe495	[7.x] Allow partial parsing dates (#47872 ) backport(#46814 ) Enable partial parsing of date part. This is making the behaviour in java.time implementation the same as with joda. 2018, 2018-01 and 2018-01-01 are all valid dates for date_optional_time or strict_date_optional_time closes #45284 closes #47473	2019-10-11 11:17:19 +02:00
Zachary Tong	2de3411c9c	Make sibling pipeline agg ctor's protected (#42808 ) SiblingPipelineAggregator is a public interfaces, but the ctor was package-private. These should be protected so that plugin authors can extend and implement their own sibling pipeline agg.	2019-10-10 12:31:14 -04:00
Martijn van Groningen	102016d571	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-10 14:44:05 +02:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Jim Ferenczi	3d334a262b	Ensure that we don't call listener twice when detecting a partial failure in _search (#47694 ) This change fixes a bug that can occur when a shard failure is detected while we build the search response and accept partial failures in set to false. In this case we currently call onFailure on the provided listener but also continue the search as if the failure didn't occur. This can lead to a listener called twice, once with onFailure and once with onSuccess which is forbidden by design.	2019-10-10 09:59:49 +02:00
dengweisysu	dc4224fbdf	Sync translog without lock before trim unreferenced readers (#47790 ) This commit is similar to the optimization made in #45765. With this change, we fsync most of the data of the current generation without holding writeLock when trimming unreferenced readers. Relates #45765	2019-10-09 17:56:30 -04:00
Armin Braun	302e09decf	Simplify some Common ActionRunnable Uses (#47799 ) (#47828 ) Especially in the snapshot code there's a lot of logic chaining `ActionRunnables` in tricky ways now and the code is getting hard to follow. This change introduces two convinience methods that make it clear that a wrapped listener is invoked with certainty in some trickier spots and shortens the code a bit.	2019-10-09 23:29:50 +02:00
Igor Motov	12e4e7ef54	Geo: implement proper handling of out of bounds geo points (#47734 ) This is the first iteration in improving of handling of out of bounds geopoints with a latitude outside of the -90 - +90 range and a longitude outside of the -180 - +180 range. Relates to #43916	2019-10-09 20:30:59 +04:00
Igor Motov	f8b8afdc70	Geo: Fixes indexing of linestrings that go around the globe (#47471 ) LINESTRING (0 0, 720 20) is now decomposed into 3 strings: multilinestring ( (0.0 0.0, 180.0 5.0), (-180.0 5.0, 180 15), (-180.0 15.0, 0 20) ) It also fixes issues with linestrings that intersect antimeridian more than 5 times. Fixes #43837 Fixes #43826	2019-10-09 20:30:59 +04:00
Tim Brooks	d18ff24dbe	Fix BulkByScrollResponseTests exception assertions (#45519 ) Currently in the x content serialization tests we compare the exception messages that are serialized. These exceptions messages are not equivalent because the exception often changes when serialized to x content. This commit removes this assertion.	2019-10-09 10:15:58 -06:00
Tim Brooks	02622c1ef9	Fix issues with serializing BulkByScrollResponse (#45357 ) Currently there are two issues with serializing BulkByScrollResponse. First, when deserializing from XContent, indexing exceptions and search exceptions are switched. Additionally, search exceptions do no retain the appropriate RestStatus code, so you must evaluate the status code from the exception. However, the exception class is not always correctly retained when serialized. This commit adds tests in the failure case. Additionally, fixes the swapping of failure types and adds the rest status code to the search failure.	2019-10-09 10:12:14 -06:00
Martijn van Groningen	da1e2ea461	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-09 09:06:13 +02:00
Armin Braun	96b36b5a8c	Make loadShardSnapshot Exceptions Consistent (#47728 ) (#47735 ) Similar to #47507. We are throwing `SnapshotException` when you (and SLM tests) would expect a `SnapshotMissingException` for concurrent snapshot status and snapshot delete operations with a very low probability. Fixed the exception type and added a test for this scenario.	2019-10-08 21:04:51 +02:00
Armin Braun	5cef4752f7	Fix Ex. Handling in SnapshotsService#snapshots (#47507 ) (#47727 ) We're needlessly wrapping a `SnapshotMissingException` which itself is a `SnapshotException` when trying to load a missing snapshot. This leads to failure #47442 which expects a `SnapshotMissingException` in this case. Closes #47442	2019-10-08 17:01:54 +02:00
Henning Andersen	ce91ba7c25	Dangling indices strip aliases (#47581 ) Importing dangling indices with aliases risks breaking functionalities using those aliases. For instance, writing to an alias may break if there is no is_write_index indication on the existing alias and the dangling index import adds a second index to the alias. Or an application could have an assumption about the alias only ever pointing to one index and suddenly seeing the alias also linked to an old index could break it. With this change we strip aliases of the index meta data found before importing a dangling index.	2019-10-08 12:09:30 +02:00
David Turner	bb5f750ab4	Deprecate include_relocations setting (#47443 ) Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a bad idea since it will lead to the kinds of overshoot that were otherwise fixed in #46079. This commit deprecates this setting so it can be removed in the next major release.	2019-10-08 08:19:04 +01:00
Tal Levy	a17f394e27	Geo-Match Enrich Processor (#47243 ) (#47701 ) this commit introduces a geo-match enrich processor that looks up a specific `geo_point` field in the enrich-index for all entries that have a geo_shape match field that meets some specific relation criteria with the input field. For example, the enrich index may contain documents with zipcodes and their respective geo_shape. Ingesting documents with a geo_point field can be enriched with which zipcode they associate according to which shape they are contained within. this commit also refactors some of the MatchProcessor by moving a lot of the shared code to AbstractEnrichProcessor. Closes #42639.	2019-10-07 15:03:46 -07:00
Armin Braun	b669b8f046	Simplify Snapshot Delete Further (#47626 ) (#47644 ) This change removes the special path for deleting the index metadata blobs and moves deleting them to the bulk delete of unreferenced blobs at the end of the snapshot delete process. This saves N RPC calls for a snapshot containing N indices and simplifies the code. Also, this change moves the unreferenced data cleanup up the stack to make it more obvious that any exceptions during this pahse will be ignored and not fail the delete request. Lastly, this change removes the needless chaining of first deleting unreferenced data from the snapshot delete and then running the stale data cleanup (that would also run from the cleanup endpoint) and simply fires off the cleanup right after updating the repository data (index-N) in parallel to the other delete operations to speed up the delete some more.	2019-10-07 14:18:41 +02:00
Armin Braun	1359ef73a3	Add IT for Snapshot Issue in 47552 (#47627 ) (#47634 ) * Add IT for Snapshot Issue in 47552 (#47627) Adding a specific integration test that reproduces the problem fixed in #47552. The issue fixed only reproduces in the snapshot resiliency otherwise which are not available in 6.8 where the fix is being backported to as well.	2019-10-07 10:38:19 +02:00
Armin Braun	6bd033931b	Add Consistency Assertion to SnapshotsInProgress (#47598 ) (#47633 ) Assert given input shards and indices are consistent. Also, fixed the equality check for SnapshotsInProgress. Before this change the tests never had more than a single waiting shard per index so they never failed as a result of the waiting shards list not being ordered. Follow up to #47552	2019-10-07 10:37:56 +02:00
Luca Cavanna	736fceb18b	Fold InitialSearchPhase into AbstractSearchAsyncAction (#47182 ) Historically, we have two base classes for search actions that generally need to fan out to multiple shards and then move on to the following phase: InitialSearchPhase and AbstractSearchAsyncAction that extends it. Practically, every search action extends the latter, and there are no direct subclasses of InitialSearchPhase in our codebase. This commit folds InitialSearchPhase into AbstractSearchAsyncAction in the attempt of simplifying things and making the search code running on the coordinating node easier to reason about.	2019-10-07 10:10:04 +02:00
Martijn van Groningen	f2f2304c75	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-10-07 10:07:56 +02:00
Armin Braun	22679c7932	Fix Snapshot Corruption in Edge Case (#47552 ) (#47620 ) This fixes missing to marking shard snapshots as failures when multiple data-nodes are lost during the snapshot process or shard snapshot failures have occured before a node left the cluster. The problem was that we were simply not adding any shard entries for completed shards on node-left events. This has no effect for a successful shard, but for a failed shard would lead to that shard not being marked as failed during snapshot finalization. Fixed by corectly keeping track of all previous completed shard states as well in this case. Also, added an assertion that without this fix would trip on almost every run of the resiliency tests and adjusted the serialization of SnapshotsInProgress.Entry so we have a proper assertion message. Closes #47550	2019-10-05 15:01:06 +02:00
Armin Braun	f2d2ca21e2	Cleaner Handling of Store Refcount in BlobStoreRepository (#47560 ) (#47594 ) If a shard gets closed we properly abort its snapshot before closing it. We should in thise case make sure to not throw a confusing exception about trying to increment the reference on an already closed shard in the async tasks if the snapshot is already aborted. Also, added an assertion to make sure that aborts are in fact the only situation in which we run into a concurrently closed store.	2019-10-05 09:45:10 +02:00
Gordon Brown	e47bdf760e	Fix Rollover error when alias has closed indices (#47148 ) (#47539 ) Rollover previously requested index stats for all indices in the provided alias, which causes an exception when there is a closed index with that alias. This commit adjusts the IndicesOptions used on the index stats request so that closed indices are ignored, rather than throwing an exception.	2019-10-04 17:40:05 -06:00
Jason Tedor	35ca3d68d7	Validating monitoring hosts setting while parsing (#47571 ) This commit lifts the validation of the monitoring hosts setting into the setting itself, rather than when the setting is used. This prevents a scenario where an invalid value for the setting is accepted, but then later fails while applying a cluster state with the invalid setting.	2019-10-04 17:32:49 -04:00
Mark Tozzi	e404f7ea80	DocValueFormat implementation for date range fields (#47472 ) (#47605 )	2019-10-04 17:21:17 -04:00
Lee Hinman	79376b7219	Set default SLM retention invocation time (#47604 ) This adds a default for the `slm.retention_schedule` setting, setting it to `0 30 1 * * ?` which is 1:30am every day. Having retention unset meant that it would never be invoked and clean up snapshots. We determined it would be better to have a default than never to be run. When coming to a decision, we weighed the option of an absolute time (such as 1:30am) versus a periodic invocation (like every 12 hours). In the end we decided on the absolute time because it has better predictability and consistency than a periodic invocation, which would rely on when the master node were elected or restarted. Relates to #43663	2019-10-04 15:00:20 -06:00
Armin Braun	c1be7a802c	Simplify Snapshot Delete Process (#47439 ) (#47533 ) We don't need to read the SnapshotInfo for a snapshot to determine the indices that need to be updated when it is deleted as the `RepositoryData` contains that information already. This PR makes it so the `RepositoryData` is used to determine which indices to update and also removes the special handling for deleting snapshot metadata and the CS snapshot blob and has those simply be deleted as part of the deleting of other unreferenced blobs in the last step of the delete. This makes the snapshot delete a little faster and more resilient by removing two RPC calls (the separate delete and the get). Also, this shortens the diff with #46250 as a side-effect.	2019-10-04 13:55:16 +02:00
David Roberts	defc97a300	Remove fallback for controller location (#47104 ) This change removes the temporary controller location fallback introduced in #47013. Relates elastic/ml-cpp#593	2019-10-04 09:50:26 +01:00
Ryan Ernst	f32692208e	Add explanations to script score queries (#46693 ) (#47548 ) While function scores using scripts do allow explanations, they are only creatable with an expert plugin. This commit improves the situation for the newer script score query by adding the ability to set the explanation from the script itself. To set the explanation, a user would check for `explanation != null` to indicate an explanation is needed, and then call `explanation.set("some description")`.	2019-10-03 21:05:05 -07:00
Nhat Nguyen	5e4732f2bb	Limit number of retaining translog files for peer recovery (#47414 ) Today we control the extra translog (when soft-deletes is disabled) for peer recoveries by size and age. If users manually (force) flush many times within a short period, we can keep many small (or empty) translog files as neither the size or age condition is reached. We can protect the cluster from running out of the file descriptors in such a situation by limiting the number of retaining translog files.	2019-10-03 20:45:29 -04:00
Armin Braun	bac119f672	Fix getSnapshotIndexMetaData Exception Behavior (#47488 ) (#47496 ) If we fail to read the global metadata in a snapshot we would throw `SnapshotMissingException` but wouldn't do so for the index metadata. This is breaking SLM tests at a low rate because they use `SnapshotMissingException` thrown from snapshot status APIs to wait for a snapshot being gone. Also, we should be consistent here in general and not leak the `NoSuchFileException` to the transport layer for index meta. Closes #46508	2019-10-03 12:47:50 +02:00
Armin Braun	7549be4489	Fix es.http.cname_in_publish_address Deprecation Logging (#47451 ) Since the property defaulted to `true` this deprecation logging runs every time unless its set to `false` manually (in which case it should've also logged but didn't). I didn't add a tests and removed the tests we had in `7.x` that covered this logging. I did move the check out of the `if (InetAddresses.isInetAddress(hostString) == false) {` condition so this is sort-of covered by the REST tests. IMO, any unit-test of this would be somewhat redundant and would've forced adding a field that just indicates that the deprecated property was used to every instance which seemed pointless. Closes #47436	2019-10-03 11:10:48 +02:00
Alpar Torok	0a14bb174f	Remove eclipse conditionals (#44075 ) * Remove eclipse conditionals We used to have some meta projects with a `-test` prefix because historically eclipse could not distinguish between test and main source-sets and could only use a single classpath. This is no longer the case for the past few Eclipse versions. This PR adds the necessary configuration to correctly categorize source folders and libraries. With this change eclipse can import projects, and the visibility rules are correct e.x. auto compete doesn't offer classes from test code or `testCompile` dependencies when editing classes in `main`. Unfortunately the cyclic dependency detection in Eclipse doesn't seem to take the difference between test and non test source sets into account, but since we are checking this in Gradle anyhow, it's safe to set to `warning` in the settings. Unfortunately there is no setting to ignore it. This might cause problems when building since Eclipse will probably not know the right order to build things in so more wirk might be necesarry.	2019-10-03 11:55:00 +03:00
Armin Braun	0beb5263b4	Fix Snapshot Finalization not Waiting for Index Metadata (#47445 ) (#47459 ) * Fix Snapshot Finalization not Waiting for Index Metadata We were mixing up the listeners here which led to the final listener that should be called after all the metadata has been written to be called before that. I fixed this by removing the one redundant listener and flattening the logic out. * Closes #47425	2019-10-02 23:26:18 +02:00
Jason Tedor	52b97ec539	Allow setting validation against arbitrary types (#47264 ) Today when settings validate, they can only validate against settings that are of the same type. While this strong-type is convenient from a development perspective, it is too limiting in that some settings need to validate against settings of a different type. For example, the list setting xpack.monitoring.exporters.<namespace>.host wants to validate that it is non-empty if and only if the string setting xpack.monitoring.exporters.<namespace>.type is "http". Today this is impossible since the settings validation framework only allows that setting to validate against other list settings. This commit increases the flexibility here to validate against settings of arbitrary type, at the expense of losing strong-typing during development.	2019-10-02 16:31:06 -04:00
Jim Ferenczi	c340814b34	Fix highlighting of overlapping terms in the unified highlighter (#47227 ) The passage formatter that the unified highlighter use doesn't handle terms with overlapping offsets. For tokenizer that provides multiple segmentation of the same terms (edge ngram for instance) the formatter should select the largest span in order to highlight the term only once. This change implements this logic.	2019-10-02 16:34:12 +02:00
Yannick Welsch	f7980e9745	Adapt version constants after backport (#47353 )	2019-10-02 14:26:23 +02:00
Yannick Welsch	99d2fe295d	Use optype CREATE for single auto-id index requests (#47353 ) Changes auto-id index requests to use optype CREATE, making it compliant with our docs. This will also make these auto-id index requests compatible with the new "create-doc" index privilege (which is based on the optype), the default optype is changed to create, just as it is already documented.	2019-10-02 14:16:52 +02:00
Yannick Welsch	0024695dd8	Disallow externally generated autoGeneratedTimestamp (#47341 ) The autoGeneratedTimestamp field is internally used to speed up indexing of operations with auto-ids, as we can rule out duplicates. Setting this field externally can make the index inconsistent, resulting in duplicate documents with same id.	2019-10-02 14:16:52 +02:00
Yannick Welsch	8c11fe610e	Use standard semantics for retried auto-id requests (#47311 ) Adds support for handling auto-id requests with optype CREATE. Also simplifies the code handling this by using the standard indexing path when dealing with possible retry conflicts. Relates #47169	2019-10-02 14:16:52 +02:00
Yannick Welsch	7b2613db55	Allow optype CREATE for append-only indexing operations (#47169 ) Bulk requests currently do not allow adding "create" actions with auto-generated IDs. This commit allows using the optype CREATE for append-only indexing operations. This is mainly the user facing aspect of it.	2019-10-02 14:16:52 +02:00
Jim Ferenczi	42c5054e52	Fix alias field resolution in match query (#47369 ) Synonym queries (when two tokens/paths start at the same position) use the alias field instead of the concrete field to build Lucene queries. This commit fixes this bug by resolving the alias field upfront in order to provide the concrete field to the actual query parser.	2019-10-02 11:45:43 +02:00
Nhat Nguyen	5cfcd7c458	Re-fetch shard info of primary when new node joins (#47035 ) Today, we don't clear the shard info of the primary shard when a new node joins; then we might risk of making replica allocation decisions based on the stale information of the primary. The serious problem is that we can cancel the current recovery which is more advanced than the copy on the new node due to the old info we have from the primary. With this change, we ensure the shard info from the primary is not older than any node when allocating replicas. Relates #46959 This work was done by Henning in #42518. Co-authored-by: Henning Andersen <henning.andersen@elastic.co>	2019-10-01 22:16:26 -04:00
Gordon Brown	ba6ee2d40d	[7.x] Adjust randomization in cluster shard limit tests (#47254 ) This commit adjusts randomization for the cluster shard limit tests so that there is often more of a gap left between the limit and the size of the first index. This allows the same randomization to be used for all tests, and alleviates flakiness in `testIndexCreationOverLimitFromTemplate`.	2019-10-01 14:53:10 -06:00
David Turner	99b25d3740	Keep nodes above watermark in testAutomaticReleaseOfIndexBlock (#47387 ) Today the comment boldly claims that this line of code keeps nodes above the 10-byte low watermark when in fact this is not true at all. This change fixes this so that it really does keep nodes above the low watermark. Fixes #45338. Again.	2019-10-01 19:58:23 +01:00
Armin Braun	3d6ef6a90e	Speed up and Reorder Snapshot Delete Operations (#47293 ) (#47350 ) This is a preliminary of #46250 making the snapshot delete work by doing all the metadata updates first and then bulk deleting all of the now unreferenced blobs. Before this change, the metadata updates for each shard and subsequent deletion of the blobs that have become unreferenced due to the delete would happen sequentially shard-by-shard parallelising only over all the indices in the snapshot. This change makes it so the all the metadata updates happen in parallel on a shard level first. Once all of the updates of shard-level metadata have finished, all the now unreferenced blobs are deleted in bulk. This has two benefits (outside of making #46250 a smaller change): * We have a lower likelihood of failing to update shard level metadata because it happens with priority and a higher degree of parallelism * Deleting of unreferenced data in the shards should go much faster in many cases (rolling indices, large number of indices with many unchanged shards) as well because a number of small bulk deletions (just two blobs for `index-N` and `snap-` for each unchanged shard) are grouped into larger bulk deletes of `100-1000` blobs depending on Cloud provider (even though the final bulk deletes are happening sequentially this should be much faster in almost all cases as you'd parallelism of 50 (GCS) to 500 (S3) snapshot threads to achieve the same delete rates when deleting from unchanged shards).	2019-10-01 19:05:43 +02:00
Colin Goodheart-Smithe	c93b39c65b	Adds version 7.4.1	2019-10-01 16:03:11 +01:00
Howard	a9cd42c05d	Cancel recoveries even if all shards assigned (#46520 ) We cancel ongoing peer recoveries if a node joins the cluster with a completely up-to-date copy of a shard, because we can use such a copy to recover a replica instantly. However, today we only look for recoveries to cancel while there are unassigned shards in the cluster. This means that we do not contemplate the cancellation of the last few recoveries since recovering shards are not unassigned. It might take much longer for these recoveries to complete than would be necessary if they were cancelled. This commit fixes this by checking for cancellable recoveries even if all shards are assigned.	2019-10-01 10:55:32 +01:00
Ignacio Vera	03d717dc32	Provide better error when updating geo_shape field mapper settings (#47281 ) (#47338 )	2019-10-01 10:52:39 +02:00
Yannick Welsch	dd0af2e425	Fix CloseIndexIT.testRelocatedClosedIndexIssue (#47169 ) Closes #47330	2019-10-01 08:34:27 +02:00
Armin Braun	3d23cb44a3	Speed up Snapshot Finalization (#47283 ) (#47309 ) As a result of #45689 snapshot finalization started to take significantly longer than before. This may be a little unfortunate since it increases the likelihood of failing to finalize after having written out all the segment blobs. This change parallelizes all the metadata writes that can safely run in parallel in the finalization step to speed the finalization step up again. Also, this will generally speed up the snapshot process overall in case of large number of indices. This is also a nice to have for #46250 since we add yet another step (deleting of old index- blobs in the shards to the finalization.	2019-09-30 23:28:59 +02:00
Jason Tedor	890951113f	Make Setting#getRaw have private access (#47287 ) The method Setting#getRaw leaks implementation details about settings, namely that they are backed by strings. We do not want code to rely upon this, so this commit makes Setting#getRaw private as a first step towards hiding the implementaton details of settings from the rest of the codebase.	2019-09-30 14:14:30 -04:00
David Turner	72b63635de	Remove unused pluggable metadata upgraders (#47277 ) Today plugins may provide upgraders for custom metadata and index metadata, but these upgraders are bypassed during a rolling restart. Fortunately this extension mechanism is unused by all known plugins. This commit removes these extension points. Relates #47297	2019-09-30 16:58:29 +01:00
Gaurav614	052c523d41	Fail allocation of new primaries in empty cluster (#43284 ) Today if you create an index in a cluster without any data nodes then it will report yellow health because it never attempts to assign any shards if there are no data nodes, so the new shards remain at `AllocationStatus.NO_ATTEMPT`. This commit moves the new primaries to `AllocationStatus.DECIDERS_NO` in this situation, causing the cluster health to move to red. Fixes #41073	2019-09-30 14:27:12 +01:00
Yannick Welsch	467596871a	Omit writing index metadata for non-replicated closed indices on data-only node (#47285 ) Fixes a bug related to how "closed replicated indices" (introduced in 7.2) interact with the index metadata storage mechanism, which has special handling for closed indices (but incorrectly handles replicated closed indices). On non-master-eligible data nodes, it's possible for the node's manifest file (which tracks the relevant metadata state that the node should persist) to become out of sync with what's actually stored on disk, leading to an inconsistency that is then detected at startup, refusing for the node to start up. Closes #47276	2019-09-30 13:56:52 +02:00
Przemyslaw Gomulka	d9a7bcef21	Support optional parsers in any order with DateMathParser Backport(46654) (#47217 ) Currently DateMathParser with roundUp = true is relying on the DateFormatter build with combined optional sub parsers with defaulted fields (depending on the formatter). That means that for yyyy-MM-dd'T'HH:mm:ss\|\|yyyy-MM-dd'T'HH:mm:ss.SSS Java.time implementation expects optional parsers in order from most specific to least specific (reverse in the example above). It is causing a problem because the first parsing succeeds but does not consume the full input. The second parser should be used. We can work around this with keeping a list of RoundUpParsers and iterate over them choosing the one that parsed full input. The same approach we used for regular (non date math) in relates #40100 The jdk is not considering this to be a bug https://bugs.openjdk.java.net/browse/JDK-8188771 Those below will expect this change first relates #46242 relates #45284 backport #46654	2019-09-30 13:54:52 +02:00
Yannick Welsch	9dc90e41fc	Remove "force" version type (#47228 ) It's been deprecated long ago and can be removed. Relates to #20377 Closes #19769	2019-09-30 11:58:34 +02:00
Martijn van Groningen	66f72bcdbc	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-30 08:12:28 +02:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Jason Tedor	98989f7b37	Use fallback settings in throttling decider (#47261 ) This commit replaces some uses of Setting#getRaw in the throttling allocation decider settings. Instead, these settings should be using fallback settings.	2019-09-28 08:06:24 -04:00
Jason Tedor	bd603b0a7b	Remove dead leniency in allow rebalance setting use (#47259 ) This commit removes some leniency that exists in getting the allow rebalance setting. Fortunately, that leniency is dead code, this can never happen. The reason this can never happen is because the settings infrastructure will not allow setting an invalid value for this setting. If you try to set this in the elasticsearch.yml, then the node will fail to start, since parsing the setting will fail. If you try to set this via an update settings API call, then parsing the setting will fail and the settings update will be rejected. Therefore, this leniency can never be activated, so we remove it. This commit is the first of a few in an attempt to remove the public uses of Setting#getRaw.	2019-09-28 08:05:37 -04:00
Martijn van Groningen	76b66634e9	put provided argument on the previous line just like in master branch, that way this doesn't show in the final pr.	2019-09-27 15:22:09 +02:00
David Roberts	e943e27954	Spawn controller processes from a different directory on macOS (#47013 ) This is the Java side of https://github.com/elastic/ml-cpp/pull/593 with a fallback so that ml-cpp bundles with either the new or old directory structure work for the time being. A few days after merging the C++ changes a followup to this change will be made that removes the fallback.	2019-09-27 14:02:40 +01:00
Martijn van Groningen	7ffe2e7e63	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-27 14:42:11 +02:00
Yannick Welsch	6fd3b4723f	Remove write lock for Translog.getGeneration (#47036 ) No need for the write lock, and currentFileGeneration is already protected by the read lock. Also removes the unused method "isCurrent".	2019-09-27 13:58:07 +02:00
Jim Ferenczi	73a09b34b8	Replace SearchContextException with SearchException (#47046 ) This commit removes the SearchContextException in favor of a simpler SearchException that doesn't leak the SearchContext. Relates #46523	2019-09-26 14:21:23 +02:00
Tanguy Leroux	95e2ca741e	Remove unused private methods and fields (#47154 ) This commit removes a bunch of unused private fields and unused private methods from the code base. Backport of (#47115)	2019-09-26 12:49:21 +02:00
jimczi	97d977f381	#47046 Fix serialization version check after backport	2019-09-26 09:56:24 +02:00
Jim Ferenczi	04972baffa	Merge ShardSearchTransportRequest and ShardSearchLocalRequest (#46996 ) (#47081 ) This change merges the `ShardSearchTransportRequest` and `ShardSearchLocalRequest` into a single `ShardSearchRequest` that can be used to create a SearchContext. Relates #46523	2019-09-26 09:20:53 +02:00
Martijn van Groningen	429f23ea2f	Allow ingest processors to execute in a non blocking manner. (#47122 ) Backport of #46241 This PR changes the ingest executing to be non blocking by adding an additional method to the Processor interface that accepts a BiConsumer as handler and changing IngestService#executeBulkRequest(...) to ingest document in a non blocking fashion iff a processor executes in a non blocking fashion. This is the second PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation. This change originates from the enrich branch and was introduced there in #43361.	2019-09-26 08:55:28 +02:00
David Turner	45c7783018	Warn on slow metadata persistence (#47130 ) Today if metadata persistence is excessively slow on a master-ineligible node then the `ClusterApplierService` emits a warning indicating that the `GatewayMetaState` applier was slow, but gives no further details. If it is excessively slow on a master-eligible node then we do not see any warning at all, although we might see other consequences such as a lagging node or a master failure. With this commit we emit a warning if metadata persistence takes longer than a configurable threshold, which defaults to `10s`. We also emit statistics that record how much index metadata was persisted and how much was skipped since this can help distinguish cases where IO was slow from cases where there are simply too many indices involved. Backport of #47005.	2019-09-26 07:40:54 +01:00
Tim Brooks	4f47e1f169	Extract proxy connection logic to specialized class (#47138 ) Currently the logic to check if a connection to a remote discovery node exists and otherwise create a proxy connection is mixed with the collect nodes, cluster connection lifecycle, and other RemoteClusterConnection logic. This commit introduces a specialized RemoteConnectionManager class which handles the open connections. Additionally, it reworks the "round-robin" proxy logic to create the list of potential connections at connection open/close time, opposed to each time a connection is requested.	2019-09-25 15:58:18 -06:00
Nhat Nguyen	7c5a088aa5	Increase ensureGreen timeout for testReplicaCorruption (#47136 ) We can have a large number of shard copies in this test. For example, the two recent failures have 24 and 27 copies respectively and all replicas have to copy segment files as their stores are corrupted. Our CI needs more than 30 seconds to start all these copies. Note that in two recent failures, the cluster was green just after the cluster health timed out. Closes #41899	2019-09-25 17:04:08 -04:00
Lee Hinman	a267df30fa	Wait for snapshot completion in SLM snapshot invocation (#47051 ) * Wait for snapshot completion in SLM snapshot invocation This changes the snapshots internally invoked by SLM to wait for completion. This allows us to capture more snapshotting failure scenarios. For example, previously a snapshot would be created and then registered as a "success", however, the snapshot may have been aborted, or it may have had a subset of its shards fail. These cases are now handled by inspecting the response to the `CreateSnapshotRequest` and ensuring that there are no failures. If any failures are present, the history store now stores the action as a failure instead of a success. Relates to #38461 and #43663	2019-09-25 14:25:22 -06:00
Armin Braun	93fcd23da8	Fail Snapshot on Corrupted Metadata Blob (#47009 ) (#47096 ) We should not be quietly ignoring a corrupted shard-level index-N blob. Simply creating a new empty shard-level index-N and moving on means that all snapshots of that shard show `SUCESS` as their state at the repository root but are in fact broken. This change at least makes it visible to the user that they can't snapshot the given shard any more and forces the user to move on to a new repository since the current one is broken and will not allow snapshotting the inconsistent shard again. Also, this change stops the delete action for shards with broken index-N blobs instead of simply deleting all blobs in the path containing the broken index-N. This prevents a temporarily broken/missing index-N blob from corrupting all snapshots of that shard.	2019-09-25 15:55:33 +02:00
Nhat Nguyen	22575bd7e6	Remove isRecovering method from Engine (#47039 ) We already prevent flushing in Engine if it's recovering. Hence, we can remove the protection in IndexShard.	2019-09-25 08:58:08 -04:00
Armin Braun	c4a166fc9a	Simplify SnapshotResiliencyTests (#46961 ) (#47108 ) Simplify `SnapshotResiliencyTests` to more closely match the structure of `AbstractCoordinatorTestCase` and allow for future drying up between the two classes: * Make the test cluster nodes a nested-class in the test cluster itself * Remove the needless custom network disruption implementation and simply track disconnected node ids like `AbstractCoordinatorTestCase` does	2019-09-25 14:53:11 +02:00
Yannick Welsch	81cbd3fba4	Mute ClusterShardLimitIT.testIndexCreationOverLimitFromTemplate Relates #47107	2019-09-25 14:03:08 +02:00
David Turner	ac920e8e64	Assert no exceptions during state application (#47090 ) Today we log and swallow exceptions during cluster state application, but such an exception should not occur. This commit adds assertions of this fact, and updates the Javadocs to explain it. Relates #47038	2019-09-25 12:32:51 +01:00
Martijn van Groningen	eef1ba3fad	Make ingest pipeline resolution logic unit testable (#47026 ) Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847	2019-09-25 11:35:00 +02:00
Daniel Mitterdorfer	48df560593	Emit log message when parent circuit breaker trips (#47000 ) (#47073 ) We emit a debug log message whenever a child circuit breaker trips (in `ChildMemoryCircuitBreaker#circuitBreak(String, long)`) but we never emit a log message when the parent circuit breaker trips. As this is more likely to happen with the real memory circuit breaker it is not possible to detect this in the logs. With this commit we add a log message on the same log level (debug) when the parent circuit breaker trips.	2019-09-25 10:22:46 +02:00
Julie Tibshirani	41ee8aa6fc	Reject regexp queries on the _index field. (#46945 ) We speculatively added support for `regexp` queries on the `_index` field in #34089 (this functionality was not actually requested by a user). Supporting regex logic adds complexity to the `_index` field for not much gain, so we would like to remove it. From an end-to-end test it turns out this functionality never even worked in the first place because of an error in how regex flags were interpreted! For this reason, we can remove support for `regexp` on `_index` without a deprecation period. Relates to #46640.	2019-09-24 12:17:00 -07:00
Tim Brooks	71ec0707cf	Remove locking around connection attempts (#46845 ) Currently in the ConnectionManager we lock around the node id. This is odd because we key connections by the ephemeral id. Upon further investigation it appears to me that we do not need the locking. Using the concurrent map, we can ensure that only one connection attempt completes. There is a very small chance that a new connection attempt will proceed right as another connection attempt is completing. However, since the whole process is asynchronous and event oriented (lightweight), that does not seem to be an issue.	2019-09-24 11:05:42 -06:00
Tim Brooks	f02582de4b	Reduce a bind failure to trace logging (#46891 ) Due to recent changes in the nio transport, a failure to bind the server channel has started to be logged at an error level. This exception leads to an automatic retry on a different port, so it should only be logged at a trace level.	2019-09-24 10:32:18 -06:00
David Turner	9135e2f9e3	Improve LeaderCheck rejection messages (#46998 ) Today the `LeaderChecker` rejects checks from nodes that are not in the current cluster with the exception message `leader check from unknown node` which offers no information about why the node is unknown. In fact the node must have been in the cluster in the recent past, so it might help guide the user to a more useful log message if we describe it as a `removed node` instead of an `unknown node`. This commit changes the exception message like this, and also tidies up a few other loose ends in the `LeaderChecker`.	2019-09-24 13:41:37 +01:00
David Turner	6943a3101f	Cut PersistedState interface from GatewayMetaState (#46655 ) Today `GatewayMetaState` implements `PersistedState` but it's an error to use it as a `PersistedState` before it's been started, or if the node is master-ineligible. It also holds some fields that are meaningless on nodes that do not persist their states. Finally, it takes responsibility for both loading the original cluster state and some of the high-level logic for writing the cluster state back to disk. This commit addresses these concerns by introducing a more specific `PersistedState` implementation for use on master-eligible nodes which is only instantiated if and when it's appropriate. It also moves the fields and high-level persistence logic into a new `IncrementalClusterStateWriter` with a more appropriate lifecycle. Follow-up to #46326 and #46532 Relates #47001	2019-09-24 12:31:13 +01:00
Julie Tibshirani	9124c94a6c	Add support for aliases in queries on _index. (#46944 ) Previously, queries on the _index field were not able to specify index aliases. This was a regression in functionality compared to the 'indices' query that was deprecated and removed in 6.0. Now queries on _index can specify an alias, which is resolved to the concrete index names when we check whether an index matches. To match a remote shard target, the pattern needs to be of the form 'cluster:index' to match the fully-qualified index name. Index aliases can be specified in the following query types: term, terms, prefix, and wildcard.	2019-09-23 13:21:37 -07:00
Jim Ferenczi	08f28e642b	Replace SearchContext with QueryShardContext in query builder tests (#46978 ) This commit replaces the SearchContext used in AbstractQueryTestCase with a QueryShardContext in order to reduce the visibility of search contexts. Relates #46523	2019-09-23 20:24:02 +02:00
Eray	199fff8a55	Allow max_children only in top level nested sort (#46731 ) This commit restricts the usage of max_children to the top level nested sort since it is ignored on the other levels.	2019-09-23 18:53:50 +02:00
Armin Braun	2da040601b	Fix Bug in Snapshot Status Response Timestamps (#46919 ) (#46970 ) Fixing a corner case where snapshot total time calculation was off when getting the `SnapshotStatus` of an in-progress snapshot. Closes #46913	2019-09-23 15:01:47 +02:00
David Turner	7bc86f23ec	Wait longer for leader failure in logs test (#46958 ) `testLogsWarningPeriodicallyIfClusterNotFormed` simulates a leader failure and waits for long enough that a failing leader check is scheduled. However it does not wait for the failing check to actually fail, which requires another two actions and therefore might take up to 200ms more. Unlucky timing would result in this test failing, for instance: ./gradle ':server:test' \ --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" \ -Dtests.jvm.argline="-Dhppc.bitmixer=DETERMINISTIC" \ -Dtests.seed=F18CDD0EBEB5653:E9BC1A8B062E697A This commit adds the extra delay needed for the leader failure to complete as expected. Fixes #46920	2019-09-23 10:52:13 +01:00
Martijn van Groningen	33bbc4798b	fixed compile errors after merging	2019-09-23 09:46:14 +02:00
Martijn van Groningen	0cfddca61d	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-23 09:46:05 +02:00
Armin Braun	ee4e6b1382	Add TestLogging for #46701 (#46939 ) (#46949 ) This at a very low rate and with the force merge in place before checking the cache size it's not clear why the cache is not of size `0` -> seems something else must be happening here that is unexpected. -> add debug logging to this test to find out Relates #46701	2019-09-21 15:24:58 +02:00
Armin Braun	938648fcff	Remove Duplicate Shard Snapshot State Updates (#46862 ) (#46906 ) We were repeatedly trying to send shard state updates for aborted snapshots on every cluster state update. This is simply dead-code since those updates are already safely sent in the callbacks passed to `SnapshotShardsService#snapshot`. On master failover, we ensure that the status update is resent via `SnapshotShardsService#syncShardStatsOnNewMaster`. => there is no need for trying to send updates here over and over and this logic can safely be removed	2019-09-20 14:30:03 +02:00
Jason Tedor	97acf353fa	Move pipelines resolved assertion (#46892 ) This assertion was added during the development of required pipelines. In the initial version of that work, the notion of whether or not a request was forwarded from the coordinating node to an ingest node was introduced. It was realized later that instead we needed to track whether or not the pipeline for the request was resolved. When that change was made, this assertion, while not incorrect, was left behind and only applied if the coordnating node was forwarding the request. Instead, the assertion applies whether or not the request is being forwarded. This commit addresses that by moving the assertion and renaming some variables.	2019-09-20 07:27:56 -04:00
Jason Tedor	2425fd1a50	Removed unused import from RequiredPipelineIT.java This commit removes an unused import that was left behind after cleaning up a backport. Sorry.	2019-09-19 16:46:27 -04:00
Jason Tedor	bd77626177	Add the ability to require an ingest pipeline (#46847 ) This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.	2019-09-19 16:37:45 -04:00
Armin Braun	c922743c5d	Remove Bogus Test: testDeleteOrphanSnapshot (#46835 ) (#46874 ) This test is broken with a very low failure rate after recent changes. Particularly after #45689 which does not check for duplicate snapshot uuids during snapshot finalization any more. The check for duplicate uuids during finalization was removed conciously since it lead to problems during master failover. This test fails because it increments the repository state id in an unexpected manner now, starting from the impossible situation of having the same snapshot UUID for two different repository state ids. This situation can't normally be reached, but we manually crafted it here. This test didn't do anything before though, because the manually crafted cluster state would simply result in an error during finalization before and nothing but a normal snapshot delete would be tested. => removing this test here, it doesn't test anything. Closes #46843	2019-09-19 18:52:35 +02:00
Yannick Welsch	9638ca20b0	Allow dropping documents with auto-generated ID (#46773 ) When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException. The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality. Closes #46678	2019-09-19 16:46:33 +02:00
Armin Braun	a087553009	Rearrange BlobStoreRepository to Prepare #46250 (#46824 ) (#46853 ) In #46250 we will have to move to two different delete paths for BwC. Both paths will share the initial reads from the repository though so I extracted/separated the delete logic away from the initial reads to significantly simplify the diff against #46250 here. Also, I added some JavaDoc from #46250 here as well which makes the code a little easier to follow even ignoring #46250 I think.	2019-09-19 13:07:00 +02:00
Tanguy Leroux	3ae51f25dd	Move testSnapshotWithLargeSegmentFiles to ESMockAPIBasedRepositoryIntegTestCase (#46802 ) This commit moves the common test testSnapshotWithLargeSegmentFiles to the ESMockAPIBasedRepositoryIntegTestCase base class.	2019-09-18 15:41:30 +02:00
Christos Soulios	0076083b35	Implement rounding optimization for fixed offset timezones (#46809 ) Fixes #45702 with date_histogram aggregation when using fixed_interval. Optimization has been implemented for both fixed and calendar intervals	2019-09-18 15:56:34 +03:00
Armin Braun	142b10604e	Fix testHistoryRetention (#46799 ) (#46805 ) Suppress the reasonable-history check in this test to guarantee we're always getting ops based recovery even after a background sync. Closes #45953 Co-Authored-By: David Turner <david.turner@elastic.co>	2019-09-18 13:22:55 +02:00
Martijn van Groningen	ac4e990924	Add ingest cluster state listeners (#46650 ) In the case that an ingest processor factory relies on other configuration in the cluster state in order to construct a processor instance then it is currently undetermined if a processor facotry can be notified about a change if multiple cluster state updates are bundled together and if a processor implement `ClusterStateApplier` interface. (IngestService implements this interface too) The idea with ingest cluster state listener is that it is guaranteed to update the processor factory first before the ingest service creates a pipeline with their respective processor instances. Currently this concept is used in the enrich branch: https://github.com/elastic/elasticsearch/blob/enrich/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichProcessorFactory.java#L21 In this case it a processor factory is interested in enrich indices' _meta mapping fields. This is the third PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. Changes to the server module are merged separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation.	2019-09-18 09:13:16 +02:00
Armin Braun	2c70d403fc	Reenable+Fix testMasterShutdownDuringFailedSnapshot (#46303 ) (#46747 ) Reenable this test since it was fixed by #45689 in production code (specifically, the fact that we write the `snap-` blobs without overwrite checks now). Only required adding the assumed blocking on index file writes to test code to properly work again. * Closes #25281	2019-09-17 18:09:48 +02:00
Armin Braun	b0f09b279f	Make Snapshot Logic Write Metadata after Segments (#45689 ) (#46764 ) * Write metadata during snapshot finalization after segment files to prevent outdated metadata in case of dynamic mapping updates as explained in #41581 * Keep the old behavior of writing the metadata beforehand in the case of mixed version clusters for BwC reasons * Still overwrite the metadata in the end, so even a mixed version cluster is fixed by this change if a newer version master does the finalization * Fixes #41581	2019-09-17 13:09:39 +02:00
Armin Braun	c045bc7f54	Minor Rearrangements in Snapshot Code (#46652 ) (#46752 ) Inlining one trivial single-use method and extracting the stale shard path blob calculation to make the diff with #46250 more manageable.	2019-09-17 09:23:00 +02:00
Armin Braun	20cb95ca5e	Fix testSnapshotRelocatingPrimary to Actually Run Relocations (#46594 ) (#46620 ) Without replicas we won't actually get any relocations going when removing the node constraints in this test. Adjusted the code to force relocations by forbidding nodes that hold primaries instead. Also, fixed the timeouts and asserted that we actually get relocations. Fixes #46276	2019-09-16 15:15:33 +02:00
Andrei Dan	c57cca98b2	[ILM] Add date setting to calculate index age (#46561 ) (#46697 ) * [ILM] Add date setting to calculate index age Add the `index.lifecycle.origination_date` to allow users to configure a custom date that'll be used to calculate the index age for the phase transmissions (as opposed to the default index creation date). This could be useful for users to create an index with an "older" origination date when indexing old data. Relates to #42449. * [ILM] Don't override creation date on policy init The initial approach we took was to override the lifecycle creation date if the `index.lifecycle.origination_date` setting was set. This had the disadvantage of the user not being able to update the `origination_date` anymore once set. This commit changes the way we makes use of the `index.lifecycle.origination_date` setting by checking its value when we calculate the index age (ie. at "read time") and, in case it's not set, default to the index creation date. * Make origination date setting index scope dynamic * Document orignation date setting in ilm settings (cherry picked from commit d5bd2bb77ee28c1978ab6679f941d7c02e389d32) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-09-16 08:50:28 +01:00
Armin Braun	2b85dcb201	Parallelize Repository Cleanup Actions (#46647 ) (#46714 ) * Parallelize Repository Cleanup Actions Deleting root blobs and unreferenced indices can safely happen in parallel, no need to have both operations run sequentially when they preclude all other repository operations.	2019-09-16 07:52:03 +02:00
David Turner	272b0ecbdd	Remove docs for proxy mode (#46677 ) We added docs for proxy mode in #40281 but on reflection we should not be documenting this setting since it does not play well with all proxies and we can't recommend its use. This commit removes those docs and expands its Javadoc instead.	2019-09-13 22:20:11 +01:00
Nhat Nguyen	cabff5a7cd	Handle lower retaining seqno retention lease error (#46420 ) We renew the CCR retention lease at a fixed interval, therefore it's possible to have more than one in-flight renewal requests at the same time. If requests arrive out of order, then the assertion is violated. Closes #46416 Closes #46013	2019-09-13 08:50:19 -04:00
Nhat Nguyen	e1a33c6283	Fix false positive out of sync warning in synced-flush (#46576 ) Synced-flush consists of three steps: (1) force-flush on every active copy; (2) check for ongoing indexing operations; (3) seal copies if there's no change since step 1. If some indexing operations are completed on the primary but not replicas, then Lucene commits from step 1 on replicas won't be the same as the primary's. And step 2 would pass if it's executed when all pending operations are done. Once step 2 passes, we will incorrectly emit the "out of sync" warning message although nothing wrong here. Relates #28464 Relates #30244	2019-09-12 16:34:33 -04:00
Nhat Nguyen	5465c8d095	Increase timeout for relocation tests (#46554 ) There's nothing wrong in the logs from these failures. I think 30 seconds might not be enough to relocate shards with many documents as CI is quite slow. This change increases the timeout to 60 seconds for these relocation tests. It also dumps the hot threads in case of timed out. Closes #46526 Closes #46439	2019-09-12 16:34:01 -04:00
Zachary Tong	e1e06c2589	Add version constant for 7.3.3	2019-09-12 13:50:40 -04:00
Jim Ferenczi	4407f3af1b	Delay the creation of SubSearchContext to the FetchSubPhase (#46598 ) This change delays the creation of the SubSearchContext for nested and parent/child inner_hits to the fetch sub phase in order to ensure that a SearchContext can built entirely from a QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid. Relates #46523	2019-09-12 14:52:15 +02:00
Igor Motov	35cb93248d	Geo: fix indexing of west to east linestrings crossing the antimeridian (#46601 ) Fixes that way linestrings that are crossing the antimeridian are indexed due to a normalization bug these lines were decomposed into a line segment that was stretching entire globe. Fixes #43775	2019-09-11 17:43:17 -04:00
Zachary Tong	6dc8ed5d57	[7.x Backport] Refactor AllocatedPersistentTask#init(), move rollup ctor logic (#46406 ) This makes the AllocatedPersistentTask#init() method protected so that implementing classes can perform their initialization logic there, instead of the constructor. Rollup's task is adjusted to use this init method. It also slightly refactors the methods to se a static logger in the AllocatedTask instead of passing it in via an argument. This is simpler, logged messages come from the task instead of the service, and is easier for tests	2019-09-11 17:00:28 -04:00
Ryan Ernst	fa9327cdb9	Add more meaningful keystore version mismatch errors (#46291 ) This commit changes the version bounds of keystore reading to give better error messages when a user has a too new or too old format. relates #44624	2019-09-11 09:55:19 -07:00
Jim Ferenczi	23bf310c84	Replace the SearchContext with QueryShardContext when building aggregator factories (#46527 ) This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them. The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`. Relates #46523	2019-09-11 16:43:30 +02:00
Armin Braun	27c15f137e	Remove Unused Method from BlobStoreRepository (#46204 ) (#46593 ) This method isn't used anymore and I forgot to delete it.	2019-09-11 16:34:24 +02:00
William Brafford	8c9f15db44	Fix Path comparisons for Windows tests (#46503 ) (#46566 ) * Fix Path comparisons for Windows tests The test NodeEnvironmentTests#testCustonDataPaths worked just fine on Darwin and Linux, but the comparison was breaking in Windows because one path had the "C:\" prefix and the other one didn't. The simple fix is to compare absolute paths rather than potentially relative ones.	2019-09-11 09:33:00 -04:00
Christoph Büscher	aa0c586b73	Deprecate `_field_names` disabling (#42854 ) Currently we allow `_field_names` fields to be disabled explicitely, but since the overhead is negligible now we decided to keep it turned on by default and deprecate the `enable` option on the field type. This change adds a deprecation warning whenever this setting is used, going forward we want to ignore and finally remove it. Closes #27239	2019-09-11 14:58:08 +02:00
Armin Braun	41633cb9b5	More Efficient Ordering of Shard Upload Execution (#42791 ) (#46588 ) * More Efficient Ordering of Shard Upload Execution (#42791) * Change the upload order of of snapshots to work file by file in parallel on the snapshot pool instead of merely shard-by-shard * Inspired by #39657 * Cleanup BlobStoreRepository Abort and Failure Handling (#46208)	2019-09-11 13:59:20 +02:00
Jim Ferenczi	80bb08fbda	Replace the SearchContext with QueryShardContext when building collapsing context (#46543 ) This commit replaces the `SearchContext` with the `QueryShardContext` when building collapsing conteext Collapse context is part of the `SearchContext` so it shouldn't require a `SearchContext` to create one. Relates #46523	2019-09-11 12:25:38 +02:00
Jim Ferenczi	425b1a77e8	Add more context to QueryShardContext (#46584 ) This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext. It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the query shard context. Relates #46523	2019-09-11 12:24:51 +02:00
Armin Braun	f8d5145472	Fix SnapshotStatusApisIT (#46563 ) (#46582 ) Obviously we have to run the status request again to busy wait for the `STARTED` state, just busy waiting on an existing response won't do anything. Closes #45917	2019-09-11 11:58:42 +02:00
Lee Hinman	cdc3a260af	Add retention to Snapshot Lifecycle Management (backport of #4… (#46506 ) * Add retention to Snapshot Lifecycle Management (#46407) This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria. An example policy would look like: ``` PUT /_slm/policy/snapshot-every-day { "schedule": "0 30 2 * * ?", "name": "<production-snap-{now/d}>", "repository": "my-s3-repository", "config": { "indices": ["foo-", "important"] }, // Newly configured retention options "retention": { // Snapshots should be deleted after 14 days "expire_after": "14d", // Keep a maximum of thirty snapshots "max_count": 30, // Keep a minimum of the four most recent snapshots "min_count": 4 } } ``` SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour. Included in this work is a new SLM stats API endpoint available through ``` json GET /_slm/stats ``` That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API. Add base framework for snapshot retention (#43605) * Add base framework for snapshot retention This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask` to start as the basis for SLM's retention implementation. Relates to #38461 * Remove extraneous 'public' * Use a local var instead of reading class var repeatedly * Add SnapshotRetentionConfiguration for retention configuration (#43777) * Add SnapshotRetentionConfiguration for retention configuration This commit adds the `SnapshotRetentionConfiguration` class and its HLRC counterpart to encapsulate the configuration for SLM retention. Currently only a single parameter is supported as an example (we still need to discuss the different options we want to support and their names) to keep the size of the PR down. It also does not yet include version serialization checks since the original SLM branch has not yet been merged. Relates to #43663 * Fix REST tests * Fix more documentation * Use Objects.equals to avoid NPE * Put `randomSnapshotLifecyclePolicy` in only one place * Occasionally return retention with no configuration * Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764) * Implement SnapshotRetentionTask's snapshot filtering and deletion This commit implements the snapshot filtering and deletion for `SnapshotRetentionTask`. Currently only the expire-after age is used for determining whether a snapshot is eligible for deletion. Relates to #43663 * Fix deletes running on the wrong thread * Handle missing or null policy in snap metadata differently * Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>> * Use the `OriginSettingClient` to work with security, enhance logging * Prevent NPE in test by mocking Client * Allow empty/missing SLM retention configuration (#45018) Semi-related to #44465, this allows the `"retention"` configuration map to be missing. Relates to #43663 * Add min_count and max_count as SLM retention predicates (#44926) This adds the configuration options for `min_count` and `max_count` as well as the logic for determining whether a snapshot meets this criteria to SLM's retention feature. These options are optional and one, two, or all three can be specified in an SLM policy. Relates to #43663 * Time-bound deletion of snapshots in retention delete function (#45065) * Time-bound deletion of snapshots in retention delete function With a cluster that has a large number of snapshots, it's possible that snapshot deletion can take a very long time (especially since deletes currently have to happen in a serial fashion). To prevent snapshot deletion from taking forever in a cluster and blocking other operations, this commit adds a setting to allow configuring a maximum time to spend deletion snapshots during retention. This dynamic setting defaults to 1 hour and is best-effort, meaning that it doesn't hard stop a deletion at an hour mark, but ensures that once the time has passed, all subsequent deletions are deferred until the next retention cycle. Relates to #43663 * Wow snapshots suuuure can take a long time. * Use a LongSupplier instead of actually sleeping * Remove TestLogging annotation * Remove rate limiting * Add SLM metrics gathering and endpoint (#45362) * Add SLM metrics gathering and endpoint This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The stats stored include the number of snapshots taken, failed, deleted, the number of retention runs, as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount of time spent deleting snapshots from SLM retention. This commit also adds an endpoint for retrieving all stats (further commits will expose this in the SLM get-policy API) that looks like: ``` GET /_slm/stats { "retention_runs" : 13, "retention_failed" : 0, "retention_timed_out" : 0, "retention_deletion_time" : "1.4s", "retention_deletion_time_millis" : 1404, "policy_metrics" : { "daily-snapshots2" : { "snapshots_taken" : 7, "snapshots_failed" : 0, "snapshots_deleted" : 6, "snapshot_deletion_failures" : 0 }, "daily-snapshots" : { "snapshots_taken" : 12, "snapshots_failed" : 0, "snapshots_deleted" : 12, "snapshot_deletion_failures" : 6 } }, "total_snapshots_taken" : 19, "total_snapshots_failed" : 0, "total_snapshots_deleted" : 18, "total_snapshot_deletion_failures" : 6 } ``` This does not yet include HLRC for this, as this commit is quite large on its own. That will be added in a subsequent commit. Relates to #43663 * Version qualify serialization * Initialize counters outside constructor * Use computeIfAbsent instead of being too verbose * Move part of XContent generation into subclass * Fix REST action for master merge * Unused import * Record history of SLM retention actions (#45513) This commit records the deletion of snapshots by the retention component of SLM into the SLM history index for the purposes of reviewing operations taken by SLM and alerting. * Retry SLM retention after currently running snapshot completes (#45802) * Retry SLM retention after currently running snapshot completes This commit adds a ClusterStateObserver to wait until the currently running snapshot is complete before proceeding with snapshot deletion. SLM retention waits for the maximum allowed deletion time for the snapshot to complete, however, the waiting time is not factored into the limit on actual deletions. Relates to #43663 * Increase timeout waiting for snapshot completion * Apply patch From `2374316f0d`.patch * Rename test variables * [TEST] Be less strict for stats checking * Skip SLM retention if ILM is STOPPING or STOPPED (#45869) This adds a check to ensure we take no action during SLM retention if ILM is currently stopped or in the process of stopping. Relates to #43663 * Check all actions preventing snapshot delete during retention (#45992) * Check all actions preventing snapshot delete during retention run Previously we only checked to see if a snapshot was currently running, but it turns out that more things can block snapshot deletion. This changes the check to be a check for: - a snapshot currently running - a deletion already in progress - a repo cleanup in progress - a restore currently running This was found by CI where a third party delete in a test caused SLM retention deletion to throw an exception. Relates to #43663 * Add unit test for okayToDeleteSnapshots * Fix bug where SLM retention task would be scheduled on every node * Enhance test logging * Ignore if snapshot is already deleted * Missing import * Fix SnapshotRetentionServiceTests * Expose SLM policy stats in get SLM policy API (#45989) This also adds support for the SLM stats endpoint to the high level rest client. Retrieving a policy now looks like: ```json { "daily-snapshots" : { "version": 1, "modified_date": "2019-04-23T01:30:00.000Z", "modified_date_millis": 1556048137314, "policy" : { "schedule": "0 30 1 * * ?", "name": "<daily-snap-{now/d}>", "repository": "my_repository", "config": { "indices": ["data-", "important"], "ignore_unavailable": false, "include_global_state": false }, "retention": {} }, "stats": { "snapshots_taken": 0, "snapshots_failed": 0, "snapshots_deleted": 0, "snapshot_deletion_failures": 0 }, "next_execution": "2019-04-24T01:30:00.000Z", "next_execution_millis": 1556048160000 } } ``` Relates to #43663 Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356) * Rewrite SnapshotLifecycleIT as as ESIntegTestCase This commit splits `SnapshotLifecycleIT` into two different tests. `SnapshotLifecycleRestIT` which includes the tests that do not require slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an integration test using `MockRepository` to simulate a snapshot being in progress. Relates to #43663 Resolves #46205 * Add error logging when exceptions are thrown * Update serialization versions * Fix type inference * Use non-Cancellable HLRC return value * Fix Client mocking in test * Fix SLMSnapshotBlockingIntegTests for 7.x branch * Update SnapshotRetentionTask for non-multi-repo snapshot retrieval * Add serialization guards for SnapshotLifecyclePolicy	2019-09-10 09:08:09 -06:00
Mayya Sharipova	2c5f9b558b	Fix highlighting for script_score query (#46507 )	2019-09-10 08:26:47 -04:00
David Turner	6c67b53932	Load metadata at start time not construction time (#46326 ) Today we load the metadata from disk while constructing the node. However there is no real need to do so, and this commit moves that code to run later while the node is starting instead.	2019-09-10 11:15:10 +01:00
Henning Andersen	9fce5a99d8	Rest Controller wildcard registration (#46487 ) Registering two different http methods on the same path using different wildcard names would result in the last wildcard name being active only. Now throw an exception instead. Closes #46482	2019-09-09 21:49:18 +02:00
Zachary Tong	8d17527050	[TEST] create larger cuckoo filters for tests (#46457 ) The cuckoofilters could be randomly created with too small of capacity or precision, which means that they can only absorb a few values before collisions start to make all filters look identical. This increases the size of filters we generate (capacity >> than the test cases) and lower fpp rate.	2019-09-09 10:18:51 -04:00
David Turner	8428f8e6e8	Remove trailing comma from nodes lists (#46484 ) Today when the membership of the cluster changes we log messages that describe the change like this: added {{node-1}{OPdaTIGmSxaEXXOyg3o96w}{127.0.0.1}{127.0.0.1:9301}{di},} The trailing comma suggests there is some missing string that might contain extra information, but in fact it's an artefact of how these messages are constructed. This commit removes the trailing comma from these lists.	2019-09-09 14:47:32 +01:00
Armin Braun	ee3396735c	Execute SnapshotsService Error Callback on Generic Thread (#46277 ) (#46480 ) I couldn't find a test for this, as it seems we only get into this error handler on a bug. Regardless, we are executing the snapshot finalization on the master update thread here which shouldn't happen and will make debugging a production issue resulting from this trickier than it has to be (because we probably also get a cluster state apply is slow warning in addition to the original bug). Used the generic pool here instead of the snapshot pool because we're resolving the user callback here as well and the generic pool seemed like the safer bet for that.	2019-09-09 14:38:11 +02:00
Martijn van Groningen	c057fce978	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-09 08:40:54 +02:00
Nhat Nguyen	24c3a1de3c	Ignore replication for noop updates (#46458 ) Previously, we ignore replication for noop updates because they do not have sequence numbers. Since #44603, we started assigning sequence numbers to noop updates leading them to be replicated to replicas. This bug occurs only on 8.0 for it requires #41065 and #44603. Closes #46366	2019-09-07 11:32:01 -04:00
markharwood	323ec022be	Deprecate the "index.max_adjacency_matrix_filters" index setting (#46394 ) Following performance optimisations to the adjacency_matrix aggregation we no longer require this setting. Marked as deprecated and due for removal in 8.0 Related #46324	2019-09-06 13:59:47 +01:00
Yunfeng,Wu	7582af27b0	Resolve the incorrect scroll_current when delete or close index (#45226 ) Resolve the incorrect current scroll for deleted or closed index	2019-09-06 09:45:53 +02:00
Jim Ferenczi	f2a6c88f83	Add a system property to ignore awareness attributes (#46375 ) This is a follow up of #19191 for 7.x. This change adds a system property called "es.routing.search_ignore_awareness_attributes" that when set to true will effectively ignore allocation awareness attributes when routing search and get requests. This is now the default in 8.x so this commit adds a way to opt-in to this new behavior in a minor version of 7.x. Relates #45735	2019-09-06 09:29:27 +02:00
Paul Sanwald	758680c549	version bump to 6.8.4 (#46409 )	2019-09-05 15:14:36 -04:00
Jason Tedor	92866f977a	Clarify error message on keystore write permissions (#46321 ) When the Elasticsearch process does not have write permissions to upgrade the Elasticsearch keystore, we bail with an error message that indicates there is a filesystem permissions problem. This commit clarifies that error message by pointing out the directory where write permissions are required, or that the user can also run the elasticsearch-keystore upgrade command manually before starting the Elasticsearch process. In this case, the upgrade would not be needed at runtime, so the permissions would not be needed then.	2019-09-05 15:11:54 -04:00
Benjamin Trent	d912a49c6f	[7.x] Support geotile_grid aggregation in composite agg sources (#45810 ) (#46399 ) * Support geotile_grid aggregation in composite agg sources (#45810) Adds support for `geotile_grid` as a source in composite aggs. Part of this change includes adding a new docFormat of `GEOTILE` that formats a hashed `long` value into a geotile formatting string `zoom/x/y`.	2019-09-05 13:22:57 -05:00
Armin Braun	7a9af874ad	Enable Debug Logging for Master and Coordination Packages (#46363 ) (#46374 ) In order to track down #46091: * Enables debug logging in REST tests for `master` and `coordination` packages since we suspect that issues are caused by failed and then retried publications	2019-09-05 14:03:38 +02:00
Yannick Welsch	7e4c633ce3	Quiet down shard lock failures (#46368 ) These were actually never intended to be logged at the warning level but made visible by a refactoring in #19991, which introduced a new exception type but forgot to adapt some of the consumers of the exception.	2019-09-05 13:08:11 +02:00
Nhat Nguyen	03ed18a010	Unmute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-04 22:33:17 -04:00
Julie Tibshirani	40c3225d26	First round of optimizations for vector functions. (#46294 ) This PR merges the `vectors-optimize-brute-force` feature branch, which makes the following changes to how vector functions are computed: * Precompute the L2 norm of each vector at indexing time. (#45390) * Switch to ByteBuffer for vector encoding. (#45936) * Decode vectors and while computing the vector function. (#46103) * Use an array instead of a List for the query vector. (#46155) * Precompute the normalized query vector when using cosine similarity. (#46190) Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2019-09-04 14:45:57 -07:00
Nhat Nguyen	a16cb89956	Revert "Sync translog without lock when trim unreferenced readers (#46203 )" Unfortunately, with this change, we won't clean up all unreferenced generations when reopening. We assume that there's at most one unreferenced generation when reopening translog. The previous implementation guarantees this assumption by syncing translog every time after we remove a translog reader. This change, however, only syncs translog once after we have removed all unreferenced readers (can be more than one) and breaks the assumption. Closes #46267 This reverts commit fd8183ee51d7cf08d9def58a2ae027714beb60de.	2019-09-04 17:09:39 -04:00
Jason Tedor	3cbdd84b89	Add test that get triggers shard search active (#46317 ) This commit is a follow-up to a change that fixed that multi-get was not triggering a shard to become search active. In that change, we added a test that multi-get properly triggers a shard to become search active. This commit is a follow-up to that change which adds a test for the get case. While get is already handled correctly in production code, there was not a test for it. This commit adds one. Additionally, we factor all the search idle tests from IndexShardIT into a separate test class, as an effort to keep related tests together instead of a single large test class containing a jumble of tests, and also to keep test classes smaller for better parallelization.	2019-09-04 11:53:32 -04:00
markharwood	408b58dd9d	Adjacency_matrix aggregation optimisation. (#46257 ) (#46315 ) Avoid pre-allocating ((N * N) - N) / 2 “BitsIntersector” objects given N filters. Most adjacency matrices will be sparse and we typically don’t need to allocate all of these objects - can save a lot of allocations when the number of filters is high. Closes #46212	2019-09-04 16:45:32 +01:00
Nhat Nguyen	eb56d23421	Do not send recovery requests with CancellableThreads (#46287 ) Previously, we send recovery requests using CancellableThreads because we send requests and wait for responses in a blocking manner. With async recovery, we no longer need to do so. Moreover, if we fail to submit a request, then we can release the Store using an interruptible thread which can risk invalidating the node lock. This PR is the first step to avoid forking when releasing the Store. Relates #45409 Relates #46178	2019-09-04 11:26:11 -04:00
Henning Andersen	5066835569	Fix SearchService.createContext exception handling (#46258 ) An exception from the DefaultSearchContext constructor could leak a searcher, causing future issues like shard lock obtained exceptions. The underlying cause of the exception in the constructor has been fixed, but as a safety precaution we also fix the exception handling in createContext. Closes #45378	2019-09-04 14:46:30 +02:00
Nhat Nguyen	3f67cbe974	Suppress warning from background sync on relocated primary (#46247 ) If a primary as being relocated, then the global checkpoint and retention lease background sync can emit unnecessary warning logs. This side effect was introduced in #42241. Relates #40800 Relates #42241	2019-09-03 18:44:15 -04:00
Nhat Nguyen	5924df1764	Mute testRecoveryFromFailureOnTrimming Tracked at #46267	2019-09-03 18:44:08 -04:00
Lee Hinman	57f322f85e	Move MockRespository into test framework (#46298 ) This moves the `MockRespository` class into `test/framework/src/main` so it can be used across all modules and plugins in tests.	2019-09-03 16:21:10 -06:00
Jason Tedor	b8c51ff894	Multi-get requests should wait for search active (#46283 ) When a shard has fallen search idle, and a non-realtime multi-get request is executed, today such requests do not wait for the shard to become search active and therefore such requests do not wait for a refresh to see the latest changes to the index. This also prevents such requests from triggering the shard as non-search idle, influencing the behavior of scheduled refreshes. This commit addresses this by attaching a listener to the shard search active state for multi-get requests. In this way, when the next scheduled refresh is executed, the multi-get request will then proceed.	2019-09-03 14:31:37 -04:00
Henning Andersen	2383acaa89	Fix testSyncFailsIfOperationIsInFlight (#46269 ) testSyncFailsIfOperationIsInFlight could fail due to the index request spawing a GCP sync (new since 7.4). Test now waits for it to finish before testing that flushed sync fails.	2019-09-03 17:30:00 +02:00
dengweisysu	416419e4c9	Sync translog without lock when trim unreferenced readers (#46203 ) With this change, we can avoid blocking writing threads when trimming unreferenced readers; hence improving the translog writing performance in async durability mode. Close #46201	2019-09-02 21:55:06 -04:00
Anup	e01ec802e7	Remove duplicate line in SearchAfterBuilder (#45994 )	2019-09-03 01:30:01 +02:00
Armin Braun	2662c1b417	Wait for all Rec. to Stop on Node Close (#46178 ) (#46237 ) * Wait for all Rec. to Stop on Node Close * This issue is in the `RecoverySourceHandler#acquireStore`. If we submit the store release to the generic threadpool while it is getting shut down we never complete the futue we wait on (in the generic pool as well) and fail to ever release the store potentially. * Fixed by waiting for all recoveries to end on node close so that we aways have a healthy thread pool here * Closes #45956	2019-09-02 18:04:37 +02:00
Martijn van Groningen	555b630160	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-09-02 09:16:55 +02:00
Martijn van Groningen	5747badaa8	Allow ingest processors access to node client. (#46077 ) This is the first PR that merges changes made to server module from the enrich branch (see #32789) into the master branch. The plan is to merge changes made to the server module separately from the pr that will merge enrich into master, so that these changes can be reviewed in isolation.	2019-09-02 08:24:26 +02:00
Nhat Nguyen	db949847e5	Fix translog stats in testPrepareIndexForPeerRecovery (#46137 ) When recovering a shard locally, we use a translog snapshot from newSnapshotFromGen which consists of all readers from a certain generation. In the test, we use newSnapshotFromMinSeqNo for the expectation. The snapshot of this method includes only readers containing operations in the requesting range. Closes #46022	2019-08-30 08:53:27 -04:00
Andrey Ershov	152ce62c58	Enhanced logging when transport is misconfigured to talk to HTTP port (#45964 ) If a node is misconfigured to talk to remote node HTTP port (instead of transport port) eventually it will receive an HTTP response from the remote node on transport port (this happens when a node sends accidentally line terminating byte in a transport request). If this happens today it results in a non-friendly log message and a long stack trace. This commit adds a check if a malformed response is HTTP response. In this case, a concise log message would appear. (cherry picked from commit 911d02b7a9c3ce7fe316360c127a935ca4b11f37)	2019-08-30 13:02:08 +02:00
Paul Sanwald	8bdbc7d9bf	Bump version from 7.4 to 7.5 (#46142 )	2019-08-29 15:03:26 -04:00
Julie Tibshirani	b5d8b364bb	Ensure top docs optimization is fully disabled for queries with unbounded max scores. (#46105 ) (#46139 ) When a query contains a mandatory clause that doesn't track the max score per block, we disable the max score optimization. Previously, we were doing this by wrapping the collector with a FilterCollector that always returned ScoreMode.COMPLETE. However we weren't adjusting totalHitsThreshold, so the collector could still call Scorer#setMinCompetitiveScore. It is against the method contract to call setMinCompetitiveScore when the score mode is COMPLETE, and some scorers like ReqOptSumScorer throw an error in this case. This commit tries to disable the optimization by always setting totalHitsThreshold to max int, as opposed to wrapping the collector.	2019-08-29 10:56:53 -07:00
Simon Willnauer	9b2ea07b17	Flush engine after big merge (#46066 ) (#46111 ) Today we might carry on a big merge uncommitted and therefore occupy a significant amount of diskspace for quite a long time if for instance indexing load goes down and we are not quickly reaching the translog size threshold. This change will cause a flush if we hit a significant merge (512MB by default) which frees diskspace sooner.	2019-08-29 17:54:15 +02:00
Nhat Nguyen	bb49124690	Only verify global checkpoint if translog sync occurred (#45980 ) We only sync translog if the given offset hasn't synced yet. We can't verify the global checkpoint from the latest translog checkpoint unless a sync has occurred. Closes #46065 Relates #45634	2019-08-29 09:44:40 -04:00
David Turner	d340530a47	Avoid overshooting watermarks during relocation (#46079 ) Today the `DiskThresholdDecider` attempts to account for already-relocating shards when deciding how to allocate or relocate a shard. Its goal is to stop relocating shards onto a node before that node exceeds the low watermark, and to stop relocating shards away from a node as soon as the node drops below the high watermark. The decider handles multiple data paths by only accounting for relocating shards that affect the appropriate data path. However, this mechanism does not correctly account for _new_ relocating shards, which are unwittingly ignored. This means that we may evict far too many shards from a node above the high watermark, and may relocate far too many shards onto a node causing it to blow right past the low watermark and potentially other watermarks too. There are in fact two distinct issues that this PR fixes. New incoming shards have an unknown data path until the `ClusterInfoService` refreshes its statistics. New outgoing shards have a known data path, but we fail to account for the change of the corresponding `ShardRouting` from `STARTED` to `RELOCATING`, meaning that we fail to find the correct data path and treat the path as unknown here too. This PR also reworks the `MockDiskUsagesIT` test to avoid using fake data paths for all shards. With the changes here, the data paths are handled in tests as they are in production, except that their sizes are fake. Fixes #45177	2019-08-29 12:40:55 +01:00
Jason Tedor	9bc4a24118	Handle delete document level failures (#46100 ) Today we assume that document failures can not occur for deletes. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 22:17:16 -04:00
Tal Levy	a356bcff41	Add Circle Processor (#43851 ) (#46097 ) add circle-processor that translates circles to polygons	2019-08-28 14:44:08 -07:00
Jason Tedor	1249e6ba5d	Handle no-op document level failures (#46083 ) Today we assume that document failures can not occur for no-ops. This assumption is bogus, as they can fail for a variety of reasons such as the Lucene index having reached the document limit. Because of this assumption, we were asserting that such a document-level failure would never happen. When this bogus assertion is violated, we fail the node, a catastrophe. Instead, we need to treat this as a fatal engine exception.	2019-08-28 13:57:24 -04:00
Tanguy Leroux	9e14ffa8be	Few clean ups in ESBlobStoreRepositoryIntegTestCase (#46068 )	2019-08-28 16:29:46 +02:00
Mark Tozzi	aec125faff	Support Range Fields in Histogram and Date Histogram (#46012 ) Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395) * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type. This is similar to how Terms aggregator works. * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation	2019-08-28 09:06:09 -04:00
Martijn van Groningen	1157224a6b	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-28 10:14:07 +02:00
Henning Andersen	300e717e42	Disallow partial results when shard unavailable (#45739 ) Searching with `allowPartialSearchResults=false` could still return partial search results during recovery. If a shard copy fails with a "shard not available" exception, the failure would be ignored and a partial result returned. The one case where this is known to happen is when a shard copy is recovering when searching, since `IllegalIndexShardStateException` is considered a "shard not available" exception. Relates to #42612	2019-08-27 17:01:23 +02:00
Nhat Nguyen	146e23a8a9	Relax translog assertion in testRestoreLocalHistoryFromTranslog (#45943 ) Since #45473, we trim translog below the local checkpoint of the safe commit immediately if soft-deletes enabled. In testRestoreLocalHistoryFromTranslog, we should have a safe commit after recoverFromTranslog is called; then we will trim translog files which contain only operations that are at most the global checkpoint. With this change, we relax the assertion to ensure that we don't put operations to translog while recovering history from the local translog.	2019-08-26 17:19:19 -04:00
Nhat Nguyen	c66bae39c3	Update translog checkpoint after marking ops as persisted (#45634 ) If two translog syncs happen concurrently, then one can return before its operations are marked as persisted. In general, this should not be an issue; however, peer recoveries currently rely on this assumption. Closes #29161	2019-08-26 17:18:52 -04:00
Nhat Nguyen	f2e8b17696	Do not create engine under IndexShard#mutex (#45263 ) Today we create new engines under IndexShard#mutex. This is not ideal because it can block the cluster state updates which also execute under the same mutex. We can avoid this problem by creating new engines under a separate mutex. Closes #43699	2019-08-26 17:18:29 -04:00
Jason Tedor	3d64605075	Remove node settings from blob store repositories (#45991 ) This commit starts from the simple premise that the use of node settings in blob store repositories is a mistake. Here we see that the node settings are used to get default settings for store and restore throttle rates. Yet, since there are not any node settings registered to this effect, there can never be a default setting to fall back to there, and so we always end up falling back to the default rate. Since this was the only use of node settings in blob store repository, we move them. From this, several places fall out where we were chaining settings through only to get them to the blob store repository, so we clean these up as well. That leaves us with the changeset in this commit.	2019-08-26 16:26:13 -04:00
Zachary Tong	943a016bb2	Add Cumulative Cardinality agg (and Data Science plugin) (#45990 ) This adds a pipeline aggregation that calculates the cumulative cardinality of a field. It does this by iteratively merging in the HLL sketch from consecutive buckets and emitting the cardinality up to that point. This is useful for things like finding the total "new" users that have visited a website (as opposed to "repeat" visitors). This is a Basic+ aggregation and adds a new Data Science plugin to house it and future advanced analytics/data science aggregations.	2019-08-26 16:19:55 -04:00
James Baiera	5535ff0a44	Fix IngestService to respect original document content type (#45799 ) (#45984 ) Backport of #45799 This PR modifies the logic in IngestService to preserve the original content type on the IndexRequest, such that when a document with a content type like SMILE is submitted to a pipeline, the resulting document that is persisted will remain in the original content type (SMILE in this case).	2019-08-26 14:33:33 -04:00
Armin Braun	af2bd75def	Fix Broken HTTP Request Breaking Channel Closing (#45958 ) (#45973 ) This is essentially the same issue fixed in #43362 but for http request version instead of the request method. We have to deal with the case of not being able to parse the request version, otherwise channel closing fails. Fixes #43850	2019-08-26 16:20:58 +02:00
Armin Braun	5a17987e19	Fix SnapshotStatusApisIT (#45929 ) (#45971 ) The snapshot status when blocking can still be INIT in rare cases when the new cluster state that has the snapshot in `STARTED` hasn't yet become visible. Fixes #45917	2019-08-26 15:59:02 +02:00
Andrey Ershov	d96469ddff	Better logging for TLS message on non-secure transport channel (#45835 ) This commit enhances logging for 2 cases: 1. If non-TLS enabled node receives transport message from TLS enabled node on transport port. 2. If non-TLS enabled node receives HTTPs request on transport port. (cherry picked from commit 4f52ebd32eb58526b4c8022f8863210bf88fc9be)	2019-08-26 15:07:13 +02:00
Jason Tedor	599bf2d68b	Deprecate the pidfile setting (#45938 ) This commit deprecates the pidfile setting in favor of node.pidfile.	2019-08-23 21:31:35 -04:00
Mayya Sharipova	3bc1494d38	Correct warning testScalingThreadPoolConfiguration Correct expected warning Closes #45907	2019-08-23 10:30:36 -04:00
Henning Andersen	46d9a575db	Fix RemoteClusterConnection close race (#45898 ) Closing a `RemoteClusterConnection` concurrently with trying to connect could result in double invoking the listener. This fixes RemoteClusterConnectionTest#testCloseWhileConcurrentlyConnecting Closes #45845	2019-08-23 14:26:02 +02:00
Tanguy Leroux	8e66df9925	Move testRetentionLeasesClearedOnRestore (#45896 )	2019-08-23 13:43:40 +02:00
Martijn van Groningen	837cfa2640	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-23 11:22:27 +02:00
Alexander Reelsen	ecafe4f4ad	Update joda to 2.10.3 (#45495 )	2019-08-23 10:39:39 +02:00
Armin Braun	ba6d72ea9f	Fix TransportSnapshotsStatusAction ThreadPool Use (#45824 ) (#45883 ) In case of an in-progress snapshot this endpoint was broken because it tried to execute repository operations in the callback on a transport thread which is not allowed (only generic or snapshot pool are allowed here).	2019-08-23 06:17:50 +02:00
Jason Tedor	de6b6fd338	Add node.processors setting in favor of processors (#45885 ) This commit namespaces the existing processors setting under the "node" namespace. In doing so, we deprecate the existing processors setting in favor of node.processors.	2019-08-22 22:18:37 -04:00
Nhat Nguyen	3393f9599e	Ignore translog retention policy if soft-deletes enabled (#45473 ) Since #45136, we use soft-deletes instead of translog in peer recovery. There's no need to retain extra translog to increase a chance of operation-based recoveries. This commit ignores the translog retention policy if soft-deletes is enabled so we can discard translog more quickly. Backport of #45473 Relates #45136	2019-08-22 16:40:06 -04:00
dengweisysu	72c6302d12	Fsync translog without writeLock before rolling (#45765 ) Today, when rolling a new translog generation, we block all write threads until a new generation is created. This choice is perfectly fine except in a highly concurrent environment with the translog async setting. We can reduce the blocking time by pre-sync the current generation without writeLock before rolling. The new step would fsync most of the data of the current generation without blocking write threads. Close #45371	2019-08-22 16:18:42 -04:00
William Brafford	f82c0f56a6	Mute flaky RemoteClusterConnection test (#45850 )	2019-08-22 15:00:43 -04:00
Jake Landis	c60399c77f	introduce 7.3.2 version to 7.x (#45864 )	2019-08-22 12:24:19 -05:00
Andrey Ershov	ed8307c198	Deprecate es.http.cname_in_publish_address setting (#45616 ) Follow up on #32806. The system property es.http.cname_in_publish_address is deprecated starting from 7.0.0 and deprecation warning should be added if the property is specified. This PR will go to 7.x and master. Follow-up PR to remove es.http.cname_in_publish_address property completely will go to the master. (cherry picked from commit a5ceca7715818f47ec87dd5f17f8812c584b592b)	2019-08-22 12:09:35 +02:00
Armin Braun	88acae48ce	Remove index-N Rebuild in Shard Snapshot Updates (#45740 ) (#45778 ) * There is no point in listing out every shard over and over when the `index-N` blob in the shard contains a list of all the files * Rebuilding the `index-N` from the `snap-${uuid}.dat` blobs does not provide any material benefit. It only would in the corner case of a corrupted `index-N` but otherwise uncorrupted blobs since we neither check the correctness of the content of all segment blobs nor do we do a similar recovery at the root of the repository. * Also, at least in version `6.x` we only mark a shard snapshot as successful after writing out the updated `index-N` blob so all snapshots that would work with `7.x` and newer must have correct `index-N` blobs => Removed the rebuilding of the `index-N` content from `snap-${uuid}.dat` files and moved to only listing `index-N` when taking a snapshot instead of listing all files => Removed check of file existence against physical blob listing => Kept full listing on the delete side to retain full cleanup of blobs that aren't referenced by the `index-N`	2019-08-22 11:32:45 +02:00
Luca Cavanna	b95ca9c3bb	Fix compile errors in HttpChannelTaskHandler Relates to #43332	2019-08-22 11:13:26 +02:00
Luca Cavanna	a47ade3e64	Cancel search task on connection close (#43332 ) This PR introduces a mechanism to cancel a search task when its corresponding connection gets closed. That would relief users from having to manually deal with tasks and cancel them if needed. Especially the process of finding the task_id requires calling get tasks which needs to call every node in the cluster. The implementation is based on associating each http channel with its currently running search task, and cancelling the task when the previously registered close listener gets called.	2019-08-22 10:43:20 +02:00
Nhat Nguyen	3029887451	Never release store using CancellableThreads (#45409 ) Today we can release a Store using CancellableThreads. If we are holding the last reference, then we will verify the node lock before deleting the store. Checking node lock performs some I/O on FileChannel. If the current thread is interrupted, then the channel will be closed and the node lock will also be invalid. Closes #45237	2019-08-21 21:24:31 -04:00
Tal Levy	9b14b7298b	[7.x] Add is_write_index column to cat.aliases (#45798 ) * Add is_write_index column to cat.aliases (#44772) Aliases have had the option to set `is_write_index` since 6.4, but the cat.aliases action was never updated. * correct version bounds to 7.4	2019-08-21 14:15:49 -07:00
William Brafford	2b549e7342	CLI tools: write errors to stderr instead of stdout (#45586 ) Most of our CLI tools use the Terminal class, which previously did not provide methods for writing to standard output. When all output goes to standard out, there are two basic problems. First, errors and warnings are "swallowed" in pipelines, making it hard for a user to know when something's gone wrong. Second, errors and warnings are intermingled with legitimate output, making it difficult to pass the results of interactive scripts to other tools. This commit adds a second set of print commands to Terminal for printing to standard error, with errorPrint corresponding to print and errorPrintln corresponding to println. This leaves it to developers to decide which output should go where. It also adjusts existing commands to send errors and warnings to stderr. Usage is printed to standard output when it's correctly requested (e.g., bin/elasticsearch-keystore --help) but goes to standard error when a command is invoked incorrectly (e.g. bin/elasticsearch-keystore list-with-a-typo \| sort).	2019-08-21 14:46:07 -04:00
Armin Braun	790765d3f9	Remove Dep. on SnapshotsService in SnapshotShardsService (#45776 ) (#45791 ) SnapshotShardsService depends on the RepositoriesService not the SnapshotsService, no need to have this indirection.	2019-08-21 19:26:19 +02:00
Armin Braun	6aaee8aa0a	Repository Cleanup Endpoint (#43900 ) (#45780 ) * Repository Cleanup Endpoint (#43900) * Snapshot cleanup functionality via transport/REST endpoint. * Added all the infrastructure for this with the HLRC and node client * Made use of it in tests and resolved relevant TODO * Added new `Custom` CS element that tracks the cleanup logic. Kept it similar to the delete and in progress classes and gave it some (for now) redundant way of handling multiple cleanups but only allow one * Use the exact same mechanism used by deletes to have the combination of CS entry and increment in repository state ID provide some concurrency safety (the initial approach of just an entry in the CS was not enough, we must increment the repository state ID to be safe against concurrent modifications, otherwise we run the risk of "cleaning up" blobs that just got created without noticing) * Isolated the logic to the transport action class as much as I could. It's not ideal, but we don't need to keep any state and do the same for other repository operations (like getting the detailed snapshot shard status)	2019-08-21 17:59:49 +02:00
Jim Ferenczi	fe2a7523ec	Add support for inlined user dictionary in the Kuromoji plugin (#45489 ) This change adds a new option called user_dictionary_rules to Kuromoji's tokenizer. It can be used to set additional tokenization rules to the Japanese tokenizer directly in the settings (instead of using a file). This commit also adds a check that no rules are duplicated since this is not allowed in the UserDictionary. Closes #25343	2019-08-21 16:28:30 +02:00
Martijn van Groningen	2677ac14d2	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-21 14:28:17 +02:00
Christos Soulios	2a0c7c40e5	[7.x] Implement AvgAggregatorTests#testDontCacheScripts and remove AvgIT #45746 Backports PR #45737: Similar to PR #45030 integration test testDontCacheScripts() was moved to unit test AvgAggregatorTests#testDontCacheScripts. AvgIT class was removed.	2019-08-20 20:19:51 +03:00
Christos Soulios	96a40acd82	[7.x] Migrate tests from MaxIT to MaxAggregatorTests (#45030 ) #45742 Backports PR #45030 to 7.x: This PR migrates tests from MaxIT integration test to MaxAggregatorTests, as described in #42893	2019-08-20 18:58:47 +03:00
Nhat Nguyen	e9759b2b33	Wait for background refresh in testAutomaticRefresh (#45661 ) If the background refresh is running, but not finished yet then the document might not be visible to the next search. Thus, if scheduledRefresh returns false, we need to wait until the background refresh is done. Closes #45571	2019-08-20 10:40:12 -04:00
Rory Hunter	47b3dccbc4	Always check that cgroup data is present (#45647 ) `OsProbe` fetches cgroup data from the filesystem, and has asserts that check for missing values. This PR changes most of these asserts into runtime checks, since at least one user has reported an NPE where a piece of cgroup data was missing. Backport of #45606 to 7.x.	2019-08-19 10:29:41 +01:00
Nhat Nguyen	6f5d944fbd	Ensure AsyncTask#isScheduled remain false after close (#45687 ) If a scheduled task of an AbstractAsyncTask starts after it was closed, then isScheduledOrRunning can remain true forever although no task is running or scheduled. Closes #45576	2019-08-17 13:48:50 -04:00
Vega	6f2daa85e3	Allow uppercase in keystore setting names (#45222 ) The elasticsearch keystore was originally backed by a PKCS#12 keystore, which had several limitations. To overcome some of these limitations in encoding, the setting names existing within the keystore were limited to lowercase alphanumberic (with underscore). Now that the keystore is backed by an encrypted blob, this restriction is no longer relevant. This commit relaxes that restriction by allowing uppercase ascii characters as well. closes #43835	2019-08-16 17:50:08 -07:00
Igor Motov	98c850c08b	Geo: Change order of parameter in Geometries to lon, lat 7.x (#45618 ) Changes the order of parameters in Geometries from lat, lon to lon, lat and moves all Geometry classes are moved to the org.elasticsearch.geomtery package. Backport of #45332 Closes #45048	2019-08-16 14:42:02 -04:00
Ryan Ernst	742213d710	Improve error message when index settings are not a map (#45588 ) This commit adds an explicit error message when a create index request contains a settings key that is not a json object. Prior to this change the user would be given a ClassCastException with no explanation of what went wrong. closes #45126	2019-08-16 11:39:26 -07:00
Zachary Tong	50c65d05ba	Move bucket reduction from Bucket to the InternalAgg (#45566 ) The current idiom is to have the InternalAggregator find all the buckets sharing the same key, put them in a list, get the first bucket and ask that bucket to reduce all the buckets (including itself). This a somewhat confusing workflow, and feels like the aggregator should be reducing the buckets (since the aggregator owns the buckets), rather than asking one bucket to do all the reductions. This commit basically moves the `Bucket.reduce()` method to the InternalAgg and renames it `reduceBucket()`. It also moves the `createBucket()` (or equivalent) method from the bucket to the InternalAgg as well.	2019-08-16 13:59:00 -04:00
Andrey Ershov	dbc90653dc	transport.publish_address should contain CNAME (#45626 ) This commit adds CNAME reporting for transport.publish_address same way it's done for http.publish_address. Relates #32806 Relates #39970 (cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)	2019-08-16 17:42:00 +02:00
Armin Braun	d6a9edea16	Lower Limit for Maximum Message Size in TcpTransport (#44496 ) (#45635 ) * Since we're buffering network reads to the heap and then deserializing them it makes no sense to buffer a message that is 90% of the heap size since we couldn't deserialize it anyway * I think `30%` is a more reasonable guess here given that we can reasonably assume that the deserialized message will be larger than the serialized message itself and processing it will take additional heap as well	2019-08-16 12:27:54 +02:00
Martijn van Groningen	5ea0985711	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-16 09:47:11 +02:00
Armin Braun	a48242c371	Cleanup Redundant TransportLogger Instantiation (#43265 ) (#45629 ) * This class' methods are all effectively `static` => make them `static` and stop instantiating it needlessly	2019-08-15 21:16:56 +02:00
Zachary Tong	cd441f6906	Catch AllocatedTask registration failures (#45300 ) When a persistent task attempts to register an allocated task locally, this creates the Task object and starts tracking it locally. If there is a failure while initializing the task, this is handled by a catch and subsequent error handling (canceling, unregistering, etc). But if the task fails to be created because an exception is thrown in the tasks ctor, this is uncaught and fails the cluster update thread. The ramification is that a persistent task remains in the cluster state, but is unable to create the allocated task, and the exception prevents other tasks "after" the poisoned task from starting too. Because the allocated task is never created, the cancellation tools are not able to remove the persistent task and it is stuck as a zombie in the CS. This commit adds exception handling around the task creation, and attempts to notify the master if there is a failure (so the persistent task can be removed). Even if this notification fails, the exception handling means the rest of the uninitialized tasks can proceed as normal.	2019-08-15 15:14:19 -04:00
Armin Braun	de58353722	Lower Painless Static Memory Footprint (#45487 ) (#45619 ) * Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable * This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node) * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization	2019-08-15 19:41:45 +02:00
Alpar Torok	03a1645bc6	Use dynamic port ranges for ExternalTestCluster (#45601 ) Moves methods added in #44213 and uses them to configure the port range for `ExternalTestCluster` too. These were still using `9300-9400` ( teh default ) and running into races.	2019-08-15 16:40:12 +03:00
Armin Braun	1beea3588b	Make BlobStoreRepository Validation Read master.dat (#45546 ) (#45578 ) * Fixing this for two reasons: 1. Why not verify that the seed we wrote is actually there when we can 2. The AWS S3 SDK started to log a bunch of WARN messages about not fully reading the stream now that we started to abuse the read blob as an `exists` check after removing that method from the blob container	2019-08-15 07:07:52 +02:00
Nick Knize	647a8308c3	[SPATIAL] Backport new ShapeFieldMapper and ShapeQueryBuilder to 7x (#45363 ) * Introduce Spatial Plugin (#44389) Introduce a skeleton Spatial plugin that holds new licensed features coming to Geo/Spatial land! * [GEO] Refactor DeprecatedParameters in AbstractGeometryFieldMapper (#44923) Refactor DeprecatedParameters specific to legacy geo_shape out of AbstractGeometryFieldMapper.TypeParser#parse. * [SPATIAL] New ShapeFieldMapper for indexing cartesian geometries (#44980) Add a new ShapeFieldMapper to the xpack spatial module for indexing arbitrary cartesian geometries using a new field type called shape. The indexing approach leverages lucene's new XYShape field type which is backed by BKD in the same manner as LatLonShape but without the WGS84 latitude longitude restrictions. The new field mapper builds on and extends the refactoring effort in AbstractGeometryFieldMapper and accepts shapes in either GeoJSON or WKT format (both of which support non geospatial geometries). Tests are provided in the ShapeFieldMapperTest class in the same manner as GeoShapeFieldMapperTests and LegacyGeoShapeFieldMapperTests. Documentation for how to use the new field type and what parameters are accepted is included. The QueryBuilder for searching indexed shapes is provided in a separate commit. * [SPATIAL] New ShapeQueryBuilder for querying indexed cartesian geometry (#45108) Add a new ShapeQueryBuilder to the xpack spatial module for querying arbitrary Cartesian geometries indexed using the new shape field type. The query builder extends AbstractGeometryQueryBuilder and leverages the ShapeQueryProcessor added in the previous field mapper commit. Tests are provided in ShapeQueryTests in the same manner as GeoShapeQueryTests and docs are updated to explain how the query works.	2019-08-14 16:35:10 -05:00
Armin Braun	e0d84e7178	Clean up Callback Chains and Duplicate in SnapshotResiliencyTests (#45398 ) (#45563 ) * It's in the title, follow up to #45233 * Flatten more listeners into `StepListener` * Remove duplication from repo and index bootstrap and asserting that the steps execute successfully	2019-08-14 21:53:07 +02:00
Armin Braun	5f6bc6fc2d	Prevent Leaking Search Tasks on Exceptions in FetchSearchPhase and DfsQueryPhase (#45500 ) (#45540 ) * If `counter.onResult` throws an exception we might leak a transport task because the failure is not handled as a phase failure (instead it bubbles up in the transport service eventually hitting the `onFailure` callback again and couting down the `counter` twice). Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>	2019-08-14 14:49:38 +02:00
Armin Braun	00e4fba2fb	Simplify and Optimize RestController Slightly (#45419 ) (#45485 ) * Simplify the path iterator to generate less garbage * `dispatchRequest` always terminates, adjust code accordingly	2019-08-13 10:43:30 +02:00
Martijn van Groningen	1951cdf1cb	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-13 09:12:31 +02:00
Julie Tibshirani	dc1856ca53	Make sure to validate the type before attempting to merge a new mapping. (#45157 ) Currently, when adding a new mapping, we attempt to parse + merge it before checking whether its top-level document type matches the existing type. So when a user attempts to introduce a new mapping type, we may give a confusing error message around merging instead of complaining that it's not possible to add more than one type ("Rejecting mapping update to [my-index] as the final mapping would have more than 1 type..."). This PR moves the type validation to the start of `MetaDataMappingService#applyRequest` so that we make sure the type matches before performing any mapper merging. We already partially addressed this issue in #29316, but the tests there focused on `MapperService` and did not catch this problem with end-to-end mapping updates. Addresses #43012.	2019-08-12 14:28:03 -07:00
Zachary Tong	4d97d2c50f	Revert "Only execute one final reduction in InternalAutoDateHistogram (#45359 )" This reverts commit `c0ea8a867e`.	2019-08-12 17:17:17 -04:00
Julie Tibshirani	8c4394d5d7	Fix a bug where mappings are dropped from rollover requests. (#45411 ) We accidentally introduced this bug when adding a typeless version of the rollover request. The bug is not present if include_type_name is set to true.	2019-08-12 12:46:27 -07:00
Michael Basnight	a521e4c86f	Retrieve processors instead of checking existence (#45354 ) The previous hasProcessors method would validate if a processor was present within a pipeline, but would not return the contents of the processors. This does not allow a consumer to inspect the processor for specific metadata. The method now returns the list of processors based on the class of the processor passed in.	2019-08-12 13:48:17 -05:00
Zachary Tong	472f6ef41a	Mute InternalAutoDateHistogramTests#testReduceRandom()	2019-08-12 14:45:08 -04:00
Zachary Tong	c0ea8a867e	Only execute one final reduction in InternalAutoDateHistogram (#45359 ) Because auto-date-histo can perform multiple reductions while merging buckets, we need to ensure that the intermediate reductions are done with a `finalReduce` set to false to prevent Pipeline aggs from generating their output. Once all the buckets have been merged and the output is stable, a mostly-noop reduction can be performed which will allow pipelines to generate their output.	2019-08-12 14:07:38 -04:00
Albert Zaharovits	2cb172f079	CreateIndex and PutIndexTemplate with typeless mapping (#45120 ) This commit makes sure that mapping parameters to `CreateIndex` and `PutIndexTemplate` are keyed by the type name. `IndexCreationTask` expects mappings to be keyed by the type name. It asserts this for template mappings but not for the mappings in the request. The `CreateIndexRequest` and `RestCreateIndexAction` mostly make it sure that the mapping is keyed by a type name, but not always. When building the create-index request outside of the REST handler, there are a few methods to set the mapping for the request. Some of them add the type name some of them do not. For example, `CreateIndexRequest#mapping(String type, Map<String, ?> source)` adds the type name, but `CreateIndexRequest#mapping(String type, XContentBuilder source)` does not. This PR asserts the type name in the request mapping inside `IndexCreationTask` and makes all `CreateIndexRequest#mapping` methods add the type name.	2019-08-12 08:05:07 +03:00
Armin Braun	a9e1402189	Remove Settings from BaseRestRequest Constructor (#45418 ) (#45429 ) * Resolving the todo, cleaning up the unused `settings` parameter * Cleaning up some other minor dead code in affected classes	2019-08-12 05:14:45 +02:00
Nhat Nguyen	cf9a73b5ac	Call afterWriteOperation after trim translog in peer recovery (#45182 ) testShouldFlushAfterPeerRecovery was added #28350 to make sure the flushing loop triggered by afterWriteOperation eventually terminates. This test relies on the fact that we call afterWriteOperation after making changes in translog. In #44756, we roll a new generation in RecoveryTarget#finalizeRecovery but do not call afterWriteOperation. Relates #28350 Relates #45073	2019-08-10 22:59:02 -04:00
Nhat Nguyen	25c6102101	Trim local translog in peer recovery (#44756 ) Today, if an operation-based peer recovery occurs, we won't trim translog but leave it as is. Some unacknowledged operations existing in translog of that replica might suddenly reappear when it gets promoted. With this change, we ensure trimming translog above the starting sequence number of phase 2. This change can allow us to read translog forward.	2019-08-10 22:59:02 -04:00
Armin Braun	1cd464d675	Isolate Request in Call-Chain for REST Request Handling (#45130 ) (#45417 ) * Follow up to #44949 * Stop using a special code path for multi-line JSON and instead handle its detection like that of other XContent types when creating the request * Only leave a single path that holds a reference to the full REST request * In the next step we can move the copying of request content to happen before the actual request handling and make it conditional on the handler in question to stop copying bulk requests as suggested in #44564	2019-08-10 10:21:01 +02:00
Armin Braun	d1ed9bdbfd	Use StepListener to Simplify SnapshotResiliencyTests (#45233 ) (#45386 ) * Reduces complicated callback relations in `testSuccessfulSnapshotAndRestore` to flat steps of sequential actions * Will refactor the other tests in this suit as a follow up * This format certainly makes it easier to create more complicated tests that involve multiple subsequent snapshots as it would allow adding loops	2019-08-09 18:19:48 +02:00
Yannick Welsch	9e6d874a41	Show BWC version in ClusterFormationFailureHelper (#45352 ) When having a cluster state from 6.x, display the metadata version as the cluster state version. Avoids confusion where a cluster state from 6.x is displayed as version 0 even if has some actual content.	2019-08-09 16:23:38 +02:00
Yannick Welsch	5ddeb488a6	Allow _update on write alias (#45318 ) Using the document update API on aliases with a write index does not work. Follow-up to #31520	2019-08-09 11:44:24 +02:00
Martijn van Groningen	f1ee29f22e	Added a custom api to perform the msearch more efficiently for enrich processor (#43965 ) Currently the msearch api is used to execute buffered search requests; however the msearch api doesn't deal with search requests in an intelligent way. It basically executes each search separately in a concurrent manner. This api reuses the msearch request and response classes and executes the searches as one request in the node holding the enrich index shard. Things like engine.searcher and query shard context are only created once. Also there are less layers than executing a regular msearch request. This results in an interesting speedup. Without this change, in a single node cluster, enriching documents with a bulk size of 5000 items, the ingest time in each bulk response varied from 174ms to 822ms. With this change the ingest time in each bulk response varied from 54ms to 109ms. I think we should add a change like this based on this improvement in ingest time. However I do wonder if instead of doing this change, we should improve the msearch api to execute more efficiently. That would be more complicated then this change, because in this change the custom api can only search enrich index shards and these are special because they always have a single primary shard. If msearch api is to be improved then that should work for any search request to any indices. Making the same optimization for indices with more than 1 primary shard requires much more work. The current change is isolated in the enrich plugin and LOC / complexity is small. So this good enough for now.	2019-08-09 09:11:04 +02:00
Tal Levy	2a99eaa7c2	Revert "removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 )" This reverts commit `7b0a8040de`.	2019-08-08 17:40:03 -07:00
Armin Braun	12ed6dc999	Only retain reasonable history for peer recoveries (#45208 ) (#45355 ) Today if a shard is not fully allocated we maintain a retention lease for a lost peer for up to 12 hours, retaining all operations that occur in that time period so that we can recover this replica using an operations-based recovery if it returns. However it is not always reasonable to perform an operations-based recovery on such a replica: if the replica is a very long way behind the rest of the replication group then it can be much quicker to perform a file-based recovery instead. This commit introduces a notion of "reasonable" recoveries. If an operations-based recovery would involve copying only a small number of operations, but the index is large, then an operations-based recovery is reasonable; on the other hand if there are many operations to copy across and the index itself is relatively small then it makes more sense to perform a file-based recovery. We measure the size of the index by computing its number of documents (including deleted documents) in all segments belonging to the current safe commit, and compare this to the number of operations a lease is retaining below the local checkpoint of the safe commit. We consider an operations-based recovery to be reasonable iff it would involve replaying at most 10% of the documents in the index. The mechanism for this feature is to expire peer-recovery retention leases early if they are retaining so much history that an operations-based recovery using that lease would be unreasonable. Relates #41536	2019-08-09 01:56:32 +02:00
Tal Levy	7b0a8040de	removes the CellIdSource abstraction from geo-grid aggs (#45307 ) (#45353 ) CellIdSource is a helper ValuesSource that encodes GeoPoint into a long-encoded representation of the grid bucket the point is associated with. This complicates thing as usage evolves to support shapes that are associated with more than one bucket ordinal.	2019-08-08 16:33:16 -07:00
Armin Braun	b19de55095	Add missing wait to testAutomaticReleaseOfIndexBlock (#45342 ) (#45351 ) Today the test waits for one of the shards to be blocked, but this does not mean that the block has been applied on all nodes, so a subsequent indexing operation may still go through. Fixes #45338	2019-08-08 22:39:22 +02:00
Henning Andersen	d139896b66	Reindex share retry between hit sources (#44203 ) (#45348 ) The client and remote hit sources had each their own retry mechanism, which would do the same. Supporting resiliency we would have to expand on the retry mechanisms and as a preparation for that, the retry mechanism is now shared such that each sub class is only responsible for sending requests and converting responses/failures to common format. Part of #42612	2019-08-08 22:01:29 +02:00
Christoph Büscher	a552b33276	Fix occasional SuggestSearchIT failure (#45330 ) Refreshes happening during indexing can result differen segment counts and slightly skewed term statistics, which in turn has the potential to change suggestion output slightly. In order to prevent this, disable refresh for the affected tests. Closes #43261	2019-08-08 21:06:32 +02:00
Martijn van Groningen	bb429d3b5c	required changes after merge	2019-08-08 17:04:18 +02:00
Dimitris Athanasiou	e53bb050db	Mute testAutomaticReleaseOfIndexBlock Relates #45338	2019-08-08 17:56:41 +03:00
Martijn van Groningen	708f856940	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-08 16:52:45 +02:00
Andrey Ershov	07c656fba9	Mute testCustomDataPaths on Windows See #45333 (cherry picked from commit 671e1ad1068aee4b593ad0c8ab13ff60b4f125b8)	2019-08-08 16:26:56 +02:00
Zachary Tong	86d6597890	Use newIndexSearcher() instead of newSearcher() (#45248 ) `newSearcher()` from lucene can randomly choose index readers which are not compatible with our tests, like ParallelCompositeReader. The `newIndexSearcher()` method on AggregatorTestCase is a wrapper similar to newSearcher but compatible with our tests	2019-08-08 09:34:38 -04:00
Martijn van Groningen	e066133016	Change the ingest simulate api to not include dropped documents (#44161 ) If documents are dropped by the `drop` processor then these documents are returned as a `null` value in the response. === Example Create pipeline: ``` PUT _ingest/pipeline/droppipeline { "processors": [ { "set": { "field": "bla", "value": "val" } }, { "drop": {} } ] } ``` Simulate request: POST _ingest/pipeline/droppipeline/_simulate { "docs": [ { "_source": { "message": "text" } } ] } Response: ``` { "docs": [ null ] } ``` Response if verbose is enabled: ``` { "docs": [ { "processor_results": [ { "doc": { "_index": "_index", "_type": "_doc", "_id": "_id", "_source": { "message": "text", "bla": "val" }, "_ingest": { "timestamp": "2019-07-10T11:07:10.758315Z" } } }, null ] } ] } ``` Closes #36150 * Abort pipeline simulation in verbose mode when document has been dropped by drop processor	2019-08-08 13:04:33 +02:00
Martijn van Groningen	fb959d188c	Backport: Add description to force-merge tasks (#41365 ) (#45191 ) * Add description to force-merge tasks (#41365) This is static information that is part of the force merge request. Relates to #15975	2019-08-08 08:15:09 +02:00
Michael Basnight	89861d0884	Add ingest processor existence helper method (#45156 ) This commit adds a helper method to the ingest service allowing it to inspect a pipeline by id and verify the existence of a processor in the pipeline. This work exposed a potential bug in that some processors contain inner processors that are passed in at instantiation. These processors needed a common way to expose their inner processors, so the WrappingProcessor was created in order to expose the inner processor.	2019-08-07 11:19:04 -05:00
Bukhtawar	cd304c4def	Auto-release flood-stage write block (#42559 ) If a node exceeds the flood-stage disk watermark then we add a block to all of its indices to prevent further writes as a last-ditch attempt to prevent the node completely exhausting its disk space. However today this block remains in place until manually removed, and this block is a source of confusion for users who current have ample disk space and did not even realise they nearly ran out at some point in the past. This commit changes our behaviour to automatically remove this block when a node drops below the high watermark again. The expectation is that the high watermark is some distance below the flood-stage watermark and therefore the disk space problem is truly resolved. Fixes #39334	2019-08-07 11:03:53 +01:00
Tanguy Leroux	a869342910	Restore DefaultShardOperationFailedException's reason after deserialization (#45203 ) The reason field of DefaultShardOperationFailedException is lost during serialization. This is sad because this field is checked for nullity during xcontent generation and it means that the cause won't be included in the generated xcontent and won't be printed in two REST API responses (Close Index API and Indices Shard Stores API). This commit simply restores the reason from the cause during deserialization.	2019-08-07 10:37:15 +02:00
Jason Tedor	bd59ee6c72	Fix clock used in update requests (#45262 ) We accidentally switched to using the relative time provider here. This commit fixes this by switching to the appropriate absolute clock.	2019-08-06 21:15:21 -04:00
David Turner	f5d1381e01	Remove always-true param from IndicesService#stats (#45231 ) Parameter `includePrevious` is always true, so this commit inlines it.	2019-08-06 17:22:11 +01:00
David Turner	355713b9ca	Improve slow logging in MasterService (#45241 ) Adds a tighter threshold for logging a warning about slowness in the `MasterService` instead of relying on the cluster service's 30-second warning threshold. This new threshold applies to the computation of the cluster state update in isolation, so we get a warning if computing a new cluster state update takes longer than 10 seconds even if it is subsequently applied quickly. It also applies independently to the length of time it takes to notify the cluster state tasks on completion of publication, in case any of these notifications holds up the master thread for too long. Relates #45007 Backport of #45086	2019-08-06 17:01:49 +01:00
Tanguy Leroux	772ce1f599	Add deprecation warning for Force Merge API (#44903 ) This commit adds a deprecation warning in 7.x for the Force Merge API when both only_expunge_deletes and max_num_segments are set in a request. Relates #44761	2019-08-06 16:04:24 +02:00
Jason Tedor	5b1b146099	Normalize environment paths (#45179 ) This commit applies a normalization process to environment paths, both in how they are stored internally, also their settings values. This normalization is done via two means: - we make the paths absolute - we remove redundant name elements from the path (what Java calls "normalization") This change ensures that when we compare and refer to these paths within the system, we are using a common ground. For example, prior to the change if the data path was relative, we would not compare it correctly to paths from disk usage. This is because the paths in disk usage were being made absolute.	2019-08-06 06:04:30 -04:00
Yannick Welsch	7aeb2fe73c	Add per-socket keepalive options (#44055 ) Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings. By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like to explore whether we can enable them by default, in particular to force keepalive configurations that are better tuned for running ES.	2019-08-06 10:45:44 +02:00
Igor Motov	b5f88120b5	Geo: add Geometry-based query builders to QueryBuilders (#45058 ) Add Geometry-based method for creation of query builders in QueryBuilder Relates to #44715	2019-08-05 13:34:48 -04:00
Zachary Tong	3df1c76f9b	Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179 ) This adjusts the `buckets_path` parser so that pipeline aggs can select specific buckets (via their bucket keys) instead of fetching the entire set of buckets. This is useful for bucket_script in particular, which might want specific buckets for calculations. It's possible to workaround this with `filter` aggs, but the workaround is hacky and probably less performant. - Adjusts documentation - Adds a barebones AggregatorTestCase for bucket_script - Tweaks AggTestCase to use getMockScriptService() for reductions and pipelines. Previously pipelines could just pass in a script service for testing, but this didnt work for regular aggs. The new getMockScriptService() method fixes that issue, but needs to be used for pipelines too. This had a knock-on effect of touching MovFn, AvgBucket and ScriptedMetric	2019-08-05 12:18:40 -04:00
Zachary Tong	e5079ac288	[7.x backport] Add more flexibility to MovingFunction window alignment (#45159 ) Introduce shift field to MovingFunction aggregation. By default, shift = 0. Behavior, in this case, is the same as before. Increasing shift by 1 moves starting window position by 1 to the right. To simply include current bucket to the window, use shift = 1 For center alignment (n/2 values before and after the current bucket), use shift = window / 2 For right alignment (n values after the current bucket), use shift = window.	2019-08-05 11:56:52 -04:00
Nhat Nguyen	56083ba1ff	Remove assertion after locally recover replica (#45181 ) If the disk becomes broken after we have locally recovered shard up to the global checkpoint, then the assertion won't hold.	2019-08-05 10:48:02 -04:00
David Turner	13a167051f	Remove fileBasedRecovery flag (#45146 ) Today `RecoveryTarget#prepareForTranslogOperations` takes a boolean flag indicating whether the recovery is file-based or not. This was used in 6.x to bootstrap some commit data that were missing in indices created in 5.x: `b506955f8d/server/src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java (L298-L300)` This flag no longer has any effect, so this commit removes it. Backport of #45131 to 7.x.	2019-08-05 08:17:40 +01:00
Armin Braun	41815ed614	Optimize StreamInput#readString (#44930 ) (#45180 ) * Resolve TODO in `readString` by moving to reading chunks of `byte[]` instead of going byte by byte * Motivated by `readString` showing up as a significant user of CPU time on the IO thread in Rally PMC benchmark * Benchmarking this: * Could not reproduce a slowdown in the potential worst case (one or two non-ascii chars) since in this case the cost of creating the string itself exceeds the read times anyway * Speedup for 50%+ for reading 200 char ascii strings from `ByteBuf` or pages bytes backed streams * Longer strings obviously get bigger speedups * More ascii chars -> more speedup	2019-08-05 07:22:42 +02:00
Jason Tedor	d78ecd9c09	Use the full hash in build info (#45163 ) This commit switches to using the full hash to build into the JAR manifest, which is used in node startup and the REST main action to display the build hash.	2019-08-03 11:27:53 -04:00
Tim Brooks	984ba82251	Move nio channel initialization to event loop (#45155 ) Currently in the transport-nio work we connect and bind channels on the a thread before the channel is registered with a selector. Additionally, it is at this point that we set all the socket options. This commit moves these operations onto the event-loop after the channel has been registered with a selector. It attempts to set the socket options for a non-server channel at registration time. If that fails, it will attempt to set the options after the channel is connected. This should fix #41071.	2019-08-02 17:31:31 -04:00
Zachary Tong	ffbe047c32	Revert "Add more flexibility to MovingFunction window alignment (#44360 )" This reverts commit `1a58a487f0`.	2019-08-02 15:16:04 -04:00
Nikita Glashenko	1a58a487f0	Add more flexibility to MovingFunction window alignment (#44360 ) Introduce shift field to MovingFunction aggregation. By default, shift = 0. Behavior, in this case, is the same as before. Increasing shift by 1 moves starting window position by 1 to the right. To simply include current bucket to the window, use shift = 1 For center alignment (n/2 values before and after the current bucket), use shift = window / 2 For right alignment (n values after the current bucket), use shift = window.	2019-08-02 15:10:21 -04:00
David Turner	9ff320d967	Use index for peer recovery instead of translog (#45137 ) Today we recover a replica by copying operations from the primary's translog. However we also retain some historical operations in the index itself, as long as soft-deletes are enabled. This commit adjusts peer recovery to use the operations in the index for recovery rather than those in the translog, and ensures that the replication group retains enough history for use in peer recovery by means of retention leases. Reverts #38904 and #42211 Relates #41536 Backport of #45136 to 7.x.	2019-08-02 15:00:43 +01:00
Armin Braun	9450505d5b	Stop Passing Around REST Request in Multiple Spots (#44949 ) (#45109 ) * Stop Passing Around REST Request in Multiple Spots * Motivated by #44564 * We are currently passing the REST request object around to a large number of places. This works fine since we simply copy the full request content before we handle the rest itself which is needlessly hard on GC and heap. * This PR removes a number of spots where the request is passed around needlessly. There are many more spots to optimize in follow-ups to this, but this one would already enable bypassing the request copying for some error paths in a follow up.	2019-08-02 07:31:38 +02:00
Jim Ferenczi	3f94e2ea43	Sparse role queries can throw an NPE (#45053 ) Sparse role queries are executed differently than other queries in order to account for the fact that most of the documents are filtered from search. However this special execution does not set the scorer for the query so any collector that needs to access the score of a document fails with an NPE. This change fixed this bug by setting the scorer before collecting any hits when intersecting the main query and the sparse role.	2019-08-01 20:21:53 +02:00
William Brafford	5f50da947a	Fix bug in the Settings#processSetting method (#45095 ) The Settings#processSetting method is intended to take a setting map and add a setting to it, adjusting the keys as it goes in case of "conflicts" where the new setting implies an object where there is currently a string, or vice versa. processSetting was failing in two cases: adding a setting two levels under a string, and adding a setting two levels under a string and four levels under a map. This commit fixes the bug and adds test coverage for the previously faulty edge cases. * fix issue #43791 about settings * add unit test in testProcessSetting()	2019-08-01 13:27:08 -04:00
Yannick Welsch	917510d3e4	Always use primary term of operation in InternalEngine (#45083 ) We keep adding the current primary term to operations for which we do not assign a sequence number. This does not make sense anymore as all operations which we care about have sequence numbers now. The goal of this commit is to clean things up in InternalEngine and reduce the complexity.	2019-08-01 17:30:00 +02:00
Armin Braun	48dc53f8d2	Make PathTrieIterator a Little more Memory Efficient (#44951 ) (#45070 ) * There's no need to have the trie iterator hold another reference to the request object (which could be huge, see #44564) * Also removed unused boolean field from trie node	2019-08-01 17:26:08 +02:00
Nhat Nguyen	3a487379c3	Tighten no pending scheduled refresh check (#45025 ) Previously, we use ThreadPoolStats to ensure that the scheduledRefresh triggered by the internal refresh setting update is executed before we index a new document. With that change (#40387), this test did not fail for the last 3 months. However, using ThreadPoolStats is not entirely watertight as both "active" and "queue" count can be 0 in a very small interval when ThreadPoolExecutor pulls a task from the queue but before marking the corresponding worker as active (i.e., lock it). Closes #39565	2019-08-01 09:06:22 -04:00
David Turner	c088bafbbc	Wait for events in waitForRelocation (#45074 ) Adds a `waitForEvents(Priority.LANGUID)` to the cluster health request in `ESIntegTestCase#waitForRelocation()` to deal with the case that this health request returns successfully despite the fact that there is a pending reroute task which will relocate another shard. Relates #44433 Fixes #45003	2019-08-01 13:47:39 +01:00
David Turner	532ade7816	More logging for slow cluster state application (#45007 ) Today the lag detector may remove nodes from the cluster if they fail to apply a cluster state within a reasonable timeframe, but it is rather unclear from the default logging that this has occurred and there is very little extra information beyond the fact that the removed node was lagging. Moreover the only forewarning that the lag detector might be invoked is a message indicating that cluster state publication took unreasonably long, which does not contain enough information to investigate the problem further. This commit adds a good deal more detail to make the issues of slow nodes more prominent: - after 10 seconds (by default) we log an INFO message indicating that a publication is still waiting for responses from some nodes, including the identities of the problematic nodes. - when the publication times out after 30 seconds (by default) we log a WARN message identifying the nodes that are still pending. - the lag detector logs a more detailed warning when a fatally-lagging node is detected. - if applying a cluster state takes too long then the cluster applier service logs a breakdown of all the tasks it ran as part of that process.	2019-08-01 13:20:46 +01:00
Hendrik Muhs	b3be8f75f0	Fix version logic after 7.3 release (BWC) (#45077 ) removes unreleased version 7.2.2 after release of 7.3.0 as it breaks the version verifier, add documentation that explains the logic	2019-08-01 12:43:23 +02:00
Christoph Büscher	a669efd2a4	Remove left-over AwaitsFix in RateClusterStateIT (#45043 ) Issues are closed and fixes in #42580 and #42430 seem to be merged to 7.x at least.	2019-08-01 12:03:29 +02:00
Martijn van Groningen	39f280364b	required change after merging in 7.x branch	2019-08-01 13:44:42 +07:00
Martijn van Groningen	aae2f0cff2	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-08-01 13:38:03 +07:00
Tim Brooks	aff66e3ac5	Add Cors integration tests (#44361 ) This commit adds integration tests to ensure that the basic cors functionality works for the netty and nio transports.	2019-07-31 14:24:23 -06:00
Armin Braun	8d63bd1d1e	Cleanup Various Action- Listener and Runnable Usages (#42273 ) (#45052 ) * Dry up code for creating simple `ActionRunnable` a little * Shorten some other code around `ActionListener` usage, in particular when wrapping it in a `TransportResponseListener`	2019-07-31 18:55:31 +02:00
Armin Braun	ee663dc9ac	Reenable Parallel Restore Test on Windows (#45037 ) (#45050 ) * As a result of #44096 this test shouldn't fail anymore on `master` and `7.4`+ so we should reenable it there * For older versions we won't backport that change so the tests should stay disabled there * Closes #44671	2019-07-31 18:35:34 +02:00
Christoph Büscher	35291ae175	Remove muted AckIT and AckClusterUpdateSettingsIT (#45044 ) Reading up on #33673 it looks like parts of these tests have been reworked and there is no intention to fix the remains on 7.x, so I think we can remove the entire test.	2019-07-31 17:17:21 +02:00
Luca Cavanna	8cc3c0dd93	Remove task null check in TransportAction (#45014 ) The task that TaskManager#register returns cannot be null. The method enforces that it is not null after calling request#createTask. It is then needless to check for null in the listener later. Also, added the call to the delegate listener in a finally block, just to make sure.	2019-07-31 17:16:41 +02:00
Christoph Büscher	e85b53a955	Remove left-over AwaitsFix in DedicatedClusterSnapshotRestoreIT (#45042 ) The issue mentioned (#38845) seems to have been closed with #38891 so the test can be re-activated.	2019-07-31 17:15:41 +02:00
Armin Braun	c7d7230524	Stop Recreating Wrapped Handlers in RestController (#44964 ) (#45040 ) * We shouldn't be recreating wrapped REST handlers over and over for every request. We only use this hook in x-pack and the wrapper there does not have any per request state. This is inefficient and could lead to some very unexpected memory behavior => I made the logic create the wrapper on handler registration and adjusted the x-pack wrapper implementation to correctly forward the circuit breaker and content stream flags	2019-07-31 17:11:34 +02:00
Zachary Tong	c25f3dd5d0	Introduce 7.3.1 version (#45046 )	2019-07-31 10:53:55 -04:00
Andrey Ershov	c27ac3d24c	Unmute testClusterJoinDespiteOfPublishingIssues and testElectMasterWithLatestVersion (#38555 ) See my comments for #37539 and #37685 (cherry picked from commit 038d4ab2940340eca942e32b54044f183b7804d9)	2019-07-31 14:55:02 +02:00
David Roberts	5e3010a606	Use system context for looking up connected nodes (#43991 ) When finding nodes in a connected cluster for cross cluster search the requests to get cluster state on the connected cluster should be made in the system context because logically they are equivalent to checking a single detail in the local cluster state and should not require that the user who made the request that is using this method in its implementation is authorized to view the entire cluster state. Fixes #43974	2019-07-31 09:09:56 +01:00
Igor Motov	1a1bb4707d	Geo: move indexShape to AbstractGeometryFieldMapper.Indexer (#44979 ) Move indexShape functionality into AbstractGeometryFieldMapper to make it more unit testable. Relates to #43644	2019-07-30 14:50:23 -04:00
Mayya Sharipova	a154b73d99	Assure index ops are successful for SimpleNestedIT (#44815 ) relates to #44486	2019-07-30 14:24:28 -04:00
Nhat Nguyen	979d0a71c7	Remove leniency during replay translog in peer recovery (#44989 ) This change removes leniency in InternalEngine during replaying translog in peer recovery.	2019-07-30 13:25:15 -04:00
Jake Landis	41a99c9e4a	introduce 7.2.2 as a version (#44371 ) * introduce 7.2.2 as a version	2019-07-30 18:52:34 +02:00
Jake Landis	03fea1c503	introduce 6.8.3 as a version (#44708 )	2019-07-30 18:48:41 +02:00
David Kyle	78aa6143a6	Mute FilteringAllocationIT testTransientSettingsStillApplied Relates to https://github.com/elastic/elasticsearch/issues/45003	2019-07-30 14:10:50 +01:00
Yannick Welsch	c1b569ed4b	Revert "Mute Zen1IT#testMixedClusterDisruption" This reverts commit `cf78ca58e3`.	2019-07-30 13:10:14 +02:00
David Turner	55f1dd8da6	Close nodes properly in Coordinator tests (#44967 ) Today closing a `ClusterNode` in an `AbstractCoordinatorTestCase` uses `onNode()` so has no effect if the node is not in the current list of nodes. It also discards the `Runnable` it creates without having run it, so has no effect anyway. This commit makes these tests much stricter about properly closing the nodes started during `Coordinator` tests, by tracking the persisted states that are opened, and adds an assertion to catch the trappy requirement that the closing node still belongs to the cluster.	2019-07-30 11:47:36 +01:00
David Kyle	cf78ca58e3	Mute Zen1IT#testMixedClusterDisruption	2019-07-30 11:33:39 +01:00
Jim Ferenczi	43bd8f2ba0	Fix aggregators early termination with breadth-first mode (#44963 ) This commit fixes a bug when a deferred aggregator tries to early terminate the collection. In such case the CollectionTerminatedException is not caught and the search fails on the shard. This change makes sure that we catch the exception in order to continue the deferred collection on the next leaf. Fixes #44909	2019-07-30 11:26:40 +02:00
Andrey Ershov	5a0bd696fc	Snapshot tool S3 cleanup 7.x backport (#44575 ) Backport of #44551	2019-07-30 11:02:08 +02:00
Nhat Nguyen	4813728783	Remove leniency in reset engine from translog (#44711 ) Replaying operations from the local translog must never fail as those operations were processed successfully on the primary before and the mapping is up to update already. This change removes leniency during resetting engine from translog in IndexShard and InternalEngine.	2019-07-29 16:31:45 -04:00
Jack Conradson	1a21682ed0	Fix JodaCompatibleZonedDateTime casts in Painless (#44874 ) This is a temporary fix during the Joda to Java datetime transition. This will implicitly cast a JodaCompatibleZonedDateTime to a ZonedDateTime for both def and static types. This is necessary to insulate users from needing to know about JodaCompatibleZonedDateTime explicitly.	2019-07-29 12:05:26 -07:00
Igor Motov	b6cef227a5	Geo: fix geo query decomposition (#44924 ) The recent refactoring introduced an issue where queries where not going through the decomposition processing. Fixes #44891	2019-07-29 11:48:24 -04:00
Luca Cavanna	a3cc32da64	TaskListener#onFailure to accept Exception instead of Throwable (#44946 ) TaskListener accepts today Throwable in its onFailure method. Though looking at where it is called (TransportAction), it can never be notified of a Throwable. This commit changes the signature of TaskListener#onFailure so that it accepts an `Exception` rather than a `Throwable` as second argument.	2019-07-29 16:47:19 +02:00
Michał Perlak	245c9b7914	Optimize Min and Max BKD optimizations (#44315 ) MinAggregator - skip BKD optimization when no result found after 1024 lookups. MaxAggregator - skip unnecessary conversions.	2019-07-29 10:04:39 -04:00
Yannick Welsch	24873dd3e3	Do not block transport thread on startup (#44939 ) We currently block the transport thread on startup, which has caused test failures. I think this is some kind of deadlock situation. I don't think we should even block a transport thread, and there's also no need to do so. We can just reject requests as long we're not fully set up. Note that the HTTP layer is only started much later (after we've completed full start up of the transport layer), so that one should be completely unaffected by this. Closes #41745	2019-07-29 11:35:17 +02:00
Armin Braun	f5efafd4d6	Cleanup Deadcode o.e.indices (#44931 ) (#44938 ) * none of this is used anywhere	2019-07-29 10:38:35 +02:00
Martijn van Groningen	db49cb505e	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-07-29 14:45:10 +07:00
Igor Motov	cfc8d17bb4	Geo: refactor geo mapper and query builder (#44884 ) Refactors out the indexing and query generation logic out of the mapper and query builder into a separate unit-testable classes.	2019-07-26 16:48:31 -04:00
Yannick Welsch	1561ab5420	Guard open connection call in RemoteClusterConnection (#44921 ) Fixes an issue where a call to openConnection was not properly guarded, allowing an exception to bubble up to the uncaught exception handler, causing test failures. Closes #44912	2019-07-26 22:27:45 +02:00
Tanguy Leroux	e1b626b947	Ensure index is green in SimpleClusterStateIT.testIndicesOptions() (#44893 ) SimpleClusterStateIT testIndicesOptions failed in #44817 because it tries to close an index at the beginning of the test. With random index settings, it is possible that the index has a high number of shards (10) and replicas (1), which means that on CI this index can take time to be fully allocated. The close index request can fail in the case where replicas are still recovering operations. Thiscommit adds a simple ensureGreen() at the beginning of the test to be sure that all replicas are started before trying to close the index. closes #44817	2019-07-26 17:07:53 +02:00
Armin Braun	1340ff19bc	Fix Test Failure in ScalingThreadPoolTests (#44898 ) (#44901 ) * Due to #44894 some constellations log a deprecation warning here now * Fixed by checking for that	2019-07-26 17:05:50 +02:00
Tanguy Leroux	8848fcfb22	Ensure cluster is stable in ShrinkIndexIT.testShrinkThenSplitWithFailedNode (#44860 ) The test ShrinkIndexIT.testShrinkThenSplitWithFailedNode sometimes fails because the resize operation is not acknowledged (see #44736). This resize operation creates a new index "splitagain" and it results in a cluster state update (TransportResizeAction uses MetaDataCreateIndexService.createIndex() to create the resized index). This cluster state update is expected to be acknowledged by all nodes (see IndexCreationTask.onAllNodesAcked()) but this is not always true: the data node that was just stopped in the test before executing the resize operation might still be considered as a "faulty" node (and not yet removed from the cluster nodes) by the FollowersChecker. The cluster state is then acked on all nodes but one, and it results in a non acknowledged resize operation. This commit adds an ensureStableCluster() check after stopping the node in the test. The goal is to ensure that the data node has been correctly removed from the cluster and that all nodes are fully connected to each before moving forward with the resize operation. Closes #44736	2019-07-26 10:14:27 +02:00
Jason Tedor	6ea2b5dec0	Deprecate setting processors to more than available (#44889 ) Today the processors setting is permitted to be set to more than the number of processors available to the JVM. The processors setting directly sizes the number of threads in the various thread pools, with most of these sizes being a linear function in the number of processors. It doesn't make any sense to set processors very high as the overhead from context switching amongst all the threads will overwhelm, and changing the setting does not control how many physical CPU resources there are on which to schedule the additional threads. We have to draw a line somewhere and this commit deprecates setting processors to more than the number of available processors. This is the right place to draw the line given the linear growth as a function of processors in most of the thread pools, and that some are capped at the number of available processors already.	2019-07-26 17:06:44 +09:00
Ignacio Vera	821f6f893b	Upgrade to Lucene 8.2.0 release (#44859 ) (#44892 )	2019-07-26 08:14:59 +02:00
Nhat Nguyen	d128188c28	Return seq_no and primary_term in noop update (#44603 ) With this change, we will return primary_term and seq_no of the current document if an update is detected as a noop. We already return the version; hence we should also return seq_no and primary_term. Relates #42497	2019-07-25 19:16:56 -04:00
Yannick Welsch	bd8470e738	Asynchronously connect to remote clusters (#44825 ) Refactors RemoteClusterConnection so that it no longer blockingly connects to remote clusters. Relates to #40150	2019-07-25 22:59:59 +02:00
Yannick Welsch	0ce841915c	Add Clone Index API (#44267 ) Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the difference that the number of primary shards is kept the same. In case where the filesystem provides hard-linking capabilities, this is a very cheap operation. Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it supports the same options as the split and shrink APIs. Closes #44128	2019-07-25 22:02:28 +02:00
Ryan Ernst	03dd22b56c	Add missing ZonedDateTime methods for joda compat layer (#44829 ) While joda no longer exists in the apis for 7.x, the compatibility layer still exists with helper methods mimicking the behavior of joda for ZonedDateTime objects returned for date fields in scripts. This layer was originally intended to be removed in 7.0, but is now likely to exist for the lifetime of 7.x. This commit adds missing methods from ChronoZonedDateTime to the compat class. These methods were not part of joda, but are needed to act like a real ZonedDateTime. relates #44411	2019-07-25 11:45:57 -07:00
Julie Tibshirani	acb7f599a3	Fix an NPE when requesting inner hits and _source is disabled. (#44836 ) This PR makes two changes to FetchSourceSubPhase when _source is disabled and we're in a nested context: * If no source filters are provided, return early to avoid an NPE. * If there are source filters, make sure to throw an exception. The behavior was chosen to match what currently happens in a non-nested context.	2019-07-25 10:38:00 -07:00
James Baiera	c5528a25e6	Merge branch '7.x' into enrich-7.x	2019-07-25 13:12:56 -04:00
Nicholas Knize	48757da6e1	[GEO] Fix GeoShapeQueryBuilder to check for valid spatial relations Refactor left out the spatial strategy check in GeoShapeQueryBuilder.relation setter method. This commit adds that check back in.	2019-07-25 11:32:13 -05:00
Nick Knize	133f848e9f	[Geo] Refactor GeoShapeQueryBuilder to derive from AbstractGeometryQueryBuilder (#44780 ) Refactors GeoShapeQueryBuilder to derive from a new AbstractGeometryQueryBuilder that provides common parsing and build logic for spatial geometries. This will allow development of custom geometry queries by extending AbstractGeometryQueryBuilder preventing duplication of common spatial query logic.	2019-07-25 11:32:13 -05:00
Armin Braun	383d7b7713	Cleanup Dead Code in Index Creation (#44784 ) (#44822 ) * Cleanup Dead Code in Index Creation * This is all unused and the state of a create request is always `OPEN`	2019-07-25 10:50:04 +02:00
Yannick Welsch	e0d4544ef6	Close connection manager on current thread in RemoteClusterConnection (#44805 ) The problem is that RemoteClusterConnection closes the connection manager asynchronously, which races with the threadpool being shutdown at the end of the test. Closes #44339 Closes #44610	2019-07-25 09:34:41 +02:00
Igor Motov	f9943a3e53	Geo: deprecate ShapeBuilder in QueryBuilders (#44715 ) Removes unnecessary now timeline decompositions from shape builders and deprecates ShapeBuilders in QueryBuilder in favor of libs/geo shapes. Relates to #40908	2019-07-24 14:27:58 -04:00
David Turner	4cfd2fc6b2	Fix testFirstListElementsToCommaDelimitedStringReportsFirstElementsIfLong (#44785 ) This test can fail (super-rarely) if it generates a list of length 11 containing a duplicate, because the `.distinct()` reduces the list length to 10 and then it is not abbreviated any more. This change generalises the test to cover lists of any random length.	2019-07-24 16:10:41 +01:00
Tanguy Leroux	a8905ef142	[7.x] Add CloseIndexResponse to HLRC (#44349 ) (#44788 ) The CloseIndexResponse was improved in #39687; this commit exposes it in the HLRC. Backport of #44349 to 7.x.	2019-07-24 15:51:01 +02:00
Dimitris Athanasiou	5453188cef	[TEST] Mute SharedClusterSnapshotRestoreIT.testParallelRestoreOperationsFromSingleSnapshot This was supposed to be muted in #44675 and its backports but that PR accidentally muted another test. Relates #44671	2019-07-24 14:28:09 +03:00
Armin Braun	4a3218551c	Fix ConnectionManagerTests (#44769 ) (#44789 ) * In both fake connection validators we were potentially executing the listener twice. This lead to the situation that the locking via `connectionLock` that ensures that each listener is only executed once ever would fail and the lister would run twice (in which case the listeners for that node are already `null` and we get an NPE) * The fact that two different tests fail is due to the fact that we weren't safely shutting down the threadpool which meant the the task that trips the assertion (on the generic pool) would leak into the next test and fail it * Closes #44758	2019-07-24 13:12:57 +02:00
Jason Tedor	4c77d5e2c7	Remove stale permissions from untrusted policy (#44783 ) We have some old permissions lying around, granted to untrusted code from the days of yore when we supported Groovy and Javascript scripting. This commit removes these stale permissions.	2019-07-24 15:59:16 +09:00
Jason Tedor	659ebf6cfb	Notify systemd when Elasticsearch is ready (#44673 ) Today our systemd service defaults to a service type of simple. This means that systemd assumes Elasticsearch is ready as soon as the ExecStart (bin/elasticsearch) process is forked off. This means that the service appears ready long before it actually is, so before it is ready to receive requests. It also means that services that want to depend on Elasticsearch being ready to start can not as there is not a reliable mechanism to determine this. This commit changes the service type to notify. This requires that Elasticsearch sends a notification message via libsystemd sd_notify method. This commit does that by using JNA to invoke this native method. Additionally, we use this integration to also notify systemd when we are stopping.	2019-07-24 14:04:36 +09:00
Armin Braun	818103ff1e	Fix testRetentionLeasesClearedOnRestore (#44754 ) (#44766 ) * Fix this test randomly failing when running into async translog persistence edge case and failing to successfully close index * Also, slightly improve debug logging on close failure * Closes #44681	2019-07-23 21:29:07 +02:00
Igor Motov	9338fc8536	GEO: Switch to using GeoTestUtil to generate random geo shapes (#44635 ) Switches to more robust way of generating random test geometries by reusing lucene's GeoTestUtil. Removes duplicate random geometry generators by moving them to the test framework. Closes #37278	2019-07-23 14:30:41 -04:00
Armin Braun	e5bd3ad0e9	Remove some Dead Code in o.e.transport (#44653 ) (#44734 ) * None of this is used	2019-07-23 10:52:37 +02:00
David Turner	ee23968f05	Ignore unknown fields if overriding node metadata (#44689 ) The `elasticsearch-node override-version` command fails if it cannot read the existing node metadata file. However, it reads this file strictly and fails if there are any unknown fields, which means it will not be useful if we add another field in future. This commit adds leniency to this command, allowing it to ignore any unknown fields and proceed with the downgrade. A downgrade is already unsafe, and the user is already copiously warned about this, so being lenient in this case does not make things much worse.	2019-07-23 08:54:58 +01:00
Jason Tedor	6928a315c4	Check shard limit after applying index templates (#44619 ) Today when creating an index and checking cluster shard limits, we check the number of shards before applying index templates. At this point, we do not know the actual number of shards that will be used to create the index. In a case when the defaults are used and a template would override, we could be grossly underestimating the number of shards that would be created, and thus incorrectly applying the limits. This commit addresses this by checking the shard limits after applying index templates.	2019-07-23 16:50:42 +09:00
Ignacio Vera	05ec970723	Support BucketScript paths of type string and array. (#44694 ) (#44731 )	2019-07-23 09:05:47 +02:00
Ioannis Kakavas	3714cb63da	Allow parsing the value of java.version sysprop (#44017 ) We often start testing with early access versions of new Java versions and this have caused minor issues in our tests (i.e. #43141) because the version string that the JVM reports cannot be parsed as it ends with the string -ea. This commit changes how we parse and compare Java versions to allow correct parsing and comparison of the output of java.version system property that might include an additional alphanumeric part after the version numbers (see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it handles a version number part, like before, but additionally a PRE part that matches ([a-zA-Z0-9]+). It also changes a number of tests that would attempt to parse java.specification.version in order to get the full version of Java. java.specification.version only contains the major version and is thus inappropriate when trying to compare against a version that might contain a minor, patch or an early access part. We know parse java.version that can be consistently parsed. Resolves #43141	2019-07-22 20:14:56 +03:00
Tanguy Leroux	bcb3563dcf	Remove AllocationService.reroute(ClusterState, String, boolean) (#44629 ) This commit removes the method AllocationService.reroute(ClusterState, String, boolean) in favor of AllocationService.reroute(ClusterState, String). Motivations are: there are already 3 other reroute methods in this class this method is always called with the debug parameter set to false almost all tests use the method reroute(ClusterState, String)	2019-07-22 17:12:21 +02:00
Evgenia Badiyanova	5273a548a4	Unmute PendingTasksBlocksIT tests	2019-07-22 10:59:21 -04:00
Armin Braun	6ceae5d586	Document Type of Collections Returned by StreamInput (#44686 ) (#44688 ) * As a result of #44665 the collections returned by the deserialization methods on `StreamInput` may be either mutable or immutable now, this PR adds documentation for that fact	2019-07-22 16:06:34 +02:00
Evgenia Badiyanova	8ee4c4d5ba	Mute some tests in PendingTasksBlocksIT Tracked in #44695.	2019-07-22 09:55:07 -04:00
David Turner	dcb3b2c18a	Fix testPendingTasksWithClusterNotRecoveredBlock In 7.x we cannot start a new master-eligible node before the cluster has formed since we first try and update minimum_master_nodes and this is blocked. This commit changes the test to start a data-only node so that no such adjustment is necessary. Relates #44685	2019-07-22 14:42:20 +01:00
Mayya Sharipova	972a49312c	Fix testQuotedQueryStringWithBoost test (#43385 ) Add more logging to indexRandom Seems that asynchronous indexing from indexRandom sometimes indexes the same document twice, which will mess up the expected score calculations. For example, indexing: { "index" : {"_id" : "1" } } {"important" :"phrase match", "less_important": "nothing important"} { "index" : {"_id" : "2" } } {"important" :"nothing important", "less_important" :"phrase match"} Produces the expected scores: 13.8 for doc1, and 1.38 for doc2 indexing: { "index" : {"_id" : "1" } } {"important" :"phrase match", "less_important": "nothing important"} { "index" : {"_id" : "2" } } {"important" :"nothing important", "less_important" :"phrase match"} { "index" : {"_id" : "3" } } {"important" :"phrase match", "less_important": "nothing important"} Produces scores: 9.4 for doc1, and 1.96 for doc2 which are found in the error logs. Relates to #43144	2019-07-22 08:44:31 -04:00
Przemyslaw Gomulka	a154f49b94	Fix stats in slow logs to be a escaped JSON backport(#44642 ) #44687 Fields in JSON logs should be an escaped JSON fields. It is a broken json value at the moment "stats": "["group1", "group2"]", -> "stats": "[\"group1\", \"group2\"]", This should later be refactored into a JSON array of strings (the same as types in 7.x)	2019-07-22 14:28:39 +02:00
David Turner	0ce3114779	Allow pending tasks before state recovery (#44685 ) Today we block access to the pending tasks API before the cluster has recovered its state. There's no real need to do so, and the master does meaningful work even before performing state recovery so it might sometimes be useful to allow access to this API. This commit changes this API to ignore all cluster blocks. Fixes #44652	2019-07-22 13:15:10 +01:00
Przemyslaw Gomulka	09e9c4cb59	Fix types field in JSON Search Slow Logs (#44641 ) The field has to be defined in log4j2.properties and should be an escaped JSON for now (it is a broken JSON at the moment). This should later be refactored into a JSON array of strings.	2019-07-22 12:02:20 +02:00
Przemyslaw Gomulka	fe20e217a4	Deprecation messages with the same key but different x-opaque-id are allowed backport(#44587 ) #44682 Deprecation logger was filtering log entries by key, that means that if two log messages with the same key are logged from different users, then the second log messages will be filtered. This change allows to log deprecation message with the same key by different users. relates #41354 backport #44587	2019-07-22 11:38:11 +02:00
Armin Braun	a6adcecd20	Fix Tring to Mutate Immutable Collections Fixes two spots where #44665 caused a previously mutable collection to now be read as an immutable one, leading to errors	2019-07-22 11:04:05 +02:00
Armin Braun	b9067ba1ba	Remove Needless Synchronization in FollowersChecker (#44631 ) (#44680 ) * It seems redundant to synchronize here and check that the map hasn't checked via the `isRunning` under the mutex * The map won't change if under the mutex that locks on all the updates to it * Without the mutex it's very unlikely to change inside the method call relative to the likelihood of changing until the generic pool where we check for `isRunning` again anyway -> just remove the synchronization (it's on the IO loop) and check since we do check the running state on the generic pool under the mutex anyway when we actually fail it	2019-07-22 10:57:30 +02:00
Jason Tedor	ff76b0af8b	Copy field names in stored fields context We have to copy the field names otherwise we either have a handle of a list that a caller might mutate or we might mutate when they aren't expecting it, or worse, a handle of a list that is not mutable (and we end up mutating the list). Relates #44665	2019-07-22 17:40:07 +09:00
Alpar Torok	b34ac66d96	Mute multiple tests on Windows (7.x) (#44676 ) * Mute failing test tracked in #44552 * mute EvilSecurityTests tracking in #44558 * Fix line endings in ESJsonLayoutTests * Mute failing ForecastIT test on windows Tracking in #44609 * mute BasicRenormalizationIT.testDefaultRenormalization tracked in #44613 * fix mute testDefaultRenormalization * Increase busyWait timeout windows is slow * Mute failure unconfigured node name * mute x-pack internal cluster test windows tracking #44610 * Mute JvmErgonomicsTests on windows Tracking #44669 * mute SharedClusterSnapshotRestoreIT testParallelRestoreOperationsFromSingleSnapshot Tracking #44671 * Mute NodeTests on Windows Tracking #44256	2019-07-22 11:32:29 +03:00
Armin Braun	0e2e83f591	More Efficient Deserialization of Empty Collections in StreamInput (#44665 ) (#44674 ) * We only had the `size == 0` optimization in some but not all spots of deserializing collections in this class, fixed the remaining spots. * Also fixed the a similar spot when deserializing `ThreadContextStruct` that could now be simplified (it was apparently doing it's own version of this optimization for the first map it deserialized before ... but not for the second map -> made it not instantiate anything if both maps are empty since it's always the same object here anyway)	2019-07-22 09:31:12 +02:00
Armin Braun	0ac137a9a1	Optimize some StreamOutput Operations (#44660 ) (#44668 ) * Optimize some StreamOutput Operations * Writing numbers byte by byte adds a lot of unnecessary bounds checks to serialization * Serializing to a threadlocal `byte[]` instead and bulk writing gives about a 50% speedup on `long` and `vlong` (for large numbers) writes and 30% for `int`, `vint` on Linux on an i9 * Using a threadlocal of the maximum string buffer size we used to allocate before also removes allocations when writing strings in general since we now never have to allocate a `byte[]` for that * And don't have to GC one either resolving the TODO removed here	2019-07-22 07:09:32 +02:00
Tal Levy	1a9cfe9110	Removal Streamable (#44647 ) (#44655 ) This commit ends the grand adventure that was the refactoring effort to migrate all usages of Streamable to Writeable. Closes #34389.	2019-07-20 19:10:49 -07:00
Ryan Ernst	4c05d25ec7	Convert Transport Request/Response to Writeable (#44636 ) (#44654 ) This commit converts all remaining TransportRequest and TransportResponse classes to implement Writeable, and disallows Streamable implementations. relates #34389	2019-07-20 11:25:58 -07:00
Ryan Ernst	f4ee2e9e91	Convert direct implementations of Streamable to Writeable (#44605 ) (#44646 ) This commit converts Streamable to Writeable for direct implementations. relates #34389	2019-07-20 08:32:29 -07:00
Tal Levy	7c84636029	Remove StreamOutput #writeOptionalStreamable and #writeStreamableList (#44602 ) (#44643 ) remove usages of writeOptionalStreamable and writeStreambaleList relates #34389.	2019-07-19 15:55:53 -07:00
Ryan Ernst	f193d14764	Convert remaining Action Response/Request to writeable.reader (#44528 ) (#44607 ) This commit converts readFrom to ctor with StreamInput on the remaining ActionResponse and ActionRequest classes. relates #34389	2019-07-19 13:33:38 -07:00
Armin Braun	f028ab43ad	Don't Swallow Interrupt in TransportService#onRequestReceived (#44622 ) (#44627 ) * We shouldn't just swallow the interrupt here quietly and keep going on the IO thread * Currently interrupt continues here just the same way an invocation of `acceptIncomingRequests` woudl have made things continue * Relates #44610	2019-07-19 20:35:29 +02:00
Christoph Büscher	eafe54c81c	Fix AnalysisMode propagation in NamedAnalyzer (#44626 ) NamedAnalyzer should return the same AnalysisMode than any custom analyzer it wraps, otherwise AnalysisMode.ALL. This used to be only CustomAnalyzer in the past, but with the introduction of the ReloadableCustomAnalyzer this needs to be added as an option where the analysis mode gets propagated. Closes #44625	2019-07-19 18:18:43 +02:00
Nikita Glashenko	804476c35d	Remove support for old translog checkpoint formats (#44280 ) This commit removes support for the translog checkpoint format from versions before 6.0.0 since 7.x versions are incompatible with indices from these versions. Relates #44720 Fixes #44210	2019-07-19 16:01:47 +01:00
Przemyslaw Gomulka	597d2dfaf5	Add types field to slow logs in 7.x (#44592 ) By mistake in 7.x types field was removed from slow logs. Types are still present in that version, so this have to be present as a JSON field relates #41354 backport that was causing this #44178	2019-07-19 08:31:00 +02:00
Ryan Ernst	60785a9fa8	Convert several direct uses of Streamable to Writeable (#44586 ) (#44604 ) This commit converts several utility classes that implement Streamable to have StreamInput constructors. It also adds a default version of readFrom to Streamable so that overriding to throw UOE is not necessary. relates #34389	2019-07-18 21:25:44 -07:00
Julie Tibshirani	336364fefe	Convert more classes in 'server' to Writeable. (#44600 ) * Convert GetTask. Convert RemoteInfo. Convert GetFieldMappings. Convert ValidateQueryRequest. Convert MainResponse. Convert MultiGet. Convert Update. Add a missing call to parent constructors. Relates to #34389.	2019-07-18 18:45:10 -07:00
Ryan Ernst	13f46aa801	Convert index and persistent actions/response to writeable (#44582 ) (#44601 ) This commit converts several more classes from streamable to writeable in server, mostly within the o.e.index and o.e.persistent packages. relates #34389	2019-07-18 18:32:09 -07:00
Tal Levy	03f5084ac7	remove usages of #readOptionalStreamable, #readStreamableList. (#44578 ) (#44598 ) This commit removes references to Streamable from StreamInput. This is all a part of the effort to remove Streamable usage. relates #34389.	2019-07-18 16:19:02 -07:00
Ryan Ernst	af093a4095	Convert ShardOperationFailedException to Writeable (#44532 ) (#44580 ) This commit converts subclasses of ShardOperationFailedException to implement ctors with StreamInput instead of readFrom. It also simplifies IndicesShardStoresResponse.Failure to serialize its shardId after the super data. relates #34389	2019-07-18 13:29:19 -07:00
Armin Braun	3b5038b837	Implement Eventually Consistent Mock Repository for SnapshotResiliencyTests (#40893 ) (#44570 ) * Add eventually consistent mock repository for reproducing and testing AWS S3 blob store behavior * Relates #38941	2019-07-18 17:54:54 +02:00
Andrey Ershov	ef6ddd15c6	Revert "Snapshot tool: S3 orphaned files cleanup (#44551)" This reverts commit `09edeeb3`	2019-07-18 17:21:45 +02:00
Andrey Ershov	09edeeb38e	Snapshot tool: S3 orphaned files cleanup (#44551 ) A tool to work with snapshots. Co-authored by @original-brownbear. This commit adds snapshot tool and the single command cleanup, that cleans up orphaned files for S3. Snapshot tool lives in x-pack/snapshot-tool. (cherry picked from commit fc4aed44dd975d83229561090f957a95cc76b287)	2019-07-18 16:38:00 +02:00
David Turner	452f7f67a0	Defer reroute when starting shards (#44539 ) Today we reroute the cluster as part of the process of starting a shard, which runs at `URGENT` priority. In large clusters, rerouting may take some time to complete, and this means that a mere trickle of shard-started events can cause starvation for other, lower-priority, tasks that are pending on the master. However, it isn't really necessary to perform a reroute when starting a shard, as long as one occurs eventually. This commit removes the inline reroute from the process of starting a shard and replaces it with a deferred one that runs at `NORMAL` priority, avoiding starvation of higher-priority tasks. Backport of #44433 and #44543.	2019-07-18 14:10:40 +01:00
Alan Woodward	ec0a0a41db	Remove type parameter from ParserContext (#44478 ) ParserContext.getType() is never called, so we can remove it and tidy up the callers as well.	2019-07-18 11:07:46 +01:00
Luca Cavanna	a8a16e6b08	Associate sub-requests to their parent task in multi search API (#44492 ) Multi search accepts multiple search requests and runs them as independent requests, each one as part of their own search task. Today they don't get associated though with their parent multi search task, which would be useful to monitor which msearch a certain search was part of, if any, and also to cancel all of the sub-requests in case the parent msearch gets cancelled (though this will also require making the multi search task cancellable as a follow-up).	2019-07-18 11:58:30 +02:00
David Turner	7598e0186a	Harmonise indentation of cluster settings (#44540 ) Today the long list of `BUILT_IN_CLUSTER_SETTINGS` is indented differently between `master` and `7.x`. This sometimes makes backporting painful. This commit adjusts the indentation of earlier branches to match that in `master`.	2019-07-18 09:50:53 +01:00
Armin Braun	6565825a13	Avoid CharsRef Allocations in StreamInput (#44488 ) (#44519 ) * Many messages deserialized from a `StreamInput` only contain short strings, some use-cases of instantiating a `StreamInput` don't deserialize any strings * Don't allocate `CharsRef` for small strings to save some allocations (especially on the IO threads) * Lazily allocate a larger `CharsRef` if needed for larger strings like we did before and have it live as long as the `StreamInput` like before as well	2019-07-18 08:52:37 +02:00
Tal Levy	38d2ada84f	deprecate Supplier<Response> constructors in HandledTransportAction (#44456 ) (#44533 ) This commit deprecates all constructors of HandledTransportAction that take in a Supplier instead of a Writeable.Reader for response objects. in addition to the deprecation, the following modules were updated to leverage Writeable - modules:ingest-common - modules:lang-mustache relates #34389.	2019-07-17 22:47:09 -07:00
Tal Levy	075a3f0e99	remove usage of ActionType#(String) (#44459 ) (#44526 ) this commit removes usage of the deprecated constructor with a single argument and no Writeable.Reader. The purpose of this is to reduce the boilerplate necessary for properly implementing a new action, as well as reducing the chances of using the incorrect super constructor while classes are being migrated to Writeable relates #34389.	2019-07-17 20:28:11 -07:00
Nhat Nguyen	51180af91d	Make peer recovery send file chunks async (#44468 ) Relates #44040 Relates #36195	2019-07-17 22:25:43 -04:00
Nhat Nguyen	458f24c46a	Reenable accounting circuit breaker (#44495 ) We have a new Lucene 8.2 snapshot on master and 7.x; hence we can re-enable the accounting on these branches. Relates #30290	2019-07-17 22:25:43 -04:00
Julie Tibshirani	34c6067018	Convert several classes in 'server' to Writeable. (#44527 ) * Convert FieldCapabilities. Convert MultiTermVectors. Convert SyncedFlush. Convert SearchTemplateRequest. * Convert MultiSearchTemplateRequest. * Convert GrokProcessorGet. Remove a stray reference to SearchTemplateRequest#readFrom. Relates to #34389.	2019-07-17 19:04:21 -07:00
Ryan Ernst	2a2686e6e7	Convert remaining ActionTypes to writeable in xpack core (#44467 ) (#44525 ) This commit converts all remaining ActionType response classes to writeable in xpack core. It also converts a few from server which were used by xpack core. relates #34389	2019-07-17 18:01:45 -07:00
Ryan Ernst	17c4b2b839	Convert MasterNodeRequest to implement Writeable.Reader (#44452 ) (#44513 ) This commit converts all MasterNodeRequest subclasses to fullfill Writeable.Reader constructors. relates #34389	2019-07-17 18:01:29 -07:00
Paul Sanwald	7114fe786b	Fix incorrect calculation of how many buckets will result from a merge operation. (#44461 ) (#44515 )	2019-07-17 19:14:16 -04:00
Julie Tibshirani	8841779de8	Convert ClearScroll* to Writeable. (#44511 ) This PR converts `ClearScrollRequest` and `ClearScrollResponse` to `Writeable`. Relates to #34389.	2019-07-17 15:49:38 -07:00
Jason Tedor	39c5f98de7	Introduce test issue logging (#44477 ) Today we have an annotation for controlling logging levels in tests. This annotation serves two purposes, one is to control the logging level used in tests, when such control is needed to impact and assert the behavior of loggers in tests. The other use is when a test is failing and additional logging is needed. This commit separates these two concerns into separate annotations. The primary motivation for this is that we have a history of leaving behind the annotation for the purpose of investigating test failures long after the test failure is resolved. The accumulation of these stale logging annotations has led to excessive disk consumption. Having recently cleaned this up, we would like to avoid falling into this state again. To do this, we are adding a link to the test failure under investigation to the annotation when used for the purpose of investigating test failures. We will add tooling to inspect these annotations, in the same way that we have tooling on awaits fix annotations. This will enable us to report on the use of these annotations, and report when stale uses of the annotation exist.	2019-07-18 05:33:33 +09:00
Ryan Ernst	0755a13c9f	Convert AcknowledgedRequest to Writeable.Reader (#44412 ) (#44454 ) This commit adds constructors to AcknolwedgedRequest subclasses to implement Writeable.Reader, and ensures all future subclasses implement the same. relates #34389	2019-07-17 11:17:36 -07:00
Yannick Welsch	c8b66c549d	Ignore failures to set socket options on Mac (#44355 ) Brings some temporary relief for test failures until #41071 is addressed.	2019-07-17 18:51:25 +02:00
Yannick Welsch	f78e64e3e2	Terminate linearizability check early on large histories (#44444 ) Large histories can be problematic and have the linearizability checker occasionally run OOM. As it's very difficult to bound the size of the histories just right, this PR will let it instead run for 10 seconds on large histories and then abort. Closes #44429	2019-07-17 18:51:25 +02:00
Igor Motov	d3cb7bbc8f	Geo: fix GeoWKTShapeParserTests (#44448 ) Changes in #44187 introduced some optimization in the way shapes are generated. These changes were not captured in GeoWKTShapeParserTests. Relates #44187	2019-07-17 12:09:38 -04:00
Igor Motov	cd5a334864	Geo: extract dateline handling logic from ShapeBuilders (#44187 ) Extracts dateline decomposition logic from ShapeBuilder into a separate utility class that is used on the indexing side. The search side will be handled as part of another PR at this time we will remove the decomposition logic from ShapeBuilders as well. This PR also doesn't change any existing logic including bugs. Relates to #40908	2019-07-17 12:09:38 -04:00
Alan Woodward	b6a0f098e6	Don't use index_phrases on graph queries (#44340 ) Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you try to use a synonym filter with the index_phrases option on a text field, you can end up with null values in a Phrase query, leading to weird exceptions further down the querying chain. As a workaround, this commit disables the index_phrases optimization for queries that produce token graphs. Fixes #43976	2019-07-17 16:46:00 +01:00
Yannick Welsch	ddd740162e	Do not use CancellableThreads for Zen1 (#44430 ) Zen 1 stops pinging threads in ZenDiscovery by calling Thread.interrupt(). This is incompatible with the CancellableThreads that only allow threads to be interrupted through cancellation. The use of CancellableThreads was introduced in #42844 and added to UnicastZenPing as part of the backport, as both Zen1 and Zen2 share the same SeedHostsResolver implementation. This commit effectively undoes the change in the backport while still allowing to share same implementation. Closes #44425	2019-07-17 17:32:47 +02:00
Zachary Tong	103ba976fd	Convert BucketScript to static parser (#44385 ) BucketScript was using the old-style parser and could easily be converted over to the newer static parser. Also adds a test for GapPolicy enum serialization	2019-07-17 10:22:42 -04:00
David Turner	377a6a47ac	Improve handshake failure messages (#44485 ) Today we report an exception on a handshake failure (e.g. cluster name mismatch) but the message does not include all the details of the mismatch. If the mismatch is something subtle like `my-cluster` instead of `my_cluster` then we cannot diagnose this from the message alone. This commit adds the details of the local cluster to the message, along with the details of the remote cluster, improving the utility of the exception message if reported in isolation.	2019-07-17 13:33:28 +01:00
Armin Braun	91673e373a	Fix Incorrect Uncompressed Error Handling in InboundMessage (#44317 ) (#44483 ) * Fix Incorrect Uncompressed Error Handling in InboundMessage * CompressorFactory.compressor does not throw uncompressed exception on uncompressed bytes, it merely returns `null` in this case if the bytes are at least XContent so the current catch and re-throw logic is dead code * Made it work again by throwing on a `null` return so we get a real error message instead of an NPE	2019-07-17 14:31:46 +02:00
Ignacio Vera	eb348d2593	Upgrade to lucene-8.2.0-snapshot-6413aae226 (#44480 )	2019-07-17 13:28:28 +02:00
Armin Braun	c8db0e9b7e	Remove blobExists Method from BlobContainer (#44472 ) (#44475 ) * We only use this method in one place in production code and can replace that with a read -> remove it to simplify the interface * Keep it as an implementation detail in the Azure repository	2019-07-17 11:56:02 +02:00
Tanguy Leroux	e423b7341a	Log non-acknowledged close index response in ReplicaToPrimaryPromotionIT Relates #44479	2019-07-17 10:32:44 +02:00
David Turner	dca8a918f3	Use applied cluster state in cluster health (#44426 ) In #44348 we changed the cluster health action so that it sometimes uses the cluster state directly from the master service rather than from the cluster applier. If the state is not recovered then this is inappropriate, because prior to state recovery the state available to the cluster applier contains no indices. This commit moves us back to using the state from the applier. Fixes #44416.	2019-07-17 08:36:13 +01:00
David Turner	0fd33b089f	Report shard state changes better (#44419 ) Today when the cluster health changes the `AllocationService` reports at most ten shards that were started or failed, and always ends its message with `...` suggesting that the list is truncated. This commit adjusts these messages to be clearer about whether the list is truncated or not. When debug logging is enabled the list is not truncated; if the list is truncated then its length is logged, and if it is not truncated then no `...` is included in the message.	2019-07-17 08:36:06 +01:00
Ryan Ernst	6e50bafa8f	Convert Broadcast request and response to use writeable.reader (#44386 ) (#44453 ) This commit converts the request and response classes for broadcast actions to implement ctors for Writeable.Reader and forces all future implementations to implement the same. relates #34389	2019-07-16 23:24:02 -07:00
Tim Brooks	6b1a769638	Move CORS Config into :server package (#43779 ) This commit moves the config that stores Cors options into the server package. Currently both nio and netty modules must have a copy of this config. Moving it into server allows one copy and the tests to be in a common location.	2019-07-16 17:50:42 -06:00
Julie Tibshirani	cc0ff3aa71	Ensure field caps doesn't error on rank feature fields. (#44370 ) The contract for MappedFieldType#fielddataBuilder is to throw an IllegalArgumentException if fielddata is not supported. The rank feature mappers were instead throwing an UnsupportedOperationException, which caused MappedFieldType#isAggregatable to fail.	2019-07-16 15:56:50 -07:00
Ryan Ernst	c26edb4c43	Ensure replication response/requests implement writeable (#44392 ) (#44446 ) This commit cleans up replication response and request so that the base class does not allow subclasses to implement Streamable. relates #34389	2019-07-16 12:53:08 -07:00
Przemysław Witek	9613700a63	[7.x] Implement MlConfigIndexMappingsFullClusterRestartIT test which verifies that .ml-config index mappings are properly updated during cluster upgrade (#44341 ) (#44366 )	2019-07-16 21:22:40 +02:00
Nhat Nguyen	301c8daf4c	Revert "Make peer recovery send file chunks async (#44040 )" This reverts commit `a2b4687d89`.	2019-07-16 14:18:35 -04:00
Christoph Büscher	67ec0a4e9b	Unmute SpecificMasterNodesIT test (#44436 ) The underlying issue is closed and the fix in #42454 seems to have been backported to 7.x and 7.3 so we can reactivate the test.	2019-07-16 19:41:59 +02:00
Yu	563a78829f	Do not allow version in Rest Update API (#43516 ) The versioning of Update API doesn't rely on version number anymore (and rather on sequence number). But in rest api level we ignored the "version" and "version_type" parameter, so that the server cannot raise the exception when whey were set. This PR restores "version" and "version_type" parsing in Update Rest API so that we can get the appropriate errors. Relates to #42497	2019-07-16 13:19:07 -04:00
Nhat Nguyen	a2b4687d89	Make peer recovery send file chunks async (#44040 )	2019-07-16 10:43:46 -04:00
Lee Hinman	fb0461ac76	[7.x] Add Snapshot Lifecycle Management (#44382 ) * Add Snapshot Lifecycle Management (#43934) * Add SnapshotLifecycleService and related CRUD APIs This commit adds `SnapshotLifecycleService` as a new service under the ilm plugin. This service handles snapshot lifecycle policies by scheduling based on the policies defined schedule. This also includes the get, put, and delete APIs for these policies Relates to #38461 * Make scheduledJobIds return an immutable set * Use Object.equals for SnapshotLifecyclePolicy * Remove unneeded TODO * Implement ToXContentFragment on SnapshotLifecyclePolicyItem * Copy contents of the scheduledJobIds * Handle snapshot lifecycle policy updates and deletions (#40062) (Note this is a PR against the `snapshot-lifecycle-management` feature branch) This adds logic to `SnapshotLifecycleService` to handle updates and deletes for snapshot policies. Policies with incremented versions have the old policy cancelled and the new one scheduled. Deleted policies have their schedules cancelled when they are no longer present in the cluster state metadata. Relates to #38461 * Take a snapshot for the policy when the SLM policy is triggered (#40383) (This is a PR for the `snapshot-lifecycle-management` branch) This commit fills in `SnapshotLifecycleTask` to actually perform the snapshotting when the policy is triggered. Currently there is no handling of the results (other than logging) as that will be added in subsequent work. This also adds unit tests and an integration test that schedules a policy and ensures that a snapshot is correctly taken. Relates to #38461 * Record most recent snapshot policy success/failure (#40619) Keeping a record of the results of the successes and failures will aid troubleshooting of policies and make users more confident that their snapshots are being taken as expected. This is the first step toward writing history in a more permanent fashion. * Validate snapshot lifecycle policies (#40654) (This is a PR against the `snapshot-lifecycle-management` branch) With the commit, we now validate the content of snapshot lifecycle policies when the policy is being created or updated. This checks for the validity of the id, name, schedule, and repository. Additionally, cluster state is checked to ensure that the repository exists prior to the lifecycle being added to the cluster state. Part of #38461 * Hook SLM into ILM's start and stop APIs (#40871) (This pull request is for the `snapshot-lifecycle-management` branch) This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are cancelled. Relates to #38461 * Add tests for SnapshotLifecyclePolicyItem (#40912) Adds serialization tests for SnapshotLifecyclePolicyItem. * Fix improper import in build.gradle after master merge * Add human readable version of modified date for snapshot lifecycle policy (#41035) * Add human readable version of modified date for snapshot lifecycle policy This small change changes it from: ``` ... "modified_date": 1554843903242, ... ``` To ``` ... "modified_date" : "2019-04-09T21:05:03.242Z", "modified_date_millis" : 1554843903242, ... ``` Including the `"modified_date"` field when the `?human` field is used. Relates to #38461 * Fix test * Add API to execute SLM policy on demand (#41038) This commit adds the ability to perform a snapshot on demand for a policy. This can be useful to take a snapshot immediately prior to performing some sort of maintenance. ```json PUT /_ilm/snapshot/<policy>/_execute ``` And it returns the response with the generated snapshot name: ```json { "snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug" } ``` Note that this does not allow waiting for the snapshot, and the snapshot could still fail. It does record this information into the cluster state similar to a regularly trigged SLM job. Relates to #38461 * Add next_execution to SLM policy metadata (#41221) * Add next_execution to SLM policy metadata This adds the next time a snapshot lifecycle policy will be executed when retriving a policy's metadata, for example: ```json GET /_ilm/snapshot?human { "production" : { "version" : 1, "modified_date" : "2019-04-15T21:16:21.865Z", "modified_date_millis" : 1555362981865, "policy" : { "name" : "<production-snap-{now/d}>", "schedule" : "/30 * * * ?", "repository" : "repo", "config" : { "indices" : [ "foo-", "important" ], "ignore_unavailable" : true, "include_global_state" : false } }, "next_execution" : "2019-04-15T21:16:30.000Z", "next_execution_millis" : 1555362990000 }, "other" : { "version" : 1, "modified_date" : "2019-04-15T21:12:19.959Z", "modified_date_millis" : 1555362739959, "policy" : { "name" : "<other-snap-{now/d}>", "schedule" : "0 30 2 * ?", "repository" : "repo", "config" : { "indices" : [ "other" ], "ignore_unavailable" : false, "include_global_state" : true } }, "next_execution" : "2019-04-16T02:30:00.000Z", "next_execution_millis" : 1555381800000 } } ``` Relates to #38461 * Fix and enhance tests * Figured out how to Cron * Change SLM endpoint from /_ilm/* to /_slm/* (#41320) This commit changes the endpoint for snapshot lifecycle management from: ``` GET /_ilm/snapshot/<policy> ``` to: ``` GET /_slm/policy/<policy> ``` It mimics the ILM path only using `slm` instead of `ilm`. Relates to #38461 * Add initial documentation for SLM (#41510) * Add initial documentation for SLM This adds the initial documentation for snapshot lifecycle management. It also includes the REST spec API json files since they're sort of documentation. Relates to #38461 * Add `manage_slm` and `read_slm` roles (#41607) * Add `manage_slm` and `read_slm` roles This adds two more built in roles - `manage_slm` which has permission to perform any of the SLM actions, as well as stopping, starting, and retrieving the operation status of ILM. `read_slm` which has permission to retrieve snapshot lifecycle policies as well as retrieving the operation status of ILM. Relates to #38461 * Add execute to the test * Fix ilm -> slm typo in test * Record SLM history into an index (#41707) It is useful to have a record of the actions that Snapshot Lifecycle Management takes, especially for the purposes of alerting when a snapshot fails or has not been taken successfully for a certain amount of time. This adds the infrastructure to record SLM actions into an index that can be queried at leisure, along with a lifecycle policy so that this history does not grow without bound. Additionally, SLM automatically setting up an index + lifecycle policy leads to `index_lifecycle` custom metadata in the cluster state, which some of the ML tests don't know how to deal with due to setting up custom `NamedXContentRegistry`s. Watcher would cause the same problem, but it is already disabled (for the same reason). * High Level Rest Client support for SLM (#41767) * High Level Rest Client support for SLM This commit add HLRC support for SLM. Relates to #38461 * Fill out documentation tests with tags * Add more callouts and asciidoc for HLRC * Update javadoc links to real locations * Add security test testing SLM cluster privileges (#42678) * Add security test testing SLM cluster privileges This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm` cluster privileges. Relates to #38461 * Don't redefine vars * Add Getting Started Guide for SLM (#42878) This commit adds a basic Getting Started Guide for SLM. * Include SLM policy name in Snapshot metadata (#43132) Keep track of which SLM policy in the metadata field of the Snapshots taken by SLM. This allows users to more easily understand where the snapshot came from, and will enable future SLM features such as retention policies. * Fix compilation after master merge * [TEST] Move exception wrapping for devious exception throwing Fixes an issue where an exception was created from one line and thrown in another. * Fix SLM for the change to AcknowledgedResponse * Add Snapshot Lifecycle Management Package Docs (#43535) * Fix compilation for transport actions now that task is required * Add a note mentioning the privileges needed for SLM (#43708) * Add a note mentioning the privileges needed for SLM This adds a note to the top of the "getting started with SLM" documentation mentioning that there are two built-in privileges to assist with creating roles for SLM users and administrators. Relates to #38461 * Mention that you can create snapshots for indices you can't read * Fix REST tests for new number of cluster privileges * Mute testThatNonExistingTemplatesAreAddedImmediately (#43951) * Fix SnapshotHistoryStoreTests after merge * Remove overridden newResponse functions that have been removed * Fix compilation for backport * Fix get snapshot output parsing in test * [DOCS] Add redirects for removed autogen anchors (#44380) * Switch <tt>...</tt> in javadocs for {@code ...}	2019-07-16 07:37:13 -06:00
Armin Braun	4a79ccd324	Cleaner Exception Handling on Shard Delete (#44384 ) (#44407 ) * Follow up to #44165 * We should just catch all exceptions here and not return errors after the index-N update went through since a subsequent delete attempt by the user would fail with SnapshotMissingException since the snapshot now appears deleted. Also, `SnapshotException` isn't even thrown in the changed spot it seems in the first place and certainly not the only exception possible.	2019-07-16 12:20:52 +02:00
David Turner	a09389c511	AwaitsFix GatewayIndexStateIT#testJustMasterNode Relates #44416.	2019-07-16 11:02:32 +01:00
David Turner	8d68d1f54d	Cluster health should await events plus other things (#44348 ) Today a cluster health request can wait on a selection of conditions, but it does not guarantee that all of these conditions have ever held simultaneously when it returns. More specifically, if a request sets `waitForEvents()` along with some other conditions then Elasticsearch will respond when the master has processed all the expected pending tasks _and then_ the cluster satisfied the other conditions, but it may be that at the time the cluster satisfied the other conditions there were undesired pending tasks again. This commit adjusts the behaviour of `waitForEvents()` to wait for all the required events to be processed and then, if the resulting cluster state does not satisfy the other conditions, it will wait until there is a cluster state that does and then retry the wait-for-events too.	2019-07-16 06:34:02 +01:00
Ryan Ernst	e0b82e92f3	Convert BaseNode(s) Request/Response classes to Writeable (#44301 ) (#44358 ) This commit converts all BaseNodeResponse and BaseNodesResponse subclasses to implement Writeable.Reader instead of Streamable. relates #34389	2019-07-15 18:07:52 -07:00
David Turner	86ee8eab3f	Allow RerouteService to reroute at lower priority (#44338 ) Today the `BatchedRerouteService` submits its delayed reroute task at `HIGH` priority, but in some cases a lower priority would be more appropriate. This commit adds the facility to submit delayed reroute tasks at different priorities, such that each submitted reroute task runs at a priority no lower than the one requested. It does not change the fact that all delayed reroute tasks are submitted at `HIGH` priority, but at least it makes this explicit.	2019-07-15 17:41:39 +01:00
Ryan Ernst	59658daef9	Separate streamable based master node actions (#44313 ) This commit creates new base classes for master node actions whose response types still implement Streamable. This simplifies both finding remaining classes to convert, as well as creating new master node actions that use Writeable for their responses. relates #34389	2019-07-15 09:20:20 -07:00
David Turner	e3d2af64c4	Throw TranslogCorruptedException in more cases (#44217 ) Today we do not throw a `TranslogCorruptedException` in certain cases of translog corruption, such as for a corrupted checkpoint file or when an expected file (either checkpoint or translog) is completely missing. This means that `elasticsearch-shard` will not truncate the translog in those cases. This commit strengthens the translog corruption tests to corrupt and/or delete both translog and checkpoint files, and ensures that a `TranslogCorruptedException` is thrown in all cases. It also sometimes simulates a recovery after a crash while rolling the translog generation, including cases where the rolled checkpoint contains incorrect data. It also adjusts (and renames) `RemoveCorruptedShardDataCommandIT.getDirs()` to return only a single path, since in practice this was the only thing that could happen and yet we were relying on its callers to verify this and not all callers were doing so.	2019-07-15 15:20:33 +01:00
Armin Braun	eb1106c465	Stronger Cleanup Shard Snapshot Directory on Delete (#44257 ) (#44337 ) * Stronger Cleanup Shard Snapshot Directory on Delete * Use `RepositoryData` to clean up unreferenced `snap-${uuid}.dat` blobs from shard directories (and index-N) and as a result also clean up data blobs that are only referenced by them * Stop cleaning up anything but index-N on shard snapshot creation to align behavior of shard index-N handling with root path index-N handling	2019-07-15 12:59:38 +02:00
Christoph Büscher	22dc125dad	AnalyzeAction.Response doesn't need to call super.readFrom() (#44331 ) The responses super.writeTo() method was removed in #44092, so the corresponding contructor that reads from a stream shouldn't call super itself, even though its implementation is currently empty.	2019-07-15 11:53:25 +02:00
Armin Braun	7f5d40d235	Avoid Needless Set Instantiation in InboundMessage (#44318 ) (#44329 ) * Avoid Needless Set Instantiation in InboundMessage * When `features` is empty (when there's no xpack) we constantly and needless instantiated a few objects here for the empty set on every message	2019-07-15 10:59:51 +02:00
Armin Braun	0cc94a457d	Remove non-SMILE Serialization from ChecksumBlobStoreFormat (#44278 ) (#44326 ) * At least all the way back to 6.x we never use anything but `SMILE` in production code with this class so I removed the more general constructor and removed the format leniency from the deserialization	2019-07-15 10:59:33 +02:00
Tanguy Leroux	76a96c3774	Remove ReusePeerRecoverySharedTest class (#44275 )	2019-07-15 10:29:29 +02:00
Armin Braun	d73e2f9c56	HLRC: Fix '+' Not Correctly Encoded in GET Req. (#33164 ) (#44324 ) * HLRC: Fix '+' Not Correctly Encoded in GET Req. * Encode `+` correctly as `%2B` in URL paths * Keep encoding `+` as space in URL parameters * Closes #33077	2019-07-15 10:21:54 +02:00
Nhat Nguyen	2203d447aa	Fail engine if hit document failure on replicas (#43523 ) An indexing on a replica should never fail after it was successfully indexed on a primary. Hence, we should fail an engine if we hit any failure (document level or tragic failure) when processing an indexing on a replica. Relates #43228 Closes #40435	2019-07-14 19:29:16 -04:00
Christoph Büscher	835b7a120d	Fix AnalyzeAction response serialization (#44284 ) Currently we loose information about whether a token list in an AnalyzeAction response is null or an empty list, because we write a 0 value to the stream in both cases and deserialize to a null value on the receiving side. This change fixes this so we write an additional flag indicating whether the value is null or not, followed by the size of the list and its content. Closes #44078	2019-07-14 10:35:11 +02:00
Ryan Ernst	1dcf53465c	Reorder HandledTransportAction ctor args (#44291 ) This commit moves the Supplier variant of HandledTransportAction to have a different ordering than the Writeable.Reader variant. The Supplier version is used for the legacy Streamable, and currently having the location of the Writeable.Reader vs Supplier in the same place forces using casts of Writeable.Reader to select the correct super constructor. This change in ordering allows easier migration to Writeable.Reader. relates #34389	2019-07-12 13:45:09 -07:00
Nikita Glashenko	d187fcb9de	Support WKT point conversion to geo_point type (#44107 ) This PR adds support for parsing geo_point values from WKT POINT format. Also, a few minor bugs in geo_point parsing were fixed. Closes #41821	2019-07-12 14:31:07 -04:00
Przemyslaw Gomulka	e23ecc5838	JSON logging refactoring and X-Opaque-ID support backport(#41354 ) (#44178 ) This is a refactor to current JSON logging to make it more open for extensions and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage) These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter. Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field. closes #41350 backport #41354	2019-07-12 16:53:27 +02:00
Armin Braun	9b4f50b40a	Remove Redundant GetAllSnapshots Method from RepositoryData (#44259 ) (#44271 ) * With the removal of the incompatible snapshots list in RepositoryData the get snapshots and get all snapshots methods are equivalent so I removed one of them	2019-07-12 15:03:09 +02:00
Yannick Welsch	068286ca4b	Remove RemoteClusterConnection.ConnectedNodes (#44235 ) This instead exposes the set of connected nodes on ConnectionManager.	2019-07-12 14:54:21 +02:00
Armin Braun	6c02cf0241	Fix InternalTestCluster StopRandomNode Assertion (#44258 ) (#44265 ) * The assertion added in #44214 is tripped by tests running dedicated test clusters per test needlessly.This breaks existing tests like the one in #44245. * Closes #44245	2019-07-12 13:18:55 +02:00
Armin Braun	ad6dce16f4	Safer Shard Snapshot Delete (#44165 ) (#44244 ) * Safer Shard Snapshot Delete * We shouldn't delete the snapshot meta file before we update the index in the shard folder. If we fail to update the index-N after deleting the existing index-N is broken because the snap- blob it references is gone.	2019-07-12 12:45:06 +02:00
David Turner	735c897ec6	Avoid counting votes from master-ineligible nodes (#43688 ) Today if a master-eligible node is converted to a master-ineligible node it may remain in the voting configuration, meaning that the master node may count its publish responses as an indication that it has properly persisted the cluster state. However master-ineligible nodes do not properly persist the cluster state, so it is not safe to count these votes. This change adjusts `CoordinationState` to take account of this from a safety point of view, and also adjusts the `Coordinator` to prevent such nodes from joining the cluster. Instead, it triggers a reconfiguration to remove from the voting configuration a node that now appears to be master-ineligible before processing its join. Backport of #43688, see #44260.	2019-07-12 11:30:52 +01:00
Armin Braun	9e920f9612	Make Timestamps Returned by Snapshot APIs Consistent (#43148 ) (#44261 ) * We don't have to calculate the start and end times form the shards for the status API, we have the start time available from the CS or the `SnapshotInfo` in the repo and can either take the end time form the `SnapshotInfo` or take the most recent time from the shard stats for in progress snapshots * Closes #43074	2019-07-12 12:05:35 +02:00
Mark Vieira	3cd9606566	Mute failing test	2019-07-11 13:32:49 -07:00
Armin Braun	0dd06cf7a5	Remove Dead Code Around Snapshots (#44109 ) (#44236 ) * Just some random spots that have become unused with recent cleanups	2019-07-11 21:56:36 +02:00
Christoph Büscher	31725ef390	[Tests] Increase SimpleQueryStringIT allowed maxClauseCount (#44215 ) For this test, we randomize the CLUSTER_MAX_CLAUSE_COUNT on test setup (@BeforeClass) between 50 and 100. Some queries in the test generate 56 clauses which hasn't been an issue before LUCENE-8811, but we slightly need to increase the minimal possible clause count now. Closes #44192	2019-07-11 20:16:20 +02:00
Yannick Welsch	ae8f625d73	Report usages old child breakers when breaking on real memory (#44221 ) This will help in investigations where the real memory circuit breaker is tripped to better understand on what the actual memory is used, i.e. whether it's a temporary thing (e.g. requests) in contrast to more permanently allocated memory (e.g. accounting).	2019-07-11 19:52:12 +02:00
Armin Braun	2768662822	Cleanup Stale Root Level Blobs in Sn. Repository (#43542 ) (#44226 ) * Cleans up all root level temp., snap-%s.dat, meta-%s.dat blobs that aren't referenced by any snapshot to deal with dangling blobs left behind by delete and snapshot finalization failures * The scenario that get's us here is a snapshot failing before it was finalized or a delete failing right after it wrote the updated index-(N+1) that doesn't reference a snapshot anymore but then fails to remove that snapshot * Not deleting other dangling blobs since that don't follow the snap-, meta- or tempfile naming schemes to not accidentally delete blobs not created by the snapshot logic * Follow up to #42189 * Same safety logic, get list of all blobs before writing index-N blobs, delete things after index-N blobs was written	2019-07-11 19:35:15 +02:00
Andrei Stefan	e9f9f00940	SQL: add pretty printing to JSON format (#43756 ) (#44220 ) (cherry picked from commit cbd9d4c259bf5a541bc49f65f7973174a36df449)	2019-07-11 20:02:24 +03:00
Christos Soulios	c091b6c004	Migrating tests from AvgIT integration test to AvgAggregatorTests (#44076 ) (#44225 ) This PR migrates most tests from AvgIT integration test to AvgAggregatorTests, as described in #42893	2019-07-11 19:20:13 +03:00
Armin Braun	5f22370b6b	Fix ShrinkIndexIT (#44214 ) (#44223 ) * Fix ShrinkIndexIT * Move this test suit to cluster scope. Currently, `testShrinkThenSplitWithFailedNode` stops a random node which randomly turns out to be the only shared master node so the cluster reset fails on account of the fact that no shared master node survived. * Closes #44164	2019-07-11 17:58:00 +02:00
Igor Motov	1636701d69	CI: Disable SimpleQueryStringIT.testDocWithAllTypes Tracked by #44192	2019-07-11 09:22:18 -05:00
Nick Knize	374030a53f	Upgrade to lucene-8.2.0-snapshot-860e0be5378 (#44171 ) (#44184 ) Upgrades lucene library to lucene-8.2.0-snapshot-860e0be5378	2019-07-11 09:17:22 -05:00
Igor Motov	66a9b721f5	Add Map to XContentParser Wrapper (#44036 ) In some cases we need to parse some XContent that is already parsed into a map. This is currently happening in handling source in SQL and ingest processors as well as parsing null_value values in geo mappings. To avoid re-serializing and parsing the value again or writing another map-based parser this commit adds an iterator that iterates over a map as if it was XContent. This makes reusing existing XContent parser on maps possible. Relates to #43554	2019-07-11 09:38:31 -04:00
Yannick Welsch	ea5513f2cf	Make NodeConnectionsService non-blocking (#44211 ) With connection management now being non-blocking, we can make NodeConnectionsService avoid the use of MANAGEMENT threads that are blocked during the connection attempts. I had to fiddle a bit with the tests as testPeriodicReconnection was using both the mock Threadpool from the DeterministicTaskQueue as well as the real ThreadPool initialized at the test class level, which resulted in races.	2019-07-11 14:08:07 +02:00
Armin Braun	51f0e941d3	Reduce Number of List Calls During Snapshot Create and Delete (#44088 ) (#44209 ) * Reduce Number of List Calls During Snapshot Create and Delete Some obvious cleanups I found when investigation the API call count metering: * No need to get the latest generation id after loading latest repository data * Loading RepositoryData already requires fetching the latest generation so we can reuse it * Also, reuse list of all root blobs when fetching latest repo generation during snapshot delete like we do for shard folders * Lastly, don't try and load `index--1` (N = -1) repository data, it doesn't exist -> just return the empty repo data initially	2019-07-11 13:52:36 +02:00
Armin Braun	8ce8c627dd	Some Cleanup in o.e.i.shard (#44097 ) (#44208 ) * Some Cleanup in o.e.i.shard * Extract one duplicated method * Cleanup obviously unused code	2019-07-11 13:52:06 +02:00
Yannick Welsch	2ee07f1ff4	Simplify port usage in transport tests (#44157 ) Simplifies AbstractSimpleTransportTestCase to use JVM-local ports and also adds an assertion so that cases like #44134 can be more easily debugged. The likely reason for that one is that a test, which was repeated again and again while always spawning a fresh Gradle worker (due to Gradle daemon) kept increasing Gradle worker IDs, causing an overflow at some point.	2019-07-11 13:35:37 +02:00
Armin Braun	c0ed64bb92	Improve Repository Consistency Check in Tests (#44204 ) * Improve Repository Consistency Check in Tests (#44099) * Check that index metadata as well as snapshot metadata always exists when referenced by other metadata * Fix SnapshotResiliencyTests on ExtraFS (#44113) * As a result of #44099 we're now checking more directories and have to ignore the `extraN` folders for those like we do for indices already * Closes #44112	2019-07-11 11:14:37 +02:00
Armin Braun	8a554f9737	Remove IncompatibleSnapshots Logic from Codebase (#44096 ) (#44183 ) * The incompatible snapshots logic was created to track 1.x snapshots that became incompatible with 2.x * It serves no purpose at this point * It adds an additional GET request to every loading of RepositoryData (from loading the incompatible snapshots blob)	2019-07-11 07:15:51 +02:00
Igor Motov	df2e1fb43e	Geo: add validator that only checks altitude (#43893 ) By default, we don't check ranges while indexing geo_shapes. As a result, it is possible to index geoshapes that contain contain coordinates outside of -90 +90 and -180 +180 ranges. Such geoshapes will currently break SQL and ML retrieval mechanism. This commit removes these restriction from the validator is used in SQL and ML retrieval.	2019-07-10 16:55:03 -04:00
Ryan Ernst	8fda49a834	Remove unused import in TransportShardBulkAction Accidentally left from backporting #44092	2019-07-10 13:43:47 -07:00
Christoph Büscher	cbb19032df	[Test] Additional logging for RemoteClusterClientTests (#44124 )	2019-07-10 22:41:54 +02:00
Ryan Ernst	c6efb9be2a	Convert ReplicationResponse to Writeable (#43953 ) This commit convers ReplicationResponse and all its subclasses to support Writeable.Reader as a constructor. relates #34389	2019-07-10 12:45:10 -07:00
Ryan Ernst	fb77d8f461	Removed writeTo from TransportResponse and ActionResponse (#44092 ) The base classes for transport requests and responses currently implement Streamable and Writeable. The writeTo method on these base classes is implemented with an empty implementation. Not only does this complicate subclasses to think they need to call super.writeTo, but it also can lead to not implementing writeTo when it should have been implemented, or extendiong one of these classes when not necessary, since there is nothing to actually implement. This commit removes the empty writeTo from these base classes, and fixes subclasses to not call super and in some cases implement an empty writeTo themselves. relates #34389	2019-07-10 12:42:04 -07:00
Zachary Tong	92ad588275	Remove generic on AggregatorFactory (#43664 ) (#44079 ) AggregatorFactory was generic over itself, but it doesn't appear we use this functionality anywhere (e.g. to allow the super class to declare arguments/return types generically for subclasses to override). Most places use a wildcard constraint, and even when a concrete type is specified it wasn't used. But since AggFactories are widely used, this led to the generic touching many pieces of code and making type signatures fairly complex	2019-07-10 13:20:28 -04:00
Nhat Nguyen	b158919542	Do not use mock engine in PrimaryAllocationIT (#44083 ) PrimaryAllocationIT#testForceStaleReplicaToBePromotedToPrimary relies on the flushing when a shard is no long assigned. This behavior, however, can be randomly disabled in MockInternalEngine. Closes #44049	2019-07-10 12:26:34 -04:00
David Turner	d0f1a756d9	Comment on the extra reroute after failing shards (#44152 ) The `ShardFailedClusterStateTaskExecutor` fails some shards, which performs a reroute, but then sometimes schedules a followup reroute. It's not clear from the code why this followup is necessary, so this commit adds a short comment describing why it's necessary.	2019-07-10 13:24:21 +01:00
David Roberts	cad804df92	[TEST] Mute ShrinkIndexIT Due to https://github.com/elastic/elasticsearch/issues/44164	2019-07-10 13:22:25 +01:00
Martijn van Groningen	913b6a64e8	Replace Streamable w/ Writable for MultiSearchRequest (#44057 ) This commit replaces usages of Streamable with Writeable for the MultiSearchRequest class. I ran into this when developing a custom action that reuses MultiSearchRequest in the enrich branch. Relates to #34389	2019-07-10 11:13:28 +02:00
Armin Braun	a23d1ed00d	Mute SearchWithRandomExceptionsIT (#44147 ) (#44149 ) * This is failing quiete often and we can reproduce it now so we don't need additional test logging on CI * Relates #40435	2019-07-10 08:12:26 +02:00
David Turner	aec44fecbc	Decouple DiskThresholdMonitor & ClusterInfoService (#44105 ) Today the `ClusterInfoService` requires the `DiskThresholdMonitor` at construction time so that it can notify it when nodes report changes in their disk usage, but this is awkward to construct: the `DiskThresholdMonitor` requires a `RerouteService` which requires an `AllocationService` which comees from the `ClusterModule` which requires the `ClusterInfoService`. Today we break the cycle with a `LazilyInitializedRerouteService` which is itself a little ugly. This commit replaces this with a more traditional subject/observer relationship between the `ClusterInfoService` and the `DiskThresholdMonitor`.	2019-07-09 18:43:32 +01:00
David Turner	e70cad4c52	Remove node conn block after connection barrier (#44114 ) Today `testOnlyBlocksOnConnectionsToNewNodes` fails (extremely rarely) if the last attempt to connect to `node0` is delayed for so long that the test runs `nodeConnectionsBlocks.clear()` before the connection attempt obtains the expected connection block. We can turn this into a reliable failure with this delay: ```diff diff --git a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java index f48413824d3..9a1d0336bcd 100644 --- a/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java +++ b/server/src/main/java/org/elasticsearch/cluster/NodeConnectionsService.java @@ -300,6 +300,13 @@ public class NodeConnectionsService extends AbstractLifecycleComponent { private final Runnable connectActivity = () -> threadPool.executor(ThreadPool.Names.MANAGEMENT).execute(new AbstractRunnable() { @Override protected void doRun() { + + try { + Thread.sleep(500); + } catch (InterruptedException e) { + throw new AssertionError("unexpected", e); + } + assert Thread.holdsLock(mutex) == false : "mutex unexpectedly held"; transportService.connectToNode(discoveryNode); consecutiveFailureCount.set(0); ``` This commit reverts the extra logging introduced in #43979 and fixes this failure by waiting for the connection attempt to hit the barrier before removing it. Fixes #40170	2019-07-09 17:03:26 +01:00
David Turner	268971db03	Wait for blackholed connection before discovery (#44077 ) Since #42636 we no longer treat connections specially when simulating a blackholed connection. This means that at the end of the safety phase we may have just started a connection attempt which will time out, but the default timeout is 30 seconds, much longer than the 2 seconds we normally allow for post-safety-phase discovery. This commit adds time for such a connection attempt to time out. It also fixes some spurious logging of `this` that now refers to an object with an unhelpful `toString()` implementation introduced in #42636. Fixes #44073	2019-07-09 10:59:53 +01:00
Henning Andersen	748a10866d	Reindex ScrollableHitSource pump data out (#43864 ) Refactor ScrollableHitSource to pump data out and have a simplified interface (callers should no longer call startNextScroll, instead they simply mark that they are done with the previous result, triggering a new batch of data). This eases making reindex resilient, since we will sometimes need to rerun search during retries. Relates #43187 and #42612	2019-07-09 11:50:09 +02:00
David Turner	fd9eebae81	Only apply initial recovery filter to shrunk shard (#44054 ) Today the `index.routing.allocation.initial_recovery._id` setting can only be set on indices that are the result of a shrink, but the filtered allocation decider also applies this filter to shards with a recovery source of `EMPTY_STORE`. The only way to have this setting set while the recovery source is `EMPTY_STORE` is to force-allocate an empty primary, but such a forced allocation ignores this allocation decider. This commit simplifies the allocation decider so that the `initial_recovery` setting only applies to shards with a recovery source of `LOCAL_SHARDS`.	2019-07-09 08:42:18 +01:00
Armin Braun	9eac5ceb1b	Dry up inputstream to bytesreference (#43675 ) (#44094 ) * Dry up Reading InputStream to BytesReference * Dry up spots where we use the same pattern to get from an InputStream to a BytesReferences	2019-07-09 09:18:25 +02:00
Armin Braun	dc8f8e40eb	Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode (#43537 ) (#44082 ) * Fix DedicatedClusterSnapshotRestoreIT testSnapshotWithStuckNode * See comment in the test: The problem is that when the snapshot delete works out partially on master failover and the retry fails on `SnapshotMissingException` no repository cleanup is run => we still failed even with repo cleanup logic in the delete path now * Fixed the test by rerunning a create snapshot and delete loop to clean up the repo before verifying file counts * Closes #39852	2019-07-09 06:32:08 +02:00
Armin Braun	03332b5aeb	Don't Consistency Check Broken Repository in Test (#43499 ) (#44071 ) * Missed this one in #42189 and it randomly runs into a situation where the broken mock repo is broken such that we can't get to a consistent end state via a delete * Closes #43498	2019-07-08 17:21:40 +02:00
Tanguy Leroux	251287f89d	Check again on-going snapshots/restores of indices before closing (#43873 ) Today we prevent any index that is actively snapshotted or restored to be closed. This verification is done during the execution of the first phase of index closing (ie before blocking the indices). We should also do this verification again in the last phase of index closing (ie after the shard sanity checks and right before actually changing the index state and the routing table) because a snapshot/restore could sneak in while the shards are verified-before-close.	2019-07-08 17:07:04 +02:00
Mark Tozzi	299a52c17d	Enable validating user-supplied missing values on unmapped fields (#43718 ) (#43940 ) Provides a hook for aggregations to introspect the `ValuesSourceType` for a user supplied Missing value on an unmapped field, when the type would otherwise be `ANY`. Mapped field behavior is unchanged, and still applies the `ValuesSourceType` of the field. This PR just provides the hook for doing this, no existing aggregations have their behavior changed.	2019-07-08 10:46:23 -04:00
Armin Braun	2918363e90	Simplify BlobStoreRepository (Flatten Nested Classes) (#42833 ) (#44060 ) * In the current codebase it is hardly obvious what code operates on a shard and is run by a datanode what code operates on the global metadata and is run on master * Fixed by adjusting the method names accordingly * The nested context classes don't add much if any value, they simply spread out the parameters that go into a shard snapshot create or delete all over the place since their constructors can be inlined in all spots * Fixed by flattening the nested classes into BlobStoreRepository * Also: * Inlined the other single use inner classes	2019-07-08 14:57:27 +02:00
Armin Braun	afe81fd625	Some Cleanup in Test Framework (#44039 ) (#44059 ) * Remove some obvious dead code * Move assert methods that were only used in a single test class to the child they belong to * Inline some redundant methods	2019-07-08 14:15:31 +02:00
David Turner	3f3bcb23c2	AwaitsFix testForceStaleReplicaToBePromotedToPrimary Relates #44049	2019-07-08 11:26:57 +01:00
David Turner	3129f5b42e	Do not copy initial recovery filter during split (#44053 ) If an index is the result of a shrink then it will have a value set for `index.routing.allocation.initial_recovery._id`. If this index is subsequently split then this value will be copied over, forcing the initial allocation of the split shards to occur on the node on which the shrink took place. Moreover if this node no longer exists then the split will fail. This commit suppresses the copying of this setting when splitting an index. Fixes #43955	2019-07-08 10:32:05 +01:00
Armin Braun	af9b98e81c	Recursively Delete Unreferenced Index Directories (#42189 ) (#44051 ) * Use ability to list child "folders" in the blob store to implement recursive delete on all stale index folders when cleaning up instead of using the diff between two `RepositoryData` instances to cover aborted deletes * Runs after ever delete operation * Relates #13159 (fixing most of this issues caused by unreferenced indices, leaving some meta files to be cleaned up only)	2019-07-08 10:55:39 +02:00
Przemyslaw Gomulka	247f2dabad	Fix decimal point parsing for date_optional_time backport(#43859 ) #44050 Joda allowed for date_optional_time and strict_date_optional_time a decimal point to be . dot or , comma For our java.time implementation we should also extend this for strict_date_optional_time-nanos the approach to fix this is the same as in iso8601 parser closes #43730	2019-07-08 09:56:01 +02:00
Armin Braun	f6efc55556	Fix SnapshotResiliencyTest (#44015 ) (#44041 ) * Closes #43989	2019-07-07 19:59:16 +02:00
Armin Braun	990ac4ca83	Some Cleanup in BlobStoreRepository (#43323 ) (#44043 ) * Some Cleanup in BlobStoreRepository * Extracted from #42833: * Dry up index and shard path handling * Shorten XContent handling	2019-07-07 19:50:46 +02:00
Nhat Nguyen	9089820d8f	Enable indexing optimization using sequence numbers on replicas (#43616 ) This PR enables the indexing optimization using sequence numbers on replicas. With this optimization, indexing on replicas should be faster and use less memory as it can forgo the version lookup when possible. This change also deactivates the append-only optimization on replicas. Relates #34099	2019-07-05 22:12:08 -04:00
Yannick Welsch	504a43d43a	Move ConnectionManager to async APIs (#42636 ) This commit converts the ConnectionManager's openConnection and connectToNode methods to async-style. This will allow us to not block threads anymore when opening connections. This PR also adapts the cluster coordination subsystem to make use of the new async APIs, allowing to remove some hacks in the test infrastructure that had to account for the previous synchronous nature of the connection APIs.	2019-07-05 20:40:22 +02:00
Yannick Welsch	88783927d1	Weaken assertion in PublicationTransportHandler (#44014 ) These assertions do not hold true when a master fails during publication and quickly becomes master again, publishing a new cluster state in a higher term which races against the previous cluster state publication to self (which does not matter anyway). Relates #43994 Closes #44012	2019-07-05 18:27:42 +02:00
Yannick Welsch	1220ff5b6d	Publish to self through transport (#43994 ) This commit ensures that cluster state publications to self also go through the transport layer. This allows voting-only nodes to intercept the publication to self. Fixes an issue discovered by a test failure where a voting-only node, which was the only bootstrapped node, would not step down as master after state transfer because publishing to self would succeed. Closes #43631	2019-07-05 13:00:52 +02:00
Yannick Welsch	5cdf3ff3fa	Revert "[TEST] Mute RemoteClusterServiceTests.testCollectNodes" This reverts commit `d8a2970fa4`.	2019-07-05 11:02:42 +02:00
David Turner	06df0c0a4c	Improve RetentionLease(Bgrd)SyncAction#toString() (#43987 ) Today `RetentionLeaseSyncAction.Request` and `RetentionLeaseBackgroundSyncAction.Request` both describe themselves as `Request{...}` in the value returned from their respective `toString()` methods. This commit adds the name of the owning class to both so we have something a bit easier to search for and so we can distinguish foreground from background syncs in logs and test failures and so on.	2019-07-05 09:58:35 +01:00
David Turner	435a83f3fd	Add more logging to testOnlyBlocksOnConnectionsToNewNodes (#43979 ) Some more output from this occasionally-failing test tracked in #40170.	2019-07-05 09:54:48 +01:00
Jim Ferenczi	cdf55cb5c5	Refactor index engines to manage readers instead of searchers (#43860 ) This commit changes the way we manage refreshes in the index engines. Instead of relying on a SearcherManager, this change uses a ReaderManager that creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand (when acquireSearcher is called) from the current ElasticsearchDirectoryReader. It also slightly changes the Engine.Searcher to extend IndexSearcher in order to simplify the usage in the consumer.	2019-07-04 22:49:43 +02:00
Christoph Büscher	aeb3c1fd1b	Prevent types deprecation warning for indices.exists requests (#43963 ) Currently we log a deprecation warning to the types removal in RestGetIndicesAction even if the REST method is HEAD, which is used by the indices.exists API. Since the body is empty in this case we should not need to show the deprecation warning. Closes #43905	2019-07-04 17:20:43 +02:00
Tanguy Leroux	b037aeaa6e	Fix IndexShardIT.testIndexCanChangeCustomDataPath() (#43978 ) The test IndexShardIT.testIndexCanChangeCustomDataPath() fails on 7.x and 7.3 because the translog cannot be recovered. While I can't reproduce the issue, I think it has been introduced in #43752 which changed ReadOnlyEngine so that it opens the translog in its constructor in order to load the translog stats. This opening writes a new checkpoint file, but because 7.x/7.3 does not wait for shards to be started after being closed, the test immediately starts to copy shard files to a new directory and possibly does not copy all the required translog files. By waiting for the shards to be started after being closed, we ensure that the shards (and engines) have been correctly initialized and that the translog checkpoint file is not currently being written. closes #43964	2019-07-04 17:06:37 +02:00
Martijn van Groningen	653f1436a0	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-07-04 13:05:10 +02:00
Alan Woodward	4b99255fed	Add name() method to TokenizerFactory (#43909 ) This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory, and removes the need to pass around tokenizer names when building custom analyzers. As this means that TokenizerFactory is no longer a functional interface, the commit also adds a factory method to TokenizerFactory to make construction simpler.	2019-07-04 11:28:55 +01:00
Jim Ferenczi	2cc0a56fe6	Fix wrong logic in `match_phrase` query with multi-word synonyms (#43941 ) Disjunction over two individual terms in a phrase query with multi-word synonyms wrongly applies a prefix query to each of these terms. This change fixes this bug by inversing the logic to use prefixes on `phrase_prefix` queries only. Closes #43308	2019-07-04 09:39:39 +02:00
Henning Andersen	cacc3f7ff8	Async IO Processor release before notify (#43682 ) This commit changes async IO processor to release the promiseSemaphore before notifying consumers. This ensures that a bad consumer that sometimes does blocking (or otherwise slow) operations does not halt the processor. This should slightly increase the concurrency for shard fsync, but primarily improves safety so that one bad piece of code has less effect on overall system performance.	2019-07-04 06:33:38 +02:00
Igor Motov	c593085104	Geo: Refactors libs/geo parser to provide serialization logic as well (#43717 ) Enables libs/geo parser to return a geometry format object that can perform both serialization and deserialization functions. This can be useful for ingest nodes that are trying to modify an existing geometry in the source. Relates to #43554	2019-07-03 19:31:44 -04:00
Adrien Grand	680edbe3f1	Bump current version to 7.4. (#43927 )	2019-07-03 20:32:04 +02:00
Armin Braun	be20fb80e4	Recursive Delete on BlobContainer (#43281 ) (#43920 ) This is a prerequisite of #42189: * Add directory delete method to blob container specific to each implementation: * Some notes on the implementations: * AWS + GCS: We can simply exploit the fact that both AWS and GCS return blobs lexicographically ordered which allows us to simply delete in the same order that we receive the blobs from the listing request. For AWS this simply required listing without the delimiter setting (so we get a deep listing) and for GCS the same behavior is achieved by not using the directory mode on the listing invocation. The nice thing about this is, that even for very large numbers of blobs the memory requirements are now capped nicely since we go page by page when deleting. * For Azure I extended the parallelization to the listing calls as well and made it work recursively. I verified that this works with thread count `1` since we only block once in the initial thread and then fan out to a "graph" of child listeners that never block. * HDFS and FS are trivial since we have directory delete methods available for them * Enhances third party tests to ensure the new functionality works (I manually ran them for all cloud providers)	2019-07-03 17:14:57 +02:00
Alan Woodward	49d69bf987	Actually close IndexAnalyzers contents (#43914 ) IndexAnalyzers has a close() method that should iterate through all its wrapped analyzers and close each one in turn. However, instead of delegating to the analyzers' close() methods, it instead wraps them in a Closeable interface, which just returns a list of the analyzers. In addition, whitespace normalizers are ignored entirely.	2019-07-03 16:06:58 +01:00
David Turner	9cecc31cdc	Shortcut simple patterns ending in `` (#43904 ) When profiling a call to `AllocationService#reroute()` in a large cluster containing allocation filters of the form `node-name-` I observed a nontrivial amount of time spent in `Regex#simpleMatch` due to these allocation filters. Patterns ending in a wildcard are not uncommon, and this change treats them as a special case in `Regex#simpleMatch` in order to shave a bit of time off this calculation. It also uses `String#regionMatches()` to avoid an allocation in the case that the pattern's only wildcard is at the start. Microbenchmark results before this change: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 1113.839 ±(99.9%) 6.338 ns/op [Average] (min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486 CI (99.9%): [1107.502, 1120.177] (assumes normal distribution) Microbenchmark results with this change applied: Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch": 433.190 ±(99.9%) 0.644 ns/op [Average] (min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964 CI (99.9%): [432.546, 433.833] (assumes normal distribution) The microbenchmark in question was: @Fork(3) @Warmup(iterations = 10) @Measurement(iterations = 10) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) @SuppressWarnings("unused") //invoked by benchmarking framework public class RegexStartsWithBenchmark { private static final String testString = "abcdefghijklmnopqrstuvwxyz"; private static final String[] patterns; static { patterns = new String[testString.length() + 1]; for (int i = 0; i <= testString.length(); i++) { patterns[i] = testString.substring(0, i) + "*"; } } @Benchmark public void performSimpleMatch() { for (int i = 0; i < patterns.length; i++) { Regex.simpleMatch(patterns[i], testString); } } }	2019-07-03 14:15:27 +01:00
paulward24	cff027499a	Ensure to access RecoveryState#fileDetails under lock Closes #43840	2019-07-03 07:39:58 -04:00
Armin Braun	7059224668	Optimize Snapshot Finalization (#42723 ) (#43908 ) * Optimize Snapshot Finalization * Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented * Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs	2019-07-03 13:26:35 +02:00
Armin Braun	455b12a4fb	Add Ability to List Child Containers to BlobContainer (#42653 ) (#43903 ) * Add Ability to List Child Containers to BlobContainer (#42653) * Add Ability to List Child Containers to BlobContainer * This is a prerequisite of #42189	2019-07-03 11:30:49 +02:00
Henning Andersen	cd2972239c	AsyncIOProcessor preserve thread context (#43729 ) AsyncIOProcessor now preserves thread context, ensuring that deprecation warnings are not duplicated to other concurrent operations on the same shard.	2019-07-03 10:22:20 +02:00
Jim Ferenczi	05c0cff1b6	Fix index_prefix sub field name on nested text fields (#43862 ) This change fixes the name of the index_prefix sub field when the `index_prefix` option is set on a text field that is nested under an object or a multi-field. We don't use the full path of the parent field to set the index_prefix field name so the field is registered under the wrong name. This doesn't break queries since we always retrieve the prefix field through its parent field but this breaks other APIs like _field_caps which tries to find the parent of the `index_prefix` field in the mapping but fails. Closes #43741	2019-07-03 09:50:52 +02:00
Armin Braun	826f38cd70	Enable Parallel Deletes in Azure Repository (#42783 ) (#43886 ) * Parallel deletes via private thread pool	2019-07-03 09:28:39 +02:00
Tanguy Leroux	365dfe88ca	Refresh translog stats after translog trimming in NoOpEngine (#43825 ) This commit changes NoOpEngine so that it refreshes its translog stats once translog is trimmed. Relates #43156	2019-07-03 08:49:14 +02:00
Jake Landis	2dc056b0a0	Read the default pipeline for bulk upsert through an alias (#41963 ) (#42802 ) This commit allows bulk upserts to correctly read the default pipeline for the concrete index that belongs to an alias. Bulk upserts are modeled differently from normal index requests such that the index request is a request inside of the update request. The update request (outer) contains the index or alias name is not part of the (inner) index request. This commit adds a secondary check against the update request (outer) if the index request (inner) does not find an alias.	2019-07-02 20:44:33 -05:00
Christoph Büscher	31cf96e7bf	Return reloaded analyzers in _reload_search_ananlyzer response (#43813 ) Currently the repsonse of the "_reload_search_analyzer" endpoint contains the index names and nodeIds of indices were analyzers reloading was triggered. This change add the names of the search-time analyzers that were reloaded. Closes #43804	2019-07-02 18:51:15 +02:00
Nhat Nguyen	697cd494bf	Remove sort by primary term when reading soft-deletes (#43845 ) With Lucene rollback (#33473), we should never have more than one primary term for each sequence number. Therefore we don't have to sort by the primary term when reading soft-deletes.	2019-07-02 10:54:32 -04:00
Tanguy Leroux	b977f019b8	Expose translog stats in ReadOnlyEngine (#43752 ) (#43823 ) Backport of #43752 for 7.x.	2019-07-02 13:39:00 +02:00
David Turner	1e8e85797d	Rename and refactor RoutingService (#43827 ) The `RoutingService` has a confusing name, since it doesn't really have anything to do with routing. Its responsibility is submitting reroute commands to the master. This commit renames this class to `BatchedRerouteService`, and extracts the `RerouteService` interface to avoid passing `BiConsumer`s everywhere. It also removes that `BatchedRerouteService extends AbstractLifecycleComponent` since this service has no meaningful lifecycle. Finally, it introduces a small wrapper class to allow for lazy initialization to deal with the dependency loop when constructing a `Node`.	2019-07-02 07:04:18 +01:00
Christoph Büscher	fe3f9f0c6b	Yet another `the the` cleanup (#43815 )	2019-07-01 20:22:19 +02:00
Yogesh Gaikwad	031d5e96ac	HLRC changes for kerberos grant type (#43642 ) (#43822 ) The TODO from last PR for kerbero grant type was missed. This commit adds the changes for kerberos grant type in HLRC.	2019-07-02 00:55:02 +10:00
Zachary Tong	1e47ea5f18	Update rare_term version skips, fix SetBackedScalingCuckooFilter javadoc	2019-07-01 10:52:06 -04:00
Zachary Tong	ea1794832f	Add RareTerms aggregation (#35718 ) This adds a `rare_terms` aggregation. It is an aggregation designed to identify the long-tail of keywords, e.g. terms that are "rare" or have low doc counts. This aggregation is designed to be more memory efficient than the alternative, which is setting a terms aggregation to size: LONG_MAX (or worse, ordering a terms agg by count ascending, which has unbounded error). This aggregation works by maintaining a map of terms that have been seen. A counter associated with each value is incremented when we see the term again. If the counter surpasses a predefined threshold, the term is removed from the map and inserted into a cuckoo filter. If a future term is found in the cuckoo filter we assume it was previously removed from the map and is "common". The map keys are the "rare" terms after collection is done.	2019-07-01 10:30:02 -04:00
Nhat Nguyen	598e00a689	Make peer recovery send file info step async (#43792 ) Relates #36195	2019-07-01 08:40:45 -04:00
Julie Tibshirani	ffa5919d7c	Add support for 'flattened object' fields. (#43762 ) This commit merges the `object-fields` feature branch. The new 'flattened object' field type allows an entire JSON object to be indexed into a field, and provides limited search functionality over the field's contents.	2019-07-01 12:08:50 +03:00
Martijn van Groningen	8f3387e7cb	fixed compile errors after cherry-picking	2019-07-01 08:31:31 +02:00
Martijn van Groningen	237f2bd60a	Make ingest executing non blocking (#43361 ) Added an additional method to the Processor interface to allow a processor implementation to make a non blocking call. Also added semaphore in order to avoid search thread pools from rejecting search requests originating from the match processor. This is a temporary workaround.	2019-07-01 08:01:46 +02:00
Ryan Ernst	3a2c698ce0	Rename Action to ActionType (#43778 ) Action is a class that encapsulates meta information about an action that allows it to be called remotely, specifically the action name and response type. With recent refactoring, the action class can now be constructed as a static constant, instead of needing to create a subclass. This makes the old pattern of creating a singleton INSTANCE both misnamed and lacking a common placement. This commit renames Action to ActionType, thus allowing the old INSTANCE naming pattern to be TYPE on the transport action itself. ActionType also conveys that this class is also not the action itself, although this change does not rename any concrete classes as those will be removed organically as they are converted to TYPE constants. relates #34389	2019-06-30 22:00:17 -07:00
Martijn van Groningen	eb8e03bc8b	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-06-30 21:32:51 +02:00
David Turner	fca7a19713	Avoid parallel reroutes in DiskThresholdMonitor (#43381 ) Today the `DiskThresholdMonitor` limits the frequency with which it submits reroute tasks, but it might still submit these tasks faster than the master can process them if, for instance, each reroute takes over 60 seconds. This causes a problem since the reroute task runs with priority `IMMEDIATE` and is always scheduled when there is a node over the high watermark, so this can starve any other pending tasks on the master. This change avoids further updates from the monitor while its last task(s) are still in progress, and it measures the time of each update from the completion time of the reroute task rather than its start time, to allow a larger window for other tasks to run. It also now makes use of the `RoutingService` to submit the reroute task, in order to batch this task with any other pending reroutes. It enhances the `RoutingService` to notify its listeners on completion. Fixes #40174 Relates #42559	2019-06-30 16:54:16 +01:00
Nhat Nguyen	55b3ec8d7b	Make peer recovery clean files step async (#43787 ) Relates #36195	2019-06-29 18:30:51 -04:00
Albert Zaharovits	5e17bc5dcc	Consistent Secure Settings #40416 Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes a single public method, namely `allSecureSettingsConsistent`. The method returns `true` if the local node's secure settings (inside the keystore) are equal to the master's, and `false` otherwise. Technically, the local node has to have exactly the same secure settings - setting names should not be missing or in surplus - for all `SecureSetting` instances that are flagged with the newly introduced `Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent` is not a consensus view across the cluster, but rather the local node's perspective in relation to the master.	2019-06-29 23:26:17 +03:00
Ryan Ernst	28ab77a023	Add StreamableResponseAction to aid in deprecation of Streamable (#43770 ) The Action base class currently works for both Streamable and Writeable response types. This commit intorduces StreamableResponseAction, for which only the legacy Action implementions which provide newResponse() will extend. This eliminates the need for overriding newResponse() with an UnsupportedOperationException. relates #34389	2019-06-28 21:40:00 -07:00
Tanguy Leroux	f02cbe9e40	Trim translog for closed indices (#43156 ) Today when an index is closed all its shards are forced flushed but the translog files are left around. As explained in #42445 we'd like to trim the translog for closed indices in order to consume less disk space. This commit reuses the existing AsyncTrimTranslogTask task and reenables it for closed indices. At the time the task is executed, we should have the guarantee that nothing holds the translog files that are going to be removed. It also leaves a short period of time (10 min) during which translog files of a recently closed index are still present on disk. This could also help in some cases where the closed index is reopened shortly after being closed (in order to update an index setting for example). Relates to #42445	2019-06-28 16:58:39 +02:00
Jim Ferenczi	7ca69db83f	Refactor IndexSearcherWrapper to disallow the wrapping of IndexSearcher (#43645 ) This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection betweeen the query and the live docs instead of checking the live docs on every document that match the query.	2019-06-28 16:28:02 +02:00
weizijun	377c4cfdc0	Fix threshold spelling errors (#43326 ) Substitutes treshold by threshold	2019-06-28 15:47:57 +02:00
Alan Woodward	81dbcfb268	Wildcard intervals (#43691 ) This commit adds a wildcard intervals source, similar to the prefix. It also changes the term parameter in prefix to read prefix, to bring it in to line with the pattern parameter in wildcard. Closes #43198	2019-06-28 14:04:03 +01:00
Christoph Büscher	2cc7f5a744	Allow reloading of search time analyzers (#43313 ) Currently changing resources (like dictionaries, synonym files etc...) of search time analyzers is only possible by closing an index, changing the underlying resource (e.g. synonym files) and then re-opening the index for the change to take effect. This PR adds a new API endpoint that allows triggering reloading of certain analysis resources (currently token filters) that will then pick up changes in underlying file resources. To achieve this we introduce a new type of custom analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows swapping out analysis components. Custom analyzers that contain filters that are markes as "updateable" will automatically choose this implementation. This PR also adds this capability to `synonym` token filters for use in search time analyzers. Relates to #29051	2019-06-28 09:55:40 +02:00
Alan Woodward	51b230f6ab	Fix PreConfiguredTokenFilters getSynonymFilter() implementations (#38839 ) (#43678 ) When we added support for TokenFilterFactories to specialise how they were used when parsing synonym files, PreConfiguredTokenFilters were set up to either apply themselves, or be ignored. This behaviour is a leftover from an earlier iteration, and also has an incorrect default. This commit makes preconfigured token filters usable in synonym file parsing by default, and brings those filters that should not be used into line with index-specific filter factories; in indexes created before version 7 we emit a deprecation warning, and we throw an error in indexes created after. Fixes #38793	2019-06-28 08:19:00 +01:00
Ryan Ernst	5b4089e57e	Remove nodeId from BaseNodeRequest (#43658 ) TransportNodesAction provides a mechanism to easily broadcast a request to many nodes, and collect the respones into a high level response. Each node has its own request type, with a base class of BaseNodeRequest. This base request requires passing the nodeId to which the request will be sent. However, that nodeId is not used anywhere. It is private to the base class, yet serialized to each node, where the node could just as easily find the nodeId of the node it is on locally. This commit removes passing the nodeId through to the node request creation, and guards its serialization so that we can remove the base request class altogether in the future.	2019-06-27 18:45:14 -07:00
Igor Motov	3607876a71	Geo: Makes coordinate validator in libs/geo plugable (#43657 ) Moves coordinate validation from Geometry constructors into parser. Relates #43644	2019-06-27 19:53:41 -04:00
Nhat Nguyen	ce8771feb7	Do not use MockInternalEngine in GatewayIndexStateIT (#43716 ) GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the flushing on shutdown. This behaviour, however, can be randomly disabled in MockInternalEngine. Closes #43034	2019-06-27 18:28:04 -04:00
Yannick Welsch	6744344ef2	Handle situation where only voting-only nodes are bootstrapped (#43628 ) Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will still try to become elected and bring full master nodes into the cluster.	2019-06-27 18:10:15 +02:00
Jim Ferenczi	df4b30fd8b	Fix propagation of enablePositionIncrements in QueryStringQueryBuilder (#43578 ) This change fixes the propagation of the enablePositionIncrements option to the underlying QueryBuilder. Closes #43574	2019-06-27 17:01:01 +02:00
Jim Ferenczi	329d05f61e	Fix UOE on search requests that match a sparse role query (#43668 ) Search requests executed through the SecurityIndexSearcherWrapper throw an UnsupportedOperationException if they match a sparse role query. When low level cancellation is activated (which is the default since #42857), the context index searcher creates a weight that doesn't handle #scorer. This change fixes this bug and adds a test to ensure that we check this case.	2019-06-27 16:56:56 +02:00
Christoph Büscher	36360358b2	Move query builder caching check to dedicated tests (#43238 ) Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable flag. This is a bit fragile due to the high randomization of query builders performed by this general test. Also we might only rarely check the "interesting" cases because they rarely get generated when fully randomizing the query builder. This change moved the general checks out ot #testToQuery and instead adds dedicated cache tests for those query builders that exhibit something other than the default behaviour. Closes #43200	2019-06-27 14:56:29 +02:00
Alan Woodward	8ff5519b11	Use preconfigured filters correctly in Analyze API (#43568 ) When a named token filter or char filter is passed as part of an Analyze API request with no index, we currently try and build the relevant filter using no index settings. However, this can miss cases where there is a pre-configured filter defined in the analysis registry. One example here is the elision filter, which has a pre-configured version built with the french elision set; when used as part of normal analysis, this preconfigured set is used, but when used as part of the Analyze API we end up with NPEs because it tries to instantiate the filter with no index settings. This commit changes the Analyze API to check for pre-configured filters in the case that the request has no index defined, and is using a name rather than a custom definition for a filter. It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram` tokenizer to make their settings consistent with the defaults used when creating them with no settings Closes #43002 Closes #43621 Closes #43582	2019-06-27 09:07:01 +01:00
Martijn van Groningen	683e116601	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-06-27 08:35:37 +02:00
Yannick Welsch	05b945d010	Avoid AssertionError when closing engine (#43638 ) Lucene throwing an AlreadyClosedException when closing the engine is fine, and should not trigger an AssertionError. Closes #43626	2019-06-26 17:40:52 +02:00
Alan Woodward	76d0edd1a4	Add prefix intervals source (#43635 ) This commit adds a prefix intervals source, allowing you to search for intervals that contain terms starting with a given prefix. The source can make use of the index_prefixes mapping option. Relates to #43198	2019-06-26 16:22:12 +01:00
Tim Brooks	2fa6bc5e12	Properly serialize remote query in ReindexRequest (#43596 ) This commit modifies the RemoteInfo to clarify that a search query must always be serialized as JSON. Additionally, it adds an assertion to ensure that this is the case. This fixes #43406. Additionally, this PR implements AbstractXContentTestCase for the reindex request. This is related to #43456.	2019-06-26 10:50:14 -04:00
David Kyle	531efb3fe5	Remove unreleased 7.1.2 version constant (#43629 ) This was breaking BWC tests as the presence of the constant implied 7.1.2 was released	2019-06-26 13:53:05 +01:00
David Kyle	58d0d5c51b	Mute DiskDisruptionIT#testGlobalCheckpointIsSafe Relates to #43626	2019-06-26 10:13:41 +01:00
Yannick Welsch	2049f715b3	Add voting-only master node (#43410 ) A voting-only master-eligible node is a node that can participate in master elections but will not act as a master in the cluster. In particular, a voting-only node can help elect another master-eligible node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two can still elect a master amongst them-selves. This only requires one of the two remaining nodes to have the capability to act as master, but both need to have voting powers. This means that one of the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a voting-only non-dedicated master node can play the role of the third master-eligible node, which allows running an HA cluster with only two dedicated master nodes. Closes #14340 Co-authored-by: David Turner <david.turner@elastic.co>	2019-06-26 08:07:56 +02:00
David Turner	11f41c4e7d	Omit non-masters in ClusterFormationFailureHelper (#41344 ) Today the `ClusterFormationFailureHelper` says `... discovery will continue using ... from last-known cluster state` and lists all the nodes in the last-known cluster state. In fact we ignore the master-ineligible nodes in the last-known cluster state during discovery. This commit fixes this by listing only the master-eligible nodes from the cluster state in this message.	2019-06-26 08:07:56 +02:00
Nhat Nguyen	05e1f55a88	Ensure relocation target still tracked when start handoff (#42201 ) If the master removes the relocating shard, but recovery isn't aware of it, then we can enter an invalid state where ReplicationTracker does not include the local shard.	2019-06-25 23:19:59 -04:00
Jake Landis	9a3c86d422	include 7.2.1 as a version (#43584 )	2019-06-25 16:02:48 -05:00
David Turner	e738f0e6d2	Allow extra time for a warning to be logged (#43597 ) Today we assert that a warning is logged after no more than `discovery.cluster_formation_warning_timeout`, but the deterministic scheduler adds a small amount of extra randomness to the timing of future events, causing the following build to fail: ./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" -Dtests.seed=DF35C28D4FA9EE2D This commit adds an allowance for this extra time.	2019-06-25 20:04:56 +01:00
Tanguy Leroux	0dc1c12f13	Fix indices shown in _cat/indices (#43286 ) After two recent changes (#38824 and #33888), the _cat/indices API no longer report information for active recovering indices and non-replicated closed indices. It also misreport replicated closed indices that are potentially not authorized for the user. This commit changes how the cat action works by first using the Get Settings API in order to resolve authorized indices. It then uses the Cluster State, Cluster Health and Indices Stats APIs to retrieve information about the indices. Closes #39933	2019-06-25 20:02:34 +02:00
James Baiera	1b902aa746	Make enrich processor use search action through a client (#43311 ) Add client to processor parameters in the ingest service. Remove the search provider function from the processor parameters. ExactMatchProcessor and Factory converted to use client. Remove test cases that are no longer applicable from processor.	2019-06-25 13:09:08 -04:00
Zachary Tong	63fef5a31e	Add scripting support to AggregatorTestCase (#43494 ) This refactors AggregatorTestCase to allow testing mock scripts. The main change is to QueryShardContext. This was previously mocked, but to get the ScriptService you have to invoke a final method which can't be mocked. Instead, we just create a mostly-empty QueryShardContext and populate the fields that are needed for testing. It also introduces a few new helper methods that can be overridden to change the default behavior a bit. Most tests should be able to override getMockScriptService() to supply a ScriptService to the context, which is later used by the aggs. More complicated tests can override queryShardContextMock() as before. Adds a test to MaxAggregatorTests to test out the new functionality.	2019-06-25 11:52:12 -04:00
Przemysław Witek	c702cd7415	[7.x] Implement XContentParser.genericMap and XContentParser.genericMapOrdered methods (#42059 ) (#43575 )	2019-06-25 16:04:54 +02:00
Armin Braun	62a28921e8	Cleanup IndicesService#CacheCleaner Scheduling (#42060 ) (#43528 ) * Follow up to #42016	2019-06-25 13:04:04 +02:00
Yannick Welsch	3d5e4577aa	Fix testPostOperationGlobalCheckpointSync The conditions in this test do not hold true anymore after #43205. Relates to #43205	2019-06-25 12:49:29 +02:00
Martijn van Groningen	f587519f17	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-06-25 10:09:51 +02:00
Nhat Nguyen	01205432fe	Unmute testOpenCloseApiWildcards Relates #39578	2019-06-24 17:12:57 -04:00
Jim Ferenczi	ae31ca5f7e	Fix score mode of the MinimumScoreCollector (#43527 ) This change fixes the score mode of the minimum score collector to be set based on the score mode of the child collector (top docs). Closes #43497	2019-06-24 21:32:33 +02:00
Yannick Welsch	d45f12799c	Sync global checkpoint on pending in-sync shards (#43526 ) At the end of a peer recovery the primary wants to mark the replica as in-sync. For that the persisted local checkpoint of the replica needs to have caught up with the global checkpoint on the primary. If translog durability is set to ASYNC, this means that information about the persisted local checkpoint can lag on the primary and might need to be explicitly fetched through a global checkpoint sync action. Unfortunately, that action will only be triggered after 30 seconds, and, even worse, will only run based on what the in-sync shard copies say (see IndexShard.maybeSyncGlobalCheckpoint). As the replica has not been marked as in-sync yet, it is not taken into consideration, and the primary might have its global checkpoint equal to the max seq no, so it thinks nothing needs to be done. Closes #43486	2019-06-24 18:35:57 +02:00
Zachary Tong	eaa9ee1f16	Set document on script when using Bytes.WithScript (#43390 ) Long and Double ValuesSource set the current document on the script before executing, but Bytes was missing this method call. That meant it was possible to generate an OutOfBoundsException when using a "value" script (field + script) on keyword or other bytes fields. This adds in the method call, and a few yaml tests to verify correct behavior.	2019-06-24 12:20:28 -04:00
Andrey Ershov	98d7d231bb	Fix testNoMasterActions (#43471 ) This commit performs the proper restore of network disruption. Previously disruptionScheme.stopDisrupting() was called that does not ensure that connectivity between cluster nodes is restored. The test was checking that the cluster has green status, but it was not checking that connectivity between nodes is restored. Here we switch to internalCluster().clearDisruptionScheme(true) which performs both checks before returning. Similar to #42798 Closes #42051 (cherry picked from commit cd1ed662f847a0055ede7dfbd325e214ec4d1490)	2019-06-24 18:53:58 +03:00
Martijn van Groningen	101cf384ba	Replace Streamable w/ Writable in AcknowledgedResponse and subclasses (backport 7.x) (#43525 ) This commit replaces usages of Streamable with Writeable for the AcknowledgedResponse and its subclasses, plus associated actions. Note that where possible response fields were made final and default constructors were removed. This is a large PR, but the change is mostly mechanical. Relates to #34389 Backport of #43414	2019-06-24 13:47:37 +02:00
Tanguy Leroux	41ebaf57b5	Do not hang on unsupported HTTP methods (#43362 ) Unsupported HTTP methods are detected during requests dispatching which generates an appropriate error response. Sadly, this error is never sent back to the client because the method of the original request is checked again in DefaultRestChannel which throws again an IllegalArgumentException that is never handled. This pull request changes the DefaultRestChannel so that the latest exception is swallowed, allowing the error message to be sent back to the client. It also eagerly adds the objects to close to the toClose list so that resources are more likely to be released if something goes wrong during the response creation and sending.	2019-06-24 13:16:29 +02:00
Yannick Welsch	19520d4640	Add additional logging for #43034 It's unclear why sometimes the shard is not flushed on closing	2019-06-24 12:30:22 +02:00
Yannick Welsch	127a608147	Assert that NOOPs must succeed (#43483 ) We currently assert that adding deletion tombstones to Lucene must always succeed if it's not a tragic exception, and the same should also hold true for NOOP tombstones. We rely on this assumption, as without this, we have the risk of creating gaps in the history, which will break operation-based recoveries and CCR.	2019-06-24 11:38:34 +02:00
Nhat Nguyen	04bc754d8d	Cleanup legacy logic in CombinedDeletionPolicy (#43484 ) This change removes the support for pre-v6 index commits which do not have sequence numbers.	2019-06-23 11:30:04 -04:00
Martijn van Groningen	df9f06213d	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-06-21 19:58:04 +02:00
Luca Cavanna	186c3122be	[TEST] Embed msearch samples in MultiSearchRequestTests (#43482 ) Depending on git configuration, line feed on checked out files may be platform dependent, which causes problems to some msearch tests as the line separator must always be `/n`. With this change we move two files to the test code so that we control exactly what line separator is used, given that the corresponding tests fail on windows. Closes #43464	2019-06-21 19:05:53 +02:00
David Turner	e4fd0ce730	Reduce TestLogging usage in DisruptionIT tests (#43411 ) Removes `@TestLogging` annotations in `*DisruptionIT` tests, so that the only tests with annotations are those with open issues. Also adds links to the open issues in the remaining cases. Relates #43403	2019-06-21 15:01:03 +01:00
Christoph Büscher	4fe650c9e5	Fix DefaultShardOperationFailedException subclass xcontent serialization (#43435 ) The current toXContent implementation can fail when the superclasses toXContent is called (see #43423). This change makes sure that DefaultShardOperationFailedException#toXContent is final and implementations need to add special fields in #innerToXContent. All implementations should write to self-contained xContent objects. Also adding a test for xContent deserialization to CloseIndexResponseTests. Closes #43423	2019-06-21 14:31:19 +02:00
Yu	c88f2f23a5	Make Recovery API support `detailed` params (#29076 ) Properly forwards the `detailed` parameter to show the recovery stats details. Closes #28910	2019-06-21 09:05:33 +02:00
Andrei Stefan	90e151edeb	Mute MultiSearchRequestTests.java tests (#43467 )	2019-06-21 08:38:21 +03:00
Jim Ferenczi	cc6c114cb8	Fix round up of date range without rounding (#43303 ) Today when searching for an exclusive range the java date math parser rounds up the value with the granularity of the operation. So when searching for values that are greater than "now-2M" the parser rounds up the operation to "now-1M". This behavior was introduced when we migrated to java date but it looks like a bug since the joda math parser rounds up values but only when a rounding is used. So "now/M" is rounded to "now-1ms" (minus 1ms to get the largest inclusive value) in the joda parser if the result should be exclusive but no rounding is applied if the input is a simple operation like "now-1M". This change restores the joda behavior in order to have a consistent parsing in all versions. Closes #43277	2019-06-20 23:59:08 +02:00
Tim Brooks	827f8fcbd5	Move reindex request parsing into request (#43450 ) Currently the fromXContent logic for reindex requests is implemented in the rest action. This is inconsistent with other requests where the logic is implemented in the request. Additionally, it requires access to the rest action in order to parse the request. This commit moves the logic and tests into the ReindexRequest.	2019-06-20 17:49:11 -04:00
sandmannn	cf610b5e81	Added parsing of erroneous field value (#42321 )	2019-06-20 15:24:04 -04:00
Jake Landis	2f2d0a198f	add version 6.8.2	2019-06-20 12:07:55 -05:00
Zachary Tong	a8a81200d0	Better support for unmapped fields in AggregatorTestCase (#43405 ) AggregatorTestCase will NPE if only a single, null MappedFieldType is provided (which is required to simulate an unmapped field). While it's possible to test unmapped fields by supplying other, non-related field types... that's clunky and unnecessary. AggregatorTestCase just needs to filter out null field types when setting up.	2019-06-20 11:31:49 -04:00
Yannick Welsch	8c856d6d91	Adapt local checkpoint assertion With async durability, it does not hold true anymore after #43205. This is fine.	2019-06-20 17:29:53 +02:00
Armin Braun	99a44a04f7	Fix Infinite Loops in ExceptionsHelper#unwrap (#42716 ) (#43421 ) * Fix Infinite Loops in ExceptionsHelper#unwrap * Keep track of all seen exceptions and break out on loops * Closes #42340	2019-06-20 16:38:28 +02:00
Armin Braun	39fef8379b	Fix FsRepositoryTests.testSnapshotAndRestore (#42925 ) (#43420 ) * The commit generation can be 3 or 2 here -> fixed by checking the actual generation on the second commit instead of hard coding 2 * Closes #42905	2019-06-20 16:36:40 +02:00
synical	b4c4018d00	Remove Confusing Comment (#43400 )	2019-06-20 15:02:37 +01:00
David Turner	c8eb09f158	Fail connection attempts earlier in tests (#43320 ) Today the `DisruptibleMockTransport` always allows a connection to a node to be established, and then fails requests sent to that node such as the subsequent handshake. Since #42342, we log handshake failures on an open connection as a warning, and this makes the test logs rather noisy. This change fails the connection attempt first, avoiding these unrealistic warnings.	2019-06-20 14:45:24 +01:00
Yannick Welsch	e04a2258fc	Fix testGlobalCheckpointSync The test needed adaption after #43205, as the ReplicationTracker now distinguishes between the knowledge of the persisted global checkpoint and the computed global checkpoint on the primary Follow-up to #43205	2019-06-20 14:00:00 +02:00
Yannick Welsch	a76c034866	Reduce shard started failure logging (#43330 ) If the master is stepping or shutting down, the error-level logging can cause quite a bit of noise.	2019-06-20 13:23:05 +02:00
Yannick Welsch	7f8e1454ab	Advance checkpoints only after persisting ops (#43205 ) Local and global checkpoints currently do not correctly reflect what's persisted to disk. The issue is that the local checkpoint is adapted as soon as an operation is processed (but not fsynced yet). This leaves room for the history below the global checkpoint to still change in case of a crash. As we rely on global checkpoints for CCR as well as operation-based recoveries, this has the risk of shard copies / follower clusters going out of sync. This commit required changing some core classes in the system: - The LocalCheckpointTracker keeps track now not only of the information whether an operation has been processed, but also whether that operation has been persisted to disk. - TranslogWriter now keeps track of the sequence numbers that have not been fsynced yet. Once they are fsynced, TranslogWriter notifies LocalCheckpointTracker of this. - ReplicationTracker now keeps track of the persisted local and persisted global checkpoints of all shard copies when in primary mode. The computed global checkpoint (which represents the minimum of all persisted local checkpoints of all in-sync shard copies), which was previously stored in the checkpoint entry for the local shard copy, has been moved to an extra field. - The periodic global checkpoint sync now also takes async durability into account, where the local checkpoints on shards only advance when the translog is asynchronously fsynced. This means that the previous condition to detect inactivity (max sequence number is equal to global checkpoint) is not sufficient anymore. - The new index closing API does not work when combined with async durability. The shard verification step is now requires an additional pre-flight step to fsync the translog, so that the main verify shard step has the most up-to-date global checkpoint at disposition.	2019-06-20 11:12:38 +02:00
Tanguy Leroux	24cfca53fa	Reconnect remote cluster when seeds are changed (#43379 ) The RemoteClusterService should close the current RemoteClusterConnection and should build it again if the seeds are changed, similarly to what is done when the ping interval or the compression settings are changed. Closes #37799	2019-06-20 10:30:02 +02:00
Luca Cavanna	94a4bc9933	SearchPhaseContext to not extend ActionListener (#43269 ) The fact that SearchPhaseContext extends ActionListener makes it hard to reason about when the original listener is notified and to trace those calls. Also, the corresponding onFailure and onResponse were only needed in two places, one each, where they can be replaced by a more intuitive call, like sendSearchResponse for onResponse.	2019-06-20 10:21:24 +02:00
Martijn van Groningen	9de4e878f7	Merge remote-tracking branch 'es/7.x' into enrich-7.x	2019-06-20 09:44:31 +02:00
Jim Ferenczi	c33d62adbc	Reduce the number of docvalues iterator created in the global ordinals fielddata (#43091 ) Today the fielddata for global ordinals re-creates docvalues readers of each segment when building the iterator of a single segment. This is required because the lookup of global ordinals needs to access the docvalues's TermsEnum of each segment to retrieve the original terms. This also means that we need to create NxN (where N is the number of segment in the index) docvalues iterators each time we want to collect global ordinal values. This wasn't an issue in previous versions since docvalues readers are stateless before 6.0 so they are reused on each segment but now that docvalues are iterators we need to create a new instance each time we want to access the values. In order to avoid creating too many iterators this change splits the global ordinals fielddata in two classes, one that is used to cache a single instance per directory reader and one that is created from the cached instance that can be used by a single consumer. The latter creates the TermsEnum of each segment once and reuse them to create the segment's iterator. This prevents the creation of all TermsEnums each time we want to access the value of a single segment, hence reducing the number of docvalues iterator to create to Nx2 (one iterator and one lookup per segment).	2019-06-20 08:44:07 +02:00
Jason Tedor	1f1a035def	Remove stale test logging annotations (#43403 ) This commit removes some very old test logging annotations that appeared to be added to investigate test failures that are long since closed. If these are needed, they can be added back on a case-by-case basis with a comment associating them to a test failure.	2019-06-19 22:58:22 -04:00
Lee Hinman	6b084e55c5	[7.x] Prevent NullPointerException in TransportRolloverAction (#43353 ) (#43397 ) It's possible for the passed in `IndexMetaData` to be null (for instance, cluster state passed in does not have the index in its metadata) which in turn can cause a `NullPointerException` when evaluating the conditions for an index. This commit adds null protection and unit tests for this case. Resolves #43296	2019-06-19 16:07:28 -06:00
Jim Ferenczi	b957aa46ce	Allocate memory lazily in BestBucketsDeferringCollector (#43339 ) While investigating memory consumption of deeply nested aggregations for #43091 the memory used to keep track of the doc ids and buckets in the BestBucketsDeferringCollector showed up as one of the main contributor. In my tests half of the memory held in the BestBucketsDeferringCollector is associated to segments that don't have matching docs in the selected buckets. This is expected on fields that have a big cardinality since each bucket can appear in very few segments. By allocating the builders lazily this change reduces the memory consumption by a factor 2 (from 1GB to 512MB), hence reducing the impact on gcs for these volatile allocations. This commit also switches the PackedLongValues.Builder with a RoaringDocIdSet in order to handle very sparse buckets more efficiently. I ran all my tests on the `geoname` rally track with the following query: ```` { "size": 0, "aggs": { "country_population": { "terms": { "size": 100, "field": "country_code.raw" }, "aggs": { "admin1_code": { "terms": { "size": 100, "field": "admin1_code.raw" }, "aggs": { "admin2_code": { "terms": { "size": 100, "field": "admin2_code.raw" }, "aggs": { "sum_population": { "sum": { "field": "population" } } } } } } } } } } ````	2019-06-19 22:10:59 +02:00
Christos Soulios	d1637ca476	Backport: Refactor aggregation base classes to remove doEquals() and doHashCode() (#43363 ) This PR is a backport a of #43214 from v8.0.0 A number of the aggregation base classes have an abstract doEquals() and doHashCode() (e.g. InternalAggregation.java, AbstractPipelineAggregationBuilder.java). Theoretically this is so the sub-classes can add to the equals/hashCode and don't need to worry about calling super.equals(). In practice, it's mostly just confusing/inconsistent. And if there are more than two levels, we end up with situations like InternalMappedSignificantTerms which has to call super.doEquals() which defeats the point of having these overridable methods. This PR removes the do versions and just use equals/hashCode ensuring the super when necessary.	2019-06-19 22:31:06 +03:00
Armin Braun	be42b2c70c	Fix NetworkUtilsTests (#43295 ) (#43378 ) * Follow up to #42109: * Adjust test to only check that interface lookup by name works not actually lookup IPs which is brittle since virtual interfaces can be destroyed/created by Docker while the tests are running Co-authored-by: Jason Tedor <jason@tedor.me>	2019-06-19 21:23:09 +02:00
Lee Hinman	d81ce9a647	Return 0 for negative "free" and "total" memory reported by the OS (#42725 ) * Return 0 for negative "free" and "total" memory reported by the OS We've had a situation where the MX bean reported negative values for the free memory of the OS, in those rare cases we want to return a value of 0 rather than blowing up later down the pipeline. In the event that there is a serialization or creation error with regard to memory use, this adds asserts so the failure will occur as soon as possible and give us a better location for investigation. Resolves #42157 * Fix test passing in invalid memory value * Fix another test passing in invalid memory value * Also change mem check in MachineLearning.machineMemoryFromStats * Add background documentation for why we prevent negative return values * Clarify comment a bit more	2019-06-19 10:35:48 -06:00
Nhat Nguyen	b5c8b32cab	Do not use soft-deletes to resolve indexing strategy (#43336 ) This PR reverts #35230. Previously, we reply on soft-deletes to fill the mismatch between the version map and the Lucene index. This is no longer needed after #43202 where we rebuild the version map when opening an engine. Moreover, PrunePostingsMergePolicy can prune _id of soft-deleted documents out of order; thus the lookup result including soft-deletes sometimes does not return the latest version (although it's okay as we only use a valid result in an engine). With this change, we use only live documents in Lucene to resolve the indexing strategy. This is perfectly safe since we keep all deleted documents after the local checkpoint in the version map. Closes #42979	2019-06-19 10:40:24 -04:00
Martijn van Groningen	a4c45b5d70	Replace Streamable w/ Writeable in SingleShardRequest and subclasses (#43222 ) (#43364 ) Backport of: https://github.com/elastic/elasticsearch/pull/43222 This commit replaces usages of Streamable with Writeable for the SingleShardRequest / TransportSingleShardAction classes and subclasses of these classes. Note that where possible response fields were made final and default constructors were removed. Relates to #34389	2019-06-19 16:15:09 +02:00
Paul Sanwald	8578aba654	[backport] Adds a minimum interval to `auto_date_histogram`. (#42814 ) (#43285 ) Backports minimum interval to date histogram	2019-06-19 07:06:45 -04:00
Igor Motov	9f7d1ff2de	Geo: Add coerce support to libs/geo WKT parser (#43273 ) Adds support for coercing not closed polygons and ignoring Z value to libs/geo WKT parser. Closes #43173	2019-06-18 14:41:01 -04:00
Jim Ferenczi	de1a685cce	Fix sporadic failures in QueryStringQueryTests#testToQueryFuzzyQueryAutoFuziness (#43322 ) This commit ensures that the test does not use reserved keyword (OR, AND, NOT) when generating the random query strings. Closes #43318	2019-06-18 20:18:09 +02:00
David Turner	90a8589294	Local node is discovered when cluster fails (#43316 ) Today the `ClusterFormationFailureHelper` does not include the local node in the list of nodes it claims to have discovered. This means that it sometimes reports that it has not discovered a quorum when in fact it has. This commit adds the local node to the set of discovered nodes.	2019-06-18 12:23:23 +01:00
David Turner	2e064e0d13	Allow election of nodes outside voting config (#43243 ) Today we suppress election attempts on master-eligible nodes that are not in the voting configuration. In fact this restriction is not necessary: any master-eligible node can safely become master as long as it has a fresh enough cluster state and can gather a quorum of votes. Moreover, this restriction is sometimes undesirable: there may be a reason why we do not want any of the nodes in the voting configuration to become master. The reason for this restriction is as follows. If you want to shut the master down then you might first exclude it from the voting configuration. When this exclusion succeeds you might reasonably expect that a new master has been elected, since the voting config exclusion is almost always a step towards shutting the node down. If we allow nodes outside the voting configuration to be the master then the excluded node will continue to be master, which is confusing. This commit adjusts the logic to allow master-eligible nodes to attempt an election even if they are not in the voting configuration. If such a master is successfully elected then it adds itself to the voting configuration. This commit also adjusts the logic that causes master nodes to abdicate when they are excluded from the voting configuration, to avoid the confusion described above. Relates #37712, #37802.	2019-06-18 12:10:48 +01:00
Nhat Nguyen	0c5086d2f3	Rebuild version map when opening internal engine (#43202 ) With this change, we will rebuild the live version map and local checkpoint using documents (including soft-deleted) from the safe commit when opening an internal engine. This allows us to safely prune away _id of all soft-deleted documents as the version map is always in-sync with the Lucene index. Relates #40741 Supersedes #42979	2019-06-17 18:08:09 -04:00
David Turner	2d9b3a69e8	Relocation targets are assigned shards too (#43276 ) Adds relocation targets to the output of `IndexShardRoutingTable#assignedShards`.	2019-06-17 17:14:09 +01:00
Henning Andersen	ba15d08e14	Allow cluster access during node restart (#42946 ) (#43272 ) This commit modifies InternalTestCluster to allow using client() and other operations inside a RestartCallback (onStoppedNode typically). Restarting nodes are now removed from the map and thus all methods now return the state as if the restarting node does not exist. This avoids various exceptions stemming from accessing the stopped node(s).	2019-06-17 15:04:17 +02:00
David Turner	4b58827beb	Make DiscoveryNodeRole into a value object (#43257 ) Adds `equals()` and `hashcode()` methods to `DiscoveryNodeRole` to compare these objects' values for equality, and adds a field to allow us to distinguish unknown roles from known ones with the same name and abbreviation, for clearer test failures. Relates #43175	2019-06-17 10:23:29 +01:00
Alpar Torok	a8bf18184a	Refactor Version class to make version bumps easier (#42668 ) (#43215 ) With this change we only have to add one line to add a new version. The intent is to make it less error prone and easier to write a script to automate the process.	2019-06-17 10:49:20 +03:00
Nhat Nguyen	4b643c50fa	Account soft deletes in committed segments (#43126 ) This change fixes the delete count issue in segment stats where we don't account soft-deleted documents from committed segments. Relates #43103	2019-06-16 22:56:24 -04:00

... 11 12 13 14 15 ...

4379 Commits