OpenSearch

Commit Graph

Author	SHA1	Message	Date
Yannick Welsch	22ba759e1f	Move metadata storage to Lucene (#50928 ) * Move metadata storage to Lucene (#50907) Today we split the on-disk cluster metadata across many files: one file for the metadata of each index, plus one file for the global metadata and another for the manifest. Most metadata updates only touch a few of these files, but some must write them all. If a node holds a large number of indices then it's possible its disks are not fast enough to process a complete metadata update before timing out. In severe cases affecting master-eligible nodes this can prevent an election from succeeding. This commit uses Lucene as a metadata storage for the cluster state, and is a squashed version of the following PRs that were targeting a feature branch: * Introduce Lucene-based metadata persistence (#48733) This commit introduces `LucenePersistedState` which master-eligible nodes can use to persist the cluster metadata in a Lucene index rather than in many separate files. Relates #48701 * Remove per-index metadata without assigned shards (#49234) Today on master-eligible nodes we maintain per-index metadata files for every index. However, we also keep this metadata in the `LucenePersistedState`, and only use the per-index metadata files for importing dangling indices. However there is no point in importing a dangling index without any shard data, so we do not need to maintain these extra files any more. This commit removes per-index metadata files from nodes which do not hold any shards of those indices. Relates #48701 * Use Lucene exclusively for metadata storage (#50144) This moves metadata persistence to Lucene for all node types. It also reenables BWC and adds an interoperability layer for upgrades from prior versions. This commit disables a number of tests related to dangling indices and command-line tools. Those will be addressed in follow-ups. Relates #48701 * Add command-line tool support for Lucene-based metadata storage (#50179) Adds command-line tool support (unsafe-bootstrap, detach-cluster, repurpose, & shard commands) for the Lucene-based metadata storage. Relates #48701 * Use single directory for metadata (#50639) Earlier PRs for #48701 introduced a separate directory for the cluster state. This is not needed though, and introduces an additional unnecessary cognitive burden to the users. Co-Authored-By: David Turner <david.turner@elastic.co> * Add async dangling indices support (#50642) Adds support for writing out dangling indices in an asynchronous way. Also provides an option to avoid writing out dangling indices at all. Relates #48701 * Fold node metadata into new node storage (#50741) Moves node metadata to uses the new storage mechanism (see #48701) as the authoritative source. * Write CS asynchronously on data-only nodes (#50782) Writes cluster states out asynchronously on data-only nodes. The main reason for writing out the cluster state at all is so that the data-only nodes can snap into a cluster, that they can do a bit of bootstrap validation and so that the shard recovery tools work. Cluster states that are written asynchronously have their voting configuration adapted to a non existing configuration so that these nodes cannot mistakenly become master even if their node role is changed back and forth. Relates #48701 * Remove persistent cluster settings tool (#50694) Adds the elasticsearch-node remove-settings tool to remove persistent settings from the on disk cluster state in case where it contains incompatible settings that prevent the cluster from forming. Relates #48701 * Make cluster state writer resilient to disk issues (#50805) Adds handling to make the cluster state writer resilient to disk issues. Relates to #48701 * Omit writing global metadata if no change (#50901) Uses the same optimization for the new cluster state storage layer as the old one, writing global metadata only when changed. Avoids writing out the global metadata if none of the persistent fields changed. Speeds up server:integTest by ~10%. Relates #48701 * DanglingIndicesIT should ensure node removed first (#50896) These tests occasionally failed because the deletion was submitted before the restarting node was removed from the cluster, causing the deletion not to be fully acked. This commit fixes this by checking the restarting node has been removed from the cluster. Co-authored-by: David Turner <david.turner@elastic.co> * fix tests Co-authored-by: David Turner <david.turner@elastic.co>	2020-01-14 09:35:43 +01:00
Tim Brooks	50cb770315	Use default profile for remote connections (#50947 ) Currently, the connection manager is configured with a default profile for both the sniff and proxy connection stratgies. This profile correctly reflects the expected number of connection (6 for sniff, 18 for proxy). This commit removes the proxy strategy usages of the per connection attempt profile configuration. Additionally, it refactors other unnecessary code around the connection manager. The connection manager now can always be built inside the remote connection.	2020-01-13 21:46:23 -06:00
Tim Brooks	27c2eb744e	Fix open/close race in ConnectionManagerTests (#50621 ) Currently we reuse the same test connection for all connection attempts in the testConcurrentConnectsAndDisconnects test. This means that if the connection fails due to a pre-existing connection, the connection will be closed impacting the state of all connection attempts. This commit fixes the test, by returning a unique connection for each attempt. Fixes #49903.	2020-01-13 18:43:18 -07:00
Nhat Nguyen	fb32a55dd5	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 19:54:38 -05:00
Nhat Nguyen	05f97d5e1b	Revert "Deprecate synced flush (#50835 )" This reverts commit `1a32d7142a`.	2020-01-13 11:41:03 -05:00
Nhat Nguyen	1a32d7142a	Deprecate synced flush (#50835 ) A normal flush has the same effect as a synced flush on Elasticsearch 7.6 or later. It's deprecated in 7.6 and will be removed in 8.0. Relates #50776	2020-01-13 10:58:29 -05:00
Armin Braun	609b015e3c	Prevent Old Version Clusters From Corrupting Snapshot Repositories (#50853 ) (#50913 ) Follow up to #50692 that starts writing a `min_version` field to the `RepositoryData` so that pre-7.6 ES versions can not read it (and potentially corrupt it if they attempt to modify the repo contents) after the repository moved to the new metadata format.	2020-01-13 15:02:53 +01:00
Christoph Büscher	c31a21c3d8	Fix time zone issue in Rounding serialization (#50845 ) When deserializing time zones in the Rounding classes we used to include a tiny normalization step via `DateUtils.of(in.readString())` that was lost in #50609. Its at least necessary for some tests, e.g. the cause of #50827 is that when sending the default time zone ZoneOffset.UTC on a stream pre 7.0 we convert it to a "UTC" string id via `DateUtils.zoneIdToDateTimeZone`. This gets then read back as a UTC ZoneRegion, which should behave the same but fails the equality tests in our serialization tests. Reverting to the previous behaviour with an additional normalization step on 7.x. Co-authored-by: Nik Everett <nik9000@gmail.com> Closes #50827	2020-01-13 10:10:15 +01:00
David Turner	456de59698	Fix non-corruption in testCurrentHeaderVersion (#50883 ) Today we make multiple attempts to corrupt the translog header in `TranslogHeaderTests#testCurrentHeaderVersion`, but if we are extraordinarily unlucky then this sequence of corruptions may restore the file to its original state. This change adjusts the test to only corrupt the file once, which is certain not to leave the file in its original state.	2020-01-12 12:38:37 +00:00
Henning Andersen	2e5e5fd483	Fix testSkipRefreshIfShardIsRefreshingAlready (#50856 ) The test checked queue size and active count, however, ThreadPoolExecutor pulls out the request from the queue before marking the worker active, risking that we think all tasks are done when they are not. Now check on completed-tasks metric instead, which is guaranteed to be monotonic. Relates #50769	2020-01-11 11:21:05 -05:00
Nhat Nguyen	f4aabdcd89	Do not force refresh when write indexing buffer (#50769 ) Today we periodically check the indexing buffer memory every 5 seconds or after we have used 1/30 of the configured memory. If the total used memory is over the threshold, then we refresh the "largest" shards. If refreshing takes longer these intervals (i.e., 5s or 1/30 buffer), then we continue to enqueue refreshes to these shards. This leads to two issues: - The refresh thread pool can be exhausted and other shards can't refresh - Execute too many refreshes for the "largest" shards With this change, we only refresh the largest shards if they are not refreshing. Here we rely on the periodic check to trigger another refresh if needed. We can harden this by making the ongoing refresh triggers the memory check when it's completed. I opted out this option in this PR for simplicity. See: https://discuss.elastic.co/t/write-queue-continue-to-rise/213652/	2020-01-11 11:21:05 -05:00
Nik Everett	e6d0f7df01	Fix format problem in composite of unmapped (#50869 ) (#50875 ) When a composite aggregation is reduced using the results from an index that has one of the fields unmapped we were throwing away the formatter. This is mildly annoying, except in the case of IP addresses which were coming out as non-utf-8-characters. And tripping assertions. This carefully preserves the formatter from the working bucket. Closes #50600	2020-01-10 16:18:11 -05:00
Jim Ferenczi	60308cf0b3	Fix upgrade of custom similarity (#50851 ) This change fixes the upgrade of index metadata that contain a custom similarity with options that are not compatible with BM25. The upgrade doesn't need a real similarity service so we fake one that resolves all custom similarity to BM25 but this logic fails because the BM25 provider checks that all options are compatible. This commit removes the verification step as it is not needed during the upgrade (the verification is done when the index is restored/opened). Closes #50763	2020-01-10 18:43:13 +01:00
Armin Braun	7e68989dae	Fix Snapshot Shard Status Request Deduplication (#50788 ) (#50840 ) * Fix Snapshot Shard Status Request Deduplication The request deduplication didn't actually work for these requests since they had no `equals` and `hashCode` so the deduplicator wouldn't actually recognize equal requests.	2020-01-10 11:49:52 +01:00
Christoph Büscher	75cb4e0b69	Muting InternalAggregationsTests.testSerialization	2020-01-10 09:24:09 +01:00
Nik Everett	d021071ab9	Move scripted metric to ObjectParser (#50708 ) (#50811 ) This replaces the hand rolled parsing code for scripted metric with `ObjectParser` which is simpler to work with because it is declarative.	2020-01-09 16:09:21 -05:00
Nik Everett	ae40e22452	Drop "funny" functions building parsers (#50715 ) (#50814 ) Replaces the "funny" `Function<String, ConstructingObjectParser<T, Void>>` with a much simpler `ConstructingObjectParser<T, String>`. This makes pretty much all of our object parsers static.	2020-01-09 15:53:03 -05:00
Armin Braun	f70e8f6ab5	Fix Snapshot Repository Corruption in Downgrade Scenarios (#50692 ) (#50797 ) * Fix Snapshot Repository Corruption in Downgrade Scenarios (#50692) This PR introduces test infrastructure for downgrading a cluster while interacting with a given repository. It fixes the fact that repository metadata in the new format could be written while there's still older snapshots in the repository that require the old-format metadata to be restorable.	2020-01-09 21:21:13 +01:00
Jake Landis	de6f132887	[7.x] Foreach processor - fork recursive call (#50514 ) (#50773 ) A very large number of recursive calls can cause a stack overflow exception. This commit forks the recursive calls for non-async processors. Once forked, each thread will handle at most 10 recursive calls to help keep the stack size and thread count down to a reasonable size.	2020-01-09 13:21:18 -06:00
Nik Everett	1d8e51f89d	Support offset in composite aggs (#50609 ) (#50808 ) Adds support for the `offset` parameter to the `date_histogram` source of composite aggs. The `offset` parameter is supported by the normal `date_histogram` aggregation and is useful for folks that need to measure things from, say, 6am one day to 6am the next day. This is implemented by creating a new `Rounding` that knows how to handle offsets and delegates to other rounding implementations. That implementation doesn't fully implement the `Rounding` contract, namely `nextRoundingValue`. That method isn't used by composite aggs so I can't be sure that any implementation that I add will be correct. I propose to leave it throwing `UnsupportedOperationException` until I need it. Closes #48757	2020-01-09 14:11:24 -05:00
Julie Tibshirani	a299aba2f8	Ensure that field collapsing works with field aliases. (#50766 ) Previously, the following situation would throw an error: * A search contains a `collapse` on a particular field. * The search spans multiple indices, and in one index the field is mapped as a concrete field, but in another it is a field alias. The error occurs when we attempt to merge `CollapseTopFieldDocs` across shards. When merging, we validate that the name of the collapse field is the same across shards. But the name has already been resolved to the concrete field name, so it will be different on shards where the field was mapped as an alias vs. shards where it was a concrete field. This PR updates the collapse field name in `CollapseTopFieldDocs` to the original requested field, so that it will always be consistent across shards. Note that in #32648, we already made a fix around collapsing on field aliases. However, we didn't test this specific scenario where the field was mapped as an alias in only one of the indices being searched.	2020-01-08 14:51:15 -08:00
Christoph Büscher	b1b4282273	Make Multiplexer inherit filter chains analysis mode (#50662 ) Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the _reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. This, of course, will also make the analyzers using the multiplexer be usable at search-time only. Closes #50554	2020-01-08 22:12:01 +01:00
Przemyslaw Gomulka	e95b0c447f	Allow parsing timezone without fully provided time backport(#50178 ) (#50740 ) strict_date_optional_time changes to have optional minute part. It already allowed optional second and fraction of second part. This allows parsing 2018-01-01T00+01 , 2018-01-01T00:00+01 , 2018-01-01T00:00:00+01 , 2018-01-01T00:00:00.000+01 It won't allow parsing a timezone without an hour part as this is not allowed by iso8601 spec closes #49351	2020-01-08 20:04:57 +01:00
Henning Andersen	125feecabc	Guess root cause support unwrap (#50525 ) (#50742 ) ElasticsearchException.guessRootCauses would return wrapper exception if inner exception was not an ElasticsearchException. Fixed to never return wrapper exceptions. At least following APIs change root_cause.0.type as a result: _update with bad script _index with bad pipeline Relates #50417	2020-01-08 19:09:14 +01:00
Adrien Grand	4f2299c714	Upgrade to Lucene 8.4.0. (#50518 ) (#50750 )	2020-01-08 18:53:59 +01:00
Adrien Grand	31158ab3d5	Add per-field metadata. (#50333 ) This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267	2020-01-08 16:21:18 +01:00
Yannick Welsch	f203c2b39d	Import replicated closed dangling indices (#50649 ) Dangling replicated closed indices are not imported properly (they miss their routing table when imported).	2020-01-08 13:39:20 +01:00
Rory Hunter	b1ff74f652	New setting to prevent automatically importing dangling indices (#49174 ) Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of #48366.	2020-01-08 13:39:20 +01:00
Tim Vernum	293661d62c	Security should not reload files that haven't changed (#50724 ) In security we currently monitor a set of files for changes: - config/role_mapping.yml (or alternative configured path) - config/roles.yml - config/users - config/users_roles This commit prevents unnecessary reloading when the file change actually doesn't change the internal structure. Backport of: #50207 Co-authored-by: Anton Shuvaev <anton.shuvaev91@gmail.com>	2020-01-08 15:13:47 +11:00
Nik Everett	deb0991667	Teach ObjectParser a happy pattern (#50691 ) (#50710 ) We very commonly have object with ctors like: ``` public Foo(String name) ``` And then declare a bunch of setters on the object. Every aggregation works like this, for example. This change teaches `ObjectParser` how to build these aggregations all on its own, without any help. This'll make it much cleaner to parse aggs, and, probably, a bunch of other things. It'll let us remove lots of wrapping. I've used this new power for the `avg` aggregation just to prove that it works outside of a unit test.	2020-01-07 11:57:41 -05:00
Nhat Nguyen	c3d207f437	Disable auto refresh in testSegmentsStats (#50689 ) If an auto-refresh happens, then version_map_memory is reset to 0. By default, the auto-refresh occurs for every second in the first 30 seconds until search becomes idle. Closes #50362	2020-01-07 10:44:30 -05:00
Hendrik Muhs	98ca9500e8	implement a workaround for remote cluster validation (#50460 ) In 7.x an internal API used for validating remote cluster does not throw, see #50420 for the details. This change implements a workaround for remote cluster validation, only for 7.x branches. fixes #50420	2020-01-07 13:51:51 +01:00
Yannick Welsch	a2ef0e8830	Check allocation id when failing shard on recovery (#50656 ) A failure of a recovering shard can race with a new allocation of the shard, and cause the new allocation to be failed as well. This can result in a shard being marked as initializing in the cluster state, but not exist on the node anymore. Closes #50508	2020-01-07 09:41:28 +01:00
Jay Modi	e5191e77e3	Remove unused IndicesOptions#fromByte method (#50683 ) This change removes a no longer used method, `fromByte`, in IndicesOptions. This method was necessary for backwards compatibility with versions prior to 6.4.0 and was used when talking to those versions. However, the minimum wire compatibility version has changed and we no longer use this code. Backport of #50665	2020-01-06 14:57:10 -07:00
Nik Everett	76bb661023	Replace AggParseContext with a String (backport of #50625 ) (#50679 ) We used to have a ton off stuff in the `AggParseContext` but now we parse aggs entirely with named xcontent. So we don't need the context any more.	2020-01-06 14:32:03 -05:00
Nhat Nguyen	926c0aa74c	Fix testRelocationEstablishedPeerRecoveryRetentionLeases (#50673 ) The redNodes are calculated incorrectly. Closes #50660	2020-01-06 13:32:04 -05:00
Nik Everett	f576aefd0f	Replace bespoke parser for significance heuristics (#50623 ) (#50659 ) This replaces the hand written xcontent parsers for significance heristics with `ObjectParser` and parsing named xcontent. As a happy accident, this was the last user of `ParseFieldRegistry` so this PR entirely removes that class. Closes #25519	2020-01-06 12:57:43 -05:00
Tim Brooks	fa57813c6d	Remove races in ProxyConnectionStrategyTests (#50620 ) Currently, we use delayed address resolution in the proxy strategy tests to allow tests to connect to different addresses. Unfortunately, this has the potential to introduce races as the address is resolved each connection attempt. The number of connection attempts can vary based on when connections are opening and closing. This commit modifies the test be allowing them to specifically control which address is used. Related to #50618	2020-01-06 10:20:53 -07:00
Martijn van Groningen	7be43e9f6d	Fix ingest stats test bug. (#50653 ) This test code fixes a serialization test bug: https://gradle-enterprise.elastic.co/s/7x2ct6yywkw3o Rarely stats for the same processor are generated and the production code then sums up these stats. However the test code wasn't summing up in that case, which caused inconsistencies between the actual and expected results. Closes #50507	2020-01-06 15:37:47 +01:00
Nhat Nguyen	b71490b06b	Deprecate indices without soft-deletes (#50502 ) (#50634 ) Soft-deletes will be enabled for all indices in 8.0. Hence, we should deprecate new indices without soft-deletes in 7.x. Backport of #50502	2020-01-06 08:44:30 -05:00
Henning Andersen	ec0ec61881	Deleted docs disregarded for if_seq_no check (#50526 ) Previously, as long as a deleted version value was kept as a tombstone, another index or delete operation against the same id would leak that the doc had existed (through seq_no info) or would allow the operation if the client forged the seq_no. Fixed to disregard info on deleted docs when doing seq_no based optimistic concurrency check.	2020-01-06 13:54:36 +01:00
Nikita Glashenko	5533e1172c	Add tests for remaining IntervalsSourceProvider implementations (#50326 ) This PR adds unit tests for wire and xContent serialization of remaining IntervalsSourceProvider implementations. Closes #50150	2020-01-06 13:18:53 +01:00
David Turner	66c690922c	Collect shard sizes for closed indices (#50645 ) Today the `InternalClusterInfoService` collects information on the sizes of shards of open indices, but does not consider closed indices. This means that shards of closed indices are treated as having zero size when they are being allocated. This commit fixes this, obtaining the sizes of all shards. Relates #33888	2020-01-06 11:44:19 +00:00
Henning Andersen	312bf44601	Workaround for JDK 14 EA FileChannel.map issue (#50523 ) FileChannel.map provokes static initialization of ExtendedMapMode in JDK14 EA, which needs elevated privileges. Relates #50512	2020-01-06 12:18:49 +01:00
Nik Everett	2362c430cd	Clean up wire test case a bit (#50627 ) (#50632 ) * Adds JavaDoc to `AbstractWireTestCase` and `AbstractWireSerializingTestCase` so it is more obvious you should prefer the latter if you have a choice * Moves the `instanceReader` method out of `AbstractWireTestCase` becaue it is no longer used. * Marks a bunch of methods final so it is more obvious which classes are for what. * Cleans up the side effects of the above.	2020-01-05 16:20:38 -05:00
Nik Everett	4d58656065	Declare remaining parsers `final` (#50571 ) (#50615 ) We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which are final. This is probably the right way to declare them because in practice we never mutate them after they are built. And we certainly don't change the static reference. Anyway, this adds `final` to these parsers. I found the non-final parsers with this: ``` diff \ <(find . -type f -name '.java' -exec grep -iHe 'static.PARSER\s=' {} \+ \| sort) \ <(find . -type f -name '.java' -exec grep -iHe 'static.final.PARSER\s*=' {} \+ \| sort) \ 2>&1 \| grep '^<' ```	2020-01-03 11:48:11 -05:00
Andrei Dan	856607b5a6	Guard against null geoBoundingBox (#50506 ) (#50608 ) A geo box with a top value of Double.NEGATIVE_INFINITY will yield an empty xContent which translates to a null `geoBoundingBox`. This commit marks the field as `Nullable` and guards against null when retrieving the `topLeft` and `bottomRight` fields. Fixes https://github.com/elastic/elasticsearch/issues/50505 (cherry picked from commit 051718f9b1e1ca957229b01e80d7b79d7e727e14) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-01-03 18:04:26 +02:00
Nik Everett	1abecad21b	Mark some constants in decay functions final (#50569 ) (#50575 ) This marks a couple of constants in the `DecayFunctionBuilder` as final. They are written in CONSTANT_CASE and used as constants but not final which is a little confusing and might lead to sneaky bugs.	2020-01-03 10:58:15 -05:00
Henning Andersen	218bd19034	Improve FutureUtils.get exception handling (#50339 ) (#50417 ) FutureUtils.get() would unwrap ElasticsearchWrapperExceptions. This is trappy, since nearly all usages of FutureUtils.get() expected only to not have to deal with checked exceptions. In particular, StepListener builds upon ListenableFuture which uses FutureUtils.get to be informed about the exception passed to onFailure. This had the bad consequence of masking away any exception that was an ElasticsearchWrapperException like RemoteTransportException. Specifically for recovery, this made CircuitBreakerExceptions happening on the target node look like they originated from the source node. The only usage that expected that behaviour was AdapterActionFuture. The unwrap behaviour has been moved to that class.	2020-01-03 15:28:47 +01:00
kkewwei	5655d6a1c1	Log index name when updating index settings (#49969 ) Today we log changes to index settings like this: updating [index.setting.blah] from [A] to [B] The identity of the index whose settings were updated is conspicuously absent from this message. This commit addresses this by adding the index name to these messages. Fixes #49818.	2020-01-03 11:26:29 +00:00
Alan Woodward	8b362c657b	Add fuzzy intervals source (#49762 ) This intervals source will return terms that are similar to an input term, up to an edit distance defined by fuzziness, similar to FuzzyQuery. Closes #49595	2020-01-03 09:59:19 +00:00
Henning Andersen	e19585b47f	Enhance TransportReplicationAction assertions (#49081 ) Include failure into assertion error when replication action discovers that it has been double triggered.	2020-01-02 19:23:10 +01:00
Oleg	7539fbb30f	Deprecate the 'local' parameter of /_cat/nodes (#50499 ) The cat nodes API performs a `ClusterStateAction` then a `NodesInfoAction`. Today it accepts the `?local` parameter and passes this to the `ClusterStateAction` but this parameter has no effect on the `NodesInfoAction`. This is surprising, because `GET _cat/nodes?local` looks like it might be a completely local call but in fact it still depends on every node in the cluster. This commit deprecates the `?local` parameter on this API so that it can be removed in 8.0. Relates #50088	2020-01-02 14:53:56 +00:00
Nhat Nguyen	e7c15a5c6e	Ensure relocating shards establish peer recovery retention leases (#50486 ) We forgot to establish peer recovery retention leases for relocating primaries without soft-deletes. Relates #50351	2019-12-26 13:51:35 -05:00
Nhat Nguyen	7713221733	Fix testCancelRecoveryDuringPhase1 (#50449 ) testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't create retention leases. We need to stub method createRetentionLease. Relates #50351 Closes #50424	2019-12-26 09:48:58 -05:00
Yannick Welsch	f57569bf5c	Mute RecoverySourceHandlerTests.testCancelRecoveryDuringPhase1 Relates #50424	2019-12-24 12:13:31 -05:00
Martijn van Groningen	10ed1ae1d2	Add remote info to the HLRC (#50483 ) The additional change to the original PR (#49657), is that `org.elasticsearch.client.cluster.RemoteConnectionInfo` now parses the initial_connect_timeout field as a string instead of a TimeValue instance. The reason that this is needed is because that the initial_connect_timeout field in the remote connection api is serialized for human consumption, but not for parsing purposes. Therefore the HLRC can't parse it correctly (which caused test failures in CI, but not in the PR CI :( ). The way this field is serialized needs to be changed in the remote connection api, but that is a breaking change. We should wait making this change until rest api versioning is introduced. Co-Authored-By: j-bean <anton.shuvaev91@gmail.com> Co-authored-by: j-bean <anton.shuvaev91@gmail.com>	2019-12-24 15:11:58 +01:00
Nhat Nguyen	33204c2055	Use peer recovery retention leases for indices without soft-deletes (#50351 ) Today, the replica allocator uses peer recovery retention leases to select the best-matched copies when allocating replicas of indices with soft-deletes. We can employ this mechanism for indices without soft-deletes because the retaining sequence number of a PRRL is the persisted global checkpoint (plus one) of that copy. If the primary and replica have the same retaining sequence number, then we should be able to perform a noop recovery. The reason is that we must be retaining translog up to the local checkpoint of the safe commit, which is at most the global checkpoint of either copy). The only limitation is that we might not cancel ongoing file-based recoveries with PRRLs for noop recoveries. We can't make the translog retention policy comply with PRRLs. We also have this problem with soft-deletes if a PRRL is about to expire. Relates #45136 Relates #46959	2019-12-23 22:04:07 -05:00
Tal Levy	bed121efaf	[7.x-backport] Centralize BoundingBox logic to a dedicated class (#50469 ) Both geo_bounding_box query and geo_bounds aggregation have a very similar definition of a "bounding box". A lot of this logic (serialization, xcontent-parsing, etc) can be centralized instead of having separated efforts to do the same things	2019-12-23 11:21:39 -08:00
Aleksandr Maus	d5cec7faa1	Improve SearchHit "equals" implementation for null fields cases (#50327 ) (#50448 ) * Improve SearchHit "equals" implementation for null fields cases	2019-12-23 09:59:07 -05:00
Igor Motov	339d10c16f	Geo: Switch generated GeoJson type names to camel case (#50400 ) Switches generated GeoJson type names to camel case to conform to the standard. Closes #49568	2019-12-20 15:37:22 -05:00
Andrei Dan	a3cdbda7c6	Make the TransportRolloverAction execute in one cluster state update (#50388 ) (#50442 ) This commit makes the TransportRolloverAction more resilient, by having it execute only one cluster state update that creates the new (rollover index), rolls over the alias from the source to the target index and set the RolloverInfo on the source index. Before these 3 steps were represented as 3 chained cluster state updates, which would've seen the user manually intervene if, say, the alias rollover cluster state update (second in the chain) failed but the creation of the rollover index (first in the chain) update succeeded * Rename innerExecute to applyAliasActions (cherry picked from commit 1ba4339a0c73ef3354b8c8b44b628fc55f1dbc78) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2019-12-20 18:01:03 +00:00
Nhat Nguyen	975cc99516	Close engine before reset log appender (#50390 ) Merge threads can run and access the mock appender after we have stopped it. Closes #50315	2019-12-20 12:19:03 -05:00
Jim Ferenczi	2acafd4b15	Optimize composite aggregation based on index sorting (#48399 ) (#50272 ) Co-authored-by: Daniel Huang <danielhuang@tencent.com> This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification. In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue. The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases: * Multi-valued source, in such case early termination is not possible. * missing_bucket is set to true	2019-12-20 12:32:37 +01:00
Yannick Welsch	4f805deb0c	Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361 ) Follow-up to #48974 that ensures that replicas are only auto-expanded according to allocation filtering rules once all nodes are upgraded to a version that supports this. Helps with orchestrating cluster upgrades.	2019-12-20 11:50:00 +01:00
Yannick Welsch	5f37f1f401	Revert "Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361 )" This reverts commit `df4fe73b84`.	2019-12-20 11:07:30 +01:00
Hendrik Muhs	de14092ad2	[Transform] refactor source and dest validation to support CCS (#50018 ) refactors source and dest validation, adds support for CCS, makes resolve work like reindex/search, allow aliased dest index with a single write index. fixes #49988 fixes #49851 relates #43201	2019-12-20 10:49:53 +01:00
Alan Woodward	3cdc23ec9c	Fix meta version of task index mapping (#50363 ) The built-in task index mapping has a version field in its metadata, so that the TaskResultsService can check to see if it needs to update mappings when a new task result is stored. #48393 updated this version in TaskResultsService but omitted to change the version in the mapping itself, so a mapping update is applied every time a new task result is stored. This commit updates the mapping version so that it corresponds to the version in TaskResultsService.	2019-12-20 09:44:47 +00:00
Yannick Welsch	df4fe73b84	Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361 ) Follow-up to #48974 that ensures that replicas are only auto-expanded according to allocation filtering rules once all nodes are upgraded to a version that supports this. Helps with orchestrating cluster upgrades.	2019-12-20 10:22:44 +01:00
Tim Brooks	cb73fb0f9b	Backport remote proxy mode stats and naming (#50402 ) * Update remote cluster stats to support simple mode (#49961) Remote cluster stats API currently only returns useful information if the strategy in use is the SNIFF mode. This PR modifies the API to provide relevant information if the user is in the SIMPLE mode. This information is the configured addresses, max socket connections, and open socket connections. * Send hostname in SNI header in simple remote mode (#50247) Currently an intermediate proxy must route conncctions to the appropriate remote cluster when using simple mode. This commit offers a additional mechanism for the proxy to route the connections by including the hostname in the TLS SNI header. * Rename the remote connection mode simple to proxy (#50291) This commit renames the simple connection mode to the proxy connection mode for remote cluster connections. In order to do this, the mode specific settings which we namespaced by their mode (ex: sniff.seed and proxy.addresses) have been reverted. * Modify proxy mode to support a single address (#50391) Currently, the remote proxy connection mode uses a list setting for the proxy address. This commit modifies this so that the setting is proxy_address and only supports a single remote proxy address.	2019-12-19 18:02:48 -07:00
Stuart Tettemer	689df1f28f	Scripting: ScriptFactory not required by compile (#50344 ) (#50392 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Backport Relates: #49466	2019-12-19 12:50:25 -07:00
Alan Woodward	1a2e931d6e	Reduce the max depth of randomly generated interval queries (#50317 ) We randomly generate intervals sources to test serialization and query generation in IntervalQueryBuilderTests. However, rarely we can generate a query that has too many nested disjunctions, resulting in query rewrites running afoul of the maximum boolean clause limit. This commit reduces the maximum depth of the randomly generated intervals source to make running into this limit much more unlikely.	2019-12-19 15:12:12 +00:00
Andrei Dan	1e11d23051	Extract a create index method that only manipulates the ClusterState (#50240 ) (#50328 ) * Extract IndexCreationTask execute into applyCreateIndexRequest This is the first step in preparation for separating the index creation into a few steps that only deal with the cluster state mutation and removing the IndexCreationTask altogether. * Split applyCreateIndexRequest This breaks down the logic in applyCreateIndexRequest into multiple steps that will hopefully make the service more readable and unit testable. The service creation process now goes through a few well defined steps, namely find the templates that possibly match the new index, parse the requested and template matching mappings, process the index and template matching settings, validate the wait for active shards request and create the `IndexService`, update the mappings in the `MapperService` (which is grouped together with creating the sort order for validation purposes), validate the requested and templated matching aliases and finally update the `ClusterState` to reflect the requested changes. This also removes the `IndexCreationTask` as it was a shallow indirection and migrates the tests from `IndexCreationTaskTests` to `MetaDataCreateIndexServiceTests` (making them "real" unit tests operating on the `ClusterState` rather than mocks). * Add more unit tests. * Add IT to verify we cleanup in case of failure (cherry picked from commit 57e6269f750471f05a1a79539ca45361b9e3c2b5) Signed-off-by: Andrei Dan <andrei.dan@elastic.co> # Conflicts: # server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataCreateIndexService.java # server/src/test/java/org/elasticsearch/action/admin/indices/create/CreateIndexIT.java # server/src/test/java/org/elasticsearch/cluster/metadata/IndexCreationTaskTests.java	2019-12-19 12:37:07 +00:00
Igor Motov	c77ca98928	Geo: Switch generated WKT to upper case (#50285 ) Switches generated WKT to upper case to conform to the standard recommendation. Relates #49568	2019-12-18 17:29:08 -05:00
Stuart Tettemer	9cdbcbd121	[TEST] Exclude name on ScriptContextInfo mutate (#50332 ) (#50337 ) ScriptContextInfoSerializingTests:testEqualsAndHashcode was failing because the mutation was generating the same name. Backport Fixes: #50331	2019-12-18 14:23:21 -07:00
Stuart Tettemer	06a24f09cf	Scripting: Cache script results if deterministic (#50106 ) (#50329 ) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: #49466	2019-12-18 13:00:42 -07:00
Adrien Grand	35a88a5dbb	Add 7.5.2 version.	2019-12-18 19:50:00 +01:00
Ryan Ernst	8439b2779b	Add version 6.8.7 constant	2019-12-18 09:38:07 -08:00
Nikita Glashenko	ef54a9c23c	Add tests for IntervalsSourceProvider.Wildcard and Prefix (#50306 ) This PR adds unit tests for wire and xContent serialization of `IntervalsSourceProvider.Wildcard` and `IntervalsSourceProvider.Prefix`. Relates #50150	2019-12-18 17:41:48 +01:00
Yannick Welsch	37b8c139b3	Omit loading IndexMetaData when inspecting shards (#50214 ) Loading shard state information during shard allocation sometimes runs into a situation where a data node does not know yet how to look up the shard on disk if custom data paths are used. The current implementation loads the index metadata from disk to determine what the custom data path looks like. This PR removes this dependency, simplifying the lookup. Relates #48701	2019-12-17 14:33:02 +01:00
Martijn van Groningen	2079f1cbeb	Backport: Fix ingest simulate response document order if processor executes async (#50269 ) Backport #50244 to 7.x branch. If a processor executes asynchronously and the ingest simulate api simulates with multiple documents then the order of the documents in the response may not match the order of the documents in the request. Alexander Reelsen discovered this issue with the enrich processor with the following reproduction: ``` PUT cities/_doc/munich {"zip":"80331","city":"Munich"} PUT cities/_doc/berlin {"zip":"10965","city":"Berlin"} PUT /_enrich/policy/zip-policy { "match": { "indices": "cities", "match_field": "zip", "enrich_fields": [ "city" ] } } POST /_enrich/policy/zip-policy/_execute GET _cat/indices/.enrich-* POST /_ingest/pipeline/_simulate { "pipeline": { "processors" : [ { "enrich" : { "policy_name": "zip-policy", "field" : "zip", "target_field": "city", "max_matches": "1" } } ] }, "docs": [ { "_id": "first", "_source" : { "zip" : "80331" } } , { "_id": "second", "_source" : { "zip" : "50667" } } ] } ``` * fixed test compile error	2019-12-17 12:27:07 +01:00
Armin Braun	4f24739fbe	Fix Index Deletion During Partial Snapshot Create (#50234 ) (#50266 ) We can simply filter out shard generation updates for indices that were removed from the cluster state concurrently to fix index deletes during partial snapshots as that completely removes any reference to those shards from the snapshot. Follow up to #50202 Closes #50200	2019-12-17 10:58:15 +01:00
Armin Braun	2e7b1ab375	Use ClusterState as Consistency Source for Snapshot Repositories (#49060 ) (#50267 ) Follow up to #49729 This change removes falling back to listing out the repository contents to find the latest `index-N` in write-mounted blob store repositories. This saves 2-3 list operations on each snapshot create and delete operation. Also it makes all the snapshot status APIs cheaper (and faster) by saving one list operation there as well in many cases. This removes the resiliency to concurrent modifications of the repository as a result and puts a repository in a `corrupted` state in case loading `RepositoryData` failed from the assumed generation.	2019-12-17 10:55:13 +01:00
Henning Andersen	8391b974c5	Recovery buffer size 16B smaller (#50100 ) G1GC will use humongous allocations when an allocation exceeds half the chosen region size, which is minimum 1MB. By reducing the recovery buffer size by 16 bytes we ensure that the recovery buffer is never allocated as a humongous allocation.	2019-12-16 22:00:22 +01:00
Nhat Nguyen	731bfa6614	Account trimAboveSeqNo in committed translog generation (#50205 ) Today we do not consider trimAboveSeqNo when calculating the translog generation of an index commit. If there is no new indexing after the primary promotion, then we won't be able to clean up the translog.	2019-12-16 11:40:16 -05:00
Zachary Tong	be78d5cc74	Migrate MinAggregator integration tests to AggregatorTestCase (#50053 ) Also renames MinTests to MinAggregationBuilderTests	2019-12-16 11:15:50 -05:00
Rory Hunter	2bd3a05892	Refactor environment variable processing for Docker (#50221 ) Backport of #49612. The current Docker entrypoint script picks up environment variables and translates them into -E command line arguments. However, since any tool executes via `docker exec` doesn't run the entrypoint, it results in a poorer user experience. Therefore, refactor the env var handling so that the -E options are generated in `elasticsearch-env`. These have to be appended to any existing command arguments, since some CLI tools have subcommands and -E arguments must come after the subcommand. Also extract the support for `_FILE` env vars into a separate script, so that it can be called from more than once place (the behaviour is idempotent). Finally, add noop -E handling to CronEvalTool for parity, and support `-E` in MultiCommand before subcommands.	2019-12-16 15:39:28 +00:00
Armin Braun	afcdc27c02	Fix Index Deletion during Snapshot Finalization (#50202 ) (#50227 ) With #45689 making it so that index metadata is written after all shards have been snapshotted we can't delete indices that are part of the upcoming snapshot finalization any longer and it is not sufficient to check if all shards of an index have been snapshotted before deciding that it is safe to delete it. This change forbids deleting any index that is in the process of being snapshot to avoid issues during snapshot finalization. Relates #50200 (doesn't fully fix yet because we're not fixing the `partial=true` snapshot case here	2019-12-16 13:30:05 +01:00
Henning Andersen	4ced237a7f	Disk threshold decider is enabled by default (#50222 ) An old comment had survived after the default was flipped. Relates #6204	2019-12-16 12:43:34 +01:00
Armin Braun	761d6e8e4b	Remove BlobContainer Tests against Mocks (#50194 ) (#50220 ) * Remove BlobContainer Tests against Mocks Removing all these weird mocks as asked for by #30424. All these tests are now part of real repository ITs and otherwise left unchanged if they had independent tests that didn't call the `createBlobStore` method previously. The HDFS tests also get added coverage as a side-effect because they did not have an implementation of the abstract repository ITs. Closes #30424	2019-12-16 11:37:09 +01:00
Ignacio Vera	3717c733ff	"CONTAINS" support for BKD-backed geo_shape and shape fields (#50141 ) (#50213 ) Lucene 8.4 added support for "CONTAINS", therefore in this commit those changes are integrated in Elasticsearch. This commit contains as well a bug fix when querying with a geometry collection with "DISJOINT" relation.	2019-12-16 09:17:51 +01:00
Nhat Nguyen	6f1098cceb	Fix version in testTurnOffTranslogRetentionAfterAllShardStarted Soft-deletes requires 6.5 or later.	2019-12-15 12:58:28 -05:00
Nhat Nguyen	df46848fb0	Migrate peer recovery from translog to retention lease (#49448 ) Since 7.4, we switch from translog to Lucene as the source of history for peer recoveries. However, we reduce the likelihood of operation-based recoveries when performing a full cluster restart from pre-7.4 because existing copies do not have PPRL. To remedy this issue, we fallback using translog in peer recoveries if the recovering replica does not have a peer recovery retention lease, and the replication group hasn't fully migrated to PRRL. Relates #45136	2019-12-15 10:24:39 -05:00
Nhat Nguyen	c151a75dfe	Use retention lease in peer recovery of closed indices (#48430 ) Today we do not use retention leases in peer recovery for closed indices because we can't sync retention leases on closed indices. This change allows that ability and adjusts peer recovery to use retention leases for all indices with soft-deletes enabled. Relates #45136 Co-authored-by: David Turner <david.turner@elastic.co>	2019-12-15 10:24:34 -05:00
Christoph Büscher	c0216f9a06	Improve DateFieldMapper `ignore_malformed` handling (#50090 ) A recent change around date parsing (#46675) made it stricter, so we should now also catch DateTimeExceptions in DateFieldMapper and ignore those when the `ignore_malformed` option is set. Closes #50081	2019-12-13 10:00:10 +01:00
Zachary Tong	521933aa11	SingleBucket aggs need to reduce their bucket's pipelines first (#50103 ) When decoupling the pipeline reduction from regular agg reduction, MultiBucket aggs were modified to reduce their bucket's pipeline aggs first before reducing the sibling aggs. This modification was missed on SingleBucket aggs, meaning any SingleBucket would fail to reduce any pipeline sub-aggs	2019-12-12 09:07:33 -05:00
Ignacio Vera	b5ec227de8	upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129 ) (#50132 )	2019-12-12 13:13:37 +01:00
Adrien Grand	0bba7ccedd	Remove information about the latest PostingsFormat/DocValuesFormat. (#50118 ) (#50127 ) This information is outdated and unused.	2019-12-12 11:46:37 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Armin Braun	0fae4065ef	Better Logging GCS Blobstore Mock (#50102 ) (#50124 ) * Better Logging GCS Blobstore Mock Two things: 1. We should just throw a descriptive assertion error and figure out why we're not reading a multi-part instead of returning a `400` and failing the tests that way here since we can't reproduce these 400s locally. 2. We were missing logging the exception on a cleanup delete failure that coincides with the `400` issue in tests. Relates #49429	2019-12-12 11:17:22 +01:00
Ryan Ernst	cbff63685a	Ensure meta and document field maps are never null in GetResult (#50112 ) This commit ensures deseriable a GetResult from StreamInput does not leave metaFields and documentFields null. This could cause an NPE in situations where upsert response for a document that did not exist is passed back to a node that forwarded the upsert request. closes #48215	2019-12-11 22:21:55 -08:00
Tim Brooks	38b67f719e	Add int indicating size of transport header (#50085 ) Currently we do not know the size of the transport header (map of request response headers, features array, and action name). This means that we must read the entire transport message to dependably act on the headers. This commit adds an int indicating the size of the transport headers. With this addition we can act upon the headers prior to reading the entire message.	2019-12-11 16:24:19 -07:00
Adrien Grand	adf5c92f8c	Address UUIDTests#testCompression failures. (#50101 ) Those were due to codec randomization. Closes #50048	2019-12-11 22:13:58 +01:00
David Turner	285eacd267	Use more specific loggers in subclasses of TMNA (#50076 ) Adjusts the subclasses of `TransportMasterNodeAction` to use their own loggers instead of the one for the base class. Relates #50056. Partial backport of #46431 to 7.x.	2019-12-11 15:07:47 +00:00
Adrien Grand	87e72156ce	Upgrade to lucene 8.4.0-snapshot-662c455. (#50016 ) (#50039 ) Lucene 8.4 is about to be released so we should check it doesn't cause problems with Elasticsearch.	2019-12-10 18:04:58 +01:00
James Rodewig	3f5678ca79	[DOCS] Remove shadow replica reference (#50029 ) Removes a reference to shadow replicas from the cat shards API docs and a comment in cluster/routing/UnassignedInfo.java. Shadow replicas were removed with #23906.	2019-12-10 09:30:51 -05:00
Armin Braun	ee4a8a08dd	Improve Snapshot Finalization Ex. Handling (#49995 ) (#50017 ) * Improve Snapshot Finalization Ex. Handling Like in #49989 we can get into a situation where the setting of the repository generation (during snapshot finalization) in the cluster state fails due to master failing over. In this case we should not try to execute the next cluster state update that will remove the snapshot from the cluster state. Closes #49989	2019-12-10 13:01:51 +01:00
Yannick Welsch	a16abf921f	Make elasticsearch-node tools custom metadata-aware (#48390 ) The elasticsearch-node tools allow manipulating the on-disk cluster state. The tool is currently unaware of plugins and will therefore drop custom metadata from the cluster state once the state is written out again (as it skips over the custom metadata that it can't read). This commit preserves unknown customs when editing on-disk metadata through the elasticsearch-node command-line tools.	2019-12-10 09:58:11 +01:00
shiwenjie12	dd441962bb	Modify notes (#48331 ) Modify notes	2019-12-09 13:03:40 -05:00
Jason Tedor	bfb2dc1353	Enable dependent settings values to be validated (#49942 ) Today settings can declare dependencies on another setting. This declaration is implemented so that if the declared setting is not set when the declaring setting is, settings validation fails. Yet, in some cases we want not only that the setting is set, but that it also has a specific value. For example, with the monitoring exporter settings, if xpack.monitoring.exporters.my_exporter.host is set, we not only want that xpack.monitoring.exporters.my_exporter.type is set, but that it is also set to local. This commit extends the settings infrastructure so that this declaration is possible. The use of this in the monitoring exporter settings will be implemented in a follow-up.	2019-12-09 12:45:50 -05:00
Vishnu Chilamakuru	056c698540	Add Validation for maxQueryTerms to be greater than 0 for MoreLikeThisQuery (#49966 ) Adds validation for maxQueryTerms to be greater than 0 for MoreLikeThisQuery and MoreLikeThisQueryBuilder. Closes #49927	2019-12-09 15:01:10 +01:00
Armin Braun	62e128f02d	Cleanup Old index-N Blobs in Repository Cleanup (#49862 ) (#49902 ) * Cleanup Old index-N Blobs in Repository Cleanup Repository cleanup didn't deal with old index-N, this change adds cleaning up all old index-N found in the repository.	2019-12-09 12:05:55 +01:00
Armin Braun	ac2774c9fa	Use Cluster State to Track Repository Generation (#49729 ) (#49976 ) Step on the road to #49060. This commit adds the logic to keep track of a repository's generation across repository operations. See changes to package level Javadoc for the concrete changes in the distributed state machine. It updates the write side of new repository generations to be fully consistent via the cluster state. With this change, no `index-N` will be overwritten for the same repository ever. So eventual consistency issues around conflicting updates to the same `index-N` are not a possibility any longer. With this change the read side will still use listing of repository contents instead of relying solely on the cluster state contents. The logic for that will be introduced in #49060. This retains the ability to externally delete the contents of a repository and continue using it afterwards for the time being. In #49060 the use of listing to determine the repository generation will be removed in all cases (except for full-cluster restart) as the last step in this effort.	2019-12-09 09:02:57 +01:00
Yannick Welsch	7a2e35caa0	Properly fake corrupted translog (#49918 ) The fake translog corruption in the test sometimes generates invalid translog files where some assertions do not hold (e.g. minSeqNo <= maxSeqNo or minTranslogGen <= translogGen) Closes #49909	2019-12-09 08:33:40 +01:00
Yannick Welsch	01d36afa4b	Randomly run CCR tests with _source disabled (#49922 ) Makes sure that CCR also properly works with _source disabled. Changes one exception in LuceneChangesSnapshot as the case of missing _recovery_source because of a missing lease was not properly properly bubbled up to CCR (testIndexFallBehind was failing).	2019-12-09 08:33:40 +01:00
Armin Braun	f768f8ddab	Fix TimedRunnable Executing onAfter Twice (#49910 ) (#49930 ) If we have a nested `AbstractRunnable` inside of `TimedRunnable` it's executed twice on `run` (once when its own `run` method is invoked and once when the `onAfter` in the `TimedRunnable` is executed). Simply removing the `onAfter` override in `TimedRunnable` makes sure that the `onAfter` is only called once by the `run` on the nested `AbstractRunnable` itself. Same was done for `onFailure` as it was double-triggering as well on exceptions in the inner `onFailure`.	2019-12-08 17:36:05 +01:00
Armin Braun	8ae11e176a	Cleanup some in o.e.transport (#49901 ) (#49971 ) Cleaning up some obvious compile warnings and dead code.	2019-12-08 16:14:20 +01:00
Stuart Tettemer	17cda5b2c0	Scripting: Groundwork for caching script results (#49895 ) (#49944 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466 Backport	2019-12-06 15:08:05 -07:00
David Roberts	17fa9d5844	[TEST] Mute ConnectionManagerTests.testConcurrentConnectsAndDisconnects Due to https://github.com/elastic/elasticsearch/issues/49903	2019-12-06 17:06:34 +00:00
Alexander Reelsen	d299bf5760	Add tests for ingesting CBOR data attachments (#49715 ) Our docs specifically mention that CBOR is supported when ingesting attachments. However this is not tested anywhere. This adds a test, that uses specifically CBOR format in its IndexRequest and another one that behaves like CBOR in the ingest attachment unit tests.	2019-12-06 14:33:39 +01:00
Orhan Toy	0f02e02d77	Consistent case in CLI option descriptions (#49635 ) This commit improves the casing of messages in the CLI help descriptions.	2019-12-05 13:36:11 -08:00
Zachary Tong	fec882a457	Decouple pipeline reductions from final agg reduction (#45796 ) Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible	2019-12-05 16:11:54 -05:00
Tim Brooks	b281d64e89	Ensure remote strategy settings can be updated (#49812 ) This is related to #49067. As part of this work a new sniff number of node connections setting, a simple addresses setting, and a simple number of sockets setting have been added. This commit ensures that these settings are properly hooked up to support dynamic updates.	2019-12-05 10:39:57 -07:00
Jim Ferenczi	495762486d	Fix concurrent issue in SearchPhaseController (#49829 ) The list used by the search progress listener can be nullified by another thread that reports a query result. This change replaces the usage of this list with a new array that is synchronously modified. Closes #49778	2019-12-05 13:09:25 +01:00
Stuart Tettemer	426c7a5e8f	Scripting: add available languages & contexts API (#49652 ) (#49815 ) Adds `GET /_script_language` to support Kibana dynamic scripting language selection. Response contains whether `inline` and/or `stored` scripts are enabled as determined by the `script.allowed_types` settings. For each scripting language registered, such as `painless`, `expression`, `mustache` or custom, available contexts for the language are included as determined by the `script.allowed_contexts` setting. Response format: ``` { "types_allowed": [ "inline", "stored" ], "language_contexts": [ { "language": "expression", "contexts": [ "aggregation_selector", "aggs" ... ] }, { "language": "painless", "contexts": [ "aggregation_selector", "aggs", "aggs_combine", ... ] } ... ] } ``` Fixes: #49463 Backport	2019-12-04 16:18:22 -07:00
Alan Woodward	aa443c6362	[CI] Interval queries cannot be cached if they use scripts (#49824 ) not adjust testCacheability(), which how fails occasionally when given a random interval source containing a script. This commit overrides testCacheability() to explicitly sources with and without script filters. Fixes #49821	2019-12-04 12:18:04 +00:00
Alan Woodward	312190266e	Improve coverage of equals/hashCode tests for IntervalQueryBuilder (#49820 ) By default, AbstractQueryTestCase only changes name and boost in its mutateInstance method, used when checking equals and hashcode implementations. This commit adds a mutateInstance method to InveralQueryBuilderTests that will check hashcode and equality when the field or intervals source are changed.	2019-12-04 11:33:24 +00:00
jimczi	53d801c0d7	\#49566 Fix non-deterministic sort order in testHighlightingWithKeywordIgnoreBoundaryScanner	2019-12-04 12:23:43 +01:00
jimczi	1d522c6605	add missing change after backport of #49566	2019-12-04 11:25:47 +01:00
Jim Ferenczi	691421f287	Fix invalid break iterator highlighting on keyword field (#49566 ) By default the unified highlighter splits the input into passages using a sentence break iterator. However we don't check if the field is tokenized or not so `keyword` field also applies the break iterator even though they can only match on the entire content. This means that by default we'll split the content of a `keyword` field on sentence break if the requested number of fragments is set to a value different than 0 (default to 5). This commit changes this behavior to ignore the break iterator on non-tokenized fields (keyword) in order to always highlight the entire values. The number of requested fragments control the number of matched values are returned but the boundary_scanner_type is now ignored. Note that this is the behavior in 6x but some refactoring of the Lucene's highlighter exposed this bug in Elasticsearch 7x.	2019-12-04 11:14:44 +01:00
Alan Woodward	408f25e016	Fixes a bug in interval filter serialization (#49793 ) There is a possible NPE in IntervalFilter xcontent serialization when scripts are used, and `equals` and `hashCode` are also incorrectly implemented for script filters. This commit fixes both.	2019-12-04 08:48:22 +00:00
Armin Braun	996cddd98b	Stop Copying Every Http Request in Message Handler (#44564 ) (#49809 ) * Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)	2019-12-04 08:41:42 +01:00
Yannick Welsch	fbb92f527a	Replicate write actions before fsyncing them (#49746 ) This commit fixes a number of issues with data replication: - Local and global checkpoints are not updated after the new operations have been fsynced, but might capture a state before the fsync. The reason why this probably went undetected for so long is that AsyncIOProcessor is synchronous if you index one item at a time, and hence working as intended unless you have a high enough level of concurrent indexing. As we rely in other places on the assumption that we have an up-to-date local checkpoint in case of synchronous translog durability, there's a risk for the local and global checkpoints not to be up-to-date after replication completes, and that this won't be corrected by the periodic global checkpoint sync. - AsyncIOProcessor also has another "bad" side effect here: if you index one bulk at a time, the bulk is always first fsynced on the primary before being sent to the replica. Further, if one thread is tasked by AsyncIOProcessor to drain the processing queue and fsync, other threads can easily pile more bulk requests on top of that thread. Things are not very fair here, and the thread might continue doing a lot more fsyncs before returning (as the other threads pile more and more on top), which blocks it from returning as a replication request (e.g. if this thread is on the primary, it blocks the replication requests to the replicas from going out, and delaying checkpoint advancement). This commit fixes all these issues, and also simplifies the code that coordinates all the after write actions.	2019-12-03 12:22:46 +01:00
Mayya Sharipova	3bbaa01764	Disable sort optimization when index is sorted (#49727 ) Don't run long sort optimization when index is already sorted on the same field as the sort query parameter. Relates to #37043, follow up for #48804	2019-12-02 17:05:21 -05:00
Mayya Sharipova	ad274dd797	Mute testIndexHasDuplicateData (#49779 ) Related to #49703	2019-12-02 17:05:01 -05:00
jimczi	3eae180b8b	add new version 7.5.1	2019-12-02 20:14:47 +01:00
Armin Braun	5f766a66fb	Make Snapshot Metadata Javadocs Clearer (#49697 ) (#49771 ) We are always using the snapshot name on the shard level, lets make it crystal clear in the docs.	2019-12-02 19:14:34 +01:00
Ignacio Vera	ff00174b61	Add CoreValuesSourceTypeTests for histogram (#49751 ) (#49765 )	2019-12-02 16:21:56 +01:00
Christoph Büscher	04ace7a6da	Add note how to run locale sensitive unit test (#49491 ) Some unit test checking locale sensitive functionality require the -Djava.locale.providers=SPI,COMPAT flag to be set. When running tests though gradle we pass this already to the BuildPlugin, but running from the IDE this might need to be set manually. Adding a note explaining this to the CONTRIBUTING.md doc and leaving a note in the test comment of SearchQueryIT.testRangeQueryWithLocaleMapping which is a test we know that suffers from this issue.	2019-12-02 11:21:56 +01:00
Henning Andersen	5adb33ec17	Deprecate sorting in reindex (#49458 ) (#49738 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-12-01 19:24:27 +01:00
Mayya Sharipova	62a891bfa3	Add bulkScorer to script score query (#46336 ) (#49734 ) Some queries return bulk scorers that can be significantly faster than iterating naively over the scorer. By giving script_score a BulkScorer that would delegate to the wrapped query, we could make it faster in some cases. Closes #40837	2019-11-29 16:51:50 -05:00
Henning Andersen	1d745f1e5c	Revert "Deprecate sorting in reindex (#49458 )" This reverts commit `27d45c9f1f`.	2019-11-29 22:08:19 +01:00
Mayya Sharipova	7cf170830c	Optimize sort on numeric long and date fields. (#49732 ) This rewrites long sort as a `DistanceFeatureQuery`, which can efficiently skip non-competitive blocks and segments of documents. Depending on the dataset, the speedups can be 2 - 10 times. The optimization can be disabled with setting the system property `es.search.rewrite_sort` to `false`. Optimization is skipped when an index has 50% or more data with the same value. Optimization is done through: 1. Rewriting sort as `DistanceFeatureQuery` which can efficiently skip non-competitive blocks and segments of documents. 2. Sorting segments according to the primary numeric sort field(#44021) This allows to skip non-competitive segments. 3. Using collector manager. When we optimize sort, we sort segments by their min/max value. As a collector expects to have segments in order, we can not use a single collector for sorted segments. We use collectorManager, where for every segment a dedicated collector will be created. 4. Using Lucene's shared TopFieldCollector manager This collector manager is able to exchange minimum competitive score between collectors, which allows us to efficiently skip the whole segments that don't contain competitive scores. 5. When index is force merged to a single segment, #48533 interleaving old and new segments allows for this optimization as well, as blocks with non-competitive docs can be skipped. Backport for #48804 Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>	2019-11-29 15:37:40 -05:00
Henning Andersen	27d45c9f1f	Deprecate sorting in reindex (#49458 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-11-29 21:35:11 +01:00
Yannick Welsch	c2d316a22f	Remove obsolete resolving logic from TRA (#49685 ) This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. In contrast to prior PR (#49647), this PR also fixes (see b3697cc) a situation where the previous index expression logic had an interesting side effect. For bulk requests (which had resolveIndex = false), the reroute phase was waiting for the index to appear in case where it was not present, and for all other replication requests (resolveIndex = true) it would right away throw an IndexNotFoundException while resolving the name and exit. With #49647, every replication request was now waiting for the index to appear, which was problematic when the given index had just been deleted (e.g. deleting a follower index while it's still receiving requests from the leader, where these requests would now wait up to a minute for the index to appear). This PR now adds b3697cc on top of that prior PR to make sure to reestablish some of the prior behavior where the reroute phase waits for the bulk request for the index to appear. That logic was in place to ensure that when an index was created and not all nodes had learned about it yet, that the bulk would not fail somewhere in the reroute phase. This is now only restricted to the situation where the current node has an older cluster state than the one that coordinated the bulk request (which checks that the index is present). This also means that when an index is deleted, we will no longer unnecessarily wait up to the timeout for the index o appear, and instead fail the request. Closes #20279	2019-11-29 15:24:07 +01:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Jim Ferenczi	496bb9e2ee	Add a listener to track the progress of a search request locally (#49471 ) (#49691 ) This commit adds a function in NodeClient that allows to track the progress of a search request locally. Progress is tracked through a SearchProgressListener that exposes query and fetch responses as well as partial and final reduces. This new method can be used by modules/plugins inside a node in order to track the progress of a local search request. Relates #49091	2019-11-28 18:23:09 +01:00
Mayya Sharipova	2dafecc398	Upgrade lucene to 8.4.0-snapshot-e648d601efb (#49641 )	2019-11-28 11:59:58 -05:00
Adrien Grand	1824a2fa58	Pure disjunctions should rewrite to a MatchNoneQueryBuilder (#48557 ) (#49673 ) Closes #48475	2019-11-28 15:54:32 +01:00
Ignacio Vera	326fe7566e	New Histogram field mapper that supports percentiles aggregations. (#48580 ) (#49683 ) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.	2019-11-28 15:06:26 +01:00
Yannick Welsch	04e9cbd6eb	Revert "Remove obsolete resolving logic from TRA (#49647 )" This reverts commit `0827ea2175`.	2019-11-28 13:12:07 +01:00
Yannick Welsch	0827ea2175	Remove obsolete resolving logic from TRA (#49647 ) This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes #20279	2019-11-28 12:11:27 +01:00
Jim Ferenczi	d6445fae4b	Add a cluster setting to disallow loading fielddata on _id field (#49166 ) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472 is implemented. Closes #43599	2019-11-28 09:35:28 +01:00
Christos Soulios	d66795fdf0	Fix typo when assigning null_value in GeoPointFieldMapper (#49655 ) Backport of #49645 to 7.x This PR fixes a trivial typo error that affects assigning null_value in the GeoPointFieldMapper	2019-11-27 20:50:27 +02:00
Martijn van Groningen	0a42395dfa	Backport: add templating support to pipeline processor (#49643 ) Backport of #49030 This commit adds templating support to the pipeline processor's `name` option. Closes #39955	2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka	502873b144	[Java.time] Retain prefixed date pattern in formatter (#48703 ) JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters. closes #48698	2019-11-27 12:29:18 +01:00
Yannick Welsch	0a73ba05de	Do not mutate request on scripted upsert (#49578 ) Fixes a bug where a scripted upsert that causes a dynamic mapping update is retried (because mapping update is still in-flight), and the request is mutated multiple times. Closes #48670	2019-11-27 09:25:36 +01:00
Martijn van Groningen	09c4269097	Add templating support to enrich processor (#49093 ) Adds support for templating to `field` and `target_field` options.	2019-11-27 08:53:11 +01:00
Martijn van Groningen	90850f4ea0	Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596 ) Backport of #49076 In case an exception occurs inside a pipeline processor, the pipeline stack is kept around as header in the exception. Then in the on_failure processor the id of the pipeline the exception occurred is made accessible via the `on_failure_pipeline` ingest metadata. Closes #44920	2019-11-27 07:52:08 +01:00
Armin Braun	996cdebfb4	Make BlobStoreRepository#writeIndexGen API Async (#49584 ) (#49610 ) Preliminary to shorten the diff of #49060. In #49060 we execute cluster state updates during the writing of a new index gen and thus it must be an async API.	2019-11-26 22:37:31 +01:00
Armin Braun	3862400270	Remove Redundant EsBlobStoreTestCase (#49603 ) (#49605 ) All the implementations of `EsBlobStoreTestCase` use the exact same bootstrap code that is also used by their implementation of `EsBlobStoreContainerTestCase`. This means all tests might as well live under `EsBlobStoreContainerTestCase` saving a lot of code duplication. Also, there was no HDFS implementation for `EsBlobStoreTestCase` which is now automatically resolved by moving the tests over since there is a HDFS implementation for the container tests.	2019-11-26 20:57:19 +01:00
Alan Woodward	fe2c65185e	Annotated text type should extend TextFieldType (#49555 ) The annotated text mapper has a field type that currently extends StringFieldType, which means that all the positional-related query factory methods need to be copied over from TextFieldType. In addition, MappedFieldType.intervals() hasn't been overridden, so you can't use intervals queries with annotated text - a major drawback, since one of the purposes of annotated text is to be able to run positional queries against annotations. This commit changes the annotated text field type to extend TextFieldType instead, adding tests to ensure that position queries work correctly. Closes #49289	2019-11-26 16:52:21 +00:00
Armin Braun	495b543e63	Improve Stability of GCS Mock API (#49592 ) (#49597 ) Same as #49518 pretty much but for GCS. Fixing a few more spots where input stream can get closed without being fully drained and adding assertions to make sure it's always drained. Moved the no-close stream wrapper to production code utilities since there's a number of spots in production code where it's also useful (will reuse it there in a follow-up).	2019-11-26 16:53:51 +01:00
Rory Hunter	cf5f013033	Return 400 when handling invalid JSON (#49558 ) Backport of #49552. Closes #49428. The code that works out an HTTP code for an exception didn't consider the JsonParseException case, meant that an invalid JSON request could result in a 500 Internal Server Error. Now it returns 400 Bad Request.	2019-11-26 12:36:56 +00:00
Tim Brooks	416178c7c8	Enable simple remote connection strategy (#49561 ) This commit back ports three commits related to enabling the simple connection strategy. Allow simple connection strategy to be configured (#49066) Currently the simple connection strategy only exists in the code. It cannot be configured. This commit moves in the direction of allowing it to be configured. It introduces settings for the addresses and socket count. Additionally it introduces new settings for the sniff strategy so that the more generic number of connections and seed node settings can be deprecated. The simple settings are not yet registered as the registration is dependent on follow-up work to validate the settings. Ensure at least 1 seed configured in remote test (#49389) This fixes #49384. Currently when we select a random subset of seed nodes from a list, it is possible for 0 seeds to be selected. This test depends on at least 1 seed being selected. Add the simple strategy to cluster settings (#49414) This is related to #49067. This commit adds the simple connection strategy settings and strategy mode setting to the cluster settings registry. With these changes, the simple connection mode can be used. Additionally, it adds validation to ensure that settings cannot be misconfigured.	2019-11-25 16:53:07 -07:00
Zachary Tong	99e313695f	Reuse CompensatedSum object in agg collect loops (#49548 ) The new CompensatedSum is a nice DRY refactor, but had the unanticipated side effect of creating a lot of object allocation in the aggregation hot collection loop: one object per visited document, per aggregator. In some places it created two per-doc-per-agg (weighted avg, geo centroids, etc) since there were multiple compensations being maintained. This PR moves the object creation out of the hot loop so that it is now created once per segment, and resets the internal state each time through the loop	2019-11-25 16:46:48 -05:00
Armin Braun	2502ff39a0	Enhance SnapshotResiliencyTests (#49514 ) (#49541 ) A few enhancements to `SnapshotResiliencyTests`: 1. Test running requests from random nodes in more spots to enhance coverage (this is particularly motivated by #49060 where the additional number of cluster state updates makes it more interesting to fully cover all kinds of network failures) 2. Fix issue with restarting only master node in one test (doing so breaks the test at an incredibly low frequency, that becomes not so low in #49060 with the additional cluster state updates between request and response) 3. Improved cluster formation checks (now properly checks the term as well when forming cluster) + makes sure all nodes are connected to all other nodes (previously the data nodes would at times not be connected to other data nodes, which was shaken out now by adding the `client()` method 4. Make sure the cluster left behind by the test makes sense by running the repo cleanup action on it (this also increases coverage of the repository cleanup action obviously and adds the basis of making it part of more resiliency tests)	2019-11-25 13:31:28 +01:00
Jared Tan	1d2bfd1af6	Include id to the error msg when it's too long (#49433 )	2019-11-24 13:08:26 -05:00
Jason Tedor	69f570ea5f	Adjust version on final pipeline serialization This commit adjusts the version final pipeline serialization after it was backported to the 7.5 branch.	2019-11-22 14:56:56 -05:00
Jay Modi	4fd5fb5297	Stop NodeTests from timing out in certain cases (#49202 ) (#49503 ) The NodeTests class contains tests that check behavior when shutting down a node. This involves starting a node, performing some operation, stopping the node, and then awaiting the close of the node. Part of closing a node is the termination of the node's ThreadPool. ThreadPool termination semantics can be deceiving. The ThreadPool#terminate method takes a timeout value and the first oddity is that the terminate method can take two times the timeout value before returning. Internally this method acts on the ExecutorService instances that are held by the ThreadPool. First, an orderly shutdown is attempted and pending tasks are allowed to execute while waiting for the timeout value. If any of the ExecutorService instances have not terminated, a call is made to attempt to stop all active tasks (usually using interrupts) and then waits for up to the timeout value a second time for the termination of the ExecutorService instances. This means that if use a large value when waiting for a node to close, we may not attempt to interrupt any threads that are in a blocking call before the test times out. In order to avoid causing these tests to time out, this change reduces the timeout passed to Node#awaitClose to 10 seconds from 1 day. This will allow blocked threads to be interrupted before the test suite fails due to the timeout. Closes #44256 Closes #42350 Closes #44435	2019-11-22 12:41:52 -07:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Armin Braun	97c7ea60b9	Add Missing Nullable Assertions in SnapshotsService (#49465 ) (#49492 ) Just realized we were missing some annotations here which was somewhat confusing since other methods/parameters have the `Nullable` annotation wherever a `null` can be passed.	2019-11-22 17:27:27 +01:00
Rory Hunter	4fae2bb3b1	Don't close stderr under `--quiet` (#49431 ) Backport of #47208. Closes #46900. When running ES with `--quiet`, if ES then exits abnormally, a user has to go hunting in the logs for the error. Instead, never close System.err, and print more information to it if ES encounters a fatal error e.g. config validation, or some fatal runtime exception. This is useful when running under e.g. systemd, since the error will go into the journal. Note that stderr is still closed in daemon (`-d`) mode.	2019-11-22 14:58:17 +00:00
Jim Ferenczi	ed4eecc00e	Pre-sort shards based on the max/min value of the primary sort field (#49092 ) This change automatically pre-sort search shards on search requests that use a primary sort based on the value of a field. When possible, the can_match phase will extract the min/max (depending on the provided sort order) values of each shard and use it to pre-sort the shards prior to running the subsequent phases. This feature can be useful to ensure that shards that contain recent data are executed first so that intermediate merge have more chance to contain contiguous data (think of date_histogram for instance) but it could also be used in a follow up to early terminate sorted top-hits queries that don't require the total hit count. The latter could significantly speed up the retrieval of the most/least recent documents from time-based indices. Relates #49091	2019-11-22 11:02:12 +01:00
Igor Motov	e8971ff367	Geo: Fix handling of circles in legacy geo_shape queries (#49410 ) Brings back support for circles in legacy geo_shape queries that was accidentally lost during query refactoring. Fixes #49296	2019-11-21 14:03:31 -05:00
Christoph Büscher	138d16ab9e	Fix ClusterHealthResponsesTests condition (#49360 ) Currently the condtion that is supposed to test creation of test instances with multiple indices is never true because it compares Strings with an enum. This changes it so the condition uses the enum constants instead.	2019-11-21 17:14:23 +01:00
Alan Woodward	d1eb7e749e	Fix test for index phrases shortcut with multi-term synonyms (#49366 ) Lucene 8.3 included a root fix for #43976, which was temporarily fixed in elasticsearch by #44340. Since we have upgraded to 8.3 we no longer need this workaround. This commit fixes the test that was added to check the workaround, and instead checks that fields with index_phrases enabled correctly build queries when used with multi-term synonyms. Closes #47777	2019-11-21 09:49:58 +00:00
Yannick Welsch	d72bd3a171	Verify translog checksum before UUID check (#49394 ) When opening a translog file, we check whether the UUID matches what we expect (the UUID from the latest commit). The UUID check can in certain cases fail when the translog is corrupted. This commit changes the ordering of the checks so that corruption is detected first.	2019-11-21 10:12:49 +01:00
Yannick Welsch	8ee70fa9c6	Fix testPeerRecoveryTrimsLocalTranslog (#49385 ) 7.x uses the transport client, which, when being closed, can throw an IllegalStateException Closes #49375	2019-11-21 10:03:25 +01:00
Nhat Nguyen	37a9cd677b	Ignore Lucene index in peer recovery if translog corrupted (#49114 ) If the translog on a replica is corrupt, we should not perform an operation-based recovery or utilize sync_id as we won't be able to open an engine in the next step. This change adds an extra validation that ensures translog is okay when preparing a peer recovery request.	2019-11-20 16:04:09 -05:00
jaymode	d9fd4cc351	Add version 6.8.6	2019-11-20 11:01:57 -07:00
Jim Ferenczi	81548df2d9	Disable caching when queries are profiled (#48195 ) This change disables the query and request cache when profile is set to true in the request. This means that profiled queries will not check caches to execute the query and the result will never be added in the cache either. Closes #33298	2019-11-20 16:02:59 +01:00
Armin Braun	1cde4a6364	Make SnapshotsService#getRepositoryData Async (#49322 ) (#49358 ) * Make SnapshotsService#getRepositoryData Async (#49322) Follow up to #49299 removing the blocking step for the snapshot status APIs as well.	2019-11-20 15:22:10 +01:00
Alan Woodward	c6b31162ba	Refactor percolator's QueryAnalyzer to use QueryVisitors Lucene now allows us to explore the structure of a query using QueryVisitors, delegating the knowledge of how to recurse through and collect terms to the query implementations themselves. The percolator currently has a home-grown external version of this API to construct sets of matching terms that must be present in a document in order for it to possibly match the query. This commit removes the home-grown implementation in favour of one using QueryVisitor. This has the added benefit of making interval queries available for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050) it also includes a clone of some of the lucene intervals logic, that can be removed once upstream has been fixed. Closes #45639	2019-11-20 09:21:01 +00:00
Mark Tozzi	17358b5af7	(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320 ) (#49330 ) This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.	2019-11-19 16:44:29 -05:00
Jay Modi	eed4cd25eb	ThreadPool and ThreadContext are not closeable (#43249 ) (#49273 ) This commit changes the ThreadContext to just use a regular ThreadLocal over the lucene CloseableThreadLocal. The CloseableThreadLocal solves issues with ThreadLocals that are no longer needed during runtime but in the case of the ThreadContext, we need it for the runtime of the node and it is typically not closed until the node closes, so we miss out on the benefits that this class provides. Additionally by removing the close logic, we simplify code in other places that deal with exceptions and tracking to see if it happens when the node is closing. Closes #42577	2019-11-19 13:15:16 -07:00
Jack Conradson	14d2e795ae	make dim files mmapped (#49272 ) This change mmaps dim files in HybridDirectory to take advantage of off- heap BKD trees. This is based off of (#48509) via (https://issues.apache.org/jira/browse/LUCENE-8932).	2019-11-19 10:22:30 -08:00
Armin Braun	0acba44a2e	Make Repository.getRepositoryData an Async API (#49299 ) (#49312 ) This API call in most implementations is fairly IO heavy and slow so it is more natural to be async in the first place. Concretely though, this change is a prerequisite of #49060 since determining the repository generation from the cluster state introduces situations where this call would have to wait for other operations to finish. Doing so in a blocking manner would break `SnapshotResiliencyTests` and waste a thread. Also, this sets up the possibility to in the future make use of async IO where provided by the underlying Repository implementation. In a follow-up `SnapshotsService#getRepositoryData` will be made async as well (did not do it here, since it's another huge change to do so). Note: This change for now does not alter the threading behaviour in any way (since `Repository#getRepositoryData` isn't forking) and is purely mechanical.	2019-11-19 16:49:12 +01:00
Armin Braun	9c00648314	Make Snapshot Delete Concurrency Exception Consistent (#49266 ) (#49281 ) We shouldn't be throwing `RepositoryException` when the repository wasn't concurrently modified in an unexpected fashion (i.e. on the blob/file level). When we know that the known repo gen moved higher in terms of the generation tracked in master memory we should throw the concurrent snapshot exception. This change makes concurrent snapshot create and delete always throw the same exception, prevents unnecessary listings when the generation is known to be off and prevents future test failures in SLM tests that assume the concurrent snapshot exception is always thrown here. Without this change, the newly added test randomly fails the `instanceOf` assertion by running into a `RepositoryException`.	2019-11-19 09:50:52 +01:00
Henning Andersen	2ac38fd315	Reindex and friends fail on RED shards (#45830 ) Reindex, update by query and delete by query would silently disregard RED/unavailable shards, thus not copying, updating or deleting matching data in those shards. Now use `allow_partial_search_results=false` to ensure these operations fail if the search crosses an unavailable chard. Added the option to explicitly specify `allow_partial_search_results=true` for reindex only (seemed too strange for update/delete by query). Relates #45739 and #42612	2019-11-18 21:23:08 +01:00
Benjamin Trent	eefe7688ce	[7.x][ML] ML Model Inference Ingest Processor (#49052 ) (#49257 ) * [ML] ML Model Inference Ingest Processor (#49052) * [ML][Inference] adds lazy model loader and inference (#47410) This adds a couple of things: - A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them - A Model class and its first sub-class LocalModel. Used to cache model information and run inference. - Transport action and handler for requests to infer against a local model Related Feature PRs: * [ML][Inference] Adjust inference configuration option API (#47812) * [ML][Inference] adds logistic_regression output aggregator (#48075) * [ML][Inference] Adding read/del trained models (#47882) * [ML][Inference] Adding inference ingest processor (#47859) * [ML][Inference] fixing classification inference for ensemble (#48463) * [ML][Inference] Adding model memory estimations (#48323) * [ML][Inference] adding more options to inference processor (#48545) * [ML][Inference] handle string values better in feature extraction (#48584) * [ML][Inference] Adding _stats endpoint for inference (#48492) * [ML][Inference] add inference processors and trained models to usage (#47869) * [ML][Inference] add new flag for optionally including model definition (#48718) * [ML][Inference] adding license checks (#49056) * [ML][Inference] Adding memory and compute estimates to inference (#48955) * fixing version of indexed docs for model inference	2019-11-18 13:19:17 -05:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Armin Braun	25cc8e3663	Fix RepoCleanup not Removed on Master-Failover (#49217 ) (#49239 ) The logic for `cleanupInProgress()` was backwards everywhere (method itself and all but one user). Also, we weren't checking it when removing a repository. This lead to a bug (in the one spot that didn't use the method backwards) that prevented the cleanup cluster state entry from ever being removed from the cluster state if master failed over during the cleanup process. This change corrects the backwards logic, adds a test that makes sure the cleanup is always removed and adds a check that prevents repository removal during cleanup to the repositories service. Also, the failure handling logic in the cleanup action was broken. Repeated invocation would lead to the cleanup being removed from the cluster state even if it was in progress. Fixed by adding a flag that indicates whether or not any removal of the cleanup task from the cluster state must be executed. Sorry for mixing this in here, but I had to fix it in the same PR, as the first test (for master-failover) otherwise would often just delete the blocked cleanup action as a result of a transport master action retry.	2019-11-18 16:44:09 +01:00
Armin Braun	f7d9e7bdc4	Better Exceptions on Concurrent Snapshot Operations (#49220 ) (#49237 ) * Better Exceptions on Concurrent Snapshot Operations It is somewhat tricky to debug test failures from concurrent operations without having the exact knowledge of what ran concurrently so I added it to these exceptions in all spots.	2019-11-18 14:12:55 +01:00
Armin Braun	42268f0b0e	Fix Broken Network Disruption in SnapshotResiliencyTests (#49216 ) (#49231 ) The network disruption was acting on node ids and node names which made reconnects not work. Moved all usages to node names to fix this. Since the map of all nodes in the test is indexed by name this was easier to work with.	2019-11-18 12:02:27 +01:00
Yannick Welsch	af797a77a1	Auto-expand indices according to allocation filtering rules (#48974 ) Honours allocation filtering rules when auto-expanding indices.	2019-11-18 12:01:56 +01:00
Armin Braun	2886d4c6dd	Make FsBlobContainer Listing Resilient to Concurrent Modifications (#49142 ) (#49176 ) * Make FsBlobContainer Listing Resilient to Concurrent Modifications If we list out files in a folder via the lazily computed directory stream, we have to deal with concurrent deletes when reading the file attributes since we don't have a lock on the directory in any way. Closes #37581	2019-11-15 21:14:53 +01:00
Mark Tozzi	dad68c59fe	Avoid precision loss in DocValueFormat.RAW#parseLong (#49063 ) (#49169 )	2019-11-15 12:32:26 -05:00
markharwood	c3745b03ee	Search optimisation - add canMatch early aborts for queries on "_index" field (#49158 ) Make queries on the “_index” field fast-fail if the target shard is an index that doesn’t match the query expression. Part of the “canMatch” phase optimisations. Closes #48473	2019-11-15 16:50:32 +00:00
Jason Tedor	36dc544819	Adjust version on ingest processor exception The dedicated ingest processor exception was backported to 7.5. This commit updates the version in the 7.x branch.	2019-11-15 09:35:12 -05:00
Armin Braun	fc505aaa76	Track Repository Gen. in BlobStoreRepository (#48944 ) (#49116 ) This is intended as a stop-gap solution/improvement to #38941 that prevents repo modifications without an intermittent master failover from causing inconsistent (outdated due to inconsistent listing of index-N blobs) `RepositoryData` to be written. Tracking the latest repository generation will move to the cluster state in a separate pull request. This is intended as a low-risk change to be backported as far as possible and motived by the recently increased chance of #38941 causing trouble via SLM (see https://github.com/elastic/elasticsearch/issues/47520). Closes #47834 Closes #49048	2019-11-15 09:54:53 +01:00
Tal Levy	5cd6f64f15	Introduce faster approximate sinh/atan math functions (#49009 ) (#49110 ) This commit introduces a new class called ESSloppyMath that is meant to reflect the purpose of Lucene's SloppyMath, but add additional unimplemented faster alternatives to math functions. The two that are used by geotile-grid a lot are sinh/atan. In a quick elasticsearch rally benchmark for geotile-grid on Switzerland data points, this shows a (1.22x) 22% speed-up over using Math's functions. closes #41166.	2019-11-14 14:15:34 -08:00
bellengao	6ce04429c6	Fix `_analyze` API to correctly use normalizers when specified (#48866 ) Currently the `_analyze` endpoint doesn't correctly use normalizers specified in the request. This change fixes that by returning the resolved normalizer from TransportAnalyzeAction#getAnalyzer and updates test to be able to catch this in the future. Closes #48650	2019-11-14 19:51:11 +01:00
Jason Tedor	2bcdcb17cd	Introduce dedicated ingest processor exception (#48810 ) Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.	2019-11-14 11:04:53 -05:00
Christoph Büscher	6c5644335f	Simplify TransportMultiSearchActionTests (#48523 ) The test doesn't seem to need the threadpool that is created and destroyed in setup and teardown any longer, so it can be removed.	2019-11-14 14:48:16 +01:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	66f0c8900f	Fix Transport Stopped Exception (#48930 ) (#49035 ) When a node shuts down, `TransportService` moves to stopped state and then closes connections. If a request is done in between, an exception was thrown that was not retried in replication actions. Now throw a wrapped `NodeClosedException` exception instead, which is correctly handled in replication action. Fixed other usages too. Relates #42612	2019-11-13 18:48:05 +01:00
Yannick Welsch	2dfa0133d5	Always use primary term from primary to index docs on replica (#47583 ) Ensures that we always use the primary term established by the primary to index docs on the replica. Makes the logic around replication less brittle by always using the operation primary term on the replica that is coming from the primary.	2019-11-13 12:13:45 +01:00
Igor Motov	40776eedaf	Fix ignoring missing values in min/max aggregations (#48970 ) Fixes the issue when the missing values can be ignored in min/max due to BKD optimization. Fixes #48905	2019-11-12 19:57:28 -05:00
Armin Braun	0e1035241d	Fix Broken Snapshots in Mixed Clusters (#48993 ) (#48995 ) Reverts #48947 and fixes the issue orginally addressed by removing the assertion. It turns out we can't simply pass empty shard generations to the snapshot finalization in the BwC case as that results in no indices being added to the meta for the given snapshot since we take the indices from the shard generations (even in the BwC case the `null` generations work fine for this). Closes #48983	2019-11-12 21:35:41 +01:00
David Turner	9baea80853	Ignore metadata of deleted indices at start (#48918 ) Today in 6.x it is possible to add an index tombstone to the graveyard without deleting the corresponding index metadata, because the deletion is slightly deferred. If you shut down the node and upgrade to 7.x when in this state then the node will fail to apply any cluster states, reporting java.lang.IllegalStateException: Cannot delete index [...], it is still part of the cluster state. This commit addresses this situation by skipping over any index metadata with a corresponding tombstone, allowing this metadata to be cleaned up by the 7.x node.	2019-11-12 11:16:54 +00:00
David Turner	dc441588b6	Remove support for ancient corrupted markers (#48858 ) Today we still support reading store corruption markers of versions that haven't been written since 1.7. This commit removes this legacy support.	2019-11-12 11:10:46 +00:00
Yannick Welsch	ab15bce4e7	Auto-expand replicated closed indices (#48973 ) Fixes a bug where replicated closed indices were not being auto-expanded.	2019-11-12 12:00:05 +01:00
Tim Brooks	0645ee88e2	Send cluster name and discovery node in handshake (#48916 ) This commits sends the cluster name and discovery naode in the transport level handshake response. This will allow us to stop sending the transport service level handshake request in the 8.0-8.x release cycle. It is necessary to start sending this in 7.x so that 8.0 is guaranteed to be communicating with a version that sends the required information.	2019-11-11 18:42:02 -05:00
Jake Landis	c320b499a0	Prevent deadlock by using separate schedulers (#48697 ) (#48964 ) Currently the BulkProcessor class uses a single scheduler to schedule flushes and retries. Functionally these are very different concerns but can result in a dead lock. Specifically, the single shared scheduler can kick off a flush task, which only finishes it's task when the bulk that is being flushed finishes. If (for what ever reason), any items in that bulk fails it will (by default) schedule a retry. However, that retry will never run it's task, since the flush task is consuming the 1 and only thread available from the shared scheduler. Since the BulkProcessor is mostly client based code, the client can provide their own scheduler. As-is the scheduler would require at minimum 2 worker threads to avoid the potential deadlock. Since the number of threads is a configuration option in the scheduler, the code can not enforce this 2 worker rule until runtime. For this reason this commit splits the single task scheduler into 2 schedulers. This eliminates the potential for the flush task to block the retry task and removes this deadlock scenario. This commit also deprecates the Java APIs that presume a single scheduler, and updates any internal code to no longer use those APIs. Fixes #47599 Note - #41451 fixed the general case where a bulk fails and is retried that can result in a deadlock. This fix should address that case as well as the case when a bulk failure from the flush needs to be retried.	2019-11-11 16:31:21 -06:00
Mark Tozzi	d9e569278f	Refactor and DRY up Kahan Sum algorithm (#48558 ) (#48959 )	2019-11-11 15:09:19 -05:00
Armin Braun	c45470f84f	Fix ShardGenerations in RepositoryData in BwC Case (#48920 ) (#48947 ) We were tripping the assertion that the makes sure we only have empty `ShardGenerations` in `RepositoryData` in the BwC case because shard generations were passed to the `Repository` in the BwC case. Fixed by only generating empty shard gen for BwC snapshots in `SnapshotsService`.	2019-11-11 18:02:53 +01:00
Rory Hunter	014e1b1090	Improve resiliency to auto-formatting in server (#48940 ) Backport of #48450. Make a number of changes so that code in the `server` directory is more resilient to automatic formatting. This covers: * Reformatting multiline JSON to embed whitespace in the strings * Move some comments around to they aren't auto-formatted to a strange place. This also required moving some `&&` and `\|\|` operators from the end-of-line to start-of-line`. * Add helper method `reformatJson()`, to strip whitespace from a JSON document using XContent methods. This is sometimes necessary where a test is comparing some machine-generated JSON with an expected value. Also, `HyperLogLogPlusPlus.java` is now excluded from formatting because it contains large data tables that don't reformat well with the current settings, and changing the settings would be worse for the rest of the codebase.	2019-11-11 14:33:04 +00:00
Yannick Welsch	87862868c6	Allow realtime get to read from translog (#48843 ) The realtime GET API currently has erratic performance in case where a document is accessed that has just been indexed but not refreshed yet, as the implementation will currently force an internal refresh in that case. Refreshing can be an expensive operation, and also will block the thread that executes the GET operation, blocking other GETs to be processed. In case of frequent access of recently indexed documents, this can lead to a refresh storm and terrible GET performance. While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted to read from the translog in case of realtime GET API or update API, this was removed in 5.0 (#20102) to avoid inconsistencies between values that were returned from the translog and those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and upsert to read from the translog again as it was easier to guarantee consistency for these, and also brought back more predictable performance characteristics of this API. Calls to the realtime GET API, however, would still always do a refresh if necessary to return consistent results. This means that users that were calling realtime GET APIs to coordinate updates on client side (realtime GET + CAS for conditional index of updated doc) would still see very erratic performance. This PR (together with #48707) resolves the inconsistencies between reading from translog and index. In particular it fixes the inconsistencies that happen when requesting stored fields, which were not available when reading from translog. In case where stored fields are requested, this PR will reparse the _source from the translog and derive the stored fields to be returned. With this, it changes the realtime GET API to allow reading from the translog again, avoid refresh storms and blocking the GET threadpool, and provide overall much better and predictable performance for this API.	2019-11-09 17:47:50 +01:00
Nhat Nguyen	ff6c121eb9	Closed shard should never open new engine (#47186 ) We should not open new engines if a shard is closed. We break this assumption in #45263 where we stop verifying the shard state before creating an engine but only before swapping the engine reference. We can fail to snapshot the store metadata or checkIndex a closed shard if there's some IndexWriter holding the index lock. Closes #47060	2019-11-08 23:40:34 -05:00
Nhat Nguyen	9a42e71dd9	Do not cancel recovery for copy on broken node (#48265 ) This change fixes a poisonous situation where an ongoing recovery was canceled because a better copy was found on a node that the cluster had previously tried allocating the shard to but failed. The solution is to keep track of the set of nodes that an allocation was failed on so that we can avoid canceling the current recovery for a copy on failed nodes. Closes #47974	2019-11-08 23:10:47 -05:00
Adrien Grand	3b9ce0a4f3	Elasticsearch 7.5 is on Lucene 8.3. (#48831 )	2019-11-06 10:13:09 -05:00
David Turner	bd5c6c4779	Add preflight check to dynamic mapping updates (#48867 ) Today if the primary discovers that an indexing request needs a mapping update then it will send it to the master for validation and processing. If, however, the put-mapping request is invalid then the master still processes it as a (no-op) cluster state update. When there are a large number of indexing operations that result in invalid mapping updates this can overwhelm the master. However, the primary already has a reasonably up-to-date mapping against which it can check the (approximate) validity of the put-mapping request before sending it to the master. For instance it is not possible to remove fields in a mapping update, so if the primary detects that a mapping update will exceed the fields limit then it can reject it itself and avoid bothering the master. This commit adds a pre-flight check to the mapping update path so that the primary can discard obviously-invalid put-mapping requests itself. Fixes #35564 Backport of #48817	2019-11-05 18:08:22 +01:00
Nhat Nguyen	0887cbc964	Fix testForceMergeWithSoftDeletesRetentionAndRecoverySource (#48766 ) This test failure manifests the limitation of the recovery source merge policy explained in #41628. If we already merge down to a single segment then subsequent force merges will be noop although they can prune recovery source. We need to adjust this test until we have a fix for the merge policy. Relates #41628 Closes #48735	2019-11-02 21:14:12 -04:00
Armin Braun	3c20541823	Cleanup Concurrent RepositoryData Loading (#48329 ) (#48834 ) The loading of `RepositoryData` is not an atomic operation. It uses a list + get combination of calls. This lead to accidentally returning an empty repository data for generations >=0 which can never not exist unless the repository is corrupted. In the test #48122 (and other SLM tests) there was a low chance of running into this concurrent modification scenario and the repository actually moving two index generations between listing out the index-N and loading the latest version of it. Since we only keep two index-N around at a time this lead to unexpectedly absent snapshots in status APIs. Fixing the behavior to be more resilient is non-trivial but in the works. For now I think we should simply throw in this scenario. This will also help prevent corruption in the unlikely event but possible of running into this issue in a snapshot create or delete operation on master failover on a repository like S3 which doesn't have the "no overwrites" protection on writing a new index-N. Fixes #48122	2019-11-02 20:42:29 +01:00
Armin Braun	a22f6fbe3c	Cleanup Redundant Futures in Recovery Code (#48805 ) (#48832 ) Follow up to #48110 cleaning up the redundant future uses that were left over from that change.	2019-11-02 17:28:12 +01:00
Jason Tedor	c82ecb664c	Do not wrap ingest processor exception with IAE (#48816 ) The problem with wrapping here is that it converts any exception into an IAE, which we treat as a client error (400 status) whereas the exception being wrapped here could be a server error (e.g., NPE). This commit stops wrapping all ingest processor exceptions as IAEs.	2019-11-01 15:11:35 -04:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Tal Levy	4be54402de	[7.x] Add ingest info to Cluster Stats (#48485 ) (#48661 ) * Add ingest info to Cluster Stats (#48485) This commit enhances the ClusterStatsNodes response to include global processor usage stats on a per-processor basis. example output: ``` ... "processor_stats": { "gsub": { "count": 0, "failed": 0 "current": 0 "time_in_millis": 0 }, "script": { "count": 0, "failed": 0 "current": 0, "time_in_millis": 0 } } ... ``` The purpose for this enhancement is to make it easier to collect stats on how specific processors are being used across the cluster beyond the current per-node usage statistics that currently exist in node stats. Closes #46146. * fix BWC of ingest stats The introduction of processor types into IngestStats had a bug. It was set to `null` and set as the key to the map. This would throw a NPE. This commit resolves this by setting all the processor types from previous versions that are not serializing it out to `_NOT_AVAILABLE`.	2019-10-31 14:36:54 -07:00
Ioannis Kakavas	99aedc844d	Copy http headers to ThreadContext strictly (#45945 ) (#48675 ) Previous behavior while copying HTTP headers to the ThreadContext, would allow multiple HTTP headers with the same name, handling only the first occurrence and disregarding the rest of the values. This can be confusing when dealing with multiple Headers as it is not obvious which value is read and which ones are silently dropped. According to RFC-7230, a client must not send multiple header fields with the same field name in a HTTP message, unless the entire field value for this header is defined as a comma separated list or this specific header is a well-known exception. This commits changes the behavior in order to be more compliant to the aforementioned RFC by requiring the classes that implement ActionPlugin to declare if a header can be multi-valued or not when registering this header to be copied over to the ThreadContext in ActionPlugin#getRestHeaders. If the header is allowed to be multivalued, then all such headers are read from the HTTP request and their values get concatenated in a comma-separated string. If the header is not allowed to be multivalued, and the HTTP request contains multiple such Headers with different values, the request is rejected with a 400 status.	2019-10-31 23:05:12 +02:00
Zachary Tong	34c2375417	Add v7.4.3 version constant	2019-10-31 13:21:25 -04:00
Alexander Reelsen	4ecf234617	Upgrade to joda 2.10.4 (#47805 )	2019-10-31 14:49:50 +01:00
Stéphane Campinas	7ea74918e1	[DOCS] Fix typo in IndexFieldData.java comments (#48743 )	2019-10-31 09:40:35 -04:00
kkewwei	0366c4d4a9	Faster access to INITIALIZING/RELOCATING shards (#47817 ) Today a couple of allocation deciders iterate through all the shards on a node to find the `INITIALIZING` or `RELOCATING` ones, and this can slow down cluster state updates in clusters with very high-density nodes holding many thousands of shards even if those shards belong to closed or frozen indices. This commit pre-computes the sets of `INITIALIZING` and `RELOCATING` shards to speed up this search. Closes #46941 Relates #48579 Co-authored-by: "hongju.xhj" <hongju.xhj@alibaba-inc.com>	2019-10-31 10:55:59 +00:00
Rory Hunter	d96976e2b1	Improve resiliency to formatting JSON in server (#48706 ) Backport of #48553. Make a number of changes so that JSON in the server directory is more resilient to automatic formatting. This covers: * Reformatting multiline JSON to embed whitespace in the strings * Add helper method `stripWhitespace()`, to strip whitespace from a JSON document using XContent methods. This is sometimes necessary where a test is comparing some machine-generated JSON with an expected value.	2019-10-31 10:48:55 +00:00
Arvind Ramachandran	eefa84bc94	Ignore dangling indices created in newer versions (#48652 ) Today it is possible that we import a dangling index that was created in a newer version than one or more of the nodes in the cluster. Such an index would prevent the older node(s) from rejoining the cluster if they were to briefly leave it for some reason. This commit prevents the import of such dangling indices. Fixes #34264	2019-10-31 10:12:42 +00:00
Yannick Welsch	fe8901b00b	Return consistent source in updates (#48707 )	2019-10-31 10:00:40 +01:00
Ignacio Vera	5bea3898a9	Add IndexOrDocValuesQuery to GeoPolygonQueryBuilder (#48449 ) (#48731 )	2019-10-31 08:46:57 +01:00
Nhat Nguyen	f8ef402027	Do not warm up searcher in engine constructor (#48605 ) With this change, we won't warm up searchers until we externally refresh an engine. We explicitly refresh before allowing reading from a shard (i.e., move to post_recovery state) and during resetting. These guarantees that we have warmed up the engine before exposing the external searcher. Another prerequisite for #47186.	2019-10-30 14:22:59 -04:00
Armin Braun	36039706b5	Fix SnapshotShardStatus Reporting for Failed Shard (#48556 ) (#48687 ) Fixes the shard snapshot status reporting for failed shards in the corner case of failing the shard because of an exception thrown in `SnapshotShardsService` and not the repository. We were missing the update on the `snapshotStatus` instance in this case which made the transport APIs using this field report back an incorrect status. Fixed by moving the failure handling to the `SnapshotShardsService` for all cases (which also simplifies the code, the ex. wrapping in the repository was pointless as we only used the ex. trace upstream anyway). Also, added an assertion to another test that explicitly checks this failure situation (ex. in the `SnapshotShardsService`) already. Closes #48526	2019-10-30 15:43:41 +01:00
Armin Braun	52e5ceb321	Restore from Individual Shard Snapshot Files in Parallel (#48110 ) (#48686 ) Make restoring shard snapshots run in parallel on the `SNAPSHOT` thread-pool.	2019-10-30 14:36:30 +01:00
Armin Braun	01e326d2e3	Fix ref count handling in Engine.failEngine (#48639 ) (#48646 ) We can run into an already closed store here and hence throw on trying to increment the ref count => moving to the guarded ref count increment closes #48625	2019-10-30 10:10:48 +01:00
Julie Tibshirani	89c65752dc	Update the signature of vector script functions. (#48653 ) Previously the functions accepted a doc values reference, whereas they now accept the name of the vector field. Here's an example of how a vector function was called before and after the change. ``` Before: cosineSimilarity(params.query_vector, doc['field']) After: cosineSimilarity(params.query_vector, 'field') ``` This seems more intuitive, since we don't allow direct access to vector doc values and the the meaning of `doc['field']` is unclear. The PR makes the following changes (broken into distinct commits): * Add new function signatures of the form `function(params.query_vector, 'field')` and deprecates the old ones. Because Painless doesn't allow two methods with the same name and number of arguments, we allow a generic `Object` to be passed in to the function and decide on the behavior through an `instanceof` check. * Refactor the class bindings so that the document field is passed to the constructor instead of the instance method. This allows us to avoid retrieving the vector doc values on every function invocation, which gives a tiny speed-up in benchmarks. Note that this PR adds new signatures for the sparse vector functions too, even though sparse vectors are deprecated. It seemed simplest to understand (for both us and users) to keep everything symmetric between dense and sparse vectors.	2019-10-29 15:46:05 -07:00
Stuart Tettemer	55d00cf2b1	Scripting: fill in get contexts REST API (#48319 ) (#48602 ) Updates response for `GET /_script_context`, returning a `contexts` object with a list of context description objects. The description includes the context name and a list of methods available. The methods list has the signature for the `execute` mathod and any getters. eg. ``` { "contexts": [ { "name" : "moving-function", "methods" : [ { "name" : "execute", "return_type" : "double", "params" : [ { "type" : "java.util.Map", "name" : "params" }, { "type" : "double[]", "name" : "values" } ] } ] }, { "name" : "number_sort", "methods" : [ { "name" : "execute", "return_type" : "double", "params" : [ ] }, { "name" : "getDoc", "return_type" : "java.util.Map", "params" : [ ] }, { "name" : "getParams", "return_type" : "java.util.Map", "params" : [ ] }, { "name" : "get_score", "return_type" : "double", "params" : [ ] } ] }, ... ] } ``` fixes: #47411	2019-10-29 14:41:15 -06:00
Nhat Nguyen	2a863ac8ff	Fix testCleanUpCommitsWhenGlobalCheckpointAdvanced Relates #48559	2019-10-29 10:39:16 -04:00
Nhat Nguyen	b08cd058bc	Greedily advance safe commit on new global checkpoint (#48559 ) Today we won't advance the safe commit on a new global checkpoint unless the last commit can become safe. This is not great if we have more than two commits as we can have a new safe commit earlier. Closes #4853	2019-10-29 10:39:16 -04:00
Jim Ferenczi	aa70ff5ea4	Fix failures in ShuffleForcedMergePolicyTests#testDiagnostics (#48627 ) This commit fixes intermittent failures in ShuffleForcedMergePolicyTests#testDiagnostics by setting a more restricted merge policy that ensures that extra merging will not happen before the forced merge.	2019-10-29 13:46:55 +01:00
Jim Ferenczi	c6abe58f63	Fix expectations in SearchAfter integration tests (#48372 ) This commit fixes the expectations of SearchAfterIT#shouldFail regarding the inner exceptions that should be thrown when testing failures. The exception is sometimes wrapped in a QueryShardException so this change only checks that the toString representation contains the expected message. Closes #43143	2019-10-29 12:37:22 +01:00
Yannick Welsch	6af3ce58f8	Filter on node id in AllocationIdIT (#48623 ) Makes the assertions more targeted. Relates #48529	2019-10-29 12:10:48 +01:00
Jim Ferenczi	028084ce23	Add a new merge policy that interleaves old and new segments on force merge (#48533 ) This change adds a new merge policy that interleaves eldest and newest segments picked by MergePolicy#findForcedMerges and MergePolicy#findForcedDeletesMerges. This allows time-based indices, that usually have the eldest documents first, to be efficient at finding the most recent documents too. Although we wrap this merge policy for all indices even though it is mostly useful for time-based but there should be no overhead for other type of indices so it's simpler than adding a setting to enable it. This change is needed in order to ensure that the optimizations that we are working on in # remain efficient even after running a force merge. Relates #37043	2019-10-29 10:44:56 +01:00

... 3 4 5 6 7 ...

4259 Commits