Commit Graph

4039 Commits

Author SHA1 Message Date
Julie Tibshirani a299aba2f8
Ensure that field collapsing works with field aliases. (#50766)
Previously, the following situation would throw an error:
* A search contains a `collapse` on a particular field.
* The search spans multiple indices, and in one index the field is mapped as a
  concrete field, but in another it is a field alias.

The error occurs when we attempt to merge `CollapseTopFieldDocs` across shards.
When merging, we validate that the name of the collapse field is the same across
shards. But the name has already been resolved to the concrete field name, so it
will be different on shards where the field was mapped as an alias vs. shards
where it was a concrete field.

This PR updates the collapse field name in `CollapseTopFieldDocs` to the
original requested field, so that it will always be consistent across shards.

Note that in #32648, we already made a fix around collapsing on field aliases.
However, we didn't test this specific scenario where the field was mapped as an
alias in only one of the indices being searched.
2020-01-08 14:51:15 -08:00
Christoph Büscher b1b4282273 Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter,
it is not reloaded via the _reload_search_analyzers because the multiplexer
itself doesn't pass on the analysis mode of the filters it contains, so its not
recognized as "updateable" in itself. Instead we can check and merge the
AnalysisMode settings of all filters in the multiplexer and use the resulting
mode (e.g. search-time only) for the multiplexer itself, thus making any synonym
filters contained in it reloadable.  This, of course, will also make the
analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 22:12:01 +01:00
Przemyslaw Gomulka e95b0c447f
Allow parsing timezone without fully provided time backport(#50178) (#50740)
strict_date_optional_time changes to have optional minute part.
It already allowed optional second and fraction of second part.
This allows parsing 2018-01-01T00+01 , 2018-01-01T00:00+01 , 2018-01-01T00:00:00+01 , 2018-01-01T00:00:00.000+01
It won't allow parsing a timezone without an hour part as this is not allowed by iso8601 spec

closes #49351
2020-01-08 20:04:57 +01:00
Henning Andersen 125feecabc
Guess root cause support unwrap (#50525) (#50742)
ElasticsearchException.guessRootCauses would return wrapper exception if
inner exception was not an ElasticsearchException. Fixed to never return
wrapper exceptions.

At least following APIs change root_cause.0.type as a result:

_update with bad script
_index with bad pipeline

Relates #50417
2020-01-08 19:09:14 +01:00
Adrien Grand 4f2299c714
Upgrade to Lucene 8.4.0. (#50518) (#50750) 2020-01-08 18:53:59 +01:00
Adrien Grand 31158ab3d5
Add per-field metadata. (#50333)
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.

In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
 - keys can't be longer than 20 chars,
 - values can only be numbers or strings of no more than 50 chars - no inner
   arrays or objects,
 - the metadata can't have more than 5 keys in total.

Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.

Here is how the meta might look like in mappings:

```json
{
  "properties": {
    "latency": {
      "type": "long",
      "meta": {
        "unit": "ms"
      }
    }
  }
}
```

And then in the field capabilities response:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms" ]
      }
    }
  }
}
```

When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms", "ns" ]
      }
    }
  }
}
```

Closes #33267
2020-01-08 16:21:18 +01:00
Yannick Welsch f203c2b39d Import replicated closed dangling indices (#50649)
Dangling replicated closed indices are not imported properly (they miss their routing table when imported).
2020-01-08 13:39:20 +01:00
Rory Hunter b1ff74f652 New setting to prevent automatically importing dangling indices (#49174)
Introduce a new static setting, `gateway.auto_import_dangling_indices`, which prevents dangling indices from being automatically imported. Part of #48366.
2020-01-08 13:39:20 +01:00
Tim Vernum 293661d62c
Security should not reload files that haven't changed (#50724)
In security we currently monitor a set of files for changes:

- config/role_mapping.yml (or alternative configured path)
- config/roles.yml
- config/users
- config/users_roles

This commit prevents unnecessary reloading when the file change actually doesn't change the internal structure.

Backport of: #50207

Co-authored-by: Anton Shuvaev <anton.shuvaev91@gmail.com>
2020-01-08 15:13:47 +11:00
Nik Everett deb0991667
Teach ObjectParser a happy pattern (#50691) (#50710)
We *very* commonly have object with ctors like:
```
public Foo(String name)
```

And then declare a bunch of setters on the object. Every aggregation
works like this, for example. This change teaches `ObjectParser` how to
build these aggregations all on its own, without any help. This'll make
it much cleaner to parse aggs, and, probably, a bunch of other things.
It'll let us remove lots of wrapping. I've used this new power for the
`avg` aggregation just to prove that it works outside of a unit test.
2020-01-07 11:57:41 -05:00
Nhat Nguyen c3d207f437 Disable auto refresh in testSegmentsStats (#50689)
If an auto-refresh happens, then version_map_memory is reset to 0. By 
default, the auto-refresh occurs for every second in the first 30
seconds until search becomes idle.

Closes #50362
2020-01-07 10:44:30 -05:00
Hendrik Muhs 98ca9500e8
implement a workaround for remote cluster validation (#50460)
In 7.x an internal API used for validating remote cluster does not throw, see #50420 for the 
details. This change implements a workaround for remote cluster validation, only for 7.x branches.

fixes #50420
2020-01-07 13:51:51 +01:00
Yannick Welsch a2ef0e8830 Check allocation id when failing shard on recovery (#50656)
A failure of a recovering shard can race with a new allocation of the shard, and cause the new
allocation to be failed as well. This can result in a shard being marked as initializing in the cluster
state, but not exist on the node anymore.

Closes #50508
2020-01-07 09:41:28 +01:00
Jay Modi e5191e77e3
Remove unused IndicesOptions#fromByte method (#50683)
This change removes a no longer used method, `fromByte`, in
IndicesOptions. This method was necessary for backwards compatibility
with versions prior to 6.4.0 and was used when talking to those
versions. However, the minimum wire compatibility version has changed
and we no longer use this code.

Backport of #50665
2020-01-06 14:57:10 -07:00
Nik Everett 76bb661023
Replace AggParseContext with a String (backport of #50625) (#50679)
We used to have a *ton* off stuff in the `AggParseContext` but now we
parse aggs entirely with named xcontent. So we don't need the context
any more.
2020-01-06 14:32:03 -05:00
Nhat Nguyen 926c0aa74c Fix testRelocationEstablishedPeerRecoveryRetentionLeases (#50673)
The redNodes are calculated incorrectly.

Closes #50660
2020-01-06 13:32:04 -05:00
Nik Everett f576aefd0f
Replace bespoke parser for significance heuristics (#50623) (#50659)
This replaces the hand written xcontent parsers for significance
heristics with `ObjectParser` and parsing named xcontent.

As a happy accident, this was the last user of `ParseFieldRegistry` so
this PR entirely removes that class.

Closes #25519
2020-01-06 12:57:43 -05:00
Tim Brooks fa57813c6d
Remove races in ProxyConnectionStrategyTests (#50620)
Currently, we use delayed address resolution in the proxy strategy tests
to allow tests to connect to different addresses. Unfortunately, this
has the potential to introduce races as the address is resolved each
connection attempt. The number of connection attempts can vary based on
when connections are opening and closing. This commit modifies the test
be allowing them to specifically control which address is used.

Related to #50618
2020-01-06 10:20:53 -07:00
Martijn van Groningen 7be43e9f6d
Fix ingest stats test bug. (#50653)
This test code fixes a serialization test bug:
https://gradle-enterprise.elastic.co/s/7x2ct6yywkw3o

Rarely stats for the same processor are generated and
the production code then sums up these stats. However
the test code wasn't summing up in that case,
which caused inconsistencies between the actual and expected results.

Closes #50507
2020-01-06 15:37:47 +01:00
Nhat Nguyen b71490b06b
Deprecate indices without soft-deletes (#50502) (#50634)
Soft-deletes will be enabled for all indices in 8.0. Hence, we should
deprecate new indices without soft-deletes in 7.x.

Backport of #50502
2020-01-06 08:44:30 -05:00
Henning Andersen ec0ec61881 Deleted docs disregarded for if_seq_no check (#50526)
Previously, as long as a deleted version value was kept as a tombstone,
another index or delete operation against the same id would leak that
the doc had existed (through seq_no info) or would allow the operation
if the client forged the seq_no. Fixed to disregard info on deleted docs
when doing seq_no based optimistic concurrency check.
2020-01-06 13:54:36 +01:00
Nikita Glashenko 5533e1172c Add tests for remaining IntervalsSourceProvider implementations (#50326)
This PR adds unit tests for wire and xContent serialization of remaining IntervalsSourceProvider
implementations.

Closes #50150
2020-01-06 13:18:53 +01:00
David Turner 66c690922c Collect shard sizes for closed indices (#50645)
Today the `InternalClusterInfoService` collects information on the sizes of
shards of open indices, but does not consider closed indices. This means that
shards of closed indices are treated as having zero size when they are being
allocated. This commit fixes this, obtaining the sizes of all shards.

Relates #33888
2020-01-06 11:44:19 +00:00
Henning Andersen 312bf44601 Workaround for JDK 14 EA FileChannel.map issue (#50523)
FileChannel.map provokes static initialization of ExtendedMapMode in
JDK14 EA, which needs elevated privileges.

Relates #50512
2020-01-06 12:18:49 +01:00
Nik Everett 2362c430cd
Clean up wire test case a bit (#50627) (#50632)
* Adds JavaDoc to `AbstractWireTestCase` and
`AbstractWireSerializingTestCase` so it is more obvious you should prefer
the latter if you have a choice
* Moves the `instanceReader` method out of `AbstractWireTestCase` becaue
it is no longer used.
* Marks a bunch of methods final so it is more obvious which classes are
for what.
* Cleans up the side effects of the above.
2020-01-05 16:20:38 -05:00
Nik Everett 4d58656065
Declare remaining parsers `final` (#50571) (#50615)
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.

I found the non-final parsers with this:
```
diff \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
  2>&1 | grep '^<'
```
2020-01-03 11:48:11 -05:00
Andrei Dan 856607b5a6
Guard against null geoBoundingBox (#50506) (#50608)
A geo box with a top value of Double.NEGATIVE_INFINITY will yield an empty
xContent which translates to a null `geoBoundingBox`. This commit marks the
field as `Nullable` and guards against null when retrieving the `topLeft`
and `bottomRight` fields.

Fixes https://github.com/elastic/elasticsearch/issues/50505

(cherry picked from commit 051718f9b1e1ca957229b01e80d7b79d7e727e14)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-01-03 18:04:26 +02:00
Nik Everett 1abecad21b
Mark some constants in decay functions final (#50569) (#50575)
This marks a couple of constants in the `DecayFunctionBuilder` as final.
They are written in CONSTANT_CASE and used as constants but not final
which is a little confusing and might lead to sneaky bugs.
2020-01-03 10:58:15 -05:00
Henning Andersen 218bd19034
Improve FutureUtils.get exception handling (#50339) (#50417)
FutureUtils.get() would unwrap ElasticsearchWrapperExceptions. This
is trappy, since nearly all usages of FutureUtils.get() expected only to
not have to deal with checked exceptions.

In particular, StepListener builds upon ListenableFuture which uses
FutureUtils.get to be informed about the exception passed to onFailure.
This had the bad consequence of masking away any exception that was an
ElasticsearchWrapperException like RemoteTransportException.
Specifically for recovery, this made CircuitBreakerExceptions happening
on the target node look like they originated from the source node.

The only usage that expected that behaviour was AdapterActionFuture.
The unwrap behaviour has been moved to that class.
2020-01-03 15:28:47 +01:00
kkewwei 5655d6a1c1 Log index name when updating index settings (#49969)
Today we log changes to index settings like this:

    updating [index.setting.blah] from [A] to [B]

The identity of the index whose settings were updated is conspicuously absent
from this message. This commit addresses this by adding the index name to these
messages.

Fixes #49818.
2020-01-03 11:26:29 +00:00
Alan Woodward 8b362c657b Add fuzzy intervals source (#49762)
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.

Closes #49595
2020-01-03 09:59:19 +00:00
Henning Andersen e19585b47f Enhance TransportReplicationAction assertions (#49081)
Include failure into assertion error when replication action discovers
that it has been double triggered.
2020-01-02 19:23:10 +01:00
Oleg 7539fbb30f Deprecate the 'local' parameter of /_cat/nodes (#50499)
The cat nodes API performs a `ClusterStateAction` then a `NodesInfoAction`.
Today it accepts the `?local` parameter and passes this to the
`ClusterStateAction` but this parameter has no effect on the `NodesInfoAction`.
This is surprising, because `GET _cat/nodes?local` looks like it might be a
completely local call but in fact it still depends on every node in the
cluster.

This commit deprecates the `?local` parameter on this API so that it can be
removed in 8.0.

Relates #50088
2020-01-02 14:53:56 +00:00
Nhat Nguyen e7c15a5c6e Ensure relocating shards establish peer recovery retention leases (#50486)
We forgot to establish peer recovery retention leases for relocating primaries
without soft-deletes.

Relates #50351
2019-12-26 13:51:35 -05:00
Nhat Nguyen 7713221733 Fix testCancelRecoveryDuringPhase1 (#50449)
testCancelRecoveryDuringPhase1 uses a mock of IndexShard, which can't
create retention leases. We need to stub method createRetentionLease.

Relates #50351 
Closes #50424
2019-12-26 09:48:58 -05:00
Yannick Welsch f57569bf5c Mute RecoverySourceHandlerTests.testCancelRecoveryDuringPhase1
Relates #50424
2019-12-24 12:13:31 -05:00
Martijn van Groningen 10ed1ae1d2
Add remote info to the HLRC (#50483)
The additional change to the original PR (#49657), is that `org.elasticsearch.client.cluster.RemoteConnectionInfo` now parses the initial_connect_timeout field as a string instead of a TimeValue instance.

The reason that this is needed is because that the initial_connect_timeout field in the remote connection api is serialized for human consumption, but not for parsing purposes.
Therefore the HLRC can't parse it correctly (which caused test failures in CI, but not in the PR CI
:( ). The way this field is serialized needs to be changed in the remote connection api, but that is a breaking change. We should wait making this change until rest api versioning is introduced.

Co-Authored-By: j-bean <anton.shuvaev91@gmail.com>

Co-authored-by: j-bean <anton.shuvaev91@gmail.com>
2019-12-24 15:11:58 +01:00
Nhat Nguyen 33204c2055 Use peer recovery retention leases for indices without soft-deletes (#50351)
Today, the replica allocator uses peer recovery retention leases to
select the best-matched copies when allocating replicas of indices with
soft-deletes. We can employ this mechanism for indices without
soft-deletes because the retaining sequence number of a PRRL is the
persisted global checkpoint (plus one) of that copy. If the primary and
replica have the same retaining sequence number, then we should be able
to perform a noop recovery. The reason is that we must be retaining
translog up to the local checkpoint of the safe commit, which is at most
the global checkpoint of either copy). The only limitation is that we
might not cancel ongoing file-based recoveries with PRRLs for noop
recoveries. We can't make the translog retention policy comply with
PRRLs. We also have this problem with soft-deletes if a PRRL is about to
expire.

Relates #45136
Relates #46959
2019-12-23 22:04:07 -05:00
Tal Levy bed121efaf
[7.x-backport] Centralize BoundingBox logic to a dedicated class (#50469)
Both geo_bounding_box query and geo_bounds aggregation have
a very similar definition of a "bounding box". A lot of this
logic (serialization, xcontent-parsing, etc) can be centralized
instead of having separated efforts to do the same things
2019-12-23 11:21:39 -08:00
Aleksandr Maus d5cec7faa1
Improve SearchHit "equals" implementation for null fields cases (#50327) (#50448)
* Improve SearchHit "equals" implementation for null fields cases
2019-12-23 09:59:07 -05:00
Igor Motov 339d10c16f Geo: Switch generated GeoJson type names to camel case (#50400)
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
2019-12-20 15:37:22 -05:00
Andrei Dan a3cdbda7c6
Make the TransportRolloverAction execute in one cluster state update (#50388) (#50442)
This commit makes the TransportRolloverAction more resilient, by having it execute
only one cluster state update that creates the new (rollover index), rolls over
the alias from the source to the target index and set the RolloverInfo on the
source index. Before these 3 steps were represented as 3 chained cluster state
updates, which would've seen the user manually intervene if, say, the alias
rollover cluster state update (second in the chain) failed but the creation of
the rollover index (first in the chain) update succeeded

* Rename innerExecute to applyAliasActions

(cherry picked from commit 1ba4339a0c73ef3354b8c8b44b628fc55f1dbc78)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2019-12-20 18:01:03 +00:00
Nhat Nguyen 975cc99516 Close engine before reset log appender (#50390)
Merge threads can run and access the mock appender after we have stopped it.

Closes #50315
2019-12-20 12:19:03 -05:00
Jim Ferenczi 2acafd4b15
Optimize composite aggregation based on index sorting (#48399) (#50272)
Co-authored-by: Daniel Huang <danielhuang@tencent.com>

This is a spinoff of #48130 that generalizes the proposal to allow early termination with the composite aggregation when leading sources match a prefix or the entire index sort specification.
In such case the composite aggregation can use the index sort natural order to early terminate the collection when it reaches a composite key that is greater than the bottom of the queue.
The optimization is also applicable when a query other than match_all is provided. However the optimization is deactivated for sources that match the index sort in the following cases:
  * Multi-valued source, in such case early termination is not possible.
  * missing_bucket is set to true
2019-12-20 12:32:37 +01:00
Yannick Welsch 4f805deb0c Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361)
Follow-up to #48974 that ensures that replicas are only auto-expanded according to allocation
filtering rules once all nodes are upgraded to a version that supports this. Helps with
orchestrating cluster upgrades.
2019-12-20 11:50:00 +01:00
Yannick Welsch 5f37f1f401 Revert "Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361)"
This reverts commit df4fe73b84.
2019-12-20 11:07:30 +01:00
Hendrik Muhs de14092ad2 [Transform] refactor source and dest validation to support CCS (#50018)
refactors source and dest validation, adds support for CCS, makes resolve work like reindex/search, allow aliased dest index with a single write index.

fixes #49988
fixes #49851
relates #43201
2019-12-20 10:49:53 +01:00
Alan Woodward 3cdc23ec9c
Fix meta version of task index mapping (#50363)
The built-in task index mapping has a version field in its metadata, so that
the TaskResultsService can check to see if it needs to update mappings when
a new task result is stored. #48393 updated this version in TaskResultsService
but omitted to change the version in the mapping itself, so a mapping update
is applied every time a new task result is stored.

This commit updates the mapping version so that it corresponds to the version
in TaskResultsService.
2019-12-20 09:44:47 +00:00
Yannick Welsch df4fe73b84 Only auto-expand replicas with allocation filtering when all nodes upgraded (#50361)
Follow-up to #48974 that ensures that replicas are only auto-expanded according to allocation
filtering rules once all nodes are upgraded to a version that supports this. Helps with
orchestrating cluster upgrades.
2019-12-20 10:22:44 +01:00
Tim Brooks cb73fb0f9b
Backport remote proxy mode stats and naming (#50402)
* Update remote cluster stats to support simple mode (#49961)

Remote cluster stats API currently only returns useful information if
the strategy in use is the SNIFF mode. This PR modifies the API to
provide relevant information if the user is in the SIMPLE mode. This
information is the configured addresses, max socket connections, and
open socket connections.

* Send hostname in SNI header in simple remote mode (#50247)

Currently an intermediate proxy must route conncctions to the
appropriate remote cluster when using simple mode. This commit offers
a additional mechanism for the proxy to route the connections by
including the hostname in the TLS SNI header.

* Rename the remote connection mode simple to proxy (#50291)

This commit renames the simple connection mode to the proxy connection
mode for remote cluster connections. In order to do this, the mode specific
settings which we namespaced by their mode (ex: sniff.seed and
proxy.addresses) have been reverted.

* Modify proxy mode to support a single address (#50391)

Currently, the remote proxy connection mode uses a list setting for the
proxy address. This commit modifies this so that the setting is
proxy_address and only supports a single remote proxy address.
2019-12-19 18:02:48 -07:00