Commit Graph

1073 Commits

Author SHA1 Message Date
Nhat Nguyen 6dd0aa54f6
Integrates soft-deletes into Elasticsearch (#33222)
This PR integrates Lucene soft-deletes(LUCENE-8200) into Elasticsearch.
Highlight works in this PR include:

- Replace hard-deletes by soft-deletes in InternalEngine
- Use _recovery_source if _source is disabled or modified (#31106)
- Soft-deletes retention policy based on the global checkpoint (#30335)
- Read operation history from Lucene instead of translog (#30120)
- Use Lucene history in peer-recovery (#30522)

Relates #30086
Closes #29530

---
These works have been done by the whole team; however, these individuals
(lexical order) have significant contribution in coding and reviewing:

Co-authored-by: Adrien Grand jpountz@gmail.com
Co-authored-by: Boaz Leskes b.leskes@gmail.com
Co-authored-by: Jason Tedor jason@tedor.me
Co-authored-by: Martijn van Groningen martijn.v.groningen@gmail.com
Co-authored-by: Nhat Nguyen nhat.nguyen@elastic.co
Co-authored-by: Simon Willnauer simonw@apache.org
2018-08-30 22:11:23 -04:00
Lee Hinman 8a2d154bad Update serialization versions for custom IndexMetaData backport 2018-08-30 15:56:53 -06:00
Igor Motov 001b78f704 Replace IndexMetaData.Custom with Map-based custom metadata (#32749)
This PR removes the deprecated `Custom` class in `IndexMetaData`, in favor
of a `Map<String, DiffableStringMap>` that is used to store custom index
metadata. As part of this, there is now no way to set this metadata in a
template or create index request (since it's only set by plugins, or dedicated
REST endpoints).

The `Map<String, DiffableStringMap>` is intended to be a namespaced `Map<String,
String>` (`DiffableStringMap` implements `Map<String, String>`, so the signature
is more like `Map<String, Map<String, String>>`). This is so we can do things
like:

``` java
Map<String, String> ccrMeta = indexMetaData.getCustom("ccr");
```

And then have complete control over the metadata. This also means any
plugin/feature that uses this has to manage its own BWC, as the map is just
serialized as a map. It also means that if metadata is put in the map that isn't
used (for instance, if a plugin were removed), it causes no failures the way
an unregistered `Setting` would.

The reason I use a custom `DiffableStringMap` here rather than a plain
`Map<String, String>` is so the map can be diffed with previous cluster state
updates for serialization.

Supersedes #32683
2018-08-30 13:57:00 -06:00
Simon Willnauer af2eaf2a6c
Remove usage of `index.shrink.source.*` in 7.x (#33271)
We cut over to `index.resize.source.*` but still have these constants
being public in `IndexMetaData`. Those Settings and constants are not needed
in 7.x while we still need to keep the keys known to private settings since
they might be part of the index settings of old indices. We can remove that
in 8.0. Yet, we should remove the settings to make sure they are not used again.
2018-08-30 21:08:35 +02:00
Jim Ferenczi d0630093cd
Fix serialization of empty field capabilities response (#33263)
Fix serialization of empty field capabilities response

When no response are required (no indices match the requested patterns) the
empty response throws an NPE in the transport serialization (writeTo).
2018-08-30 18:07:58 +02:00
Jim Ferenczi 1404dd2a42
Fix nested _source retrieval with includes/excludes (#33180)
If an exclude or an include clause removes an entry to a nested field in the original source at query time,
the creation of nested hits fails with an NPE. This change fixes this exception and replaces the nested document
source with an empty map.

Closes #33163
Closes #33170
2018-08-30 15:15:50 +02:00
David Turner 47859e56ac
Move file-based discovery to core (#33241)
Today we support a static list of seed hosts in core Elasticsearch, and allow a
dynamic list of seed hosts to be provided via a file using the `discovery-file`
plugin. In fact the ability to provide a dynamic list of seed hosts is
increasingly useful, so this change moves this functionality to core
Elasticsearch to avoid the need for a plugin.

Furthermore, in order to start up nodes in integration tests we currently
assign a known port to each node before startup, which unfortunately sometimes
fails if another process grabs the selected port in the meantime. By moving the
`discovery-file` functionality into the core product we can use it to avoid
this race.

This change also moves the expected path to the file from
`$ES_PATH_CONF/discovery-file/unicast_hosts.txt` to
`$ES_PATH_CONF/unicast_hosts.txt`. An example of this file is not included in
distributions.

For BWC purposes the plugin still exists, but does nothing more than create the
example file in the old location, and issue a warning when it is used. We also
continue to support the old location for the file, but warn about its
deprecation.

Relates #29244
Closes #33030
2018-08-30 06:43:04 +01:00
Armin Braun cc4d7059bf
Ingest: Add conditional per processor (#32398)
* Ingest: Add conditional per processor
* closes #21248
2018-08-30 03:46:39 +02:00
Jason Tedor 0f22dbb1cc
Apply settings filter to get cluster settings API (#33247)
Some settings have filters applied to them and we use this in logs and
the get nodes info API. For consistency, we should apply this in the get
cluster settings API too.
2018-08-29 15:56:13 -04:00
Simon Willnauer 6a0d4b4a77
Remote 6.x transport BWC Layer for `_shrink` (#33236)
The shrink action was renamed to `_resize` with the addition
or split. This bwc layer is unnecessary on 7.x since 6.latest will
always use the resize action.
2018-08-29 16:43:13 +02:00
Luca Cavanna 49109187e2
Remove unsupported group_shard_failures parameter (#33208)
We have had support for the `group_shard_failures` parameter in our code for a while, since we introduced failures grouping. When we introduced validation of parameters at REST, we seem to have forgotten to expose such parameter. Given that the parameter is effectively not supported for many months now, that no user has complained about that and that grouping is the expected behaviour, this commit removes support for the parameter.
2018-08-29 14:05:41 +02:00
Luca Cavanna 034fdbca28
Update BucketUtils#suggestShardSideQueueSize signature (#33210)
`BucketUtils#suggestShardSideQueueSize` used to calculate the shard_size based on the number of shards. It returns now a different value only based on whether we are querying a single shard or multiple shards. This commit replaces the numberOfShards argument with a boolean that tells whether we are querying a single shard or not.
2018-08-29 13:51:54 +02:00
Armin Braun f690b492e7
INGEST: Add Pipeline Processor (#32473)
* INGEST: Add Pipeline Processor

* Adds Processor capable of invoking other pipelines
* Closes #31842
2018-08-29 11:03:10 +02:00
Alexander Reelsen 48b388ce82
Core: Add java time xcontent serializers (#33120)
This ensures that the java time class exposed by painless have proper
serialization/string representations.

Closes #31853
2018-08-29 10:00:16 +02:00
Alpar Torok f29f0af7bc
Consider multi release jars when running third party audit (#33206)
Exclude classes meant for newer versions than what we are auditing against, those classes won't be found. There's no reason to exclude JDK classes from newer versions, with this PR, we will not extract them in the first place.
2018-08-29 09:53:04 +03:00
Mark Tozzi 84b61d0738
Scroll queries asking for rescore are considered invalid (#32918)
This PR changes our behavior from silently ignoring rescore in a scroll query to instead report to the user that such a query is invalid.

Closes #31775
2018-08-28 15:48:23 -04:00
Sohaib Iftikhar 7f5e29ddb2 HLREST: add reindex API (#32679)
Adds the reindex API to the high level REST client.
2018-08-28 13:02:23 -04:00
Jonathan Little 9d92a87ae6 Remove support for deprecated params._agg/_aggs for scripted metric aggregations (#32979) 2018-08-28 09:27:43 +01:00
Alpar Torok 2cc611604f
Run Third party audit with forbidden APIs CLI (part3/3) (#33052)
The new implementation is functional equivalent with the old, ant based one.
It parses task standard error to get the missing classes and violations in the same way.
I considered re-using ForbiddenApisCliTask but Gradle makes it hard to build inheritance with tasks that have task actions , since the order of the task actions can't be controlled.
This inheritance isn't dully desired either as the third party audit task is much more opinionated and we don't want to expose some of the configuration.
We could probably extract a common base class without any task actions, but probably more trouble than it's worth.

Closes #31715
2018-08-28 10:03:30 +03:00
Nhat Nguyen 014b3236dc
Ensure to generate identical NoOp for the same failure (#33141)
We generate slightly different NoOps in InternalEngine and
TransportShardBulkAction for the same failure.

1. InternalEngine uses Exception#getFailure to generate a message
without the class name: newOp [NoOp{seqNo=1, primaryTerm=1,
reason='Contexts are mandatory in context enabled completion field
[suggest_context]'}].

2. TransportShardBulkAction uses Exception#toString to generate a
message with the class name: NoOp{seqNo=1, primaryTerm=1,
reason='java.lang.IllegalArgumentException: Contexts are mandatory in
context enabled completion field [suggest_context]'}.

If a write operation fails while a replica is recovering, that replica
will possibly receive two different NoOps: one from recovery and one
from replication. These two different NoOps will trip
TranslogWriter#assertNoSeqNumberConflict assertion.

This commit ensures that we generate the same Noop for the same failure.

Closes #32986
2018-08-27 15:59:42 -04:00
Luca Cavanna ed0571e16c
ShardSearchFailure#readFrom to set index and shardId (#33161)
As part of recent changes made to `ShardOperationFailedException` we introduced `index` and `shardId` members to the base class, but the subclasses are entirely responsible for the serialization of such fields. In the case of `ShardSearchFailure`, we have an additional `SearchShardTarget` instance member which also holds the index and the shardId, hence they get serialized as part of `SearchShardTarget` itself. When de-serializing a `ShardSearchFailure` though, we need to remember to also set the parent class `index` and `shardId` fields otherwise they get lost

Relates to #32640
2018-08-27 20:31:27 +02:00
Jason Tedor 318df2a107
Adjust BWC version on mapping version
The introduction of mapping version on index metadata has been
backported to 6.x. This commit adjusts the BWC version around mapping
version to account for this backport.
2018-08-27 13:17:15 -04:00
Jason Tedor 2aef7e0900
Introduce mapping version to index metadata (#33147)
This commit introduces mapping version to index metadata. This value is
monotonically increasing and is updated on mapping updates. This will be
useful in cross-cluster replication so that we can request mapping
updates from the leader only when there is a mapping update as opposed
to the strategy we employ today which is to request a mapping update any
time there is an index metadata update. As index metadata updates can
occur for many reasons other than mapping updates, this leads to some
unnecessary requests and work in cross-cluster replication.
2018-08-27 12:21:11 -04:00
Mikita Karaliou f1f6d4ed33 Support only string `format` in date, root object & date range (#28117)
Limit date `format` attribute to String values only.

Closes #23650
2018-08-27 12:24:51 +02:00
Daniel Mitterdorfer 06c0055c0f
Have circuit breaker succeed on unknown mem usage
With this commit we implement a workaround for
https://bugs.openjdk.java.net/browse/JDK-8207200 which is a race
condition in the JVM that results in `IllegalArgumentException` to be
thrown in rare cases when we determine memory usage via `MemoryMXBean`.
As we do not want to fail requests in those cases we always return zero
memory usage.

Relates #31767
Relates #33125
2018-08-27 07:09:27 +02:00
Jason Tedor 143cd9bbaa
Do not lose default mapper on metadata updates (#33153)
When applying index metadata updates we run through the mappings
updating them if needed. Today if there is not an update to the default
mapper, we can lose the default mapping. This means that, for example,
if we apply a settings update to an index we will lose the default
mapper. This happens because we were not guarding updating the default
mapping with a check that the default mapping was updated in the
metadata update. When there is no update in the metadata update, we need
to continue to preserve the previous default mapping. This commit
achieves this by moving the updating of the default mapping under the
same guard that we use for updating the default mapping source. We add a
test that fails before putting the update under a guard and now passes
after moving the update under the guard.
2018-08-26 15:57:52 -04:00
Jason Tedor f8b07a0d84
Fix a mappings update test (#33146)
This commit fixes a mappings update test. The test is broken in the
sense that it passes, but for the wrong reason. The test here is testing
that if we make a mapping update but do not commit that mapping update
then the mapper service still maintains the previous document
mapper. This was not the case long, long ago when a mapping update would
update the in-memory state before the cluster state update was
committed. This test was passing, but it was passing because the mapping
update was never even updated. It was never even updated because it was
encountering a null pointer exception. Of course the in-memory state is
not going to be updated in that case, we are simply going to end up with
a failed cluster state update. Fixing that leads to another issue which
is that the mapping source does not even parse so again we would, of
course, end up with the in-memory state not being modified. We fix these
issues, assert that the result cluster state task completed
successfully, and finally that the in-memory state was not updated since
we never committed the resulting cluster state.
2018-08-26 09:36:17 -04:00
Simon Willnauer 3376922e8b
Add proxy support to RemoteClusterConnection (#33062)
This adds support for connecting to a remote cluster through
a tcp proxy. A remote cluster can configured with an additional
`search.remote.$clustername.proxy` setting. This proxy will be used
to connect to remote nodes for every node connection established.
We still try to sniff the remote clsuter and connect to nodes directly
through the proxy which has to support some kind of routing to these nodes.
Yet, this routing mechanism requires the handshake request to include some
kind of information where to route to which is not yet implemented. The effort
to use the hostname and an optional node attribute for routing is tracked
in #32517

Closes #31840
2018-08-25 20:41:32 +02:00
Nhat Nguyen 9dad82ece8
TEST: Skip assertSeqNos for closed shards (#33130)
If a shard was closed, we return null for SeqNoStats. Therefore the
assertion assertSeqNos will hit NPE when it verifies a closed shard.

This commit skips closed shards in assertSeqNos and enables this
assertion in AbstractDisruptionTestCase.
2018-08-24 21:02:13 -04:00
Nik Everett a023e64801 Checkstyle!
Catching your unused imports since 2001.
2018-08-24 14:13:13 -04:00
Jim Ferenczi 70030c18f1 [Test] Fix sporadic failure in MembershipActionTests
Rewrite test that require Version.V_5 constants.
2018-08-24 18:40:04 +02:00
Mayya Sharipova 6f1ee76443 Revert "Do NOT allow termvectors on nested fields (#32728)"
This reverts commit fdff8f3db0.
2018-08-24 10:12:16 -04:00
Jim Ferenczi f4e9729d64
Remove unsupported Version.V_5_* (#32937)
This change removes the es 5x version constants and their usages.
2018-08-24 09:51:21 +02:00
Michael Basnight 8f16696fe1 Add versions 5.6.12 and 6.4.1 2018-08-23 15:49:14 -05:00
Mayya Sharipova fdff8f3db0
Do NOT allow termvectors on nested fields (#32728)
Requesting _termvectors on a nested field or any sub-fields of a nested field
returns empty results.

Closes #21625
2018-08-23 16:46:47 -04:00
Simon Willnauer f3cfd4504f Use `addIfAbsent` instead of checking if an element is contained
Relates to #32988
2018-08-23 13:43:23 +02:00
Ignacio Vera d7219c05a2
Search: Support of wildcard on docvalue_fields (#32980)
* Search: Support of wildcard on docvalue_fields

For consistency with stored_fields, docvalue_fields should support the use of wildcards. 
Documentation of doc values fields is updated accordingly.

See also: #26390

Closes #26299
2018-08-23 10:04:00 +02:00
Jim Ferenczi ffe895e16e
Change query field expansion (#33020)
This commit changes the query field expansion for query parsers
to not rely on an hardcoded list of field types. Instead we rely on
the type of exception that is thrown by MappedFieldType#termQuery to
include/exclude an expanded field.

Supersedes #31655

Closes #31798
2018-08-23 09:52:48 +02:00
Armin Braun 46247ff1f9
INGEST: Cleanup Redundant Put Method (#33034) 2018-08-23 07:43:36 +02:00
Luca Cavanna 393eec1482
Set maxScore for empty TopDocs to Nan rather than 0 (#32938)
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
2018-08-22 17:23:54 +02:00
Jason Tedor 67bfb765ee
Refactor Netty4Utils#maybeDie (#33021)
In our Netty layer we have had to take extra precautions against Netty
catching throwables which prevents them from reaching the uncaught
exception handler. This code has taken on additional uses in NIO layer
and now in the scheduler engine because there are other components in
stack traces that could catch throwables and suppress them from reaching
the uncaught exception handler. This commit is a simple cleanup of the
iterative evolution of this code to refactor all uses into a single
method in ExceptionsHelper.
2018-08-22 10:18:07 -04:00
Simon Willnauer ead198bf2e
Add settings updater for 2 affix settings (#33050)
Today we can only have non-affix settings updated and consumed _together_.
Yet, there are use-cases where two affix settings depend on each other which
makes using the hard without consuming updates together. Unfortunately, there is
not straight forward way to have N settings updated together in a type-safe way
having 2 still serves a large portion of use-cases.
2018-08-22 14:13:27 +02:00
Nhat Nguyen 262d3c0783
Allow engine to recover from translog upto a seqno (#33032)
This change allows an engine to recover from its local translog up to
the given seqno. The extended API can be used in these use cases:

When a replica starts following a new primary, it resets its index to
the safe commit, then replays its local translog up to the current
global checkpoint (see #32867).

When a replica starts a peer-recovery, it can initialize the
start_sequence_number to the persisted global checkpoint instead of the
local checkpoint of the safe commit. A replica will then replay its
local translog up to that global checkpoint before accepting remote
translog from the primary. This change will increase the chance of
operation-based recovery. I will make this in a follow-up.

Relates #32867
2018-08-22 07:57:44 -04:00
Simon Willnauer ffb1a5d5b7
Expose `max_concurrent_shard_requests` in `_msearch` (#33016)
Today `_msearch` doesn't allow modifying the `max_concurrent_shard_requests`
per sub search request. This change adds support for setting this parameter on
all sub-search requests in an `_msearch`.

Relates to #31877
2018-08-22 08:45:08 +02:00
Julie Tibshirani 67b5a83a9a
Ensure that _exists queries on keyword fields use norms when they're available. (#33006) 2018-08-21 16:33:42 -07:00
Jim Ferenczi 767c69593c
Fix quoted _exists_ query (#33019)
This change in the `query_string` query fixes the detection of the special
`_exists_` field when it is used with a quoted term.

Closes #28922
2018-08-21 22:15:09 +02:00
Jim Ferenczi 8b43e21521
Fix multi fields empty query (#33017)
This change fixes empty query removal when all fields remove the search term in
`simple_query_string`, `multi_match` and `query_string`.

Closes #33009
2018-08-21 22:12:53 +02:00
Igor Motov 3973bb4028
Fix north pole overflow error in GeoHashUtils.bbox() (#32891)
Fixes an overflow error in  GeoHashUtils.bbox() calculation of a
bounding box for geohashes with maximum precision located next to the
north pole.
2018-08-21 14:59:37 -04:00
Jason Tedor bdfcc326d7
Enable avoiding mmap bootstrap check (#32421)
The maximum map count boostrap check can be a hindrance to users that do
not own the underlying platform on which they are executing
Elasticsearch. This is because addressing it requires tuning the kernel
and a platform provider might now allow this, especially on shared
infrastructure. However, this bootstrap check is not needed if mmapfs is
not in use. Today we do not have a way for the user to communicate that
they are not going to use mmapfs. This commit therefore adds a setting
that enables the user to disallow mmapfs. When mmapfs is disallowed, the
maximum map count bootstrap check is not enforced. Additionally, we
fallback to a different default index store and prevent the explicit use
of mmapfs for an index.
2018-08-21 11:02:25 -04:00
Simon Willnauer 92076497e5
Use a dedicated ConnectionManger for RemoteClusterConnection (#32988)
This change introduces a dedicated ConnectionManager for every RemoteClusterConnection
such that there is not state shared with the TransportService internal ConnectionManager.
All connections to a remote cluster are isolated from the TransportService but still uses
the TransportService and it's internal properties like the Transport, tracing and internal
listener actions on disconnects etc.
This allows a remote cluster connection to have a different lifecycle than a local cluster connection,
also local discovery code doesn't get notified if there is a disconnect on from a remote cluster and
each connection can use it's own dedicated connection profile which allows to have a reduced set of
connections per cluster without conflicting with the local cluster.

Closes #31835
2018-08-21 12:43:25 +02:00