Commit Graph

5755 Commits

Author SHA1 Message Date
Nhat Nguyen 0127b71901 Adjust keep alive assertion in ShardSearchRequest (#62582)
Relates #62184
2020-09-17 16:09:54 -04:00
Lee Hinman 9bb7ce0b22
[7.x] Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338) (#62557)
Backports the following commits to 7.x:

    Allocate new indices on "hot" or "content" tier depending on data stream inclusion (#62338)
2020-09-17 13:29:23 -06:00
Martijn van Groningen 5f643433c6
Prohibit the usage of create index api in namespaces managed by data stream templates (#62574)
Backport of #62527 to 7.x branch.

This commit adds validation that prohibits the creation of regular indices
in the namespace of templates with data streams enabled.

It shouldn't be possible to create ordinary indices when the name of the index
matches with a composable index template that enables data streams. Auto creation
has logic that creates data streams instead of regular indices. However validation
logic for the create index api was missing.
2020-09-17 20:10:42 +02:00
Jim Ferenczi df93b31b15
Faster sequential access for stored fields (#62509) (#62573)
Faster sequential access for stored fields

Spinoff of #61806
Today retrieving stored fields at search time is optimized for random access.
So we make no effort to keep state in order to not decompress the same data
multiple times because two documents might be in the same compressed block.
This strategy is acceptable when retrieving a top N sorted by score since
there is no guarantee that documents will be on the same block.
However, we have some use cases where the document to retrieve might be
completely sequential:

Scrolls or normal search sorted by document id.
Queries on Runtime fields that extract from _source.
This commit exposes a sequential stored fields reader in the
custom leaf reader that we use at search time.
That allows to leverage the merge instances of stored fields readers that
are optimized for sequential access.
This change focuses on the fetch phase for now and leverages the merge instances
for stored fields only if all documents to retrieve are adjacent.
Applying the same logic in the source lookup of runtime fields should
be trivial but will be done in a follow up.

The speedup on queries sorted by doc id is significant.
I played with the scroll task of the http_logs rally track
on my laptop and had the following result:

|                                                        Metric |   Task |    Baseline |   Contender |     Diff |    Unit |
|--------------------------------------------------------------:|-------:|------------:|------------:|---------:|--------:|
|                                            Total Young Gen GC |        |       0.199 |       0.231 |    0.032 |       s |
|                                              Total Old Gen GC |        |           0 |           0 |        0 |       s |
|                                                    Store size |        |     17.9704 |     17.9704 |        0 |      GB |
|                                                 Translog size |        | 2.04891e-06 | 2.04891e-06 |        0 |      GB |
|                                        Heap used for segments |        |    0.820332 |    0.820332 |        0 |      MB |
|                                      Heap used for doc values |        |    0.113979 |    0.113979 |        0 |      MB |
|                                           Heap used for terms |        |     0.37973 |     0.37973 |        0 |      MB |
|                                           Heap used for norms |        |     0.03302 |     0.03302 |        0 |      MB |
|                                          Heap used for points |        |           0 |           0 |        0 |      MB |
|                                   Heap used for stored fields |        |    0.293602 |    0.293602 |        0 |      MB |
|                                                 Segment count |        |         541 |         541 |        0 |         |
|                                                Min Throughput | scroll |     12.7872 |     12.8747 |  0.08758 | pages/s |
|                                             Median Throughput | scroll |     12.9679 |     13.0556 |  0.08776 | pages/s |
|                                                Max Throughput | scroll |     13.4001 |     13.5705 |  0.17046 | pages/s |
|                                       50th percentile latency | scroll |     524.966 |     251.396 |  -273.57 |      ms |
|                                       90th percentile latency | scroll |     577.593 |     271.066 | -306.527 |      ms |
|                                      100th percentile latency | scroll |      664.73 |     272.734 | -391.997 |      ms |
|                                  50th percentile service time | scroll |     522.387 |     248.776 | -273.612 |      ms |
|                                  90th percentile service time | scroll |     573.118 |      267.79 | -305.328 |      ms |
|                                 100th percentile service time | scroll |     660.642 |     268.963 | -391.678 |      ms |
|                                                    error rate | scroll |           0 |           0 |        0 |       % |
Closes #62024
2020-09-17 19:58:18 +02:00
Alan Woodward 5421a743a7 Move SearchLookup into FetchContext (#62549)
FetchSubPhase#getProcessor currently takes a SearchLookup parameter. This
however is only needed by a couple of subphases, and will almost certainly change in
future as we want to simplify how fetch phases retrieve values for individual hits.

To future-proof against further signature changes, this commit moves the SearchLookup
reference into FetchContext instead.
2020-09-17 17:39:02 +01:00
Alan Woodward e3e3aef3d8 Load version metadata even when stored fields are disabled (#62533)
Currently we throw an error if stored fields are disabled, but hit version metadata is
requested on a search. This doesn't make much sense, as the version information
is stored in docvalues and so has no connection with stored fields.

This commit removes the link between the two, allowing version metadata to be loaded
even when stored fields are disabled in a request.

Fixes #62456
2020-09-17 17:39:02 +01:00
Alan Woodward 91e2330529 Warn on badly-formed null values for date and IP field mappers (#62487)
In #57666 we changed when null_value was parsed for ip and date fields. Previously,
the null value was stored as a string, and parsed into a date or InetAddress whenever
a document containing a null value was encountered. Now, the values are parsed when
the mappings are built, which means that bad values are detected up front; if you try and
add a mapping with a badly-parsed ip or date for a null_value, the mapping will be
rejected.

This causes problems for upgrades in the case when you have a badly-formed null_value
in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only
logging a warning on the badly formed value, replicating the earlier behaviour.

Fixes #62363
2020-09-17 16:38:08 +01:00
Ignacio Vera 901000891a
Fix test error in InternalCardinalityTests#testEqualsAndHashcode (#62542) (#62554)
Make sure the the new HLL++ is different to the original one
2020-09-17 17:09:13 +02:00
Alan Woodward 63afc61b08 Introduce FetchContext (#62357)
We currently pass a SearchContext around to share configuration among
FetchSubPhases. With the introduction of runtime fields, it would be useful
to start storing some state on this context to be shared between different
subphases (for example, stored fields or search lookups can be loaded lazily
but referred to by many different subphases). However, SearchContext is a
very large and unwieldy class, and adding more methods or state here feels
like a bridge too far.

This commit introduces a new FetchContext class that exposes only those
methods on SearchContext that are required for fetch phases. This reduces
the API surface area for fetch phases considerably, and should give us some
leeway to add further state.
2020-09-17 09:57:43 +01:00
Adrien Grand e0a4a94985
Speed up merging when source is disabled. (#62443) (#62474)
The CodecReader wrapper we use to remove the `_recovery_source` field
doesn't override `StoredFieldsreader#getMergeInstance`, which has the
undesired side-effect of preventing the wrapped stored fields reader
from optimizing merging.
2020-09-17 10:53:31 +02:00
David Turner 62dcc5b1ae Suppress stack in VersionConflictEngineException (#62433)
`VersionConflictEngineException` is thrown on the hot path for updates,
but stack traces are expensive to compute and transport and rarely
useful for this kind of exception. This commit avoids computing the
stack trace for these exceptions.
2020-09-17 09:40:07 +01:00
Adrien Grand 9a8225bbc1
Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450) (#62476)
This new snapshot contains the following JIRAs that we're interested in:
 - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525)
Better handling of small documents. This should improve retrieval times
when documents are less than ~1kB.
 - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510)
Faster flushes when index sorting is enabled by not compressing the
temporary files that store stored fields and term vectors.
2020-09-17 10:28:20 +02:00
Armin Braun 5112c17319
Add WARN Logging on Slow Transport Message Handling (#62444) (#62521)
Add simple WARN logging on slow inbound TCP messages.
2020-09-17 10:12:20 +02:00
David Turner 14aec44cd8 Log if recovery affected by disconnect (#62437)
Today we only emit `DEBUG` logs if the source disconnects from the
target during a recovery. This deserves to be noisier by default since
it should be rare and may help users identify other problems with their
network or with their shard movements.

This commit promotes this message to `INFO`. There's no need for `WARN`
since these days we will normally resume the recovery where it left off.
2020-09-17 08:22:40 +01:00
Ignacio Vera 2d3ca9c155
Introduce a sparse HyperLogLogPlusPlus class for cloning and serializing low cardinality buckets (#62480) (#62520)
Reduces the memory footprint of an HLL++ structure that uses Linear counting when cloning or deserialising the data structure.
2020-09-17 08:54:50 +02:00
Julie Tibshirani e1da558206 Remove unused test search context for significant_terms. 2020-09-16 14:27:11 -07:00
Jay Modi 5da922064f
LocalNodeMasterListener is a regular listener (#62485)
This commit makes the LocalNodeMasterListener interface extend the
ClusterStateListener interface and use a default implementation for
detecting whether the local node master status changed.

Backport of #62422
2020-09-16 11:42:53 -06:00
Tanguy Leroux 8a2e9e66d4
Wait for relocations and disk threshold monitor in DiskThresholdDeciderIT (#62358) (#62467)
Closes #62326
2020-09-16 17:40:20 +02:00
Armin Braun f6a8599cf8
Don't Start Redundant ConsistentSettingsService (#62283) (#62428)
The consistent settings service is only used in tests so far. No need to start it
unless it's actually used.
2020-09-16 09:43:04 +02:00
Ignacio Vera f3ed641fc7
Adds bucketOrd back to cardinality algorithms (#62389) (#62427) 2020-09-16 08:41:57 +02:00
Nik Everett 24a24d050a
Implement fields fetch for runtime fields (backport of #61995) (#62416)
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.

While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.

Relates to #59332
2020-09-15 20:24:10 -04:00
Nik Everett 0a7f335215
Speed up writeVInt (backport of #62345) (#62419)
This speeds up `StreamOutput#writeVInt` quite a bit which is nice
because it is *very* commonly called when serializing aggregations. Well,
when serializing anything. All "collections" serialize their size as a
vint. Anyway, I was examining the serialization speeds of `StringTerms`
and this saves about 30% of the write time for that. I expect it'll be
useful other places.
2020-09-15 17:14:08 -04:00
Nik Everett 771a8893a6
Add more debugging information for cardinality agg (#62317) (#62397)
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
   out if we're using the optimization for segment ordinals. It adds a few
   more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
   non-trivial for some aggregations, like cardinality.
2020-09-15 13:21:11 -04:00
Armin Braun ffbc64bd10
Log WARN on Response Deserialization Failure (#62368) (#62388)
We never see this exception in the logs even though it's pretty severe.
All we might see is an exception about a transport message not having been read fully
from the logic that follows this code.
Technically we should probably bubble up the exception but that's a bigger change
and needs some carefully reasoning, this change for the time being at least simplifies
tracking down deserialization issues in responses.
2020-09-15 18:27:39 +02:00
Adrien Grand 6db8afefc2
Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376)
Upgrade to a new Lucene snapshot that (at least partially) addresses the
indexing rate regression when index sorting is enabled.

Backport of #62334.
2020-09-15 17:48:07 +02:00
Alan Woodward f89fa421e2 Remove unnecessary IndexSearcher field on HitContext (#62378)
FastVectorHighlighter uses the top-level reader to rewrite queries against, which
it gets via an IndexSearcher field on HitContext. However, we can already access
this top-level reader via HitContext's existing LeafReaderContext field.

This commit removes the unnecessary field and constructor parameter, and
changes the implementation of topLevelReader to go via ReaderUtils and
the leaf reader context.
2020-09-15 15:46:14 +01:00
Christoph Büscher 0ca9829867 Muting CoordinatorTests#testLogsMessagesIfPublicationDelayed 2020-09-15 15:40:51 +02:00
Albert Zaharovits aeed1c05b0
Ensure authz operation overrides transient authz headers (#61621)
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.

Co-authored-by: Tim Vernum tim@adjective.org
2020-09-15 16:37:38 +03:00
Armin Braun eae6a3b18e
Fix testMappingVersionAfterDynamicMappingUpdate (#62352) (#62360)
There is a race in this test where the index request will return
once the dynamic mapping update has been observed by the cluster
state observer internally used by the indexing but not hit all
state appliers and thus isn't showing up as the applied state returned
by `clusterService.state()` yet.
2020-09-15 11:59:22 +02:00
Alan Woodward a68f7077c7 Rationalise fetch phase exceptions (#62230)
We have a special FetchPhaseExecutionException which contains some useful
information about which shard and doc a fetch phase has failed in. However, this
is not used in many places - currently only the ExplainPhase and the highlighters
throw one, and the FetchPhase itself catches IOExceptions and just passes them
to the ExceptionsHelper with no extra context.

This commit changes FetchPhase to throw FetchPhaseExecutionException if it
encounters problems in any of its subphases, and removes the special handling
from the explain and highlight phases. It also removes the need to pass shard ids
around when building HitContext objects.
2020-09-15 09:28:19 +01:00
Alan Woodward 8089210815 Some small cleanups in TermVectorsService (#62292)
We removed the use of aggregated stats from term vectors back in #16452, but there is
a bunch of dead code left here which can be stripped out.
2020-09-15 09:01:49 +01:00
Ignacio Vera 3536f7f7c2
Initialize BitArray storage as number of bits (#62327) (#62354) 2020-09-15 08:34:22 +02:00
Armin Braun c81a076f5a
Improve Efficiency of ClusterApplierService Iteration (#62282) (#62350)
The complexity of removing a timeout listener was `O(n)` which
means that in case of many queued up CS update tasks (such as in the
case of an avalanche of dynamic mapping updates) we're dealing with
quadratic complexity for timing out N tasks which was observed to be
an issue in practice.

This PR makes the complexity of timing out a task `O(1)` and generally
simplifies the iteration logic of listeners and applies to be a little
more efficient and inline better.
2020-09-15 05:59:48 +02:00
Julie Tibshirani f56ce4f39b
Fix failure in InnerHitBuilderTests around 'fields' option. (#62344)
The case InnerHitBuilderTests#testEqualsAndHashcode creates a copy of the object
by serializing + deserializing it, then applies a modification. If the 'fields'
list is empty, then deserializing it results in Collections.emptyList. Because
this is immutable, then modifying it can throw an UnsupportedOperationException.

This PR takes the same approach as for docvalue_fields, where we create a new
list instead of trying to add to an empty one.
2020-09-14 15:39:03 -07:00
Julie Tibshirani 4a19bdb2ea
Support the 'fields' option in inner_hits and top_hits. (#62337)
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation

Addresses #61949.
2020-09-14 11:51:45 -07:00
David Turner 9acd2fd1fd Minor cleanups to BytesReferenceStreamInput (#62302)
Followup to #61681:

- reuse the current iterator in `reset()` if possible
- simply some integer-overflow-avoidance in `skip()`
- clarify some comments
- address some IntelliJ warnings
2020-09-14 17:02:27 +01:00
Christoph Büscher e2eada2498
Fix disabling `allow_leading_wildcard` (#62300) (#62318)
Disabling the `query_string` queries `allow_leading_wildcard` parameter didn't
work after a change probably introduced in #60959 because the various field types
`wildcardQuery` don't check the leading characters like
QueryParserBase#getWildcardQuery does. This PR adds the missing check also
before calling the field types wildcard generating method.

Closes #62267
2020-09-14 17:13:17 +02:00
Alan Woodward 5358cee29c Cut over more mapping tests to MapperServiceTestCase (#62312)
Shaves a few more seconds off the build.
2020-09-14 16:00:37 +01:00
Armin Braun 95766da345
Save Some Allocations when Working with ClusterState (#62060) (#62303)
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
2020-09-14 15:09:54 +02:00
Armin Braun 875af1c976
Remove Dead Variable in BlobStoreIndexShardSnapshots. (#62285) (#62295)
This was never used.

Co-authored-by: Howard <danielhuang@tencent.com>
2020-09-14 13:40:39 +02:00
Luca Cavanna 53bf057a53 [TEST] avoid double null check in TransportSearchActionTests 2020-09-11 10:10:09 +02:00
Nhat Nguyen aafb2cb812 Support point in time cross cluster search (#61827)
This commit integrates point in time into cross cluster search.

Relates #61062
Closes #61790
2020-09-10 19:25:48 -04:00
Nhat Nguyen 808c8689ac Always include the matching node when resolving point in time (#61658)
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.

Closes #61627
2020-09-10 19:25:48 -04:00
Nhat Nguyen 035f0638f4 Support point in time in async_search (#61560)
This commit integrates point in time into async search and
ensures that it works correctly with security enabled.

Relates #61062
2020-09-10 19:25:48 -04:00
Nhat Nguyen 063a6d047c Release search context when scroll keep_alive is too large (#62179)
Previously, we close related search contexts if the keep_alive of a scroll is too large. 
But we accidentally change this behavior in #62061.
2020-09-10 19:25:48 -04:00
Nhat Nguyen 2eb1e8bc84 Make keep alive of point in time optional in search (#62184)
A search request should not be required to extend the keep_alive of a point in time. 
This change makes that parameter optional.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 3fc35aa76e Shard Search Scroll failures consistency (#62061)
Today some uncaught shard failures such as RejectedExecutionException skips the release of shard context
and let subsequent scroll requests access the same shard context again. Depending on how the other shards advanced,
this behavior can lead to missing data since scrolls always move forward.
In order to avoid hidden data loss, this commit ensures that we always release the context of shard search scroll requests whenever a failure
occurs locally. The shard search context will no longer exist in subsequent scroll requests which will lead to consistent shard failures
in the responses.
This change also modifies the retry tests of the reindex feature. Reindex retries scroll search request that contains a shard failure and
move on whenever the failure disappears. That is not compatible with how scrolls work and can lead to missing data as explained above.
That means that reindex will now report scroll failures when search rejection happen during the operation instead of skipping document
silently.
Finally this change removes an old TODO that was fulfilled with #61062.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 4d528e91a1 Ensure validation of the reader context is executed first (#61831)
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.

Relates #61446

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2020-09-10 19:25:48 -04:00
Luca Cavanna 44bd4a6004 Fix point in time toXContent impl (#62080)
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.

This commit fixes this and adds tests for it.
2020-09-10 19:25:47 -04:00
Nhat Nguyen 3d69b5c41e Introduce point in time APIs in x-pack basic (#61062)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
2020-09-10 19:25:47 -04:00
Armin Braun e0a81f7d14
Speed up Version Checks (#62216) (#62253)
The `fromId` method would show up in profiling and JIT analysis as not-inlinable because it's too large
in the contexts it's used in in many cases and was consuming a surprising amount of cycles for computing the
min compat versions.

-> extract cold path from `fromId` to make JIT happy and cache minimumg compatible versions to fields.
2020-09-10 22:57:06 +02:00
Armin Braun 25db5acb0d
Simplify TimeValue Serialization (#62023) (#62248)
This can be done without map lookups => less code and much smaller methods => better inlining potentially.
2020-09-10 20:16:21 +02:00
Armin Braun 7b941a18e9
Optimize Snapshot Shard Status Update Handling (#62070) (#62219)
Avoiding a number of noop updates that were observed to cause trouble (as in needless noop CS publishing) which can become an issue when working with a large number of concurrent snapshot operations.
Also this sets up some simplifications made in the clone snapshot branch.
2020-09-10 16:29:16 +02:00
Ignacio Vera c8981ea93d
upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213) (#62222) 2020-09-10 16:23:18 +02:00
Igor Motov b6bff56a56
Fix hard_bounds interval handling (#62129) (#62188)
The hard bounds were incorrectly scaled for intervals, which was
causing incorrect buckets to show up or no buckets at all for
interval other than 1.

Closes #62126
2020-09-09 15:42:12 -04:00
Nik Everett 1104d65465
Fix bug with terms' min_doc_count (#62130) (#62177)
The `global_ordinals` implementation of `terms` had a bug when
`min_doc_count: 0` that'd cause sub-aggregations to have array index out
of bounds exceptions. Ooops. My fault. This fixes the bug by assigning
ordinals to those buckets.

Closes #62084
2020-09-09 13:04:51 -04:00
Armin Braun 6710104673
Fix Creating NOOP Tasks on SNAPSHOT Pool (#62152) (#62157)
Fixing a few spots where NOOP tasks on the snapshot pool were created needlessly.
Especially when it comes to mixed master+data nodes and concurrent snapshots these
hurt delete operation performance needlessly.
2020-09-09 14:05:17 +02:00
Luca Cavanna fbf0967e20 QueryPhaseResultConsumer to call notifyPartialReduce (#62083)
As part of #60275 QueryPhaseResultConsumer ended up calling SearchProgressListener#onPartialReduce directly instead of notifyPartialReduce. That means we don't catch exceptions that may occur while executing the progress listener callback.

This commit fixes the call and adds a test for this scenario.
2020-09-09 13:44:07 +02:00
Luca Cavanna ad83261348 Print out search request as part of async search task description (#62057)
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.

With this commit we address that while also making sure that the description highlights that the task is originated from an async search.

Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
2020-09-09 13:44:07 +02:00
Rory Hunter b7fd7cf154
Write deprecation logs to a data stream (#61966)
Backport of #58924.

Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream
as well as to disk.
2020-09-09 12:16:28 +01:00
Armin Braun ed4984a32e
Remove Redundant Stream Wrapping from Compression (#62017) (#62132)
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
2020-09-09 03:27:38 +02:00
Nik Everett b8e9a7125f
Speed up empty highlighting many fields (backport of #61860) (#62122)
Kibana often highlights *everything* like this:
```
POST /_search
{
  "query": ...,
  "size": 500,
  "highlight": {
    "fields": {
      "*": { ... }
    }
  }
}
```

This can get slow when there are hundreds of mapped fields. I tested
this locally and unscientifically and it took a request from 20ms to
150ms when there are 100 fields. I've seen clusters with 2000 fields
where simple search go from 500ms to 1500ms just by turning on this sort
of highlighting. Even when the query is just a `range` that and the
fields are all numbers and stuff so it won't highlight anything.

This speeds up the `unified` highlighter in this case in a few ways:
1. Build the highlighting infrastructure once field rather than once pre
   document per field. This cuts out a *ton* of work analyzing the query
   over and over and over again.
2. Bail out of the highlighter before loading values if we can't produce
   any results.

Combined these take that local 150ms case down to 65ms. This is unlikely
to be really useful when there are only a few fetched docs and only a
few fields, but we often end up having many fields with many fetched
docs.
2020-09-08 15:49:50 -04:00
Alan Woodward 28fd4a2ae8 Convert RangeFieldMapper to parametrized form (#62058)
This also adds the ability to define a serialization check on Parameters, used
in this case to only serialize format and locale parameters if the mapper is a
date range.
2020-09-08 18:44:13 +01:00
Alan Woodward 5f05eef7e3 Convert some more mapping tests to MapperServiceTestCase (#62089)
We don't need to extend ESSingleNodeTestCase for all these tests.
2020-09-08 17:51:40 +01:00
Tim Brooks 075271758e
Keep checkpoint file channel open across fsyncs (#61744)
Currently we open and close the checkpoint file channel for every fsync.
This file channel can be kept open for the lifecycle of a translog
writer. This avoids the overhead of opening the file, checking file
permissions, and closing the file on every fsync.
2020-09-08 08:54:53 -06:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
Armin Braun ebd1569028
Fix testMasterFailOverWithQueuedDeletes (#62062) (#62078)
Fixing very rare corner case where the delete retry is slow.

Closes #62031
2020-09-08 10:35:06 +02:00
Nhat Nguyen bb0a583990
Allow enabling soft-deletes on restore from snapshot (#62018)
Closes #61969
2020-09-07 09:45:36 -04:00
Alan Woodward cbc9578cbd Remove SearchPhase interface (#62050)
The interface is never used as an abstraction - implementations are are called directly,
and most of them don't need to implement the preProcess method.
2020-09-07 13:45:43 +01:00
David Turner 3389d5ccb2 Introduce integ tests for high disk watermark (#60460)
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-09-07 14:39:39 +02:00
Armin Braun 395538f508
Improve Snapshot State Machine Performance (#62000) (#62049)
Just a few random things to optimize motivated by somewhat sub-standard performance
for large snapshot cluster states with many concurrent snapshots observed in production.
2020-09-07 13:25:40 +02:00
Jim Ferenczi fa8e76abb1
Improve reduction of terms aggregations (#61779) (#62028)
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-07 13:13:20 +02:00
Alan Woodward a295b0aa86 Fix null_value parsing for data_nanos field mapper (#61994)
The null_value parameter for date fields is always parsed using DateFormatter.parseMillis,
which is incorrect for nanosecond resolution fields. This commit changes the parsing logic
to always use DateFieldType.parse() to parse the null value.
2020-09-07 10:58:54 +01:00
Alan Woodward 1799c0c583 Convert completion, binary, boolean tests to MapperTestCase (#62004)
Also fixes a metadata serialization bug in CompletionFieldMapper.
2020-09-07 10:48:20 +01:00
Luca Cavanna 0c8b438577
Add support for runtime fields (#61776)
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.

We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-09-07 09:14:53 +02:00
Howard b26584dff8 Remove unused deciders in BalancedShardsAllocator (#62026) 2020-09-07 00:04:16 -04:00
Armin Braun 1e3edbbe74
Simplify BytesReference StreamInput (#61681) (#62014)
Flattening both streams into a single stream here saves a few objects and some indirection.
Also, removed the redundant `offset` field which added nothing but complexity by forcing the
incrementation of two counters on every read.
2020-09-05 10:45:52 +02:00
Ryan Ernst 6d3b691048
Add snapshot only test modules (#61954)
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.

Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
2020-09-04 16:35:18 -07:00
Yannick Welsch 6d08b55d4e Simplify searchable snapshot shard allocation (#61911)
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
2020-09-04 15:45:00 +02:00
Alan Woodward 66bb1eea98 Improve error messages on bad [format] and [null_value] params for date mapper (#61932)
Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper
configuration, you get a vague error:

Failed to parse mapping [_doc]: cannot parse empty date
Similarly, if you pass an incorrect format, you get the error:

Failed to parse mapping [_doc]: Invalid format [...]
This commit improves both these errors by including the mapper name and parameter that
are misconfigured.

Fixes #61712
2020-09-04 14:13:28 +01:00
Ignacio Vera 31c026f25c
upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957) (#61974) 2020-09-04 13:46:20 +02:00
Nik Everett 3d23dcd742
Use standard bit set impl in cardinality (#61816) (#61930)
This replaces a specialized bit set implementation used in cardinality
with our standard `BitArray` which works exactly the same way. Its also
tracked by `BigArrays` which is great!
2020-09-03 12:37:30 -04:00
Nik Everett 3934e14bc0
Fixup vwhisto test (#60936) (#61928)
This test assumed some random bounds that turned out not to hold in some
cases.

Closes #60673
2020-09-03 12:37:17 -04:00
Alan Woodward 48870c60c7 Don't spin up a whole node to unit test some data structures (#61923)
BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase,
which builds an entire node just to run some unit tests over entirely in-memory data
structures. This commit converts them both to extend ESTestCase.
2020-09-03 17:19:42 +01:00
Alan Woodward 3a1e0edf0a Convert DateFieldMapperTests to MapperTestCase (#61920) 2020-09-03 16:04:02 +01:00
Martijn Laarman cfa54c08bd [7.x] Version bump 7.9.1 release 2020-09-03 16:41:58 +02:00
Alan Woodward e2f006eeb4
Merge FetchSubPhase hitsExecute and hitExecute methods (#60907) (#61893)
FetchSubPhase has two 'execute' methods, one which takes all hits to be examined,
and one which takes a single HitContext. It's not obvious which one should be implemented
by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first
run the hitExecute methods of all subphases, and then subsequently call all the
hitsExecute methods.

This commit reworks FetchSubPhase to replace these two variants with a processor class,
`FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method.  This
processor class has two methods, `setNextReader()` and `process`.  FetchPhase collects
processors from all its subphases (if a subphase does not need to execute on the current
search context, it can return `null` from `getProcessor`).  It then sorts its hits by docid, and
groups them by lucene leaf reader.  For each reader group, it calls `setNextReader()` on
all non-null processors, and then passes each doc id to `process()`.

Implementations of fetch sub phases can divide their concerns into per-request, per-reader
and per-document sections, and no longer need to worry about sorting docs or dealing with
reader slices.

FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods,
setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all
these executors together (if a phase should not be executed, then it returns null here); then
it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then
execute for each hit in turn. Individual sub phases no longer need to concern themselves with
sorting docs or keeping track of readers; global structures can be built in
getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.
2020-09-03 12:20:55 +01:00
Alan Woodward af01ccee93
Add specific test for serializing all mapping parameter values (#61844) (#61877)
This commit adds a test to MapperTestCase that explicitly checks that a mapper can
serialize all its default values, and that this serialization can then be re-parsed. Note that
the test is disabled for non-parametrized mappers as their serialization may in some cases
output parameters that are not accepted. Gradually moving all mappers to parametrized
form will address this.

The commit also contains a fix to keyword mappers, which were not correctly serializing
the similarity parameter; this partially addresses #61563. It also enables `null` as a
value for `null_value` on `scaled_float`, as a follow-up to #61798
2020-09-03 09:20:26 +01:00
Nik Everett c19f67ce30
Support longs in BitArray (backport of #61867) (#61871)
We frequently use `long`s with `BitArray` in aggs and right now we have
to assert that the `long` fits in an `int`. This adds support for `long`
to `BitArray` so we don't need those assertions.
2020-09-02 17:24:31 -04:00
Henning Andersen 867d5f1c68
Search memory leak (#61788) (#61862)
Search could leak memory if global ordinals were calculated as part of
a search with low level cancellation enabled. QueryPhase registers a
cancellation on the reader that is never removed, which ends up being
referenced from the global ordinals cache entry. This keeps an indirect
reference to the search context. A significant leak can occur when a
heavy aggregation (cardinality for instance) is used and a failure occurs
during search, in particular if the pages backing the hyperlog++ structure
are not recycled when it is closed.

This commit also fixes an issue with an unclosed resource and request
breaker adjustment in the cardinality aggregation.
2020-09-02 18:51:14 +02:00
Jim Ferenczi a0e4331c49 Cleanup usages of QueryPhaseResultConsumer (#61713)
This commit generalizes how QueryPhaseResultConsumer is initialized.
The query phase always uses this consumer so it doesn't need to be hidden behind
an abstract class.
2020-09-02 14:41:02 +02:00
Alan Woodward d59343b4ba
Allow [null] values in [null_value] (#61798) (#61807)
Several field mappers have a null_value parameter, that allows you to specify a placeholder
value to insert into a document if the incoming value for that field is null. The default value
for this is always null, meaning "add no placeholder". However, we explicitly bar users from
setting this parameter directly to null (done in #7978, in order to fix an NPE).

This exclusion means that if a mapper is serialized with include_defaults, then we either need
to special-case null_value to ensure that it is not output when it holds the default value, or
we find that the resulting serialized form cannot be used to create a mapping. This stops us
doing some useful generic testing of mappers.

This commit permits null as a parameter value for null_value, and changes the tests to check
that it is a) permissible and b) applied without throwing errors. As part of the testing changes,
a new base class MapperServiceTestCase is refactored from MapperTestCase, holding
the various helper methods related to building mappings but not the single-mapper specific
abstract methods.

Closes #58823
2020-09-02 10:42:19 +01:00
Igor Motov 48e53cca94
Fix wrong NaN comparison (#61795) (#61811)
Fixes wrong NaN comparison in error message generator in GeoPolygonDecomposer and PolygonBuilder.

Supersedes #48207

Co-authored-by: Pedro Luiz Cabral Salomon Prado <pedroprado010@users.noreply.github.com>
2020-09-01 15:50:38 -04:00
Tim Brooks e573fa9abc
Add data.path fast path for FilePermission (#61302)
The recursive data.path FilePermission check is an extremely hot
codepath in Elasticsearch. Unfortunately the FilePermission check in
Java is extremely allocation heavy. As it iterates through different
file permissions, it allocates byte arrays for each Path component that
must be compared. This PR improves the situation by adding the recursive
data.path FilePermission it its own PermissionsCollection object which
is checked first.
2020-09-01 12:03:22 -06:00
Armin Braun 28710c985d
Dry up Settings from Map Construction (#61778) (#61803)
We used the same hack all over the place. At least drying it up to a single place.

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
2020-09-01 19:46:10 +02:00
Tanguy Leroux 6e944d9e21
Throws IndexNotFoundException in TransportGetAction for unknown System indices (#61785) (#61791)
The change #57936 introduced a dedicated thread pool for reads in system indices. 
It also introduced a potential NPE in the case the index to read in not yet present in 
the cluster state. This commit fixes that bug by using the getIndexSafe() instead of 
just index() method when retrieving the index's metadata so that an INFE is thrown 
if the index does not exist.
2020-09-01 17:41:57 +02:00
Dan Hermann 88a448f1cd
Fix wrong result when executing bulk requests with and without pipeline (#60818) (#61777) 2020-09-01 07:05:25 -05:00
Armin Braun 3fd25bfa87
Fix Concurrent Snapshot Create+Delete + Delete Index (#61770) (#61773)
We had a bug here were we put a `null` value into the shard
assignment mapping when reassigning work after a snapshot delete
had gone through. This only affects partial snaphots but essentially
dead-locks the snapshot process.

Closes #61762
2020-09-01 13:20:25 +02:00
Tanguy Leroux 787dfda4c1
Prevent snapshots to be mounted as system indices (#61517) (#61727)
System indices can be snapshotted and are therefore potential candidates 
to be mounted as searchable snapshot indices. As of today nothing 
prevents a snapshot to be mounted under an index name starting with . 
and this can lead to conflicting situations because searchable snapshot 
indices are read-only and Elasticsearch expects some system indices 
to be writable; because searchable snapshot indices will soon use an 
internal system index (#60522) to speed up recoveries and we should
prevent the system index to be itself a searchable snapshot index 
(leading to some deadlock situation for recovery).

This commit introduces a changes to prevent snapshots to be mounted 
as a system index.
2020-09-01 11:13:28 +02:00
Boice Huang 8fdd3d158b Remove redundant symbol in msearch tests (#61353) 2020-09-01 10:58:22 +02:00
Nik Everett fb84c1f73e
Calculate precise cardinality upper bounds (#61529) (#61754)
This reworks `CardinalityUpperBound` to support precise estimates while
maintaining most of the public API. This will allow us to make more
informed choices about the data structures that we use in aggregations.
None of those interesting choices come as part of this change, but they
are more possible with it.
2020-08-31 15:10:02 -04:00
Dan Hermann 2858e1efc4
Document new stats in _cat/nodes (#60445) (#61742) 2020-08-31 12:40:21 -05:00
Adam Locke 5723b928d7
Remove Outdated Snapshot Docs (#61684) (#61728)
Removing some now outdated statements that refer to a time
when snapshot operations could not run concurrently.

Closes #61680
2020-08-31 12:04:27 -04:00
Jason Tedor 43cb7c48bd
Adjust Lucene versions for 7.9.1
This commit adjusts the Lucene versions for 7.9.1 after the backporting
of upgrading the 7.9 branch to Lucene 8.6.2.
2020-08-31 10:30:39 -04:00
Jason Tedor 64cd229b35
Upgrade to Lucene 8.6.2 (#61688)
This commit upgrades the Lucene dependencies to 8.6.2.
2020-08-31 09:54:07 -04:00
Rory Hunter ff6c071275
Implement deprecation logging using log4j (#61629)
Backport of #61474.

Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
2020-08-31 12:42:04 +01:00
Armin Braun 5c86b216e8
Fix Race in testGetSnapshotsRequest (#61694) (#61700)
The fact that the data node is already blocked on writing
data files did not guarantee that the cluster state that made
the data node start snapshotting is already applied on master.
This could lead to races where the get snapshots action still
runs based on a state without the snapshot in it, tripping the assertion.
Much safer to handle this by waiting on the non-blocking snapshot create
to return, which guarantees that the CS has been applied on master.

Closes #61541
2020-08-31 11:06:51 +02:00
Armin Braun 22e4d759c3
Speed up Reading Enum Set from Stream (#61678) (#61687)
No need in adding enum values to a normal set and then copying, the `EnumSet` is directly mutable just fine.
2020-08-30 20:49:51 +02:00
Jake Landis d2e5f2f532
[7.x] Enhance the ingest node simulate verbose output (#60433) (#60678)
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
  * `success` - if the processor ran with out errors
  * `error` - if the processor ran but threw an error that was not ingored
  * `error_ignored` - if the processor ran but threw an error that was ingored
  * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
  * `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist

closes #56004
2020-08-27 16:53:09 -05:00
Lee Hinman 1bfebd54ea
[7.x] Allocate newly created indices on data_hot tier nodes (#61342) (#61650)
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
2020-08-27 13:41:12 -06:00
Luca Cavanna f769821bc8
Pass SearchLookup supplier through to fielddataBuilder (#61430) (#61638)
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-08-27 18:09:56 +02:00
Alan Woodward b6cb590685 Log more information when mappings fail on index creation (#61577)
Errors from bad mappings at index creation are currently logged at DEBUG level, which
can make it difficult to work out what's going on if the index is being auto-created. This
commit ups the log level to INFO for auto-created indices, and includes some more
information in the log message.
2020-08-27 15:08:51 +01:00
David Turner 411965d392 Allow background cluster state update in tests (#61455)
Today the `CoordinatorTests` run the publication process as a single
atomic action; however in production it appears possible that another
master may be elected, publish its state, then fail, then we win another
election, all in between the time we sampled our previous cluster state
and started to publish the one we first thought of.

This violates the `assertClusterStateConsistency()` assertion that
verifies the cluster state update event matches the states we actually
published and applied.

This commit adjusts the tests to run the publication process more
asynchronously so as to allow time for this behaviour to occur. This
should eventually result in a reproduction of the failure in #61437 that
will let us analyse what's really going on there and help us fix it.
2020-08-27 11:22:58 +01:00
David Turner b866aaf81c Use int for number of parts in blob store (#61618)
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:

    for (int i = 0; i < fileInfo.numberOfParts(); i++) {

This commit changes the representation of the number of parts of a blob
to an `int`.
2020-08-27 10:54:03 +01:00
David Turner 5df74cc888 Replace Math.toIntExact with toIntBytes (#61604)
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
2020-08-27 08:28:54 +01:00
Jay Modi 34c4fc3b91
Remove tasks module to define tasks system index (#61588)
This commit removes the tasks module that only existed to define the
tasks result index, `.tasks`,  as a system index. The definition for
the tasks results system index descriptor is moved to the
`SystemIndices` class with a check that no other plugin or module
attempts to define an entry with the same source.

Additionally, this change also makes the pattern for the tasks result
index a wildcard pattern since we will need this when the index is
upgraded (reindex to new name and then alias that to .tasks).

Backport of #61540
2020-08-26 09:48:23 -06:00
David Turner f2dc664228 Remove dead code in EsExecutors (#61574)
Removes a couple of unused methods.
2020-08-26 16:08:36 +01:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Igor Motov f70a59971a
[7.x] Add rate aggregation (#61369) (#61554)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 17:39:00 -04:00
Nik Everett 87cf81e179
Migrate some more mapper test cases (#61507) (#61552)
Migrate some more mapper test cases from `ESSingleNodeTestCase` to
`MapperTestCase`.
2020-08-25 15:27:26 -04:00
markharwood 8b56441d2b
Search - add case insensitive support for regex queries. (#59441) (#61532)
Backport to add case insensitive support for regex queries. 
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.

Closes #59235
2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
Armin Braun f22ddf822e
Some Optimizations around BytesArray (#61183) (#61511)
* Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache
* Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference
* Lighter `writeTo` implementation
* Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory
2020-08-25 07:13:39 +02:00
Armin Braun 806dfcfcf7
Speed up Compression Logic by Pooling Resources (#61358) (#61495)
This is mostly motivated by the performance issues we are seeing around the GET mappings
REST API which (in case of a large number of indices) will create decompressing streams in a hot loop
which takes a significant amount of time for the system calls involved in instantiating deflaters
and inflaters.
Also, this fixes a leaked deflater when deserializing cached repository data.
2020-08-25 04:01:55 +02:00
Armin Braun 16b932c1dc
Remove Potentially Expensive Use of BytesReference.toBytesRef (#61415) (#61503)
This method might have materialize all the bytes in a reference into a fresh `byte[]`.
Using the stream is much safer and only trivially more expensive + in most cases we now run the fast path via `BytesArray` anyway.
2020-08-24 23:58:21 +02:00
Nhat Nguyen d47bbbafe0 Cancel multisearch when http connection closed (#61399)
Relates #61337
2020-08-24 15:12:54 -04:00
Nhat Nguyen 23a0f8b617 Detect and optimize noop of update index settings (#61348)
This optimization is more relevant in the context of CCR. When a node in
the follower cluster leaves, we reallocate the shard-follow tasks on 
that node to other nodes. The new tasks will overwhelm the follower
cluster with many put-mapping, update-settings requests, although most
of them are noop. This change detects and optimizes the noop
update-settings requests.
2020-08-24 15:08:53 -04:00
Nik Everett f3b6d49ae1
Migrate server mapper tests to new MapperTestCase (#61378) (#61490)
This continues #61301, migrating all of the mappers in `server` to the
new `MapperTestCase` which is nicer than `FieldMapperTestCase` because
it doesn't depend on all of Elasticsearch.
2020-08-24 13:33:35 -04:00
Armin Braun bb4d97073c
Remove Favicon Special Path in RestController (#61460) (#61487)
It's unnecessary (and adds one string comparison to every request) to special
case the favicon so I added it as a normal REST handler to simplify the code.
2020-08-24 18:36:23 +02:00
Armin Braun af2e2782eb
Stop Needlessly Copying Bytes in XContent Parsing (#61447) (#61469)
Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient.
This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray`
before deserializing, adding overhead for copying the bytes and managing the buffers.

This commit fixes a number of spots where `BytesArray` is the most common type of
`BytesReference` to special case this type and parse it more efficiently.
Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.
2020-08-24 15:49:15 +02:00
Dan Hermann c53731a0cd
[7.x] Fix wrong pipeline name in debug log (#58817) (#61233) 2020-08-21 11:14:01 -05:00
David Turner 078e8717ee Stop opening PING conns to remote clusters (#61408)
Today a remote cluster connection comprises a `PING` and a `REG`
channel. The `PING` channel is only used for health checks between the
elected master and the members of its own cluster, so is unused in a
remote cluster connection. This commit removes this unused connection.
2020-08-21 12:21:57 +01:00
Armin Braun e09058df1a
Serialize Get Mappings Response on Generic ThreadPool (#57937) (#61401)
For large responses to the get mappings request, the serialization
to XContent can be extremely slow (serializing mappings is expensive since
we have to decompress and deserialize the mapping source).
To not introduce instability on the IO thread handling the get mappings response
we should move the serialization to the management pool.
The trade-off of introducing one or two new context switches for responses that are
small enough to not cause trouble on the transport thread to prevent instability
in case of a large number of mappings in the cluster seems worth it.
2020-08-21 08:06:30 +02:00
Armin Braun 22509c95f8
Fix Blackholed Connection Behavior in DisruptableMockTransport (#61310) (#61381)
It is not realistic to drop messages without eventually failing.
To retain the coverage of long pauses this PR adjusts the blackholed
behavior to fail a send after 24h (which is assumed to be longer than any
timeout in the system) instead of never.

Closes #61034
2020-08-21 07:54:56 +02:00
Julie Tibshirani 997c73ec17
Correct how field retrieval handles multifields and copy_to. (#61391)
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.
2020-08-20 15:53:35 -07:00
Julie Tibshirani 85ad328df7
Ensure fetch fields aren't dropped when rewriting search. (#61390)
Previously we didn't retain the requested fields when performing a shallow copy
of the search source. This meant that when a search was rewritten, we could drop
the requested fields and fail to return them in the response.
2020-08-20 14:58:58 -07:00
Armin Braun 08dbd6d989
Optimize a few Spots on IO Loop (#60865) (#61380)
Saving some cycles here and there on the IO loop:

* Don't instantiate new `Runnable` to execute on `SAME` in a few spots
* Don't instantiate complicated wrapped stream for empty messages
* Stop instantiating almost never used `ClusterStateObserver` in two spots
* Some minor cleanup and preventing pointless `Predicate<>` instantiation in transport master node action
2020-08-20 20:22:49 +02:00
Alan Woodward a3a0c63ccf
Convert NumberFieldMapper to parametrized form (#61092) (#61376)
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
2020-08-20 16:43:26 +01:00
Nhat Nguyen a3906dcef3 Enable cancellation for msearch requests (#61337)
Today multi-search requests are not cancellable because we create
regular tasks instead of cancellable ones for them.
2020-08-19 16:59:17 -04:00
Nik Everett 9789e6d154
Migrate some field mapper tests to ESTestCase (#61301) (#61346)
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
2020-08-19 15:43:49 -04:00
Armin Braun 4a53ae203e
Fix SharedClusterSnapshotRestoreIT.testThrottling (#61323) (#61328)
We have to set the recovery setting to `0` if we don't want throttling
from recoveries. Otherwise the randomized value used for this setting in
tests can lead to throttling unexpectedly.

Closes #61311
2020-08-19 15:26:32 +02:00
Nik Everett 70128e022b fix stats aggregator tests
With #60683 we stopped forcing aggregating all docs using a single
Aggregator which made some of our accuracy assumptions about the stats
aggregator incorrect. This adds a test that does the forcing and asserts
the old accuracy and adds a test without the forcing with much looser
accuracy guarantees.

Closes #61132
2020-08-19 08:55:43 -04:00
Alan Woodward b1aa0d8731
Fix fieldnames field type for pre-6.1 indexes (#61322)
The FieldNamesFieldMapper field has different behaviour for indexes created in
clusters earlier than v6.1, and the code to deal with this was still using the vestigial
FieldType field of FieldMapper in its indexing path. This meant that documents
added after an upgrade were not correctly indexing their field names field. This
commit corrects the parseCreateField method to use the default field type.

Fixes #61305
2020-08-19 12:59:09 +01:00
David Turner 389f7779e7 Report more details of unobtainable ShardLock (#61255)
Today a common reason for a `ShardLockObtainFailedException` is when a
shard is removed from a node and then assigned straight back to it again
before the node has had a chance to shut the previous shard instance
down. For instance, this can happen if a node briefly leaves the cluster
holding a primary with no in-sync replicas.

The message in this case is typically as follows:

    obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]

This is pretty hard to interpret, and doesn't raise the important
question: "why didn't the shard shut down sooner?"

With this change we reword the message a bit, report the age of the
shard lock, and adjust the details to report that the lock is held by a
closing shard:

    obtaining shard lock for [starting shard] timed out after [5000ms], lock already held for [closing shard] with age [12345ms]

Relates #38807
2020-08-19 06:36:28 +01:00
Nhat Nguyen 08b0e78ef4 Log more info when search ops higher than expected (#61108)
We have seen a situation where the total search operations are higher 
than expected. Unfortunately, we did not have enough info to figure it
out. This commit adds the failures to the error to provide more context
and adjusts the log level in case of failure to debug.
2020-08-18 15:20:41 -04:00
Rory Hunter bd7236cd65 Version bump for 7.9.0 release 2020-08-18 16:07:43 +01:00
Dimitrios Liappis c870640cbd
[7.x] Introduce 6.8.13 as a version (#61198)
Introduce version 6.8.13 to branch 7.x
2020-08-18 17:07:16 +03:00
Armin Braun 58d07b2ffc
Remove Unused ByteBufferReference (#61116) (#61250)
We only work with heap byte buffers at this point and those we can and do unwrap the
`byte[]` ourselves and use `BytesArray` instead of a needless level of indirection via `ByteBuffer`.
2020-08-18 10:53:40 +02:00
Armin Braun 6ffa7f0737
Fix testConcurrentSnapshotDeleteAndDeleteIndex (#61228) (#61249)
There is a corner case here in which during partial snapshot the index is
deleted right between starting the snapshot in the CS and the data node getting to work
on it, causing the data node the fail that shard snapshot and making the snapshot `PARTIAL`.

Closes #61208
2020-08-18 10:45:30 +02:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Nik Everett 1b7bbafd81
Add method to make random DateFormatter pattern (backport of #60613) (#61213)
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
2020-08-17 10:57:52 -04:00
Christoph Büscher 6866396e1d Improve 'ignore_malformed' handling for dates (#60211)
Currently we occasionally can get ArithmeticException from parsing bad input
values on 'date' fields that are passed on even if 'ignore_malformed' is set.
This change adds this exception to the ones we already catch for malformed
values.

Closes #52634
2020-08-17 16:18:08 +02:00
David Turner b21cb7f466 Reduce allocations when persisting cluster state (#61159)
Today we allocate a new `byte[]` for each document written to the
cluster state. Some of these documents may be quite large. We need a
buffer that's at least as large as the largest document, but there's no
need to use a fresh buffer for each document.

With this commit we re-use the same `byte[]` much more, only allocating
it afresh if we need a larger one, and using the buffer needed for one
round of persistence as a hint for the size needed for the next one.
2020-08-17 13:45:31 +01:00
David Turner f3e0c60896
Restrict testing of legacy discovery to tests (#61178)
The 7.x branch preserves the legacy discovery mechanism from 6.x purely
for running internal cluster tests; this mechanism is otherwise
completely untested and unsupported. However it is still technically
possible to use it outside of the test suite if you dig through the
source code to work out what settings need to be set. With this change
we make it impossible to use this mechanism in production.

Closes #61177
2020-08-17 11:05:27 +01:00
Ryan Ernst 9cb45dafab
Unwrap transport exception when using transport client (#60801)
The ReloadSecureSettingsIT makes requests to the reload settings apis.
In 7.x, the client used from the integ test infrastructure may be a
transport client. In that case, the expected exception type, and causes
the test to fail (though it will hang indefinitely due to not counting
down the latch, see
https://github.com/elastic/elasticsearch/pull/60800). This commit adds
unwrapping of the remote exception to get the underlying expected
exception.

closes #51546
2020-08-13 10:24:04 -07:00
Ryan Ernst c73ab0b16f
Ensure hotthreads do not produce node failures (#61073)
This commit adds an assertion that no sub-nodes requests within
hot threads failed.

relates #58842
2020-08-13 10:22:19 -07:00
Armin Braun 3143b5ea47
Stabilize testSnapshotDeleteRelocatingPrimaryIndex (#61088) (#61096)
Use transport blocking to make relocation take forever instead of relying on the relocation to take long enough to clash with the snapshot.

Closes #61069
2020-08-13 16:26:56 +02:00
Yannick Welsch 8e775394ac Fix testNoMasterActionsMetadataWriteMasterBlock (#60605)
We can't assert on the specific exception, unfortunately.
2020-08-13 10:48:16 +02:00
David Turner c6276ae177 Fail invalid incremental cluster state writes (#61030)
It is disastrous if we commit an incremental cluster state update
without having written the full state first. We assert that this doesn't
happen, but it is hard to fully test the myriad ways that things might
fail in a messy production environment. Given the disastrous
consequences it is worth erring on the side of caution in this area.
This commit fails invalid writes even if assertions are disabled.
2020-08-12 19:46:19 +01:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Alan Woodward 5b3c10c379
Fix serialization of AllFieldMapper (#61044)
Converting AllFieldMapper to parametrized form ended up not being run through BWC
testing, resulting in an incorrect implementation being committed. This commit fixes
the serialization, and adds unit tests as well as unmuting the BWC test that uncovered
the bug.

Fixes #60986
2020-08-12 17:32:55 +01:00
Yannick Welsch 8c488de576 Gracefully handle null in checkSettingsForTerminalDeprecation
Fixes a test failure after backport to 7.x
2020-08-12 18:03:52 +02:00
Yannick Welsch 25404cbe3d Provide option to allow writes when master is down (#60605)
Elasticsearch currently blocks writes by default when a master is unavailable. The cluster.no_master_block setting allows
a user to change this behavior to also block reads when a master is unavailable. This PR introduces a way to now also still
allow writes when a master is offline. Writes will continue to work as long as routing table changes are not needed (as
those require the master for consistency), or if dynamic mapping updates are not required (as again, these require the
master for consistency).

Eventually we should switch the default of cluster.no_master_block to this new mode.
2020-08-12 16:56:45 +02:00
Yannick Welsch 6644f2283d Do not access snapshot repo on dedicated voting-only master node (#61016)
Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the
snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently
elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and
will never be the elected master so we should not require it to have write access to the repository.

Closes #59649
2020-08-12 16:56:45 +02:00
Yannick Welsch af519be9cb Ensure repo not in use for wildcard repo deletes (#60947)
Repositories can't be unregistered when they are actively being used for snapshots or restores. Wildcard repository
deletes could silently bypass the "repo in use" checks however, which is now fixed.
2020-08-12 16:38:06 +02:00
Dan Hermann 538c93c923
Adding Hit counts and Miss counts for QueryCache exposed through REST api. (#60114) (#60993) 2020-08-12 08:21:09 -05:00
Alan Woodward c81dc2b8b7 Convert KeywordFieldMapper to parametrized form (#60645)
This makes KeywordFieldMapper extend ParametrizedFieldMapper, with explicitly
defined parameters.

In addition, we add a new option to Parameter, restrictedStringParam, which
accepts a restricted set of string options.
2020-08-12 11:41:11 +01:00
markharwood 66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Nik Everett ce9c5f0e46 Fix diversified sample tests
The test assumed that the aggregator only ran once but we turned that
off. This turns it back on.
2020-08-11 17:49:43 -04:00
Jay Modi 2fa6448a15
System index reads in separate threadpool (#60927)
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.

In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.

Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.

Relates #50251
Relates #37867
Backport of #57936
2020-08-11 12:16:34 -06:00
Julie Tibshirani a93be8d577
Handle nested arrays in field retrieval. (#60981)
We accept _source values with multiple levels of arrays, such as
`"field": [[[1, 2]]]`. This PR ensures that field retrieval can handle nested
arrays by unwrapping the arrays before parsing the values.
2020-08-11 10:22:16 -07:00
Mark Tozzi ab8518fb5b
[7.x] Extensibility for Composite Agg #59648 (#60842) 2020-08-11 09:14:33 -04:00
Alan Woodward 54279212cf
Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847) (#60924)
This commit cuts over all metadata field mappers to parametrized format.
2020-08-11 09:02:28 +01:00
Armin Braun 3e2dfc6eac
Remove GCS Bucket Exists Check (#60899) (#60914)
Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS.
We don't need to do the bucket exists check before using the repo, that just needlessly
increases the necessary permissions for using the GCS repository.
2020-08-11 09:54:27 +02:00
Julie Tibshirani d51eae6e9f
Prevent loading 'fields' with stored fields disabled. (#60938)
Because the 'fields' option loads from _source (which is a stored field), it is
not possible to retrieve 'fields' when stored_fields are disabled.

This also fixes #60912, where setting stored_fields: _none_ prevented the
_ignored fields from being loaded and caused a parsing exception.
2020-08-10 15:40:27 -07:00
Nik Everett 0286d0a769
Move distance_feature query building into MFT (#60614) (#60846)
This moves the `distance_feature` query building out of
`DistanceFeatureQueryBuilder` and into subclasses of `MappedFieldType`.
Without this we don't have a chance of supporting this for runtime
fields. In general I'm not sad to see the `instanceof`s go.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 16:05:17 -04:00
Julie Tibshirani b216340f50
Make `FetchPhase` logic more readable. (#60779)
* Factor out FieldsVisitor#postProcess call.
* Swap logical order for normal and nested documents.
* Extract the method createStoredFieldsVisitor.
2020-08-10 11:04:54 -07:00
Nik Everett dfd502f9ca
Rework checking if a year is a leap year (#60585) (#60790)
This way is faster, saving about 8% on the microbenchmark that rounds to
the nearest month. That is in the hot path for `date_histogram` which is
a very popular aggregation so it seems worth it to at least try and
speed it up a little.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-10 12:45:34 -04:00
Jim Ferenczi f30f1f04e2
Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816)
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
2020-08-10 17:23:00 +02:00
David Turner f44c28b595
Deprecate and ignore join timeout (#60872)
There is no point in timing out a join attempt any more once a cluster
is entirely in 7.x. Timing out and retrying with the same master is
pointless, and an in-flight join attempt to one master no longer blocks
attempts to join other masters. This commit deprecates this unnecessary
setting and removes its effect from the joining process.

Relates #60873 which removes this setting in master.
2020-08-10 13:57:41 +01:00
Martijn van Groningen 64bb082f9b
Improve error message for non append-only writes that target data stream (#60874)
Backport of #60809 to 7.x branch.

Closes #60581
2020-08-10 13:18:59 +02:00
Alan Woodward e8d9185045 Cut over IPFieldMapper to parametrized form (#60602)
This commit makes IpFieldMapper extend ParametrizedFieldMapper. It also
updates the IpFieldMapper docs to add the ignore_malformed parameter,
which was not previously documented.
2020-08-10 11:01:10 +01:00
David Turner 1f49e0b9d0 Fix testRerouteOccursOnDiskPassingHighWatermark (#60869)
Sometimes this test would refresh the disk stats so quickly that it hit
the refresh rate limiter even though it was almost completely disabled.
This commit allows the rate limiter to be completely disabled.

Closes #60587
2020-08-10 09:39:44 +01:00
Ryan Ernst ddcfbec569
Add assert message for multiple lines in osprobe (#60796)
Several /proc files are expected to contain a single line. We assert on
this in tests, but the contents of the file are lost and the assertion
therefore lacks important information to debug why the file appeared to
have multiple lines. This commit dumps the contents of the file on
assertion failure.

relates #59284
2020-08-06 15:53:30 -07:00
Ryan Ernst fc38af363e
Ensure latch is counted down when assertion trips (#60800)
The ReloadSecureSettingsIT uses latches to ensure coordination across
requests to the underlying in memory cluster. However, in the case of an
expected failure, if the assertion fails, the latch will never be
counted down, and will cause the test to hang indefinitely. This commit
ensures the latch is always counted down with a try/finally.

relates #51546
2020-08-06 15:33:46 -07:00
Jim Ferenczi 98119578a1 Disable sort optimization on search collapsing (#60838)
Collapse search queries that sort by a field can throw
an ArrayStoreException due to a bug in the [sort optimization](https://github.com/elastic/elasticsearch/pull/51852)
introduced in 7.7.0. Search collapsing were not supposed to
be eligible for this sort optimization so this change explicitly
filters them from this new feature.
2020-08-06 21:37:12 +02:00
Jim Ferenczi 14980ff97e Fix AOOBE when setting min_doc_count to 0 in significant_terms (#60823)
This commit fixes the computation of the subset size on empty buckets (doc count of 0).
The aggregator test refactoring in #60683 revealed this bug.
2020-08-06 18:57:09 +02:00
David Turner 721198c29e Increase logging in testRerouteOccursOnDiskPassingHighWatermark (#60817)
Relates #60587
2020-08-06 14:08:09 +01:00
Armin Braun a2c7991e96
Fix CompressibleBytesOutputStreamTests (#60815) (#60822)
Since #60730 the `bytes` field can be `null`. This adds the missing `null` check to the test
override.

Closes #60814
2020-08-06 15:07:48 +02:00
David Turner 273a6f916d AwaitsFix for #60814 2020-08-06 12:56:28 +01:00
Tim Brooks 2f76c48ea7
Propagate forceExecution when acquiring permit (#60634)
Currently the transport replication action does not propagate the force
execution parameter when acquiring the indexing permit. The logic to
acquire the index permit supports force execution, so this parameter
should be propagate. Fixes #60359.
2020-08-05 15:57:40 -06:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Jake Landis f3752ba1d5
7.x suport new path for re-index java-api doc (#60319)
This commit uses the new location for the reindex java-api documentation.
Temporary files have been left behind to pacify the docs build.

related #60339
2020-08-05 09:05:07 -05:00
Armin Braun ebfb93ff26
Improve some BytesStreamOutput Usage (#60730) (#60736)
* Stop redundantly creating a `0` length `ByteArray` that is never used
* Add efficient way to get a minimal size copy of the bytes in a `BytesStreamOutput`
* Avoid multiple redundant `byte[]` copies in search cache key creation
2020-08-05 15:51:06 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Igor Motov 959690a64a
Refactor extendedBounds to use DoubleBounds (#60556) (#60681)
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
2020-08-04 16:45:47 -04:00
Alan Woodward b3ae5d26bd
Move mapper validation to the mappers themselves (#60072) (#60649)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.

This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
2020-08-04 14:39:20 +01:00
Armin Braun 212ce22d15
Optimize CS Persistence Stream Use (#60643) (#60647)
In the metadata persistence logic we failed to override the bulk write
method on the FilterOutputStream resulting in all the writes to it
running byte-by-byte in a loop adding a large number of bounds checks
needlessly.
2020-08-04 15:06:57 +02:00
Armin Braun 859ad761bb
Fix Broken Stream Close in writeRawValue (#60625) (#60644)
Small oversight in #56078 that only showed up during backporting where a stream copy was turned from a non-closing to a closing one. Enhanced part of a test in this PR to make it show up in master also even though we practically never use this method with stream targets that actually close.
2020-08-04 13:39:52 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Julie Tibshirani f99584c6f3
Avoid reloading _source for every inner hit. (#60632)
Previously if an inner_hits block required _ source, we would reload and parse
the root document's source for every hit. This PR adds a shared SourceLookup to
the inner hits context that allows inner hits to reuse parsed source if it's
already available. This matches our approach for sharing the root document ID.

Relates to #32818.
2020-08-03 17:12:27 -07:00
Julie Tibshirani fc63f8224f
Simplify class hierarchy for ordinals field data. (#60606)
This PR simplifies the hierarchy for ordinals field data classes:
* Remove `AbstractIndexFieldData`, since only `AbstractIndexOrdinalsFieldData`
inherits directly from it.
* Make `SortedSetOrdinalsIndexFieldData` extend
`AbstractIndexOrdinalsFieldData`. This lets us remove some redundant code.
2020-08-03 09:58:29 -07:00
Yannick Welsch 3409e019d2 Ignore shutdown when retrying recoveries (#60586)
Avoids failures when shutting down a node.
2020-08-03 15:14:38 +02:00
Nik Everett 2cde43b799
Allows nanosecond resolution in search_after (backport of #60328) (#60426)
Allows nanosecond resolution in search_after (#60328)

This fixes `search_after` to properly parse string formatted dates that
have nanosecond resolution.

Closes #52424
2020-08-03 08:17:48 -04:00
David Turner d2ddf8cd6a Improve deserialization failure logging (#60577)
Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

    java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the `TransportService`
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with `ContextPreservingActionListener`
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates #38939
2020-08-03 11:51:01 +01:00
Armin Braun 3270cb3088
More Efficient Writes for Snapshot Shard Generations (#60458) (#60575)
Same as #59905 but for shard level metadata. Since we wnat to retain
the ability to do safe+atomic writes for non-uuid shard generations
this PR has to create two separate write paths for both kinds of
shard generations.
2020-08-03 11:11:36 +02:00
Armin Braun 204efe9387
Add Repository Setting to Disable Writing index.latest (#60448) (#60576)
Writing the `index.latest` blob is unnecessary unless the contents of the repository
are to be used as a URL-repository. Also, in some edge cases, the fact that `index.latest` is the only
blob in the repository that regularly gets overwritten was causing compatibility issues with
some backing blobstores (Azure no-overwrite policy, Hitachy S3 equivalent).
=> this commit changes behavior to make snapshots not fail if writing `index.latest` fails
and adds a setting to disable writing `index.latest`.
2020-08-03 11:11:24 +02:00
Andrei Dan ac258f10d6
Data streams: throw ResourceAlreadyExists exception (#60518) (#60536)
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.

(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-01 16:31:09 +01:00
Julie Tibshirani f1d4fd8c3e Correct name of IndexFieldData#loadGlobalDirect. (#60492)
It seems 'localGlobalDirect' was just a typo.
2020-07-31 10:53:21 -07:00
Jim Ferenczi 8db896d290 Fix race condition in SearchPhaseControllerTests#testPartialMergeFailure (#60488)
This change ensures that we call the listener for partial merge failure **before**
calling the completion listener in order to avoid race condition in tests.

Closes #60446
2020-07-31 16:29:20 +02:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Julie Tibshirani 8ac81a3447 Remove IndexFieldData#clear since it is unused. (#60475)
This method was never called. It also seemed tricky that calling a method on
`IndexFieldData` could clear the contents of a shared cache.
2020-07-30 14:07:55 -07:00
Julie Tibshirani dfd7f226f0
Clarify SourceLookup sharing across fetch subphases. (#60484)
The `SourceLookup` class provides access to the _source for a particular
document, specified through `SourceLookup#setSegmentAndDocument`. Previously
the search context contained a single `SourceLookup` that was shared between
different fetch subphases. It was hard to reason about its state: is
`SourceLookup` set to the expected document? Is the _source already loaded and
available?

Instead of using a global source lookup, the fetch hit context now provides
access to a lookup that is set to load from the hit document.

This refactor closes #31000, since the same `SourceLookup` is no longer shared
between the 'fetch _source phase' and script execution.
2020-07-30 13:22:31 -07:00
Dan Hermann 5e5503ac28
Change severity of negative stats messages from WARN to DEBUG (#60375) (#60444) 2020-07-30 06:06:13 -05:00
Armin Braun 3bf4c01d8e
Don't Allocate Redundant Pages in BigArrays (#60201) (#60441)
The oversize algorithm was allocating more pages than necessary to accommodate `minTargetSize`.
An example would be that a 16k page size and 15k `minTargetSize` would result in a new size of 32k (2 pages).
The difference between the minimum number of necessary pages and the estimated size then keeps growing as sizes increase.

I don't think there is much value in preemptively allocating pages by over-sizing aggressively since the behavior of
the system is quite different from that of a single array where over-sizing avoids copying
once the minimum target size is more than a single page.

Relates #60173 which lead me to this when `BytesStreamOutput` would allocate a large number of never used
pages during serialization of repository metadata.
2020-07-30 11:09:58 +02:00
Armin Braun a2c49a4f02
Reduce Heap Use during Shard Snapshot (#60370) (#60440)
Instances of `BlobStoreIndexShardSnapshots` can be of non-trivial size. In case of snapshotting a larger
number of shards the previous execution order would lead to memory use proportional to the number of shards
for these objects. With this change, the number of these objects on heap is bounded by the size of the snapshot
pool (except for in the BwC format path).
This PR makes it so that they are written to the repository at the earliest possible point in time
so that they can be garbage collected.
If shard generations are used, we can safely write these right at the beginning of the shard snapshot.
If shard generations are not used we can only write them at the end of the shard snapshot after all
other blobs have been written.

Closes #60173
2020-07-30 10:45:00 +02:00
Igor Motov 00a1949852
Streamline GeoJSON to map serialization (#60413) (#60429)
Optimizes GeoJSON to map serialization when retrieving spatial data through
fields.

Closes #60259
2020-07-29 17:56:56 -04:00
Julie Tibshirani 5359417ec3
Minor clean-up around search highlight context. (#60422)
* Rename SearchContextHighlight -> SearchHighlightContext.
* Rename HighlighterContext to FieldHighlightContext.
* Make the search highlight context immutable.
* Avoid storing SearchHighlightContext on HighlighterContext.
2020-07-29 11:39:17 -07:00
Tim Brooks 85fdf959ad
Add configured indexing memory limit to node stats (#60414)
This commit adds the configured memory limit to the node stats API.
2020-07-29 12:28:21 -06:00
Nhat Nguyen 9d4a64e749
Allow CCR on nodes with legacy roles only (#60093)
CCR will stop functioning if the master node is on 7.8, but data nodes 
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.

Relates #54146
Relates #59375
2020-07-29 10:57:31 -04:00
Armin Braun 8429b4ace8
Fix Queued Snapshot Deletes After Finalization Failure (#60285) (#60379)
This fixes the behavior of the snapshot state machine in the following edge case:

1. Snapshot is running
2. Delete/abort for the snapshot is started
3. Snapshot fails to finalize

We were not removing the failed snapshot id from the list of snapshots to delete in the delete.
This lead to an error in the repository, which throws if we try to delete a non-existing snapshot.
This commmit updates the deletions in progress by removing the failed snapshot id.
The fact that this could lead to snapshot delete entries without any snapshot ids is not optimized
on purpose because it allows for another attempt at writing clean `RepositoryData` and will run basic
cleanup on the repository (root level blobs and stale indices) and thus bring the repository back into
a clean state after a failed finalization.

Closes #60274
2020-07-29 15:54:18 +02:00
Armin Braun 381cec2ba9
Fix ConcurrentSnapshotsIT.testMasterFailOverWithQueuedDeletes (#60307) (#60376)
The test assumed that the master fail-over would always work out as a single step.
This is not guaranteed however and we can randomly see master failing over twice,
in which case the transport listener will be failed on the node that stops being
leader and we have to catch an exception for the deletes as well just like we do
for the snapshot.

Closes #60262
2020-07-29 15:54:00 +02:00
Armin Braun 0778274b72
Fix IPV6 Scope Id in InetAddressesTests (#60368) (#60369)
Follow up to #60360, turns out at times the name of an interface that isn't loopback is not a valid scope id.
2020-07-29 13:16:12 +02:00
Armin Braun 1f6a3765e4
Fix NPE in SnapshotsInProgress Constructor (#60355)
Merge oversight between cleanups that removed `null` for `shards` and this corner case
spot of no indices in a snapshot.

Closes #60330
2020-07-29 10:47:28 +02:00
Armin Braun 4307a45153
Fix IPV6 Scope ID Test (#60360) (#60363)
Use real scope id from first available interface instead of `lo` which might
not exist on non-Linux platforms.

Closes #60332
2020-07-29 09:55:37 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Zachary Tong e3d85feecd Mute testForStringIPv6WithScopeIdInput test
Tracking issue: https://github.com/elastic/elasticsearch/issues/60332
2020-07-28 15:05:19 -04:00
Igor Motov 0dd53b76bd
Add aggregation list to node info (#60074) (#60256)
Adds a full list of supported aggregations to the node info API. This list
will be used in transform tests and telemetry mapping tests that will be added
as follow-up PRs.

Fixes #59774
2020-07-28 14:06:12 -04:00
Julie Tibshirani c7bfb5de41
Add search `fields` parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
James Rodewig 025e7bee80
[DOCS] Fix allowed values for numeric sort types (#60176) (#60299)
Co-authored-by: Philippus Baalman <philippus@gmail.com>
2020-07-28 13:51:59 -04:00
Howard 11b86b3f88
Remove unused clusterService instance in ActionModule. (#59826) 2020-07-28 10:36:04 -07:00
jimczi 4e4ed6ee48 fix race condition in SearchPhaseControllerTests#consumerTestCase 2020-07-28 18:27:39 +02:00
David Turner 9450ea08b4 Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 17:08:04 +01:00
Armin Braun 9222070f22
Fix Test Failure in testCorrectCountsForDoneShards (#60254) (#60286)
* Fix Test Failure in testCorrectCountsForDoneShards

Fixing the freak edge case where the node shard status request returns before
the node was able to send the state update request to master and update the cluster state.
Without this change, the snapshot shard status would report as `DONE` once the data node
has finished updating the shard in the cluster state.
If the data node then drops out of the cluster before the state has been updated, then
the status will jump to "FAILURE" because the master updates the state once the data node
leaves the cluster.

Closes #60247
2020-07-28 15:46:18 +02:00
David Turner b78caa5c00 Add more useful toString on cluster state observers (#60277)
Today if a cluster state observer's listener takes a long time to
process a notification then we log the following rather useless warning
message:

    [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@12345678]] took [34567ms]

This commit adds a handful of simple `toString()` implementations in
order to identify the owner of the listener in question.
2020-07-28 12:56:58 +01:00
Jim Ferenczi 1144534093
Executes incremental reduce in the search thread pool (#58461) (#60275)
This change forks the execution of partial
reduces in the coordinating node to the search thread pool.
It also ensures that partial reduces are executed sequentially
and asynchronously in order to limit the memory and cpu that a
single search request can use but also to avoid blocking a
network thread.
If a partial reduce fails with an exception, the search
request is cancelled and the reporting of the error is
delayed to the start of the fetch phase (when the final
reduce is performed). This ensures that we cleanup the
in-flight search requests before returning an error to
the user.

Closes #53411
Relates #51857
2020-07-28 13:40:47 +02:00
Armin Braun d39622e17e
Stop Serializing RepositoryData Twice when Writing (#60107) (#60269)
We can save one round of serializing `RepositoryData` on the write path.
This also leads to somewhat better compression because we compress larger chunks
in one go potentially when compared to serializing and compressing in one go.
Also, fixed the double wrapping of collections when copying the repository
data instance via the `withGenId`.
2020-07-28 11:42:14 +02:00
Yannick Welsch a55c869aab Properly document keepalive and other tcp options (#60216)
Keepalive options are not well-documented (only in transport section, although also available at http and network level).

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2020-07-28 11:10:04 +02:00
Yannick Welsch ffe114b890 Set specific keepalive options by default on supported platforms (#59278)
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.

This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
2020-07-28 11:10:04 +02:00
Armin Braun fac5953d13
Let `isInetAddress` utility understand the scope ID on ipv6 (#60172) (#60263)
Make `isInetAddress` utility method understand the scope ID on ipv6.

Fixes #60115

Co-authored-by: Yang Cheng <chengyang2048@163.com>
2020-07-28 09:37:39 +02:00
James Rodewig cb4c21fa7b
[DOCS] Fix typo in adapt auto expand replica comments (#60187) (#60239)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-07-27 14:18:53 -04:00
weizijun 5df043d0e0 Fix wait_for_no_initializing_shards params (#58379) 2020-07-27 14:03:26 -04:00
Adrien Grand f1f275c91b Add 6.8.12 and 7.8.2 version constants. 2020-07-27 19:26:22 +02:00
Tim Brooks df0f68da23
Identify the operation type in rejected exception (#60138)
Currently, we do not categorize the operation type in the rejection
exception messsage when we reject an indexing operation for indexing
memory limits. This commit fixes this to ensure that it is identified as
coordinating, primary, or replica.
2020-07-27 10:09:46 -06:00
Tim Brooks 47922c9e4a
Fix indexing pressure replica rejections logic (#60150)
Currently the logic to rejection replica rejections is evaluate before
adding the additional bytes of the current operation. This means that
the first replica operation which should be rejected will be allowed to
proceed. This commit fixes this logic and adds unit level test to ensure
indexing pressure behavior is correct.
2020-07-27 10:00:01 -06:00
Nik Everett a451dd87aa
Reduce merge map memory overhead in the Variable Width Histogram Aggregation (#59366) (#60171)
When a document which is distant from existing buckets gets collected, the
`variable_width_histogram` will create a new bucket and then insert it
into the ordered list of buckets.

Currently, a new merge map array is created to move this bucket. This is
very expensive as there might be thousands of buckets.

This PR creates `mergeBuckets(UnaryOperator<Long> mergeMap)` methods in
`BucketsAggregator` and `MergingBucketsDefferingCollector`, and updates
the `variable_width_histogram`  to use them. This eliminates the need to
create an entire merge map array for each new bucket and reduces the
memory overhead of the algorithm.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-07-27 09:23:06 -04:00
Armin Braun 196ed6b90e
Remove Mostly Redundant Deleting in FsBlobContainer (#60117) (#60195)
In almost all cases we write uuid named files via this method.
Preemptively deleting just wastes IO ops, we can delete after a write failed
and retry the write to cover the few cases where we actually do an overwrite.
2020-07-27 14:05:41 +02:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Armin Braun 25a75d05c0
Fix Test Failure in testConcurrentlyChangeRepositoryContentsInBwCMode (#60095)
There is a very unlikely but possible test failure in this test.
The `SnapshotsService` continues iterating over queued operations after
resolving the transport listener. This can lead to a situation where the moved
repository data is not picked up when running the delete (even though we
have the concurrent modifications BwC mode activated) concurrently.

I fixed this in the test so that the test still verifies that this setting works.
Technically speaking, one could add logic to the way we queue and execute repo operations
to address this special case. Since this case only comes about with the concurrent modifications
setting enabled (and the setting is gone in master already) I don't really see a reason to improve
the logic here since we should always fail queued up repo operations on concurrent modification for
safety reasons.
2020-07-27 09:33:38 +02:00
Nhat Nguyen 0031dea9cc Fix race in testSendSnapshotSendsOps (#59831)
There is a race between increase and get the global checkpoint in the 
test as indexTranslogOperations can be executed concurrently.

Closes #59492
2020-07-23 16:22:40 -04:00
Ignacio Vera db183c89ed
Refactor HyperLogLogPlusPlus to separate algorithms and internal data representation (#60104) (#60109) 2020-07-23 15:07:05 +02:00
David Turner bf7e53a91e Remove node-level canAllocate override (#59389)
Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
2020-07-23 08:48:59 +01:00
Armin Braun 43a6ff5eb1
Optimize some Spots around Closing Resources (#60049) (#60096)
The single element `close` calls go through a very inefficient path that includes creating
a one element list.
`releaseOnce` is only with a single non-null input in production in two spots so no need for
varargs and any complexity here.
`ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have
that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were
wrapping here) already.
2020-07-23 08:49:06 +02:00
Julie Tibshirani aa57bbd422
Consolidate validation for 'docvalue_fields'. (#60065)
This improves modularity and also fixes some issues when `docvalues_fields` is
used within `inner_hits` or the `top_hits` agg:
* We previously didn't resolve wildcards in field names.
* We also forgot to enforce the limit `index.max_docvalue_fields_search`.
2020-07-22 17:26:58 -07:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
Tim Brooks ba01540d7e
Implement human readable indexing pressure stats (#60058)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 12:07:59 -06:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Luca Cavanna 702c997819 ParametrizedFieldMapper to run validators against default value (#60042)
Sometimes there is the need to make a field required in the mappings, and validate that a value has been provided for it. This can be done through a validator when using ParametrizedFieldMapper, but validators need to run also when a value for a field has not been specified.

Relates to #59332
2020-07-22 14:12:38 +02:00
Armin Braun c06c9fb966
Fix BwC Snapshot INIT Path (#60006)
There were two subtle bugs here from backporting #56911 to 7.x.

1. We passed `null` for the `shards` map which isn't nullable any longer
when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map
like the `null` handling did in the past.
2. The removal of a failed `INIT` state snapshot from the cluster state tried
removing it from the finalization loop (the set of repository names that are
currently finalizing). This will trip an assertion since the snapshot failed
before its repository was put into the set. I made the logic ignore the set
in case we remove a failed `INIT` state snapshot to restore the old logic to
exactly as it was before the concurrent snapshots backport to be on the safe
side here.

Also, added tests that explicitly call the old code paths because as can be seen
from initially missing this, the BwC tests will only run in the configuration new
version master, old version nodes ever so often and having a deterministic test
for the old state machine seems the safest bet here.

Closes #59986
2020-07-22 10:09:55 +02:00
Jake Landis 55216dabb4
[7.x] Per processor description for verbose simulate (#58207) (#60008)
For ingest node processors a per processor description
was recently added. This commit displays that description
in the verbose output of the pipeline simulation.

related #57906
2020-07-21 17:32:45 -05:00
Nik Everett 49f365ddfd
Fix bug in deep pipeline agg serialization (#59984)
In #54716 I removed pipeline aggregators from the aggregation result
tree and caused us to read them from the request. This saves a bunch of
round trip bytes, which is neat. But there was a bug in the backwards
compatibility logic. You see, we still have to give the pipeline
aggregations to nodes older than 7.8 over the wire because that is how
they know what pipelines to run. They have the pipelines in the request
but they don't read them. They use the ones in the response tree.

Anyway, we had a bug where we were never sending pipelines defined two
levels down. So while you are upgrading the pipeline wouldn't run.
Sometimes. If the data node of the "first" result was post-7.8 and the
coordinating node was pre-7.8.

This fixes the bug.
2020-07-21 16:03:15 -04:00
David Turner dde568caf7 Fix scheduling of ClusterInfoService#refresh (#59880)
Today the `InternalClusterInfoService` uses the
`LocalNodeMasterListener` interface to start/stop its operations. Since
the `onMaster` and `offMaster` methods are called on the `MANAGEMENT`
threadpool, there's no guarantee that they run in the correct sequence,
which could result in an elected master failing to regularly update the
cluster info.

Since this service is also a `ClusterStateListener` we may as well drop
the usage of the `LocalNodeMasterListener` interface and simply update
the status of the local node on the applier thread in `clusterChanged`
to ensure consistency.

Additionally, today the `InternalClusterInfoService` uses a simple flag
to track whether the local node is the elected master or not. If the
node stops being the master and then starts again within a few seconds
then the scheduled updates from the old mastership might carry on
running in addition to the ones for the new mastership.

This commit addresses that by tracking the identity of the scheduled
update job and creating a new job for each mastership.
2020-07-21 17:14:49 +01:00
Alan Woodward a0ad1a196b Wrap up building parametrized TypeParsers (#59977)
The TypeParser implementations of all ParametrizedFieldMapper descendant classes are
essentially the same - stateless, requiring the construction of a Builder object, and calling
parse on it before returning it. We can make this easier (and less error-prone) to
implement by wrapping the logic up into a final class, which takes a function to produce
the Builder from a name and parser context.
2020-07-21 16:00:11 +01:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Luca Cavanna 5e17f00ecf Tweak toXContent implementation of ParametrizedFieldMapper (#59968)
ParametrizedFieldMapper overrides `toXContent` from `FieldMapper`, yet it could override `doXContentBody` and rely on the `toXContent` from the base class. Additionally, this allows to make `doXContentBody` final. Also, toXContent is still overridden only to make it final.
2020-07-21 16:01:51 +02:00
Przemyslaw Gomulka 19fe3e511f
Deprecate camel case date format backport(#59555) (#59948)
Camel case date formats are deprecated and snake case should be used
instead.
backports #59555
2020-07-21 15:56:44 +02:00
Armin Braun e37bfe8a5f
Stop Checking if Segment Data Blob Exists before Write (#59905) (#59971)
With uuid named segment data blobs there is no reason to ensure no overwrites are happening
for these blobs when writing. On the contrary, at least on Azure this check can conflict with
the SDK's retrying and cause upload failures randomly.
2020-07-21 15:23:42 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Armin Braun cefaa17c52
Simplify CheckSumBlobStoreFormat and make it more Reusable (#59888) (#59950)
Refactored `CheckSumBlobStoreFormat` so it can more easily be reused in
other functionality (i.e. upcoming repair logic).
Simplified away constant `failIfAlreadyExists` parameter and removed the atomic
write method and its tests.
The atomic write method was only used in a single spot and that spot has now been adjusted to
work the same way writing root level metadata works.
2020-07-21 11:20:56 +02:00
Armin Braun 5b92596fad
Cleanup and Optimize Multiple Serialization Spots (#59626) (#59936)
Follow up to #59606 using some of the new infrastructure and making similar cleanups (and due to at times better handling of size hints and empty collections also optimizations in the stream utility methods this also means speedups) in various spots in the core codebase.
2020-07-21 10:06:56 +02:00
Julie Tibshirani 8647872a1e
Simplify structure for parsing points. (#59938)
Previously we constructed a GeometryFormat object and delegated point parsing to
it. This wasn't a good fit conceptually because each GeometryFormat instance
didn't represent a distinct point format.
2020-07-20 17:11:43 -07:00
Nik Everett b2ca19484a
Allocate slightly less per bucket (#59740) (#59873)
This replaces that data structure that we use to resolve bucket ids in
bucketing aggs that are inside other bucketing aggs. This replaces the
"legoed together" data structure with a purpose built `LongLongHash`
with semantics similar to `LongHash`, except that it has two `long`s
as keys instead of one.

The microbenchmarks show a fairly substantial performance gain on the
hot path, around 30%. Rally's higher level benchmarks show anywhere
from 0 to 7% speed improvements. Not as much as I'd hoped, but nothing
to sneeze at. And, after all, we all allocating slightly less data per
owningBucketOrd, which is always nice.
2020-07-20 10:43:11 -04:00
Stéphane Campinas bcebdfe5b1 fix handling of alias filter in SearchService#canMatch (#59368)
The check against the alias filter should be done after the request is rewritten.

Close #59367
2020-07-20 16:25:15 +02:00
David Turner b75207a09f Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 15:22:04 +01:00
Lee Hinman 8c7d414a3b
[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806) (#59810)
Backports the following commits to 7.x:

    Fix retrieving data stream stats for a DS with multiple backing indices (#59806)
2020-07-17 16:56:07 -06:00
Nik Everett 514b2f3414
Clean up a few of vwh's rough edges (#59341) (#59807)
This cleans up a few rough edged in the `variable_width_histogram`,
mostly found by @wwang500:
1. Setting its tuning parameters in an unexpected order could cause the
   request to fail.
2. We checked that the maximum number of buckets was both less than
   50000 and MAX_BUCKETS. This drops the 50000.
3. Fixes a divide by 0 that can occur of the `shard_size` is 1.
4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows
   a signed int.
5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less
   than `buckets` we will very consistently return fewer buckets than
   requested. For the most part we expect folks to leave it at the
   default. If they change it, we expect it to be much bigger than
   `buckets`.
6. Allocate a smaller `mergeMap` in when initially bucketing requests
   that don't use the entire `shard_size * 3 / 4`. Its just a waste.
7. Default `shard_size` to `10 * buckets` rather than `100`. It *looks*
   like that was our intention the whole time. And it feels like it'd
   keep the algorithm humming along more smoothly.
8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like
   we've documented it rather than `5000`. Like the point above, this
   feels like the right thing to do to keep the algorithm happy.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-17 15:16:09 -04:00
Lee Hinman f6b08a3115
[7.x] Allow simulating existing composable index template (#59733) (#59798)
Backports the following commits to 7.x:

    Allow simulating existing composable index template (#59733)
2020-07-17 13:10:07 -06:00
Nik Everett 95e6e4a452
Small cleanup for IndexFieldData (#59724) (#59800)
This drops `IndexComponent` from `IndexFieldData` because it wasn't
doing anything other than forcing us to perform a bunch of ceremony to
build them.
2020-07-17 13:38:15 -04:00
Tal Levy c9ab7bb651
Fix bug in circuit-breaker check for geoshape grid aggregations (#57962) (#59741)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.

Fixes #57847.
2020-07-17 09:26:00 -07:00
Christoph Büscher f4ff5fe93b
Add `zero_terms_query` support to `match_phrase_prefix` (#58822) (#59784)
Currently `match_phrase_prefix` doesn't support `zero_terms_query` like the
other match-type queries. This change adds this support.

Closes #58468
2020-07-17 17:23:23 +02:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Alan Woodward 65f6fb8e94
Shortcut mapping update if the incoming mapping version is the same as the current mapping version (#59517) (#59772)
Currently, when we apply a cluster state change to a shard on a non-master node,
we check to see if the mappings need to be updated by comparing the decompressed
serialized mappings from the update against the serialized version of the shard's
existing mappings. However, we already have a much simpler way of checking this,
by comparing mapping versions on the index metadata of the old and new states.

This commit adds a shortcut to MapperService.updateMappings() that compares
these mapping versions, and ignores the merge if they are equal.
2020-07-17 14:53:09 +01:00
Alan Woodward b29d368b52
Convert DateFieldMapper to parametrized format (#59429) (#59759)
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
2020-07-17 12:46:18 +01:00
Przemko Robakowski 790fbbcd87
[7.x] Fix handling of final pipelines when destination is changed (#59522) (#59746)
* Fix handling of final pipelines when destination is changed (#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
2020-07-17 11:13:48 +02:00
Tim Brooks b6e6a8c090
Fix replication operation transient retry test (#58205)
After the work to retry transient replication failures, the local and
global checkpoint test metadata can be incremented on a different thread
than the test thread. This appears to introduce an extremely rare
scenario where this data is not visible for later test assertions. This
commit fixes the issue by using synchronized maps.
2020-07-16 16:01:47 -06:00
Martijn van Groningen 0096238df1
Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727)
Backport #59503 to 7.x

and adjusted exception messages.

Relates to #59076
2020-07-16 22:29:40 +02:00
Igor Motov 2408803fad
Adds hard_bounds to histogram aggregations (#59175) (#59656)
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
2020-07-16 15:31:53 -04:00
Alan Woodward 10be10c99b
Migrate CompletionFieldMapper to parametrized format (#59691)
This adds a number of new optional parameters to Parameter, including:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
* allowing default values to be based on the values of other parameters

We preserve the previous serialization format of CompletionFieldMapper,
always emitting most fields, in order to meet mapping checks in mixed
version clusters, where the mapper service will check that mappings have
been correctly parsed and updated by checking their serialized outputs.
2020-07-16 19:15:00 +01:00
Howard c0d429863c
remove unused cluster name in environment. (backport of #59605) (#59681)
removes an unused variable
2020-07-16 09:25:55 -04:00
Nik Everett 343053c0a7 Fix compilation in Eclipse (backport #59675)
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
2020-07-16 08:25:12 -04:00
Alan Woodward 27067de699 Make MappedFieldType#meta final (#59383)
The MappedFieldType#updateMeta method was used for testing equality checks, but we
no longer need these after #59212 , so we can remove this method and make meta final.
2020-07-16 09:45:55 +01:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Armin Braun 6db481f49e
Fix ConcurrentSnapshotsIT.testEquivalentDeletesAreDeduplicated (#59611) (#59653)
Trying to queue up snapshot deletes by blocking the delete of the latest
index-N doesn't work here. The first delete will block on the delete operation
but only do so after having already written the updated repository data.
Since that repository data will contain no snapshots, the subsequent deletes for
`*` will just fall through and complete instead of queue up.
=> Fixed by simply waiting on all files on master so that we block before updating
the repository data and get to test the queueing of equivalent operations

closes #59608
2020-07-16 09:28:36 +02:00
Nhat Nguyen b599f7a9c0
Fix estimate size of translog operations (#59206)
Make sure that the estimateSize method includes all fields of translog operations.
2020-07-16 00:19:30 -04:00
Julie Tibshirani 2b70758a05 Correct type parametrization in geo mappers. (#59583)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
2020-07-15 14:10:47 -07:00
Boice Huang ef26c1739b fix typo in Exception Response in GeoJson (#59270) 2020-07-15 20:15:18 +01:00
Boice Huang 07a58d915d Fix typo in AggregationProfiler (#59269) 2020-07-15 20:14:19 +01:00
Armin Braun cc7093645c
Cleanup some Serialization Code around Snapshots (#59532) (#59606)
A number of obvious possible simplifications that also improve efficiency
in some cases (better empty collection handling and size hint use).
Also, added a shortcut for writing and reading immutable open maps that
can be used to dry up additional spots.
2020-07-15 20:40:43 +02:00
David Turner 67e7c3f60e Fix failing test introduced in #59601 2020-07-15 17:44:27 +01:00
Rory Hunter b8d73a1e7e
Default gateway.auto_import_dangling_indices to false (#59302)
Backport of #58898.

Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
2020-07-15 17:10:42 +01:00
David Turner 691759fb1f
Validate snapshot UUID during restore (#59601)
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.

This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.

Relates #50999
2020-07-15 16:23:20 +01:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Rory Hunter 2e05ce5f88 Bump version to 7.10.0 2020-07-15 11:56:45 +01:00
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
David Turner 0c2510dc68 Don't request cluster metadata in _cat/shards impl (#59548)
Today `GET _cat/shards` requests the nodes, routing table, and metadata
from the cluster state, but it does not use any information from the
metadata portion of the response. Metadata includes things like mappings
and templates that may be substantial in size.

This commit drops the unnecessary metadata portion of this cluster state
request.
2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Armin Braun 96f52a028f
Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428) (#59584)
We were not handling the case where during a partial snapshot all
shards would enter a failed state right off the bat.

Closes #59384
2020-07-15 07:59:22 +02:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Armin Braun 16a47e0d08
Simplify SnapshotsInProgress Construction (#58893) (#59573)
With parallel snapshots incoming (but also in isolation) it makes sense to clean up
`SnapshotsInProgress` construction.
We don't need to pre-compute the waiting shards for every entry. We rarely use this information
(only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping
over all shards instead of just the waiting ones once per routing change tops instead of on every change
to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the
current master cares about the information).
In addition to that change I removed some dead code constructors and slighly optimized deserialization.
2020-07-15 00:00:53 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Armin Braun 68a199f75f
Minor Cleanup Dead Code Snapshotting (#57716) (#59569)
* Use consistent cluster state instead in state update
* Remove dead loop in tests
* Remove some dead exception ctors

Just three trivial/random things I found.
2020-07-14 23:13:14 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Mark Tozzi ed2c29f102
If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360) (#59567)
Co-authored-by: Fabio Corneti <info@corneti.com>
2020-07-14 16:56:29 -04:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Tim Brooks a46e5e0f04
Increase default write queue size (#59464)
This commit increases the default write queue size to 10000. This is to
allow a greater number of pending indexing requests. This work is safe
as we have added additional memory limits. Relates to #59263.
2020-07-14 10:35:25 -06:00
Tim Brooks 1a24916fef
Enable replication retries on 7.9+ (#59546)
Currently the work to support replication retries is present on 7.9.
This commit enables these retries by setting the replication timeout to
60s.
2020-07-14 10:35:05 -06:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Luca Cavanna af2f85be15
Consolidate script parsing from object (7.x) (#59509)
The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391).

This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.
2020-07-14 17:08:29 +02:00
Mark Tozzi b357c1b77a
[7.x] Fix NPE when building exception messages for aggregations (#59156) (#59334) 2020-07-14 09:37:44 -04:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Armin Braun 0e3d87ab54
Add Assertions on CS Application in Snapshot Logic (#58681) (#59511)
Relates to #58680. Bugs like that should not only show up in logs
but ideally also get caught in tests. We expect to never see exceptions
in these two spots.
2020-07-14 12:16:42 +02:00
Armin Braun 81e96954d0
Improve Efficiency of SnapshotsService CS Apply (#56874) (#59508)
This change removes the redundant submitting of two separate cluster state updates
for the node configuration changes and routing changes that affect snapshots.
Since we submitted the task to deal with node configuration changes every time on master
fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well
as snapshots that have all their shards completed into this cluster state update task.

Aside from improving efficiency overall this change has the fortunate side effect of moving
all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots
since it makes it very natural and straight forward to order snapshot finalizations by exploiting
that they are all initiated on the same thread.
2020-07-14 11:49:09 +02:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Tim Brooks 68d56fa7db
Implement rejections in `WriteMemoryLimits` (#59451)
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
`indexing_limits.memory.limit`.
2020-07-13 14:34:50 -06:00
Mark Tozzi eb0b28dd1d
Move getPointReaderOrNull into AggregatorBase (#58769) (#59455) 2020-07-13 16:31:33 -04:00
Armin Braun 64c5f70a2d
Remove Needless Context Switches on Loading RepositoryData (#56935) (#59452)
We don't need to switch to the generic or snapshot pool for loading
cached repository data (i.e. most of the time in normal operation).

This makes `executeConsistentStateUpdate` less heavy if it has to retry
and lowers the chance of having to retry in the first place.
Also, this change allowed simplifying a few other spots in the codebase
where we would fork off to another pool just to load repository data.
2020-07-13 21:38:29 +02:00
Armin Braun bde92fc5fc
Remove Needless Context Switch From Snapshot Finalization (#56871) (#59443)
No need to do any switch to the `SNAPSHOT` pool here, the blob store
repo handles all its writes async on the `SNAPSHOT` pool so we're just
needlessly context-switching to enqueue those tasks there.
Also cleaned up the source only repository (the only override to `finalizeSnapshot`)
to make it clear that no IO is happening there and we don't need to run it on the
`SNAPSHOT` pool either.
2020-07-13 20:11:07 +02:00
Armin Braun 31be3a3645
More Efficient Snapshot State Handling (#56669) (#59430)
Follow up to #56365. Instead of redundantly checking snapshots for completion
over and over, just track the completed snapshots in the CS updates that complete
them instead of looping over the smae snapshot entries over and over.
Also, in the batched snapshot shard status updates, only check for completion
of a snapshot entry if it isn't already finalizing.
2020-07-13 18:58:04 +02:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Henning Andersen adf6083dd0
Enhance real memory circuit breaker with G1 GC (#58674) (#59394)
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202
2020-07-13 17:41:09 +02:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Alan Woodward bd01fd107c Revert "Migrate CompletionFieldMapper to parametrized format (#59291)"
This reverts commit 19ba6c39d2.
2020-07-13 14:16:09 +01:00
Armin Braun 4e574a7136
Remove Dead Code from Closed Index Snapshot Logic (#56764) (#59398)
The code path for closed indices is dead code here ever since #39644
because `shards(currentState, indexIds, ...)` does not set
`MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned.
=> simplified the code accordingly.
2020-07-13 14:49:16 +02:00
David Turner 3fb9dccc22 Fix FSHealthServiceTests on Windows (#59387)
In #52680 we introduced a new health check mechanism. This commit fixes
up some related test failures on Windows caused by erroneously assuming
that all paths begin with `/`.

Closes #59380
2020-07-13 12:43:45 +01:00
Alan Woodward 19ba6c39d2 Migrate CompletionFieldMapper to parametrized format (#59291)
This adds some optional extra configuration to Parameter:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
2020-07-13 12:43:15 +01:00
Armin Braun 08b54feaaf
Remove Snapshot INIT Step (#55918) (#59374)
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
2020-07-13 13:41:09 +02:00
Alan Woodward c810a4a12e Continue to accept unused 'universal' params in <8.0 indexes (#59381)
We have a number of parameters which are universally parsed by almost all
mappers, whether or not they make sense. Migrating the binary and boolean
mappers to the new style of declaring their parameters explicitly has meant
that these universal parameters stopped being accepted, which would break
existing mappings.

This commit adds some extra logic to ParametrizedFieldMapper that checks
for the existence of these universal parameters, and issues a warning on
7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an
error.

Fixes #59359
2020-07-13 11:15:56 +01:00
David Kyle 7dcd943e1d Mute FsHealthServiceTests testFailsHealthOnIOException (#59382)
For #59380
2020-07-13 09:48:07 +01:00
Armin Braun 483386136d
Move all Snapshot Master Node Steps to SnapshotsService (#56365) (#59373)
This refactoring has three motivations:

1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
    * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
2020-07-12 22:19:07 +02:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Stuart Tettemer 4c04fd1e05
Scripting: Unlimited compilation rate for ingest (#59268)
* `ingest` and `processor_conditional` default to unlimited compilation rate

Refs: #50152
2020-07-09 16:34:47 -05:00
Stuart Tettemer 94e213dd5f
Scripting: Per context stats in `script` in _nodes/stats (#59266)
Updated `_nodes/stats`:
 * Update `script` in `_node/stats` to include stats per context:

```
      "script": {
        "compilations": 1,
        "cache_evictions": 0,
        "compilation_limit_triggered": 0,
        "contexts":[
          {
            "context": "aggregation_selector",
            "compilations": 0,
            "cache_evictions": 0,
            "compilation_limit_triggered": 0
          },

```

Refs: #50152
Backport: #59625
2020-07-09 15:30:50 -05:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Dan Hermann c26d2b5fa5
Data stream support for indices shard stores API 2020-07-09 13:11:45 -05:00
Nik Everett 28ef997953
Improve vwh's distant bucket handling (#59094) (#59248)
This modifies the `variable_width_histogram`'s distant bucket handling
to:
1. Properly handle integer overflows
2. Recalculate the average distance when new buckets are added on the
   ends. This should slow down the rate at which we build extra buckets
   as we build more of them.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-09 12:14:46 -04:00
Przemko Robakowski c870d6e570
[7.x] Restart tests with data streams (#58330) (#59303)
* Restart tests with data streams (#58330)
2020-07-09 17:52:20 +02:00
David Turner d56fc72ee5 Fix node health-check-related test failures (#59277)
In #52680 we introduced a new health check mechanism. This commit fixes
up some sporadic related test failures, and improves the behaviour of
the `FollowersChecker` slightly in the case that no retries are
configured.

Closes #59252
Closes #59172
2020-07-09 12:46:12 +01:00
David Turner c80a9e2ec2 Skip unnecessary directory iteration (#59007)
Today `NodeEnvironment#findAllShardIds` enumerates the index directories
in each data path in order to find one with a specific name. Since we
already know the name of the folder we seek we can construct the path
directly and avoid this directory listing. This commit does that.
2020-07-09 11:56:41 +01:00
Alan Woodward 67a27e2b9d Add declarative parameters to FieldMappers (#58663)
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
2020-07-09 11:43:21 +01:00
Ignacio Vera 1ad00d1ceb
Add Support in geo_match enrichment policy for any type of geometry (#59276)
geo_match enrichment works currently only with points. This change adds the ability to
use any type of geometry.
2020-07-09 11:41:41 +02:00
Nhat Nguyen 6a0f7411e2 Do not release safe commit with CancellableThreads (#59182)
We are leaking a FileChannel in #39585 if we release a safe commit with 
CancellableThreads. Although it is a bug in Lucene where we do not close
a FileChannel if we failed to create a NIOFSIndexInput, I think it's
safer if we release a safe commit using the generic thread pool instead.

Closes #39585
Relates #45409
2020-07-08 13:51:48 -04:00
Nhat Nguyen 00c859bfca Fix testSendSnapshotSendsOps
We need to use a concurrent collection to keep track of the shipped operations
as they can arrive concurrently since #58018.

Relates #58018
2020-07-08 12:25:33 -04:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
Nik Everett a29d3515a2
Improve cardinality measure used to build aggs (#56533) (#59107)
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-08 08:42:23 -04:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Tim Brooks 3700bd1c08
Fix assertion in testCollectNodes test (#58948)
Currently we assert that the reason we fail collecting nodes in this
test is due to the fact that no seeds are available or no connections
could be established to cluster_2. However, the collection could fail if
we cannot establish connections to cluster_1. This commit adds that as
an acceptible assertion.
2020-07-07 21:37:10 -06:00
Nhat Nguyen ef5c397c0f
Sending operations concurrently in peer recovery (#58018)
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.

With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.

Backport of #58018
2020-07-07 22:03:31 -04:00
Lee Hinman b832fe30ab
[7.x] Validate Data Streams reference a template on composable template update (#59106) (#59193)
This commit adds validation that when a composable index template is updated, that the number
of unreferenced data streams does not increase. While it is still possible to have data streams
without a backing template (through snapshot restoration), this reduces the chance of getting
in to that scenario.

Relates to #53100
2020-07-07 15:38:27 -06:00
Tim Brooks b1c3ad8f59
Fix race in RecoveryRequestTrackerTests (#59187)
Currently in the recovery request tracker tests we place the futures
into the future map on the GENERIC thread. It is possible that the test
has already advanced past the point where we block on these futures
before they are placed in the map. This introduces other potential
failures as we expect all futures have been completed. This commit fixes
the test by places the futures in the map prior to dispatching.
2020-07-07 15:10:31 -06:00
Nik Everett d536854879 Fix test bug in auto_date_histo
The test would try to prepare a `Rounding` even when there aren't any
buckets. This would fail because there is no range over which to prepare
the rounding. It turns out that we don't need the rounding in that case
so we just use `null` then.

Closes #59131
2020-07-07 15:39:48 -04:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Nhat Nguyen de6ac6aea6 Fix recovery stage transition with sync_id (#57754)
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.

Relates ##7187
Closes #57708
2020-07-07 12:00:37 -04:00
Rene Groeschke a896df53ac
Remove misc dependency related deprecation warnings (7.x backport) (#59122)
* Fix dependency related deprecations (#58892)
* Fix classpath setup for forbiddenapi usage
2020-07-07 17:10:31 +02:00
Nik Everett eb169ae226
Fix lookup support in adjacency matrix (backport of #59099) (#59108)
This request:
```
POST /_search
{
  "aggs": {
    "a": {
      "adjacency_matrix": {
        "filters": {
          "1": {
            "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
          }
        }
      }
    }
  }
}
```

Would fail with a 500 error and a message like:
```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason":"async actions are left after rewrite"
      }
    ]
  }
}
```

This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
2020-07-07 10:28:20 -04:00
David Turner 46c8d00852
Remove nodes with read-only filesystems (#52680) (#59138)
Today we do not allow a node to start if its filesystem is readonly, but
it is possible for a filesystem to become readonly while the node is
running. We don't currently have any infrastructure in place to make
sure that Elasticsearch behaves well if this happens. A node that cannot
write to disk may be poisonous to the rest of the cluster.

With this commit we periodically verify that nodes' filesystems are
writable. If a node fails these writability checks then it is removed
from the cluster and prevented from re-joining until the checks start
passing again.

Closes #45286

Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>
2020-07-07 14:00:02 +01:00
Francisco Fernández Castaño 1ced3f0eb3
Extract recovery files details to its own class (#59121)
Backport of #59039
2020-07-07 12:35:57 +02:00
Ignacio Vera 5cc6457ed8
upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091) (#59120) 2020-07-07 12:07:41 +02:00
Armin Braun d6d6df16bb
Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) (#59119)
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
2020-07-07 12:04:41 +02:00
David Turner ef2f0d1f67 Inline no-op IndicesModule#getEngineFactories (#59051)
This method was introduced in #31183 but it has no effect and is never
overridden so this commit removes it.
2020-07-07 09:15:20 +01:00
Francisco Fernández Castaño 0752a86fe5
Enforce higher priority for RepositoriesService ClusterStateApplier (#59040)
* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
2020-07-07 09:51:08 +02:00
Howard 00ed31d000 Remove IndexShardRoutingTable#primaryAsList (#59044) 2020-07-07 07:34:32 +01:00
Nik Everett be13dea113 Drop a TODO from the terms aggregator (#59100)
We did it in #56487.
2020-07-06 17:46:06 -04:00
Nik Everett eff5f4d234
Add pipeline aggregations to the rewrite phase (backport #58878) (#59081)
This allows pipeline aggregations to participate in the up-front rewrite
phase for searches, in particular, it allows them to load data that they
need asynchronously.

Relates to #58193

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-06 15:13:45 -04:00
Nhat Nguyen e827d2ed92 Fix testRestoreLocalHistoryFromTranslogOnPromotion (#58745)
If the global checkpoint equals max_seq_no, then we won't reset an engine 
(as all operations are safe), and max_seqno_of_updates_or_deletes 
won't advance to max_seq_no.

Closes #58163
2020-07-06 12:19:45 -04:00
Andrei Dan 2d516d7bcc
[7.x] Search all (_all, *) resolves data streams too (#58869) (#59058)
Part of the original PR was merged by #59028

(cherry picked from commit 2598327726124d8a86333f79cdc45bf6a4297dbc)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-06 14:19:15 +01:00
Dan Hermann 550dcb0ca6
[7.x] Delete data stream API accepts multiple names (#59064) 2020-07-06 08:06:10 -05:00
Armin Braun 722d94688b
Fix MinimumMasterNodesIT Test (#59054) (#59057)
Tiny oversight in dee9e048bdcc5ba59f20d2554e989015463df05a caused
the `otherNodes` collection to incorrectly contain `master` here.
2020-07-06 13:00:15 +02:00
Armin Braun 62eabdac6e
Dry up Snapshot ITs further (#59035) (#59052)
Some more obvious cleaning up of the snapshot ITs.

follow up to #58818
2020-07-06 12:26:42 +02:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Armin Braun 49857cc35d
Dry up Master Disconnect Disruption Tests (#58953) (#59050)
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
2020-07-06 11:04:24 +02:00
Nhat Nguyen 62763b177d Implement toString for BulkByScrollTask (#59042)
We should implement "toString" of BulkByScrollTask.StatusOrException 
to have a meaningful log message when a reindex task completes.
2020-07-05 22:06:56 -04:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Dan Hermann 7c43cbca82
[7.x] Ignore matching data streams if include_data_streams is false (#59028) 2020-07-03 14:51:32 -05:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Armin Braun d22dd437f1
Fix Two Common Zero Len Array Instantiations (#58944) (#58993)
Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.
2020-07-03 09:18:14 +02:00
Nhat Nguyen 65645217bc Handle IOException while checking translog corruption
We can hit an IOException while reading a translog header after corrupting it.

Relates #58866
2020-07-02 22:38:05 -04:00
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Mark Vieira 8fca312a3a
Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented 2020-07-02 16:58:23 -07:00
Tim Brooks 9d1bf383d0
Add test assertions to ensure write bytes released (#58970)
This is a follow-up to #57573. This commit ensures that the bytes marked
in WriteMemoryLimits are released by any test using an internal test
cluster.
2020-07-02 17:38:23 -06:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Jim Ferenczi a4e08acdd1 Fix exists query on unmapped field in query_string (#58804)
Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped.
This change also introduced a bug in the `query_string` query, using an unmapped field
like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the
exception if the query is built outside of an `ExistsQueryBuilder`.

Closes #58737
2020-07-02 21:52:03 +02:00
Nhat Nguyen be804b765d
Avoid flipping translog header version (#58866)
An old translog header does not have a checksum. If we flip the header 
version of an empty translog to the older version, then we won't detect
that corruption, and translog will be considered clean as before.

Closes #58671
2020-07-02 14:34:19 -04:00
Tal Levy d516959774
Re-enable support for array-valued geo_shape fields. (#58786) (#58943)
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
2020-07-02 11:21:55 -07:00
Ryan Ernst d825d4352c
Eagerly compile condition script at processor creation (#58882)
Ingest script processors were changed to eagerly compile their scripts
when the ingest pipeline is saved, but conditional scripts were missed.
This commit adds eager compilation to ingest conditional scripts, which
will help surface errors before runtime, as well as adds tests for each
case we might encounter between inline and stored script compilation
failures.

closes #58864
2020-07-02 11:10:20 -07:00
Lee Hinman e32623ef52
[7.x] Add test for component templates updated after cluster restart (#58883) (#58914)
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
2020-07-02 08:23:34 -06:00
Armin Braun 62152852dc
Cleanup Duplication in Snapshot ITs (#58818) (#58915)
Just a few obvious static cleanups of duplication to push back against the ever increasing complexity of these tests.
2020-07-02 16:00:01 +02:00
Alan Woodward 0cd1dc3143 Percolator keyword fields should not store norms (#58899)
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
2020-07-02 13:59:28 +01:00
Nik Everett 5e49ee800e
Drop rewriting in date_histogram (backport of #57836) (#58875)
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
2020-07-01 17:19:12 -04:00
Dan Hermann 98a62a6b2d
Make DataStream instances explicitly immutable (#58688) (#58839) 2020-07-01 11:14:01 -05:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Andrei Dan f7dc09340b
Prohibit custom _routing for index requests targetting a data stream (#58749) (#58831)
This prohibits the use of a custom _routing when the index/bulk requests are targetting a data stream.
Using a custom _routing when targetting a backing index is still permitted.

Relates to #53100

(cherry picked from commit ece6b7a318a8bd3a010499189f31fc5e3a012d4f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-01 14:54:18 +01:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka 2c275913b9
[7.x] Week based parsing for ingest date processor (#58597) (#58802)
Date processor was incorrectly parsing week based dates because when a
weekbased year was provided ingest module was thinking year was not
on a date and was trying to applying the logic for dd/MM type of
dates.
Date Processor is also allowing users to specify locale parameter. It
should be taken into account when parsing dates - currently only used
for formatting. If someone specifies 'en-us' locale, then calendar data
rules for that locale should be used.
The exception is iso8601 format. If someone is using that format,
then locale should not override calendar data rules.
closes #58479
2020-07-01 15:15:56 +02:00
David Turner 822b7421ce Forbid read-only-allow-delete block in blocks API (#58727)
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
2020-07-01 13:18:26 +01:00
Martijn van Groningen a0df96befb
Add data stream support to put mapping and update index settings APIs. (#58758)
Backport of #58231 to 7.x branch.

Change update index setting and put mapping api
to execute on all backing indices if data stream is targeted.

Relates #53100
2020-07-01 13:32:21 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
Dan Hermann 1c2a726731
Data stream support for search shards API (#58486) (#58765) 2020-06-30 17:59:51 -05:00
Nik Everett 40850a780d
Fail variable_width_histogram that collects from many (#58619) (#58780)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 18:26:45 -04:00
Dan Hermann cae49b0fd7
[7.x] Add data stream support to open index API (#58767) 2020-06-30 14:30:32 -05:00
Dan Hermann a84ff81743
Data stream support for get field mappings API (#58488) (#58766) 2020-06-30 13:45:04 -05:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
Boice Huang 8c93f4e154 Sort document by internal doc id in FetchPhase to better use LRU cache (#57273)
This change sorts the docIdsToLoad once instead of in each sub-phase.
2020-06-30 17:06:09 +02:00
Julie Tibshirani ab65a57d70
Merge mappings for composable index templates (#58709)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-30 08:01:37 -07:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Armin Braun b52a764143
Fix NPE in SnapshotService CS Application (#58680) (#58735)
In the unlikely corner case of deleting a relocation (hence `WAITING`) primary shard's
index during a partial snapshot, we would throw an NPE when checking if there's any external
changes to process.
2020-06-30 15:20:49 +02:00
Yannick Welsch b885cbff1a
Add index block api (#58716)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 14:06:52 +02:00
Patrick Jiang(白泽) be20aacec3 Add `matchBoolPrefix` method to QueryBuilders (#58637) 2020-06-29 16:30:40 +02:00
Armin Braun 95d85f29f8
Fix Snapshots Capturing Incomplete Datastreams (#58630) (#58656)
Only snapshot datastreams that are recorded in `SnapshotInfo` and clean those
that aren't from the snapshotted metadata.
Do not restore all datastreams by default when restoring global metadata, use the same
mechanics used for indices here.

Closes #58544
2020-06-29 12:51:40 +02:00
Armin Braun 4f2f257b12
Fix DataStream Handling on Restore of Global Metadata (#58631) (#58649)
When restoring a global metadata snapshot we were overwriting the correctly
adjusted data streams in the metadata when looping over all custom values.

Closes #58496
2020-06-29 10:58:41 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Ryan Ernst 08e75abd4e
Always add Java-9 style file permissions (#46050) (#58628)
Java 9 removed pathname canonicalization, which means that we need to
add permissions for the path and also the real path when adding file
permissions. Since master requires a minimum runtime of JDK 11, we no
longer need conditional logic here to apply this pathname
canonicalization with our bares hands. This commit removes that
conditional pathname canonicalization.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-06-26 18:19:07 -07:00
Nik Everett 67e9d39932
Remove useless aggregation helper (#58571) (#58578)
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 15:58:44 -04:00
Tanguy Leroux 775fb5d4cf
Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477) (#58584)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.

This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
 SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 18:26:20 +02:00
Armin Braun 090211f768
Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots (#58390) (#58593)
Minor bugs/inconsistencies:

If a shard hasn't changed at all we were reporting `0` for total size and total file count
while it was ongoing.

If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.
2020-06-26 16:11:30 +02:00
Howard eaa60b7c54 [Docs] Fix return tuple element order (#58463) 2020-06-26 12:24:54 +02:00
Nik Everett 5f52bc4c9f
Fix two scripted_metric bugs (backport of #58547) (#58565)
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
2020-06-25 16:16:10 -04:00
Armin Braun 468e559ff7
Fix Memory Leak From Master Failover During Snapshot (#58511) (#58560)
If we failed over while the data nodes were doing their work
we would never resolve the listener and leak it.
This change fails all listeners if master fails over.
2020-06-25 20:43:08 +02:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Jim Ferenczi 6451187e84 Filter empty fields in SearchHit#toXContent (#58418)
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
2020-06-25 17:49:03 +02:00
Nik Everett 03e6d1b535
Add Variable Width Histogram Aggregation (backport of #42035) (#58440)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-06-25 11:40:47 -04:00
Nik Everett c7726cc93e Fix janky test
Fixes a test that incorrectly assumed that a list of random values less
than or equal to `n` always contained `n`. Oops.

Closes #58353
2020-06-25 11:13:29 -04:00
Nik Everett 71adade73a
Return clear error message if aggregation type is invalid (#58255) (#58365)
The main changes are:

1. Catch the `NamedObjectNotFoundException` when parsing aggregation
   type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().

Closes #58146.

Co-authored-by: bellengao <gbl_long@163.com>
2020-06-25 11:08:25 -04:00
David Roberts 1742b1c39e Cancel persistent task recheck when no longer master (#58539)
If a persistent task cannot be assigned on the first attempt
then the master node will schedule periodic rechecks to see
if the assignment requirements have been met.

These periodic rechecks should be cancelled if the node ceases
to be master.  Previously they weren't, leading to exceptions
being logged repeatedly.  This PR cancels the rechecks on
learning that the node is no longer the master.

Fixes #58531
2020-06-25 15:51:57 +01:00
Nik Everett 335505c4e1
Drop deprecated aggregator wrapper (backport of #58367) (#58448)
This drops the deprecated and now unused `asMultiBucketAggregator`. It
was too easy to use it to make inefficient `Aggregators`.

Relates to #56487
2020-06-25 09:31:19 -04:00
Julie Tibshirani 1f2e05c947
Simplify mapping validation for resizing indices. (#58514)
When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.

This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
2020-06-24 14:07:19 -07:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
markharwood d5ac3bb87f
Field capabilities - make `keyword` a family of field types (#58315) (#58483)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 12:32:14 +01:00
Jim Ferenczi ec8d5ec79c Fix handling of terminate_after when size is 0 (#58212)
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.

Closes #57624
2020-06-24 13:16:11 +02:00
David Turner 796cb9e9ca Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message (#58410)
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:

    TOO_MANY_REQUESTS/12/index read-only / allow delete (api)

Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:

    FORBIDDEN/12/index read-only / allow delete (api)

Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
2020-06-24 10:22:11 +01:00