Commit Graph

5340 Commits

Author SHA1 Message Date
Ignacio Vera db183c89ed
Refactor HyperLogLogPlusPlus to separate algorithms and internal data representation (#60104) (#60109) 2020-07-23 15:07:05 +02:00
David Turner bf7e53a91e Remove node-level canAllocate override (#59389)
Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
2020-07-23 08:48:59 +01:00
Armin Braun 43a6ff5eb1
Optimize some Spots around Closing Resources (#60049) (#60096)
The single element `close` calls go through a very inefficient path that includes creating
a one element list.
`releaseOnce` is only with a single non-null input in production in two spots so no need for
varargs and any complexity here.
`ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have
that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were
wrapping here) already.
2020-07-23 08:49:06 +02:00
Julie Tibshirani aa57bbd422
Consolidate validation for 'docvalue_fields'. (#60065)
This improves modularity and also fixes some issues when `docvalues_fields` is
used within `inner_hits` or the `top_hits` agg:
* We previously didn't resolve wildcards in field names.
* We also forgot to enforce the limit `index.max_docvalue_fields_search`.
2020-07-22 17:26:58 -07:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
Tim Brooks ba01540d7e
Implement human readable indexing pressure stats (#60058)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 12:07:59 -06:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Luca Cavanna 702c997819 ParametrizedFieldMapper to run validators against default value (#60042)
Sometimes there is the need to make a field required in the mappings, and validate that a value has been provided for it. This can be done through a validator when using ParametrizedFieldMapper, but validators need to run also when a value for a field has not been specified.

Relates to #59332
2020-07-22 14:12:38 +02:00
Armin Braun c06c9fb966
Fix BwC Snapshot INIT Path (#60006)
There were two subtle bugs here from backporting #56911 to 7.x.

1. We passed `null` for the `shards` map which isn't nullable any longer
when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map
like the `null` handling did in the past.
2. The removal of a failed `INIT` state snapshot from the cluster state tried
removing it from the finalization loop (the set of repository names that are
currently finalizing). This will trip an assertion since the snapshot failed
before its repository was put into the set. I made the logic ignore the set
in case we remove a failed `INIT` state snapshot to restore the old logic to
exactly as it was before the concurrent snapshots backport to be on the safe
side here.

Also, added tests that explicitly call the old code paths because as can be seen
from initially missing this, the BwC tests will only run in the configuration new
version master, old version nodes ever so often and having a deterministic test
for the old state machine seems the safest bet here.

Closes #59986
2020-07-22 10:09:55 +02:00
Jake Landis 55216dabb4
[7.x] Per processor description for verbose simulate (#58207) (#60008)
For ingest node processors a per processor description
was recently added. This commit displays that description
in the verbose output of the pipeline simulation.

related #57906
2020-07-21 17:32:45 -05:00
Nik Everett 49f365ddfd
Fix bug in deep pipeline agg serialization (#59984)
In #54716 I removed pipeline aggregators from the aggregation result
tree and caused us to read them from the request. This saves a bunch of
round trip bytes, which is neat. But there was a bug in the backwards
compatibility logic. You see, we still have to give the pipeline
aggregations to nodes older than 7.8 over the wire because that is how
they know what pipelines to run. They have the pipelines in the request
but they don't read them. They use the ones in the response tree.

Anyway, we had a bug where we were never sending pipelines defined two
levels down. So while you are upgrading the pipeline wouldn't run.
Sometimes. If the data node of the "first" result was post-7.8 and the
coordinating node was pre-7.8.

This fixes the bug.
2020-07-21 16:03:15 -04:00
David Turner dde568caf7 Fix scheduling of ClusterInfoService#refresh (#59880)
Today the `InternalClusterInfoService` uses the
`LocalNodeMasterListener` interface to start/stop its operations. Since
the `onMaster` and `offMaster` methods are called on the `MANAGEMENT`
threadpool, there's no guarantee that they run in the correct sequence,
which could result in an elected master failing to regularly update the
cluster info.

Since this service is also a `ClusterStateListener` we may as well drop
the usage of the `LocalNodeMasterListener` interface and simply update
the status of the local node on the applier thread in `clusterChanged`
to ensure consistency.

Additionally, today the `InternalClusterInfoService` uses a simple flag
to track whether the local node is the elected master or not. If the
node stops being the master and then starts again within a few seconds
then the scheduled updates from the old mastership might carry on
running in addition to the ones for the new mastership.

This commit addresses that by tracking the identity of the scheduled
update job and creating a new job for each mastership.
2020-07-21 17:14:49 +01:00
Alan Woodward a0ad1a196b Wrap up building parametrized TypeParsers (#59977)
The TypeParser implementations of all ParametrizedFieldMapper descendant classes are
essentially the same - stateless, requiring the construction of a Builder object, and calling
parse on it before returning it. We can make this easier (and less error-prone) to
implement by wrapping the logic up into a final class, which takes a function to produce
the Builder from a name and parser context.
2020-07-21 16:00:11 +01:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Luca Cavanna 5e17f00ecf Tweak toXContent implementation of ParametrizedFieldMapper (#59968)
ParametrizedFieldMapper overrides `toXContent` from `FieldMapper`, yet it could override `doXContentBody` and rely on the `toXContent` from the base class. Additionally, this allows to make `doXContentBody` final. Also, toXContent is still overridden only to make it final.
2020-07-21 16:01:51 +02:00
Przemyslaw Gomulka 19fe3e511f
Deprecate camel case date format backport(#59555) (#59948)
Camel case date formats are deprecated and snake case should be used
instead.
backports #59555
2020-07-21 15:56:44 +02:00
Armin Braun e37bfe8a5f
Stop Checking if Segment Data Blob Exists before Write (#59905) (#59971)
With uuid named segment data blobs there is no reason to ensure no overwrites are happening
for these blobs when writing. On the contrary, at least on Azure this check can conflict with
the SDK's retrying and cause upload failures randomly.
2020-07-21 15:23:42 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Armin Braun cefaa17c52
Simplify CheckSumBlobStoreFormat and make it more Reusable (#59888) (#59950)
Refactored `CheckSumBlobStoreFormat` so it can more easily be reused in
other functionality (i.e. upcoming repair logic).
Simplified away constant `failIfAlreadyExists` parameter and removed the atomic
write method and its tests.
The atomic write method was only used in a single spot and that spot has now been adjusted to
work the same way writing root level metadata works.
2020-07-21 11:20:56 +02:00
Armin Braun 5b92596fad
Cleanup and Optimize Multiple Serialization Spots (#59626) (#59936)
Follow up to #59606 using some of the new infrastructure and making similar cleanups (and due to at times better handling of size hints and empty collections also optimizations in the stream utility methods this also means speedups) in various spots in the core codebase.
2020-07-21 10:06:56 +02:00
Julie Tibshirani 8647872a1e
Simplify structure for parsing points. (#59938)
Previously we constructed a GeometryFormat object and delegated point parsing to
it. This wasn't a good fit conceptually because each GeometryFormat instance
didn't represent a distinct point format.
2020-07-20 17:11:43 -07:00
Nik Everett b2ca19484a
Allocate slightly less per bucket (#59740) (#59873)
This replaces that data structure that we use to resolve bucket ids in
bucketing aggs that are inside other bucketing aggs. This replaces the
"legoed together" data structure with a purpose built `LongLongHash`
with semantics similar to `LongHash`, except that it has two `long`s
as keys instead of one.

The microbenchmarks show a fairly substantial performance gain on the
hot path, around 30%. Rally's higher level benchmarks show anywhere
from 0 to 7% speed improvements. Not as much as I'd hoped, but nothing
to sneeze at. And, after all, we all allocating slightly less data per
owningBucketOrd, which is always nice.
2020-07-20 10:43:11 -04:00
Stéphane Campinas bcebdfe5b1 fix handling of alias filter in SearchService#canMatch (#59368)
The check against the alias filter should be done after the request is rewritten.

Close #59367
2020-07-20 16:25:15 +02:00
David Turner b75207a09f Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 15:22:04 +01:00
Lee Hinman 8c7d414a3b
[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806) (#59810)
Backports the following commits to 7.x:

    Fix retrieving data stream stats for a DS with multiple backing indices (#59806)
2020-07-17 16:56:07 -06:00
Nik Everett 514b2f3414
Clean up a few of vwh's rough edges (#59341) (#59807)
This cleans up a few rough edged in the `variable_width_histogram`,
mostly found by @wwang500:
1. Setting its tuning parameters in an unexpected order could cause the
   request to fail.
2. We checked that the maximum number of buckets was both less than
   50000 and MAX_BUCKETS. This drops the 50000.
3. Fixes a divide by 0 that can occur of the `shard_size` is 1.
4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows
   a signed int.
5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less
   than `buckets` we will very consistently return fewer buckets than
   requested. For the most part we expect folks to leave it at the
   default. If they change it, we expect it to be much bigger than
   `buckets`.
6. Allocate a smaller `mergeMap` in when initially bucketing requests
   that don't use the entire `shard_size * 3 / 4`. Its just a waste.
7. Default `shard_size` to `10 * buckets` rather than `100`. It *looks*
   like that was our intention the whole time. And it feels like it'd
   keep the algorithm humming along more smoothly.
8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like
   we've documented it rather than `5000`. Like the point above, this
   feels like the right thing to do to keep the algorithm happy.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-17 15:16:09 -04:00
Lee Hinman f6b08a3115
[7.x] Allow simulating existing composable index template (#59733) (#59798)
Backports the following commits to 7.x:

    Allow simulating existing composable index template (#59733)
2020-07-17 13:10:07 -06:00
Nik Everett 95e6e4a452
Small cleanup for IndexFieldData (#59724) (#59800)
This drops `IndexComponent` from `IndexFieldData` because it wasn't
doing anything other than forcing us to perform a bunch of ceremony to
build them.
2020-07-17 13:38:15 -04:00
Tal Levy c9ab7bb651
Fix bug in circuit-breaker check for geoshape grid aggregations (#57962) (#59741)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.

Fixes #57847.
2020-07-17 09:26:00 -07:00
Christoph Büscher f4ff5fe93b
Add `zero_terms_query` support to `match_phrase_prefix` (#58822) (#59784)
Currently `match_phrase_prefix` doesn't support `zero_terms_query` like the
other match-type queries. This change adds this support.

Closes #58468
2020-07-17 17:23:23 +02:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Alan Woodward 65f6fb8e94
Shortcut mapping update if the incoming mapping version is the same as the current mapping version (#59517) (#59772)
Currently, when we apply a cluster state change to a shard on a non-master node,
we check to see if the mappings need to be updated by comparing the decompressed
serialized mappings from the update against the serialized version of the shard's
existing mappings. However, we already have a much simpler way of checking this,
by comparing mapping versions on the index metadata of the old and new states.

This commit adds a shortcut to MapperService.updateMappings() that compares
these mapping versions, and ignores the merge if they are equal.
2020-07-17 14:53:09 +01:00
Alan Woodward b29d368b52
Convert DateFieldMapper to parametrized format (#59429) (#59759)
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
2020-07-17 12:46:18 +01:00
Przemko Robakowski 790fbbcd87
[7.x] Fix handling of final pipelines when destination is changed (#59522) (#59746)
* Fix handling of final pipelines when destination is changed (#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
2020-07-17 11:13:48 +02:00
Tim Brooks b6e6a8c090
Fix replication operation transient retry test (#58205)
After the work to retry transient replication failures, the local and
global checkpoint test metadata can be incremented on a different thread
than the test thread. This appears to introduce an extremely rare
scenario where this data is not visible for later test assertions. This
commit fixes the issue by using synchronized maps.
2020-07-16 16:01:47 -06:00
Martijn van Groningen 0096238df1
Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727)
Backport #59503 to 7.x

and adjusted exception messages.

Relates to #59076
2020-07-16 22:29:40 +02:00
Igor Motov 2408803fad
Adds hard_bounds to histogram aggregations (#59175) (#59656)
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
2020-07-16 15:31:53 -04:00
Alan Woodward 10be10c99b
Migrate CompletionFieldMapper to parametrized format (#59691)
This adds a number of new optional parameters to Parameter, including:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
* allowing default values to be based on the values of other parameters

We preserve the previous serialization format of CompletionFieldMapper,
always emitting most fields, in order to meet mapping checks in mixed
version clusters, where the mapper service will check that mappings have
been correctly parsed and updated by checking their serialized outputs.
2020-07-16 19:15:00 +01:00
Howard c0d429863c
remove unused cluster name in environment. (backport of #59605) (#59681)
removes an unused variable
2020-07-16 09:25:55 -04:00
Nik Everett 343053c0a7 Fix compilation in Eclipse (backport #59675)
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
2020-07-16 08:25:12 -04:00
Alan Woodward 27067de699 Make MappedFieldType#meta final (#59383)
The MappedFieldType#updateMeta method was used for testing equality checks, but we
no longer need these after #59212 , so we can remove this method and make meta final.
2020-07-16 09:45:55 +01:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Armin Braun 6db481f49e
Fix ConcurrentSnapshotsIT.testEquivalentDeletesAreDeduplicated (#59611) (#59653)
Trying to queue up snapshot deletes by blocking the delete of the latest
index-N doesn't work here. The first delete will block on the delete operation
but only do so after having already written the updated repository data.
Since that repository data will contain no snapshots, the subsequent deletes for
`*` will just fall through and complete instead of queue up.
=> Fixed by simply waiting on all files on master so that we block before updating
the repository data and get to test the queueing of equivalent operations

closes #59608
2020-07-16 09:28:36 +02:00
Nhat Nguyen b599f7a9c0
Fix estimate size of translog operations (#59206)
Make sure that the estimateSize method includes all fields of translog operations.
2020-07-16 00:19:30 -04:00
Julie Tibshirani 2b70758a05 Correct type parametrization in geo mappers. (#59583)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
2020-07-15 14:10:47 -07:00
Boice Huang ef26c1739b fix typo in Exception Response in GeoJson (#59270) 2020-07-15 20:15:18 +01:00
Boice Huang 07a58d915d Fix typo in AggregationProfiler (#59269) 2020-07-15 20:14:19 +01:00
Armin Braun cc7093645c
Cleanup some Serialization Code around Snapshots (#59532) (#59606)
A number of obvious possible simplifications that also improve efficiency
in some cases (better empty collection handling and size hint use).
Also, added a shortcut for writing and reading immutable open maps that
can be used to dry up additional spots.
2020-07-15 20:40:43 +02:00
David Turner 67e7c3f60e Fix failing test introduced in #59601 2020-07-15 17:44:27 +01:00
Rory Hunter b8d73a1e7e
Default gateway.auto_import_dangling_indices to false (#59302)
Backport of #58898.

Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
2020-07-15 17:10:42 +01:00
David Turner 691759fb1f
Validate snapshot UUID during restore (#59601)
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.

This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.

Relates #50999
2020-07-15 16:23:20 +01:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Rory Hunter 2e05ce5f88 Bump version to 7.10.0 2020-07-15 11:56:45 +01:00
David Turner 0c2510dc68 Don't request cluster metadata in _cat/shards impl (#59548)
Today `GET _cat/shards` requests the nodes, routing table, and metadata
from the cluster state, but it does not use any information from the
metadata portion of the response. Metadata includes things like mappings
and templates that may be substantial in size.

This commit drops the unnecessary metadata portion of this cluster state
request.
2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Armin Braun 96f52a028f
Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428) (#59584)
We were not handling the case where during a partial snapshot all
shards would enter a failed state right off the bat.

Closes #59384
2020-07-15 07:59:22 +02:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Armin Braun 16a47e0d08
Simplify SnapshotsInProgress Construction (#58893) (#59573)
With parallel snapshots incoming (but also in isolation) it makes sense to clean up
`SnapshotsInProgress` construction.
We don't need to pre-compute the waiting shards for every entry. We rarely use this information
(only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping
over all shards instead of just the waiting ones once per routing change tops instead of on every change
to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the
current master cares about the information).
In addition to that change I removed some dead code constructors and slighly optimized deserialization.
2020-07-15 00:00:53 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Armin Braun 68a199f75f
Minor Cleanup Dead Code Snapshotting (#57716) (#59569)
* Use consistent cluster state instead in state update
* Remove dead loop in tests
* Remove some dead exception ctors

Just three trivial/random things I found.
2020-07-14 23:13:14 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Mark Tozzi ed2c29f102
If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360) (#59567)
Co-authored-by: Fabio Corneti <info@corneti.com>
2020-07-14 16:56:29 -04:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Tim Brooks a46e5e0f04
Increase default write queue size (#59464)
This commit increases the default write queue size to 10000. This is to
allow a greater number of pending indexing requests. This work is safe
as we have added additional memory limits. Relates to #59263.
2020-07-14 10:35:25 -06:00
Tim Brooks 1a24916fef
Enable replication retries on 7.9+ (#59546)
Currently the work to support replication retries is present on 7.9.
This commit enables these retries by setting the replication timeout to
60s.
2020-07-14 10:35:05 -06:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Luca Cavanna af2f85be15
Consolidate script parsing from object (7.x) (#59509)
The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391).

This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.
2020-07-14 17:08:29 +02:00
Mark Tozzi b357c1b77a
[7.x] Fix NPE when building exception messages for aggregations (#59156) (#59334) 2020-07-14 09:37:44 -04:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Armin Braun 0e3d87ab54
Add Assertions on CS Application in Snapshot Logic (#58681) (#59511)
Relates to #58680. Bugs like that should not only show up in logs
but ideally also get caught in tests. We expect to never see exceptions
in these two spots.
2020-07-14 12:16:42 +02:00
Armin Braun 81e96954d0
Improve Efficiency of SnapshotsService CS Apply (#56874) (#59508)
This change removes the redundant submitting of two separate cluster state updates
for the node configuration changes and routing changes that affect snapshots.
Since we submitted the task to deal with node configuration changes every time on master
fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well
as snapshots that have all their shards completed into this cluster state update task.

Aside from improving efficiency overall this change has the fortunate side effect of moving
all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots
since it makes it very natural and straight forward to order snapshot finalizations by exploiting
that they are all initiated on the same thread.
2020-07-14 11:49:09 +02:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Tim Brooks 68d56fa7db
Implement rejections in `WriteMemoryLimits` (#59451)
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
`indexing_limits.memory.limit`.
2020-07-13 14:34:50 -06:00
Mark Tozzi eb0b28dd1d
Move getPointReaderOrNull into AggregatorBase (#58769) (#59455) 2020-07-13 16:31:33 -04:00
Armin Braun 64c5f70a2d
Remove Needless Context Switches on Loading RepositoryData (#56935) (#59452)
We don't need to switch to the generic or snapshot pool for loading
cached repository data (i.e. most of the time in normal operation).

This makes `executeConsistentStateUpdate` less heavy if it has to retry
and lowers the chance of having to retry in the first place.
Also, this change allowed simplifying a few other spots in the codebase
where we would fork off to another pool just to load repository data.
2020-07-13 21:38:29 +02:00
Armin Braun bde92fc5fc
Remove Needless Context Switch From Snapshot Finalization (#56871) (#59443)
No need to do any switch to the `SNAPSHOT` pool here, the blob store
repo handles all its writes async on the `SNAPSHOT` pool so we're just
needlessly context-switching to enqueue those tasks there.
Also cleaned up the source only repository (the only override to `finalizeSnapshot`)
to make it clear that no IO is happening there and we don't need to run it on the
`SNAPSHOT` pool either.
2020-07-13 20:11:07 +02:00
Armin Braun 31be3a3645
More Efficient Snapshot State Handling (#56669) (#59430)
Follow up to #56365. Instead of redundantly checking snapshots for completion
over and over, just track the completed snapshots in the CS updates that complete
them instead of looping over the smae snapshot entries over and over.
Also, in the batched snapshot shard status updates, only check for completion
of a snapshot entry if it isn't already finalizing.
2020-07-13 18:58:04 +02:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Henning Andersen adf6083dd0
Enhance real memory circuit breaker with G1 GC (#58674) (#59394)
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.

This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.

Related to #57202
2020-07-13 17:41:09 +02:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Alan Woodward bd01fd107c Revert "Migrate CompletionFieldMapper to parametrized format (#59291)"
This reverts commit 19ba6c39d2.
2020-07-13 14:16:09 +01:00
Armin Braun 4e574a7136
Remove Dead Code from Closed Index Snapshot Logic (#56764) (#59398)
The code path for closed indices is dead code here ever since #39644
because `shards(currentState, indexIds, ...)` does not set
`MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned.
=> simplified the code accordingly.
2020-07-13 14:49:16 +02:00
David Turner 3fb9dccc22 Fix FSHealthServiceTests on Windows (#59387)
In #52680 we introduced a new health check mechanism. This commit fixes
up some related test failures on Windows caused by erroneously assuming
that all paths begin with `/`.

Closes #59380
2020-07-13 12:43:45 +01:00
Alan Woodward 19ba6c39d2 Migrate CompletionFieldMapper to parametrized format (#59291)
This adds some optional extra configuration to Parameter:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
2020-07-13 12:43:15 +01:00
Armin Braun 08b54feaaf
Remove Snapshot INIT Step (#55918) (#59374)
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.

This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
2020-07-13 13:41:09 +02:00
Alan Woodward c810a4a12e Continue to accept unused 'universal' params in <8.0 indexes (#59381)
We have a number of parameters which are universally parsed by almost all
mappers, whether or not they make sense. Migrating the binary and boolean
mappers to the new style of declaring their parameters explicitly has meant
that these universal parameters stopped being accepted, which would break
existing mappings.

This commit adds some extra logic to ParametrizedFieldMapper that checks
for the existence of these universal parameters, and issues a warning on
7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an
error.

Fixes #59359
2020-07-13 11:15:56 +01:00
David Kyle 7dcd943e1d Mute FsHealthServiceTests testFailsHealthOnIOException (#59382)
For #59380
2020-07-13 09:48:07 +01:00
Armin Braun 483386136d
Move all Snapshot Master Node Steps to SnapshotsService (#56365) (#59373)
This refactoring has three motivations:

1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
    * This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
2020-07-12 22:19:07 +02:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Stuart Tettemer 4c04fd1e05
Scripting: Unlimited compilation rate for ingest (#59268)
* `ingest` and `processor_conditional` default to unlimited compilation rate

Refs: #50152
2020-07-09 16:34:47 -05:00
Stuart Tettemer 94e213dd5f
Scripting: Per context stats in `script` in _nodes/stats (#59266)
Updated `_nodes/stats`:
 * Update `script` in `_node/stats` to include stats per context:

```
      "script": {
        "compilations": 1,
        "cache_evictions": 0,
        "compilation_limit_triggered": 0,
        "contexts":[
          {
            "context": "aggregation_selector",
            "compilations": 0,
            "cache_evictions": 0,
            "compilation_limit_triggered": 0
          },

```

Refs: #50152
Backport: #59625
2020-07-09 15:30:50 -05:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Dan Hermann c26d2b5fa5
Data stream support for indices shard stores API 2020-07-09 13:11:45 -05:00
Nik Everett 28ef997953
Improve vwh's distant bucket handling (#59094) (#59248)
This modifies the `variable_width_histogram`'s distant bucket handling
to:
1. Properly handle integer overflows
2. Recalculate the average distance when new buckets are added on the
   ends. This should slow down the rate at which we build extra buckets
   as we build more of them.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-09 12:14:46 -04:00
Przemko Robakowski c870d6e570
[7.x] Restart tests with data streams (#58330) (#59303)
* Restart tests with data streams (#58330)
2020-07-09 17:52:20 +02:00
David Turner d56fc72ee5 Fix node health-check-related test failures (#59277)
In #52680 we introduced a new health check mechanism. This commit fixes
up some sporadic related test failures, and improves the behaviour of
the `FollowersChecker` slightly in the case that no retries are
configured.

Closes #59252
Closes #59172
2020-07-09 12:46:12 +01:00
David Turner c80a9e2ec2 Skip unnecessary directory iteration (#59007)
Today `NodeEnvironment#findAllShardIds` enumerates the index directories
in each data path in order to find one with a specific name. Since we
already know the name of the folder we seek we can construct the path
directly and avoid this directory listing. This commit does that.
2020-07-09 11:56:41 +01:00
Alan Woodward 67a27e2b9d Add declarative parameters to FieldMappers (#58663)
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
2020-07-09 11:43:21 +01:00
Ignacio Vera 1ad00d1ceb
Add Support in geo_match enrichment policy for any type of geometry (#59276)
geo_match enrichment works currently only with points. This change adds the ability to
use any type of geometry.
2020-07-09 11:41:41 +02:00
Nhat Nguyen 6a0f7411e2 Do not release safe commit with CancellableThreads (#59182)
We are leaking a FileChannel in #39585 if we release a safe commit with 
CancellableThreads. Although it is a bug in Lucene where we do not close
a FileChannel if we failed to create a NIOFSIndexInput, I think it's
safer if we release a safe commit using the generic thread pool instead.

Closes #39585
Relates #45409
2020-07-08 13:51:48 -04:00
Nhat Nguyen 00c859bfca Fix testSendSnapshotSendsOps
We need to use a concurrent collection to keep track of the shipped operations
as they can arrive concurrently since #58018.

Relates #58018
2020-07-08 12:25:33 -04:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
Nik Everett a29d3515a2
Improve cardinality measure used to build aggs (#56533) (#59107)
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-08 08:42:23 -04:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Tim Brooks 3700bd1c08
Fix assertion in testCollectNodes test (#58948)
Currently we assert that the reason we fail collecting nodes in this
test is due to the fact that no seeds are available or no connections
could be established to cluster_2. However, the collection could fail if
we cannot establish connections to cluster_1. This commit adds that as
an acceptible assertion.
2020-07-07 21:37:10 -06:00
Nhat Nguyen ef5c397c0f
Sending operations concurrently in peer recovery (#58018)
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.

With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.

Backport of #58018
2020-07-07 22:03:31 -04:00
Lee Hinman b832fe30ab
[7.x] Validate Data Streams reference a template on composable template update (#59106) (#59193)
This commit adds validation that when a composable index template is updated, that the number
of unreferenced data streams does not increase. While it is still possible to have data streams
without a backing template (through snapshot restoration), this reduces the chance of getting
in to that scenario.

Relates to #53100
2020-07-07 15:38:27 -06:00
Tim Brooks b1c3ad8f59
Fix race in RecoveryRequestTrackerTests (#59187)
Currently in the recovery request tracker tests we place the futures
into the future map on the GENERIC thread. It is possible that the test
has already advanced past the point where we block on these futures
before they are placed in the map. This introduces other potential
failures as we expect all futures have been completed. This commit fixes
the test by places the futures in the map prior to dispatching.
2020-07-07 15:10:31 -06:00
Nik Everett d536854879 Fix test bug in auto_date_histo
The test would try to prepare a `Rounding` even when there aren't any
buckets. This would fail because there is no range over which to prepare
the rounding. It turns out that we don't need the rounding in that case
so we just use `null` then.

Closes #59131
2020-07-07 15:39:48 -04:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Nhat Nguyen de6ac6aea6 Fix recovery stage transition with sync_id (#57754)
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.

Relates ##7187
Closes #57708
2020-07-07 12:00:37 -04:00
Nik Everett eb169ae226
Fix lookup support in adjacency matrix (backport of #59099) (#59108)
This request:
```
POST /_search
{
  "aggs": {
    "a": {
      "adjacency_matrix": {
        "filters": {
          "1": {
            "terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
          }
        }
      }
    }
  }
}
```

Would fail with a 500 error and a message like:
```
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason":"async actions are left after rewrite"
      }
    ]
  }
}
```

This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
2020-07-07 10:28:20 -04:00
David Turner 46c8d00852
Remove nodes with read-only filesystems (#52680) (#59138)
Today we do not allow a node to start if its filesystem is readonly, but
it is possible for a filesystem to become readonly while the node is
running. We don't currently have any infrastructure in place to make
sure that Elasticsearch behaves well if this happens. A node that cannot
write to disk may be poisonous to the rest of the cluster.

With this commit we periodically verify that nodes' filesystems are
writable. If a node fails these writability checks then it is removed
from the cluster and prevented from re-joining until the checks start
passing again.

Closes #45286

Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>
2020-07-07 14:00:02 +01:00
Francisco Fernández Castaño 1ced3f0eb3
Extract recovery files details to its own class (#59121)
Backport of #59039
2020-07-07 12:35:57 +02:00
Armin Braun d6d6df16bb
Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) (#59119)
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
2020-07-07 12:04:41 +02:00
David Turner ef2f0d1f67 Inline no-op IndicesModule#getEngineFactories (#59051)
This method was introduced in #31183 but it has no effect and is never
overridden so this commit removes it.
2020-07-07 09:15:20 +01:00
Francisco Fernández Castaño 0752a86fe5
Enforce higher priority for RepositoriesService ClusterStateApplier (#59040)
* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
2020-07-07 09:51:08 +02:00
Howard 00ed31d000 Remove IndexShardRoutingTable#primaryAsList (#59044) 2020-07-07 07:34:32 +01:00
Nik Everett be13dea113 Drop a TODO from the terms aggregator (#59100)
We did it in #56487.
2020-07-06 17:46:06 -04:00
Nik Everett eff5f4d234
Add pipeline aggregations to the rewrite phase (backport #58878) (#59081)
This allows pipeline aggregations to participate in the up-front rewrite
phase for searches, in particular, it allows them to load data that they
need asynchronously.

Relates to #58193

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-06 15:13:45 -04:00
Nhat Nguyen e827d2ed92 Fix testRestoreLocalHistoryFromTranslogOnPromotion (#58745)
If the global checkpoint equals max_seq_no, then we won't reset an engine 
(as all operations are safe), and max_seqno_of_updates_or_deletes 
won't advance to max_seq_no.

Closes #58163
2020-07-06 12:19:45 -04:00
Andrei Dan 2d516d7bcc
[7.x] Search all (_all, *) resolves data streams too (#58869) (#59058)
Part of the original PR was merged by #59028

(cherry picked from commit 2598327726124d8a86333f79cdc45bf6a4297dbc)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-06 14:19:15 +01:00
Dan Hermann 550dcb0ca6
[7.x] Delete data stream API accepts multiple names (#59064) 2020-07-06 08:06:10 -05:00
Armin Braun 722d94688b
Fix MinimumMasterNodesIT Test (#59054) (#59057)
Tiny oversight in dee9e048bdcc5ba59f20d2554e989015463df05a caused
the `otherNodes` collection to incorrectly contain `master` here.
2020-07-06 13:00:15 +02:00
Armin Braun 62eabdac6e
Dry up Snapshot ITs further (#59035) (#59052)
Some more obvious cleaning up of the snapshot ITs.

follow up to #58818
2020-07-06 12:26:42 +02:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Armin Braun 49857cc35d
Dry up Master Disconnect Disruption Tests (#58953) (#59050)
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
2020-07-06 11:04:24 +02:00
Nhat Nguyen 62763b177d Implement toString for BulkByScrollTask (#59042)
We should implement "toString" of BulkByScrollTask.StatusOrException 
to have a meaningful log message when a reindex task completes.
2020-07-05 22:06:56 -04:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Dan Hermann 7c43cbca82
[7.x] Ignore matching data streams if include_data_streams is false (#59028) 2020-07-03 14:51:32 -05:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Armin Braun d22dd437f1
Fix Two Common Zero Len Array Instantiations (#58944) (#58993)
Two spots I found in which we commonly instatiate a non-trivial number of zero length arrays.
2020-07-03 09:18:14 +02:00
Nhat Nguyen 65645217bc Handle IOException while checking translog corruption
We can hit an IOException while reading a translog header after corrupting it.

Relates #58866
2020-07-02 22:38:05 -04:00
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Mark Vieira 8fca312a3a
Mute WriteMemoryLimitsIT.testWriteBytesAreIncremented 2020-07-02 16:58:23 -07:00
Tim Brooks 9d1bf383d0
Add test assertions to ensure write bytes released (#58970)
This is a follow-up to #57573. This commit ensures that the bytes marked
in WriteMemoryLimits are released by any test using an internal test
cluster.
2020-07-02 17:38:23 -06:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Jim Ferenczi a4e08acdd1 Fix exists query on unmapped field in query_string (#58804)
Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped.
This change also introduced a bug in the `query_string` query, using an unmapped field
like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the
exception if the query is built outside of an `ExistsQueryBuilder`.

Closes #58737
2020-07-02 21:52:03 +02:00
Nhat Nguyen be804b765d
Avoid flipping translog header version (#58866)
An old translog header does not have a checksum. If we flip the header 
version of an empty translog to the older version, then we won't detect
that corruption, and translog will be considered clean as before.

Closes #58671
2020-07-02 14:34:19 -04:00
Tal Levy d516959774
Re-enable support for array-valued geo_shape fields. (#58786) (#58943)
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
2020-07-02 11:21:55 -07:00
Ryan Ernst d825d4352c
Eagerly compile condition script at processor creation (#58882)
Ingest script processors were changed to eagerly compile their scripts
when the ingest pipeline is saved, but conditional scripts were missed.
This commit adds eager compilation to ingest conditional scripts, which
will help surface errors before runtime, as well as adds tests for each
case we might encounter between inline and stored script compilation
failures.

closes #58864
2020-07-02 11:10:20 -07:00
Lee Hinman e32623ef52
[7.x] Add test for component templates updated after cluster restart (#58883) (#58914)
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.

In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.

This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
2020-07-02 08:23:34 -06:00
Armin Braun 62152852dc
Cleanup Duplication in Snapshot ITs (#58818) (#58915)
Just a few obvious static cleanups of duplication to push back against the ever increasing complexity of these tests.
2020-07-02 16:00:01 +02:00
Alan Woodward 0cd1dc3143 Percolator keyword fields should not store norms (#58899)
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
2020-07-02 13:59:28 +01:00
Nik Everett 5e49ee800e
Drop rewriting in date_histogram (backport of #57836) (#58875)
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
2020-07-01 17:19:12 -04:00
Dan Hermann 98a62a6b2d
Make DataStream instances explicitly immutable (#58688) (#58839) 2020-07-01 11:14:01 -05:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Andrei Dan f7dc09340b
Prohibit custom _routing for index requests targetting a data stream (#58749) (#58831)
This prohibits the use of a custom _routing when the index/bulk requests are targetting a data stream.
Using a custom _routing when targetting a backing index is still permitted.

Relates to #53100

(cherry picked from commit ece6b7a318a8bd3a010499189f31fc5e3a012d4f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-01 14:54:18 +01:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka 2c275913b9
[7.x] Week based parsing for ingest date processor (#58597) (#58802)
Date processor was incorrectly parsing week based dates because when a
weekbased year was provided ingest module was thinking year was not
on a date and was trying to applying the logic for dd/MM type of
dates.
Date Processor is also allowing users to specify locale parameter. It
should be taken into account when parsing dates - currently only used
for formatting. If someone specifies 'en-us' locale, then calendar data
rules for that locale should be used.
The exception is iso8601 format. If someone is using that format,
then locale should not override calendar data rules.
closes #58479
2020-07-01 15:15:56 +02:00
David Turner 822b7421ce Forbid read-only-allow-delete block in blocks API (#58727)
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
2020-07-01 13:18:26 +01:00
Martijn van Groningen a0df96befb
Add data stream support to put mapping and update index settings APIs. (#58758)
Backport of #58231 to 7.x branch.

Change update index setting and put mapping api
to execute on all backing indices if data stream is targeted.

Relates #53100
2020-07-01 13:32:21 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
Dan Hermann 1c2a726731
Data stream support for search shards API (#58486) (#58765) 2020-06-30 17:59:51 -05:00
Nik Everett 40850a780d
Fail variable_width_histogram that collects from many (#58619) (#58780)
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035
2020-06-30 18:26:45 -04:00
Dan Hermann cae49b0fd7
[7.x] Add data stream support to open index API (#58767) 2020-06-30 14:30:32 -05:00
Dan Hermann a84ff81743
Data stream support for get field mappings API (#58488) (#58766) 2020-06-30 13:45:04 -05:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
Boice Huang 8c93f4e154 Sort document by internal doc id in FetchPhase to better use LRU cache (#57273)
This change sorts the docIdsToLoad once instead of in each sub-phase.
2020-06-30 17:06:09 +02:00
Julie Tibshirani ab65a57d70
Merge mappings for composable index templates (#58709)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-30 08:01:37 -07:00
Armin Braun b52a764143
Fix NPE in SnapshotService CS Application (#58680) (#58735)
In the unlikely corner case of deleting a relocation (hence `WAITING`) primary shard's
index during a partial snapshot, we would throw an NPE when checking if there's any external
changes to process.
2020-06-30 15:20:49 +02:00
Yannick Welsch b885cbff1a
Add index block api (#58716)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 14:06:52 +02:00
Patrick Jiang(白泽) be20aacec3 Add `matchBoolPrefix` method to QueryBuilders (#58637) 2020-06-29 16:30:40 +02:00
Armin Braun 95d85f29f8
Fix Snapshots Capturing Incomplete Datastreams (#58630) (#58656)
Only snapshot datastreams that are recorded in `SnapshotInfo` and clean those
that aren't from the snapshotted metadata.
Do not restore all datastreams by default when restoring global metadata, use the same
mechanics used for indices here.

Closes #58544
2020-06-29 12:51:40 +02:00
Armin Braun 4f2f257b12
Fix DataStream Handling on Restore of Global Metadata (#58631) (#58649)
When restoring a global metadata snapshot we were overwriting the correctly
adjusted data streams in the metadata when looping over all custom values.

Closes #58496
2020-06-29 10:58:41 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Ryan Ernst 08e75abd4e
Always add Java-9 style file permissions (#46050) (#58628)
Java 9 removed pathname canonicalization, which means that we need to
add permissions for the path and also the real path when adding file
permissions. Since master requires a minimum runtime of JDK 11, we no
longer need conditional logic here to apply this pathname
canonicalization with our bares hands. This commit removes that
conditional pathname canonicalization.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-06-26 18:19:07 -07:00
Nik Everett 67e9d39932
Remove useless aggregation helper (#58571) (#58578)
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 15:58:44 -04:00
Tanguy Leroux 775fb5d4cf
Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477) (#58584)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.

This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
 SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 18:26:20 +02:00
Armin Braun 090211f768
Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots (#58390) (#58593)
Minor bugs/inconsistencies:

If a shard hasn't changed at all we were reporting `0` for total size and total file count
while it was ongoing.

If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.
2020-06-26 16:11:30 +02:00
Howard eaa60b7c54 [Docs] Fix return tuple element order (#58463) 2020-06-26 12:24:54 +02:00
Nik Everett 5f52bc4c9f
Fix two scripted_metric bugs (backport of #58547) (#58565)
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
   when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
   script because we *assumed* that it'd be ok to run the script once
   and clone its results. Sadly, cloning can't clone *anything* that the
   init script can make, like `String` arrays. This runs the init script
   once for every new bucket so we don't need to clone.
2020-06-25 16:16:10 -04:00
Armin Braun 468e559ff7
Fix Memory Leak From Master Failover During Snapshot (#58511) (#58560)
If we failed over while the data nodes were doing their work
we would never resolve the listener and leak it.
This change fails all listeners if master fails over.
2020-06-25 20:43:08 +02:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Jim Ferenczi 6451187e84 Filter empty fields in SearchHit#toXContent (#58418)
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
2020-06-25 17:49:03 +02:00
Nik Everett 03e6d1b535
Add Variable Width Histogram Aggregation (backport of #42035) (#58440)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-06-25 11:40:47 -04:00
Nik Everett c7726cc93e Fix janky test
Fixes a test that incorrectly assumed that a list of random values less
than or equal to `n` always contained `n`. Oops.

Closes #58353
2020-06-25 11:13:29 -04:00
Nik Everett 71adade73a
Return clear error message if aggregation type is invalid (#58255) (#58365)
The main changes are:

1. Catch the `NamedObjectNotFoundException` when parsing aggregation
   type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().

Closes #58146.

Co-authored-by: bellengao <gbl_long@163.com>
2020-06-25 11:08:25 -04:00
David Roberts 1742b1c39e Cancel persistent task recheck when no longer master (#58539)
If a persistent task cannot be assigned on the first attempt
then the master node will schedule periodic rechecks to see
if the assignment requirements have been met.

These periodic rechecks should be cancelled if the node ceases
to be master.  Previously they weren't, leading to exceptions
being logged repeatedly.  This PR cancels the rechecks on
learning that the node is no longer the master.

Fixes #58531
2020-06-25 15:51:57 +01:00
Nik Everett 335505c4e1
Drop deprecated aggregator wrapper (backport of #58367) (#58448)
This drops the deprecated and now unused `asMultiBucketAggregator`. It
was too easy to use it to make inefficient `Aggregators`.

Relates to #56487
2020-06-25 09:31:19 -04:00
Julie Tibshirani 1f2e05c947
Simplify mapping validation for resizing indices. (#58514)
When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.

This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
2020-06-24 14:07:19 -07:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
markharwood d5ac3bb87f
Field capabilities - make `keyword` a family of field types (#58315) (#58483)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 12:32:14 +01:00
Jim Ferenczi ec8d5ec79c Fix handling of terminate_after when size is 0 (#58212)
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.

Closes #57624
2020-06-24 13:16:11 +02:00
David Turner 796cb9e9ca Reword INDEX_READ_ONLY_ALLOW_DELETE_BLOCK message (#58410)
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:

    TOO_MANY_REQUESTS/12/index read-only / allow delete (api)

Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:

    FORBIDDEN/12/index read-only / allow delete (api)

Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
2020-06-24 10:22:11 +01:00
Alan Woodward d251a482e9 Move MappedFieldType.similarity() to TextSearchInfo (#58439)
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.

It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
2020-06-24 10:00:32 +01:00
Ryan Ernst 89c03e593c
Create utility for custom config setup in packaging tests (#58352)
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
2020-06-23 15:12:22 -07:00
Dan Hermann b40c27698f
Fix incorrect stats warning when swap is disabled 2020-06-23 14:34:27 -05:00
James Rodewig affc3954e6
[DOCS] Fix typo in RoutingNode comment (#58079) (#58454)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-06-23 13:07:08 -04:00
Christoph Büscher 642b05a511
Fix test failure in RangeQueryBuilderTests.testToQuery (#58449)
Very rarely this test can fail if we draw a random TimeZone id that we cannot
parse with the legacy joda DateMathParser and get an IllegalArgumentException.
In addition to a "SystemV/*" time zone we also need an index "versionCreated"
before V_7_0_0 and no "format" setting in the query builder. Given how unlikely
this combination is, we should simply dissallow those time zone ids when
generating the random query builder for RangeQueryBuilderTests.

Closes #58431
2020-06-23 17:44:18 +02:00
Mark Tozzi 52806a8f89
Small VS config cleanup (#58294) (#58442) 2020-06-23 10:53:06 -04:00
Alan Woodward 8ebd341710
Add text search information to MappedFieldType (#58230) (#58432)
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.

This allows us to remove the MapperService.getLuceneFieldType() shim method.
2020-06-23 14:37:26 +01:00
Nik Everett 519f41950a
Save memory when significant_text is not on top (#58145) (#58364)
This merges the aggregator for `significant_text` into
`significant_terms`, applying the optimization built in #55873 to save
memory when the aggregation is not on top. The `significant_text`
aggregation is pretty memory intensive all on its own and this doesn't
particularly help with that, but it'll help with the memory usage of any
sub-aggregations.
2020-06-23 09:19:05 -04:00
Dan Hermann 41e8f584c1
[7.x] Minimum node version check before creating data stream (#58424) 2020-06-23 07:45:27 -05:00
Armin Braun 943efb78fd
Save Shard ID Serializations in Bulk Requests (#56209) (#58414)
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.

Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:

* 8 bytes for the `ShardId` object (itself + one field)
   * + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
   * + 30 bytes for the `Index` uuid string
   * + all the bytes in the index name string

=> 60+ bytes per bulk request item saved on heap and over the wire
2020-06-23 12:35:52 +02:00
David Turner 256b660f0a
Remove anonymous PublicationContext implementation (#58412)
Today the `PublicationContext` interface has a single anonymous
implementation, and `PublicationTransportHandler` has various methods
that take the variables that this anonymous class captures. This commit
refactors this into a proper class with proper fields and moves the
relevant methods onto this class.

Backport of #58405 to 7.x.
2020-06-23 11:13:23 +01:00
Alan Woodward 519d1278e2
Make FieldTypeLookup immutable (#58162) (#58411)
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
2020-06-23 10:51:32 +01:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Przemko Robakowski a44dad9fbb
[7.x] Add support for snapshot and restore to data streams (#57675) (#58371)
* Add support for snapshot and restore to data streams (#57675)

This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.

Closes #57127

Relates to #53100

* version fix

* compilation fix

* compilation fix

* remove unused changes

* compilation fix

* test fix
2020-06-19 22:41:51 +02:00
William Brafford b3c99f06d6
Mute flaky test (#58356) 2020-06-18 15:30:11 -04:00
Andrei Dan 30e777856f
[7.x] Validate alias operations don't target data streams (#58327) (#58337)
This adds validation to make sure alias operations (add, remove, remove index)
don't target data streams or the backing indices.

(cherry picked from commit 816448990e464a02f3960f12f6f6644a8cce36a4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-18 20:23:07 +01:00
Stuart Tettemer 20abba8433
Scripting: Deprecate general cache settings (#55753) (#58283)
Backport: ef543b0
2020-06-18 11:54:23 -06:00
Alan Woodward 4b8cf2af6a
Add serialization test for FieldMappers when include_defaults=true (#58235) (#58328)
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.

Fixes #58188
2020-06-18 15:46:04 +01:00
Alan Woodward ca2d12d039 Remove Settings parameter from FieldMapper base class (#58237)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
2020-06-18 12:53:54 +01:00
Rory Hunter 4da767bb3e Fix version 2020-06-18 12:29:47 +01:00
Rory Hunter a71f0cabdc Version bump for 7.8.0 release 2020-06-18 11:04:56 +01:00
Christoph Büscher ba0b046909 Fix test compilation issue 2020-06-18 11:36:11 +02:00
Christoph Büscher 31d8e03954 Prevent BigInteger serialization errors in term queries (#57987)
When a numeric value in e.g. a `term` query doesn't fit into a long, it
curerently gets parsed to a BigInteger object, that the various term query
builders store untouched. This leads to serialization errors when these queries
are sent across the wire. Instead we can convert to a string representation
early on, since that is what we store e.g. when indexing big integers into
`keyword` fields anyway.

Closes #57917
2020-06-18 11:17:12 +02:00
Jim Ferenczi 82db0b575c
Allow index filtering in field capabilities API (#57276) (#58299)
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
2020-06-18 10:23:26 +02:00
Yannick Welsch ffeff4090e Add new flag to check whether alias exists on remove (#58100)
This allows doing true CAS operations on aliases, making sure that an alias is actually properly
moved from a given source index onto a given target index. This is useful to ensure that an
alias is actually moved from a given index to another one, and not just added to another index.
2020-06-18 10:15:26 +02:00
Mark Vieira ef8899b130
Mute SpanMultiTermQueryBuilderTests.testToQueryInnerTermQuery 2020-06-17 16:27:18 -07:00
Julie Tibshirani b1161cba35 Rename SearchContext#smartNameFieldType. (#58203)
The concept of a 'smart name' doesn't make sense now that there are no mapping
types.
2020-06-17 10:38:32 -07:00
Tim Brooks 2074412d79
Retry failed replication due to transient errors (#56230)
Currently a failed replication action will fail an entire replica. This
includes when replication fails due to potentially short lived transient
issues such as network distruptions or circuit breaking errors.

This commit implements retries using the retryable action.
2020-06-17 10:17:30 -06:00
Luca Cavanna 5ddea03de7 Remove needless termsQuery implementation from StringFieldType (#57609)
The base class `TermBasedFieldType` already implements exactly the same `termsQuery` method, hence there is no need to override it.
2020-06-17 18:04:49 +02:00
GeChenxin a96f526de1 Add index name to refresh mapping task (#57598) 2020-06-17 10:49:36 -04:00
Armin Braun 41af7f5455
Fix Typo in Snapshot Abort Test (#58238) (#58247)
Forgot the brackets here in #58214 so in the rare case where the
first update seen by the listener doesn't match it will still remove
itself and never be invoked again -> timeout.
2020-06-17 14:53:39 +02:00
Nik Everett ab2c6d9696
Save memory when auto_date_histogram is not on top (backport of #57304) (#58190)
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.

It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.

The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.

All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster

Relates to #56487
2020-06-17 08:48:41 -04:00
Ignacio Vera b6585f2b51
Add new extensions for Lucene86 points codec to FsDirectoryFactory (#58226) (#58233) 2020-06-17 12:55:33 +02:00
Armin Braun 85be78b624
Fix Snapshot Abort Not Waiting for Data Nodes (#58214) (#58228)
This was a really subtle bug that we introduced a long time ago.
If a shard snapshot is in aborted state but hasn't started snapshotting on a node
we can only send the failed notification for it if the shard was actually supposed
to execute on the local node.
Without this fix, if shard snapshots were spread out across at least two data nodes
(so that each data node does not have all the primaries) the abort would actually
never wait on the data nodes. This isn't a big deal with uuid shard generations
but could lead to potential corruption on S3 when using numeric shard generations
(albeit very unlikely now that we have the 3 minute wait there).
Another negative side-effect of this bug was that master would receive a lot more
shard status update messages for aborted shards since each data node not assigned
a primary would send one message for that primary.
2020-06-17 11:39:50 +02:00
Armin Braun c2b416ee31
Fix DanglingIndicesIT Failures from MasterNotDiscoveredException (#58215) (#58221)
The dangling indices action is not a proper master node action so it does not
retry when executed while the cluster hasn't fully formed yet.
Since we use node restarts when setting up the dangling indices state we need
to manually ensure a fully formed cluster before moving on with the tests to avoid
failures.
2020-06-17 10:34:08 +02:00
Stuart Tettemer 01795d1925
Revert "Scripting: Deprecate general cache settings (#55753)" (#58201)
This reverts commit 88e8b34fc2.
2020-06-16 14:58:18 -06:00
Rory Hunter 03369e0980
Implement dangling indices API (#58176)
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.

Co-authored-by: David Turner <david.turner@elastic.co>
2020-06-16 21:50:38 +01:00
Stuart Tettemer 88e8b34fc2
Scripting: Deprecate general cache settings (#55753)
Backport: ef543b0
2020-06-16 13:06:59 -06:00
Alan Woodward c6acc7c976 Correctly deal with aliases when retrieving lucene FieldType 2020-06-16 18:06:37 +01:00
Alan Woodward 12a3f6dfca
MappedFieldType should not extend FieldType (#58160)
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to #56814
2020-06-16 16:56:43 +01:00
Dan Hermann 911d46370e
Prohibit clone, shrink, and split on a data stream's write index 2020-06-16 10:53:20 -05:00
Lee Hinman 03ce0f8a4d
[7.x] Normalized prefix for rollover API (#57271) (69e1c066) (#58171)
* Normalized prefix for rollover API (#57271)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Lee Hinman <lee@writequit.org>

It fixes the issue #53388
by normalizing prefix at index creation request itself

* Fix compilation for backport

Co-authored-by: Gaurav Chandani <chngau@amazon.com>
2020-06-16 09:22:10 -06:00
Francisco Fernández Castaño a5bc5ae030
Don't log on RetentionLeaseSync error handler (#58157)
After an index has been deleted it may take some time to cancel all the
maintenance tasks such as RetentionLeaseSync, it's possible that the
task is already executing before the cancellation. This commit just
avoids logging a warning message for those scenarios.

Closes #57864

Backport of (#58098)
2020-06-16 14:04:32 +02:00
Yannick Welsch e046b0a8fa Fix realtime get of numeric fields (#58121)
Using realtime get on numeric fields when reading from the translog would yield a ClassCastException.

Closes #57462
2020-06-16 09:16:26 +02:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
Stuart Tettemer 71a42dbde9
[7.x] Rely on the computeIfAbsent logic to prevent duplicated compilation of scripts (#55467) (#58123)
Instead of serializing compilation using a plain lock / mutex combined with a double check, rely on the computeIfAbsent logic to prevent duplicated compilation of scripts. Made checkCompilationLimit to be thread-safe and lock free.

Backport: 865acad

Co-authored-by: Michael Bischoff <michael.bischoff@elastic.co>
2020-06-15 12:01:22 -06:00
markharwood 03dd73dc0d
Fix for wildcard fields that returned ByteRefs not Strings to scripts. (#58060) (#58109)
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.

Closes #58044
2020-06-15 14:52:56 +01:00
Dan Hermann 8a910443c4
Add ignore_empty_value parameter in set ingest processor (#57030) (#58108) 2020-06-15 08:35:08 -05:00
Armin Braun 1a48983a56
Fix Running TranslogOps on CS Thread (#58056) (#58076)
We should fork off from the CS thread to run this even if it's a rare
condition.
2020-06-13 17:00:49 +02:00
Nik Everett a5571eb1a8
Save memory when rare_terms is not on top (backport of #57948) (#58069)
This uses the optimization that we started making in #55873 for
`rare_terms` to save a bit of memory when that aggregation is not on the
top level.
2020-06-12 17:47:10 -04:00
Dan Hermann 17f3318732
[7.x] Resolve index API (#58037) 2020-06-12 15:41:32 -05:00
Mayya Sharipova 8bd0147ba7
Correct how meta-field is defined for pre 7.8 hits (#57951)
We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.

Closes #57831
2020-06-12 09:39:53 -04:00
Armin Braun 5662281562
Fix ExtraFS Breaking SharedClusterSnapshotRestoreIT (#58026) (#58040)
If `ExtraFS` decides to put `extra0/0` into the indices folder
then the previous logic in this test would have interpreted the `0`
as shard `0` of index `extra0` and fail to list its contents (since it's a file
and not an actual shard directory).

=> simplified the logic to use actually referenced `IndexId` for iterating over indices
instead.
2020-06-12 15:27:48 +02:00
Martijn van Groningen 01d8bb8cfa
Enforce valid field mapping exists for timestamp_field in templates. (#58036)
Backport of #57741 to 7.x branch.

Relates to #53100
2020-06-12 15:24:42 +02:00
Armin Braun a5a251d8c0
Handle Rejections when Scheduling RetryableAction (#58033) (#58039)
Scheduling on the threadpool will throw if the scheduler is already
shut down. Handled by treating the rejection like any other non-retryable
exception.

Closes #58021
2020-06-12 15:23:02 +02:00
Nik Everett d6c8d9415d
Give significance lookups their own home (backport of #57903) (#57959)
This moves the code to look up significance heuristics information like
background frequency and superset size out of
`SignificantTermsAggregatorFactory` and into its own home so that it is
easier to pass around. This will:
1. Make us feel better about ourselves for not passing around the
   factory, which is really *supposed* to be a throw away thing.
2. Abstract the significance lookup logic so we can reuse it for the
   `significant_text` aggregation.
3. Make if very simple to cache the background frequencies which should
   speed up when the agg is a sub-agg. We had done this for numerics
   but not string-shaped significant terms.
2020-06-12 09:21:19 -04:00
Martijn van Groningen f4199f2ee0
Prohibit append-only writes targeting backing indices directly. (#58025)
Backport of #57788 to 7.x branch.

Append-only writes can only target the corresponding data stream.

Relates to #53100
2020-06-12 13:17:55 +02:00
Armin Braun db03e7c93b
Exclude WindowsFS from SharedClusterSnapshotRestoreIT (#58020) (#58023)
Same as #52488 but for a different test suite

Closes #58019
2020-06-12 10:49:03 +02:00
Mark Tozzi 36f551bdb4
Make ValuesSourceConfig behave like a config object (#57762) (#58012) 2020-06-11 17:23:55 -04:00
Igor Motov 5138c0c045
Fix missing null values for std_deviation_bounds in ext. stats aggs (#58000)
Adds missing null values for std_deviation_bounds in extended stats aggs and
improves null handling in parsed extended stats.
2020-06-11 16:23:20 -04:00
Lee Hinman ffc3c77f75
[7.x] Disallow deletion of composable template if in use by data stream (#57957) (#57994)
Backports the following commits to 7.x:

    Disallow deletion of composable template if in use by data stream (#57957)
2020-06-11 13:51:56 -06:00
Jim Ferenczi 4c6bfe32a7 Fix possible NPE on search phase failure (#57952)
When a search phase fails, we release the context of all successful shards.
Successful shards that rewrite the request to match none will not create any context
since #. This change ensures that we don't try to release a `null` context on these
successful shards.

Closes #57945
2020-06-11 18:54:16 +02:00
Yannick Welsch 85b0b540f0 Fix refresh behavior in MockDiskUsagesIT (#57926)
Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the
shard size or disk usage function (to mock out disk usage) are overridden.

Closes #57888
2020-06-11 17:38:12 +02:00
David Turner f950c121bb Hide AlreadyClosedException on IndexCommit release (#57986)
Today `InternalEngine#releaseIndexCommit` fails with an
`AlreadyClosedException` if the engine is closed before the index commit is
released. This can happen if, for example, a node leaves and rejoins the
cluster and acquires an index commit for replica shard allocation concurrently
with shutting the shard down.

There's no need to fail the operation like this: if the engine is shut down
then we will clean up the unreferenced files when it's restarted (or if it's
allocated elsewhere) so we can suppress an `AlreadyClosedException` in this
case. This commit does so.

Fixes #57797
2020-06-11 15:41:50 +01:00
Alan Woodward 16e230dcb8 Update to lucene snapshot e7c625430ed (#57981)
Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.
2020-06-11 14:51:53 +01:00
Yannick Welsch 34fc52dbf3 Fix PersistedClusterStateServiceTests.testSlowLogging (#57971)
The range in the last writeDurationMillis selection could be empty, as it could prior to the call be set to 1.
2020-06-11 15:47:34 +02:00
Igor Motov 947573f309
Added standard deviation / variance sampling to extended stats (#49782) (#57947)
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
 
Closes #49554

Co-authored-by: Igor Motov <igor@motovs.org>

Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
2020-06-11 09:19:44 -04:00
Nik Everett da72a3a51d
Speed up reducing auto_date_histo with a time zone (backport of #57933) (#57958)
When reducing `auto_date_histogram` we were using `Rounding#round`
which is quite a bit more expensive than
```
Rounding.Prepared prepared = rounding.prepare(min, max);
long result = prepared.round(date);
```
when rounding to a non-fixed time zone like `America/New_York`. This
stops using the former and starts using the latter.

Relates to #56124
2020-06-11 09:15:12 -04:00
Albert Zaharovits c57ccd99f7
Just log 401 stacktraces (#55774)
Ensure stacktraces of 401 errors for unauthenticated users are logged
but not returned in the response body.
2020-06-10 20:39:32 +03:00
Armin Braun 85f5c4192b
Improve Test Coverage for Old Repository Metadata Formats (#57915) (#57922)
Use the the hack used in `CorruptedBlobStoreRepositoryIT` in more snapshot
failure tests to verify that BwC repository metadata is handled properly
in these so far not-test-covered scenarios.
Also, some minor related dry-up of snapshot tests.

Relates #57798
2020-06-10 13:27:01 +02:00
Yannick Welsch 80f221e920
Use clean thread context for transport and applier service (#57792) (#57914)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-10 10:30:28 +02:00
Armin Braun fe85bdbe6f
Fix Remote Recovery Being Retried for Removed Nodes (#57608) (#57913)
If a node is disconnected we retry. It does not make sense
to retry the recovery if the node is removed from the cluster though.
=> added a CS listener that cancels the recovery for removed nodes

Also, we were running the retry on the `SAME` pool which for each retry will
be the scheduler pool. Since the error path of the listener we use here
will do blocking operations when closing the resources used by the recovery
we can't use the `SAME` pool here since not all exceptions go to the `ActionListenerResponseHandler`
threading like e.g. `NodeNotConnectedException`.

Closes #57585
2020-06-10 09:41:52 +02:00
Armin Braun d579420452
Stop Serializing Exceptions in SnapshotInfo (#57866) (#57898)
In ff9e8c622427d42a2d87b4ceb298d043ae3c4e6a we changed the format
used when serializing snapshot failures in the cluster state and
`SnapshotInfo`. This turned them from a short string holding all the
nested exception messages into a multi kb stacktrace in many cases.
This is not great if you snapshot a large number of shards that all fail
for example and massively blows up the size of the GET snapshots response
if there are snapshots with failures in there.
This change reverts to the format used for exceptions before the above commit.

Also, this change short circuits logging and serialization of the failure
for an aborted snapshot where we don't care about the specific message at all
and aligns the message to "aborted" in all cases (current if we aborted before any IO,
it would have been "aborted" and an exception when aborting later during IO).
2020-06-10 08:41:03 +02:00
Gordon Brown aab6317260
[7.x] Include hidden indices in snapshots by default (#57325)
Previously, hidden indices were not included in snapshots by default, unless
specified using one of the usual methods for doing so: naming indices directly,
using index patterns starting with a ., or specifying expand_wildcards to
a value that includes hidden (e.g. all or hidden,open).

This commit changes the default expand_wildcards value to include hidden
indices.
2020-06-09 16:01:52 -06:00
Yannick Welsch 9eec819c5b Revert "Use clean thread context for transport and applier service (#57792)"
This reverts commit 259be236cf.
2020-06-09 22:24:54 +02:00
Yannick Welsch 8199956937 Revert "Assert on request headers only (#57792)"
This reverts commit b5d3565214.
2020-06-09 22:24:35 +02:00
Henning Andersen 1e8e115ae1 Rollover avoid heavy lifting in dry-run/validation (#57894)
Fixed two newly introduced issues with rollover:
1. Using auto-expand replicas, rollover could result in unexpected log
messages on future indexes.
2. It did a reroute and other heavy work on the network thread.

Closes #57706
Supersedes #57865
Relates #53965
2020-06-09 22:07:30 +02:00
Jake Landis fff0a106c9
[7.x] Support `if_seq_no` and `if_primary_term` for ingest (#55430) (#57768)
Allow for optimistic concurrency control during ingest by checking the
sequence number and primary term. This is accomplished by defining
_if_seq_no and _if_primary_term in the pipeline, similarly to _version
and _version_type.

Closes #41255
Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>
2020-06-09 14:20:26 -05:00
Andrei Dan 3945712c72
[7.x] ILM add data stream support to the Shrink action (#57616) (#57884)
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.

(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-09 19:45:22 +01:00
Jim Ferenczi ea696198e9 Fix rounding composite aggs on sorted index (#57867)
This commit fixes a bug on the composite aggregation when the index
is sorted and the primary composite source needs to round values (date_histo).
In such case, we cannot take into account the subsequent sources even if they
match the index sort because the rounding of the primary sort value may break
the original index order.

Fixes #57849
2020-06-09 20:41:45 +02:00
Nik Everett 44a79d1739
Deprecte Rounding#round (#57845) (#57893)
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)

```

because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.

Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.

Relates to #56124
2020-06-09 14:30:56 -04:00
Tim Brooks 2630c80b5d
Fix IndexRecoveryIT transient error test (#57826)
Currently it is possible for a transient network error to disrupt the
start recovery request from the remote to source node. This disruption
is racy with the recovery occurring on the source node. It is possible
for the source node to finish and clear its recovery. When this occurs,
the recovery cannot be reestablished and the "no two start" assertion
is tripped. This commit fixes this issue by allowing two starts if the
finalize request has been received.

Fixes #57416.
2020-06-09 10:49:38 -06:00
Tim Brooks 8119b96517
Fix stalled send translog ops request (#57859)
Currently, the translog ops request is reentrent when there is a mapping
update. The impact of this is that a translog ops ends up waiting on the
pre-existing listener and it is never completed. This commit fixes this
by introducing a new code path to avoid the idempotency logic.
2020-06-09 10:46:34 -06:00
Tim Brooks c17121428e
Fix translog ops action name in channel listener (#57854)
The action name is passed to the `ChannelListener` and is used for
logging purposes. Currently, we are using the incorrect action name for
the translog ops listener. This commit fixes the issue.
2020-06-09 10:38:58 -06:00
Lee Hinman cb2ce3736a
[7.x] Make noop template updates be cluster state noops (#57851) (#57880)
Backports the following commits to 7.x:

    Make noop template updates be cluster state noops (#57851)
2020-06-09 09:26:06 -06:00
Dan Hermann b501b282f8
Change default backing index naming scheme 2020-06-09 09:31:34 -05:00
Nik Everett e7cc2448d2
Save memory when string terms are not on top (#57758) (#57876)
This reworks string flavored implementations of the `terms` aggregation
to save memory when it is under another bucket by dropping the usage of
`asMultiBucketAggregator`.
2020-06-09 10:26:29 -04:00
Yannick Welsch b5d3565214 Assert on request headers only (#57792)
Only assert that actual request headers are empty, as default headers might still be there when stashing the context.
2020-06-09 14:08:25 +02:00
Yannick Welsch 259be236cf Use clean thread context for transport and applier service (#57792)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-09 12:32:28 +02:00
Tim Brooks 9eaee3da8d
Fix exception check in RecoveryRequestTrackerTests (#57493)
Currently we check that exceptions are the same in the recovery request
tracker test. This is inconsistent because the future wraps the
exception in a new instance. This commit fixes the test by comparing a
random exception message.

Fixes #57199
2020-06-08 15:42:48 -06:00
Lee Hinman fe2eaf0d03
[7.x] Throw exception on duplicate mappings metadata fields (#57839)
In #57701 we changed mappings merging so that duplicate fields specified in mappings caused an
exception during validation. This change makes the same exception thrown when metadata fields are
duplicated. This will allow us to be strict currently with plans to make the merging more
fine-grained in a later release.
2020-06-08 14:21:18 -06:00
Tim Brooks 952cf770ed
Reestablish peer recovery after network errors (#57827)
Currently a network disruption will fail a peer recovery. This commit
adds network errors as retryable actions for the source node.
Additionally, it adds sequence numbers to the recovery request to
ensure that the requests are idempotent.

Additionally it adds a reestablish recovery action. The target node
will attempt to reestablish an existing recovery after a network
failure. This is necessary to ensure that the retries occurring on the
source node provide value in bidirectional failures.
2020-06-08 14:17:52 -06:00
Lee Hinman 6e8cf0973f
[7.x] Disallow merging existing mapping field definitions in templates (#57701) (#57822)
Backports the following commits to 7.x:

    Disallow merging existing mapping field definitions in templates (#57701)
2020-06-08 12:56:09 -06:00
Armin Braun 0987c0a5f3
Fix Broken Numeric Shard Generations in RepositoryData (#57813) (#57821)
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.

Closes #57798
2020-06-08 18:36:56 +02:00
Nik Everett ee0ce8ffaf
Fix a bug with missing fields in sig_terms (#57757)
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.

Closes #57402
2020-06-08 10:07:14 -04:00
Mayya Sharipova 70e63a365a
Refactor how to determine if a field is metafield (#57378) (#57771)
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
2020-06-08 09:16:18 -04:00
Andrei Dan 1b84e93d83
[7.x] DataStream creation validation allows for prefixed indices (#57750) (#57799)
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.

This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.

(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-08 13:31:52 +01:00
Armin Braun 004eb8bd7e
Fix Bug With RepositoryData Caching (#57785) (#57800)
* Fix Bug With RepositoryData Caching

This fixes a really subtle bug with caching `RepositoryData`
that can corrupt a repository.
We were caching `RepositoryData` serialized in the newest
metadata format. This lead to a confusing situation where
numeric shard generations would be cached in `ShardGenerations`
that were not written to the repository because the repository
or cluster did not yet support `ShardGenerations`.
In the case where shard generations are not actually supported yet,
these cached numeric generations are not safe and there's multiple
scenarios where they would be incorrect, leading to the repository
trying to read shard level metadata from index-N that don't exist.
This commit makes it so that cached metadata is always in the same
format as the metadata in the repository.

Relates #57798
2020-06-08 13:16:45 +02:00
Luca Cavanna 7a06a13d99 Add description to submit and get async search, as well as cancel tasks (#57745)
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.

Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
2020-06-08 11:17:29 +02:00
Armin Braun 619e4f8c02
Make BackgroundIndexer more Efficient (#57781) (#57789)
Improve efficiency of background indexer by allowing to add
an assertion for failures while they are produced to prevent
queuing them up.
Also, add non-blocking stop to the background indexer so that when
stopping multiple indexers we don't needlessly continue indexing
on some indexers while stopping another one.

Closes #57766
2020-06-08 10:18:47 +02:00
Nik Everett 3b1dfa3b5d
Remove deprecated wrapped from scripted_metric (backport of #57627) (#57763)
This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.

Relates to #56487
2020-06-05 16:14:28 -04:00
Martijn van Groningen f170b52e64
Backing indices should use composable template matching with the corresponding data stream name (#57728)
Backport of #57640 to 7.x branch.

Composable templates with exact matches, can match with the data stream name, but not with the backing index name.
Also if the backing index naming scheme changes, then a composable template may never match with a backing index.

In that case mappings and settings may not get applied.
2020-06-05 18:38:22 +02:00
Dan Hermann 3fe93e24a6
[7.x] Prohibit closing the write index for a data stream (#57740) 2020-06-05 11:14:43 -05:00
Jake Landis 459ab9a0b2
[7.x] Ensure type exists for all monitoring configuration (#57399) (#57704)
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.

All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.

This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.

Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.

[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
2020-06-05 10:47:11 -05:00
Tanguy Leroux 0e57528d5d Remove more //NORELEASE (#57517)
We agreed on removing the following //NORELEASE tags.
2020-06-05 15:34:06 +02:00
Gordon Brown 5a4e5a1e9d
Handle `cluster.max_shards_per_node` in YAML config (#57234)
Prior to this commit, `cluster.max_shards_per_node` is not correctly handled
when it is set via the YAML config file, only when it is set via the Cluster
Settings API.

This commit refactors how the limit is implemented, both to enable correctly
handling the setting in the YAML and to more effectively centralize the logic
used to enforce the limit. The logic used to apply the limit, as well as the
setting value, has been moved to the new `ShardLimitValidator`.
2020-06-04 14:02:21 -06:00
Nik Everett 98c379c507
Merge remaining sig_terms into terms (#57397) (#57687)
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.

Relates #56487
2020-06-04 14:32:32 -04:00
Mark Vieira 9b0f5a1589
Include vendored code notices in distribution notice files (#57017) (#57569)
(cherry picked from commit 627ef279fd29f8af63303bcaafd641aef0ffc586)
2020-06-04 10:34:24 -07:00
Armin Braun 80d1b12fa3
Restore ThreadContext after Serializing OutboundMessage (#57659) (#57681)
Stash the current context before restoring the stored context on the IO thread
so that its thread context does not get polluted.

Closes #57554
2020-06-04 17:55:26 +02:00
David Turner fc4dd6d681 Timeout health API on busy master (#57587)
Today `GET _cluster/health?wait_for_events=...&timeout=...` will wait
indefinitely for the master to process the pending cluster health task,
ignoring the specified timeout. This could take a very long time if the master
is overloaded. This commit fixes this by adding a timeout to the pending
cluster health task.
2020-06-04 13:39:22 +01:00
William Brafford 7de6d97363
Version bump for 7.7.1 release (#57619) 2020-06-03 16:38:25 -04:00
Igor Motov 8d7f389f3a
Increase search.max_buckets to 65,535 (#57042)
Increases the default search.max_buckets limit to 65,535, and only counts
buckets during reduce phase.

Closes #51731
2020-06-03 15:35:41 -04:00
Julie Tibshirani e0a15e8dc4
Remove the 'array value parser' marker interface. (#57571) (#57622)
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).

The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
2020-06-03 11:30:14 -07:00
Nik Everett 7fd94f7d0f Test: Protect auto_date_histo from 0 buckets
The test for `auto_date_histogram` as trying to round `Long.MAX_VALUE`
if there were 0 buckets. That doesn't work.

Also, this replaces all of the class variables created to make
consistent random result when testing `InternalAutoDateHistogram` with
the newer `randomResultsToReduce` which is a little simpler to
understand.
2020-06-03 12:51:22 -04:00
Christos Soulios 67abde326e
[7.x] Introduce v6.8.11 (#57600) 2020-06-03 19:10:16 +03:00
Nhat Nguyen 5097071230 Increase timeout for GlobalCheckpointSyncIT (#57567)
The test failed when it was running with 4 replicas and 3 indexing 
threads. The recovering replicas can prevent the global checkpoint from
advancing. This commit increases the timeout to 60 seconds for this
suite and the check for no inflight requests.

Closes #57204
2020-06-03 08:50:02 -04:00
Nik Everett 2a27c411fb
Same memory when geo aggregations are not on top (#57483) (#57551)
Saves memory when the `geotile_grid` and `geohash_grid` are not on the
top level by using the `LongKeyedBucketOrds` we built in #55873.
2020-06-02 16:21:50 -04:00
Zachary Tong 79ac69cfa3
[7.x Backport] Prevent SigTerms/SigText from running on fields they do not support (#57485)
SigTerms cannot run on fields that are not searchable, and SigText
cannot run on fields that do not have analyzers.  Both of these
situations fail today with an esoteric exception, so this just formalizes
the constraint by throwing an IllegalArgumentException up front.

In practice, the only affected field seems to be the `binary` field,
which is neither searchable or has a default analyzer (e.g. even numeric
and keyword fields have a default analyzer despite not being tokenized)

Adds supported-type tests, and makes some changes to the test itself
to allow testing sigtext (indexing _source).

Also a few tweaks to the test to avoid bad randomization (negative
numbers, etc).
2020-06-02 16:03:37 -04:00
Nik Everett 97c06816a4
Fix an optimization in terms agg (backport #57438) (#57547)
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.

We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.

Closes #57407
2020-06-02 14:57:45 -04:00
Mark Tozzi e50f514092
IndexFieldData should hold the ValuesSourceType (#57373) (#57532) 2020-06-02 12:16:53 -04:00
Armin Braun ba2d70d8eb
Serialize Outbound Messages on IO Threads (#56961) (#57080)
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.

With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.

This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.

Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
2020-06-02 16:15:18 +02:00
Armin Braun 9bc9d01b84
Do not Block Snapshot Thread Pool Fully During Restore or Snapshot (#57360) (#57511)
Allow for a fairer distribution of snapshot and restore operations
to enable parallel snapshots and improve behaviour for parallel snapshot + restore.

Closes #55803
2020-06-02 11:45:55 +02:00
Ryan Ernst 7aad4f6470
Store parsed mapping settings in IndexSettings (#57492)
There are several mapping settings that are currently re-parsed every
time they are read. This can be quite frequent, for example within every
document ingestion. This commit moves the parsed versions of these
mapping settings to be stored in IndexSettings, just as other index settings
are already.

closes #57395
2020-06-01 16:45:36 -07:00
Mark Tozzi 1f500583b1
Clean up Aggregator Supplier Boiler Plate (#57442) (#57452) 2020-06-01 14:21:07 -04:00
Nik Everett c6c0b1a968
Optimize `routingNodes` variable in AddIncrementallyTests (#57140) (#57447)
The `routingNodes` variable is unused. Replace `clusterState.getRoutingNodes()` with `routingNodes`.

Co-authored-by: Boice Huang <boicehuang@tencent.com>
2020-06-01 14:13:45 -04:00
Zachary Tong daaf5a3dcc
Fix assertion catching in aggregation supported type test (#56466) (#57382)
At some point, we changed the supported-type test to also catch
assertion errors.  This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.

This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.

Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.

Also includes a variety of fixes to other tests which were failing
but being silently smothered.
2020-06-01 12:10:05 -04:00
Armin Braun 59570eaa7d
Fix Local Translog Recovery not Updating Safe Commit in Edge Case (#57350) (#57380)
In case the local checkpoint in the latest commit is less
than the last processed local checkpoint we would recover
0 ops and hence not commit again.
This would lead to the logic in `IndexShard#recoverLocallyUpToGlobalCheckpoint`
not seeing the latest local checkpoint when it reload the safe commit from the store
and thus cause inefficient recoveries because the recoveries would work from a
lower than possible local checkpoint.

Closes #57010
2020-05-30 09:28:50 +02:00
Nik Everett d6a3704932
Fold some of sig_terms into terms (backport of #57361) (#57386)
This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.

The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-05-29 22:51:11 -04:00
Nik Everett f52e779806
Fix casting of scaled_float in sorts (#57207) (#57385)
Previously we'd get a `ClassCastException` when you tried to use
`numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves
some code around so the casting actually works.
2020-05-29 18:06:04 -04:00
Nik Everett d5e86d7c4d
Small cleanups for terms aggregator (#57315) (#57381)
This includes a few small cleanups for the `TermsAggregatorFactory`:

1. Removes an unused `DeprecationLogger`
2. Moves the members to right above the ctor.
3. Merges some all of the heuristics for picking `SubAggCollectionMode`
   into a single method.
2020-05-29 16:59:35 -04:00
Nik Everett 4263c25b2f
Save memory when histogram agg is not on top (backport of #57277) (#57377)
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
2020-05-29 15:07:37 -04:00
Benjamin Trent 15aba60c02
[7.x] Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695) (#57359)
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)

This commit lays the ground work for plugins supplying their own circuit breakers.

It adds a new interface: `CircuitBreakerPlugin`.

This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.

With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
2020-05-29 12:13:46 -04:00
Mayya Sharipova aebb78bf5c Run sort optimization when from+size>0 (#57250) 2020-05-29 11:30:35 -04:00
Armin Braun e4fd78f866
Remove Overly Strict Safety Mechnism in Shard Snapshot Logic (#57227) (#57362)
Unfortunately, we cannot have a safety mechnism like this where we throw whenever we find unreadable data in a shard.
This breaks in the case of an older ES version (without shard generations enabled) having failed to snapshot a shard snapshot after writing some data to its path and having finalized it for example.
Another example of where we can't support this check is the test I added, if we snapshot an index with a name that already exists in the repository and more shards than the existing index, fail doing that and then retry snapshotting it we will also see unexpected data in the path.

We could technically do deeper inspections on the unexpected data but I don't think it's worth it really. In the end if we are unable to read the data here it's broken anyway. By moving to a new `index-` blob in the shard directory I don't see us ever
corrupting existing data and since we (by virtue of moving to an empty generation) won't do any incremental work on top of potentially corrupt data we also do not risk creating broken snapshots going forward.
=> Just logging a warning in this very unlikely case is the best we can do I think
2020-05-29 16:41:57 +02:00
Dan Hermann 6b0d707671
[7.x] Do not report negative values for swap sizes (#57353) 2020-05-29 08:11:47 -05:00
Henning Andersen 8427d677e9
Reindex and friends fail nicely when max_docs < slices (#54901) (#57348)
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.

Relates to #52786.

Co-authored-by: bellengao <gbl_long@163.com>
2020-05-29 14:30:14 +02:00
Martijn van Groningen 04ef39da77
Change cluster info actions to be able to resolve data streams. (#57343)
Backport of #56878 to 7.x branch.

With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.

Relates to #53100
2020-05-29 12:17:53 +02:00
Ignacio Vera 75868ea915
Catch InputCoercionException thrown by Jackson parser (#57287) (#57330)
Jackson 2.10 library has added a new type of error that is thrown when a numeric value is out 
of range. This error should be catch and handle properly in case the flag ignore_malformed 
has been set to true.
2020-05-29 09:47:47 +02:00
Nik Everett b9fe10866e
Make global ords terms simpler to understand (backport of #57241) (#57311)
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
2020-05-28 16:52:35 -04:00
Julie Tibshirani 10e1dc199d Revert "Remove unused logic from FieldNamesFieldMapper. (#56834)"
This reverts commit 343fb699a4.
2020-05-28 10:54:10 -07:00
Martijn van Groningen 225ccd1cfa
Ensure template exists when creating data stream (#57275)
Backporting #56888 to 7.x branch.

Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.

This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.

Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.

Relates to #53100
2020-05-28 15:08:25 +02:00
Nhat Nguyen 5b08eaf90c
Fix trimUnsafeCommits for indices created before 6.2 (#57187)
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.

Closes #57091
2020-05-27 15:08:49 -04:00
Lee Hinman c0f732b9f6
[7.x] Rename template V2 classes to ComposableTemplate (#57183) (#57232)
Backports the following commits to 7.x:

    Rename template V2 classes to ComposableTemplate (#57183)
2020-05-27 11:01:59 -06:00
Nik Everett 4d5be7c817
Save memory on numeric sig terms when not top (backport of #56789) (#57221)
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
2020-05-27 12:03:28 -04:00
Przemyslaw Gomulka 0e34b2f42e
SlowLoggers using single logger (#56708)
Slow loggers should use single shared logger as otherwise when index is
deleted the log4j logger will remain reachable (log4j is caching) and
will create a memory leak.

closes https://github.com/elastic/elasticsearch/issues/56171
2020-05-27 16:38:31 +02:00
Alan Woodward d6b79bcd95 Remove Mapper.updateFieldType() (#57151)
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.

Relates to #41059
Backport of #56986
2020-05-27 09:21:24 +01:00
Julie Tibshirani 343fb699a4 Remove unused logic from FieldNamesFieldMapper. (#56834)
This logic is no longer used, now that each field mapper handles adding the
`_field_names` fields.
2020-05-26 17:40:36 -07:00
Nik Everett 0fce2b7713 Fix DateHistogramAggregatorTests.testAsSubAgg
Closes #57168 by using `AggregatorTestCase#newIndexSearcher` in the
`AggregatorTestCase#testCase`. Without that global ordinals will
*sometimes* fail to work.
2020-05-26 15:05:31 -04:00
Mark Vieira 92e127e90d
Mute DateHistogramAggregatorTests.testAsSubAgg
(cherry picked from commit 4d050a7a6438a7d102eeef9e03a7d79565bddab7)
2020-05-26 10:57:22 -07:00
Christoph Büscher 56625e35b7 Fix `bool` query behaviour on null value (#56817)
Until 7.7 we used to ignore `null` values for `bool`queries `minimum_should_match`,
parameters and also for the `must`,  `must_not`, `should` and `filter` clauses.
An internal refactoring has changed this so now we get a parsing error. While `null` 
should not a common value here, we should restore the old behaviour for bwc for now.

Closes #56812
2020-05-26 16:23:40 +02:00
Armin Braun 184338ed61
Fix Snapshot Javadoc Issues (#57083) (#57122)
Fixing some incorrect JavaDoc and a typo.

Co-authored-by: jinwook han <jin942002@naver.com>
2020-05-25 18:05:01 +02:00
Dan Hermann c5f61fe24c
Handle exceptions when building _cat/indices response 2020-05-25 09:59:24 -05:00
Armin Braun dde75b0f64
Fix Confusing Exception on Shard Snapshot Abort (#57116) (#57117)
If a partial snapshot has some of its shards aborted because an index got deleted, this can lead to confusing `IllegalStateExceptions` when trying to increment the ref count of the already closed `Store`.
Refactored this a little to throw the same exception for aborted shards no matter the timing of the store close and assert that the concurrent store close can in fact only happen when the shard snapshot has already been aborted.
2020-05-25 16:50:11 +02:00
Nhat Nguyen 4511611802 Fix testTrackingChannelTask (#57061)
A task might not be canceled on disconnection if it is completed before the cancellation
is started. We need to relax the assertion in this test.

Closes #56746
2020-05-25 09:53:50 -04:00
Armin Braun 5569137ae3
Flatten ReleaseableBytesReference Object Trees (#57092) (#57109)
When slicing a releasable bytes reference we would create a new counter
every time and pass the original reference chain to the new slice on every
slice invocation. This would lead to extremely deep reference chains and
needlessly uses a dedicated counter for every slice when all the slices
eventually just refer to the same underlying bytes and `Releasable`.
This commit tracks the ref count wrapper with its releasable in a separate
object that can be passed around on every slicing, making the slices' tree
as flat as the original releasable bytes reference.

Also, we were needlessly creating a redundant releasable bytes reference from
a releasable bytes-stream-output that we never actually used for releasing (all code
that uses it just releases the stream itself instead).
2020-05-25 13:00:37 +02:00