Commit Graph

5188 Commits

Author SHA1 Message Date
Igor Motov 00a1949852
Streamline GeoJSON to map serialization (#60413) (#60429)
Optimizes GeoJSON to map serialization when retrieving spatial data through
fields.

Closes #60259
2020-07-29 17:56:56 -04:00
Julie Tibshirani 5359417ec3
Minor clean-up around search highlight context. (#60422)
* Rename SearchContextHighlight -> SearchHighlightContext.
* Rename HighlighterContext to FieldHighlightContext.
* Make the search highlight context immutable.
* Avoid storing SearchHighlightContext on HighlighterContext.
2020-07-29 11:39:17 -07:00
Tim Brooks 85fdf959ad
Add configured indexing memory limit to node stats (#60414)
This commit adds the configured memory limit to the node stats API.
2020-07-29 12:28:21 -06:00
Nhat Nguyen 9d4a64e749
Allow CCR on nodes with legacy roles only (#60093)
CCR will stop functioning if the master node is on 7.8, but data nodes 
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.

Relates #54146
Relates #59375
2020-07-29 10:57:31 -04:00
Armin Braun 8429b4ace8
Fix Queued Snapshot Deletes After Finalization Failure (#60285) (#60379)
This fixes the behavior of the snapshot state machine in the following edge case:

1. Snapshot is running
2. Delete/abort for the snapshot is started
3. Snapshot fails to finalize

We were not removing the failed snapshot id from the list of snapshots to delete in the delete.
This lead to an error in the repository, which throws if we try to delete a non-existing snapshot.
This commmit updates the deletions in progress by removing the failed snapshot id.
The fact that this could lead to snapshot delete entries without any snapshot ids is not optimized
on purpose because it allows for another attempt at writing clean `RepositoryData` and will run basic
cleanup on the repository (root level blobs and stale indices) and thus bring the repository back into
a clean state after a failed finalization.

Closes #60274
2020-07-29 15:54:18 +02:00
Armin Braun 381cec2ba9
Fix ConcurrentSnapshotsIT.testMasterFailOverWithQueuedDeletes (#60307) (#60376)
The test assumed that the master fail-over would always work out as a single step.
This is not guaranteed however and we can randomly see master failing over twice,
in which case the transport listener will be failed on the node that stops being
leader and we have to catch an exception for the deletes as well just like we do
for the snapshot.

Closes #60262
2020-07-29 15:54:00 +02:00
Armin Braun 0778274b72
Fix IPV6 Scope Id in InetAddressesTests (#60368) (#60369)
Follow up to #60360, turns out at times the name of an interface that isn't loopback is not a valid scope id.
2020-07-29 13:16:12 +02:00
Armin Braun 1f6a3765e4
Fix NPE in SnapshotsInProgress Constructor (#60355)
Merge oversight between cleanups that removed `null` for `shards` and this corner case
spot of no indices in a snapshot.

Closes #60330
2020-07-29 10:47:28 +02:00
Armin Braun 4307a45153
Fix IPV6 Scope ID Test (#60360) (#60363)
Use real scope id from first available interface instead of `lo` which might
not exist on non-Linux platforms.

Closes #60332
2020-07-29 09:55:37 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Zachary Tong e3d85feecd Mute testForStringIPv6WithScopeIdInput test
Tracking issue: https://github.com/elastic/elasticsearch/issues/60332
2020-07-28 15:05:19 -04:00
Igor Motov 0dd53b76bd
Add aggregation list to node info (#60074) (#60256)
Adds a full list of supported aggregations to the node info API. This list
will be used in transform tests and telemetry mapping tests that will be added
as follow-up PRs.

Fixes #59774
2020-07-28 14:06:12 -04:00
Julie Tibshirani c7bfb5de41
Add search `fields` parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
James Rodewig 025e7bee80
[DOCS] Fix allowed values for numeric sort types (#60176) (#60299)
Co-authored-by: Philippus Baalman <philippus@gmail.com>
2020-07-28 13:51:59 -04:00
Howard 11b86b3f88
Remove unused clusterService instance in ActionModule. (#59826) 2020-07-28 10:36:04 -07:00
jimczi 4e4ed6ee48 fix race condition in SearchPhaseControllerTests#consumerTestCase 2020-07-28 18:27:39 +02:00
David Turner 9450ea08b4 Log and track open/close of transport connections (#60297)
Transport connections between nodes remain in place until one or other
node shuts down or the connection is disrupted by a flaky network.
Today it is very difficult to demonstrate that transient failures and
cluster instability are caused by the network even though this is often
the case. In particular, transport connections open and close without
logging anything, even at `DEBUG` level, making it very hard to quantify
the scale of the problem or to correlate the networking problems with
external events.

This commit adds the missing `DEBUG`-level logging when transport
connections open and close, and also tracks the total number of
transport connections a node has opened as a measure of the stability of
the underlying network.
2020-07-28 17:08:04 +01:00
Armin Braun 9222070f22
Fix Test Failure in testCorrectCountsForDoneShards (#60254) (#60286)
* Fix Test Failure in testCorrectCountsForDoneShards

Fixing the freak edge case where the node shard status request returns before
the node was able to send the state update request to master and update the cluster state.
Without this change, the snapshot shard status would report as `DONE` once the data node
has finished updating the shard in the cluster state.
If the data node then drops out of the cluster before the state has been updated, then
the status will jump to "FAILURE" because the master updates the state once the data node
leaves the cluster.

Closes #60247
2020-07-28 15:46:18 +02:00
David Turner b78caa5c00 Add more useful toString on cluster state observers (#60277)
Today if a cluster state observer's listener takes a long time to
process a notification then we log the following rather useless warning
message:

    [notifying listener [org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener@12345678]] took [34567ms]

This commit adds a handful of simple `toString()` implementations in
order to identify the owner of the listener in question.
2020-07-28 12:56:58 +01:00
Jim Ferenczi 1144534093
Executes incremental reduce in the search thread pool (#58461) (#60275)
This change forks the execution of partial
reduces in the coordinating node to the search thread pool.
It also ensures that partial reduces are executed sequentially
and asynchronously in order to limit the memory and cpu that a
single search request can use but also to avoid blocking a
network thread.
If a partial reduce fails with an exception, the search
request is cancelled and the reporting of the error is
delayed to the start of the fetch phase (when the final
reduce is performed). This ensures that we cleanup the
in-flight search requests before returning an error to
the user.

Closes #53411
Relates #51857
2020-07-28 13:40:47 +02:00
Armin Braun d39622e17e
Stop Serializing RepositoryData Twice when Writing (#60107) (#60269)
We can save one round of serializing `RepositoryData` on the write path.
This also leads to somewhat better compression because we compress larger chunks
in one go potentially when compared to serializing and compressing in one go.
Also, fixed the double wrapping of collections when copying the repository
data instance via the `withGenId`.
2020-07-28 11:42:14 +02:00
Yannick Welsch a55c869aab Properly document keepalive and other tcp options (#60216)
Keepalive options are not well-documented (only in transport section, although also available at http and network level).

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2020-07-28 11:10:04 +02:00
Yannick Welsch ffe114b890 Set specific keepalive options by default on supported platforms (#59278)
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.

This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
2020-07-28 11:10:04 +02:00
Armin Braun fac5953d13
Let `isInetAddress` utility understand the scope ID on ipv6 (#60172) (#60263)
Make `isInetAddress` utility method understand the scope ID on ipv6.

Fixes #60115

Co-authored-by: Yang Cheng <chengyang2048@163.com>
2020-07-28 09:37:39 +02:00
James Rodewig cb4c21fa7b
[DOCS] Fix typo in adapt auto expand replica comments (#60187) (#60239)
Co-authored-by: Howard <danielhuang@tencent.com>
2020-07-27 14:18:53 -04:00
weizijun 5df043d0e0 Fix wait_for_no_initializing_shards params (#58379) 2020-07-27 14:03:26 -04:00
Adrien Grand f1f275c91b Add 6.8.12 and 7.8.2 version constants. 2020-07-27 19:26:22 +02:00
Tim Brooks df0f68da23
Identify the operation type in rejected exception (#60138)
Currently, we do not categorize the operation type in the rejection
exception messsage when we reject an indexing operation for indexing
memory limits. This commit fixes this to ensure that it is identified as
coordinating, primary, or replica.
2020-07-27 10:09:46 -06:00
Tim Brooks 47922c9e4a
Fix indexing pressure replica rejections logic (#60150)
Currently the logic to rejection replica rejections is evaluate before
adding the additional bytes of the current operation. This means that
the first replica operation which should be rejected will be allowed to
proceed. This commit fixes this logic and adds unit level test to ensure
indexing pressure behavior is correct.
2020-07-27 10:00:01 -06:00
Nik Everett a451dd87aa
Reduce merge map memory overhead in the Variable Width Histogram Aggregation (#59366) (#60171)
When a document which is distant from existing buckets gets collected, the
`variable_width_histogram` will create a new bucket and then insert it
into the ordered list of buckets.

Currently, a new merge map array is created to move this bucket. This is
very expensive as there might be thousands of buckets.

This PR creates `mergeBuckets(UnaryOperator<Long> mergeMap)` methods in
`BucketsAggregator` and `MergingBucketsDefferingCollector`, and updates
the `variable_width_histogram`  to use them. This eliminates the need to
create an entire merge map array for each new bucket and reduces the
memory overhead of the algorithm.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-07-27 09:23:06 -04:00
Armin Braun 196ed6b90e
Remove Mostly Redundant Deleting in FsBlobContainer (#60117) (#60195)
In almost all cases we write uuid named files via this method.
Preemptively deleting just wastes IO ops, we can delete after a write failed
and retry the write to cover the few cases where we actually do an overwrite.
2020-07-27 14:05:41 +02:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Armin Braun 25a75d05c0
Fix Test Failure in testConcurrentlyChangeRepositoryContentsInBwCMode (#60095)
There is a very unlikely but possible test failure in this test.
The `SnapshotsService` continues iterating over queued operations after
resolving the transport listener. This can lead to a situation where the moved
repository data is not picked up when running the delete (even though we
have the concurrent modifications BwC mode activated) concurrently.

I fixed this in the test so that the test still verifies that this setting works.
Technically speaking, one could add logic to the way we queue and execute repo operations
to address this special case. Since this case only comes about with the concurrent modifications
setting enabled (and the setting is gone in master already) I don't really see a reason to improve
the logic here since we should always fail queued up repo operations on concurrent modification for
safety reasons.
2020-07-27 09:33:38 +02:00
Nhat Nguyen 0031dea9cc Fix race in testSendSnapshotSendsOps (#59831)
There is a race between increase and get the global checkpoint in the 
test as indexTranslogOperations can be executed concurrently.

Closes #59492
2020-07-23 16:22:40 -04:00
Ignacio Vera db183c89ed
Refactor HyperLogLogPlusPlus to separate algorithms and internal data representation (#60104) (#60109) 2020-07-23 15:07:05 +02:00
David Turner bf7e53a91e Remove node-level canAllocate override (#59389)
Today there is a node-level `canAllocate` override which the balancer
uses to ignore certain nodes to which it is certain no more shards can
be allocated. In fact this override only ignores nodes which have hit
the rarely-used `cluster.routing.allocation.total_shards_per_node`
limit, so this optimization doesn't have a meaningful impact on real
clusters.

This commit removes this unnecessary fast path from the balancer, and
also removes all the machinery needed to support it.
2020-07-23 08:48:59 +01:00
Armin Braun 43a6ff5eb1
Optimize some Spots around Closing Resources (#60049) (#60096)
The single element `close` calls go through a very inefficient path that includes creating
a one element list.
`releaseOnce` is only with a single non-null input in production in two spots so no need for
varargs and any complexity here.
`ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have
that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were
wrapping here) already.
2020-07-23 08:49:06 +02:00
Julie Tibshirani aa57bbd422
Consolidate validation for 'docvalue_fields'. (#60065)
This improves modularity and also fixes some issues when `docvalues_fields` is
used within `inner_hits` or the `top_hits` agg:
* We previously didn't resolve wildcards in field names.
* We also forgot to enforce the limit `index.max_docvalue_fields_search`.
2020-07-22 17:26:58 -07:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
Tim Brooks ba01540d7e
Implement human readable indexing pressure stats (#60058)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 12:07:59 -06:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Luca Cavanna 702c997819 ParametrizedFieldMapper to run validators against default value (#60042)
Sometimes there is the need to make a field required in the mappings, and validate that a value has been provided for it. This can be done through a validator when using ParametrizedFieldMapper, but validators need to run also when a value for a field has not been specified.

Relates to #59332
2020-07-22 14:12:38 +02:00
Armin Braun c06c9fb966
Fix BwC Snapshot INIT Path (#60006)
There were two subtle bugs here from backporting #56911 to 7.x.

1. We passed `null` for the `shards` map which isn't nullable any longer
when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map
like the `null` handling did in the past.
2. The removal of a failed `INIT` state snapshot from the cluster state tried
removing it from the finalization loop (the set of repository names that are
currently finalizing). This will trip an assertion since the snapshot failed
before its repository was put into the set. I made the logic ignore the set
in case we remove a failed `INIT` state snapshot to restore the old logic to
exactly as it was before the concurrent snapshots backport to be on the safe
side here.

Also, added tests that explicitly call the old code paths because as can be seen
from initially missing this, the BwC tests will only run in the configuration new
version master, old version nodes ever so often and having a deterministic test
for the old state machine seems the safest bet here.

Closes #59986
2020-07-22 10:09:55 +02:00
Jake Landis 55216dabb4
[7.x] Per processor description for verbose simulate (#58207) (#60008)
For ingest node processors a per processor description
was recently added. This commit displays that description
in the verbose output of the pipeline simulation.

related #57906
2020-07-21 17:32:45 -05:00
Nik Everett 49f365ddfd
Fix bug in deep pipeline agg serialization (#59984)
In #54716 I removed pipeline aggregators from the aggregation result
tree and caused us to read them from the request. This saves a bunch of
round trip bytes, which is neat. But there was a bug in the backwards
compatibility logic. You see, we still have to give the pipeline
aggregations to nodes older than 7.8 over the wire because that is how
they know what pipelines to run. They have the pipelines in the request
but they don't read them. They use the ones in the response tree.

Anyway, we had a bug where we were never sending pipelines defined two
levels down. So while you are upgrading the pipeline wouldn't run.
Sometimes. If the data node of the "first" result was post-7.8 and the
coordinating node was pre-7.8.

This fixes the bug.
2020-07-21 16:03:15 -04:00
David Turner dde568caf7 Fix scheduling of ClusterInfoService#refresh (#59880)
Today the `InternalClusterInfoService` uses the
`LocalNodeMasterListener` interface to start/stop its operations. Since
the `onMaster` and `offMaster` methods are called on the `MANAGEMENT`
threadpool, there's no guarantee that they run in the correct sequence,
which could result in an elected master failing to regularly update the
cluster info.

Since this service is also a `ClusterStateListener` we may as well drop
the usage of the `LocalNodeMasterListener` interface and simply update
the status of the local node on the applier thread in `clusterChanged`
to ensure consistency.

Additionally, today the `InternalClusterInfoService` uses a simple flag
to track whether the local node is the elected master or not. If the
node stops being the master and then starts again within a few seconds
then the scheduled updates from the old mastership might carry on
running in addition to the ones for the new mastership.

This commit addresses that by tracking the identity of the scheduled
update job and creating a new job for each mastership.
2020-07-21 17:14:49 +01:00
Alan Woodward a0ad1a196b Wrap up building parametrized TypeParsers (#59977)
The TypeParser implementations of all ParametrizedFieldMapper descendant classes are
essentially the same - stateless, requiring the construction of a Builder object, and calling
parse on it before returning it. We can make this easier (and less error-prone) to
implement by wrapping the logic up into a final class, which takes a function to produce
the Builder from a name and parser context.
2020-07-21 16:00:11 +01:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Luca Cavanna 5e17f00ecf Tweak toXContent implementation of ParametrizedFieldMapper (#59968)
ParametrizedFieldMapper overrides `toXContent` from `FieldMapper`, yet it could override `doXContentBody` and rely on the `toXContent` from the base class. Additionally, this allows to make `doXContentBody` final. Also, toXContent is still overridden only to make it final.
2020-07-21 16:01:51 +02:00
Przemyslaw Gomulka 19fe3e511f
Deprecate camel case date format backport(#59555) (#59948)
Camel case date formats are deprecated and snake case should be used
instead.
backports #59555
2020-07-21 15:56:44 +02:00
Armin Braun e37bfe8a5f
Stop Checking if Segment Data Blob Exists before Write (#59905) (#59971)
With uuid named segment data blobs there is no reason to ensure no overwrites are happening
for these blobs when writing. On the contrary, at least on Azure this check can conflict with
the SDK's retrying and cause upload failures randomly.
2020-07-21 15:23:42 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Armin Braun cefaa17c52
Simplify CheckSumBlobStoreFormat and make it more Reusable (#59888) (#59950)
Refactored `CheckSumBlobStoreFormat` so it can more easily be reused in
other functionality (i.e. upcoming repair logic).
Simplified away constant `failIfAlreadyExists` parameter and removed the atomic
write method and its tests.
The atomic write method was only used in a single spot and that spot has now been adjusted to
work the same way writing root level metadata works.
2020-07-21 11:20:56 +02:00
Armin Braun 5b92596fad
Cleanup and Optimize Multiple Serialization Spots (#59626) (#59936)
Follow up to #59606 using some of the new infrastructure and making similar cleanups (and due to at times better handling of size hints and empty collections also optimizations in the stream utility methods this also means speedups) in various spots in the core codebase.
2020-07-21 10:06:56 +02:00
Julie Tibshirani 8647872a1e
Simplify structure for parsing points. (#59938)
Previously we constructed a GeometryFormat object and delegated point parsing to
it. This wasn't a good fit conceptually because each GeometryFormat instance
didn't represent a distinct point format.
2020-07-20 17:11:43 -07:00
Nik Everett b2ca19484a
Allocate slightly less per bucket (#59740) (#59873)
This replaces that data structure that we use to resolve bucket ids in
bucketing aggs that are inside other bucketing aggs. This replaces the
"legoed together" data structure with a purpose built `LongLongHash`
with semantics similar to `LongHash`, except that it has two `long`s
as keys instead of one.

The microbenchmarks show a fairly substantial performance gain on the
hot path, around 30%. Rally's higher level benchmarks show anywhere
from 0 to 7% speed improvements. Not as much as I'd hoped, but nothing
to sneeze at. And, after all, we all allocating slightly less data per
owningBucketOrd, which is always nice.
2020-07-20 10:43:11 -04:00
Stéphane Campinas bcebdfe5b1 fix handling of alias filter in SearchService#canMatch (#59368)
The check against the alias filter should be done after the request is rewritten.

Close #59367
2020-07-20 16:25:15 +02:00
David Turner b75207a09f Remove sporadic min/max usage estimates from stats (#59755)
Today `GET _nodes/stats/fs` includes `{least,most}_usage_estimate`
fields for some nodes. These fields have rather strange semantics. They
are only reported on the elected master and on nodes that have been the
elected master since they were last restarted; when a node stops being
the elected master these stats remain in place but we stop updating them
so they may become arbitrarily stale.

This means that these statistics are pretty meaningless and impossible
to use correctly. Even if they were kept up to date they're never
reported for data-only nodes anyway, despite the fact that data nodes
are the ones where we care most about disk usage. The information needed
to compute the path with the least/most available space is already
provided in the rest the stats output, so we can treat the inclusion of
these stats as a bug and fix it by simply removing them in this commit.
Since these stats were always optional and mostly omitted (for opaque
reasons) this is not considered a breaking change.
2020-07-20 15:22:04 +01:00
Lee Hinman 8c7d414a3b
[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806) (#59810)
Backports the following commits to 7.x:

    Fix retrieving data stream stats for a DS with multiple backing indices (#59806)
2020-07-17 16:56:07 -06:00
Nik Everett 514b2f3414
Clean up a few of vwh's rough edges (#59341) (#59807)
This cleans up a few rough edged in the `variable_width_histogram`,
mostly found by @wwang500:
1. Setting its tuning parameters in an unexpected order could cause the
   request to fail.
2. We checked that the maximum number of buckets was both less than
   50000 and MAX_BUCKETS. This drops the 50000.
3. Fixes a divide by 0 that can occur of the `shard_size` is 1.
4. Fixes a divide by 0 that can occur if the `shard_size * 3` overflows
   a signed int.
5. Requires `shard_size * 3 / 4` to be at least `buckets`. If it is less
   than `buckets` we will very consistently return fewer buckets than
   requested. For the most part we expect folks to leave it at the
   default. If they change it, we expect it to be much bigger than
   `buckets`.
6. Allocate a smaller `mergeMap` in when initially bucketing requests
   that don't use the entire `shard_size * 3 / 4`. Its just a waste.
7. Default `shard_size` to `10 * buckets` rather than `100`. It *looks*
   like that was our intention the whole time. And it feels like it'd
   keep the algorithm humming along more smoothly.
8. Default the `initial_buffer` to `min(10 * shard_size, 50000)` like
   we've documented it rather than `5000`. Like the point above, this
   feels like the right thing to do to keep the algorithm happy.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-17 15:16:09 -04:00
Lee Hinman f6b08a3115
[7.x] Allow simulating existing composable index template (#59733) (#59798)
Backports the following commits to 7.x:

    Allow simulating existing composable index template (#59733)
2020-07-17 13:10:07 -06:00
Nik Everett 95e6e4a452
Small cleanup for IndexFieldData (#59724) (#59800)
This drops `IndexComponent` from `IndexFieldData` because it wasn't
doing anything other than forcing us to perform a bunch of ceremony to
build them.
2020-07-17 13:38:15 -04:00
Tal Levy c9ab7bb651
Fix bug in circuit-breaker check for geoshape grid aggregations (#57962) (#59741)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.

Fixes #57847.
2020-07-17 09:26:00 -07:00
Christoph Büscher f4ff5fe93b
Add `zero_terms_query` support to `match_phrase_prefix` (#58822) (#59784)
Currently `match_phrase_prefix` doesn't support `zero_terms_query` like the
other match-type queries. This change adds this support.

Closes #58468
2020-07-17 17:23:23 +02:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Alan Woodward 65f6fb8e94
Shortcut mapping update if the incoming mapping version is the same as the current mapping version (#59517) (#59772)
Currently, when we apply a cluster state change to a shard on a non-master node,
we check to see if the mappings need to be updated by comparing the decompressed
serialized mappings from the update against the serialized version of the shard's
existing mappings. However, we already have a much simpler way of checking this,
by comparing mapping versions on the index metadata of the old and new states.

This commit adds a shortcut to MapperService.updateMappings() that compares
these mapping versions, and ignores the merge if they are equal.
2020-07-17 14:53:09 +01:00
Alan Woodward b29d368b52
Convert DateFieldMapper to parametrized format (#59429) (#59759)
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
2020-07-17 12:46:18 +01:00
Przemko Robakowski 790fbbcd87
[7.x] Fix handling of final pipelines when destination is changed (#59522) (#59746)
* Fix handling of final pipelines when destination is changed (#59522)

This change fixes final pipelines if destination index is changed during pipeline run:
-final pipelines can't change destination anymore, exception is thrown if they try to
-if request/default pipeline changes destination final pipeline from old index won't be executed
-if request/default pipeline changes destination and new index has final pipeline it will be executed
-default pipeline from new index won't be executed
Additionally TransportBulkAction.resolvePipelines was moved to IngestService as it's needed for resolving pipelines from new index. Tests were moved accordingly.

Closes #57968
2020-07-17 11:13:48 +02:00
Tim Brooks b6e6a8c090
Fix replication operation transient retry test (#58205)
After the work to retry transient replication failures, the local and
global checkpoint test metadata can be incremented on a different thread
than the test thread. This appears to introduce an extremely rare
scenario where this data is not visible for later test assertions. This
commit fixes the issue by using synchronized maps.
2020-07-16 16:01:47 -06:00
Martijn van Groningen 0096238df1
Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727)
Backport #59503 to 7.x

and adjusted exception messages.

Relates to #59076
2020-07-16 22:29:40 +02:00
Igor Motov 2408803fad
Adds hard_bounds to histogram aggregations (#59175) (#59656)
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
2020-07-16 15:31:53 -04:00
Alan Woodward 10be10c99b
Migrate CompletionFieldMapper to parametrized format (#59691)
This adds a number of new optional parameters to Parameter, including:

* custom serialization (to handle analyzers)
* deprecated parameter names
* parameter validation
* allowing default values to be based on the values of other parameters

We preserve the previous serialization format of CompletionFieldMapper,
always emitting most fields, in order to meet mapping checks in mixed
version clusters, where the mapper service will check that mappings have
been correctly parsed and updated by checking their serialized outputs.
2020-07-16 19:15:00 +01:00
Howard c0d429863c
remove unused cluster name in environment. (backport of #59605) (#59681)
removes an unused variable
2020-07-16 09:25:55 -04:00
Nik Everett 343053c0a7 Fix compilation in Eclipse (backport #59675)
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
2020-07-16 08:25:12 -04:00
Alan Woodward 27067de699 Make MappedFieldType#meta final (#59383)
The MappedFieldType#updateMeta method was used for testing equality checks, but we
no longer need these after #59212 , so we can remove this method and make meta final.
2020-07-16 09:45:55 +01:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Armin Braun 6db481f49e
Fix ConcurrentSnapshotsIT.testEquivalentDeletesAreDeduplicated (#59611) (#59653)
Trying to queue up snapshot deletes by blocking the delete of the latest
index-N doesn't work here. The first delete will block on the delete operation
but only do so after having already written the updated repository data.
Since that repository data will contain no snapshots, the subsequent deletes for
`*` will just fall through and complete instead of queue up.
=> Fixed by simply waiting on all files on master so that we block before updating
the repository data and get to test the queueing of equivalent operations

closes #59608
2020-07-16 09:28:36 +02:00
Nhat Nguyen b599f7a9c0
Fix estimate size of translog operations (#59206)
Make sure that the estimateSize method includes all fields of translog operations.
2020-07-16 00:19:30 -04:00
Julie Tibshirani 2b70758a05 Correct type parametrization in geo mappers. (#59583)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
2020-07-15 14:10:47 -07:00
Boice Huang ef26c1739b fix typo in Exception Response in GeoJson (#59270) 2020-07-15 20:15:18 +01:00
Boice Huang 07a58d915d Fix typo in AggregationProfiler (#59269) 2020-07-15 20:14:19 +01:00
Armin Braun cc7093645c
Cleanup some Serialization Code around Snapshots (#59532) (#59606)
A number of obvious possible simplifications that also improve efficiency
in some cases (better empty collection handling and size hint use).
Also, added a shortcut for writing and reading immutable open maps that
can be used to dry up additional spots.
2020-07-15 20:40:43 +02:00
David Turner 67e7c3f60e Fix failing test introduced in #59601 2020-07-15 17:44:27 +01:00
Rory Hunter b8d73a1e7e
Default gateway.auto_import_dangling_indices to false (#59302)
Backport of #58898.

Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
2020-07-15 17:10:42 +01:00
David Turner 691759fb1f
Validate snapshot UUID during restore (#59601)
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.

This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.

Relates #50999
2020-07-15 16:23:20 +01:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Rory Hunter 2e05ce5f88 Bump version to 7.10.0 2020-07-15 11:56:45 +01:00
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
David Turner 0c2510dc68 Don't request cluster metadata in _cat/shards impl (#59548)
Today `GET _cat/shards` requests the nodes, routing table, and metadata
from the cluster state, but it does not use any information from the
metadata portion of the response. Metadata includes things like mappings
and templates that may be substantial in size.

This commit drops the unnecessary metadata portion of this cluster state
request.
2020-07-15 10:14:48 +01:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Armin Braun 96f52a028f
Fix Snapshot not Starting in Partial Snapshot Corner Case (#59428) (#59584)
We were not handling the case where during a partial snapshot all
shards would enter a failed state right off the bat.

Closes #59384
2020-07-15 07:59:22 +02:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Armin Braun 16a47e0d08
Simplify SnapshotsInProgress Construction (#58893) (#59573)
With parallel snapshots incoming (but also in isolation) it makes sense to clean up
`SnapshotsInProgress` construction.
We don't need to pre-compute the waiting shards for every entry. We rarely use this information
(only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping
over all shards instead of just the waiting ones once per routing change tops instead of on every change
to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the
current master cares about the information).
In addition to that change I removed some dead code constructors and slighly optimized deserialization.
2020-07-15 00:00:53 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Armin Braun 68a199f75f
Minor Cleanup Dead Code Snapshotting (#57716) (#59569)
* Use consistent cluster state instead in state update
* Remove dead loop in tests
* Remove some dead exception ctors

Just three trivial/random things I found.
2020-07-14 23:13:14 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Mark Tozzi ed2c29f102
If no perBucketSample has been allocated for the parent bucket return a doc count of 0 (#59360) (#59567)
Co-authored-by: Fabio Corneti <info@corneti.com>
2020-07-14 16:56:29 -04:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00