Commit Graph

52841 Commits

Author SHA1 Message Date
James Rodewig f8976505cb
[DOCS] Correct the default value of `ignore_throttled` param (#60036) (#60086)
Co-authored-by: bellengao <gbl_long@163.com>
2020-07-22 16:53:18 -04:00
James Rodewig 0c9791798d
[7.x] [DOCS] Reformat snippets to use two-space indents (#60080) 2020-07-22 15:57:49 -04:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
Lisa Cawley 9ba017f699
[DOCS] Changes level offset of transform pages (#60066) (#60075) 2020-07-22 11:22:57 -07:00
Tim Brooks ba01540d7e
Implement human readable indexing pressure stats (#60058)
The indexing pressure stats do not currently have human readable
variants. This commit add human readable variants and updates the
documentation.
2020-07-22 12:07:59 -06:00
James Rodewig ed10d7407c
[DOCS] Fix shrink index API prereqs (#59985) (#60067) 2020-07-22 14:06:40 -04:00
James Rodewig 74a34777d1
[DOCS] Fix outdated Kibana UI refs and screenshots in security docs (#60023) (#60059) 2020-07-22 13:08:22 -04:00
Tim Brooks ceb54ed655
Add indexing pressure documentation (#59456)
This commit adds documentation about the new indexing pressure memory
limit setting and exposure of this metrics in node stats.
2020-07-22 10:09:18 -06:00
Adam Locke 0a73225cd8
[DOCS] Adding new page for restore snapshot API (#59937) (#60055)
* Adding new page for restore snapshot API.

* Improving test cases, lots of edits, and streamlining content.

* Incorporating review suggestions and feedback.

* Specify `index alias` vs `alias`

* Change parameter order

* Provide clarity around regular expression

* Add link to SLM parameters

* Split sentences in example

* Adding link to master node page.
2020-07-22 12:08:55 -04:00
Lisa Cawley 46d33b1586
[DOCS] 7.9.0 release notes (#60053) 2020-07-22 08:40:59 -07:00
Larry Gregory a686ccc9b2
[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854) (#60047) 2020-07-22 11:06:10 -04:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Luca Cavanna 702c997819 ParametrizedFieldMapper to run validators against default value (#60042)
Sometimes there is the need to make a field required in the mappings, and validate that a value has been provided for it. This can be done through a validator when using ParametrizedFieldMapper, but validators need to run also when a value for a field has not been specified.

Relates to #59332
2020-07-22 14:12:38 +02:00
Dimitris Athanasiou 7e652ca873
[7.x][ML] Include same fields during test inference as in training (#… (#60034)
In #58877, when we switched test inference on java, we just
use the doc's `_source` as features. However, this could be
missing out on features that were used during training,
e.g. alias fields, etc.

This commit addresses this by extracting fields to use as
features during inference the same way they are extracted
in `DataFrameDataExtractor` when they are used for training.

Backport of #59963
2020-07-22 12:54:13 +03:00
David Roberts 7358f9fb05 [ML] Mute ForecastIT.testOverflowToDisk in EAR builds (#60040)
Due to https://github.com/elastic/elasticsearch/issues/58806
2020-07-22 10:17:37 +01:00
Armin Braun c06c9fb966
Fix BwC Snapshot INIT Path (#60006)
There were two subtle bugs here from backporting #56911 to 7.x.

1. We passed `null` for the `shards` map which isn't nullable any longer
when creating `SnapshotsInProgress.Entry`, fixed by just passing an empty map
like the `null` handling did in the past.
2. The removal of a failed `INIT` state snapshot from the cluster state tried
removing it from the finalization loop (the set of repository names that are
currently finalizing). This will trip an assertion since the snapshot failed
before its repository was put into the set. I made the logic ignore the set
in case we remove a failed `INIT` state snapshot to restore the old logic to
exactly as it was before the concurrent snapshots backport to be on the safe
side here.

Also, added tests that explicitly call the old code paths because as can be seen
from initially missing this, the BwC tests will only run in the configuration new
version master, old version nodes ever so often and having a deterministic test
for the old state machine seems the safest bet here.

Closes #59986
2020-07-22 10:09:55 +02:00
Rene Groeschke b210af8389
Update Gradle configurations section in CONTRIBUTING (#59906) 2020-07-22 09:15:32 +02:00
Rene Groeschke 3fe6635b92
Remove stale gradle plugin descriptor 2020-07-22 09:10:01 +02:00
James Baiera 1c1a4297e0
Track backing indices in data streams stats from cluster state (#59817) (#60015)
If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate 
counts of the number of backing indices, despite this data being accurate and available in the 
cluster state.
2020-07-21 23:21:33 -04:00
Emily Li 5f27a95346 Fix grammar mistake in SQL data type docs. (#60028)
Remove an extra 'when'.
2020-07-21 16:15:06 -07:00
Jake Landis 55216dabb4
[7.x] Per processor description for verbose simulate (#58207) (#60008)
For ingest node processors a per processor description
was recently added. This commit displays that description
in the verbose output of the pipeline simulation.

related #57906
2020-07-21 17:32:45 -05:00
James Rodewig 293cb8d48c
[DOCS] Fix typo in thread pools docs (#59944) (#60019)
Fix typo where available processors should be allocated processors.

Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2020-07-21 17:04:36 -04:00
James Rodewig 401e12dc2b
[DOCS] Fix data stream docs (#59818) (#60010) 2020-07-21 17:04:13 -04:00
James Rodewig 04c68ba740
[DOCS] Update search docs to use `my-index` dataset (#60005) (#60012) 2020-07-21 16:14:44 -04:00
Nik Everett 49f365ddfd
Fix bug in deep pipeline agg serialization (#59984)
In #54716 I removed pipeline aggregators from the aggregation result
tree and caused us to read them from the request. This saves a bunch of
round trip bytes, which is neat. But there was a bug in the backwards
compatibility logic. You see, we still have to give the pipeline
aggregations to nodes older than 7.8 over the wire because that is how
they know what pipelines to run. They have the pipelines in the request
but they don't read them. They use the ones in the response tree.

Anyway, we had a bug where we were never sending pipelines defined two
levels down. So while you are upgrading the pipeline wouldn't run.
Sometimes. If the data node of the "first" result was post-7.8 and the
coordinating node was pre-7.8.

This fixes the bug.
2020-07-21 16:03:15 -04:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
James Rodewig b302b09b85
[DOCS] Reformat snippets to use two-space indents (#59973) (#59994) 2020-07-21 15:49:58 -04:00
David Roberts 606b7ea139 [DOCS] Adds extra ml-cpp PRs to release notes (#59967) 2020-07-21 11:47:36 -07:00
Tim Brooks ed315442ac
Update thread pool docs about WRITE queue size (#59643)
This commit updates the thread pool documentation to reflect the change
in the WRITE thread pool default queue size.
2020-07-21 12:38:03 -06:00
James Rodewig 32d7fa1541
[DOCS] Introduce basic ECS logs test (#59713) (#59997)
Adds a new `my-index-00001` REST test for docs snippets.

This test can serve as a lightweight replacement for
our existing `twitter` REST tests.

The new dataset is:

* Based on Apache logs, which is better aligned with Elastic use cases
* Compliant with ECS
* Similar to the existing `twitter` data set, containing the same field data types
* Lightweight, which should keep existing test runtimes roughly the same

Also updates the search API reference docs to use the new test.
2020-07-21 13:25:53 -04:00
Armin Braun 5613e4b00b
Increase Timeout in testSLMRetentionAfterRestore (#59979) (#59991)
This test failed by hitting the 10s default busy assert timeout.
Given how involved the retention run is (multiple disk reads, CS updates etc.)
we should have a higher timeout here.

Also, removed the pointless delete call for the snapshot that we just asserted is gone,
 at the end of the test.

Closes #59956
2020-07-21 18:19:18 +02:00
David Turner dde568caf7 Fix scheduling of ClusterInfoService#refresh (#59880)
Today the `InternalClusterInfoService` uses the
`LocalNodeMasterListener` interface to start/stop its operations. Since
the `onMaster` and `offMaster` methods are called on the `MANAGEMENT`
threadpool, there's no guarantee that they run in the correct sequence,
which could result in an elected master failing to regularly update the
cluster info.

Since this service is also a `ClusterStateListener` we may as well drop
the usage of the `LocalNodeMasterListener` interface and simply update
the status of the local node on the applier thread in `clusterChanged`
to ensure consistency.

Additionally, today the `InternalClusterInfoService` uses a simple flag
to track whether the local node is the elected master or not. If the
node stops being the master and then starts again within a few seconds
then the scheduled updates from the old mastership might carry on
running in addition to the ones for the new mastership.

This commit addresses that by tracking the identity of the scheduled
update job and creating a new job for each mastership.
2020-07-21 17:14:49 +01:00
James Rodewig fb40ccf8a4
[DOCS] Mark data stream stats API as stable (#59978) (#59987)
Removes experimental admon from data stream stats API.
Relates to #59860.
2020-07-21 11:22:36 -04:00
malpani 0555fef799 Support ignore_keywords flag for word delimiter graph token filter (#59563)
This commit allows customizing the word delimiter token filters to skip processing 
tokens tagged as keyword through the `ignore_keywords` flag Lucene's 
WordDelimiterGraphFilter already exposes.

Fix for #59491
2020-07-21 16:11:55 +01:00
Alan Woodward a0ad1a196b Wrap up building parametrized TypeParsers (#59977)
The TypeParser implementations of all ParametrizedFieldMapper descendant classes are
essentially the same - stateless, requiring the construction of a Builder object, and calling
parse on it before returning it. We can make this easier (and less error-prone) to
implement by wrapping the logic up into a final class, which takes a function to produce
the Builder from a name and parser context.
2020-07-21 16:00:11 +01:00
James Rodewig 4d646ca819
[DOCS] Fix typo in LDAP config docs (#59953) (#59974)
Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
2020-07-21 10:48:08 -04:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Howard 466e947b0e
[DOCS] Fix missing punctuation in agg docs (#59823) 2020-07-21 10:19:29 -04:00
Luca Cavanna 5e17f00ecf Tweak toXContent implementation of ParametrizedFieldMapper (#59968)
ParametrizedFieldMapper overrides `toXContent` from `FieldMapper`, yet it could override `doXContentBody` and rely on the `toXContent` from the base class. Additionally, this allows to make `doXContentBody` final. Also, toXContent is still overridden only to make it final.
2020-07-21 16:01:51 +02:00
Przemyslaw Gomulka 19fe3e511f
Deprecate camel case date format backport(#59555) (#59948)
Camel case date formats are deprecated and snake case should be used
instead.
backports #59555
2020-07-21 15:56:44 +02:00
Armin Braun e37bfe8a5f
Stop Checking if Segment Data Blob Exists before Write (#59905) (#59971)
With uuid named segment data blobs there is no reason to ensure no overwrites are happening
for these blobs when writing. On the contrary, at least on Azure this check can conflict with
the SDK's retrying and cause upload failures randomly.
2020-07-21 15:23:42 +02:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Rene Groeschke c6d3af35b9
Simplify gradle test task error reporting (#59760) (#59964)
* Simplify test error reporting

- avoid using extra plugin
- avoid extra task listener (should be avoided related to #57918 )
- keep all logic in the listener
2020-07-21 14:13:35 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Rene Groeschke 60d46f1d13
Remove superflous enforce deprecation failure plugin (#59770) (#59898)
- With enforcing the build to fail on all gradle deprecated api usage we do not need
this tailored plugin anymore
2020-07-21 12:54:17 +02:00
Armin Braun cefaa17c52
Simplify CheckSumBlobStoreFormat and make it more Reusable (#59888) (#59950)
Refactored `CheckSumBlobStoreFormat` so it can more easily be reused in
other functionality (i.e. upcoming repair logic).
Simplified away constant `failIfAlreadyExists` parameter and removed the atomic
write method and its tests.
The atomic write method was only used in a single spot and that spot has now been adjusted to
work the same way writing root level metadata works.
2020-07-21 11:20:56 +02:00
Armin Braun 5b92596fad
Cleanup and Optimize Multiple Serialization Spots (#59626) (#59936)
Follow up to #59606 using some of the new infrastructure and making similar cleanups (and due to at times better handling of size hints and empty collections also optimizations in the stream utility methods this also means speedups) in various spots in the core codebase.
2020-07-21 10:06:56 +02:00
Julie Tibshirani 8647872a1e
Simplify structure for parsing points. (#59938)
Previously we constructed a GeometryFormat object and delegated point parsing to
it. This wasn't a good fit conceptually because each GeometryFormat instance
didn't represent a distinct point format.
2020-07-20 17:11:43 -07:00
Lisa Cawley fb212269ce
[DOCS] Changes level offset of anomaly detection pages (#59911) (#59940) 2020-07-20 17:04:59 -07:00
Julie Tibshirani 8dc5880c3f Add 'point' to the top-level field type docs. (#59731)
Before it was missing from the list. This PR also renames the 'geo data types'
section to 'spatial data types' and consolidates the geo and cartesian types
into that section.
2020-07-20 16:30:12 -07:00