Commit Graph

52150 Commits

Author SHA1 Message Date
Nik Everett 5056f2792d
Skip max_buckets test when it is flaky (#58038)
Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in #43095.

This updates the test to be skipped in mixed version clusters without
 #43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.

Closes #57657
2020-06-12 15:06:56 -04:00
James Rodewig 0906709598 [DOCS] Reword prohibited ops for data stream write index 2020-06-12 14:00:24 -04:00
James Baiera e8e351769f
Make dependenciesInfo task filter out project dependencies (#58015) (#58057)
By default we filter out projects from dependencies info task based on their 
group names. We should be filtering based on if the dependency is a project 
dependency.
2020-06-12 13:41:29 -04:00
Ryan Ernst 3e504fb69a
Re-enable certgen cli tests on windows (#58017)
relates #50825
2020-06-12 10:22:02 -07:00
Matthias Gärtner 65ca575632 Off-by-ones on data sizes in documentation (#57735)
Fixes exponent off-by-ones in Painless documentation for int and long.
2020-06-12 09:59:15 -07:00
James Rodewig d0d8ef7270
[DOCS] Document reindex for data streams (#58013) (#58051)
Changes:

* Adds new 'Reindex with a data stream' section to 'Use a data stream'

* Makes the existing reindex API docs aware of data streams

* Rewrites the reindex glossary definition to include data streams
2020-06-12 11:21:31 -04:00
James Rodewig 82493615ce
[DOCS] Add data streams glossary def (#58014) (#58049) 2020-06-12 10:39:51 -04:00
James Rodewig 4539f15b90
[DOCS] Remove 7.8.0 coming tags (#58047) 2020-06-12 10:17:40 -04:00
Rory Hunter e840ffa300 Add release notes for 7.8.0 (#56340)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: lcawl <lcawley@elastic.co>
2020-06-12 10:03:41 -04:00
Mayya Sharipova 8bd0147ba7
Correct how meta-field is defined for pre 7.8 hits (#57951)
We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.

Closes #57831
2020-06-12 09:39:53 -04:00
Ignacio Vera c518670f83
Fix Geo grid aggregation circuit breaker tests (#58028) (#58042)
This commit makes sure we create index with only one shard.
2020-06-12 15:39:27 +02:00
Armin Braun 5662281562
Fix ExtraFS Breaking SharedClusterSnapshotRestoreIT (#58026) (#58040)
If `ExtraFS` decides to put `extra0/0` into the indices folder
then the previous logic in this test would have interpreted the `0`
as shard `0` of index `extra0` and fail to list its contents (since it's a file
and not an actual shard directory).

=> simplified the logic to use actually referenced `IndexId` for iterating over indices
instead.
2020-06-12 15:27:48 +02:00
Martijn van Groningen 01d8bb8cfa
Enforce valid field mapping exists for timestamp_field in templates. (#58036)
Backport of #57741 to 7.x branch.

Relates to #53100
2020-06-12 15:24:42 +02:00
Armin Braun a5a251d8c0
Handle Rejections when Scheduling RetryableAction (#58033) (#58039)
Scheduling on the threadpool will throw if the scheduler is already
shut down. Handled by treating the rejection like any other non-retryable
exception.

Closes #58021
2020-06-12 15:23:02 +02:00
Nik Everett d6c8d9415d
Give significance lookups their own home (backport of #57903) (#57959)
This moves the code to look up significance heuristics information like
background frequency and superset size out of
`SignificantTermsAggregatorFactory` and into its own home so that it is
easier to pass around. This will:
1. Make us feel better about ourselves for not passing around the
   factory, which is really *supposed* to be a throw away thing.
2. Abstract the significance lookup logic so we can reuse it for the
   `significant_text` aggregation.
3. Make if very simple to cache the background frequencies which should
   speed up when the agg is a sub-agg. We had done this for numerics
   but not string-shaped significant terms.
2020-06-12 09:21:19 -04:00
James Rodewig 922423c52c [DOCS] Remove unneeded comma in data stream docs 2020-06-12 09:14:56 -04:00
Rene Groeschke cb78c4d29d
Fix info message for RunTask in debug mode (#57974) (#58003)
Fixes #57860

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-12 13:32:03 +02:00
Martijn van Groningen f4199f2ee0
Prohibit append-only writes targeting backing indices directly. (#58025)
Backport of #57788 to 7.x branch.

Append-only writes can only target the corresponding data stream.

Relates to #53100
2020-06-12 13:17:55 +02:00
David Roberts 93b693527a
[7.x][ML] Add categorizer stats ML result type (#58001)
This type of result will store stats about how well categorization
is performing.  When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.

This PR is a minimal implementation to allow the C++ side changes to
be made.  More Java side changes related to per-partition
categorization will be in followup PRs.  However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats.  Like forecast request stats the
categorizer stats can be read directly from the job's results alias.

Backport of #57978
2020-06-12 12:08:07 +01:00
markharwood 2da8e57f59
Search - add range query support to wildcard field (#57881) (#57988)
Backport to add range query support to wildcard field

Closes #57816
2020-06-12 11:30:54 +01:00
Armin Braun db03e7c93b
Exclude WindowsFS from SharedClusterSnapshotRestoreIT (#58020) (#58023)
Same as #52488 but for a different test suite

Closes #58019
2020-06-12 10:49:03 +02:00
David Kyle 39020f3900
HLRC for delete expired data by job Id (#57722) (#57975)
High level rest client changes for #57337
2020-06-12 09:44:17 +01:00
Martijn van Groningen c8031c6f99
Add data stream support to the reindex api. (#57970)
Backport of #57870 to 7.x branch.

This change now also copies the op_type from the reindex request's destination index request to the actual index request being used in the bulk request.

For ensuring no document exists, the op_type create doesn't need to be copied, since Versions.MATCH_DELETED will copied from the 'mainRequest.getDestination().version()'.
The `version()` method on IndexRequest only returns Versions.MATCH_DELETED if op_type=create and no specific version has been specified.

However in order to be able to index into a data stream, the op_type must be create. So in order to support that the op_type must be copied from the reindex request's destination index request to the actual index request being used in the bulk request.

Relates to #53100 and #57788
2020-06-12 09:54:37 +02:00
Rene Groeschke 5226fef321 Update Gradle wrapper to 6.5 (#57580) (#57653)
* Update Gradle wrapper to 6.5
* Fix groovy incompatibility issue after gradle update
* Fix Gstring String incompatibility
2020-06-12 08:38:16 +02:00
Ryan Ernst 3bc2601ba3
Re-enable packaging tests for windows (#58010)
This commit fixes the gc logfile name for windows on java 8, and re-enables windows testing of the archive tests.

closes #50825
2020-06-11 16:26:24 -07:00
James Rodewig bf90b6f221 [DOCS] Remove extra word from data stream docs 2020-06-11 17:44:59 -04:00
Mark Tozzi 36f551bdb4
Make ValuesSourceConfig behave like a config object (#57762) (#58012) 2020-06-11 17:23:55 -04:00
Igor Motov 5138c0c045
Fix missing null values for std_deviation_bounds in ext. stats aggs (#58000)
Adds missing null values for std_deviation_bounds in extended stats aggs and
improves null handling in parsed extended stats.
2020-06-11 16:23:20 -04:00
James Rodewig 1814b66a69 [DOCS] Fix typos in data stream docs 2020-06-11 16:21:09 -04:00
Benjamin Trent 2881995a45
[ML] adding new inference model size estimate handling from native process (#57930) (#57999)
Adds support for reading in `model_size_info` objects.

These objects contain numeric values indicating the model definition size and complexity.

Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-11 15:59:23 -04:00
Lee Hinman ffc3c77f75
[7.x] Disallow deletion of composable template if in use by data stream (#57957) (#57994)
Backports the following commits to 7.x:

    Disallow deletion of composable template if in use by data stream (#57957)
2020-06-11 13:51:56 -06:00
Mark Vieira d9e547dbd3
Revert "Re-enable windows archives packaging tests (#57955)"
This reverts commit 573c6279af.
2020-06-11 11:58:56 -07:00
Lisa Cawley 7442808869 [DOCS] Rename monitoring collection from internal to legacy (#56395) 2020-06-11 10:21:01 -07:00
Jim Ferenczi 4c6bfe32a7 Fix possible NPE on search phase failure (#57952)
When a search phase fails, we release the context of all successful shards.
Successful shards that rewrite the request to match none will not create any context
since #. This change ensures that we don't try to release a `null` context on these
successful shards.

Closes #57945
2020-06-11 18:54:16 +02:00
James Rodewig c36df27730
[DOCS] Reformat `pattern_replace` token filter (#57699) (#57995)
Changes:

* Rewrites description and adds Lucene link
* Adds analyze example
* Adds parameter definitions
* Adds custom analyzer example
2020-06-11 12:19:38 -04:00
Yannick Welsch 85b0b540f0 Fix refresh behavior in MockDiskUsagesIT (#57926)
Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the
shard size or disk usage function (to mock out disk usage) are overridden.

Closes #57888
2020-06-11 17:38:12 +02:00
James Rodewig 6fc8317f07
[DOCS] Reformat data streams intro and overview (#57954) (#57993)
Changes:

* Updates 'Data streams' intro page to focus on problem solution and
  benefits.

* Adds 'Data streams overview' page to cover conceptual information,
  based on existing content in the 'Data streams' intro.

* Adds diagrams for data streams and search/indexing request examples.

* Moves API jump list and API docs to a new 'Data streams APIs' section.
  Links to these APIs will be available through tutorials.

* Add xrefs to existing docs for concepts like generation, write index,
  and append-only.
2020-06-11 11:32:09 -04:00
James Rodewig 4e738f60f8 [DOCS] Fix typo in data stream docs 2020-06-11 11:30:00 -04:00
James Rodewig d534862d41
[DOCS] Move search API's `docvalue_fields` examples (#57760) (#57989)
Changes:

* Condenses and relocates the `docvalue_fields` example to the 'Run a search'
   page.
* Adds docs for the `docvalue_fields` request body parameter.
* Updates several related xrefs.

Co-authored-by: debadair <debadair@elastic.co>
2020-06-11 11:25:04 -04:00
David Turner f950c121bb Hide AlreadyClosedException on IndexCommit release (#57986)
Today `InternalEngine#releaseIndexCommit` fails with an
`AlreadyClosedException` if the engine is closed before the index commit is
released. This can happen if, for example, a node leaves and rejoins the
cluster and acquires an index commit for replica shard allocation concurrently
with shutting the shard down.

There's no need to fail the operation like this: if the engine is shut down
then we will clean up the unreferenced files when it's restarted (or if it's
allocated elsewhere) so we can suppress an `AlreadyClosedException` in this
case. This commit does so.

Fixes #57797
2020-06-11 15:41:50 +01:00
David Turner 9b52a250f8 Add admonition to cluster state instability note (#57985)
We document that the cluster state API is an internal representation which may
change, but apparently not emphatically enough. This commit adds a `NOTE:`
admonition to this paragraph.
2020-06-11 15:32:18 +01:00
Alan Woodward 16e230dcb8 Update to lucene snapshot e7c625430ed (#57981)
Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.
2020-06-11 14:51:53 +01:00
Yannick Welsch 34fc52dbf3 Fix PersistedClusterStateServiceTests.testSlowLogging (#57971)
The range in the last writeDurationMillis selection could be empty, as it could prior to the call be set to 1.
2020-06-11 15:47:34 +02:00
Igor Motov 947573f309
Added standard deviation / variance sampling to extended stats (#49782) (#57947)
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
 
Closes #49554

Co-authored-by: Igor Motov <igor@motovs.org>

Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
2020-06-11 09:19:44 -04:00
Nik Everett da72a3a51d
Speed up reducing auto_date_histo with a time zone (backport of #57933) (#57958)
When reducing `auto_date_histogram` we were using `Rounding#round`
which is quite a bit more expensive than
```
Rounding.Prepared prepared = rounding.prepare(min, max);
long result = prepared.round(date);
```
when rounding to a non-fixed time zone like `America/New_York`. This
stops using the former and starts using the latter.

Relates to #56124
2020-06-11 09:15:12 -04:00
David Roberts 54d4f2a623 [ML] Refresh annotations index on job flush and close (#57979)
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
2020-06-11 12:29:04 +01:00
David Kyle b87b147704
Add models for search to ModelLoadingService (#57592) (#57919)
ModelLoadingService only caches models if they are referenced by an 
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest 
pipeline is deleted the model it references should not be evicted if 
it is used in search.
2020-06-11 10:48:37 +01:00
David Kyle 2905a2f623
Use Search After job iterators (#57875) (#57923)
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
2020-06-11 10:06:18 +01:00
Ryan Ernst 573c6279af
Re-enable windows archives packaging tests (#57955)
This commit re-enables windows testing for archives packaging tests.
These were disabled previously because of constant failure due to
windows file locks, but the failure does not occur outside of CI, so
they are being re-enabled to further investigate the failure.

relates #50825
2020-06-10 15:13:33 -07:00
Costin Leau ff0ea62cb8 EQL: Fix casing for tiebreaker field (#57943)
Use tiebreaker instead of tieBreaker

(cherry picked from commit 3c774948a5d5e10fac267cb9a54f5d0559a00c1d)
2020-06-11 00:10:19 +03:00