5337 Commits

Author SHA1 Message Date
Tanguy Leroux
8669766a81 Reduce contention in CacheFile.fileLock() method (#55662)
The CacheFile.fileLock() method is used to acquire a lock 
on a cache file so that the file can't be deleted (or its file 
handle closed) during the execution of a read or a write 
operation.

Today this lock is obtained by first acquiring the eviction 
lock (the write lock of the readwrite lock), then by checking 
if the cache file is evicted and the file channel still open, 
and finally by obtaining the file lock (the read lock of the 
readwrite lock). Acquiring the read lock while the eviction 
lock is held ensures that the cache file eviction cannot 
start in the meanwhile. But eviction starts (and terminations) 
also acquire the eviction lock; and this lock cannot be 
obtained while a read lock is held (the write lock of a 
readwrite lock is exclusive).

If we were acquiring a read lock and checking the eviction 
flag and file channel existence while holding the read lock 
we know that no eviction can start or finish until the 
read lock is released.
2020-04-23 14:40:27 +02:00
Rory Hunter
d66af46724
Always use deprecateAndMaybeLog for deprecation warnings (#55319)
Backport of #55115.

Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
2020-04-23 09:20:54 +01:00
David Roberts
87f4751eca [ML] Make find_file_structure recognize Kibana CSV report timestamps (#55609)
The Kibana CSV export feature uses a non-standard timestamp format.
This change adds it to the formats the find_file_structure endpoint
recognizes out-of-the-box, to make round-tripping data from Kibana
back to Kibana via CSV files easier.

Fixes #55586
2020-04-23 08:39:07 +01:00
Jake Landis
25ea6a74f0
[7.x] Validate REST specs against schema (#55117) (#55563)
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.

The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.

An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.

Closes #54314
2020-04-22 14:14:03 -05:00
Albert Zaharovits
82ed0ab420
Update the audit logfile list of system users (#55578)
Out of the box "access granted" audit events are not logged
for system users. The list of system users was stale and included
only the _system and _xpack users. This commit expands this list
with _xpack_security and _async_search, effectively reducing the
auditing noise by not logging the audit events of these system
users out of the box.

Closes #37924
2020-04-22 21:59:31 +03:00
Tal Levy
c370b83bd7
Fix locale lowercase test issue in GenerateSnapshotNameStepTests (#55597) (#55605)
The testPerformAction test has been failing periodically due to
how Hamcrest's containsStringIgnoringCase does not lowercase using
the same Locale set in the test infrastructure.

This commit falls back to explicitly lowercasing using the root
locale
2020-04-22 11:29:57 -07:00
Tal Levy
f27ce69f0c
[backport] Add geo_bounds aggregation support for geo_shape (#55328) (#55600)
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
2020-04-22 11:29:35 -07:00
Tal Levy
0844455505
Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037) (#55500)
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.

This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
2020-04-22 08:12:54 -07:00
Dimitris Athanasiou
50a5afed15
[7.x][ML] Prepare parsing phase_progress from DFA process (#55580) (#55587)
Data frame analytics process currently reports progress as
an integer `progress_percent`. We parse that and report it
from the _stats API as the progress of the `analyzing` phase.
However, we want to allow the DFA process to report progress
for more than one phase. This commit prepares for this by
parsing `phase_progress` from the process, an object that
contains the `phase` name plus the `progress_percent` for that
phase.

Backport of #55580
2020-04-22 16:38:32 +03:00
Benjamin Trent
7c81cd7833
[ML] explicitly disallow partial results in datafeed extractors (#55537) (#55585)
Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`.

- Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set.
- Code error handling is simplified

closes https://github.com/elastic/elasticsearch/issues/40793
2020-04-22 09:07:44 -04:00
David Roberts
810caf5ffe
[ML] Test that audit message is written when closing unassigned job (#55582)
Issue #55521 suggested that audit messages were not written when
closing an unassigned job.  This is not the case, but we didn't
have a test to prove it.

Backport of #55571
2020-04-22 13:23:43 +01:00
David Roberts
2dc5586afe
[ML] Add effective max model memory limit to ML info (#55581)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Backport of #55529
2020-04-22 12:28:50 +01:00
David Roberts
da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response (#55570)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of #55473
2020-04-22 12:06:53 +01:00
David Kyle
e99ef3542c Mute ModelLoadingServiceTests::testMaxCachedLimitReached 2020-04-22 11:53:07 +01:00
Tim Vernum
8b566aea47
Fix use of password protected PKCS#8 keys for SSL (#55567)
PEMUtils would incorrectly fill the encryption password with zeros
(the '\0' character) after decrypting a PKCS#8 key.

Since PEMUtils did not take ownership of this password it should not
zero it out because it does not know whether the caller will use that
password array again. This is actually what PEMKeyConfig does - it
uses the key encryption password as the password for the ephemeral
keystore that it creates in order to build a KeyManager.

Backport of: #55457
2020-04-22 16:38:51 +10:00
Yang Wang
32e46bf552
Fix certutil http for empty password with JDK 11 and lower (#55437) (#55565)
Fix elasticseaerch-certutil http command so that it correctly accepts empty keystore password with JDK version 11 and lower.
2020-04-22 15:03:10 +10:00
David Kyle
8e8c6b4aee
Fix accounting in ModelLoadingServiceTests (#55307) (#55547)
In the test after the first load event is is not known which models are cached as 
loading a later one will evict an earlier one and the order is not known.
The models could have been loaded 1 or 2 times not exactly twice
2020-04-21 19:25:06 +01:00
Armin Braun
db7eb8e8ff
Remove Redundant CS Update on Snapshot Finalization (#55276) (#55528)
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
2020-04-21 15:33:17 +02:00
David Turner
be60d50452 Allow searching of snapshot taken while indexing (#55511)
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.

This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.

Relates #50999
2020-04-21 13:21:38 +01:00
Ignacio Vera
e4c65b4388
mute test SSLReloadDuringStartupIntegTests.testReloadDuringStartup (#55525) 2020-04-21 14:13:13 +02:00
Jim Ferenczi
0b3bdfcc3e Fix expiration time in async search response (#55435)
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
2020-04-21 14:04:29 +02:00
Przemysław Witek
59d377462f
Apply default timeout in StopDataFrameAnalyticsAction.Request (#55512) (#55517) 2020-04-21 13:05:48 +02:00
Nhat Nguyen
3cc4e0dd09 Retry follow task when remote connection queue full (#55314)
If more than 100 shard-follow tasks are trying to connect to the remote 
cluster, then some of them will abort with "connect listener queue is 
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
2020-04-20 22:43:05 -04:00
Stuart Tettemer
93a2e9b0f9
Test: MockScoreScript can be cacheable. (#55499)
Backport: 0ed1eb5
2020-04-20 17:09:58 -06:00
Benjamin Trent
cabff65aec
[ML] Fixing inference stats race condition (#55163) (#55486)
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.

To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.

When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.

This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.

I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).

closes  https://github.com/elastic/elasticsearch/issues/54786
2020-04-20 16:21:18 -04:00
Benjamin Trent
24d41eb695
[ML] partitions model definitions into chunks (#55260) (#55484)
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.

This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.

I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
2020-04-20 16:08:54 -04:00
Benjamin Trent
fa0373a19f
[7.x] [ML] Fix log spam and disable ILM/SLM history for native ML tests (#55475)
* [ML] fix native ML test log spam (#55459)

This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.

* removing unnecessary changes

* removing unused imports

* removing unnecessary java setting
2020-04-20 15:41:30 -04:00
Lee Hinman
9eddd2bcc9
[7.x] Add prefer_v2_templates flag and index setting (#55411) (#55476)
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover

These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.

Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.

This also adds support for sending this parameter to the High Level Rest Client.

Relates to #53101
2020-04-20 12:05:42 -06:00
Armin Braun
a0763d958d
Make RepositoryData Less Memory Heavy (#55293) (#55468)
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
2020-04-20 18:28:06 +02:00
William Brafford
7817948926 Disable monitoring in ML multinode tests (#55461)
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
2020-04-20 10:51:16 -04:00
David Turner
0df329dde7 Use soft deletes for searchable snapshots tests (#55453)
This allows us to perform some dummy indexing including updates/deletes.
2020-04-20 14:37:51 +01:00
Przemysław Witek
7d5f74e964
Fix and unmute testSetUpgradeMode_ExistingTaskGetsUnassigned (#55368) (#55452) 2020-04-20 13:29:29 +02:00
Yannick Welsch
b9da307cd1 Add GCS support for searchable snapshots (#55403)
Adds ranged read support for GCS repositories in order to enable searchable snapshot support
for GCS.

As part of this PR, I've extracted some of the test infrastructure to make sure that
GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering
similar test (as I saw those diverging in what they cover)
2020-04-20 13:02:59 +02:00
Jason Tedor
9ecb222bfa
Remove unneeded validation in feature set usage
This validation is not needed, as we have discovered the source of the
serialization error that was leading to some usage instances appearing
to not have a name.
2020-04-18 14:29:59 -04:00
Jason Tedor
23049391be
Upgrade feature aware check usage of ASM to 7.3.1 (#54577)
This commit upgrades the ASM dependency used in the feature aware check
to 7.3.1. This gives support for JDK 14. Additionally, now that Gradle
understands JDK 13, it means we can remove a restriction on running the
feature aware check to JDK 12 and lower.
2020-04-18 10:49:57 -04:00
Jay Modi
405ff0ce27
Handle TLS file updates during startup (#55330)
This change reworks the loading and monitoring of files that are used
for the construction of SSLContexts so that updates to these files are
not lost if the updates occur during startup. Previously, the
SSLService would parse the settings, build the SSLConfiguration
objects, and construct the SSLContexts prior to the
SSLConfigurationReloader starting to monitor these files for changes.
This allowed for a small window where updates to these files may never
be observed until the node restarted.

To remove the potential miss of a change to these files, the code now
parses the settings and builds SSLConfiguration instances prior to the
construction of the SSLService. The files back the SSLConfiguration
instances are then registered for monitoring and finally the SSLService
is constructed from the previously parse SSLConfiguration instances. As
the SSLService is not constructed when the code starts monitoring the
files for changes, a CompleteableFuture is used to obtain a reference
to the SSLService; this allows for construction of the SSLService to
complete and ensures that we do not miss any file updates during the
construction of the SSLService.

While working on this change, the SSLConfigurationReloader was also
refactored to reflect how it is currently used. When the
SSLConfigurationReloader was originally written the files that it
monitored could change during runtime. This is no longer the case as
we stopped the monitoring of files that back dynamic SSLContext
instances. In order to support the ability for items to change during
runtime, the class made use of concurrent data structures. The use of
these concurrent datastructures has been removed.

Closes #54867
Backport of #54999
2020-04-17 20:10:33 -06:00
Zachary Tong
f46b567563 Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250)
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).

This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class.  Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
2020-04-17 16:39:38 -04:00
Ryan Ernst
66071b2f6e
Remove combo security and license helper from license state (#55366) (#55417)
Security features in the license state currently do a dynamic check on
whether security is enabled. This is because the license level can
change the default security enabled state. This commit splits out the
check on security being enabled, so that the combo method of security
enabled plus license allowed is no longer necessary.
2020-04-17 13:07:02 -07:00
William Brafford
49e30b15a2
Deprecate disabling basic-license features (#54816) (#55405)
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.

This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
2020-04-17 15:04:17 -04:00
Benjamin Trent
4be3663968
[7.x] [ML] fix bugs with prediction field value settings (#55333) (#55394)
* [ML] fix bugs with prediction field value settings (#55333)

This fixes two unreleased bugs:

1. Prediction value type of `number` might show unexpected classes

Analytics created models may have class labels like `1, 5, 10` (or some collection of discrete, whole numbers). These labels are passed to the inference model config in the `classification_labels` field.

When the predicted value format is `numeric` it should attempt to see if the classification labels are provided and are numeric. If so, use those. If not, use the underlying value.

2. When supplying an update overwrite, inference was losing the default prediction field value. This is because it was not copied over in the copy ctor in the ClassificationConfig.Builder class. 

closes #55332
2020-04-17 14:45:02 -04:00
Jake Landis
eb30cf5c89
[7.x] Move Watcher config out of RestResourcesPlugin (#55136) (#55336) 2020-04-17 12:38:01 -05:00
Benjamin Trent
8c581c3388
[ML] fixing and unmuting testHRDSplit test (#55349) (#55393)
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes https://github.com/elastic/elasticsearch/issues/32966
2020-04-17 09:55:52 -04:00
Tanguy Leroux
eb52df6652 Mute GraphTests.testTimedoutQueryCrawl (#55397)
Relates #55396
Relates #53913
2020-04-17 15:31:48 +02:00
Benjamin Trent
65e0084120
[ML] do not start stopping tasks on reassignment (#55315) (#55388)
When a anomaly jobs, datafeeds, and analytics tasks are stopped, they enter an ephemeral state called `STOPPING`. 

If the node executing the task fails while this is occurring, they could be stuck in the limbo state of `STOPPING`. It is best to mark the tasks as completed if they get reassigned to a node.
2020-04-17 08:57:12 -04:00
Tanguy Leroux
290361c63b
Mute MlConfigIndexMappingsFullClusterRestartIT.testMlConfigIndexMappingsAfterMigration (#55389)
Relates #54415
2020-04-17 14:54:17 +02:00
Costin Leau
fc6261967b SQL: Streamline declaration of LeafAggs (#55380)
Avoid repetition of the aggregation builder setup

Relates #55241

(cherry picked from commit 6cfe130e5da4aac11bad64f187fecc411139f5e2)
2020-04-17 15:04:54 +03:00
markharwood
7761b01a33
Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294) (#55375)
Closes #55288
2020-04-17 11:43:26 +01:00
Marios Trivyzas
f958e9abdc
SQL: Implement scripting inside aggs (#55241) (#55371)
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:

```
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b
FROM test
GROUP BY b
HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
```

Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as
this is currently not implemented on the ElasticSearch side.

Fixes: #29980
Fixes: #36865
Fixes: #37271

(cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)
2020-04-17 12:41:22 +02:00
Tanguy Leroux
71855fbfe0 Mute testSupportedFieldTypes in HDRPreAggregatedPercentile tests (#55369)
Relates #55360
2020-04-17 10:49:43 +02:00
Martijn van Groningen
417d5f2009
Make data streams in APIs resolvable. (#55337)
Backport from: #54726

The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.

In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.

Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).

Relates to #53100
2020-04-17 08:33:37 +02:00