Commit Graph

5772 Commits

Author SHA1 Message Date
Costin Leau fe775a315f EQL: Obey size request parameter (#59014)
While at it, change the default size to 10 (to align it with the search
API defaults).

(cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)
2020-07-06 19:14:25 +03:00
Yang Wang 2a1635ad69
Create API key with TransportBulkAction directly (#59046) (#59060)
Use TransportBulkAction directly to create API keys instead of going through the proxy from IndexAction to BulkAction.
2020-07-06 23:32:07 +10:00
Yang Wang 66c0231895
Improve threadpool usage and error handling for API key validation (#58090) (#59047)
The PR introduces following two changes:

Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service.

On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.
2020-07-06 21:21:07 +10:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
David Kyle 0fc12194bf
[ML] Increase timeout in MlDistributedFailureIT (#58997) (#59013)
Doubles the timeout on the ensureStableClusterOnAllNodes method to 60s
to account for v slow ci
2020-07-06 11:30:41 +01:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Armin Braun 49857cc35d
Dry up Master Disconnect Disruption Tests (#58953) (#59050)
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
2020-07-06 11:04:24 +02:00
Yang Wang a9151db735
Map only specific type of OIDC Claims (#58524) (#59043)
This commit changes our behavior in 2 ways:

- When mapping claims to user properties ( principal, email, groups,
name), we only handle string and array of string type. Previously
we would fail to recognize an array of other types and that would
cause failures when trying to cast to String.
- When adding unmapped claims to the user metadata, we only handle
string, number, boolean and arrays of these. Previously, we would
fail to recognize an array of other types and that would cause
failures when attempting to process role mappings.

For user properties that are inherently single valued, like
principal(username) we continue to support arrays of strings where
we select the first one in case this is being depended on by users
but we plan on removing this leniency in the next major release.

Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
2020-07-06 11:36:41 +10:00
Tanguy Leroux 49f4227837 Check acknowledged responses in FsSearchableSnapshotsIT (#59021)
Despite all my attempts I did not manage to reproduce issues like the ones 
described in #58961. My guess is that the _mount request got retried at 
some point but I wasn't able to validate this assumption.

Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small 
random chunk size and a large number of documents is picked up in the 
tests. The parent class also does not verify the acknowledged status 
of some requests.

This commit lowers down the chunk size and number of docs in tests 
(this is extensively tests in unit tests) and also adds assertions on 
acknowledged responses.

Relates #58961
2020-07-05 10:50:31 +02:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Bogdan Pintea e88d71b187
[7.x] SQL: Redact credentials in connection exceptions (#58650) (#59025)
* SQL: Redact credentials in connection exceptions (#58650)

This commit adds the functionality to redact the credentials from the
exceptions generated when a connection attempt fails, preventing them
from leaking into logs, console history etc.

There are a few causes that can lead to failed connections. The most
challenging to deal with is a malformed connection string. The redaction
tries to get around it by modifying the URI to a parsable state, so that
the redaction can be applied reliably. If there's no reliability
guarantee, the redaction will bluntly replace the entire connection
string and the user informed about the option to modify it so that the
redaction won't apply. (This is done by using a caplitalized scheme,
which is legal, but otherwise never used in practice.)

The commit fixes a couple of other issues with the URI parser:
- it allows an empty hostname, or even entire connection string (as per
the existing documentation);
- it reduces the editing of the connection string in the exception
messages (so that the user easier recognize their input);
- it uses the default URI as source for the scheme and hostname.

(cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd)

* Implement String#repeat(), unavailable in Java8

Implement a client.StringUtils#repeatString() as a replacement for
String#repeat(), unavailable in Java8.
2020-07-04 11:29:06 +02:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Lisa Cawley 5c19464a2f [DOCS] Clarifies number of file and native realms (#58949) 2020-07-03 11:00:28 -07:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Bogdan Pintea 3d96d91efb
[7.x] SQL: fix handling of escaped chars in JDBC connection string (#58429) (#58977)
SQL: fix handling of escaped chars in JDBC connection string (#58429)

This commit fixes an issue emerging when the connection string URI
contains escaped characters.

The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.

The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.

(cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)
2020-07-03 17:03:00 +02:00
Luca Cavanna e3fc1638d8 Improve error handling in async search code (#57925)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.

Closes #58995
2020-07-03 16:07:26 +02:00
Hendrik Muhs ca3da7af85 [ML] handle broken setup with state alias being an index (#58999)
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If
.ml-state-write is a concrete index instead of an alias ML stops working. This change improves error
handling by setting the job to failed and properly log and audit the problem. The user still has to
manually fix the problem. This change should lead to a quicker resolution of the problem.

fixes #58482
2020-07-03 15:26:59 +02:00
Ignacio Vera 2c2486d3d4
Fix GeoHash grid aggregation circuit breaker tests (#58218) (#59001) 2020-07-03 13:46:35 +02:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Luca Cavanna 4f86f6fb38 Submit async search to not require read privilege (#58942)
When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data.

The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.
2020-07-03 12:18:07 +02:00
David Kyle f6a0c2c59d
[7.x] Pipeline Inference Aggregation (#58965)
Adds a pipeline aggregation that loads a model and performs inference on the
input aggregation results.
2020-07-03 09:29:04 +01:00
Tim Vernum 1133c29ce9
Treat roles as a SortedSet (#58988)
The Saml SP document stored the role mapping in a Set, but this made
the order in XContent inconsistent. This switched it to use a TreeSet.

Resolves: #54733
Backport of: #55201
2020-07-03 13:40:58 +10:00
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Benjamin Trent bd9b3b6116
[ML] fix inference ml-stats-write alias creation (#58947) (#58959)
The check for potentially creating the .ml-stats-write alias should verify that the indices actually exist.

closes #58662
2020-07-02 16:16:42 -04:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Tal Levy d516959774
Re-enable support for array-valued geo_shape fields. (#58786) (#58943)
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
2020-07-02 11:21:55 -07:00
David Roberts 2c04685b81
[ML] Ensure config index mappings are up-to-date before updating configs (#58938)
We already had code to ensure the config index mappings were
up-to-date before creating a new config.  However, it's also
possible that an update to a config could add the latest
settings that require the latest mappings to index correctly.

This change checks that the latest config index mappings are
in place in the 3 update actions in the same way as the checks
are done in the 3 put actions.

Backport of #58916
2020-07-02 18:55:19 +01:00
Robin Clarke 567720d970 [DOCS] Added caveat about the number of file realms (#58369) 2020-07-02 10:27:36 -07:00
Dan Hermann c988afdc15
Data stream support for migrations deprecations info API 2020-07-02 11:16:22 -05:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Tanguy Leroux 6aa669c8bb Fix SearchableSnapshotDirectoryStatsTests (#58912)
Similar to #58847 but in a different tests. The failure never 
reproduced locally but occurs from time to time on CI.
2020-07-02 16:39:26 +02:00
Dan Hermann b78bfa01f6
[7.x] Data stream support for graph explore API 2020-07-02 08:19:03 -05:00
David Kyle d6643bfc7f Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902)"
The test was fixed in #58847

This reverts commit bb96c910a5.
2020-07-02 13:21:05 +01:00
David Kyle bb96c910a5 Mute FsSearchableSnapshotsIT testClearCache (#58902)
For #58901
2020-07-02 12:58:28 +01:00
Costin Leau 965f77fa44 EQL: Introduce sequence internal paging (#58859)
Refactor sequence matching classes in order to decouple querying from
results consumption (and matching).
Rename some classes to better convey their intent.

Introduce internal pagination of sequence algorithm, that is getting the
data in slices and, if needed, moving forward in order to find more
matches until either the dataset is consumer or the number of results
desired is found.

(cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)
2020-07-02 13:44:21 +03:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Benjamin Trent c64e283dbf
[7.x] [ML] handles compressed model stream from native process (#58009) (#58836)
* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: https://github.com/elastic/ml-cpp/pull/1349
2020-07-01 15:14:31 -04:00
Mark Vieira 1fcaec7dfc
Ignore test seed used in test system properties (#58789) 2020-07-01 11:52:22 -07:00
James Rodewig a966513eae
[DOCS] Remove problematic terms (#58832) (#58851) 2020-07-01 13:47:14 -04:00
Nhat Nguyen f63cbad629
Ensure CCR partial reads never overuse buffer (#58620)
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
2020-07-01 13:23:28 -04:00
Tanguy Leroux ec4843f4df Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847)
Since #58728 part of searchable snapshot shard files are written in cache 
in an asynchronous manner in a dedicated thread pool. It means that even 
if a search query is successful and returns, there are still more bytes to 
write in the cached files on disk.

On CI this can be slow; if we want to check that the cached_bytes_written 
has changed we need to check multiple times to give some time for the 
cached data to be effectively written.
2020-07-01 18:01:00 +02:00
Benjamin Trent c768467155
Muting flakey test (#58855) (#58856) 2020-07-01 11:54:43 -04:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Tanguy Leroux d35e8f45da
Allow read operations to be executed without waiting for full range to be written in cache (#58728) (#58829)
This commit changes CacheFile and CachedBlobContainerIndexInput so that
 the read operations made by these classes are now progressively executed 
and do not wait for full range to be written in cache. It relies on the change 
introduced in #58477 and it is the last change extracted from #58164.

Relates #58164
2020-07-01 15:38:17 +02:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00