This commit makes the names of fetch subphases more consistent:
* Now the names end in just 'Phase', whereas before some ended in
'FetchSubPhase'. This matches the query subphases like AggregationPhase.
* Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what
they do.
This is to support the ML categorization wizard.
Currently cluster:admin/analyze is only provided with the
"manage" cluster privilege, which is an excessive privilege
level to provide access to this single feature. It means
that the ML categorization wizard only works for extremely
highly privileged users.
Following this change the Kibana system user will be
permitted to run the _analyze endpoint on supplied strings
(not on an index). The ML UI will then call the _analyze
endpoint as the Kibana system user after first checking
that the logged-in user is permitted to create an ML job.
This will mean that users with the more reasonable
"manage_ml" cluster privilege will be permitted to use
the ML categorization wizard.
(This is also consistent with the way the ML UI will access
_all_ Elasticsearch functionality when the "ML in Spaces"
project is completed.)
Closes#51391
Relates elastic/kibana#57375
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.
While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
This commit removes the need for DeprecatedRoute and ReplacedRoute to
have an instance of a DeprecationLogger. Instead the RestController now
has a DeprecationLogger that will be used for all deprecated and
replaced route messages.
Relates #51950
Backport of #52278
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
* Add more checks around parameter conversions
This commit adds two necessary verifications on received parameters:
- it checks the validity of the parameter's data type: if the declared
data type is resolved to an ES or Java type;
- it checks if the returned converter is non-null (i.e. a conversion is
possible) and generates an appropriate exception otherwise.
(cherry picked from commit eda30ac9c69383165324328c599ace39ac064342)
* Extract common optimizer tests (#52169)
(cherry picked from commit e5ad72bc22e9ec0686ab582195f0032efcb880bf)
* Hook in the optimizer rules (#52172)
(cherry picked from commit 1f90d8cc56052fbf2af604e72f9f5ca73f5e75d5)
Previously, in the in-memory sorting module
`LocalAggregationSorterListener` only the aggregate functions where used
(grabbed by the `sortingColumns`). As a consequence, if the ORDER BY
was also using columns of the GROUP BY clause, (especially in the case
of higher priority - before the aggregate functions) wrong results were
produced. E.g.:
```
SELECT gender, MAX(salary) AS max FROM test_emp
GROUP BY gender
ORDER BY gender, max
```
Add all columns of the ORDER BY to the `sortingColumns` so that the
`LocalAggregationSorterListener` can use the correct comparators in
the underlying PriorityQueue used to implement the in-memory sorting.
Fixes: #50355
(cherry picked from commit be680af11c823292c2d115bff01658f7b75abd76)
add a list of unsupported aggs in transforms and create a test that fails if a new aggregation is
added. Limitation: works only if a new agg is added to either the core or a known plugin
(Analytics, MatrixAggregation).
During a bug hunt, I caught a handful of things (unrelated to the bug) that could be potential issues:
1. Needlessly wrapping in exception handling (minor cleanup)
2. Potential of notifying listeners of a failure multiple times + even trying to notify of a success after a failure notification
Modifies SLM's and ILM's history indices to be hidden indices for added
protection against accidental querying and deletion, and improves
IndexTemplateRegistry to handle upgrading index templates.
Also modifies the REST test cleanup to delete hidden indices.
This commit adds a new security origin, and an associated reserved user
and role, named `_async_search`, which can be used by internal clients to
manage the `.async-search-*` restricted index namespace.
7.x backport of #52201
Provides a path to set register the EQL feature flag in release builds.
This enables EQL in release builds so that release docs tests pass.
Release docs tests do not have infrastructure in place to only register
snippets from included portions of the docs, they instead include all
docs snippets.
Since EQL can not be enabled in release builds, this meant that the EQL
snippets fail in the release docs tests.
This adds the ability to enable EQL in the release docs tests. This
system property will be removed when EQL is ready for release.
disallow to specify percentile out of range [0,100]. This also fixes a problem in transform by failing
validation if an invalid percentile configuration is used.
In #51146 a rudimentary check for poor categorization was added to
7.6.
This change replaces that warning based on a Java-side check with
a new one based on the categorization_status field that the ML C++
sets. categorization_status was added in 7.7 and above by #51879,
so this new warning based on more advanced conditions will also be
in 7.7 and above.
Closes#50749
Changes the misleading error message when attempting to open
a job while the "cluster.persistent_tasks.allocation.enable"
setting is set to "none" to a clearer message that names the
setting.
Closes#51956
Previously, when the specified (or default) fetchSize led to
subsequent HTTP requests and the usage of cursors, those subsequent
were no longer using the client timezone specified in the initial
SQL query. As a consequence, Even though the query is executed once
(with the correct timezone) the processing of the query results by
the HitExtractors in the next pages was done using the default
timezone Z. This could lead to incorrect results.
Fix the issue by correctly using the initially specified timezone,
which is found in the deserialisation of the cursor string.
Fixes: #51258
(cherry picked from commit 8f7afbdeb9295999b48a6c36db5b31cbe0cee432)
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use.
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)
Resolves: #49106
This change extracts the code that previously existed in the
"Authentication" class that was responsible for reading and writing
authentication objects to/from the ThreadContext.
This is needed to support multiple authentication objects under
separate keys.
This refactoring highlighted that there were a large number of places
where we extracted the Authentication/User objects from the thread
context, in a variety of ways. These have been consolidated to rely on
the SecurityContext object.
Backport of: #52032
Refactors `DataFrameAnalyticsTask` to hold a `StatsHolder` object.
That just has a `ProgressTracker` for now but this is paving the
way to add additional stats like memory usage, analysis stats, etc.
Backport #52134
Employs `ResultsPersisterService` from `DataFrameRowsJoiner` in order
to add retries when a data frame analytics job is persisting the results
to the destination data frame.
Backport of #52048
Make the parsing of date more lenient
- as an escaped literal: `{d '2020-02-10[[T| ]10:20[:30][.123456789][tz]]'}`
- cast a string to a date: `CAST(2020-02-10[[T| ]10:20[:30][.123456789][tz]]' AS DATE)`
Closes: #49379
(cherry picked from commit 5863b27500d5e7f6cdd8c6c62b09b84e53ca724a)
Currently, the logic for looking up `flattened` field types lives in the
top-level `FieldTypeLookup`. This PR moves it into a dedicated class
`DynamicKeyFieldTypeLookup`.
This fixes:
- the parsing of milliseconds in intervals: everything past the . used to be converted as-is to milliseconds, with no normalisation of the unit; thus, a value of .23 ended up as 23 millis in the interval, instead of 230.
- the printing of a trailing .0, in case the interval lacks the fractional part;
- tests generating a random millisecond value used to simply print it in the string about to be evaluated without a necessary front-filling of 0[s], where the amount was below 100/10.
(The combination of first and last issues above, plus statistical "luck" made the incorrect handling pass the tests.)
(cherry picked from commit 4de8c64f63ee37c1bcfdb9b9d3a07d09be243222)
* Allow forcemerge in the hot phase for ILM policies
This commit changes the `forcemerge` action to also be allowed in the `hot` phase for policies. The
forcemerge will occur after a rollover, and allows users to take advantage of higher disk speeds for
performing the force merge (on a separate node type, for example).
On caveat with this is that a `forcemerge` in the `hot` phase *MUST* be accompanied by a `rollover`
action. ILM validates policies to ensure this is the case.
Resolves#43165
* Use anyMatch instead of findAny in validation
* Make randomTimeseriesLifecyclePolicy single-pass
It's perfectly fine if a bulk request on the follower hits
IndexShardClosedException in some CCR tests because we sometimes
close some follower shards while the follow-task is replicating operations.
Instead of failing the test immediately, this commit bubbles up that
failure to the shard follow task.
Closes#52052
Allow also whitespace ` ` (together with `T`) as a separator between
date and time parts of the timestamp string. E.g.:
```
{ts '2020-02-08 12.10.45'}
```
or
```
{ts '2020-02-08T12.10.45'}
```
Fixes: #46069
(cherry picked from commit 07c977023fb8ceab5991c359a6cbfe07beaad9bb)
This change adds support for the following new model_size_stats
fields:
- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status
Backport of #51879
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
Now that the FIPS 140 security provider is simply a test dependency
we don't need the thirdPartyAudit exceptions, but plugin-cli and
transport-netty4 do need jarHell disabled as they use the non fips
BouncyCastle security provider as a test dependency too.
Some parts of the User class (e.g. equals/hashCode) assumed that
principal could never be null, but the constructor didn't enforce
that.
This adds a null check into the constructor and fixes a few tests that
relied on being able to pass in null usernames.
Backport of: #51988
The REST tests for autoscaling either need to be skipped in a
non-snapshot build, or alternatively, the feature flag registered so
that autoscaling can be enabled. We prefer the latter approach, as it
allows us to also test autoscaling in non-snapshot builds incrementally,
instead of at the end of development as autoscaling prepares for
release. This commit registers the autoscaling feature flag in REST
tests for non-snapshot builds.
We can just put the `IndexId` instead of just the index name into the recovery soruce and
save one load of `RepositoryData` on each shard restore that way.
Add some more tests where more than one literal is selected,
unaliased and aliased.
Follows: #42121
(cherry picked from commit 405271d408a233e697eb2e9ded3005a71f4df5e7)
This commit provides a path to set register the autoscaling feature flag
in release builds, and therefore enabling autoscaling in release
builds. The primary reason that we add this is so that our release docs
tests can pass. Our release docs tests do not have infrastructure in
place to only register snippets from included portions of the docs, they
instead include all docs snippets. Since autoscaling can not be enabled
in release builds, this meant that the autoscaling snippets would fail
in the release docs tests. To address then, we need the ability to
enable autoscaling in the release docs tests which we can now do with
the system property added here. This system property will be removed
when autoscaling is ready for release.