Backport of #64454
- Add LongRareTerms and StringRareTerms to the DefaultNamedXContents,
ensure that the response of RareTerms aggregation can be parsed
correctly.
- Add testSearchWithRareTermsAgg method to test the response of
RareTerms aggregation can be parsed correctly.
- Add some test code to ensure the AggregationsTests can execute
successfully.
Co-authored-by: bellengao <gbl_long@163.com>
Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.
Close#58646
(cherry picked from commit e85d9d1bbee13ad408e789fd62efb30bc8d223f2)
(cherry picked from commit 452c674a10cdc16dced3cde7babf5d5a9d64a6d9)
Executing a MultiGetRequest with no items will fail. This makes sure there
is always at least one item in the request.
(cherry picked from commit bd4703250fe331296b8613b277ea25c8bef1dcd9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* Allow all indices options variants
Irrespective of allow_no_indices value, throw VerificationException when
there is no index validated
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Getting the API key document form the security index is the most time consuing part
of the API Key authentication flow (>60% if index is local and >90% if index is remote).
This traffic is now avoided by caching added with this PR.
Additionally, we add a cache invalidator registry so that clearing of different caches will
be managed in a single place (requires follow-up PRs).
Make EQL case sensitive by default and adapt some of the string functions
Remove the case sensitive option from Between string function
Add case_insensitive option to term and wildcard queries usage
(cherry picked from commit 7550e0664c8c2f1f13519036c759b1e76345551f)
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata.
Related to: https://github.com/elastic/ml-cpp/pull/1522
Ignore by default unavailable indices (same as ES) and verify that
allowNoIndices is set to false since at least one index is required
for validating the query.
Fix#62986
(cherry picked from commit fd75ac27223cd1b699b8d9c311dc401a39f9e0c8)
* [ML] renames */inference* apis to */trained_models* (#63097)
This commit renames all `inference` CRUD APIs to `trained_models`.
This aligns with internal terminology, documentation, and use-cases.
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.
Fix#62131
(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
Classification feature importance supports various types in the class name:
- string
- boolean
- numerical
The xcontent parsing on the server side and the HLRC side should support and test these types.
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.
Backport of #62486
In case of more than 500 transforms, get and stats return paged results which can be requested using
page parameters. For >500 transforms count wasn't parsed out of the server response but taken from
size of the list of transforms.
The change also adds client/server hlrc tests and fixes a wrong type for count in get.
fixes#56245
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.
Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.
Fix#62494
(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
This adds ILM support for automatically migrating the managed
indices between data tiers.
This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).
(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* [ML] adds new n_gram_encoding custom processor (#61578)
This adds a new `n_gram_encoding` feature processor for analytics and inference.
The focus of this processor is simple ngram encodings that allow:
- multiple ngrams [1..5]
- Prefix, infix, suffix
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.
As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).
Fix#59764Fix#59779
Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
* [ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
We only ever support `JSON` for the query source format in practice.
The reason this test worked before is a bug in xcontent parsing that parses
empty maps out of streams of the wrong format.
Closes#61483
(cherry picked from commit 291f5bd1b2e889e9447d660e5407f3120cffb1a5)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Co-authored-by: Tuan Le <23419763+tumile@users.noreply.github.com>
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.
Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
Backport of #60746
to wait for at least a single shard to be allocated for a backing index of a data stream,
so that total store size is larger than zero (which is what the tests expects).
Closes#60461
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:
- continuous mode with several checkpoints created
- correctness of results
- optimizations (minimal necessary writes)
- permutations of features (index settings, aggs, data types, index or data stream)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration