This test accesses system indices for 2 reasons.
First, it creates a filter that has a different type. This was done
to assert that filter is not returned from the APIs. However,
now that access to the `.ml-meta` index is restricted,
it is not really a concern.
Second, it creates a `.ml-meta` index without mappings to test
the get API does not fail due to lack of mappings on a sorted field,
namely the `filter_id`. Once again, this test is less useful once
system indices have restricted access.
Relates #62501
Backport of #63111
This fixes a bug introduced when moving from mget to ids query. While
mget returns all the ids given, id query is a search query and thus by
default returns only 10 documents.
The fix correctly sets the expected size so all the information is
returned inside the response.
Fix#63030
(cherry picked from commit 09ba85548a0142a1fe8376efea9cc4e7764a207c)
It extracts the query capabilities from AbstractGeometryFieldType into two new interfaces, GeoshapeQueryable and ShapeQueryable. Those interfaces are implemented by the final mappers.
Bwc tests can consume much time to build and to run so it's nice to be
able to skip them when running the `check` task on the SQL module.
Introduce a new task `checkNoBwc` so one can use:
```
./gradlew -p x-pack/plugin/sql checkNoBwc
```
to skip them.
(cherry picked from commit a52e1846f338f6869273181c6f248579581fa68c)
Previously, backquote couldn't not be used inside an escaped identifier,
e.g.:
```
`my`identifier` = "some_value"
```
was not allowed. Introduce escaping of the backquote by using a
double backquote:
```
`my``identifier` = "some_value"
```
(cherry picked from commit 49514121486f42a58674b3e5901de4021fda5c15)
When an opened anomaly detection job is updated with a detection
rule that references a filter, apart from updating the c++ process
with the rule, we also need to update it with the referenced filter.
This commit fixes a bug which led to the job not applying such updates
on-the-fly.
Fixes#62948
Backport of #63057
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.
Fix#62131
(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
Test testSortDifferentFormatsShouldFail was occasionally failing for
2 reasons:
1) Documents on "idx2" were not available for search before a
search request started
2) Running a test multiple times was causing
occasional ResourceAlreadyExistsException for idx2,
as idx2 was not deleted for a test.
This patch makes the following fixes:
1) Sets up immediate refresh policy for docs in the index"idx2"
2) Creates an index idx2 only once per cluster
Closes: #62997
This is a very minor optimization but trivial to implement, so might as well.
```
Benchmark (nGramStrs) Mode Cnt Score Error Units
NGramProcessorBenchmark.ngramInnerLoop 1,2,3 avgt 20 4415092.443 ± 31302.115 ns/op
NGramProcessorBenchmark.ngramOuterLoop 1,2,3 avgt 20 4235550.340 ± 103393.465 ns/op
```
This measurement is in nanoseconds, consequently, the overall performance of inference is dominated by other factors (i.e. map#put). But, this optimization adds up overtime and is simple.
This commit adds bwc testing for data frame analytics.
The bwc tests only go back to the 7.9.0.
Meaning, initially only rolling upgrades from 7.9.x -> 7.10.0 are tested.
Since the feature was experimental in < 7.9.0, this is acceptable.
Data frame analytics results format changed in version `7.10.0`.
If existing jobs that were not completed are restarted, it is
possible the destination index had already been created. That index's
mappings are not suitable for the new results format.
This commit checks the version of the destination index and deletes
it when the version is outdated. The job will then continue by
recreating the destination index and reindexing.
Backport of #62960
This PR adds timeouts to the named pipe connections of the
autodetect, normalize and data_frame_analyzer processes.
This argument requires the changes of elastic/ml-cpp#1514 in
order to work, so that PR will be merged before this one.
(The controller process already had a different mechanism,
tied to the ES JVM lifetime.)
Backport of #62993
Fix query creation inside sequences with any queries due to lacking a
clause to combine, which lead to an invalid request being created.
Fix#62967
(cherry picked from commit ff59d8823919a6e70928816e5c3687308ebde33f)
Classification feature importance supports various types in the class name:
- string
- boolean
- numerical
The xcontent parsing on the server side and the HLRC side should support and test these types.
* Use compilation as validation for painless role template (#62845)
Role template validation now performs only compilation if the script is painless.
It no longer attempts to execute the script with empty input which is problematic.
The compliation process will catch things like invalid syntax, undefined variables,
which still provide certain level of protection against ill-defined role templates.
Behaviour for Mustache script is unchanged.
* Checkstyle
* [ML] fixing testTwoJobsWithSameRandomizeSeedUseSameTrainingSet tests (#62976)
This fixes the two test failures.
The shard failure seems to be due to the .ml-stats index being in the middle of being created.
50_script_values/Script query fails sometimes
as resulting hits will be ordered differently from expected.
This patch ensures consisten ordering of hits.
Closes#62975
This refactors the loading of monitoring templates slightly so that they aren't loaded over and
over again (from disk) on CS updates. This isn't an important optimization in production for obvious
reasons since it only affects the install stage, but this turned out to cause some slow CS applies
in tests.
Relates #62853
If the transform grouping is a script then exclude the field from the source index
mappings fields caps request. A null object caused an NPE in the serialisation of
FieldCapabilitiesIndexRequest.
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.
Backport of #62486
Passing FieldMappers to point parsing functions makes trying to build source-only
fields from MappedFieldTypes more complicated. This small refactoring changes
things so that the relevant parsing and factory functions from
AbstractGeometryFieldMapper are instead passed as lambdas to the PointParser
constructor.
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
Introduce 64-bit unsigned long field type
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Backport for #60050Closes#32434
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.
Fixes#61631
Same as in the normal Netty tests we have to disable the runtime proc
setting in the normal tests task just like we do for the internal cluster tests.
Closes#61919Closes#62298
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.
As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.
Closes#37182
for get trained models include_model_definition is now deprecated.
This commit writes a deprecation warning if that parameter is used and suggests the caller to utilize the replacement
Backport #62825 to 7.x branch.
Today if a data stream is auto created, but an index with same name as the
first backing index already exists then internally that error is ignored,
which then result that later in the execution of a bulk request, the
bulk item fails due to that the data stream hasn't been auto created.
This situation can only occur if an index with same is created that
will be the backing index of a data stream prior to the creation
of the data stream.
Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>
The `migrate` action will now configure the
`index.routing.allocation.include._tier_preference` setting to the corresponding
tiers. For the HOT phase it will configure `data_hot`, for the WARM phase it will
configure `data_warm,data_hot` and for the COLD phase
`data_cold,data_warm,data_cold`.
(cherry picked from commit 9dbf0e6f0c267e40c5bcfb568bb2254da103ae40)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Replace common Like and RLike queries that match all characters with
IsNotNull (exists) queries
Fix#62585
(cherry picked from commit 4c23fad0468a9edd7325b06c6a96f7af37625dbf)
In case of more than 500 transforms, get and stats return paged results which can be requested using
page parameters. For >500 transforms count wasn't parsed out of the server response but taken from
size of the list of transforms.
The change also adds client/server hlrc tests and fixes a wrong type for count in get.
fixes#56245
Backport #62766 to 7.x branch.
The bulk api cache the resolved concrete indices when resolving the user provided
index name into the actual index name. The validation that prevents write ops other
than create from being executed in a data stream was only performed if the result
wasn't cached. In case of cached resolvings, the validation never occurs.
The validation would be skipped for all bulk items for a data stream after a create
operation for that same data stream. This commit ensures that the validation is always
performed for all bulk items (whether the concrete index resolution has been cached or
not cached).
Closes#62762
When state persistence was first implemented for data frame analytics
we had the assumption that state would always fit in a single document.
However this is not the case any more.
This commit adds handling of state that spreads over multiple documents.
Backport of #62564
This fixes reindexing progress in the scenario when a DFA job that had not finished
reindexing is resumed (either because the user called stop and start or because the
job was reassigned in the middle of reindexing). Before the fix reindexing progress
stays to the value it had reached before until it surpasses that value.
When we resume a data frame analytics job we want to preserve reindexing progress
and reset all other phases. Except for when reindexing was not completed.
In that case we are deleting the destination index and starting reindexing
from scratch. Thus we need to reset reindexing progress too.
Backport of #62772
To better align the plugin naming with other mapper plugins under x-pack (e.g.
mapper-flattened) this PR changes the plugin name and the containing directory
to "mapper-version"
This change adds support for the recently introduced case insensitivity flag for
wildcard and prefix queries. Since version field values are encoded differently we
need to adapt our own AutomatonQuery variation to add both cases if case insensitivity
is turned on.
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.
There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.
This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.
At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
Eclipse was confused for two reasons:
1. `:x-pack:plugin` depended on itself.
2. `ql`, `sql`, and `eql` couldn't see some methods.
I fixed problem 1 by only adding the "depends on itself" configuration
outside of eclipse. I fixed problem 2 by making a `test` sub-project in
`ql` that contains test utilities and depending on those where possible.
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.
Backport of #61655
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.
This commit addresses this issue for the ML plugin
* unmuting test
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.
Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).
This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
When target indices are remote only, CCS does not require user to have privileges on the local cluster. This PR ensure Point-In-Time reader follows the same pattern.
Relates: #61827
CCS with remote indices only does not require any privileges on the local cluster.
This PR ensures that search with scroll follow the permission model.
This allows the `check-migration` step to move past the allocation check
if the tier routing settings are manually unset.
This helps a user unblock ILM in case a tier is removed (ie. if the warm tier
is decommissioned this will allow users to resume the ILM policies stuck in
`check-migration` waiting for the warm nodes to become available and the managed
index to allocate. this allows the index to allocate on the other available tiers)
(cherry picked from commit d7a1eaa7f51d0972d10c0df1d3cd77d6b755dd41)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Implement FORMAT according to the SQL Server spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver15#ExampleD by translating to the java.time patterns used in DATETIME_FORMAT.
Closes: #54965
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit da511f4e033db6e8a6aa2a54b23e906b5e026845)
As part of the conversion, adds the ability to customize merge validation - in this case, we
allow an update to the constant value if it is currently set to null, but refuse further
updates once it has been set once.
This commit also converts ParametrizedMapperTests to use MapperServiceTestCase.
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.
Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.
Relates to #48878
This commit adds a test that verifies that snapshots incrementality
is respected when a snapshot-backed index is snapshotted. This
test mounts a snapshot as a snapshot-backed index, creates a
new snapshot from it and then verifies that no new data blobs
were added to the repository.
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.
The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:
```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```
If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.
This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.
Subsequent work will change the ILM migration to make additional use of this setting.
Relates to #60848
Backports #61590 to 7.x
So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set
This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not.
This way each metadata field can configure if it can be included in the document _source
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
Async search tests can take more than one minute due to the excessive trace logs.
And the point in time in the tests can be expired the midway.
Closes#62451
Expressions like `1 = 2 = 3 = 4` or `1 < 2 = 3 >= 4` were treated with
leftmost priority: ((1 = 2) = 3) = 4 which can lead to confusing
results. Since such expressions don't make so much change for EQL
filters we disallow them in the parser to prevent unexpected results
from their bad usage.
Major DBs like PostgreSQL and Oracle also disallow them in their SQL
syntax. (counter example would be MySQL which interprets them as we did
before with leftmost priority).
Fixes: #61654
(cherry picked from commit 8f94981bb093f104228d267b532e0a3d5b7f6a38)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.
Fix#62494
(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)