This fixes reindexing progress in the scenario when a DFA job that had not finished
reindexing is resumed (either because the user called stop and start or because the
job was reassigned in the middle of reindexing). Before the fix reindexing progress
stays to the value it had reached before until it surpasses that value.
When we resume a data frame analytics job we want to preserve reindexing progress
and reset all other phases. Except for when reindexing was not completed.
In that case we are deleting the destination index and starting reindexing
from scratch. Thus we need to reset reindexing progress too.
Backport of #62772
To better align the plugin naming with other mapper plugins under x-pack (e.g.
mapper-flattened) this PR changes the plugin name and the containing directory
to "mapper-version"
This change adds support for the recently introduced case insensitivity flag for
wildcard and prefix queries. Since version field values are encoded differently we
need to adapt our own AutomatonQuery variation to add both cases if case insensitivity
is turned on.
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.
There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.
This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.
At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
* Make for each processor resistant to field modification (#62791)
This change provides consistent view of field that foreach processor is iterating over. That prevents it to go into infinite loop and put great pressure on the cluster.
Closes#62790
* fix compilation
This extracts the configuration for extracting values from a groked
string when building the grok expression to do two things:
1. Create a method exposing that configuration on `Grok` itself which
will be used grok `grok` flavored runtime fields.
2. Marginally speed up extracting grok values by skipping a little
string manipulation.
Eclipse was confused for two reasons:
1. `:x-pack:plugin` depended on itself.
2. `ql`, `sql`, and `eql` couldn't see some methods.
I fixed problem 1 by only adding the "depends on itself" configuration
outside of eclipse. I fixed problem 2 by making a `test` sub-project in
`ql` that contains test utilities and depending on those where possible.
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.
Backport of #61655
Java 15 requires at last glibc 2.14, but we support older Linux OSs that ship with older versions. Rather than continue to ship Java 14, which is now EOL and therefore unsupported, ES will detect this situation and print a helpful message, instead of the cryptic error that would otherwise be printed. Users on older OSs will have to set JAVA_HOME instead of using the bundled JVM.
This doesn't affect v8.0.0 because these older Linux OSs will not be supported, and all the supported ones have glibc 2.14.
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls (#62694)
* [ML] changing to not use global bulk indexing parameters in conjunction with add(object) calls
global parameters, outside of the global index, are ignored for internal callers in certain cases.
If the interal caller is adding requests via the following methods:
```
- BulkRequest#add(IndexRequest)
- BulkRequest#add(UpdateRequest)
- BulkRequest#add(DocWriteRequest)
- BulkRequest#add(DocWriteRequest[])
```
It is better to specifically set the desired parameters on the requests before they are added
to the bulk request object.
This commit addresses this issue for the ML plugin
* unmuting test
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.
Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
This reworks the code around grok's built-in patterns to name things
more like the rest of the code. Its not a big deal, but I'm just more
used to having `public static final` constants in SHOUTING_SNAKE_CASE.
Closes#61660. When ordering shard for recovery, ensure system index shards are
ordered first so that their recovery will be started first.
Note that I rewrote PriorityComparatorTests to use IndexMetadata instead of its
local IndexMeta POJO.
In #52680 we introduced a mechanism that will allow nodes to remove
themselves from the cluster if they locally determine themselves to be
unhealthy. The only check today is that their data paths are all
empirically writeable. This commit extends this check to consider a
failure of `NodeEnvironment#assertEnvIsLocked()` to be an indication of
unhealthiness.
Closes#58373
`RepositoriesService#doClose` was never called which lead to
mock repositories not unblocking until the `ThreadPool` interrupts
all threads. Thus stopping a node that is blocked on a mock repository operation wastes `10s`
in each test that does it (which is quite a few as it turns out).
There's possible retries here that work out if both the snapshot and the delete
operation are retried when master shuts down and hits the unlikely case of the retried delete
executing before the retried snapshot, making both operations pass.
Closes#62686
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).
This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
When target indices are remote only, CCS does not require user to have privileges on the local cluster. This PR ensure Point-In-Time reader follows the same pattern.
Relates: #61827
Currently we log the NettyAllocator description when the netty plugin is
created. Unfortunately, this hits certain static fields in Netty which
triggers the settings of the number of CPU processors. This conflicts
with out Elasticsearch behavior to override this based on a setting.
This commit resolves the issue by logging after the processors have been
set.
CCS with remote indices only does not require any privileges on the local cluster.
This PR ensures that search with scroll follow the permission model.
* [DOCS] EQL: Improve regsvr32 misuse explanation (#62722)
Expands the introduction to better explain what regsvr32 misuse is and
how it works at a high level.
* [DOCS] EQL: Style fixes
Currently the netty pool chunk size defaults to 16MB. The number does
not play well with the G1GC which causes this to consume entire regions.
Additionally, we normally allocated arrays of size 64KB or less. This
means that Elasticsearch could handle a smaller pool chunk size to play
nicer with the G1GC.
This commit ensures that the final order of the terms aggregations
is registered correctly after the final reduce.
This bug was introduced in #62028 which is not released yet so this PR is marked
as a non-issue.
This issue was discovered when running a terms aggregation under an auto-date
histogram. In such a case, the auto-date histogram may run multiple final reduce
to merge buckets together. This change makes sure that running multiple final reduces
doesn't create duplicates but it doesn't fix the fact that the final reduce may prune
the list of terms prematurely. This other bug is tracked separately in #62731.
This assertion does not always hold because there can be a race between
`putReaderContext` and `afterIndexRemoved` when an index is deleted.
Closes#62624
This allows the `check-migration` step to move past the allocation check
if the tier routing settings are manually unset.
This helps a user unblock ILM in case a tier is removed (ie. if the warm tier
is decommissioned this will allow users to resume the ILM policies stuck in
`check-migration` waiting for the warm nodes to become available and the managed
index to allocate. this allows the index to allocate on the other available tiers)
(cherry picked from commit d7a1eaa7f51d0972d10c0df1d3cd77d6b755dd41)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
The distro tests rely on two jdks, pulled in by the jdk download plugin.
The move the artifact transforms result in the path to the extracted
jdks existing under the gradle cache dir, which is outside the vagrant
mount of the elasticsearch project. This commit creates a local copy
within the `qa:os` project that the packaging tests use.
closes#61138
Implement FORMAT according to the SQL Server spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver15#ExampleD by translating to the java.time patterns used in DATETIME_FORMAT.
Closes: #54965
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit da511f4e033db6e8a6aa2a54b23e906b5e026845)
This is a follow up of #62480 where we are oversizing one array when initialising. In addition it prevents a possible CircuitBreaker leak during initialisation.
Make serializing `RepositoryData` a little faster and split up/document the code for it a little
as well given how massive this method has gotten at this point.