* Add ComponentTemplate to MetaData (#53290)
* Add ComponentTemplate to MetaData
This adds a `ComponentTemplate` datastructure that will be used as part of #53101 (Index Templates
v2) to the `MetaData` class. Currently there are no APIs for interacting with this class, so it will
always be an empty map (other than in tests). This infrastructure will be built upon to add APIs in
a subsequent commit.
A `ComponentTemplate` is made up of a `Template`, a version, and a MetaData.Custom class. The
`Template` contains similar information to an `IndexTemplateMetaData` object— settings, mappings,
and alias configuration.
* Update minimal supported version constant
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This changes the `top_metrics` aggregation to return metrics in their
original type. Since it only supports numerics, that means that dates,
longs, and doubles will come back as stored, with their appropriate
formatter applied.
This change removes the Lucene's experimental flag from the documentations of the following
tokenizer/filters:
* Simple Pattern Split Tokenizer
* Simple Pattern tokenizer
* Flatten Graph Token Filter
* Word Delimiter Graph Token Filter
The flag is still present in Lucene codebase but we're fully supporting these tokenizers/filters
in ES for a long time now so the docs flag is misleading.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
The spirit of StreamInput/StreamOutput is that common I/O patterns should
be handled by these classes so that the persistence methods in
application classes can be kept short, which facilitates easy visual
comparison between read and write methods, and reduces risks of having
serialization issues due to mismatched implementations.
To this end, this change adds readOptionalVLong and writeOptionalVLong
methods to these classes as we have started to build up cases where
that conditional/null logic has been implemented directly in the read &
write methods.
Co-authored-by: Tim Vernum <tim.vernum@elastic.co>
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.
Backport of #53367
The tests in this class had been failing for a while, but went unnoticed as not tested by CI (see #53442).
The reason the tests fail is that the can-match phase is smarter now, and filters out access to a non-existing field.
Closes#53442
Restructures the 'Update an enrich policy' section to:
* Migrate the content to the section. It was previously stored in the
Put Enrich Policy API docs.
* Remove the warning tag admonition from the section content.
* Replace a reused section earlier in the "Set up an enrich processor"
page with a link.
No substantive changes were made to the content.
If an index was created in version 6 and contain a date field with a joda-style pattern it should still be allowed to search and insert document into it.
Those created in 6 but date pattern starts with 8, should be considered as java style.
The jdk and distribution download plugins create fake ivy repositories,
and use group based repository filtering to ensure no other artifacts
try to resolve against the fake repos. Currently this works by adding a
blanket exclude to all repositories for the given group name. This
commit changes to using the new exclusiveContent feature in
Gradle to do the exclusion.
Today we do not have any infrastructure for adding a deprecation check
for settings that are removed. This commit enables this by adding such
infrastructure. Note that this infrastructure is unused in this commit,
which is deliberate. However, the primary target for this commit is 7.x
where this infrastructue will be used, in a follow-up.
Adds a new `default_field_map` field to trained model config objects.
This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.
The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
It looks like `date_nanos` fields weren't likely to work properly in
composite aggs because composites iterate field values using points and
we weren't converting the points into milliseconds. Because the doc
values were coming back in milliseconds we ended up geting very confused
and just never collecting sub-aggregations.
This fixes that by adding a method to `DateFieldMapper.Resolution` to
`parsePointAsMillis` which is similarly in name and function to
`NumberFieldMapper.NumberType`'s `parsePoint` except that it normalizes
to milliseconds which is what aggs need at the moment.
Closes#53168
Using a Long alone is not strong enough for the id of search contexts
because we reset the id generator whenever a data node is restarted.
This can lead to two issues:
1. Fetch phase can fetch documents from another index
2. A scroll search can return documents from another index
This commit avoids these issues by adding a UUID to SearchContexId.
Makes the following changes to the `word_delimiter` token filter docs:
* Adds a warning admonition recommending the `word_delimiter_graph`
filter instead. This warning includes a link to the deprecated Lucene
`WordDelimiterFilter`.
* Updates the description
* Adds detailed analyze snippet
* Adds custom analyzer and custom filter snippets
* Reorganizes and updates parameter documentation
This is a partial implementation of an endpoint for anomaly
detector model memory estimation.
It is not complete, lacking docs, HLRC and sensible numbers
for many anomaly detector configurations. These will be
added in a followup PR in time for 7.7 feature freeze.
A skeleton endpoint is useful now because it allows work on
the UI side of the change to commence. The skeleton endpoint
handles the same cases that the old UI code used to handle,
and produces very similar estimates for these cases.
Backport of #53333
In rare circumstances it is possible for an isolated node to have a greater
term than the currently-elected leader. Today such a node will attempt to join
the cluster but will not offer a vote to the leader and will reject its cluster
state publications due to their stale term. This situation persists since there
is no mechanism for the joining node to inform the leader that its term is
stale and a new election is required.
This commit adds the current term of the joining node to the join request. Once
the join has been validated, the leader will perform another election to
increase its term far enough to allow the isolated node to join properly.
Fixes#53271
In a tip admonition, we recommend using the `keyword` tokenizer with the
`word_delimiter_graph` token filter. However, we only use the
`whitespace` tokenizer in the example snippets. This updates those
snippets to use the `keyword` tokenizer instead.
Also corrects several spacing issues for arrays in these docs.
* Record Force Merges in live commit data
Prerequisite of #52182. Record force merges in the live commit data
so two shard states with the same sequence number that differ only in whether
or not they have been force merged can be distinguished when creating snapshots.
We can't always have the same segment stats and doc stats between
InternalEngine and ReadOnlyEngine if there are some fully deleted
segments. ReadOnlyEngine always filters out them. InternalEngine,
however, will keep them if peer recovery retention leases exist or the
number of the retaining operations is non-zero.
This change reverts the fix in #51331 and uses the wrapped reader to
calculate the segment stats and doc stats. For the test, we need to
disable the extra retaining soft-deletes operations.
Closes#51303
This avoids NPE when executing SLM policy when no config was provided.
Related to #44465Closes#53171
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Prior to this commit, rollover did not propagate the `is_hidden` alias
property when rollover over an index. This commit ensures that an alias
that's rollover over will remain hidden.
This removes the `instanceof`s from `SiblingPipelineAggregator` by
adding a `rewriteBuckets` method to `InternalAggregation` that can be
called to, well, rewrite the buckets. The default implementation of
`rewriteBuckets` throws the same exception that was thrown when you
attempted to run a `SiblingPipelineAggregator` on an aggregation without
buckets. It is overridden by `InternalSingleBucketAggregation` and
`InternalMultiBucketAggregation` to correctly rewrite their buckets.
When an composite aggregation is run against an index with a sort that
*starts* with the "source" fields from the composite but has additional
fields it'd blow up in while trying to decide if it could use the sort.
This changes it to decide that it *can* use the sort.
Closes#52480
This change optimizes the merge of terms aggregations by removing
the priority queue that was used to collect all the buckets during
a non-final reduction. We don't need to keep the result sorted since
the merge of buckets in a subsequent reduce can modify the order.
I wrote a small micro-benchmark to test the change and the speed ups
are significative for small merge buffer sizes:
````
########## Master:
Benchmark (bufferSize) (cardinality) (numShards) (topNSize) Mode Cnt Score Error Units
TermsReduceBenchmark.reduceTopHits 5 10000 1000 1000 avgt 10 2459,690 ± 198,682 ms/op
TermsReduceBenchmark.reduceTopHits 16 10000 1000 1000 avgt 10 1030,620 ± 91,544 ms/op
TermsReduceBenchmark.reduceTopHits 32 10000 1000 1000 avgt 10 558,608 ± 44,915 ms/op
TermsReduceBenchmark.reduceTopHits 128 10000 1000 1000 avgt 10 287,333 ± 8,342 ms/op
TermsReduceBenchmark.reduceTopHits 512 10000 1000 1000 avgt 10 257,325 ± 54,515 ms/op
########## Patch:
Benchmark (bufferSize) (cardinality) (numShards) (topNSize) Mode Cnt Score Error Units
TermsReduceBenchmark.reduceTopHits 5 10000 1000 1000 avgt 10 805,611 ± 14,630 ms/op
TermsReduceBenchmark.reduceTopHits 16 10000 1000 1000 avgt 10 378,851 ± 17,929 ms/op
TermsReduceBenchmark.reduceTopHits 32 10000 1000 1000 avgt 10 261,094 ± 10,176 ms/op
TermsReduceBenchmark.reduceTopHits 128 10000 1000 1000 avgt 10 241,051 ± 19,558 ms/op
TermsReduceBenchmark.reduceTopHits 512 10000 1000 1000 avgt 10 231,643 ± 6,170 ms/op
````
The code for the benchmark can be found [here](). It seems to be up to 3x faster for terms aggregations
that return 10,000 unique terms (1000 terms per shard). For a cardinality of 100,000 terms, this patch is up to 5x faster:
````
########## Patch:
Benchmark (bufferSize) (cardinality) (numShards) (topNSize) Mode Cnt Score Error Units
TermsReduceBenchmark.reduceTopHits 5 100000 1000 1000 avgt 10 12791,083 ± 397,128 ms/op
TermsReduceBenchmark.reduceTopHits 16 100000 1000 1000 avgt 10 3974,939 ± 324,617 ms/op
TermsReduceBenchmark.reduceTopHits 32 100000 1000 1000 avgt 10 2186,285 ± 267,124 ms/op
TermsReduceBenchmark.reduceTopHits 128 100000 1000 1000 avgt 10 914,657 ± 160,784 ms/op
TermsReduceBenchmark.reduceTopHits 512 100000 1000 1000 avgt 10 604,198 ± 145,457 ms/op
########## Master:
Benchmark (bufferSize) (cardinality) (numShards) (topNSize) Mode Cnt Score Error Units
TermsReduceBenchmark.reduceTopHits 5 100000 1000 1000 avgt 10 60696,107 ± 929,944 ms/op
TermsReduceBenchmark.reduceTopHits 16 100000 1000 1000 avgt 10 16292,894 ± 783,398 ms/op
TermsReduceBenchmark.reduceTopHits 32 100000 1000 1000 avgt 10 7705,444 ± 77,588 ms/op
TermsReduceBenchmark.reduceTopHits 128 100000 1000 1000 avgt 10 2156,685 ± 88,795 ms/op
TermsReduceBenchmark.reduceTopHits 512 100000 1000 1000 avgt 10 760,273 ± 53,738 ms/op
````
The merge of buckets can also be optimized. Currently we use an hash map to merge buckets coming from different shards so this can be costly if the number of unique terms is high. Instead, we could always sort the shard terms result by key and perform a merge sort to reduce the results. This would save memory and make the merge more linear in terms
of complexity in the coordinating node at the expense of an additional sort in the shards. I plan to test this possible optimization in a follow up.
Relates #51857
It doesn't make a whole lot of sense for `BitArray#clear` to grow the
underlying storage array just to clear the bit. We *already* treat
indices outside of the storage array as unset. This turns such
operations into a noop.
A previous change (#53029) is causing analysis jobs to wait for certain indices to be made available. While this it is good for jobs to wait, they could fail early on _start.
This change will cause the persistent task to continually retry node assignment when the failure is due to shards not being available.
If the shards are not available by the time `timeout` is reached by the predicate, it is treated as a _start failure and the task is canceled.
For tasks seeking a new assignment after a node failure, that behavior is unchanged.
closes#53188
Updates the SVG for a token graph to make the layout consistent with
other graphs. This means moving the text directly above the edge lines.
Previously, the text was above the edge line.