By default we filter out projects from dependencies info task based on their
group names. We should be filtering based on if the dependency is a project
dependency.
Changes:
* Adds new 'Reindex with a data stream' section to 'Use a data stream'
* Makes the existing reindex API docs aware of data streams
* Rewrites the reindex glossary definition to include data streams
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: Tim Vernum <tim@adjective.org>
Co-authored-by: lcawl <lcawley@elastic.co>
We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.
Closes#57831
If `ExtraFS` decides to put `extra0/0` into the indices folder
then the previous logic in this test would have interpreted the `0`
as shard `0` of index `extra0` and fail to list its contents (since it's a file
and not an actual shard directory).
=> simplified the logic to use actually referenced `IndexId` for iterating over indices
instead.
Scheduling on the threadpool will throw if the scheduler is already
shut down. Handled by treating the rejection like any other non-retryable
exception.
Closes#58021
This moves the code to look up significance heuristics information like
background frequency and superset size out of
`SignificantTermsAggregatorFactory` and into its own home so that it is
easier to pass around. This will:
1. Make us feel better about ourselves for not passing around the
factory, which is really *supposed* to be a throw away thing.
2. Abstract the significance lookup logic so we can reuse it for the
`significant_text` aggregation.
3. Make if very simple to cache the background frequencies which should
speed up when the agg is a sub-agg. We had done this for numerics
but not string-shaped significant terms.
This type of result will store stats about how well categorization
is performing. When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.
This PR is a minimal implementation to allow the C++ side changes to
be made. More Java side changes related to per-partition
categorization will be in followup PRs. However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats. Like forecast request stats the
categorizer stats can be read directly from the job's results alias.
Backport of #57978
Backport of #57870 to 7.x branch.
This change now also copies the op_type from the reindex request's destination index request to the actual index request being used in the bulk request.
For ensuring no document exists, the op_type create doesn't need to be copied, since Versions.MATCH_DELETED will copied from the 'mainRequest.getDestination().version()'.
The `version()` method on IndexRequest only returns Versions.MATCH_DELETED if op_type=create and no specific version has been specified.
However in order to be able to index into a data stream, the op_type must be create. So in order to support that the op_type must be copied from the reindex request's destination index request to the actual index request being used in the bulk request.
Relates to #53100 and #57788
Adds support for reading in `model_size_info` objects.
These objects contain numeric values indicating the model definition size and complexity.
Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
When a search phase fails, we release the context of all successful shards.
Successful shards that rewrite the request to match none will not create any context
since #. This change ensures that we don't try to release a `null` context on these
successful shards.
Closes#57945
Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the
shard size or disk usage function (to mock out disk usage) are overridden.
Closes#57888
Changes:
* Updates 'Data streams' intro page to focus on problem solution and
benefits.
* Adds 'Data streams overview' page to cover conceptual information,
based on existing content in the 'Data streams' intro.
* Adds diagrams for data streams and search/indexing request examples.
* Moves API jump list and API docs to a new 'Data streams APIs' section.
Links to these APIs will be available through tutorials.
* Add xrefs to existing docs for concepts like generation, write index,
and append-only.
Changes:
* Condenses and relocates the `docvalue_fields` example to the 'Run a search'
page.
* Adds docs for the `docvalue_fields` request body parameter.
* Updates several related xrefs.
Co-authored-by: debadair <debadair@elastic.co>
Today `InternalEngine#releaseIndexCommit` fails with an
`AlreadyClosedException` if the engine is closed before the index commit is
released. This can happen if, for example, a node leaves and rejoins the
cluster and acquires an index commit for replica shard allocation concurrently
with shutting the shard down.
There's no need to fail the operation like this: if the engine is shut down
then we will clean up the unreferenced files when it's restarted (or if it's
allocated elsewhere) so we can suppress an `AlreadyClosedException` in this
case. This commit does so.
Fixes#57797
We document that the cluster state API is an internal representation which may
change, but apparently not emphatically enough. This commit adds a `NOTE:`
admonition to this paragraph.
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
Closes#49554
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
When reducing `auto_date_histogram` we were using `Rounding#round`
which is quite a bit more expensive than
```
Rounding.Prepared prepared = rounding.prepare(min, max);
long result = prepared.round(date);
```
when rounding to a non-fixed time zone like `America/New_York`. This
stops using the former and starts using the latter.
Relates to #56124
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
ModelLoadingService only caches models if they are referenced by an
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest
pipeline is deleted the model it references should not be evicted if
it is used in search.
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
This commit re-enables windows testing for archives packaging tests.
These were disabled previously because of constant failure due to
windows file locks, but the failure does not occur outside of CI, so
they are being re-enabled to further investigate the failure.
relates #50825
Reworks the `parent` and `child` aggregation are not at the top level
using the optimization from #55873. Instead of wrapping all
non-top-level `parent` and `child` aggregators we now handle being a
child aggregator in the aggregator, specifically by adding recording
which global ordinals show up in the parent and then checking if they
match the child.