Introduce 64-bit unsigned long field type
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Backport for #60050Closes#32434
If `track_total_hits=true` is used, the exact value of the number of hits is returned - i.e. the value is effectively limitless, and not the default value of 10,000
Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.
Backport of #61655
* [DOCS] EQL: Improve regsvr32 misuse explanation (#62722)
Expands the introduction to better explain what regsvr32 misuse is and
how it works at a high level.
* [DOCS] EQL: Style fixes
Implement FORMAT according to the SQL Server spec: https://docs.microsoft.com/en-us/sql/t-sql/functions/format-transact-sql?view=sql-server-ver15#ExampleD by translating to the java.time patterns used in DATETIME_FORMAT.
Closes: #54965
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
(cherry picked from commit da511f4e033db6e8a6aa2a54b23e906b5e026845)
This PR adds a new 'version' field type that allows indexing string values
representing software versions similar to the ones defined in the Semantic
Versioning definition (semver.org). The field behaves very similar to a
'keyword' field but allows efficient sorting and range queries that take into
accound the special ordering needed for version strings. For example, the main
version parts are sorted numerically (ie 2.0.0 < 11.0.0) whereas this wouldn't
be possible with 'keyword' fields today.
Valid version values are similar to the Semantic Versioning definition, with the
notable exception that in addition to the "main" version consiting of
major.minor.patch, we allow less or more than three numeric identifiers, i.e.
"1.2" or "1.4.6.123.12" are treated as valid too.
Relates to #48878
* Add Maven Central as a JDBC repository
Document Maven Central as a JDBC repository.
(cherry picked from commit 2bc4d7eb19a26bf21b11214c4351470b677e1598)
The autoscaling decision API now returns an absolute capacity,
and leaves the actual decision of whether a scale up or down
is needed to the orchestration system.
The decision API now returns both a tier and node level required
and current capacity as wells as a decider level breakdown of the
same though with in particular current memory still not populated.
This commit adds the `index.routing.allocation.prefer._tier` setting to the
`DataTierAllocationDecider`. This special-purpose allocation setting lets a user specify a
preference-based list of tiers for an index to be assigned to. For example, if the setting were set
to:
```
"index.routing.allocation.prefer._tier": "data_hot,data_warm,data_content"
```
If the cluster contains any nodes with the `data_hot` role, the decider will only allow them to be
allocated on the `data_hot` node(s). If there are no `data_hot` nodes, but there are `data_warm` and
`data_content` nodes, then the index will be allowed to be allocated on `data_warm` nodes.
This allows us to specify an index's preference for tier(s) without causing the index to be
unassigned if no nodes of a preferred tier are available.
Subsequent work will change the ILM migration to make additional use of this setting.
Relates to #60848
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.
- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
We removed index-time boosting back in 5x, and we no longer document the 'boost'
parameter on any of our mapping types. However, it is still possible to define an
index-time boost on a field mapper for a surprisingly large number of field types, and
they even have an effect (sometimes, on some queries).
As a first step in finally removing all traces of index time boosting, this comment emits
a deprecation warning whenever a boost parameter is found on a mapping definition.
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
* fixing test
* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
The underlying issue was fixed a while ago in Lucene:
https://issues.apache.org/jira/browse/LUCENE-9517
and went away when lucene snapshot version was upgraded.
Also the name of the index to rollover had to be slightly changed,
so that it doesn't collide with data stream template's namespace.
(a regular index can't be created in the namespace that is managed
by a template that creates data streams)
Closes#62043
This adds ILM support for automatically migrating the managed
indices between data tiers.
This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).
(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
out if we're using the optimization for segment ordinals. It adds a few
more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
non-trivial for some aggregations, like cardinality.
* Add "synthetics-*-*" templates for synthetics fleet data
For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.
This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
#56709 and #57629.
Resolves#61665
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.
This API is now superseded by the Repositories Metering API.
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats
These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.
In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.
Backport of #60371
There are currently half a dozen ways to add plugins and modules for
test clusters to use. All of them require the calling project to peek
into the plugin or module they want to use to grab its bundlePlugin
task, and then both depend on that task, as well as extract the archive
path the task will produce. This creates cross project dependencies that
are difficult to detect, and if the dependent plugin/module has not yet
been configured, the build will fail because the task does not yet
exist.
This commit makes the plugin and module methods for testclusters
symmetetric, and simply adding a file provider directly, or a project
path that will produce the plugin/module zip. Internally this new
variant uses normal configuration/dependencies across projects to get
the zip artifact. It also has the added benefit of no longer needing the
caller to add to the test task a dependsOn for bundlePlugin task.
We now link to the top-level keyword type family page instead of its individual
subsections. This better fits the page format, where each type name is a link.
This commit enhances the verbose output for the
`_ingest/pipeline/_simulate?verbose` api. Specifically
this adds the following:
* the pipeline processor is now included in the output
* the conditional (if) and result is now included in the output iff it was defined
* a status field is always displayed. the possible values of status are
* `success` - if the processor ran with out errors
* `error` - if the processor ran but threw an error that was not ingored
* `error_ignored` - if the processor ran but threw an error that was ingored
* `skipped` - if the process did not run (currently only possible if the if condition evaluates to false)
* `dropped` - if the the `drop` processor ran and dropped the document
* a `processor_type` field for the type of processor (e.g. set, rename, etc.)
* throw a better error if trying to simulate with a pipeline that does not exist
closes#56004
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.
This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.
This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:
```
// Create an index
PUT /eggplant
// Get an index
GET /eggplant?flat_settings
```
Returns the default settings now of:
```json
{
"eggplant" : {
"aliases" : { },
"mappings" : { },
"settings" : {
"index.creation_date" : "1597855465598",
"index.number_of_replicas" : "1",
"index.number_of_shards" : "1",
"index.provided_name" : "eggplant",
"index.routing.allocation.include._tier" : "data_hot",
"index.uuid" : "6ySG78s9RWGystRipoBFCA",
"index.version.created" : "8000099"
}
}
}
```
After the initial setting of this setting, it can be treated like any other index level setting.
This new setting is *not* set on a new index if any of the following is true:
- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)
Relates to #60848
* updated shard limit doc
As the documentation was not so clear. I have updated saying this limit includes open indices with unassigned primaries and replicas count towards the limit.
* [DOCS] Incorporated edits.
Co-authored-by: Deb Adair <debadair@elastic.co>
Co-authored-by: gadekishore <50092970+gadekishore@users.noreply.github.com>
Backport to add case insensitive support for regex queries.
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.
Closes#59235
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.
As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).
Fix#59764Fix#59779
Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
No-op changes to:
* Move `Search your data` source files into the same directory
* Rename `Search your data` source files based on page ID
* Remove unneeded includes
* Remove the `Request` dir
* [ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.
Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>