OpenSearch/docs/reference/release-notes/highlights.asciidoc

263 lines
11 KiB
Plaintext
Raw Normal View History

[[release-highlights]]
== What's new in {minor-version}
coming[{minor-version}]
Here are the highlights of what's new and improved in {es} {minor-version}!
ifeval::["{release-state}"!="unreleased"]
For detailed information about this release, see the
<<release-notes-{elasticsearch_version}, Release notes >> and
<<breaking-changes-{minor-version}, Breaking changes>>.
endif::[]
// Add previous release to the list
Other versions:
{ref-bare}/7.9/release-highlights.html[7.9]
| {ref-bare}/7.8/release-highlights.html[7.8]
| {ref-bare}/7.7/release-highlights.html[7.7]
2020-06-15 18:50:49 -04:00
| {ref-bare}/7.6/release-highlights-7.6.0.html[7.6]
| {ref-bare}/7.5/release-highlights-7.5.0.html[7.5]
| {ref-bare}/7.4/release-highlights-7.4.0.html[7.4]
| {ref-bare}/7.3/release-highlights-7.3.0.html[7.3]
| {ref-bare}/7.2/release-highlights-7.2.0.html[7.2]
| {ref-bare}/7.1/release-highlights-7.1.0.html[7.1]
| {ref-bare}/7.0/release-highlights-7.0.0.html[7.0]
// tag::notable-highlights[]
[discrete]
[[indexing-speed-improvement]]
=== Indexing speed improvement
{es} 7.10 improves indexing speed by up to 20%. We've reduced the coordination
needed to add entries to the {ref}/index-modules-translog.html[transaction log].
This reduction allows for more concurrency and increases the transaction
log buffer size from `8KB` to `1MB`. However, performance gains are lower for
full-text search and other analysis-intensive use cases. The heavier the
indexing chain, the lower the gains, so indexing chains that involve many
fields, ingest pipelines or full-text indexing will see lower gains.
[discrete]
[[more-space-efficient-indices]]
=== More space-efficient indices
{es} 7.10 depends on Apache Lucene 8.7, which introduces higher compression of
stored fields, the part of the index that notably stores the
{ref}/mapping-source-field.html[`_source`]. On the various data sets that we
benchmark against, we noticed space reductions between 0% and 10%. This change
especially helps on data sets that have lots of redundant data across documents,
which is typically the case of the documents that are produced by our
Observability solutions, which repeat metadata about the host that produced the
data on every document.
Elasticsearch offers the ability to configure the
{ref}/index-modules.html#index-codec[`index.codec`] setting to tell
{es} how aggressively to compress stored fields. Both supported values
`default` and `best_compression` will get better compression with this change.
[discrete]
[[data-tier-formalization]]
=== Data tiers
7.10 introduces the concept of formalized data tiers within {es}.
{ref}/data-tiers.html[Data tiers] are a simple, integrated approach that gives
users control over optimizing for cost, performance, and breadth/depth of data.
Prior to this formalization, many users configured their own tier topology using
custom node attributes as well as using {ilm-init} to manage the lifecycle and
location of data within a cluster.
With this formalization, data tiers (content, hot, warm, and cold) can be
explicitly configured using {ref}/modules-node.html#node-roles[node roles], and
indices can be configured to be allocated within a specific tier using
{ref}/data-tier-shard-filtering.html[index-level data tier allocation filtering].
{ilm-init} will make use of these tiers to
{ref}/ilm-migrate.html[automatically migrate] data between nodes as an index
goes through the phases of its lifecycle.
Newly created indices abstracted by a {ref}/data-streams.html[data stream] will
be allocated to the `data_hot` tier automatically, while standalone indices will
be allocated to the `data_content` tier automatically. Nodes with the
pre-existing `data` role are considered to be part of all tiers.
[discrete]
[[auc-roc-eval-class]]
=== AUC ROC evaluation metrics for classification analysis
{ml-docs}/ml-dfanalytics-evaluate.html#ml-dfanalytics-class-aucroc[Area under the curve of receiver operating characteristic (AUC ROC)]
is an evaluation metric that has been available for {oldetection} since 7.3 and
now is available for {classification} analysis. AUC ROC represents the
performance of the {classification} process at different predicted probability
thresholds. The true positive rate for a specific class is compared against the
rate of all the other classes combined at the different threshold levels to
create the curve.
[discrete]
[[custom-feature-processor-dfa]]
=== Custom feature processors in {dfanalytics}
Feature processors enable you to extract process features from document fields.
You can use these features in model training and model deployment. Custom
feature processors provide a mechanism to create features that can be used at
search and ingest time and they dont take up space in the index.
This process more tightly couples feature generation with the resulting model.
The result is simplified model management as both the features and the model can
easily follow the same life cycle.
[discrete]
[[points-in-time-for-search]]
=== Points in time (PITs) for search
In 7.10, we're introducing points in time (PITs), a lightweight way to preserve
index state over searches. PITs improve end-user experience by making UIs more
reactive.
By default, a search request waits for complete results before returning a
response. For example, a search that retrieves top hits and aggregations returns
a response only after both top hits and aggregations are computed. However,
aggregations are usually slower and more expensive to compute than top hits.
Instead of sending a combined request, you can send two separate requests: one
for top hits and another one for aggregations. With separate search requests, a
UI can display top hits as soon as they're available and display aggregation
data after the slower aggregation request completes. You can use a PIT to ensure
both search requests run on the same data and index state.
To use a PIT in a search, you must first explicitly create the PIT using the new
{ref}/point-in-time-api.html[open PIT API]. PITs get automatically garbage-collected
after `keep_alive` if no follow-up request extends their duration.
[source,console]
----
POST /my-index-000001/_pit?keep_alive=1m
----
// TEST[setup:my_index]
The API returns a PIT ID you can use in search requests. You can also
configure by how long to extend your PIT's lifespan using the search request's
`keep_alive` parameter.
[source,console]
----
POST /_search
{
"size": 100,
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
----
// TEST[catch:missing]
PITs automatically close when their `keep_alive` period ends. You can
also manually close PITs you no longer need using the
{ref}/point-in-time-api.html[close PIT API]. Closing a PIT releases the
resources needed to maintain the PIT's index state.
[source,console]
----
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
----
// TEST[catch:missing]
For more information about using PITs in search, see
{ref}/paginate-search-results.html#search-after[Paginate search results with
`search_after`] or the {ref}/point-in-time-api.html[PIT API documentation].
[discrete]
[[support-for-request-level-circuit-breakers]]
=== Request-level circuit breakers on coordinating nodes
You can now use a coordinating node to account for memory used to perform
partial and final reduce of aggregations in the request circuit breaker. The
search coordinator adds the memory that it used to save and reduce the results
of shard aggregations in the request circuit breaker. Before any partial or
final reduce, the memory needed to reduce the aggregations is estimated and a
CircuitBreakingException is thrown if exceeds the maximum memory allowed in this
breaker.
This size is estimated as roughly 1.5 times the size of the serialized
aggregations that need to be reduced. This estimation can be completely off for
some aggregations but it is corrected with the real size after the reduce
completes. If the reduce is successful, we update the circuit breaker to remove
the size of the source aggregations and replace the estimation with the
serialized size of the newly reduced result.
[discrete]
[[eql-case-sensitivity-operator]]
=== EQL: Case-sensitivity and the `:` operator
In 7.10, we made most EQL operators and functions case-sensitive by default.
We've also added `:`, a new case-insensitive equal operator. Designed for
security use cases, you can use the `:` operator to search for strings in
Windows event logs and other event data containing a mix of letter cases.
[source,console]
----
GET /my-index-000001/_eql/search
{
"query": """
process where process.executable : "c:\\\\windows\\\\system32\\\\cmd.exe"
"""
}
----
// TEST[setup:sec_logs]
For more information, see the {ref}/eql-syntax.html[EQL
syntax documentation].
[discrete]
[[deprecate-rest-api-access-to-system-indices]]
=== REST API access to system indices is deprecated
We are deprecating REST API access to system indices. Most REST API
requests that attempt to access system indices will return the following
deprecation warning:
[source,text]
----
this request accesses system indices: [.system_index_name], but in a future
major version, direct access to system indices will be prevented by default
----
The following REST API endpoints access system indices as part of their
implementation and will not return the deprecation warning:
* `GET _cluster/health`
* `GET {index}/_recovery`
* `GET _cluster/allocation/explain`
* `GET _cluster/state`
* `POST _cluster/reroute`
* `GET {index}/_stats`
* `GET {index}/_segments`
* `GET {index}/_shard_stores`
* `GET _cat/[indices,aliases,health,recovery,shards,segments]`
We are also adding a new metadata flag to track indices. {es} will automatically
add this flag to any existing system indices during upgrade.
[discrete]
[[add-system-read-thread-pool]]
=== New thread pools for system indices
We've added two new thread pools for system indices: `system_read` and
`system_write`. These thread pools ensure system indices critical to the Elastic
Stack, such as those used by security or Kibana, remain responsive when
a cluster is under heavy query or indexing load.
`system_read` is a `fixed` thread pool used to manage resources for
read operations targeting system indices. Similarly, `system_write` is a
`fixed` thread pool used to manage resources for write operations targeting
system indices. Both have a maximum number of threads equal to `5`
or half of the available processors, whichever is smaller.
// end::notable-highlights[]