OpenSearch/docs/reference/release-notes/highlights-7.0.0.asciidoc

372 lines
19 KiB
Plaintext
Raw Normal View History

[[release-highlights-7.0.0]]
== 7.0.0 release highlights
++++
<titleabbrev>7.0.0</titleabbrev>
++++
//NOTE: The notable-highlights tagged regions are re-used in the
//Installation and Upgrade Guide
//tag::notable-highlights[]
[float]
==== Adaptive replica selection enabled by default
In Elasticsearch 6.x and prior, a series of search requests to the same shard
would be forwarded to the primary and each replica in round-robin fashion. This
could prove problematic if one node starts a long garbage collection---search
requests could still be forwarded to the slow node regardless and would have an
impact on search latency.
In 6.1, we added an experimental
{ref}/search.html#search-adaptive-replica[adaptive replica selection] feature.
Each node tracks and compares how long search requests to
other nodes take, and uses this information to adjust how frequently to send
requests to shards on particular nodes. In our benchmarks, this results in an
overall improvement in search throughput and reduced 99th percentile latencies.
This option was disabled by default throughout 6.x, but weve heard feedback
from our users that have found the setting to be very beneficial, so weve
turned it on by default starting in Elasticsearch 7.0.0.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Skip shard refreshes if a shard is "search idle"
Elasticsearch 6.x and prior {ref}/indices-refresh.html[refreshed] indices
automatically in the background, by default every second. This provides the
“near real-time” search capabilities Elasticsearch is known for: results are
available for search requests within one second after they'd been added, by
default. However, this behavior has a significant impact on indexing performance
if the refreshes are not needed, (e.g., if Elasticsearch isnt servicing any
active searches).
Elasticsearch 7.0 is much smarter about this behavior by introducing the
notion of a shard being "search idle". A shard now transitions to being search
idle after it hasn't had any searches for
{ref}/index-modules.html#dynamic-index-settings[thirty seconds], by default.
Once a shard is search idle, all scheduled refreshes will
be skipped until a search comes through, which will trigger the next scheduled
refresh. We know that this is going to significantly increase the indexing
throughput for many users. The new behavior is only applied if there is no
explicit {ref}/index-modules.html#dynamic-index-settings[refresh interval set],
so do set the refresh
interval explicitly for any indices on which you prefer the old behavior.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Default to one shard
One of the biggest sources of troubles weve seen over the years from our users
has been over-sharding and defaults play a big role in that. In Elasticsearch
6.x and prior, we defaulted to five shards by default per index. If you had one
daily index for ten different applications and each had the default of five
shards, you were creating fifty shards per day and it wasn't long before you had
thousands of shards even if you were only indexing a few gigabytes of data per
day. Index Lifecycle Management was a first step to help with this: providing
native rollover functions to create indexes by size instead of (just) by day and
built-in shrink functionality to shrink the number of shards per
index. Defaulting indices to one shard is the next step in helping to reduce
over-sharding. Of course, if you have another preferred primary shard count, you
can set it via the index settings.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Lucene 8
As with every major release, we look to support the latest major version of
Lucene, along with all the goodness that comes with it. That includes all the
developments that we contributed to the new Lucene version. Elasticsearch 7.0
bundles Lucene 8, which is the latest version of Lucene. Lucene version 8 serves
as the foundation for many functional improvements in the rest of Elasticsearch,
including improved search performance for top-k queries and better ways to
combine relevance signals for your searches while still maintaining speed.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Introduce the ability to minimize round-trips in {ccs}
In Elasticsearch 5.3, we released a feature called
{ref}/modules-cross-cluster-search.html[{ccs}] for users to query across multiple
clusters. Weve since improved on the {ccs} framework, adding features to
ultimately use it to deprecate and replace tribe nodes as a way to federate
queries. In Elasticsearch 7.0, were adding a new execution mode for {ccs}: one
which has fewer round-trips when they aren't necessary. This mode
(`ccs_minimize_roundtrips`) can result in faster searches when the {ccs} query
spans high-latencies (e.g., across a WAN).
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== New cluster coordination implementation
Since the beginning, we focused on making Elasticsearch easy to scale and
resilient to catastrophic failures. To support these requirements, we created a
pluggable cluster coordination system, with the default implementation known as
Zen Discovery. Zen Discovery was meant to be effortless, and give our users
peace of mind (as the name implies). The meteoric rise in Elasticsearch usage
has taught us a great deal. For instance, Zen's `minimum_master_nodes` setting
was often misconfigured, which put clusters at a greater risk of split brains
and losing data. Maintaining this setting across large and dynamically resizing
clusters was also difficult.
In Elasticsearch 7.0, we have completely rethought and rebuilt the cluster
coordination layer. The new implementation gives safe sub-second master election
times, where Zen may have taken several seconds to elect a new master, valuable
time for a mission-critical deployment. With the `minimum_master_nodes` setting
removed, growing and shrinking clusters becomes safer and easier, and leaves
much less room to misconfigure the system. Most importantly, the new cluster
coordination layer gives us strong building blocks for the future of
Elasticsearch, ensuring we can build functionality for even more advanced
use-cases to come.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Better support for small heaps (the real-memory circuit breaker)
Elasticsearch 7.0 adds an all-new {ref}/circuit-breaker.html[circuit breaker]
that keeps track of the total memory used by the JVM and will reject requests if
they would cause the reserved plus actual heap usage to exceed 95%. We'll also
be changing the default maximum buckets to return as part of an aggregation
(`search.max_buckets`) to 10,000, which is unbounded by default in 6.x and
prior. These two show great signs at seriously improving the out-of-memory
protection of Elasticsearch in 7.x, helping you keep your cluster alive even in
the face of adversarial or novice users running large queries and aggregations.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== {ccr-cap} is production-ready
We introduced {ccr-cap} as a beta feature in Elasticsearch
6.5. {ccr-cap} was the most heavily requested features for Elasticsearch. We're
excited to announce {ccr-cap} is now generally available and ready for production use
in Elasticsearch 6.7 and 7.0! {ccr-cap} has a variety of use cases, including
cross-datacenter and cross-region replication, replicating data to get closer to
the application server and user, and maintaining a centralized reporting cluster
replicated from a large number of smaller clusters.
In addition to maturing to a GA feature, there were a number of important
technical advancements in CCR for 6.7 and 7.0. Previous versions of {ccr-cap} required
replication to start on new indices only: existing indices could not be
replicated. {ccr-cap} can now start replicating existing indices that have soft
deletes enabled in 6.7 and 7.0, and new indices default to having soft deletes
enabled. We also introduced new technology to prevent a follower index from
falling fatally far behind its leader index. Weve added a management UI in
Kibana for configuring remote clusters, indices to replicate, and index naming
patterns for automatic replication (e.g. for replicating `metricbeat-*`
indices). We've also added a monitoring UI for insight into {ccr} progress and
alerting on errors. Check out the Getting started with {ccr}
guide, or visit the reference documentation to learn more.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== {ilm-cap} is production-ready
Index Lifecycle Management (ILM) was
https://www.elastic.co/blog/elastic-stack-6-6-0-released[released] as a beta
feature in Elasticsearch 6.6. Weve officially moved ILM out of beta and into
GA, ready for production usage! ILM makes it easy to manage the lifecycle of
data in Elasticsearch, including how data progresses between
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/ilm-policy-definition.html[hot, warm, cold, and deletion phases].
Specific rules regarding how data moves through these phases can be created via
APIs in Elasticsearch, or a beautiful management UI in Kibana.
In Elasticsearch 6.7 and 7.0, ILM can now manage frozen indices. Frozen indices
are valuable for long term data storage in Elasticsearch, and require a smaller
amount of memory (heap) in relation to the amount of data managed by a node. In
6.7 and 7.0,
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/_actions.html[frozen indices]
can now be frozen as part of the cold phase in ILM. In addition, ILM now works
directly with Cross-Cluster Replication (CCR), which also GAd in the
Elasticsearch 6.7 and 7.0 releases. The potential actions available in each ILM
phase can be found in the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/_actions.html[documentation].
ILM is free to use and part of the default distribution of Elasticsearch.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== SQL is production-ready
The SQL interface to Elasticsearch is now GA.
https://www.elastic.co/blog/elasticsearch-6-3-0-released[Introduced in 6.3] as
an alpha release, the SQL interface allows developers and data scientists
familiar with SQL to use the speed, scalability, and full-text power of
Elasticsearch that others know and love. It also allows BI tools using SQL to
easily access data in Elasticsearch. In addition to approving SQL access as a GA
feature in Elasticsearch, weve designated our
https://www.elastic.co/downloads/jdbc-client[JDBC] and
https://www.elastic.co/downloads/odbc-client[ODBC] drivers as GA. There are four
methods to access Elasticsearch SQL: through the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-rest.html[Elasticsearch
REST endpoints], the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-cli.html[Elasticsearch
SQL command line interface], the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-jdbc.html[JDBC
driver], and the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-odbc.html[ODBC
driver].
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== High-level REST client is feature-complete
If youve been following our
https://www.elastic.co/blog/the-elasticsearch-java-high-level-rest-client-is-out[blog]
or our https://github.com/elastic/elasticsearch/issues/27205[GitHub repository],
you may be aware of a task weve been working on for quite a while now: creating
a next-generation Java client for accessing an Elasticsearch cluster. We
started off by working on the most commonly-used features like search and
aggregations, and have been working our way through administrative and
monitoring APIs. Many of you that use Java are already using this new client,
but for those that are still using the TransportClient, now is a great time to
upgrade to our High Level REST Client, or HLRC.
As of 7.0.0, the HLRC now has all the API checkboxes checked to call it
“complete” so those of you still using the TransportClient should be able to
migrate. Well of course continue to develop our REST APIs and will add them to
this client as we go. For a list of all of the APIs that are available, have a
look at our
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high.html[HLRC
documentation]. To get started, have a look at the
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high-getting-started.html[getting
started with the HLRC] section of our docs and if you need help migrating from
the TransportClient, have a look at our
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high-level-migration.html[migration
guide].
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Support nanosecond timestamps
Up until 7.0 Elasticsearch could only store timestamps with millisecond
precision. If you wanted to process events that occur at a higher rate -- for
example if you want to store and analyze tracing or network packet data in
Elasticsearch -- you may want higher precision. Historically, we have used the
https://www.joda.org/joda-time/[Joda time library] to handle dates and times,
and Joda lacked support for such high precision timestamps.
With JDK 8, an official Java time API has been introduced which can also handle
nanosecond precision timestamps and over the past year, weve been working to
migrate our Joda time usage to the native Java time while trying to maintain
backwards compatibility. As of 7.0.0, you can now make use of these nanosecond
timestamps via a dedicated
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/date_nanos.html[date_nanos
field mapper]. Note that aggregations are still on a millisecond resolution
with this field to avoid having an explosion of buckets.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Faster retrieval of top hits
When it comes to search, query performance is a key feature. We have achieved a
significant improvement to search performance in Elasticsearch 7.0 for
situations in which the exact hit count is not needed and it is sufficient to
set a lower boundary to the number of results. For example, if your users
typically just look at the first page of results on your site and dont care
about exactly how many documents matched, you may be able to show them “more
than 10,000 hits” and then provide them with paginated results. Its quite
common to have users enter frequently-occurring terms like “the” and “a” in
their queries, which has historically forced Elasticsearch to score a lot of
documents even when those frequent terms couldnt possibly add much to the
score.
In these conditions Elasticsearch can now skip calculating scores for records
that are identified at an early stage as records that will not be ranked at the
top of the result set. This can significantly improve the query speed. The
actual number of top results that are scored is
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-track-total-hits.html[configurable],
but the default is 10,000. The behavior of queries that have a result set that
is smaller than this threshold will not change - i.e. the results count is
accurate but there is no performance improvement for queries that match a small
number of documents. Because the improvement is based on skipping low ranking
records, it does not apply to aggregations. You can read more about this
powerful algorithmic development in our blog post
https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Magic
WAND: Faster Retrieval of Top Hits in Elasticsearch].
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Support for TLS 1.3
Elasticsearch has supported encrypted communications for a long time, however,
we recently started https://www.elastic.co/support/matrix#matrix_jvm[supporting
JDK 11], which gives us new capabilities. JDK 11 now has TLSv1.3 support so
starting with 7.0, were now supporting TLSv1.3 within Elasticsearch for those
of you running JDK 11. In order to help new users from inadvertently running
with low security, weve also dropped TLSv1.0 from our defaults. For those
running older versions of Java, we have default options of TLSv1.2 and
TLSv1.1. Have a look at our
https://www.elastic.co/guide/en/elastic-stack-overview/7.0/ssl-tls.html[TLS
setup instructions] if you need help getting started.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Bundle JDK in Elasticsearch distribution
One of the more prominent "getting started hurdles" weve seen users run into
has been not knowing that Elasticsearch is a Java application and that they need
to install one of the supported JDKs first. With 7.0, were now bundling a
distribution of OpenJDK to help users get started with Elasticsearch even
faster. We understand that some users have preferred JDK distributions, so we
also support bringing your own JDK. If you want to bring your own JDK, you can
still do so by
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/setup.html#jvm-version[setting
JAVA_HOME] before starting Elasticsearch.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Rank features
Elasticsearch 7.0 has several new field types to get the most out of your data.
Two to help with core search use cases are
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/rank-feature.html[`rank_feature`]
and
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/rank-features.html[`rank_features`].
These can be used to boost documents based on numeric or categorical values
while still maintaining the performance of the new fast top hits query
capabilities. For more information on these fields and how to use them, read our
https://www.elastic.co/blog/easier-relevance-tuning-elasticsearch-7-0[blog
post].
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== JSON logging
JSON logging is now enabled in Elasticsearch in addition to plaintext
logs. Starting in 7.0, you will find new files with `.json` extensions in your
log directory. This means you can now use filtering tools like
https://stedolan.github.io/jq/[`jq`] to pretty print and process your logs in a
much more structured manner. You can also expect finding additional information
like `node.id`, `cluster.uuid`, `type` (and more) in each log line. The `type`
field per each JSON log line will let you to distinguish log streams when
running on docker.
//end::notable-highlights[]
//tag::notable-highlights[]
[float]
=== Script score query (aka function score 2.0)
With 7.0, we are introducing the
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-script-score-query.html[next
generation of our function score capability]. This new script_score query
provides a new, simpler, and more flexible way to generate a ranking score per
record. The script_score query is constructed of a set of functions, including
arithmetic and distance functions, which the user can mix and match to construct
arbitrary function score calculations. The modular structure is simpler to use
and will open this important functionality to additional users.
//end::notable-highlights[]