From edb0d42b41f21838a0d142dac4f64b91e4055b88 Mon Sep 17 00:00:00 2001 From: lcawl Date: Mon, 1 Apr 2019 10:34:51 -0700 Subject: [PATCH] [DOCS] Add notable release highlights for 7.0 --- .../release-notes/highlights-7.0.0.asciidoc | 366 +++++++++++++++++- 1 file changed, 365 insertions(+), 1 deletion(-) diff --git a/docs/reference/release-notes/highlights-7.0.0.asciidoc b/docs/reference/release-notes/highlights-7.0.0.asciidoc index d01d543c825..c48bf85c655 100644 --- a/docs/reference/release-notes/highlights-7.0.0.asciidoc +++ b/docs/reference/release-notes/highlights-7.0.0.asciidoc @@ -6,4 +6,368 @@ coming[7.0.0] -See also <> and <>. +//NOTE: The notable-highlights tagged regions are re-used in the +//Installation and Upgrade Guide + +//tag::notable-highlights[] +[float] +==== Adaptive replica selection enabled by default + +In Elasticsearch 6.x and prior, a series of search requests to the same shard +would be forwarded to the primary and each replica in round-robin fashion. This +could prove problematic if one node starts a long garbage collection --- search +requests could still be forwarded to the slow node regardless and would have an +impact on search latency. + +In 6.1, we added an experimental +{ref}/search.html#search-adaptive-replica[adaptive replica selection] feature. +Each node tracks and compares how long search requests to +other nodes take, and uses this information to adjust how frequently to send +requests to shards on particular nodes. In our benchmarks, this results in an +overall improvement in search throughput and reduced 99th percentile latencies. + +This option was disabled by default throughout 6.x, but we’ve heard feedback +from our users that have found the setting to be very beneficial, so we’ve +turned it on by default starting in Elasticsearch 7.0.0. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Skip shard refreshes if a shard is "search idle" + +Elasticsearch 6.x and prior {ref}/indices-refresh.html[refreshed] indices +automatically in the background, by default every second. This provides the +“near real-time” search capabilities Elasticsearch is known for: results are +available for search requests within one second after they'd been added, by +default. However, this behavior has a significant impact on indexing performance +if the refreshes are not needed, (e.g., if Elasticsearch isn’t servicing any +active searches). + +Elasticsearch 7.0 is much smarter about this behavior by introducing the +notion of a shard being "search idle". A shard now transitions to being search +idle after it hasn't had any searches for +{ref}/index-modules.html#dynamic-index-settings[thirty seconds], by default. +Once a shard is search idle, all scheduled refreshes will +be skipped until a search comes through, which will trigger the next scheduled +refresh. We know that this is going to significantly increase the indexing +throughput for many users. The new behavior is only applied if there is no +explicit {ref}/index-modules.html#dynamic-index-settings[refresh interval set], +so do set the refresh +interval explicitly for any indices on which you prefer the old behavior. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Default to one shard + +One of the biggest sources of troubles we’ve seen over the years from our users +has been over-sharding and defaults play a big role in that. In Elasticsearch +6.x and prior, we defaulted to five shards by default per index. If you had one +daily index for ten different applications and each had the default of five +shards, you were creating fifty shards per day and it wasn't long before you had +thousands of shards even if you were only indexing a few gigabytes of data per +day. Index Lifecycle Management was a first step to help with this: providing +native rollover functions to create indexes by size instead of (just) by day and +built-in shrink functionality to shrink the number of shards per +index. Defaulting indices to one shard is the next step in helping to reduce +over-sharding. Of course, if you have another preferred primary shard count, you +can set it via the index settings. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Lucene 8 + +As with every major release, we look to support the latest major version of +Lucene, along with all the goodness that comes with it. That includes all the +developments that we contributed to the new Lucene version. Elasticsearch 7.0 +bundles Lucene 8, which is the latest version of Lucene. Lucene version 8 serves +as the foundation for many functional improvements in the rest of Elasticsearch, +including improved search performance for top-k queries and better ways to +combine relevance signals for your searches while still maintaining speed. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Introduce the ability to minimize round-trips in {ccs} + +In Elasticsearch 5.3, we released a feature called +{ref}/modules-cross-cluster-search.html[{ccs}] for users to query across multiple +clusters. We’ve since improved on the {ccs} framework, adding features to +ultimately use it to deprecate and replace tribe nodes as a way to federate +queries. In Elasticsearch 7.0, we’re adding a new execution mode for {ccs}: one +which has fewer round-trips when they aren't necessary. This mode +(`ccs_minimize_roundtrips`) can result in faster searches when the {ccs} query +spans high-latencies (e.g., across a WAN). +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== New cluster coordination implementation + +Since the beginning, we focused on making Elasticsearch easy to scale and +resilient to catastrophic failures. To support these requirements, we created a +pluggable cluster coordination system, with the default implementation known as +Zen Discovery. Zen Discovery was meant to be effortless, and give our users +peace of mind (as the name implies). The meteoric rise in Elasticsearch usage +has taught us a great deal. For instance, Zen's `minimum_master_nodes` setting +was often misconfigured, which put clusters at a greater risk of split brains +and losing data. Maintaining this setting across large and dynamically resizing +clusters was also difficult. + +In Elasticsearch 7.0, we have completely rethought and rebuilt the cluster +coordination layer. The new implementation gives safe sub-second master election +times, where Zen may have taken several seconds to elect a new master, valuable +time for a mission-critical deployment. With the `minimum_master_nodes` setting +removed, growing and shrinking clusters becomes safer and easier, and leaves +much less room to misconfigure the system. Most importantly, the new cluster +coordination layer gives us strong building blocks for the future of +Elasticsearch, ensuring we can build functionality for even more advanced +use-cases to come. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Better support for small heaps (the real-memory circuit breaker) + +Elasticsearch 7.0 adds an all-new {ref}/circuit-breaker.html[circuit breaker] +that keeps track of the total memory used by the JVM and will reject requests if +they would cause the reserved plus actual heap usage to exceed 95%. We'll also +be changing the default maximum buckets to return as part of an aggregation +(`search.max_buckets`) to 10,000, which is unbounded by default in 6.x and +prior. These two show great signs at seriously improving the out-of-memory +protection of Elasticsearch in 7.x, helping you keep your cluster alive even in +the face of adversarial or novice users running large queries and aggregations. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== {ccr-cap} is production-ready + +We introduced {ccr-cap} as a beta feature in Elasticsearch +6.5. {ccr-cap} was the most heavily requested features for Elasticsearch. We're +excited to announce {ccr-cap} is now generally available and ready for production use +in Elasticsearch 6.7 and 7.0! {ccr-cap} has a variety of use cases, including +cross-datacenter and cross-region replication, replicating data to get closer to +the application server and user, and maintaining a centralized reporting cluster +replicated from a large number of smaller clusters. + +In addition to maturing to a GA feature, there were a number of important +technical advancements in CCR for 6.7 and 7.0. Previous versions of {ccr-cap} required +replication to start on new indices only: existing indices could not be +replicated. {ccr-cap} can now start replicating existing indices that have soft +deletes enabled in 6.7 and 7.0, and new indices default to having soft deletes +enabled. We also introduced new technology to prevent a follower index from +falling fatally far behind its leader index. We’ve added a management UI in +Kibana for configuring remote clusters, indices to replicate, and index naming +patterns for automatic replication (e.g. for replicating `metricbeat-*` +indices). We've also added a monitoring UI for insight into {ccr} progress and +alerting on errors. Check out the Getting started with {ccr} +guide, or visit the reference documentation to learn more. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== {ilm-cap} is production-ready + +Index Lifecycle Management (ILM) was +https://www.elastic.co/blog/elastic-stack-6-6-0-released[released] as a beta +feature in Elasticsearch 6.6. We’ve officially moved ILM out of beta and into +GA, ready for production usage! ILM makes it easy to manage the lifecycle of +data in Elasticsearch, including how data progresses between +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/ilm-policy-definition.html[hot, warm, cold, and deletion phases]. +Specific rules regarding how data moves through these phases can be created via +APIs in Elasticsearch, or a beautiful management UI in Kibana. + +In Elasticsearch 6.7 and 7.0, ILM can now manage frozen indices. Frozen indices +are valuable for long term data storage in Elasticsearch, and require a smaller +amount of memory (heap) in relation to the amount of data managed by a node. In +6.7 and 7.0, +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/_actions.html[frozen indices] +can now be frozen as part of the cold phase in ILM. In addition, ILM now works +directly with Cross-Cluster Replication (CCR), which also GA’d in the +Elasticsearch 6.7 and 7.0 releases. The potential actions available in each ILM +phase can be found in the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/_actions.html[documentation]. +ILM is free to use and part of the default distribution of Elasticsearch. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== SQL is production-ready + +The SQL interface to Elasticsearch is now GA. +https://www.elastic.co/blog/elasticsearch-6-3-0-released[Introduced in 6.3] as +an alpha release, the SQL interface allows developers and data scientists +familiar with SQL to use the speed, scalability, and full-text power of +Elasticsearch that others know and love. It also allows BI tools using SQL to +easily access data in Elasticsearch. In addition to approving SQL access as a GA +feature in Elasticsearch, we’ve designated our +https://www.elastic.co/downloads/jdbc-client[JDBC] and +https://www.elastic.co/downloads/odbc-client[ODBC] drivers as GA. There are four +methods to access Elasticsearch SQL: through the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-rest.html[Elasticsearch +REST endpoints], the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-cli.html[Elasticsearch +SQL command line interface], the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-jdbc.html[JDBC +driver], and the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/sql-odbc.html[ODBC +driver]. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== High-level REST client is feature-complete + +If you’ve been following our +https://www.elastic.co/blog/the-elasticsearch-java-high-level-rest-client-is-out[blog] +or our https://github.com/elastic/elasticsearch/issues/27205[GitHub repository], +you may be aware of a task we’ve been working on for quite a while now: creating +a next-generation Java client for accessing an Elasticsearch cluster. We +started off by working on the most commonly-used features like search and +aggregations, and have been working our way through administrative and +monitoring APIs. Many of you that use Java are already using this new client, +but for those that are still using the TransportClient, now is a great time to +upgrade to our High Level REST Client, or HLRC. + +As of 7.0.0, the HLRC now has all the API checkboxes checked to call it +“complete” so those of you still using the TransportClient should be able to +migrate. We’ll of course continue to develop our REST APIs and will add them to +this client as we go. For a list of all of the APIs that are available, have a +look at our +https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high.html[HLRC +documentation]. To get started, have a look at the +https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high-getting-started.html[getting +started with the HLRC] section of our docs and if you need help migrating from +the TransportClient, have a look at our +https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.0/java-rest-high-level-migration.html[migration +guide]. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Support nanosecond timestamps + +Up until 7.0 Elasticsearch could only store timestamps with millisecond +precision. If you wanted to process events that occur at a higher rate -- for +example if you want to store and analyze tracing or network packet data in +Elasticsearch -- you may want higher precision. Historically, we have used the +https://www.joda.org/joda-time/[Joda time library] to handle dates and times, +and Joda lacked support for such high precision timestamps. + +With JDK 8, an official Java time API has been introduced which can also handle +nanosecond precision timestamps and over the past year, we’ve been working to +migrate our Joda time usage to the native Java time while trying to maintain +backwards compatibility. As of 7.0.0, you can now make use of these nanosecond +timestamps via a dedicated +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/date_nanos.html[date_nanos +field mapper]. Note that aggregations are still on a millisecond resolution +with this field to avoid having an explosion of buckets. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Faster retrieval of top hits + +When it comes to search, query performance is a key feature. We have achieved a +significant improvement to search performance in Elasticsearch 7.0 for +situations in which the exact hit count is not needed and it is sufficient to +set a lower boundary to the number of results. For example, if your users +typically just look at the first page of results on your site and don’t care +about exactly how many documents matched, you may be able to show them “more +than 10,000 hits” and then provide them with paginated results. It’s quite +common to have users enter frequently-occurring terms like “the” and “a” in +their queries, which has historically forced Elasticsearch to score a lot of +documents even when those frequent terms couldn’t possibly add much to the +score. + +In these conditions Elasticsearch can now skip calculating scores for records +that are identified at an early stage as records that will not be ranked at the +top of the result set. This can significantly improve the query speed. The +actual number of top results that are scored is +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/search-request-track-total-hits.html[configurable], +but the default is 10,000. The behavior of queries that have a result set that +is smaller than this threshold will not change - i.e. the results count is +accurate but there is no performance improvement for queries that match a small +number of documents. Because the improvement is based on skipping low ranking +records, it does not apply to aggregations. You can read more about this +powerful algorithmic development in our blog post +https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand[Magic +WAND: Faster Retrieval of Top Hits in Elasticsearch]. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Support for TLS 1.3 + +Elasticsearch has supported encrypted communications for a long time, however, +we recently started https://www.elastic.co/support/matrix#matrix_jvm[supporting +JDK 11], which gives us new capabilities. JDK 11 now has TLSv1.3 support so +starting with 7.0, we’re now supporting TLSv1.3 within Elasticsearch for those +of you running JDK 11. In order to help new users from inadvertently running +with low security, we’ve also dropped TLSv1.0 from our defaults. For those +running older versions of Java, we have default options of TLSv1.2 and +TLSv1.1. Have a look at our +https://www.elastic.co/guide/en/elastic-stack-overview/7.0/ssl-tls.html[TLS +setup instructions] if you need help getting started. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Bundle JDK in Elasticsearch distribution + +One of the more prominent "getting started hurdles" we’ve seen users run into +has been not knowing that Elasticsearch is a Java application and that they need +to install one of the supported JDKs first. With 7.0, we’re now bundling a +distribution of OpenJDK to help users get started with Elasticsearch even +faster. We understand that some users have preferred JDK distributions, so we +also support bringing your own JDK. If you want to bring your own JDK, you can +still do so by +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/setup.html#jvm-version[setting +JAVA_HOME] before starting Elasticsearch. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Rank features + +Elasticsearch 7.0 has several new field types to get the most out of your data. +Two to help with core search use cases are +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/rank-feature.html[`rank_feature`] +and +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/rank-features.html[`rank_features`]. +These can be used to boost documents based on numeric or categorical values +while still maintaining the performance of the new fast top hits query +capabilities. For more information on these fields and how to use them, read our +https://www.elastic.co/blog/easier-relevance-tuning-elasticsearch-7-0[blog +post]. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== JSON logging + +JSON logging is now enabled in Elasticsearch in addition to plaintext +logs. Starting in 7.0, you will find new files with `.json` extensions in your +log directory. This means you can now use filtering tools like +https://stedolan.github.io/jq/[`jq`] to pretty print and process your logs in a +much more structured manner. You can also expect finding additional information +like `node.id`, `cluster.uuid`, `type` (and more) in each log line. The `type` +field per each JSON log line will let you to distinguish log streams when +running on docker. +//end::notable-highlights[] + +//tag::notable-highlights[] +[float] +=== Script score query (aka function score 2.0) + +With 7.0, we are introducing the +https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-script-score-query.html[next +generation of our function score capability]. This new script_score query +provides a new, simpler, and more flexible way to generate a ranking score per +record. The script_score query is constructed of a set of functions, including +arithmetic and distance functions, which the user can mix and match to construct +arbitrary function score calculations. The modular structure is simpler to use +and will open this important functionality to additional users. +//end::notable-highlights[]