OpenSearch/docs/reference/release-notes/highlights-7.8.0.asciidoc

168 lines
6.2 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[release-highlights-7.8.0]]
== 7.8.0 release highlights
++++
<titleabbrev>7.8.0</titleabbrev>
++++
//NOTE: The notable-highlights tagged regions are re-used in the
//Installation and Upgrade Guide
// tag::notable-highlights[]
[float]
=== Geo improvements
We have made several improvements to geo support in {es} 7.8.
- You can now run an aggregation that finds the bounding box (top left point and
bottom right point) that contains all shapes matching a query. A shape is
anything that is defined by multiple points. See
{ref}/search-aggregations-metrics-geobounds-aggregation.html[Geo Bounds Aggregations].
- {ref}/search-aggregations-bucket-geohashgrid-aggregation.html[GeoHash grid aggregations]
and {ref}/search-aggregations-bucket-geotilegrid-aggregation.html[map tile grid aggregations]
allow you to group geo_points into buckets.
- {ref}/search-aggregations-metrics-geocentroid-aggregation.html[Geo centroid aggregations]
allow you to compute the weighted https://en.wikipedia.org/wiki/Centroid[centroid]
from all coordinate values for a geo_point field.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
=== Add support for t-test aggregations
{es} now supports a `t_test` metrics
aggregation, which performs a statistical hypothesis test in which the test
statistic follows a
https://en.wikipedia.org/wiki/Student%27s_t-distribution[Students
t-distribution] under the null hypothesis on numeric values extracted from
the aggregated documents or generated by provided scripts. In practice,
this will tell you if the difference between two population means are
statistically significant and did not occur by chance alone. See
{ref}/search-aggregations-metrics-ttest-aggregation.html[T-Test Aggregation].
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
=== Expose aggregation usage in feature usage API
It is now possible to fetch a count of aggregations that have been executed
via the {ref}/cluster-nodes-usage.html[node features API]. This is broken down per
combination of aggregation and data type, per shard on each node, from the
last restart until the time when the counts are fetched. When trying to
analyze how {es} is being used in practice, it is useful to know
the usage distribution across aggregations and field types. For example,
you might be able to conclude that a certain part of an index is not used a
lot and could perhaps can be eliminated.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
=== Support `value_count` and `avg` aggregations over histogram fields
{es} now implements `value_count` and `avg` aggregations over histogram
fields.
When the `value_count` aggregation is computed on {ref}/histogram.html[histogram
fields], the result of the aggregation is the sum of all numbers in the
`counts` array of the histogram.
When the average is computed on histogram fields, the result of the
aggregation is the weighted average of all elements in the `values` array
taking into consideration the number in the same position in the `counts`
array.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
=== Reduce aggregation memory consumption
{es} now attempts to save memory on the coordinating node by delaying
deserialization of the shard results for an aggregation until the last
second. This is helpful as it makes the shard-aggregations results "short
lived" garbage. It also should shrink the memory usage of aggregations when
they are waiting to be merged.
Additionally, when the search is in batched reduce mode, {es} will force
the results to be serialized between batch reduces in an attempt to keep
the memory usage as low as possible between reductions.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
=== Scalar functions now supported in SQL aggregations
When querying {es} using SQL, it is now possible to use scalar functions
inside aggregations. This allows for more complex expressions, including
within `GROUP BY` or `HAVING` clauses. For example:
[source, sql]
----
SELECT
MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max,
b
FROM test
GROUP BY b
HAVING
MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
----
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
[[release-highlights-7.8.0-throttling]]
=== Increase the performance and scalability of {transforms} with throttling
{transforms-cap} achieved GA status in 7.7 and now in 7.8 they are even better
with the introduction of
{ref}/transform-overview.html#transform-performance[throttling]. You can spread
out the impact of the {transforms} on your cluster by defining the rate at which
they perform search and index requests. Set the `docs_per_second` limit when you
create or update your {transform}.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
[[release-highlights-7.8.0-mml]]
=== Better estimates for {ml} model memory usage
For 7.8, we introduce dynamic estimation of the model memory limit for jobs in
{ml-docs}/ootb-ml-jobs.html[ML solution modules]. The estimate is generated
during the job creation. It uses a calculation based on the specific detectors
of the job and the cardinality of the partitioning and influencer fields. It
means the job setup has better default values depending on the size of the data
being analyzed.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
[[release-highlights-7.8.0-loss-functions]]
=== Additional loss functions for {regression}
{ml-docs}/dfa-regression.html#dfa-regression-lossfunction[Loss functions]
measure how well a {ml} model fits a specific data set. In 7.8, we added two new
loss functions for {regression} analysis. In addition to the existing mean
squared error function, there are now mean squared logarithmic error and
Pseudo-Huber loss functions. These additions enable you to choose the
loss function that fits best with your data set.
// end::notable-highlights[]
// tag::notable-highlights[]
[float]
[[release-highlights-7.8.0-data-visualizer]]
=== Extended upload limit and explanations for Data Visualizer
You can now upload files up to 1 GB in Data Visualizer. The file structure
finder functionality of the Data Visualizer provides more detailed explanations
after both successful and unsuccessful analysis which makes it easier to
diagnose issues with file upload.
// end::notable-highlights[]