OpenSearch/docs/reference/release-notes/7.8.asciidoc

474 lines
24 KiB
Plaintext
Raw Normal View History

[[release-notes-7.8.1]]
== {es} version 7.8.1
Also see <<breaking-changes-7.8,Breaking changes in 7.8>>.
[[breaking-7.8.1]]
[discrete]
=== Breaking changes
License::
* Display enterprise license as platinum in /_xpack {es-pull}58217[#58217]
[[feature-7.8.1]]
[discrete]
=== New features
SQL::
* Implement SUM, MIN, MAX, AVG over literals {es-pull}56786[#56786] (issues: {es-issue}41412[#41412], {es-issue}55569[#55569])
[[enhancement-7.8.1]]
[discrete]
=== Enhancements
Authorization::
* Add read privileges for observability-annotations for apm_user {es-pull}58530[#58530] (issues: {es-issue}64796[#64796], {es-issue}69642[#69642], {es-issue}69881[#69881])
Features/Indices APIs::
* Change "apply create index" log level to DEBUG {es-pull}56947[#56947]
* Make noop template updates be cluster state noops {es-pull}57851[#57851] (issue: {es-issue}57662[#57662])
* Rename template V2 classes to ComposableTemplate {es-pull}57183[#57183] (issue: {es-issue}56609[#56609])
Machine Learning::
* Add exponent output aggregator to inference {es-pull}58933[#58933]
Snapshot/Restore::
* Allow read operations to be executed without waiting for full range to be written in cache {es-pull}58728[#58728] (issues: {es-issue}58164[#58164], {es-issue}58477[#58477])
* Allows SparseFileTracker to progressively execute listeners during Gap processing {es-pull}58477[#58477] (issue: {es-issue}58164[#58164])
* Do not wrap CacheFile reentrant r/w locks with ReleasableLock {es-pull}58244[#58244] (issue: {es-issue}58164[#58164])
* Use snapshot information to build searchable snapshot store MetadataSnapshot {es-pull}56289[#56289]
Transform::
* Improve update API {es-pull}57648[#57648] (issue: {es-issue}56499[#56499])
[[bug-7.8.1]]
[discrete]
=== Bug fixes
CCR::
* Ensure CCR partial reads never overuse buffer {es-pull}58620[#58620]
Cluster Coordination::
* Suppress cluster UUID logs in 6.8/7.x upgrade {es-pull}56835[#56835]
* Timeout health API on busy master {es-pull}57587[#57587]
Engine::
* Fix realtime get of numeric fields from translog {es-pull}58121[#58121] (issue: {es-issue}57462[#57462])
Features/ILM+SLM::
* Fix negative limiting with fewer PARTIAL snapshots than minimum required {es-pull}58563[#58563] (issue: {es-issue}58515[#58515])
Features/Indices APIs::
* Fix issue reading template mappings after cluster restart {es-pull}58964[#58964] (issues: {es-issue}58521[#58521], {es-issue}58643[#58643], {es-issue}58883[#58883])
* ITV2: disallow duplicate dynamic templates {es-pull}56291[#56291] (issues: {es-issue}28988[#28988], {es-issue}53101[#53101], {es-issue}53326[#53326])
Features/Stats::
* Fix unnecessary stats warning when swap is disabled {es-pull}57983[#57983]
Geo::
* Fix max-int limit for number of points reduced in geo_centroid {es-pull}56370[#56370] (issue: {es-issue}55992[#55992])
* Re-enable support for array-valued geo_shape fields. {es-pull}58786[#58786]
Infra/Core::
* Week based parsing for ingest date processor {es-pull}58597[#58597] (issue: {es-issue}58479[#58479])
Machine Learning::
* Allow unran/incomplete forecasts to be deleted for stopped/failed jobs {es-pull}57152[#57152] (issue: {es-issue}56419[#56419])
* Fix inference .ml-stats-write alias creation {es-pull}58947[#58947] (issue: {es-issue}58662[#58662])
* Fix race condition when force stopping data frame analytics job {es-pull}57680[#57680]
* Handle broken setup with state alias being an index {es-pull}58999[#58999] (issue: {es-issue}58482[#58482])
* Mark forecasts for force closed/failed jobs as failed {es-pull}57143[#57143] (issue: {es-issue}56419[#56419])
* Better interrupt handling during named pipe connection {ml-pull}1311[#1311]
* Trap potential cause of SIGFPE {ml-pull}1351[#1351] (issue: {ml-issue}1348[#1348])
* Correct inference model definition for MSLE regression models {ml-pull}1375[#1375]
* Fix cause of SIGSEGV of classification and regression {ml-pull}1379[#1379]
* Fix restoration of change detectors after seasonality change {ml-pull}1391[#1391]
* Fix potential SIGSEGV when forecasting {ml-pull}1402[#1402] (issue: {ml-issue}1401[#1401])
Network::
* Close channel on handshake error with old version {es-pull}56989[#56989] (issue: {es-issue}54337[#54337])
Percolator::
* Fix nested document support in percolator query {es-pull}58149[#58149] (issue: {es-issue}52850[#52850])
Recovery::
* Fix recovery stage transition with sync_id {es-pull}57754[#57754] (issues: {es-issue}57187[#57187], {es-issue}57708[#57708])
SQL::
* Fix behaviour of COUNT(DISTINCT <literal>) {es-pull}56869[#56869]
* Fix bug in resolving aliases against filters {es-pull}58399[#58399] (issues: {es-issue}57270[#57270], {es-issue}57417[#57417])
* Fix handling of escaped chars in JDBC connection string {es-pull}58429[#58429] (issue: {es-issue}57927[#57927])
* Handle MIN and MAX functions on dates in Painless scripts {es-pull}57605[#57605] (issue: {es-issue}57581[#57581])
Search::
* Ensure search contexts are removed on index delete {es-pull}56335[#56335]
* Filter empty fields in SearchHit#toXContent {es-pull}58418[#58418] (issue: {es-issue}41656[#41656])
* Fix exists query on unmapped field in query_string {es-pull}58804[#58804] (issues: {es-issue}55785[#55785], {es-issue}58737[#58737])
* Fix handling of terminate_after when size is 0 {es-pull}58212[#58212] (issue: {es-issue}57624[#57624])
* Fix possible NPE on search phase failure {es-pull}57952[#57952] (issues: {es-issue}51708[#51708], {es-issue}57945[#57945])
* Handle failures with no explicit cause in async search {es-pull}58319[#58319] (issues: {es-issue}57925[#57925], {es-issue}58311[#58311])
* Improve error handling in async search code {es-pull}57925[#57925] (issue: {es-issue}58995[#58995])
* Prevent BigInteger serialization errors in term queries {es-pull}57987[#57987] (issue: {es-issue}57917[#57917])
* Submit async search to not require read privilege {es-pull}58942[#58942]
Snapshot/Restore::
* Fix Incorrect Snapshot Shar Status for DONE Shards in Running Snapshots {es-pull}58390[#58390]
* Fix Memory Leak From Master Failover During Snapshot {es-pull}58511[#58511] (issue: {es-issue}56911[#56911])
* Fix NPE in SnapshotService CS Application {es-pull}58680[#58680]
* Fix Snapshot Abort Not Waiting for Data Nodes {es-pull}58214[#58214]
* Remove Overly Strict Safety Mechnism in Shard Snapshot Logic {es-pull}57227[#57227] (issue: {es-issue}57198[#57198])
Task Management::
* Cancel persistent task recheck when no longer master {es-pull}58539[#58539] (issue: {es-issue}58531[#58531])
* Ensure unregister child node if failed to register task {es-pull}56254[#56254] (issues: {es-issue}54312[#54312], {es-issue}55875[#55875])
Transform::
* Fix page size return in cat transform, add dps {es-pull}57871[#57871] (issues: {es-issue}56007[#56007], {es-issue}56498[#56498])
[[upgrade-7.8.1]]
[discrete]
=== Upgrades
Infra/Core::
* Upgrade to JNA 5.5.0 {es-pull}58183[#58183]
value_count Aggregation optimization (backport of #54854) (#55076) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits | avg | sum | value_count | ----------- | ------- | ------- | ----------- | 20,000 | .038s | .033s | .063s | 200,000 | .127s | .125s | .334s | 2,000,000 | .789s | .729s | 3.176s | 20,000,000 | 4.200s | 3.239s | 22.787s | 200,000,000 | 21.000s | 22.000s | 154.917s | The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type | average | min | max | sum ------------ | ------- | ---- | ----------- | ------- double->long | 32.2ns | 28ns | 0.024ms | 3.22s long->double | 31.9ns | 28ns | 0.036ms | 3.19s long->String | 163.8ns | 93ns | 1921 ms | 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits | avg | sum | value_count | value_count | value_count | value_count | value_count | value_count | | | | double | double | keyword | keyword | geo_point | geo_point | | | | before | after | before | after | before | after | ----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | 20,000 | 38s | .033s | .063s | .026s | .030s | .030s | .038s | .015s | 200,000 | 127s | .125s | .334s | .078s | .116s | .099s | .278s | .031s | 2,000,000 | 789s | .729s | 3.176s | .439s | .348s | .386s | 3.365s | .178s | 20,000,000 | 4.200s | 3.239s | 22.787s | 2.700s | 2.500s | 2.600s | 25.192s | 1.278s | 200,000,000 | 21.000s | 22.000s | 154.917s | 18.990s | 19.000s | 20.000s | 168.971s | 9.093s | - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
[[release-notes-7.8.0]]
== {es} version 7.8.0
Also see <<breaking-changes-7.8,Breaking changes in 7.8>>.
value_count Aggregation optimization (backport of #54854) (#55076) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits | avg | sum | value_count | ----------- | ------- | ------- | ----------- | 20,000 | .038s | .033s | .063s | 200,000 | .127s | .125s | .334s | 2,000,000 | .789s | .729s | 3.176s | 20,000,000 | 4.200s | 3.239s | 22.787s | 200,000,000 | 21.000s | 22.000s | 154.917s | The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type | average | min | max | sum ------------ | ------- | ---- | ----------- | ------- double->long | 32.2ns | 28ns | 0.024ms | 3.22s long->double | 31.9ns | 28ns | 0.036ms | 3.19s long->String | 163.8ns | 93ns | 1921 ms | 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits | avg | sum | value_count | value_count | value_count | value_count | value_count | value_count | | | | double | double | keyword | keyword | geo_point | geo_point | | | | before | after | before | after | before | after | ----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | 20,000 | 38s | .033s | .063s | .026s | .030s | .030s | .038s | .015s | 200,000 | 127s | .125s | .334s | .078s | .116s | .099s | .278s | .031s | 2,000,000 | 789s | .729s | 3.176s | .439s | .348s | .386s | 3.365s | .178s | 20,000,000 | 4.200s | 3.239s | 22.787s | 2.700s | 2.500s | 2.600s | 25.192s | 1.278s | 200,000,000 | 21.000s | 22.000s | 154.917s | 18.990s | 19.000s | 20.000s | 168.971s | 9.093s | - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
[[breaking-7.8.0]]
[discrete]
value_count Aggregation optimization (backport of #54854) (#55076) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits | avg | sum | value_count | ----------- | ------- | ------- | ----------- | 20,000 | .038s | .033s | .063s | 200,000 | .127s | .125s | .334s | 2,000,000 | .789s | .729s | 3.176s | 20,000,000 | 4.200s | 3.239s | 22.787s | 200,000,000 | 21.000s | 22.000s | 154.917s | The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type | average | min | max | sum ------------ | ------- | ---- | ----------- | ------- double->long | 32.2ns | 28ns | 0.024ms | 3.22s long->double | 31.9ns | 28ns | 0.036ms | 3.19s long->String | 163.8ns | 93ns | 1921 ms | 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits | avg | sum | value_count | value_count | value_count | value_count | value_count | value_count | | | | double | double | keyword | keyword | geo_point | geo_point | | | | before | after | before | after | before | after | ----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | 20,000 | 38s | .033s | .063s | .026s | .030s | .030s | .038s | .015s | 200,000 | 127s | .125s | .334s | .078s | .116s | .099s | .278s | .031s | 2,000,000 | 789s | .729s | 3.176s | .439s | .348s | .386s | 3.365s | .178s | 20,000,000 | 4.200s | 3.239s | 22.787s | 2.700s | 2.500s | 2.600s | 25.192s | 1.278s | 200,000,000 | 21.000s | 22.000s | 154.917s | 18.990s | 19.000s | 20.000s | 168.971s | 9.093s | - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
=== Breaking changes
Aggregations::
* `value_count` aggregation optimization {es-pull}54854[#54854]
Features/Indices APIs::
* Add auto create action {es-pull}55858[#55858]
Mapping::
* Disallow changing 'enabled' on the root mapper {es-pull}54463[#54463] (issue: {es-issue}33933[#33933])
* Fix updating include_in_parent/include_in_root of nested field {es-pull}54386[#54386] (issue: {es-issue}53792[#53792])
[[deprecation-7.8.0]]
[discrete]
=== Deprecations
Authentication::
* Deprecate the `kibana` reserved user; introduce `kibana_system` user {es-pull}54967[#54967]
Cluster Coordination::
* Voting config exclusions should work with absent nodes {es-pull}50836[#50836] (issue: {es-issue}47990[#47990])
Features/Features::
* Add node local storage deprecation check {es-pull}54383[#54383] (issue: {es-issue}54374[#54374])
Features/Indices APIs::
* Deprecate local parameter for get field mapping request {es-pull}55014[#55014]
Infra/Core::
* Deprecate node local storage setting {es-pull}54374[#54374]
Infra/Plugins::
* Add xpack setting deprecations to deprecation API {es-pull}56290[#56290] (issue: {es-issue}54745[#54745])
* Deprecate disabling basic-license features {es-pull}54816[#54816] (issue: {es-issue}54745[#54745])
* Deprecated xpack "enable" settings should be no-ops {es-pull}55416[#55416] (issues: {es-issue}54745[#54745], {es-issue}54816[#54816])
* Make xpack.ilm.enabled setting a no-op {es-pull}55592[#55592] (issues: {es-issue}54745[#54745], {es-issue}54816[#54816], {es-issue}55416[#55416])
* Make xpack.monitoring.enabled setting a no-op {es-pull}55617[#55617] (issues: {es-issue}54745[#54745], {es-issue}54816[#54816], {es-issue}55416[#55416], {es-issue}55461[#55461], {es-issue}55592[#55592])
* Restore xpack.ilm.enabled and xpack.slm.enabled settings {es-pull}57383[#57383] (issues: {es-issue}54745[#54745], {es-issue}55416[#55416], {es-issue}55592[#55592])
[[feature-7.8.0]]
[discrete]
=== New features
Aggregations::
* Add Student's t-test aggregation support {es-pull}54469[#54469] (issue: {es-issue}53692[#53692])
* Add support for filters to t-test aggregation {es-pull}54980[#54980] (issue: {es-issue}53692[#53692])
* Histogram field type support for Sum aggregation {es-pull}55681[#55681] (issue: {es-issue}53285[#53285])
* Histogram field type support for ValueCount and Avg aggregations {es-pull}55933[#55933] (issue: {es-issue}53285[#53285])
Features/Indices APIs::
* Add simulate template composition API _index_template/_simulate_index/{name} {es-pull}55686[#55686] (issue: {es-issue}53101[#53101])
Geo::
* Add geo_bounds aggregation support for geo_shape {es-pull}55328[#55328]
* Add geo_shape support for geotile_grid and geohash_grid {es-pull}55966[#55966]
* Add geo_shape support for the geo_centroid aggregation {es-pull}55602[#55602]
* Add new point field {es-pull}53804[#53804]
SQL::
* Implement DATETIME_FORMAT function for date/time formatting {es-pull}54832[#54832] (issue: {es-issue}53714[#53714])
* Implement DATETIME_PARSE function for parsing strings {es-pull}54960[#54960] (issue: {es-issue}53714[#53714])
* Implement scripting inside aggs {es-pull}55241[#55241] (issues: {es-issue}29980[#29980], {es-issue}36865[#36865], {es-issue}37271[#37271])
[[enhancement-7.8.0]]
[discrete]
=== Enhancements
Aggregations::
* Aggs must specify a `field` or `script` (or both) {es-pull}52226[#52226]
* Expose aggregation usage in Feature Usage API {es-pull}55732[#55732] (issue: {es-issue}53746[#53746])
* Reduce memory for big aggregations run against many shards {es-pull}54758[#54758]
* Save memory in on aggs in async search {es-pull}55683[#55683]
Allocation::
* Disk decider respect watermarks for single data node {es-pull}55805[#55805]
* Improve same-shard allocation explanations {es-pull}56010[#56010]
Analysis::
* Add preserve_original setting in ngram token filter {es-pull}55432[#55432]
* Add preserve_original setting in edge ngram token filter {es-pull}55766[#55766] (issue: {es-issue}55767[#55767])
* Add pre-configured “lowercase” normalizer {es-pull}53882[#53882] (issue: {es-issue}53872[#53872])
Audit::
* Update the audit logfile list of system users {es-pull}55578[#55578] (issue: {es-issue}37924[#37924])
Authentication::
* Let realms gracefully terminate the authN chain {es-pull}55623[#55623]
Authorization::
* Add reserved_ml_user and reserved_ml_admin kibana privileges {es-pull}54713[#54713]
Autoscaling::
* Rollover: refactor out cluster state update {es-pull}53965[#53965]
CRUD::
* Avoid holding onto bulk items until all completed {es-pull}54407[#54407]
Cluster Coordination::
* Add voting config exclusion add and clear API spec and integration test cases {es-pull}55760[#55760] (issue: {es-issue}48131[#48131])
Features/CAT APIs::
* Add support for V2 index templates to /_cat/templates {es-pull}55829[#55829] (issue: {es-issue}53101[#53101])
Features/Indices APIs::
* Add HLRC support for simulate index template api {es-pull}55936[#55936] (issue: {es-issue}53101[#53101])
* Add prefer_v2_templates flag and index setting {es-pull}55411[#55411] (issue: {es-issue}53101[#53101])
* Add warnings/errors when V2 templates would match same indices as V1 {es-pull}54367[#54367] (issue: {es-issue}53101[#53101])
* Disallow merging existing mapping field definitions in templates {es-pull}57701[#57701] (issues: {es-issue}55607[#55607], {es-issue}55982[#55982], {es-issue}57393[#57393])
* Emit deprecation warning if multiple v1 templates match with a new index {es-pull}55558[#55558] (issue: {es-issue}53101[#53101])
* Guard adding the index.prefer_v2_templates settings for pre-7.8 nodes {es-pull}55546[#55546] (issues: {es-issue}53101[#53101], {es-issue}55411[#55411], {es-issue}55539[#55539])
* Handle merging dotted object names when merging V2 template mappings {es-pull}55982[#55982] (issue: {es-issue}53101[#53101])
* Throw exception on duplicate mappings metadata fields when merging templates {es-pull}57835[#57835] (issue: {es-issue}57701[#57701])
* Update template v2 api rest spec {es-pull}55948[#55948] (issue: {es-issue}53101[#53101])
* Use V2 index templates during index creation {es-pull}54669[#54669] (issue: {es-issue}53101[#53101])
* Use V2 templates when reading duplicate aliases and ingest pipelines {es-pull}54902[#54902] (issue: {es-issue}53101[#53101])
* Validate V2 templates more strictly {es-pull}56170[#56170] (issues: {es-issue}43737[#43737], {es-issue}46045[#46045], {es-issue}53101[#53101], {es-issue}53970[#53970])
Features/Java High Level REST Client::
* Enable support for decompression of compressed response within RestHighLevelClient {es-pull}53533[#53533]
Features/Stats::
* Fix available / total disk cluster stats {es-pull}32480[#32480] (issue: {es-issue}32478[#32478])
Features/Watcher::
* Delay warning about missing x-pack {es-pull}54265[#54265] (issue: {es-issue}40898[#40898])
Geo::
* Add geo_shape mapper supporting doc-values in Spatial Plugin {es-pull}55037[#55037] (issue: {es-issue}53562[#53562])
Infra/Core::
* Decouple Environment from DiscoveryNode {es-pull}54373[#54373]
* Ensure that the output of node roles are sorted {es-pull}54376[#54376] (issue: {es-issue}54370[#54370])
* Reintroduce system index APIs for Kibana {es-pull}54858[#54858] (issues: {es-issue}52385[#52385], {es-issue}53912[#53912])
* Schedule commands in current thread context {es-pull}54187[#54187] (issue: {es-issue}17143[#17143])
* Start resource watcher service early {es-pull}54993[#54993] (issue: {es-issue}54867[#54867])
Infra/Packaging::
* Make Windows JAVA_HOME handling consistent with Linux {es-pull}55261[#55261] (issue: {es-issue}55134[#55134])
Infra/REST API::
* Add validation to the usage service {es-pull}54617[#54617]
Infra/Scripting::
* Scripting: stats per context in nodes stats {es-pull}54008[#54008] (issue: {es-issue}50152[#50152])
Machine Learning::
* Add effective max model memory limit to ML info {es-pull}55529[#55529] (issue: {es-issue}63942[#63942])
* Add loss_function to regression {es-pull}56118[#56118]
* Add new inference_config field to trained model config {es-pull}54421[#54421]
* Add failed_category_count to model_size_stats {es-pull}55716[#55716] (issue: {es-issue}1130[#1130])
* Add prediction_field_type to inference config {es-pull}55128[#55128]
* Allow a certain number of ill-formatted rows when delimited format is specified {es-pull}55735[#55735] (issue: {es-issue}38890[#38890])
* Apply default timeout in StopDataFrameAnalyticsAction.Request {es-pull}55512[#55512]
* Create an annotation when a model snapshot is stored {es-pull}53783[#53783] (issue: {es-issue}52149[#52149])
* Do not execute ML CRUD actions when upgrade mode is enabled {es-pull}54437[#54437] (issue: {es-issue}54326[#54326])
* Make find_file_structure recognize Kibana CSV report timestamps {es-pull}55609[#55609] (issue: {es-issue}55586[#55586])
* More advanced model snapshot retention options {es-pull}56125[#56125] (issue: {es-issue}52150[#52150])
* Return assigned node in start/open job/datafeed response {es-pull}55473[#55473] (issue: {es-issue}54067[#54067])
* Skip daily maintenance activity if upgrade mode is enabled {es-pull}54565[#54565] (issue: {es-issue}54326[#54326])
* Start gathering and storing inference stats {es-pull}53429[#53429]
* Unassign data frame analytics tasks in SetUpgradeModeAction {es-pull}54523[#54523] (issue: {es-issue}54326[#54326])
* Speed up anomaly detection for the lat_long function {ml-pull}1102[#1102]
* Reduce CPU scheduling priority of native analysis processes to favor the ES
JVM when CPU is constrained. This change is implemented only for Linux and macOS,
not for Windows {ml-pull}1109[#1109]
* Take `training_percent` into account when estimating memory usage for
classification and regression {ml-pull}1111[#1111]
* Support maximize minimum recall when assigning class labels for multiclass
classification {ml-pull}1113[#1113]
* Improve robustness of anomaly detection to bad input data {ml-pull}1114[#1114]
* Add new `num_matches` and `preferred_to_categories` fields to category output
{ml-pull}1062[#1062]
* Add mean squared logarithmic error (MSLE) for regression {ml-pull}1101[#1101]
* Add pseudo-Huber loss for regression {ml-pull}1168[#1168]
* Reduce peak memory usage and memory estimates for classification and regression
{ml-pull}1125[#1125].)
* Reduce variability of classification and regression results across our target
operating systems {ml-pull}1127[#1127]
* Switch data frame analytics model memory estimates from kilobytes to
megabytes {ml-pull}1126[#1126] (issue: {es-issue}54506[#54506])
* Add a {ml} native code build for Linux on AArch64 {ml-pull}1132[#1132],
{ml-pull}1135[#1135]
* Improve data frame analytics runtime by optimising memory alignment for intrinsic
operations {ml-pull}1142[#1142]
* Fix spurious anomalies for count and sum functions after no data are received
for long periods of time {ml-pull}1158[#1158]
* Improve false positive rates from periodicity test for time series anomaly
detection {ml-pull}1177[#1177]
* Break progress reporting of data frame analyses into multiple phases {ml-pull}1179[#1179]
* Really centre the data before training for classification and regression begins. This
means we can choose more optimal smoothing bias and should reduce the number of trees
{ml-pull}1192[#1192]
Mapping::
* Merge V2 index/component template mappings in specific manner {es-pull}55607[#55607] (issue: {es-issue}53101[#53101])
Recovery::
* Avoid copying file chunks in peer covery {es-pull}56072[#56072] (issue: {es-issue}55353[#55353])
* Retry failed peer recovery due to transient errors {es-pull}55353[#55353]
SQL::
* Add BigDecimal support to JDBC {es-pull}56015[#56015] (issue: {es-issue}43806[#43806])
* Drop BASE TABLE type in favour for just TABLE {es-pull}54836[#54836]
* Relax version lock between server and clients {es-pull}56148[#56148]
value_count Aggregation optimization (backport of #54854) (#55076) We found some problems during the test. Data: 200Million docs, 1 shard, 0 replica hits | avg | sum | value_count | ----------- | ------- | ------- | ----------- | 20,000 | .038s | .033s | .063s | 200,000 | .127s | .125s | .334s | 2,000,000 | .789s | .729s | 3.176s | 20,000,000 | 4.200s | 3.239s | 22.787s | 200,000,000 | 21.000s | 22.000s | 154.917s | The performance of `avg`, `sum` and other is very close when performing statistics, but the performance of `value_count` has always been poor, even not on an order of magnitude. Based on some common-sense knowledge, we think that `value_count` and sum are similar operations, and the time consumed should be the same. Therefore, we have discussed the agg of `value_count`. The principle of counting in es is to traverse the field of each document. If the field is an ordinary value, the count value is increased by 1. If it is an array type, the count value is increased by n. However, the problem lies in traversing each document and taking out the field, which changes from disk to an object in the Java language. We summarize its current problems with Elasticsearch as: - Number cast to string overhead, and GC problems caused by a large number of strings - After the number type is converted to string, sorting and other unnecessary operations are performed Here is the proof of type conversion overhead. ``` // Java long to string source code, getChars is very time-consuming. public static String toString(long i) { int size = stringSize(i); if (COMPACT_STRINGS) { byte[] buf = new byte[size]; getChars(i, size, buf); return new String(buf, LATIN1); } else { byte[] buf = new byte[size * 2]; StringUTF16.getChars(i, size, buf); return new String(buf, UTF16); } } ``` test type | average | min | max | sum ------------ | ------- | ---- | ----------- | ------- double->long | 32.2ns | 28ns | 0.024ms | 3.22s long->double | 31.9ns | 28ns | 0.036ms | 3.19s long->String | 163.8ns | 93ns | 1921 ms | 16.3s particularly serious. Our optimization code is actually very simple. It is to manage different types separately, instead of uniformly converting to string unified processing. We added type identification in ValueCountAggregator, and made special treatment for number and geopoint types to cancel their type conversion. Because the string type is reduced and the string constant is reduced, the improvement effect is very obvious. hits | avg | sum | value_count | value_count | value_count | value_count | value_count | value_count | | | | double | double | keyword | keyword | geo_point | geo_point | | | | before | after | before | after | before | after | ----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | 20,000 | 38s | .033s | .063s | .026s | .030s | .030s | .038s | .015s | 200,000 | 127s | .125s | .334s | .078s | .116s | .099s | .278s | .031s | 2,000,000 | 789s | .729s | 3.176s | .439s | .348s | .386s | 3.365s | .178s | 20,000,000 | 4.200s | 3.239s | 22.787s | 2.700s | 2.500s | 2.600s | 25.192s | 1.278s | 200,000,000 | 21.000s | 22.000s | 154.917s | 18.990s | 19.000s | 20.000s | 168.971s | 9.093s | - The results are more in line with common sense. `value_count` is about the same as `avg`, `sum`, etc., or even lower than these. Previously, `value_count` was much larger than avg and sum, and it was not even an order of magnitude when the amount of data was large. - When calculating numeric types such as `double` and `long`, the performance is improved by about 8 to 9 times; when calculating the `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
Search::
* Consolidate DelayableWriteable {es-pull}55932[#55932]
* Exists queries to MatchNoneQueryBuilder when the field is unmapped {es-pull}54857[#54857]
* Rewrite wrapper queries to match_none if possible {es-pull}55271[#55271]
* SearchService#canMatch takes into consideration the alias filter {es-pull}55120[#55120] (issue: {es-issue}55090[#55090])
Snapshot/Restore::
* Add GCS support for searchable snapshots {es-pull}55403[#55403]
* Allocate searchable snapshots with the balancer {es-pull}54889[#54889] (issues: {es-issue}50999[#50999], {es-issue}54729[#54729])
* Allow bulk snapshot deletes to abort {es-pull}56009[#56009] (issue: {es-issue}55773[#55773])
* Allow deleting multiple snapshots at once {es-pull}55474[#55474]
* Allow searching of snapshot taken while indexing {es-pull}55511[#55511] (issue: {es-issue}50999[#50999])
* Allow to prewarm the cache for searchable snapshot shards {es-pull}55322[#55322]
* Enable prewarming by default for searchable snapshots {es-pull}56201[#56201] (issue: {es-issue}55952[#55952])
* Permit searches to be concurrent to prewarming {es-pull}55795[#55795]
* Reduce contention in CacheFile.fileLock() method {es-pull}55662[#55662]
* Require soft deletes for searchable snapshots {es-pull}55453[#55453]
* Searchable Snapshots should respect max_restore_bytes_per_sec {es-pull}55952[#55952]
* Update the HDFS version used by HDFS Repo {es-pull}53693[#53693]
* Use streaming reads for GCS {es-pull}55506[#55506] (issue: {es-issue}55505[#55505])
* Use workers to warm cache parts {es-pull}55793[#55793] (issue: {es-issue}55322[#55322])
Task Management::
* Add indexName in update-settings task name {es-pull}55714[#55714]
* Add scroll info to search task description {es-pull}54606[#54606]
* Broadcast cancellation to only nodes have outstanding child tasks {es-pull}54312[#54312] (issues: {es-issue}50990[#50990], {es-issue}51157[#51157])
* Support hierarchical task cancellation {es-pull}54757[#54757] (issue: {es-issue}50990[#50990])
Transform::
* Add throttling {es-pull}56007[#56007] (issue: {es-issue}54862[#54862])
[[bug-7.8.0]]
[discrete]
=== Bug fixes
Aggregations::
* Add analytics plugin usage stats to _xpack/usage {es-pull}54911[#54911] (issue: {es-issue}54847[#54847])
* Aggregation support for Value Scripts that change types {es-pull}54830[#54830] (issue: {es-issue}54655[#54655])
* Allow terms agg to default to depth first {es-pull}54845[#54845]
* Clean up how pipeline aggs check for multi-bucket {es-pull}54161[#54161] (issue: {es-issue}53215[#53215])
* Fix auto_date_histogram serialization bug {es-pull}54447[#54447] (issues: {es-issue}54382[#54382], {es-issue}54429[#54429])
* Fix error massage for unknown value type {es-pull}55821[#55821] (issue: {es-issue}55727[#55727])
* Fix scripted metric in CCS {es-pull}54776[#54776] (issue: {es-issue}54758[#54758])
* Use Decimal formatter for Numeric ValuesSourceTypes {es-pull}54366[#54366] (issue: {es-issue}54365[#54365])
Allocation::
* Fix Broken ExistingStoreRecoverySource Deserialization {es-pull}55657[#55657] (issue: {es-issue}55513[#55513])
Features/ILM+SLM::
* ILM stop step execution if writeIndex is false {es-pull}54805[#54805]
Features/Indices APIs::
* Fix NPE in MetadataIndexTemplateService#findV2Template {es-pull}54945[#54945]
* Fix creating filtered alias using now in a date_nanos range query failed {es-pull}54785[#54785] (issue: {es-issue}54315[#54315])
* Fix simulating index templates without specified index {es-pull}56295[#56295] (issues: {es-issue}53101[#53101], {es-issue}56255[#56255])
* Validate non-negative priorities for V2 index templates {es-pull}56139[#56139] (issue: {es-issue}53101[#53101])
Features/Watcher::
* Ensure watcher email action message ids are always unique {es-pull}56574[#56574]
Infra/Core::
* Add generic Set support to streams {es-pull}54769[#54769] (issue: {es-issue}54708[#54708])
Machine Learning::
* Fix GET _ml/inference so size param is respected {es-pull}57303[#57303] (issue: {es-issue}57298[#57298])
* Fix file structure finder multiline merge max for delimited formats {es-pull}56023[#56023]
* Validate at least one feature is available for DF analytics {es-pull}55876[#55876] (issue: {es-issue}55593[#55593])
* Trap and fail if insufficient features are supplied to data frame analyses.
Otherwise, classification and regression got stuck at zero analyzing progress
{ml-pull}1160[#1160] (issue: {es-issue}55593[#55593])
* Make categorization respect the model_memory_limit {ml-pull}1167[#1167]
(issue: {ml-issue}1130[#1130])
* Respect user overrides for max_trees for classification and regression
{ml-pull}1185[#1185]
* Reset memory status from soft_limit to ok when pruning is no longer required
{ml-pull}1193[#1193] (issue: {ml-issue}1131[#1131])
* Fix restore from training state for classification and regression
{ml-pull}1197[#1197]
* Improve the initialization of seasonal components for anomaly detection
{ml-pull}1201[#1201] (issue: {ml-issue}#1178[#1178])
Network::
* Fix issue with pipeline releasing bytes early {es-pull}54458[#54458]
* Handle TLS file updates during startup {es-pull}54999[#54999] (issue: {es-issue}54867[#54867])
SQL::
* Fix DATETIME_PARSE behaviour regarding timezones {es-pull}56158[#56158] (issue: {es-issue}54960[#54960])
Search::
* Don't expand default_field in query_string before required {es-pull}55158[#55158] (issue: {es-issue}53789[#53789])
* Fix `time_zone` for `query_string` and date fields {es-pull}55881[#55881] (issue: {es-issue}55813[#55813])
Security::
* Fix certutil http for empty password with JDK 11 and lower {es-pull}55437[#55437] (issue: {es-issue}55386[#55386])
Transform::
* Fix count when matching exact ids {es-pull}56544[#56544] (issue: {es-issue}56196[#56196])
* Fix http status code when bad scripts are provided {es-pull}56117[#56117] (issue: {es-issue}55994[#55994])
[[regression-7.8.0]]
[discrete]
=== Regressions
Infra/Scripting::
* Don't double-wrap expression values {es-pull}54432[#54432] (issue: {es-issue}53661[#53661])