Commit Graph

36 Commits

Author SHA1 Message Date
Lisa Cawley b5542e0480
[DOCS] Adds ml-cpp PRs to release notes (#57444) 2020-06-01 09:56:32 -07:00
Nik Everett b99a50bcb9
value_count Aggregation optimization (backport of #54854) (#55076)
We found some problems during the test.

Data: 200Million docs, 1 shard, 0 replica

    hits    |   avg   |   sum   | value_count |
----------- | ------- | ------- | ----------- |
     20,000 |   .038s |   .033s |       .063s |
    200,000 |   .127s |   .125s |       .334s |
  2,000,000 |   .789s |   .729s |      3.176s |
 20,000,000 |  4.200s |  3.239s |     22.787s |
200,000,000 | 21.000s | 22.000s |    154.917s |

The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.

The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:

- Number cast to string overhead, and GC problems caused by a large
  number of strings
- After the number type is converted to string, sorting and other
  unnecessary operations are performed

Here is the proof of type conversion overhead.

```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
        int size = stringSize(i);
        if (COMPACT_STRINGS) {
            byte[] buf = new byte[size];
            getChars(i, size, buf);
            return new String(buf, LATIN1);
        } else {
            byte[] buf = new byte[size * 2];
            StringUTF16.getChars(i, size, buf);
            return new String(buf, UTF16);
        }
}
```

  test type  | average |  min |     max     |   sum
------------ | ------- | ---- | ----------- | -------
double->long |  32.2ns | 28ns |     0.024ms |  3.22s
long->double |  31.9ns | 28ns |     0.036ms |  3.19s
long->String | 163.8ns | 93ns |  1921    ms | 16.3s

particularly serious.

Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.

    hits    |   avg   |   sum   | value_count | value_count | value_count | value_count | value_count | value_count |
            |         |         |    double   |    double   |   keyword   |   keyword   |  geo_point  |  geo_point  |
            |         |         |   before    |    after    |   before    |    after    |   before    |    after    |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
     20,000 |     38s |   .033s |       .063s |       .026s |       .030s |       .030s |       .038s |       .015s |
    200,000 |    127s |   .125s |       .334s |       .078s |       .116s |       .099s |       .278s |       .031s |
  2,000,000 |    789s |   .729s |      3.176s |       .439s |       .348s |       .386s |      3.365s |       .178s |
 20,000,000 |  4.200s |  3.239s |     22.787s |      2.700s |      2.500s |      2.600s |     25.192s |      1.278s |
200,000,000 | 21.000s | 22.000s |    154.917s |     18.990s |     19.000s |     20.000s |    168.971s |      9.093s |

- The results are more in line with common sense. `value_count` is about
  the same as `avg`, `sum`, etc., or even lower than these. Previously,
  `value_count` was much larger than avg and sum, and it was not even an
  order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
  performance is improved by about 8 to 9 times; when calculating the
  `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
James Rodewig 7401191019
[DOCS] Include 7.7.0 release notes (#54529)
Includes the 7.7.0 release notes so they render in the HTML docs.

Also removes a few legacy `coming[7.6.0]` tags.
2020-03-31 16:23:49 -04:00
Jim Ferenczi 55f2e8bff0 [DOCS] Add 7.6.2 release notes (#53720)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: lcawl <lcawley@elastic.co>
2020-03-24 22:42:25 +01:00
Yannick Welsch d1e7951e00 [DOCS] Add 7.6.1. release notes (#52874)
Adds the release notes for 7.6.1.
2020-03-04 15:47:54 +01:00
James Rodewig 2353fe47fc [DOCS] Adds placeholder for 7.5.2 release notes (#51124)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-16 14:42:24 -05:00
Zachary Tong 8f48c8d312 Add 7.6.0 release notes 2020-01-15 14:10:37 -05:00
taku333 65af0a0f0a [DOCS] Add 7.5.1 link to release notes overview (#51022) 2020-01-15 11:53:26 -05:00
jimczi 0e82b5f59b add release notes for 7.5.0 2019-11-12 09:59:14 +01:00
debadair a876760848 [DOCS] Add placeholder for 7.4.2 release notes (#48724) 2019-10-30 16:09:29 -07:00
James Rodewig e931fcd331 [DOCS] Add placeholder for 7.4.1 release notes (#48316) 2019-10-22 07:53:28 -05:00
Lisa Cawley dea472c6fb [DOCS] Adds machine learning PRs to release notes (#46564) 2019-09-11 08:30:59 -07:00
Colin Goodheart-Smithe 57ac4f391b
Adds release notes for 7.4.0 2019-08-30 09:08:42 +01:00
James Rodewig 5dcc00a8b3 [DOCS] Add placeholder for 7.3.1 release notes (#45710) 2019-08-19 17:04:02 -04:00
Lisa Cawley 6c7f7d4a10 [DOCS] Adds ml-cpp PRs to release notes (#44354) 2019-07-15 09:22:36 -07:00
Adrien Grand 64ff895a32 Add 7.3 release notes. (#44010) 2019-07-10 09:36:51 +02:00
Jake Landis 51161a4b0e
add 7.2.0 release notes 2019-06-26 08:50:11 -05:00
Christian Kotzbauer 929215c0d5
Update release-notes.asciidoc (#42779) 2019-06-01 08:18:00 -04:00
Jake Landis 87bff89500
7.1.0 release notes forward port (#42252)
Forward port of #42208
2019-05-20 14:39:17 -04:00
James Rodewig 732ef15f0d [DOCS] Adds placeholder for 7.1.0 release notes (#42024) 2019-05-09 13:17:04 -04:00
lcawl f4348843ba [DOCS] Adds placeholder for 7.0.0 release notes 2019-04-05 14:27:05 -07:00
lcawl 7aa3cf5445 [DOCS] Adds placeholder for 7.0.0-rc2 release notes 2019-04-01 12:02:13 -07:00
lcawl 15a1e65e48 [DOCS] Adds placeholder for 7.0.0-rc1 release notes 2019-03-22 11:34:50 -07:00
debadair d9c255dbbf [DOCS] Added include and reference to beta1 RNs (#38905) 2019-02-14 07:43:54 -08:00
lcawl 55743aac47 [DOCS] Adds placeholder for alpha2 release notes 2018-12-11 14:26:41 -08:00
Lisa Cawley 949e4e9d1a
[DOCS] Synchronizes captialization in top-level titles (#33605) 2018-09-27 08:36:18 -07:00
Lee Hinman cfad6688b0 Migrate migration docs from 6.0 to 7.0 (#26227)
* Migrate migration docs from 6.0 to 7.0

Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.

* Remove release notes as well

They link to the migration guides, so they have to go.

* Add placeholder notes for 7.0 so doc build is happy
2017-08-16 13:12:44 -06:00
Clinton Gormley 8b9c201224 Added release notes for 6.0.0-alpha2 2017-06-06 11:52:18 +02:00
Clinton Gormley 0174119296 Added release notes for 6.0.0-alpha1 2017-05-05 12:39:50 +02:00
Clinton Gormley 1f11a5d93c Remove links to release notes 2016-09-08 18:07:39 +02:00
Clinton Gormley 3922392218 Added release notes for 5.0.0-alpha5 2016-07-29 14:43:31 +02:00
Clinton Gormley e1ab3f16fd Add link to alpha4 release notes 2016-06-30 18:32:15 +02:00
Clinton Gormley 85bf48b4c1 Added release notes for 5.0.0-alpha3 2016-05-31 11:51:10 +02:00
Clinton Gormley 9fee8c76af Added release notes for 5.0.0-alpha2 2016-05-02 14:21:59 +02:00
Clinton Gormley e07b6a2641 Docs: Added 5.0.0-alpha1 release notes 2016-03-18 14:51:49 +01:00
Clinton Gormley 9e0ca4a795 Updated the release-notes script to produce AsciiDoc and added placeholders 2015-11-20 20:05:53 +01:00