35 Commits

Author SHA1 Message Date
Nik Everett
b99a50bcb9
value_count Aggregation optimization (backport of #54854) (#55076)
We found some problems during the test.

Data: 200Million docs, 1 shard, 0 replica

    hits    |   avg   |   sum   | value_count |
----------- | ------- | ------- | ----------- |
     20,000 |   .038s |   .033s |       .063s |
    200,000 |   .127s |   .125s |       .334s |
  2,000,000 |   .789s |   .729s |      3.176s |
 20,000,000 |  4.200s |  3.239s |     22.787s |
200,000,000 | 21.000s | 22.000s |    154.917s |

The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.

The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:

- Number cast to string overhead, and GC problems caused by a large
  number of strings
- After the number type is converted to string, sorting and other
  unnecessary operations are performed

Here is the proof of type conversion overhead.

```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
        int size = stringSize(i);
        if (COMPACT_STRINGS) {
            byte[] buf = new byte[size];
            getChars(i, size, buf);
            return new String(buf, LATIN1);
        } else {
            byte[] buf = new byte[size * 2];
            StringUTF16.getChars(i, size, buf);
            return new String(buf, UTF16);
        }
}
```

  test type  | average |  min |     max     |   sum
------------ | ------- | ---- | ----------- | -------
double->long |  32.2ns | 28ns |     0.024ms |  3.22s
long->double |  31.9ns | 28ns |     0.036ms |  3.19s
long->String | 163.8ns | 93ns |  1921    ms | 16.3s

particularly serious.

Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.

    hits    |   avg   |   sum   | value_count | value_count | value_count | value_count | value_count | value_count |
            |         |         |    double   |    double   |   keyword   |   keyword   |  geo_point  |  geo_point  |
            |         |         |   before    |    after    |   before    |    after    |   before    |    after    |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
     20,000 |     38s |   .033s |       .063s |       .026s |       .030s |       .030s |       .038s |       .015s |
    200,000 |    127s |   .125s |       .334s |       .078s |       .116s |       .099s |       .278s |       .031s |
  2,000,000 |    789s |   .729s |      3.176s |       .439s |       .348s |       .386s |      3.365s |       .178s |
 20,000,000 |  4.200s |  3.239s |     22.787s |      2.700s |      2.500s |      2.600s |     25.192s |      1.278s |
200,000,000 | 21.000s | 22.000s |    154.917s |     18.990s |     19.000s |     20.000s |    168.971s |      9.093s |

- The results are more in line with common sense. `value_count` is about
  the same as `avg`, `sum`, etc., or even lower than these. Previously,
  `value_count` was much larger than avg and sum, and it was not even an
  order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
  performance is improved by about 8 to 9 times; when calculating the
  `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
James Rodewig
7401191019
[DOCS] Include 7.7.0 release notes (#54529)
Includes the 7.7.0 release notes so they render in the HTML docs.

Also removes a few legacy `coming[7.6.0]` tags.
2020-03-31 16:23:49 -04:00
Jim Ferenczi
55f2e8bff0 [DOCS] Add 7.6.2 release notes (#53720)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: lcawl <lcawley@elastic.co>
2020-03-24 22:42:25 +01:00
Yannick Welsch
d1e7951e00 [DOCS] Add 7.6.1. release notes (#52874)
Adds the release notes for 7.6.1.
2020-03-04 15:47:54 +01:00
James Rodewig
2353fe47fc [DOCS] Adds placeholder for 7.5.2 release notes (#51124)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-16 14:42:24 -05:00
Zachary Tong
8f48c8d312 Add 7.6.0 release notes 2020-01-15 14:10:37 -05:00
taku333
65af0a0f0a [DOCS] Add 7.5.1 link to release notes overview (#51022) 2020-01-15 11:53:26 -05:00
jimczi
0e82b5f59b add release notes for 7.5.0 2019-11-12 09:59:14 +01:00
debadair
a876760848 [DOCS] Add placeholder for 7.4.2 release notes (#48724) 2019-10-30 16:09:29 -07:00
James Rodewig
e931fcd331 [DOCS] Add placeholder for 7.4.1 release notes (#48316) 2019-10-22 07:53:28 -05:00
Lisa Cawley
dea472c6fb [DOCS] Adds machine learning PRs to release notes (#46564) 2019-09-11 08:30:59 -07:00
Colin Goodheart-Smithe
57ac4f391b
Adds release notes for 7.4.0 2019-08-30 09:08:42 +01:00
James Rodewig
5dcc00a8b3 [DOCS] Add placeholder for 7.3.1 release notes (#45710) 2019-08-19 17:04:02 -04:00
Lisa Cawley
6c7f7d4a10 [DOCS] Adds ml-cpp PRs to release notes (#44354) 2019-07-15 09:22:36 -07:00
Adrien Grand
64ff895a32 Add 7.3 release notes. (#44010) 2019-07-10 09:36:51 +02:00
Jake Landis
51161a4b0e
add 7.2.0 release notes 2019-06-26 08:50:11 -05:00
Christian Kotzbauer
929215c0d5
Update release-notes.asciidoc (#42779) 2019-06-01 08:18:00 -04:00
Jake Landis
87bff89500
7.1.0 release notes forward port (#42252)
Forward port of #42208
2019-05-20 14:39:17 -04:00
James Rodewig
732ef15f0d [DOCS] Adds placeholder for 7.1.0 release notes (#42024) 2019-05-09 13:17:04 -04:00
lcawl
f4348843ba [DOCS] Adds placeholder for 7.0.0 release notes 2019-04-05 14:27:05 -07:00
lcawl
7aa3cf5445 [DOCS] Adds placeholder for 7.0.0-rc2 release notes 2019-04-01 12:02:13 -07:00
lcawl
15a1e65e48 [DOCS] Adds placeholder for 7.0.0-rc1 release notes 2019-03-22 11:34:50 -07:00
debadair
d9c255dbbf [DOCS] Added include and reference to beta1 RNs (#38905) 2019-02-14 07:43:54 -08:00
lcawl
55743aac47 [DOCS] Adds placeholder for alpha2 release notes 2018-12-11 14:26:41 -08:00
Lisa Cawley
949e4e9d1a
[DOCS] Synchronizes captialization in top-level titles (#33605) 2018-09-27 08:36:18 -07:00
Lee Hinman
cfad6688b0 Migrate migration docs from 6.0 to 7.0 (#26227)
* Migrate migration docs from 6.0 to 7.0

Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.

* Remove release notes as well

They link to the migration guides, so they have to go.

* Add placeholder notes for 7.0 so doc build is happy
2017-08-16 13:12:44 -06:00
Clinton Gormley
8b9c201224 Added release notes for 6.0.0-alpha2 2017-06-06 11:52:18 +02:00
Clinton Gormley
0174119296 Added release notes for 6.0.0-alpha1 2017-05-05 12:39:50 +02:00
Clinton Gormley
1f11a5d93c Remove links to release notes 2016-09-08 18:07:39 +02:00
Clinton Gormley
3922392218 Added release notes for 5.0.0-alpha5 2016-07-29 14:43:31 +02:00
Clinton Gormley
e1ab3f16fd Add link to alpha4 release notes 2016-06-30 18:32:15 +02:00
Clinton Gormley
85bf48b4c1 Added release notes for 5.0.0-alpha3 2016-05-31 11:51:10 +02:00
Clinton Gormley
9fee8c76af Added release notes for 5.0.0-alpha2 2016-05-02 14:21:59 +02:00
Clinton Gormley
e07b6a2641 Docs: Added 5.0.0-alpha1 release notes 2016-03-18 14:51:49 +01:00
Clinton Gormley
9e0ca4a795 Updated the release-notes script to produce AsciiDoc and added placeholders 2015-11-20 20:05:53 +01:00