Commit Graph

51240 Commits

Author SHA1 Message Date
Nik Everett c00811f3a3
Make some agg tests easier to read (#54954) (#55079)
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.

I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
2020-04-10 14:15:30 -04:00
Nik Everett b99a50bcb9
value_count Aggregation optimization (backport of #54854) (#55076)
We found some problems during the test.

Data: 200Million docs, 1 shard, 0 replica

    hits    |   avg   |   sum   | value_count |
----------- | ------- | ------- | ----------- |
     20,000 |   .038s |   .033s |       .063s |
    200,000 |   .127s |   .125s |       .334s |
  2,000,000 |   .789s |   .729s |      3.176s |
 20,000,000 |  4.200s |  3.239s |     22.787s |
200,000,000 | 21.000s | 22.000s |    154.917s |

The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.

The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:

- Number cast to string overhead, and GC problems caused by a large
  number of strings
- After the number type is converted to string, sorting and other
  unnecessary operations are performed

Here is the proof of type conversion overhead.

```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
        int size = stringSize(i);
        if (COMPACT_STRINGS) {
            byte[] buf = new byte[size];
            getChars(i, size, buf);
            return new String(buf, LATIN1);
        } else {
            byte[] buf = new byte[size * 2];
            StringUTF16.getChars(i, size, buf);
            return new String(buf, UTF16);
        }
}
```

  test type  | average |  min |     max     |   sum
------------ | ------- | ---- | ----------- | -------
double->long |  32.2ns | 28ns |     0.024ms |  3.22s
long->double |  31.9ns | 28ns |     0.036ms |  3.19s
long->String | 163.8ns | 93ns |  1921    ms | 16.3s

particularly serious.

Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.

    hits    |   avg   |   sum   | value_count | value_count | value_count | value_count | value_count | value_count |
            |         |         |    double   |    double   |   keyword   |   keyword   |  geo_point  |  geo_point  |
            |         |         |   before    |    after    |   before    |    after    |   before    |    after    |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
     20,000 |     38s |   .033s |       .063s |       .026s |       .030s |       .030s |       .038s |       .015s |
    200,000 |    127s |   .125s |       .334s |       .078s |       .116s |       .099s |       .278s |       .031s |
  2,000,000 |    789s |   .729s |      3.176s |       .439s |       .348s |       .386s |      3.365s |       .178s |
 20,000,000 |  4.200s |  3.239s |     22.787s |      2.700s |      2.500s |      2.600s |     25.192s |      1.278s |
200,000,000 | 21.000s | 22.000s |    154.917s |     18.990s |     19.000s |     20.000s |    168.971s |      9.093s |

- The results are more in line with common sense. `value_count` is about
  the same as `avg`, `sum`, etc., or even lower than these. Previously,
  `value_count` was much larger than avg and sum, and it was not even an
  order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
  performance is improved by about 8 to 9 times; when calculating the
  `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
Mark Vieira 38590c83f0
Update opensuse 15.1 os identifier 2020-04-10 10:04:53 -07:00
Luca Cavanna 93c39ad4e7 Async search: create internal index only before storing initial response (#54619)
We currently create the .async-search index if necessary before performing any action (index, update or delete). Truth is that this is needed only before storing the initial response. The other operations are either update or delete, which will anyways not find the document to update/delete even if the index gets created when missing. This also caused `testCancellation` failures as we were trying to delete the document twice from the .async-search index, once from `TransportDeleteAsyncSearchAction` and once as a consequence of the search task being completed. The latter may be called after the test is completed, but before the cluster is shut down and causing problems to the after test checks, for instance if it happens after all the indices have been cleaned up. It is totally fine to try to delete a response that is no longer found, but not quite so if such call will also trigger an index creation.

With this commit we remove all the calls to createIndexIfNecessary from the update/delete operation, and we leave one call only from storeInitialResponse which is where the index is expected to be created.

Closes #54180
2020-04-10 18:24:05 +02:00
Tim Brooks 98fba92022
Fail sniff process if no connections opened (#54934)
Currently the remote cluster sniff connection process can succeed even
if no connections are opened. This commit fixes this by failing the
connection process if no connections are successfully opened.
2020-04-10 10:06:45 -06:00
Ross Wolf 96a903b17f
EQL: Add string function (#54470)
* EQL: Add string() function
* EQL: Reorder queryfolder_tests
* EQL: Add test queries
* EQL: Fix InternalEqlScriptUtils.string and test case
* EQL: Fix testStringFunctionWithText error message
* EQL: Flatten ToStringFunctionPipe.equals
* EQL: Reorder painless whitelist
* EQL: Address feedback and remove string(null) handling
* EQL: Move string(pid) test over
* EQL: Rename source -> value
2020-04-10 09:48:29 -06:00
Jim Ferenczi d14ed34577 Explicitly test rewrite of date histogram's time zones on date_nanos (#54402)
This commit adds an explicit test of time zone rewrite on date nanos
field. Today this is working but we need tests to ensure that we don't
break it unintentionally.
2020-04-10 17:37:59 +02:00
Igor Motov da976d247f
Improve robustness of Query Result serializations (#54692) (#55028)
Makes query result serialization more robust by propagating possible
IOExceptions that can occur during shard level result serialization to the
caller instead of throwing AssertionError that is not intercepted.

Fixes #54665
2020-04-10 10:29:01 -04:00
Przemysław Witek 17101d86d9
[7.x] Do not execute ML CRUD actions when upgrade mode is enabled (#54437) (#55049) 2020-04-10 16:07:11 +02:00
James Rodewig c440754784 [DOCS] EQL: Document `wildcard` function (#54086) 2020-04-10 09:18:29 -04:00
oneoneonepig 356cc94889 [DOCS] Fix double quote typo in 7.0 breaking changes (#55040) 2020-04-10 09:11:51 -04:00
Dimitrios Liappis b062535e27
Mute testSearchableSnapshotAction in TimeSeriesLifecycleActions tests (#55055)
Backport of #55052
Details in #55050
2020-04-10 16:03:09 +03:00
Jason Tedor a370668fcc
Clean up even more instances of "metaData"
We recently cleaned up the use of the word "metadata" across the
codebase. Even more additional uses have trickled in, likely from
in-progress work. This commit cleans up these last few additional
instances.

Relates #54519
2020-04-10 08:52:37 -04:00
Jason Tedor 9eeae59a83
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:48:27 -04:00
James Rodewig 51326432be [DOCS] Add query reference docs template (#52292) 2020-04-10 08:47:54 -04:00
James Rodewig d5a609a2e5 [DOCS] Add token filter reference docs template (#52290)
Creates a reusable template for token filter reference documentation.

Contributors can make a copy of this template and customize it when
documenting new token filters.
2020-04-10 08:45:10 -04:00
Przemko Robakowski 35c195b224
Prevent putting V2 index template when overlapping with existing template (#54933) (#55042)
* Prevent putting V2 index template when overlapping with existing template

This change prevents putting V2 index template when it would overlap with existing V2 template
of the same priority

Relates to #53101
2020-04-10 10:31:37 +02:00
Costin Leau a7e4f79e8f EQL: Deprecate lenient sequence declaration (#55032)
Deprecate alternative sequence parameter declaration (with then by)
Disallow lack of time units inside maxspan

Fix #55023
Relate #54680

(cherry picked from commit 201adafba9def1de4bf843760defb9def3394f63)
2020-04-10 10:30:07 +03:00
Marios Trivyzas bf0cadb602
SQL: Implement DATETIME_PARSE function for parsing strings (#54960) (#55035)
Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function
which allows to parse a datetime string according to the specified
pattern into a datetime object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Relates to #53714

(cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)
2020-04-10 01:16:29 +02:00
Vishal Patel 51cb0c5c7b [DOCS] Collapse nested objects in cluster reroute docs (#54851) 2020-04-09 15:29:22 -04:00
Mark Vieira 12f056b833
Update IDE integration to reflect Java 14 requirement (#54990) 2020-04-09 12:27:57 -07:00
Nik Everett 62d6bc31bf
Reduce memory for big aggs run against many shards (#54758) (#55024)
This changes the behavior of aggregations when search is performed
against enough shards to enable "batch reduce" mode. In this case we
force always store aggregations in serialized form rather than a
traditional java reference. This should shrink the memory usage of large
aggregations at the cost of slightly slowing down aggregations where the
coordinating node is also a data node. Because we're only doing this
when there are many shards this is likely to be fairly rare.

As a side effect this lets us add logs for the memory usage of the aggs
buffer:
```
[2020-04-03T17:03:57,052][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1320->448] max [1320]
[2020-04-03T17:03:57,089][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328]
[2020-04-03T17:03:57,102][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328]
[2020-04-03T17:03:57,103][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [1328->448] max [1328]
[2020-04-03T17:03:57,105][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [888] max [1328]
```

These are useful, but you need to keep some things in mind before
trusting them:
1. The buffers are oversized ala Lucene's ArrayUtils. This means that we
   are using more space than we need, but probably not much more.
2. Before they are merged the aggregations are inflated into their
   traditional Java objects which *probably* take up a lot more space
   than the serialized form. That is, after all, the reason why we store
   them in serialized form in the first place.

And, just because I can, here is another example of the log:
```
[2020-04-03T17:06:18,731][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528]
[2020-04-03T17:06:18,750][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528]
[2020-04-03T17:06:18,809][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528]
[2020-04-03T17:06:18,827][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs partial reduction [147528->49176] max [147528]
[2020-04-03T17:06:18,829][TRACE][o.e.a.s.SearchPhaseController] [runTask-0] aggs final reduction [98352] max [147528]
```

I got that last one by building a ten shard index with a million docs in
it and running a `sum` in three layers of `terms` aggregations, all on
`long` fields, and with a `batched_reduce_size` of `3`.
2020-04-09 14:58:42 -04:00
Julie Tibshirani 850ea7c0be Correct the name of the docvalues_fields object parser. 2020-04-09 11:36:28 -07:00
Nhat Nguyen c9f8fb2dd0 Clear recent errors when auto-follow successfully (#54997)
Today, we do not clear the recent errors in AutoFollowCoordinator when 
we successfully auto-follow indices. This can lead to confusion for the
operators.
2020-04-09 14:35:16 -04:00
István Zoltán Szabó 374f633b6e [DOCS] Adds link points to the data frame analytics supported fields (#55004)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-09 11:27:57 -07:00
Nik Everett 83c328f125
Deprecate serializing PipelineAggregators (#54926) (#55025)
`PipelineAggregator`s are only sent across the wire for backwards
compatibility with 7.7.0. `PipelineAggregator` needs to continue to
implement `NamedWriteable` for backwards compatibility but pipeline
aggregations created after 7.7.0 need not implement any of the methods
in that interface because we'll never attempt to call them. So this
creates implementations in `PipelineAggregator` (the base class) that
just throw exceptions.
2020-04-09 14:13:47 -04:00
Martijn van Groningen 7f38b146b3
Temporarily preserve data streams after each yaml rest test has executed. (#54959) (#55007)
Instead delete the data streams manually, until client yaml test runners
have been updated to also delete all data streams after each yaml test.

Relates to #53100
2020-04-09 14:44:57 +02:00
Albert Zaharovits f55a361b64
Preserve Task Id for ML Datafeed (#54943)
This change preserves the task id for internal requests for the `StartDatafeedPersistentTask`.

Task ids are a way to express a relationship between related internal requests.
In this particular case, the task ids are used for debugging and (soon) security auditing,
but not for task cancellation, because there is already a graceful-shutdown of child
internal requests (given a task id) in place.
2020-04-09 13:22:29 +03:00
Przemko Robakowski adc6e880cf
Fix NPE in MetadataIndexTemplateService#findV2Template (#54945) (#55001)
This commit fixes potential NPE when there's V2 template with `null` priority.
This is done by using `null`-safe comparator.
2020-04-09 11:34:20 +02:00
Armin Braun f6bdd30165
Fix S3 Blob Container Retries Test Range Handling (#55000) (#55002)
The ranges in HTTP headers are using inclusive values for start and end of the range.
The math we used was off in so far that start equals end for the range resulted in length `0`
instead of the correct value of `1`.
Closes #54981
Closes #54995
2020-04-09 10:58:42 +02:00
Hendrik Muhs 223fbb2ae7 [Transform] fix sporadic test failure due to unavailable notif… (#54939)
move no initializing shards check before dumping audit messages

fixes #54810
2020-04-09 08:04:42 +02:00
Przemko Robakowski afa3467957
[7.x] HLRC support for Index Templates V2 (#54838) (#54932)
* HLRC support for Index Templates V2 (#54838)

* HLRC support for Index Templates V2

This change adds High Level Rest Client support for Index Templates V2.

Relates to #53101

* fixed compilation error

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-09 07:43:13 +02:00
Mark Vieira dd73a14d11
Improve total build configuration time (#54611) (#54994)
This commit includes a number of changes to reduce overall build
configuration time. These optimizations include:

- Removing the usage of the 'nebula.info-scm' plugin. This plugin
   leverages jgit to load read various pieces of VCS information. This
   is mostly overkill and we have our own minimal implementation for
   determining the current commit id.
- Removing unnecessary build dependencies such as perforce and jgit
   now that we don't need them. This reduces our classpath considerably.
- Expanding the usage lazy task creation, particularly in our
   distribution projects. The archives and packages projects create
   lots of tasks with very complex configuration. Avoiding the creation
   of these tasks at configuration time gives us a nice boost.
2020-04-08 16:47:02 -07:00
Mark Vieira ac6d1f7b24
Mute S3BlobContainerRetriesTests.testReadRangeBlobWithRetries 2020-04-08 16:45:38 -07:00
Andrei Stefan 85f129a50a
EQL: indexOf function implementation (#54543) (#54989)
(cherry picked from commit a4b1d6e52d9ba22d541dd86d69861b1efee83604)
2020-04-09 02:41:01 +03:00
Lee Hinman 1f17df13c1 Bump minimum version for component template CRUD test (#54992)
These tests do CRUD for component templates, however, for 7.7 some changes weren't backported in the
`_doc` wrapping/unwrapping done for the APIs, this can cause test failures.

This bumps the minimum version for these tests to 7.8, which is okay because component templates are
hidden behind a flag and have no compatibility guarantees for 7.7.

Relates to #53101
2020-04-08 16:39:46 -06:00
Mark Vieira 1552f2fa3e
Enable searchable snapshots for release tests (#54987) 2020-04-08 14:41:03 -07:00
Jake Landis 2b970e2a8d
[7.x] Allow different source sets from forbiddenApis (#54731) (#54983)
ForbiddenApis task via the precommit task currently makes an assumption
that only the test and main source sets are present for any given project.
This commit removes that assumption and allows for any project source set's
compileClasspath class path to be added to the forbiddenApis classpath
configuration.
2020-04-08 16:31:04 -05:00
James Rodewig c6cd8ca7c0
[DOCS] Update upgrade docs for 7.7 (#54978) 2020-04-08 16:23:08 -04:00
Dan Hermann c7f9a27d2d
Delete backing indices with data stream (#54693) (#54976) 2020-04-08 15:18:12 -05:00
Mark Vieira 264bfaca56
Mute S3BlobContainerRetriesTests.testReadBlobWithPrematureConnectionClose 2020-04-08 13:05:35 -07:00
Mark Vieira 0fa8a14bcb
Mute SamlServiceProviderDocumentTests.testStreamRoundTripWithAllFields 2020-04-08 12:56:36 -07:00
Lee Hinman 3b879b0821
[7.x] Use V2 templates when reading duplicate aliases and inge… (#54973)
When a new index is rolled over, we check to see whether there are any duplicate alias
configurations in the index template configuration. Additionally, when a new index is created from a
bulk action, we check the templates to see if there are any ingest pipelines that need to be applied
to the index that will be newly created.

Both of these actions previously checked the v1 templates for their settings, they now also check
the v2 index templates, with the v2 index templates taking precendence similar to the way they do
when creating an index.

Relates to #53101
2020-04-08 13:33:14 -06:00
Armin Braun 411dc2f607
Fix Broken Math in S3 Retries Tests (#54952) (#54972)
If we run into `length == 0` we trip an assertion in `randomIntBetween(0, length -1)`.
2020-04-08 20:32:21 +02:00
James Rodewig 964cf565c9
[DOCS] EQL: Document `between` function (#54950) 2020-04-08 13:49:15 -04:00
Lee Hinman c2c0707174
[7.x] Add allowed warnings to index template composition tests… (#54961)
We occasionally add a global template for our YAML tests, and this can cause warnings for these
template tests. This commit adds these warnings so they don't cause test failures.

Resolves #54822

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-08 10:17:32 -06:00
Jason Tedor a5391c915e
Set JAVA14_HOME in CI (#54955)
This commit updates the CI defaults so that JAVA14_HOME is set.
2020-04-08 11:09:50 -04:00
Jay Modi 3600c9862f
Reintroduce system index APIs for Kibana (#54935)
This change reintroduces the system index APIs for Kibana without the
changes made for marking what system indices could be accessed using
these APIs. In essence, this is a partial revert of #53912. The changes
for marking what system indices should be allowed access will be
handled in a separate change.

The APIs introduced here are wrapped versions of the existing REST
endpoints. A new setting is also introduced since the Kibana system
indices' names are allowed to be changed by a user in case multiple
instances of Kibana use the same instance of Elasticsearch.

Relates #52385
Backport of #54858
2020-04-08 09:08:49 -06:00
Bogdan Pintea 8d6d7b88d8
SQL: drop BASE TABLE type in favour for just TABLE (#54836) (#54951)
* Drop BASE TABLE type in favour for just TABLE

This commit drops the table type 'BASE TABLE' and replaces all
occurences with just 'TABLE', since his type is wider-used and
friendlier to the client applications that query for certain table types
in their discovery mode.

The 'TABLE' type is also explicitely mentioned by the JDBC and ODBC
standards and although other data source-specific types are permitted,
older apps will not work well with them.

* Refactor table type constants out of IndexType

Move SQL_TABLE/_ALIAS out of IndexType, so that they can also be used in
that Enum definition.

(cherry picked from commit 70241b52697ac2cf71004040042123c1ec050299)
2020-04-08 16:02:12 +02:00
Jason Tedor 6853d73e88
Require JDK 14 for compilation (#54696)
This commit bumps the minimum JDK required for compilation to JDK 14.
2020-04-08 09:26:29 -04:00