Commit Graph

8257 Commits

Author SHA1 Message Date
Andrei Dan ef338ee3d4
ILM DOCS: mention forcemerge is best effort (#54794) (#55401)
(cherry picked from commit 3fd05435c52dd265dbe1a40104e7dc7a335d50ae)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-17 15:42:23 +01:00
James Rodewig f87a3f0c48 [DOCS] Document analysis/mapping response for cluster stats API (#55054)
PR #51260 moved usage counts about mapping field types and analysis to
the `_cluster/stats` API.

This documents those stats in the response section of the cluster stats
API docs.
2020-04-17 08:44:10 -04:00
Adrien Grand 0cb6a1f089
Document the index corruption bug that gets fixed via Lucene 8.5.1. (#55232)
Using soft deletes on shrunk indices may cause corruption.
2020-04-17 13:37:37 +02:00
markharwood 7761b01a33
Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294) (#55375)
Closes #55288
2020-04-17 11:43:26 +01:00
Marios Trivyzas f958e9abdc
SQL: Implement scripting inside aggs (#55241) (#55371)
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:

```
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b
FROM test
GROUP BY b
HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
```

Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as
this is currently not implemented on the ElasticSearch side.

Fixes: #29980
Fixes: #36865
Fixes: #37271

(cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)
2020-04-17 12:41:22 +02:00
Lisa Cawley c7cf6e621d [DOCS] Remove text fields from classification dependent variables (#54849) 2020-04-16 13:40:28 -07:00
Lisa Cawley cf5278f771 [DOCS] Add ml-cpp PRs to 7.7 release notes (#55264)
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
2020-04-16 11:28:34 -07:00
Julie Tibshirani d7cded8d7a
Fix updating include_in_parent/include_in_root of nested field. (#55326)
The main changes are:
1. Throw an error when updating `include_in_parent` or `include_in_root` attribute of nested field dynamically by the PUT mapping API.
2. Add a test for the change.

Closes #53792

Co-authored-by: bellengao <gbl_long@163.com>
2020-04-16 11:17:12 -07:00
James Rodewig f0b9be8b1b [DOCS] Reformat `flatten_graph` token filter (#54268)
* [DOCS] Reformat `flatten_graph` token filter

Makes the following changes to the `flatten_graph` token filter docs:

* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Adds analyzer example
2020-04-16 08:35:08 -04:00
Bogdan Pintea b88dd47de3 Docs: add the change log for 7.7 (#55019)
* Add the change log for 7.7

Add the change log for 7.7

* Update rel. notes to latest state (BC5)

Update the release notes to current state (i.e. BC5).

* Update docs/reference/release-notes/7.7.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2020-04-15 15:25:08 -04:00
Lisa Cawley f0b9578684 [DOCS] Removes transform performance note (#55177) 2020-04-15 10:42:52 -07:00
Ignacio Vera a677b63daa
Upgrade to lucene 8.5.1 release (#55229) (#55235)
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
2020-04-15 17:35:42 +02:00
James Rodewig 4f2ab96f38 [DOCS] EQL: Document `indexOf` function (#55071) 2020-04-15 11:29:50 -04:00
James Rodewig 8d6f0f6a76 [DOCS] Document `max_concurrent_searches` default (#55116) 2020-04-15 10:04:23 -04:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00
Jake Landis 85139fad7e
[7.x] Advise a simpler curator migration (#54457) (#55188)
Advice for migrating from Curator should simply be to phase out curator managed indices, 
since curator will ignore ILM indices
https://www.elastic.co/guide/en/elasticsearch/client/curator/5.7/ilm-and-curator.html#ilm-and-curator.

Co-authored-by: Jay Greenberg <PhaedrusTheGreek@users.noreply.github.com>
2020-04-15 07:55:31 -05:00
Lisa Cawley 2910d01179
[DOCS] Removes unshared sections from ml-shared.asciidoc (#55192) 2020-04-14 18:47:09 -07:00
Yang Wang f49354b7d7
Add migration notes for deprecating local parameter of get field mapping API (#55194)
This is a follow-up for #55099 to add migration notes about the deprecation of local parameter for get field mappings API.
2020-04-15 11:38:05 +10:00
Igor Motov 1754e50cbd
[7.x] Add analytics plugin usage stats to _xpack/usage (#54911) (#55162)
Adds analytics plugin usage stats to _xpack/usage.

Closes #54847
2020-04-14 17:03:14 -04:00
James Rodewig 12130843ca
[DOCS] Add maintenance releases to upgrade table (#55012)
Updates the supported upgrade path table in [Upgrade Elasticsearch][0]
to include a new row for maintenance releases. For example, this row
covers upgrading from 7.6.0 to 7.6.2.

The new table row only displays for releases greater than n.x.0. For
example, the new row will display for the 7.7.1 release but not the
7.7.0 release.

[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/setup-upgrade.html
2020-04-14 11:28:55 -04:00
James Rodewig 3fbd8b371f [DOCS] Use consistent line breaks in EQL function docs 2020-04-14 10:17:45 -04:00
Yannick Welsch a610513ec7 Provide repository-level stats for searchable snapshots (#55051)
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
2020-04-14 14:34:08 +02:00
lcawl fcd96db006 [DOCS] Edits create data frame analytics job API (#54751) 2020-04-13 10:43:52 -07:00
Nhat Nguyen 96bb1164f0 Support hierarchical task cancellation (#54757)
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.

Closes #50990
2020-04-13 12:35:21 -04:00
Igor Motov 51c6f69e02
[7.x] Add support for filters to T-Test aggregation (#54980) (#55066)
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
2020-04-13 12:28:58 -04:00
Jake Landis a2fafa6af4
[7.x] Lazy test cluster module and plugins (#54852) (#55087)
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
2020-04-13 10:53:35 -05:00
James Rodewig 57d6493e29 [DOCS] EQL: Document `string` function (#55086) 2020-04-13 11:23:45 -04:00
Peter Dyson f0b6cf4c11 [DOCS] Note where ILM policies are stored and backup caveats (#54859) 2020-04-13 09:11:16 -06:00
Vishal Patel 16921ebbd8 [DOCS] Collapse nested objects in Explore API docs (#55067)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-04-13 09:27:03 -04:00
Ioannis Kakavas 7a8a66d9ae
[7.x] Fix ReloadSecureSettings API to consume password (#54771) (#55059)
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:

- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
2020-04-13 09:50:55 +03:00
Yang Wang 862799956c
Deprecate local parameter for get field mapping request (#55014) (#55099)
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.

This PR deprecates the parameter from rest layer. It will be removed in next major version.
2020-04-12 13:48:47 +10:00
James Rodewig 2655dfa2fe [DOCS] EQL: Reword field support for EQL functions (#55074)
Changes boilerplate sentence of "If using a field as the argument, this
parameter only supports..." to "...this parameter supports only...".

The latter is a bit more clear and readable.
2020-04-10 15:33:29 -04:00
Jason Tedor d1137ebdaa
Passthrough special characters in thread pool docs (#55080)
Some of these characters are special to Asciidoctor and they ruin the
rendering on this page. Instead, we use a macro to passthrough these
characters without Asciidoctor applying any subtitutions to them. This
commit then addresses some rendering issues in the thread pool docs.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-04-10 15:11:19 -04:00
Nik Everett b99a50bcb9
value_count Aggregation optimization (backport of #54854) (#55076)
We found some problems during the test.

Data: 200Million docs, 1 shard, 0 replica

    hits    |   avg   |   sum   | value_count |
----------- | ------- | ------- | ----------- |
     20,000 |   .038s |   .033s |       .063s |
    200,000 |   .127s |   .125s |       .334s |
  2,000,000 |   .789s |   .729s |      3.176s |
 20,000,000 |  4.200s |  3.239s |     22.787s |
200,000,000 | 21.000s | 22.000s |    154.917s |

The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.

The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:

- Number cast to string overhead, and GC problems caused by a large
  number of strings
- After the number type is converted to string, sorting and other
  unnecessary operations are performed

Here is the proof of type conversion overhead.

```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
        int size = stringSize(i);
        if (COMPACT_STRINGS) {
            byte[] buf = new byte[size];
            getChars(i, size, buf);
            return new String(buf, LATIN1);
        } else {
            byte[] buf = new byte[size * 2];
            StringUTF16.getChars(i, size, buf);
            return new String(buf, UTF16);
        }
}
```

  test type  | average |  min |     max     |   sum
------------ | ------- | ---- | ----------- | -------
double->long |  32.2ns | 28ns |     0.024ms |  3.22s
long->double |  31.9ns | 28ns |     0.036ms |  3.19s
long->String | 163.8ns | 93ns |  1921    ms | 16.3s

particularly serious.

Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.

    hits    |   avg   |   sum   | value_count | value_count | value_count | value_count | value_count | value_count |
            |         |         |    double   |    double   |   keyword   |   keyword   |  geo_point  |  geo_point  |
            |         |         |   before    |    after    |   before    |    after    |   before    |    after    |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
     20,000 |     38s |   .033s |       .063s |       .026s |       .030s |       .030s |       .038s |       .015s |
    200,000 |    127s |   .125s |       .334s |       .078s |       .116s |       .099s |       .278s |       .031s |
  2,000,000 |    789s |   .729s |      3.176s |       .439s |       .348s |       .386s |      3.365s |       .178s |
 20,000,000 |  4.200s |  3.239s |     22.787s |      2.700s |      2.500s |      2.600s |     25.192s |      1.278s |
200,000,000 | 21.000s | 22.000s |    154.917s |     18.990s |     19.000s |     20.000s |    168.971s |      9.093s |

- The results are more in line with common sense. `value_count` is about
  the same as `avg`, `sum`, etc., or even lower than these. Previously,
  `value_count` was much larger than avg and sum, and it was not even an
  order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
  performance is improved by about 8 to 9 times; when calculating the
  `geo_point` type, the performance is improved by 18 to 20 times.
2020-04-10 13:16:39 -04:00
James Rodewig c440754784 [DOCS] EQL: Document `wildcard` function (#54086) 2020-04-10 09:18:29 -04:00
oneoneonepig 356cc94889 [DOCS] Fix double quote typo in 7.0 breaking changes (#55040) 2020-04-10 09:11:51 -04:00
Jason Tedor 9eeae59a83
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:48:27 -04:00
James Rodewig 51326432be [DOCS] Add query reference docs template (#52292) 2020-04-10 08:47:54 -04:00
James Rodewig d5a609a2e5 [DOCS] Add token filter reference docs template (#52290)
Creates a reusable template for token filter reference documentation.

Contributors can make a copy of this template and customize it when
documenting new token filters.
2020-04-10 08:45:10 -04:00
Marios Trivyzas bf0cadb602
SQL: Implement DATETIME_PARSE function for parsing strings (#54960) (#55035)
Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function
which allows to parse a datetime string according to the specified
pattern into a datetime object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Relates to #53714

(cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)
2020-04-10 01:16:29 +02:00
Vishal Patel 51cb0c5c7b [DOCS] Collapse nested objects in cluster reroute docs (#54851) 2020-04-09 15:29:22 -04:00
István Zoltán Szabó 374f633b6e [DOCS] Adds link points to the data frame analytics supported fields (#55004)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-09 11:27:57 -07:00
Mark Vieira dd73a14d11
Improve total build configuration time (#54611) (#54994)
This commit includes a number of changes to reduce overall build
configuration time. These optimizations include:

- Removing the usage of the 'nebula.info-scm' plugin. This plugin
   leverages jgit to load read various pieces of VCS information. This
   is mostly overkill and we have our own minimal implementation for
   determining the current commit id.
- Removing unnecessary build dependencies such as perforce and jgit
   now that we don't need them. This reduces our classpath considerably.
- Expanding the usage lazy task creation, particularly in our
   distribution projects. The archives and packages projects create
   lots of tasks with very complex configuration. Avoiding the creation
   of these tasks at configuration time gives us a nice boost.
2020-04-08 16:47:02 -07:00
James Rodewig c6cd8ca7c0
[DOCS] Update upgrade docs for 7.7 (#54978) 2020-04-08 16:23:08 -04:00
James Rodewig 964cf565c9
[DOCS] EQL: Document `between` function (#54950) 2020-04-08 13:49:15 -04:00
Théophile Helleboid - chtitux a8aa36d427 [DOCS] Fix typo in SLM retention docs (#54797) 2020-04-08 08:56:45 -04:00
Marios Trivyzas 6afd60b082
SQL: Implement DATETIME_FORMAT function for date/time formatting (#54832) (#54942)
Implement DATETIME_FORMAT(<date/datetime/time>, ) function
which allows for formatting a timestamp to the specified format. The
patterns allowed as those of java.time.format.DateTimeFormatter.

Related to #53714

(cherry picked from commit 72be0b54a9299e87e785469cdc9aafac2a48c046)
2020-04-08 13:45:47 +02:00
István Zoltán Szabó 3a3effedc2 [DOCS] Reworks some parts of EMM API docs (#54872)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-04-08 10:20:34 +02:00
Julie Tibshirani 475b210eec
Improve guidance on removing default mappings. (#54915)
In 7.x, an index template will fail to apply if it contains a `_default_`
mapping. Several users have expressed confusion over the fact that loading the
template doesn't show any default mappings. This docs change clarifies that in
order to see all mappings in the template, you must pass `include_type_name`.
2020-04-07 15:18:13 -07:00
James Rodewig 9569a8eb13 [DOCS] Add example to "avoid scripts" advice (#54719)
Adds a detailed example to the "Avoid scripts" section of the "Tune
for search speed" docs. The detail outlines how a script used to
transform indexed data can be moved to ingest.

The update also removes an outdated reference to supported script
languages.
2020-04-07 15:25:10 -04:00