Commit Graph

8292 Commits

Author SHA1 Message Date
James Rodewig 386fb16409 [DOCS] SQL: Update link for supported regex in `RLIKE` docs (#55830)
The`RLIKE` function docs points users to [Java’s Pattern class doc][0]
for regular expression syntax. However, these docs include shorthand
character classes, such as `[\d]`, `[\s]`, and `[\w]`. These character
classes are not supported in Elasticsearch, which may confuse users.

This updates the SQL `RLIKE` docs to refer to the ES [regular expression
syntax docs][1], which only documents supported syntax.

[0]: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
[1]: https://www.elastic.co/guide/en/elasticsearch/reference/master/regexp-syntax.html

Relates to #55231
2020-04-28 09:25:51 -04:00
James Rodewig 452be22a4d [DOCS] Warn about searching across all fields wt. `query_string` (#55853)
Warn about potential performance impact when a large number of fields
is used with query string query and no default field.

Re-adds content from #35570.
That content was erroneously removed in #45296.

Co-authored-by: Peter Dyson <peter.dyson@geekpete.com>
2020-04-28 09:20:21 -04:00
Adrien Grand 58c3bb5ae1
Repurpose `ignore_throttled` to be only about frozen indices. (#55047) (#55852)
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.

Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.

Relates #54058
2020-04-28 14:31:54 +02:00
Amit Khandelwal 126e4acca8 Expose `preserve_original` in `edge_ngram` token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:24:27 +02:00
István Zoltán Szabó a5cf4712e5 [DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:03:43 +02:00
James Rodewig c16b1edae0 [DOCS] EQL: Fix whitespace in `stringContains` docs 2020-04-27 15:53:59 -04:00
James Rodewig 8df5cff9c1 [DOCS] Correct stemmer token filters anchor 2020-04-27 14:57:59 -04:00
James Rodewig 5b8a18c756 [DOCS] Correct stemmer token filter anchor 2020-04-27 14:51:51 -04:00
David Roberts 3ba44a5af8
[ML] Adding failed_category_count to model_size_stats (#55761)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Backport of #55716
2020-04-25 10:36:49 +01:00
James Rodewig c1b0548db0
[DOCS] Document EQL search REST API (#52384) 2020-04-24 15:36:01 -04:00
James Rodewig 5981412bf7
[DOCS] EQL: Document `stringContains` function (#54968) 2020-04-24 15:09:05 -04:00
James Rodewig e4ebe55d04
[DOCS] EQL: Document `cidrMatch` function (#54216) (#55739) 2020-04-24 14:01:11 -04:00
James Rodewig e0a8adb5b2
[DOCS] Reformat `stemmer` token filter (#55693)
Makes the following changes to the `stemmer` token filter docs:

* Adds detailed analyze example
* Rewrites parameter definitions
* Adds custom analyzer example
* Adds a `language` value for the `estonian` stemmer
* Reorders the `language` values to show recommended algorithms first,
  followed by other values alphabetically
2020-04-24 11:25:01 -04:00
James Rodewig 96285b90c1
[DOCS] Add stemming concept docs (#55156)
Adds conceptual documentation for stemming, including:

* An overview of why stemming is helpful in search
* Algorithmic vs. dictionary stemming
* Token filters used to control stemming, such as `stemmer_override`, `keyword_marker`, and `conditional`
2020-04-24 11:01:28 -04:00
Christoph Büscher f95a741ad3
[Docs] Fix fuzziness example in match-query.asciidoc (#55715)
The example looks the same as in the previous section although it should use the
"fuzziness" parameter. This seems to be okay on 6.8 and master and was probably
only forgotten to port to 7.x branches.
2020-04-24 16:21:40 +02:00
Zachary Tong 715c90bf7d Aggs must specify a `field` or `script` (or both) (#52226)
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user.  This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
2020-04-23 19:23:41 -04:00
James Rodewig e74fdacabd
[DOCS] Add admonition for EQL exact matches on text fields (#53402) (#55670)
Adds a important admonition to the EQL syntax page noting that
the equal (`==`) operator should not be used to match `text` field
values.

Relates to #52709 and #53020
2020-04-23 10:59:50 -04:00
István Zoltán Szabó 5813dfdcc7
[7.x][DOCS] Adds ML related items to release highlights (#55652) 2020-04-23 11:58:32 +02:00
Lisa Cawley 314ca78e31
[7.x][DOCS] Update example and nesting in get data frame analytics job stats API (#55612) 2020-04-22 10:58:26 -07:00
James Rodewig 8d05d7dace
[DOCS] Add collapsible sections to 7.x breaking changes (#55334)
Adds collapsible sections and new format to the 7.x breaking changes.

Relates to #53229.
2020-04-22 10:56:38 -04:00
James Rodewig 6f9513915d
[DOCS] Add 'how to' doc about avoiding oversharding (#55480)
Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
2020-04-22 10:44:16 -04:00
James Rodewig 414f9c98f3
[DOCS] Document missing bulk API response parameters (#55414)
Documents several parameters missing from the bulk API's response body
docs. Also moves several response-related chunks of text to the response
body section.

Relates to #55237
2020-04-22 09:48:03 -04:00
David Roberts 2dc5586afe
[ML] Add effective max model memory limit to ML info (#55581)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Backport of #55529
2020-04-22 12:28:50 +01:00
David Roberts da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response (#55570)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of #55473
2020-04-22 12:06:53 +01:00
István Zoltán Szabó 0ce3406033 [DOCS] Provides further details on aggregations in datafeeds (#55462)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-04-22 08:54:52 +02:00
James Rodewig 777ffd5801
[DOCS] Add bulk API example with failures (#55412)
Adds an example for bulk API requests that include failures.
Also documents guidance on use the `filter_path` parameter
to narrow the bulk API response for errors.

Closes #55237
2020-04-21 16:22:23 -04:00
James Baiera 2a5f1f49a9
Add enrich metricset from 7.5 (#54791) (#55356)
Co-authored-by: Julien Guay <guay_j@yahoo.fr>
2020-04-21 12:39:08 -04:00
James Rodewig b9dfd12e7e
[DOCS] Remove 'Testing' chapter (#55270) (#55532)
Removes the 'Testing' chapter from the Elasticsearch Reference guide.

This chapter was originally written for so that users using the Java HLRC client could
use the same test classes when testing Elasticsearch in their own applications.
However, this is no longer the case or recommended.

Closes #55257.
2020-04-21 10:29:58 -04:00
Paul Sanwald 0f7917b94b
add release notes for 7.5.2 (#51259)
Adds release notes for 7.5.2
2020-04-21 08:19:46 -04:00
Benjamin Trent 24d41eb695
[ML] partitions model definitions into chunks (#55260) (#55484)
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.

This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.

I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
2020-04-20 16:08:54 -04:00
David Turner 8e618fdf10 Adjust docs for voting config exclusions API (#55006)
In #50836 we deprecated the existing voting config exclusions API and added a
new one. This commit adjust the docs to match.
2020-04-20 19:47:33 +01:00
Lee Hinman 9eddd2bcc9
[7.x] Add prefer_v2_templates flag and index setting (#55411) (#55476)
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover

These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.

Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.

This also adds support for sending this parameter to the High Level Rest Client.

Relates to #53101
2020-04-20 12:05:42 -06:00
jmceniery 99409e8c95 [DOCS] Remove Wikipedia link from `SUM_OF_SQUARES` SQL function docs (#52398)
Removed the link to Wikipedia as the function is not calculating the sum of squares in this way. More can be found here at this issue:

https://github.com/elastic/elasticsearch/issues/50416
2020-04-20 09:59:59 -04:00
Ben Skelker 74f55ec6fa [DOCS] Add `ip_range` datatype to core datatypes range list (#55446) 2020-04-20 08:55:09 -04:00
William Brafford 49e30b15a2
Deprecate disabling basic-license features (#54816) (#55405)
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.

This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
2020-04-17 15:04:17 -04:00
Andrei Dan ef338ee3d4
ILM DOCS: mention forcemerge is best effort (#54794) (#55401)
(cherry picked from commit 3fd05435c52dd265dbe1a40104e7dc7a335d50ae)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-17 15:42:23 +01:00
James Rodewig f87a3f0c48 [DOCS] Document analysis/mapping response for cluster stats API (#55054)
PR #51260 moved usage counts about mapping field types and analysis to
the `_cluster/stats` API.

This documents those stats in the response section of the cluster stats
API docs.
2020-04-17 08:44:10 -04:00
Adrien Grand 0cb6a1f089
Document the index corruption bug that gets fixed via Lucene 8.5.1. (#55232)
Using soft deletes on shrunk indices may cause corruption.
2020-04-17 13:37:37 +02:00
markharwood 7761b01a33
Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294) (#55375)
Closes #55288
2020-04-17 11:43:26 +01:00
Marios Trivyzas f958e9abdc
SQL: Implement scripting inside aggs (#55241) (#55371)
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:

```
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b
FROM test
GROUP BY b
HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
```

Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as
this is currently not implemented on the ElasticSearch side.

Fixes: #29980
Fixes: #36865
Fixes: #37271

(cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)
2020-04-17 12:41:22 +02:00
Lisa Cawley c7cf6e621d [DOCS] Remove text fields from classification dependent variables (#54849) 2020-04-16 13:40:28 -07:00
Lisa Cawley cf5278f771 [DOCS] Add ml-cpp PRs to 7.7 release notes (#55264)
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
2020-04-16 11:28:34 -07:00
Julie Tibshirani d7cded8d7a
Fix updating include_in_parent/include_in_root of nested field. (#55326)
The main changes are:
1. Throw an error when updating `include_in_parent` or `include_in_root` attribute of nested field dynamically by the PUT mapping API.
2. Add a test for the change.

Closes #53792

Co-authored-by: bellengao <gbl_long@163.com>
2020-04-16 11:17:12 -07:00
James Rodewig f0b9be8b1b [DOCS] Reformat `flatten_graph` token filter (#54268)
* [DOCS] Reformat `flatten_graph` token filter

Makes the following changes to the `flatten_graph` token filter docs:

* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Adds analyzer example
2020-04-16 08:35:08 -04:00
Bogdan Pintea b88dd47de3 Docs: add the change log for 7.7 (#55019)
* Add the change log for 7.7

Add the change log for 7.7

* Update rel. notes to latest state (BC5)

Update the release notes to current state (i.e. BC5).

* Update docs/reference/release-notes/7.7.asciidoc

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2020-04-15 15:25:08 -04:00
Lisa Cawley f0b9578684 [DOCS] Removes transform performance note (#55177) 2020-04-15 10:42:52 -07:00
Ignacio Vera a677b63daa
Upgrade to lucene 8.5.1 release (#55229) (#55235)
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
2020-04-15 17:35:42 +02:00
James Rodewig 4f2ab96f38 [DOCS] EQL: Document `indexOf` function (#55071) 2020-04-15 11:29:50 -04:00
James Rodewig 8d6f0f6a76 [DOCS] Document `max_concurrent_searches` default (#55116) 2020-04-15 10:04:23 -04:00
Benjamin Trent 8ff2cbf1a3
[7.x] [ML] adding prediction_field_type to inference config (#55128) (#55230)
* [ML] adding prediction_field_type to inference config (#55128)

Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 09:45:22 -04:00