Commit Graph

6847 Commits

Author SHA1 Message Date
Christoph Büscher fe3f9f0c6b Yet another `the the` cleanup (#43815) 2019-07-01 20:22:19 +02:00
Zachary Tong ea1794832f Add RareTerms aggregation (#35718)
This adds a `rare_terms` aggregation.  It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.

This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).

This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again.  If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter.  If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".

The map keys are the "rare" terms after collection is done.
2019-07-01 10:30:02 -04:00
Tanguy Leroux 7554420581 Update docs for Open/Close API (#43809)
Relates #43530
2019-07-01 15:20:36 +02:00
Dimitrios Liappis 9aa6f7c434
Update TLS configuration in Docker docs (#43816)
Following the removal of the `unzip` package from the Elasticsearch 
Docker image in #39040, update setup instructions for TLS in Docker.

Also avoid cross-platform ownership+permission issues by not relying
on local bind mounts for storing generated certs and don't require 
`curl` locally installed.

Backport of #43748
2019-07-01 15:33:34 +03:00
David Turner 40d43e3f87 Avoid IP addresses for bootstrapping in setup docs (#43802)
Removes the suggestion to use IP addresses for `cluster.initial_master_nodes`
in the "important settings" discovery docs, leaving only the suggestion to use
node names.

Relates #41179, #41569
2019-07-01 12:39:54 +01:00
Julie Tibshirani ffa5919d7c
Add support for 'flattened object' fields. (#43762)
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
2019-07-01 12:08:50 +03:00
weizijun 28358fdbed
show a full ingest example in the index page, to let user fast understand ingest node. (#43476) 2019-07-01 08:04:57 +02:00
James Rodewig d8fe0f5c13 [DOCS] Rewrite `terms_set` query (#43060) 2019-06-28 12:57:22 -04:00
Alan Woodward 81dbcfb268 Wildcard intervals (#43691)
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.

Closes #43198
2019-06-28 14:04:03 +01:00
James Rodewig 74dd6e49fc [DOCS] Rewrite boosting query (#43647) 2019-06-28 08:35:55 -04:00
Henning Andersen 632da7f2c8 Enabled cannot be updated (#43701)
Removed the invalid tip that enabled can be updated for existing fields
and clarified instead that it cannot.

Related to #33566 and #33933
2019-06-28 12:59:00 +02:00
Christoph Büscher 2cc7f5a744
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-28 09:55:40 +02:00
Lisa Cawley 1b7bcdc3a0 [DOCS] Adds data frame API response codes for allow_no_match (#43666) 2019-06-27 15:17:58 -07:00
Lisa Cawley 42cb59f7b4 [DOCS] Updates ML APIs to use new API template (#43711) 2019-06-27 15:05:51 -07:00
lcawl d46e2bb26a [DOCS] Adds anchors and attributes to ML APIs 2019-06-27 09:44:56 -07:00
Alan Woodward 05a7333eca Require [articles] setting in elision filter (#43083)
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
2019-06-27 09:02:36 +01:00
Julie Tibshirani bed7e68014 Make the ignore_above docs tests more robust. (#43349)
It is possible for internal ML indices like `.data-frame-notifications-1` to leak,
causing other docs tests to fail when they accidentally search over these
indices. This PR updates the ignore_above tests to only search a specific index.
2019-06-27 10:50:55 +03:00
Lisa Cawley ad84059db6 [DOCS] Updates data frame APIs to use API template (#43610) 2019-06-26 13:49:37 -07:00
James Rodewig 87566c9324 [DOCS] Change 'X-Pack APIs' section to 'REST APIs' (#43451) 2019-06-26 13:46:12 -04:00
Alan Woodward 76d0edd1a4 Add prefix intervals source (#43635)
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.

Relates to #43198
2019-06-26 16:22:12 +01:00
Benjamin Trent c121b00c98
[7.x] [ML][Data Frame] Add support for allow_no_match for endpoints (#43490) (#43637)
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)

* [ML][Data Frame] Add support for allow_no_match parameter in endpoints

Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms

* Update DataFrameTransformDocumentationIT.java
2019-06-26 10:09:56 -05:00
Stuart Tettemer 500205e8c5
Add painless method getByPath, get value from nested collections with dotted path (#43170) (#43606)
Given a nested structure composed of Lists and Maps, getByPath will return the value
keyed by path.  getByPath is a method on Lists and Maps.

The path is string Map keys and integer List indices separated by dot. An optional third
argument returns a default value if the path lookup fails due to a missing value.

Eg.
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key1') = ['c', 'd']
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key1.0') = 'c'
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key2', 'x') = 'x'
[['key0': 'value0'], ['key1': 'value1']].getByPath('1.key1') = 'value1'

Throws IllegalArgumentException if an item cannot be found and a default is not given.
Throws NumberFormatException if a path element operating on a List is not an integer.

Fixes #42769
2019-06-26 09:06:34 -06:00
Jake Landis 51161a4b0e
add 7.2.0 release notes 2019-06-26 08:50:11 -05:00
Armin Braun 83067968ca
Add SAS Token Authentication Support to Azure Repo Plugin (#42982) (#43618)
* Added setting for SAS token
* Added support for the token in tests
* Relates #42117
2019-06-26 13:43:32 +02:00
David Roberts 558e323c89 [ML] Introduce a setting for the process connect timeout (#43234)
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.

The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
2019-06-26 09:22:04 +01:00
Yannick Welsch 2049f715b3 Add voting-only master node (#43410)
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.

Closes #14340

Co-authored-by: David Turner <david.turner@elastic.co>
2019-06-26 08:07:56 +02:00
James Rodewig 50eac875e4 [DOCS] Rewrite `range` query (#43282) 2019-06-25 15:25:48 -04:00
Dimitris Athanasiou 126c2fd2d5
[7.x][ML] Machine learning data frame analytics (#43544) (#43592)
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.

A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.

The APIs are:

- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}

When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:

1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index

In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:

- POST _ml/data_frame/_evaluate
2019-06-25 20:29:11 +03:00
James Rodewig b598701198 [DOCS] Add redirect for painless examples anchor 2019-06-25 12:34:18 -04:00
rbayet 66693c2706
Fixing backquote in fail_on_unsupported_field (#43572) 2019-06-25 16:34:38 +02:00
Ernesto Reig c594a956e2
Default number of shards is now 1 instead of 5 (#43573)
As specified in the [Breaking changes for 7.X](https://www.elastic.co/guide/en/elasticsearch/reference/7.1/breaking-changes-7.0.html#breaking_70_indices_changes), the default number of shards for an index is now `1` instead of `5`.
2019-06-25 14:51:07 +02:00
debadair df42fac9ac [DOCS] Edited title/subtitle. (#43552) 2019-06-24 15:31:19 -07:00
Lisa Cawley 8ffd9c6981 [DOCS] Adds administering section (#43493) 2019-06-24 10:15:25 -07:00
David Roberts 6728e63619 [DOCS] Rename "job" to "transform" in data frame transform docs (#43534) 2019-06-24 09:11:24 -07:00
Tanguy Leroux 9794409ca0 Fix broken link 2019-06-24 16:19:57 +02:00
Tanguy Leroux a4dfa7c29b Add release highlight for replicated closed indices on 7.2.0 (#43530) 2019-06-24 15:54:36 +02:00
Matthew Adams 0bcadbf846 Clarify storage location of ML Snapshots (#43437)
The existing language was misleading about the model snapshots and where they are located. Saying "to disk" sounds like files external to Elasticsearch IMO. It raises the obvious question, where on disk? which node? Is it in the Elasticsearch snapshot repo? The model snapshots are held in an internal index.
2019-06-24 09:14:12 +01:00
Igor Motov 6162471d2e Docs: Add description of the coerce parameter in geo_shape mapper (#43340)
Explains the effect of the coerce parameter on the geo_shape field.

Relates #35059
2019-06-21 12:30:20 -04:00
James Rodewig 014fd19abd [DOCS] Rewrite `constant_score` query (#43374) 2019-06-21 12:04:00 -04:00
James Rodewig 359b103f87 [DOCS] Rewrite term-level queries overview (#43337) 2019-06-21 11:55:02 -04:00
Luiz Guilherme Pais dos Santos eeb1812510 Example of how to set slow logs dynamically per-index (#42384)
* Example of how to set slow logs dynamically per-index

* Make _settings API example more explicit

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>

* Add TEST directive to fix CI

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-06-21 09:30:53 -04:00
David Kyle d1280339a8
specifies which index to search in docs for various queries (#43307) (#43428)
the geo-bounding-box and phrase-suggest docs were susceptible to
failing due to other indices in the cluster. This change restricts
the queries to the index that is set up for the test.

relates to #43271.
2019-06-21 10:15:51 +01:00
Yu c88f2f23a5 Make Recovery API support `detailed` params (#29076)
Properly forwards the `detailed` parameter to show the recovery stats details.

Closes #28910
2019-06-21 09:05:33 +02:00
Ryan Ernst 7b0a259b2c Clarify unsupported secure settings behavior (#43454)
This commit tweaks the docs for secure settings to ensure the user is
aware adding non secure settings to the keystore will result in
elasticsearch not starting.

fixes #43328

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
2019-06-20 14:27:27 -07:00
Deb Adair 6b1e45b5b3 [DOCS] Updated the URL for starting in the cloud. 2019-06-20 13:09:21 -07:00
debadair 2319fe74c3 [DOCS] Fixed path to install directory. (#43443) 2019-06-20 10:36:28 -07:00
Lisa Cawley 5f8db95d60 [DOCS] Describe setup for monitoring logs (#42655) 2019-06-20 08:17:27 -07:00
debadair 7b740b4ea3 [DOCS] Add brew install instructions. Closes #42914 (#42915) 2019-06-20 07:56:49 -07:00
David Kyle 12bc38d9e6 Mute put-transform docs test
Relates to #43271
2019-06-20 15:54:24 +01:00
Christoph Büscher adab7eae71 [Docs] Remove boost parameter from intervals-query example (#43331)
The boost factor doesn't seem to be needed and can be removed.
2019-06-20 10:34:14 +02:00