This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.
Relates to #43198
* [ML][Data Frame] Add support for allow_no_match for endpoints (#43490)
* [ML][Data Frame] Add support for allow_no_match parameter in endpoints
Adds support for:
* Get Transforms
* Get Transforms stats
* stop transforms
* Update DataFrameTransformDocumentationIT.java
Given a nested structure composed of Lists and Maps, getByPath will return the value
keyed by path. getByPath is a method on Lists and Maps.
The path is string Map keys and integer List indices separated by dot. An optional third
argument returns a default value if the path lookup fails due to a missing value.
Eg.
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key1') = ['c', 'd']
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key1.0') = 'c'
['key0': ['a', 'b'], 'key1': ['c', 'd']].getByPath('key2', 'x') = 'x'
[['key0': 'value0'], ['key1': 'value1']].getByPath('1.key1') = 'value1'
Throws IllegalArgumentException if an item cannot be found and a default is not given.
Throws NumberFormatException if a path element operating on a List is not an integer.
Fixes#42769
This change introduces a new setting,
xpack.ml.process_connect_timeout, to enable
the timeout for one of the external ML processes
to connect to the ES JVM to be increased.
The timeout may need to be increased if many
processes are being started simultaneously on
the same machine. This is unlikely in clusters
with many ML nodes, as we balance the processes
across the ML nodes, but can happen in clusters
with a single ML node and a high value for
xpack.ml.node_concurrent_job_allocations.
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.
Closes#14340
Co-authored-by: David Turner <david.turner@elastic.co>
This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.
A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.
The APIs are:
- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}
When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:
1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index
In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:
- POST _ml/data_frame/_evaluate
The existing language was misleading about the model snapshots and where they are located. Saying "to disk" sounds like files external to Elasticsearch IMO. It raises the obvious question, where on disk? which node? Is it in the Elasticsearch snapshot repo? The model snapshots are held in an internal index.
* Example of how to set slow logs dynamically per-index
* Make _settings API example more explicit
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
* Add TEST directive to fix CI
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
the geo-bounding-box and phrase-suggest docs were susceptible to
failing due to other indices in the cluster. This change restricts
the queries to the index that is set up for the test.
relates to #43271.
This commit tweaks the docs for secure settings to ensure the user is
aware adding non secure settings to the keystore will result in
elasticsearch not starting.
fixes#43328
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Together with types removal, any mention of "fields with the same name in the same index" doesn't make sense anymore.
(cherry picked from commit c5190106cbd4c007945156249cce462956933326)
* [ML][Data Frame] adds new pipeline field to dest config (#43124)
* [ML][Data Frame] adds new pipeline field to dest config
* Adding pipeline support to _preview
* removing unused import
* moving towards extracting _source from pipeline simulation
* fixing permission requirement, adding _index entry to doc
* adjusting for java 8 compatibility
* adjusting bwc serialization version to 7.3.0
These docs were misleading for package installations of
Elasticsearch. Instead, we should refer to $ES_CONFIG/ingest-geoip as
the path to place the custom database files. For non-package
installations, this is the same as $ES_HOME/config, but for package
installations this is not the case as the config directory for package
installations is /etc/elasticsearch, and is not relative to
$ES_HOME. This commit corrects the docs.
* [DOCS] Add introduction to Elasticsearch.
* [DOCS] Incorporated review comments.
* [DOCS] Minor edits to add an abbreviated title and cross refs.
* [DOCS] Added sizing tips & link to quantatative sizing video.
To be consistent with the `search.max_buckets` default setting,
set the hard limit of the PriorityQueue used for in memory sorting,
when sorting on an aggregate function, to 10000.
Fixes: #43168
(cherry picked from commit 079e012fdea68ea0a7daae078359495047e9c407)