Adds documentation for the `minimum_should_match` parameter to the `bool` query docs. Includes docs for the default values:
- `1` if the `bool` query includes at least one `should` clause and no `must` or `filter` clauses
- `0` otherwise
* Adds a title abbreviation
* Updates the description and adds a Lucene link
* Reformats the parameters section
* Adds analyze, custom analyzer, and custom filter snippets
Relates to #44726.
The documentation contained a small error, as bytes and duration was not
properly converted to a number and thus remained a string.
The documentation is now also properly tested by providing a full blown
simulate pipeline example.
* Recommend Metricbeat for 7.x
* Fix typo in link to configuring-metricbeat
* [DOCS] Fixes build error and some terminology
* Add to local exporter page per review feedback
* Creates a prerequisites section in the cross-cluster replication (CCR)
overview.
* Adds concise definitions for local and remote cluster in a CCR context.
* Documents that the ES version of the local cluster must be the same
or a newer compatible version as the remote cluster.
The "Restore any snapshots as required" step is a trap: it's somewhere between
tricky and impossible to restore multiple clusters into a single one.
Also add a note about configuring discovery during a rolling upgrade to
proscribe any rare cases where you might accidentally autobootstrap during the
upgrade.
When the enrich processor appends enrich data to an incoming document,
it adds a `target_field` to contain the enrich data.
This `target_field` contains both the `match_field` AND `enrich_fields`
specified in the enrich policy.
Previously, this was reflected in the documented example but not
explicitly stated. This adds several explicit statements to the docs.
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.
It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.
Related to #47567
This rewrites long sort as a `DistanceFeatureQuery`, which can
efficiently skip non-competitive blocks and segments of documents.
Depending on the dataset, the speedups can be 2 - 10 times.
The optimization can be disabled with setting the system property
`es.search.rewrite_sort` to `false`.
Optimization is skipped when an index has 50% or more data with
the same value.
Optimization is done through:
1. Rewriting sort as `DistanceFeatureQuery` which can
efficiently skip non-competitive blocks and segments of documents.
2. Sorting segments according to the primary numeric sort field(#44021)
This allows to skip non-competitive segments.
3. Using collector manager.
When we optimize sort, we sort segments by their min/max value.
As a collector expects to have segments in order,
we can not use a single collector for sorted segments.
We use collectorManager, where for every segment a dedicated collector
will be created.
4. Using Lucene's shared TopFieldCollector manager
This collector manager is able to exchange minimum competitive
score between collectors, which allows us to efficiently skip
the whole segments that don't contain competitive scores.
5. When index is force merged to a single segment, #48533 interleaving
old and new segments allows for this optimization as well,
as blocks with non-competitive docs can be skipped.
Backport for #48804
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.
It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.
Related to #47567
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.
Closes#49531
Backport of #49690
- Improves HTTP client hostname verification failure messages
- Adds "DiagnosticTrustManager" which logs certificate information
when trust cannot be established (hostname failure, CA path failure,
etc)
These diagnostic messages are designed so that many common TLS
problems can be diagnosed based solely (or primarily) on the
elasticsearch logs.
These diagnostics can be disabled by setting
xpack.security.ssl.diagnose.trust: false
Backport of: #48911
Add a couple of pointers for the user to check the
overall cluster health and the version of ES running
on every node.
Fixes: #49670
(cherry picked from commit 8ca11f54cd839f41632c556601e94da67e91a3d1)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
This commit clarifies how to override JAVA_HOME from the bundled jdk for
deb and rpm installs, which each have their own file that is sourced
upon service startup.
closes#49068
Fix reference about the uid:gid that Elasticsearch runs as inside
the Docker container and add a packaging test to ensure that bind
mounting a data dir with a random uid and gid:0 works as
expected.
Backport of #49529Closes#47929
Backport of #49076
In case an exception occurs inside a pipeline processor,
the pipeline stack is kept around as header in the exception.
Then in the on_failure processor the id of the pipeline the
exception occurred is made accessible via the `on_failure_pipeline`
ingest metadata.
Closes#44920
Add TRUNC as alias to already implemented TRUNCATE
numeric function which is the flavour supported by
Oracle and PostgreSQL.
Relates to: #41195
(cherry picked from commit f2aa7f0779bc5cce40cc0c1f5e5cf1a5bb7d84f0)
The default is set to Integer.MAX_VALUE but is reported to be `0` in the docs.
With the current implementation a value of 0 would mean all terms are filtered
out, which is the opposite of "unbounded".
Closes#49520
The breaking changes cover the removal of TLSv1 from the default
protocols, but assume that users who need to retain TLSv1 support will
understand all the places where they may used it.
This has proven not to be true, as it is easy to be unaware that (for
example) an LDAP server is using TLSv1.
This change explicitly lists all the places where TLS protocols may
need to be configured.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
Co-Authored-By: Pius <pius@elastic.co>