It adds notes about:
- how preference can help optimize cache usage
- the fact that too many replicas can hurt search performance due to lower
utilization of the filesystem cache
- how index sorting can improve _source compression
- how always putting fields in the same order in documents can improve _source
compression
* Add documentation for the new parent-join field
This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.
Relates #20257
* Upgrade icu4j for the ICU analysis plugin to 59.1
Lucene upgraded to 59.1 so we should use the same.
Closes#21425
* Add breaking change for the icu upgrade
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
Expose the experimental simplepattern and
simplepatternsplit tokenizers in the common
analysis plugin. They provide tokenization based
on regular expressions, using Lucene's
deterministic regex implementation that is usually
faster than Java's and has protections against
creating too-deep stacks during matching.
Both have a not-very-useful default pattern of the
empty string because all tokenizer factories must
be able to be instantiated at index creation time.
They should always be configured by the user
in practice.
Get mappings HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for get
mappings HEAD requests, and just relying on the general mechanism that
exists for handling HEAD requests in the REST layer.
Relates #23192
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815
It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
This PR enables Ingest plugins to leverage processor-scoped REST
endpoints. First of which being the Grok endpoint that retrieves
Grok Patterns for users to retrieve all the built-in patterns.
Example usage: Kibana Grok Autocomplete!
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information and
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.
Relates #6720
During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).
In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named
ES_SKIP_SET_KERNEL_PARAMETERS was introduced to skip calling sysctl, but
this causes trouble for:
- configuration management systems, which usually cannot apply an env
var when running a package manager
- package upgrades, which will not have the env var set any more, and
thus leaving the package management system in a bad state (possibly
half-way upgraded, can be very hard to recover)
This removes the env var again and instead of calling systemd-sysctl
manually, tells systemd to restart the wrapper unit - which itself can
be masked by system administrators or management tools if it is known
that sysctl does not work in a given environment.
The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.
Relates #24234
The index parameter in the update-aliases, put-alias, and delete-alias APIs no longer accepts alias names. Instead, it accepts only index names (or wildcards which will expand to matching indices).
Closes#23960
This removes the parsing of things like `GET /idx/_aliases,_mappings`, instead,
a user must choose between retriving all index metadata with `GET /idx`, or only
a specific form such as `GET /idx/_settings`.
Relates to (and is a prerequisite of) #24437
Some response classes in the java api expose both `getTook()` which returns a `TimeValue` and `getTookInMillis` which returns a `long` value. `getTook()` is enough as one can do `getTook().millis()` to obtain the same result as `getTookInMillis()`, which can be removed.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments
This commit adds a new bg_count field to the REST response of
SignificantTerms aggregations. Similarly to the bg_count that already
exists in significant terms buckets, this new bg_count field is set at
the aggregation level and is populated with the superset size value.
Currently global ordinals are documented under `fielddata`. It moves them to
their own file since they also work with doc values and fielddata is on the way
out.
Closes#23101
This commit adds a `doc_count` field to the response body of Matrix
Stats aggregation. It exposes the number of documents involved in
the computation of statistics, a value that can already be retrieved using
the method MatrixStats.getDocCount() in the Java API.
* SignificantText aggregation - like significant_terms but doesn’t require fielddata=true, recommended used with `sampler` agg to limit expense of tokenizing docs and takes optional `filter_duplicate_text`:true setting to avoid stats skew from repeated sections of text in search results.
Closes#23674
Adds a "magic" key to the yaml testing stash mostly for use with
documentation tests. When unstashing an object, `$_path` is the
path into the current position in the object you are unstashing.
This means that in docs tests you can use
`// TESTRESPONSEs/somevalue/$body.${_path}/` to mean "replace
`somevalue` with whatever is the response in the same position."
Compare how you must carefully mock out all the numbers in the profile
response without this change:
```
// TESTRESPONSE[s/"id": "\[2aE02wS1R8q_QFnYu6vDVQ\]\[twitter\]\[1\]"/"id": $body.profile.shards.0.id/]
// TESTRESPONSE[s/"rewrite_time": 51443/"rewrite_time": $body.profile.shards.0.searches.0.rewrite_time/]
// TESTRESPONSE[s/"score": 51306/"score": $body.profile.shards.0.searches.0.query.0.breakdown.score/]
// TESTRESPONSE[s/"time_in_nanos": "1873811"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.time_in_nanos/]
// TESTRESPONSE[s/"build_scorer": 2935582/"build_scorer": $body.profile.shards.0.searches.0.query.0.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 919297/"create_weight": $body.profile.shards.0.searches.0.query.0.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 53876/"next_doc": $body.profile.shards.0.searches.0.query.0.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "391943"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.children.0.time_in_nanos/]
// TESTRESPONSE[s/"score": 28776/"score": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.score/]
// TESTRESPONSE[s/"build_scorer": 784451/"build_scorer": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 1669564/"create_weight": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 10111/"next_doc": $body.profile.shards.0.searches.0.query.0.children.0.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "210682"/"time_in_nanos": $body.profile.shards.0.searches.0.query.0.children.1.time_in_nanos/]
// TESTRESPONSE[s/"score": 4552/"score": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.score/]
// TESTRESPONSE[s/"build_scorer": 42602/"build_scorer": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.build_scorer/]
// TESTRESPONSE[s/"create_weight": 89323/"create_weight": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.create_weight/]
// TESTRESPONSE[s/"next_doc": 2852/"next_doc": $body.profile.shards.0.searches.0.query.0.children.1.breakdown.next_doc/]
// TESTRESPONSE[s/"time_in_nanos": "304311"/"time_in_nanos": $body.profile.shards.0.searches.0.collector.0.time_in_nanos/]
// TESTRESPONSE[s/"time_in_nanos": "32273"/"time_in_nanos": $body.profile.shards.0.searches.0.collector.0.children.0.time_in_nanos/]
```
To how you can cavalierly mock all the numbers at once with this change:
```
// TESTRESPONSE[s/(?<=[" ])\d+(\.\d+)?/$body.$_path/]
```
Currently a `delete document` request against a non-existing index actually **creates** this index.
With this change the `delete document` no longer creates the previously non-existing index and throws an `index_not_found` exception instead.
However as discussed in https://github.com/elastic/elasticsearch/pull/15451#issuecomment-165772026, if an external version is explicitly used, the current behavior is preserved and the index is still created and the document is marked for deletion.
Fixes#15425
Native scripts have been replaced in documentation by implementing
a ScriptEngine and they were deprecated in 5.5.0. This commit
removes the native script infrastructure for 6.0.
closes#19966
This PR adds a new thread pool type: `fixed_auto_queue_size`. This thread pool
behaves like a regular `fixed` threadpool, except that every
`auto_queue_frame_size` operations (default: 10,000) in the thread pool,
[Little's Law](https://en.wikipedia.org/wiki/Little's_law) is calculated and
used to adjust the pool's `queue_size` either up or down by 50. A minimum and
maximum is taken into account also. When the min and max are the same value, a
regular fixed executor is used instead.
The `SEARCH` threadpool is changed to use this new type of thread pool. However,
the min and max are both set to 1000, meaning auto adjustment is opt-in rather
than opt-out.
Resolves#3890
In scripts (at least some of the languages), the terms dictionary and
postings can be access with the special _index variable. This is for
very advanced use cases which want to do their own scoring. The problem
is segment level statistics must be recomputed for every document.
Additionally, this is not friendly to the terms index caching as the
order of looking up terms should be controlled by lucene.
This change removes _index from scripts. Anyone using it can and should
instead write a Similarity plugin, which is explicitly designed to allow
doing the calculations needed for a relevance score.
closes#19359
Today when an index is `read-only` the index is also blocked from
being deleted which sometimes is undesired since in-order to make
changes to a cluster indices must be deleted to free up space. This is
a likely scenario in a hosted environment when disk-space is limited to switch
indices read-only but allow deletions to free up space.
Our strong recommendation is disabling swap over any other alternative
to avoid the JVM from landing on disk. This commit clarifies the docs in
this regard.
This commit documents how to write a `ScriptEngine` in order to use
expert internal apis, such as using Lucene directly to find index term
statistics. These documents prepare the way to remove both native
scripts and IndexLookup.
The example java code is actually compiled and tested under a new gradle
subproject for example plugins. This change does not yet breakup
jvm-example into the new examples dir, which should be done separately.
relates #19359
relates #19966
This commit adds support for histogram and date_histogram agg compound order by refactoring and reusing terms agg order code. The major change is that the Terms.Order and Histogram.Order classes have been replaced/refactored into a new class BucketOrder. This is a breaking change for the Java Transport API. For backward compatibility with previous ES versions the (date)histogram compound order will use the first order. Also the _term and _time aggregation order keys have been deprecated; replaced by _key.
Relates to #20003: now that all these aggregations use the same order code, it should be easier to move validation to parse time (as a follow up PR).
Relates to #14771: histogram and date_histogram aggregation order will now be validated at reduce time.
Closes#23613: if a single BucketOrder that is not a tie-breaker is added with the Java Transport API, it will be converted into a CompoundOrder with a tie-breaker.
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc. In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot. In cloud-based repositories, this can be expensive,
both from a cost and performance perspective. Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot. This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.
This commit enhances the get snapshots API with an optional `verbose`
parameter. If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.
Closes#24288
Now that indices have a single type by default, we can move to the next step
and identify documents using their `_id` rather than the `_uid`.
One notable change in this commit is that I made deletions implicitly create
types. This helps with the live version map in the case that documents are
deleted before the first type is introduced. Otherwise there would be no way
to differenciate `DELETE index/foo/1` followed by `PUT index/foo/1` from
`DELETE index/bar/1` followed by `PUT index/foo/1`, even though those are
different if versioning is involved.
This commit adds support for indexing and searching a new ip_range field type. Both IPv4 and IPv6 formats are supported. Tests are updated and docs are added.
`_search_shards`API today only returns aliases names if there is an alias
filter associated with one of them. Now it can be useful to see which aliases
have been expanded for an index given the index expressions. This change also includes non-filtering aliases even without a filtering alias being present.
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.
Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.
Relates to #18160
Rewrites most of the snippets in the `innert_hits` docs to be
complete examples and enables `VIEW IN CONSOLE`, `COPY AS CURL`,
and automatic testing of the snippets.
Today we go to heroic lengths to workaround bugs in the JDK or around
issues like BSD jails to get information about the underlying file
store. For example, we went to lengths to work around a JDK bug where
the file store returned would incorrectly report whether or not a path
is writable in certain situations in Windows operating
systems. Another bug prevented getting file store information on
Windows on a virtual drive on Windows. We no longer need to work
around these bugs, we could simply try to write to disk and let an I/O
exception arise if we could not write to the disk or take advantage of
the fact that these bugs are fixed in recent releases of the JDK
(e.g., the file store bug is fixed since 8u72). Additionally, we
collected information about all file stores on the system which meant
that if the user had a stale NFS mount, Elasticsearch could hang and
fail on startup if that mount point was not available. Finally, we
collected information through Lucene about whether or not a disk was a
spinning disk versus an SSD, information that we do not need since we
assume SSDs by default. This commit takes into consideration that we
simply do not need this heroic effort, we do not need information
about all file stores, and we do not need information about whether or
not a disk spins to greatly simplfy file store handling.
Relates #24402
Open/Close index api have allow_no_indices set to false by default, while delete index has it set to true. The flag controls where a wildcard expression that matches no indices will be ignored or an error will be thrown instead. This commit aligns open/close default behaviour to that of delete index.
Add info about the base image used and the github repo of
elasticsearch-docker.
Clarify that setting `memlock=-1:-1` is only a requirement when
`bootstrap_memory_lock=true` and the alternatives we document
elsewhere in docs for disabling swap are valid for Docker as well.
Additionally, with latest versions of docker-ce shipping with
unlimited (or high enough) defaults for `nofile` and `nproc`, clarify
that explicitly setting those per ES container is not required, unless
they are not defined in the Docker daemon.
Finally simplify production `docker-compose.yml` example by removing
unneeded options.
Relates #24389
This change makes the request builder code-path same as `Client#execute`. The request builder used to return a `ListenableActionFuture` when calling execute, which allows to associate listeners with the returned future. For async execution though it is recommended to use the `execute` method that accepts an `ActionListener`, like users would do when using `Client#execute`.
Relates to #24412
Relates to #9201
The _cat/shards docs asserted that one of the columns looked like
a propery byte size but used a regex like `\d+\.\d+.*` which doesn't
match `0b` which is a possible value. Instead this uses
`\d(\.\d+)?[kmg]?b`.
The response is attempting to illustrate the sync_id marker, but in
the test the index is too "fresh" to have a sync marker. So the test
needs to execute a sync flush behind the scenes so that the marker
is present
Currently we don't write the count value to the geo_centroid aggregation rest response,
but it is provided via the java api and the count() method in the GeoCentroid interface.
We should add this parameter to the rest output and also provide it via the getProperty()
method.
The alias parameter was documented as a list in our rest-spec, yet only the first value out of a list was getting read and processed. This commit adds support for multiple aliases to _cat/aliases
Closes#23661
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.
Relates #15613
Docs: rewrite description of `bool`'s `should`
Rewrites the description of the `bool` query's `should`
clauses so it is (hopefully) more clear what the defaults
for `minimum_should_match` are.
There is still an `[IMPORTANT]` section about `minimum_should_match`
in a filter context. I think it is worth keeping because it is, well,
important.
Closes#23831
This commit adds a link to the minimum master nodes section of the
important settings docs from the Zen discovery docs to clarify the
meaning and importance of setting minimum master nodes to a quorum of
master-eligible nodes.
Relates #24311
The `count` value in the stats aggregation represents a simple doc count
that doesn't require a formatted version. We didn't render an "as_string"
version for count in the rest response, so the method should also be
removed in favour of just using String.valueOf(getCount()) if a string
version of the count is needed.
Closes#24287
I just spent ages debugging a script I wrote after following the documentation. It was not clear to me that _index is not defined when using painless; if it was mentioned on this page I would have saved myself a lot of time.
For the Windows service, JAVA_HOME should be set to the path to the
JDK. We should make this clear in the docs to help users avoid
frustrating startup problems.
Relates #24260
Add option "enable_position_increments" with default value true.
If option is set to false, indexed value is the number of tokens
(not position increments count)
This commit rewords the note on whitespace in Log4j settings to not
refer to only of the examples on the page, but instead be clear that the
note applies to all the examples on the page.
A confusing thing that can happen when configuring Log4j is that
extraneous whitespace throws off its configuration parsing yet the error
messages that arise give no indication that this is the problem. This
commit adds a note to the docs.
Relates #24198
This commit removes the deprecated cloud.aws.* settings. It also removes
backcompat for specifying `discovery.type: ec2`, and unused aws signer
code which was removed in a previous PR.
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.
Relates #6720
Elasticsearch runs as user elasticsearch with uid:gid 1000:1000 inside
the Docker container. Clarify that bind mounted local directories need
to be accessible by this user.
Relates #24092
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
The docs don't clearly explain that the deleted doc count also comes from lucene.
IMHO, it is worth highlighting this information separately, as a Note.
Apart from that, there should be an official recommended alternative as well.