It adds notes about:
- how preference can help optimize cache usage
- the fact that too many replicas can hurt search performance due to lower
utilization of the filesystem cache
- how index sorting can improve _source compression
- how always putting fields in the same order in documents can improve _source
compression
* Add documentation for the new parent-join field
This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.
Relates #20257
UnicodeSetFilter was only allowed in the icu_folding token filter.
It seems useful to expose this setting in icu_normalizer token filter
and char filter.
* Upgrade icu4j for the ICU analysis plugin to 59.1
Lucene upgraded to 59.1 so we should use the same.
Closes#21425
* Add breaking change for the icu upgrade
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
Expose the experimental simplepattern and
simplepatternsplit tokenizers in the common
analysis plugin. They provide tokenization based
on regular expressions, using Lucene's
deterministic regex implementation that is usually
faster than Java's and has protections against
creating too-deep stacks during matching.
Both have a not-very-useful default pattern of the
empty string because all tokenizer factories must
be able to be instantiated at index creation time.
They should always be configured by the user
in practice.
Get mappings HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for get
mappings HEAD requests, and just relying on the general mechanism that
exists for handling HEAD requests in the REST layer.
Relates #23192
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815
It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
This PR enables Ingest plugins to leverage processor-scoped REST
endpoints. First of which being the Grok endpoint that retrieves
Grok Patterns for users to retrieve all the built-in patterns.
Example usage: Kibana Grok Autocomplete!
The Log4j dependency is separated into two artifacts, the API and the
core implementation. This is to enable replacing Log4j on the backend
through the SLF4J bridge with another logging implementation. For this
reason, the dependencies are marked as optional. This causes confusion
amongst users as to use the bridge, the API should be non-optional since
it is needed for the bridge to function correctly. While they could pull
it into their application directly, it would be clearer if we simply
marked this depdendency as non-optional. Note that this does not mean
that users have to use Log4j for logging in their application, so we are
not marking core as required, it only clarifies what they need to be
able to plug in a different logging implementation.
Relates #25136
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information and
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.
Relates #6720
During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).
In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named
ES_SKIP_SET_KERNEL_PARAMETERS was introduced to skip calling sysctl, but
this causes trouble for:
- configuration management systems, which usually cannot apply an env
var when running a package manager
- package upgrades, which will not have the env var set any more, and
thus leaving the package management system in a bad state (possibly
half-way upgraded, can be very hard to recover)
This removes the env var again and instead of calling systemd-sysctl
manually, tells systemd to restart the wrapper unit - which itself can
be masked by system administrators or management tools if it is known
that sysctl does not work in a given environment.
The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.
Relates #24234
The index parameter in the update-aliases, put-alias, and delete-alias APIs no longer accepts alias names. Instead, it accepts only index names (or wildcards which will expand to matching indices).
Closes#23960
This removes the parsing of things like `GET /idx/_aliases,_mappings`, instead,
a user must choose between retriving all index metadata with `GET /idx`, or only
a specific form such as `GET /idx/_settings`.
Relates to (and is a prerequisite of) #24437
Some response classes in the java api expose both `getTook()` which returns a `TimeValue` and `getTookInMillis` which returns a `long` value. `getTook()` is enough as one can do `getTook().millis()` to obtain the same result as `getTookInMillis()`, which can be removed.
* Adds nodes usage API to monitor usages of actions
The nodes usage API has 2 main endpoints
/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
Still to do:
* [x] Create usage service to store usage statistics
* [x] Record usage in REST layer
* [x] Add Transport Actions
* [x] Add REST Actions
* [x] Tests
* [x] Documentation
* Rafactors UsageService so counts are done by the handlers
* Fixing up docs tests
* Adds a name to all rest actions
* Addresses review comments