#25147 added the translog deletion policy but didn't enable it by default. This PR enables a default retention of 512MB (same maximum size of the current translog) and an age of 12 hours (i.e., after 12 hours all translog files will be deleted). This increases to chance to have an ops based recovery, even if the primary flushed or the replica was offline for a few hours.
In order to see which parts of the translog are committed into lucene the translog stats are extended to include information about uncommitted operations.
Views now include all translog ops and guarantee, as before, that those will not go away. Snapshotting a view allows to filter out generations that are not relevant based on a specific sequence number.
Relates to #10708
The `document_type` parameter is no longer required to be specified,
because by default from 6.0 only a single type is allowed. (`index.mapping.single_type` defaults to `true`)
* [Analysis] Parse synonyms with the same analysis chain
Synonym Token Filter / Synonym Graph Filter tokenize synonyms with whatever tokenizer and token filters appear before it in the chain.
Close#7199
* Add MSI installation to documentation
Move installation documentation for Windows with the .zip archive into the zip and tar installation documentation, and clearly indicate any differences for installing on macOS/Linux and Windows.
* Separate out installation with .zip on Windows
With #23997 we have introduced a new internal index option that allows to resolve index expressions only against concrete indices while ignoring aliases. Such index option was applied to IndicesAliasesRequest, so that the index part of alias actions would only be resolved against concrete indices.
Same is done in this commit with delete index request. Deleting aliases has always been confusing as some users expect it to only remove the alias from the index (which has its own specific API). Even worse, in case of filtered aliases, deleting an alias may leave users with the expectation that only the documents that match the filter are deleted, which was never the case. To address all this confusion, delete index api works now only against concrete indices. WIldcard expressions will be only resolved against concrete index, as if aliases didn't exist. If one tries to delete against an alias, an IndexNotFoundException will be thrown regardless of whether the alias exists or not, as a concrete index with such a name doesn't exist.
Closes#2318
It adds notes about:
- how preference can help optimize cache usage
- the fact that too many replicas can hurt search performance due to lower
utilization of the filesystem cache
- how index sorting can improve _source compression
- how always putting fields in the same order in documents can improve _source
compression
* Add documentation for the new parent-join field
This commit adds the docs for the new parent-join field.
It explains how to define, index and query this new field.
Relates #20257
* Upgrade icu4j for the ICU analysis plugin to 59.1
Lucene upgraded to 59.1 so we should use the same.
Closes#21425
* Add breaking change for the icu upgrade
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
Expose the experimental simplepattern and
simplepatternsplit tokenizers in the common
analysis plugin. They provide tokenization based
on regular expressions, using Lucene's
deterministic regex implementation that is usually
faster than Java's and has protections against
creating too-deep stacks during matching.
Both have a not-very-useful default pattern of the
empty string because all tokenizer factories must
be able to be instantiated at index creation time.
They should always be configured by the user
in practice.
Get mappings HEAD requests incorrectly return a content-length header of
0. This commit addresses this by removing the special handling for get
mappings HEAD requests, and just relying on the general mechanism that
exists for handling HEAD requests in the REST layer.
Relates #23192
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves
exactly like the `unified` highlighter when index_options is set to `offsets`:
https://issues.apache.org/jira/browse/LUCENE-7815
It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided).
The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text.
Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such.
There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available.
I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.
This PR enables Ingest plugins to leverage processor-scoped REST
endpoints. First of which being the Grok endpoint that retrieves
Grok Patterns for users to retrieve all the built-in patterns.
Example usage: Kibana Grok Autocomplete!
This commit refactors the query phase in order to be able
to automatically detect queries that can be early terminated.
If the index sort matches the query sort, the top docs collection is early terminated
on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector.
This change also adds a new parameter to the search request called `track_total_hits`.
It indicates if the total number of hits that match the query should be tracked.
If false, queries sorted by the index sort will not try to compute this information and
and will limit the collection to the first N documents per segment.
Aggregations are not impacted and will continue to see every document
even when the index sort matches the query sort and `track_total_hits` is false.
Relates #6720
During package install on systemd-based systems, some sysctl settings
should be set (e.g. vm.max_map_count).
In some environments, changing sysctl settings plainly does not work;
previously a global environment variable named
ES_SKIP_SET_KERNEL_PARAMETERS was introduced to skip calling sysctl, but
this causes trouble for:
- configuration management systems, which usually cannot apply an env
var when running a package manager
- package upgrades, which will not have the env var set any more, and
thus leaving the package management system in a bad state (possibly
half-way upgraded, can be very hard to recover)
This removes the env var again and instead of calling systemd-sysctl
manually, tells systemd to restart the wrapper unit - which itself can
be masked by system administrators or management tools if it is known
that sysctl does not work in a given environment.
The restart is not silent on systems in their default configuration, but
is ignored if the unit is masked.
Relates #24234
The index parameter in the update-aliases, put-alias, and delete-alias APIs no longer accepts alias names. Instead, it accepts only index names (or wildcards which will expand to matching indices).
Closes#23960
This removes the parsing of things like `GET /idx/_aliases,_mappings`, instead,
a user must choose between retriving all index metadata with `GET /idx`, or only
a specific form such as `GET /idx/_settings`.
Relates to (and is a prerequisite of) #24437