* [DOCS] Add introduction to Elasticsearch.
* [DOCS] Incorporated review comments.
* [DOCS] Minor edits to add an abbreviated title and cross refs.
* [DOCS] Added sizing tips & link to quantatative sizing video.
To be consistent with the `search.max_buckets` default setting,
set the hard limit of the PriorityQueue used for in memory sorting,
when sorting on an aggregate function, to 10000.
Fixes: #43168
(cherry picked from commit 079e012fdea68ea0a7daae078359495047e9c407)
The machine learning feature of xpack has native binaries with a
different commit id than the rest of code. It is currently exposed in
the xpack info api. This commit adds that commit information to the ML
info api, so that it may be removed from the info api.
Now emphasises the test is for indexed values.
Previous documentation only mentioned the state of the input JSON doc (null values) but this is only one of several reasons why an indexed value may not exist.
Closes#24256
The description field of xpack featuresets is optionally part of the
xpack info api, when using the verbose flag. However, this information
is unnecessary, as it is better left for documentation (and the existing
descriptions describe anything meaningful). This commit removes the
description field from feature sets.
Rest docs page update
- have the section be on separate pages
- add an Overview page
- add other formats examples
(cherry picked from commit 309bd691ff3f8625f67ca09fc1dd8e265f7e6c92)
* [ML] Adding support for geo_shape, geo_centroid, geo_point in datafeeds
* only supporting doc_values for geo_point fields
* moving validation into GeoPointField ctor
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
This change adds the earliest and latest timestamps into
the field stats for fields of type "date" in the output of
the ML find_file_structure endpoint. This will enable the
cards for date fields in the file data visualizer in the UI
to be made to look more similar to the cards for date
fields in the index data visualizer in the UI.
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
The `replace` option in the phonetic token filter can have suprising side
effects, e.g. such as described in #26921. This PR adds a note to be mindful
about such scenarios and offers alternatives to using the `replace` option.
Closes#26921
This change abstracts the specific types away from the different
representations of datetime as a datetime representation in code can be all
kinds of different things. This defines the three most common types of
datetimes as numeric, string, and complex while outlining the type most
typically used for these as long, String, and ZonedDateTime, respectively.
Documentation uses the definitions while examples use the types. This makes
the documentation easier to consume especially for people from a non-Java
background.
This commit adds functionality so that aliases that are manipulated on
leader indices are replicated by the shard follow tasks to the follower
indices. Note that we ignore write indices. This is due to the fact that
follower indices do not receive direct writes so the concept is not
useful.
Relates #41815
Adding notes to the existing docs about how using `preference` might increase
request cache utilization but also add warning about the downsides.
Closes#24278
This commit addresses a few more frequently-asked questions:
* clarifies that bootstrapping doesn't happen even after a full cluster
restart.
* removes the example that uses IP addresses, to try and further encourage the
use of node names for bootstrapping.
* clarifies that auto-bootstrapping might form different clusters on different
hosts, and gives a process for starting again if this wasn't what you wanted.
* adds the "do not stop half-or-more of the master-eligible nodes" slogan that
was notably absent.
* reformats one of the console examples to a narrower width
For `multi_match` query: link `boost` param to the generic reference
for query usage and `slop` to the `match_phrase` query where its usage
is documented.
Fixes: #40091
(cherry picked from commit 69993049a8bd9e7f042935729fe69a8266d95a0a)
Add an explanatory NOTE section to draw attention to the difference
between small and capital letters used for the index date patterns.
e.g.: HH vs hh, MM vs mm.
Closes: #22322
(cherry picked from commit c8125417dc33215651f9bb76c9b1ffaf25f41caf)
Fix a couple of wrong links because of the order of the anchor
and the usage of backquotes.
(cherry picked from commit 4e0c6525153b60a57202937c2ae57968c8e35285)
When analysing a semi-structured text file the
find_file_structure endpoint merges lines to form
multi-line messages using the assumption that the
first line in each message contains the timestamp.
However, if the timestamp is misdetected then this
can lead to excessive numbers of lines being merged
to form massive messages.
This commit adds a line_merge_size_limit setting
(default 10000 characters) that halts the analysis
if a message bigger than this is created. This
prevents significant CPU time being spent subsequently
trying to determine the internal structure of the
huge bogus messages.
Adding an example of how to re-implement the polish stempel analyzer
in case a user want to modify or extend it. In order for the analyzer to be
able to use polish stopwords, also registering a polish_stop filter for the
stempel plugin.
Closes#13150
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.
This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
This PR updates the docs for `docvalue_fields` and `stored_fields` to clarify
that nested fields must be accessed through `inner_hits`. It also tweaks the
nested fields documentation to make this point more visible.
Addresses #23766.
In AsciiDoc, `subs="attributes,callouts,macros"` options were required
to render `include-tagged::` in a code block.
With elastic/docs#827, Elasticsearch Reference documentation migrated
from AsciiDoc to Asciidoctor.
In Asciidoctor, the `subs="attributes,callouts,macros"` options are no
longer needed to render `include-tagged::` in a code block. This commit
removes those unneeded options.
Resolves#41589
Several `ifdef::asciidoctor` conditionals were added so that AsciiDoc
and Asciidoctor doc builds rendered consistently.
With https://github.com/elastic/docs/pull/827, Elasticsearch Reference
documentation migrated completely to Asciidoctor. We no longer need to
support AsciiDoc so we can remove these conditionals.
Resolves#41722
Several `ifdef::asciidoctor` conditionals were added so that AsciiDoc
and Asciidoctor doc builds rendered consistently.
With https://github.com/elastic/docs/pull/827, Elasticsearch Reference
documentation migrated completely to Asciidoctor. We no longer need to
support AsciiDoc so we can remove these conditionals.
Resolves#41722
* Previously, we mentioned multiple times that each nested object was indexed as its own document. This is repetitive, and is also a bit confusing in the context of `index.mapping.nested_fields.limit`, as that applies to the number of distinct `nested` types in the mappings, not the number of nested objects. We now just describe the issue once at the beginning of the section, to illustrate why `nested` types can be expensive.
* Reference the ongoing example to clarify the meaning of the two settings.
Addresses #28363.
Since the max_score optimization landed in Elasticsearch 7,
the CommonTermsQuery is redundant and slower. Moreover the
cutoff_frequency parameter for MatchQuery and MultiMatchQuery
is redundant.
Relates to #27096
(cherry picked from commit 04b74497314eeec076753a33b3b6cc11549646e8)
Both of these classes are basically a bloated wrapper around a simple
construct that can simply be a DirectoryFactory interface. This change
removes both classes and replaces them with a simple stateless interface
that creates a new `Directory` per shard. The concept of `index.store` is preserved
since it makes sense from a configuration perspective.
This change contains a major refactoring of the timestamp
format determination code used by the ML find file structure
endpoint.
Previously timestamp format determination was done separately
for each piece of text supplied to the timestamp format finder.
This had the drawback that it was not possible to distinguish
dd/MM and MM/dd in the case where both numbers were 12 or less.
In order to do this sensibly it is best to look across all the
available timestamps and see if one of the numbers is greater
than 12 in any of them. This necessitates making the timestamp
format finder an instantiable class that can accumulate evidence
over time.
Another problem with the previous approach was that it was only
possible to override the timestamp format to one of a limited
set of timestamp formats. There was no way out if a file to be
analysed had a timestamp that was sane yet not in the supported
set. This is now changed to allow any timestamp format that can
be parsed by a combination of these Java date/time formats:
yy, yyyy, M, MM, MMM, MMMM, d, dd, EEE, EEEE, H, HH, h, mm, ss,
a, XX, XXX, zzz
Additionally S letter groups (fractional seconds) are supported
providing they occur after ss and separated from the ss by a dot,
comma or colon. Spacing and punctuation is also permitted with
the exception of the question mark, newline and carriage return
characters, together with literal text enclosed in single quotes.
The full list of changes/improvements in this refactor is:
- Make TimestampFormatFinder an instantiable class
- Overrides must be specified in Java date/time format - Joda
format is no longer accepted
- Joda timestamp formats in outputs are now derived from the
determined or overridden Java timestamp formats, not stored
separately
- Functionality for determining the "best" timestamp format in
a set of lines has been moved from TextLogFileStructureFinder
to TimestampFormatFinder, taking advantage of the fact that
TimestampFormatFinder is now an instantiable class with state
- The functionality to quickly rule out some possible Grok
patterns when looking for timestamp formats has been changed
from using simple regular expressions to the much faster
approach of using the Shift-And method of sub-string search,
but using an "alphabet" consisting of just 1 (representing any
digit) and 0 (representing non-digits)
- Timestamp format overrides are now much more flexible
- Timestamp format overrides that do not correspond to a built-in
Grok pattern are mapped to a %{CUSTOM_TIMESTAMP} Grok pattern
whose definition is included within the date processor in the
ingest pipeline
- Grok patterns that correspond to multiple Java date/time
patterns are now handled better - the Grok pattern is accepted
as matching broadly, and the required set of Java date/time
patterns is built up considering all observed samples
- As a result of the more flexible acceptance of Grok patterns,
when looking for the "best" timestamp in a set of lines
timestamps are considered different if they are preceded by
a different sequence of punctuation characters (to prevent
timestamps far into some lines being considered similar to
timestamps near the beginning of other lines)
- Out-of-the-box Grok patterns that are considered now include
%{DATE} and %{DATESTAMP}, which have indeterminate day/month
ordering
- The order of day/month in formats with indeterminate day/month
order is determined by considering all observed samples (plus
the server locale if the observed samples still do not suggest
an ordering)
Relates #38086Closes#35137Closes#35132