As a follow-up to #38540 we can use lambda functions and method
references where convenient in the low-level REST client.
Also, we need to update the docs to state that the minimum java version
required is 1.8.
This commit reworks and clarifies the docs for the `discovery-ec2` plugin:
- folds the tiny "Getting started with AWS" into the page on configuration
- spells out the name of each setting in full instead of noting the
`discovery.ec2` prefix at the top of the page.
- replaces each `(Secure)` marker with a sentence describing what that means in
situ
- notes some missing defaults
- clarifies the behaviour of `discovery.ec2.groups` (dependent on `.any_group`)
- clarifies what `discovery.ec2.host_type` is for
- adds `discovery.ec2.tag.TAGNAME` as a (meta-)setting rather than describing
it in a separate section
- notes that the tags mentioned in `discovery.ec2.tag.TAGNAME` cannot contain
colons (see #38406)
- clarifies the EC2-specific interface names and what they're for
- reorders and rewords the recommendations for storage
- expands on why you should not span a cluster across regions
- adds a suggestion on protecting instances against termination during scale-in
- reformat to 80 columns where possible
Fixes#38406
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
This commit updates the default ciphers and TLS protocols that are used
when the runtime JDK supports them. New cipher support has been
introduced in JDK 11 and 12 along with performance fixes for AES GCM.
The ciphers are ordered with PFS ciphers being most preferred, then
AEAD ciphers, and finally those with mainstream hardware support. When
available stronger encryption is preferred for a given cipher.
This is a backport of #41385 and #41808. There are known JDK bugs with
TLSv1.3 that have been fixed in various versions. These are:
1. The JDK's bundled HttpsServer will endless loop under JDK11 and JDK
12.0 (Fixed in 12.0.1) based on the way the Apache HttpClient performs
a close (half close).
2. In all versions of JDK 11 and 12, the HttpsServer will endless loop
when certificates are not trusted or another handshake error occurs. An
email has been sent to the openjdk security-dev list and #38646 is open
to track this.
3. In JDK 11.0.2 and prior there is a race condition with session
resumption that leads to handshake errors when multiple concurrent
handshakes are going on between the same client and server. This bug
does not appear when client authentication is in use. This is
JDK-8213202, which was fixed in 11.0.3 and 12.0.
4. In JDK 11.0.2 and prior there is a bug where resumed TLS sessions do
not retain peer certificate information. This is JDK-8212885.
The way these issues are addressed is that the current java version is
checked and used to determine the supported protocols for tests that
provoke these issues.
Adds a note that restarting half-or-more of the master-eligible nodes means
you're no longer doing a rolling upgrade, and may need to upgrade all the
things before the cluster returns to health.
Configurations are stored in the .data-frame-internal-1
index, but users should not add configurations directly to
the index as additional information to enable access control
is added. This adds a warning against allowing access to the
internal index.
The migrate tool was added when the native realm was created, to aid
users in converting from file realms that were per node, into the
cluster managed native realm. While this tool was useful at the time,
users should now be using the native realm directly. This commit
deprecates the tool, to be removed in a followup for 8.0.
Adds an initial limited implementations of geo features to SQL. This implementation is based on the [OpenGIS® Implementation Standard for Geographic information - Simple feature access](http://www.opengeospatial.org/standards/sfs), which is the current standard for GIS system implementation. This effort is concentrate on SQL option AKA ISO 19125-2.
Queries that are supported as a result of this initial implementation
Metadata commands
- `DESCRIBE table` - returns the correct column types `GEOMETRY` for geo shapes and geo points.
- `SHOW FUNCTIONS` - returns a list that includes supported `ST_` functions
- `SYS TYPES` and `SYS COLUMNS` display correct types `GEO_SHAPE` and `GEO_POINT` for geo shapes and geo points accordingly.
Returning geoshapes and geopoints from elasticsearch
- `SELECT geom FROM table` - returns the geoshapes and geo_points as libs/geo objects in JDBC or as WKT strings in console.
- `SELECT ST_AsWKT(geom) FROM table;` and `SELECT ST_AsText(geom) FROM table;`- returns the geoshapes ang geopoints in their WKT representation;
Using geopoints to elasticsearch
- The following functions will be supported for geopoints in queries, sorting and aggregations: `ST_GeomFromText`, `ST_X`, `ST_Y`, `ST_Z`, `ST_GeometryType`, and `ST_Distance`. In most cases when used in queries, sorting and aggregations, these function are translated into script. These functions can be used in the SELECT clause for both geopoints and geoshapes.
- `SELECT * FROM table WHERE ST_Distance(ST_GeomFromText(POINT(1 2), point) < 10;` - returns all records for which `point` is located within 10m from the `POINT(1 2)`. In this case the WHERE clause is translated into a range query.
Limitations:
Geoshapes cannot be used in queries, sorting and aggregations as part of this initial effort. In order to fully take advantage of geoshapes we would need to have access to geoshape doc values, which is coming in #37206. `ST_Z` cannot be used on geopoints in queries, sorting and aggregations since we don't store altitude in geo_point doc values.
Relates to #29872
Backport of #42031
* [ML] adding pivot.size option for setting paging size
* Changing field name to address PR comments
* fixing ctor usage
* adjust hlrc for field name change
This commit slightly reworks the recommendations in the docs about setting the
heap size:
* the "rules of thumb" are actually instructions that should be followed
* the reason for setting `Xmx` to 50% of the heap size is more subtle than just
leaving space for the filesystem cache
* it is normal to see Elasticsearch using more memory than `Xmx`
* replace `cutoff` and `limit` with `threshold` since all three terms are used
interchangeably
* since we recommend setting `Xmx` equal to `Xms`, avoid talking about setting
`Xmx` in isolation
Relates #41954
This processor uses the lucene HTMLStripCharFilter class to remove HTML
entities from a field. This adds to the char filter, so that there is
possibility to store the stripped version as well.
Note, that the characeter filter replaces tags with a newline, so that
the produced HTML will look slightly different than the incoming HTML
with regards to newlines.
The `bulk` threadpool is now called `write`, but `bulk` is still
used in some examples. This commit fixes that.
Also, the only way `threadpool.bulk.write: 30` is a valid increase in the size
of this threadpool is if you have 29 processors, which is an odd number of
processors to have. This commit removes the "more threads" bit.
In cases where node names and transport addresses can be muddled, it is unclear
that `cluster.initial_master_nodes: master-a:9300` means to look for a node
called `master-a:9300` rather than a node called `master-a` with transport port
`9300`. This commit adds docs to that effect.
Today Elasticsearch accepts, but silently ignores, port ranges in the
`discovery.seed_hosts` setting:
```
discovery.seed_hosts: 10.1.2.3:9300-9400
```
Silently ignoring part of a setting like this is trappy. With this change we
reject seed host addresses of this form.
Closes#40786
Backport of #41404
The settings listed under the "Default values for TLS/SSL settings"
heading are not actual settings, rather they are common suffixes that
are used for settings that exist in a variety of contexts.
This commit changes the way they are presented to reduce this
confusion.
Backport of: #41779
The CircuitBreaker was introduced as means of preventing a
`StackOverflowException` during the build of the AST by the parser.
The ANTLR4 grammar causes a weird behaviour for a Parser Listener.
The `enterEveryRule()` method is often called with a different parsing
context than the respective `exitEveryRule()`. This makes it difficult
to keep track of the tree's depth, and a custom Map was used as an
attempt of matching the contextes as they are encounter during `enter`
and during `exit` of the rules.
This approach had 2 important drawbacks:
1. It's hard to maintain this custom Map as the grammar changes.
2. The CircuitBreaker could often lead to false positives which caused
valid queries to return an Exception and prevent them from executing.
So, this removes completely the CircuitBreaker which is replaced be
a simple handling of the `StackOverflowException`
Fixes: #41471
(cherry picked from commit 1559a8e2dbd729138b52e89b7e80264c9f4ad1e7)
The `path_match` and `path_unmatch` parameters in dynamic templates match on
object fields in addition to leaf fields. This is not obvious and can cause
surprising errors when a template is meant for a leaf field, but there are
object fields that match. This PR adds a note to the docs to describe the
current behavior.
We received some feedback that it is not completely clear why `_doc` is present
in the typeless document APIs:
> The new index APIs are PUT {index}/_doc/{id} in case of explicit ids and POST
{index}/_doc for auto-generated ids."_ Isn't this contradicting? Specifying
*types in requests is deprecated*, but we are supposed to still mention *_doc*
in write requests?
This PR updates the 'removal of types' documentation to try to clarify that
`_doc` now represents the endpoint name, as opposed to a type.
Add a TIP on how to use CASE to achieve custom bucketing
with GROUP BY.
Follows: #41349
(cherry picked from commit eb5f5d45533c5f81e57dd0221d902a73ec400098)
As negative scores will now cause an error, and it is easy to
accidentally produce negative scores with some of the built-in modifiers
(especially `ln` and `log`), this adjusts the documentation to more
strongly recommend the use of `ln1p` and `log1p` instead.
Also corrects some awkward formatting on the note sections following the
table.
Today's `docker-compose` docs are missing the `discovery.seed_nodes` config on
one of the nodes. With today's configuration the cluster can still form the
first time it is started, because `cluster.initial_master_nodes` requires both
nodes to bootstrap the cluster which ensures that each discover the other.
However if `es02` is elected master it will remove `es01` from the voting
configuration and then when restarted it will form a cluster on its own without
needing to do any discovery. Meanwhile `es01` doesn't know how to find `es02`
after a restart so will be unable to join this cluster.
This commit fixes this by adding the missing configuration.
Relates #41394, which fixes a different `docker-compose.yml` in the same way.
* Clarify that peer recovery settings apply to shard relocation
* Fix awkward wording of 1st sentence
* [DOCS] Remove snapshot recovery reference.
Call out link to [[cat-recovery]].
Separate expert settings.
Added integrations for a couple of frameworks.
Removed community clients where the last commit was more than three
years ago. Also added the official go client link and removed the
official groovy client, as it is outdated.
* [ML] Adds progress reporting for transforms
* fixing after master merge
* Addressing PR comments
* removing unused imports
* Adjusting afterKey handling and percentage to be 100*
* Making sure it is a linked hashmap for serialization
* removing unused import
* addressing PR comments
* removing unused import
* simplifying code, only storing total docs and decrementing
* adjusting for rewrite
* removing initial progress gathering from executor
Today the `_field_caps` API returns the list of indices where a field
is present only if this field has different types within the requested indices.
However if the request is an index pattern (or an alias, or both...) there
is no way to infer the indices if the response contains only fields that have
the same type in all indices. This commit changes the response to always return
the list of indices in the response. It also adds a way to retrieve unmapped field
in a specific section per field called `unmapped`. This section is created for each field
that is present in some indices but not all if the parameter `include_unmapped` is set to
true in the request (defaults to false).
We link to these migraiton docs but we don't specify the id. This
isn't great practice in general and is preventing us from migrating to
Asciidoctor because it generates ids in a slightly different way.
Adds some validation to prevent duplicate source names from being
used in the composite agg.
Also refactored to use a ConstructingObjectParser and removed the
private ctor and setter for sources, making it mandatory.
This adds a gradle task called generateContextDoc in the Painless module. The
task will start a cluster, issue commands against the context rest api for
Painless, and generate documentation for each API per context. Each context
has a first page of classes sorted by package first and class name second,
along with a page per package with each classes' constructors, methods, and
fields. A link is generated for each constructor, method, and field to a JavaDoc
page when possible.
Drops some inline callouts that snuck into 7.x. We're doings this in
preparation for switching the elasticsearch reference to asciidoctor
which doesn't support them.
Implement a more trivial case of the CASE expression which is
expressed as a traditional function with 2 or 3 arguments. e.g.:
IIF(a = 1, 'one', 'many')
IIF(a > 0, 'positive')
Closes: #40917
(cherry picked from commit add02f4f553ad472026dcc1eaa84245a0558a4b0)
The example to delete a remote cluster is missing the `skip_unavailable` setting which results in an error:
```
"type": "illegal_argument_exception",
"reason": "missing required setting [cluster.remote.tiny-test.seeds] for setting [cluster.remote.tiny-test.skip_unavailable]"
```
Implement the ANSI SQL CASE expression which provides the if/else
functionality common to most programming languages.
The CASE expression can have multiple WHEN branches and becomes a
powerful tool for SQL queries as it can be used in SELECT, WHERE,
GROUP BY, HAVING and ORDER BY clauses.
Closes: #36200
(cherry picked from commit 8b2577406f47ae60d15803058921d128390af0b6)
Today's `docker-compose` docs are missing the `discovery.seed_nodes` config on
one of the nodes. With today's configuration the cluster can still form the
first time it is started, because `cluster.initial_master_nodes` requires both
nodes to bootstrap the cluster which ensures that each discover the other.
However if `es02` is elected master it will remove `es01` from the voting
configuration and then when restarted it will form a cluster on its own without
needing to do any discovery. Meanwhile `es01` doesn't know how to find `es02`
after a restart so will be unable to join this cluster.
This commit fixes this by adding the missing configuration.
This change clarifies the documentation around the recommended JVM. The
recommended JVM is the bundled JVM. If a user does not use our
recommended JVM we suggest that they use a supported LTS version of the
JVM.
Closes#41132
Fix a deprecation warning that wasn't rendering correctly in
asciidoctor. This one needed to be explicitly marked as an inline macro
because it is on its own line and it needed to have its text escaped
because it contained a `,`. It also was missing explanitory text for
what the setting was.
Currently enabling profiling disables top-hits optimizations, which is
unfortunate: it would be nice to be able to notice the difference in method
counts and timings depending on whether total hit counts are requested.
This PR makes a few clarifications to the docs for the `enabled` setting:
- Replace references to 'mapping type' with 'mapping' or 'mapping definition'.
- In code examples, clarify that the disabled fields have type `object`.
- Add a section on how disabled fields can hold non-object data.
This section should be at the same sub-level as other sections in the
auto date-histogram docs, otherwise it is rendered on to another page
and is confusing for users to understand what it's in reference to.
The following phrase causes confusion:
> Alternatively the IP addresses or hostnames (if node name defaults to the
> host name) can be used.
This change clarifies the conditions under which you can use a hostname, and
adds an anchor to the note introduced in (#41137) so we can link directly to it
in conversations with users.
Fixes rendering the `version_qualified` attribute in the docs. This
attribute includes `-SNAPSHOT` but does not include `-alpha` or `-beta`
and represents the `version` field as returned by `GET /` or
`GET /_cat/plugins`. Without this change we just drop the entire line on
the floor rather than render it so the output looks bad.
This helps avoid memory issues when computing deep sub-aggregations. Because it
should be rare to use sub-aggregations with significant terms, we opted to always
choose breadth first as opposed to exposing a `collect_mode` option.
Closes#28652.
Added documentation for node repurpose tool and included documentation on how to repurpose nodes safely. Adjusted order of tools in `elasticsearch-node` tool since the repurpose tool is most likely to be used.
Co-Authored-By: David Turner <david.turner@elastic.co>
The explanation given in the completion suggester documentation why we use the
"simple" analyzer as the default is no longer valid. Since we still use "simple"
as the default, we should just delete the explanation that doesn't fit anymore.
Closes#36715
* [DOCS] Fix broken link to Elasticsearch Docker source code
* [DOCS] Link to Dockerfile in elastic/elasticsearch repo
* [DOCS] Link to Docker source files in elastic/elasticsearch repo
* [DOCS] Added settings page for ILM.
* [DOCS] Adding ILM settings file
* [DOCS] Moved the ILM settings to a separate section
* [DOCS] Linked to the rollover docs.
* [DOCS] Tweaked the "required" wording.
This commit is a correction of a doc bug in the docs for the ingest
date-index-name processor. The correct pattern is
yyyy-MM-dd'T'HH:mm:ss.SSSXX. This is due to the transition from Joda
time to Java time where Z does not mean the same thing between the two.
Command needs `?v` so user can see the column headers. Otherwise the instructions in the note about checking the init and relo columns don't make sense
This commit fixes a problem with BWC that was brought up in #40511. A
newer version of the code was emitting a new value for an enum to an
older version, and the older version could not handle that. It caused
the response to error. The MainResponse is now relaxed, and will accept
whatever values the server expose, and holds most of them as Strings
instead of complex objects.
Fixes#40511
We generate two pages with "funny" names:
* _changing_the_client_8217_s_initialization_code.html
* _changing_the_application_8217_s_code.html
The leading `_` comes from us not specifying the name of the page. The
`8217` comes about because of the single quote character. This is a
funny name, but it is the name that we have so we shouldn't change it
without putting in a redirect.
We're looking at switching these docs from being built with the
no-longer-maintained AsciiDoc project to being built with the
actively-maintained Asciidoctor project. Asciidoctor Doesn't include the
`8217`s in the generated ids. That is *better*, but we don't really want
to change the pages. Ultimately we'd prefer none of our pages start with
`_`, but that is a problem for a different time.
Anyway, this pins the ids to their "funny" id so it won't change when we
switch to Asciidoctor. We'll remove it later, when we have more fine
control of our redirects.
Drops the inline callouts from the painless reference book. These
callouts are incompatible with Asciidoctor and we'd very much like to
switch to Asciidoctor for building this book, partially because
Asciidoctor is actively developed and AsciiDoc is not, and partially
because it builds the book three times faster.
This change rejects an illegal combination of flush parameters where
force is true, but wait_if_ongoing is false. This combination is trappy
and should be forbidden.
Closes#36342
- Added square brackets for the optional argument of precision
- Fixed character to lower case after comma
(cherry picked from commit d2f6f3b9ce36875e2eb6145c50464b4d72f2b1df)
After `TIME` SQL data type is introduced, implement
`CURRENT_TIME/CURTIME` functions similarly to CURRENT_TIMESTAMP
that return the system's current time (only, without the date part).
Closes: #40468
(cherry picked from commit 9feede781409d0e264ce45951a25b28ff129b187)
* Add release notes for 7.0.0-rc1.
* [DOCS] Fixes broken link to breaking changes
* [DOCS] Removed old ML PRs; edited titles
* Remove superseded PR.
* Clean up Lucene upgrade PRs/issues.
A full format for a DATETIME would be:
`2019-03-30T10:20:30.123+10:00` which is 29 chars long.
For DATE a full format would be: `2019-03-30T00:00:00.000+10:00`
which is also 29 chars long.
(cherry picked from commit 6be83964ed025528778bca8d35692762e166983b)
Moves the id of the preface in the java-api so it is compatible with
both AsciiDoc and Asciidoctor. As it stands now we apply the id that we
want for the preface to the book itself which is strange and only works
with AsciiDoc.
Support ANSI SQL's TIME type by introductin a runtime-only
ES SQL time type.
Closes: #38174
(cherry picked from commit 046ccd4cf0a251b2a3ddff6b072ab539a6711900)
This updates the casting table to reflect the recent changes for casting consistency in Painless. This also adds a small section on explicitly casting a character to a String which has always been allowed but undocumented.
There is a single example in the Java API docs that contains an inline
callout that is incompatible with Asciidoctor:
```
client.prepareUpdate("ttl", "doc", "1")
.setScript(new Script(
"ctx._source.gender = \"male\"" <1> , ScriptService.ScriptType.INLINE, null, null))
.get();
```
This rewrites the example to use an Asciidoctor compatible end of line
callout. It also looks nicer to me because it fits better on the page.
```
client.prepareUpdate("ttl", "doc", "1")
.setScript(new Script(
"ctx._source.gender = \"male\"", <1>
ScriptService.ScriptType.INLINE, null, null))
.get();
```
We no longer need to mention soft deletes in the getting started guide
now that retention leases exist and default to 12h. This commit removes
mention of soft deletes from the getting started guide, to simplify that
content.
Currently, the docs correctly state that using `now` in range queries will not
be affected by the `time_zone` parameter. However, using date math roundings
like e.g. `now\d` will be affected by the `time_zone`. Adding this example
because it seems to be a frequently asked question and source of confusion.
Relates to #40581
Specifying an inline script in an "inline" field
was deprecated in 5.x. The new field name is
"source".
(Since 6.x still accepts "inline" I will only backport
this docs change as far as 7.0.)
I have been hitting the suite timeout on `DocsClientYamlTestSuiteIT`
As far as I can see, the docs tests are taking quite a while, I assume
it's because more and more docs snippets get added over time, which
means more tests. The current suite timeout is the default 20 minutes.
It takes me just a little less than 20 minutes to run these on my
laptop. On my CI, I end up hitting the suite timeout. Hereby I propose
that we increase the suite timeout to 30 minutes.
This commit changes the note in docs about required java version to note
the existence of the bundled jdk and how to bring your own java. It also
reorganizes the zip/targz docs as zip is no longer suitable on
Linux/MacOS.
The basic models `b, de, p` and the after effect `no`
are not available anymore in Lucene 8 but they are still
listed in the >7x documentation. This change removes these
references that should also be listed in the breaking change
of es 7.0.
Closes#40264
Improve the documentation of parameter --pass of elasticsearch-certutil
Backport of: #40137
Co-Authored-By: Diego Cardozo Sandrim <diegocsandrim@users.noreply.github.com>
Co-Authored-By: Vigneash Sundar <vikene@users.noreply.github.com>
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.
This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.