This commit adds classifiers to the distributions indicating the
OS (for archives) and platform. The current OSes are for windows, darwin (ie
macos) and linux. This change will allow future OS/architecture specific
changes to the distributions. Note the docs using distribution links
have been updated, but will be reworked in a followup to make OS
specific instructions for the archives.
Restricted indices (currently only .security-6 and .security) are special
internal indices that require setting the `allow_restricted_indices` flag
on every index permission that covers them. If this flag is `false`
(default) the permission will not cover these and actions against them
will not be authorized.
However, the monitoring APIs were the only exception to this rule.
This exception is herein forfeited and index monitoring privileges have to be
granted explicitly, using the `allow_restricted_indices` flag on the permission,
as is the case for any other index privilege.
* Update the top-level 'getting started' guide.
* Remove custom types from the painless getting started documentation.
* Fix an incorrect references to '_doc' in the cardinality query docs.
* Update the _update docs to use the typeless API format.
This commit modifies the put follow index action to use a
CcrRepository when creating a follower index. It routes
the logic through the snapshot/restore process. A
wait_for_active_shards parameter can be used to configure
how long to wait before returning the response.
The delete and update by query APIs both offer protection against overriding concurrent user changes to the documents they touch. They currently are using internal versioning. This PR changes that to rely on sequences numbers and primary terms.
Relates #37639
Relates #36148
Relates #10708
The update request has a lesser known support for a one off update of a known document version. This PR adds an a seq# based alternative to power these operations.
Relates #36148
Relates #10708
Abdicates to another master-eligible node once the active master is reconfigured out of the voting
configuration, for example through the use of voting configuration exclusions.
Follow-up to #37712
In order to support JSON log format, a custom pattern layout was used and its configuration is enclosed in ESJsonLayout. Users are free to use their own patterns, but if smooth Beats integration is needed, they should use ESJsonLayout. EvilLoggerTests are left intact to make sure user's custom log patterns work fine.
To populate additional fields node.id and cluster.uuid which are not available at start time,
a cluster state update will have to be received and the values passed to log4j pattern converter.
A ClusterStateObserver.Listener is used to receive only one ClusteStateUpdate. Once update is received the nodeId and clusterUUid are set in a static field in a NodeAndClusterIdConverter.
Following fields are expected in JSON log lines: type, tiemstamp, level, component, cluster.name, node.name, node.id, cluster.uuid, message, stacktrace
see ESJsonLayout.java for more details and field descriptions
Docker log4j2 configuration is now almost the same as the one use for ES binary.
The only difference is that docker is using console appenders, whereas ES is using file appenders.
relates: #32850
We inject an Unfollow action before Shrink because the Shrink action
cannot be safely used on a following index, as it may not be fully
caught up with the leader index before the "original" following index is
deleted and replaced with a non-following Shrunken index. The Unfollow
action will verify that 1) the index is marked as "complete", and 2) all
operations up to this point have been replicated from the leader to the
follower before explicitly disconnecting the follower from the leader.
Injecting an Unfollow action before the Rollover action is done mainly
as a convenience: This allow users to use the same lifecycle policy on
both the leader and follower cluster without having to explictly modify
the policy to unfollow the index, while doing what we expect users to
want in most cases.
* ML: Add MlMetadata.upgrade_mode and API
* Adding tests
* Adding wait conditionals for the upgrade_mode call to return
* Adding tests
* adjusting format and tests
* Adjusting wait conditions for api return and msgs
* adjusting doc tests
* adding upgrade mode tests to black list
We changed the `action.auto_create_index` setting to be a dynamic cluster-level
setting in #20274 but today the reference manual indicates that it is still a
static node-level setting. This commit addresses this, and clarifies the
semantics of patterns that may both permit and forbid the creation of certain
indices.
Relates #7513
The docs silently accept duplicate note markers (such as `<3>` here) but
formats them in an unexpected way. This change removes this duplication so that
the rendered documentation looks as intended.
The ML file structure finder has always reported both Joda
and Java time format strings. This change makes the Java time
format strings the ones that are incorporated into mappings
and ingest pipeline definitions.
The BWC syntax of prepending "8" to these formats is used.
This will need to be removed once Java time format strings
become the default in Elasticsearch.
This commit also removes direct imports of Joda classes in the
structure finder unit tests. Instead the core Joda BWC class
is used.
This changes adds the support to handle `nested` fields in the `composite`
aggregation. A `nested` aggregation can be used as parent of a `composite`
aggregation in order to target `nested` fields in the `sources`.
Closes#28611
This commit changes the default for the `track_total_hits` option of the search request
to `10,000`. This means that by default search requests will accurately track the total hit count
up to `10,000` documents, requests that match more than this value will set the `"total.relation"`
to `"gte"` (e.g. greater than or equals) and the `"total.value"` to `10,000` in the search response.
Scroll queries are not impacted, they will continue to count the total hits accurately.
The default is set back to `true` (accurate hit count) if `rest_total_hits_as_int` is set in the search request.
I choose `10,000` as the default because that's also the number we use to limit pagination. This means that users will be able to know how far they can jump (up to 10,000) even if the total number of hits is not accurate.
Closes#33028
The default value for ssl.supported_protocols no longer includes TLSv1
as this is an old protocol with known security issues.
Administrators can enable TLSv1.0 support by configuring the
appropriate `ssl.supported_protocols` setting, for example:
xpack.security.http.ssl.supported_protocols: ["TLSv1.2","TLSv1.1","TLSv1"]
Relates: #36021
Ranaming as follows:
feature -> rank_feature
feature_vector -> rank_features
feature query -> rank_feature query
Ranaming is done to distinguish from other vector types.
Closes#36723
This deprecates the `xpack.watcher.history.cleaner_service.enabled` setting,
since all newly created `.watch-history` indices in 7.0 will use ILM to manage
their retention.
In 8.0 the setting itself and cleanup actions will be removed.
Resolves#32041
This commit removes the Index Audit Output type, following its deprecation
in 6.7 by 8765a31d4e6770. It also adds the migration notice (settings notice).
In general, the problem with the index audit output is that event indexing
can be slower than the rate with which audit events are generated,
especially during the daily rollovers or the rolling cluster upgrades.
In this situation audit events will be lost which is a terrible failure situation
for an audit system.
Besides of the settings under the `xpack.security.audit.index` namespace, the
`xpack.security.audit.outputs` setting has also been deprecated and will be
removed in 7. Although explicitly configuring the logfile output does not touch
any deprecation bits, this setting is made redundant in 7 so this PR deprecates
it as well.
Relates #29881
* Use ILM for Watcher history deletion
This commit adds an index lifecycle policy for the `.watch-history-*` indices.
This policy is automatically used for all new watch history indices.
This does not yet remove the automatic cleanup that the monitoring plugin does
for the .watch-history indices, and it does not touch the
`xpack.watcher.history.cleaner_service.enabled` setting.
Relates to #32041
Users may require the sequence number and primary terms to perform optimistic concurrency control operations. Currently, you can get the sequence number via the `docvalues_fields` API but the primary term is not accessible because it is maintained by the `SeqNoFieldMapper` and the infrastructure can't find it.
This commit adds a dedicated sub fetch phase to return both numbers that is connected to a new `seq_no_primary_term` parameter.
With this commit we add a note to the API conventions documentation that
all date math expressions are resolved independently of any locale. This
behavior might be puzzling to users that try to specify a different
calendar than a Gregorian calendar.
Closes#37330
Relates #37663
Removes all sensitive settings (passwords, auth tokens, urls, etc...) for
watcher notifications accounts. These settings were deprecated (and
herein removed) in favor of their secure sibling that is set inside the
elasticsearch keystore. For example:
`xpack.notification.email.account.<id>.smtp.password`
is no longer a valid setting, and it is replaced by
`xpack.notification.email.account.<id>.smtp.secure_password`
This change adds the unfollow action for CCR follower indices.
This is needed for the shrink action in case an index is a follower index.
This will give the follower index the opportunity to fully catch up with
the leader index, pause index following and unfollow the leader index.
After this the shrink action can safely perform the ilm shrink.
The unfollow action needs to be added to the hot phase and acts as
barrier for going to the next phase (warm or delete phases), so that
follower indices are being unfollowed properly before indices are expected
to go in read-only mode. This allows the force merge action to execute
its steps safely.
The unfollow action has three steps:
* `wait-for-indexing-complete` step: waits for the index in question
to get the `index.lifecycle.indexing_complete` setting be set to `true`
* `wait-for-follow-shard-tasks` step: waits for all the shard follow tasks
for the index being handled to report that the leader shard global checkpoint
is equal to the follower shard global checkpoint.
* `pause-follower-index` step: Pauses index following, necessary to unfollow
* `close-follower-index` step: Closes the index, necessary to unfollow
* `unfollow-follower-index` step: Actually unfollows the index using
the CCR Unfollow API
* `open-follower-index` step: Reopens the index now that it is a normal index
* `wait-for-yellow` step: Waits for primary shards to be allocated after
reopening the index to ensure the index is ready for the next step
In the case of the last two steps, if the index in being handled is
a regular index then the steps acts as a no-op.
Relates to #34648
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Co-authored-by: Gordon Brown <gordon.brown@elastic.co>
* Add ccr follow info api
This api returns all follower indices and per follower index
the provided parameters at put follow / resume follow time and
whether index following is paused or active.
Closes#37127
* iter
* [DOCS] Edits the get follower info API
* [DOCS] Fixes link to remote cluster
* [DOCS] Clarifies descriptions for configured parameters
The "include_type_name" parameter was temporarily introduced in #37285 to facilitate
moving the default parameter setting to "false" in many places in the documentation
code snippets. Most of the places can simply be reverted without causing errors.
In this change I looked for asciidoc files that contained the
"include_type_name=true" addition when creating new indices but didn't look
likey they made use of the "_doc" type for mappings. This is mostly the case
e.g. in the analysis docs where index creating often only contains settings. I
manually corrected the use of types in some places where the docs still used an
explicit type name and not the dummy "_doc" type.
This commit adds a set_priority action to the hot, warm, and cold
phases for an ILM policy. This action sets the `index.priority`
on the managed index to allow different priorities between the
hot, warm, and cold recoveries.
This commit also includes the HLRC and documentation changes.
closes#36905
* SQL: Rename SQL data type DATE to DATETIME
SQL data type DATE has only the date part (e.g.: 2019-01-14)
without any time information. Previously the SQL type DATE was
referring to the ES DATE which contains also the time part along
with TZ information. To conform with SQL data types the data type
`DATE` is renamed to `DATETIME`, since it includes also the time,
as a new runtime SQL `DATE` data type will be introduced down the road,
which only contains the date part and meets the SQL standard.
Closes: #36440
* Address comments
Some systems default to a nofile ulimit of 65535. To reduce the pain of
deploying Elasticsearch to such systems, this commit lowers the required
limit from 65536 to 65535.
In order to distinguish the ES-SQL type from the standard SQL type
add a new ES-SQL column that will make clear this distingstion,
e.g.: datetime vs TIMSTAMP
Fixes: #37519
The semantics of the API changed considerably since the documentation was written.
The main change is to remove references to memory reduction (this is related to refresh).
Instead, flush refers to recovery times. I also removed the references to trimming the translog
as the translog may be required for other purposes (operation history for ops based recovery
and complement ongoing file based recoveries).
Closes#32869
* Default include_type_name to false for get and put mappings.
* Default include_type_name to false for get field mappings.
* Add a constant for the default include_type_name value.
* Default include_type_name to false for get and put index templates.
* Default include_type_name to false for create index.
* Update create index calls in REST documentation to use include_type_name=true.
* Some minor clean-ups around the get index API.
* In REST tests, use include_type_name=true by default for index creation.
* Make sure to use 'expression == false'.
* Clarify the different IndexTemplateMetaData toXContent methods.
* Fix FullClusterRestartIT#testSnapshotRestore.
* Fix the ml_anomalies_default_mappings test.
* Fix GetFieldMappingsResponseTests and GetIndexTemplateResponseTests.
We make sure to specify include_type_name=true during xContent parsing,
so we continue to test the legacy typed responses. XContent generation
for the typeless responses is currently only covered by REST tests,
but we will be adding unit test coverage for these as we implement
each typeless API in the Java HLRC.
This commit also refactors GetMappingsResponse to follow the same appraoch
as the other mappings-related responses, where we read include_type_name
out of the xContent params, instead of creating a second toXContent method.
This gives better consistency in the response parsing code.
* Fix more REST tests.
* Improve some wording in the create index documentation.
* Add a note about types removal in the create index docs.
* Fix SmokeTestMonitoringWithSecurityIT#testHTTPExporterWithSSL.
* Make sure to mention include_type_name in the REST docs for affected APIs.
* Make sure to use 'expression == false' in FullClusterRestartIT.
* Mention include_type_name in the REST templates docs.
This commit removes the fallback for SSL settings. While this may be
seen as a non user friendly change, the intention behind this change
is to simplify the reasoning needed to understand what is actually
being used for a given SSL configuration. Each configuration now needs
to be explicitly specified as there is no global configuration or
fallback to some other configuration.
Closes#29797
Today file-chunks are sent sequentially one by one in peer-recovery. This is a
correct choice since the implementation is straightforward and recovery is
network bound in most of the time. However, if the connection is encrypted, we
might not be able to saturate the network pipe because encrypting/decrypting
are cpu bound rather than network-bound.
With this commit, a source node can send multiple (default to 2) file-chunks
without waiting for the acknowledgments from the target.
Below are the benchmark results for PMC and NYC_taxis.
- PMC (20.2 GB)
| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| -------- | -------- | -------- | -------- |
| Plain | 184s | 137s | 106s | 105s | 106s |
| TLS | 346s | 294s | 176s | 153s | 117s |
| Compress | 1556s | 1407s | 1193s | 1183s | 1211s |
- NYC_Taxis (38.6GB)
| Transport | Baseline | chunks=1 | chunks=2 | chunks=3 | chunks=4 |
| ----------| ---------| ---------| ---------| ---------| -------- |
| Plain | 321s | 249s | 191s | * | * |
| TLS | 618s | 539s | 323s | 290s | 213s |
| Compress | 2622s | 2421s | 2018s | 2029s | n/a |
Relates #33844
This adds a configurable whitelist to the HTTP client in watcher. By
default every URL is allowed to retain BWC. A dynamically configurable
setting named "xpack.http.whitelist" was added that allows to
configure an array of URLs, which can also contain simple regexes.
Closes#29937
`+` for index name inclusions is no longer supported for 6.x+. This
commit removes references of the `+` from the documenation. System
indices additional example is also included.
fixes#37237
Previously these were only linked in a circuitous way rather than being
available from the top level API documentation and "Put Lifecycle" API docs.
This makes them slightly easier to find for a user.
* provide overriden `hashCode` and toString methods to account for `DISTINCT`
* change the analyzer for scenarios where `COUNT <field_name>` and `COUNT DISTINCT` have different paths
* defined a new `filter` aggregation encapsulating an `exists` query to filter out null or missing values
Upgrading the Elastic Stack perfectly documents the process to
upgrade ES from 5 to 6 when internal indices are present. However,
the rolling upgrade docs do not mention anything about internal indices.
This adds a warning in the rolling upgrade procedure, highlighting that
internal indices should be upgraded before the rolling upgrade procedure
can be started.
This change adds support for the 'include_type_name' parameter for the
indices.get API. This parameter, which defaults to `false` starting in 7.0,
changes the response to not include the indices type names any longer.
If the parameter is set in the request, we additionally emit a deprecation
warning since using the parameter should be only temporarily necessary while
adapting to the new response format and we will remove it with the next major
version.
* [Analysis] Deprecate Standard Html Strip Analyzer
Deprecate only Standard Html Strip Analyzer
If user create index with the analyzer since 7.0, es throws an exception.
If an index was created before 7.0, es issue deprecation log
We will remove it in 8.0
Related #4704
In Lucene 8 searches can skip non-competitive hits if the total hit count is not requested.
It is also possible to track the number of hits up to a certain threshold. This is a trade off to speed up searches while still being able to know a lower bound of the total hit count. This change adds the ability to set this threshold directly in the track_total_hits search option. A boolean value (true, false) indicates whether the total hit count should be tracked in the response. When set as an integer this option allows to compute a lower bound of the total hits while preserving the ability to skip non-competitive hits when enough matches have been collected.
Relates #33028
Today it's very difficult to see which indices are frozen or rather
throttled via the commonly used monitoring APIs. This change adds
a cell to the `_cat/indices` API to render if an index is `search.throttled`
Relates to #34352
Adds an example on translating geohashes returned by geohashgrid
agg as bucket keys into geo bounding box filters in elasticsearch as well
as 3rd party applications.
Closes#36413
Enhance error message for the case that the 2nd argument of PERCENTILE
and PERCENTILE_RANK is not a foldable, as it doesn't make sense to have
a dynamic value coming from a field.
Fixes: #36903
Types can be used both in the source and dest section of the body which will
be translated to search and index requests respectively. Adding a deprecation warning
for those cases and removing examples using more than one type in reindex since
support for this is going to be removed.
There are a handful of examples in the ILM documentation that could result in
rolling over indices more quickly than we might normally recommend,
contributing to over-sharding in cases where the examples are copied without
modification. This change makes some numbers bigger to try and avoid this.
With this commit we rename `node.store.allow_mmapfs` to
`node.store.allow_mmap`. Previously this setting has controlled whether
`mmapfs` could be used as a store type. With the introduction of
`hybridfs` which also relies on memory-mapping,
`node.store.allow_mmapfs` also applies to `hybridfs` and thus we rename
it in order to convey that it is actually used to allow memory-mapping
but not a specific store type.
Relates #36668
Relates #37070
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.
Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.
While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.
This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.
This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.
Relates to #32125
With this commit we introduce a new store type `hybridfs` that is a
hybrid between `mmapfs` and `niofs`. This store type chooses different
strategies to read Lucene files based on the read access pattern (random
or linear) in order to optimize performance.
This store type has been available in earlier versions of Elasticsearch
as `default_fs`. We have chosen a different name now in order to convey
the intent of the store type instead of tying it to the fact whether it
is the default choice.
Relates #36668
This commit fixes some cross-doc links from the old ingest plugins page
to the new ingest processor pages that arose after converting
ingest-geoip and ingest-user-agent to modules.
This commit adds a placeholder ingest-geoip plugin page as there are
other components in the Elastic Stack that still refer to these
pages. These docs would be broken without this placeholder page forcing
teams responsible for those docs to scramble to fix the build over the
weekend before a holiday period. Instead, we add a placeholder page so
the docs build continues to function, and those teams can fix their docs
without the constraint of a broken build. We also cleanup a few minor
docs issues that were missed during the initial changes to convert
ingest-geoip to a module.
This extra scenario describes the case where an updated
policy increases the current phase's `min_age`. Now, the
docs explicitly describe this scenario as to what is
expected -- old min_age is used.
Closes#35356.
* Added Limitations page
* Made the aggregations page follow the common template for functions
* Modified all tables to have the first row's cells content centered
* Polishing in other various sections
This is a follow-up to some discussions around #36399. Currently we have
relatively confusing compression behavior where compression can be
configured for requests based on transport.compress or a specific
setting for a remote cluster. However, we can only compress responses
based on transport.compress as we do not know where a request is
coming from (currently).
This commit modifies the behavior to NEVER compress responses based on
settings. Instead, a response will only be compressed if the request was
compressed. This commit also updates the documentation to more clearly
described transport level compression.
Allow scripts to correctly reference grouping functions
Fix bug in translation of date/time functions mixed with histograms.
Enhance Verifier to prevent histograms being nested inside other
functions inside GROUP BY (as it implies double grouping)
Extend Histogram docs
This commit breaks the single ingest docs file into multiple files,
factoring out the processor docs into a documentation file per
processor. This will help make this content easier to maintain.
This commit overhauls the documentation of discovery and cluster coordination,
removing mention of the Zen Discovery module and replacing it with docs for the
new cluster coordination mechanism introduced in 7.0.
Relates #32006
Leaving `index.lifecycle.indexing_complete` in place when removing the
lifecycle policy from an index can cause confusion, as if a new policy
is associated with the policy, rollover will be silently skipped.
Removing that setting when removing the policy from an index makes
associating a new policy with the index more involved, but allows ILM to
fail loudly, rather than silently skipping operations which the user may
assume are being performed.
* Adjust order of checks in WaitForRolloverReadyStep
This allows ILM to error out properly for indices that have a valid
alias, but are not the write index, while still handling
`indexing_complete` on old-style aliases and rollover (that is, those
which only point to a single index at a time with no explicit write
index)
This is related to #36652. In 7.0 we plan to deprecate a number of
settings that make reference to the concept of a tcp transport. We
mostly just have a single transport type now (based on tcp). Settings
should only reference tcp if they are referring to socket options. This
commit updates the settings in the docs. And removes string usages of
the old settings. Additionally it adds a missing remote compress setting
to the docs.
* [ML] Job and datafeed mappings with index template (#32719)
Index mappings for the configuration documents
* [ML] Job config document CRUD operations (#32738)
* [ML] Datafeed config CRUD operations (#32854)
* [ML] Change JobManager to work with Job config in index (#33064)
* [ML] Change Datafeed actions to read config from the config index (#33273)
* [ML] Allocate jobs based on JobParams rather than cluster state config (#33994)
* [ML] Return missing job error when .ml-config is does not exist (#34177)
* [ML] Close job in index (#34217)
* [ML] Adjust finalize job action to work with documents (#34226)
* [ML] Job in index: Datafeed node selector (#34218)
* [ML] Job in Index: Stop and preview datafeed (#34605)
* [ML] Delete job document (#34595)
* [ML] Convert job data remover to work with index configs (#34532)
* [ML] Job in index: Get datafeed and job stats from index (#34645)
* [ML] Job in Index: Convert get calendar events to index docs (#34710)
* [ML] Job in index: delete filter action (#34642)
This changes the delete filter action to search
for jobs using the filter to be deleted in the index
rather than the cluster state.
* [ML] Job in Index: Enable integ tests (#34851)
Enables the ml integration tests excluding the rolling upgrade tests and a lot of fixes to
make the tests pass again.
* [ML] Reimplement established model memory (#35500)
This is the 7.0 implementation of a master node service to
keep track of the native process memory requirement of each ML
job with an associated native process.
The new ML memory tracker service works when the whole cluster
is upgraded to at least version 6.6. For mixed version clusters
the old mechanism of established model memory stored on the job
in cluster state was used. This means that the old (and complex)
code to keep established model memory up to date on the job object
has been removed in 7.0.
Forward port of #35263
* [ML] Need to wait for shards to replicate in distributed test (#35541)
Because the cluster was expanded from 1 node to 3 indices would
initially start off with 0 replicas. If the original node was
killed before auto-expansion to 1 replica was complete then
the test would fail because the indices would be unavailable.
* [ML] DelayedDataCheckConfig index mappings (#35646)
* [ML] JIndex: Restore finalize job action (#35939)
* [ML] Replace Version.CURRENT in streaming functions (#36118)
* [ML] Use 'anomaly-detector' in job config doc name (#36254)
* [ML] Job In Index: Migrate config from the clusterstate (#35834)
Migrate ML configuration from clusterstate to index for closed jobs
only once all nodes are v6.6.0 or higher
* [ML] Check groups against job Ids on update (#36317)
* [ML] Adapt to periodic persistent task refresh (#36633)
* [ML] Adapt to periodic persistent task refresh
If https://github.com/elastic/elasticsearch/pull/36069/files is
merged then the approach for reallocating ML persistent tasks
after refreshing job memory requirements can be simplified.
This change begins the simplification process.
* Remove AwaitsFix and implement TODO
* [ML] Default search size for configs
* Fix TooManyJobsIT.testMultipleNodes
Two problems:
1. Stack overflow during async iteration when lots of
jobs on same machine
2. Not effectively setting search size in all cases
* Use execute() instead of submit() in MlMemoryTracker
We don't need a Future to wait for completion
* [ML][TEST] Fix NPE in JobManagerTests
* [ML] JIindex: Limit the size of bulk migrations (#36481)
* [ML] Prevent updates and upgrade tests (#36649)
* [FEATURE][ML] Add cluster setting that enables/disables config migration (#36700)
This commit adds a cluster settings called `xpack.ml.enable_config_migration`.
The setting is `true` by default. When set to `false`, no config migration will
be attempted and non-migrated resources (e.g. jobs, datafeeds) will be able
to be updated normally.
Relates #32905
* [ML] Snapshot ml configs before migrating (#36645)
* [FEATURE][ML] Split in batches and migrate all jobs and datafeeds (#36716)
Relates #32905
* SQL: Fix translation of LIKE/RLIKE keywords (#36672)
* SQL: Fix translation of LIKE/RLIKE keywords
Refactor Like/RLike functions to simplify internals and improve query
translation when chained or within a script context.
Fix#36039Fix#36584
* Fixing line length for EnvironmentTests and RecoveryTests (#36657)
Relates #34884
* Add back one line removed by mistake regarding java version check and
COMPAT jvm parameter existence
* Do not resolve addresses in remote connection info (#36671)
The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.
* [Painless] Add boxed type to boxed type casts for method/return (#36571)
This adds implicit boxed type to boxed types casts for non-def types to create asymmetric casting relative to the def type when calling methods or returning values. This means that a user calling a method taking an Integer can call it with a Byte, Short, etc. legally which matches the way def works. This creates consistency in the casting model that did not previously exist.
* SNAPSHOTS: Adjust BwC Versions in Restore Logic (#36718)
* Re-enables bwc tests with adjusted version conditions now that #36397 enables concurrent snapshots in 6.6+
* ingest: fix on_failure with Drop processor (#36686)
This commit allows a document to be dropped when a Drop processor
is used in the on_failure fork of the processor chain.
Fixes#36151
* Initialize startup `CcrRepositories` (#36730)
Currently, the CcrRepositoryManger only listens for settings updates
and installs new repositories. It does not install the repositories that
are in the initial settings. This commit, modifies the manager to
install the initial repositories. Additionally, it modifies the ccr
integration test to configure the remote leader node at startup, instead
of using a settings update.
* [TEST] fix float comparison in RandomObjects#getExpectedParsedValue
This commit fixes a test bug introduced with #36597. This caused some
test failure as stored field values comparisons would not work when CBOR
xcontent type was used.
Closes#29080
* [Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
* TESTS:Debug Log. IndexStatsIT#testFilterCacheStats
* ingest: support default pipelines + bulk upserts (#36618)
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.
However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.
This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.
fixes#36219
* Fix duplicate phrase in shrink/split error message (#36734)
This commit removes a duplicate "must be a" from the shrink/split error
messages.
* Deprecate types in get_source and exist_source (#36426)
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
* Revert "[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as default `geo_shape` indexing approach (#35320)"
This reverts commit 5bc7822562.
* Enhance Invalidate Token API (#35388)
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
* Add raw sort values to SearchSortValues transport serialization (#36617)
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.
This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
* Watcher: Ensure all internal search requests count hits (#36697)
In previous commits only the stored toXContent version of a search
request was using the old format. However an executed search request was
already disabling hit counts. In 7.0 hit counts will stay enabled by
default to allow for proper migration.
Closes#36177
* [TEST] Ensure shard follow tasks have really stopped.
Relates to #36696
* Ensure MapperService#getAllMetaFields elements order is deterministic (#36739)
MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.
With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.
* Expose Sequence Number based Optimistic Concurrency Control in the rest layer (#36721)
Relates #36148
Relates #10708
* [ML] Mute MlDistributedFailureIT
* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:
When using vector strategy:
* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..
All other features are supported.
* revert change to PercolatorFieldMapper
* fix ExistsQuery for geo_shape vector strategy
* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only
* initial update to geoshape docs, including mapping migration updates
* initial support for GeoCollection queries
* fix docs and javadoc errors
* clean up geocollection queries
* set deprecated mapping tests to NOTCONSOLE
* fix geo-shape mapper asciidoc mapping and test warnings
* add support for point queries using LatLonShapeBoundingBoxQuery
* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups
* add lucene geometry build testing to ShapeBuilder tests
* remove deprecated prefix tree mapping from geo-shape.asciidoc
* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper
Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.
* update docs to remove vector strategy
* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder
* fix LineLength failure in GeoJsonShapeParserTests
* ShapeMapper refactor changes from PR feedback
* fix typo in geo-shape.asciidoc
* ignore circle test in docs
* update indexing-approach ref to geoshape-indexing-approach
* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase
* fix deprecatedParameters setup
* update indexing approach
* fixing unexpected warnings failures
* move orientation back to field type
* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point
* fix indexing-approach link in circle section of geoshape docs
* add strategy to deprecation warnings check
* fix test failures
* fix typo in QueryStringQueryBuilderTests
* fix total hits to totalHits().value
* fix version number
* add version check to BaseGeoShapeFieldMapper
* fix line length!
* revert version check in BaseGeoShapeFieldMapper
* Fix serialization of mappings of legacy shapes.
The first example given is missing the two single-token cases for "is" and "a".
The later usage example is slightly wrong in that custom analyzers should
go under `settings.analysis.analyzer`.
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
This commit adds an adjust_offsets parameter to the word_delimiter_graph token filter, defaulting
to true. Most of the time you'd want sub-tokens emitted by this filter to have offsets that are
adjusted to their real position in the token stream; however, some token filters can change the
length or starting position of a token (eg trim) without changing their offset attributes, and this
can lead to word_delimiter_graph emitting illegal offsets. Setting adjust_offsets to false in these
cases will allow indexing again.
Fixes#34741, #33710
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
For each remote cluster the auto follow coordinator, starts an auto
follower that checks the remote cluster state and determines whether an
index needs to be auto followed. The time since last auto follow is
reported per remote cluster and gives insight whether the auto follow
process is alive.
Relates to #33007
Originates from #35895
Introduce Histogram grouping function for bucketing/grouping data based
on a given range. Both date and numeric histograms are supported using
the appropriate range declaration (numbers vs intervals).
SELECT HISTOGRAM(number, 50) AS h FROM index GROUP BY h
SELECT HISTOGRAM(date, INTERVAL 1 YEAR) AS h FROM index GROUP BY h
In addition add multiply operator for Intervals
Add docs for intervals and histogram
Fix#36509
* Add IntervalQueryBuilder with support for match and combine intervals
* Add relative intervals
* feedback
* YAML test - broekn
* yaml test; begin to add block source
* Add block; make disjunction its own source
* WIP
* Extract IntervalBuilder and add tests for it
* Fix eq/hashcode in Disjunction
* New yaml test
* checkstyle
* license headers
* test fix
* YAML format
* YAML formatting again
* yaml tests; javadoc
* Add OR test -> requires fix from LUCENE-8586
* Add docs
* Re-do API
* Clint's API
* Delete bash script
* doc fixes
* imports
* docs
* test fix
* feedback
* comma
* docs fixes
* Tidy up doc references to old rule
Add CURRENT_TIMESTAMP as keyword as well function alongside NOW()
These return the current date/time for the given query, computed when
the statement reaches the server. For completeness, CURRENT_TIMESTAMP
also accepts precision as an optional parameter.
Fix#36534
* Adds deprecation logging to ScriptDocValues#getValues.
First commit addressing issue #22919.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
* Fixes two build errors in #34279
* Removes unused import in ScriptDocValuesDatesTest
* Removes used of `.values` in example in diversified-sampler-aggregation.asciidoc
* Removes use of .values from painless test.
Part of #34279
* Updates tests to use `doc[foo]` syntax rather than `doc[foo].values`.
* Removes use of `getValues()` and replaces use of `doc[foo].values` with `doc[foo]`.
* Indentation fix.
* Remove unnecessary list construction at previous `getValues()` callsite in ScriptDocValues.GeoPoints.
* Update migration doc and add link to `getValue` in ScriptDocValues javadoc.
* Fix compile
* Fix javadoc issue
* Removes ScriptDocValues#getValues usage from painless whitelist.
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
* Parallel restore from two different snapshots
* Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
* Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
* Add guidance on using CCR with Logstash
This commit adds a note to the documentation regarding how to configure
Logstash indices in the context of being available as leader indices for
cross-cluster replication.
* Oh okay
* idk
* notconsole
This commit adds deprecation warnings when using format specifiers with
joda data formats that will change with java time. It also adds the "8"
prefix which may be used to force the new java time format parsing.
When a security manager is present, the JVM will cache positive hostname
lookups indefinitely. This can be problematic, especially in the modern
world with cloud services where DNS addresses can change, or
environments using Docker containers where IP addresses could be
considered ephemeral. This behavior impacts cluster discovery,
cross-cluster replication and cross-cluster search, reindex from remote,
snapshot repositories, webhooks in Watcher, external authentication
mechanisms, and the Elastic Stack Monitoring Service. The experience of
watching a DNS lookup change yet not be reflected within Elasticsearch
is a poor experience for users. The reason the JVM has this is guard
against DNS cache posioning attacks. Yet, there is already a defense in
the modern world against such attacks: TLS. With proper certificate
validation, even if a resolver falls prey to a DNS cache poisoning
attack, using TLS would neuter the attack. Therefore we have a policy
with dubious security value that significantly impacts usability. As
such we make the usability/security tradeoff towards usability, since
the security risks are very low. This commit introduces new system
properties that Elasticsearch observes to override the JVM DNS cache
policy.
Previously persistent task assignment was checked in the
following situations:
- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected
However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.
This change adds a timed recheck of persistent task
assignment to account for such situations. The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.
Closes#35792
Redeprecates the `/_xpack/rollup` endpoints in favor of `/_rollup`.
When we cleanup the rollup in a cluster containing 6.x nodes we need to
use `/_xpack/rollup` instead of `/_rollup` because the 6.x nodes don't
know about `/_rollup`. In those cases we must ignore the deprecation
warnings that the 7.0 node will return for the end point.
Closes#36044
* Renamed DAY_OF_WEEK and WEEK_OF_YEAR functions to their ISO version and
added the same functions with different functionality.
* Rewritten the datetime functions documentation to follow the format of the other
functions documentation pages.
Adds a setting that indicates that an index is done indexing, set by ILM
when the Rollover action completes. This indicates that the Rollover
action should be skipped in any future invocations, as long as the index
is no longer the write index for its alias.
This enables 1) an index with a policy that involves the Rollover action
to have the policy removed and switched to another one without use of
the move-to-step API, and 2) integrations with Beats and CCR.
* Lower fielddata circuit breaker default limit
Lower fielddata circuit breaker default limit from 60% to 40% as we have
moved to doc_values for most of the cases.
* merge master in
* update tests
* update docs
Bulk requests comprise many individual actions, and the responses for each
action comes back in the same order (see e.g. `DocumentActionsIT#testBulk()`).
However the docs do not seem to explicitly state this vital fact. This commit
addresses that omission.
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit gets rid of the 'NONE' and 'INFO' severity levels for
deprecation issues.
'NONE' is unused and does not make much sense as a severity level.
'INFO' can be separated into two categories: Either 1) we can
definitively tell there will be a problem with the cluster/node/index
configuration that can be resolved prior to upgrade, in which case
the issue should be a WARNING, or 2) we can't, because any issues would
be at the application level, for which the user should review the
deprecation logs and/or response headers.
In real deployments it is important that clusters are properly configured to
avoid accidentally forming multiple independent clusters at cluster
bootstrapping time. However we also expect to be able to unpack Elasticsearch
and start up one or more nodes without any up-front configuration, and have
them do their best to find each other and form a cluster after a few seconds.
This change adds a delayed automatic bootstrapping process to nodes that start
up with no relevant settings set to support the desired out-of-the-box
experience without compromising safety in properly-configured deployments.
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
The current response format is:
```
{
"pattern1": {
...
},
"pattern2": {
...
}
}
```
The new format is:
```
{
"patterns": [
{
"name": "pattern1",
"pattern": {
...
}
},
{
"name": "pattern2",
"pattern": {
...
}
}
]
}
```
This format is more structured and more friendly for parsing and generating specs.
This is a breaking change, but it is better to do this now while ccr
is still a beta feature than later.
Follow up from #36049
This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
The new limit on the number of open shards in a cluster may be
interpreted by users as a sizing recommendation, but it is not. This
clarifies in the documentation that this is a safety limit, not a
recommendation.
This commit documents how Index Lifecycle Management
interacts with snapshot/restore, and documents a workaround
for situations in which ILM should not immediately resume
managing an index after it is restored.
A number of tokenfilters can produce multiple tokens at the same position. This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.
This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are
generated. In indexes created in elasticsearch 6.x deprecation warnings are emitted in place
of the error.
* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors
Fixes#34298
Add a short extra sentence that explains that a missing query part in a search
request containing a "suggest" section will mean only suggestions are returned.
Closes#31640
Move classes under the same package to avoid internal classes being
exposed to the outside. Remove public visibility outside 3 classes:
EsDriver, EsDataSource and EsTypes.
The driver only has one package, namely org.elasticsearch.xpack.sql.jdbc
Use Es prefix for classes to ease name conflict and indicate their
destination
Fix#35437
* [Rollup] Add more diagnostic stats to job
To help debug future performance issues, this adds the
min/max/avg/count/total latencies (in milliseconds) for search
and bulk phase. This latency is the total service time including
transfer between nodes, not just the `took` time.
It also adds the count of search/bulk failures encountered during
runtime. This information is also in the log, but a runtime counter
will help expose problems faster
* review cleanup
* Remove dead ParseFields
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
Closes#22919
MultiSearchRequests issues through `_msearch` now validate all keys
in the metadata section. Previously unknown keys were ignored
while now an exception is thrown.
Closes#35869
This commit adds documentation for authorization_realms
setting for the Kerberos realm and also corrects a typo in
existing documentation.
Co-authored-by: @A-Hall
This removes the option to run a cluster without enforcing the
cluster-wide shard limit, making strict enforcement the default and only
behavior. The limit can still be adjusted as desired using the cluster
settings API.
Add GREATEST(expr1, expr2, ... exprN) and LEAST(expr1, expr2, exprN)
functions which are in the family of CONDITIONAL functions.
Implementation follows PostgreSQL behaviour, so the functions return
`NULL` when all of their arguments evaluate to `NULL`.
Renamed `CoalescePipe` and `CoalesceProcessor` to `ConditionalPipe` and
`ConditionalProcessor` respectively, to be able to reuse them for
`Greatest` and `Least` evaluations. To achieve that `ConditionalOperation`
has been added to differentiate between the functionalities at execution
time.
Closes: #35878
This operator handles nulls in different way than the normal `=`.
If one of the operants is `null` and the other not it returns `false`.
If both operants are `null` it returns `true`. Therefore in contrary to
`=`, which returns `null` if at least one of the operants is `null`, this one
never returns `null` as a result.
Closes: #35871
This endpoint was not previously documented as it was not
particularly useful to end users. However, since the HLRC
will support the endpoint we need some documentation to
link to.
The purpose of the endpoint is to provide defaults and
limits used by ML. These are needed to fully understand
configurations that have missing values because the missing
value means the default should be used.
Relates #35777
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
This commit moves the documentation and examples for the `index_prefixes`
option on text fields to its own file, to bring it in line with other mapping
parameters, and expands a bit on both.
Recent Docker for Mac releases[1] have a different path to the tty for
accessing the console of the xhyve vm, required for altering the
`vm.max_map_count` sysctl.
Update instructions on how to enter the xhyve vm for altering the
`vm.max_map_count` sysctl setting on Docker for Mac.
Closes#34817
[1]
https://forums.docker.com/t/is-it-possible-to-ssh-to-the-xhyve-machine/17426/13
The ICU plugin provides the building blocks of an analysis chain, but doesn't actually have a prebuilt analyzer. It would be a better for users if there was a simple analyzer that they could use out of the box, and also something we can point to from the CJK Analyzer docs as a superior alternative.
Relates to #34285
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.
Relates to #34352
This adds an example to drive home the semantics of `min_age` and explain
how actions from one phase must complete before `min_age` is even tested
closes#34020.
RolloverStep previously had a name of "attempt_rollover", which was
inconsistent with all other step names due it its use of an underscore
instead of a dash.
* Deprecate types in count requests.
* Move RestCountAction to the 'search' package.
* Deprecate types in multi search requests.
* Add tests for types deprecation in the _search endpoint.
RolloverAction will now periodically check the rollover conditions using
the Rollover API with the dry_run option as an AsyncWaitStep, then run
the rollover itself by calling the Rollover API with no conditions,
which will always roll over, as an AsyncActionStep. This will resolve
race condition issues in policies using RolloverAction.
* ML: Adding missing datacheck to datafeedjob
* Adding client side and docs
* Making adjustments to validations
* Making values default to on, having more sensible limits
* Intermittent commit, still need to figure out interval
* Adjusting delayed data check interval
* updating docs
* Making parameter Boolean, so it is nullable
* bumping bwc to 7 before backport
* changing to version current
* moving delayed data check config its own object
* Separation of duties for delayed data detection
* fixing checkstyles
* fixing checkstyles
* Adjusting default behavior so that null windows are allowed
* Mentioning the default value
* Fixing comments, syncing up validations
The docs are not resilient to timing issues where the ILM metadata is not set on newly
created indices, so we shouldn't be so strict on the returned response
The documentation of `search_after` recommends to use the `_id`
field as a tiebreaker for the sort without warning against
the additional memory required. This change changes the recommandation
to use a copy of the `_id` field with doc_values enabled.
This adds a `wait_for_completion` flag which allows the user to block
the Stop API until the task has actually moved to a stopped state,
instead of returning immediately. If the flag is set, a `timeout` parameter
can be specified to determine how long (at max) to block the API
call. If unspecified, the timeout is 30s.
If the timeout is exceeded before the job moves to STOPPED, a
timeout exception is thrown. Note: this is just signifying that the API
call itself timed out. The job will remain in STOPPING and evenutally
flip over to STOPPED in the background.
If the user asks the API to block, we move over the the generic
threadpool so that we don't hold up a networking thread.
* [ILM] Add documentation for error handling in ILM
This adds some initial documentation for error handling and retrying failed
steps for index lifecycle management
We changed the way realm settings are defined, and this affects custom
realms in SecurityExtensions. This change adds those details to the
breaking changes docs.
Relates: #30241
* [DOCS] ILM API Ref edits
* [DOCS] Fixed endpoint for DELETE policy.
* [DOCS] Removed comparison to setting index.lifecycle.name to null.
* [DOCS] Fixed xrefs to explain API.
Today our OS information returned in node stats only returns a
high-level name of the OS (e.g., "Linux"). Yet, for some uses this is
too high-level and knowing at a finer level of granularity the
underlying OS can be useful. This commit extracts the pretty name on
Linux from /etc/os-release. This pretty name usually includes the Linux
vendor and the Linux vendor version number (e.g., Fedora 28).
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in #26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.
Closes#34778
If the underlying mount point for the JNA temporary directory is mounted
noexec on Linux, then the JVM will not be able to map the native code in
as executable. This will prevent JNA from executing and will prevent
Elasticsearch from being able to execute some functions that rely on
native code (e.g., memory locking, and installing system call
filters). We do not want to get into the business of catching exceptions
and parsing messages towards this because these exception messages can
change on us. We also do not want to jump through a lot of hoops to
check the underlying mount point for noexec. Instead, we will rely on
documentation to address this problem. This commit adds to the important
system configuration section of the docs that the JNA temporary
directory is not on a mount point with the noexec mount option.
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
Sometimes users are confused about whether they can use the Convert Processor
for changing an existing fields type to other types even if the existing one is already
ingested. This confusion is from the first line of description. Changing this and also
adding a some detail to the code snippet.
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
We've decided that the bulk, delete, get, index, update, and search APIs should not
contain this request parameter, and we will instead accept both typed and typeless calls.
The remove-ilm-from-index API was using the DELETE http method
to signify that something is being removed. Although, metadata
about ILM for the index is being deleted, no entity/resource
is being deleted during this operation. POST is more in line with
what this API is actually doing, it is modifying the metadata for
an index. As part of this change, `remove` is also appended to the path
to be more explicit about its actions.
This moves all Realm settings to an Affix definition.
However, because different realm types define different settings
(potentially conflicting settings) this requires that the realm type
become part of the setting key.
Thus, we now need to define realm settings as:
xpack.security.authc.realms:
file.file1:
order: 0
native.native1:
order: 1
- This is a breaking change to realm config
- This is also a breaking change to custom security realms (SecurityExtension)
We have an example in `reindex`'s docs about copying from many indices
at once. It doesn't work at the moment because we only allow a single
type per index. We didn't notice it in the docs tests because those
tests didn't copy any documents. This change:
1. Adds documents to the docs tests to fully exercise the snippet.
2. Fixes the example by moving all copied documents to the same type.
3. Moves the note about id collisions and expands on it because it is
even more likely than before.
Closes#35150
This commit removes the Joda time usage from ILM and the HLRC components of ILM.
It also fixes an issue where using the `?human=true` flag could have caused the
parser not to work. These millisecond fields now follow the standard we use
elsewhere in the code, with additional fields added iff the `human` flag is
specified.
This is a breaking change for ILM, but since ILM has not yet been released, no
compatibility shim is needed.
This changes the current script.max_size_in_bytes to be dynamic so it can be
set through the cluster settings API. This setting is also applied to inline scripts
in the compile method of ScriptService to prevent excessively long inline
scripts from being compiled. The script length limit is removed from Painless as
this is no longer necessary with the protection in compile.
With this commit we differentiate between permanent circuit breaking
exceptions (which require intervention from an operator and should not
be automatically retried) and transient ones (which may heal themselves
eventually and should be retried). Furthermore, the parent circuit
breaker will categorize a circuit breaking exception as either transient
or permanent based on the categorization of memory usage of its child
circuit breakers.
Closes#31986
Relates #34460
When we connect to remote clusters, there may be a few more routers/firewalls in-between compared to when we connect to nodes in the same cluster. We've experienced cases where firewalls drop connections completely and keep-alives seem not to be enough, or they are not properly configured. With this commit we allow to enable application-level pings specifically from CCS nodes to the selected remote nodes through the new setting `cluster.remote.${clusterAlias}.transport.ping_schedule`. The new setting is similar `transport.ping_schedule` but it does not affect intra-cluster communication, pings are only sent to specific remote cluster when specifically enabled, as they are disabled by default.
Relates to #34405
This PR renames the CRUD APIS for ILM
GET _ilm/<policy>, _ilm -> _ilm/policy/<policy>, _ilm/policy
PUT _ilm/<policy> -> _ilm/policy/<policy>
DELETE _ilm/<policy> -> _ilm/policy/<policy>
closes#34929.
The `random_score` function produces values between 0 (inclusive) and 1
(exclusive) and documented it with fancy methematical range notation. It
is so fancy I thought it was a typo. This changes the documentation to
use words.
Relates to #35084
This changes the RollupSearch endpoint to proactively resolve index
patterns. If the index pattern(s) match more than one rollup index,
an exception is throw as before. But if the pattern only matches one
rollup index, execution is allowed to continue (unlike before where
it would assume all patterns were for raw data).
This also allows the search endpoint to resolve aliases that point to
a rollup index.
Also tweaks the documentation to make this clear.
Closes#34828
* Remove a tip about ignore_above that only makes sense with multiple types.
* Remove a line from the percolator documentation that refers to multiple types.
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation
Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
When combine_script and reduce_script were made into required
parameters for Scripted Metric aggregations in #33452, the docs were
not updated to reflect that. This marks those parameters as required
in the documentation.
Deprecates `_source_include` and `_source_exclude` url parameters
in favor of `_source_inclues` and `_source_excludes` because those
are consistent with the rest of Elasticsearch's APIs.
Relates to #22792
This commit fixes two issues with the CCR API specification:
- remove the CCR stats endpoint, it is not currently implemented
- fix the documentation links
The file structure finder endpoint can find the NDJSON
(newline-delimited JSON) file format, but called it
`json`. This change renames the `format` for this file
structure to `ndjson`, which is more precise and will
hopefully avoid confusion.
* Changed the auto follow stats to also include follow stats.
* Renamed the auto follow stats api to stats api and changed its url path
from `/_ccr/auto_follow/stats` `/_ccr/stats`.
* Removed `/_ccr/stats` url path for the follow stats api, which makes
the index parameter a required parameter.
* Fixed docs.
This commit is our first introduction to cross-cluster replication
docs. In this commit, we introduce the cross-cluster replication API
docs. We also add skelton docs for additional content that will be added
in a series of follow-up commits.
Documents the new structured logfile format for auditing
that was introduced by #31931. Most changes herein
are for 6.x . In 7.0 the deprecated format is gone and a
follow-up PR is in order.
This change adds a section about the global search setting
`indices.query.bool.max_clause_count` that limits the number of boolean clauses
allowed in a Lucene BooleanQuery.
Closes#19858
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
Extend querying support on multiple indices from being strictly
identical to being just compatible.
Use FieldCapabilities API (extended through #33803) for mapping merging.
Close#31837#31611
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.
The `IN` expression can be used in SELECT, WHERE and HAVING clauses.
Closes: #32955
`CONVERT` works exactly like cast with slightly different syntax:
`CONVERT(<value>, <data_type)` as opposed to `CAST(<value> AS <data_type>)`
Moreover it support format of the MS-SQL data types `SQL_<type>`,
e.g.: `SQL_INTEGER`
Closes: #34513
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
This commit switches to using a trial license in the docs tests that run
on the default distribution. This is needed so that docs tests can be
executed against non-basic features.
With remote clusters taking on a larger role, we have make the
infrastructure more generic than being tied to cross-cluster search
(CCS). We want to refer to the remote clusters configuration in the
cross-cluster replication (CCR) docs. Yet, these docs are still tied to
CCS. This commit extracts the remote clusters docs from CCS (with some
wording changes to make them more general) so that we can refer to them
in the CCR docs.
When a envelope that crosses the dateline is specified as a part of
geo_shape query is parsed it shouldn't have its left and right points
flipped.
Fixes#34418
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.
Fix#32079
The `term` and `phrase` suggesters have different options to filter candidates
based on their frequencies. The `popular` mode for instance filters candidate
terms that occur in less docs than the original term. However when we compute this threshold
we use the total term frequency of a term instead of the document frequency. This is not inline
with the actual filtering which is always based on the document frequency. This change fixes
this discrepancy and clarifies the meaning of the different frequencies in use in the suggesters.
It also ensures that the threshold doesn't overflow the maximum allowed value (Integer.MAX_VALUE).
Closes#34282
Add example for selectively clearing just the request, query or fielddata cache
and for selectively clearing the cache for specific fields.
Closes#34287
* Adding new xpack.ml.max_lazy_ml_nodes setting to docs
* Fixing docs, making it clearer what the setting does
* Adding note about external process need
We'd disabled them because we didn't have a way to clean up after each
test. I implemented #34342 which adds the clean ups so now we can
re-enable the tests.
In the `setup` sections we have to use `raw` requests instead of
`x-pack` requests because we don't have the json config for x-pack.
Closes#33319
This commit moves the definition of domainSplit into java and exposes it
as a painless whitelist extension. The method also no longer needs
params, and version which ignores params is added and deprecated.
Tweak the upgrade instructions for moving from pre-6.3-with-x-pack to
post-6.3-default distribution. Specifically, you have to remove the
x-pack plugin before upgrading because 6.4 doesn't understand how to
remove it.
Relates to #34307
This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so
it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts
in 6x in order to ensure that users are aware when they'll upgrade to ES 7.
Relates #33309
The ingest pipeline that is produced is very simple. It
contains a grok processor if the format is semi-structured
text, a date processor if the format contains a timestamp,
and a remove processor if required to remove the interim
timestamp field parsed out of semi-structured text.
Eventually the UI should offer the option to customize the
pipeline with additional processors to perform other data
preparation steps before ingesting data to an index.
* New OCTET_LENGTH function
* Changed the way the FunctionRegistry stores functions, considering the alphabetic ordering by name
* Added documentation for the RANDOM function
The "lookupUser" method on a realm facilitates the "run-as" and
"authorization_realms" features.
This commit allows a realm to be used for "lookup only", in which
case the "authenticate" method (and associated token methods) are
disabled.
It does this through the introduction of a new
"authentication.enabled" setting, which defaults to true.
Building automatons can be costly. For the most part we cache things
that use automatons so the cost is limited.
However:
- We don't (currently) do that everywhere (e.g. we don't cache role
mappings)
- It is sometimes necessary to clear some of those caches which can
cause significant CPU overhead and processing delays.
This commit introduces a new cache in the Automatons class to avoid
unnecesarily recomputing automatons.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
The `status` part of the tasks API reflects the internal status of a
running task. In general, we do not make backwards breaking changes to
the `status` but because it is internal we reserve the right to do so. I
suspect we will very rarely excercise that right but it is important
that we have it so we're not boxed into any particular implementation
for a request.
In some sense this is policy making by documentation change. In another
it is clarification of the way we've always thought of this field.
I also reflect the documentation change into the Javadoc in a few
places. There I acknowledge Kibana's "special relationship" with
Elasticsearch. Kibana parses `_reindex`'s `status` field and, because
we're friends with those folks, we should talk to them before we make
backwards breaking changes to it. We *want* to be friends with everyone
but there is only so much time in the day and we don't *want* to make
backwards breaking fields to `status` at all anyway. So we hope that
breaking changes documentation should be enough for other folks.
Relates to #34245.
We generate tests from our documentation, including assertions about the
responses returned by a particular API. But sometimes we *can't* assert
that the response is correct because of some defficiency in our tooling.
Previously we marked the response `// NOTCONSOLE` to skip it, but this
is kind of odd because `// NOTCONSOLE` is really to mark snippets that
are json but aren't requests or responses. This introduces a new
construct to skip response assertions:
```
// TESTRESPONSE[skip:reason we skipped this]
```
This enables Elasticsearch to use the JVM-wide configured
PKCS#11 token as a keystore or a truststore for its TLS configuration.
The JVM is assumed to be configured accordingly with the appropriate
Security Provider implementation that supports PKCS#11 tokens.
For the PKCS#11 token to be used as a keystore or a truststore for an
SSLConfiguration, the .keystore.type or .truststore.type must be
explicitly set to pkcs11 in the configuration.
The fact that the PKCS#11 token configuration is JVM wide implies that
there is only one available keystore and truststore that can be used by TLS
configurations in Elasticsearch.
The PIN for the PKCS#11 token can be set as a truststore parameter in
Elasticsearch or as a JVM parameter ( -Djavax.net.ssl.trustStorePassword).
The basic goal of enabling PKCS#11 token support is to allow PKCS#11-NSS in
FIPS mode to be used as a FIPS 140-2 enabled Security Provider.
* Make text message not required in constructor for slack
* Remove unnecessary comments in test file
* Throw exception when reduce or combine is not provided; update tests
* Update integration tests for scripted metrics to always include reduce and combine
* Remove some old changes from previous branches
* Rearrange script presence checks to be earlier in build
* Change null check order in script builder for aggregated metrics; correct test scripts in IT
* Add breaking change details to PR
As user-defined cluster metadata is accessible to anyone with access to
get the cluster settings, stored in the logs, and likely to be tracked
by monitoring solutions, it is useful to clarify in the documentation
that it should not be used to store secret information.
Previously, parsing an arithmetic expression with `*` and no spaces,
e.g.: `2*i` threw a parsing exception as the grammar rule for
tableIdentifier was clashing with the rule for arithmetic operator `*`.
This issue comes already in the lexer and the left part of the
expression (in our example `2*`) was recognised as a
TABLE_IDENTIFIER token.
The solution adopted is to allow the `*` wildcard in the table name
only if it's surrounded with double quotes, e.g.: `"my*index"`
Closes: #33957
This change fixes a potential deadlock problem in the unit
test introduced in #34117.
It also removes a piece of debug code and corrects a docs
formatting problem that were both added in that same PR.
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0
Relates to #31389
This can be used to restrict the amount of CPU a single
structure finder request can use.
The timeout is not implemented precisely, so requests
may run for slightly longer than the timeout before
aborting.
The default is 25 seconds, which is a little below
Kibana's default timeout of 30 seconds for calls to
Elasticsearch APIs.
Previously the timestamp_formats field in the response
from the find_file_structure endpoint contained Joda
timestamp formats. This change makes that clear by
renaming the field to joda_timestamp_formats, and also
adds a java_timestamp_formats field containing the
equivalent Java time format strings.
With this commit we remove a leftover in the docs about the `format`
field being updatable. This is not true since we removed support for
updates in #25285.
Closes#33986
Relates #25285
Relates #34006
* Changed the format of the String functions documentation page.
* Adopted the same format for Math functions, but completely changed the examples.
* Added missing documentation for Math functions.
Previously numeric values in the field_stats created by the
find_file_structure endpoint were always output with a
decimal point. This looked unfriendly and unnatural for
fields that clearly store integer values. This change
converts integer values to type Integer before output in
the file structure field stats.
* Added TRUNCATE function, modified ROUND to accept two parameters instead of one. Made the second parameter optional for both functions.
* Added documentation for both functions.
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
We currently special-case SynonymFilterFactory and SynonymGraphFilterFactory, which need to
know their predecessors in the analysis chain in order to correctly analyze their synonym lists. This
special-casing doesn't work with Referring filter factories, such as the Multiplexer or Conditional
filters. We also have a number of filters (eg the Multiplexer) that will break synonyms when they
appear before them in a chain, because they produce multiple tokens at the same position.
This commit adds two methods to the TokenFilterFactory interface.
* `getChainAwareTokenFilterFactory()` allows a filter factory to rewrite itself against its preceding
filter chain, or to resolve references to other filters. It replaces `ReferringFilterFactory` and
`CustomAnalyzerProvider.checkAndApplySynonymFilter`, and by default returns `this`.
* `getSynonymFilter()` defines whether or not a filter should be applied when building a synonym
list `Analyzer`. By default it returns `true`.
Fixes#33609
It is not obvious that a filesystem-level backup may capture an inconsistent
set of files that may fail on restore, or (worse) succeed having silently
discarded some data. This change spells the out, and reorganises the first page
or so of the snapshot/restore docs to make this warning fit more nicely.
The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.
In #33241 we moved the file-based discovery functionality to core
Elasticsearch, but preserved the `discovery-file` plugin, and support for the
existing location of the `unicast_hosts.txt` file, for BWC reasons. This commit
completes the removal of this plugin.
New plugin for annotated_text field type.
Largely a copy of `text` field type but adds ability to include markdown-like syntax in the text.
The “AnnotatedText” class parses text+markup and converts into plain text and AnnotationTokens.
The annotation token values are injected unchanged alongside the regular text tokens to provide a
form of additional indexed overlay useful in positional searches and highlighting.
Annotated_text fields do not support fielddata as we want to phase this out.
Also includes a new "annotated" highlighter type that retains annotations and merges in search
hits as additional annotation markup.
Closes#29467
* Implement xpack.monitoring.elasticsearch.collection.enabled setting
* Fixing line lengths
* Updating constructor calls in test
* Removing unused import
* Fixing line lengths in test classes
* Make monitoringService.isElasticsearchCollectionEnabled() return true for tests
* Remove wrong expectation
* Adding unit tests for new flag to be false
* Fixing line wrapping/indentation for better readability
* Adding docs
* Fixing logic in ClusterStatsCollector::shouldCollect
* Rebasing with master and resolving conflicts
* Simplifying implementation by gating scheduling
* Doc fixes / improvements
* Making methods package private
* Fixing wording
* Fixing method access
This commit switches the joda time backcompat in scripting to use
augmentation over ZonedDateTime. The augmentation methods provide
compatibility with the missing methods between joda's DateTime and
java's ZonedDateTime. Due to getDayOfWeek returning an enum in the java
API, ZonedDateTime is wrapped so that the method can return int like the
joda time does. The java time api version is renamed to
getDayOfWeekEnum, which will be kept through 7.x for compatibility while
users switch back to getDayOfWeek once joda compatibility is removed.
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
The remote cluster settings search.remote.* have been renamed to
cluster.remote.* and are automatically upgraded in the cluster state on
gateway recovery, and on put. This commit adds a note to the migration
docs for these changes.
This change adds a `_source` only snapshot repository that allows to wrap
any existing repository as a _backend_ to snapshot only the `_source` part
including live docs markers. Snapshots taken with the `source` repository
won't include any indices, doc-values or points. The snapshot will be reduced in size and
functionality such that it requires full re-indexing after it's successfully restored.
The restore process will copy the `_source` data locally starts a special shard and engine
to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception. The restored index is also marked as read-only.
This feature aims mainly for disaster recovery use-cases where snapshot size is
a concern or where time to restore is less of an issue.
**NOTE**: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.
This change clarifies the documentation of the context completion suggester
regarding filtering and boosting with contexts.
Unlike the suggester v1, filtering on multiple contexts
works as a disjunction, a suggestion matches if it contains at least one of the provided
context values and boosting selects the maximum score among the matching contexts.
This commit also adapts an old test that was written for the v1 suggester and commented out
for version 2 because the behavior changed.
This allows users to filter out tokens from a TokenStream using painless scripts,
instead of having to write specialised Java code and packaging it up into a plugin.
The commit also refactors the AnalysisPredicateScript.Token class so that it wraps
and makes read-only an AttributeSource.
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
Split function section into multiple chapters
Add String functions
Add (small) section on Conversion/Cast functions
Add missing aggregation functions
Enable documentation testing (was disabled by accident). While at it,
fix failing tests
Improve spec tests to allow multi-line queries (useful for docs)
Add ability to ignore a spec test (name should end with -Ignore)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
With features like CCR building on the CCS infrastructure, the settings
prefix search.remote makes less sense as the namespace for these remote
cluster settings than does a more general namespace like
cluster.remote. This commit replaces these settings with cluster.remote
with a fallback to the deprecated settings search.remote.
The maximum number of fields per index is limited to 1000 by default by the
`index.mapping.total_fields.limit` setting to prevent accidental mapping
explosions due to too many fields. Currently all metadata fields also count
towards this limit, which can lead to some confusion when using lower limits.
It is not obvious for users that they cannot actually add as many fields as
are specified by the limit in this case.
This change takes the number of metadata fields out of the field count that we
check against the field limit. It also adds tests that check that we can add
fields up to the specified limit, but throw an exception for any additional field added.
Closes#24096
This allows tokenfilters to be applied selectively, depending on the status of the current token in the tokenstream. The filter takes a scripted predicate, and only applies its subfilter when the predicate returns true.
Adds a place for users to store cluster-wide data they wish to associate
with the cluster via the Cluster Settings API. This is strictly for
user-defined data, Elasticsearch makes no other other use of these
settings.
Extend SHOW TABLES, DESCRIBE and SHOW COLUMNS to support table
identifiers not just SQL LIKE pattern.
This allows both Elasticsearch-style multi-index patterns and SQL LIKE.
To disambiguate between the two (as the " vs ' can be easy to miss),
the grammar now requires LIKE keyword as a prefix for all LIKE-like
patterns.
Also added some docs comparing the two types of patterns.
Fix#33294
Global search timeouts and timeouts specified in the search request body use the
same internal mechanism as search cancellation. Therefore the same caveats
apply, mostly around the responsiveness of the timeout which gets only checked
by a running search on segment boundaries by default.
Closes#31263
This change merges two sections in the "Tune for search speed" documentation
that recommend mapping numeric identifiers as keywords. Both sections contain
mostly the same advice, so they can be merged.
Closes#32733
This commit adds the support to early terminate the collection of a leaf
in the aggregation framework. This change introduces a MultiBucketCollector which
handles CollectionTerminatedException exactly like the Lucene MultiCollector.
Any aggregator can now throw a CollectionTerminatedException without stopping
the collection of a sibling aggregator. This is useful for aggregators that
can infer their result without visiting all documents (e.g.: a min/max aggregation on a match_all query).
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests. This
drops the deprecated methods and will be released with 7.0.
* master:
Mute test watcher usage stats output
[Rollup] Fix FullClusterRestart test
Adjust soft-deletes version after backport into 6.5
completely drop `index.shard.check_on_startup: fix` for 7.0 (#33194)
Fix AwaitsFix issue number
Mute SmokeTestWatcherWithSecurityIT testsi
drop `index.shard.check_on_startup: fix` (#32279)
tracked at
[DOCS] Moves ml folder from x-pack/docs to docs (#33248)
[DOCS] Move rollup APIs to docs (#31450)
[DOCS] Rename X-Pack Commands section (#33005)
TEST: Disable soft-deletes in ParentChildTestCase
Fixes SecurityIntegTestCase so it always adds at least one alias (#33296)
Fix pom for build-tools (#33300)
Lazy evaluate java9home (#33301)
SQL: test coverage for JdbcResultSet (#32813)
Work around to be able to generate eclipse projects (#33295)
Highlight that index_phrases only works if no slop is used (#33303)
Different handling for security specific errors in the CLI. Fix for https://github.com/elastic/elasticsearch/issues/33230 (#33255)
[ML] Refactor delimited file structure detection (#33233)
SQL: Support multi-index format as table identifier (#33278)
MINOR: Remove Dead Code from PathTrie (#33280)
Enable forbiddenapis server java9 (#33245)
This brings the name in line with everywhere else and means that name
seen on the feature usage and `GET _xpack` APIs will match the plugin
name.
This change also removes `IndexLifcycle.NAME` since this was only used
to name the scheduler job and that can be done using
`XPackField.INDEX_LIFECYCLE` instead
* master:
Integrates soft-deletes into Elasticsearch (#33222)
Revert "Integrates soft-deletes into Elasticsearch (#33222)"
Add support for "authorization_realms" (#33262)
Authorization Realms allow an authenticating realm to delegate the task
of constructing a User object (with name, roles, etc) to one or more
other realms.
E.g. A client could authenticate using PKI, but then delegate to an LDAP
realm. The LDAP realm performs a "lookup" by principal, and then does
regular role-mapping from the discovered user.
This commit includes:
- authorization_realm support in the pki, ldap, saml & kerberos realms
- docs for authorization_realms
- checks that there are no "authorization chains"
(whereby "realm-a" delegates to "realm-b", but "realm-b" delegates to "realm-c")
Authorization realms is a platinum feature.
Today we support a static list of seed hosts in core Elasticsearch, and allow a
dynamic list of seed hosts to be provided via a file using the `discovery-file`
plugin. In fact the ability to provide a dynamic list of seed hosts is
increasingly useful, so this change moves this functionality to core
Elasticsearch to avoid the need for a plugin.
Furthermore, in order to start up nodes in integration tests we currently
assign a known port to each node before startup, which unfortunately sometimes
fails if another process grabs the selected port in the meantime. By moving the
`discovery-file` functionality into the core product we can use it to avoid
this race.
This change also moves the expected path to the file from
`$ES_PATH_CONF/discovery-file/unicast_hosts.txt` to
`$ES_PATH_CONF/unicast_hosts.txt`. An example of this file is not included in
distributions.
For BWC purposes the plugin still exists, but does nothing more than create the
example file in the old location, and issue a warning when it is used. We also
continue to support the old location for the file, but warn about its
deprecation.
Relates #29244Closes#33030
* master:
Painless: Add Bindings (#33042)
Update version after client credentials backport
Fix forbidden apis on FIPS (#33202)
Remote 6.x transport BWC Layer for `_shrink` (#33236)
Test fix - Graph HLRC tests needed another field adding to randomisation exception list
HLRC: Add ML Get Records API (#33085)
[ML] Fix character set finder bug with unencodable charsets (#33234)
TESTS: Fix overly long lines (#33240)
Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic
Remove unsupported group_shard_failures parameter (#33208)
Update BucketUtils#suggestShardSideQueueSize signature (#33210)
Parse PEM Key files leniantly (#33173)
INGEST: Add Pipeline Processor (#32473)
Core: Add java time xcontent serializers (#33120)
Consider multi release jars when running third party audit (#33206)
Update MSI documentation (#31950)
HLRC: create base timed request class (#33216)
[DOCS] Fixes command page titles
HLRC: Move ML protocol classes into client ml package (#33203)
Scroll queries asking for rescore are considered invalid (#32918)
Painless: Fix Semicolon Regression (#33212)
ingest: minor - update test to include dissect (#33211)
Switch remaining LLREST usage to new style Requests (#33171)
HLREST: add reindex API (#32679)
* master:
[Rollup] Better error message when trying to set non-rollup index (#32965)
HLRC: Use Optional in validation logic (#33104)
Remove unused User class from protocol (#33137)
ingest: Introduce the dissect processor (#32884)
[Docs] Add link to es-kotlin-wrapper-client (#32618)
[Docs] Remove repeating words (#33087)
Minor spelling and grammar fix (#32931)
Remove support for deprecated params._agg/_aggs for scripted metric aggregations (#32979)
Watcher: Simplify finding next date in cron schedule (#33015)
Run Third party audit with forbidden APIs CLI (part3/3) (#33052)
Fix plugin build test on Windows (#33078)
HLRC+MINOR: Remove Unused Private Method (#33165)
Remove old unused test script files (#32970)
Build analysis-icu client JAR (#33184)
Ensure to generate identical NoOp for the same failure (#33141)
ShardSearchFailure#readFrom to set index and shardId (#33161)
* ingest: Introduce the dissect processor
The ingest node dissect processor is an alternative to Grok
to split a string based on a pattern. Dissect differs from
Grok such that regular expressions are not used to split the
string.
Dissect can be used to parse a source text field with a
simpler pattern, and is often faster the Grok for basic string
parsing. This processor uses the dissect library which
does most of the work.
The fix proposed in #31442 fails with the oss distro because the added
3dots does not match anything with the default oss while a 3dots
expression requires matching at least one thread pool.
This change makes an ellipsis optional so the thread_pool list can match
both the oss distro (without ccr) and default distro (with ccr).
Relates #31442