* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:
When using vector strategy:
* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..
All other features are supported.
* revert change to PercolatorFieldMapper
* fix ExistsQuery for geo_shape vector strategy
* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only
* initial update to geoshape docs, including mapping migration updates
* initial support for GeoCollection queries
* fix docs and javadoc errors
* clean up geocollection queries
* set deprecated mapping tests to NOTCONSOLE
* fix geo-shape mapper asciidoc mapping and test warnings
* add support for point queries using LatLonShapeBoundingBoxQuery
* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups
* add lucene geometry build testing to ShapeBuilder tests
* remove deprecated prefix tree mapping from geo-shape.asciidoc
* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper
Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.
* update docs to remove vector strategy
* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder
* fix LineLength failure in GeoJsonShapeParserTests
* ShapeMapper refactor changes from PR feedback
* fix typo in geo-shape.asciidoc
* ignore circle test in docs
* update indexing-approach ref to geoshape-indexing-approach
* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase
* fix deprecatedParameters setup
* update indexing approach
* fixing unexpected warnings failures
* move orientation back to field type
* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point
* fix indexing-approach link in circle section of geoshape docs
* add strategy to deprecation warnings check
* fix test failures
* fix typo in QueryStringQueryBuilderTests
* fix total hits to totalHits().value
* fix version number
* add version check to BaseGeoShapeFieldMapper
* fix line length!
* revert version check in BaseGeoShapeFieldMapper
* Fix serialization of mappings of legacy shapes.
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
* Adds deprecation logging to ScriptDocValues#getValues.
First commit addressing issue #22919.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
* Fixes two build errors in #34279
* Removes unused import in ScriptDocValuesDatesTest
* Removes used of `.values` in example in diversified-sampler-aggregation.asciidoc
* Removes use of .values from painless test.
Part of #34279
* Updates tests to use `doc[foo]` syntax rather than `doc[foo].values`.
* Removes use of `getValues()` and replaces use of `doc[foo].values` with `doc[foo]`.
* Indentation fix.
* Remove unnecessary list construction at previous `getValues()` callsite in ScriptDocValues.GeoPoints.
* Update migration doc and add link to `getValue` in ScriptDocValues javadoc.
* Fix compile
* Fix javadoc issue
* Removes ScriptDocValues#getValues usage from painless whitelist.
* Lower fielddata circuit breaker default limit
Lower fielddata circuit breaker default limit from 60% to 40% as we have
moved to doc_values for most of the cases.
* merge master in
* update tests
* update docs
This commit gets rid of the 'NONE' and 'INFO' severity levels for
deprecation issues.
'NONE' is unused and does not make much sense as a severity level.
'INFO' can be separated into two categories: Either 1) we can
definitively tell there will be a problem with the cluster/node/index
configuration that can be resolved prior to upgrade, in which case
the issue should be a WARNING, or 2) we can't, because any issues would
be at the application level, for which the user should review the
deprecation logs and/or response headers.
In real deployments it is important that clusters are properly configured to
avoid accidentally forming multiple independent clusters at cluster
bootstrapping time. However we also expect to be able to unpack Elasticsearch
and start up one or more nodes without any up-front configuration, and have
them do their best to find each other and form a cluster after a few seconds.
This change adds a delayed automatic bootstrapping process to nodes that start
up with no relevant settings set to support the desired out-of-the-box
experience without compromising safety in properly-configured deployments.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
MultiSearchRequests issues through `_msearch` now validate all keys
in the metadata section. Previously unknown keys were ignored
while now an exception is thrown.
Closes#35869
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
We changed the way realm settings are defined, and this affects custom
realms in SecurityExtensions. This change adds those details to the
breaking changes docs.
Relates: #30241
This moves all Realm settings to an Affix definition.
However, because different realm types define different settings
(potentially conflicting settings) this requires that the realm type
become part of the setting key.
Thus, we now need to define realm settings as:
xpack.security.authc.realms:
file.file1:
order: 0
native.native1:
order: 1
- This is a breaking change to realm config
- This is also a breaking change to custom security realms (SecurityExtension)
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
When a envelope that crosses the dateline is specified as a part of
geo_shape query is parsed it shouldn't have its left and right points
flipped.
Fixes#34418
The `term` and `phrase` suggesters have different options to filter candidates
based on their frequencies. The `popular` mode for instance filters candidate
terms that occur in less docs than the original term. However when we compute this threshold
we use the total term frequency of a term instead of the document frequency. This is not inline
with the actual filtering which is always based on the document frequency. This change fixes
this discrepancy and clarifies the meaning of the different frequencies in use in the suggesters.
It also ensures that the threshold doesn't overflow the maximum allowed value (Integer.MAX_VALUE).
Closes#34282
This change disallows negative query boosts. Negative scores are not allowed in Lucene 8 so
it is easier to just disallow negative boosts entirely. We should also deprecate negative boosts
in 6x in order to ensure that users are aware when they'll upgrade to ES 7.
Relates #33309
* Make text message not required in constructor for slack
* Remove unnecessary comments in test file
* Throw exception when reduce or combine is not provided; update tests
* Update integration tests for scripted metrics to always include reduce and combine
* Remove some old changes from previous branches
* Rearrange script presence checks to be earlier in build
* Change null check order in script builder for aggregated metrics; correct test scripts in IT
* Add breaking change details to PR
#32281 adds elasticsearch-shard to provide bwc version of elasticsearch-translog for 6.x; have to remove elasticsearch-translog for 7.0
Relates to #31389
Changes the default of the `node.name` setting to the hostname of the
machine on which Elasticsearch is running. Previously it was the first 8
characters of the node id. This had the advantage of producing a unique
name even when the node name isn't configured but the disadvantage of
being unrecognizable and not being available until fairly late in the
startup process. Of particular interest is that it isn't available until
after logging is configured. This forces us to use a volatile read
whenever we add the node name to the log.
Using the hostname is available immediately on startup and is generally
recognizable but has the disadvantage of not being unique when run on
machines that don't set their hostname or when multiple elasticsearch
processes are run on the same host. I believe that, taken together, it
is better to default to the hostname.
1. Running multiple copies of Elasticsearch on the same node is a fairly
advanced feature. We do it all the as part of the elasticsearch build
for testing but we make sure to set the node name then.
2. That the node.name defaults to some flavor of "localhost" on an
unconfigured box feels like it isn't going to come up too much in
production. I expect most production deployments to at least set the
hostname.
As a bonus, production deployments need no longer set the node name in
most cases. At least in my experience most folks set it to the hostname
anyway.
In #33241 we moved the file-based discovery functionality to core
Elasticsearch, but preserved the `discovery-file` plugin, and support for the
existing location of the `unicast_hosts.txt` file, for BWC reasons. This commit
completes the removal of this plugin.
This change removes the wrapping of the created field in the put user
response. The created field was added as a top level field in #32332,
while also still being wrapped within the `user` object of the
response. Since the value is available in both formats in 6.x, we can
remove the wrapped version for 7.0.
The remote cluster settings search.remote.* have been renamed to
cluster.remote.* and are automatically upgraded in the cluster state on
gateway recovery, and on put. This commit adds a note to the migration
docs for these changes.
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
In #29623 we added `Request` object flavored requests to the low level
REST client and in #30315 we deprecated the old `performRequest`s. In a
long series of PRs I've changed all of the old style requests. This
drops the deprecated methods and will be released with 7.0.
We used to set `maxScore` to `0` within `TopDocs` in situations where there is really no score as the size was set to `0` and scores were not even tracked. In such scenarios, `Float.Nan` is more appropriate, which gets converted to `max_score: null` on the REST layer. That's also more consistent with lucene which set `maxScore` to `Float.Nan` when merging empty `TopDocs` (see `TopDocs#merge`).
Currently, if geo context is represented by something other than
geo_point or an object with lat and lon fields, the parsing of it
as a geo context can result in ignoring the context altogether,
returning confusing errors such as number_format_exception or trying
to parse the number specifying as long-encoded hash code. It would also
fail if the geo_point was stored.
This commit makes the mapping parsing more strict and will fail during
mapping update or index creation if the geo context doesn't point to
a geo_point field.
Supersedes #32412Closes#32202
Resolving wildcards in aliases expression is challenging as we may end
up with no aliases to replace the original expression with, but if we
replace with an empty array that means _all which is quite the opposite.
Now that we support and serialize the original requested aliases,
whenever aliases are replaced we will be able to know what was
initially requested. `MetaData#findAliases` can then be updated to not
return anything in case it gets empty aliases, but the original aliases
were not empty. That means that empty aliases are interpreted as _all
only if they were originally requested that way.
Relates to #31516
Because this is a static method on a public API, and one that we encourage
plugin authors to use, the method with the typo is deprecated in 6.x
rather than just renamed.
With this commit we introduce a new circuit-breaking strategy to the parent
circuit breaker. Contrary to the current implementation which only accounts for
memory reserved via child circuit breakers, the new strategy measures real heap
memory usage at the time of reservation. This allows us to be much more
aggressive with the circuit breaker limit so we bump it to 95% by default. The
new strategy is turned on by default and can be controlled with the new cluster
setting `indices.breaker.total.userealmemory`.
Note that we turn it off for all integration tests with an internal test cluster
because it leads to spurious test failures which are of no value (we cannot
fully control heap memory usage in tests). All REST tests, however, will make
use of the real memory circuit breaker.
Relates #31767
Removes support for storing scripts without the usual json around the
script. So You can no longer do:
```
POST _scripts/<templatename>
{
"query": {
"match": {
"title": "{{query_string}}"
}
}
}
```
and must instead do:
```
POST _scripts/<templatename>
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"title": "{{query_string}}"
}
}
}
}
}
```
This improves error reporting when you attempt to store a script but don't
quite get the syntax right. Before, there was a good chance that we'd
think of it as a "raw" template and just store it. Now we won't do that.
Nice.
So far the in-flight request circuit breaker has only accounted for the
on-the-wire representation of a request. However, we convert the raw
request into XContent internally which increases the overhead.
Therefore, we increase the value of the corresponding setting
`network.breaker.inflight_requests.overhead` from one to two. While this
value is still rather conservative (we assume that the representation as
structured objects has no overhead compared to the byte[]), it is closer
to reality than the current value.
Relates #31613
* Migrate scripted metric aggregation scripts to ScriptContext design #29328
* Rename new script context container class and add clarifying comments to remaining references to params._agg(s)
* Misc cleanup: make mock metric agg script inner classes static
* Move _score to an accessor rather than an arg for scripted metric agg scripts
This causes the score to be evaluated only when it's used.
* Documentation changes for params._agg -> agg
* Migration doc addition for scripted metric aggs _agg object change
* Rename "agg" Scripted Metric Aggregation script context variable to "state"
* Rename a private base class from ...Agg to ...State that I missed in my last commit
* Clean up imports after merge
With #29331 we added support for the cluster health API to the
high-level REST client. The transport client does not support the level
parameter, and it always returns all the info needed for shards level
rendering. We have maintained that behaviour when adding support for
cluster health to the high-level REST client, to ease migration, but the
correct thing to do is to default the high-level REST client to
`cluster` level, which is the same default as when going through the
Elasticsearch REST layer.
This commit removes all the API methods that accept a `Header` varargs
argument, in favour of the newly introduced API methods that accept a
`RequestOptions` argument.
Relates to #31069
With `max_concurrent_shard_requests` we used to throttle / limit
the number of concurrent shard requests a high level search request
can execute per node. This had several problems since it limited the
number on a global level based on the number of nodes. This change
now throttles the number of concurrent requests per node while still
allowing concurrency across multiple nodes.
Closes#31192
When `lenient=false`, attempts to create match phrase queries with custom analyzers against non-text fields will throw an IllegalArgumentException.
Also changes `*Match*QueryBuilderTests` so that it avoids this scenario
Fixes#31061
Currently failures to compile a script usually lead to a ScriptException, which
inherits the 500 INTERNAL_SERVER_ERROR from ElasticsearchException if it does
not contain another root cause. Instead, this should be a 400 Bad Request error.
This PR changes this more generally for script compilation errors by changing
ScriptException to return 400 (bad request) as status code.
Closes#12315
Include size of snapshot in snapshot metadata
Adds difference of number of files (and file sizes) between prev and current snapshot. Total number/size reflects total number/size of files in snapshot.
Closes#18543
Treats geohashes as grid cells instead of just points when the
geohashes are used to specify the edges in the geo_bounding_box
query. For example, if a geohash is used to specify the top_left
corner, the top left corner of the geohash cell will be used as the
corner of the bounding box.
Closes#25154
This commit reintroduces 31251c9 and 63a5799. These commits introduced a
memory leak and were reverted. This commit brings those commits back
and fixes the memory leak by removing unnecessary retain method calls.
This reverts commit 31251c9 introduced in #30695.
We suspect this commit is causing the OOME's reported in #30811 and we will use this PR to test this assertion.
This is related to #29500 and #28898. This commit removes the abilitiy
to disable http pipelining. After this commit, any elasticsearch node
will support pipelined requests from a client. Additionally, it extracts
some of the http pipelining work to the server module. This extracted
work is used to implement pipelining for the nio plugin.
The getDate() and getDates() existed prior to 5.x on long fields in
scripting. In 5.x, a new Date type for ScriptDocValues was added. The
getDate() and getDates() methods were left on long fields and added to date
fields to ease the transition. This commit removes those methods for
7.0.
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.
closes#12792
Systemd overrides should happen through /etc/systemd/system, not
directly editing the service file. This commit removes marking the
service file as configuration for rpm and deb packages.
Today when an index is created from shrinking or splitting an existing
index, the target index inherits almost none of the source index
settings. This is surprising and a hassle for operators managing such
indices. Given this is the default behavior, we can not simply change
it. Instead, we start by introducing the ability to copy settings. This
flag can be set on the REST API or on the transport layer and it has the
behavior that it copies all settings from the source except non-copyable
settings (a property of a setting introduced in this
change). Additionally, settings on the request will always override.
This change is the first step in our adventure:
- this flag is added here in 7.0.0 and immediately deprecated
- this flag will be backported to 6.4.0 and remain deprecated
- then, we will remove the ability to set this flag to false in 7.0.0
- finally, in 8.0.0 we will remove this flag and the only behavior will
be for settings to be copied
A previous change modified the output of the thread pool info contained
in the nodes info API. This commit adds a note to the migration docs for
this change.
This metric previously existed for backwards compatibility reasons
although the suggest stats were folded into search stats. This metric
was deprecated in 6.3.0 and this commit removes them for 7.0.0.
The name of the bulk thread pool was renamed to "write" with "bulk" as a
fallback name. This change was made in 6.x for BWC reasons yet in 7.0.0
we are removing this fallback. This commit removes this fallback for the
write thread pool.
Now that single-document indexing requests are executed on the bulk
thread pool the index thread pool is no longer needed. This commit
removes this thread pool from Elasticsearch.
CRUD: Parsing changes for UpdateRequest (#29293)
Use `ObjectParser` to parse `UpdateRequest` so we reject unknown fields
and drop support for the `_fields` parameter because it was deprecated
in 5.x.
This change validates that the `_search` request does not have trailing
tokens after the main object and fails the request with a parsing exception otherwise.
Closes#28995
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
From 7.0 on, using `delimited_payload_filter` should throw an error.
It was deprecated in 6.2 in favour of `delimited_payload` (#26625).
Relates to #27704
This improves the way similarities are plugged in in order to:
- reject the classic similarity on 7.x indices and emit a deprecation
warning otherwise
- reject unkwown parameters on 7.x indices and emit a deprecation
warning otherwise
Even though this breaks the plugin API, I'd like to backport to 7.x so
that users can get deprecation warnings when they are doing something
that will become unsupported in the future.
Closes#23208Closes#29035
This commit removes some parameters deprecated in 6.x (or 5.x):
`use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`.
Closes#25551
This will reject mapping updates to the `_default_` mapping with 7.x indices
and still emit a deprecation warning with 6.x indices.
Relates #15613
Supersedes #28248
By the time the master branch is released the deprecated url
parameters in the `/_cache/clear` API will have been deprecated
for a couple of minor releases. Since master will be the next
major release we are fine with removing these parameters.
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.
Relates to https://github.com/elastic/kibana/issues/16764
* Reject regex search if regex string is too long (#28344)
* Add docs
* Introduce index level setting `index.max_regex_length`
to control the maximum length of the regular expression
Closes#28344
Similarly to what has been done for s3 and azure, this commit removes
the repository settings `application_name` and `connect/read_timeout`
in favor of client settings. It introduce a GoogleCloudStorageClientSettings
class (similar to S3ClientSettings) and a bunch of unit tests for that,
it aligns the documentation to be more coherent with the S3 one, it
documents the connect/read timeouts that were not documented at all and
also adds a new client setting that allows to define a custom endpoint.
This PR removes previously deprecated `isShardsAcked()` method in
favour of `isShardsAcknowledged()` on `CreateIndexResponse`, `CreateIndexClusterStateUpdateResponse` and `RolloverResponse`
Related to #27784
Follow-up of #27819
- Introduce index level settings to control the maximum number of terms
that can be used in a Terms Query
- Throw an error if a request exceeds this max number
Closes#18829
* Limit the analyzed text for highlighting
- Introduce index level settings to control the max number of character
to be analyzed for highlighting
- Throw an error if analysis is required on a larger text
Closes#27517
#27409 deprecated the incorrectly-spelled `levenstein` in favour of `levenshtein`.
#27526 deprecated the inconsistent `jarowinkler` in favour of `jaro_winkler`.
These changes were merged into 6.2, and this change removes them entirely in 7.0.
This commit adds a new dynamic cluster setting named `search.max_buckets` that can be used to limit the number of buckets created per shard or by the reduce phase. Each multi bucket aggregator can consume buckets during the final build of the aggregation at the shard level or during the reduce phase (final or not) in the coordinating node. When an aggregator consumes a bucket, a global count for the request is incremented and if this number is greater than the limit an exception is thrown (TooManyBuckets exception).
This change adds the ability for multi bucket aggregator to "consume" buckets in the global limit, the default is 10,000. It's an opt-in consumer so each multi-bucket aggregator must explicitly call the consumer when a bucket is added in the response.
Closes#27452#26012
Add an index level setting `index.analyze.max_token_count` to control
the number of generated tokens in the _analyze endpoint.
Defaults to 10000.
Throw an error if the number of generated tokens exceeds this limit.
Closes#27038
Today we refresh automatically in the background by default very second.
This default behavior has a significant impact on indexing performance
if the refreshes are not needed.
This change introduces a notion of a shard being `search idle` which a
shard transitions to after (default) `30s` without any access to an
external searcher. Once a shard is search idle all scheduled refreshes
will be skipped unless there are any refresh listeners registered.
If a search happens on a `serach idle` shard the search request _park_
on a refresh listener and will be executed once the next scheduled refresh
occurs. This will also turn the shard into the `non-idle` state immediately.
This behavior is only applied if there is no explicit refresh interval set.
The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is
deprecated and should be replaced by `delimited_payload`.
Closes#21978
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.
This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.
NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.
This is a 7.0 only change.
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.
Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.
Closes#26962
Stardardize underscore requirements in parameters across different type of
requests:
_index, _type, _source, _id keep their underscores
params like version and retry_on_conflict will be without underscores
Throw an error if older versions of parameters are used
BulkRequest, MultiGetRequest, TermVectorcRequest, MoreLikeThisQuery
were changed
Closes#26886
Queries that create a scroll context cannot use the cache.
They modify the search context during their execution so using the cache
can lead to duplicate result for the next scroll query.
This change fails the entire request if the request_cache option is explictely set
on a query that creates a scroll context (`scroll=1m`) and make sure internally that we never
use the cache for these queries when the option is not explicitely used.
For 6.x a deprecation log will be printed instead of failing the entire request and the request_cache hint
will be ignored (forced to false).
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
The shard preference _primary, _replica and its variants were useful
for the asynchronous replication. However, with the current impl, they
are no longer useful and should be removed.
Closes#26335
Numeric fields no longer support the index_options parameter. This changes the parameter
to be rejected in numeric field types after it was deprecated in 6.0.
Closes#21475
Follow up for #23405.
We remove azure deprecated settings in 7.0:
* The legacy azure settings which where starting with `cloud.azure.storage.` prefix have been removed.
This includes `account`, `key`, `default` and `timeout`.
You need to use settings which are starting with `azure.client.` prefix instead.
* Global timeout setting `cloud.azure.storage.timeout` has been removed.
You must set it per azure client instead. Like `azure.client.default.timeout: 10s` for example.
* Remove the _all metadata field
This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
We use `:` for cross-cluster search (eg `cluster:index`), therefore, we should
not allow the ambiguity when allowing cluster or index names.
Relates to #23892
* Migrate migration docs from 6.0 to 7.0
Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.
* Remove release notes as well
They link to the migration guides, so they have to go.
* Add placeholder notes for 7.0 so doc build is happy
An array of values is required because there is no default (or
reasonable way to set a default). But validation for values
only happens if it is actually set. If the values param is omitted
entirely than the agg builder will NPE.
The environment variable CONF_DIR was previously inconsistently used in
our packaging to customize the location of Elasticsearch configuration
files. The importance of this environment variable has increased
starting in 6.0.0 as it's now used consistently to ensure Elasticsearch
and all secondary scripts (e.g., elasticsearch-keystore) all use the
same configuration. The name CONF_DIR is there for legacy reasons yet
it's too generic. This commit renames CONF_DIR to ES_PATH_CONF.
Relates #26197
With this commit we remove the following three previously unused
(and undocumented) Netty 4 related settings:
* transport.netty.max_cumulation_buffer_capacity,
* transport.netty.max_composite_buffer_components and
* http.netty.max_cumulation_buffer_capacity
from Elasticsearch.
This commit updates the docs for the config files to explain the new
mechanism for customizing the configuration directory via the
environment variable CONF_DIR.
Relates #25990
This commit changes the way we handle field expansion in `match`, `multi_match` and `query_string` query.
The main changes are:
- For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.
- For partial field names (with `*` suffix), the expansion is done only on `keyword`, `text`, `date`, `ip` and `number` field types. Other field types are simply ignored.
- For all fields (`*`), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.
- The `*` notation can also be used to set `default_field` option on`query_string` query. This should replace the needs for the extra option `use_all_fields` which is deprecated in this change.
This commit also rewrites simple `*` query to matchalldocs query when all fields are requested (Fixes#25556).
The same change should be done on `simple_query_string` for completeness.
`use_all_fields` option in `query_string` is also deprecated in this change, `default_field` should be set to `*` instead.
Relates #25551
Also has updates to ScriptMetaData for allowing the old namespace format to be loaded all the way back through 5.0; however, it will throw an exception if two scripts share the same id but different languages.
Today we enable users to customize the environment through the use of
ES_INCLUDE. This made sense for legacy reasons when we did not have
nicities like jvm.options (so dumped JVM options in the default include
script) and somewhat duplicates some of the functionality that we will
need from a dedicated environment script. This commit removes support
for ES_INCLUDE as a first step towards a dedicated include script.
Relates #25804
This commit expands on the migration note regarding the removal of
default.path.data and default.path.logs to include a note that users
that were relying on the defaults (the common case for path.logs), and
they carry over their previous elasticsearch.yml configruation file,
then they must add explicit values for path.data and path.logs.
403 can be confused with security. If an API doesn't support working against closed indices and closed indices are referred to in a request, that is a bad request, hence 400 is more appropriate.
Currently the `to` and `from` parameter in the `date_range` aggregation is not
parsed with the correct date field format from the mappings or the aggregation
if the argument is numeric, but always treated as a long value specifying
`epoch_millis`. This leads to problems e.g. when the format is `epoch_second`,
but the `to` and `from` are currently treated as millis.
With this change, we interpret these parameters according to the `format` of the target field.
If the `format` in the mappings is not compatible with numeric input values,
a compatible `format` (e.g. `epoch_millis`, `epoch_second`) must be specified in
the `date_range` aggregation itself, otherwise an error is thrown.
#Closes #17920
This change removes the leniency of having a `null` index to fetch
terms from in 6.0 onwards. This feature will be deprecated in the 5.x series
and 6.0 nodes will require the index to be set.
Closes#25750
This change refactors the query_string query to analyze the query text around logical operators of the query string the same way than a match_query/multi_match_query.
It also adds a type parameter that can be used to change the way multi fields query are built the same way than a multi_match query does.
Now that these queries share the same behavior regarding text analysis, some parameters are obsolete and have been deprecated:
split_on_whitespace: This setting is now ignored with a deprecation notice
if it is used explicitely. With this PR The query_string always splits on logical operator.
It simplifies the understanding of the other parameters that can have different meanings
depending on the value of split_on_whitespace.
auto_generate_phrase_queries: This setting is now ignored with a deprecation notice
if it is used explicitely. This setting only makes sense when the parser splits on whitespace.
use_dismax: This setting is now ignored with a deprecation notice
if it is used explicitely. The tie_breaker parameter is sufficient to handle best_fields/most_fields.
Fixes#25574
This commit removes the environment variable ES_JVM_OPTIONS that allows
the jvm.options file to sit separately from the rest of the config
directory. Instead, we use the CONF_DIR environment variable for custom
configuration location just as we do for the other configuration files.
Relates #25679
Requests that execute a stored script will no longer be allowed to specify the lang of the script. This information is stored in the cluster state making only an id necessary to execute against. Putting a stored script will still require a lang.