I am not sure why we have this leniency for HTTP max content length, it
has been there since the beginning
(5ac51ee93f) with no explanation of its
source. That said, our philosophy today is different than the philosophy
of the past where Elasticsearch would be quite lenient in its handling
of settings and today we aim for predictability for both users and
us. This commit removes leniency in the parsing of
http.max_content_length.
Today this part of the documentation just says that Geo queries are not 100%
accurate, but in fact we can be more precise about which kinds of queries see
which kinds of error. This commit clarifies this point.
At time of writing, GeoJSON did not enforce a specific ordering of vertices in
a polygon, but it now does. We occasionally get reports of Elasticsearch
rejecting apparently-valid GeoJSON because of badly oriented polygons, and it's
helpful to be able to point at this bit of the documentation when responding.
As follow up to #28245 , this PR removes the logic for selecting the
right start commit from the Engine constructor in favor of explicitly
trimming them in the Store, before the engine is opened. This makes the
constructor in engine follow standard Lucene semantics and use the last
commit.
Relates #28245
Relates #29156
#28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change
translog and lucene commit points. That util class bundled everything that's needed to create and
empty shard, bootstrap a shard from a lucene index that was just restored etc.
In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That
would sometime fail due to concurrent shard store fetching or other short activities that require the
files not to be changed while they read from them.
Since there is no way to wait on the index writer lock, the `Store` class has other locks to make
sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this
PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and
shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the
translog manipulations on their own.
This change refactors the composite aggregation to add an execution mode that visits documents in the order of the values
present in the leading source of the composite definition. This mode does not need to visit all documents since it can early terminate
the collection when the leading source value is greater than the lowest value in the queue.
Instead of collecting the documents in the order of their doc_id, this mode uses the inverted lists (or the bkd tree for numerics) to collect documents
in the order of the values present in the leading source.
For instance the following aggregation:
```
"composite" : {
"sources" : [
{ "value1": { "terms" : { "field": "timestamp", "order": "asc" } } }
],
"size": 10
}
```
... can use the field `timestamp` to collect the documents with the 10 lowest values for the field instead of visiting all documents.
For composite aggregation with more than one source the execution can early terminate as soon as one of the 10 lowest values produces enough
composite buckets. For instance if visiting the first two lowest timestamp created 10 composite buckets we can early terminate the collection since it
is guaranteed that the third lowest timestamp cannot create a composite key that compares lower than the one already visited.
This mode can execute iff:
* The leading source in the composite definition uses an indexed field of type `date` (works also with `date_histogram` source), `integer`, `long` or `keyword`.
* The query is a match_all query or a range query over the field that is used as the leading source in the composite definition.
* The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only).
If these conditions are not met this aggregation visits each document like any other agg.
The rank_eval documentation was missing an explanation of the parameter
`k` that controls the number of top hits that are used in the ranking evaluation.
Closes#29205
This enhancement adds Z value support (source only) to geo_shape fields. If vertices are provided with a third dimension, the third dimension is ignored for indexing but returned as part of source. Like beofre, any values greater than the 3rd dimension are ignored.
closes#23747
This commit removes some parameters deprecated in 6.x (or 5.x):
`use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`.
Closes#25551
This commit adds a new setting `cluster.persistent_tasks.allocation.enable`
that can be used to enable or disable the allocation of persistent tasks.
The setting accepts the values `all` (default) or `none`. When set to
none, the persistent tasks that are created (or that must be reassigned)
won't be assigned to a node but will reside in the cluster state with
a no "executor node" and a reason describing why it is not assigned:
```
"assignment" : {
"executor_node" : null,
"explanation" : "persistent task [foo/bar] cannot be assigned [no
persistent task assignments are allowed due to cluster settings]"
}
```
This will reject mapping updates to the `_default_` mapping with 7.x indices
and still emit a deprecation warning with 6.x indices.
Relates #15613
Supersedes #28248
Update allocation awareness docs
Today, the docs imply that if multiple attributes are specified the the
whole combination of values is considered as a single entity when
performing allocation. In fact, each attribute is considered separately. This
change fixes this discrepancy.
It also replaces the use of the term "awareness zone" with "zone or domain", and
reformats some paragraphs to the right width.
Fixes#29105
This is a follow up to a previous change which set the error file path
for the package distributions. The observation here is that we always
set the working directory of Elasticsearch to the root of the
installation (i.e., Elasticsearch home). Therefore, we can specify the
error file path relative to this directory and default it to the logs
directory, similar to the package distributions.
This is a follow up to a previous change which set the heap dump path
for the package distributions. The observation here is that we always
set the working directory of Elasticsearch to to the root of
installation (i.e., Elasticsearch home). Therefore, we can specify the
heap dump path relative to this directory and default it to the data
directory, similar to the package distributions.
Adds support for triple quoted strings to the documentation test
generator. Kibana's CONSOLE tool has supported them for a year but we
were unable to use them in Elasticsearch's docs because the process that
converts example snippets into tests couldn't handle this. This change
adds code to convert them into standard JSON so we can pass them to
Elasticsearch.
I have seen this question a couple times already, most recently at
https://twitter.com/dimosr7/status/973872744965332993
I tried to keep the explanation as simple as I could, which is not always easy
as this is a matter of trade-offs.
By the time the master branch is released the deprecated url
parameters in the `/_cache/clear` API will have been deprecated
for a couple of minor releases. Since master will be the next
major release we are fine with removing these parameters.
Currently we have a fairly complicated logic in the engine constructor logic to deal with all the
various ways we want to mutate the lucene index and translog we're opening.
We can:
1) Create an empty index
2) Use the lucene but create a new translog
3) Use both
4) Force a new history uuid in all cases.
This leads complicated code flows which makes it harder and harder to make sure we cover all the
corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens
things as they are and all needed modifications are done by static methods directly on the
directory, one at a time.
* Add a REST integration test that documents date_range support
Add a test case that exercises date_range aggregations using the missing
option.
Addresses #17597
* Test cleanup and correction
Adding a document with a null date to exercise `missing` option, update
test name to something reasonable.
* Update documentation to explain how the "missing" parameter works for
date_range aggregations.
* Wrap lines at 80 chars in docs.
* Change format of test to YAML for readability.
The current docs on [Indices APIs: PUT Mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html) suggests that a having number of different mapping types per index is still possible in elasticsearch versions > 6.0.0 although they have been [removed](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html). The console code has already been updated accordingly but notes (2) and (3) on the console code still name the `user` mapping type.
This PR updates the list with notes after the console code, as well as the first sentence of the docs
to avoid confusion. Also, I have removed the second command from the console code as it no
longer holds any value if the docs are solely on the `_doc` mapping.