I have seen this question a couple times already, most recently at
https://twitter.com/dimosr7/status/973872744965332993
I tried to keep the explanation as simple as I could, which is not always easy
as this is a matter of trade-offs.
Additionally:
* Included the existing update by query java api docs in java-api docs.
(for some reason it was never included, it needed some tweaking and
then it was good to go)
* moved delete-by-query / update-by-query code samples to java file so
that we can verify that these samples at least compile.
Closes#24203
The Java API documentation for index administration currenty is wrong because
the PutMappingRequestBuilder#setSource(Object... source) an
CreateIndexRequestBuilder#addMapping(String type, Object... source) methods
delegate to methods that check that the input arguments are valid key/value
pairs. This changes the docs so the java api code examples are included from
documentation integration tests so we detect compile and runtime issues earlier.
Closes#28131
By the time the master branch is released the deprecated url
parameters in the `/_cache/clear` API will have been deprecated
for a couple of minor releases. Since master will be the next
major release we are fine with removing these parameters.
Currently we have a fairly complicated logic in the engine constructor logic to deal with all the
various ways we want to mutate the lucene index and translog we're opening.
We can:
1) Create an empty index
2) Use the lucene but create a new translog
3) Use both
4) Force a new history uuid in all cases.
This leads complicated code flows which makes it harder and harder to make sure we cover all the
corner cases. This PR tries to take another approach. Constructing an InternalEngine always opens
things as they are and all needed modifications are done by static methods directly on the
directory, one at a time.
We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.
We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.
Which allows running:
```
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information. Used to parse pdf and office files",
"processors" : [
{
"attachment" : {
"field" : "data",
"indexed_chars_field" : "size"
}
}
]
}
```
Then index either:
```
PUT index/doc/1?pipeline=attachment
{
"data": "BASE64"
}
```
Which will use the default value (or the one defined by `indexed_chars`)
Or
```
PUT index/doc/2?pipeline=attachment
{
"data": "BASE64",
"size": 1000
}
```
Closes#28942
* Add a REST integration test that documents date_range support
Add a test case that exercises date_range aggregations using the missing
option.
Addresses #17597
* Test cleanup and correction
Adding a document with a null date to exercise `missing` option, update
test name to something reasonable.
* Update documentation to explain how the "missing" parameter works for
date_range aggregations.
* Wrap lines at 80 chars in docs.
* Change format of test to YAML for readability.
The current docs on [Indices APIs: PUT Mapping](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html) suggests that a having number of different mapping types per index is still possible in elasticsearch versions > 6.0.0 although they have been [removed](https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html). The console code has already been updated accordingly but notes (2) and (3) on the console code still name the `user` mapping type.
This PR updates the list with notes after the console code, as well as the first sentence of the docs
to avoid confusion. Also, I have removed the second command from the console code as it no
longer holds any value if the docs are solely on the `_doc` mapping.
The docs state that `_gce_` is recommended but the code sample states
that `_gce:hostname_` is recommended. This aligns the code sample with
the documentation. Also replace `type` with `zen.hosts_provider` as
discovery.type was removed in #25080.
With this commit we reduce heap usage of the ingest-geoip plugin by
memory-mapping the database files. Previously, we have stored these
files gzip-compressed but this has resulted that data are loaded on the
heap.
Closes#28782
The original example resulted in a 400 error due to the example being `-` separated instead of the default `.` separation.
```
failed to parse date field [2001-01-01] with format [YYYY.MM.dd]
```
Adds a usage example of the JLH score used in significant terms aggregation.
All other methods to calculate significance score have such an example
Closes#28513
Increase the default limit of `index.highlight.max_analyzed_offset` to 1M instead of previous 10K.
Enhance an error message when offset increased to include field name, index name and doc_id.
Relates to https://github.com/elastic/kibana/issues/16764
* Clarifies how the query_string splits textual part to build a query
Whitespaces are not considered as operators anymore in 6x but the documentation is not clear about it.
This commit changes the example in the documentation and adds a note regarding whitespaces and operators.
Closes#28719
Values for the network.host setting can often contain a colon which is a
character that is considered special by YAML (these arise in IPv6
addresses and some of the special tags like ":ipv4"). As such, these
values need to be quoted or a YAML parser will be unhappy with
them. This commit adds a note to the docs regarding this.
* Reject regex search if regex string is too long (#28344)
* Add docs
* Introduce index level setting `index.max_regex_length`
to control the maximum length of the regular expression
Closes#28344
Similarly to what has been done for s3 and azure, this commit removes
the repository settings `application_name` and `connect/read_timeout`
in favor of client settings. It introduce a GoogleCloudStorageClientSettings
class (similar to S3ClientSettings) and a bunch of unit tests for that,
it aligns the documentation to be more coherent with the S3 one, it
documents the connect/read timeouts that were not documented at all and
also adds a new client setting that allows to define a custom endpoint.
We previously specified the -server flag to force the JVM to use the
server JVM. This is the default on all the systems that we support when
using a 64-bit JVM (and we no longer support 32-bit JVMs). There was
some trouble with this flag for the Windows service since procrun did
not understand what to do with it; as such, we had to filter this flag
out in the service. When we migrated to parsing JVM options in Java (via
the JVM options parser) we simplified this situation and removed
specifying the -server flag. This commit removes a leftover statement
that we are forcing the server JVM.
Relates #28738
The node stats API enables filtlering the top-level stats for only
desired top-level stats. Yet, this was never enabled for adaptive
replica selection stats. This commit enables this. We also add setting
these stats on the request builder, and fix an inconsistent name in a
setter.
Relates #28721
The Windows service will use a private temporary directory under the
user that is performing the installation. In cases when the service will
run as a different user, operators need a method to set this temporary
directory elsewhere. We have such a mechanism, so this commit merely
adds a note to the documentation on how to utilize it.
Relates #28712
Elasticsearch 6.x indices do not allow multiple types index. Instead, they use "_doc" as default if created internally (Elasticsearch), or "doc" default if sent by Logstash.
Currently the Translog constructor is capable both of opening an existing translog and creating a
new one (deleting existing files). This PR separates these two into separate code paths. The
constructors opens files and a dedicated static methods creates an empty translog.
This directory was removed from plugins in #28589, but docs still
referenced it. This commit cleans up the plugin author docs to no longer
refer to it.
* Search option terminate_after does not handle post_filters and aggregations correctly
This change fixes the handling of the `terminate_after` option when post_filters (or min_score) are used.
`post_filter` should be applied before `terminate_after` in order to terminate the query when enough document are accepted
by the post_filters.
This commit also changes the type of exception thrown by `terminate_after` in order to ensure that multi collectors (aggregations)
do not try to continue the collection when enough documents have been collected.
Closes#28411
We do want to keep this functionality in the future and we provide support for it.
This change is a first step towards replacing the `synonym` token filter with `synonym_graph`.
Currently the callouts for this section are below all the examples, making it
harder to relate them to the snippets. Instead they should be moved closer
to the examples.
* [DOCS] expand examples on providing mappings for create index and put mapping
The create index API and put mappings API docs the for high-level Java REST client didn't have a lot of info on how to provide mappings. This commit adds some examples.
This commit splits the async execution documentation into 2 parts, one
for the async method itself and one for the action listener. This allows
to add more doc and to use CountDownLatches in doc tests to wait for
asynchronous operations to be completed before moving to the next test.
It also renames few files.
Related to #28457
Adds allow_partial_search_results flag to search requests with default setting = true.
When false, will error if search either timeouts, has partial errors or has missing shards rather
than returning partial search results. A cluster-level setting provides a default for search requests with no flag.
Closes#27435
The `terms` query is really designed for filtering and highlighting it might
cause performance issues if it wraps many terms, so I am documenting
highlighting these queries as a best-effort only.
Closes#28099
This change adds support for the new ranking evaluation API to the High Level Rest Client.
This mostly means adding support for parsing the various response objects back from the
REST representation. It includes one change to the response syntax where previously we didn't
print the type of the metric details section but we now need it to pick the right parser to
parse this section back.
Closes#28198
This adds the ability to index term prefixes into a hidden subfield, enabling prefix queries to be run without multitermquery rewrites. The subfield reuses the analysis chain of its parent text field, appending an EdgeNGramTokenFilter. It can be configured with minimum and maximum ngram lengths. Query terms with lengths outside this min-max range fall back to using prefix queries against the parent text field.
The mapping looks like this:
"my_text_field" : {
"type" : "text",
"analyzer" : "english",
"index_prefix" : { "min_chars" : 1, "max_chars" : 10 }
}
Relates to #27049
Cluster settings shouldn't leak into the next test.
I played with failing the test if it left over any settings but that
felt like it added more ceremony then it was worth. The advantage is
that any test that intentionally wants to leave settings in place after
the test would fail and require looking at but, so far as I can tell, we
don't have any such tests.
Clear the disk watermark after the snippet showing users how to set it.
Without this our tests will fail if the disks have less than 10GB free.
Closes#28325
This pull request replaces the jvm-example plugin (from the jvm/site plugins era) by two new plugins: a custom-settings that shows how to register and use custom settings (including secured settings) in a plugin, and rest-handler plugin that shows how to register a rest handler.
The two plugins now reside in the plugins/examples project. They can serve as sample plugins for users, a special attention has been put on documentation. The packaging tests have been adapted to use the custom-settings plugin.
The implementation maintains the order of the original requests yet this
functionality is not documented. This commit adds a note to the docs
regarding the ordering of responses to an multi-get request.
Relates #28356