This adds a low level primitive operations to shrink an existing
index into a new index with a single shard. This primitive expects
all shards of the source index to allocated on a single node. Once the target index is initializing on the shrink node it takes a snapshot of the source index shards and copies all files into the target indices data folder. An [optimization](https://issues.apache.org/jira/browse/LUCENE-7300) coming in Lucene 6.1 will also allow for optional constant time copy if hard-links are supported by the filesystem. All mappings are merged into the new indexes metadata once the snapshots have been taken on the merge node.
To shrink an existing index all shards must be moved to a single node (one instance of each shard) and the index must be read-only:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_settings' -d '{
"settings" : {
"index.routing.allocation.require._name" : "shrink_node_name",
"index.blocks.write" : true
}
}
```
once all shards are started on the shrink node. the new index can be created via:
```BASH
$ curl -XPUT 'http://localhost:9200/logs/_shrink/logs_single_shard' -d '{
"settings" : {
"index.codec" : "best_compression",
"index.number_of_replicas" : 1
}
}'
```
This API will perform all needed check before the new index is created and selects the shrink node based on the allocation of the source index. This call returns immediately, to monitor shrink progress the recovery API should be used since all copy operations are reflected in the recovery API with byte copy progress etc.
The shrink operation does not modify the source index, if a shrink operation should
be canceled or if the shrink failed, the target index can simply be deleted and
all resources are released.
Like on other places in the query dsl the full field name should be used.
Before this change this wasn't the case for nested inner hits when source filtering was used.
Highlighting has a workaround, which is now removed as the source of nested inner hits can only be refered by the full name.
Closes#16653
This commit adds documentation for the bootstrap checks and provides
either links or inline guidance for setting the necessary settings to
pass the bootstrap checks.
Relates #18605
* changes `throttle_retries` to `use_throttle_retries`
* removes registering of all individual repository settings when the plugin starts. Not needed
* adds more comment about deprecated method in AWS SDK we need to implement though in a Delegate class within our tests
This PR changes the InternalTestCluster to support dedicated master nodes. The creation of dedicated master nodes can be controlled using a new `supportsMasterNodes` parameter to the ClusterScope annotation. If set to true (the default), dedicated master nodes will randomly be used. If set to false, no master nodes will be created and data nodes will also be allowed to become masters. If active, test runs will either have 1 or 3 masternodes
This commit removes the ability to specify a custom plugins
path. Instead, the plugins path will always be a subdirectory called
"plugins" off of the home directory.
- now you can specify a list of grok patterns to match your field with
and the first one to successfully match wins.
- only non-null captures will be inserted into your matched document.
Fixes#17903.
`doc_values` for _type field are created but any attempt to load them throws an IAE.
This PR re-enables `doc_values` loading for _type, it also enables `fielddata` loading for indices created between 2.0 and 2.1 since doc_values were disabled during that period.
It also restores the old docs that gives example on how to sort or aggregate on _type field.
The syntax highlighter only supports [source,js].
Also adds a check to the rest test generator that runs during
the build that'll fail the build if it sees `[source,json]`.
Significant changes:
* AbstractQueryTestCase has moved to the test framework module, in order for query builder tests in modules and plugins
* Added support to AbstractQueryTestCase to register plugins
* Lift the restriction that only one percolator could be added per index. This validation existed in MapperService, but because the percolator moved to a module it could no longer exist there. Instead of bringing it back it was removed. This validation existed since the percolator cache only supported one percolator query per document, since the percolator cache has been removed this restriction could removed as well.
* While moving percolator tests to the new module, also removed a couple of tests for the deprecated percolate and mpercolate api. These APIs are now sugar APIs for bwc and rediect to the searvh and msearvh APIs. Some tests were still testing as if percolate and mpercolate API did the percolation, but this no longer the case and these tests could be removed.
Before the query extraction would have been aborted and the percolator query would be marked as unknown.
This resulted in a situation that these queries always need to be evaluated by the memory index at search time.
By adding support for this query many more percolator query candidate hits can skip the expensive memory index verification step. For example the `match` query parser returns a MatchNoDocsQuery if the query terms are removed by text analysis (lets query text only contained stop words).
Remove the arbitrary limit on epoch_millis and epoch_seconds of 13 and 10
characters, respectively. Instead allow any character combination that can
be converted to a Java Long.
Update the docs to reflect this change.
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
`index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated
shards with less failures than the number set per index will be allowed to allocate again.
Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards
has been started.
Relates to #18417
Before 5.0 for it was required that the percolator queries were cached in jvm heap as Lucene queries for two reasons:
1) Performance. The percolator evaluated all percolator queries all the time. There was no pre-selecting queries that are likely to match like we have today.
2) Updates made to percolator queries were visible in realtime, Today these changes are visible in near realtime. So updating no longer requires the percolator to have the queries in jvm heap.
So having the percolator queries in jvm heap via the percolator cache is now less attractive. Especially when there are many percolator queries then these queries can consume many GBs of jvm heap.
Removing the percolator cache does make the percolate query slower compared to how the execution time in 5.0.0-alpha1 and alpha2, but it is still faster compared to 2.x and before.
Currently the query builders expose the clauses of the span
query as a modifiable list. Instead we should make the that
getter return an unmodifiable list. Also renaming the method
used to add a clause from `clause(spanQuery)` to
`addClause(spanQuery)`.
Today when parsing settings during bootstrap, we add a system property
for every Elasticsearch setting. Additionally, settings can be set via
system properties. This commit simplifies this situation.
- settings are no longer propogated to system properties
- system properties can not be used to set settings
- the "es." prefix on settings is no longer required (nor permitted)
- test logging has a dedicated system property (tests.logger.level)
Relates #18198
* Docs: Improved tokenizer docs
Added descriptions and runnable examples
* Addressed Nik's comments
* Added TESTRESPONSEs for all tokenizer examples
* Added TESTRESPONSEs for all analyzer examples too
* Added docs, examples, and TESTRESPONSES for character filters
* Skipping two tests:
One interprets "$1" as a stack variable - same problem exists with the REST tests
The other because the "took" value is always different
* Fixed tests with "took"
* Fixed failing tests and removed preserve_original from fingerprint analyzer
I initially wrongly put this setting under `cloud.aws.s3.` prefix which does not make sense. It should be placed at the same place as `max_retries`.
Also applied @tlrx comments. We should set this even if max_retries is not set (when using default values).
Also added some documentation about this setting.
* Register `indices.query.bool.max_clause_count` setting
This commit registers `indices.query.bool.max_clause_count` as a node
level setting and removes support for its synonym setting
`index.query.bool.max_clause_count`.
Closes#18336
Two of the snippets in validate weren't working properly so they are
marked as skip and linked to this:
https://github.com/elastic/elasticsearch/issues/18254
We didn't properly handle empty parameter values. We were sending
them as the literal string "null". Now we do better and send them
as the empty string.
This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.
Relates #15915
This uses the same backoff policy we use for bulk and just retries until
the request isn't rejected.
Instead of `{"retries": 12}` in the response to count retries this now
looks like `{"retries": {"bulk": 12", "search": 1}`.
Closes#18059
Sorts an array of values in ascending or descending order. If all elements are numerics, they will be sorted numerically. If values are strings, or mixtures of strings/numbers, the elements will be sorted lexicographically.
This change does the following:
- Queries that are currently unsupported such as prefix queries on numeric
fields or term queries on geo fields now throw an error rather than returning
a query that does not match anything.
- Fuzzy queries on numeric, date and ip fields are now unsupported: they used
to create range queries, we now expect users to use range queries directly.
Fuzzy, regexp and prefix queries are now only supported on text/keyword
fields (including `_all`).
- The `_uid` and `_id` fields do not support prefix or range queries anymore as
it would prevent us to store them more efficiently in the future, eg. by
using a binary encoding.
Note that it is still possible to ignore these errors by using the `lenient`
option of the `match` or `query_string` queries.
This commit adds a note to the Windows service docs regarding the thread
stack size setting for the Windows service installer. As the Apache
Commons procrun daemon requires that this setting be explicitly set, we
need a value to be set when the service is installed. The right place
for this setting is the jvm.options file. We do not want to ship with a
hard-coded value here because we do not want to override the default
setting on other platforms, and the right default depends on whether or
not the end-user is on a 32-bit versus a 64-bit Windows system.
Relates #18324
Previously multiple extensions could be provided, however, this can lead
to confusion with on-disk scripts (ie, "foo.js" and "foo.javascript")
having different content. Only a single extension is now supported.
The only language currently supporting multiple extensions was the
Javascript engine ("js" and "javascript"). It now only supports the
`.js` extension.
Relates to #10598
Currently `fuzziness` is not supported for the `cross_fields` type
of the `multi_match` query since it complicates the logic that
blends the term queries that cross_fields uses internally. At the
moment using this combination is silently ignored, which can lead to
confusions. Instead we should throw an exception in this case.
The same is true for phrase and phrase_prefix type.
Closes#7764
This removes all the mentions of the sandbox from the script engine
services and permissions model. This means that the following settings
are no longer supported:
```yaml
script.inline: sandbox
script.stored: sandbox
```
Instead, only a `true` or `false` value can be specified.
Since this would otherwise break the default-allow parameter for
languages like expressions, painless, and mustache, all script engines
have been updated to have individual settings, for instance:
```yaml
script.engine.groovy.inline: true
```
Would enable all inline scripts for groovy. (they can still be
overridden on a per-operation basis).
Expressions, Painless, and Mustache all default to `true` for inline,
file, and stored scripts to preserve the old scripting behavior.
Resolves#17114
Rearranges the FingerprintAnalyzer so that AsciiFolding comes earlier in the chain (after lowercasing, before stop removal, for maximum deduping power)
Closes#18266
It will keep using the caching terms enum for keyword/text fields and falls back
to IndexSearcher.count for fields that do not use the inverted index for
searching (such as numbers and ip addresses). Note that this probably means that
significant terms aggregations on these fields will be less efficient than they
used to be. It should be ok under a sampler aggregation though.
This moves tests back to the state they were in before numbers started using
points, and also adds a new test that significant terms aggs fail if a field is
not indexed.
In the long term, we might want to follow the approach that Robert initially
proposed that consists in collecting all documents from the background filter in
order to compute frequencies using doc values. This would also mean that
significant terms aggregations do not require fields to be indexed anymore.
* Docs: First pass at improving analyzer docs
I've rewritten the intro to analyzers plus the docs
for all analyzers to provide working examples.
I've also removed:
* analyzer aliases (see #18244)
* analyzer versions (see #18267)
* snowball analyzer (see #8690)
Next steps will be tokenizers, token filters, char filters
* Fixed two typos
This commit adds a hard requirement to the RPM and Debian packages for
/bin/bash to be present, and adds a note regarding this to the migration
docs.
Relates #18259
Remove performance hack for accessing a document's fields, its not needed.
Add support for accessing is-getter methods like List.isEmpty() as .empty
Closes#18201
This makes it much easier to apply to other projects.
Fixes to doc tests infrastructure:
* Fix comparing lists. Was totally broken.
* Fix order of actual vs expected parameters.
* Allow multiple `// TESTRESPONSE` lines with substitutions to join
into one big list of subtitutions. This makes lets the docs look
tidier.
* Exclude build from snippet scanning
* Allow subclasses of ESRestTestCase access to the admin execution context
The plugin script parses command-line options looking for Java system
properties and extracts these arguments to pass to the java command when
starting the JVM. Since elasticsearch-plugin allows arbitrary user
arguments to the JVM via ES_JAVA_OPTS, this parsing is unnecessary. This
commit removes this unnecessary
Relates #18207
We should prevent highlighting if a field is anything but a text or keyword field.
However, someone might implement a custom field type that has text and still want to
highlight on that. We cannot know in advance if the highlighter will be able to
highlight such a field and so we do the following:
If the field is only highlighted because the field matches a wildcard we assume
it was a mistake and do not process it.
If the field was explicitly given we assume that whoever issued the query knew
what they were doing and try to highlight anyway.
closes#17537
The `ip` field uses a binary representation internally. This breaks when
rendering sort values in search responses since elasticsearch tries to write a
binary byte[] as an utf8 json string. This commit extends the `DocValueFormat`
API in order to give fields a chance to choose how to render values.
Closes#6077
QueryBuilder has generics, but those are never used: all call sites use
`QueryBuilder<?>`. Only `AbstractQueryBuilder` needs generics so that the base
class can contain a default implementation for setters that returns `this`.
This gives better coverage and consistency with the scripting APIs, by
whitelisting the primary search scripting API classes and using them instead
of only Map and List methods.
For example, accessing fields can now be done with `.value` instead of `.0`
because `getValue()` is whitelisted. For now, access to a document's fields in
this way (loads) are fast-pathed in the code, to avoid dynamic overhead.
Access to geo fields and geo distance functions is now supported.
TODO: date support (e.g. whitelist ReadableDateTime methods as a start)
TODO: improve docs (like expressions and groovy have for document's fields)
TODO: remove fast-path hack
Closes#18169
Squashed commit of the following:
commit ec9f24b2424891a7429bb4c0a03f9868cba0a213
Author: Robert Muir <rmuir@apache.org>
Date: Thu May 5 17:59:37 2016 -0400
cutover to <Def> instead of <Object> here
commit 9edb1550438acd209733bc36f0d2e0aecf190ecb
Author: Robert Muir <rmuir@apache.org>
Date: Thu May 5 17:03:02 2016 -0400
add fast-path for docvalues field loads
commit f8e38c0932fccc0cfa217516130ad61522e59fe5
Author: Robert Muir <rmuir@apache.org>
Date: Thu May 5 16:47:31 2016 -0400
Painless: add fielddata accessors (.value/.values/.distance()/etc)
The snippet in the docs creates and index and uses it with the
_analyze api. The trouble is that if the index hasn't been created
fully the _analyze API will fail. This adds a
GET _cluster/health?wait_for_status=yellow
which fixes the issue.
While this does make the docs more cluttered, it also makes the snippets
actually runnable.
Closes#18165
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.
By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.
Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.
See docs/README.asciidoc for lots more.
Closes#12583. That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
All other values are errors.
Add java test for throttling. We had a REST test but it only ran against
one node so it didn't catch serialization errors.
Add Simple round trip test for rethrottle request
* Reorganize scripting documentation
* Further changes to tidy up scripting docs
Closes#18116
* Add note about .lat/lon potentially returning null
* Added .value to expressions example
* Fixed two bad ASCIIDOC links
With this commit we compress HTTP responses provided the client
supports it (as indicated by the HTTP header 'Accept-Encoding').
We're also able to process compressed HTTP requests if needed.
The default compression level is lowered from 6 to 3 as benchmarks
have indicated that this reduces query latency with a negligible
increase in network traffic.
Closes#7309
Fix a limitation that prevent from hierarchical inner hits be defined in query dsl.
Removed the nested_path, parent_child_type and query options from inner hits dsl. These options are only set by ES
upon parsing the has_child, has_parent and nested queries are using their respective query builders.
These options are still used internally, when these options are set a new private copy is created based on the
provided InnerHitBuilder and configuring either nested_path or parent_child_type and the inner query of the query builder
being used.
Closes#11118
* Add isSearchable and isAggregatable (collapsed to true if any of the instances of that field are searchable or aggregatable).
* Accept wildcards in field names.
* Add a section named conflicts for fields with the same name but with incompatible types (instead of throwing an exception).
Hi,
I've created a query builder DSL for Kotlin language that mimics the JSON query DSL.
This makes it easier to translate the documentation targeting the JSON api onto kotlin code.
Please consider adding it to the list of community clients.
Thanks,
Mike Buhot
This commit actually bounds the size of the generic thread pool. The
generic thread pool was of type cached, a thread pool with an unbounded
number of workers and an unbounded work queue. With this commit, the
generic thread pool is now of type scaling. As such, the cached thread
pool type has been removed. By default, the generic thread pool is
constructed with a core pool size of four, a max pool size of 128 and
idle workers can be reaped after a keep-alive time of thirty seconds
expires. The work queue for this thread pool remains unbounded.
The getting started docs use dynamic mappings. With the recent change to
string split into text and keyword, text lost the default ability to do
aggs. This was added back in #17188. This change updates the getting
started examples to use the keyword multi field added to dynamically
mapped text fields.
closes#17941
Lucene allows to create a ICUTokenizer with a special config argument
enabling the customization of the rule based iterator by providing
custom rules files.
This commit enable this feature. Users could provide a list of RBBI rule
files to ICU tokenizer.
closes#13146
* `rename` processor, renamed `to` to `target_field`
* `date` processor, renamed `match_field` to `field` and renamed `match_formats` to `formats`
* `geoip` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
* `attachment` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
Closes#17835
Adds a `fingerprint` token filter which uses Lucene's FingerprintFilter,
and a `fingerprint` analyzer that combines the Fingerprint filter with
lowercasing, stop word removal and asciifolding.
Closes#13325
In Elasticsearch 5.0.0, by default unquoted field names in JSON will be
rejected. This can cause issues, however, for documents that were
already indexed with unquoted field names. To alleviate this, a system
property has been added that can be enabled so migration can occur.
This system property will be removed in Elasticsearch 6.0.0
Resolves#17674
* Added an extra `field` parameter to the `percolator` query to indicate what percolator field should be used. This must be an existing field in the mapping of type `percolator`.
* The `.percolator` type is now forbidden. (just like any type that starts with a `.`)
This only applies for new indices created on 5.0 and later. Indices created on previous versions the .percolator type is still allowed to exist.
The new `percolator` field type isn't active in such indices and the `PercolatorQueryCache` knows how to load queries from these legacy indices.
The `PercolatorQueryBuilder` will not enforce that the `field` parameter is of type `percolator`.
We advertise in our documentation that byte units are like `kb`, `mb`... But we actually only support the simple notation `k` or `m`.
This commit adds support for the documented form and keeps the non documented options to avoid any breaking change.
It also adds support for `micros`, `nanos` and `d` as a time unit in `_cat` API.
Remove the support for `b` as a SizeValue unit. Actually, for numbers, when using raw numbers without unit, there is no text to add/parse after the number. For example, you don't write `10` as `10b`. We support option like `size=` in `_cat` API which means that we want to display raw data without unit (singles).
Documentation updated accordingly.
Add test for the empty size option.
Fix missing TimeValues options for some cat APIs
This makes all numeric fields including `date`, `ip` and `token_count` use
points instead of the inverted index as a lookup structure. This is expected
to perform worse for exact queries, but faster for range queries. It also
requires less storage.
Notes about how the change works:
- Numeric mappers have been split into a legacy version that is essentially
the current mapper, and a new version that uses points, eg.
LegacyDateFieldMapper and DateFieldMapper.
- Since new and old fields have the same names, the decision about which one
to use is made based on the index creation version.
- If you try to force using a legacy field on a new index or a field that uses
points on an old index, you will get an exception.
- IP addresses now support IPv6 via Lucene's InetAddressPoint and store them
in SORTED_SET doc values using the same encoding (fixed length of 16 bytes
and sortable).
- The internal MappedFieldType that is stored by the new mappers does not have
any of the points-related properties set. Instead, it keeps setting the index
options when parsing the `index` property of mappings and does
`if (fieldType.indexOptions() != IndexOptions.NONE) { // add point field }`
when parsing documents.
Known issues that won't fix:
- You can't use numeric fields in significant terms aggregations anymore since
this requires document frequencies, which points do not record.
- Term queries on numeric fields will now return constant scores instead of
giving better scores to the rare values.
Known issues that we could work around (in follow-up PRs, this one is too large
already):
- Range queries on `ip` addresses only work if both the lower and upper bounds
are inclusive (exclusive bounds are not exposed in Lucene). We could either
decide to implement it, or drop range support entirely and tell users to
query subnets using the CIDR notation instead.
- Since IP addresses now use a different representation for doc values,
aggregations will fail when running a terms aggregation on an ip field on a
list of indices that contains both pre-5.0 and 5.0 indices.
- The ip range aggregation does not work on the new ip field. We need to either
implement range aggs for SORTED_SET doc values or drop support for ip ranges
and tell users to use filters instead. #17700Closes#16751Closes#17007Closes#11513
The change adds a new option to the geo_* queries: ignore_unmapped. If this option is set to false, the toQuery method on the QueryBuilder will throw an exception if the field specified in the query is unmapped. If the option is set to true, the toQuery method on the QueryBuilder will return a MatchNoDocsQuery. The default value is false so the queries work how they do today (throwing an exception on unmapped field)
The change adds a new option to the `nested`, `has_parent`, `has_children` and `parent_id` queries: `ignore_unmapped`. If this option is set to false, the `toQuery` method on the QueryBuilder will throw an exception if the type/path specified in the query is unmapped. If the option is set to true, the `toQuery` method on the QueryBuilder will return a MatchNoDocsQuery. The default value is `false`so the queries work how they do today (throwing an exception on unmapped paths/types)
With this commit we limit the size of all in-flight requests on
transport level. The size is guarded by a circuit breaker and is
based on the content size of each request.
By default we use 100% of available heap meaning that the parent
circuit breaker will limit the maximum available size. This value
can be changed by adjusting the setting
network.breaker.inflight_requests.limit
Relates #16011
This commit adds a new configuration file jvm.options to centralize and
simplify management of JVM options. This separates the configuration of
the JVM from the packaging scripts (bin/elasticsearch*, bin/service.bat,
and init.d/elasticsearch) simplifying end-user operational management of
custom JVM options.
CBOR is natively supported in Elasticsearch and allows for byte arrays.
This means, that by using CBOR the user can prevent base64 conversions
for the data being sent back and forth.
This PR adds support to extract data from a byte array in addition to
a string. This also required to add a ByteArrayValueSource class.
The doc mentions match_path in one place but the correct syntax is path_match which is mentioned everywhere else. Using the wrong string leads to errors because the mapping becomes too greedy, and matches things it shouldn't.
Now the `match` query has been split out into `match`, `match_phrase` and `match_phrase_prefix` we need to update the docs to remove the deprecated syntax
Closes#17513
The current example in the documentation for Index Templates lacks any properties values. This is helpful to many devs that aren't sure how to take a regular Index Mapping and convert it to a template.
IMHO the original text here was incomplete. Adding the simple words 'in the index mapping' makes this sentence more clear. Perhaps a be more clear to make this a link.
* [TEST] check registered queries one by one in SearchModuleTests
* Switch to using ParseField to parse query names
If we have a deprecated query name, at the moment we don't have a way to log any deprecation warning nor fail when we are in strict mode. With this change we use ParseField, which will take care of the camel casing that we currently do manually (so that one day we can remove it more easily). This also means, that each query will have a unique preferred name, and all the other names are deprecated.
Terms query "in" synonym is now formally deprecated, as well as fuzzy_match, match_fuzzy, match_phrase and match_phrase_prefix for match query, mlt for more_like_this and geo_bbox for geo_bounding_box. All these will be removed in 6.0.
Every QueryParser holds now a ParseField constant called QUERY_NAME_FIELD that holds the name for it. The first name is the preferred one, all the others are deprecated. The first name is taken from the NAME constant already present in each query builder object, so that we somehow keep the serialization constant separated from ParseField. This change also allowed us to remove the names method from the QueryParser interface.
apart from locahost typo, the issue is that localhost is not 100% safe
for all distros with IPv6.
For example fedora23 defines localhost4 and localhost6 (among other
aliases) so `curl localhost:9200` doesn't work.
For this reason, I think it's safer to replace localhost with 127.0.0.1
By default, tasks are grouped by node. However, task execution in elasticsearch can be quite complex and an individual task that runs on a coordinating node can have many subtasks running on other nodes in the cluster. This commit makes it possible to list task grouped by common parents instead of by node. When this option is enabled all subtask are grouped under the coordinating node task that started all subtasks in the group. To group tasks by common parents, use the following syntax:
GET /tasks?group_by=parents
This commit adds the new `action.search.shard_count.limit` setting which
configures the maximum number of shards that can be queried in a single search
request. It has a default value of 1000.
Both top level and inline inner hits are now covered by InnerHitBuilder.
Although there are differences between top level and inline inner hits,
they now make use of the same builder logic.
The parsing of top level inner hits slightly changed to be more readable.
Before the nested path or parent/child type had to be specified as encapsuting
json object, now these settings are simple fields. Before this was required
to allow streaming parsing of inner hits without missing contextual information.
Once some issues are fixed with inline inner hits (around multi level hierachy of inner hits),
top level inner hits will be deprecated and removed in the next major version.
Today the basic node settings like `node.data` and `node.master` can't really be fully validated
since we allow to specify custom user attributes on the node level. We have to, in order to
support that, add a wildcard setting for `node.*` to let these setting pass validation.
Instead we should require a more contraint prefix like `node.attr.` that defines a namespace
that is reserved for user attributes.
This commit adds a new namespace for attributes in `node.attr`.
Closes#17280
This is to prevent mapping explosion when dynamic keys such as UUID are used as field names. index.mapping.total_fields.limit specifies the total number of fields an index can have. An exception will be thrown when the limit is reached. The default limit is 1000. Value 0 means no limit. This setting is runtime adjustable
Closes#11443
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.
It looks like this:
```
GET /_cluster/allocation/explain?pretty
{
"index": "only-foo",
"shard": 0,
"primary": false
}
```
Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".
The output when a shard is unassigned looks like this:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : false
},
"assigned" : false,
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2016-03-22T20:04:23.620Z"
},
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 0.06666675,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "NO",
"weight" : -1.3833332,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 2.3166666,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
And when the shard *is* assigned, the output looks like:
```
{
"shard" : {
"index" : "only-foo",
"index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
"id" : 0,
"primary" : true
},
"assigned" : true,
"assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
"nodes" : {
"V-Spi0AyRZ6ZvKbaI3691w" : {
"node_name" : "Susan Storm",
"node_attributes" : {
"bar" : "baz"
},
"final_decision" : "NO",
"weight" : 1.4499999,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
},
"Qc6VL8c5RWaw1qXZ0Rg57g" : {
"node_name" : "Slipstream",
"node_attributes" : {
"bar" : "baz",
"foo" : "bar"
},
"final_decision" : "CURRENTLY_ASSIGNED",
"weight" : 0.0,
"decisions" : [ {
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
} ]
},
"PzdyMZGXQdGhqTJHF_hGgA" : {
"node_name" : "The Symbiote",
"node_attributes" : { },
"final_decision" : "NO",
"weight" : 3.6999998,
"decisions" : [ {
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index include filters [foo:\"bar\"]"
} ]
}
}
}
```
Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.
Resolves#14593
https://github.com/elastic/elasticsearch/pull/17288 added a check to enforce that the `discovery.zen.minimum_master_nodes` configuration is set when nodes have the `host`, `port`, or `bind_host` set in either `transport` or general `network` configuration sections. This was documented incorrectly as "nodes that are bound to a non-loopback interface", which lead to confusion as I set `network.host: "localhost"` and the check was still failing.
This change updates the docs to detail the actual check. I think it also highlights how complex the check is and the need for a simpler solution.
discovery.zen.minimum_master_nodes is the single most important setting to set on a production cluster. We have no way of supplying a good default so it must be set by the user. Binding a node to a public IP (as opposed to the default local host) is a good enough indication that a node will be part of a production cluster cluster and thus it's a good tradeoff to enforce the settings. Note that nothing prevent users from setting it to 1 in a single node cluster.
Closes#17288
We currently have a `discovery.zen.master_election.filter_client` setting that control whether their ping responses are ignored for master election (which is the current default). With the push to treat client nodes as normal nodes (and promote the transport/rest clients for client work), this should be changed. This commit remove this setting and it's companion `discovery.zen.master_election.filter_data` setting (currently defaulting to false) in favor of singe `discovery.zen.master_election.ignore_non_master_pings` setting with more intuitive name (defaulting to false).
Resolves#17325Closes#17329
The available memory metric was always set to `0` since 2.0.beta1 (bug). was left behind but never set. Turns out the section wasn't that useful, as it would only output the total memory available throughout all nodes in the cluster. We decided to remove the section then.
In #17198, we removed suggest transport action, which
used the `suggest` threadpool to execute requests. Now
`suggest` threadpool is unused and suggest requests are
executed on the `search` threadpool.
We can be better at checking `buffer_size` and `chunk_size` for S3 repositories.
For example, we know that:
* `buffer_size` should be more than `5mb`
* `chunk_size` should be no more than `5tb`
* `buffer_size` should be lower than `chunk_size`
Otherwise, setting `buffer_size` is useless.
For the record:
`chunk_size` is a Snapshot setting whatever the implementation is.
`buffer_size` is an S3 implementation setting.
Let say that you are snapshotting a 500mb file. If you set `chunk_size` to `200mb`, then Snapshot service will call S3 repository to snapshot 3 files with the following sizes:
* `200mb`
* `200mb`
* `100mb`
If you set `buffer_size` to `100mb` (AWS maximum size recommendation), the first file of `200mb` will be uploaded on S3 using the multipart feature in 2 chunks and the workflow is basically the following:
* create the multipart request and get back an `id` from AWS S3 platform
* upload part1: `100mb`
* upload part2: `100mb`
* "commit" the full upload using the `id`.
Closes#17244.
Currently if you run an `exists` query on an object, it will resolve all sub
fields and create a disjunction for all those fields. However the `_field_names`
mapper indexes paths for objects so we could query object paths directly.
I also changed the query parser to reject `exists` queries if the `_field_names`
field is disabled since it would be a big performance trap.
In 5.0 we don't allow index settings to be specified on the node level ie.
in yaml files or via commandline argument. This can cause problems during
upgrade if this was used extensively. For instance if analyzers where
specified on a node level this might cause the index to be closed when
imported (see #17187). In such a case all indices relying on this
must be updated via `PUT /${index}/_settings`. Yet, this API has slightly
different semantics since it overrides existing settings. To make this less
painful this change adds a `preserve_existing` parameter on that API to ensure
we have the same semantics as if the setting was applied on the node level.
This change also adds a better error message and a change to the migration guide
to ensure upgrades are smooth if index settings are specified on the node level.
If a index setting is detected this change fails the node startup and prints a message
like this:
```
*************************************************************************************
Found index level settings on node level configuration.
Since elasticsearch 5.x index level settings can NOT be set on the nodes
configuration like the elasticsearch.yaml, in system properties or command line
arguments.In order to upgrade all indices the settings must be updated via the
/${index}/_settings API. Unless all settings are dynamic all indices must be closed
in order to apply the upgradeIndices created in the future should use index templates
to set default values.
Please ensure all required values are updated on all indices by executing:
curl -XPUT 'http://localhost:9200/_all/_settings?preserve_existing=true' -d '{
"index.number_of_shards" : "1",
"index.query.default_field" : "main_field",
"index.translog.durability" : "async",
"index.ttl.disable_purge" : "true"
}'
*************************************************************************************
```
Also replaced the PercolatorQueryRegistry with the new PercolatorQueryCache.
The PercolatorFieldMapper stores the rewritten form of each percolator query's xcontext
in a binary doc values field. This make sure that the query rewrite happens only during
indexing (some queries for example fetch shapes, terms in remote indices) and
the speed up the loading of the queries in the percolator query cache.
Because the percolator now works inside the search infrastructure a number of features
(sorting fields, pagination, fetch features) are available out of the box.
The following feature requests are automatically implemented via this refactoring:
Closes#10741Closes#7297Closes#13176Closes#13978Closes#11264Closes#10741Closes#4317
Today, certain bootstrap properties are set and read via system
properties. This action-at-distance way of managing these properties is
rather confusing, and completely unnecessary. But another problem exists
with setting these as system properties. Namely, these system properties
are interpreted as Elasticsearch settings, not all of which are
registered. This leads to Elasticsearch failing to startup if any of
these special properties are set. Instead, these properties should be
kept as local as possible, and passed around as method parameters where
needed. This eliminates the action-at-distance way of handling these
properties, and eliminates the need to register these non-setting
properties. This commit does exactly that.
Additionally, today we use the "-D" command line flag to set the
properties, but this is confusing because "-D" is a special flag to the
JVM for setting system properties. This creates confusion because some
"-D" properties should be passed via arguments to the JVM (so via
ES_JAVA_OPTS), and some should be passed as arguments to
Elasticsearch. This commit changes the "-D" flag for Elasticsearch
settings to "-E".
This commit adds fields bytes_recovered and files_recovered to the cat
recovery API. These fields, respectively, indicate the total number of
bytes and files recovered. Additionally, for consistency, some totals
fields and translog recovery fields have been renamed.
Closes#17064
Enables the touching of all memory pages used by the JVM heap spaces
during initialization of the HotSpot VM, which commits all memory pages
at initialization time. By default, pages are committed only as they are
needed.
The ingest stats include the following statistics:
* `ingest.total.count`- The total number of document ingested during the lifetime of this node
* `ingest.total.time_in_millis` - The total time spent on ingest preprocessing documents during the lifetime of this node
* `ingest.total.current` - The total number of documents currently being ingested.
* `ingest.total.failed` - The total number ingest preprocessing operations failed during the lifetime of this node
Also these stats are returned on a per pipeline basis.
This commit updates the documentation for GeoPointField by removing all references to the coerce and doc_values parameters. DocValues are enabled in lucene GeoPointField by default (required for boundary filtering). The QueryBuilders are updated to automatically normalize points (ignoring the coerce parameter) for any index created onOrAfter version 2.2.
Internally the put pipeline API uses this information in node info API to validate if all specified processors in a pipeline exist on all nodes in the cluster.
Closes#16964
Squashed commit of the following:
commit a23f9d2d29220991aa498214530753d7a5a148c6
Merge: eec9c4e 0b0a251
Author: Robert Muir <rmuir@apache.org>
Date: Mon Mar 7 04:12:02 2016 -0500
Merge branch 'master' into lucene6
commit eec9c4e5cd11e9c3e0b426f04894bb2a6dae4f21
Merge: bc67205 675d940
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 13:45:00 2016 -0500
Merge branch 'master' into lucene6
commit bc67205bdfe1526eae277ab7856fc050ecbdb7b2
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 09:56:31 2016 -0500
fix test bug
commit a60723b007ff12d97b1810cef473bd7b553a0327
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:35:35 2016 +0100
Fix SimpleValidateQueryIT to put braces around boosted terms
commit ae3a49d7ba7ced448d2a5262e5d8ec98671a9090
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:27:25 2016 +0100
fix multimatchquery
commit ae23fdb88a8f6d3fb7ba60fd1aaf3fd72d899aa5
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 15:20:49 2016 +0100
Rewrite DecayFunctionScoreIT to be independent of the similarity used
This test relied a lot on the term scoring and compared scores
that are dependent on the similarity. This commit changes the base query
to be a predictable constant score query.
commit 366c2d518c35d31251033f1b6f6a93f6e2ae327d
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 14:06:14 2016 +0100
Fix scoring in tests due to changes to idf calculation.
Lucene 6 uses a different default similarity as well as a different
way to calculate IDF. In contrast to older version lucene 6 uses docCount per field
to calculate the IDF not the # of docs in the index to overcome the sparse field
cases.
commit dac99fd64ac2fa71b8d8d106fe68825e574c49f8
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:21:57 2016 -0500
don't hardcoded expected termquery score
commit 6e9f340ba49ab10eed512df86d52a121aa775b0f
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 08:04:45 2016 -0500
suppress deprecation warning until migrated to points
commit 3ac8908424b3fdad44a90a4f7bdb3eff7efd077d
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:21:43 2016 -0500
Remove invalid test: all commits have IDs, and its illegal to do this.
commit c12976288124ad1a26467e7e848fb810548e7eab
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:06:14 2016 -0500
don't test with unsupported back compat
commit 18bbfe76128570bc70883bf91ff4c44c82d27817
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 07:02:18 2016 -0500
remove now invalid lucene 4 backcompat test
commit 7e730e572886f0ef2d3faba712e4256216ff01ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:58:52 2016 -0500
remove now invalid lucene 4 backwards test
commit 244d2ab6868ba5ac9e0bcde3c2833743751a25ec
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:47:23 2016 -0500
use 6.0 codec
commit 5f64d4a431a6fdaa1234adca23f154c2a1de8284
Author: Robert Muir <rmuir@apache.org>
Date: Fri Mar 4 06:43:08 2016 -0500
compile, javadocs, forbidden-apis, etc
commit 1f273cd62a7fe9ca8f8944acbbfc5cbdd3d81ccb
Merge: cd33921 29e3443
Author: Simon Willnauer <simonw@apache.org>
Date: Fri Mar 4 10:45:29 2016 +0100
Merge branch 'master' into lucene6
commit cd33921ac742ef9fb351012eff35f3c7dbda7264
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:58:37 2016 -0500
fix hunspell dictionary loading
commit c7fdbd837b01f7defe9cb1c24e2ec65604b0dc96
Merge: 4d4190f d8948ba
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:41:53 2016 -0500
Merge branch 'master' into lucene6
commit 4d4190fd82601aaafac6b8254ccb3edf218faa34
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:39:14 2016 -0500
remove nocommit
commit 77ca69e288b1a41aa9595c921ed166c272a00ea8
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:38:24 2016 -0500
clean up numericutils vs legacynumericutils
commit a466d696fbaad04b647ffbc0857a9439b583d0bf
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:32:43 2016 -0500
upgrade spatial4j
commit 5412c747a8cfe638bacedbc8233163cb75cc3dc5
Author: Robert Muir <rmuir@apache.org>
Date: Thu Mar 3 23:19:28 2016 -0500
move to 6.0.0-snapshot-8eada27
commit b32bfe924626b87e540692375ece09e7c2edb189
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:30:09 2016 +0100
Fix some test compile errors.
commit 6ccde35e9840b03c68d1a2cd47c7923a06edf64a
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:25:51 2016 +0100
Current Lucene version is 6.0.0.
commit f62e1015d931b4cc04c778298a8fa1ba65e97ad9
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 11:20:48 2016 +0100
Fix compile errors in NGramTokenFilterFactory.
commit 6837c6eabf96075f743649da9b9b52dd39611c58
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:50:59 2016 +0100
Fix the edge ngram tokenizer/filter.
commit ccd7f070de5efcdfbeb34b9555c65c4990bf1ba6
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:42:44 2016 +0100
The missing value is now accessible through a getter.
commit bd3b77f9b28e5b05daa3d49683a9922a6baf2963
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:41:51 2016 +0100
Remove IndexCacheableQuery.
commit 05f3091c347aeae80eeb16349ac51d2b53cf86f7
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:39:43 2016 +0100
Fix compilation of function_score queries.
commit 81cda79a2431ac78f56b0cc5a5765387f662d801
Author: Adrien Grand <jpountz@gmail.com>
Date: Thu Mar 3 10:35:02 2016 +0100
Fix compile errors in BlendedTermQuery.
commit 70994ce8dd1eca0b995870974a38e20f26f96a7b
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 23:33:03 2016 -0500
add bug ID
commit 29d4f1a71f36f646b5a6060bed3db019564a279d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 21:02:32 2016 -0500
easy .store changes
commit 5e1a1e6fd665fa455e88d3a8987362fad5f44bb1
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:47:24 2016 -0500
cleanups mostly around boosting
commit 333a669ec6c305ada5645d13ed1da0e19ec1d053
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 20:27:56 2016 -0500
more simple fixes
commit bd5cd98a1e089c866b6b4a5e159400b110140ce6
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:49:38 2016 -0500
more easy fixes and removal of ancient cruft
commit a68f419ee47da5f9c9ce5b372f01d707e902474c
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 19:35:02 2016 -0500
cutover numerics
commit 4ca5dc1fa47dd5892db00899032133318fff3116
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:34:18 2016 -0500
fix some constants
commit 88710a17817086e477c6c021ec346d0534b7fb88
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:14:25 2016 -0500
Add spatial-extras jar as a core dependency
commit c8cd6726583e5ce3f546ed355d4eca037164a30d
Author: Robert Muir <rmuir@apache.org>
Date: Wed Mar 2 18:03:33 2016 -0500
update to lucene 6 jars
The cluster stats api now returns counts for each node role. The `master_data`, `master_only`, `data_only` and `client` fields have been removed from the response in favour of `master`, `data`, `ingest` and `coordinating_only`. The same node can have multiple roles, hence contribute to multiple roles counts. Every node is implicitly a coordinating node, so whenever a node has no explicit roles, it will be counted as coordinating only.
_cat/nodes used to return `c` for client node or `d` for data node as part of the node.role column. This commit changes it to return `m` for master eligible, `d` for data and/or `i` for ingest. A node with no explicit roles will be a coordinating only node and marked with `-`. A node can obviously have multiple roles. The master column has been adapted to return only whether a node is the current master (`*`) or not (`-`).