Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
This change adds an index setting to define how the documents should be sorted inside each Segment.
It allows any numeric, date, boolean or keyword field inside a mapping to be used to sort the index on disk.
It is not allowed to use a `nested` fields inside an index that defines an index sorting since `nested` fields relies on the original sort of the index.
This change does not add early termination capabilities in the search layer. This will be added in a follow up.
Relates #6720
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
When indexing a document via the bulk API where IDs can be explicitly
specified, we currently accept an empty ID. This is problematic because
such a document can not be obtained via the get API. Instead, we should
rejected these requets as accepting them could be a dangerous form of
leniency. Additionally, we already have a way of specifying
auto-generated IDs and that is to not explicitly specify an ID so we do
not need a second way. This commit rejects the individual requests where
ID is specified but empty.
Relates #24118
_field_stats has evolved quite a lot to become a multi purpose API capable of retrieving the field capabilities and the min/max value for a field.
In the mean time a more focused API called `_field_caps` has been added, this enpoint is a good replacement for _field_stats since he can
retrieve the field capabilities by just looking at the field mapping (no lookup in the index structures).
Also the recent improvement made to range queries makes the _field_stats API obsolete since this queries are now rewritten per shard based on the min/max found for the field.
This means that a range query that does not match any document in a shard can return quickly and can be cached efficiently.
For these reasons this change deprecates _field_stats. The deprecation should happen in 5.4 but we won't remove this API in 6.x yet which is why
this PR is made directly to 6.0.
The rest tests have also been adapted to not throw an error while this change is backported to 5.4.
This change introduces a new API called `_field_caps` that allows to retrieve the capabilities of specific fields.
Example:
````
GET t,s,v,w/_field_caps?fields=field1,field2
````
... returns:
````
{
"fields": {
"field1": {
"string": {
"searchable": true,
"aggregatable": true
}
},
"field2": {
"keyword": {
"searchable": false,
"aggregatable": true,
"non_searchable_indices": ["t"]
"indices": ["t", "s"]
},
"long": {
"searchable": true,
"aggregatable": false,
"non_aggregatable_indices": ["v"]
"indices": ["v", "w"]
}
}
}
}
````
In this example `field1` have the same type `text` across the requested indices `t`, `s`, `v`, `w`.
Conversely `field2` is defined with two conflicting types `keyword` and `long`.
Note that `_field_caps` does not treat this case as an error but rather return the list of unique types seen for this field.
Today, when parsing mget requests, we silently ignore keys in the top
level that do not match "docs" or "ids". This commit addresses this
situation by throwing an exception if any other key occurs here, and
providing the names of valid keys.
Relates #23746
This is especially useful when we rewrite the query because the result of the rewrite can be very different on different shards. See #18254 for example.
A previous attempt to address a race condition in this test set wait for
active shards to all. However, there might not be any replicas if the
test is only running with one node so we end up waiting
forever. Instead, to address the intial race condition, we just count
through the primary.
In #23638 we renamed `request_cache` to `request` in the
`_cache/clear` API. But it is only going to be committed back to
5.x so we can't test with the new name in a mixed version
cluster.
This test executes a bulk indexing operation with two documents. If this
test is running against multiple nodes, there are no guarantees that all
shards are green before we execute a search operation which might hit a
replica shard. This commit creates the index in advance, and waits for
all shards to be active before proceeding with the indexing request.
This test asserts that the took time exists by using the is_true
assertion. This assertion fails if the took time was zero as is_true
asserts that the field is not the empty string, not the string "false",
and not 0. If the search returns quickly, and took time is measured
using a high-precision monotonic clock, the took time can be zero. This
commit changes the assertion to use gte.
This commit adds the size of the cluster state to the response for the
get cluster state API call (GET /_cluster/state). The size that is
returned is the size of the full cluster state in bytes when compressed.
This is the same size of the full cluster state when serialized to
transmit over the network. Specifying the ?human flag displays the
compressed size in a more human friendly manner. Note that even if the
cluster state request filters items from the cluster state (so a subset
of the cluster state is returned), the size that is returned is the
compressed size of the entire cluster state.
Closes#3415
This test makes little sense when sent from the REST layer, as WrapperQueryBuilder is supposed to be used from the Java api. Also, providing the inner query as base64 string will work only for string formats and break for binary formats like SMILE and CBOR, whcih doesn't play well with randomizing content type in our REST tests
This commit removes an necessary test that ensures if
wait_for_active_shards cannot be fulfilled on index creation, that the
response returns shardsAcknowledged=false. However, this is already
tested in WaitForActiveShardsIT and it would improve the speed of the
test runs to get rid of any unnecessary tests, especially those that
depend on timeouts.
Previous changes aligned HEAD requests to be consistent with GET
requests to the same endpoint. This commit aligns the REST spec for the
impacted endpoints.
Relates #23313
Also expand testing on the different ways to provide index settings and remove dead code around ability to provide settings as query string parameters
Closes#23242
Both PRs below have been backported to 5.4 such that we can enable
BWC tests of this feature as well as remove version dependend serialization
for search request / responses.
Relates to #23288
Relates to #23253
In #23253 we added an the ability to incrementally reduce search results.
This change exposes the parameter to control the batch since and therefore
the memory consumption of a large search request.
A previous change aligned the handling of the GET document and HEAD
document APIs. This commit aligns the specification for these two APIs
as well, and fixes a failing test.
Relates #23196
This test uses index_patterns which has been introduced in 6.0 and does not exist in 5.4.0, making the Bwc test fails. Instead of using index templates, it now uses explicitly create the required indices. Also, it fixes unmapped aggregations tests.
This pull request reuses the typed_keys parameter added in #22965, but this time it applies it to suggesters. When set to true, the suggester names in the search response will be prefixed with a prefix that reflects their type.
This changes removes the SearchResponseListener that was used by the ExpandCollapseSearchResponseListener to expand collapsed hits.
The removal of SearchResponseListener is not a breaking change because it was never released.
This change also replace the blocking call in ExpandCollapseSearchResponseListener by a single asynchronous multi search request. The parallelism of the expand request can be set via CollapseBuilder#max_concurrent_group_searches
Closes#23048
This pull request adds a new parameter to the REST Search API named `typed_keys`. When set to true, the aggregation names in the search response will be prefixed with a prefix that reflects the internal type of the aggregation.
Here is a simple example:
```
GET /_search?typed_keys
{
"aggs": {
"tweets_per_user": {
"terms": {
"field": "user"
}
}
},
"size": 0
}
```
And the response:
```
{
"aggs": {
"sterms:tweets_per_user": {
...
}
}
}
```
This parameter is intended to make life easier for REST clients that could parse back the prefix and could detect the type of the aggregation to parse. It could also be implemented for suggesters.
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
* Integrate UnifiedHighlighter
This change integrates the Lucene highlighter called "unified" in the list of supported highlighters for ES.
This highlighter can extract offsets from either postings, term vectors, or via re-analyzing text.
The best strategy is picked automatically at query time and depends on the field and the query to highlight.
This is related to #22116. URLRepository requires SocketPermission
connect. This commit introduces a new module called "repository-url"
where URLRepository will reside. With the new module, permissions can
be removed from core.
* Add top hits collapsing to search request
The field collapsing is done with a custom top docs collector that "collapse" search hits with same field value.
The distributed aspect is resolve using the two passes that the regular search uses. The first pass "collapse" the top hits, then the coordinating node merge/collapse the top hits from each shard.
```
GET _search
{
"collapse": {
"field": "category",
}
}
```
This change also adds an ExpandCollapseSearchResponseListener that intercepts the search response and expands collapsed hits using the CollapseBuilder#innerHit} options.
The retrieval of each inner_hits is done by sending a query to all shards filtered by the collapse key.
```
GET _search
{
"collapse": {
"field": "category",
"inner_hits": {
"size": 2
}
}
}
```
Similar to the Filters aggregation but only supports "keyed" filter buckets and automatically "ANDs" pairs of filters to produce a form of adjacency matrix.
The intersection of buckets "A" and "B" is named "A&B" (the choice of separator is configurable). Empty intersection buckets are removed from the final results.
Closes#22169
Those services validate their setting before submitting an AckedClusterStateUpdateTask to the cluster state service. An acked cluster state may be completed by a networking thread when the last acks as received. As such it needs special care to make sure that thread context headers are handled correctly.
This PR removes all leniency in the conversion of Strings to booleans: "true"
is converted to the boolean value `true`, "false" is converted to the boolean
value `false`. Everything else raises an error.
Relates to #22024
On top of documentation, the PR adds deprecation loggers and deals with the resulting warning headers.
The yaml test is set exclude versions up to 6.0. This is need to make sure bwc tests pass until this is backported to 5.2.0 . Once that's done, I will change the yaml test version limits
There are some parameters that are accepted by each and every api we expose. Those (pretty, source, error_trace and filter_path) are not explicitly listed in the spec of every api, rather whitelisted in clients test runners so that they are always accepted. The `human` flag has been treated up until now as a parameter that's accepted by only some stats and info api, but that doesn't reflect reality as es core treats it exactly like `pretty` (relevant especially now that we validate params and throw exception when we find one that is not supported). Furthermore, the human flag has effect on every api that outputs a date, time, percentage or byte size field. For instance the tasks api outputs a date field although they don't have the human flag explicitly listed in their spec. There are other similar cases. This commit removes the human flag from the rest spec and makes it an always accepted query_string param.
This change disables the _all meta field by default.
Now that we have the "all-fields" method of query execution, we can save both
indexing time and disk space by disabling it.
_all can no longer be configured for indices created after 6.0.
Relates to #20925 and #21341Resolves#19784
* Promote longs to doubles when a terms agg mixes decimal and non-decimal number
This change makes the terms aggregation work when the buckets coming from different indices are a mix of decimal numbers and non-decimal numbers. In this case non-decimal number (longs) are promoted to decimal (double) which can result in a loss of precision for big numbers.
Fixes#22232
This patch fixes the incorrect indentation in the REST tests, which makes tests in language runners (eg. Ruby, Python) to fail, since the skip clause is parsed as an empty value. Tha Java YAML parser is smarter/lenient about whitespace, so it doesn't catch this.
Right now closing a shard looks like it strands refresh listeners,
causing tests like
`delete/50_refresh/refresh=wait_for waits until changes are visible in search`
to fail. Here is a build that fails:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+multi_cluster_search+multijob-darwin-compatibility/4/console
This attempts to fix the problem by implements `Closeable` on
`RefreshListeners` and rejecting listeners when closed. More importantly
the act of closing the instance flushes all pending listeners
so we shouldn't have any stranded listeners on close.
Because it was needed for testing, this also adds the number of
pending listeners to the `CommonStats` object and all API to which
that flows: `_cat/nodes`, `_cat/indices`, `_cat/shards`, and
`_nodes/stats`.
Currently `geo_point` and `geo_shape` field are treated as `text` field by the field stats API and we
try to extract the min/max values with MultiFields.getTerms.
This is ok in master because a `geo_point` field is always a Point field but it can cause problem in 5.x (and 2.x) because the legacy
`geo_point` are indexed as terms.
As a result the min and max are extracted and then printed in the FieldStats output using BytesRef.utf8ToString
which can throw an IndexOutOfBoundException since it's not valid UTF8 strings.
This change ensure that we never try to extract min/max information from a `geo_point` field.
It does not add a new type for geo points in the fieldstats API so we'll continue to use `text` for this kind of field.
This PR is targeted to master even though we could only commit this change to 5.x. I think it's cleaner to have it in master too before we make any decision on
https://github.com/elastic/elasticsearch/pull/21947.
Fixes#22384
This PR completes the refactoring of the cluster allocation explain API and improves it in the following two high-level ways:
1. The explain API now uses the same allocators that the AllocationService uses to make shard allocation decisions. Prior to this PR, the explain API would run the deciders against each node for the shard in question, but this was not executed on the same code path as the allocators, and many of the scenarios in shard allocation were not captured due to not executing through the same code paths as the allocators.
2. The APIs have changed, both on the Java and JSON level, to accurately capture the decisions made by the system. The APIs also now report on shard moving and rebalancing decisions, whereas the previous API did not report decisions for moving shards which cannot remain on their current node or rebalancing shards to form a more balanced cluster.
Note: this change affects plugin developers who may have a custom implementation of the ShardsAllocator interface. The method weighShards has been removed and no longer has any utility. In order to support the new explain API, however, a custom implementation of ShardsAllocator must now implement ShardAllocationDecision decideShardAllocation(ShardRouting shard, RoutingAllocation allocation) which provides a decision and explanation for allocating a single shard. For implementations that do not support explaining a single shard allocation via the cluster allocation explain API, this method can simply return an UnsupportedOperationException.
`scaled_float` should be used as DOUBLE in aggregations but currently they are used as LONG.
This change fixes this issue and adds a simple it test for it.
Fixes#22350
Before, snapshot/restore would synchronize all operations on the cluster
state except for deleting snapshots. This meant that only one
snapshot/restore operation would be allowed in the cluster at any given
time, except for deletions - there could be two or more snapshot
deletions running at the same time, or a deletion could be running,
unbeknowest to the rest of the cluster, and thus a snapshot or restore
would be allowed at the same time as the snapshot deletion was still in
progress. This could cause any number of synchronization issues,
including the situation where a snapshot that was deleted could reappear
in the index-N file, even though its data was no longer present in the
repository.
This commit introduces a new custom type to the cluster state to
represent deletions in progress. Now, another deletion cannot start if
a deletion is currently in progress. Similarily, a snapshot or restore
cannot be started if a deletion is currently in progress. In each case,
if attempting to run another snapshot/restore operation while a deletion
is in progress, a ConcurrentSnapshotExecutionException will be thrown.
This is the same exception thrown if trying to snapshot while another
snapshot is in progress, or restore while a snapshot is in progress.
Closes#19957
Today we only expose `value_type` in scriptable aggregations, however it is
also useful with unmapped fields. I suspect we never noticed because
`value_type` was not documented (fixed) and most aggregations are scriptable.
Closes#20163
Sequence BWC logic consists of two elements:
1) Wire level BWC using stream versions.
2) A changed to the global checkpoint maintenance semantics.
For the sequence number infra to work with a mixed version clusters, we have to consider situation where the primary is on an old node and replicas are on new ones (i.e., the replicas will receive operations without seq#) and also the reverse (i.e., the primary sends operations to a replica but the replica can't process the seq# and respond with local checkpoint). An new primary with an old replica is a rare because we do not allow a replica to recover from a new primary. However, it can occur if the old primary failed and a new replica was promoted or during primary relocation where the source primary is treated as a replica until the master starts the target.
1) Old Primary & New Replica - this case is easy as is taken care of by the wire level BWC. All incoming requests will have their seq# set to `UNASSIGNED_SEQ_NO`, which doesn't confuse the local checkpoint logic (keeping it at `NO_OPS_PERFORMED`)
2) New Primary & Old replica - this one is trickier as the global checkpoint service currently takes all in sync replicas into consideration for the global checkpoint calculation. In order to deal with old replicas, we change the semantics to say all *new node* in sync replicas. That means the replicas on old nodes don't count for the global checkpointing. In this state the seq# infra is not fully operational (you can't search on it, because copies may miss it) but it is maintained on shards that can support it. The old replicas will have to go through a file based recovery at some point and will get the seq# information at that point. There is still an edge case where a new primary fails and an old replica takes over. I'lll discuss this one with @ywelsch as I prefer to avoid it completely.
This PR also re-enables the BWC tests which were disabled. As such it had to fix any BWC issue that had crept in. Most notably an issue with the removal of the `timestamp` field in #21670.
The commit also includes a fix for the default value of the seq number field in replicated write requests (it was 0 but should be -2), that surface some other minor bugs which are fixed as well.
Last - I added some debugging tools like more sane node names and forcing replication request to implement a `toString`
In #20305, _suggest endpoint was deprecated
in favour of using _search endpoint. This
commit removes the dedicated _suggest endpoint
entirely from master.
This commit adds a skip for the include segment file sizes REST tests on
nodes less than or equal to version 5.1.1 as the stats APIs did not
correctly account for this parameter prior to version 5.1.2.
Relates #21879
This commit addresses an issue in the stats APIs where
include_segment_file_sizes was not being consumed leading to requests
containing this parameter being rejected.
Relates #21879
* Replace _suggest endpoint to _search in docs
In 5.0, the _suggest endpoint is just sugar for _search
with suggestions specified. Users should move away from
using the _suggest endpoint, as it is marked as deprecated in 5.x and
will be removed in 6.0
* update docs to use _search endpoint instead of _suggest
* Add deprecation logging to RestSuggestAction
* Use search endpoint instead of suggest endpoint in rest tests
Add checks to RankEvalSpec to safe guard against missing parameters.
Fail early in case no metric is supplied, no rated requests are supplied or the search source builder is missing but no template is supplied neither.
Add stricter checks around rank eval request parsing: Fail if in a rated request we see both, a verbatim request as well as request
template parameters.
Relates to #21260
Queries must be rewritten before the query phase executes otherwise non-executable queries like `wrapper` query or `terms` will fail or queries that require resources like script service can't access these service unless rewritten.
Relates to #21303
This change allows specifying alias/wildcard expression in indices_boost.
And added another format for specifying indices_boost. It accepts array of index name and boost pair.
If an index is included in multiple aliases/wildcard expressions, the first match will be used.
With new format, old format is marked as deprecated.
Closes#4756
Improves the error message returned when looking up a task that
belongs to a node that is no longer part of the cluster. The new
error message tells the user that the node isn't part of the cluster.
This is useful because if you start a task and the node goes down
there isn't a record of the task at all. This hints to the user that
the task might have died with the node.
Relates to #22027
Fixes an issue where indexing requests with operation type "create" auto-convert external versioning to internal versioning and silently ignore the version number instead of failing with an error message.
This commit adds a skip for the missing types REST test. There was a bug
for an unclosed XContent object on version prior to version 5.0.3 which
is going to lead to different responses on vresions prior to 5.0.3 and
versions on or after version 5.0.3.
When there are no indexes, get mapping has a series of special cases.
Two of those expect the response object already started, and the other
two respond with an exception. Those two cases (types passed in but no
indexes and vice versa) would fail in their error response generation
because it did not expect an object to already be started in the json
generator. This change moves the object start to where it is needed for
the empty responses.
closes#21916
REST tests use the default OOTB low/high disk watermarks of 85%/90%, which can make some tests fail if run on a machine with a fuller disk. This commit changes the watermarks in the same way as in IntegTestCase so that they're essentially ignored.
With #21738 we added an indices section to the search shards api, that will return the concrete indices hit by the request, and eventually the corresponding alias filter.
The java API returns the AliasFilter object, which holds the filter itself and an array of aliases that pointed to the index in the original request. The REST layer doesn't print out the aliases array though. This commit adds the aliases array as well and tests for this.
Group, List and Affix settings generate a bogus diff that turns the actual
diff into a string containing a json structure for instance:
```
"action" : {
"search" : {
"remote" : {
"" : "{\"my_remote_cluster\":\"[::1]:60378\"}"
}
}
}
```
which make reading the setting impossible. This happens for instance
if a group or affix setting is rendered via `_cluster/settings?include_defaults=true`
This change fixes the issue as well as several minor issues with affix settings that
where not accepted as valid setting today.
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
Add indices and filter information to search shards api output
The search shards api returns info about which shards are going to be hit by executing a search with provided parameters: indices, routing, preference. Indices can also be aliases, which can also hold filters. The output includes an array of shards and a summary of all the nodes the shards are allocated on. This commit adds a new indices section to the search shards output that includes one entry per index, where each index can be associated with an optional filter in case the index was hit through a filtered alias.
This is relevant since we have moved parsing of alias filters to the coordinating node.
Relates to #20916
The `type` parameter has always been accepted by the search_shards api, probably to make the api and its urls the same as search. Truth is that the type never had any effect, it's been ignored from day one while accepting it may make users think that we actually do something with it.
This commit removes support for the type parameter from the REST layer and the Java API. Backwards compatibility is maintained on the transport layer though.
The new added serialization test also uncovered a bug in the java API where the `ClusterSearchShardsRequest` could be created with no arguments, but the indices were required to be not null otherwise the request couldn't be serialized as `writeTo` would throw NPE. Fixed by setting a default value (empty array) for indices.
This is a followup to #21689 where we removed a misplaced try catch for IndexMissingException and IndexClosedException which was related to #9047 (at least for the index closed case). The code block within the change was moved as part of #20890, which made the catch redundant. It was somehow used before (e.g. in 5.0) but it doesn't seem that this catch had any effect. Added tests to verify that. In fact a specific catch added to the search api only would defeat the purpose of having common indices options that work throughout all our APIs.
Relates to #21689
The tests rely on the allocation decider allowing allocation and
the disk threshold decider wasn't allowing it. This came up from
time to in CI but *all* the time on my local machine because it is
a small ssd.
* master: (22 commits)
Add proper toString() method to UpdateTask (#21582)
Fix `InternalEngine#isThrottled` to not always return `false`. (#21592)
add `ignore_missing` option to SplitProcessor (#20982)
fix trace_match behavior for when there is only one grok pattern (#21413)
Remove dead code from GetResponse.java
Fixes date range query using epoch with timezone (#21542)
Do not cache term queries. (#21566)
Updated dynamic mapper section
Docs: Clarify date_histogram bucket sizes for DST time zones
Handle release of 5.0.1
Fix skip reason for stats API parameters test
Reduce skip version for stats API parameter tests
Strict level parsing for indices stats
Remove cluster update task when task times out (#21578)
[DOCS] Mention "all-fields" mode doesn't search across nested documents
InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
Fixed bad asciidoc in boolean mapping docs
Fixed bad asciidoc ID in node stats
Be strict when parsing values searching for booleans (#21555)
Fix time zone rounding edge case for DST overlaps
...
Adds a version constant for it, bwc indices, and a vagrant upgrade-from
version. Also bumps the "upgrade from" version for the backwards-5.0
test and adds `skip`s for tests that don't fail against 5.0 so we skip
them during the backwards testing.
Finally, this skips the "Shrink index via API" test because it fails
consistently for me. Inconsistently for CI, but consistently for me.
I'll work on making it consistent tomorrow.
This commit reduces the skip version for the stats API unrecognized
parameter tests now that the logic for unrecognized parameters has been
backported to 5.x.
Today when parsing a stats request, Elasticsearch silently ignores
incorrect metrics. This commit removes lenient parsing of stats requests
for the nodes stats and indices stats APIs.
Relates #21417
This commit enables real BWC testing against a 5.1 snapshot. All
REST tests plus rolling upgrade test now run against a mixed version
cross major version cluster.
* master:
ShardActiveResponseHandler shouldn't hold to an entire cluster state
Ensures cleanup of temporary index-* generational blobs during snapshotting (#21469)
Remove (again) test uses of onModule (#21414)
[TEST] Add assertBusy when checking for pending operation counter after tests
Revert "Add trace logging when aquiring and releasing operation locks for replication requests"
Allows multiple patterns to be specified for index templates (#21009)
[TEST] fixes rebalance single shard check as it isn't guaranteed that a rebalance makes sense and the method only tests if rebalance is allowed
Document _reindex with random_score
* master: (516 commits)
Avoid angering Log4j in TransportNodesActionTests
Add trace logging when aquiring and releasing operation locks for replication requests
Fix handler name on message not fully read
Remove accidental import.
Improve log message in TransportNodesAction
Clean up of Script.
Update Joda Time to version 2.9.5 (#21468)
Remove unused ClusterService dependency from SearchPhaseController (#21421)
Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Wait for all reindex subtasks before rethrottling
Correcting a typo-Maan to Man-in README.textile (#21466)
Fix InternalSearchHit#hasSource to return the proper boolean value (#21441)
Replace all index date-math examples with the URI encoded form
Fix typos (#21456)
Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
Add null check in InternalSearchHit#sourceRef to prevent NPE (#21431)
Add VirtualBox version check (#21370)
Export ES_JVM_OPTIONS for SysV init
Skip reindex rethrottle tests with workers
Make forbidden APIs be quieter about classpath warnings (#21443)
...
* Allows for an array of index template patterns to be provided to an
index template, and rename the field from 'template' to 'index_pattern'.
Closes#20690
Applied (almost) the same rules we use to validate index names
to new alias names. The only rule not applies it "must be lowercase".
We have tests that don't follow that rule and I except there are lots
of examples of camelCase alias names in the wild. We can add that
validation but I'm not sure it is worth it.
Closes#20748
Adds an alias that starts with `#` to the BWC index and validates
that you can read from it and remove it. Starting with `#` isn't
allowed after 5.1.0/6.0.0 so we don't create the alias or check it
after those versions.
Adds support for `?slices=N` to reindex which automatically
parallelizes the process using parallel scrolls on `_uid`. Performance
testing sees a 3x performance improvement for simple docs
on decent hardware, maybe 30% performance improvement
for more complex docs. Still compelling, especially because
clusters should be able to get closer to the 3x than the 30%
number.
Closes#20624
Today when writing an unbounded queue_size in the cat thread pool API,
we write null. This commit modifies this so that the output is -1 so
that the output is always present, and always a numeric value.
Relates #21342
Exist requests are supposed to never throw an exception, but rather return true or false depending on whether some resource exists or not. Indices exists does that for indices and accepts wildcard expressions too. The way the api works internally is by resolving indices and catching IndexNotFoundException: if an exception is thrown the index does not exist hence it returns false, otherwise it returns true. That works ok only if ignore_unavailable and allow_no_indices indices options are both set to false, meaning that they are strict and any missing index or wildcard expressions that resolves to no indices will lead to an exception that can be thrown and cause false to be returned.
Unfortunately the indices options have been configurable up until now for this request, meaning that one can set ignore_unavailable or allow_no_indices to true and have the indices exist request return true for indices that really don't exist, which makes very little sense in the context of this api.
This commit removes the indicesOptions setter from the IndicesExistsRequest and makes settable only expandWildcardsOpen and expandWildcardsClosed, hence a subset of the available indices options. This way we can guarantee more consistent behaviour of the indices exists api. We can then remove the ignore_unavailable and allow_no_indices option from indices exists api spec
This commit fixes an issue with the cat thread pool API spec. Namely,
the name of the variable for the thread pool patterns parameter was
misnamed in the spec.
Relates #21332
Today we validate the target index name late and therefore don't fail for instance
if the target index already exists and `dry_run=true` was specified. This change
validates the index name before we early terminate if dry_run is set.
Closes#21149
With #21099 we removed support for the ignored allow_no_indices parameter in indices upgrade API. Truth is that ignore_unavailable and expand_wildcards were also ignored, in indices upgrade as well as upgrade status API. Those parameters are though supported internally and settable through java API, hence they should be all supported on the REST layer too.
`FilterAggregationBuilder` today misses to rewrite queries which causes failures
if a query that uses a client for instance to lookup terms since it must be rewritten first.
This change also ensures that if a client is used from the rewrite context we mark the query as
non-cacheable.
Closes#21301
Since we now validate all consumed request parameter, users can't specify
`_cat/nodes?full_id=true|false` anymore since this parameter is consumed late.
This commit adds a test for this parameter and consumes it before request is processed.
Closes#21266
today the `_shrink` tests do relocate all shards to a single node in
the cluster. Yet, that is not always possible since the only node we can
safely identify in the cluster is the master and if the master is a BWC
node in such a cluster we won't be able to relocate shards that have a
primary on the newer version nodes since allocation deciders forbid this.
This change restricts allocation for that index when the index is created
to restrict allocation to the master that guarantees that all primaries
are on the same node which is sufficient for the `_shrink` API to run.
Lucene 6.2 introduces the new `Analyzer.normalize` API, which allows to apply
only character-level normalization such as lowercasing or accent folding, which
is exactly what is needed to process queries that operate on partial terms such
as `prefix`, `wildcard` or `fuzzy` queries. As a consequence, the
`lowercase_expanded_terms` option is not necessary anymore. Furthermore, the
`locale` option was only needed in order to know how to perform the lowercasing,
so this one can be removed as well.
Closes#9978
This adds support for templating in rank eval requests.
Relates to #20231
Problem: In it's current state the rank-eval request API forces the user to repeat complete queries for each test request. In most use cases the structure of the query to test will be stable with only parameters changing across requests, so this looks like lots of boilerplate json for something that could be expressed in a more concise way.
Uses templating/ ScriptServices to enable users to submit only one test request template and let them only specify template parameters on a per test request basis.
This fixes our cluster formation task to run REST tests against a mixed version cluster.
Yet, due to some limitations in our test framework `indices.rollover` tests are currently
disabled for the BWC case since they select the current master as the merge node which
happens to be a BWC node and we can't relocate all shards to it since the primaries are on
a higher version node. This will be fixed in a followup.
Closes#21142
Note: This has been cherry-picked from 5.0 and fixes several rest tests
as well as a BWC break in `OsStats.java`
When indices stats are requested via the node stats API, there is a
level parameter to request stats at the index, node, or shards
level. This parameter was not whitelisted when URL parsing was made
strict. This commit whitelists this parameter.
Additionally, there was some leniency in the parsing of this parameter
that has been removed.
Relates #21024
The create request now requires that an ID be present.
Currently the clients hard code a create method, but
we should just add a create REST spec so this method
can be autogenerated.
Today we don't parse alias filters on the coordinating node, we only forward
the alias patters to executing node and resolve it late. This has several problems
like requests that go through filtered aliases are never cached if they use date math,
since the parsing happens very late in the process even without rewriting. It also used
to be processed on every shard while we can only do it once per index on the coordinating node.
Another nice side-effect is that we are never prone to cluster-state updates that change an alias,
all nodes will execute the exact same alias filter since they are process based on the same
cluster state.
* Adding built-in sorting capability to _cat apis.
Closes#16975
* addressing pr comments
* changing value types back to original implementation and fixing cosmetic issues
* Changing compareTo, hashCode of value types to a better implementation
* Changed value compareTos to use Double.compare instead of if statements + fixed some failed unit tests
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
* master: (1199 commits)
[DOCS] Remove non-valid link to mapping migration document
Revert "Default `include_in_all` for numeric-like types to false"
test: add a test with ipv6 address
docs: clearify that both ip4 and ip6 addresses are supported
Include complex settings in settings requests
Add production warning for pre-release builds
Clean up confusing error message on unhandled endpoint
[TEST] Increase logging level in testDelayShards()
change health from string to enum (#20661)
Provide error message when plugin id is missing
Document that sliced scroll works for reindex
Make reindex-from-remote ignore unknown fields
Remove NoopGatewayAllocator in favor of a more realistic mock (#20637)
Remove Marvel character reference from guide
Fix documentation for setting Java I/O temp dir
Update client benchmarks to log4j2
Changes the API of GatewayAllocator#applyStartedShards and (#20642)
Removes FailedRerouteAllocation and StartedRerouteAllocation
IndexRoutingTable.initializeEmpty shouldn't override supplied primary RecoverySource (#20638)
Smoke tester: Adjust to latest changes (#20611)
...
This commit changes the default behavior of `_flush` to block if other flushes are ongoing.
This also removes the use of `FlushNotAllowedException` and instead simply return immediately
by skipping the flush. Users should be aware if they set this option that the flush might or might
not flush everything to disk ie. no transactional behavior of some sort.
Closes#20569
Adds a cat api endpoint: /_cat/templates and its more specific version, /_cat/templates/{name}.
It looks something like:
$ curl "localhost:9200/_cat/templates?v"
name template order version
sushi_california_roll *avocado* 1 1
pizza_hawaiian *pineapples* 1
pizza_pepperoni *pepperoni* 1
The specified version (only allows * globs) looks like:
$ curl "localhost:9200/_cat/templates/pizza*"
name template order version
pizza_hawaiian *pineapples* 1
pizza_pepperoni *pepperoni* 1
Partially specified columns:
$ curl "localhost:9200/_cat/templates/pizza*?v=true&h=name,template"
name template
pizza_hawaiian *pineapples*
pizza_pepperoni *pepperoni*
The help text:
$ curl "localhost:9200/_cat/templates/pizza*?help"
name | n | template name
template | t | template pattern string
order | o | template application order number
version | v | version
Closes#20467
We can now run templates using `explain` and/or `profile` parameters.
Which is interesting when you have defined a complicated profile but want to debug it in an easier way than running the full query again.
You can use `explain` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"explain": true
}
```
You can use `profile` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"profile": true
}
```
The command-line arguments for Elasticsearch must now be specified using
-E. This commit fixes the usage of command-line arguments in the REST
API spec README.
The only repository we can be sure is safe to clean is `fs` so we clean
any snapshots in those repositories after each test. Other repositories
like url and azure tend to throw exceptions rather than let us fetch
their contents during the REST test. So we clean what we can....
Closes#18159
The refresh description should indicate that the affected shards are
refreshed as opposed to the entire index.
This was raised as a discrepancy on
discuss (https://discuss.elastic.co/t/refresh-parameter-of-index-api/59008/2)
on the .NET client that originates from code generated from the rest api
spec. The description has been updated in master but should be updated
for the 2.4.0 release.
We put the rest api spec into a jar for upload to maven, so that we can
use within external rest tests. This change adds making a pom for maven
(as well as producing sources and javadoc jars, even though they will be
empty, because maven central requires them).
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
This commit adds a health status parameter to the cat indices API for
filtering on indices that match the specified status (green|yellow|red).
Relates #20393
Add docs to template support for _msearch
Relates to #10885
Relates to #15674
* Reference those docs from the rest api spec for _msearch/template support.
memory are much less than the total memory, the percentage
returned could be 0%. The yaml tests check that the free/used
percentage are valid values by asserting `is_true`, but it
turns out that `is_true` returns false if the value is
assigned but it is 0 or even the string "0". This commit
changes the assertion in the yaml test to ensure the value
is greater than or equal to 0 instead.
This was an error-prone version type that allowed overriding previous
version semantics. It could cause primaries and replicas to be out of
sync however, so it has been removed.
Resolves#19769
This adds a version field to Templates, which is itself is unused by Elasticsearch, but exists for users to better manage their own templates. Like description, it's optional.
This was an error-prone version type that allowed overriding previous
version semantics. It could cause primaries and replicas to be out of
sync however, so it has been removed.
Resolves#19769
Currently it does not because our parsers do not support big integers/decimals
(on purpose) but we do not have to ask our parser for the number type, we can
just ask the jackson parser for a number representation of the value with the
right type.
Note that I did not add similar tests for big decimals because Jackson seems to
never return big decimals, even for decimal values that are out of the range of
values that can be represented by doubles.
Closes#11508
The mem section was buggy in cluster stats and removed. It is now added back with the same structure as in node stats, containing total memory, available memory, used memory and percentages. All the values are the sum of all the nodes across the cluster (or at least the ones that we were able to get the values from).
If elasticsearch controls the ID values as well as the documents
version we can optimize the code that adds / appends the documents
to the index. Essentially we an skip the version lookup for all
documents unless the same document is delivered more than once.
On the lucene level we can simply call IndexWriter#addDocument instead
of #updateDocument but on the Engine level we need to ensure that we deoptimize
the case once we see the same document more than once.
This is done as follows:
1. Mark every request with a timestamp. This is done once on the first node that
receives a request and is fixed for this request. This can be even the
machine local time (see why later). The important part is that retry
requests will have the same value as the original one.
2. In the engine we make sure we keep the highest seen time stamp of "retry" requests.
This is updated while the retry request has its doc id lock. Call this `maxUnsafeAutoIdTimestamp`
3. When the engine runs an "optimized" request comes, it compares it's timestamp with the
current `maxUnsafeAutoIdTimestamp` (but doesn't update it). If the the request
timestamp is higher it is safe to execute it as optimized (no retry request with the same
timestamp has been run before). If not we fall back to "non-optimzed" mode and run the request as a retry one
and update the `maxUnsafeAutoIdTimestamp` unless it's been updated already to a higher value
Relates to #19813
* Params improvements to Cluster Health API wait for shards
Previously, the cluster health API used a strictly numeric value
for `wait_for_active_shards`. However, with the introduction of
ActiveShardCount and the removal of write consistency level for
replication operations, `wait_for_active_shards` is used for
write operations to represent values for ActiveShardCount. This
commit moves the cluster health API's usage of `wait_for_active_shards`
to be consistent with its usage in the write operation APIs.
This commit also changes `wait_for_relocating_shards` from a
numeric value to a simple boolean value `wait_for_no_relocating_shards`
to set whether the cluster health operation should wait for
all relocating shards to complete relocation.
* Addresses code review comments
* Don't be lenient if `wait_for_relocating_shards` is set
While removing an index isn't actually an alias action, if we add
an alias action that deletes an index then we can delete and index
and add an alias with the same name as the index atomically, in
the same cluster state update.
Closes#20064
This commit adds the support for exclusion filter to the response filtering (filter_path) feature. It changes the XContentBuilder APIs so that it now accepts two types of filters: inclusive and exclusive. Filters are no more String arrays but sets of String instead.
Adds an explicit recoverySource field to ShardRouting that characterizes the type of recovery to perform:
- fresh empty shard copy
- existing local shard copy
- recover from peer (primary)
- recover from snapshot
- recover from other local shards on same node (shrink index action)
This makes GET operations more consistent with `_search` operations which expect
`(stored_)fields` to work on stored fields and source filtering to work on the
`_source` field. This is now possible thanks to the fact that GET operations
do not read from the translog anymore (#20102) and also allows to get rid of
`FieldMapper#isGenerated`.
The `_termvectors` API (and thus more_like_this too) was relying on the fact
that GET operations would extract fields from either stored fields or the source
so the logic to do this that used to exist in `ShardGetService` has been moved
to `TermVectorsService`. It would be nice that term vectors do not rely on this,
but this does not seem to be a low hanging fruit.
The network types in use on a cluster can be useful information to have,
so this commit adds aggregate metrics for the network types in use in a
cluster to the cluster stats.
Relates #20144
This change adds a special field named _none_ that allows to disable the retrieval of the stored fields in a search request or in a TopHitsAggregation.
To completely disable stored fields retrieval (including disabling metadata fields retrieval such as _id or _type) use _none_ like this:
````
POST _search
{
"stored_fields": "_none_"
}
````
Today we do a lot of accounting inside the engine to maintain locations
of documents inside the transaction log. This is only needed to ensure
we can return the documents source from the engine if it hasn't been refreshed.
Aside of the added complexity to be able to read from the currently writing translog,
maintainance of pointers into the translog this also caused inconsistencies like different values
of the `_ttl` field if it was read from the tlog or not. TermVectors are totally different if
the document is fetched from the tranlog since copy fields are ignored etc.
This chance will simply call `refresh` if the documents latest version is not in the index. This
streamlines the semantics of the `_get` API and allows for more optimizations inside the engine
and on the transaction log. Note: `_refresh` is only called iff the requested document is not refreshed
yet but has recently been updated or added.
#Relates to #19787
Adds ignoreUnavailable to the snapshot status API to be consistent
with the get snapshots API which has a similar parameter. If
ignoreUnavailable is set to true, then the snapshot status request
will ignore any snapshots that were not found in the repository,
instead of throwing a SnapshotMissingException.
Closes#18522
Currently both `PUT` and `POST` can be used to create indices. This commit
removes support for `POST index_name` so that we can use it to index documents
with auto-generated ids once types are removed.
Relates #15613
that have analyzer aliases in their analysis settings will still work, but
any attempts to create an alias for analyzers in newly created indices
will result in an IllegalArgumentException.
As a result, the setting `index.analysis.analyzer.{analyzerName}.alias` is
no longer supported.
Closes#18244