* master:
Skip shard refreshes if shard is `search idle` (#27500)
Remove workaround in translog rest test (#27530)
inner_hits: Return an empty _source for nested inner hit when filtering on a field that doesn't exist.
percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024
Dedup translog operations by reading in reverse (#27268)
Ensure logging is configured for CLI commands
Ensure `doc_stats` are changing even if refresh is disabled (#27505)
Fix classes that can exit
Revert "Adjust CombinedDeletionPolicy for multiple commits (#27456)"
Transpose expected and actual, and remove duplicate info from message. (#27515)
[DOCS] Fixed broken link in breaking changes
* es/master: (38 commits)
Backport wait_for_initialiazing_shards to cluster health API
Carry over version map size to prevent excessive resizing (#27516)
Fix scroll query with a sort that is a prefix of the index sort (#27498)
Delete shard store files before restoring a snapshot (#27476)
Replace `delimited_payload_filter` by `delimited_payload` (#26625)
CURRENT should not be a -SNAPSHOT version if build.snapshot is false (#27512)
Fix merging of _meta field (#27352)
Remove unused method (#27508)
unmuted test, this has been fixed by #27397
Consolidate version numbering semantics (#27397)
Add wait_for_no_initializing_shards to cluster health API (#27489)
[TEST] use routing partition size based on the max routing shards of the second split
Adjust CombinedDeletionPolicy for multiple commits (#27456)
Update composite-aggregation.asciidoc
Deprecate `levenstein` in favor of `levenshtein` (#27409)
Automatically prepare indices for splitting (#27451)
Validate `op_type` for `_create` (#27483)
Minor ShapeBuilder cleanup
muted test
Decouple nio constructs from the tcp transport (#27484)
...
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.
This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).
This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.
This is part of the #26865 issue.
There are various cases were some files could exist in the
store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.
This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.
Related to #26865
This is related to #27260. Currently, basic nio constructs (nio
channels, the channel factories, selector event handlers, etc) implement
logic that is specific to the tcp transport. For example, NioChannel
implements the TcpChannel interface. These nio constructs at some point
will also need to support other protocols (ex: http).
This commit separates the TcpTransport logic from the nio building
blocks.
This change removes the module named aggs-composite and adds the `composite` aggs
as a core aggregation. This allows other plugins to use this new aggregation
and simplifies the integration in the HL rest client.
This is related to #27260. Currently every nio channel has a profile
field. Profile is a concept that only relates to the tcp transport. Http
channels will not have profiles. This commit moves the profile from the
nio channel to the read context. The context is the level that protocol
specific features and logic should live.
* master: (31 commits)
[TEST] Fix `GeoShapeQueryTests#testPointsOnly` failure
Transition transport apis to use void listeners (#27440)
AwaitsFix GeoShapeQueryTests#testPointsOnly #27454
Bump test version after backport
Ensure nested documents have consistent version and seq_ids (#27455)
Tests: Add Fedora-27 to packaging tests
Delete some seemingly unused exceptions (#27439)
#26800: Fix docs rendering
Remove config prompting for secrets and text (#27216)
Move the CLI into its own subproject (#27114)
Correct usage of "an" to "a" in getting started docs
Avoid NPE when getting build information
Removes BWC snapshot status handler used in 6.x (#27443)
Remove manual tracking of registered channels (#27445)
Remove parameters on HandshakeResponseHandler (#27444)
[GEO] fix pointsOnly bug for MULTIPOINT
Standardize underscore requirements in parameters (#27414)
peanut butter hamburgers
Log primary-replica resync failures
Uses TransportMasterNodeAction to update shard snapshot status (#27165)
...
Currently we use ActionListener<TcpChannel> for connect, close, and send
message listeners in TcpTransport. However, all of the listeners have to
capture a reference to a channel in the case of the exception api being
called. This commit changes these listeners to be type <Void> as passing
the channel to onResponse is not necessary. Additionally, this change
makes it easier to integrate with low level transports (which use
different implementations of TcpChannel).
This commit removes the ability to use ${prompt.secret} and
${prompt.text} as valid config settings. Secure settings has obsoleted
the need for this, and it cleans up some of the code in Bootstrap.
Projects the depend on the CLI currently depend on core. This should not
always be the case. The EnvironmentAwareCommand will remain in :core,
but the rest of the CLI components have been moved into their own
subproject of :core, :core:cli.
This is related to #27260. Currently, every ESSelector keeps track of
all channels that are registered with it. ESSelector is just an
abstraction over a raw java nio selector. The java nio selector already
tracks its own selection keys. This commit removes our tracking and
relies on the java nio selector tracking.
It leads to harder-to-parse logs that look like this:
```
1> [2017-11-16T20:46:21,804][INFO ][o.e.t.r.y.ClientYamlTestClient] Adding header Content-Type
1> with value application/json
1> [2017-11-16T20:46:21,812][INFO ][o.e.t.r.y.ClientYamlTestClient] Adding header Content-Type
1> with value application/json
1> [2017-11-16T20:46:21,820][INFO ][o.e.t.r.y.ClientYamlTestClient] Adding header Content-Type
1> with value application/json
1> [2017-11-16T20:46:21,966][INFO ][o.e.t.r.y.ClientYamlTestClient] Adding header Content-Type
1> with value application/json
```
This is related to #27260. In the nio transport work we do not catch or
handle `Throwable`. There are a few places where we have exception
handlers that accept `Throwable`. This commit removes those cases.
This commit is a follow up to the work completed in #27132. Essentially
it transitions two more methods (sendMessage and getLocalAddress) from
Transport to TcpChannel. With this change, there is no longer a need for
TcpTransport to be aware of the specific type of channel a transport
returns. So that class is no longer parameterized by channel type.
This is a follow up to #27132. As that PR greatly simplified the
connection logic inside a low level transport implementation, much of
the functionality provided by the NioClient class is no longer
necessary. This commit removes that class.
* master:
Stop skipping REST test after backport of #27056
Fix default value of ignore_unavailable for snapshot REST API (#27056)
Add composite aggregator (#26800)
Fix `ShardSplittingQuery` to respect nested documents. (#27398)
[Docs] Restore section about multi-level parent/child relation in parent-join (#27392)
Add TcpChannel to unify Transport implementations (#27132)
Add note on plugin distributions in plugins folder
Remove implementations of `TransportChannel` (#27388)
Update Google SDK to version 1.23 (#27381)
Fix Gradle 4.3.1 compatibility for logging (#27382)
[Test] Change Elasticsearch startup timeout to 120s in packaging tests
Docs/windows installer (#27369)
* This change adds a module called `aggs-composite` that defines a new aggregation named `composite`.
The `composite` aggregation is a multi-buckets aggregation that creates composite buckets made of multiple sources.
The sources for each bucket can be defined as:
* A `terms` source, values are extracted from a field or a script.
* A `date_histogram` source, values are extracted from a date field and rounded to the provided interval.
This aggregation can be used to retrieve all buckets of a deeply nested aggregation by flattening the nested aggregation in composite buckets.
A composite buckets is composed of one value per source and is built for each document as the combinations of values in the provided sources.
For instance the following aggregation:
````
"test_agg": {
"terms": {
"field": "field1"
},
"aggs": {
"nested_test_agg":
"terms": {
"field": "field2"
}
}
}
````
... which retrieves the top N terms for `field1` and for each top term in `field1` the top N terms for `field2`, can be replaced by a `composite` aggregation in order to retrieve **all** the combinations of `field1`, `field2` in the matching documents:
````
"composite_agg": {
"composite": {
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
},
}
}
````
The response of the aggregation looks like this:
````
"aggregations": {
"composite_agg": {
"buckets": [
{
"key": {
"field1": "alabama",
"field2": "almanach"
},
"doc_count": 100
},
{
"key": {
"field1": "alabama",
"field2": "calendar"
},
"doc_count": 1
},
{
"key": {
"field1": "arizona",
"field2": "calendar"
},
"doc_count": 1
}
]
}
}
````
By default this aggregation returns 10 buckets sorted in ascending order of the composite key.
Pagination can be achieved by providing `after` values, the values of the composite key to aggregate after.
For instance the following aggregation will aggregate all composite keys that sorts after `arizona, calendar`:
````
"composite_agg": {
"composite": {
"after": {"field1": "alabama", "field2": "calendar"},
"size": 100,
"sources": [
{
"field1": {
"terms": {
"field": "field1"
}
}
},
{
"field2": {
"terms": {
"field": "field2"
}
}
}
}
}
````
This aggregation is optimized for indices that set an index sorting that match the composite source definition.
For instance the aggregation above could run faster on indices that defines an index sorting like this:
````
"settings": {
"index.sort.field": ["field1", "field2"]
}
````
In this case the `composite` aggregation can early terminate on each segment.
This aggregation also accepts multi-valued field but disables early termination for these fields even if index sorting matches the sources definition.
This is mandatory because index sorting picks only one value per document to perform the sort.
Right now our different transport implementations must duplicate
functionality in order to stay compliant with the requirements of
TcpTransport. They must all implement common logic to open channels,
close channels, keep track of channels for eventual shutdown, etc.
Additionally, there is a weird and complicated relationship between
Transport and TransportService. We eventually want to start merging
some of the functionality between these classes.
This commit starts moving towards a world where TransportService retains
all the application logic and channel state. Transport implementations
in this world will only be tasked with returning a channel when one is
requested, calling transport service when a channel is accepted from
a server, and starting / stopping itself.
Specifically this commit changes how channels are opened and closed. All
Transport implementations now return a channel type that must comply with
the new TcpChannel interface. This interface has the methods necessary
for TcpTransport to completely manage the lifecycle of a channel. This
includes setting the channel up, waiting for connection, adding close
listeners, and eventually closing.
* es/master:
Logging: Unify log rotation for index/search slow log (#27298)
wildcard query on _index (#27334)
REST spec: Validate that api name matches file name that contains it (#27366)
Revert "Reduce synchronization on field data cache"
Create new handlers for every new request in GoogleCloudStorageService (#27339)
Rest test fixes (#27354)
* es/master: (24 commits)
Reduce synchronization on field data cache
add json-processor support for non-map json types (#27335)
Properly format IndexGraveyard deletion date as date (#27362)
Upgrade AWS SDK Jackson Databind to 2.6.7.1
Stop responding to ping requests before master abdication (#27329)
[Test] Fix POI version in packaging tests
Allow affix settings to specify dependencies (#27161)
Tests: Improve size regex in documentation test (#26879)
reword comment
Remove unnecessary logger creation for doc values field data
[Geo] Decouple geojson parse logic from ShapeBuilders
[DOCS] Fixed link to docker content
Plugins: Add versionless alias to all security policy codebase properties (#26756)
[Test] #27342 Fix SearchRequests#testValidate
[DOCS] Move X-Pack-specific Docker content (#27333)
Fail queries with scroll that explicitely set request_cache (#27342)
[Test] Fix S3BlobStoreContainerTests.testNumberOfMultiparts()
Set minimum_master_nodes to all nodes for REST tests (#27344)
[Tests] Relax allowed delta in extended_stats aggregation (#27171)
Remove S3 output stream (#27280)
...
We use affix settings to group settings / values under a certain namespace.
In some cases like login information for instance a setting is only valid if
one or more other settings are present. For instance `x.test.user` is only valid
if there is an `x.test.passwd` present and vice versa. This change allows to specify
such a dependency to prevent settings updates that leave settings in an inconsistent
state.
* master: (22 commits)
Update Tika version to 1.15
Aggregations: bucket_sort pipeline aggregation (#27152)
Introduce templating support to timezone/locale in DateProcessor (#27089)
Increase logging on qa:mixed-cluster tests
Update to AWS SDK 1.11.223 (#27278)
Improve error message for parse failures of completion fields (#27297)
Ensure external refreshes will also refresh internal searcher to minimize segment creation (#27253)
Remove optimisations to reuse objects when applying a new `ClusterState` (#27317)
Decouple `ChannelFactory` from Tcp classes (#27286)
Fix find remote when building BWC
Remove colons from task and configuration names
Add unreleased 5.6.5 version number
testCreateSplitIndexToN: do not set `routing_partition_size` to >= `number_of_routing_shards`
Snapshot/Restore: better handle incorrect chunk_size settings in FS repo (#26844)
Add limits for ngram and shingle settings (#27211) (#27318)
Correct comment in index shard test
Roll translog generation on primary promotion
ObjectParser: Replace IllegalStateException with ParsingException (#27302)
scripted_metric _agg parameter disappears if params are provided (#27159)
Update discovery-ec2.asciidoc
...
A previous refactoring exposed some protected methods. However, there is
not currently a need to be exposing these methods so publicly so we pull
one of them back to being package-private and we remove the other one.
Relates #27324
We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.
This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
* Decouple `ChannelFactory` from Tcp classes
This is related to #27260. Currently `ChannelFactory` is tightly coupled
to classes related to the elasticsearch Tcp binary protocol. This commit
modifies the factory to be able to construct http or other protocol
channels.
* master: (25 commits)
Disable bwc tests in preparation of backporting #26931
TemplateUpgradeService should only run on the master (#27294)
Die with dignity while merging
Fix profiling naming issues (#27133)
Correctly encode warning headers
Fixed references to Multi Index Syntax (#27283)
Add an active Elasticsearch WordPress plugin link (#27279)
Setting url parts as required to reflect the code base (#27263)
keys in aggs percentiles need to be in quotes. (#26905)
Align routing param type with search.json (#26958)
Update to support bulk updates by query (#27172)
Remove duplicated SnapshotStatus (#27276)
add split index reference in indices.asciidoc
Add ability to split shards (#26931)
[Docs] Fix minor paragraph indentation error for multiple Indices params (#25535)
Upgrade to Jackson 2.8.10 (#27230)
Fix inconsistencies in the rest api specs for `tasks` (#27163)
Adjust RestHighLevelClient method modifiers (#27238)
Remove unused parameters in AnalysisRegistry (#27232)
Add more information on `_failed_to_convert_` exception (#27034)
...
If an out of memory error is thrown while merging, today we quietly
rewrap it into a merge exception and the out of memory error is
lost. Instead, we need to rethrow out of memory errors, and in fact any
fatal error here, and let those go uncaught so that the node is torn
down. This commit causes this to be the case.
Relates #27265
The warnings headers have a fairly limited set of valid characters
(cf. quoted-text in RFC 7230). While we have assertions that we adhere
to this set of valid characters ensuring that our warning messages do
not violate the specificaion, we were neglecting the possibility that
arbitrary user input would trickle into these warning headers. Thus,
missing here was tests for these situations and encoding of characters
that appear outside the set of valid characters. This commit addresses
this by encoding any characters in a deprecation message that are not
from the set of valid characters.
Relates #27269
This change adds a new `_split` API that allows to split indices into a new
index with a power of two more shards that the source index. This API works
alongside the `_shrink` API but doesn't require any shard relocation before
indices can be split.
The split operation is conceptually an inverse `_shrink` operation since we
initialize the index with a _syntetic_ number of routing shards that are used
for the consistent hashing at index time. Compared to indices created with
earlier versions this might produce slightly different shard distributions but
has no impact on the per-index backwards compatibility. For now, the user is
required to prepare an index to be splittable by setting the
`index.number_of_routing_shards` at index creation time. The setting allows the
user to prepare the index to be splittable in factors of
`index.number_of_routing_shards` ie. if the index is created with
`index.number_of_routing_shards: 16` and `index.number_of_shards: 2` it can be
split into `4, 8, 16` shards. This is an intermediate step until we can make
this the default. This also allows us to safely backport this change to 6.x.
The `_split` operation is implemented internally as a DeleteByQuery on the
lucene level that is executed while the primary shards execute their initial
recovery. Subsequent merges that are triggered due to this operation will not be
executed immediately. All merges will be deferred unti the shards are started
and will then be throttled accordingly.
This change is intended for the 6.1 feature release but will not support pre-6.1
indices to be split unless these indices have been shrunk before. In that case
these indices can be split backwards into their original number of shards.
While it's not possible to upgrade the Jackson dependencies
to their latest versions yet (see #27032 (comment) for more)
it's still possible to upgrade to the latest 2.8.x version.
We have an hidden setting called `index.queries.cache.term_queries` that disables caching of term queries in the query cache.
Though term queries are not cached in the Lucene UsageTrackingQueryCachingPolicy since version 6.5.
This makes the es policy useless but also makes it impossible to re-enable caching for term queries.
This change appeared in Lucene 6.5 so this setting is no-op since version 5.4 of Elasticsearch
The change in this PR removes the setting and the custom policy.
Only tests should use the single argument Environment constructor. To
enforce this the single arg Environment constructor has been replaced with
a test framework factory method.
Production code (beyond initial Bootstrap) should always use the same
Environment object that Node.getEnvironment() returns. This Environment
is also available via dependency injection.
* es/master:
Fix snapshot getting stuck in INIT state (#27214)
Add an example of dynamic field names (#27255)
#26260 Allow ip_range to accept CIDR notation (#27192)
#27189 Fixed rounding of bounds in scaled float comparison (#27207)
Add support for Gradle 4.3 (#27249)
Fixes QueryStringQueryBuilderTests
build: Fix setting the incorrect bwc version in mixed cluster qa module
[Test] Fix QueryStringQueryBuilderTests.testExistsFieldQuery BWC
Adjust assertions for sequence numbers BWC tests
Do not create directories if repository is readonly (#26909)
[Test] Fix InternalStatsTests
[Test] Fix QueryStringQueryBuilderTests.testExistsFieldQuery
Uses norms for exists query if enabled (#27237)
Reinstate recommendation for ≥ 3 master-eligible nodes. (#27204)
For FsBlobStore and HdfsBlobStore, if the repository is read only, the blob store should be aware of the readonly setting and do not create directories if they don't exist.
Closes#21495
* master:
Lazy initialize checkpoint tracker bit sets
Remove checkpoint tracker bit sets setting
Fix stable BWC branch detection logic
Fix logic detecting unreleased versions
Enhances exists queries to reduce need for `_field_names` (#26930)
Added new terms_set query
Set request body to required to reflect the code base (#27188)
Update Docker docs for 6.0.0-rc2 (#27166)
Add version 6.0.0
Docs: restore now fails if it encounters incompatible settings (#26933)
Convert index blocks to cluster block exceptions (#27050)
[DOCS] Link remote info API in Cross Cluster Search docs page
Fix Laplace scorer to multiply by alpha (and not add) (#27125)
[DOCS] Clarify migrate guide and search request validation
Raise IllegalArgumentException if query validation failed (#26811)
prevent duplicate fields when mixing parent and root nested includes (#27072)
TopHitsAggregator must propagate calls to `setScorer`. (#27138)
When partitioning version constants into released and unreleased
versions, today we have a bug in finding the last unreleased
version. Namely, consider the following version constants on the 6.x
branch: ..., 5.6.3, 5.6.4, 6.0.0-alpha1, ..., 6.0.0-rc1, 6.0.0-rc2,
6.0.0, 6.1.0. In this case, our convention dictates that: 5.6.4, 6.0.0,
and 6.1.0 are unreleased. Today we correctly detect that 6.0.0 and 6.1.0
are unreleased, and then we say the previous patch version is unreleased
too. The problem is the logic to remove that previous patch version is
broken, it does not skip alphas/betas/RCs which have been released. This
commit fixes this by skipping backwards over pre-release versions when
finding the previous patch version to remove.
Relates #27206
* Enhances exists queries to reduce need for `_field_names`
Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.
This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.
Closes#26770
* Addresses review comments
* Addresses more review comments
* implements existsQuery explicitly on every mapper
* Reinstates ability to perform term query on `_field_names`
* Added bwc depending on index created version
* Review Comments
* Skips tests that are not supported in 6.1.0
These values will need to be changed after backporting this PR to 6.x
It is required in order to work correctly with bulk scorer implementations
that change the scorer during the collection process. Otherwise sub collectors
might call `Scorer.score()` on the wrong scorer.
Closes#27131
* master:
Refactor internal engine
[Docs] #26541: add warning regarding the limit on the number of fields that can be queried at once in the multi_match query.
[Docs] Fix note in bucket_selector
[Docs] Fix indentation of examples (#27168)
[Docs] Clarify `span_not` query behavior for non-overlapping matches (#27150)
[Docs] Remove first person "I" from getting started (#27155)
This commit is a minor refactoring of internal engine to move hooks for
generating sequence numbers into the engine itself. As such, we refactor
tests that relied on this hook to use the new hook, and remove the hook
from the sequence number service itself.
Relates #27082
The headers passed to reindex were skipped except for the last one. This
commit fixes the copying of the headers, as well as adds a base test
case for rest client builders to access the headers within the built
rest client.
relates #22976