The QueryStringQueryBuilder does not currently delegate to the field mapper's prefixQuery
method, so does not use indexed prefixes. This commit corrects this.
It also fixes a bug where a query a* would not match the word a if indexed prefixes were used with
a minchar setting of 2.
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.
Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.
While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.
This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.
This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.
Relates to #32125
With #36221 we introduced shards counting to address a rare failure.
This caused a worse problem in this test when replicas were allocated
and shards failures were randomly returned. The latch has to take into
account additional attempts caused by the shard failures, which means
that in order for run to be called, performPhaseOnShard will be called
(numShards + numFailures) times.
To address this, we need to decide upfront which shard is going to fail,
making sure that at least one shards is successful otherwise the whole
request fails.
Closes#37074
With this commit we introduce a new store type `hybridfs` that is a
hybrid between `mmapfs` and `niofs`. This store type chooses different
strategies to read Lucene files based on the read access pattern (random
or linear) in order to optimize performance.
This store type has been available in earlier versions of Elasticsearch
as `default_fs`. We have chosen a different name now in order to convey
the intent of the store type instead of tying it to the fact whether it
is the default choice.
Relates #36668
SearchAsyncActionTests may fail with RejectedExecutionException as InitialSearchPhase may try to execute a runnable after the test has successfully completed, and the corresponding executor was already shut down. The latch was located in getNextPhase that is almost correct, but does not cover for the last finishAndRunNext round that gets executed after onShardResult is invoked.
This commit moves the latch to count the number of shards, and allowing the test to count down later, after finishAndRunNext has been potentially forked. This way nothing else will be executed once the executor is shut down at the end of the tests.
Closes#36221Closes#33699
* Fixes the issue reproduced in the added tests:
* When having open index requests on a shard that are waiting for a refresh, relocating that shard
becomes blocked until that refresh happens (which could be never as in the test scenario).
With #36997 we added the ability to provide a cluster alias with a
SearchRequest.
The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.
Relates to #36997 & #32125
With the upcoming cross-cluster search alternate execution mode, the CCS
node will be able to split a CCS request into multiple search requests,
one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.
The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups.
The local cluster alias can be set to the SearchRequest at the transport layer
only, and its constructor/getter methods are package private.
Relates to #32125
Improves on #36449 which did not cover the situation where a node had bumped its term during
the election, and not when receiving the first follower check. This was uncovered while refactoring
NodeJoinTests so that they don't need to access to an internal field of Coordinator anymore (which
can now be made private).
* This speeds up the test from an average 25s down to 7s runtime
* There is no need for artificially slowing down the snapshot to reproduce the issue of an out of sync routing table in practice.
Over hundreds of test runs the test's snapshot shard service still runs in the index not found exception every time reproducing this issue.
* Relates #36294
Keys are compared in BucketSortPipelineAggregation so making key type (ArrayMap) implement Comparable. Maps are compared using the entry set's iterator so ordered maps order is maintain. For each entry first comparing key then value. Assuming all keys are strings. When comparing entries' values if type is not identical and\or type not implementing Comparable, throwing exception. Not implementing equals() and hashCode() functions as parent's ones are sufficient. Tests included.
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
We introduce a typeless API in #35790 where we translate the default
docType "_doc" to the user-defined docType. However, we do not rewrite
the SourceToParse with the resolved docType. This leads to a situation
where we have two translog operations for the same document with
different types:
- prvOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=1545125562123}]
- newOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='not_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}]
Closes#36769
In this test, we verify that the LocalCheckpointTracker is initialized
with the operations of the safe commit. And the test fails because
Engine#Index does not implement the equals method (should not
implement as it consists of a mutable ParsedDocument).
Closes#36470
This is a follow-up to some discussions around #36399. Currently we have
relatively confusing compression behavior where compression can be
configured for requests based on transport.compress or a specific
setting for a remote cluster. However, we can only compress responses
based on transport.compress as we do not know where a request is
coming from (currently).
This commit modifies the behavior to NEVER compress responses based on
settings. Instead, a response will only be compressed if the request was
compressed. This commit also updates the documentation to more clearly
described transport level compression.
When the script contexts were created in 6, the use of params.ctx was
deprecated. This commit cleans up that code and ensures that params.ctx
is null in both watcher script contexts.
Relates: #34059
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.
* Adds term number and greppable phrase 'coordinator becoming' to Coordinator
mode changes
* Adds term and version to messages from the ClusterApplier about master
changes
* Reduces some LeaderChecker messages to TRACE level
This commit modifies ESSingleNodeTestCase and ESIntegTestCase and
several concrete test classes to use node names when bootstrapping the
cluster.
Today ClusterBootstrapService.INITIAL_MASTER_NODE_COUNT_SETTING
setting is used to bootstrap clusters in tests. Instead, we want to use
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING and get rid of
the former setting eventually.
There were two main problems when refactoring InternalTestCluster:
1. Nodes are created one-by-one in buildNode method. And node.name
is created in this method as well. It's not suitable for bootstrapping,
because we need to have the names of all master eligible nodes in
advance, before creating the node with bootstrapping configuration set.
We address this issue by separating buildNode into two methods:
getNodeSettings and buildNode. We first iterate over all nodes to
get nodes settings, then change the setting for the bootstrapping node
and then proceed with building the node.
2. If autoManageMinMasterNodes = false, there is no way for the test to
set the list of bootstrapping nodes because node names are not known in
advance. This problem is solved by adding updateNodesSettings method
to NodeConfigurationSource and ESIntegTestCase (which could be
overridden by concrete integration test class). Once we have the list
of settings for all nodes, the integration test class is allowed to
update it. In our case, we update the
ClusterBootrstapService.INITIAL_MASTER_NODES_SETTING setting.
... MlDistributedFailureIT.testLoseDedicatedMasterNode.
An intermittent failure has been observed in
`MlDistributedFailureIT. testLoseDedicatedMasterNode`.
The test launches a cluster comprised by a dedicated master node
and a data and ML node. It creates a job and datafeed and starts them.
It then shuts down and restarts the master node. Finally, the test asserts
that the two tasks have been reassigned within 10s.
The intermittent failure is due to the assertions that the tasks have been
reassigned failing. Investigating the failure revealed that the `assertBusy`
that performs that assertion times out. Furthermore, it appears that the
job task is not reassigned because the memory tracking info is stale.
Memory tracking info is refreshed asynchronously when a job is attempted
to be reassigned. Tasks are attempted to be reassigned either due to a relevant
cluster state change or periodically. The periodic interval is controlled by a cluster
setting called `cluster.persistent_tasks.allocation.recheck_interval` and defaults to 30s.
What seems to be happening in this test is that if all cluster state changes after the
master node is restarted come through before the async memory info refresh completes,
then the job might take up to 30s until it is attempted to reassigned. Thus the `assertBusy`
times out.
This commit changes the test to reduce the periodic check that reassigns persistent
tasks to `200ms`. If the above theory is correct, this should eradicate those failures.
Closes#36760
Negative timestamps are currently supported in joda time. These are
dates before epoch. However, it doesn't really make sense to have a
negative timestamp, since this is a modern format. Any dates before
epoch can be represented with normal date formats, like ISO8601.
Additionally, implementing negative epoch timestamp parsing in java time
has an edge case which would more than double the code required. This
commit deprecates use of negative epoch timestamps.
Currently the ByteBufferReference does not duplicate the buffer.
This means that any changes to the buffer's limit or position will
impact the reference. This can lead to unexpected behavior. This commit
uses the ByteBuffer#slice method to ensure that the reference retains
its own ByteBuffer.
This commit partially reverts #36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.
closes#36602
The test testClusterInfoServiceInformationClearOnError relies on timing behavior. It sets
InternalClusterInfoService.INTERNAL_CLUSTER_INFO_TIMEOUT_SETTING to 1s and relies on the
fact that the stats request completes within that timeframe (which our ever-so-slow CI seems to
violate at times). Unfortunately the logging has been misimplemented in InternalClusterInfoService,
so the corresponding log messages showing that the requests have timed out are missing for this.
The issue can be locally reproduced by reducing the timeout to something lower.
Closes#36554
If the master sends both its follower checks just before disconnection then
neither will receive a response, meaning it must wait for the checks to time
out and send another in order to detect the disconnection and stand down.
Closes#36788
Single-node discovery is not persisting cluster states, which was caused by a recent 7.0-only
refactoring. This commit ensures that the cluster state is properly persisted when using single-node
discovery and adds a corresponding test.
* Replace 0L with an UNASSIGNED_PRIMARY_TERM constant
0 is an illegal value for a primary term that is often used to indicate
the primary term isn't assigned or is irrelevant. This PR replaces the
usage of 0 with a constant, to improve readability and so it can be
tracked and if needed, replaced.
* feedback
The default index_prefix settings will index prefixes of between 2 and 5 characters in length.
Currently, if a prefix search falls outside of this range at either end we fall back to a standard prefix
expansion, which is still very expensive for single character prefixes. However, we have an option
here to use a wildcard expansion rather than a prefix expansion, so that a query of a* gets remapped
to a? against the _index_prefix field - likely to be a very small set of terms, and certain to be much
smaller than a* against the whole index.
This commit adds this extra level of mapping for any prefix term whose length is one less than
the min_chars parameter of the index_prefixes field.
Today we might corrupt a fully deleted segment which is then pruned
once a snapshot is taken. This causes random test failures in CorruptedFileIT.
This change hardens the selection of files to corrupt and removes some fragile
code preventing fully deleted segments to be taken into account.
Closes#36526
Commit #36786 updated docs and strings to reference transport.port instead of
transport.tcp.port. However, this breaks backwards compatibility tests
as the tests rely on string configurations and transport.port does not
exist prior to 6.6. This commit reverts the places were we reference
transport.tcp.port for tests. This work will need to be reintroduced in
a backwards compatible way.
Lucene 7.6 uses a smaller encoding for LatLonShape. This commit forks the LatLonShape classes to Elasticsearch's local lucene package. These classes will be removed on the release of Lucene 7.6.
This is related to #36652. In 7.0 we plan to deprecate a number of
settings that make reference to the concept of a tcp transport. We
mostly just have a single transport type now (based on tcp). Settings
should only reference tcp if they are referring to socket options. This
commit updates the settings in the docs. And removes string usages of
the old settings. Additionally it adds a missing remote compress setting
to the docs.
TransportWriteAction.WriteReplicaResult is not properly synchronized, which can lead to a data race
between the thread that calls respond and the AsyncAfterWriteAction that calls either onSuccess or
onFailure. This data race results in the response listener not being called, which ultimately results in
a stuck replication task on the replica.
This commit is related to #36127. It adds a CcrRestoreSourceService to
track Engine.IndexCommitRef need for in-process file restores. When a
follower starts restoring a shard through the CcrRepository it opens a
session with the leader through the PutCcrRestoreSessionAction. The
leader responds to the request by telling the follower what files it
needs to fetch for a restore. This is not yet implemented.
Once, the restore is complete, the follower closes the session with the
DeleteCcrRestoreSessionAction action.
This commit removes the originalSettings member from Node. It was only
needed to allows test clusters to recreate the node in certain
situations. Instead, the test cluster now keeps track of these settings.
The joda epoch parsing code currently supports passing epoch time as a
number in scientific notation. However, no systems appear to exist which
output timestamps in scientific notation. In java time, it is
particularly complex to implement scientific notation timestamp parsing
within a DateTimeFormatter. This commit adds a deprecation warning when
the epoch time parsers in joda parse scientific notation, so that it can
be removed when switching to java time.
joda are passed a time in scientific notation.
* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:
When using vector strategy:
* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..
All other features are supported.
* revert change to PercolatorFieldMapper
* fix ExistsQuery for geo_shape vector strategy
* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only
* initial update to geoshape docs, including mapping migration updates
* initial support for GeoCollection queries
* fix docs and javadoc errors
* clean up geocollection queries
* set deprecated mapping tests to NOTCONSOLE
* fix geo-shape mapper asciidoc mapping and test warnings
* add support for point queries using LatLonShapeBoundingBoxQuery
* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups
* add lucene geometry build testing to ShapeBuilder tests
* remove deprecated prefix tree mapping from geo-shape.asciidoc
* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper
Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.
* update docs to remove vector strategy
* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder
* fix LineLength failure in GeoJsonShapeParserTests
* ShapeMapper refactor changes from PR feedback
* fix typo in geo-shape.asciidoc
* ignore circle test in docs
* update indexing-approach ref to geoshape-indexing-approach
* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase
* fix deprecatedParameters setup
* update indexing approach
* fixing unexpected warnings failures
* move orientation back to field type
* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point
* fix indexing-approach link in circle section of geoshape docs
* add strategy to deprecation warnings check
* fix test failures
* fix typo in QueryStringQueryBuilderTests
* fix total hits to totalHits().value
* fix version number
* add version check to BaseGeoShapeFieldMapper
* fix line length!
* revert version check in BaseGeoShapeFieldMapper
* Fix serialization of mappings of legacy shapes.
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
MapperService#getAllMetaFields returns an array, which is created out of
an `ObjectHashSet`. Such set does not guarantee deterministic hash
ordering. The array returned by its toArray may be sorted differently
at each run. This caused some repeatability issues in our tests (see #29080)
as we pick random fields from the array of possible metadata fields,
but that won't be repeatable if the input array is sorted differently at
every run. Once setting the tests seed, hppc picks that up and the sorting is
deterministic, but failures don't repeat with the seed that gets printed out
originally (as a seed was not originally set).
See also https://issues.carrot2.org/projects/HPPC/issues/HPPC-173.
With this commit, we simply create a static sorted array that is used for
`getAllMetaFields`. The change is in production code but really affects
only testing as the only production usage of this method was to iterate
through all values when parsing fields in the high-level REST client code.
Anyways, this seems like a good change as returning an array would imply
that it's deterministically sorted.
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of each `SearchHit`. Sort values are already present but they are formatted according to the provided `DocValueFormat` provided. The CCS node needs to be able to reconstruct the lucene `FieldDoc` to include in the `TopFieldDocs` and `CollapseTopFieldDocs` which will feed the `mergeTopDocs` method used to reduce multiple search responses (one per cluster) into one.
This commit adds such information to the `SearchSortValues` and exposes it through a new getter method added to `SearchHit` for retrieval. This info is only serialized at transport and never printed out at REST.
This change adds a new untyped endpoint `{index}/_source/{id}` for both the
GET and the HEAD methods to get the source of a document or check for its
existance. It also adds deprecation warnings to RestGetSourceAction that emit
a warning when the old deprecated "type" parameter is still used. Also updating
documentation and tests where appropriate.
Relates to #35190
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.
However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.
This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.
fixes#36219
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
The remote connection info API leads to resolving addresses of seed
nodes when invoked. This is problematic because if a hostname fails to
resolve, we would not display any remote connection info. Yet, a
hostname not resolving can happen across remote clusters, especially in
the modern world of cloud services with dynamically chaning
IPs. Instead, the remote connection info API should be providing the
configured seed nodes. This commit changes the remote connection info to
display the configured seed nodes, avoiding a hostname resolution. Note
that care was taken to preserve backwards compatibility with previous
versions that expect the remote connection info to serialize a transport
address instead of a string representing the hostname.
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
Investigating #36556 was made a little trickier because the feedback from the
failing assertion wasn't very informative, and the messages attached to other
nearby assertions were misleading. This commit improves the feedback from these
assertions and tidies up a few other issues in the test suite.
* If removing half the nodes completely removes a shard from the cluster we can't count it in the assertion
* Also:
* Remove unused logger parameter
* Fix typo in var name
* Closes#35365
This commit add support for using sequence numbers to power [optimistic concurrency control](http://en.wikipedia.org/wiki/Optimistic_concurrency_control)
in the delete and index transport actions and requests. A follow up will come with adding sequence
numbers to the update and get results.
Relates #36148
Relates #10708
This commit updates our transport settings for 7.0. It generally takes a
few approaches. First, for normal transport settings, it usestransport.
instead of transport.tcp. Second, it uses transport.tcp, http.tcp,
or network.tcp for all settings that are proxies for OS level socket
settings. Third, it marks the network.tcp.connect_timeout setting for
removal. Network service level settings are only settings that apply to
both the http and transport modules. There is no connect timeout in
http. Fourth, it moves all the transport settings to a single class
TransportSettings similar to the HttpTransportSettings class.
This commit does not actually remove any settings. It just adds the new
renamed settings and adds todos for settings that will be deprecated.
The existing joda compat methods isEquals isAfter and isBefore all took
in a ZonedDateTime, but since all of the scripting is now using the new
JodaCompatZonedDateTime, these are changed to take that in instead.
The following updates were made:
* Add deprecation warnings to `RestUpdateAction`, plus a test in `RestUpdateActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Add HLRC integration tests for the typed APIs.
* Update documentation (for both the REST API and Java HLRC).
* Fix failing integration tests.
Because of an earlier PR, the REST yml tests were already updated (one version without types, and another legacy version that retains types).
SearchSortValuesTests extends now `AbstractSerializingTestCase` which removes some code duplication and standardizes the way we test `fromXContent`, serialization and equals/hashcode.
Also, we were never creating `SearchSortValues` through their public constructor that accept an array of `DocValueFormat` together with the array of raw sort values. That is covered now, which involved some conversion from `BytesRef` to String in the test.
Also, the previous test was not using doing any equality check against the original and parsed versions in `testFromXContent` due to values being parsed with different types in some cases, which is now covered by converting those values using a new method added to `RandomObjects`. The code was already there as part of `randomStoredFieldValues`, but it is now exposed to be used in other scenarios.
For cross cluster search alternate execution mode (see #32125), we will need to take a search request that spans across multiple clusters (based on index prefixes e.g. cluster1:index, cluster2:index etc.) and split it into multiple search requests to be sent to each cluster. A copy constructor added to `SearchRequest` would make that easy and well maintainable in the future.
Something along the same lines already happens in `BulkByScrollParallelizationHelper`, but the corresponding code went outdated as some new fields were added to `SearchRequest` which were not added to the bulk by scroll code. A copy constructor helps making the task of copying a search request maintainable over time.
* Add IntervalQueryBuilder with support for match and combine intervals
* Add relative intervals
* feedback
* YAML test - broekn
* yaml test; begin to add block source
* Add block; make disjunction its own source
* WIP
* Extract IntervalBuilder and add tests for it
* Fix eq/hashcode in Disjunction
* New yaml test
* checkstyle
* license headers
* test fix
* YAML format
* YAML formatting again
* yaml tests; javadoc
* Add OR test -> requires fix from LUCENE-8586
* Add docs
* Re-do API
* Clint's API
* Delete bash script
* doc fixes
* imports
* docs
* test fix
* feedback
* comma
* docs fixes
* Tidy up doc references to old rule
Today we assert that the fake node ID is greater than the real node's ID. In
fact we want to assert that it's greater than _all_ proper UUIDs. This adds
assertions to that effect.
* Adds deprecation logging to ScriptDocValues#getValues.
First commit addressing issue #22919.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
* Fixes two build errors in #34279
* Removes unused import in ScriptDocValuesDatesTest
* Removes used of `.values` in example in diversified-sampler-aggregation.asciidoc
* Removes use of .values from painless test.
Part of #34279
* Updates tests to use `doc[foo]` syntax rather than `doc[foo].values`.
* Removes use of `getValues()` and replaces use of `doc[foo].values` with `doc[foo]`.
* Indentation fix.
* Remove unnecessary list construction at previous `getValues()` callsite in ScriptDocValues.GeoPoints.
* Update migration doc and add link to `getValue` in ScriptDocValues javadoc.
* Fix compile
* Fix javadoc issue
* Removes ScriptDocValues#getValues usage from painless whitelist.
Today, ResyncTask.Status is not registered, but appears as a task status
sometimes, leading to `Failed to deserialize response from handler` exceptions:
java.lang.IllegalArgumentException: Unknown NamedWriteable [org.elasticsearch.tasks.Task$Status][resync]
This commit adds the missing registration.
ConcurrentHashMap does not always behave correctly if removing elements and
concurrently checking for its emptyiness. Work around this by protecting all
usages with a mutex (there was only one usage unprotected by the mutex anyway)
and then we don't even need a ConcurrentHashMap at all.
In order for CCS alternate execution mode (see #32125) to be able to do the final reduction step on the CCS coordinating node, we need to serialize additional info in the transport layer as part of the `SearchHits`, specifically:
- lucene `SortField[]` which contains info about the fields that sorting was performed on and their type, which depends on mappings (that the CCS node does not know about)
- collapse field (`String`) that field collapsing was executed on, if requested
- collapse values (`Object[]`) that field collapsing was based on, if requested
This info is needed to be able to reconstruct the `TopFieldDocs` or `CollapseFieldTopDocs` in the CCS coordinating node to feed the `mergeTopDocs` method and reduce multiple search responses received (one per cluster) into one.
This commit adds such information to the `SearchHits` class. It's nullable info that is not serialized through the REST layer. `SearchPhaseController` sets such info at the end of the hits reduction phase.
* Enable parallel restore operations
* Add uuid to restore in progress entries to uniquely identify them
* Adjust restore in progress entries to be a map in cluster state
* Added tests for:
* Parallel restore from two different snapshots
* Parallel restore from a single snapshot to different indices to test uuid identifiers are correctly used by `RestoreService` and routing allocator
* Parallel restore with waiting for completion to test transport actions correctly use uuid identifiers
In this test, we keep track of a list of index commits then verify that
we reload exactly every operation from the safe commit. If a background
merge is triggered, then we might have a new index commit which is not
recorded in the tracking list. This change disables merges in the test.
Closes#36470
A previous fix of a similar problem in #35201 wasn't general enough, we also
need to catch cases where the randomly generated query string starts with some
version of "now" and hits a date field.
Closes#36595
This commit adds deprecation warnings when using format specifiers with
joda data formats that will change with java time. It also adds the "8"
prefix which may be used to force the new java time format parsing.
The getters and setters for useDisMax() have been deprecated since at least 6.0,
also there hasn't been any reference to the query parameter in the
documentation. Removing it from the builder and tests and replacing it with
`tieBreaker(1.0f)` where necessary.
The commit changes how indices are closed in the MetaDataIndexStateService.
It now uses a 3 steps process where writes are blocked on indices to be closed,
then some verifications are done on shards using the TransportVerifyShardBeforeCloseAction
added in #36249, and finally indices states are moved to CLOSE and their routing
tables removed.
The closing process also takes care of using the pre-7.0 way to close indices if the
cluster contains mixed version of nodes and a node does not support the TransportVerifyShardBeforeCloseAction. It also closes unassigned indices.
Related to #33888
This commit turns MultiValuesSourceFieldConfig into a proper
ToXContentObject for easy testing and verification of its
to/from XContent methods.
Closes#36474.
We are attempting to replace the usage of the `MockTcpTransport` with
the `MockNioTransport`. This commit replaces usages of
`MockTcpTransport` in two zen test cases.
When a security manager is present, the JVM will cache positive hostname
lookups indefinitely. This can be problematic, especially in the modern
world with cloud services where DNS addresses can change, or
environments using Docker containers where IP addresses could be
considered ephemeral. This behavior impacts cluster discovery,
cross-cluster replication and cross-cluster search, reindex from remote,
snapshot repositories, webhooks in Watcher, external authentication
mechanisms, and the Elastic Stack Monitoring Service. The experience of
watching a DNS lookup change yet not be reflected within Elasticsearch
is a poor experience for users. The reason the JVM has this is guard
against DNS cache posioning attacks. Yet, there is already a defense in
the modern world against such attacks: TLS. With proper certificate
validation, even if a resolver falls prey to a DNS cache poisoning
attack, using TLS would neuter the attack. Therefore we have a policy
with dubious security value that significantly impacts usability. As
such we make the usability/security tradeoff towards usability, since
the security risks are very low. This commit introduces new system
properties that Elasticsearch observes to override the JVM DNS cache
policy.
Previously persistent task assignment was checked in the
following situations:
- Persistent tasks are changed
- A node joins or leaves the cluster
- The routing table is changed
- Custom metadata in the cluster state is changed
- A new master node is elected
However, there could be situations when a persistent
task that could not be assigned to a node could become
assignable due to some other change, such as memory
usage on the nodes.
This change adds a timed recheck of persistent task
assignment to account for such situations. The timer
is suspended while checks triggered by cluster state
changes are in-flight to avoid adding burden to an
already busy cluster.
Closes#35792
* We should compare the target value with the to be applied value before interpreting the update as a change
* This speeds up the test failing in #36496 considerably by preventing state updates on noop setting updates
This commit add support to engine operations for resolving and verifying the sequence number and
primary term of the last modification to a document before performing an operation. This is
infrastructure to move our (optimistic concurrency control)[http://en.wikipedia.org/wiki/Optimistic_concurrency_control] API to use sequence numbers instead of internal versioning.
Relates #36148
Relates #10708
There are certain BootstrapCheck checks that may need access environment-specific
values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment
via a constructor to bypass the shortcoming in BootstrapContext. This commit
pulls in the node's environment into BootstrapContext.
Another case is found in #36519, where it is useful to check the state of the
data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on
the environment to retrieve references to things like node data paths.
This means that the BootstrapContext will have the same Settings used in the
Environment, which currently differs from the Node's settings.
data files under the cluster name subdirectory has been deprecated and was
meant to be removed in 6.0. This commit removes some leftover referrences to
these paths.
Includes the following:
* Reversion of doc-values changes in LUCENE-8374; we are interested in seeing if this
has an effect on benchmarks for node-stats and index-stats
* More improvements to docvalues updates
* Make sure to test conversion for both typed and typeless HLRC requests.
* Update a few more statements to deprecatedAndMaybeLog.
* Make sure Rest*SearchTemplateActionTests extend RestActionTestCase.
Currently TransportRequestOptions allows specific requests to request
compression. This commit removes this and always compresses based on the
settings. Additionally, it removes TransportResponseOptions as they
are unused.
This closes#36399.
* Fix CorruptFileIT to also take last DV generation into account
We currently only prune old .liv generations. With soft_deletes it's important
to also prune DV generations.
* Fix CorruptionUtils to skip the footer bytes after the checksum is read.
Today we read a broken checksum since we also checksum the 8 footer bytes that include
the checksum algorithm and the footer magic.
Closes#36526
Currently, the `BytesStreamOutput` always zeros out the underlying byte
pages when they are acquired. This should not be necessary as the stream
overwrites the underlying bytes as serialization occurs.
`PageCacheRecycler` is the class that creates and holds pages of arrays
for various uses. `BigArrays` is just one user of these pages. This
commit moves the constants that define the page sizes for the recycler
to be on the recycler class.
This commit moves the node field in the JoinRequest object to be a
private field, adding a dedicated accessor. This is a minor breaking
change in that it is no longer possible for all callers to overwrite
this field, but that is a feature.
This commit moves the MergedDateFormatter to a package private class and
reworks joda DateFormatter instances to use that instead of a single
DateTimeFormatter with multiple parsers. This will allow the java and
joda multi formats to share the same format parsing method in a
followup.
Currently our handshake requests do not include a version. This is
unfortunate as we cannot rely on the stream version since it is not the
sending node's version. Instead it is the minimum compatibility version.
The handshake request is currently empty and we do nothing with it. This
should allow us to add data to the request without breaking backwards
compatibility.
This commit adds the version to the handshake request. Additionally, it
allows "future data" to be added to the request. This allows nodes to craft
a version compatible response. And will properly handle additional data in
future handshake requests. The proper handling of "future data" is useful
as this is the only request where we do not know the other node's version.
Finally, it renames the TcpTransportHandshaker to
TransportHandshaker.
This commit contains a few minor changes to our search code:
- adjust the visibility of a couple of methods in our search code to package private from public or protected.
- make some of the `SearchPhaseController` methods static where possible
- rename one of the `SearchPhaseController#reducedQueryPhase` methods (used only for scroll requests) to `reducedScrollQueryPhase` without the `isScrollRequest` argument which was always set to `true`
- replace leniency in `SearchPhaseController#setShardIndex` with an assert to make sure that we never set the shard index twice
- remove two null checks where the checked field can never be null
- resolve an unchecked warning
- replace `List#toArray` invocation that creates an array providing the true size with array creation of length 0
- correct a couple of typos in comments
The different `DocValueFormat` implementors throw `UnsupportedOperationException` for methods that they don't support. That is perfectly fine, and quite common as not all implementors support all of the possible formats. This makes it hard though to trace back which implementors support which formats as they all implement the same methods.
This commit introduces default methods in the `DocValueFormat` interface so that all methods throw `UnsupportedOperationException` by default. This way implementors can override only the methods that they specifically support.
This commit modifies BigArrays to take a circuit breaker name and
the circuit breaking service. The default instance of BigArrays that
is passed around everywhere always uses the request breaker. At the
network level, we want to be using the inflight request breaker. So this
change will allow that.
Additionally, as this change moves away from a single instance of
BigArrays, the class is modified to not be a Releasable anymore.
Releasing big arrays was always dispatching to the PageCacheRecycler,
so this change makes the PageCacheRecycler the class that needs to be
managed and torn-down.
Finally, this commit closes#31435 be making the serialization of
transport messages use the inflight request breaker. With this change,
we no longer push the global BigArrays instnace to the network level.
Today the Zen2 coordinator only applies a write block when there is no known
elected master, ignoring the `discovery.zen.no_master_block` setting. This
commit resolves this, applying the correct block according to the configuration
instead.
Today we do not enforce soft-deletes when accessing the Lucene changes
snapshot. This might lead to internal errors because we assume
soft-deletes are enabled in that code path.
1. CCR tests work without any changes
2. `testDanglingIndices` require changes the source code (added TODO).
3. `testIndexDeletionWhenNodeRejoins` because it's using just two
nodes, adding the node to exclusions is needed on restart.
4. `testCorruptTranslogTruncationOfReplica` starts dedicated master
one, because otherwise, the cluster does not form, if nodes are stopped
and one node is started back.
5. `testResolvePath` needs TEST cluster, because all nodes are stopped
at the end of the test and it's not possible to perform checks needed
by SUITE cluster.
6. `SnapshotDisruptionIT`. Without changes, the test fails because Zen2
retries snapshot creation as soon as network partition heals. This
results into the race between creating snapshot and test cleanup logic
(deleting index). Zen1 on the
other hand, also schedules retry, but it takes some time after network
partition heals, so cleanup logic executes latter and test passes. The
check that snapshot is eventually created is added to
the end of the test.
Today we log a slightly cryptic "cluster bootstrapping is disabled on this
node" message if bootstrapping hasn't been configured. Since there is today
only one way to bootstrap the cluster it seems preferable to spell out exactly
which setting is missing.
* Lower fielddata circuit breaker default limit
Lower fielddata circuit breaker default limit from 60% to 40% as we have
moved to doc_values for most of the cases.
* merge master in
* update tests
* update docs
Deals with a situation where a follower becomes disconnected from the leader, but only for such a
short time where it becomes candidate and puts up a NO_MASTER_BLOCK, but then receives a
follower check from the leader. If the leader does not notice the node disconnecting, it is important
for the node not to be turned back into a follower but try and join the leader again.
We still should prefer the node into a follower on a follower check when this follower check triggers
a term bump as this can help during a leader election to quickly have a leader turn all other nodes
into followers, even before the leader has had the chance to transfer a possibly very large cluster
state.
Closes#36428
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit converts the watcher execution context to use the joda
compat java time objects. It also again removes the joda methods from
the painless whitelist.
For each API, the following updates were made:
- Add deprecation warnings to `Rest*Action`, plus tests in `Rest*ActionTests`.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
Allow `auto_date_histogram` as a valid parent agg for derivative,
cumulative sum, moving average, moving function and serial
differencing pipeline aggregations.
Since all these aggs share the same requirement (sequentially
ordered parent aggs), this commit also refactors to share
the same validation code so that any newly added aggs won't
be forgotten.
Closes#35578
This change adds a way to provide the content type of the rest serialization
tests when creating random instances. This is used by SearchHitsTests to ensure
that the internal members of the class are created with the same xContentType
and that equals can be used to compare an instances created from an XContent
view.
Today the `GetDiscoveredNodesAction` waits, possibly indefinitely, to discover
enough nodes to bootstrap the cluster. However it is possible that the cluster
forms before a node has discovered the expected collection of nodes, in which
case the action will wait indefinitely despite the fact that it is no longer
required.
This commit changes the behaviour so that the action fails once a node receives
a cluster state with a nonempty configuration, indicating that the cluster has
been successfully bootstrapped and therefore the `GetDiscoveredNodesAction`
need wait no longer.
Relates #36380 and #36381; reverts 558f4ec278.
This commit creates JodaDateFormatter to replace
FormatDateTimeFormatter. It converts all uses of the old class
to DateFormatter to allow a future change to use JavaDateFormatter
when appropriate.
Moves all remaining (rolling-upgrade and mixed-version) REST tests to use Zen2. To avoid adding
extra configuration, it relies on Zen2 being set as the default discovery type. This required a few
smaller changes in other tests. I've removed AzureMinimumMasterNodesTests which tests Zen1
functionality and dates from a time where host providers were not configurable and each cloud
plugin had its own discovery.type, subclassing the ZenDiscovery class. I've also adapted a few tests
which were unnecessarily adding addTestZenDiscovery = false for the same legacy reasons. Finally,
this also moves the unconfigured-node-name REST test to Zen2, testing the auto-bootstrapping
functionality in development mode when no discovery configuration is provided.
Today we expose a new engine immediately during Lucene rollback. The new
engine is started with a safe commit which might not include all
acknowledged operation. With this change, we won't expose the new engine
until it has recovered from the local translog.
Note that this solution is not complete since it's able to reserve only
acknowledged operations before the global checkpoint. This is because we
replay translog up to the global checkpoint during rollback. A per-doc
Lucene rollback would solve this issue entirely.
Relates #32867
Today, we iterate the bitset of hardLiveDocs to calculate the number of
live docs. This calculation might be expensive if we enable soft-deletes
(by default) for old indices whose soft-deletes was disabled previously
and had hard-deletes.
Once soft-deletes is enabled, we no longer hard-update or hard-delete
documents directly. We have hard-deletes in two scenarios: (1) from old
segments where soft-deletes was disabled, (2) when IndexWriter hits
non-aborted exceptions. These two cases, IW flushes SegmentInfos before
exposing the hard-deletes; thus we can use the hard-delete count of
SegmentInfos.
Creating `CompositeBytesReference` has more overhead than a single
`ByteBufferReference`. Many of our messages will be contained to a
single `ByteBuffer`. This commit avoids creating composite instances
when there is 0 or 1 underlying `ByteBuffers`.
We support rolling upgrades from Zen1 by keeping the master as a Zen1 node
until there are no more Zen1 nodes in the cluster, using the following
principles:
- Zen1 nodes will never vote for Zen2 nodes
- Zen2 nodes will, while not bootstrapped, vote for Zen1 nodes
- Zen2 nodes that were previously part of a mixed cluster will automatically
(and unsafely) bootstrap themselves when the last Zen1 node leaves.
This commit makes FormatDateTimeFormatter and DateFormatter apis close
to each other, so that the former can be removed in favor of the latter.
This PR does not change the uses of FormatDateTimeFormatter yet, so that
that future change can be purely mechanical.
This is related to #35975. It implements a basic restore functionality
for the CcrRepository. When the restore process is kicked off, it
configures the new index as expected for a follower index. This means
that the index has a different uuid, the version is not incremented, and
the Ccr metadata is installed.
When the restore shard method is called, an empty shard is initialized.
This commit removes the parseDefaulting method from DateFormatter,
bringing it more inline with the joda equivalent
FormatDateTimeFormatter. This method was only needed for the java
time implementation of DateMathParser. Instead, a DateFormatter now
returns an implementation of DateMathParser for the given format,
allowing the java time implementation to construct the appropriate date
math parser internally.
* Add warning if cluster fails to form fast enough
Today if a leader is not discovered or elected then nodes are essentially
silent at INFO and above, and log copiously at DEBUG and below. A short delay
when electing a leader is not unusual, for instance if other nodes have not yet
started, but a persistent failure to elect a leader is a problem worthy of log
messages in the default configuration.
With this change, while there is no leader each node outputs a WARN-level log
message every 10 seconds (by default) indicating as such, describing the
current discovery state and the current quorum(s).
* Add note about whether the discovered nodes form a quorum or not
* Introduce separate ClusterFormationFailureHelper
... and back out the unnecessary changes elsewhere
* It can be volatile
In #34474, we added a new assertion to ensure that the
LocalCheckpointTracker is always consistent with Lucene index. However,
we reset LocalCheckpoinTracker in testDedupByPrimaryTerm cause this
assertion to be violated.
This commit removes resetCheckpoint from LocalCheckpointTracker and
rewrites testDedupByPrimaryTerm without resetting the local checkpoint.
Relates #34474
Add support for inlined user dictionary in Nori
This change adds a new option called `user_dictionary_rules` to the
Nori a tokenizer`. It can be used to set additional tokenization rules
to the Korean tokenizer directly in the settings (instead of using a file).
Closes#35842
In real deployments it is important that clusters are properly configured to
avoid accidentally forming multiple independent clusters at cluster
bootstrapping time. However we also expect to be able to unpack Elasticsearch
and start up one or more nodes without any up-front configuration, and have
them do their best to find each other and form a cluster after a few seconds.
This change adds a delayed automatic bootstrapping process to nodes that start
up with no relevant settings set to support the desired out-of-the-box
experience without compromising safety in properly-configured deployments.
The conversion of timezones in JodaCompatibleZonedDateTime from joda to
java time requires the use of the DateUtils class to cater for corner
cases.
Closes#36306
This pull request adds the TransportShardCloseAction which is a
transport replication action that acquires all index shard permits for
its execution. This action will be used in the future by the
MetaDataIndexStateService in a new index closing process, where
we need to execute some sanity checks before closing an index.
The action executes the following verifications on the primary and replicas:
* there is no other on going operation active on the shard
* the data node holding the shard knows that the index is blocked for writes
* the shard's max sequence number is equal to the global checkpoint
When the verifications are done and successful, the shard is flushed.
Relates #33888
Includes:
LUCENE-8594: DV update are broken for updates on new field
LUCENE-8590: Optimize DocValues update datastructures
LUCENE-8593: Specialize single value numeric DV updates
Relates #36286
The test testLookupSeqNoByIdInLucene fails because it expects if any
change should be visible after a flush. However, that flush might be
ignored if the waitIfOngoing parameter is false (the default value), and
there is an ongoing flush triggered after merge is running.
Closes#35823
This commit moves back to use explicit dependsOn for test tasks on
check. Not all tasks extending RandomizedTestingTask should be run by
check directly.
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
* Merging the zen2 branch has made it impossible to reproduce the test failure here, likely as a result of the state now being written atomically
* Closes#36189
Some IDEs (specifically Eclipse 4.8.0) needs some explicit casting for type
inference. This was added in a recent change but we should also have comments so
we remember why this needs to be there (at least for now).
Currently, we only log that WriteStateException has occurred, in
GatewayMetaState.
This PR goal is to re-throw WriteStateException to upper layers.
If dirty flag is set to false, we wrap WriteStateException in
UncheckedIOException, we prefer unchecked exception not to add
explicit throws in the multitude of layers.
If dirty flag is set to true - the world is broken. And we need to
halt the JVM. Instead of explicit halting in GatewayMetaState, we
prefer to throw IOError, which will be subsequently handled by
ElasticsearchUncaughtExceptionHandler and JVM will be halted.
This PR also adds tests for WriteStateException.
The Eclipse IDE java compiler seems to need some special hints about what types
some functions used in the tests return. Correcting this for some test that were
newly merged to master.
This change removes the custom serialization of the total hits
and reuses the shard's serialization of Lucene#read/writeTopDocs
in the client code. This also removes the incorrect assertion that
trips randomly in bwc tests.
Closes#36284
Currently we use Math.random() in a few places in the tests which makes these
tests not reproducable with the random seed mechanism that comes with
ESTestCase. The change removes those instances.
This commit hides ClusterStates that have a STATE_NOT_RECOVERED_BLOCK from
ClusterStateAppliers. This is needed, because some appliers, such as IngestService, rely on
the fact, that cluster states with STATE_NOT_RECOVERED_BLOCK won't contain anything useful.
Once the state is recovered it's fully available for the appliers. This commit also switches many of
the remaining tests that require state persistence/recovery from Zen1 to Zen2.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Closes#19528
* Port a test Colin wrote for the TDigest library to validate TDigests storing over 2*MAXINT values. This appears to have been fixed in version 3.2 of TDigest, which Elasticsearch has been using for some time, so no changes were necessary to resolve this issue.
The shard deletion logic (triggered by IndicesStore), which also leads to index metadata deletion on
non-master-eligible data nodes, currently races against the new cluster state persistence logic
triggered by accepting cluster states. One thread is writing the index metadata while another one is
deleting the index metadata, leading to exceptions and assertions tripping (see below). The solution
proposed by this PR is to move the cluster state persistence of non-master-eligible nodes back to
the cluster applier service, just as it used to be for Zen1. This ensures that the index metadata
deletion logic, which is triggered by the shard deletion logic, runs on the same thread on which we
persist the cluster state.
Closes#35435
- make it easier to add additional testing tasks with the proper configuration and add some where they were missing.
- mute or fix failing tests
- add a check as part of testing conventions to find classes not included in any testing task.
This commit replaces usages of Streamable with Writeable for the
BaseTasksResponse / TransportTasksAction classes and subclasses of
these classes.
Note that where possible response fields were made final.
Relates to #34389
This commit adds an empty CcrRepository snapshot/restore repository.
When a new cluster is registered in the remote cluster settings, a new
CcrRepository is registered for that cluster.
This is implemented using a new concept of "internal repositories".
RepositoryPlugin now allows implementations to return factories for
"internal repositories". The "internal repositories" are different from
normal repositories in that they cannot be registered through the
external repository api. Additionally, "internal repositories" are local
to a node and are not stored in the cluster state.
The repository will be unregistered if the remote cluster is removed.
* Move `createRepository` call out of cluster state tasks
* Now only `RepositoriesService#applyClusterState` manipulates `this.repositories`
* Closes#9488
This commit makes `document`, `update`, `explain`, `termvectors` and `mapping`
typeless APIs work on indices that have a type whose name is not `_doc`.
Unfortunately, this needs to be a bit of a hack since I didn't want calls with
random type names to see documents with the type name that the user had chosen
upon type creation.
The `explain` and `termvectors` do not support being called without a type for
now so the test is just using `_doc` as a type for now, we will need to fix
tests later but this shouldn't require further changes server-side since passing
`_doc` as a type name is what typeless APIs do internally anyway.
Relates #35190
It is important that all shards of a given index have the same
`indexCreatedVersionMajor` to Lucene, or eg. merging those shards is going to
be considered illegal. At the moment, we use the latest Lucene version when
creating a shard, which could cause shards to have different created versions
eg. in case of forced allocation. This commit makes sure to reuse the
appropriate Lucene version in order to avoid such issues.
Closes#33826
Today we configure the soft-deletes field iff soft-deletes enabled.
Although this choice was correct, it prevents an engine with
soft-deletes disabled from opening a Lucene index with soft-deletes.
Moreover, this change should not have any side-effect if a Lucene index
does not have any soft-deletes.
Relates #36141
This commit implements proper metadata recovery for Zen2.
GatewayService is responsible for the recovery. In Zen1 GatewayService
creates an instance of Gateway, that is used to reach out to other cluster
nodes, get their state and calculate the most up-to-date state based on
versions. After that Gateway performs upgrade and archival of
ClusterSettings and closes bad indices. Then recovered state is passed to GatewayService.GatewayRecoveryListener that mixes up current state
and restored state, removes state not recovered block, creates the
routing table and performs re-routing.
In Zen2 we should perform this kind of logic on cluster startup, except
mixing state (because there is nothing to mix) and opening routing table.
This commit refactors out all `ClusterUpdate` functions in a separate class
`ClusterStateUpdaters`, which is used by `Gateway` and `GatewayService`
in case of Zen1, and by `GatewayMetaState` and `GatewayService` in case of
Zen2.
This commit also switches all integration tests that are already using Zen2 from
InMemoryPersistedState to GatewayMetaState.
This commit changes how an operation which requires all index shard
operations permits is executed when a primary term update is required:
the operation and the update are combined so that the operation is
executed after the primary term update under the same blocking
operation.
Closes#35850
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This test suite can stop all the shared master-eligible nodes, which breaks the
cluster since any non-shared master-eligible nodes are stopped first in the
reset process between tests.
Since this test suite can leave the cluster in this somewhat broken state, it
seems best that it uses a new cluster for each test.
This change adds a soft limit to open scroll contexts that can be controlled with the dynamic cluster setting `search.max_open_scroll_context` (defaults to 500).
Today if a node `A` sends a peers request to another node `B` then `B` will
react by sending a peers request back to `A`. However if `A` is not
master-eligible then this reaction is pointless and fails with an exception
saying `non-master-eligible node found`, adding noise to the logs. This change
suppresses this response to non-master-eligible nodes.
Given that we check the max buckets limit on each shard when collecting the buckets, and that non final reduction cannot add buckets (see #35921), there is no point in counting and checking the number of buckets as part of non final reduction phases.
Such check is still needed though in the final reduction phases to make sure that the number of returned buckets is not above the allowed threshold.
Relates somehow to #32125 as we will make use of non final reduction phases in CCS alternate execution mode and that increases the chance that this check trips for nothing when reducing aggs in each remote cluster.
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Moved method `canOpenIndex` is only used in tests -> moved to test CP
* Simplify `org.elasticsearch.index.store.Store#renameTempFilesSafe`
* Delete some dead methods
* Replace Streamable w/ Writeable in BaseTasksRequest and subclasses
This commit replaces usages of Streamable with Writeable for the
BaseTasksRequest / TransportTasksAction classes and subclasses of
these classes.
Relates to #34389
In #36033 we removed a catch block because we thought we were preventing
exceptions by avoiding concurrent elections, missing the obvious fact that some
joins are supposed to be failing.
As a quick fix the catch was reinstated in 3a5dab6d8e
but this change adds finesse by only catching exceptions from the joins that we
expect to fail. It also inlines an always-false parameter to `initialState()`.
Adds about a minute worth of backoffs and retries to saving task
results so it is *much* more likely that a busy cluster won't lose task
results. This isn't an ideal solution to losing task results, but it is
an incremental improvement. If all of the retries fail when still log
the task result, but that is far from ideal.
Closes#33764
Empty buckets don't need to be added when performing an incremental reduction step, they can be added later in the final reduction step. This will allow us to later remove the max buckets limit when performing non final reduction.
This is a follow-up to #35144. That commit made the underlying
connection opening process in TcpTransport asynchronous. However the
method still blocked on the process being complete before returning.
This commit moves the blocking to the ConnectionManager level. This is
another step towards the top-level TransportService api being async.
Prior to #35441 `ConnectionManager` had a `Lifecycle` object to support
the ping runnable. After that commit, the connection amanger only needs
the existing `AtomicBoolean` to indicate if it is running.
New that we test with min_doc_count set to 0 as well, we may end up generating a lot more buckets. This commit adjusts the min bound and max bound, as well as the offset for each randomly generated agg instance so that we don't end up hitting the 10.000 max buckets limit.
Relates to #36064
In this test we were randomizing different values but minDocCount was hardcoded to 1. It's important to test other values, especially `0` as it's the default. The test needed some adapting in the way buckets are randomly generated: all aggs need to share the same interval, minDocCount and emptyBucketInfo. Also assertions need to take into account that more (or less) buckets are expected depending on minDocCount.
CompositeBytesReference#slice has two bugs:
- One that makes it fail if the reference is empty and an empty slice is
created, this is #35950 and is fixed by special-casing empty-slices.
- One performance bug that makes it always create a composite slice when
creating a slice that ends on a boundary, this is fixed by computing `limit`
as the index of the sub reference that holds the last element rather than
the next element after the slice.
Closes#35950
org.elasticsearch.rest.RestController#hasContentType checks to see if the
RestHandler supports the `application/x-ndjson` Content-Type. DeprecationRestHandler
is a wrapper around the real RestHandler, and prior to this change
would always return `false` due to the interface's default supportsContentStream().
This prevents API's that use multi-line JSON from properly being deprecated
resulting in an HTTP 406 error.
This change ensures that the DeprecationRestHandler honors the
supportsContentStream() of the wrapped RestHandler.
Relates to #35958
The support for rest_total_hits_as_int has already been merged to 6x
in #35848 so this change adds this new option to master. The plan was
to add this new option as part of #35848 but we've decided to wait a few
days before merging this breaking change so this commit just handles
the new option as a noop exactly like 6x for now. This will allow
users to migrate to this parameter before #35848 is merged.
Relates #33028
This is related to #34405 and a follow-up to #34753. It makes a number
of changes to our current keepalive pings.
The ping interval configuration is moved to the ConnectionProfile.
The server channel now responds to pings. This makes the keepalive
pings bidirectional.
On the client-side, the pings can now be optimized away. What this
means is that if the channel has received a message or sent a message
since the last pinging round, the ping is not sent for this round.
The nested agg can defer the collection of children if it is nested
under another aggregation. In such case accessing the score in the children
aggregation throws an error because the scorer has already advanced to the next
parent. This change fixes this error by caching the score of the parent in the
nested aggregation. Children aggregations that work on nested documents will be
able to access the _score. Also note that the _score in this case is always the
parent's score, there is no way to retrieve the score of a nested docs in aggregations.
Closes#35985Closes#34555
* [Zen2] Implement Tombstone REST APIs
* Adds REST API for withdrawing votes and clearing vote withdrawls
* Tests added to Netty4 module since we need a real Network impl. for Http endpoints
Today the default for USE_ZEN2 is false and it is overridden in many places. By
defaulting it to true we can be sure that the only places in which Zen2 does
not work are those in which it is explicitly set to false.
Today we sometimes create a setup in which the node is a quorum on its own,
which allows it to win a pre-voting round and schedule an election essentially
at will, causing it to discard all the joins it just received and fail the
test. This change excludes this case, preventing stray elections from ruining
things.
Today any node can win an election. However, the whole point of
master-eligibility is that master-ineligible nodes should not be elected as the
leader; furthermore master-ineligible nodes do not have any outgoing STATE
channels so cannot publish cluster states, so their leadership is ineffective
and disruptive.
This change ensures that the elected leader is master-eligible by preventing
master-ineligible nodes from scheduling an election.
A number of tokenfilters can produce multiple tokens at the same position. This
is a problem when using token chains to parse synonym files, as the SynonymMap
requires that there are no stacked tokens in its input.
This commit ensures that when used to parse synonyms, these tokenfilters either produce
a single version of their input token, or that they throw an error when mappings are
generated. In indexes created in elasticsearch 6.x deprecation warnings are emitted in place
of the error.
* asciifolding and cjk_bigram produce only the folded or bigrammed token
* decompounders, synonyms and keyword_repeat are skipped
* n-grams, word-delimiter-filter, multiplexer, fingerprint and phonetic throw errors
Fixes#34298
The ActiveShardCount is used by cluster state observers to wait for a
given number of shards to be active before returning to the caller. The
current implementation does not work when an index is closed while an
observer is waiting on shards to be active. In this case, a NPE is thrown
and the observer is never notified that the shards won't become active.
This commit fixes the ActiveShardCount.enoughShardsActive() so that it
does not fail when an index is closed, similarly to what is done when an
index is deleted.
In `InternalHistogramTests` we were randomizing different values but `minDocCount` was hardcoded to `1`. It's important to test other values, especially `0` as it's the default. To make this possible, the test needed some adapting in the way buckets are randomly generated: all aggs need to share the same `interval`, `minDocCount` and `emptyBucketInfo`. Also assertions need to take into account that more (or less) buckets are expected depending on `minDocCount`.
This was originated by #35921 and its need to test adding empty buckets as part of the reduce phase.
Also relates to #26856 as one more key comparison needed to use `Double.compare` to properly handle `NaN` values, which was triggered by the increased test coverage.
Right now using the `GET /_tasks/<taskid>` API and causing a task to opt
in to saving its result after being completed requires permissions on
the `.tasks` index. When we built this we thought that that was fine,
but we've since moved towards not leaking details like "persisting task
results after the task is completed is done by saving them into an index
named `.tasks`." A more modern way of doing this would be to save the
tasks into the index "under the hood" and to have APIs to manage the
saved tasks. This is the first step down that road: it drops the
requirement to have permissions to interact with the `.tasks` index when
fetching task statuses and when persisting statuses beyond the lifetime
of the task.
In particular, this moves the concept of the "origin" of an action into
a more prominent place in the Elasticsearch server. The origin of an
action is ignored by the server, but the security plugin uses the origin
to make requests on behalf of a user in such a way that the user need
not have permissions to perform these actions. It *can* be made to be
fairly precise. More specifically, we can create an internal user just
for the tasks API that just has permission to interact with the `.tasks`
index. This change doesn't do that, instead, it uses the ubiquitus
"xpack" user which has most permissions because it is simpler. Adding
the tasks user is something I'd like to get to in a follow up change.
Instead, the majority of this change is about moving the "origin"
concept from the security portion of x-pack into the server. This should
allow any code to use the origin. To keep the change managable I've also
opted to deprecate rather than remove the "origin" helpers in the
security code. Removing them is almost entirely mechanical and I'd like
to that in a follow up as well.
Relates to #35573
Currently when a Fuzziness instance with custom AUTO distance values gets
written to XContent, the customized lower and upper distance values are ommited
and can consequently not be parsed back. This changes this to write the String
including the optional custom values when writing to XContent and fixes the
tests that should have caught this in the first place, e.g. by adding the custom
low and high distance values to the equality check.
This change removes the deprecated useDisMax() and useAllFields() methods from
the QueryStringQueryBuilder and related tests. The disMax parameter has already
been a no-op since 6.0 and also the useAllFields has been deprecated since 6.0
and there is a direct replacement via defaultField.
`ScriptDocValues#getValues` was added for backwards compatibility but no
longer needed. Scripts using the syntax `doc['foo'].values` when
`doc['foo']` is a list should be using `doc['foo']` instead.
Closes#22919
This PR adds deprecation warnings to the relevant `Rest*Action` classes, plus tests in `Rest*ActionTests`. No updates to REST tests, the Java HLRC, or documentation were necessary, since they didn't make use of types.
MultiSearchRequests issues through `_msearch` now validate all keys
in the metadata section. Previously unknown keys were ignored
while now an exception is thrown.
Closes#35869
This commit removes the dedicated `setSoLinger` method. This simplifies
the `TcpChannel` interface. This method has very little effect as the
SO_LINGER is not set prior to the channels being closed in the abstract
transport test case. We still will set SO_LINGER on the
`MockNioTransport`. However we can do this manually.
This pull request makes the `RestGetSourceAction` return a `ResourceNotFoundException` with a proper JSON response when source or document itself is missing (see issue #33384).
Here is below a sample JSON output:
```
{
"error": {
"root_cause": [
{
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
}
],
"type": "resource_not_found_exception",
"reason": "Source not found [index1]/[_doc]/[1]"
},
"status": 404
}
```
Today GatewayMetaState is capable of atomically storing MetaData to
disk. We've also moved fields that are needed to be persisted in Zen2
from ClusterState to ClusterState.MetaData.CoordinationMetaData.
This commit implements PersistedState interface.
version and currentTerm are persisted as a part of Manifest.
GatewayMetaState now implements both ClusterStateApplier and
PersistedState interfaces. We started with two descendants
Zen1GatewayMetaState and Zen2GatewayMetaState, but it turned
out to be not easy to glue it.
GatewayMetaState now constructs previousClusterState (including
MetaData) and previousManifest inside the constructor so that all
PersistedState methods are usable as soon as GatewayMetaState
instance is constructed. Also, loadMetaData is renamed to
getMetaData, because it just returns
previousClusterState.metaData().
Sadly, we don't have access to localNode (obtained from
TransportService in the constructor, so getLastAcceptedState
should be called, after setLocalNode method is invoked.
Currently, when deciding whether to write IndexMetaData to disk,
we're comparing current IndexMetaData version and received
IndexMetaData version. This is not safe in Zen2 if the term has changed.
So updateClusterState now accepts incremental write
method parameter. When it's set to false, we always write
IndexMetaData to disk.
Things that are not covered by GatewayMetaStateTests are covered
by GatewayMetaStatePersistedStateTests.
This commit also adds an option to use GatewayMetaState instead of
InMemoryPersistedState in TestZenDiscovery. However, by default
InMemoryPersistedState is used and only one test in PersistedStateIT
used GatewayMetaState. In order to use it for other tests, proper
state recovery should be implemented.
This removes the option to run a cluster without enforcing the
cluster-wide shard limit, making strict enforcement the default and only
behavior. The limit can still be adjusted as desired using the cluster
settings API.
This commit adds the support for exists query in the sorted execution mode
of the `composite` aggregation. We'll execute the aggregation from the sorted
points and use early termination if the main query is an `exists` query over the
first source of the `composite` aggregation.
Today we don't respect the indices options when they are passed
as request parameters to the `_msearch` endpoint. This is unintuitive
and doesn't cause any errors. This changes uses the top-level indices
options as the defaults for each sub search-request.
Closes#35851
More like this query allows to provide identifiers of documents to be retrieved as like/unlike items.
It can happen that at retrieval time an error is thrown, for instance caused by missing routing value when `_routing` is set required in the mapping.
Instead of ignoring such error and returning no documents for the query, the error should be re-thrown and returned to users. As part of this
change also mget and mtermvectors are unified in the way they throw such exception like it happens in other places, so that a `RoutingMissingException` is raised.
Closes#29678
This commit simplifies the throttling logic in InitialSearchPhase and removes some asserts from it. Also, a few formatting changes are applied to its code and surrounding classes.
A publication can succeed and complete before all nodes have applied the
published state and acknowledged it, thanks to the publication timeout; however
we need every node eventually either to apply the published state (or a later
state) or be removed from the cluster. This change introduces the LagDetector
which achieves this liveness property by removing any lagging nodes from the
cluster.
The `wait_for_metadata_version` parameter will instruct the cluster state
api to only return a cluster state until the metadata's version is equal or
greater than the version specified in `wait_for_metadata_version`. If
the specified `wait_for_timeout` has expired then a timed out response
is returned. (a response with no cluster state and wait for timed out flag set to true)
In the case metadata's version is equal or higher than `wait_for_metadata_version`
then the api will immediately return.
This feature is useful to avoid external components from constantly
polling the cluster state to whether somethings have changed in the
cluster state's metadata.
Code that operates on-top of the engine requires all readers returned to be
unwrapped into ElasticsearchDirectoryReader. The special reader
the FrozenEngine uses wasn't wrapped.
Today voting tombstones are stored in CoordinationMetaData as
Set<DiscoveryNode>.
DiscoveryNode is not a lightweight object and have a lot of fields.
It also has toXContent method, but no fromXContent method and the
output of toXContent is not enough to re-create DiscoveryNode
object.
And votingTombstone set should be persisted as a part of MetaData.
On the other hand, the only thing required from the tombstone is the
nodeId.
This PR adds VotingTombstone class for voting tombstones, which
consists of two fields for now - nodeId and nodeName. It could be
extended/shrank in the future if needed.
This PR also resolves TODO's related to the voting tombstones xcontent
story.
Example of CoordinationMetaData.toXContent with voting tombstones:
{
"term": 1,
"last_committed_config": [
"fkwLdOBvXSlgRTBfgNAL",
"tmQiPGHvUxXzPkkCDSJo",
"HhOmtQBZAThpHIGWhxpz",
"qZHWGpoDNPYRNIiqKsDl"
],
"last_accepted_config": [
"lhqacKmriwhHGFZcvqbx",
"MYysmBuROkvJRlDcusyd"
],
"voting_tombstones": [
{
"node_id": "McjbZbRkEz",
"node_name": "pdKIWeNJUO"
},
{
"node_id": "cpXkVibGwo",
"node_name": "UnCvFgdVsc"
},
{
"node_id": "EylRNOztbc",
"node_name": "ohOhkbMWZX"
}
]
}
Today when rolling a transog generation we copy the checkpoint from
`translog.ckp` to `translog-nnnn.ckp` using a simple `Files.copy()` followed by
appropriate `fsync()` calls. The copy operation is not atomic, so if we crash
at the wrong moment we can leave an incomplete checkpoint file on disk. In
practice the checkpoint is so small that it's either empty or fully written.
However, we do not correctly handle the case where it's empty when the node
restarts.
In contrast, in `recoverFromFiles()` we _do_ copy the checkpoint atomically.
This commit extracts the atomic copy operation from `recoverFromFiles()` and
re-uses it in `rollGeneration()`.
This pull request exposes two new methods in the IndexShard and
TransportReplicationAction classes in order to allow transport replication
actions to acquire all index shard operation permits for their execution.
It first adds the acquireAllPrimaryOperationPermits() and the
acquireAllReplicaOperationsPermits() methods to the IndexShard class
which allow to acquire all operations permits on a shard while exposing
a Releasable. It also refactors the TransportReplicationAction class to
expose two protected methods (acquirePrimaryOperationPermit() and
acquireReplicaOperationPermit()) that can be overridden when a transport
replication action requires the acquisition of all permits on primary and/or
replica shard during execution.
Finally, it adds a TransportReplicationAllPermitsAcquisitionTests which
illustrates how a transport replication action can grab all permits before
adding a cluster block in the cluster state, making subsequent operations
that requires a single permit to fail).
Related to elastic #33888
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
After #35332 has been merged, we noticed some test failures like #35597
in which one or more replica shards failed to be promoted as primaries
because the primary replica re-synchronization never succeed.
After some digging it appeared that the execution of the resync action was
blocked because of the presence of a global cluster block in the cluster state
(in this case, the "no master" block), making the resync action to fail when
executed on the primary.
Until #35332 such failures never happened because the
TransportResyncReplicationAction is skipping the reroute phase, the only
place where blocks were checked. Now with #35332 blocks are checked
during reroute and also during the execution of the transport replication
action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is
part of the replica to primary promotion and makes sure that replicas are in
sync and should not be blocked when the cluster state has no master or
when the index is read only.
This commit changes the TransportResyncReplicationAction to make obvious
that it does not honor blocks. It also adds a simple test that fails if the resync
action is blocked during the primary action execution.
Closes#35597
`testIncompatibleDiffResendsFullState` sometimes makes a 2-node cluster and
then partitions one of the nodes from the leader, which makes the leader stand
down. Then when the partition is removed the cluster re-forms but does so by
sending full cluster states, not diffs, causing the test to fail.
Additionally `testDiffBasedPublishing` sometimes fails if a publication is
delivered out-of-order, wiping out a fresher last-received cluster state with a
less-fresh one. This is fixed here by passing the received cluster state to the
coordinator before recording it as the last-received one, relying on the
coordinator's freshness checks.
Adds an XContent sub parser class that can to wrap another
XContent parser at the beginning of an object and allow skiping
all children in case of the parsing failure. It also uses this
subparser to ignore the rest of the GeoJson shape if the
parsing fails and we need to ignore the geoshape due to the
ignore_malformed flag.
Supersedes #34498Closes#34047
* [GEO] Add support to ShapeBuilders for building Lucene geometry
This commit adds support for building lucene geometry from the ShapeBuilders.
This is needed for integrating LatLonShape as the primary indexing approach
for geo_shape field types. All unit and integration tests are updated to
add randomization for testing both jts/s4j shapes and lucene shapes.
This test is failing sometimes with Zen2 due to the lack of lag detection.
Zen1 does not have this problem as it only considers a join as valid if the
corresponding cluster state update is successfully published and committed
on the joining node.
Today we have a way to atomically persist global MetaData and
IndexMetaData to disk when new ClusterState is received. All other
ClusterState fields are not persisted.
However, there are other parts of ClusterState that should be
persisted, namely:
version
term
lastCommittedConfiguration
lastAcceptedConfiguration
votingTombstones
version is changed frequently, other fields are not. We decided
to group term, lastCommittedConfiguration,
lastAcceptedConfiguration and votingTombstones into
CoordinationMetaData class and make CoordinationMetaData a field
inside MetaData.
MetaData.toXContent and MetaData.fromXContent should take care of
CoordinationMetaData.
version stays as a top level field in ClusterState and will be
persisted as part of Manifest in a follow-up commit.
Also MetaData.isGlobalStateEquals should be extended to include
coordinationMetaData in comparison.
This commit favors exposing getters, such as getTerm directly in
ClusterState to avoid massive code changes.
An example of CoordinationMetaState.toXContent:
{
"term": 1,
"last_committed_config": [
"TiIuBcbBtpuXyDDVHXeD",
"ZIAoVbkjjLPLUuYLaTkw"
],
"last_accepted_config": [
"OwkXbXZNOZPJqccdFHdz",
"LouzsGYwmQzpeQMrboZe",
"fCKGRZdjLTqzXAqPUtGL",
"pLoxshjpJXwDhbgjfYJy",
"SjINLwFIlIEFZCbjrSFo",
"MDkVncJEVyZLJktopWje"
]
}
- Moves disruption tests to Zen2
- Registers a few missing settings
- Removes .put(TestZenDiscovery.USE_ZEN2.getKey(), true) from tests where Zen2 is now enabled
by default through the parent test class
- Moves QuorumGatewayIT back to Zen1, as it is not stable with Zen2 as it currently relies on
dangling indices due to the lack of proper CS persistence, which triggers secondary failures
Queries across multiple fields generate MatchNoDocsQuerys for fields that are
unmapped. In certain situation this can lead to erroneous behaviour,
for example when an umapped field is used in a query_string query across
several fields. If some of the tokens in the query string get eliminated by an
analyzer on the mapped fields, the same token will currently generate
MatchNoDocsQuerys combined into a disjunction, which in turn
leads to no matches in the overall query. Instead we should simply not add
MatchNoDocsQuerys to those disjunctions.
Closes#34708
This parameter in the `query_string` query was deprecated in 6.0 and ignored
since then. Its API methods and remaining uses can be removed in the upcoming
major version.
Relates to #35734
This commit adds a rest endpoint for freezing and unfreezing an index.
Among other cleanups mainly fixing an issue accessing package private APIs
from a plugin that got caught by integration tests this change also adds
documentation for frozen indices.
Note: frozen indices are marked as `beta` and available as a basic feature.
Relates to #34352
Zen2 is now feature-complete enough to run most ESIntegTestCase tests. The changes in this PR
are as follows:
- ClusterSettingsIT is adapted to not be Zen1 specific anymore (it was using Zen1 settings).
- Some of the integration tests require persistent storage of the cluster state, which is not fully
implemented yet (see #33958). These tests keep running with Zen1 for now but will be switched
over as soon as that is fully implemented.
- Some very few integration tests are not running yet with Zen2 for other reasons, depending on
some of the other open points in #32006.
Elasticsearch node is responsible for storing cluster metadata.
There are 2 types of metadata: global metadata and index metadata.
`GatewayMetaState` implements `ClusterStateApplier` and receives all
`ClusterStateChanged` events and is responsible for storing modified
metadata to disk.
When new `ClusterStateChanged` event is received, `GatewayMetaState`
checks if global metadata has changed and if it's the case writes new
global metadata to disk. After that `GatewayMetaState` checks if index
metadata has changed or there are new indices assigned to this node and
if it's the case writes new index metadata to disk. Atomicity of global
metadata and index metadata writes is ensured by `MetaDataStateFormat`
class.
Unfortunately, there is no atomicity when more than one metadata changes
(global and index, or metadata for two indices). And atomicity is
important for Zen2 correctness.
This commit adds atomicity by adding a notion of manifest file,
represented by `MetaState` class. `MetaState` contains pointers to
current metadata.
More precisely, it stores global state generation as long and map from
`Index` to index metadata generation as long. Atomicity of writes for
manifest file is ensured by `MetaStateFormat` class.
The algorithm of writing changes to the disk would be the following:
1. Write global metadata state file to disk and remember
it's generation.
2. For each new/changed index write state file to disk and remember
it's generation. For each not-changed index use generation from
previous manifest file. If index is removed or this node is no longer
responsible for this index - forget about the index.
3. Create `MetaState` object using previously remembered generations and
write it to disk.
4. Remove old state files for global metadata, indices metadata and
manifest.
Additonally new implementation relies on enhanced `MetaDataStateFormat`
failure semantics, `applyClusterState` throws IOException, whose
descendant `WriteStateException` could be (and should be in Zen2)
explicitly handled.
The list of official plugins accidentally included `qa` projects like,
well, `qa` and `amazon-ec2`. This changes the mechanism that we use to
build the list and adds a test to catch this.
Closes#35623
Randomize test assertion and test set size instead of asserting on an
exhaustive list of dates with fixed test set size. Also refactor common
objects used to avoid recreating them, avoid date to string conversion
and reduce duplicate test code
Closes#33181
Removed extending of AbstractComponent and changed logger usage to
explicit declaration. Abstract classes still have logger
declaration using this.getClass() in order to show implementation class
name in its logs.
See #34488
* Deprecate types in count requests.
* Move RestCountAction to the 'search' package.
* Deprecate types in multi search requests.
* Add tests for types deprecation in the _search endpoint.
This change fixes#35351. Users were no longer able to return types of numbers other than doubles for bucket aggregation scripts. This change reverts to the previous behavior of being able to return any type of number and having it converted to a double outside of the script.
This inserts newlines in order to reduce line lengths in the
o.e.action.admin.cluster package to 140 characters or less. This
also remves the checkstyle suppressions for affected files.
Relates #34884, #34923
The javadocs of the CharSequence interface state that not all of its
implementations define the general contracts of the Object#equals and
Object#hashCode methods, therefore it is dangerous to use different CharSequence
instances as elements in a set or as keys in a map. While we probably mostly use
Strings in sets, in some places this is not enforced. To prevent this from
accidentally happening, this change replaces all occurances of Set<CharSequence>
which are currently mostly used in the completion suggester code with the more
concrete usage of Set<String>.
This changes the test to not use a `CountDownlatch`, instead adding an assertion
for the final logging message and waiting until the `MockAppender` has seen it
before proceeding.
Resolves#23739
Today, the bootstrapping of a Zen2 cluster is driven externally, requiring
something else to wait for discovery to converge and then to inject the initial
configuration. This is hard to use in some situations, such as REST tests.
This change introduces the `ClusterBootstrapService` which brings the bootstrap
retry logic within each node and allows it to be controlled via an (unsafe)
node setting.
The `composite` aggregation can optimize its execution when the query
is a `match_all` or a `range` over the field that is used in the first source
of the aggregation. However we only check for instances of `PointRangeQuery` whereas
the range query builder creates an `IndexOrDocValuesQuery`. This means that
today the optimization does not apply to `range` query even if the code could handle it.
This change fixes this issue by extracting the index query inside `IndexOrDocValuesQuery`.
In #23175 we renamed `ThreadPool$EstimatedTimeThread` to
`ThreadPool$CachedTimeThread` but did not update the corresponding entry in
`HotThreads#isIdleThread`. This commit addresses this.
This pull request replaces some blocks of code that must be run once
and that are currently based on AtomicBoolean by the convient RunOnce
class added in #35489.
Today, the TransportReplicationAction checks the global level blocks and
the index level blocks before routing the operation to the primary, in the
ReroutePhase, and it happens at the very beginning of the transport
replication action execution. For the upcoming rework of the Close Index
API and in order to deal with primary relocation, we'll need to also check
for blocks before executing the operation on the primary (while holding a
permit) but before routing to the new primary.
This pull request change the AsyncPrimaryAction so that it checks for
replication action's blocks before executing the operation locally or before
routing the primary action to the newly primary shard. The check is done
while holding a PrimaryShardReference.
Related to #33888
The way ScoreAccessor implements `compareTo()` is problematic because it doesn't
completely follow the Comparable contract, specificaly symmetry (if x is a
ScoreAccessor and y any Number then x.comparTo(y) works, but y.compareTo(x)
generally does not even compile). Fortunately we don't seem to use the fact that
ScoreAccessor is a Comparable anywhere, so we can simply remove it.
Today the `PeerFinder` probes each address it obtains, identifies the node to
which it just connected, and then returns all such nodes. However, this can
lead to duplicates if a node manages to connect to another node via two
distinct addresses. This causes bootstrapping to fail since
`BootstrapConfiguration#resolve` forbids duplicates.
This change alters the behaviour of the `PeerFinder` to remove duplicates in
this situation.
If shutting down half or more of the master-eligible nodes, their votes must
first be explicitly withdrawn to ensure that the cluster doesn't lose its
quorum. This works via _voting tombstones_, stored in the cluster state, which
tell the reconfigurator to remove nodes from the voting configuration.
This change introduces voting tombstones to the cluster state, together with
transport APIs for adding and removing them, and makes use of these APIs in
`InternalTestCluster` to support tests which remove at least half of the
master-eligible nodes at once (e.g. shrinking from two master-eligible nodes to
one).
AbstractComponent was deprecated in #35140 and is looking like it will be
removed at some point by #34888. Today all it does is provide a logger. This
change removes the usages of AbstractComponent that live solely in the zen2
feature branch to avoid some future merge pain, and replaces it where necessary
with some directly-created loggers.
This change adds a special caching reader that caches all relevant
values for a range query to rewrite correctly in a can_match phase
without actually opening the underlying directory reader. This
allows frozen indices to be filtered with can_match and in-turn
searched with wildcards in a efficient way since it allows us to
exclude shards that won't match based on their date-ranges without
opening their directory readers.
Relates to #34352
Depends on #34357
The ParsedReverseNested implementation should implement the ReverseNested
interface and not the Nested interface. Although this is an empty marker
interface it is confusing and can lead to casting errors. Also adding a test to
check that both ParsedNested and ParsedReverseNested implement the correct
interface.
Closes#35449
Some very old ancient versions of Linux do not have /etc/os-release. For
example, old Red Hat-like OS. This commit adds a fallback for handling
pretty name for these OS.
This is a follow up to #35357. That commit failed to register the new
cluster.remote.cluster_name.transport.compress setting with
`ClusterSettings`. This commit fixes that.
Some OS (e.g., Oracle Linux Server 6.9) have a trailing space at the end
of the PRETTY_NAME line in /etc/os-release. This commit addresses this
by accounting for this trailing space when extracting the pretty name.
Implements serialization compatibility between Zen1 and Zen2 transport action, allowing a Zen1 node to join a fully formed Zen2 cluster and vice-versa.
The MockTcpTransport is not friendly in regards to memory usage. It must
allocate multiple byte arrays for every message. This improves the
memory situation by failing fast if the message is improperly formatted.
Additionally, it uses reusable big arrays for at least half of the
allocated byte arrays.
This change adds a logger for the query and fetch phases that prints all requests
before their execution at the trace level. This will help debugging cases where an issue
occurs during the execution since only completed queries are logged by the slow logs.
This is related to #34483. It introduces a namespaced setting for
compression that allows users to configure compression on a per remote
cluster basis. The transport.tcp.compress remains as a fallback
setting. If transport.tcp.compress is set to true, then all requests
and responses are compressed. If it is set to false, only requests to
clusters based on the cluster.remote.cluster_name.transport.compress
setting are compressed. However, after this change regardless of any
local settings, responses will be compressed if the request that is
received was compressed.
Today our OS information returned in node stats only returns a
high-level name of the OS (e.g., "Linux"). Yet, for some uses this is
too high-level and knowing at a finer level of granularity the
underlying OS can be useful. This commit extracts the pretty name on
Linux from /etc/os-release. This pretty name usually includes the Linux
vendor and the Linux vendor version number (e.g., Fedora 28).
Enables diff-based publishing, which is an optimization where only the changing parts of the cluster
state are published to the nodes in the cluster, falling back to full cluster state publishing if the
receiver does not have the previous cluster state.