This is important to allow any test to use RandomQueryBuilder#createQuery()
since some of the query builders that are used in this test test the length
of the types array and otherwise will thow NPE if the test is not a subclass
of AbstractQueryTestCase.
Today when logging an unknown or invalid setting, the log message does
not contain the source. This means that if we are archiving such a
setting, we do not specify where the setting is from (an index, and
which index, or a persistent or transient cluster setting). This commit
provides such logging for the end user can better understand the
consequences of the unknown or invalid setting.
Relates #20951
This commit fixes an issue with the configuration for the AwsSdkMetrics
logger; the issue is that the logging configuration had used underscores
instead of periods for the settings key (the perils of lenient settings
parsing).
Relates #20313
Now index and delete methods in index shard share code for
indexing stats. This commit collapses seperate methods for
index and delete operations into a generic execute method
for performing engine write operations. As an added benefit,
this commit cleans up the interface for indexing operation
listener making it more simple and concise to use.
Currently, we treat all write operation exceptions as equals, but in reality
every write operation can cause either an environment failure (i.e. a failure
that should fail the engine e.g. data corruption, lucene tragic events) or
operation failure (i.e. a failure that is transient w.r.t the operation e.g.
parsing exception).
This change bubbles up enironment failures from the engine, after failing the
engine but captures transient operation failures as part of the operation
to be processed appopriately at the transport level.
Today we don't parse alias filters on the coordinating node, we only forward
the alias patters to executing node and resolve it late. This has several problems
like requests that go through filtered aliases are never cached if they use date math,
since the parsing happens very late in the process even without rewriting. It also used
to be processed on every shard while we can only do it once per index on the coordinating node.
Another nice side-effect is that we are never prone to cluster-state updates that change an alias,
all nodes will execute the exact same alias filter since they are process based on the same
cluster state.
Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.
There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.
This commit also increases the lower bound of the max processes ulimit,
to prepare for a world where Elasticsearch instances might be running
with more the previous cap of 32 processors. With the current settings,
Elasticsearch wants to create roughly 576 + 25 * p / 2 threads, where p
is the number of processors. Add in roughly 7 * p / 8 threads for the GC
threads and a fudge factor, and 4096 should cover us pretty well up to
256 cores.
Relates #20874
Some people apparently never run tests when they change this file.
Neither do they read comments right below the line they change that
they should do the change after all.
It is important that folks understand that snapshot/restore isn't
for archiving. It is appropriate for backup and disaster recovery
but not for archival over long periods of time because of version
incompatibility.
Closes#20866
`AbstractSearchAsyncAction` has only been tested in integration tests.
The infrastructure is rather critical and should be tested on a unit-test
level. This change takes the first step.
This changes the CacheBuilder methods that are used to set expiration times to accept a
TimeValue instead of long. Accepting a long can lead to issues where the incorrect value is
passed in as the time unit is not clearly identified. By using TimeValue the caller no longer
needs to worry about the time unit used by the cache or builder.
Before this change the `MultiMatchQuery` called the field types
`termQuery()` with a null context. This is not correct so this change
fixes this so the `MultiMatchQuery` now uses the `ShardQueryContext` it
stores as a field.
Relates to https://github.com/elastic/elasticsearch/pull/20796#pullrequestreview-3606305
Both netty3 and netty4 http implementation printed the default
toString representation of PortRange if ports couldn't be bound.
This commit adds a better default toString method to PortRange and
uses the string representation for the error message in the http
implementations.
The test testDataFileCorruptionDuringRestore expects failures to happen when accessing snapshot data. It would sometimes
fail however as MockRepository (by default) only simulates 100 failures.
Sequence number related data (maximum sequence number, local checkpoint,
and global checkpoint) gets stored in Lucene on each commit. The logical
place to store this data is on each Lucene commit's user commit data
structure (see IndexWriter#setCommitData and the new version
IndexWriter#setLiveCommitData). However, previously we did not store the
maximum sequence number in the commit data because the commit data got
copied over before the Lucene IndexWriter flushed the documents to segments
in the commit. This means that between the time that the commit data was
set on the IndexWriter and the time that the IndexWriter completes the commit,
documents with higher sequence numbers could have entered the commit.
Hence, we would use FieldStats on the _seq_no field in the documents to get
the maximum sequence number value, but this suffers the drawback that if the
last sequence number in the commit corresponded to a delete document action,
that sequence number would not show up in FieldStats as there would be no
corresponding document in Lucene.
In Lucene 6.2, the commit data was changed to take an Iterable interface, so
that the commit data can be calculated and retrieved *after* all documents
have been flushed, while the commit data itself is being set on the Lucene commit.
This commit changes max_seq_no so it is stored in the commit data instead of
being calculated from FieldStats, taking advantage of the deferred calculation
of the max_seq_no through passing an Iterable that dynamically sets the iterator
data.
* improvements to iterating over commit data (and better safety guarantees)
* Adds sequence number and checkpoint testing for document deletion
intertwined with document indexing.
* improve test code slightly
* Remove caching of max_seq_no in commit data iterator and inline logging
* Adds a test for concurrently indexing and committing segments
to Lucene, ensuring the sequence number related commit data
in each Lucene commit point matches the invariants of
localCheckpoint <= highest sequence number in commit <= maxSeqNo
* fix comments
* addresses code review
* adds clarification on checking commit data on recovery from translog
* remove unneeded method
Sometimes it's useful / needed to use unreleased Version constants but we should not add those to the Version.java class for several reasons ie. BWC tests and assertions along those lines. Yet, it's not really obvious how to do that so I added some comments and a simple test for this.
There was an issue with using fuzziness parameter in multi_match query that has
been reported in #18710 and was fixed in Lucene 6.2 that is now used on master.
In order to verify that fix and close the original issue this PR adds the test
from that issue as an integration test.