Today we always add no-ops to translog regardless of its origin, thus a
noop may appear in the translog multiple times. This is not a big deal
as noops are small and rare to appear.
This commit ensures to add a noop to translog only if its origin is not
from local translog. This restriction has been applied for index and
delete.
This metric previously existed for backwards compatibility reasons
although the suggest stats were folded into search stats. This metric
was deprecated in 6.3.0 and this commit removes them for 7.0.0.
This commit fixes two issues with the byte size value equals/hash code
test.
The first problem is due to a test failure when the original instance is
zero bytes and we pick the mutation branch where we preserve the size
but change the unit. The mutation should result in a different byte size
value but changing the unit on zero bytes still leaves us with zero
bytes.
During the course of fixing this test I discovered another problem. When
we need to randomize size, we could randomly select a size that would
lead to an overflow of Long.MAX_VALUE.
This commit fixes both of these issues.
This commit adds the distribution type to the startup scripts so that we
can discern from log output and the main response the type of the
distribution (deb/rpm/tar/zip).
This commit adds the distribution flavor (default versus oss) to the
build process which is passed through the startup scripts to
Elasticsearch. This change will be used to customize the message on
attempting to install/remove x-pack based on the distribution flavor.
This commit makes x-pack a module and adds it to the default
distrubtion. It also creates distributions for zip, tar, deb and rpm
which contain only oss code.
Adds a check in BlobstoreRepository.snapshot(...) that prevents duplicate snapshot names and fails
the snapshot before writing out the new index file. This ensures that you cannot end up in this
situation where the index file has duplicate names and cannot be read anymore .
Relates to #28906
The suggest stats were folded into the search stats as part of the
indices stats API in 5.0.0. However, the suggest metric remained as a
synonym for the search metric for BWC reasons. This commit deprecates
usage of the suggest metric on the indices stats API.
Similarly, due to the changes to fold the suggest stats into the search
stats, requesting the suggest index metric on the indices metric on the
nodes stats API has produced an empty object as the response since
5.0.0. This commit deprecates this index metric on the indices metric on
the nodes stats API.
This commit implements the ability to remove values from a Cache using
the values iterator. This brings the values iterator in line with the
keys iterator and adds support for removing items in the cache that are
not easily found by the key used for the cache.
Previously we did not put an indexing to a version map if that map does
not require safe access but removed the existing delete tombstone only
if assertion enabled. In #29585, we removed the side-effect caused by
assertion then this test started failing. This failure can be explained
as follows:
- Step 1: Index a doc then delete that doc
- Step 2: The version map can switch to unsafe mode because of
concurrent refreshes (implicitly called by flushes)
- Step 3: Index a document - the version map won't add this version
value and won't prune the tombstone (previously it did)
- Step 4: Delete a document - this will return NOT_FOUND instead of
DELETED because of the stale delete tombstone
This failure is actually fixed by #29619 in which we never leave stale
delete tombstones
Closes#29626
Today the VersionMap does not clean up a stale delete tombstone if it
does not require safe access. However, in a very rare situation due to
concurrent refreshes, the safe-access flag may be flipped over then an
engine accidentally consult that stale delete tombstone.
This commit ensures to never leave stale delete tombstones in a version
map by always pruning delete tombstones when putting a new index entry
regardless of the value of the safe-access flag.
This commit remove serializing of common stats flags via its enum
ordinal and uses an explicit index defined on the enum. This is to
enable us to remove an unused flag (Suggest) without ruining the
ordering and thus breaking serialization.
We removed catched throwable from the code base and left behind was a
comment about catching InternalError in MemoryManagementMXBean. We are
not going to catch InternalError here as we expect that to be
fatal. This commit removes that stale comment.
The name of the bulk thread pool was renamed to "write" with "bulk" as a
fallback name. This change was made in 6.x for BWC reasons yet in 7.0.0
we are removing this fallback. This commit removes this fallback for the
write thread pool.
Today when a version map does not require safe access, we will skip that
document. However, if the assertion is enabled, we remove the delete
tombstone of that document if existed. This side-effect may accidentally
hide bugs in which stale delete tombstone can be accessed.
This change ensures putAssertionMap not modify the tombstone maps.
The camel case name `htmlStip` should be removed in favour of `html_strip`, but
we need to deprecate it first. This change adds deprecation warnings for indices
with version starting with 6.3.0 and logs deprecation warnings in this cases.
This commit renames the bulk thread pool to the write thread pool. This
is to better reflect the fact that the underlying thread pool is used to
execute any document write request (single-document index/delete/update
requests, and bulk requests).
With this change, we add support for fallback settings
thread_pool.bulk.* which will be supported until 7.0.0.
We also add a system property so that the display name of the thread
pool remains as "bulk" if needed to avoid breaking users.
Now that single-document indexing requests are executed on the bulk
thread pool the index thread pool is no longer needed. This commit
removes this thread pool from Elasticsearch.
Binary doc values are retrieved during the DocValueFetchSubPhase through an instance of ScriptDocValues.
Since 6.0 ScriptDocValues instances are not allowed to reuse the object that they return (https://github.com/elastic/elasticsearch/issues/26775) but BinaryScriptDocValues doesn't follow this restriction and reuses instances of BytesRefBuilder among different documents.
This results in `field` values assigned to the wrong document in the response.
This commit fixes this issue by recreating the BytesRef for each value that needs to be returned.
Fixes#29565
When comparing doubles, fixed epsilons can fail because the absolute
difference in values may be quite large, even though the relative
difference is tiny (e.g. with two very large numbers).
Instead, we can scale epsilon by the absolute value of the expected
value. This means we are looking for a diff that is epsilon-percent
away from the value, rather than just epsilon.
This is basically checking the relative error using junit's assertEqual.
Closes#29456, unmutes the test
As part of adding support for new API to the high-level REST client,
we added support for the `flat_settings` parameter to some of our
request classes. We added documentation that such flag is only ever
read by the high-level REST client, but the truth is that it doesn't
do anything given that settings are always parsed back into a `Settings`
object, no matter whether they are returned in a flat format or not.
It was a mistake to add support for this flag in the context of the
high-level REST client, hence this commit removes it.
This refactors MapperService so that it wraps a single `DocumentMapper` rather
than a `Map<String, DocumentMapper>`. We will need follow-ups since I haven't
fixed most APIs that still expose collections of types of mappers, but this is
a start...
Today the translog of an engine is exposed and can be accessed directly.
While this exposure offers much flexibility, it also causes these troubles:
- Inconsistent behavior between translog method and engine method.
For example, rolling a translog generation via an engine also trims
unreferenced files, but translog's method does not.
- An engine does not get notified when critical errors happen in translog
as the access is direct.
This change isolates translog of an engine and enforces all accesses to
translog via the engine.
The index thread pool is no longer needed as its primary use-case for
single-document indexing requests has been relieved now that
single-document indexing requests are converted to bulk indexing
requests (with a single document payload).
We want to remove the index thread pool as it is no longer needed since
single-document indexing requests are executed as bulk requests
now. Analyze requests are also executed on the index thread pool though
and they need a thread pool to execute on. The bulk thread does not seem
like the right thread pool, let us keep that thread pool conceptually
for bulk requests and free for bulk requests. None of the existing
thread pools make sense for analyze requests either. The generic thread
pool would be a terrible choice since it has an unbounded queue and that
is a bad idea for user-facing APIs. This commit introduces a small by
default (size=1, queue_size=16) thread pool for analyze requests.
This commit add the `include_type_name` option to the `index`, `update`,
`delete`, `get`, `bulk` and `search` APIs. When set to `false`, the response
will omit the `_type` in the response. This option doesn't work if the endpoint
contains a type. For instance, the following call would succeed:
```
GET index/_doc/1?include_type_name=false
```
But the following one would fail:
```
GET index/some_type/1?include_type_name=false
```
Relates #15613
With the move long ago to execute all single-document indexing requests
as bulk indexing request, the method
PipelineExecutionService#executeIndexRequest is unused and will never be
used in production code. This commit removes this method and cuts over
all tests to use PipelineExecutionService#executeBulkRequest.
CRUD: Parsing changes for UpdateRequest (#29293)
Use `ObjectParser` to parse `UpdateRequest` so we reject unknown fields
and drop support for the `_fields` parameter because it was deprecated
in 5.x.
The default percentiles values and the default highlighter per- and
post-tags are currently publicly accessible and can be altered any time.
This change prevents this by restricting field access.
Today when reading an operation from the current generation fails
tragically we attempt to close the translog. However, by invoking close
before releasing the read lock we end up in self-deadlock because
closing tries to acquire the write lock and the read lock can not be
upgraded to a write lock. To avoid this, we move the close invocation
outside of the try-with-resources that acquired the read lock. As an
extra guard against this, we document the problem and add an assertion
that we are not trying to invoke close while holding the read lock.
This change adds a client that is connected to a remote cluster.
This allows plugins and internal structures to invoke actions on
remote clusters just like a if it's a local cluster. The remote
cluster must be configured via the cross cluster search infrastructure.
This adds 2 testcases that test if a shard goes idle
pending (uncommitted) segments are committed and unreferenced
files will be freed.
Relates to #29482
Control max size and count of warning headers
Add a static persistent cluster level setting
"http.max_warning_header_count" to control the maximum number of
warning headers in client HTTP responses.
Defaults to unbounded.
Add a static persistent cluster level setting
"http.max_warning_header_size" to control the maximum total size of
warning headers in client HTTP responses.
Defaults to unbounded.
With every warning header that exceeds these limits,
a message will be logged in the main ES log,
and any more warning headers for this response will be
ignored.
Unlike the `indices.create`, `indices.get_mapping` and `indices.put_mapping`
APIs, the index APIs do not need the `include_type_name` option, they can work
work with and without types withouth knowing whether types are being used.
Internally, `_doc` is used as a type if no type is provided, like for the
`indices.put_mapping` API.
This change adds the current primary term to the header of the current
translog file. Having a term in a translog header is a prerequisite step
that allows us to trim translog operations given the max valid seq# for
that term.
This commit also updates tests to conform the primary term invariant
which guarantees that all translog operations in a translog file have
its terms at most the term stored in the translog header.
This commit moves the `TimeValue` class into the elasticsearch-core project.
This allows us to use this class in many of our other projects without relying
on the entire `server` jar.
Relates to #28504
* Decouple TimeValue from Elasticsearch server classes
This commit decouples the `TimeValue` class from the other server classes. This
is in preperation to move `TimeValue` into the `elasticsearch-core` jar,
allowing us to use it from projects that cannot depend on the elasticsearch-core
library.
Relates to #28504
The skeleton of ElasticsearchMergePolicy is quite similar to
MergePolicyWrapper. This commit therefore makes ElasticsearchMergePolicy
inherited from MergePolicyWrapper instead of MergePolicy.
Currently, a flush stats contains only the total flush which is the sum
of manual flush (via API) and periodic flush (async triggered when the
uncommitted translog size is exceeded the flush threshold). Sometimes,
it's useful to know these two numbers independently. This commit tracks
and returns a periodic flush count in a flush stats.
This adds an `include_type_name` option to the `indices.create`,
`indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`.
When set to `false`, then mappings will be returned directly in the body of
the `indices.get_mapping` API, without keying them by the type name, the
`indices.create` will expect mappings directly under the `mappings` key, and
the `indices.put_mapping` will use `_doc` as a type name and fail if a `type`
is provided explicitly.
Relates #15613
Today we expose a mutable list of documents in ParseContext via
ParseContext#docs(). This, on the one hand places knowledge how
to access nested documnts in multiple places and on the other
allows for potential illegal access to nested only docs after
the docs are reversed. This change restricts the access and
streamlines nested / non-root doc access.
This change validates that the `_search` request does not have trailing
tokens after the main object and fails the request with a parsing exception otherwise.
Closes#28995
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
Today when a user runs a CLI tool with standard input closed and no tty
attached, the result from reading is null and this usually leads to a
null pointer exception when we try to parse this input. This arises for
example when the user runs the plugin installer through a Docker
container without leaving standard input open and attaching a tty
(docker exec <container ID> bin/elasticsearch-plugin install). When we
try to read that the user accepts the plugin requiring additional
security permissions we will get back null. This commit addresses this
for all cases by throwing an illegal state exception. The solution for
the user is leave standard input open and attach a tty (or, for some
tools, use batch mode).
#29409 removed the nearlyEquals() double comparison snippet, which
makes these tests very flaky because they can generate very large or
very small doubles which don't work well with absolute error comparison.
We need to either refactor these tests to guarantee they stay in a small
range (which could be difficult due to holt/holt-winters) or re-implement
the more robust double comparison.
Tracking issue: #29456
This commit simplifies the exception handling in
TranslogWriter#closeWithTragicEvent. When invoking this method, the
inner close method could throw an exception which we always catch and
suppress into the exception that led us to tragically close. This commit
moves that repeated logic into closeWithTragicException and now callers
simply need to catch, invoke closeWithTragicException, and rethrow.
* Remove copy-pasted code
We had two instances of copy-pasted code with a bad license from
another website. The code was doing something rather simple, and
that functionality already exists within junit. This PR simply leverages
the junit functionality.
`BaseRandomBinaryDocValuesRangeQueryTestCase.testRandomBig` should only run with
nightly tests. It doesn't make sense to make it part of every test run.
`UUIDTests` had a slow test for compression, which I made a bit faster by
decreasing the number of indexed docs.
This commit fixes the formatting of the values in the composite
aggregation response. `date` fields should return timestamp as longs
when used in a `terms` source and `ip` fields should always be formatted as strings.
This commit also fixes the parsing of the `after` key for these field types.
Finally, this commit disables the index optimization for the `ip` field and any source that provides a `missing` value.
This commit simplifies the invocations to
Translog#closeOnTragicEvent. This method already catches all possible
exceptions and suppresses the non-AlreadyClosedExceptions into the
exception that triggered the invocation. Therefore, there is no need for
callers to do this same logic (which would never execute).
* Move Streams.copy into elasticsearch-core and make a multi-release jar
This moves the method `Streams.copy(InputStream in, OutputStream out)` into the
`elasticsearch-core` project (inside the `o.e.core.internal.io` package). It
also makes this class into a multi-release class where the Java 9 equivalent
uses `InputStream#transferTo`.
This is a followup from
https://github.com/elastic/elasticsearch/pull/29300#discussion_r178147495
* Move ObjectParser into the x-content lib
This moves `ObjectParser`, `AbstractObjectParser`, and
`ConstructingObjectParser` into the libs/x-content dependency. This decoupling
allows them to be used for parsing for projects that don't want to depend on the
entire Elasticsearch jar.
Relates to #28504
* Move Tuple into elasticsearch-core
This allows us to use Tuple from other projects that don't want to rely on the
entire Elasticsearch jar.
I have also added very simple tests, since there were none.
Relates tangentially to #28504
Today we close the translog write tragically if we experience any I/O
exception on a write. These tragic closes lead to use closing the
translog and failing the engine. Yet, there is one case that is missed
which is when we touch the write channel during a read (checking if
reading from the writer would put us past what has been flushed). This
commit addresses this by closing the writer tragically if we encounter
an I/O exception on the write channel while reading. This becomes
interesting when we consider that this method is invoked from the engine
through the translog as part of getting a document from the
translog. This means we have to consider closing the translog here as
well which will cascade up into us finally failing the engine.
Note that there is no semantic change to, for example, primary/replica
resync and recovery. These actions will take a snapshot of the translog
which syncs the translog to disk. If an I/O exception occurs during the
sync we already close the writer tragically and once we have synced we
do not ever read past the position that was synced while taking the
snapshot.
* Fixes query_string query equals timezone check
This change fixes a bug where two `QueryStringQueryBuilder`s were found
to be equal if they had the same timezone set even if the query string
in the builders were different
Closes#29403
* Adds mutate function to QueryStringQueryBuilderTests
* iter
Fixes instances of
- Equals methods without type check
- Equals methods where the field of `this` was compared to the same
field of `this` instead of the `that` object that is compared to
Since #26542 the NodeVersionAllocationDecider tries to explain its NO decisions
as follows:
... may not support codecs or postings formats for a newer Lucene version
However, this message often appears during a rolling upgrade, and experience
has shown that it seems to cause more confusion and worry than it needs to.
This change fixes that by removing the explanation again, reducing the message
to a statement of fact about the respective nodes' versions.
Additionally, the same wording was used for version incompatibilities when
allocating a primary (vs its previous location) and a replica (vs its primary).
This change separates these two cases so they can have separate, clearer
wording.
Fixes#29228
This change fixes the handling of the `quote_field_suffix` option on `query_string`
query. The expansion was not applied to default fields query.
Closes#29324
`action.master.force_local` was only ever used internally and never documented. It was one of those settings that were
automatically added to a tribe node, to make sure that cluster state read operations would work locally rather than failing when trying to forward the request to the master (as the tribe node never had a master).
Given that we recently removed the tribe node, we can also remove this setting.
Today when you input a byte size setting that is out of bounds for the
setting, you get an error message that indicates the maximum value of
the setting. The problem is that because we use ByteSize#toString, we
end up with a representation of the value that does not really tell you
what the bound is. For example, if the bound is 2^31 - 1 bytes, the
output would be 1.9gb which does not really tell you want the limit as
there are many byte size values that we format to the same 1.9gb with
ByteSize#toString. We have a method ByteSize#getStringRep that uses the
input units to the value as the output units for the string
representation, so we end up with no loss if we use this to report the
bound. This commit does this.
Before doing any kind of validation on a new mapping, we should first do the multi-type validation in
order to provide better error messages. For #29313, this means that the exception message will be
Rejecting mapping update to [range_index_new] as the final mapping would have more than 1 type:
[_doc, mytype]
instead of
[expected_attendees] is defined as an object in mapping [mytype] but this name is already used for
a field in other types
Today we report thread pool info using a common object. This means that
we use a shared set of terminology that is not consistent with the
terminology used to the configure thread pools. This holds in particular
for the minimum and maximum number of threads in the thread pool where
we use the following terminology:
thread pool info | fixed | scaling
min core size
max max size
A previous change addressed this for the nodes info API. This commit
changes the display of thread pool info in the cat thread pool API too
to be dependent on the type of the thread pool so that we can align the
terminology in the output of thread pool info with the terminology used
to configure a thread pool.