Currently, translog operations are read and processed one by one. This
may be a problem as stale operations in translogs may suddenly reappear
in recoveries. To make sure that stale operations won't be processed, we
read the translog files in a reverse order (eg. from the most recent
file to the oldest file) and only process an operation if its sequence
number was not seen before.
Relates to #10708
Any CLI commands that depend on core Elasticsearch might touch classes
(directly or indirectly) that depends on logging. If they do this and
logging is not configured, Log4j will dump status error messages to the
console. As such, we need to ensure that any such CLI command configures
logging (with a trivial configuration that dumps log messages to the
console). Previously we did this in the base CLI command but with the
refactoring of this class out of core Elasticsearch, we no longer
configure logging there (since we did not want this class to depend on
settings and logging). However, this meant for some CLI commands (like
the plugin CLI) we were no longer configuring logging. This commit adds
base classes between the low-level command and multi-command classes
that ensure that logging is configured. Any CLI command that depends on
core Elasticsearch should use this infrastructure to ensure logging is
configured. There is one exception to this: Elasticsearch itself because
it takes reponsibility into its own hands for configuring logging from
Elasticsearch settings and log4j2.properties. We preserve this special
status.
Relates #27523
Today if refresh is disabled the doc stats are not updated anymore.
In a bulk index scenario this might cause confusion since even if
we refresh internal readers etc. doc stats are never advancing.
This change cuts over to the internal reader that is refreshed outside
of the external readers refresh interval but always equally `fresh` or
`fresher` which will cause less confusion.
In a previous change, we locked down the classes that can exit by
specifying explicit classes rather than packages than can exit. Alas,
there was a bug in the sense that the class that we exit from in the
case of an uncaught exception is not
ElasticsearchUncaughtExceptionHandler but rather an anonymous nested
class of ElasticsearchUncaughtExceptionHandler. To address this, we
replace this anonymous class with a bonafide nested class
ElasticsearchUncaughtExceptionHandler$PrivilegedHaltAction. Note that if
we try to get this class name we have a $ in the middle of the string
which is a special regular expression character; as such, we have to
escape it.
Relates #27518
The commit looks harmless, unfortunately it can break the engine flush
scheduler and the translog rolling. Both `uncommittedOperations` and
`uncommittedSizeInBytes` are currently calculated based on the minimum
required generation for recovery rather than the translog generation of
the last index commit. This is not correct if other index commits are
reserved for snapshotting even though we are keeping the last index
commit only.
This reverts commit e95d18ec23.
Today we create a new concurrent hash map everytime we refresh
the internal reader. Under defaults this isn't much of a deal but
once the refresh interval is set to `-1` these maps grow quite large
and it can have a significant impact on indexing throughput. Under low
memory situations this can cause up to 2x slowdown. This change carries
over the map size as the initial capacity wich will be auto-adjusted once
indexing stops.
Closes#20498
During a scroll, if the search sort matches the index sort we use the sort values of the last doc returned by
the previous scroll to optimize the main query with a `SearchAfterSortedDocQuery`.
This query can "jump" directly to the first document that sorts after the provided sort values.
This optim is also applied if the search sort is a prefix of the index sort but this case throws an exception
because we use the index sort (instead of the search sort) to validate the sort values of the last document.
This change fixes this bug and adds a test for it.
Pull request #20220 added a change where the store files
that have the same name but are different from the ones in the
snapshot are deleted first before the snapshot is restored.
This logic was based on the `Store.RecoveryDiff.different`
set of files which works by computing a diff between an
existing store and a snapshot.
This works well when the files on the filesystem form valid
shard store, ie there's a `segments` file and store files
are not corrupted. Otherwise, the existing store's snapshot
metadata cannot be read (using Store#snapshotStoreMetadata())
and an exception is thrown
(CorruptIndexException, IndexFormatTooOldException etc) which
is later caught as the begining of the restore process
(see RestoreContext#restore()) and is translated into
an empty store metadata (Store.MetadataSnapshot.EMPTY).
This will make the deletion of different files introduced
in #20220 useless as the set of files will always be empty
even when store files exist on the filesystem. And if some
files are present within the store directory, then restoring
a snapshot with files with same names will fail with a
FileAlreadyExistException.
This is part of the #26865 issue.
There are various cases were some files could exist in the
store directory before a snapshot is restored. One that
Igor identified is a restore attempt that failed on a node
and only first files were restored, then the shard is allocated
again to the same node and the restore starts again (but fails
because of existing files). Another one is when some files
of a closed index are corrupted / deleted and the index is
restored.
This commit adds a test that uses the infrastructure provided
by IndexShardTestCase in order to test that restoring a shard
succeed even when files with same names exist on filesystem.
Related to #26865
The `delimited_payload_filter` is renamed to `delimited_payload`, the old name is
deprecated and should be replaced by `delimited_payload`.
Closes#21978
Today, we keep only the last index commit and use only it to calculate
the minimum required translog generation. This may no longer be correct
as we introduced a new deletion policy which keeps multiple index
commits. This change adjusts the CombinedDeletionPolicy so that it can
work correctly with a new index deletion policy.
Relates to #10708, #27367
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.
This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.
NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.
This is a 7.0 only change.
This is related to #27260. Currently, basic nio constructs (nio
channels, the channel factories, selector event handlers, etc) implement
logic that is specific to the tcp transport. For example, NioChannel
implements the TcpChannel interface. These nio constructs at some point
will also need to support other protocols (ex: http).
This commit separates the TcpTransport logic from the nio building
blocks.
Add an index level setting `index.mapping.nested_objects.limit` to control
the number of nested json objects that can be in a single document
across all fields. Defaults to 10000.
Throw an error if the number of created nested documents exceed this
limit during the parsing of a document.
Closes#26962
Today we allow exiting solely by being in certain packages. This commit
upgrades the securesm dependency to a new version that supports being
explicit about which classes can exit. We utilize that here to only
allow exiting from the uncaught exception handler and the base CLI
command class.
Relates #27482
Exclude "key" field from random modifications in tests, the composite agg uses
an array of object for bucket key and values are checked.
Relates #26800
When a field is not mapped, Elasticsearch tries to generate a mapping update
from the parsed document. Some documents can introduce corner-cases, for
instance in the event of a multi-valued field whose values would be mapped to
different field types if they were supplied on their own, see for instance:
```
PUT index/doc/1
{
"foo": ["2017-11-10T02:00:01.247Z","bar"]
}
```
In that case, dynamic mappings want to map the first value as a `date` field
and the second one as a `text` field. This currently throws an exception,
which is expected, but the wrong one since it throws a `class_cast_exception`
(which triggers a HTTP 5xx code) when it should throw an
`illegal_argument_exception` (HTTP 4xx).
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.
Relates to #24362
Removing the unnecessary RankEvalTestHelper, making use of the common test infra
in ESTestCase, also hardening a few of the classes by making more fields final.