IndexAnalyzers has a close() method that should iterate through all its wrapped
analyzers and close each one in turn. However, instead of delegating to the
analyzers' close() methods, it instead wraps them in a Closeable interface,
which just returns a list of the analyzers. In addition, whitespace normalizers are
ignored entirely.
When profiling a call to `AllocationService#reroute()` in a large cluster
containing allocation filters of the form `node-name-*` I observed a nontrivial
amount of time spent in `Regex#simpleMatch` due to these allocation filters.
Patterns ending in a wildcard are not uncommon, and this change treats them as
a special case in `Regex#simpleMatch` in order to shave a bit of time off this
calculation. It also uses `String#regionMatches()` to avoid an allocation in
the case that the pattern's only wildcard is at the start.
Microbenchmark results before this change:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
1113.839 ±(99.9%) 6.338 ns/op [Average]
(min, avg, max) = (1102.388, 1113.839, 1135.783), stdev = 9.486
CI (99.9%): [1107.502, 1120.177] (assumes normal distribution)
Microbenchmark results with this change applied:
Result "org.elasticsearch.common.regex.RegexStartsWithBenchmark.performSimpleMatch":
433.190 ±(99.9%) 0.644 ns/op [Average]
(min, avg, max) = (431.518, 433.190, 435.456), stdev = 0.964
CI (99.9%): [432.546, 433.833] (assumes normal distribution)
The microbenchmark in question was:
@Fork(3)
@Warmup(iterations = 10)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@SuppressWarnings("unused") //invoked by benchmarking framework
public class RegexStartsWithBenchmark {
private static final String testString = "abcdefghijklmnopqrstuvwxyz";
private static final String[] patterns;
static {
patterns = new String[testString.length() + 1];
for (int i = 0; i <= testString.length(); i++) {
patterns[i] = testString.substring(0, i) + "*";
}
}
@Benchmark
public void performSimpleMatch() {
for (int i = 0; i < patterns.length; i++) {
Regex.simpleMatch(patterns[i], testString);
}
}
}
* Optimize Snapshot Finalization
* Delete index-N blobs and segement blobs in one single bulk delete instead of in separate ones to save RPC calls on implementations that have bulk deletes implemented
* Don't fail snapshot because deleting old index-N failed, this results in needlessly logging finalization failures and makes analysis of failures harder going forward as well as incorrect index.latest blobs
* Add Ability to List Child Containers to BlobContainer (#42653)
* Add Ability to List Child Containers to BlobContainer
* This is a prerequisite of #42189
This change fixes the name of the index_prefix sub field when the `index_prefix`
option is set on a text field that is nested under an object or a multi-field.
We don't use the full path of the parent field to set the index_prefix field name
so the field is registered under the wrong name. This doesn't break queries since
we always retrieve the prefix field through its parent field but this breaks other
APIs like _field_caps which tries to find the parent of the `index_prefix` field
in the mapping but fails.
Closes#43741
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.
Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
Currently the repsonse of the "_reload_search_analyzer" endpoint contains the
index names and nodeIds of indices were analyzers reloading was triggered. This
change add the names of the search-time analyzers that were reloaded.
Closes#43804
With Lucene rollback (#33473), we should never have more than one
primary term for each sequence number. Therefore we don't have to sort
by the primary term when reading soft-deletes.
The `RoutingService` has a confusing name, since it doesn't really have
anything to do with routing. Its responsibility is submitting reroute commands
to the master.
This commit renames this class to `BatchedRerouteService`, and extracts the
`RerouteService` interface to avoid passing `BiConsumer`s everywhere. It also
removes that `BatchedRerouteService extends AbstractLifecycleComponent` since
this service has no meaningful lifecycle. Finally, it introduces a small
wrapper class to allow for lazy initialization to deal with the dependency loop
when constructing a `Node`.
This adds a `rare_terms` aggregation. It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.
This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).
This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again. If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter. If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".
The map keys are the "rare" terms after collection is done.
This commit merges the `object-fields` feature branch. The new 'flattened
object' field type allows an entire JSON object to be indexed into a field, and
provides limited search functionality over the field's contents.
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.
This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.
relates #34389
Today the `DiskThresholdMonitor` limits the frequency with which it submits
reroute tasks, but it might still submit these tasks faster than the master can
process them if, for instance, each reroute takes over 60 seconds. This causes
a problem since the reroute task runs with priority `IMMEDIATE` and is always
scheduled when there is a node over the high watermark, so this can starve any
other pending tasks on the master.
This change avoids further updates from the monitor while its last task(s) are
still in progress, and it measures the time of each update from the completion
time of the reroute task rather than its start time, to allow a larger window
for other tasks to run.
It also now makes use of the `RoutingService` to submit the reroute task, in
order to batch this task with any other pending reroutes. It enhances the
`RoutingService` to notify its listeners on completion.
Fixes#40174
Relates #42559
Introduces a new `ConsistentSecureSettingsValidatorService` service that exposes
a single public method, namely `allSecureSettingsConsistent`. The method returns
`true` if the local node's secure settings (inside the keystore) are equal to the
master's, and `false` otherwise. Technically, the local node has to have exactly
the same secure settings - setting names should not be missing or in surplus -
for all `SecureSetting` instances that are flagged with the newly introduced
`Property.Consistent`. It is worth highlighting that the `allSecureSettingsConsistent`
is not a consensus view across the cluster, but rather the local node's perspective
in relation to the master.
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.
relates #34389
Today when an index is closed all its shards are forced flushed
but the translog files are left around. As explained in #42445
we'd like to trim the translog for closed indices in order to
consume less disk space. This commit reuses the existing
AsyncTrimTranslogTask task and reenables it for closed indices.
At the time the task is executed, we should have the guarantee
that nothing holds the translog files that are going to be removed.
It also leaves a short period of time (10 min) during which translog
files of a recently closed index are still present on disk. This could
also help in some cases where the closed index is reopened
shortly after being closed (in order to update an index setting
for example).
Relates to #42445
This change removes the ability to wrap an IndexSearcher in plugins. The IndexSearcherWrapper is replaced by an IndexReaderWrapper and allows to wrap the DirectoryReader only. This simplifies the creation of the context IndexSearcher that is used on a per request basis. This change also moves the optimization that was implemented in the security index searcher wrapper to the ContextIndexSearcher that now checks the live docs to determine how the search should be executed. If the underlying live docs is a sparse bit set the searcher will compute the intersection
betweeen the query and the live docs instead of checking the live docs on every document that match the query.
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.
Closes#43198
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.
This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.
Relates to #29051
When we added support for TokenFilterFactories to specialise how they were used when parsing
synonym files, PreConfiguredTokenFilters were set up to either apply themselves, or be ignored.
This behaviour is a leftover from an earlier iteration, and also has an incorrect default.
This commit makes preconfigured token filters usable in synonym file parsing by default, and brings
those filters that should not be used into line with index-specific filter factories; in indexes created
before version 7 we emit a deprecation warning, and we throw an error in indexes created after.
Fixes#38793
TransportNodesAction provides a mechanism to easily broadcast a request
to many nodes, and collect the respones into a high level response. Each
node has its own request type, with a base class of BaseNodeRequest.
This base request requires passing the nodeId to which the request will
be sent. However, that nodeId is not used anywhere. It is private to the
base class, yet serialized to each node, where the node could just as
easily find the nodeId of the node it is on locally.
This commit removes passing the nodeId through to the node request
creation, and guards its serialization so that we can remove the base
request class altogether in the future.
GatewayIndexStateIT#testRecoverBrokenIndexMetadata replies on the
flushing on shutdown. This behaviour, however, can be randomly disabled
in MockInternalEngine.
Closes#43034
Adds support for the situation where only voting-only nodes are bootstrapped. In that case, they will
still try to become elected and bring full master nodes into the cluster.
Search requests executed through the SecurityIndexSearcherWrapper throw
an UnsupportedOperationException if they match a sparse role query.
When low level cancellation is activated (which is the default since #42857),
the context index searcher creates a weight that doesn't handle #scorer.
This change fixes this bug and adds a test to ensure that we check this case.
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.
This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.
Closes#43200
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.
This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.
It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings
Closes#43002Closes#43621Closes#43582
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.
Relates to #43198
This commit modifies the RemoteInfo to clarify that a search query
must always be serialized as JSON. Additionally, it adds an assertion
to ensure that this is the case. This fixes#43406.
Additionally, this PR implements AbstractXContentTestCase for the
reindex request. This is related to #43456.
A voting-only master-eligible node is a node that can participate in master elections but will not act
as a master in the cluster. In particular, a voting-only node can help elect another master-eligible
node as master, and can serve as a tiebreaker in elections. High availability (HA) clusters require at
least three master-eligible nodes, so that if one of the three nodes is down, then the remaining two
can still elect a master amongst them-selves. This only requires one of the two remaining nodes to
have the capability to act as master, but both need to have voting powers. This means that one of
the three master-eligible nodes can be made as voting-only. If this voting-only node is a dedicated
master, a less powerful machine or a smaller heap-size can be chosen for this node. Alternatively, a
voting-only non-dedicated master node can play the role of the third master-eligible node, which
allows running an HA cluster with only two dedicated master nodes.
Closes#14340
Co-authored-by: David Turner <david.turner@elastic.co>
Today the `ClusterFormationFailureHelper` says `... discovery will continue
using ... from last-known cluster state` and lists all the nodes in the
last-known cluster state. In fact we ignore the master-ineligible nodes in the
last-known cluster state during discovery. This commit fixes this by listing
only the master-eligible nodes from the cluster state in this message.
If the master removes the relocating shard, but recovery isn't aware of
it, then we can enter an invalid state where ReplicationTracker does not
include the local shard.
Today we assert that a warning is logged after no more than
`discovery.cluster_formation_warning_timeout`, but the deterministic scheduler
adds a small amount of extra randomness to the timing of future events, causing
the following build to fail:
./gradlew :server:test --tests "org.elasticsearch.cluster.coordination.CoordinatorTests.testLogsWarningPeriodicallyIfClusterNotFormed" -Dtests.seed=DF35C28D4FA9EE2D
This commit adds an allowance for this extra time.
After two recent changes (#38824 and #33888), the _cat/indices API
no longer report information for active recovering indices and
non-replicated closed indices. It also misreport replicated closed
indices that are potentially not authorized for the user.
This commit changes how the cat action works by first using the
Get Settings API in order to resolve authorized indices. It then uses
the Cluster State, Cluster Health and Indices Stats APIs to retrieve
information about the indices.
Closes#39933