Today our OS information returned in node stats only returns a
high-level name of the OS (e.g., "Linux"). Yet, for some uses this is
too high-level and knowing at a finer level of granularity the
underlying OS can be useful. This commit extracts the pretty name on
Linux from /etc/os-release. This pretty name usually includes the Linux
vendor and the Linux vendor version number (e.g., Fedora 28).
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in #26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.
Closes#34778
Currently when aggregating on an unmapped date field (e.g. using a
date_histogram) we don't preserve the aggregations `format` setting but instead
use the default format. This can lead to loosing the aggregations `format` when
aggregating over several indices where some of them contain unmapped date fields
and are encountered first in the reduce phase.
Related to #31760
Today the `composite` aggregation throws an error if a source targets an
unmapped field and `missing_bucket` is set to false. Documents without a
value for a source cannot produce any bucket if `missing_bucket` is not
activated so the error is a shortcut to say that the response will be empty.
However this is not consistent with the `terms` aggregation which accepts
unmapped field by default even if the response is also guaranteed to be empty.
This commit removes this restriction, if a source contains an unmapped field
we now return an empty response (no buckets).
Closes#35317
The current implementation of asyncBlockOperations() can be used to
execute some code once all indexing operations permits have been acquired,
then releases all permits immediately after the code execution. This
immediate release is not suitable for treatments that need to keep all
permits over multiple execution steps.
This commit adds a new asyncBlockOperations() that exposes a Releasable,
making it possible to acquire all permits and only release them all
when needed by closing the Releasable. The existing blockOperations()
method has been modified to delegate permit acquisition/releasing to this new
method.
Relates to #33888
This commit uses the index settings version so that a follower can
replicate index settings changes as needed from the leader.
Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>
The lookup vars under params (namely _fields and _source) were
inadvertently removed when scoring scripts were converted to using
script contexts. This commit adds them back, along with deprecation
warnings for those that should not be used.
A CCR test failure shows that the approach in #34474 is flawed.
Restoring the LocalCheckpointTracker from an index commit can cause both
FollowingEngine and InternalEngine to incorrectly ignore some deletes.
Here is a small scenario illustrating the problem:
1. Delete doc with seq=1 => engine will add a delete tombstone to Lucene
2. Flush a commit consisting of only the delete tombstone
3. Index doc with seq=0 => engine will add that doc to Lucene but soft-deleted
4. Restart an engine with the commit (step 2); the engine will fill its
LocalCheckpointTracker with the delete tombstone in the commit
5. Replay the local translog in reverse order: index#0 then delete#1
6. When process index#0, an engine will add it into Lucene as a live doc
and advance the local checkpoint to 1 (seq#1 was restored from the
commit - step 4).
7. When process delete#1, an engine will skip it because seq_no=1 is
less than or equal to the local checkpoint.
We should have zero document after recovering from translog, but here we
have one.
Since all operations after the local checkpoint of the safe commit are
retained, we should find them if the look-up considers also soft-deleted
documents. This PR fills the disparity between the version map and the
local checkpoint tracker by taking soft-deleted documents into account
while resolving strategy for engine operations.
Relates #34474
Relates #33656
Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.
The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query,
so that the query analyzer can't extract it and the `percolate` query
is then able to evaluate 'now' at query time.
This change adds a `frozen` engine that allows lazily open a directory reader
on a read-only shard. The engine wraps general purpose searchers in a LazyDirectoryReader
that also allows to release and reset the underlying index readers after any and before
secondary search phases.
Relates to #34352
Today we only apply `ingore_throttled` to expansions from wildcards,
date math expressions and aliases. Yet, this is tricky since we might
have resolved certain expressions in pre-filter steps like security.
It's more consistent to apply this logic to all expressions including
concrete indices.
Relates to #34354
With this change, `Version` no longer carries information about the qualifier,
we still need a way to show the "display version" that does have both
qualifier and snapshot. This is now stored by the build and red from `META-INF`.
This is related to #29023. Additionally at other points we have
discussed a preference for removing the need to unnecessarily block
threads for opening new node connections. This commit lays the groudwork
for this by opening connections asynchronously at the transport level.
We still block, however, this work will make it possible to eventually
remove all blocking on new connections out of the TransportService
and Transport.
We've decided that the bulk, delete, get, index, update, and search APIs should not
contain this request parameter, and we will instead accept both typed and typeless calls.
We have seen an improvement when we bumped the timeout from 1s to 5s, but there are still a few failures for this tests. With this commit we bump the timeout to 10 seconds hoping it will stop all the failures.
Today if a wildcard, date-math expression or alias expands/resolves
to an index that is search-throttled we still search it. This is likely
not the desired behavior since it can unexpectedly slow down searches
significantly.
This change adds a new indices option that allows `search`, `count`
and `msearch` to ignore throttled indices by default. Users can
force expansion to throttled indices by using `ignore_throttled=true`
on the rest request to expand also to throttled indices.
Relates to #34352
This test has a bug that got introduced during the refactoring of #32442. With 2 concurrent term increments,
we can only assert under the operation permit that we are in the correct operation term, not that there is
not already another term bump pending.
Closes#34862
If the random query string is "now" by accident _and_ we are also not setting
some field names to use explicitely, then we can hit the "mapped_date" field
from default test setup. This correctly leads to the query being was marked as
not cacheable, but we assume and check so later. This change fixes this rare
edge case by making sure we don't hit the "date" field in this rare cases.
Closes#35183
When the engine is asked for historical operations, we check if some of the requested operations
are not yet refreshed and if so we refresh before returning the operations. The refresh check is
based on capturing the local checkpoint before each refresh and comparing that value to the one
requested when `newChangesSnapshot` was called. If the requested range is above the captured
local checkpoint we issue a refresh.
This can currently cause unneeded extra refreshes if the method is called concurrently which may cause unwanted degradation in indexing performance. This is especially relevant for CCR where we always ask for a range below the global checkpoint. That range is guaranteed to be below the local
checkpoint of the shard and one refresh is enough to serve multiple changes requests.
This commit fixes this by introducing a dedicated mutex to make sure the test for whether a refresh
is needed actually wait for concurrents for concurrent refreshes that were caused by another
change refresh.
Note that this is not a big change in semantics as refreshes are serialized by lucene anyway. I also
opted not to keep the synchronization to the changes snapshot request only even if in theory we
can apply it to all refreshes, not matter where they come from.
This changes the current script.max_size_in_bytes to be dynamic so it can be
set through the cluster settings API. This setting is also applied to inline scripts
in the compile method of ScriptService to prevent excessively long inline
scripts from being compiled. The script length limit is removed from Painless as
this is no longer necessary with the protection in compile.
Currently we create a new netty event loop group for client connections
and all server profiles. Each new group creates new threads for io
processing. This means 2 * num of processors new threads for each group.
A single group should be able to handle all io processing (for the
transports). This also brings the netty module inline with what we do
for nio.
Additionally, this PR renames the worker threads to be the same for
netty and nio.
Currently, we assume that rollback always happens in the test
testRestoreLocalHistoryFromTranslogOnPromotion. However, if the global
checkpoint equals max_seq_no, we won't rollback. This causes the
max_seq_no_of_updates assertion failed because max_seq_no_of_updates
won't be advanced to the global checkpoint. With this commit, we assert
max_seq_no_of_updates in two different paths.
Today we always allocate a full buffer (1024 elements) in a
LuceneChangesSnapshot even though the requesting size is smaller.
With this change, we will use the requesting size as the buffer size if
it's smaller than the default batch size; otherwise uses the default
batch size.
With this commit we differentiate between permanent circuit breaking
exceptions (which require intervention from an operator and should not
be automatically retried) and transient ones (which may heal themselves
eventually and should be retried). Furthermore, the parent circuit
breaker will categorize a circuit breaking exception as either transient
or permanent based on the categorization of memory usage of its child
circuit breakers.
Closes#31986
Relates #34460
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
* NETWORKING: MockTransportService Wait for Close
* Make `MockTransportService` wait `30s` for close listeners to run before failing the assertion
* Closes#34990
When we connect to remote clusters, there may be a few more routers/firewalls in-between compared to when we connect to nodes in the same cluster. We've experienced cases where firewalls drop connections completely and keep-alives seem not to be enough, or they are not properly configured. With this commit we allow to enable application-level pings specifically from CCS nodes to the selected remote nodes through the new setting `cluster.remote.${clusterAlias}.transport.ping_schedule`. The new setting is similar `transport.ping_schedule` but it does not affect intra-cluster communication, pings are only sent to specific remote cluster when specifically enabled, as they are disabled by default.
Relates to #34405
* DISCOVERY: Cleanup AbstractDisruptionTestCase
* Make the internal test cluster manage minimum master nodes where we used the default of (nodes / 2 + 1) before
* Remove use of the `NodeConfigurationSource` indirection
* Relates #33675
Drops the `Settings` member from `AbstractComponent`, moving it from the
base class on to the classes that use it. For the most part this is a
mechanical change that doesn't drop `Settings` accesses. The one
exception to this is naming threads where it switches from an invocation
that passes `Settings` and extracts the node name to one that explicitly
passes the node name.
This change doesn't drop the `Settings` argument from
`AbstractComponent`'s ctor because this change is big enough as is.
We'll do that in a follow up change.
This commit filters out usage of deprecated tzs by tests. These are
tested separately and should not require checking for warnings on any
test using random timezones.
closes#34188
This commit adds a new single value metric aggregation that calculates
the statistic called median absolute deviation, which is a measure of
variability that works on more types of data than standard deviation
Our calculation of MAD is approximated using t-digests. In the collect
phase, we collect each value visited into a t-digest. In the reduce
phase, we merge all value t-digests, then create a t-digest of
deviations using the first t-digest's median and centroids
Bulk Request in High level rest client should be consistent with what is
possible in Rest API, therefore should support global parameters. Global
parameters are passed in URL in Rest API.
Some parameters are mandatory - index, type - and would fail validation
if not provided before before the bulk is executed.
Optional parameters - routing, pipeline.
The usage of these should be consistent across sync/async execution,
bulk processor and BulkRequestBuilder
closes#26026
Deprecates `_source_include` and `_source_exclude` url parameters
in favor of `_source_inclues` and `_source_excludes` because those
are consistent with the rest of Elasticsearch's APIs.
Relates to #22792
Drops the `deprecationLogger` from `AbstractComponent`, moving it to
places where we need it. This saves us from building a bunch of
`DeprecationLogger`s that we don't need.
Relates to #34488
`AbstractComponent` is trouble because its name implies that
*everything* should extend from it. It *is* useful, but maybe too
broadly useful. The things it offers access too, the `Settings` instance
for the entire server and a logger are nice to have around, but not
really needed *everywhere*. The `Settings` instance especially adds a
fair bit of ceremony to testing without any value.
This removes the `nodeName` method from `AbstractComponent` so it is
more clear where we actually need the node name.
* Removed methods are just unused (the exceptions being isGeoPoint() and is
isFloatingPoint() but those could more efficiently be replaced by enum comparisons to simplify the code)
* Remove exceptions aren't thrown
In order to remove Streamable from the codebase, Response objects need
to be read using the Writeable.Reader interface which this change
enables. This change enables the use of Writeable.Reader by adding the
`Action#getResponseReader` method. The default implementation simply
uses the existing `newResponse` method and the readFrom method. As
responses are migrated to the Writeable.Reader interface, Action
classes can be updated to throw an UnsupportedOperationException when
`newResponse` is called and override the `getResponseReader` method.
Relates #34389
This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.
Relates #34508
* Fix linelength suppressions in index.fielddata
* Some lines that were too long were dead code => Removed them and all code that became dead because of it
* Relates #34884
testClusterFormsByScanningPorts is flaky: sometimes in CI it's not possible to
bind to any of the ports we need to in order for the port scanning to work.
This change removes this test, and #34809 describes a better way to test this
behaviour.
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.
Historically, DeprecationLogger was uppercased more frequently than lowercased.
This is related to #30876. The AbstractSimpleTransportTestCase initiates
many tcp connections. There are normally over 1,000 connections in
TIME_WAIT at the end of the test. This is because every test opens at
least two different transports that connect to each other with 13
channel connection profiles. This commit modifies the default
connection profile used by this test to 6. One connection for each
type, except for REG which gets 2 connections.
* Check self references in metric agg after last doc collection (#33593)
* Revert 0aff5a30c5dbad9f476be14f34b81e2d1991bb0f (#33593)
* Check self refs in metric agg only once in post collection hook (#33593)
* Remove unnecessary mocking (#33593)
The check for null argument is already done in `splitStringByCommaToArray`, hence it can be removed, which allows us to remove the whole splitScrollIds private method.
With this change, we apply the common test config automatically to all
newly created tasks instead of opting in specifically.
For plugin authors using the plugin externally this means that the
configuration will be applied to their RandomizedTestingTasks as well.
The purpose of the task is to simplify setup and make it easier to
change projects that use the `test` task but actually run integration
tests to use a task called `integTest` for clarity, but also because
we may want to configure and run them differently.
E.x. using different levels of concurrency.
Access to special variables _source and _fields were accidentally
removed in recent refactorings. This commit adds them back, along with a
test.
closes#33884
In a future major version, we will be introducing a soft limit on the
number of shards in a cluster based on the number of nodes in the
cluster. This limit will be configurable, and checked on operations
which create or open shards and issue a warning if the operation would
take the cluster over the limit.
There is an option to enable strict enforcement of the limit, which
turns the warnings into errors. In a future release, the option will be
removed and strict enforcement will be the default (and only) behavior.
- Restrict visibility of Aggregators and Factories
- Move PipelineAggregatorBuilders up a level so it is consistent with
AggregatorBuilders
- Checkstyle line length fixes for a few classes
- Minor odds/ends (swapping to method references, formatting, etc)
This commit introduces two corrections to the way simulate?verbose
handles conditionals on processors.
1) Prior to this change when executing simulate?verbose for
processors with conditionals that evaluate to false, that processor
would still be displayed in the result set. What was displayed was
correct, such that no changes to the document occurred. However, if the
conditional evaluates to false, the processor should not even be
displayed.
2) Prior to this change when executing simulate?verbose for
pipeline processors with conditionals, the individual steps would no
longer be displayed. Commit e37e5df addressed the issue, but
failed account for a conditional on the pipeline processor. Since
a pipeline processor can introduce cycles and is effectively a
single processor that encapsulates multiple other processors that
are potentially guarded by a single conditional, special handling is
needed to for pipeline and conditional pipeline processors.
We should delete a job by directly talking to the allocated
task and telling it to shutdown. Today we shut down a job
via the persistent task framework. This is not ideal because,
while the job has been removed from the persistent task
CS, the allocated task continues to live until it gets the
shutdown message.
This means a user can delete a job, immediately delete
the rollup index, and then see new documents appear in
the just-deleted index. This happens because the indexer
in the allocated task is still running and indexes a few
more documents before getting the shutdown command.
In this PR, the transport action is changed to a TransportTasksAction,
and we invoke onCancelled() directly on the matching job.
The race condition still exists after this PR (albeit less likely),
but this was a precursor to fixing the issue and a self-contained
chunk of code. A second PR will followup to fix the race itself.
This fixes a bug about aliases authorization.
That is, a user might see aliases which he is not authorized to see.
This manifests when the user is not authorized to see any aliases
and the `GetAlias` request is empty which normally is a marking
that all aliases are requested. In this case, no aliases should be
returned, but due to this bug, all aliases will have been returned.
The `ignore` set contains entries of type Class<?>, but the check is performed
on Path objects. This always returns false so is useless currently. Looking at
the first commit of this test that already shows this behaviour this never
excluded anything, so it can be removed.