The `_id` field uses a binary encoding to index terms that is not compatible with
the utf8 automaton that the unified highlighter creates to reanalyze the input.
For these reason this commit ignores terms that target the `_id` field when
`require_field_match` is set to false.
Closes#37525
With the removal of the `_all` field the `mlt` query cannot infer a field name
to use to analyze the provided (un)like text if the `fields` parameter is not
explicitly set in the query and the `index.query.default_field` is not changed
in the index settings (by default it is set to `*`). For this reason the like text
is ignored and queries are only built from the provided document ids.
This change fixes this bug by throwing an error if the fields option is not set
and the `index.query.default_field` is equals to `*`. The error is thrown only
if like or unlike texts are provided in the query.
This change clarifies the documentation around the recommended JVM. The
recommended JVM is the bundled JVM. If a user does not use our
recommended JVM we suggest that they use a supported LTS version of the
JVM.
Closes#41132
Fix bug in predicate subtraction that caused the evaluation to be
skipped on the first mismatch instead of evaluating the whole list. In
some cases this caused not only an incorrect result but one that kept on
growing causing the engine to bail
Fix#40835
(cherry picked from commit bd2b33d6eaca616a5acd846204e2d12f905854d4)
* Handle the scenario where assertLogs() is not called from a test method
but the audit rolling file rolls over.
* Use a local boolean variable instead of the static one to account for
assertBusy() code block possibly being called multiple times and having
different execution paths.
(cherry picked from commit 6f642196cbab90079c610097befc794746170df1)
This fixes an issue where every N seconds a slow search request is triggered
since the searcher access time is not set unless the shard is idle. This change
moves to a more pro-active approach setting the searcher as accessed all the time.
Fix a deprecation warning that wasn't rendering correctly in
asciidoctor. This one needed to be explicitly marked as an inline macro
because it is on its own line and it needed to have its text escaped
because it contained a `,`. It also was missing explanitory text for
what the setting was.
This PR adds additional cleanup when stopping the node.
The data dir is excepted because it gets reused in some tests.
Without this cleanup the number of working dir copies could grew to
exhaust all available disk space.
* Bulk requests can be thousands of items large and take more than O(10ms) time to handle => we should not handle them on the transport threadpool to not block select loops
* relates #39128
* relates #39658
Today we do not distinguish "no operations in flight" from "operations are
blocked", since both return `0` from `IndexShard#getActiveOperationsCount()`.
We therefore cannot assert that every `TransportReplicationAction` performs its
actions under permit(s). This commit fixes this by returning
`IndexShard#OPERATIONS_BLOCKED` if operations are blocked, allowing these two
cases to be distinguished.
This commit extracts the template management from Watcher into an
abstract class, so that templates and lifecycle policies can be managed
in the same way across multiple plugins. This will be useful for SLM, as
well as potentially ILM and any other plugins which need to manage index
templates.
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.
When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.
Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
The `ignore_malformed` option currently works on numeric fields only when the
bad value isn't a string value but not if it is a boolean. In this case we get a
parsing error from the xContent parser which we need to catch in addition to the
field mapper.
Closes#11498
Today the `?preference=custom_string_value` search preference will only change
its choice of a shard copy if something changes the `IndexShardRoutingTable`
for that specific shard. Users can use this behaviour to route searches to a
consistent set of shard copies, which means they can reliably hit copies with
hot caches, and use the other copies only for redundancy in case of failure.
However we do not assert this property anywhere, so we might break it in
future.
This commit adds a test that shows that searches are routed consistently even
if other indices are created/rebalanced/deleted.
Relates https://discuss.elastic.co/t/176598, #41115, #26791
Today we always trim unsafe commits (whose max_seq_no >= global
checkpoint) before starting a read-write or read-only engine. This is
mandatory for read-write engines because they must start with the safe
commit. This is also fine for read-only engines since most of the cases
we should have exactly one commit after closing an index (trimming is a
noop). However, this is dangerous for following indices which might have
more than one commits when they are being closed.
With this change, we move the trimming logic to the ctor of InternalEngine
so we won't trim anything if we are going to open a read-only engine.
Currently enabling profiling disables top-hits optimizations, which is
unfortunate: it would be nice to be able to notice the difference in method
counts and timings depending on whether total hit counts are requested.
`Node#close` is pretty hard to rely on today:
- it might swallow exceptions
- it waits for 10 seconds for threads to terminate but doesn't signal anything
if threads are still not terminated after 10 seconds
This commit makes `IOException`s propagated and splits `Node#close` into
`Node#close` and `Node#awaitClose` so that the decision what to do if a node
takes too long to close can be done on top of `Node#close`.
It also adds synchronization to lifecycle transitions to make them atomic. I
don't think it is a source of problems today, but it makes things easier to
reason about.
Today we check if an index has broken settings when checking if an index
needs to be upgraded. However, it can be the case that an index setting
became broken even if an index is already upgraded to the current
version if the user removed a plugin (or downgraded from the default
distribution to the non-default distribution) while on the same version
of Elasticsearch. In this case, some registered settings would go
missing and the index would now be broken. Yet, we miss this check and
instead of archiving the settings, the index becomes unassigned due to
the missing settings. This commit addresses this by checking for broken
settings whether or not the index is upgraded.
One of the two #getCorrections methods is only used in tests, so we can move
it and any of the required helper methods to that test. Also reducing the
visibility of several methods to package private since the class isn't used
elsewhere outside the package.
Today we erroneously look for a node setting called `readonly` when deciding
whether or not to create a missing directory in a filesystem repository. This
change fixes this by using the repository setting instead.
Closes#41009
Relates #26909
CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP can be used as SQL keywords
(without parentheses) and therefore there is a special rule in the
grammar to accommodate this.
Previously, this rule was also catching the parenthesised version of those functions too,
not allowing the {fn <functionName>()} to be used. E.g.:
{fn current_time(2)} or {fn current_timestamp()}
Now, the grammar rule catches only the keyword versions and all the parenthesised
versions go through the normal function resolution. As a consequence the validation
of the precision is moved from the parser lever (ExpressionBuilder) to the function
implementations.
Fixes: #41240
(cherry picked from commit bfbc9f140144b5a35aa29008b58bf58074419853)
* fix the packer cache script
This PR disabled the explicit pull since it seems this always tries to
work with a registry.
Functionality will not be affected since we will still build the images
on pull.
When the same alias points to multiple indices we can write to only one index
with `is_write_index` value `true`. The special handling in case of the put
mapping request(to resolve authorized indices) has a check on indices size
for a concrete index. If multiple indices existed then it marked the request
as unauthorized.
The check has been modified to consider write index flag and only when the
requested index matches with the one with write index alias, the alias is considered
for authorization.
Closes#40831
In `TransportRolloverAction` before doing rollover we resolve
source index name (write index) from the alias in the rollover request.
Before evaluating the conditions and executing rollover action, we
retrieve stats, but to do so we used the source index name
resolved from the alias instead of alias from the index.
This fails when the user is assigned a role with index privilege on the
alias instead of the concrete index. This commit fixes this by using
the alias from the request.
After this change, verified that when we retrieve all the stats (including write + read indexes)
we are considering only source index.
Closes#40771
When specifying a limit over an agg sorting, the limit will be pushed
down to the grouping which affects the custom sorting. This commit fixes
that and restricts the limit only to sorting.
Fix#40984
(cherry picked from commit da3726528d9011b05c0677ece6d11558994eccd9)
Although the translation rule was implemented in the `Optimizer`,
the rule was not added in the list of rules to be executed.
Relates to #41195
Follows #37936
(cherry picked from commit f426a339b77af6008d41cc000c9199fe384e9269)
The unified highlighter returns the first sentence of the text when number_of_fragments
is set to 0 (full highlighting). This is a legacy of the removed postings highlighter
that was based on sentence break only. This commit changes this behavior in order
to respect the provided no_match_size value when number_of_fragments is set to 0.
This means that the behavior will be consistent for any value of the number_of_fragments option.
Closes#41066
Yet another improvement to SYS TABLES on differentiating between table
types specified as '%' and '' while maintaining legacy support for null
Fix#40775
(cherry picked from commit 6dbca5edd335eb1da8e7825389a15e5fe45397d4)
* Adds Bulk delete API to blob container
* Implement bulk delete API for S3
* Adjust S3Fixture to accept both path styles for bulk deletes since the S3 SDK uses both during our ITs
* Closes#40250
Today the blended term query detects if a term exists in a field by looking at the term statistics in the index.
However the value to indicate that a term has no occurence in a field have changed in Lucene. A non-existing term now returns
a doc and total term frequency of 0. Because of this disrepancy the blended term query picks 0 as the minimum frequency for a term
even if other fields have documents for this terms. This confuses the term queries that the blending creates since some of them
contain a custom state that indicates a frequency of 0 even though the term has some occurence in the field. For these terms an exception
is thrown because the term query always checks that the term state's frequency is greater than 0 if there are documents associate to it.
This change fixes this bug by ignoring terms with a doc freq of 0 when the blended term query picks the minimum term frequency among the
requested fields.
Closes#41118