To recap, Attributes form the properties of a derived table.
Each LogicalPlan has Attributes as output since each one can be part of
a query and as such its result are sent to its consumer.
This change essentially removes the name id comparison so any changes
applied to existing expressions should work as long as the said
expressions are semantically equivalent.
This change enforces the hashCode and equals which has the side-effect
of using hashCode as identifiers for each expression.
By removing any property from an Attribute, the various components need
to look the original source for comparison which, while annoying, should
prevent a reference from getting out of sync with its source due to
optimizations.
Essentially going forward there are only 3 types of NamedExpressions:
Alias - user define (implicit or explicit) name
FieldAttribute - field backed by Elasticsearch
ReferenceAttribute - a reference to another source acting as an
Attribute. Typically the Attribute of an Alias.
* Remove the usage of NamedExpression as basis for all Expressions.
Instead, restrict their use only for named context, such as projections
by using Aliasing instead.
* Remove different types of Attributes and allow only FieldAttribute,
UnresolvedAttribute and ReferenceAttribute. To avoid issues with
rewrites, resolve the references inside the QueryContainer so the
information always stays on the source.
* Side-effect, simplify the rules as the state for InnerAggs doesn't
have to be contained anymore.
* Improve ResolveMissingRef rule to handle references to named
non-singular expression tree against the same expression used up the
tree.
#49693 backport to 7.x
(cherry picked from commit 5d095e2173bcbf120f534a6f2a584185a7879b57)
This commit changes the recommended repository file for rpm based
systems to be disabled by default. This is a safer practice so upgrades
of the system do no accidentally upgrade elasticsearch itself.
closes#30660
When testing wildfly with Elasticsearch, we currently dump the wildfly
log if the test fails. However, when starting wildfly we may fail to
find the port number wildfly started on, and fail with no output. This
change dumps the wildflog log when failing to find the http or
management ports.
relates #49374
* Allow list of IPs in geoip ingest processor
This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.
Closes#46193
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic. This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.
Script results were never cached and that does not change here. Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.
Refs: #49466
**Backport**
This commit refactors the `IndexLifecycleRunner` to split out and
consolidate the number of methods that change state from within ILM. It
adds a new class `IndexLifecycleTransition` that contains a number of
static methods used to modify ILM's state. These methods all return new
cluster states rather than making changes themselves (they can be
thought of as helpers for modifying ILM state).
Rather than having multiple ways to move an index to a particular step
(like `moveClusterStateToStep`, `moveClusterStateToNextStep`,
`moveClusterStateToPreviouslyFailedStep`, etc (there are others)) this
now consolidates those into three with (hopefully) useful names:
- `moveClusterStateToStep`
- `moveClusterStateToErrorStep`
- `moveClusterStateToPreviouslyFailedStep`
In the move, I was also able to consolidate duplicate or redundant
arguments to these functions. Prior to this commit there were many calls
that provided duplicate information (both `IndexMetaData` and
`LifecycleExecutionState` for example) where the duplicate argument
could be derived from a previous argument with no problems.
With this split, `IndexLifecycleRunner` now contains the methods used to
actually run steps as well as the methods that kick off cluster state
updates for state transitions. `IndexLifecycleTransition` contains only
the helpers for constructing new states from given scenarios.
This also adds Javadocs to all methods in both `IndexLifecycleRunner`
and `IndexLifecycleTransition` (this accounts for almost all of the
increase in code lines for this commit). It also makes all methods be as
restrictive in visibility, to limit the scope of where they are used.
This refactoring is part of work towards capturing actions and
transitions that ILM makes, by consolidating and simplifying the places
we make state changes, it will make adding operation auditing easier.
Our docs specifically mention that CBOR is supported when ingesting attachments. However this is not tested anywhere.
This adds a test, that uses specifically CBOR format in its IndexRequest and another one that behaves like CBOR in the ingest attachment unit tests.
Moved the deprecation warning to ReindexValidator to ensure it runs
early and works with resilient reindex. Also check that the warning
is reported back for wait_for_completion=false.
Follow-up to #49458
This PR adds 3 nodes to handle types defined by a front-end creating a
Painless AST. These types are decided with data immutability in mind -
hence the reason for more than a single node.
Move BuildParams class to 'minimumRuntime' source set to retain compatibility
with build-tools for builds using a Java 8 runtime.
Closes#49766
(cherry picked from commit 1059f823acdfa7a2f1f9bff21c7256dae4f3e23c)
Historically only two things happened in the final reduction:
empty buckets were filled, and pipeline aggs were reduced (since it
was the final reduction, this was safe). Usage of the final reduction
is growing however. Auto-date-histo might need to perform
many reductions on final-reduce to merge down buckets, CCS
may need to side-step the final reduction if sending to a
different cluster, etc
Having pipelines generate their output in the final reduce was
convenient, but is becoming increasingly difficult to manage
as the rest of the agg framework advances.
This commit decouples pipeline aggs from the final reduction by
introducing a new "top level" reduce, which should be called
at the beginning of the reduce cycle (e.g. from the SearchPhaseController).
This will only reduce pipeline aggs on the final reduce after
the non-pipeline agg tree has been fully reduced.
By separating pipeline reduction into their own set of methods,
aggregations are free to use the final reduction for whatever
purpose without worrying about generating pipeline results
which are non-reducible
When external plugin authors use build-tools, their integ tests depend
on the integ-test-zip artifact. However, this dependency was broken in
7.5.0 by accidentally removing the `@zip` qualifier on the maven
dependency, which works around the fact the pom for the integ-test-zip
claims the artifact is a pom instead of zip packaging. This commit
restores the workaround of using `@zip` until the pom can be fixed.
closes#49787
This cleans up two minor things.
- Cleans up style of == false
- Pulls maxLoopCounter into a member variable instead of accessing
CompilerSettings multiple times in the SFunction node
This is related to #49067. As part of this work a new sniff number of
node connections setting, a simple addresses setting, and a simple
number of sockets setting have been added. This commit ensures that
these settings are properly hooked up to support dynamic updates.
The list used by the search progress listener can be nullified
by another thread that reports a query result. This change replaces
the usage of this list with a new array that is synchronously modified.
Closes#49778
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.
Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.
For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.
Response format:
```
{
"types_allowed": [
"inline",
"stored"
],
"language_contexts": [
{
"language": "expression",
"contexts": [
"aggregation_selector",
"aggs"
...
]
},
{
"language": "painless",
"contexts": [
"aggregation_selector",
"aggs",
"aggs_combine",
...
]
}
...
]
}
```
Fixes: #49463
**Backport**
This removes the storeSettings pass where nodes in the AST could store
information they needed out of CompilerSettings for use during later
passes. CompilerSettings is part of ScriptRoot which is available during the
analysis pass making the storeSettings pass redundant.
The docker build task depends on the docker context being built, but it
was not explicitly setup as an input. This commit adds the task as an
input to the docker build.
relates #49613
* Stop Allocating Buffers in CopyBytesSocketChannel (#49825)
The way things currently work, we read up to 1M from the channel
and then potentially force all of it into the `ByteBuf` passed
by Netty. Since that `ByteBuf` tends to by default be `64k` in size,
large reads will force the buffer to grow, completely circumventing
the logic of `allocHandle`.
This seems like it could break
`io.netty.channel.RecvByteBufAllocator.Handle#continueReading`
since that method for the fixed-size allocator does check
whether the last read was equal to the attempted read size.
So if we set `64k` because that's what the buffer size is,
then wirte `1M` to the buffer we will stop reading on the IO loop,
even though the channel may still have bytes that we can read right away.
More imporatantly though, this can lead to running OOM quite easily
under IO pressure as we are forcing the heap buffers passed to the read
to `reallocate`.
Closes#49699
Adds documentation for the `minimum_should_match` parameter to the `bool` query docs. Includes docs for the default values:
- `1` if the `bool` query includes at least one `should` clause and no `must` or `filter` clauses
- `0` otherwise
* Adds a title abbreviation
* Updates the description and adds a Lucene link
* Reformats the parameters section
* Adds analyze, custom analyzer, and custom filter snippets
Relates to #44726.
IntelliJ IDEA moved their JUnit runner to a different package. While this does not break running
tests in IDEA, it leads to an ugly exception being thrown at the end of the tests:
Exception in thread "main" java.lang.SecurityException: java.lang.System#exit(0) calls are not
allowed
at org.elasticsearch.secure_sm.SecureSM$2.run(SecureSM.java:248)
at org.elasticsearch.secure_sm.SecureSM$2.run(SecureSM.java:215)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:310)
at org.elasticsearch.secure_sm.SecureSM.innerCheckExit(SecureSM.java:215)
at org.elasticsearch.secure_sm.SecureSM.checkExit(SecureSM.java:206)
at java.base/java.lang.Runtime.exit(Runtime.java:111)
at java.base/java.lang.System.exit(System.java:1781)
at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:59)
This commit adds support for newer IDEA versions in SecureSM.
not adjust testCacheability(), which how fails occasionally when given a random
interval source containing a script. This commit overrides testCacheability() to
explicitly sources with and without script filters.
Fixes#49821
By default, AbstractQueryTestCase only changes name and boost in its mutateInstance
method, used when checking equals and hashcode implementations. This commit adds
a mutateInstance method to InveralQueryBuilderTests that will check hashcode and
equality when the field or intervals source are changed.
By default the unified highlighter splits the input into passages using
a sentence break iterator. However we don't check if the field is tokenized
or not so `keyword` field also applies the break iterator even though they can
only match on the entire content. This means that by default we'll split the
content of a `keyword` field on sentence break if the requested number of fragments
is set to a value different than 0 (default to 5). This commit changes this behavior
to ignore the break iterator on non-tokenized fields (keyword) in order to always
highlight the entire values. The number of requested fragments control the number of
matched values are returned but the boundary_scanner_type is now ignored.
Note that this is the behavior in 6x but some refactoring of the Lucene's highlighter
exposed this bug in Elasticsearch 7x.
Backport of #49079. Reimplement a number of the tests from
elastic/elasticsearch-docker.
There is also one Docker image fix here, which is that two of the provided
config files had different file permissions to the rest. I've fixed this
with another RUN chmod while building the image, and adjusted the
corresponding packaging test.
There is a possible NPE in IntervalFilter xcontent serialization when scripts are
used, and `equals` and `hashCode` are also incorrectly implemented for script
filters. This commit fixes both.
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
* I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
When we are notifying systemd that we are fully started up, it can be
that we do not notify systemd before its default timeout of sixty
seconds elapses (e.g., if we are upgrading on-disk metadata). In this
case, we need to notify systemd to extend this timeout so that we are
not abruptly terminated. We do this by repeatedly sending
EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this
every fifteen seconds. This will prevent systemd from abruptly
terminating us during a long startup. We cancel the scheduled execution
of this notification after we have successfully started up.