Start moving built in analysis components into the new analysis-common
module. The goal of this project is:
1. Remove core's dependency on lucene-analyzers-common.jar which should
shrink the dependencies for transport client and high level rest client.
2. Prove that analysis plugins can do all the "built in" things by moving all
"built in" behavior to a plugin.
3. Force tests not to depend on any oddball analyzer behavior. If tests
need anything more than the standard analyzer they can use the mock
analyzer provided by Lucene's test infrastructure.
`script_stack` is super useful when debugging Painless scripts
because it skips all the "weird" stuff involved that obfuscates
where the actual error is. It skips Painless's internals and
call site bootstrapping.
It works fine, but it didn't have many tests. This converts a
test that we had for line numbers into a test for the
`script_stack`. The line numbers test was an indirect test
for `script_stack`.
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
We'd like to be able to support context-sensitive whitelists in
Painless but we can't now because the whitelist is a static thing.
This begins to de-static the whitelist, in particular removing
the static keyword from most of the methods on `Definition` and
plumbing the static instance into the appropriate spots as though
it weren't static. Once we de-static all the methods we should be
able to fairly simply build context-sensitive whitelists.
The only "fun" bit of this is that I added another layer in the
chain of methods that bootstraps `def` calls. Instead of running
`invokedynamic` directly on `DefBootstrap` we now `invokedynamic`
`$bootstrapDef` on the script itself loads the `Definition` that
the script was compiled against and then calls `DefBootstrap`.
I chose to put `Definition` into `Locals` so I didn't have to
change the signature of all the `analyze` methods. I could have
do it another way, but that seems ok for now.
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.
Some notes about the change:
- it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
- it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
- two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
The JVM caches `Integer` objects. This is known. A test in Painless
was relying on the JVM not caching the particular integer `1000`.
It turns out that when you provide `-XX:+AggressiveOpts` the JVM
*does* cache `1000`, causing the test to fail when that is
specified.
This replaces `1000` with a randomly selected integer that we test
to make sure *isn't* cached by the JVM. *Hopefully* this test is
good enough. It relies on the caching not changing in between when
we check that the value isn't cached and when we run the painless
code. The cache now is a simple array but there is nothing
preventing it from changing. If it does change in a way that thwarts
this test then the test fail fail again. At least when that happens
the next person can see the comment about how it is important
that the integer isn't cached and can follow that line of inquiry.
Closes#24041
When indexing a document via the bulk API where IDs can be explicitly
specified, we currently accept an empty ID. This is problematic because
such a document can not be obtained via the get API. Instead, we should
rejected these requets as accepting them could be a dangerous form of
leniency. Additionally, we already have a way of specifying
auto-generated IDs and that is to not explicitly specify an ID so we do
not need a second way. This commit rejects the individual requests where
ID is specified but empty.
Relates #24118
This commit makes closing a ReleasableBytesStreamOutput release the underlying BigArray so
that we can use try-with-resources with these streams and avoid leaking memory by not returning
the BigArray. As part of this change, the ReleasableBytesStreamOutput adds protection to only
release the BigArray once.
In order to make some of the changes cleaner, the ReleasableBytesStream interface has been
removed. The BytesStream interface is changed to a abstract class so that we can use it as a
useable return type for a new method, Streams#flushOnCloseStream. This new method wraps a
given stream and overrides the close method so that the stream is simply flushed and not closed.
This behavior is used in the TcpTransport when compression is used with a
ReleasableBytesStreamOutput as we need to close the compressed stream to ensure all of the data
is written from this stream. Closing the compressed stream will try to close the underlying stream
but we only want to flush so that all of the written bytes are available.
Additionally, an error message method added in the BytesRestResponse did not use a builder
provided by the channel and instead created its own JSON builder. This changes that method to use
the channel builder and in turn the bytes stream output that is managed by the channel.
Note, this commit differs from 6bfecdf921 in that it updates
ReleasableBytesStreamOutput to handle the case of the BigArray decreasing in size, which changes
the reference to the BigArray. When the reference is changed, the releasable needs to be updated
otherwise there could be a leak of bytes and corruption of data in unrelated streams.
This reverts commit afd45c1432, which reverted #23572.
This commit collapses the SyncBulkRequestHandler and
AsyncBulkRequestHandler into a single BulkRequestHandler. The new
handler executes a bulk request and awaits for the completion if the
BulkProcessor was configured with a concurrentRequests setting of 0.
Otherwise the execution happens asynchronously.
As part of this change the Retry class has been refactored.
withSyncBackoff and withAsyncBackoff have been replaced with two
versions of withBackoff. One method takes a listener that will be
called on completion. The other method returns a future that will been
complete on request completion.
This commit skips the two Painless tests
EqualsTests#testBranchEqualsDefAndPrimitive and
EqualsTests#testBranchNotEqualsDefAndPrimitive on Windows as the tests
are repeatedly failing there.
The getProperty method is an internal method needed to run pipeline aggregations and retrieve info by path from the aggs tree. It is not needed in the MultiBucketsAggregation.Bucket interface, which is returned to users running aggregations from the transport client. The method is moved to the InternalMultiBucketAggregation class as that's where it belongs.
reindex_from_remote was using `TimeValue#toString` to generate the
scroll timeout which is bad because that generates fractional
time values that are useful for people but bad for Elasticsearch
which doesn't like to parse them. This switches it to using
`TimeValue#getStringRep` which spits out whole time values.
Closes to #23945
Makes #23828 even more desirable
Before now ranges where forbidden, because the percolator query itself could get cached and then the percolator queries with now ranges that should no longer match, incorrectly will continue to match.
By disabling caching when the `percolator` is being used, the percolator can now correctly support range queries with now based ranges.
I think this is the right tradeoff. The percolator query is likely to not be the same between search requests and disabling range queries with now ranges really disabled people using the percolator for their use cases.
Also fixed an issue that existed in the percolator fieldmapper, it was unable to find forbidden queries inside `dismax` queries.
Closes#23859
This commit modifies the BulkProcessor to be decoupled from the
client implementation. Instead it just takes a
BiConsumer<BulkRequest, ActionListener<BulkResponse>> that executes
the BulkRequest.
This commit makes closing a ReleasableBytesStreamOutput release the underlying BigArray so
that we can use try-with-resources with these streams and avoid leaking memory by not returning
the BigArray. As part of this change, the ReleasableBytesStreamOutput adds protection to only release the BigArray once.
In order to make some of the changes cleaner, the ReleasableBytesStream interface has been
removed. The BytesStream interface is changed to a abstract class so that we can use it as a
useable return type for a new method, Streams#flushOnCloseStream. This new method wraps a
given stream and overrides the close method so that the stream is simply flushed and not closed.
This behavior is used in the TcpTransport when compression is used with a
ReleasableBytesStreamOutput as we need to close the compressed stream to ensure all of the data
is written from this stream. Closing the compressed stream will try to close the underlying stream
but we only want to flush so that all of the written bytes are available.
Additionally, an error message method added in the BytesRestResponse did not use a builder
provided by the channel and instead created its own JSON builder. This changes that method to use the channel builder and in turn the bytes stream output that is managed by the channel.
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit changes the listener passed to sendMessage from a Runnable
to a ActionListener.
This change also removes IOException from the sendMessage signature.
That signature is misleading as it allows implementers to assume an
exception will be thrown in case of failure. That does not happen due
to Netty's async nature.
As the query of a search request defaults to match_all,
calling _delete_by_query without an explicit query may
result in deleting all data.
In order to protect users against falling into that
pitfall, this commit adds a check to require the explicit
setting of a query.
Closes#23629
The current rest backcompat tests, which run against a mixed cluster of
5.x and 6.0 nodes, depend on snapshot builds of 5.x. However, this has
the potential for inconsistency that results in CI failures, and happens
quite often, whenever some backcompat logic is added to 5.x, but the bwc
test on master fails because the 5.x code has not yet been published as
a snapshot.
This change creates a git clone of the 5.x branch,
builds the zip distribution, and ties that into gradle substitutions for
the 5.x version.
Removed `parse(String index, String type, String id, BytesReference source)` in DocumentMapper.java and replaced all of its use in Test files with `parse(SourceToParse source)`.
`parse(String index, String type, String id, BytesReference source)` was only used in test files and never in the main code so it was removed. All of the test files that used it was then modified to use `parse(SourceToParse source)` method that existing in DocumentMapper.java
Without this change, if write a script with multiple regexes
*sometimes* the lexer will decide to look at them like one
big regex and then some trailing garbage. Like this discuss post:
https://discuss.elastic.co/t/error-with-the-split-function-in-painless-script/79021
```
def val = /\\\\/.split(ctx._source.event_data.param17);
if (val[2] =~ /\\./) {
def val2 = /\\./.split(val[2]);
ctx._source['user_crash'] = val2[0]
} else {
ctx._source['user_crash'] = val[2]
}
```
The error message you get from the lexer is `lexer_no_viable_alt_exception`
right after the *second* regex.
With this change each regex is just a single regex like it ought to be.
As a bonus, while looking into this issue I found that the error
reporting for regexes wasn't very nice. If you specify an invalid
pattern then you get an error marker on the start of the pattern
with the JVM's regex error message which attempts to point you to the
location in the regex but is totally unreadable in the JSON response.
This change fixes the location to point to the appropriate spot
inside the pattern and removes the portion of the JVM's error message
that doesn't render well. It is no longer needed now that we point
users to the appropriate spot in the pattern.
Changes reindex and friends to wait until the entire request has
been "cleaned up" before responding. "Clean up" in this context
is clearing the scroll and (for reindex-from-remote) shutting
down the client. Failures to clean up are still only logged, not
returned to the user.
Closes#23653
This commit upgrades the Netty dependencies from version 4.1.8 to
version 4.1.9. This commit picks up a few bug fixes that impacted us:
- Netty was incorrectly ignoring interfaces with self-assigned MAC
addresses (e.g., instances running in Docker containers or on EC2)
- incorrect handling of the Expect: 100-continue header
Relates #23540
With this commit we change the default receive predictor size for Netty
from 32kB to 64kB as our testing has shown that this leads to less
allocations on smaller heaps like the default out of the box
configuration and this value also works reasonably well for larger
heaps.
Closes#23185
This commit mutes a ton of Painless lambda tests on JDK 9. This commit
did not attempt to discover exactly which tests are failing, but instead
just blanket muted all tests in LambdaTests, FunctionRefTests, and
AugmentationTests.
Relates #23473
Previously, the RestController would stash the context prior to copying headers. However, there could be deprecation
log messages logged and in turn warning headers being added to the context prior to the stashing of the context. These
headers in the context would then be removed from the request and also leaked back into the calling thread's context.
This change moves the stashing of the context to the HttpTransport so that the network threads' context isn't
accidentally populated with warning headers and to ensure the headers added early on in the RestController are not
excluded from the response.
Throw error when skip or do sections are malformed, such as they don't start with the proper token (START_OBJECT). That signals bad indentation, which would be ignored otherwise. Thanks (or due to) our pull parsing code, we were still able to properly parse the sections, yet other runners weren't able to.
Closes#21980
* [TEST] fix indentation in matrix_stats yaml tests
* [TEST] fix indentation in painless yaml test
* [TEST] fix indentation in analysis yaml tests
* [TEST] fix indentation in generated docs yaml tests
* [TEST] fix indentation in multi_cluster_search yaml tests
We have many version constants in master that have already been
released, but are still marked (by naming convention) as unreleased.
This commit renames those version constants.
The dependencyLicenses check has the ability to map multiple jar files
to the same license file. However, netty was not taking advantage of
this, and had duplicate copies of its license/notice files for each jar.
This commit reduces the copies to one and uses the mapping feature.
This commit sets the intial size of the pipeline handler queue small to
prevent waste if pipelined requests are never sent. Since the queue will
grow quickly if pipeline requests are indeed set, this should not be
problematic.
Relates #23335
When pipelined responses are sent to the pipeline handler for writing,
they are not necessarily written immediately. They must be held in a
priority queue until all responses preceding the given response are
written. This means that when write is invoked on the handler, the
promise that is attached to the write invocation will not necessarily be
the promise associated with the responses that are written while the
queue is drained. To address this, the promise associated with a
pipelined response must be held with the response and then used when the
channel context is actually written to. This was introduced when
ensuring that the releasing promise is always chained through on write
calls lest the releasing promise never be invoked. This leads to many
failing test cases, so no new test cases are needed here.
Relates #23317
Gradle's finalizedBy on tasks only ensures one task runs after another,
but not immediately after. This is problematic for our integration tests
since it allows multiple project's integ test clusters to be
simultaneously. While this has not been a problem thus far (gradle 2.13
happened to keep the finalizedBy tasks close enough that no clusters
were running in parallel), with gradle 3.3 the task graph generation has
changed, and numerous clusters may be running simultaneously, causing
memory pressure, and thus generally slower tests, or even failure if the
system has a limited amount of memory (eg in a vagrant host).
This commit reworks how integ tests are configured. It adds an
`integTestCluster` extension to gradle which is equivalent to the current
`integTest.cluster` and moves the rest test runner task to
`integTestRunner`. The `integTest` task is then just a dummy task,
which depends on the cluster runner task, as well as the cluster stop
task. This means running `integTest` in one project will both run the
rest tests, and shut down the cluster, before running `integTest` in
another project.
When sending a response to a client, we attach a releasing listener to
the channel promise. If the client disappears before the response is
sent, the releasing listener was never notified. The reason the
listeners were never notified was due to a mistaken invocation of write
and flush on the channel which has two overrides: one that takes an
existing promise, and one that does not and instead creates a new
promise. When the client disappears, it is this latter promise that is
notified, which does not contain the releasing listener. This commit
addreses this issue by invoking the override that passes our channel
promise through.
Relates #23310