This adds support to painless for decimal constants with trailing `d` or
`D` to make it compatible with Java. It already supported integer
constants with a trailing `d` or `D` but this adds tests for it.
Closes#21116
In painless we prefer explicit types over implicit ones whereas
groovy is the other way around. Take this groovy code:
```
> 86400000.class
java.lang.Integer
> 864000000000.class
java.lang.Long
```
Painless accepts `86400000` just fine because that is a valid `int`
in the jvm. It rejects `864000000000` as an invlid `int` constant
because, in painless as in java, `long` constants always end in `L`
or `l`.
To ease the transition from groovy to painless, this changes the
compilation error returned from these invalid constants from:
```
Invalid int constant [864000000000].
```
to
```
Invalid int constant [864000000000]. If you want a long constant then change it to [864000000000L].
```
Inspired by #21313
* master: (516 commits)
Avoid angering Log4j in TransportNodesActionTests
Add trace logging when aquiring and releasing operation locks for replication requests
Fix handler name on message not fully read
Remove accidental import.
Improve log message in TransportNodesAction
Clean up of Script.
Update Joda Time to version 2.9.5 (#21468)
Remove unused ClusterService dependency from SearchPhaseController (#21421)
Remove max_local_storage_nodes from elasticsearch.yml (#21467)
Wait for all reindex subtasks before rethrottling
Correcting a typo-Maan to Man-in README.textile (#21466)
Fix InternalSearchHit#hasSource to return the proper boolean value (#21441)
Replace all index date-math examples with the URI encoded form
Fix typos (#21456)
Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
Add null check in InternalSearchHit#sourceRef to prevent NPE (#21431)
Add VirtualBox version check (#21370)
Export ES_JVM_OPTIONS for SysV init
Skip reindex rethrottle tests with workers
Make forbidden APIs be quieter about classpath warnings (#21443)
...
In the test for reindex and friend's rethrottling feature we were waiting
only for a single reindex sub task to start before rethrottling. This
mostly worked because starting tasks is fast. But it didn't *always work
and CI found that for us. This fixes the test to wait for all subtasks
to start before rethrottling.
I reproduced this locally semi-consistently with some fairly creative
`Thread.sleep` calls and this test fix fixes the issue even with the
sleeps so I'm fairly sure this will work consistently.
Closes#21446
The method used to be called `isSourceEmpty`, and was renamed to `hasSource`, but the return value never changed. Updated tests and users accordingly.
Closes#21419
Null safe dereferences make handling null or missing values shorter.
Compare without:
```
if (ctx._source.missing != null && ctx._source.missing.foo != null) {
ctx._source.foo_length = ctx.source.missing.foo.length()
}
```
To with:
```
Integer length = ctx._source.missing?.foo?.length();
if (length != null) {
ctx._source.foo_length = length
}
```
Combining this with the as of yet unimplemented elvis operator allows
for very concise defaults for nulls:
```
ctx._source.foo_length = ctx._source.missing?.foo?.length() ?: 0;
```
Since you have to start somewhere, we started with null safe dereferenes.
Anyway, this is a feature borrowed from groovy. Groovy allows writing to
null values like:
```
def v = null
v?.field = 'cat'
```
And the writes are simply ignored. Painless doesn't support this at this
point because it'd be complex to implement and maybe not all that useful.
There is no runtime cost for this feature if it is not used. When it is
used we implement it fairly efficiently, adding a jump rather than a
temporary variable.
This should also work fairly well with doc values.
If you try to reindex with multiple slices against a node that
doesn't support it we throw an `IllegalArgumentException` so
`assertVersionSerializable` is ok with it and so if this happens
in REST it comes back as a 400 error.
* Rest client: don't reuse that same HttpAsyncResponseConsumer across multiple retries
Turns out that AbstractAsyncResponseConsumer from apache async http client is stateful and cannot be reused across multiple requests. The failover mechanism was mistakenly reusing that same instance, which can be provided by users, across retries in case nodes are down or return 5xx errors. The downside is that we have to change the signature of two public methods, as HttpAsyncResponseConsumer cannot be provided directly anymore, rather its factory needs to be provided which is going to be used to create one instance of the consumer per request attempt.
Up until now we tested our RestClient against multiple nodes only in a mock environment, where we don't really send http requests. In that scenario we can verify that retries etc. work properly but the interaction with the http client library in a real scenario is different and can catch other problems. With this commit we also add an integration test that sends requests to multiple hosts, and some of them may also get stopped meanwhile. The specific test for pathPrefix was also removed as pathPrefix is now randomly applied by default, hence implicitly tested. Moved also a small test method that checked the validity of the path argument to the unit test RestClientSingleHostTests.
Also increase default buffer limit to 100MB and make it required in default consumer
The default buffer limit used to be 10MB but that proved not to be high enough for scroll requests (see reindex from remote). With this commit we increase the limit to 100MB and make it a bit more visibile in the consumer factory.
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
Adds support for `?slices=N` to reindex which automatically
parallelizes the process using parallel scrolls on `_uid`. Performance
testing sees a 3x performance improvement for simple docs
on decent hardware, maybe 30% performance improvement
for more complex docs. Still compelling, especially because
clusters should be able to get closer to the 3x than the 30%
number.
Closes#20624
This adds checks for expected warning headers to the query builder test
infrastructure. Tests that are adding deprecation warnings to the response
headers need to check those, otherwise the abstract base class for the test
class will complain at teardown.
The `IndexService#newQueryShardContext()` method creates a QueryShardContext on
shard `0`, with a `null` reader and that uses `System.currentTimeMillis()` to
resolve `now`. This may hide bugs, since the shard id is sometimes used for
query parsing (it is used to salt random score generation in `function_score`),
passing a `null` reader disables query rewriting and for some use-cases, it is
simply not ok to rely on the current timestamp (eg. percolation). So this pull
request removes this method and instead requires that all call sites provide
these parameters explicitly.
It was 10mb and that was causing trouble when folks reindex-from-remoted
with large documents.
We also improve the error reporting so it tells folks to use a smaller
batch size if they hit a buffer size exception. Finally, adds some docs
to reindex-from-remote mentioning the buffer and giving an example of
lowering the size.
Closes#21185
Previously Elasticsearch would only use the package name for logging
levels, truncating the package prefix and the class name. This meant
that logger names for Netty were just prefixed by netty3 and netty. We
changed this for Elasticsearch so that it's the fully-qualified class
name now, but never corrected this for Netty. This commit fixes the
logger names for the Netty modules so that their levels are controlled
by the fully-qualified class name.
Relates #21223
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
Java 9's exception message when lists have an out of bounds index
is much better than java 8 but the painless code asserted on the
java 8 message. Now it'll accept either.
I'm tempted to weaken the assertion but I like asserting that the
message is readable.
Adds support for indexing into lists and arrays with negative
indexes meaning "counting from the back". So for if
`x = ["cat", "dog", "chicken"]` then `x[-1] == "chicken"`.
This adds an extra branch to every array and list access but
some performance testing makes it look like the branch predictor
successfully predicts the branch every time so there isn't a
in execution time for this feature when the index is positive.
When the index is negative performance testing showed the runtime
is the same as writing `x[x.length - 1]`, again, presumably thanks
to the branch predictor.
Those performance metrics were calculated for lists and arrays but
`def`s get roughly the same treatment though instead of inlining
the test they need to make a invoke dynamic so we don't screw up
maps.
Closes#20870
Lucene 6.3 is expected to be released in the next weeks so it'd be good to give
it some integration testing. I had to upgrade randomized-testing too so that
both Lucene and Elasticsearch are on the same version.
This commit fixes responses to HEAD requests so that the value of the
Content-Length is correct per the HTTP spec. Namely, the value of this
header should be equal to the Content-Length if the request were not a
HEAD request.
This commit also fixes a memory leak on HEAD requests to the main action
that arose from the bytes on a builder not being released due to them
being dropped on the floor to ensure that the response to the main
action did not have a body.
Relates #21123
Versions before 2.0 needed to be told to return interesting fields
like `_parent`, `_routing`, `_ttl`, and `_timestamp`. And they come
back inside a `fields` block which we need to parse.
Closes#21044
This commit upgrades the transport-netty4 module dependency from Netty
version 4.1.5 to version 4.1.6. This is a bug fix release of Netty.
Relates #21051
This allows you to whitelist `localhost:*` or `127.0.10.*:9200`.
It explicitly checks for patterns like `*` in the whitelist and
refuses to start if the whitelist would match everything. Beyond
that the user is on their own designing a secure whitelist.
When running `gradle run`, a developer usually intends to get a running
instance as if they had run elasticsearch from the command line. This is
different than the isolated environment we use for integration testing
plugins. This change switches the run task to use the zip distribution,
so that all modules included in the normal distribution are included.
* Scripting: Add support for booleans in scripts
Since 2.0, booleans have been represented as numeric fields (longs).
However, in scripts, this is odd, since you expect doing a comparison
against a boolean to work. While languages like groovy will auto convert
between booleans and longs, painless does not.
This changes the doc values accessor for boolean fields in scripts to
return Boolean objects instead of Long objects.
closes#20949
* Make Booleans final and remove wrapping of `this` for getValues()
This commit fixes an issue with the handling of the value "keep-alive"
on the Connection header in the Netty 4 HTTP implementation while
handling an HTTP 1.0 request. The issue was using the wrong equals
method to compare an AsciiString instance and a String instance (they
could never be equal). This commit fixes this to use the correct equals
method to compare for content equality.
This commit fixes an issue with the handling of the value "close" on the
Connection header in the Netty 4 HTTP implementation. The issue was
using the wrong equals method to compare an AsciiString instance and a
String instance (they could never be equal). This commit fixes this to
use the correct equals method to compare for content equality.
Relates #20956
Today Elasticsearch limits the number of processors used in computing
thread counts to 32. This was from a time when Elasticsearch created
more threads than it does now and users would run into out of memory
errors. It appears the real cause of these out of memory errors was not
well understood (it's often due to ulimit settings) and so users were
left hitting these out of memory errors on boxes with high core
counts. Today Elasticsearch creates less threads (but still a lot) and
we have a bootstrap check in place to ensure that the relevant ulimit is
not too low.
There are some caveats still to having too many concurrent indexing
threads as it can lead to too many little segments, and it's not a
magical go faster knob if indexing is already bottlenecked by disk, but
this limitation is artificial and surprising to users and so it should
be removed.
This commit also increases the lower bound of the max processes ulimit,
to prepare for a world where Elasticsearch instances might be running
with more the previous cap of 32 processors. With the current settings,
Elasticsearch wants to create roughly 576 + 25 * p / 2 threads, where p
is the number of processors. Add in roughly 7 * p / 8 threads for the GC
threads and a fudge factor, and 4096 should cover us pretty well up to
256 cores.
Relates #20874
Both netty3 and netty4 http implementation printed the default
toString representation of PortRange if ports couldn't be bound.
This commit adds a better default toString method to PortRange and
uses the string representation for the error message in the http
implementations.
Some objects like maps, iterables or arrays of objects can self-reference themselves. This is mostly due to a bug in code but the XContentBuilder should be able to detect such situations and throws an IllegalArgumentException instead of building objects over and over until a stackoverflow occurs.
closes#20540closes#19475
Update scripts might want to update the documents `_timestamp` but need a notion of `now()`.
Painless doesn't support any notion of now() since it would make scripts non-pure functions. Yet,
in the update case this is a valid value and we can pass it with the context together to allow the
script to record the timestamp the document was updated.
Relates to #17895
* Replace org.elasticsearch.common.lucene.search.MatchNoDocsQuery with its Lucene version (org.apache.lucene.search.MatchNoDocsQuery)
This change removes the ES version of the match no docs query and replaces it with the Lucene version.
relates #18030
* Add missing change
Today we throw an assertion error if we release an AbstractArray more than once.
Yet, it's recommended to implement close methods such that they can be invoked
more than once. Guaranteed single release calls are hard to implement and some
situations might not be tested causing for instance `CircuitBreaker` to operate on
corrupted memory stats.
UpdateHelper, MetaDataIndexUpgradeService, and some recovery
stuff.
Move ClusterSettings to nullable ctor parameter of TransportService
so it isn't forgotten.
This change proposes the removal of all non-tcp transport implementations. The
mock transport can be used by default to run tests instead of local transport that has
roughly the same performance compared to TCP or at least not noticeably slower.
This is a master only change, deprecation notice in 5.x will be committed as a
separate change.
Today SearchContext expose the current context as a thread local which makes any kind of sane interface design very very hard. This PR removes the thread local entirely and instead passes the relevant context anywhere needed. This simplifies state management dramatically and will allow for a much leaner SearchContext interface down the road.
Today when parsing a request, Elasticsearch silently ignores incorrect
(including parameters with typos) or unused parameters. This is bad as
it leads to requests having unintended behavior (e.g., if a user hits
the _analyze API and misspell the "tokenizer" then Elasticsearch will
just use the standard analyzer, completely against intentions).
This commit removes lenient URL parameter parsing. The strategy is
simple: when a request is handled and a parameter is touched, we mark it
as such. Before the request is actually executed, we check to ensure
that all parameters have been consumed. If there are remaining
parameters yet to be consumed, we fail the request with a list of the
unconsumed parameters. An exception has to be made for parameters that
format the response (as opposed to controlling the request); for this
case, handlers are able to provide a list of parameters that should be
excluded from tripping the unconsumed parameters check because those
parameters will be used in formatting the response.
Additionally, some inconsistencies between the parameters in the code
and in the docs are corrected.
Relates #20722
The invalid ingest configuration field name used to show itself,
even when it was null, in error messages. Sometimes this does not make
sense.
e.g.
```[null] Only one of [file], [id], or [inline] may be configure```
vs.
```Only one of [file], [id], or [inline] may be configure```
The above deals with three fields, therefore this no one property
responsible.
* master: (1199 commits)
[DOCS] Remove non-valid link to mapping migration document
Revert "Default `include_in_all` for numeric-like types to false"
test: add a test with ipv6 address
docs: clearify that both ip4 and ip6 addresses are supported
Include complex settings in settings requests
Add production warning for pre-release builds
Clean up confusing error message on unhandled endpoint
[TEST] Increase logging level in testDelayShards()
change health from string to enum (#20661)
Provide error message when plugin id is missing
Document that sliced scroll works for reindex
Make reindex-from-remote ignore unknown fields
Remove NoopGatewayAllocator in favor of a more realistic mock (#20637)
Remove Marvel character reference from guide
Fix documentation for setting Java I/O temp dir
Update client benchmarks to log4j2
Changes the API of GatewayAllocator#applyStartedShards and (#20642)
Removes FailedRerouteAllocation and StartedRerouteAllocation
IndexRoutingTable.initializeEmpty shouldn't override supplied primary RecoverySource (#20638)
Smoke tester: Adjust to latest changes (#20611)
...
reindex-from-remote should ignore unknown fields so it is mostly
future compatible. This makes it ignore unknown fields by adding an
option to `ObjectParser` and `ConstructingObjectParser` that, if
enabled, causes them to ignore unknown fields.
Closes#20504
Today we hold on to all possible tokenizers, tokenfilters etc. when we create
an index service on a node. This was mainly done to allow the `_analyze` API to
directly access all these primitive. We fixed this in #19827 and can now get rid of
the AnalysisService entirely and replace it with a simple map like class. This
ensures we don't create a gazillion long living objects that are entirely useless since
they are never used in most of the indices. Also those objects might consume a considerable
amount of memory since they might load stopwords or synonyms etc.
Closes#19828
This commit removes `ByteSizeValue`'s methods that are duplicated (ex: `mbFrac()` and `getMbFrac()`) in order to only keep the `getN` form.
It also renames `mb()` -> `getMb()`, `kb()` -> `getKB()` in order to be more coherent with the `ByteSizeUnit` method names.
* Build: Remove old maven deploy support
This change removes the old maven deploy that we have in parallel to
maven-publish, and makes maven-publish fully work with publishing to
maven local. Using `gradle publishToMavenLocal` should be used to
publish to .m2.
Note that there is an unfortunate hack that means for
zip artifacts we must first create/publish a dummy pom file, and then
follow that with the real pom file. It would be nice to have the pom
file contains packaging=zip, but maven central then requires sources and
javadocs. But our zips are really just attached artifacts, so we already
set the packaging type to pom for our zip files. This change just works
around a limitation of the underlying maven publishing library which
silently skips attached artifacts when the packaging type is set to pom.
relates #20164closes#20375
* Remove unnecessary extra spacing
This change removes all guice interaction from Transport, HttpServerTransport,
HttpServer and TransportService. All these classes as well as their subclasses
or extended version configured via plugins are now created by using plain old
bloody java constructors. YAY!
We can now run templates using `explain` and/or `profile` parameters.
Which is interesting when you have defined a complicated profile but want to debug it in an easier way than running the full query again.
You can use `explain` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"explain": true
}
```
You can use `profile` parameter when running a template:
```js
GET /_search/template
{
"file": "my_template",
"params": {
"status": [ "pending", "published" ]
},
"profile": true
}
```
TransportService is such a central part of the core server, replacing
it's implementation is risky and can cause serious issues. This change removes the ability to
plug in TransportService but allows registering a TransportInterceptor that enables
plugins to intercept requests on both the sender and the receiver ends. This is a commonly used
and overwritten functionality but encapsulates the custom code in a contained manner.
This change replaces the fields parameter with stored_fields when it makes sense.
This is dictated by the renaming we made in #18943 for the search API.
The following list of endpoint has been changed to use `stored_fields` instead of `fields`:
* get
* mget
* explain
The documentation and the rest API spec has been updated to cope with the changes for the following APIs:
* delete_by_query
* get
* mget
* explain
The `fields` parameter has been deprecated for the following APIs (it is replaced by _source filtering):
* update: the fields are extracted from the _source directly.
* bulk: the fields parameter is used but fields are extracted from the source directly so it is allowed to have non-stored fields.
Some APIs still have the `fields` parameter for various reasons:
* cat.fielddata: the fields paramaters relates to the fielddata fields that should be printed.
* indices.clear_cache: used to indicate which fielddata fields should be cleared.
* indices.get_field_mapping: used to filter fields in the mapping.
* indices.stats: get stats on fields (stored or not stored).
* termvectors: fields are retrieved from the stored fields if possible and extracted from the _source otherwise.
* mtermvectors:
* nodes.stats: the fields parameter is used to concatenate completion_fields and fielddata_fields so it's not related to stored_fields at all.
Fixes#20155
In 5.x we allowed this with a deprecation warning. This removes the code
added for that deprecation, requiring the cluster name to not be in the
data path.
Resolves#20391
** The default script language is now maintained in `Script` class.
* Added `script.legacy.default_lang` setting that controls the default language for scripts that are stored inside documents (for example percolator queries). This defaults to groovy.
** Added `QueryParseContext#getDefaultScriptLanguage()` that manages the default scripting language. Returns always `painless`, unless loading query/search request in legacy mode then the returns what is configured in `script.legacy.default_lang` setting.
** In the aggregation parsing code added `ParserContext` that also holds the default scripting language like `QueryParseContext`. Most parser don't have access to `QueryParseContext`. This is for scripts in aggregations.
* The `lang` script field is always serialized (toXContent).
Closes#20122