In scripts (at least some of the languages), the terms dictionary and
postings can be access with the special _index variable. This is for
very advanced use cases which want to do their own scoring. The problem
is segment level statistics must be recomputed for every document.
Additionally, this is not friendly to the terms index caching as the
order of looking up terms should be controlled by lucene.
This change removes _index from scripts. Anyone using it can and should
instead write a Similarity plugin, which is explicitly designed to allow
doing the calculations needed for a relevance score.
closes#19359
This commit documents how to write a `ScriptEngine` in order to use
expert internal apis, such as using Lucene directly to find index term
statistics. These documents prepare the way to remove both native
scripts and IndexLookup.
The example java code is actually compiled and tested under a new gradle
subproject for example plugins. This change does not yet breakup
jvm-example into the new examples dir, which should be done separately.
relates #19359
relates #19966
I just spent ages debugging a script I wrote after following the documentation. It was not clear to me that _index is not defined when using painless; if it was mentioned on this page I would have saved myself a lot of time.
Drops any mention of non-sandboxed scripting languages other than a
brief "we don't support them and we shouldn't because A and B"
statement.
Relates to #23930
They needed to be updated now that Painless is the default and
the non-sandboxed scripting languages are going away or gone.
I dropped the entire section about customizing the classloader
whitelists. In master this barely does anything (exposes more
things to expressions).
Painless uses Ruby-like method dispatch (reciever type, method name,
and arity) rather than Java-like (reciever type, method name, and
argument compile time types) or Groovy-like method dispatch (receiver
type, method name, and argument run time types). We do this for
mostly good reasons but we never documented it.
Relates to #22720
Implemented by wrapping an array of reused `ModuleDateTime`s that
we grow when needed. The `ModuleDateTime`s are reused when we
move to the next document.
Also improves the error message returned when attempting to modify
the `ScriptdocValues`, removes a couple of allocations, and documents
that the date functions are available in Painless.
Relates to #22162
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
Adds "Appending B. Painless API Reference", a reference of all classes
and methods available from Painless. Removes links to java packages
because they contain methods that we don't expose and don't contain
methods that we do expose (the ones in Augmentation). Instead this
generates a list of every class and every exposed method using the same
type information available to the
interpreter/compiler/whatever-we-call-it. From there you can jump to
the relevant docs.
Right now you build all the asciidoc files by running
```
gradle generatePainlessApi
```
These files are expected to be committed because we build the docs
without running `gradle`.
Also changes the output of `Debug.explain` so that it is easy to
search for the class in the generated reference documentation.
You can also run it in an IDE safely if you pass the path to the
directory in which to generate the docs as the first parameter. It'll
blow away the entire directory an recreate it from scratch so be careful.
And then you can build the docs by running something like:
```
../docs/build_docs.pl --out ../built_docs/ --doc docs/reference/index.asciidoc --open
```
That is, if you have checked out https://github.com/elastic/docs in
`../docs`. Wait a minute or two and your browser will pop open in with
all of Elasticsearch's reference documentation. If you go to
`http://localhost:8000/painless-api-reference.html` you can see this
list. Or you can get there by following the links to `Modules` and
`Scripting` and `Painless` and then clicking the link in the paragraphs
below titled `Appendix B. Painless API Reference`.
I like having these in asciidoc because we can deep link to them from the
rest of the guide with constructs like
`<<painless-api-reference-Object-hashCode-0>>` and
`<<painless-api-reference->>` and we get link checking. Then the only
brittle link maintenance bit is the link generation for javadoc. Which
sucks. But I think it is important that we link to the methods directly
so they are easy to find.
Relates to #22720
The `Script source settings` section currently states that `false` means scripting is ENABLED.
The other sections seem to indicate that `false` means scripting is DISABLED.
If the current documentation is correct, that would imply that `inline` and `stored` scripting are ENABLED by default, which seems to conflict with all the other sections in the document.
Sends the `error_trace` parameter with all requests sent by the
yaml test framework, including the doc snippet tests. This can be
overridden by settings `error_trace: false`. While this drift's
core's handling of the yaml tests from the client's slightly this
should only be a problem for tests that rely on the default value,
both of which I've fixed by setting the value explicitly.
This also escapes `\n` and `\t` in the `Stash dump on failure` so
the `stack_trace` is more readable.
Also fixes `RestUpdateSettingsAction` to not think of the `error_trace`
parameter as a setting.
Elasticsearch can be run in a few different ways:
- from the command line on Linux and Windows
- as a service on Linux and Windows
on both 32-bit client and 64-bit server VMs. We strive for a great
out-of-the-box experience any of these combinations but today it is
lacking on 32-bit client JVMs and on the Windows service. There are two
deficiencies that arise:
- on any 32-bit client JVM we fail to start out of the box because we
force the server JVM in jvm.options
- when installing the Windows service, the thread stack size must be
specified in jvm.options
This commit attempts to address these deficiencies.
We should continue to force the server JVM because there are systems
where the server JVM is not active by default (e.g., the 32-bit JDK on
Windows). This does mean that if a user tries to run with a client JVM
they will see a failure message at startup but this is the best that we
can do if we want to continue to force the server JVM. Thus, this commit
at least documents this situation.
To improve the situation with installing the Windows service, this
commit adds a default setting for the thread stack size. This default is
chosen based on the default thread stack size across all 64-bit server
JVMs. This means that if a user tries to run with a 32-bit JVM they
could otherwise see significantly higher memory usage (this situation is
complicated, it's really only on Windows where the extra memory usage is
egregious, but cutting into the 32-bit address space on any system is
bad). So this commit makes it so that the out-of-the-box experience is
improved for the Windows service on 64-bit server JVMs and we document
the need to adjust this setting on 32-bit JVMs.
Again, we are focusing on the out-of-the-box experience here and this
means optimizing for the best experience on any 64-bit server JVM as
this covers the vast majority of the user base. The users that are on
32-bit JVMs will suffer a little bit but at least now any user on any
64-bit server JVM can start Elasticsearch out of the box.
Finally, we fix some references to the jvm.options documentation.
Relates #21920
NOTE: The result of `?.` and `?:` can't be assigned to primitives. So
`int[] someArray = null; int l = someArray?.length` and
`int s = params.size ?: 100` don't work. Do
`def someArray = null; def l = someArray?.length` and
`def s = params.size ?: 100` instead.
Relates to #21748
* Scripting: Remove groovy scripting language
Groovy was deprecated in 5.0. This change removes it, along with the
legacy default language infrastructure in scripting.
You can use `Debug.explain(someObject)` in painless to throw an
`Error` that can't be caught by painless code and contains an
object's class. This is useful because painless's sandbox doesn't
allow you to call `someObject.getClass()`.
Closes#20263
Implements a null coalescing operator in painless that looks like `?:`. This form was chosen to emulate Groovy's `?:` operator. It is different in that it only coalesces null values, instead of Groovy's `?:` operator which coalesces all falsy values. I believe that makes it the same as Kotlin's `?:` operator. In other languages this operator looks like `??` (C#) and `COALESCE` (SQL) and `:-` (bash).
This operator is lazy, meaning the right hand side is only evaluated at all if the left hand side is null.
This change was reverted after it caused random test failures. This was
due to a copy/paste error in the original PR which caused the mock
version of ClusterInfoService to be used whenever the mock *ZenPing* was
used, and the real ClusterInfoService to be used when MockZenPing was
not used.
Null safe dereferences make handling null or missing values shorter.
Compare without:
```
if (ctx._source.missing != null && ctx._source.missing.foo != null) {
ctx._source.foo_length = ctx.source.missing.foo.length()
}
```
To with:
```
Integer length = ctx._source.missing?.foo?.length();
if (length != null) {
ctx._source.foo_length = length
}
```
Combining this with the as of yet unimplemented elvis operator allows
for very concise defaults for nulls:
```
ctx._source.foo_length = ctx._source.missing?.foo?.length() ?: 0;
```
Since you have to start somewhere, we started with null safe dereferenes.
Anyway, this is a feature borrowed from groovy. Groovy allows writing to
null values like:
```
def v = null
v?.field = 'cat'
```
And the writes are simply ignored. Painless doesn't support this at this
point because it'd be complex to implement and maybe not all that useful.
There is no runtime cost for this feature if it is not used. When it is
used we implement it fairly efficiently, adding a jump rather than a
temporary variable.
This should also work fairly well with doc values.
Refactored ScriptType to clean up some of the variable and method names. Added more documentation. Deprecated the 'in' ParseField in favor of 'stored' to match the indexed scripts being replaced by stored scripts.
Adds support for indexing into lists and arrays with negative
indexes meaning "counting from the back". So for if
`x = ["cat", "dog", "chicken"]` then `x[-1] == "chicken"`.
This adds an extra branch to every array and list access but
some performance testing makes it look like the branch predictor
successfully predicts the branch every time so there isn't a
in execution time for this feature when the index is positive.
When the index is negative performance testing showed the runtime
is the same as writing `x[x.length - 1]`, again, presumably thanks
to the branch predictor.
Those performance metrics were calculated for lists and arrays but
`def`s get roughly the same treatment though instead of inlining
the test they need to make a invoke dynamic so we don't screw up
maps.
Closes#20870
When compiling many dynamically changing scripts, parameterized
scripts (<https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-scripting-using.html#prefer-params>)
should be preferred. This enforces a limit to the number of scripts that
can be compiled within a minute. A new dynamic setting is added -
`script.max_compilations_per_minute`, which defaults to 15.
If more dynamic scripts are sent, a user will get the following
exception:
```json
{
"error" : {
"root_cause" : [
{
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "i",
"node" : "a5V1eXcZRYiIk8lecjZ4Jw",
"reason" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
}
],
"caused_by" : {
"type" : "general_script_exception",
"reason" : "Failed to compile inline script [\"aaaaaaaaaaaaaaaa\"] using lang [painless]",
"caused_by" : {
"type" : "circuit_breaking_exception",
"reason" : "[script] Too many dynamic script compilations within one minute, max: [15/min]; please use on-disk, indexed, or scripts with parameters instead",
"bytes_wanted" : 0,
"bytes_limit" : 0
}
}
},
"status" : 500
}
```
This also fixes a bug in `ScriptService` where requests being executed
concurrently on a single node could cause a script to be compiled
multiple times (many in the case of a powerful node with many shards)
due to no synchronization between checking the cache and compiling the
script. There is now synchronization so that a script being compiled
will only be compiled once regardless of the number of concurrent
searches on a node.
Relates to #19396
GeoDistance is implemented using a crazy enum that causes issues with the scripting modules. This commit moves all distance calculations to arcDistance and planeDistance static methods in GeoUtils. It also removes unnecessary distance helper methods from ScriptDocValues.GeoPoints.