to specify a `targetField`. This results in some interesting behavior that was missed in the review.
This processor sorts in-place, so there is a side-effect in both the original field and the target field.
Another bug was that the targetField was not being set if the list being sorted was fewer than two elements.
The new behavior works like this: If targetField and fieldName are not the same, we copy the list.
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
This PR enables Ingest plugins to leverage processor-scoped REST
endpoints. First of which being the Grok endpoint that retrieves
Grok Patterns for users to retrieve all the built-in patterns.
Example usage: Kibana Grok Autocomplete!
Unknown patterns used to silently be ignored. This was a problem because users did not know they were providing an invalid pattern name, and maybe thought the rest of their regexes were invalid.
Fixes#22831.
REST handlers that require a body will throw an an ElasticsearchParseException "request body required".
REST handlers that require a body OR source param will throw an ElasticsearchParseException "request body or source param required".
Replaced asserts in BulkRequest parsing code with a more descriptive IllegalArgumentException if the line contains an empty object.
Updated bulk REST test to verify an empty action line is rejected properly.
Updated BulkRequestTests with randomized testing for an empty action line.
Used try-with-resouces for XContentParser in AbstractBulkByQueryRestHandler.
DateProcessor's DateFormat UNIX format parser resulted in
a floating point rounding error when parsing certain stringed
epoch times. Now Double.parseDouble is used, preserving the
intented input.
This commit renames the concept of the "compiled type" to a "factory
type", along with all implementations of this class to be named Factory.
This brings it inline with the classes purpose.
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
This commit modifies the compile method of ScriptService to be context
aware. The ScriptContext is now a generic class which contains both the
instance type and compiled type for a script. Instance type may be
stateful (for example, pre loading field information for the index a
script will execute on, like in expressions), while the compiled type is
stateless and used to construct instance type instances. This change is
only a first step to cutover ScriptService to the new paradigm. It only
converts callers to the script service, and has a small shim to wrap
compilation from the script engines to support the current two fixed
instance types, SearchScript and ExecutableScript.
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
When constructing an array list, if we know the size of the list in
advance (because we are adding objects to it derived from another list),
we should size the array list to the appropriate capacity in advance (to
avoid resizing allocations). This commit does this in various places.
Relates #24439
The one argument ctor for `Script` creates a script with the
default language but most usages of are for testing and either
don't care about the language or are for use with
`MockScriptEngine`. This replaces most usages of the one argument
ctor on `Script` with calls to `ESTestCase#mockScript` to make
it clear that the tests don't need the default scripting language.
I've also factored out some copy and pasted script generation
code into a single place. I would have had to change that code
to use `mockScript` anyway, so it was easier to perform the
refactor.
Relates to #16314
ScriptService has two executable methods, one which takes a
CompiledScript, which is similar to search, and one that takes a raw
Script and both compiles and returns an ExecutableScript for it. The
latter is not needed, and the call sites which used one or the other
were mixed. This commit removes the extra executable method in favor of
callers first calling compile, then executable.
This change simplifies how the rest test runner finds test files and
removes all leniency. Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.
closes#20240
This commit renames the random ASCII helper methods in ESTestCase. This
is because this method ultimately uses the random ASCII methods from
randomized runner, but these methods actually only produce random
strings generated from [a-zA-Z].
Relates #23886
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates #22960
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.
The new behavior is the following:
When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.
Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].
When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.
Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.
When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
Beforehand, the DateProcessor constructs its joda pattern formatter during processor
construction. This led to newly ingested documents being defaulted to
the year that the pipeline was constructed, not that of processing.
Fixes#22547.
* Remove a checked exception, replacing it with `ParsingException`.
* Remove all Parser classes for the yaml sections, replacing them with static methods.
* Remove `ClientYamlTestFragmentParser`. Isn't used any more.
* Remove `ClientYamlTestSuiteParseContext`, replacing it with some static utility methods.
I did not rewrite the parsers using `ObjectParser` because I don't think it is worth it right now.
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.
Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.
The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.
The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
The JSON processor has an optional field called "target_field".
If you don't specify target_field then target_field becomes what you specified as "field".
There isn't anyway to add the fields to the root of a document. By
setting `add_to_root`, now serialized fields will be inserted into the
top-level fields of the ingest document.
Closes#21898.
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.
Fixes#21842.
Grok was originally ignoring potential matches to named-capture groups
larger than one. For example, If you had two patterns containing the
same named field, but only the second pattern matched, it would fail to
pick this up.
This PR fixes this by exploring all potential places where a
named-capture was used and chooses the first one that matched.
Fixes#22117.
For the record, I also had to remove the geo-hash cell and geo-distance range
queries to make the code compile. These queries already throw an exception in
all cases with 5.x indices, so that does not hurt any more.
I also had to rename all 2.x bwc indices from `index-${version}` to
`unsupported-${version}` to make `OldIndexBackwardCompatibilityIT`
happy.
There is an issue in the Grok Processor, where trace_match: true does not inject the _ingest._grok_match_index into the ingest-document when there is just one pattern provided. This is due to an optimization in the regex construction. This commit adds a check for when this is the case, and injects a static index value of "0", since there is only one pattern matched (at the first index into the patterns).
To make this clearer, more documentation was added to the grok-processor docs.
Fixes#21371.
At one point in the past when moving out the rest tests from core to
their own subproject, we had multiple test classes which evenly split up
the tests to run. However, we simplified this and went back to a single
test runner to have better reproduceability in tests. This change
removes the remnants of that multiplexing support.
The invalid ingest configuration field name used to show itself,
even when it was null, in error messages. Sometimes this does not make
sense.
e.g.
```[null] Only one of [file], [id], or [inline] may be configure```
vs.
```Only one of [file], [id], or [inline] may be configure```
The above deals with three fields, therefore this no one property
responsible.
Instead of get, set and remove we do get, remove and then set to avoid type conflicts in IngestDocument.
If the set still fails we try to restore the original field in ingest document.
Closes#19892