Commit Graph

154 Commits

Author SHA1 Message Date
Guillaume Le Floch 3f6d80aa66 Allow removing multiple fields in ingest processor (#24750)
* Allow removing multiple fields in ingest processor

* Iteration 2

* Few fixes
2017-06-08 13:17:44 -07:00
Tal Levy d6d0c13bd6 fix grok's pattern parsing to validate pattern names in expression (#25063)
Unknown patterns used to silently be ignored. This was a problem because users did not know they were providing an invalid pattern name, and maybe thought the rest of their regexes were invalid.

Fixes #22831.
2017-06-06 08:07:53 -07:00
Tal Levy e51246023a add `exclude_keys` option to KeyValueProcessor (#24876)
and modify data-structure of `include_keys` and `exclude_keys` to be
backed by a HashSet
2017-06-05 14:12:48 -07:00
Tal Levy 2a6e6866bd Fix floating-point error when DateProcessor parses UNIX (#24947)
DateProcessor's DateFormat UNIX format parser resulted in
a floating point rounding error when parsing certain stringed
epoch times. Now Double.parseDouble is used, preserving the
intented input.
2017-05-30 09:42:26 -07:00
Ryan Ernst 74e031e842 Scripting: Rename CompiledType to FactoryType in ScriptContext (#24897)
This commit renames the concept of the "compiled type" to a "factory
type", along with all implementations of this class to be named Factory.
This brings it inline with the classes purpose.
2017-05-26 00:02:54 -07:00
Ryan Ernst 8aaea51a0a Scripting: Move context definitions to instance type classes (#24883)
This is a simple refactoring to move the context definitions into the
type that they use. While we have multiple context names for the same
class at the moment, this will eventually become one ScriptContext per
instance type, so the pattern of a static member on the interface called
CONTEXT can be used. This commit also moves the consolidated list of
contexts provided by core ES into ScriptModule.
2017-05-25 12:18:45 -07:00
Ryan Ernst 1daacd97b0 Scripting: Add instance and compiled classes to script contexts (#24868)
This commit modifies the compile method of ScriptService to be context
aware. The ScriptContext is now a generic class which contains both the
instance type and compiled type for a script. Instance type may be
stateful (for example, pre loading field information for the index a
script will execute on, like in expressions), while the compiled type is
stateless and used to construct instance type instances. This change is
only a first step to cutover ScriptService to the new paradigm. It only
converts callers to the script service, and has a small shim to wrap
compilation from the script engines to support the current two fixed
instance types, SearchScript and ExecutableScript.
2017-05-24 14:29:02 -07:00
Ryan Ernst 52d504bb5f Scripting: Simplify ScriptContext (#24818)
As we work towards contexts implying the return type of compilation, we
first need ScriptContext to not be an enum. This commit removes the
Standard enum and Plugin subclass of ScriptContext.
2017-05-22 13:11:15 -07:00
Ryan Ernst 463fe2f4d4 Scripting: Remove file scripts (#24627)
This commit removes file scripts, which were deprecated in 5.5.

closes #21798
2017-05-17 14:42:25 -07:00
Koen De Groote 878ae8eb3c Size lists in advance when known
When constructing an array list, if we know the size of the list in
advance (because we are adding objects to it derived from another list),
we should size the array list to the appropriate capacity in advance (to
avoid resizing allocations). This commit does this in various places.

Relates #24439
2017-05-12 10:36:13 -04:00
javanna e875f7f72e remove duplicated import in AppendProcessor 2017-05-08 10:36:36 +02:00
Ryan Ernst 473e98981b Scripts: Remove unnecessary executable shortcut (#24264)
ScriptService has two executable methods, one which takes a
CompiledScript, which is similar to search, and one that takes a raw
Script and both compiles and returns an ExecutableScript for it. The
latter is not needed, and the call sites which used one or the other
were mixed. This commit removes the extra executable method in favor of
callers first calling compile, then executable.
2017-04-21 17:53:03 -07:00
Jason Tedor 9a0b216c36 Upgrade checkstyle to version 7.5
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.

Relates #22960
2017-02-03 09:46:44 -05:00
Jack Conradson 3d2626c4c6 Change Namespace for Stored Script to Only Use Id (#22206)
Currently, stored scripts use a namespace of (lang, id) to be put, get, deleted, and executed. This is not necessary since the lang is stored with the stored script. A user should only have to specify an id to use a stored script. This change makes that possible while keeping backwards compatibility with the previous namespace of (lang, id). Anywhere the previous namespace is used will log deprecation warnings.

The new behavior is the following:

When a user specifies a stored script, that script will be stored under both the new namespace and old namespace.

Take for example script 'A' with lang 'L0' and data 'D0'. If we add script 'A' to the empty set, the scripts map will be ["A" -- D0, "A#L0" -- D0]. If a script 'A' with lang 'L1' and data 'D1' is then added, the scripts map will be ["A" -- D1, "A#L1" -- D1, "A#L0" -- D0].

When a user deletes a stored script, that script will be deleted from both the new namespace (if it exists) and the old namespace.

Take for example a scripts map with {"A" -- D1, "A#L1" -- D1, "A#L0" -- D0}. If a script is removed specified by an id 'A' and lang null then the scripts map will be {"A#L0" -- D0}. To remove the final script, the deprecated namespace must be used, so an id 'A' and lang 'L0' would need to be specified.

When a user gets/executes a stored script, if the new namespace is used then the script will be retrieved/executed using only 'id', and if the old namespace is used then the script will be retrieved/executed using 'id' and 'lang'
2017-01-31 13:27:02 -08:00
Tal Levy e9a68b3287 fix date-processor to a new default year for every new pipeline execution. (#22601)
Beforehand, the DateProcessor constructs its joda pattern formatter during processor
construction. This led to newly ingested documents being defaulted to
the year that the pipeline was constructed, not that of processing.

Fixes #22547.
2017-01-25 15:09:07 -08:00
Tal Levy e6fb3a5d95 fix index out of bounds error in KV Processor (#22288)
- checks for index-out-of-bounds
- added unit tests for failed `field_split` and `value_split` scenarios

missed this test in #22272.
2016-12-27 10:57:11 -08:00
Tal Levy c53b2ee9cd introduce KV Processor in Ingest Node (#22272)
Now you can parse field values of the `key=value` variety and have
`key` be inserted as a field name in an ingest document.

Closes #22222.
2016-12-20 13:26:17 -08:00
Nik Everett a04dcfb95b Introduce XContentParser#namedObject (#22003)
Introduces `XContentParser#namedObject which works a little like
`StreamInput#readNamedWriteable`: on startup components register
parsers under names and a superclass. At runtime we look up the
parser and call it to parse the object.

Right now the parsers take a context object they use to help with
the parsing but I hope to be able to eliminate the need for this
context as most what it is used for at this point is to move
around parser registries which should be replaced by this method
eventually. I make no effort to do so in this PR because it is
big enough already. This is meant to the a start down a road that
allows us to remove classes like `QueryParseContext`,
`AggregatorParsers`, `IndicesQueriesRegistry`, and
`ParseFieldRegistry`.

The goal here is to reduce the amount of plumbing required to
allow parsing pluggable things. With this you don't have to pass
registries all over the place. Instead you must pass a super
registry to fewer places and use it to wrap the reader. This is
the same tradeoff that we use for NamedWriteable and it allows
much, much simpler binary serialization. We think we want that
same thing for xcontent serialization.

The only parsing actually converted to this method is parsing
`ScoreFunctions` inside of `FunctionScoreQuery`. I chose this
because it is relatively self contained.
2016-12-20 11:05:24 -05:00
Tal Levy bb37167946 Enables the ability to inject serialized json fields into root of document. (#22179)
The JSON processor has an optional field called "target_field".
If you don't specify target_field then target_field becomes what you specified as "field".
There isn't anyway to add the fields to the root of a document. By
setting `add_to_root`, now serialized fields will be inserted into the
top-level fields of the ingest document.

Closes #21898.
2016-12-16 10:17:27 -08:00
Tal Levy eaf82a6e7e compile ScriptProcessor inline scripts when creating ingest pipelines (#21858)
Inline scripts defined in Ingest Pipelines are now compiled at creation time to preemptively catch errors on initialization of the pipeline.

Fixes #21842.
2016-12-14 17:26:51 -08:00
Tal Levy f56097b57a Fixes GrokProcessor's ignorance of named-captures with same name. (#22131)
Grok was originally ignoring potential matches to named-capture groups
larger than one. For example, If you had two patterns containing the
same named field, but only the second pattern matched, it would fail to
pick this up.

This PR fixes this by exploring all potential places where a
named-capture was used and chooses the first one that matched.

Fixes #22117.
2016-12-13 13:19:55 -08:00
Tal Levy 6796464f16 add `ignore_missing` option to SplitProcessor (#20982)
Closes #20840.
2016-11-16 15:46:09 +02:00
Tal Levy 04b712bdc5 fix trace_match behavior for when there is only one grok pattern (#21413)
There is an issue in the Grok Processor, where trace_match: true does not inject the _ingest._grok_match_index into the ingest-document when there is just one pattern provided. This is due to an optimization in the regex construction. This commit adds a check for when this is the case, and injects a static index value of "0", since there is only one pattern matched (at the first index into the patterns).

To make this clearer, more documentation was added to the grok-processor docs.

Fixes #21371.
2016-11-16 15:41:54 +02:00
Jack Conradson aeb97ff412 Clean up of Script.
Closes #21321
2016-11-10 09:59:13 -08:00
Jack Conradson 512a77a633 Refactor ScriptType to be a top-level class. 2016-10-26 10:21:22 -07:00
Tal Levy 38c650f376 make painless the default scripting language for ScriptProcessor (#20981)
- fixes a bug in the docs that mentions `lang` as optional
- now `lang` defaults to "painless"
2016-10-18 16:22:01 -07:00
Martijn van Groningen 55dce523c2 docs: marked `foreach` processor as experimental
Closes #19602
2016-09-30 12:23:42 +02:00
Tal Levy 92ab44d35c [fix] JSON Processor was not properly added (#20613) 2016-09-28 23:04:22 +02:00
Tal Levy 9f1f5fdedc introduce the JSON Processor (#20128)
introduce the JSON Processor
2016-09-09 14:34:32 -07:00
Tal Levy dda32545bb add ignore_missing option to relevant processors (#20194) 2016-09-09 12:20:18 -07:00
Martijn van Groningen 6f6d17dc9c ingest: Add `dot_expander` processor that can turn fields with dots in the field name into object fields. 2016-09-05 07:28:38 +02:00
Martijn van Groningen 1925813e09 ingest: Fix rename processor change rename leaf fields into branch fields
Instead of get, set and remove we do get, remove and then set to avoid type conflicts in IngestDocument.
If the set still fails we try to restore the original field in ingest document.

Closes #19892
2016-08-30 07:38:01 +02:00
Martijn van Groningen 48926b4d66 ingest: don't render template twice for append processor 2016-08-26 18:07:32 +02:00
Igor Motov b36fbc4452 Add support for parameters to the script ingest processor
The script processor should support `params` to be consistent with all other script consumers.
2016-08-24 16:49:48 -04:00
Tal Levy 84bf24b1e9 remove ability to set field value in script-processor configuration (#19981) 2016-08-15 10:57:39 -07:00
Martijn van Groningen 7b36a72ccb fixed compiler warnining and removed unused imports 2016-07-29 15:06:49 +02:00
Martijn van Groningen 24d7fa6d54 ingest: Change the `foreach` processor to use the `_ingest._value` ingest metadata attribute to store the current array element being processed.
Closes #19592
2016-07-27 09:35:09 +02:00
Tal Levy f7cd86ef6d rethrow script compilation exceptions into ingest configuration exceptions (#19318)
* rethrow script compilation exceptions into ingest configuration exceptions
* update readProcessor to rethrow any exception as an ElasticsearchException
2016-07-20 10:37:56 -07:00
Martijn van Groningen d0069f0fbb Provide access to ThreadContext in ingest plugins
Also introduced a `Processor.Parameters` class that is holder for several services processors rely on,
the  IngestPlugin#getProcessors(...) method has been changed to accept `Processor.Parameters` instead
of each service seperately.
2016-07-15 08:16:15 +02:00
Tal Levy 8fd01554bc update foreach processor to only support one applied processor. (#19402)
Closes #19345.
2016-07-13 13:13:00 -07:00
Ryan Ernst 2fc41adeb5 Merge branch 'master' into ingest_plugin_api 2016-07-05 20:53:03 -07:00
Tanguy Leroux 0e7faf1005 Enable Checkstyle RedundantModifier 2016-07-04 15:22:12 +02:00
Ryan Ernst e5caadc4f3 Merge branch 'master' into ingest_plugin_api 2016-07-01 12:35:26 -07:00
Ryan Ernst 65c9b0b588 Merge branch 'master' into ingest_plugin_api 2016-07-01 09:26:17 -07:00
Tanguy Leroux 8c40b2b54e Fix order of modifiers 2016-07-01 16:57:14 +02:00
Ryan Ernst e4f265eb3a Ingest: Remove generics from Processor.Factory
The factory for ingest processor is generic, but that is only for the
return type of the create mehtod. However, the actual consumer of the
factories only cares about Processor, so generics are not needed.

This change removes the generic type from the factory. It also removes
AbstractProcessorFactory which only existed in order pull the optional
tag from config. This functionality is moved to the caller of the
factories in ConfigurationUtil, and the create method now takes the tag.
This allows the covariant return of the implementation to work with
tests not needing casts.
2016-06-30 02:33:54 -07:00
Ryan Ernst 08b3b6264e Tests pass, started removing generics from processor factory 2016-06-30 01:49:22 -07:00
Ryan Ernst f4519c44b7 Merge branch 'master' into ingest_plugin_api 2016-06-29 22:38:23 -07:00
Ryan Ernst ecf6101798 Scripts: Remove ClusterState from compile api
Stored scripts are pulled from the cluster state, and the current api
requires passing the ClusterState on each call to compile. However, this
means every user of the ScriptService needs to depend on the
ClusterService. Instead, this change makes the ScriptService a
ClusterStateListener. It also simplifies tests a lot, as they no longer
need to create fake cluster states (except when testing stored scripts).
2016-06-28 13:20:00 -07:00
Ryan Ernst 258c3e86ab Added IngestPlugin api, cutover common and geoip, changed ingest factory
api to take ProcessorsRegistry
2016-06-28 10:52:07 -07:00
Martijn van Groningen 82f7bfad98 ingest: merged o.e.ingest.core with o.e.ingest and in ingest-common module added o.e.ingest.common package
and moved all code to that package.
2016-06-21 09:24:00 +02:00
Ryan Ernst a4503c2aed Plugins: Remove name() and description() from api
In 2.0 we added plugin descriptors which require defining a name and
description for the plugin. However, we still have name() and
description() which must be overriden from the Plugin class. This still
exists for classpath plugins. But classpath plugins are mainly for
tests, and even then, referring to classpath plugins with their class is
a better idea. This change removes name() and description(), replacing
the name for classpath plugins with the full class name.
2016-06-15 17:12:22 -07:00
Tal Levy a26260fb72 new ScriptProcessor for Ingest (#18193)
add new ScriptProcessor for executing ES Scripts within pipelines
2016-06-15 14:57:18 -07:00
Martijn van Groningen f611f1c99e ingest: Move processors from core to ingest-common module.
Folded grok processor into ingest-common module.

The rest tests have been moved to ingest-common module as well, because these tests don't run in the rest-api-spec module but in the distribution:integ-test-zip module
and adding a test plugin there felt just wrong to me. I think this is ok. I left a tiny ingest rest test behind in that tests with an empty pipeline.

Removed messy tests, these tests were already covered in the rest tests

Added ingest test plugin in test infra so that each module testing integration with ingest doesn't need write its own plugin

Moved reindex ingest tests to qa module

Closes #18490
2016-06-07 17:32:52 +02:00