Commit Graph

295 Commits

Author SHA1 Message Date
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
Tal Levy 499ad6fcc4
Pre-compile inline scripts in Ingest Script processors (#57960) (#58130)
This commit introduces an optimization for inline scripts.
It keeps the compiled ingest script that the ScriptProcessor.Factory
has been creating for validation purposes. Previously, the Script Service's
cache was leveraged because it was the best way to handle caching of both
stored and inline scripts. Since inline scripts are so widely used in
Ingest Node, it is probably best to ensure we are using the pre-compiled version
from the beginning.
2020-06-15 15:22:56 -07:00
Dan Hermann 8a910443c4
Add ignore_empty_value parameter in set ingest processor (#57030) (#58108) 2020-06-15 08:35:08 -05:00
Jake Landis a370d5eead
[7.x] Ensure Joni warning are logged at debug (#57302) (#57897)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 17:06:29 -05:00
Jake Landis fff0a106c9
[7.x] Support `if_seq_no` and `if_primary_term` for ingest (#55430) (#57768)
Allow for optimistic concurrency control during ingest by checking the
sequence number and primary term. This is accomplished by defining
_if_seq_no and _if_primary_term in the pipeline, similarly to _version
and _version_type.

Closes #41255
Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>
2020-06-09 14:20:26 -05:00
Jack Conradson 35c546b388
Backports for _source bug fix in scripting (#57068)
* Update DeprecationMap to DynamicMap (#56149)

This renames DeprecationMap to DynamicMap, and changes the deprecation
messages Map to accept a Map of String (keys) to Functions (updated values)
instead. This creates more flexibility in either logging or updating values from
params within a script. This change is required to fix (#52103) in a future PR.

* Fix Source Return Bug in Scripting (#56831)

This change ensures that when a user returns _source directly no matter where 
accessed within scripting, the value is a Map of the converted source as 
opposed to a SourceLookup.
2020-05-21 17:07:38 -07:00
Jake Landis a56fb6192e
[7.x] Fix ingest simulate verbose on failure with conditional (#56478) (#56635)
If a conditional is added to a processor, and that processor fails, and 
that processor has an on_failure handler, the full trace of all of the 
executed processors may not be displayed in simulate verbose. The 
information is correct, but misses displaying some of the steps used 
to get there.

This happens because a processor that is conditional processor is a 
wrapper around the real processor and a processor with an on_failure 
handler is also a wrapper around the processor(s). When decorating for 
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors 
can be missed for decoration resulting in the missing displayed steps.

The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.

related: #56004
2020-05-12 15:41:05 -05:00
Dan Hermann 2061652988
Ensure auto close of HTMLStripCharFilter in HtmlStripProcessor
The HtmlStripProcessor did not use a try-with resources block to ensure
that the used HTMLStripCharFilter is closed.
2020-05-01 17:31:53 -05:00
Przemko Robakowski bf0204ba06
Fix empty_value handling in CsvProcessor (#55649) (#55968)
* Fix empty_value handling in CsvProcessor

Due to bug in `CsvProcessor.Factory` it was impossible to specify `empty_value`.
This change fixes that and adds relevant test.

Closes #55643

* assert changed
2020-04-29 22:37:22 +02:00
Jake Landis a2fafa6af4
[7.x] Lazy test cluster module and plugins (#54852) (#55087)
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
2020-04-13 10:53:35 -05:00
William Brafford 958e9d1b78
Refactor nodes stats request builders to match requests (#54363) (#54604)
* Refactor nodes stats request builders to match requests (#54363)

* Remove hard-coded setters from NodesInfoRequestBuilder

* Remove hard-coded setters from NodesStatsRequest

* Use static imports to reduce clutter

* Remove uses of old info APIs
2020-04-01 17:03:04 -04:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Jake Landis db3420d757
[7.x] Optimize which Rest resources are used by the Rest tests… (#53766)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-19 12:28:59 -05:00
Dan Hermann 94ac979c66
Support array for all string ingest processors (#53694) 2020-03-18 07:07:49 -05:00
Ryan Ernst 5c472fcb47 Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642)
Re-applies the change from #53523 along with test fixes.

closes #53626
closes #53624
closes #53622
closes #53625

Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Jake Landis <jake.landis@elastic.co>
2020-03-17 10:28:51 -07:00
Mark Vieira 2f0aca992b
Revert "Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576)"
This reverts commit b7dbadeea0.
2020-03-15 18:10:40 -07:00
Jason Tedor b7dbadeea0
Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576)
This commit upgrades our Jackson dependency to 2.10.3 and our GeoIP2
dependency to 2.13.1.

Relates #53523
2020-03-14 13:28:06 -04:00
Dan Hermann 3c8b46a8c1
[7.x] Handle errors when evaluating if conditions in processors (#52892) 2020-02-27 09:00:51 -06:00
Jay Modi 3edadfefd0 RestHandlers declare handled routes (#52123)
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.

This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.

Closes #51622

Co-authored-by: Jason Tedor <jason@tedor.me>

Backport of #51950
2020-02-09 22:48:32 -07:00
Przemko Robakowski c827f6f440
Avoid clash between source field and header field in CsvProcessorTests (#51962) (#52070)
This change fixes flakiness in `CsvProcessorTests` where source field
can be the same as one of the headers used by tests which messes up
asserts when we check that field is not present after processor run.

Closes #50209
2020-02-07 21:00:39 +01:00
Przemko Robakowski 6332de40b4
Add empty_value parameter to CSV processor (#51567) (#51966)
* Add empty_value parameter to CSV processor

This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.

* docs updated

* Fix compilation problem

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-05 23:35:52 +01:00
Przemyslaw Gomulka a6d24d6a46
Fix ingest timezone logic backport(#51215) (#51802)
when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter.
When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter

closes #51108
backport(#51215)
2020-02-03 14:17:43 +01:00
Przemko Robakowski a7f0c699cf
Fix ignore_missing in CsvProcessor (#51600) (#51609)
This change fixes inverted logic around ignore_missing in CsvProcessor
2020-01-29 14:58:23 +01:00
Przemko Robakowski 3fb7ad0e67
[7.x] Refactor ForEachProcessor to use iteration instead of recursion (#51104) (#51322)
* Refactor ForEachProcessor to use iteration instead of recursion (#51104)

* Refactor ForEachProcessor to use iteration instead of recursion

This change makes ForEachProcessor iterative and still non-blocking.
In case of non-async processors we use single for loop and no recursion at all.
In case of async processors we continue work on either current thread or thread
started by downstream processor, whichever is slower (usually processor thread).
Everything is synchronised by single atomic variable.

Relates #50514

* Update IngestCommonPlugin.java
2020-01-22 20:03:37 +01:00
Martijn van Groningen 02dfd71efa
Backport: Add pipeline name to ingest metadata (#51050)
Backport: #50467

This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.

Example usage in pipeline:
PUT /_ingest/pipeline/2
{
    "processors": [
        {
            "set": {
                "field": "pipeline_name",
                "value": "{{_ingest.pipeline}}"
            }
        }
    ]
}

Closes #42106
2020-01-16 10:50:47 +01:00
Jake Landis de6f132887
[7.x] Foreach processor - fork recursive call (#50514) (#50773)
A very large number of recursive calls can cause a stack overflow
exception. This commit forks the recursive calls for non-async
processors. Once forked, each thread will handle at most 10
recursive calls to help keep the stack size and thread count
down to a reasonable size.
2020-01-09 13:21:18 -06:00
Henning Andersen 125feecabc
Guess root cause support unwrap (#50525) (#50742)
ElasticsearchException.guessRootCauses would return wrapper exception if
inner exception was not an ElasticsearchException. Fixed to never return
wrapper exceptions.

At least following APIs change root_cause.0.type as a result:

_update with bad script
_index with bad pipeline

Relates #50417
2020-01-08 19:09:14 +01:00
Alexander Reelsen 71054d269b Sync grok patterns with logstash patterns (#50381)
In order to ensure that logstash and Elasticsearch are able to understand
the same patterns, this commit adapts to changes in logstash, adds a few
patterns and changes a few.
2020-01-08 14:59:34 +01:00
Przemko Robakowski 0efb241b3c
Fix flakiness in CsvProcessorTests (#50254) (#50256)
There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields.

Closes #50209
2019-12-17 01:15:15 +01:00
Przemko Robakowski 4619834b97
[7.x] CSV ingest processor (#49509) (#50083)
* CSV ingest processor (#49509)

This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.

Closes #49113
2019-12-11 23:06:05 +01:00
Martijn van Groningen 0a42395dfa
Backport: add templating support to pipeline processor (#49643)
Backport of #49030

This commit adds templating support to the pipeline processor's `name` option.

Closes #39955
2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka 502873b144
[Java.time] Retain prefixed date pattern in formatter (#48703)
JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters.

closes #48698
2019-11-27 12:29:18 +01:00
Martijn van Groningen 90850f4ea0
Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596)
Backport of #49076

In case an exception occurs inside a pipeline processor,
the pipeline stack is kept around as header in the exception.
Then in the on_failure processor the id of the pipeline the
exception occurred is made accessible via the `on_failure_pipeline`
ingest metadata.

Closes #44920
2019-11-27 07:52:08 +01:00
Jason Tedor 71bcfbf1e3
Replace required pipeline with final pipeline (#49470)
This commit enhances the required pipeline functionality by changing it
so that default/request pipelines can also be executed, but the required
pipeline is always executed last. This gives users the flexibility to
execute their own indexing pipelines, but also ensure that any required
pipelines are also executed. Since such pipelines are executed last, we
change the name of required pipelines to final pipelines.
2019-11-22 14:37:36 -05:00
Jason Tedor 2bcdcb17cd
Introduce dedicated ingest processor exception (#48810)
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
2019-11-14 11:04:53 -05:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Henning Andersen 8835142ac9 Grok processor ignore case test (#48909)
Added test demonstrating that grok using ignore case works, since this
does a minimal test that the `joni` and `jcodings` libraries are
compatible.

Forward-port of test from #43334
2019-11-08 00:04:29 +01:00
Jason Tedor c82ecb664c
Do not wrap ingest processor exception with IAE (#48816)
The problem with wrapping here is that it converts any exception into an
IAE, which we treat as a client error (400 status) whereas the exception
being wrapped here could be a server error (e.g., NPE). This commit
stops wrapping all ingest processor exceptions as IAEs.
2019-11-01 15:11:35 -04:00
Dan Hermann dbc05cd808
Add option to split processor for preserving trailing empty fields (#48685) 2019-10-30 08:25:03 -05:00
Martijn van Groningen b034153df7
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:34:01 +02:00
Przemyslaw Gomulka 6ab58de7ef
[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675) (#47913)
Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month
However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from
Within DateFormatters we use the correct uuuu year instead of yyyy year of era

worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.
2019-10-11 21:19:56 +02:00
Martijn van Groningen 429f23ea2f
Allow ingest processors to execute in a non blocking manner. (#47122)
Backport of #46241

This PR changes the ingest executing to be non blocking
by adding an additional method to the Processor interface
that accepts a BiConsumer as handler and changing
IngestService#executeBulkRequest(...) to ingest document
in a non blocking fashion iff a processor executes
in a non blocking fashion.

This is the second PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.

The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.

This change originates from the enrich branch and was introduced there
in #43361.
2019-09-26 08:55:28 +02:00
Jason Tedor bd77626177
Add the ability to require an ingest pipeline (#46847)
This commit adds the ability to require an ingest pipeline on an
index. Today we can have a default pipeline, but that could be
overridden by a request pipeline parameter. This commit introduces a new
index setting index.required_pipeline that acts similarly to
index.default_pipeline, except that it can not be overridden by a
request pipeline parameter. Additionally, a default pipeline and a
request pipeline can not both be set. The required pipeline can be set
to _none to ensure that no pipeline ever runs for index requests on that
index.
2019-09-19 16:37:45 -04:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Michael Basnight 89861d0884 Add ingest processor existence helper method (#45156)
This commit adds a helper method to the ingest service allowing it to
inspect a pipeline by id and verify the existence of a processor in the
pipeline. This work exposed a potential bug in that some processors
contain inner processors that are passed in at instantiation. These
processors needed a common way to expose their inner processors, so the
WrappingProcessor was created in order to expose the inner processor.
2019-08-07 11:19:04 -05:00
Ryan Ernst f193d14764
Convert remaining Action Response/Request to writeable.reader (#44528) (#44607)
This commit converts readFrom to ctor with StreamInput on the remaining
ActionResponse and ActionRequest classes.

relates #34389
2019-07-19 13:33:38 -07:00
Tal Levy 38d2ada84f
deprecate Supplier<Response> constructors in HandledTransportAction (#44456) (#44533)
This commit deprecates all constructors of HandledTransportAction
that take in a Supplier instead of a Writeable.Reader for response
objects.

in addition to the deprecation, the following modules were updated to
leverage Writeable

- modules:ingest-common
- modules:lang-mustache

relates #34389.
2019-07-17 22:47:09 -07:00
Julie Tibshirani 34c6067018
Convert several classes in 'server' to Writeable. (#44527)
* Convert FieldCapabilities*.
* Convert MultiTermVectors*.
* Convert SyncedFlush*.
* Convert SearchTemplateRequest.
* Convert MultiSearchTemplateRequest.
* Convert GrokProcessorGet*.
* Remove a stray reference to SearchTemplateRequest#readFrom.

Relates to #34389.
2019-07-17 19:04:21 -07:00
Ryan Ernst 1dcf53465c Reorder HandledTransportAction ctor args (#44291)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.

relates #34389
2019-07-12 13:45:09 -07:00
Ryan Ernst fb77d8f461 Removed writeTo from TransportResponse and ActionResponse (#44092)
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.

This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.

relates #34389
2019-07-10 12:42:04 -07:00
Jake Landis 2dc056b0a0
Read the default pipeline for bulk upsert through an alias (#41963) (#42802)
This commit allows bulk upserts to correctly read the default pipeline
for the concrete index that belongs to an alias.

Bulk upserts are modeled differently from normal index requests such that
the index request is a request inside of the update request. The update
request (outer) contains the index or alias name is not part of the (inner)
index request. This commit adds a secondary check against the update request
(outer) if the index request (inner) does not find an alias.
2019-07-02 20:44:33 -05:00
Ryan Ernst 3a2c698ce0
Rename Action to ActionType (#43778)
Action is a class that encapsulates meta information about an action
that allows it to be called remotely, specifically the action name and
response type. With recent refactoring, the action class can now be
constructed as a static constant, instead of needing to create a
subclass. This makes the old pattern of creating a singleton INSTANCE
both misnamed and lacking a common placement.

This commit renames Action to ActionType, thus allowing the old INSTANCE
naming pattern to be TYPE on the transport action itself. ActionType
also conveys that this class is also not the action itself, although
this change does not rename any concrete classes as those will be
removed organically as they are converted to TYPE constants.

relates #34389
2019-06-30 22:00:17 -07:00
Ryan Ernst 28ab77a023
Add StreamableResponseAction to aid in deprecation of Streamable (#43770)
The Action base class currently works for both Streamable and Writeable
response types. This commit intorduces StreamableResponseAction, for
which only the legacy Action implementions which provide newResponse()
will extend. This eliminates the need for overriding newResponse() with
an UnsupportedOperationException.

relates #34389
2019-06-28 21:40:00 -07:00
Henning Andersen 437d2d6d9f Rename processor test fix (#43035)
If the source field name is a prefix of the target field name, the
source field still exists after rename processor has run. Adjusted test
case to handle that case.
2019-06-10 19:23:05 +02:00
Mark Vieira e44b8b1e2e
[Backport] Remove dependency substitutions 7.x (#42866)
* Remove unnecessary usage of Gradle dependency substitution rules (#42773)

(cherry picked from commit 12d583dbf6f7d44f00aa365e34fc7e937c3c61f7)
2019-06-04 13:50:23 -07:00
emasab a142e8cfd8 Build local year inside DateFormat lambda
bugfix for https://github.com/elastic/elasticsearch/issues/41797 (#42120)

This makes sure that the year can change between when the lambda is generated and when it is executed without causing the incorrect year to be used.

Resolves #41797
2019-05-23 10:36:11 -06:00
Yannick Welsch 770d8e9e39 Remove usage of max_local_storage_nodes in test infrastructure (#41652)
Moves the test infrastructure away from using node.max_local_storage_nodes, allowing us in a
follow-up PR to deprecate this setting in 7.x and to remove it in 8.0.

This also changes the behavior of InternalTestCluster so that starting up nodes will not automatically
reuse data folders of previously stopped nodes. If this behavior is desired, it needs to be explicitly
done by passing the data path from the stopped node to the new node that is started.
2019-05-22 11:04:55 +02:00
Alexander Reelsen 8e33a5292a Add HTML strip processor (#41888)
This processor uses the lucene HTMLStripCharFilter class to remove HTML
entities from a field. This adds to the char filter, so that there is
possibility to store the stripped version as well.

Note, that the characeter filter replaces tags with a newline, so that
the produced HTML will look slightly different than the incoming HTML
with regards to newlines.
2019-05-09 13:01:07 +02:00
Christoph Büscher 52495843cc [Docs] Fix common word repetitions (#39703) 2019-04-25 20:47:47 +02:00
Alpar Torok 25944c4317 convert modules to use testclusters (#40804)
* convert modules to use testclusters
* Eliminate PluginPropertiesTask and move logic in plugin where it belongs
2019-04-04 11:45:40 +03:00
Martijn van Groningen 89837eb918
Remove -Xlint exclusions in the ingest-common module. (#40505)
Fix the generics in processors extending AbstractStringProcessor and its factory.

Relates to #40366
2019-03-29 09:43:36 +01:00
Jake Landis 797d6b8a66
Execute ingest node pipeline before creating the index (#39607) (#39796)
Prior to this commit (and after 6.5.0), if an ingest node changes
the _index in a pipeline, the original target index would be created.
For daily indexes this could create an extra, empty index per day.

This commit changes the TransportBulkAction to execute the ingest node
pipeline before attempting to create the index. This ensures that the 
only index created is the original or one set by the ingest node pipeline. 
This was the execution order prior to 6.5.0 (#32786). 

The execution order was changed in 6.5 to better support default pipelines. 
Specifically the execution order was changed to be able to read the settings
from the index meta data. This commit also includes a change in logic such 
that if the target index does not exist when ingest node pipeline runs, it 
will now pull the default pipeline (if one exists) from the settings of the 
best matched of the index template. 

Relates #32786
Relates #32758 
Closes #36545
2019-03-07 13:31:41 -06:00
Martijn van Groningen b8659fcb83
No need to extend from StatusToXContentObject,
if RestToXContentListener is used instead of RestStatusToXContentListener
2019-03-04 13:29:10 +01:00
Martijn van Groningen 0550ead176
Cleanup GrokProcessorGetAction class (#39567)
* Removed request builder. From 7.0, request builders are no longer used.
* Use RestStatusToXContentListener instead of custom RestBuilderListener in the rest action.
* Changed a few public constructor's and constants' visibility from public to package protected.
  (these are only used internally, so no need to for public visibility)
2019-03-04 08:51:23 +01:00
Ioannis Kakavas ec2b64af63 Disable date parsing test in non english locale (#39052)
This ensures we do not attempt to parse non english locale dates
in FIPS mode. The error, originally assumed to affect only Joda,
affects Java time in the same manner and manifests only with the
version of BouncyCastle FIPS certified provider we use in tests.
The upstream issue https://github.com/bcgit/bc-java/issues/405
indicates that the behavior is resolved in later versions of the
BouncyCastle library and should be tested again when the new
versions become FIPS 140 certified
2019-02-20 09:02:37 +02:00
Alexander Reelsen 884b5063a4 Create ISO8601 joda compatible java time formatter (#38434)
The existing formatter being used was not on par with the joda formatter
as it was missing the ability to parse a comma as a separator between
seconds and milliseconds.

While a real iso8601 would be much more complex, this might be
sufficient for some more use-cases.

The ingest date formatter now also uses the iso8601 formatter by
default.

Closes #38345
2019-02-11 15:11:26 +01:00
Alexander Reelsen 56edc8e37f
Fix timezone fallback in ingest processor (#38407) (#38664)
If no timezone was specified in the date processor, then the conversion
would lead to wrong time, as UTC was assumed by default, leading to
incorrectly parsed dates.

This commit does not assume a default timezone and will thus not format
the dates in a wrong way.
2019-02-09 20:28:59 +01:00
Christoph Büscher 820029522b
Mute DateProcessorTests#testJodaPatternLocale (#38265)
Only fails on FIPS 8, muting this selectively.
2019-02-03 19:52:53 +01:00
Alexander Reelsen 6c5a7387af
Replace joda time in ingest-common module (#38088)
This commit fully replaces any remaining joda time time classes with
java time implementations.

Relates #27330
2019-02-01 10:15:18 +01:00
Henning Andersen 68ed72b923
Handle scheduler exceptions (#38014)
Scheduler.schedule(...) would previously assume that caller handles
exception by calling get() on the returned ScheduledFuture.
schedule() now returns a ScheduledCancellable that no longer gives
access to the exception. Instead, any exception thrown out of a
scheduled Runnable is logged as a warning.

This is a continuation of #28667, #36137 and also fixes #37708.
2019-01-31 17:51:45 +01:00
Tal Levy e0d5de33da fix DateIndexNameProcessorTests offset pattern (#38069)
`XX` was being used to represent an offset pattern,
it should be `ZZ`

Fixes #38067.
2019-01-31 08:57:56 +01:00
Alexander Reelsen b94acb608b
Speed up converting of temporal accessor to zoned date time (#37915)
The existing implementation was slow due to exceptions being thrown if
an accessor did not have a time zone. This implementation queries for
having a timezone, local time and local date and also checks for an
instant preventing to throw an exception and thus speeding up the conversion.

This removes the existing method and create a new one named
DateFormatters.from(TemporalAccessor accessor) to resemble the naming of
the java time ones.

Before this change an epoch millis parser using the toZonedDateTime
method took approximately 50x longer.

Relates #37826
2019-01-31 08:55:40 +01:00
Jason Tedor 89bffc25de
Mute failing date index name processor test
This test is repeatedly failing, so this commit mutes it.

Relates #38067
2019-01-30 20:37:52 -05:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
David Roberts 2f7776c8b7
Switch default time format for ingest from Joda to Java for v7 (#37934)
Date formats with and without the "8" prefix are now all treated
as Java time formats, so that ingest does the same as mappings
in this respect.
2019-01-30 16:26:28 +00:00
Alexander Reelsen 9e350d027e
Add BWC compatible processing to ingest date processors (#37407)
The ingest date processor is currently only able to parse joda formats.
However it is not using the existing elasticsearch classes but access
joda directly. This means that our existing BWC layer does not notify
the user about deprecated formats. This commit switches to use the
exising Elasticsearch Joda methods to acquire a date format, that
includes the BWC check and the ability to parse java 8 dates.

The date parsing in ingest has also another extra feature, that the
fallback year, when a date format without a year is used, is the current
year, and not 1970 like usual. This is currently not properly supported
in the DateFormatter class. As this is the only case for this feature
and java time can take care of this using the toZonedDateTime() method,
a workaround just for the joda time parser has been created, that can be
removed soon again from 7.0.
2019-01-25 13:50:19 +01:00
Jack Conradson de55b4dfd1
Add types deprecation to script contexts (#37554)
This adds deprecation to _type in the script contexts for ingest and update. 
This adds a DeprecationMap that wraps the ctx Map containing _type for these 
specific contexts.
2019-01-18 09:13:49 -08:00
Jake Landis 195873002b
ingest: compile mustache template only if field includes '{{'' (#37207)
* ingest: compile mustache template only if field includes '{{''

Prior to this change, any field in an ingest node processor that supports
script templates would be compiled as mustache template regardless if they
contain a template or not. Compiling normal text as mustache templates is
harmless. However, each compilation counts against the script compilation
circuit breaker. A large number of processors without any templates or scripts
could un-intuitively trip the too many script compilations circuit breaker.

This change simple checks for '{{' in the text before it attempts to compile.

fixes #37120
2019-01-09 14:47:47 -06:00
Jake Landis 384757deff
ingest: support default pipelines + bulk upserts (#36618)
This commit adds support to enable bulk upserts to use an index's
default pipeline. Bulk upsert, doc_as_upsert, and script_as_upsert
are all supported.

However, bulk script_as_upsert has slightly surprising behavior since
the pipeline is executed _before_ the script is evaluated. This means
that the pipeline only has access the data found in the upsert field
of the script_as_upsert. The non-bulk script_as_upsert (existing behavior)
runs the pipeline _after_ the script is executed. This commit
does _not_ attempt to consolidate the bulk and non-bulk behavior for
script_as_upsert.

This commit also adds additional testing for the non-bulk behavior,
which remains unchanged with this commit.

fixes #36219
2018-12-17 16:25:11 -06:00
Jake Landis 7bf822bbbb
ingest: fix on_failure with Drop processor (#36686)
This commit allows a document to be dropped when a Drop processor
is used in the on_failure fork of the processor chain.

Fixes #36151
2018-12-17 14:10:13 -06:00
Jake Landis 190ac8e9bf
ingest: support default pipeline through an alias (#36231)
This commit allows writes that go through an alias to use the default
pipeline defined on the backing index.

Fixes #35817
2018-12-05 16:25:50 -06:00
Jake Landis 9150a93b1c
ingest: dot_expander_processor prevent null add/append to source document (#35106)
* don't allow null values to be added or appended to the 
source document if the field does not exist.
2018-11-05 17:16:42 -06:00
Alexander Reelsen 409050e8de
Refactor: Remove settings from transport action CTOR (#35208)
As settings are not used in the transport action constructor, this
removes the passing of the settings in all the transport actions.
2018-11-05 13:08:18 +01:00
Alpar Torok 59536966c2
Add a new "contains" feature (#34738)
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
2018-10-25 08:50:50 +03:00
Jake Landis 89dc07bdd9
ingest: better support for conditionals with simulate?verbose (#34155)
This commit introduces two corrections to the way simulate?verbose
handles conditionals on processors.

1) Prior to this change when executing simulate?verbose for
processors with conditionals that evaluate to false, that processor
would still be displayed in the result set. What was displayed was
correct, such that no changes to the document occurred. However, if the
conditional evaluates to false, the processor should not even be
displayed.

2) Prior to this change when executing simulate?verbose for
pipeline processors with conditionals, the individual steps would no
longer be displayed. Commit e37e5df addressed the issue, but
failed account for a conditional on the pipeline processor. Since
a pipeline processor can introduce cycles and is effectively a
single processor that encapsulates multiple other processors that
are potentially guarded by a single conditional, special handling is
needed to for pipeline and conditional pipeline processors.
2018-10-23 11:33:48 -05:00
Armin Braun 8e155b8430
INGEST: Rename Pipeline Processor Param. (#34733)
* `name` is more readable/ergnomic than having `pipeline` twice
2018-10-23 13:43:26 +02:00
Ryan Ernst 8734540345
Ensure map keys cannot be self referencing (#34569)
This commit improves self reference checking to map keys, as well as
adds it to ingest script processing.
2018-10-17 15:16:13 -07:00
Ryan Ernst a2c941806b
Tests: Add support for custom contexts to mock scripts (#34100)
This commit adds the ability to plug in compilation of custom contexts
in mock script engine. This is needed for testing plugins which add
custom contexts like watcher.
2018-09-27 12:23:59 -07:00
Armin Braun 0ba1855740
INGEST: Tests for Drop Processor (#33430)
* INGEST: Tests for Drop Processor

* UT for behavior of dropped callback
and drop processor
   * Moved drop processor to `server`
project to enable this test
* Simple IT
* Relates #32278
2018-09-25 19:29:22 +02:00
Jake Landis e37e5dfc04
ingest: support simulate with verbose for pipeline processor (#33839)
* ingest: support simulate with verbose for pipeline processor

This change better supports the use of simulate?verbose with the
pipeline processor. Prior to this change any pipeline processors
executed with simulate?verbose would not show all intermediate 
processors for the inner pipelines.

This changes also moves the PipelineProcess and TrackingResultProcessor
classes to enable instance checks and to avoid overly public classes.
As well this updates the error message for when cycles are detected
in pipelines calling other pipelines.
2018-09-20 08:33:07 -05:00
Armin Braun ef1066d7f8
INGEST: Allow Repeated Invocation of Pipeline (#33419)
* Allows repeated, non-recursive invocation
of the same pipeline
2018-09-05 22:04:53 +02:00
Armin Braun 46774098d9
INGEST: Implement Drop Processor (#32278)
* INGEST: Implement Drop Processor
* Adjust Processor API
* Implement Drop Processor
* Closes #23726
2018-09-05 14:25:29 +02:00
Armin Braun cc4d7059bf
Ingest: Add conditional per processor (#32398)
* Ingest: Add conditional per processor
* closes #21248
2018-08-30 03:46:39 +02:00
Armin Braun f690b492e7
INGEST: Add Pipeline Processor (#32473)
* INGEST: Add Pipeline Processor

* Adds Processor capable of invoking other pipelines
* Closes #31842
2018-08-29 11:03:10 +02:00
Jake Landis e9b0807c67
ingest: minor - update test to include dissect (#33211)
This change also includes placing the bytes processor in the correct
order (helps to avoid merge conflict when back patching processors)
2018-08-28 11:55:04 -07:00
Jake Landis 79b507dbf5
ingest: Introduce the dissect processor (#32884)
* ingest: Introduce the dissect processor

The ingest node dissect processor is an alternative to Grok
to split a string based on a pattern. Dissect differs from
Grok such that regular expressions are not used to split the
string.

Dissect can be used to parse a source text field with a
simpler pattern, and is often faster the Grok for basic string
parsing. This processor uses the dissect library which
does most of the work.
2018-08-28 07:11:20 -07:00
Armin Braun 986c55b830
INGEST: Add Configuration Except. Data to Metdata (#32322)
* closes #27728
2018-08-15 19:02:19 +02:00
Armin Braun be31cc642b
INGEST: Enable default pipelines (#32286)
* INGEST: Enable default pipelines

* Add `default_pipeline` index setting
* `_none` is interpreted as no pipeline
* closes #21101
2018-08-02 17:11:12 +02:00
Armin Braun cf7489899a
INGEST: Clean up Java8 Stream Usage (#32059)
* GrokProcessor: Rationalize the loop over the map to save allocations and indirection
* IngestDocument: Rationalize way we append to `List`
2018-07-30 21:25:30 +02:00