Commit Graph

171 Commits

Author SHA1 Message Date
javanna 57d6971252 Streamline support for get/set/remove of metadata fields and ingest metadata fields
Unify metadata map and source, add also support for _ingest prefix. Depending on the prefix, either _source, nothing or _ingest, we will figure out which map to use for values retrieval, but also modifications.
2015-12-09 18:36:44 +01:00
javanna 744d2908a8 avoid null values in simulate serialization prototypes, use empty maps instead 2015-12-09 18:36:44 +01:00
javanna 6b7446beb9 Remove sourceModified flag from IngestDocument
If one is using the ingest plugin and providing a pipeline id with the request, the chance that the source is going to be modified is 99%. We shouldn't worry about keeping track of whether something changed. That seemed useful at first so we can save the resources for setting back the source (map to bytes) when not needed. Also, we are trying to unify metadata fields and source in the same map and that is going to complicate how we keep track of changes that happen in the source only. Best solution is to remove the flag.
2015-12-09 18:36:43 +01:00
javanna b0d7d604ff Add support for transient metadata to IngestDocument
IngestDocument now holds an additional map of transient metadata. The only field that gets added automatically is `timestamp`, which contains the timestamp of ingestion in ISO8601 format. In the future it will be possible to eventually add or modify these fields, which will not get indexed, but they will be available via templates to all of the processors.

Transient metadata will be visualized by the simulate api, although they will never get indexed. Moved WriteableIngestDocument to the simulate package as it's only used by simulate and it's now modelled for that specific usecase.

 Also taken the chance to remove one IngestDocument constructor used only for testing (accepting only a subset of es metadata fields). While doing that introduced some more randomizations to some existing processor tests.

Closes #15036
2015-12-09 18:36:01 +01:00
javanna 5bc1e46113 setFieldValue for list to replace when an index is specified
It used to do add instead, which is not consistent with the behaviour of set, which always replaces.
2015-12-09 18:28:07 +01:00
Martijn van Groningen 233de434a0 Merge pull request #15310 from martijnvg/ingest/stream_put_and_delete_responses
Streamline put & delete pipeline responses with index & delete responses
2015-12-09 11:30:57 +01:00
Martijn van Groningen a2cda4e3f2 Streamline the put and delete pipelines responses with the index and delete response in core. 2015-12-08 14:01:28 +01:00
Tal Levy 45f48ac126 update all processors to only operate on one field at a time when possible 2015-12-07 08:30:00 -08:00
javanna d7c3b51b9c [TEST] adapt to upstream changes 2015-12-04 16:35:53 +01:00
javanna 73986cc54f adapt to upstream changes 2015-12-04 14:17:07 +01:00
Tal Levy ffa8998f36 Merge pull request #15181 from martijnvg/ingest_geoip_only_read_mmdb_files
[Ingest] The geoip processor should only try to read *.mmdb files from the geoip config directory
2015-12-03 12:17:09 -08:00
Tal Levy 56da7b32ed add ability to define custom grok patterns within processor config 2015-12-03 08:24:07 -08:00
Tal Levy cf1c393d70 Merge pull request #15166 from talevy/remove_pattern_utils
move PatternUtils#loadBankFromStream into GrokProcessor.Factory
2015-12-03 08:08:00 -08:00
Martijn van Groningen 6acf8ec263 Removed pipeline tests with a simpler tests
The PipelineTests tried to test if the configured map/list in set processor wasn't modified while documents were ingested. Creating a pipeline programmatically created more noise than the test needed. The new tests in IngestDocumentTests have the same goal, but is much smaller and clearer by directly testing against IngestDocument.
2015-12-03 15:19:04 +01:00
Martijn van Groningen 9ab765b851 The geoip processor should only try to read *.mmdb files from the geoip config directory 2015-12-02 14:38:49 +01:00
Martijn van Groningen 270a3977bc Removed the lazy cache in DatabaseReaderService and eagerly build all available databases. 2015-12-02 11:16:02 +01:00
Tal Levy 767bd1d4d5 move PatternUtils#loadBankFromStream into GrokProcessor.Factory 2015-12-01 15:46:02 -08:00
javanna 6c0510b01d Make rename processor less error prone
Rename processor now checks whether the field to rename exists and throws exception if it doesn't. It also checks that the new field to rename to doesn't exist yet, and throws exception otherwise. Also we make sure that the rename operation is atomic, otherwise things may break between the remove and the set and we'd leave the document in an inconsistent state.

Note that the requirement for the new field name to not exist simplifies the usecase for e.g. { "rename" : { "list.1": "list.2"} } as such a rename wouldn't be accepted if list is actually a list given that either list.2 already exists or the index is out of bounds for the existing list. If one really wants to replace an existing field, that field needs to be removed first through remove processor and then rename can be used.
2015-12-01 19:58:24 +01:00
Martijn van Groningen 15b6708a5d and now make use of the lifecycle infrastructure 2015-12-01 18:20:25 +01:00
Tal Levy 8e4c288b5c Merge pull request #15132 from talevy/no_match_for_grok
[Ingest] No match for grok
2015-12-01 09:13:11 -08:00
Tal Levy 2c1effdd41 throw exception when grok processor does not match 2015-12-01 08:58:58 -08:00
Martijn van Groningen 9dd52ad7d3 Removed pollution from the Processor.Factory interface.
1) It no longer extends from Closeable.
2) Removed the config directory setter. Implementation that relied on it, now get the location to the config dir via their constructors.
2015-12-01 17:32:37 +01:00
Martijn van Groningen fa9fcb3b11 geo processor should add a list of doubles instead of an array to the ingest document 2015-12-01 17:12:34 +01:00
Martijn van Groningen 99a4295330 If a list or map value gets set on ingest document a deep copy needs to be made.
If this is not done this can lead to processor configuration being changed by an bulk or index request.
2015-12-01 16:02:04 +01:00
Martijn van Groningen 4402da1af0 also change the tests to deal with Exception instead of IOException 2015-11-30 15:45:40 +01:00
Martijn van Groningen dde274d944 Replaced IOException with Exception on factory implementations' `Processor.Factory#create(Map)` method. 2015-11-30 15:37:16 +01:00
Martijn van Groningen fdf4543b8e Renamed `add` processor to `set` processor.
This name makes more sense, because if a field already exists it overwrites it.
2015-11-30 15:03:20 +01:00
javanna 43b861b076 IngestDocument to support accessing and modifying list items
When reading, through #getFieldValue and #hasField, and a list is encountered, the next element in the path is treated as the index of the item that the path points to (e.g. `list.0.key`). If the index is not a number or out of bounds, an exception gets thrown.

Added #appendFieldValue method that has the same behaviour as setFieldValue, but when a list is the last element in the path, instead of replacing the whole list it will simply add a new element to the existing list. This method is currently unused, we have to decide whether the set processor or a new processor should use it.

A few other changes made:
- Renamed hasFieldValue to hasField, as this method is not really about values but only keys. It will return true if a key is there but its value is null, while it returns false only when a field is not there at all.
- Changed null semantic in getFieldValue. null gets returned only when it was an actual value in the source, an exception is thrown otherwise when trying to access a non existing field, so that null != field not present.
- Made remove stricter about non existing fields. Throws error when trying to remove a non existing field. This is more consistent with the other methods in IngestDocument which are strict about fields that are not present.

Relates to #14324
2015-11-30 13:58:03 +01:00
Martijn van Groningen 0fe1b4eab1 PipelineStore no longer is a lifecycle component
Client in PipelineStore gets provided via a guice provider
Processor and Factory throw Exception instead of IOException
Removed PipelineExecutionService.Listener with ActionListener
2015-11-26 18:12:15 +01:00
Martijn van Groningen 9aff8c6352 fix compile errors 2015-11-26 17:00:16 +01:00
Martijn van Groningen 9d1fa0d6da ingest: Add `meta` processor that allows to modify the metadata attributes of document being processed 2015-11-26 15:46:32 +01:00
Martijn van Groningen afc9069c99 * Inlined PipelineStoreClient class into the PipelineStore class
* Moved PipelineReference to a top level class and named it PipelineDefinition
* Pulled some logic from the crud transport classes to the PipelineStore
* Use IOUtils#close(...) where appropriate
2015-11-26 14:42:39 +01:00
javanna 1a7391070f Simulate api improvements
Move ParsedSimulateRequest to SimulatePipelineRequest and remove Parser class in favor of static parse methods.
Simplified execute methods in SimulateExecutionService.
2015-11-26 13:24:45 +01:00
Martijn van Groningen 2890432421 made updatePipelines() to not make it prone to race conditions 2015-11-25 18:32:56 +01:00
javanna 5d510b59c8 use MetaData enum for metadata field names
Also rename getName to getFieldName in MetaData to prevent confusion with name() enum method.
2015-11-25 18:08:53 +01:00
javanna ec162c458e Replace property with field in IngestDocument
getPropertyValue => getFieldValue
hasPropertyValue => hasFieldValue
setPropertyValue => setFieldValue
removeProperty => removeField
2015-11-25 18:08:53 +01:00
Luca Cavanna e15fa99ee3 Merge pull request #15019 from javanna/enhancement/java8_date_parser
date formats: use a function instead of our own interface
2015-11-25 16:17:39 +01:00
javanna c4cf55c196 [TEST] generate random timezone out of the available ones in joda 2015-11-25 15:58:01 +01:00
javanna 5daa73b350 date formats: use a function instead of our own interface
Also turn the different date formats into an enum.
2015-11-25 15:37:35 +01:00
javanna e0fcee642e [TEST] fix locale comparison 2015-11-25 10:26:46 +01:00
javanna 388e637fa9 add a few more asserts to IngestActionFilterTests 2015-11-24 19:43:54 +01:00
javanna 49bfe6410e Rename Data leftovers 2015-11-24 15:49:25 +01:00
javanna 8f1f5d4da0 Split mutate processor into one processor per function 2015-11-24 14:31:53 +01:00
Martijn van Groningen 1e9d5c7b22 test: also test what happens if all index requests fail to be processed by the pipeline 2015-11-24 13:41:40 +01:00
Martijn van Groningen 8b1f117e51 Instead of failing the entire bulk request if the pipeline fails, only fail a bulk item. 2015-11-24 12:40:30 +01:00
Martijn van Groningen ecc8158b89 renamed Data to IngestDocument
moved all metadata related fields to a single metadata map
removed specific metadata getters with a generic getMetadata()
2015-11-23 18:24:42 +01:00
javanna bb298ed27a add some javadocs to Data#getDocument 2015-11-20 18:01:27 +01:00
javanna be3e913349 set Data modified flag when a property is removed and improve behaviour when adding a field 2015-11-20 17:59:45 +01:00
javanna 404ae395ca add javadocs for Data#removeProperty 2015-11-18 18:00:25 +01:00
javanna 59868cd02e add support for removeProperty 2015-11-18 17:28:37 +01:00
javanna ab5b649184 accept null values and adapt hasPropertyValue return value 2015-11-18 10:46:03 +01:00
javanna ba8f8810ea rename getProperty, containsProperty and addField methods
For more consistency we now have getPropertyValue, hasPropertyValue, and setPropertyValue
2015-11-18 10:46:03 +01:00
javanna 126df4ca9a type safety in Data#getProperty and proper error when type found is not the expected one 2015-11-18 10:46:03 +01:00
javanna e616e8398a add missing javadocs 2015-11-18 10:46:03 +01:00
javanna 044a86d6c6 remove dependency on core and resolve generics compiler warning 2015-11-18 10:46:03 +01:00
javanna ba7e536e1d better error and tests for empty and null values in Data containsProperty addField and getProperty 2015-11-18 10:46:03 +01:00
javanna 9d7d5bd9bc Throw a proper error when add field fails due to existing field type mismatch
Instead of throwing ClassCastException whenever we try and add a field to a parent that is not a Map, we now throw a clearer error (IAE).
2015-11-18 10:46:02 +01:00
Luca Cavanna 68cefe1d81 Merge pull request #14759 from talevy/fix_null_mutate_report
adds tests and guards against null values in some mutate methods
2015-11-16 10:32:32 +01:00
javanna 5169d9d80f minor formatting changes 2015-11-16 10:28:29 +01:00
Tal Levy de33f5a911 adds tests and guards against null values in some mutate methods 2015-11-15 21:57:27 -08:00
javanna 446fa0c10b remove unnecessary line breaks 2015-11-13 19:37:06 +01:00
javanna 26569045ef remove leftover equals/hashcode 2015-11-13 19:36:18 +01:00
javanna 97f4f27b14 remove equals/hashcode as part of Pipeline and adapt tests
Only MutateProcessor implemented equals / hashcode hence we would only use that one in our tests, since they relied on them. Better to not rely on equals/hashcode, drop them and mock processor/pipeline in our tests that need them. That also allow to make MutateProcessor constructor package private as the other processors.
2015-11-13 19:35:09 +01:00
javanna d093600729 simplify serialization for simulate response depending on verbose flag
Removed equals and hashcode whenever they wouldn't be reliable because of exception comparison. at the end of the day we use them for testing and we can simplify our tests without requiring equals and hashcode in prod code, which also would require more tests if maintained.

Add equals/hashcode test for Data/TransportData and randomize existing serialization tests
2015-11-13 16:22:24 +01:00
Tal Levy 20384aedf0 split out SimulateDocumentResult into subclasses, add tests for equalTo and streamable 2015-11-12 18:27:49 -08:00
Tal Levy af1de8e1cc updated with cosmetic changes 2015-11-12 13:43:05 -08:00
javanna c4951ef74f update get pipeline param names to id for consistency 2015-11-12 17:40:14 +01:00
javanna 979fa81618 make description optional as part of a Pipeline 2015-11-12 15:45:35 +01:00
javanna 75371b2381 restore initial simulate endpoint url, adapt get pipeline param name 2015-11-12 15:45:17 +01:00
javanna 66330539a3 Merge branch 'feature/ingest' into pr/14572 2015-11-12 15:39:22 +01:00
javanna 5bd4493ea2 use ConfigurationUtils to read string value from config 2015-11-12 15:38:36 +01:00
Tal Levy 674084973d moar updates 2015-11-11 21:51:45 -08:00
Tal Levy b40af1bcfd updates, moar verbose 2015-11-11 19:39:23 -08:00
Tal Levy c22c1e0f54 remove simulate executor service call and move to simple execution 2015-11-11 10:35:09 -08:00
Tal Levy 1f29fa4fe9 update rest status 2015-11-11 10:35:09 -08:00
Tal Levy bce7f6c7ad Add simulate endpoint 2015-11-11 10:35:09 -08:00
javanna c12c9e6e29 add equals and hashcode to GsubExpression 2015-11-11 11:10:26 +01:00
javanna 2b31f4fff7 Mutate processor improvements
Remove code duplications from ConfigurationUtils
Make sure that the mutate processor doesn't use Tuple as that would require to depend on core.
Also make sure that the MutateProcessor tests don't end up testing the factory as well.
Make processor getters package private as they are only needed in tests.
Add new tests to MutateProcessorFactoryTests
2015-11-11 11:10:26 +01:00
Martijn van Groningen 2bde384825 renamed yaml tests 2015-11-11 14:41:29 +07:00
Martijn van Groningen 347b8e600e specify all the dependencies of dependencies, because transitive dependencies have been disabled 2015-11-11 14:39:48 +07:00
Martijn van Groningen 4da05168f4 geoip: renamed `ip_field` option to `source_field`, because it can hold a ip or hostname.
geoip: add a `fields` option to control what fields are added by geoip processor
geoip: instead of adding all fields, only `country_code`, `city_name`, `location`, `continent_name` and `region_name` fields are added.
2015-11-10 11:50:12 +07:00
Tal Levy e9b72f5394 remove SimpleProcessor 2015-11-09 19:14:06 -08:00
Tal Levy 8fc5a3d032 introduce mutate processor.
fix forbiddenapis

update

clean up and add rest test

update mutate factory to use configuration utilities

compile gsub pattern

cleanup, update parseBooleans, null tests
2015-11-09 18:49:55 -08:00
javanna f657a7dbf3 Merge branch 'feature/ingest' into ingest/date 2015-11-09 13:42:07 +01:00
Martijn van Groningen 302621f16b geoip: don't store geoinfo if nothing can be resolved 2015-11-09 18:24:05 +07:00
javanna 1dfe6f6dcf make sure headers etc. are passed over to inner index and delete requests in put/delete pipeline 2015-11-06 19:21:30 +01:00
javanna d318990339 added REST test 2015-11-06 16:10:22 +01:00
javanna 682876f7d7 added factory tests 2015-11-06 15:44:20 +01:00
javanna dbf5c96876 Merge branch 'feature/ingest' into ingest/date 2015-11-06 13:04:51 +01:00
javanna 588de6ccff Add generic type to Processor.Factory and rename Geo*Tests to GeoIp*Tests 2015-11-06 12:53:20 +01:00
javanna 1d2e244bac Merge branch 'feature/ingest' into ingest/date 2015-11-06 12:08:14 +01:00
javanna 7798c6cd49 unified date parser tests in a single test class, added more tests for date processor 2015-11-06 12:05:30 +01:00
Martijn van Groningen e7f0f0ed4e Enforce strict pipeline configuration
Closes #14552
2015-11-05 23:38:29 +07:00
Martijn van Groningen 92452ff99a rename the `ingest` parameter to `pipeline_id` param, because it is more descriptive what the parameter should hold. 2015-11-05 14:35:49 +07:00
javanna b45815da36 adapt to changes upstream 2015-11-04 14:56:03 +01:00
javanna 363454ce4a Merge branch 'feature/ingest' into ingest/date 2015-11-04 14:48:24 +01:00
Martijn van Groningen 1a4b5bba2b Simplify processor creation from map of maps by folding the build and builder factory in one interface called Factory.
In tests processors can be created from the their constructors instead of builders.
In the IngestModule, register instances instead of class instances.
2015-11-03 13:25:58 +07:00
Tal Levy d935a3ab81 Make Grok-specific classes package-protected. 2015-11-02 21:30:09 -08:00
Martijn van Groningen 95e6b99d2b renamed test file 2015-11-03 12:17:10 +07:00
Martijn van Groningen 03387266ca ingest: Added new `geoip` processor, that adds geographical information to documents based on an ip address.
The information is fetched from the Maxmind geolite2 database, that is embedded in the ingest plugin.
2015-11-03 12:06:29 +07:00