Commit Graph

141 Commits

Author SHA1 Message Date
Jake Landis c0056cddd8
ingest: Introduction of a bytes processor ()
ingest: Introduction of a bytes processor

This processor allows for human readable byte values (e.g. 1kb) to be converted to value in bytes (e.g. 1024). Internally this processor re-uses "ByteSizeValue.parseBytesSizeValue" which supports conversions up to Long.MAX_VALUE and the following units: "b", "kb", "mb", "gb", "tb", pb".

This change also introduces a generic return type for the AbstractStringProcessor to allow for code reuse while supporting a String -> T conversion. (String -> Long in this case).
2018-07-03 10:40:56 -05:00
Armin Braun 13e1cf6191
ingest: Add ignore_missing property to foreach filter () () 2018-06-26 20:04:41 +02:00
Martijn van Groningen 6030d4be1e
[INGEST] Interrupt the current thread if evaluation grok expressions take too long ()
This adds a thread interrupter that allows us to encapsulate calls to org.joni.Matcher#search()
This method can hang forever if the regex expression is too complex.

The thread interrupter in the background checks every 3 seconds whether there are threads
execution the org.joni.Matcher#search() method for longer than 5 seconds and
if so interrupts these threads.

Joni has checks that that for every 30k iterations it checks if the current thread is interrupted and
if so returns org.joni.Matcher#INTERRUPTED

Closes 
2018-06-12 07:49:03 +02:00
Tanguy Leroux 42608881b0
[Docs] Remove mention pattern files in Grok processor ()
Pattern files have been removed in 
16fa3e546e
2018-06-11 09:32:12 +02:00
rzmf 080cefec73 Fix missing comma in ingest-node.asciidoc () 2018-04-03 11:33:44 +01:00
Nik Everett 762226bee9
Docs: Support triple quotes ()
Adds support for triple quoted strings to the documentation test
generator. Kibana's CONSOLE tool has supported them for a year but we
were unable to use them in Elasticsearch's docs because the process that
converts example snippets into tests couldn't handle this. This change
adds code to convert them into standard JSON so we can pass them to
Elasticsearch.
2018-03-16 12:46:39 -04:00
Jiri Tyr c713d62f88 [Docs] Fix link to Grok patterns () 2018-03-16 14:13:17 +01:00
Devin Young e8a78df555 Fix markdown formatting () 2018-01-26 08:15:16 -07:00
Sian Lerk Lau 5e3ba8a88d Enable convert processor to support Long and Double. ()
Closes 
2018-01-03 11:27:55 +01:00
Sian Lerk Lau 47eefbe889 Enable grok processor to support long, double and boolean () 2017-12-20 11:19:49 -08:00
David Pilato 3ca39186d1
Fix missing comma in examples () 2017-12-19 18:28:39 +01:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. ()
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes 
Closes 
2017-12-14 17:47:53 +01:00
Tal Levy 5c34533761
add json-processor support for non-map json types ()
The Json Processor originally only supported parsing field values into Maps even
though the JSON spec specifies that strings, null-values, numbers, booleans, and arrays
are also valid JSON types. This commit enables parsing these values now.

response to .
2017-11-13 10:28:19 -08:00
Tal Levy d22fd4ea58
Introduce templating support to timezone/locale in DateProcessor ()
Sometimes systems like Beats would want to extract the date's timezone and/or locale
from a value in a field of the document. This PR adds support for mustache templating
to extract these values.

Closes .
2017-11-09 09:45:32 -08:00
Loek van Gool 67e677f443
Add an example of dynamic field names () 2017-11-03 23:20:58 +01:00
Christoph Büscher c7c6443b10 [Docs] "The the" is a great band, but ... ()
Removing several occurrences of this typo in the docs and javadocs, seems to be
a common mistake. Corrections turn up once in a while in PRs, better to correct
some of this in one sweep.
2017-09-14 15:08:20 +02:00
Atothendrew c30d6ebcbb [Docs] Correct json example in ingest-node.asciidoc () 2017-08-21 11:07:44 +02:00
Tal Levy 872526cad3 add URL-Decode Processor to Ingest ()
closes 

Adds a URL Decoder Processor to Ingest

this will decode urls like:

https%3a%2f%2felastic.co%2 to https://elastic.co/
2017-08-07 10:26:11 -07:00
Clinton Gormley ff4a2519f2 Update experimental labels in the docs ()
Relates https://github.com/elastic/elasticsearch/issues/19798

Removed experimental label from:
* Painless
* Diversified Sampler Agg
* Sampler Agg
* Significant Terms Agg
* Terms Agg document count error and execution_hint
* Cardinality Agg precision_threshold
* Pipeline Aggregations
* index.shard.check_on_startup
* index.store.type (added warning)
* Preloading data into the file system cache
* foreach ingest processor
* Field caps API
* Profile API

Added experimental label to:
* Moving Average Agg Prediction


Changed experimental to beta for:
* Adjacency matrix agg
* Normalizers
* Tasks API
* Index sorting

Labelled experimental in Lucene:
* ICU plugin custom rules file
* Flatten graph token filter
* Synonym graph token filter
* Word delimiter graph token filter
* Simple pattern tokenizer
* Simple pattern split tokenizer

Replaced experimental label with warning that details may change in the future:
* Analysis explain output format
* Segments verbose output format
* Percentile Agg compression and HDR Histogram
* Percentile Rank Agg HDR Histogram
2017-07-18 14:06:22 +02:00
Glen Smith e9dfb2a215 Fix another simulate example in ingest docs
When simulating an ingest pipeline against an existing pipeline, the
_source field is required to wrap each doc. This commit fixes another
example in the docs that is missing this.
    
Relates , relates e3a0c11239
2017-07-17 15:17:42 +09:00
Glen Smith e3a0c11239 Fix simulate example in ingest docs
When simulating an ingest pipeline against an existing pipeline, the
_source field is required to wrap each doc. This commit fixes an example
in the docs that is missing this.

Relates 
2017-07-17 14:17:41 +09:00
olcbean 2ba9fd2aec Remove deprecated created and found from index, delete and bulk ()
The created and found fields in index and delete responses became obsolete after the introduction of the result field in index, update and delete responses ().

After deprecating the created and found fields in 5.x (), now they are removed.

Fixes 
2017-07-07 13:58:46 -04:00
Clinton Gormley 0170e0e8d3 Remove usage of multi-types from the docs and added a page explaining type removal ()
Closes 
2017-07-05 12:30:19 +02:00
DeDe Morton 6442d1f75e [Docs] Add link to grok debugger docs () 2017-06-28 16:14:16 -07:00
Alexander Kazakov 53b74348ff Fix documentation for script processor () 2017-06-26 12:14:23 -07:00
Alexander Kazakov a7dafdaa05 Add target_field parameter to gsub, join, lowercase, sort, split, trim, uppercase ()
Closes  
2017-06-13 09:40:44 -07:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id ()
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Tal Levy a771912a22 Add Ingest-Processor specific Rest Endpoints & Add Grok endpoint ()
This PR enables Ingest plugins to leverage processor-scoped REST
endpoints. First of which being the Grok endpoint that retrieves
Grok Patterns for users to retrieve all the built-in patterns.
Example usage: Kibana Grok Autocomplete!
2017-06-08 15:24:35 -07:00
Guillaume Le Floch 3f6d80aa66 Allow removing multiple fields in ingest processor ()
* Allow removing multiple fields in ingest processor

* Iteration 2

* Few fixes
2017-06-08 13:17:44 -07:00
Tal Levy e51246023a add `exclude_keys` option to KeyValueProcessor ()
and modify data-structure of `include_keys` and `exclude_keys` to be
backed by a HashSet
2017-06-05 14:12:48 -07:00
Tal Levy dfe2ecaa28 add docs example for Ingest scripts manipulating document metadata ()
It may not be clear to users that the Ingest ScriptProcessor context object `ctx` can 
manipulate document metadata like `_index` and `_type`.
2017-05-25 07:45:19 -07:00
Ryan Ernst 463fe2f4d4 Scripting: Remove file scripts ()
This commit removes file scripts, which were deprecated in 5.5.

closes 
2017-05-17 14:42:25 -07:00
Nik Everett a01f846226 CONSOLEify a few more docs
Adds CONSOLE to cross-cluster-search docs but skips them for testing
because we don't have a second cluster set up. This gets us the
`VIEW IN CONSOLE` and `COPY AS CURL` links and makes sure that they
are valid yaml (not json, technically) but doesn't get testing.
Which is better than we had before.

Adds CONSOLE to the dynamic templates docs and ingest-node docs.
The ingest-node docs contain a *ton* of non-console snippets. We
might want to convert them to full examples later, but that can be
a separate thing.

Relates to 
2017-05-04 21:01:14 -04:00
Jason Tedor 4796557a30 Add primary term to doc write response
This commit adds the primary term to the doc write response.

Relates 
2017-04-19 14:44:22 -04:00
Glen Smith 3ff014d07d ingest-node.asciidoc - Clarify json processor ()
Add examples for the json processor.
2017-04-18 23:27:26 -04:00
Clinton Gormley 5cf13f29bb Update ingest-node.asciidoc
Fixed docs typo
2017-03-22 10:44:11 +01:00
Gameldar e3eb363882 Link directly to the attachments in arrays section
The link should be made to the relevant section of the ingest attachments documentation, rather than the top of the page.
2016-12-22 20:52:08 +08:00
gameldar d404ee3533 Add ingest-attachment-with-arrays section to ingest attachments doc
Added a new section detailing how to use the attachment processor
within an array.

This reverts commit  and instead links to the foreach processor.
2016-12-22 00:18:33 +08:00
Tal Levy c53b2ee9cd introduce KV Processor in Ingest Node ()
Now you can parse field values of the `key=value` variety and have
`key` be inserted as a field name in an ingest document.

Closes .
2016-12-20 13:26:17 -08:00
Tal Levy ad4b1ecdeb [docs] update ingest-node delete docs to mention wildcarding () 2016-12-20 10:52:17 -08:00
Tal Levy bb37167946 Enables the ability to inject serialized json fields into root of document. ()
The JSON processor has an optional field called "target_field".
If you don't specify target_field then target_field becomes what you specified as "field".
There isn't anyway to add the fields to the root of a document. By
setting `add_to_root`, now serialized fields will be inserted into the
top-level fields of the ingest document.

Closes .
2016-12-16 10:17:27 -08:00
Pablo Musa 152efe95e6 Small typo fix in the docs. ()
There is a small typo in the convert processor code example.
2016-12-16 14:50:06 +01:00
Jason Tedor d06a8903fd Merge branch 'master' into feature/seq_no
* master: (22 commits)
  Add proper toString() method to UpdateTask ()
  Fix `InternalEngine#isThrottled` to not always return `false`. ()
  add `ignore_missing` option to SplitProcessor ()
  fix trace_match behavior for when there is only one grok pattern ()
  Remove dead code from GetResponse.java
  Fixes date range query using epoch with timezone ()
  Do not cache term queries. ()
  Updated dynamic mapper section
  Docs: Clarify date_histogram bucket sizes for DST time zones
  Handle release of 5.0.1
  Fix skip reason for stats API parameters test
  Reduce skip version for stats API parameter tests
  Strict level parsing for indices stats
  Remove cluster update task when task times out ()
  [DOCS] Mention "all-fields" mode doesn't search across nested documents
  InternalTestCluster: when restarting a node we should validate the cluster is formed via the node we just restarted
  Fixed bad asciidoc in boolean mapping docs
  Fixed bad asciidoc ID in node stats
  Be strict when parsing values searching for booleans ()
  Fix time zone rounding edge case for DST overlaps
  ...
2016-11-16 09:10:35 -05:00
Tal Levy 6796464f16 add `ignore_missing` option to SplitProcessor ()
Closes .
2016-11-16 15:46:09 +02:00
Tal Levy 04b712bdc5 fix trace_match behavior for when there is only one grok pattern ()
There is an issue in the Grok Processor, where trace_match: true does not inject the _ingest._grok_match_index into the ingest-document when there is just one pattern provided. This is due to an optimization in the regex construction. This commit adds a check for when this is the case, and injects a static index value of "0", since there is only one pattern matched (at the first index into the patterns).

To make this clearer, more documentation was added to the grok-processor docs.

Fixes .
2016-11-16 15:41:54 +02:00
Jason Tedor 33f7cd5a16 Remove shard ID from doc write response
This commit removes the shard ID from doc write response; this was
useful for debugging but its time has passed.

Relates 
2016-11-11 15:18:25 -05:00
Jason Tedor d3417fb022 Merge branch 'master' into feature/seq_no
* master: (516 commits)
  Avoid angering Log4j in TransportNodesActionTests
  Add trace logging when aquiring and releasing operation locks for replication requests
  Fix handler name on message not fully read
  Remove accidental import.
  Improve log message in TransportNodesAction
  Clean up of Script.
  Update Joda Time to version 2.9.5 ()
  Remove unused ClusterService dependency from SearchPhaseController ()
  Remove max_local_storage_nodes from elasticsearch.yml ()
  Wait for all reindex subtasks before rethrottling
  Correcting a typo-Maan to Man-in README.textile ()
  Fix InternalSearchHit#hasSource to return the proper boolean value ()
  Replace all index date-math examples with the URI encoded form
  Fix typos ()
  Adapt ES_JVM_OPTIONS packaging test to ubuntu-1204
  Add null check in InternalSearchHit#sourceRef to prevent NPE ()
  Add VirtualBox version check ()
  Export ES_JVM_OPTIONS for SysV init
  Skip reindex rethrottle tests with workers
  Make forbidden APIs be quieter about classpath warnings ()
  ...
2016-11-10 23:40:33 -05:00
Tal Levy 38c650f376 make painless the default scripting language for ScriptProcessor ()
- fixes a bug in the docs that mentions `lang` as optional
- now `lang` defaults to "painless"
2016-10-18 16:22:01 -07:00
Chris Earle 9cf7214380 [DOCS] Add "version" to template and pipeline docs ()
* [DOCS] Add "version" to template and pipeline docs

This adds details about the "version" to both the template and pipeline pages.
2016-10-18 11:56:18 -04:00
Pascal Borreli fcb01deb34 Fixed typos () 2016-10-10 14:51:47 -06:00
Boaz Leskes 27eab74510 merge from master 2016-09-30 17:19:30 +02:00
Martijn van Groningen 55dce523c2 docs: marked `foreach` processor as experimental
Closes 
2016-09-30 12:23:42 +02:00
Jason Tedor 8879360f66 Fix failing doc tests in feature/seq_no
This commit fixes failing doc tests in feature/seq_no after merging
master into this branch.
2016-09-29 03:58:02 +02:00
Tal Levy f3a5ee671b [docs] [fix] `field` is no longer an option for the script processor () 2016-09-29 03:04:43 +02:00
Tal Levy 550a0449bc [docs] [fix] `field` is no longer an option for the script processor () 2016-09-28 21:50:32 +02:00
Tal Levy 9f1f5fdedc introduce the JSON Processor ()
introduce the JSON Processor
2016-09-09 14:34:32 -07:00
Tal Levy dda32545bb add ignore_missing option to relevant processors () 2016-09-09 12:20:18 -07:00
Martijn van Groningen 6f6d17dc9c ingest: Add `dot_expander` processor that can turn fields with dots in the field name into object fields. 2016-09-05 07:28:38 +02:00
Igor Motov b36fbc4452 Add support for parameters to the script ingest processor
The script processor should support `params` to be consistent with all other script consumers.
2016-08-24 16:49:48 -04:00
Tal Levy bf046f8f93 update ingest date index name processor with runnable CONSOLE examples () 2016-08-11 11:36:14 -07:00
Martijn van Groningen a91bb29585 ingest: Made the response format of the get pipeline api match with the response format of the index template api
Closes 
2016-07-29 17:58:30 +02:00
Martijn van Groningen 24d7fa6d54 ingest: Change the `foreach` processor to use the `_ingest._value` ingest metadata attribute to store the current array element being processed.
Closes 
2016-07-27 09:35:09 +02:00
Martijn van Groningen 1bc12f5214 docs: fix broken link
Closes 
2016-07-14 11:12:47 +02:00
Tal Levy 8fd01554bc update foreach processor to only support one applied processor. ()
Closes .
2016-07-13 13:13:00 -07:00
javanna 62462f5d9b [TEST] replace ResponseBodyAssertion with existing MatchAssertion
We introduced a special response_body assertion to test our docs snippets. The match assertion does the same job though and can be reused and adapted where needed. ResponseBodyAssertion contains provides much better and accurate errors though, which can be now utilized in MatchAssertion so that many more REST tests can benefit from readable error messages.

 Each response body gets always stashed and can be retrieved for later evaluations already. Instead of providing the response body as strings that get parsed to json objects separately, then converted to maps as ResponseBodyAssertion did, we parse everything once, the json is part of the yaml test, which is supported. The only downside is that json comments cannot be used, rather yaml comments should be used (// C style vs # ). There were only two docs tests that were using comments in ingest-node.asciidoc where I went ahead and remove the comments which didn't seem that useful anyways.
2016-07-01 11:13:10 +02:00
David Pilato 157645fe9e Merge pull request from elastic/doc/ingest-foreach
Wrong name for values field
2016-06-22 23:14:02 +02:00
Adrien Grand db9af54ec0 Remove `_timestamp` and `_ttl` on 5.x indices.
This removes the ability to use `_timestamp` and `_ttl` on indices created on
or after 5.0.

Closes 
2016-06-22 08:35:54 +02:00
David Pilato cb8073e990 Wrong name for values field
We wrote that the document is:

```json
{
  "value" : ["foo", "bar", "baz"]
}
```

But the processor is using a `values` field:

```json
{
  "foreach" : {
    "field" : "values",
    "processors" : [
      // ...
    ]
  }
}
```

It should be `values`.
2016-06-20 18:58:41 +02:00
Tal Levy a26260fb72 new ScriptProcessor for Ingest ()
add new ScriptProcessor for executing ES Scripts within pipelines
2016-06-15 14:57:18 -07:00
Martijn van Groningen a2ad5c0282 docs: fix typo
Closes 
2016-06-15 10:56:46 +02:00
Christoph Wurm d71894a226 Update ingest-node.asciidoc
Fixed forgotten rename from `match_formats` to `formats` in documentation (changed in dd2184ab25)
2016-06-07 15:47:38 +02:00
Martijn van Groningen 766789b0f0 ingest: added `ignore_failure` option to all processors
If this option is enabled on a processor it silently catches any processor related failure and continues executing the rest of the pipeline.

 Closes 
2016-06-01 10:29:12 +02:00
Tal Levy edfbdf2748 add ability to specify multiple grok patterns ()
- now you can specify a list of grok patterns to match your field with
and the first one to successfully match wins.
- only non-null captures will be inserted into your matched document.

Fixes .
2016-05-25 12:20:39 -07:00
Zachary Tong 7c46b57ff2 Add a Sort ingest processor
Sorts an array of values in ascending or descending order. If all elements are numerics, they will be sorted numerically. If values are strings, or mixtures of strings/numbers, the elements will be sorted lexicographically.
2016-05-17 12:06:48 -04:00
Clinton Gormley 3f594089c2 Renamed all AUTOSENSE snippets to CONSOLE () 2016-05-09 15:42:23 +02:00
Nik Everett 4b1c116461 Generate and run tests from the docs
Adds infrastructure so `gradle :docs:check` will extract tests from
snippets in the documentation and execute the tests. This is included
in `gradle check` so it should happen on CI and during a normal build.

By default each `// AUTOSENSE` snippet creates a unique REST test. These
tests are executed in a random order and the cluster is wiped between
each one. If multiple snippets chain together into a test you can annotate
all snippets after the first with `// TEST[continued]` to have the
generated tests for both snippets joined.

Snippets marked as `// TESTRESPONSE` are checked against the response
of the last action.

See docs/README.asciidoc for lots more.

Closes . That issue is about catching bugs in the docs during build.
This catches *some* bugs in the docs during build which is a good start.
2016-05-05 13:58:03 -04:00
Martijn van Groningen 7aca1389e2 ingest: Add `date_index_name` processor.
Closes 
2016-04-29 17:20:48 +02:00
Tal Levy 6302fb65a3 add ability to disable ability to override values of existing fields in set processor 2016-04-28 13:50:19 -07:00
Martijn van Groningen dd2184ab25 ingest: Streamline option naming for several processors:
* `rename` processor, renamed `to` to `target_field`
* `date` processor, renamed `match_field` to `field` and renamed `match_formats` to `formats`
* `geoip` processor, renamed `source_field` to `field` and renamed `fields` to `properties`
* `attachment` processor, renamed `source_field` to `field` and renamed `fields` to `properties`

Closes 
2016-04-21 13:40:43 +02:00
Clinton Gormley 102a398d9f Fixed split processor example 2016-04-19 14:11:45 +02:00
Clinton Gormley a2ab13ddd1 Update ingest-node.asciidoc
Documented `separator` in the `split processor

Closes https://github.com/elastic/elasticsearch/issues/17831
2016-04-19 11:11:58 +02:00
Martijn van Groningen 16fa3e546e docs: remove mention of file based grok pattern 2016-04-13 22:51:12 +02:00
Martijn van Groningen ca5bd89581 docs: adjust grok processor docs to not mention pattern files as these no longer exist
Closes 
2016-04-13 12:37:50 +02:00
Tal Levy 2064fe3985 add type conversion support to ConvertProcessor 2016-03-29 07:56:53 -07:00
Clinton Gormley 432f0cc193 Docs: Added the ingest node to the modules/nodes page
Closes 
2016-03-15 19:03:18 +01:00
Martijn van Groningen 2fa33d5c47 Added ingest statistics to node stats API
The ingest stats include the following statistics:
* `ingest.total.count`- The total number of document ingested during the lifetime of this node
* `ingest.total.time_in_millis` - The total time spent on ingest preprocessing documents during the lifetime of this node
* `ingest.total.current` - The total number of documents currently being ingested.
* `ingest.total.failed` - The total number ingest preprocessing operations failed during the lifetime of this node

Also these stats are returned on a per pipeline basis.
2016-03-10 13:21:43 +01:00
Martijn van Groningen 82d01e4315 Added ingest info to node info API, which contains a list of available processors.
Internally the put pipeline API uses this information in node info API to validate if all specified processors in a pipeline exist on all nodes in the cluster.
2016-03-07 14:44:50 +01:00
DeDe Morton 6b52b0bdc3 Ingest node edits 2016-03-03 22:29:27 -08:00
Martijn van Groningen ddc2b60c4b docs: fix incorrect grok pattern parameter names
Closes 
2016-02-26 06:38:25 -08:00
DeDe Morton d3e3f19a1f Change topic order 2016-02-12 15:00:07 -08:00
DeDe Morton 2734737d55 Add ingest docs to the build 2016-02-11 14:16:56 -08:00