Commit Graph

5394 Commits

Author SHA1 Message Date
Przemko Robakowski a7f0c699cf
Fix ignore_missing in CsvProcessor (#51600) (#51609)
This change fixes inverted logic around ignore_missing in CsvProcessor
2020-01-29 14:58:23 +01:00
Ioannis Kakavas ba3051a50f
Mute Netty4ClientYamlTestSuiteIT in FIPS 140 (#51536)
rest-api-spec/test/10_basic.yml would check that transport_types is
`netty4` but we run FIPS 140 tests with default distribution and
transport_types is `security4`
2020-01-29 08:16:47 +02:00
Gordon Brown 89c2834b24
Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959)
This commit deprecates the creation of dot-prefixed index names (e.g.
.watches) unless they are either 1) a hidden index, or 2) registered by
a plugin that extends SystemIndexPlugin. This is the first step
towards more thorough protections for system indices.

This commit also modifies several plugins which use dot-prefixed indices
to register indices they own as system indices, and adds a plugin to
register .tasks as a system index.
2020-01-28 10:01:16 -07:00
Henning Andersen 9085024e1d Disable reindex against 0.90 on mac (#51449)
We still test remote reindex against version 0.90. This failed on mac a
few times and rather than spend time investigating this, we no longer
test remote reindex against 0.90 on mac.

Closes #51202
2020-01-27 12:42:12 +01:00
Ioannis Kakavas ee202a642f
Enable tests in FIPS 140 in JDK 11 (#49485)
This change changes the way to run our test suites in 
JVMs configured in FIPS 140 approved mode. It does so by:

- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.

- When runtime java is 11 and higher, using BouncyCastle FIPS 
Cryptographic provider and BCJSSE in FIPS mode. These are 
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests

- When runtime java is 8, using BouncyCastle FIPS 
Cryptographic provider and SunJSSE in FIPS mode. 

Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
2020-01-27 11:14:52 +02:00
Przemko Robakowski 3fb7ad0e67
[7.x] Refactor ForEachProcessor to use iteration instead of recursion (#51104) (#51322)
* Refactor ForEachProcessor to use iteration instead of recursion (#51104)

* Refactor ForEachProcessor to use iteration instead of recursion

This change makes ForEachProcessor iterative and still non-blocking.
In case of non-async processors we use single for loop and no recursion at all.
In case of async processors we continue work on either current thread or thread
started by downstream processor, whichever is slower (usually processor thread).
Everything is synchronised by single atomic variable.

Relates #50514

* Update IngestCommonPlugin.java
2020-01-22 20:03:37 +01:00
Stuart Tettemer 41c15b438d
Scripting: Add char position of script errors (#51069) (#51266)
Add the character position of a scripting error to error responses.

The contents of the `position` field are experimental and subject to
change.  Currently, `offset` refers to the character location where the
error was encountered, `start` and `end` define a range of characters
that contain the error.

eg.
```
{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
          "y = x;",
          "     ^---- HERE"
        ],
        "script": "def x = new ArrayList(); Map y = x;",
        "lang": "painless",
        "position": {
          "offset": 33,
          "start": 29,
          "end": 35
        }
      }
```

Refs: #50993
2020-01-21 13:45:59 -07:00
Nik Everett ca15a3f5a8
Add "did you mean" to unknown queries (#51177) (#51254)
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
2020-01-21 12:45:52 -05:00
Marios Trivyzas fda25ed04a
Fix caching for PreConfiguredTokenFilter (#50912) (#51091)
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
(cherry picked from commit 24e1858)
2020-01-16 13:58:02 +01:00
Martijn van Groningen 02dfd71efa
Backport: Add pipeline name to ingest metadata (#51050)
Backport: #50467

This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.

Example usage in pipeline:
PUT /_ingest/pipeline/2
{
    "processors": [
        {
            "set": {
                "field": "pipeline_name",
                "value": "{{_ingest.pipeline}}"
            }
        }
    ]
}

Closes #42106
2020-01-16 10:50:47 +01:00
Nik Everett fc5fde7950
Add "did you mean" to ObjectParser (#50938) (#50985)
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
  "dac": {}
}'

{
  "error" : {
    "root_cause" : [
      {
        "type" : "x_content_parse_exception",
        "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
      }
    ],
    "type" : "x_content_parse_exception",
    "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
  },
  "status" : 400
}
```

The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
2020-01-14 17:53:41 -05:00
Christoph Büscher 2f13751bad
Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862) (#50991)
We deprecated and removed the camel-case versions of the nGram and edgeNGram
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in
7. Usage will be disallowed on new indices starting with 8 then.
2020-01-14 21:42:34 +01:00
Alan Woodward 4974f56b25 Fix analysis BWC tests - warnings now emitted on index creation 2020-01-14 14:48:40 +00:00
Alan Woodward 8c16725a0d Check for deprecations when analyzers are built (#50908)
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349
2020-01-14 13:52:02 +00:00
Jake Landis de6f132887
[7.x] Foreach processor - fork recursive call (#50514) (#50773)
A very large number of recursive calls can cause a stack overflow
exception. This commit forks the recursive calls for non-async
processors. Once forked, each thread will handle at most 10
recursive calls to help keep the stack size and thread count
down to a reasonable size.
2020-01-09 13:21:18 -06:00
Christoph Büscher b1b4282273 Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter,
it is not reloaded via the _reload_search_analyzers because the multiplexer
itself doesn't pass on the analysis mode of the filters it contains, so its not
recognized as "updateable" in itself. Instead we can check and merge the
AnalysisMode settings of all filters in the multiplexer and use the resulting
mode (e.g. search-time only) for the multiplexer itself, thus making any synonym
filters contained in it reloadable.  This, of course, will also make the
analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 22:12:01 +01:00
Henning Andersen 125feecabc
Guess root cause support unwrap (#50525) (#50742)
ElasticsearchException.guessRootCauses would return wrapper exception if
inner exception was not an ElasticsearchException. Fixed to never return
wrapper exceptions.

At least following APIs change root_cause.0.type as a result:

_update with bad script
_index with bad pipeline

Relates #50417
2020-01-08 19:09:14 +01:00
Adrien Grand 4f2299c714
Upgrade to Lucene 8.4.0. (#50518) (#50750) 2020-01-08 18:53:59 +01:00
Adrien Grand 31158ab3d5
Add per-field metadata. (#50333)
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.

In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
 - keys can't be longer than 20 chars,
 - values can only be numbers or strings of no more than 50 chars - no inner
   arrays or objects,
 - the metadata can't have more than 5 keys in total.

Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.

Here is how the meta might look like in mappings:

```json
{
  "properties": {
    "latency": {
      "type": "long",
      "meta": {
        "unit": "ms"
      }
    }
  }
}
```

And then in the field capabilities response:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms" ]
      }
    }
  }
}
```

When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:

```json
{
  "latency": {
    "long": {
      "searchable": true,
      "aggreggatable": true,
      "meta": {
        "unit": [ "ms", "ns" ]
      }
    }
  }
}
```

Closes #33267
2020-01-08 16:21:18 +01:00
Alexander Reelsen 71054d269b Sync grok patterns with logstash patterns (#50381)
In order to ensure that logstash and Elasticsearch are able to understand
the same patterns, this commit adapts to changes in logstash, adds a few
patterns and changes a few.
2020-01-08 14:59:34 +01:00
Mayya Sharipova 0b7309ec9c Fix NPE bug inner_hits (#50709)
When there several subqueries on different relations of the join field,
and only one of subqueries is using inner_hits, NPE occurs.
This PR prevents NPE error.

Closes #50539
2020-01-07 14:21:54 -05:00
Alan Woodward a3ab7eb95d Correctly handle MSM for nested disjunctions (#50669)
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.

This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.

Fixes #50305
2020-01-07 09:32:30 +00:00
Nik Everett 45663ac1a8
Use Void context on parsers where possible (#50573) (#50617)
*Most* of our parsing can be done without passing any extra context into
the parser that isn't already part of the xcontent stream. While I was
looking around at the places that *do* need a context I found a few
places that were declared to need a context but don't actually need it.
2020-01-03 13:28:55 -05:00
Nik Everett 4d58656065
Declare remaining parsers `final` (#50571) (#50615)
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.

I found the non-final parsers with this:
```
diff \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
  2>&1 | grep '^<'
```
2020-01-03 11:48:11 -05:00
Nik Everett b36a8ab141
Make some ObjectParsers final (#50471) (#50556)
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to a bunch
of these parsers, mostly the ones in xpack and their "paired" parsers in
the high level rest client. I picked these just to have somewhere to
break the up the change so it wouldn't be huge.

I found the non-final parsers with this:
```
diff \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
  <(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
  2>&1 | grep '^<'
```
2020-01-02 10:47:38 -05:00
Christoph Büscher 6258d25458
Log deprecation for nGram and edgeNGram custom filters (#50376) (#50445)
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
deprecation warnings for `nGram` and `edgeNGram` respectively on version 7
indices.

Relates #50360
2019-12-20 22:00:08 +01:00
Stuart Tettemer f212994c16
[TEST] Unknown scripting annotations raise error (#50343) (#50346)
Ensure that unknown annotations, such as typo'd `@nondeterministic`, 
will raise an exception.
2019-12-19 16:22:22 -07:00
Stuart Tettemer 689df1f28f
Scripting: ScriptFactory not required by compile (#50344) (#50392)
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory.  Factories may optionally implement
ScriptFactory.  If so, then they can indicate determinism and thus
cacheability.

**Backport**

Relates: #49466
2019-12-19 12:50:25 -07:00
Stuart Tettemer 06a24f09cf
Scripting: Cache script results if deterministic (#50106) (#50329)
Cache results from queries that use scripts if they use only
deterministic API calls.  Nondeterministic API calls are marked in the
whitelist with the `@nondeterministic` annotation.  Examples are
`Math.random()` and `new Date()`.

Refs: #49466
2019-12-18 13:00:42 -07:00
Przemko Robakowski 0efb241b3c
Fix flakiness in CsvProcessorTests (#50254) (#50256)
There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields.

Closes #50209
2019-12-17 01:15:15 +01:00
Ignacio Vera b5ec227de8
upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129) (#50132) 2019-12-12 13:13:37 +01:00
Armin Braun 6eee41e253
Remove Unused Single Delete in BlobStoreRepository (#50024) (#50123)
* Remove Unused Single Delete in BlobStoreRepository

There are no more production uses of the non-bulk delete or the delete that throws
on missing so this commit removes both these methods.
Only the bulk delete logic remains. Where the bulk delete was derived from single deletes,
the single delete code was inlined into the bulk delete method.
Where single delete was used in tests it was replaced by bulk deleting.
2019-12-12 11:17:46 +01:00
Przemko Robakowski 4619834b97
[7.x] CSV ingest processor (#49509) (#50083)
* CSV ingest processor (#49509)

This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.

Closes #49113
2019-12-11 23:06:05 +01:00
Jack Conradson eb20db8a1c Update Painless AST Catch Node (#50044)
This makes two changes to the catch node:

1. Use SDeclaration to replace independent variable usage.
2. Use a DType to set a "minimum" exception type - this allows us to require 
users to continue using Exception as "minimum" type for catch blocks, but 
for us to internally catch Error/Throwable. This is a required step to 
removing custom try/catch blocks from SClass.
2019-12-10 12:56:34 -08:00
Adrien Grand 87e72156ce
Upgrade to lucene 8.4.0-snapshot-662c455. (#50016) (#50039)
Lucene 8.4 is about to be released so we should check it doesn't cause problems
with Elasticsearch.
2019-12-10 18:04:58 +01:00
Alan Woodward 3d8c2f9e18 Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803)
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
2019-12-10 11:01:52 +00:00
Przemko Robakowski d7083a84f4
Allow list of IPs in geoip ingest processor (#49573) (#49947)
* Allow list of IPs in geoip ingest processor

This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.

Closes #46193
2019-12-07 00:19:09 +01:00
Stuart Tettemer 17cda5b2c0
Scripting: Groundwork for caching script results (#49895) (#49944)
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic.  This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.

Script results were never cached and that does not change here.  Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.

Refs: #49466

**Backport**
2019-12-06 15:08:05 -07:00
Jake Landis 1c5a139968
Update jackson-databind to 2.8.11.4 (#49347) (#49937) 2019-12-06 13:39:33 -06:00
Henning Andersen 1d3feaf18e
Reindex sort deprecation warning take 2 (#49855) (#49899)
Moved the deprecation warning to ReindexValidator to ensure it runs
early and works with resilient reindex. Also check that the warning
is reported back for wait_for_completion=false.

Follow-up to #49458
2019-12-06 09:44:36 +01:00
Jack Conradson cd3744c0b7 Add nodes to handle types (#49785)
This PR adds 3 nodes to handle types defined by a front-end creating a 
Painless AST. These types are decided with data immutability in mind - 
hence the reason for more than a single node.
2019-12-05 17:09:19 -08:00
Zachary Tong fec882a457 Decouple pipeline reductions from final agg reduction (#45796)
Historically only two things happened in the final reduction:
empty buckets were filled, and pipeline aggs were reduced (since it
was the final reduction, this was safe).  Usage of the final reduction
is growing however.  Auto-date-histo might need to perform
many reductions on final-reduce to merge down buckets, CCS
may need to side-step the final reduction if sending to a
different cluster, etc

Having pipelines generate their output in the final reduce was
convenient, but is becoming increasingly difficult to manage
as the rest of the agg framework advances.

This commit decouples pipeline aggs from the final reduction by
introducing a new "top level" reduce, which should be called
at the beginning of the reduce cycle (e.g. from the SearchPhaseController).
This will only reduce pipeline aggs on the final reduce after
the non-pipeline agg tree has been fully reduced.

By separating pipeline reduction into their own set of methods,
aggregations are free to use the final reduction for whatever
purpose without worrying about generating pipeline results
which are non-reducible
2019-12-05 16:11:54 -05:00
Jack Conradson 687c6648d9 Minor Painless Clean Up (#49844)
This cleans up two minor things.
- Cleans up style of == false
- Pulls maxLoopCounter into a member variable instead of accessing
CompilerSettings multiple times in the SFunction node
2019-12-05 12:20:07 -08:00
Stuart Tettemer 426c7a5e8f
Scripting: add available languages & contexts API (#49652) (#49815)
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.

Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.

For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.

Response format:
```
{
  "types_allowed": [
    "inline",
    "stored"
  ],
  "language_contexts": [
    {
      "language": "expression",
      "contexts": [
        "aggregation_selector",
        "aggs"
        ...
      ]
    },
    {
      "language": "painless",
      "contexts": [
        "aggregation_selector",
        "aggs",
        "aggs_combine",
        ...
      ]
    }
...
  ]
}
```

Fixes: #49463 

**Backport**
2019-12-04 16:18:22 -07:00
Jack Conradson dbf6183469 Remove extraneous pass (#49797)
This removes the storeSettings pass where nodes in the AST could store 
information they needed out of CompilerSettings for use during later 
passes. CompilerSettings is part of ScriptRoot which is available during the 
analysis pass making the storeSettings pass redundant.
2019-12-04 12:18:04 -08:00
Armin Braun 91ac87d75b
Stop Allocating Buffers in CopyBytesSocketChannel (#49825) (#49832)
* Stop Allocating Buffers in CopyBytesSocketChannel (#49825)

The way things currently work, we read up to 1M from the channel
and then potentially force all of it into the `ByteBuf` passed
by Netty. Since that `ByteBuf` tends to by default be `64k` in size,
large reads will force the buffer to grow, completely circumventing
the logic of `allocHandle`.

This seems like it could break
`io.netty.channel.RecvByteBufAllocator.Handle#continueReading`
since that method for the fixed-size allocator does check
whether the last read was equal to the attempted read size.
So if we set `64k` because that's what the buffer size is,
then wirte `1M` to the buffer we will stop reading on the IO loop,
even though the channel may still have bytes that we can read right away.

More imporatantly though, this can lead to running OOM quite easily
under IO pressure as we are forcing the heap buffers passed to the read
to `reallocate`.

Closes #49699
2019-12-04 19:36:52 +01:00
Armin Braun 996cddd98b
Stop Copying Every Http Request in Message Handler (#44564) (#49809)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
2019-12-04 08:41:42 +01:00
Jason Tedor 0f27c0b702
Extend systemd timeout during startup (#49784)
When we are notifying systemd that we are fully started up, it can be
that we do not notify systemd before its default timeout of sixty
seconds elapses (e.g., if we are upgrading on-disk metadata). In this
case, we need to notify systemd to extend this timeout so that we are
not abruptly terminated. We do this by repeatedly sending
EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this
every fifteen seconds. This will prevent systemd from abruptly
terminating us during a long startup. We cancel the scheduled execution
of this notification after we have successfully started up.
2019-12-03 14:25:45 -05:00
Henning Andersen 5adb33ec17
Deprecate sorting in reindex (#49458) (#49738)
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.

It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.

Related to #47567
2019-12-01 19:24:27 +01:00
Henning Andersen 1d745f1e5c Revert "Deprecate sorting in reindex (#49458)"
This reverts commit 27d45c9f1f.
2019-11-29 22:08:19 +01:00