Commit Graph

5365 Commits

Author SHA1 Message Date
Przemko Robakowski 0efb241b3c
Fix flakiness in CsvProcessorTests (#50254) (#50256)
There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields.

Closes #50209
2019-12-17 01:15:15 +01:00
Ignacio Vera b5ec227de8
upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129) (#50132) 2019-12-12 13:13:37 +01:00
Armin Braun 6eee41e253
Remove Unused Single Delete in BlobStoreRepository (#50024) (#50123)
* Remove Unused Single Delete in BlobStoreRepository

There are no more production uses of the non-bulk delete or the delete that throws
on missing so this commit removes both these methods.
Only the bulk delete logic remains. Where the bulk delete was derived from single deletes,
the single delete code was inlined into the bulk delete method.
Where single delete was used in tests it was replaced by bulk deleting.
2019-12-12 11:17:46 +01:00
Przemko Robakowski 4619834b97
[7.x] CSV ingest processor (#49509) (#50083)
* CSV ingest processor (#49509)

This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.

Closes #49113
2019-12-11 23:06:05 +01:00
Jack Conradson eb20db8a1c Update Painless AST Catch Node (#50044)
This makes two changes to the catch node:

1. Use SDeclaration to replace independent variable usage.
2. Use a DType to set a "minimum" exception type - this allows us to require 
users to continue using Exception as "minimum" type for catch blocks, but 
for us to internally catch Error/Throwable. This is a required step to 
removing custom try/catch blocks from SClass.
2019-12-10 12:56:34 -08:00
Adrien Grand 87e72156ce
Upgrade to lucene 8.4.0-snapshot-662c455. (#50016) (#50039)
Lucene 8.4 is about to be released so we should check it doesn't cause problems
with Elasticsearch.
2019-12-10 18:04:58 +01:00
Alan Woodward 3d8c2f9e18 Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803)
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
2019-12-10 11:01:52 +00:00
Przemko Robakowski d7083a84f4
Allow list of IPs in geoip ingest processor (#49573) (#49947)
* Allow list of IPs in geoip ingest processor

This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.

Closes #46193
2019-12-07 00:19:09 +01:00
Stuart Tettemer 17cda5b2c0
Scripting: Groundwork for caching script results (#49895) (#49944)
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic.  This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.

Script results were never cached and that does not change here.  Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.

Refs: #49466

**Backport**
2019-12-06 15:08:05 -07:00
Jake Landis 1c5a139968
Update jackson-databind to 2.8.11.4 (#49347) (#49937) 2019-12-06 13:39:33 -06:00
Henning Andersen 1d3feaf18e
Reindex sort deprecation warning take 2 (#49855) (#49899)
Moved the deprecation warning to ReindexValidator to ensure it runs
early and works with resilient reindex. Also check that the warning
is reported back for wait_for_completion=false.

Follow-up to #49458
2019-12-06 09:44:36 +01:00
Jack Conradson cd3744c0b7 Add nodes to handle types (#49785)
This PR adds 3 nodes to handle types defined by a front-end creating a 
Painless AST. These types are decided with data immutability in mind - 
hence the reason for more than a single node.
2019-12-05 17:09:19 -08:00
Zachary Tong fec882a457 Decouple pipeline reductions from final agg reduction (#45796)
Historically only two things happened in the final reduction:
empty buckets were filled, and pipeline aggs were reduced (since it
was the final reduction, this was safe).  Usage of the final reduction
is growing however.  Auto-date-histo might need to perform
many reductions on final-reduce to merge down buckets, CCS
may need to side-step the final reduction if sending to a
different cluster, etc

Having pipelines generate their output in the final reduce was
convenient, but is becoming increasingly difficult to manage
as the rest of the agg framework advances.

This commit decouples pipeline aggs from the final reduction by
introducing a new "top level" reduce, which should be called
at the beginning of the reduce cycle (e.g. from the SearchPhaseController).
This will only reduce pipeline aggs on the final reduce after
the non-pipeline agg tree has been fully reduced.

By separating pipeline reduction into their own set of methods,
aggregations are free to use the final reduction for whatever
purpose without worrying about generating pipeline results
which are non-reducible
2019-12-05 16:11:54 -05:00
Jack Conradson 687c6648d9 Minor Painless Clean Up (#49844)
This cleans up two minor things.
- Cleans up style of == false
- Pulls maxLoopCounter into a member variable instead of accessing
CompilerSettings multiple times in the SFunction node
2019-12-05 12:20:07 -08:00
Stuart Tettemer 426c7a5e8f
Scripting: add available languages & contexts API (#49652) (#49815)
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.

Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.

For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.

Response format:
```
{
  "types_allowed": [
    "inline",
    "stored"
  ],
  "language_contexts": [
    {
      "language": "expression",
      "contexts": [
        "aggregation_selector",
        "aggs"
        ...
      ]
    },
    {
      "language": "painless",
      "contexts": [
        "aggregation_selector",
        "aggs",
        "aggs_combine",
        ...
      ]
    }
...
  ]
}
```

Fixes: #49463 

**Backport**
2019-12-04 16:18:22 -07:00
Jack Conradson dbf6183469 Remove extraneous pass (#49797)
This removes the storeSettings pass where nodes in the AST could store 
information they needed out of CompilerSettings for use during later 
passes. CompilerSettings is part of ScriptRoot which is available during the 
analysis pass making the storeSettings pass redundant.
2019-12-04 12:18:04 -08:00
Armin Braun 91ac87d75b
Stop Allocating Buffers in CopyBytesSocketChannel (#49825) (#49832)
* Stop Allocating Buffers in CopyBytesSocketChannel (#49825)

The way things currently work, we read up to 1M from the channel
and then potentially force all of it into the `ByteBuf` passed
by Netty. Since that `ByteBuf` tends to by default be `64k` in size,
large reads will force the buffer to grow, completely circumventing
the logic of `allocHandle`.

This seems like it could break
`io.netty.channel.RecvByteBufAllocator.Handle#continueReading`
since that method for the fixed-size allocator does check
whether the last read was equal to the attempted read size.
So if we set `64k` because that's what the buffer size is,
then wirte `1M` to the buffer we will stop reading on the IO loop,
even though the channel may still have bytes that we can read right away.

More imporatantly though, this can lead to running OOM quite easily
under IO pressure as we are forcing the heap buffers passed to the read
to `reallocate`.

Closes #49699
2019-12-04 19:36:52 +01:00
Armin Braun 996cddd98b
Stop Copying Every Http Request in Message Handler (#44564) (#49809)
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
2019-12-04 08:41:42 +01:00
Jason Tedor 0f27c0b702
Extend systemd timeout during startup (#49784)
When we are notifying systemd that we are fully started up, it can be
that we do not notify systemd before its default timeout of sixty
seconds elapses (e.g., if we are upgrading on-disk metadata). In this
case, we need to notify systemd to extend this timeout so that we are
not abruptly terminated. We do this by repeatedly sending
EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this
every fifteen seconds. This will prevent systemd from abruptly
terminating us during a long startup. We cancel the scheduled execution
of this notification after we have successfully started up.
2019-12-03 14:25:45 -05:00
Henning Andersen 5adb33ec17
Deprecate sorting in reindex (#49458) (#49738)
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.

It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.

Related to #47567
2019-12-01 19:24:27 +01:00
Henning Andersen 1d745f1e5c Revert "Deprecate sorting in reindex (#49458)"
This reverts commit 27d45c9f1f.
2019-11-29 22:08:19 +01:00
Henning Andersen 27d45c9f1f Deprecate sorting in reindex (#49458)
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.

It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.

Related to #47567
2019-11-29 21:35:11 +01:00
Armin Braun 813b49adb4
Make BlobStoreRepository Aware of ClusterState (#49639) (#49711)
* Make BlobStoreRepository Aware of ClusterState (#49639)

This is a preliminary to #49060.

It does not introduce any substantial behavior change to how the blob store repository
operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency
by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation
(create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple.
This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the
repository operates in #49060
2019-11-29 14:57:47 +01:00
Mayya Sharipova 2dafecc398
Upgrade lucene to 8.4.0-snapshot-e648d601efb (#49641) 2019-11-28 11:59:58 -05:00
jimczi 35732504ba #49166 Fix spurious test failure 2019-11-28 11:08:15 +01:00
Jim Ferenczi d6445fae4b Add a cluster setting to disallow loading fielddata on _id field (#49166)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.

Closes #43599
2019-11-28 09:35:28 +01:00
Martijn van Groningen 0a42395dfa
Backport: add templating support to pipeline processor (#49643)
Backport of #49030

This commit adds templating support to the pipeline processor's `name` option.

Closes #39955
2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka 502873b144
[Java.time] Retain prefixed date pattern in formatter (#48703)
JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters.

closes #48698
2019-11-27 12:29:18 +01:00
Yannick Welsch bd007271cf Avoid double-wrapping allocator (#49534)
When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.
2019-11-27 09:25:32 +01:00
Martijn van Groningen 90850f4ea0
Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596)
Backport of #49076

In case an exception occurs inside a pipeline processor,
the pipeline stack is kept around as header in the exception.
Then in the on_failure processor the id of the pipeline the
exception occurred is made accessible via the `on_failure_pipeline`
ingest metadata.

Closes #44920
2019-11-27 07:52:08 +01:00
Jason Tedor 71bcfbf1e3
Replace required pipeline with final pipeline (#49470)
This commit enhances the required pipeline functionality by changing it
so that default/request pipelines can also be executed, but the required
pipeline is always executed last. This gives users the flexibility to
execute their own indexing pipelines, but also ensure that any required
pipelines are also executed. Since such pipelines are executed last, we
change the name of required pipelines to final pipelines.
2019-11-22 14:37:36 -05:00
Henning Andersen 49bb5fb642 Netty4: switch to composite cumulator (#49478)
The default merge cumulator used in netty transport leads to additional
GC pressure and memory copying when a message that exceeds the chunk
size is handled. This is especially a problem on G1 GC, since we get
many "humongous" allocations and that can in theory cause real memory
circuit breaker to break unnecessarily.
2019-11-22 18:14:10 +01:00
Martijn van Groningen 2243743450
Update geolite2 database in ingest geoip plugin. (#49308)
Some tests were tweaked to deal with the updated database files.
2019-11-22 08:38:57 +01:00
Henning Andersen 0164de8579 Reindex search response fix again (#49423)
Fixed test case to more broadly accept all messages with "Partial
shards failure" in it, to hopefully catch all relevant search messages
now that reindex does not allow searching against red shards.

Closes #49295
2019-11-21 11:45:08 +01:00
Jack Conradson a780ec14f0 Painless: Upgrade ASM to 7.2 (#49263)
This upgrades Painless to use the latest ASM libraries providing support up 
to Java 14. Note the library is not published with the latest versions in an 
"all" package, so we pick up each lib independently that's required. There 
were some changes to the getType method that require descriptors to be 
used in place of internal class names.
2019-11-20 07:09:47 -08:00
Christoph Büscher 4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
Alan Woodward c6b31162ba
Refactor percolator's QueryAnalyzer to use QueryVisitors
Lucene now allows us to explore the structure of a query using QueryVisitors,
delegating the knowledge of how to recurse through and collect terms to the
query implementations themselves. The percolator currently has a home-grown
external version of this API to construct sets of matching terms that must be
present in a document in order for it to possibly match the query.

This commit removes the home-grown implementation in favour of one using
QueryVisitor. This has the added benefit of making interval queries available
for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050)
it also includes a clone of some of the lucene intervals logic, that can be removed
once upstream has been fixed.

Closes #45639
2019-11-20 09:21:01 +00:00
Mark Tozzi 17358b5af7
(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320) (#49330)
This is a pure code rearrangement refactor.  Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility).  ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.
2019-11-19 16:44:29 -05:00
Ryan Ernst c6a8913c38 Fix java home validation usage by tasks (#49204)
Tasks intending to use a particular java home provided by JAVA<N>_HOME
use the getJavaHome method, which verifies the given java home is
available, or will be if the task will run. However, the verification
logic was broken, in addition to unnecessarily delaying retrieving the
java home until runtime. This commit fixes the verification logic to run
at either config time, delaying verification, or at runtime which
immediately checks if java home is available.

closes #49153
2019-11-19 10:30:19 -08:00
Henning Andersen bc29c9877a Reindex search response fix (#49301)
Fixed test case to also accept another error message, now that reindex
does not allow searching against red shards.

Closes #49295
2019-11-19 14:38:05 +01:00
Tanguy Leroux abed869ec6 Mute ReindexFailureTests.testResponseOnSearchFailure (#49298)
Relates #49295
2019-11-19 12:38:54 +01:00
Henning Andersen 2ac38fd315 Reindex and friends fail on RED shards (#45830)
Reindex, update by query and delete by query would silently disregard
RED/unavailable shards, thus not copying, updating or deleting matching
data in those shards. Now use `allow_partial_search_results=false` to
ensure these operations fail if the search crosses an unavailable chard.

Added the option to explicitly specify `allow_partial_search_results=true`
for reindex only (seemed too strange for update/delete by query).

Relates #45739 and #42612
2019-11-18 21:23:08 +01:00
gpaimla 7d20b50f45 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:24:21 +01:00
Jason Tedor 2bcdcb17cd
Introduce dedicated ingest processor exception (#48810)
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
2019-11-14 11:04:53 -05:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Henning Andersen 8835142ac9 Grok processor ignore case test (#48909)
Added test demonstrating that grok using ignore case works, since this
does a minimal test that the `joni` and `jcodings` libraries are
compatible.

Forward-port of test from #43334
2019-11-08 00:04:29 +01:00
Jason Tedor c82ecb664c
Do not wrap ingest processor exception with IAE (#48816)
The problem with wrapping here is that it converts any exception into an
IAE, which we treat as a client error (400 status) whereas the exception
being wrapped here could be a server error (e.g., NPE). This commit
stops wrapping all ingest processor exceptions as IAEs.
2019-11-01 15:11:35 -04:00
Mark Vieira 6ab4645f4e
[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818)
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.

Closes #42042
2019-11-01 11:33:11 -07:00
Ioannis Kakavas 99aedc844d
Copy http headers to ThreadContext strictly (#45945) (#48675)
Previous behavior while copying HTTP headers to the ThreadContext,
would allow multiple HTTP headers with the same name, handling only
the first occurrence and disregarding the rest of the values. This
can be confusing when dealing with multiple Headers as it is not
obvious which value is read and which ones are silently dropped.

According to RFC-7230, a client must not send multiple header fields
with the same field name in a HTTP message, unless the entire field
value for this header is defined as a comma separated list or this
specific header is a well-known exception.

This commits changes the behavior in order to be more compliant to
the aforementioned RFC by requiring the classes that implement
ActionPlugin to declare if a header can be multi-valued or not when
registering this header to be copied over to the ThreadContext in
ActionPlugin#getRestHeaders.
If the header is allowed to be multivalued, then all such headers
are read from the HTTP request and their values get concatenated in
a comma-separated string.
If the header is not allowed to be multivalued, and the HTTP
request contains multiple such Headers with different values, the
request is rejected with a 400 status.
2019-10-31 23:05:12 +02:00
Dan Hermann dbc05cd808
Add option to split processor for preserving trailing empty fields (#48685) 2019-10-30 08:25:03 -05:00