Commit Graph

5390 Commits

Author SHA1 Message Date
Jim Ferenczi d6445fae4b Add a cluster setting to disallow loading fielddata on _id field (#49166)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.

Closes #43599
2019-11-28 09:35:28 +01:00
Martijn van Groningen 0a42395dfa
Backport: add templating support to pipeline processor (#49643)
Backport of #49030

This commit adds templating support to the pipeline processor's `name` option.

Closes #39955
2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka 502873b144
[Java.time] Retain prefixed date pattern in formatter (#48703)
JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters.

closes #48698
2019-11-27 12:29:18 +01:00
Yannick Welsch bd007271cf Avoid double-wrapping allocator (#49534)
When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.
2019-11-27 09:25:32 +01:00
Martijn van Groningen 90850f4ea0
Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596)
Backport of #49076

In case an exception occurs inside a pipeline processor,
the pipeline stack is kept around as header in the exception.
Then in the on_failure processor the id of the pipeline the
exception occurred is made accessible via the `on_failure_pipeline`
ingest metadata.

Closes #44920
2019-11-27 07:52:08 +01:00
Jason Tedor 71bcfbf1e3
Replace required pipeline with final pipeline (#49470)
This commit enhances the required pipeline functionality by changing it
so that default/request pipelines can also be executed, but the required
pipeline is always executed last. This gives users the flexibility to
execute their own indexing pipelines, but also ensure that any required
pipelines are also executed. Since such pipelines are executed last, we
change the name of required pipelines to final pipelines.
2019-11-22 14:37:36 -05:00
Henning Andersen 49bb5fb642 Netty4: switch to composite cumulator (#49478)
The default merge cumulator used in netty transport leads to additional
GC pressure and memory copying when a message that exceeds the chunk
size is handled. This is especially a problem on G1 GC, since we get
many "humongous" allocations and that can in theory cause real memory
circuit breaker to break unnecessarily.
2019-11-22 18:14:10 +01:00
Martijn van Groningen 2243743450
Update geolite2 database in ingest geoip plugin. (#49308)
Some tests were tweaked to deal with the updated database files.
2019-11-22 08:38:57 +01:00
Henning Andersen 0164de8579 Reindex search response fix again (#49423)
Fixed test case to more broadly accept all messages with "Partial
shards failure" in it, to hopefully catch all relevant search messages
now that reindex does not allow searching against red shards.

Closes #49295
2019-11-21 11:45:08 +01:00
Jack Conradson a780ec14f0 Painless: Upgrade ASM to 7.2 (#49263)
This upgrades Painless to use the latest ASM libraries providing support up 
to Java 14. Note the library is not published with the latest versions in an 
"all" package, so we pick up each lib independently that's required. There 
were some changes to the getType method that require descriptors to be 
used in place of internal class names.
2019-11-20 07:09:47 -08:00
Christoph Büscher 4ffa050735 Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:37:12 +01:00
Alan Woodward c6b31162ba
Refactor percolator's QueryAnalyzer to use QueryVisitors
Lucene now allows us to explore the structure of a query using QueryVisitors,
delegating the knowledge of how to recurse through and collect terms to the
query implementations themselves. The percolator currently has a home-grown
external version of this API to construct sets of matching terms that must be
present in a document in order for it to possibly match the query.

This commit removes the home-grown implementation in favour of one using
QueryVisitor. This has the added benefit of making interval queries available
for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050)
it also includes a clone of some of the lucene intervals logic, that can be removed
once upstream has been fixed.

Closes #45639
2019-11-20 09:21:01 +00:00
Mark Tozzi 17358b5af7
(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320) (#49330)
This is a pure code rearrangement refactor.  Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility).  ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.
2019-11-19 16:44:29 -05:00
Ryan Ernst c6a8913c38 Fix java home validation usage by tasks (#49204)
Tasks intending to use a particular java home provided by JAVA<N>_HOME
use the getJavaHome method, which verifies the given java home is
available, or will be if the task will run. However, the verification
logic was broken, in addition to unnecessarily delaying retrieving the
java home until runtime. This commit fixes the verification logic to run
at either config time, delaying verification, or at runtime which
immediately checks if java home is available.

closes #49153
2019-11-19 10:30:19 -08:00
Henning Andersen bc29c9877a Reindex search response fix (#49301)
Fixed test case to also accept another error message, now that reindex
does not allow searching against red shards.

Closes #49295
2019-11-19 14:38:05 +01:00
Tanguy Leroux abed869ec6 Mute ReindexFailureTests.testResponseOnSearchFailure (#49298)
Relates #49295
2019-11-19 12:38:54 +01:00
Henning Andersen 2ac38fd315 Reindex and friends fail on RED shards (#45830)
Reindex, update by query and delete by query would silently disregard
RED/unavailable shards, thus not copying, updating or deleting matching
data in those shards. Now use `allow_partial_search_results=false` to
ensure these operations fail if the search crosses an unavailable chard.

Added the option to explicitly specify `allow_partial_search_results=true`
for reindex only (seemed too strange for update/delete by query).

Relates #45739 and #42612
2019-11-18 21:23:08 +01:00
gpaimla 7d20b50f45 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:24:21 +01:00
Jason Tedor 2bcdcb17cd
Introduce dedicated ingest processor exception (#48810)
Today we wrap exceptions that occur while executing an ingest processor
in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we
only unwrap causes for exceptions that implement
ElasticsearchWrapperException, which the top-level
ElasticsearchException does not. Ultimately, this means that any
exception that occurs during processor execution does not have its cause
unwrapped, and so its status is blanket treated as a 500. This means
that while executing a bulk request with an ingest pipeline,
document-level failures that occur during a processor will cause the
status for that document to be treated as 500. Since that does not give
the client any indication that they made a mistake, it means some
clients will enter infinite retries, thinking that there is some
server-side problem that merely needs to clear. This commit addresses
this by introducing a dedicated ingest processor exception, so that its
causes can be unwrapped. While we could consider a broader change to
unwrap causes for more than just ElasticsearchWrapperExceptions, that is
a broad change with unclear implications. Since the problem of reporting
500s on client errors is a user-facing bug, we take the conservative
approach for now, and we can revisit the unwrapping in a future change.
2019-11-14 11:04:53 -05:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Henning Andersen 8835142ac9 Grok processor ignore case test (#48909)
Added test demonstrating that grok using ignore case works, since this
does a minimal test that the `joni` and `jcodings` libraries are
compatible.

Forward-port of test from #43334
2019-11-08 00:04:29 +01:00
Jason Tedor c82ecb664c
Do not wrap ingest processor exception with IAE (#48816)
The problem with wrapping here is that it converts any exception into an
IAE, which we treat as a client error (400 status) whereas the exception
being wrapped here could be a server error (e.g., NPE). This commit
stops wrapping all ingest processor exceptions as IAEs.
2019-11-01 15:11:35 -04:00
Mark Vieira 6ab4645f4e
[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818)
This commit introduces a consistent, and type-safe manner for handling
global build parameters through out our build logic. Primarily this
replaces the existing usages of extra properties with static accessors.
It also introduces and explicit API for initialization and mutation of
any such parameters, as well as better error handling for uninitialized
or eager access of parameter values.

Closes #42042
2019-11-01 11:33:11 -07:00
Ioannis Kakavas 99aedc844d
Copy http headers to ThreadContext strictly (#45945) (#48675)
Previous behavior while copying HTTP headers to the ThreadContext,
would allow multiple HTTP headers with the same name, handling only
the first occurrence and disregarding the rest of the values. This
can be confusing when dealing with multiple Headers as it is not
obvious which value is read and which ones are silently dropped.

According to RFC-7230, a client must not send multiple header fields
with the same field name in a HTTP message, unless the entire field
value for this header is defined as a comma separated list or this
specific header is a well-known exception.

This commits changes the behavior in order to be more compliant to
the aforementioned RFC by requiring the classes that implement
ActionPlugin to declare if a header can be multi-valued or not when
registering this header to be copied over to the ThreadContext in
ActionPlugin#getRestHeaders.
If the header is allowed to be multivalued, then all such headers
are read from the HTTP request and their values get concatenated in
a comma-separated string.
If the header is not allowed to be multivalued, and the HTTP
request contains multiple such Headers with different values, the
request is rejected with a 400 status.
2019-10-31 23:05:12 +02:00
Dan Hermann dbc05cd808
Add option to split processor for preserving trailing empty fields (#48685) 2019-10-30 08:25:03 -05:00
Yogesh Gaikwad 9ed7352a12
Add Sysprop to Adjust IO Buffer Size (#48267) (#48667)
The 1MB IO-buffer size per transport thread is causing trouble in
some tests, albeit at a low rate. Reducing the number of transport
threads was not enough to fully fix this situation.
Allowing to configure the size of the buffer and reducing it by
more than an order of magnitude should fix these tests.

Closes #46803
2019-10-30 14:19:54 +11:00
Christoph Büscher 09d68e7548
Support `search_type` in Rank Evaluation API (#48542) (#48631)
Adding support for the `search_type` request parameter to the Ranking Evaluation
API since this parameter can impact the ranking and the metric score and should
be choosen in the same way when evaluating the search as later in the real
search.

Closes #48503
2019-10-29 14:54:33 +01:00
Rory Hunter 3c77c50f5f
Improve resiliency to auto-formatting in libs, modules (#48619)
Backport of #48448. Make a number of changes so that code in the libs and
modules directories are more resilient to automatic formatting. This covers:

* Remove string concatenation where JSON fits on a single line
* Move some comments around to they aren't auto-formatted to a strange
  place
2019-10-29 10:39:34 +00:00
Tim Brooks 45e42f4e18
Upgrade to Netty 4.1.43 (#48484)
With this update we can remove the mitigation in our custom allocator
which forces heap buffer allocations.
2019-10-25 10:17:25 -06:00
Tim Brooks c0b545f325
Make BytesReference an interface (#48486)
BytesReference is currently an abstract class which is extended by
various implementations. This makes it very difficult to use the
delegation pattern. The implication of this is that our releasable
BytesReference is a PagedBytesReference type and cannot be used as a
generic releasable bytes reference that delegates to any reference type.
This commit makes BytesReference an interface and introduces an
AbstractBytesReference for common functionality.
2019-10-24 15:39:30 -06:00
Michael Basnight c19379ef31 Remove random when using HLRC sync and async calls (#48211)
This commit removes the randomization used by every execute call in the
high level rest tests. Previously every execute call, which can be many
calls per single test, would rely on a random boolean to determine if
they should use the sync or async methods provided to the execute
method. This commit runs the tests twice, using two different clusters,
both of them providing the value one time via a sysprop. This ensures
that the whole suite of tests is run using the sync and async code
paths.

Closes #39667
2019-10-24 09:06:17 -05:00
Martijn van Groningen b034153df7
Change grok watch dog to be Matcher based instead of thread based. (#48346)
There is a watchdog in order to avoid long running (and expensive)
grok expressions. Currently the watchdog is thread based, threads
that run grok expressions are registered and after completion unregister.
If these threads stay registered for too long then the watch dog interrupts
these threads. Joni (the library that powers grok expressions) has a
mechanism that checks whether the current thread is interrupted and
if so abort the pattern matching.

Newer versions have an additional method to abort long running pattern
matching inside joni. Instead of checking the thread's interrupted flag,
joni now also checks a volatile field that can be set via a `Matcher`
instance. This is more efficient method for aborting long running matches.
(joni checks each 30k iterations whether interrupted flag is set vs.
just checking a volatile field)

Recently we upgraded to a recent joni version (#47374), and this PR
is a followup of that PR.

This change should also fix #43673, since it appears when unit tests
are ran the a test runner thread's interrupted flag may already have
been set, due to some thread reuse.
2019-10-24 15:34:01 +02:00
Tim Brooks c1f6aff5bb
Remove default netty allocator empty assertions (#48356)
This commit removes a problematic assertion that the netty default
allocator is not used. This assertion is problematic because any other
test can cause this task to fail by touching the default allocator. We
assert that we are using heap buffers in the channel.
2019-10-22 20:22:32 -06:00
Tim Brooks 547e399dbf
Remove option to enable direct buffer pooling (#48310)
This commit removes the option to change the netty system properties to
reenable the direct buffer pooling. It also removes the need for us to
disable the buffer pooling in the system properties file. Instead, we
programmatically craete an allocator that is used by our networking
layer.

This commit does introduce an Elasticsearch property which allows the
user to fallback on the netty default allocator. If they choose this
option, they can configure the default allocator how they wish using the
standard netty properties.
2019-10-21 19:15:50 -06:00
Ignacio Vera b1224fca8c
upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227) 2019-10-21 08:21:09 +02:00
Alexander Reelsen 66581d8158
update ingest-user-agent regexes.yml (#47807)
This new regexes are from:
154eba17f5/regexes.yaml
2019-10-18 16:26:48 +02:00
Jack Conradson 155ecd0a76 Change Painless regex node to use SField instead of Globals (#47944)
* Change Painless regex node to use SField instead of Globals

* Use reflection instead of ASM to specify modifiers

* Remove synthetic from SField
2019-10-15 07:47:16 -07:00
jimczi b858e19bcc Revert #46598 that breaks the cachability of the sub search contexts. 2019-10-15 09:40:59 +02:00
Tim Brooks 8814bf07f1
Upgrade to Netty 4.1.42 (#48015)
Upgrades the netty version.
2019-10-14 13:54:02 -06:00
Przemyslaw Gomulka 6ab58de7ef
[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675) (#47913)
Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month
However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from
Within DateFormatters we use the correct uuuu year instead of yyyy year of era

worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.
2019-10-11 21:19:56 +02:00
Jim Ferenczi bd6e2592a7 Remove the SearchContext from the highlighter context (#47733)
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates #47198
Relates #46523
2019-10-10 10:34:10 +02:00
Jack Conradson 076d3073b5 Move binding member field generation to Painless semantic pass (#47739)
This adds an SField node that operates similarly to SFunction as a top level 
node meant only for use in an SClass node. Member fields are generated 
for both class bindings and instance bindings using the new SField node 
during the semantic pass, and information is no longer passed through 
Globals for this during the write pass.
2019-10-09 10:24:53 -07:00
Tim Brooks 02622c1ef9
Fix issues with serializing BulkByScrollResponse (#45357)
Currently there are two issues with serializing BulkByScrollResponse.
First, when deserializing from XContent, indexing exceptions and search
exceptions are switched. Additionally, search exceptions do no retain
the appropriate RestStatus code, so you must evaluate the status code
from the exception. However, the exception class is not always correctly
retained when serialized.

This commit adds tests in the failure case. Additionally, fixes the
swapping of failure types and adds the rest status code to the search
failure.
2019-10-09 10:12:14 -06:00
Alpar Torok 36d018c909 Convert RunTask to use testclusers, remove ClusterFormationTasks (#47572)
* Convert RunTask to use testclusers, remove ClusterFormationTasks

This PR adds a new RunTask and a way for it to start a
testclusters cluster out of band and block on it to replace
the old RunTask that used ClusterFormationTasks.

With this we can now remove ClusterFormationTasks.
2019-10-08 14:43:29 +03:00
Jack Conradson 833ed30f0d Modify Painless AST to add synthetic functions during semantic pass (#47611)
This has ELambda and ENewArrayFunctionRef add their generated synthetic
methods to the SClass node during the semantic pass and removes this 
data from the write pass. This is the first step to remove "Globals" (mutable 
state) from the write pass.
2019-10-07 07:48:51 -07:00
Jack Conradson e3aab1295e Add a ScriptRoot to consolidate global data necessary for multiple passes (#47532)
This PR is to get plumbing in for a ScriptRoot class that will consolidate 
several pieces of state required by potentially multiple passes including 
PainlessLookup, CompilerSettings, FunctionTable, the root class node, and a 
synthetic counter. It's possible more may be added to this as we move 
forward and slowly make the the nodes have less mutable state.
2019-10-04 08:37:19 -07:00
Ryan Ernst f32692208e
Add explanations to script score queries (#46693) (#47548)
While function scores using scripts do allow explanations, they are only
creatable with an expert plugin. This commit improves the situation for
the newer script score query by adding the ability to set the
explanation from the script itself.

To set the explanation, a user would check for `explanation != null` to
indicate an explanation is needed, and then call
`explanation.set("some description")`.
2019-10-03 21:05:05 -07:00
Jim Ferenczi 5a3fa4a479
Add client jar for mapper-extras (#47430)
The rest high level client has a dependency on mapper-extras but the jar
is not published so this commit adds a client jar for this module.

Closes #47413
2019-10-03 01:23:45 +02:00
Jim Ferenczi c340814b34 Fix highlighting of overlapping terms in the unified highlighter (#47227)
The passage formatter that the unified highlighter use doesn't handle terms with overlapping offsets.
For tokenizer that provides multiple segmentation of the same terms (edge ngram for instance) the formatter
should select the largest span in order to highlight the term only once. This change implements this logic.
2019-10-02 16:34:12 +02:00
Alan Woodward 697c693ee7 Reset Token position on reuse in scripted analysis (#47424)
Most of the information in AnalysisPredicateScript.Token is pulled directly
from its underlying AttributeSource, but we also keep track of the token position,
and this state is held directly on the Token. This information needs to be reset when
the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used.

Fixes #47197
2019-10-02 11:27:04 +01:00
Jack Conradson 8f1a80a43d Move Painless local methods to a dedicated FunctionTable (#46889)
This moves the way Painless maintains function headers for use
across compilation into its own class - FunctionTable. This
allows us to store a dedicated object for function lookup at
runtime for the def type instead of a loose Map of functions.
2019-09-30 09:06:40 -07:00
Rory Hunter 53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
Jack Conradson d09965a6dd Add ClassWriter to Painless writing pass (#47140)
This the first part of a series to allow nodes to write all of their appropriate
pieces to the class. Currently, nodes must add their bindings, constants, and
functions to main SClass node for delayed writing. This instead adds a
Painless version of ClassWriter to the write pass. The Painless ClassWriter
contains an appropriate ClassVisitor that can be accessed in any node
during the process along with access to the clinit method, and finally a
shortcut for creating new MethodWriter. The next step will be removing the
delayed writing in SClass, and instead, delegate all writing responsibilities to
the nodes.
2019-09-27 11:04:15 -07:00
Jack Conradson 9b4f377474 Change Painless function node to use a block instead of raw statements (#46884)
This change improves the node structure of SFunction. SFunction now uses 
an SBlock instead of a List of AStatments reducing code duplication and 
gives a future target for symbol table scoping.
2019-09-26 10:33:35 -07:00
Jim Ferenczi 73a09b34b8
Replace SearchContextException with SearchException (#47046)
This commit removes the SearchContextException in favor of a simpler
SearchException that doesn't leak the SearchContext.

Relates #46523
2019-09-26 14:21:23 +02:00
Martijn van Groningen 429f23ea2f
Allow ingest processors to execute in a non blocking manner. (#47122)
Backport of #46241

This PR changes the ingest executing to be non blocking
by adding an additional method to the Processor interface
that accepts a BiConsumer as handler and changing
IngestService#executeBulkRequest(...) to ingest document
in a non blocking fashion iff a processor executes
in a non blocking fashion.

This is the second PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.

The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.

This change originates from the enrich branch and was introduced there
in #43361.
2019-09-26 08:55:28 +02:00
Tim Brooks 6720c56bdd
Set netty system properties in BuildPlugin (#45881)
Currently in production instances of Elasticsearch we set a couple of
system properties by default. We currently do not apply all of these
system properties in tests. This commit applies these properties in the
tests.
2019-09-24 10:49:36 -06:00
Jack Conradson a1af2fe96a Rename Painless node SSource to SClass (#46984)
Mechanical renaming of SSource node to SClass to better align with the names of what other nodes generate.
2019-09-24 07:35:10 -07:00
Jim Ferenczi 08f28e642b Replace SearchContext with QueryShardContext in query builder tests (#46978)
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates #46523
2019-09-23 20:24:02 +02:00
Jason Tedor bd77626177
Add the ability to require an ingest pipeline (#46847)
This commit adds the ability to require an ingest pipeline on an
index. Today we can have a default pipeline, but that could be
overridden by a request pipeline parameter. This commit introduces a new
index setting index.required_pipeline that acts similarly to
index.default_pipeline, except that it can not be overridden by a
request pipeline parameter. Additionally, a default pipeline and a
request pipeline can not both be set. The required pipeline can be set
to _none to ensure that no pipeline ever runs for index requests on that
index.
2019-09-19 16:37:45 -04:00
Yannick Welsch 9638ca20b0 Allow dropping documents with auto-generated ID (#46773)
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
2019-09-19 16:46:33 +02:00
Alexander Reelsen 011496ed5f Expose cache setting in UserAgentPlugin (#46533)
The setting was not registered. Also documentation has been added.
2019-09-16 11:30:38 +02:00
Jim Ferenczi 4407f3af1b Delay the creation of SubSearchContext to the FetchSubPhase (#46598)
This change delays the creation of the SubSearchContext for nested and parent/child inner_hits
to the fetch sub phase in order to ensure that a SearchContext can built entirely from a
QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid.

Relates #46523
2019-09-12 14:52:15 +02:00
Mark Vieira ccf656a9d0
Repository plugin test cacheability fixes (#46572) 2019-09-11 08:24:55 -07:00
Jim Ferenczi 23bf310c84 Replace the SearchContext with QueryShardContext when building aggregator factories (#46527)
This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates #46523
2019-09-11 16:43:30 +02:00
Christoph Büscher aa0c586b73 Deprecate `_field_names` disabling (#42854)
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.

Closes #27239
2019-09-11 14:58:08 +02:00
Jim Ferenczi 425b1a77e8 Add more context to QueryShardContext (#46584)
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
2019-09-11 12:24:51 +02:00
Mayya Sharipova 2c5f9b558b Fix highlighting for script_score query (#46507) 2019-09-10 08:26:47 -04:00
Alexander Reelsen 0915bd7c6a Update mustache dependency to 0.9.6 (#46243) 2019-09-09 13:42:03 +02:00
Ryan Ernst a078bb4b92 Add test tasks for unpooled and direct buffer pooling to netty (#46049)
Some netty behavior is controlled by system properties. While we want to
test with the defaults for Elasticsearch for most tests, within netty we
want to ensure these netty settings exhibit correct behavior. This
commit adds variants of test and integTest tasks for netty which set the
unpooled and direct buffer pooled allocators.

relates #45881
2019-08-30 11:37:45 -07:00
Igor Motov 28006fe19f Fix GeoIpProcessorFactoryTests on windows (#45668)
Switches windows build to use geoip database loaded on heap instead
of memory mapping it.

Closes #44552
2019-08-28 18:02:25 -04:00
Mark Tozzi aec125faff
Support Range Fields in Histogram and Date Histogram (#46012)
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)

     * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
     * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
     * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type.  This is similar to how Terms aggregator works.
     * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
2019-08-28 09:06:09 -04:00
Tim Brooks 956df7be92
Reindex task state initialized before reindex (#46043)
Currently the process to execute a reindex process is tightly coupled to
step of initializing the task state. This creates problems when this
process is asynchronous. It is possible that the task state has not been
initialized which prevents follow-up actions such as rethrottle. This
commit separates the task initialization so that it can be executed as a
first step in the persistent reindex process.
2019-08-27 15:28:04 -05:00
Tim Brooks 07f3ddb549
Extract reindexing logic from transport action (#46033)
This commit extracts the reindexing logic from the transport action so
that it can be incorporated into the persistent reindex work without
requiring the usage of the client.
2019-08-27 12:28:37 -05:00
Tim Brooks ad233e3e38
Add test for CopyBytesSocketChannel (#46031)
Currently we use a custom CopyBytesSocketChannel for interfacing with
netty. We have integration tests that use this channel, however we never
verify the read and write behavior in the face of potential partial
writes. This commit adds a test for this behavior.
2019-08-27 11:25:22 -05:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Jack Conradson 45ad01ab1c Fix bugs in Painless SCatch node (#45880)
This fixes two bugs:
- A recently introduced bug where an NPE will be thrown if a catch block is 
empty.
- A long-time bug where an NPE will be thrown if multiple catch blocks in a 
row are empty for the same try block.
2019-08-23 08:08:02 -07:00
Jason Tedor de6b6fd338
Add node.processors setting in favor of processors (#45885)
This commit namespaces the existing processors setting under the "node"
namespace. In doing so, we deprecate the existing processors setting in
favor of node.processors.
2019-08-22 22:18:37 -04:00
Henning Andersen 4afa413a01 Fix update-by-query script examples (#43907)
Two examples had swapped the order of lang and code when creating a
script.

Relates #43884
2019-08-22 22:03:54 +02:00
Jack Conradson a1b88ca009 Move regex error to node (#45813) 2019-08-22 07:12:54 -07:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Andrey Ershov dbc90653dc transport.publish_address should contain CNAME (#45626)
This commit adds CNAME reporting for transport.publish_address same way
it's done for http.publish_address.

Relates #32806
Relates #39970

(cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)
2019-08-16 17:42:00 +02:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Armin Braun de58353722
Lower Painless Static Memory Footprint (#45487) (#45619)
* Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable
* This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node)
   * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization
2019-08-15 19:41:45 +02:00
Jim Ferenczi 79a1390935 Add mapper-extras and the RankFeatureQuery in the hlrc (#43713)
This change adds the support for the RankFeatureQuery in the HLRC by
providing an extra dependency on mapper-extras-client. It also removes
the dependency on lang-painless in mapper-extras which is not needed
anymore since the move of the vector field into a dedicated module.

Closes #43634
2019-08-14 18:41:39 +02:00
Jack Conradson 7f550f2b29 Complete decoupling ANTLR AST from Painless AST (#45366)
This change removes the Reserved class used to track variables usages 
within the ANTLR grammar. That task is now performed by an existing pass 
"extractVariables" in the Painless AST. The Painless AST no longer has any 
dependencies on the ANTLR AST for state outside of the tree being built. 
This will simplify future refactoring and opens the possibility of alternate 
grammars.
2019-08-13 08:02:10 -07:00
Tim Brooks ae06a9399a
Fix bug in copying bytes for socket write (#45463)
Currently we take the array of nio buffers from the netty channel
outbound buffer and copy their bytes to a direct buffer. In the process
we mutate the nio buffer positions. It seems like netty will continue to
reuse these buffers. This means than any data that is not flushed in a
call is lost. This commit fixes this by incrementing the positions after
the flush has completed. This is similar to the behavior that
SocketChannel would have provided and netty relied upon.

Fixes #45444.
2019-08-12 15:59:26 -06:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Armin Braun a501d68f23
Upgrade to Netty 4.1.38 (#45132) (#45364)
* A number of fixes to buffer handling in the .37 and .38 -> we should stay up to date
2019-08-09 03:38:14 +02:00
Tim Brooks af908efa41
Disable netty direct buffer pooling by default (#44837)
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
2019-08-08 15:10:31 -06:00
Henning Andersen d139896b66
Reindex share retry between hit sources (#44203) (#45348)
The client and remote hit sources had each their own retry mechanism,
which would do the same. Supporting resiliency we would have to expand
on the retry mechanisms and as a preparation for that, the retry
mechanism is now shared such that each sub class is only responsible for
sending requests and converting responses/failures to common format.

Part of #42612
2019-08-08 22:01:29 +02:00
Jack Conradson b716b840d3 Remove loop counter from Reserved in Painless AST. (#45298)
This change adds a compiler pass to give each node the chance to store 
settings necessary for analysis and writing. This removes the need to pass 
this in a somewhat convoluted way through an additional class called 
Reserved, and also removes the need to have the Walker set values for 
settings on reserved. This is next step in decoupling the Painless grammar 
from the Painless AST.
2019-08-08 09:34:51 -07:00
Michael Basnight 89861d0884 Add ingest processor existence helper method (#45156)
This commit adds a helper method to the ingest service allowing it to
inspect a pipeline by id and verify the existence of a processor in the
pipeline. This work exposed a potential bug in that some processors
contain inner processors that are passed in at instantiation. These
processors needed a common way to expose their inner processors, so the
WrappingProcessor was created in order to expose the inner processor.
2019-08-07 11:19:04 -05:00
Jason Tedor bd59ee6c72
Fix clock used in update requests (#45262)
We accidentally switched to using the relative time provider here. This
commit fixes this by switching to the appropriate absolute clock.
2019-08-06 21:15:21 -04:00
Jack Conradson fc8a6fc9d0 Decouple Painless AST Lambda generation from the grammar (#45111)
This is the first step in decoupling the Painless AST from the grammar. The
Painless AST should be able to generate classes independently of how the
AST is generated from a grammar. (If I were to build a Painless AST by hand
in code this should be all that's necessary.) This change removes Lambda
name generation from the ANTLR grammar tree walker. It also removes
unnecessary node generation of new array function references from the
tree walker as well.
2019-08-06 10:08:19 -07:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Zachary Tong 3df1c76f9b Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179)
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
2019-08-05 12:18:40 -04:00
Tim Brooks 984ba82251
Move nio channel initialization to event loop (#45155)
Currently in the transport-nio work we connect and bind channels on the
a thread before the channel is registered with a selector. Additionally,
it is at this point that we set all the socket options. This commit
moves these operations onto the event-loop after the channel has been
registered with a selector. It attempts to set the socket options for a
non-server channel at registration time. If that fails, it will attempt
to set the options after the channel is connected. This should fix
#41071.
2019-08-02 17:31:31 -04:00
Jack Conradson 54552edaf6 Whitelist randomUUID in Painless (#45148)
This whitelists randomUUID with the understanding that it's possible for 
/dev/random to cause blocking on *nix systems. Users that need 
randomUUID should switch their random generator source to /dev/urandom 
if this is a concern for them.
2019-08-02 11:53:56 -07:00
Christoph Büscher 3366726ad1 Enable reloading of synonym_graph filters (#45135)
Reloading of synonym_graph filter doesn't work currently because the search time
AnalysisMode doesn't get propagated to the TokenFilterFactory emitted by the
graph filters getChainAwareTokenFilterFactory() method. This change fixes that.

Closes #45127
2019-08-02 15:33:42 +02:00