Commit Graph

5293 Commits

Author SHA1 Message Date
Jim Ferenczi 5a3fa4a479
Add client jar for mapper-extras (#47430)
The rest high level client has a dependency on mapper-extras but the jar
is not published so this commit adds a client jar for this module.

Closes #47413
2019-10-03 01:23:45 +02:00
Jim Ferenczi c340814b34 Fix highlighting of overlapping terms in the unified highlighter (#47227)
The passage formatter that the unified highlighter use doesn't handle terms with overlapping offsets.
For tokenizer that provides multiple segmentation of the same terms (edge ngram for instance) the formatter
should select the largest span in order to highlight the term only once. This change implements this logic.
2019-10-02 16:34:12 +02:00
Alan Woodward 697c693ee7 Reset Token position on reuse in scripted analysis (#47424)
Most of the information in AnalysisPredicateScript.Token is pulled directly
from its underlying AttributeSource, but we also keep track of the token position,
and this state is held directly on the Token. This information needs to be reset when
the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used.

Fixes #47197
2019-10-02 11:27:04 +01:00
Jack Conradson 8f1a80a43d Move Painless local methods to a dedicated FunctionTable (#46889)
This moves the way Painless maintains function headers for use
across compilation into its own class - FunctionTable. This
allows us to store a dedicated object for function lookup at
runtime for the def type instead of a loose Map of functions.
2019-09-30 09:06:40 -07:00
Rory Hunter 53a4d2176f
Convert most awaitBusy calls to assertBusy (#45794) (#47112)
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.

There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.

I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.

Other changes:

  * Rework `waitForDocs` to scale its timeout. Instead of calling
    `assertBusy` in a loop, work out a reasonable overall timeout and await
    just once.
  * Some tests failed after switching to `assertBusy` and had to be fixed.
  * Correct the expect templates in AbstractUpgradeTestCase.  The ES
    Security team confirmed that they don't use templates any more, so
    remove this from the expected templates. Also rewrite how the setup
    code checks for templates, in order to give more information.
  * Remove an expected ML template from XPackRestTestConstants The ML team
    advised that the ML tests shouldn't be waiting for any
    `.ml-notifications*` templates, since such checks should happen in the
    production code instead.
  * Also rework the template checking code in `XPackRestTestHelper` to give
    more helpful failure messages.
  * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
2019-09-29 12:21:46 +01:00
Jack Conradson d09965a6dd Add ClassWriter to Painless writing pass (#47140)
This the first part of a series to allow nodes to write all of their appropriate
pieces to the class. Currently, nodes must add their bindings, constants, and
functions to main SClass node for delayed writing. This instead adds a
Painless version of ClassWriter to the write pass. The Painless ClassWriter
contains an appropriate ClassVisitor that can be accessed in any node
during the process along with access to the clinit method, and finally a
shortcut for creating new MethodWriter. The next step will be removing the
delayed writing in SClass, and instead, delegate all writing responsibilities to
the nodes.
2019-09-27 11:04:15 -07:00
Jack Conradson 9b4f377474 Change Painless function node to use a block instead of raw statements (#46884)
This change improves the node structure of SFunction. SFunction now uses 
an SBlock instead of a List of AStatments reducing code duplication and 
gives a future target for symbol table scoping.
2019-09-26 10:33:35 -07:00
Jim Ferenczi 73a09b34b8
Replace SearchContextException with SearchException (#47046)
This commit removes the SearchContextException in favor of a simpler
SearchException that doesn't leak the SearchContext.

Relates #46523
2019-09-26 14:21:23 +02:00
Martijn van Groningen 429f23ea2f
Allow ingest processors to execute in a non blocking manner. (#47122)
Backport of #46241

This PR changes the ingest executing to be non blocking
by adding an additional method to the Processor interface
that accepts a BiConsumer as handler and changing
IngestService#executeBulkRequest(...) to ingest document
in a non blocking fashion iff a processor executes
in a non blocking fashion.

This is the second PR that merges changes made to server module from
the enrich branch (see #32789) into the master branch.

The plan is to merge changes made to the server module separately from
the pr that will merge enrich into master, so that these changes can
be reviewed in isolation.

This change originates from the enrich branch and was introduced there
in #43361.
2019-09-26 08:55:28 +02:00
Tim Brooks 6720c56bdd
Set netty system properties in BuildPlugin (#45881)
Currently in production instances of Elasticsearch we set a couple of
system properties by default. We currently do not apply all of these
system properties in tests. This commit applies these properties in the
tests.
2019-09-24 10:49:36 -06:00
Jack Conradson a1af2fe96a Rename Painless node SSource to SClass (#46984)
Mechanical renaming of SSource node to SClass to better align with the names of what other nodes generate.
2019-09-24 07:35:10 -07:00
Jim Ferenczi 08f28e642b Replace SearchContext with QueryShardContext in query builder tests (#46978)
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates #46523
2019-09-23 20:24:02 +02:00
Jason Tedor bd77626177
Add the ability to require an ingest pipeline (#46847)
This commit adds the ability to require an ingest pipeline on an
index. Today we can have a default pipeline, but that could be
overridden by a request pipeline parameter. This commit introduces a new
index setting index.required_pipeline that acts similarly to
index.default_pipeline, except that it can not be overridden by a
request pipeline parameter. Additionally, a default pipeline and a
request pipeline can not both be set. The required pipeline can be set
to _none to ensure that no pipeline ever runs for index requests on that
index.
2019-09-19 16:37:45 -04:00
Yannick Welsch 9638ca20b0 Allow dropping documents with auto-generated ID (#46773)
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
2019-09-19 16:46:33 +02:00
Alexander Reelsen 011496ed5f Expose cache setting in UserAgentPlugin (#46533)
The setting was not registered. Also documentation has been added.
2019-09-16 11:30:38 +02:00
Jim Ferenczi 4407f3af1b Delay the creation of SubSearchContext to the FetchSubPhase (#46598)
This change delays the creation of the SubSearchContext for nested and parent/child inner_hits
to the fetch sub phase in order to ensure that a SearchContext can built entirely from a
QueryShardContext. This commit also adds a validation step to the inner hits builder that ensures that we fail the request early if the inner hits path is invalid.

Relates #46523
2019-09-12 14:52:15 +02:00
Mark Vieira ccf656a9d0
Repository plugin test cacheability fixes (#46572) 2019-09-11 08:24:55 -07:00
Jim Ferenczi 23bf310c84 Replace the SearchContext with QueryShardContext when building aggregator factories (#46527)
This commit replaces the `SearchContext` with the `QueryShardContext` when building aggregator factories. Aggregator factories are part of the `SearchContext` so they shouldn't require a `SearchContext` to create them.
The main changes here are the signatures of `AggregationBuilder#build` that now takes a `QueryShardContext` and `AggregatorFactory#createInternal` that passes the `SearchContext` to build the `Aggregator`.

Relates #46523
2019-09-11 16:43:30 +02:00
Christoph Büscher aa0c586b73 Deprecate `_field_names` disabling (#42854)
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.

Closes #27239
2019-09-11 14:58:08 +02:00
Jim Ferenczi 425b1a77e8 Add more context to QueryShardContext (#46584)
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
2019-09-11 12:24:51 +02:00
Mayya Sharipova 2c5f9b558b Fix highlighting for script_score query (#46507) 2019-09-10 08:26:47 -04:00
Alexander Reelsen 0915bd7c6a Update mustache dependency to 0.9.6 (#46243) 2019-09-09 13:42:03 +02:00
Ryan Ernst a078bb4b92 Add test tasks for unpooled and direct buffer pooling to netty (#46049)
Some netty behavior is controlled by system properties. While we want to
test with the defaults for Elasticsearch for most tests, within netty we
want to ensure these netty settings exhibit correct behavior. This
commit adds variants of test and integTest tasks for netty which set the
unpooled and direct buffer pooled allocators.

relates #45881
2019-08-30 11:37:45 -07:00
Igor Motov 28006fe19f Fix GeoIpProcessorFactoryTests on windows (#45668)
Switches windows build to use geoip database loaded on heap instead
of memory mapping it.

Closes #44552
2019-08-28 18:02:25 -04:00
Mark Tozzi aec125faff
Support Range Fields in Histogram and Date Histogram (#46012)
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)

     * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
     * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
     * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type.  This is similar to how Terms aggregator works.
     * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
2019-08-28 09:06:09 -04:00
Tim Brooks 956df7be92
Reindex task state initialized before reindex (#46043)
Currently the process to execute a reindex process is tightly coupled to
step of initializing the task state. This creates problems when this
process is asynchronous. It is possible that the task state has not been
initialized which prevents follow-up actions such as rethrottle. This
commit separates the task initialization so that it can be executed as a
first step in the persistent reindex process.
2019-08-27 15:28:04 -05:00
Tim Brooks 07f3ddb549
Extract reindexing logic from transport action (#46033)
This commit extracts the reindexing logic from the transport action so
that it can be incorporated into the persistent reindex work without
requiring the usage of the client.
2019-08-27 12:28:37 -05:00
Tim Brooks ad233e3e38
Add test for CopyBytesSocketChannel (#46031)
Currently we use a custom CopyBytesSocketChannel for interfacing with
netty. We have integration tests that use this channel, however we never
verify the read and write behavior in the face of potential partial
writes. This commit adds a test for this behavior.
2019-08-27 11:25:22 -05:00
Jason Tedor 3d64605075
Remove node settings from blob store repositories (#45991)
This commit starts from the simple premise that the use of node settings
in blob store repositories is a mistake. Here we see that the node
settings are used to get default settings for store and restore throttle
rates. Yet, since there are not any node settings registered to this
effect, there can never be a default setting to fall back to there, and
so we always end up falling back to the default rate. Since this was the
only use of node settings in blob store repository, we move them. From
this, several places fall out where we were chaining settings through
only to get them to the blob store repository, so we clean these up as
well. That leaves us with the changeset in this commit.
2019-08-26 16:26:13 -04:00
Jack Conradson 45ad01ab1c Fix bugs in Painless SCatch node (#45880)
This fixes two bugs:
- A recently introduced bug where an NPE will be thrown if a catch block is 
empty.
- A long-time bug where an NPE will be thrown if multiple catch blocks in a 
row are empty for the same try block.
2019-08-23 08:08:02 -07:00
Jason Tedor de6b6fd338
Add node.processors setting in favor of processors (#45885)
This commit namespaces the existing processors setting under the "node"
namespace. In doing so, we deprecate the existing processors setting in
favor of node.processors.
2019-08-22 22:18:37 -04:00
Henning Andersen 4afa413a01 Fix update-by-query script examples (#43907)
Two examples had swapped the order of lang and code when creating a
script.

Relates #43884
2019-08-22 22:03:54 +02:00
Jack Conradson a1b88ca009 Move regex error to node (#45813) 2019-08-22 07:12:54 -07:00
Armin Braun 6aaee8aa0a
Repository Cleanup Endpoint (#43900) (#45780)
* Repository Cleanup Endpoint (#43900)

* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
2019-08-21 17:59:49 +02:00
Andrey Ershov dbc90653dc transport.publish_address should contain CNAME (#45626)
This commit adds CNAME reporting for transport.publish_address same way
it's done for http.publish_address.

Relates #32806
Relates #39970

(cherry picked from commit e0a2558a4c3a6b6fbfc6cd17ed34a6f6ef7b15a9)
2019-08-16 17:42:00 +02:00
Luca Cavanna c31cddf27e
Update the schema for the REST API specification (#42346)
* Update the REST API specification

This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.

Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.

Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.

Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.

* Adapt YAML runner to new REST API specification format

The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.

Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
2019-08-16 14:40:00 +02:00
Armin Braun de58353722
Lower Painless Static Memory Footprint (#45487) (#45619)
* Painless generates a ton of duplicate strings and empty `Hashmap` instances wrapped as unmodifiable
* This change brings down the static footprint of Painless on an idle node by 20MB (after running the PMC benchmark against said node)
   * Since we were looking into ways of optimizing for smaller node sizes I think this is a worthwhile optimization
2019-08-15 19:41:45 +02:00
Jim Ferenczi 79a1390935 Add mapper-extras and the RankFeatureQuery in the hlrc (#43713)
This change adds the support for the RankFeatureQuery in the HLRC by
providing an extra dependency on mapper-extras-client. It also removes
the dependency on lang-painless in mapper-extras which is not needed
anymore since the move of the vector field into a dedicated module.

Closes #43634
2019-08-14 18:41:39 +02:00
Jack Conradson 7f550f2b29 Complete decoupling ANTLR AST from Painless AST (#45366)
This change removes the Reserved class used to track variables usages 
within the ANTLR grammar. That task is now performed by an existing pass 
"extractVariables" in the Painless AST. The Painless AST no longer has any 
dependencies on the ANTLR AST for state outside of the tree being built. 
This will simplify future refactoring and opens the possibility of alternate 
grammars.
2019-08-13 08:02:10 -07:00
Tim Brooks ae06a9399a
Fix bug in copying bytes for socket write (#45463)
Currently we take the array of nio buffers from the netty channel
outbound buffer and copy their bytes to a direct buffer. In the process
we mutate the nio buffer positions. It seems like netty will continue to
reuse these buffers. This means than any data that is not flushed in a
call is lost. This commit fixes this by incrementing the positions after
the flush has completed. This is similar to the behavior that
SocketChannel would have provided and netty relied upon.

Fixes #45444.
2019-08-12 15:59:26 -06:00
Armin Braun a9e1402189
Remove Settings from BaseRestRequest Constructor (#45418) (#45429)
* Resolving the todo, cleaning up the unused `settings` parameter
* Cleaning up some other minor dead code in affected classes
2019-08-12 05:14:45 +02:00
Armin Braun a501d68f23
Upgrade to Netty 4.1.38 (#45132) (#45364)
* A number of fixes to buffer handling in the .37 and .38 -> we should stay up to date
2019-08-09 03:38:14 +02:00
Tim Brooks af908efa41
Disable netty direct buffer pooling by default (#44837)
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
2019-08-08 15:10:31 -06:00
Henning Andersen d139896b66
Reindex share retry between hit sources (#44203) (#45348)
The client and remote hit sources had each their own retry mechanism,
which would do the same. Supporting resiliency we would have to expand
on the retry mechanisms and as a preparation for that, the retry
mechanism is now shared such that each sub class is only responsible for
sending requests and converting responses/failures to common format.

Part of #42612
2019-08-08 22:01:29 +02:00
Jack Conradson b716b840d3 Remove loop counter from Reserved in Painless AST. (#45298)
This change adds a compiler pass to give each node the chance to store 
settings necessary for analysis and writing. This removes the need to pass 
this in a somewhat convoluted way through an additional class called 
Reserved, and also removes the need to have the Walker set values for 
settings on reserved. This is next step in decoupling the Painless grammar 
from the Painless AST.
2019-08-08 09:34:51 -07:00
Michael Basnight 89861d0884 Add ingest processor existence helper method (#45156)
This commit adds a helper method to the ingest service allowing it to
inspect a pipeline by id and verify the existence of a processor in the
pipeline. This work exposed a potential bug in that some processors
contain inner processors that are passed in at instantiation. These
processors needed a common way to expose their inner processors, so the
WrappingProcessor was created in order to expose the inner processor.
2019-08-07 11:19:04 -05:00
Jason Tedor bd59ee6c72
Fix clock used in update requests (#45262)
We accidentally switched to using the relative time provider here. This
commit fixes this by switching to the appropriate absolute clock.
2019-08-06 21:15:21 -04:00
Jack Conradson fc8a6fc9d0 Decouple Painless AST Lambda generation from the grammar (#45111)
This is the first step in decoupling the Painless AST from the grammar. The
Painless AST should be able to generate classes independently of how the
AST is generated from a grammar. (If I were to build a Painless AST by hand
in code this should be all that's necessary.) This change removes Lambda
name generation from the ANTLR grammar tree walker. It also removes
unnecessary node generation of new array function references from the
tree walker as well.
2019-08-06 10:08:19 -07:00
Yannick Welsch 7aeb2fe73c Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-06 10:45:44 +02:00
Zachary Tong 3df1c76f9b Allow pipeline aggs to select specific buckets from multi-bucket aggs (#44179)
This adjusts the `buckets_path` parser so that pipeline aggs can
select specific buckets (via their bucket keys) instead of fetching
the entire set of buckets.  This is useful for bucket_script in
particular, which might want specific buckets for calculations.

It's possible to workaround this with `filter` aggs, but the workaround
is hacky and probably less performant.

- Adjusts documentation
- Adds a barebones AggregatorTestCase for bucket_script
- Tweaks AggTestCase to use getMockScriptService() for reductions and
pipelines.  Previously pipelines could just pass in a script service
for testing, but this didnt work for regular aggs.  The new
getMockScriptService() method fixes that issue, but needs to be used
for pipelines too.  This had a knock-on effect of touching MovFn,
AvgBucket and ScriptedMetric
2019-08-05 12:18:40 -04:00