Commit Graph

5620 Commits

Author SHA1 Message Date
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Rene Groeschke a896df53ac
Remove misc dependency related deprecation warnings (7.x backport) (#59122)
* Fix dependency related deprecations (#58892)
* Fix classpath setup for forbiddenapi usage
2020-07-07 17:10:31 +02:00
Ignacio Vera 5cc6457ed8
upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091) (#59120) 2020-07-07 12:07:41 +02:00
Jake Landis 604c6dd528
7.x - Create plugin for yamlTest task (#56841) (#59090)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 14:16:26 -05:00
Nik Everett 2965c7fe12
Fix bug in parent and child aggregators when parent field not defined (#57089) (#59074)
Adding null check for ParentJoinFieldMapper in ChildrenAggregationBuilder.joinFieldResolveConfig

Closes #42997

Co-authored-by: ParthPunkster <parthjain.pj1994@gmail.com>
2020-07-06 10:59:47 -04:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Tim Brooks 605e24ed7c
Use `getPortRange` in http server tests (#58794)
Currently we are leaving the settings to default port range in the nio
and netty4 http server test. This has recently led to tests failing due
to what appears to be a port conflict with other processes. This commit
modifies these tests to use the test case helper method to generate port
ranges.

Fixes #58433 and #58296.
2020-07-02 13:21:45 -06:00
Dan Hermann 40655069e2
Data stream support for delete-by-query 2020-07-02 08:17:24 -05:00
Dan Hermann fba1047ad9
Data stream support for update by query API 2020-07-02 08:16:05 -05:00
Alan Woodward 0cd1dc3143 Percolator keyword fields should not store norms (#58899)
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
2020-07-02 13:59:28 +01:00
Rene Groeschke 70713a0a19
Remove deprecated AbstractArchiveTask Gradle API usages (#58657) (#58894)
* Fix deprecated ArchiveTask configurations
2020-07-02 13:08:34 +02:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka 2c275913b9
[7.x] Week based parsing for ingest date processor (#58597) (#58802)
Date processor was incorrectly parsing week based dates because when a
weekbased year was provided ingest module was thinking year was not
on a date and was trying to applying the logic for dd/MM type of
dates.
Date Processor is also allowing users to specify locale parameter. It
should be taken into account when parsing dates - currently only used
for formatting. If someone specifies 'en-us' locale, then calendar data
rules for that locale should be used.
The exception is iso8601 format. If someone is using that format,
then locale should not override calendar data rules.
closes #58479
2020-07-01 15:15:56 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Tim Brooks 5efec3a517
Add error logging when http test fails (#58505)
Netty4HttpServerTransportTests has started to fail intermittently. It
seems like unexpected successful responses are being received when the
test is simulating errors. This commit adds logging to the test to
provide additional information when there is an unexpected success. It
also adds the logging to the nio http test.
2020-06-24 11:02:20 -06:00
Luca Cavanna 7e2bb8d6a2 Mute Netty4HttpServerTransportTests#testCorsRequest (#58480)
Relates to #58433
2020-06-24 14:31:38 +02:00
Alan Woodward d251a482e9 Move MappedFieldType.similarity() to TextSearchInfo (#58439)
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.

It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
2020-06-24 10:00:32 +01:00
Alan Woodward 8ebd341710
Add text search information to MappedFieldType (#58230) (#58432)
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.

This allows us to remove the MapperService.getLuceneFieldType() shim method.
2020-06-23 14:37:26 +01:00
Alan Woodward 4b8cf2af6a
Add serialization test for FieldMappers when include_defaults=true (#58235) (#58328)
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.

Fixes #58188
2020-06-18 15:46:04 +01:00
Alan Woodward ca2d12d039 Remove Settings parameter from FieldMapper base class (#58237)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
2020-06-18 12:53:54 +01:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
jimczi a7488ee16f Fix PercolatorMatchedSlotSubFetchPhaseTests#testHitsExecute 2020-06-17 23:04:17 +02:00
Jim Ferenczi a19213dcca Fix nested document support in percolator query (#58149)
This commit ensures that we filter out nested documents
when retrieving the document slots of a matching query.

Closes #52850
2020-06-17 22:32:54 +02:00
Alan Woodward 12a3f6dfca
MappedFieldType should not extend FieldType (#58160)
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to #56814
2020-06-16 16:56:43 +01:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
Tal Levy 499ad6fcc4
Pre-compile inline scripts in Ingest Script processors (#57960) (#58130)
This commit introduces an optimization for inline scripts.
It keeps the compiled ingest script that the ScriptProcessor.Factory
has been creating for validation purposes. Previously, the Script Service's
cache was leveraged because it was the best way to handle caching of both
stored and inline scripts. Since inline scripts are so widely used in
Ingest Node, it is probably best to ensure we are using the pre-compiled version
from the beginning.
2020-06-15 15:22:56 -07:00
Dan Hermann 8a910443c4
Add ignore_empty_value parameter in set ingest processor (#57030) (#58108) 2020-06-15 08:35:08 -05:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Martijn van Groningen c8031c6f99
Add data stream support to the reindex api. (#57970)
Backport of #57870 to 7.x branch.

This change now also copies the op_type from the reindex request's destination index request to the actual index request being used in the bulk request.

For ensuring no document exists, the op_type create doesn't need to be copied, since Versions.MATCH_DELETED will copied from the 'mainRequest.getDestination().version()'.
The `version()` method on IndexRequest only returns Versions.MATCH_DELETED if op_type=create and no specific version has been specified.

However in order to be able to index into a data stream, the op_type must be create. So in order to support that the op_type must be copied from the reindex request's destination index request to the actual index request being used in the bulk request.

Relates to #53100 and #57788
2020-06-12 09:54:37 +02:00
Mark Tozzi 36f551bdb4
Make ValuesSourceConfig behave like a config object (#57762) (#58012) 2020-06-11 17:23:55 -04:00
Alan Woodward 16e230dcb8 Update to lucene snapshot e7c625430ed (#57981)
Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.
2020-06-11 14:51:53 +01:00
Nik Everett 0a2bd10758
Save memory when parent and child are not on top (#57892) (#57944)
Reworks the `parent` and `child` aggregation are not at the top level
using the optimization from #55873. Instead of wrapping all
non-top-level `parent` and `child` aggregators we now handle being a
child aggregator in the aggregator, specifically by adding recording
which global ordinals show up in the parent and then checking if they
match the child.
2020-06-10 16:25:10 -04:00
Yannick Welsch 80f221e920
Use clean thread context for transport and applier service (#57792) (#57914)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-10 10:30:28 +02:00
Jake Landis a370d5eead
[7.x] Ensure Joni warning are logged at debug (#57302) (#57897)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 17:06:29 -05:00
Yannick Welsch 9eec819c5b Revert "Use clean thread context for transport and applier service (#57792)"
This reverts commit 259be236cf.
2020-06-09 22:24:54 +02:00
Jake Landis fff0a106c9
[7.x] Support `if_seq_no` and `if_primary_term` for ingest (#55430) (#57768)
Allow for optimistic concurrency control during ingest by checking the
sequence number and primary term. This is accomplished by defining
_if_seq_no and _if_primary_term in the pipeline, similarly to _version
and _version_type.

Closes #41255
Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>
2020-06-09 14:20:26 -05:00
Yannick Welsch 259be236cf Use clean thread context for transport and applier service (#57792)
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
2020-06-09 12:32:28 +02:00
Mayya Sharipova 70e63a365a
Refactor how to determine if a field is metafield (#57378) (#57771)
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
2020-06-08 09:16:18 -04:00
Tanguy Leroux 0e57528d5d Remove more //NORELEASE (#57517)
We agreed on removing the following //NORELEASE tags.
2020-06-05 15:34:06 +02:00
Armin Braun 24779c80f9
Serialize Outbound Message on Flush (#57084) (#57682)
Follow up to #56961:

We can be a little more efficient than just serializing at the IO loop by serializing
only when we flush to a channel. This has the advantage that we don't serialize a long
queue of messages for a channel that isn't writable for a longer period of time (unstable network,
actually writing large volumes of data, etc.).
Also, this further reduces the time for which we hold on to the write buffer for a message,
making allocations because of an empty page cache recycler pool less likely.
2020-06-04 18:06:13 +02:00
Nik Everett 928794cd61
Make parent and child aggregator more obvious (#57490) (#57553)
Pulls the way that the `ParentJoinAggregator` collects global ordinals
into a strategy object so it is a little simpler to reason about and
it'll be simpler to save memory by removing `asMultiBucketAggregator` in
the future.

Relates to #56487
2020-06-02 16:22:38 -04:00
Mark Tozzi e50f514092
IndexFieldData should hold the ValuesSourceType (#57373) (#57532) 2020-06-02 12:16:53 -04:00
Armin Braun ba2d70d8eb
Serialize Outbound Messages on IO Threads (#56961) (#57080)
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.

With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.

This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.

Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
2020-06-02 16:15:18 +02:00
Nik Everett f52e779806
Fix casting of scaled_float in sorts (#57207) (#57385)
Previously we'd get a `ClassCastException` when you tried to use
`numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves
some code around so the casting actually works.
2020-05-29 18:06:04 -04:00
Tomasz Elendt a7c36c8af5 Support multiple tokens on LHS in stemmer_override rules (#56113) (#56484)
This commit adds support for rules with multiple tokens on LHS, also
known as "contraction rules", into stemmer override token
filter. Contraction rules are handy into translating multiple
inflected words into the same root form. One side effect of this change is
that it brings stemmer override rules format closer to synonym rules
format so that it makes it easier to translate one into another.

This change also makes stemmer override rules parser more strict so
that it should catch more errors which were previously accepted.

Closes #56113
2020-05-29 22:34:31 +02:00
Henning Andersen 8427d677e9
Reindex and friends fail nicely when max_docs < slices (#54901) (#57348)
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.

Relates to #52786.

Co-authored-by: bellengao <gbl_long@163.com>
2020-05-29 14:30:14 +02:00
Lee Hinman c0f732b9f6
[7.x] Rename template V2 classes to ComposableTemplate (#57183) (#57232)
Backports the following commits to 7.x:

    Rename template V2 classes to ComposableTemplate (#57183)
2020-05-27 11:01:59 -06:00
Alan Woodward d6b79bcd95 Remove Mapper.updateFieldType() (#57151)
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.

Relates to #41059
Backport of #56986
2020-05-27 09:21:24 +01:00
Armin Braun 56401d3f66
Release HTTP Request Body Earlier (#57094) (#57110)
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
2020-05-25 13:00:19 +02:00
Jack Conradson 35c546b388
Backports for _source bug fix in scripting (#57068)
* Update DeprecationMap to DynamicMap (#56149)

This renames DeprecationMap to DynamicMap, and changes the deprecation
messages Map to accept a Map of String (keys) to Functions (updated values)
instead. This creates more flexibility in either logging or updating values from
params within a script. This change is required to fix (#52103) in a future PR.

* Fix Source Return Bug in Scripting (#56831)

This change ensures that when a user returns _source directly no matter where 
accessed within scripting, the value is a Map of the converted source as 
opposed to a SourceLookup.
2020-05-21 17:07:38 -07:00
markharwood eb8cb31d46
Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024)
Same version as master
2020-05-21 11:28:16 +01:00
Andrei Balici 19a336e8d3 Add `max_token_length` setting to the CharGroupTokenizer (#56860)
Adds `max_token_length` option to the CharGroupTokenizer.
Updates documentation as well to reflect the changes.

Closes #56676
2020-05-20 14:28:40 +02:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Tim Brooks 57c3a61535
Create HttpRequest earlier in pipeline (#56393)
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
2020-05-18 14:54:01 -06:00
Armin Braun cac85a6f18
Shorter Path in Netty ByteBuf Unwrap (#56740) (#56857)
In most cases we are seeing a `PooledHeapByteBuf` here now. No need to
redundantly create an new `ByteBuffer` and single element array for it
here when we can just directly unwrap its internal `byte[]`.
2020-05-16 11:54:36 +02:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Ryan Ernst 9fb80d3827
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 20:23:07 -07:00
Armin Braun 14a042fbe5
Make No. of Transport Threads == Available CPUs (#56488) (#56780)
We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
`io.netty.allocator.useCacheForAllThreads` which is `true` we end up
using up to `16MB` of thread local buffer cache for each transport thread.
Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
2020-05-14 21:33:46 +02:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Julie Tibshirani 1ad83c37c4
Use index sort range query when possible. (#56710)
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.

Relates to #48665.
2020-05-13 13:24:45 -07:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Jake Landis a56fb6192e
[7.x] Fix ingest simulate verbose on failure with conditional (#56478) (#56635)
If a conditional is added to a processor, and that processor fails, and 
that processor has an on_failure handler, the full trace of all of the 
executed processors may not be displayed in simulate verbose. The 
information is correct, but misses displaying some of the steps used 
to get there.

This happens because a processor that is conditional processor is a 
wrapper around the real processor and a processor with an on_failure 
handler is also a wrapper around the processor(s). When decorating for 
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors 
can be missed for decoration resulting in the missing displayed steps.

The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.

related: #56004
2020-05-12 15:41:05 -05:00
Armin Braun b449661b8f
Remove Unused ByteBufStreamInput (#56567) (#56601)
We're not using this one any more.
2020-05-12 16:04:58 +02:00
Tim Brooks 760ab726c2
Share netty event loops between transports (#56553)
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
2020-05-11 15:43:43 -06:00
Nik Everett 2f38aeb5e2
Save memory when numeric terms agg is not top (#55873) (#56454)
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:

1. We didn't have an appropriate data structure to track the
   sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
   per `Aggregator` which was the only way to delay collection of
   sub-aggregations.

This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.

It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.

It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.

I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
2020-05-08 20:38:53 -04:00
Mark Vieira 0fb9bc5379
Always use archive base name as the pom artifact id (#56447) (#56467) 2020-05-08 16:11:19 -07:00
Jason Tedor 33669c0420
Upgrade to Jackson 2.10.4 (#56188)
Another Jackson release is available. There are some CVEs addressed,
none of which impact us, but since we can now bump Jackson easily, let
us move along with the train to avoid the false positives from security
scanners.
2020-05-06 17:20:23 -04:00
Julie Tibshirani e852bb29b7
Simplify signature of FieldMapper#parseCreateField. (#56144)
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.

This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
2020-05-06 11:12:09 -07:00
Nhat Nguyen c305cfbbb6
Fix CancelTests#testDeleteByQueryCancelWithWorkers (#56242)
We need to relax the assertion as a TaskCancelledException
can be suppressed instead.

Closes #55647
2020-05-06 09:55:40 -04:00
Tim Brooks 6a51017cb2
Upgrade netty to 4.1.49.Final (#56059) 2020-05-05 10:40:23 -06:00
Martijn van Groningen 2ac32db607
Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151)
Backport of #56034.

Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.

Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.

Relates to #53100
2020-05-04 22:38:33 +02:00
Armin Braun 75d4a4def4
Fix potential NPEin Netty4Transport.stopInternal (#56080) (#56129)
Closes #56068
2020-05-04 19:38:21 +02:00
markharwood e197b6c45b
Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432) (#56100)
Authored-by: Amit Khandelwal <amitmbm87@gmail.com>
2020-05-04 11:31:28 +01:00
Dan Hermann 2061652988
Ensure auto close of HTMLStripCharFilter in HtmlStripProcessor
The HtmlStripProcessor did not use a try-with resources block to ensure
that the used HTMLStripCharFilter is closed.
2020-05-01 17:31:53 -05:00
Igor Motov d8f9df771d
Expose agg usage in Feature Usage API (#55732) (#56048)
Counts usage of the aggs and exposes them on the _nodes/usage/.

Closes #53746
2020-04-30 12:53:36 -04:00
Przemko Robakowski 797f63e743
[7.x] Emit deprecation warning if multiple v1 templates match with a new index (#55558) (#56038)
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)

* Emit deprecation warning if multiple v1 templates match with a new index

* DEPRECATION_LOGGER rename
2020-04-30 17:36:17 +02:00
Przemko Robakowski bf0204ba06
Fix empty_value handling in CsvProcessor (#55649) (#55968)
* Fix empty_value handling in CsvProcessor

Due to bug in `CsvProcessor.Factory` it was impossible to specify `empty_value`.
This change fixes that and adds relevant test.

Closes #55643

* assert changed
2020-04-29 22:37:22 +02:00
Amit Khandelwal 126e4acca8 Expose `preserve_original` in `edge_ngram` token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:24:27 +02:00
Tim Brooks 80662f31a1
Introduce mechanism to stub request handling (#55832)
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
2020-04-27 16:57:15 -06:00
Ryan Ernst 70b499b7aa
Simplify java home verification (#55635)
* Simplify java home verification

At one time, all uses of java home were found through the getJavaHome
utility method on BuildPlugin. However, that was changed many
refactorings ago, but the complex support for registering a java home
version needed that fails at configuration time still exists. The only
remaining use of grabbing java home is within bwc tests, and must be at
runtime since that is when we have the checkout and know what version is
needed.

This commit consolidates the java home finding method into a utility
unassociated with BuildPlugin.

* fix checkstyle

* address feedback
2020-04-27 12:43:32 -07:00
Jake Landis 7b4bacebb5
[7.x] fix the schema validation for scripts_painless_context (#55738) (#55751) 2020-04-27 08:39:56 -05:00
Rory Hunter d66af46724
Always use deprecateAndMaybeLog for deprecation warnings (#55319)
Backport of #55115.

Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
2020-04-23 09:20:54 +01:00
Jake Landis 25ea6a74f0
[7.x] Validate REST specs against schema (#55117) (#55563)
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.

The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.

An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.

Closes #54314
2020-04-22 14:14:03 -05:00
Tal Levy 0844455505
Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037) (#55500)
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.

This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
2020-04-22 08:12:54 -07:00
Jason Tedor 1553e7e620
Encapsulate systemd extender
The systemd extender is a scheduled execution that ensures we
repeatedly let systemd know during startup that we are still starting
up. We cancel this scheduled execution once the node has successfully
started up. This extender is wrapped in a set once, which we expose
directly. This commit addresses this by putting the extender behind a
getter, which hides the implementation detail that the extener is
wrapped in a set once. This cleans up some issues in tests, that
ensures we are not making assertions about the set once, but instead
about the extender.
2020-04-20 21:17:42 -04:00
Jason Tedor 80f18ad31a
Use set once for systemd extender (#55497)
When Elasticsearch is starting up, we schedule a thread to repeatedly
let systemd know that we are still in the process of starting up. Today
we use a non-final field for this. This commit changes this to be a set
once so we can mark the field as final, and get stronger guarantees when
reasoning about the state of execution here.
2020-04-20 21:15:04 -04:00
Zachary Tong f46b567563 Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250)
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).

This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class.  Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
2020-04-17 16:39:38 -04:00
Martijn van Groningen 417d5f2009
Make data streams in APIs resolvable. (#55337)
Backport from: #54726

The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.

In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.

Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).

Relates to #53100
2020-04-17 08:33:37 +02:00
Mark Tozzi 22c55180c1
[7.x] Backport ValuesSourceRegistry and related work (#54922)
* Add ValuesSource Registry and associated logic (#54281)

* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)

* ValuesSourceRegistry Prototype (#48758)

* Remove generics from ValuesSource related classes (#49606)

* fix percentile aggregation tests (#50712)

* Basic thread safety for ValuesSourceRegistry (#50340)

* Remove target value type from ValuesSourceAggregationBuilder (#49943)

* Cleanup default values source type (#50992)

* CoreValuesSourceType no longer implements Writable (#51276)

* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)

* Put values source types on fields (#51503)

* Remove VST Any (#51539)

* Rewire terms agg to use new VS registry (#51182)

Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)

* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)

* Wire Percentiles aggregator into new VS framework (#51639)

This required a bit of a refactor to percentiles itself.  Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory.  This doesn't work (or at least, would be
difficult) in the new VS framework.

This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory.  This object
is then used when deciding which kind of aggregator to create

Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.

* Remove generics and target value type from MultiVSAB (#51647)

* fix checkstyle after merge (#52008)

* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)

* Convert RareTerms to new VS registry (#52166)

* Wire up Value Count (#52225)

* Wire up Max & Min aggregations (#52219)

* ValuesSource refactoring: Wire up Sum aggregation (#52571)

* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)

* Soft immutability for VSConfig (#52729)

* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)

Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)

* VS refactoring: Wire up stats aggregation (#52891)

* ValuesSource refactoring: Wire up string_stats aggregation (#52875)

* VS refactoring: Wire up median (MAD) aggregation (#52945)

* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java

this commit implements `getValuesSourceType` for
the ConstantKeyword field type.

master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.

* ValuesSource refactoring: Wire up Avg aggregation (#52752)

* Wire PercentileRanks aggregator into new VS framework  (#51693)

* Add a VSConfig resolver for aggregations not using the registry (#53038)

* Vs refactor wire up ranges and date ranges (#52918)

* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)

This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry

relates #42949.

* VS refactoring: convert Boxplot to new registry (#53132)

* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)

This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry

relates to the values-source refactoring meta issue #42949.

* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)

This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.

relates to the values-source refactoring meta issue #42949.

* Fix type tests for Missing aggregation (#53501)

* ValuesSource Refactor: move histo VSType into XPack module (#53298)

- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs

* Wire up DateHistogram to the ValuesSourceRegistry (#53484)

* Vs refactor parser cleanup (#53198)

Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>

* First batch of easy fixes

* Remove List.of from ValuesSourceRegistry

Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.

* More compiler fixes

* More compiler fixes

* More compiler fixes

* Precommit is happy and so am I

* Add new Core VSTs to tests

* Disabled supported type test on SigTerms until we can backport it's fix

* fix checkstyle

* Fix test failure from semantic merge issue

* Fix some metaData->metadata replacements that got lost

* Fix list of supported types for MinAggregator

* Fix list of supported types for Avg

* remove unused import

Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
2020-04-16 16:54:46 -04:00
David Turner 7941f4a47e Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 16:27:36 +01:00
William Brafford 2ba3be9db6
Remove deprecated third-party methods from tests (#55255) (#55269)
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
2020-04-15 17:54:47 -04:00
Ignacio Vera a677b63daa
Upgrade to lucene 8.5.1 release (#55229) (#55235)
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
2020-04-15 17:35:42 +02:00
Mark Vieira ce85063653
[7.x] Re-add origin url information to publish POM files (#55173) 2020-04-14 13:24:15 -07:00
William Brafford 52bebec51f
NodeInfo response should use a collection rather than fields (#54460) (#55132)
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.

In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."

I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
2020-04-13 17:18:39 -04:00
Jake Landis a2fafa6af4
[7.x] Lazy test cluster module and plugins (#54852) (#55087)
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.

Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
2020-04-13 10:53:35 -05:00
Jason Tedor 9eeae59a83
Clarify available processors (#54907)
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
2020-04-10 08:48:27 -04:00