OpenSearch

Commit Graph

Author	SHA1	Message	Date
Jake Landis	604c6dd528	7.x - Create plugin for yamlTest task (#56841 ) (#59090 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 14:16:26 -05:00
Nik Everett	2965c7fe12	Fix bug in parent and child aggregators when parent field not defined (#57089 ) (#59074 ) Adding null check for ParentJoinFieldMapper in ChildrenAggregationBuilder.joinFieldResolveConfig Closes #42997 Co-authored-by: ParthPunkster <parthjain.pj1994@gmail.com>	2020-07-06 10:59:47 -04:00
Martijn van Groningen	f0dd9b4ace	Add data stream timestamp validation via metadata field mapper (#59002 ) Backport of #58582 to 7.x branch. This commit adds a new metadata field mapper that validates, that a document has exactly a single timestamp value in the data stream timestamp field and that the timestamp field mapping only has `type`, `meta` or `format` attributes configured. Other attributes can affect the guarantee that an index with this meta field mapper has a useable timestamp field. The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever a new backing index of a data stream is created. Relates to #53100	2020-07-06 11:32:33 +02:00
Dan Hermann	c1781bc7e7	[7.x] Add include_data_streams flag for authorization (#59008 )	2020-07-03 12:58:39 -05:00
Tim Brooks	605e24ed7c	Use `getPortRange` in http server tests (#58794 ) Currently we are leaving the settings to default port range in the nio and netty4 http server test. This has recently led to tests failing due to what appears to be a port conflict with other processes. This commit modifies these tests to use the test case helper method to generate port ranges. Fixes #58433 and #58296.	2020-07-02 13:21:45 -06:00
Dan Hermann	40655069e2	Data stream support for delete-by-query	2020-07-02 08:17:24 -05:00
Dan Hermann	fba1047ad9	Data stream support for update by query API	2020-07-02 08:16:05 -05:00
Alan Woodward	0cd1dc3143	Percolator keyword fields should not store norms (#58899 ) The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields, leading to an increase in memory usage. This commit disables norms on these fields again.	2020-07-02 13:59:28 +01:00
Rene Groeschke	70713a0a19	Remove deprecated AbstractArchiveTask Gradle API usages (#58657 ) (#58894 ) * Fix deprecated ArchiveTask configurations	2020-07-02 13:08:34 +02:00
Alan Woodward	3ba16e0f39	Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830 ) Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on the generic MappedFieldType. Backport of #58639	2020-07-01 14:52:14 +01:00
Przemyslaw Gomulka	2c275913b9	[7.x] Week based parsing for ingest date processor (#58597 ) (#58802 ) Date processor was incorrectly parsing week based dates because when a weekbased year was provided ingest module was thinking year was not on a date and was trying to applying the logic for dd/MM type of dates. Date Processor is also allowing users to specify locale parameter. It should be taken into account when parsing dates - currently only used for formatting. If someone specifies 'en-us' locale, then calendar data rules for that locale should be used. The exception is iso8601 format. If someone is using that format, then locale should not override calendar data rules. closes #58479	2020-07-01 15:15:56 +02:00
Yannick Welsch	15c85b29fd	Account for recovery throttling when restoring snapshot (#58658 ) (#58811 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-07-01 12:19:29 +02:00
Rene Groeschke	d952b101e6	Replace compile configuration usage with api (7.x backport) (#58721 ) * Replace compile configuration usage with api (#58451) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build * Fix compile usages in 7.x branch	2020-06-30 15:57:41 +02:00
Henning Andersen	38be2812b1	Enhance extensible plugin (#58542 ) Rather than let ExtensiblePlugins know extending plugins' classloaders, we now pass along an explicit ExtensionLoader that loads the extensions asked for. Extensions constructed that way can optionally receive their own Plugin instance in the constructor.	2020-06-25 20:37:56 +02:00
Jason Tedor	52ad5842a9	Introduce node.roles setting (#58512 ) Today we have individual settings for configuring node roles such as node.data and node.master. Additionally, roles are pluggable and we have used this to introduce roles such as node.ml and node.voting_only. As the number of roles is growing, managing these becomes harder for the user. For example, to create a master-only node, today a user has to configure: - node.data: false - node.ingest: false - node.remote_cluster_client: false - node.ml: false at a minimum if they are relying on defaults, but also add: - node.master: true - node.transform: false - node.voting_only: false If they want to be explicit. This is also challenging in cases where a user wants to have configure a coordinating-only node which requires disabling all roles, a list which we are adding to, requiring the user to keep checking whether a node has acquired any of these roles. This commit addresses this by adding a list setting node.roles for which a user has explicit control over the list of roles that a node has. If the setting is configured, the node has exactly the roles in the list, and not any additional roles. This means to configure a master-only node, the setting is merely 'node.roles: [master]', and to configure a coordinating-only node, the setting is merely: 'node.roles: []'. With this change we deprecate the existing 'node.*' settings such as 'node.data'.	2020-06-25 14:14:51 -04:00
Tim Brooks	5efec3a517	Add error logging when http test fails (#58505 ) Netty4HttpServerTransportTests has started to fail intermittently. It seems like unexpected successful responses are being received when the test is simulating errors. This commit adds logging to the test to provide additional information when there is an unexpected success. It also adds the logging to the nio http test.	2020-06-24 11:02:20 -06:00
Luca Cavanna	7e2bb8d6a2	Mute Netty4HttpServerTransportTests#testCorsRequest (#58480 ) Relates to #58433	2020-06-24 14:31:38 +02:00
Alan Woodward	d251a482e9	Move MappedFieldType.similarity() to TextSearchInfo (#58439 ) Similarities only apply to a few text-based field types, but are currently set directly on the base MappedFieldType class. This commit moves similarity information into TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper. It was previously possible to include a similarity parameter on a number of field types that would then ignore this information. To make it obvious that this has no effect, setting this parameter on non-text field types now issues a deprecation warning.	2020-06-24 10:00:32 +01:00
Alan Woodward	8ebd341710	Add text search information to MappedFieldType (#58230 ) (#58432 ) Now that MappedFieldType no longer extends lucene's FieldType, we need to have a way of getting the index information about a field necessary for building text queries, building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo abstraction that holds this information, and a getTextSearchInfo() method to MappedFieldType to make it available. Field types that do not support text search can just return null here. This allows us to remove the MapperService.getLuceneFieldType() shim method.	2020-06-23 14:37:26 +01:00
Alan Woodward	4b8cf2af6a	Add serialization test for FieldMappers when include_defaults=true (#58235 ) (#58328 ) Fixes a bug in TextFieldMapper serialization when index is false, and adds a base-class test to ensure that all field mappers are tested against all variations with defaults both included and excluded. Fixes #58188	2020-06-18 15:46:04 +01:00
Alan Woodward	ca2d12d039	Remove Settings parameter from FieldMapper base class (#58237 ) This is currently used to set the indexVersionCreated parameter on FieldMapper. However, this parameter is only actually used by two implementations, and clutters the API considerably. We should just remove it, and use it directly in the implementations that require it.	2020-06-18 12:53:54 +01:00
Rene Groeschke	abc72c1a27	Unify dependency licenses task configuration (#58116 ) (#58274 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-18 08:15:50 +02:00
jimczi	a7488ee16f	Fix PercolatorMatchedSlotSubFetchPhaseTests#testHitsExecute	2020-06-17 23:04:17 +02:00
Jim Ferenczi	a19213dcca	Fix nested document support in percolator query (#58149 ) This commit ensures that we filter out nested documents when retrieving the document slots of a matching query. Closes #52850	2020-06-17 22:32:54 +02:00
Alan Woodward	12a3f6dfca	MappedFieldType should not extend FieldType (#58160 ) MappedFieldType is a combination of two concerns: * an extension of lucene's FieldType, defining how a field should be indexed * a set of query factory methods, defining how a field should be searched We want to break these two concerns apart. This commit is a first step to doing this, breaking the inheritance relationship between MappedFieldType and FieldType. MappedFieldType instead has a series of boolean flags defining whether or not the field is searchable or aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining how indexing should be done. Relates to #56814	2020-06-16 16:56:43 +01:00
Tal Levy	69d5e044af	Add optional description parameter to ingest processors. (#57906 ) (#58152 ) This commit adds an optional field, `description`, to all ingest processors so that users can explain the purpose of the specific processor instance. Closes #56000.	2020-06-15 19:27:57 -07:00
Tal Levy	499ad6fcc4	Pre-compile inline scripts in Ingest Script processors (#57960 ) (#58130 ) This commit introduces an optimization for inline scripts. It keeps the compiled ingest script that the ScriptProcessor.Factory has been creating for validation purposes. Previously, the Script Service's cache was leveraged because it was the best way to handle caching of both stored and inline scripts. Since inline scripts are so widely used in Ingest Node, it is probably best to ensure we are using the pre-compiled version from the beginning.	2020-06-15 15:22:56 -07:00
Dan Hermann	8a910443c4	Add ignore_empty_value parameter in set ingest processor (#57030 ) (#58108 )	2020-06-15 08:35:08 -05:00
Rene Groeschke	01e9126588	Remove deprecated usage of testCompile configuration (#57921 ) (#58083 ) * Remove usage of deprecated testCompile configuration * Replace testCompile usage by testImplementation * Make testImplementation non transitive by default (as we did for testCompile) * Update CONTRIBUTING about using testImplementation for test dependencies * Fail on testCompile configuration usage	2020-06-14 22:30:44 +02:00
Martijn van Groningen	c8031c6f99	Add data stream support to the reindex api. (#57970 ) Backport of #57870 to 7.x branch. This change now also copies the op_type from the reindex request's destination index request to the actual index request being used in the bulk request. For ensuring no document exists, the op_type create doesn't need to be copied, since Versions.MATCH_DELETED will copied from the 'mainRequest.getDestination().version()'. The `version()` method on IndexRequest only returns Versions.MATCH_DELETED if op_type=create and no specific version has been specified. However in order to be able to index into a data stream, the op_type must be create. So in order to support that the op_type must be copied from the reindex request's destination index request to the actual index request being used in the bulk request. Relates to #53100 and #57788	2020-06-12 09:54:37 +02:00
Mark Tozzi	36f551bdb4	Make ValuesSourceConfig behave like a config object (#57762 ) (#58012 )	2020-06-11 17:23:55 -04:00
Alan Woodward	16e230dcb8	Update to lucene snapshot e7c625430ed (#57981 ) Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.	2020-06-11 14:51:53 +01:00
Nik Everett	0a2bd10758	Save memory when parent and child are not on top (#57892 ) (#57944 ) Reworks the `parent` and `child` aggregation are not at the top level using the optimization from #55873. Instead of wrapping all non-top-level `parent` and `child` aggregators we now handle being a child aggregator in the aggregator, specifically by adding recording which global ordinals show up in the parent and then checking if they match the child.	2020-06-10 16:25:10 -04:00
Yannick Welsch	80f221e920	Use clean thread context for transport and applier service (#57792 ) (#57914 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-10 10:30:28 +02:00
Jake Landis	a370d5eead	[7.x] Ensure Joni warning are logged at debug (#57302 ) (#57897 ) When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.	2020-06-09 17:06:29 -05:00
Yannick Welsch	9eec819c5b	Revert "Use clean thread context for transport and applier service (#57792 )" This reverts commit `259be236cf`.	2020-06-09 22:24:54 +02:00
Jake Landis	fff0a106c9	[7.x] Support `if_seq_no` and `if_primary_term` for ingest (#55430 ) (#57768 ) Allow for optimistic concurrency control during ingest by checking the sequence number and primary term. This is accomplished by defining _if_seq_no and _if_primary_term in the pipeline, similarly to _version and _version_type. Closes #41255 Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>	2020-06-09 14:20:26 -05:00
Yannick Welsch	259be236cf	Use clean thread context for transport and applier service (#57792 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-09 12:32:28 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Tanguy Leroux	0e57528d5d	Remove more //NORELEASE (#57517 ) We agreed on removing the following //NORELEASE tags.	2020-06-05 15:34:06 +02:00
Armin Braun	24779c80f9	Serialize Outbound Message on Flush (#57084 ) (#57682 ) Follow up to #56961: We can be a little more efficient than just serializing at the IO loop by serializing only when we flush to a channel. This has the advantage that we don't serialize a long queue of messages for a channel that isn't writable for a longer period of time (unstable network, actually writing large volumes of data, etc.). Also, this further reduces the time for which we hold on to the write buffer for a message, making allocations because of an empty page cache recycler pool less likely.	2020-06-04 18:06:13 +02:00
Nik Everett	928794cd61	Make parent and child aggregator more obvious (#57490 ) (#57553 ) Pulls the way that the `ParentJoinAggregator` collects global ordinals into a strategy object so it is a little simpler to reason about and it'll be simpler to save memory by removing `asMultiBucketAggregator` in the future. Relates to #56487	2020-06-02 16:22:38 -04:00
Mark Tozzi	e50f514092	IndexFieldData should hold the ValuesSourceType (#57373 ) (#57532 )	2020-06-02 12:16:53 -04:00
Armin Braun	ba2d70d8eb	Serialize Outbound Messages on IO Threads (#56961 ) (#57080 ) Almost every outbound message is serialized to buffers of 16k pagesize. We were serializing these messages off the IO loop (and retaining the concrete message instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the channel is ready. 1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average. 2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically. With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable. Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC. This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain). I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once. Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.	2020-06-02 16:15:18 +02:00
Nik Everett	f52e779806	Fix casting of scaled_float in sorts (#57207 ) (#57385 ) Previously we'd get a `ClassCastException` when you tried to use `numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves some code around so the casting actually works.	2020-05-29 18:06:04 -04:00
Tomasz Elendt	a7c36c8af5	Support multiple tokens on LHS in stemmer_override rules (#56113 ) (#56484 ) This commit adds support for rules with multiple tokens on LHS, also known as "contraction rules", into stemmer override token filter. Contraction rules are handy into translating multiple inflected words into the same root form. One side effect of this change is that it brings stemmer override rules format closer to synonym rules format so that it makes it easier to translate one into another. This change also makes stemmer override rules parser more strict so that it should catch more errors which were previously accepted. Closes #56113	2020-05-29 22:34:31 +02:00
Henning Andersen	8427d677e9	Reindex and friends fail nicely when max_docs < slices (#54901 ) (#57348 ) When the parameter `max_docs` is less than `slices` in update_by_query, delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an action_request_validation_exception with confused error message: "maxDocs should be greater than 0...". This change checks that whether `max_docs` is less than `slices` and throw an illegal_argument_exception with clear message. Relates to #52786. Co-authored-by: bellengao <gbl_long@163.com>	2020-05-29 14:30:14 +02:00
Lee Hinman	c0f732b9f6	[7.x] Rename template V2 classes to ComposableTemplate (#57183 ) (#57232 ) Backports the following commits to 7.x: Rename template V2 classes to ComposableTemplate (#57183)	2020-05-27 11:01:59 -06:00
Alan Woodward	d6b79bcd95	Remove Mapper.updateFieldType() (#57151 ) When we had multiple mapping types, an update to a field in one type had to be propagated to the same field in all other types. This was done using the Mapper.updateFieldType() method, called at the end of a merge. However, now that we only have a single type per index, this method is unnecessary and can be removed. Relates to #41059 Backport of #56986	2020-05-27 09:21:24 +01:00
Armin Braun	56401d3f66	Release HTTP Request Body Earlier (#57094 ) (#57110 ) We don't need to hold on to the request body past the beginning of sending the response. There is no need to keep a reference to it until after the response has been sent fully and we can eagerly release it here. Note, this can be optimized further to release the contents even earlier but for now this is an easy increment to saving some memory on the IO pool.	2020-05-25 13:00:19 +02:00
Jack Conradson	35c546b388	Backports for _source bug fix in scripting (#57068 ) * Update DeprecationMap to DynamicMap (#56149) This renames DeprecationMap to DynamicMap, and changes the deprecation messages Map to accept a Map of String (keys) to Functions (updated values) instead. This creates more flexibility in either logging or updating values from params within a script. This change is required to fix (#52103) in a future PR. * Fix Source Return Bug in Scripting (#56831) This change ensures that when a user returns _source directly no matter where accessed within scripting, the value is a Map of the converted source as opposed to a SourceLookup.	2020-05-21 17:07:38 -07:00
markharwood	eb8cb31d46	Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024 ) Same version as master	2020-05-21 11:28:16 +01:00
Andrei Balici	19a336e8d3	Add `max_token_length` setting to the CharGroupTokenizer (#56860 ) Adds `max_token_length` option to the CharGroupTokenizer. Updates documentation as well to reflect the changes. Closes #56676	2020-05-20 14:28:40 +02:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Tim Brooks	57c3a61535	Create HttpRequest earlier in pipeline (#56393 ) Elasticsearch requires that a HttpRequest abstraction be implemented by http modules before server processing. This abstraction controls when underlying resources are released. This commit moves this abstraction to be created immediately after content aggregation. This change will enable follow-up work including moving Cors logic into the server package and tracking bytes as they are aggregated from the network level.	2020-05-18 14:54:01 -06:00
Armin Braun	cac85a6f18	Shorter Path in Netty ByteBuf Unwrap (#56740 ) (#56857 ) In most cases we are seeing a `PooledHeapByteBuf` here now. No need to redundantly create an new `ByteBuffer` and single element array for it here when we can just directly unwrap its internal `byte[]`.	2020-05-16 11:54:36 +02:00
Alan Woodward	d33d13f2be	Simplify generics on Mapper.Builder (#56747 ) Mapper.Builder currently has some complex generics on it to allow fluent builder construction. However, the second parameter, a return type from the build() method, is unnecessary, as we can use covariant return types. This commit removes this second generic parameter.	2020-05-15 12:14:49 +01:00
Ryan Ernst	9fb80d3827	Move publishing configuration to a separate plugin (#56727 ) This is another part of the breakup of the massive BuildPlugin. This PR moves the code for configuring publications to a separate plugin. Most of the time these publications are jar files, but this also supports the zip publication we have for integ tests.	2020-05-14 20:23:07 -07:00
Armin Braun	14a042fbe5	Make No. of Transport Threads == Available CPUs (#56488 ) (#56780 ) We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using up to `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.	2020-05-14 21:33:46 +02:00
Mark Tozzi	b718193a01	Clean up DocValuesIndexFieldData (#56372 ) (#56684 )	2020-05-14 12:42:37 -04:00
Julie Tibshirani	1ad83c37c4	Use index sort range query when possible. (#56710 ) This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when possible to speed up certain range queries. Points-based queries are already very efficient, the only time this query makes a difference is when the range matches a large number of documents. Relates to #48665.	2020-05-13 13:24:45 -07:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Jake Landis	a56fb6192e	[7.x] Fix ingest simulate verbose on failure with conditional (#56478 ) (#56635 ) If a conditional is added to a processor, and that processor fails, and that processor has an on_failure handler, the full trace of all of the executed processors may not be displayed in simulate verbose. The information is correct, but misses displaying some of the steps used to get there. This happens because a processor that is conditional processor is a wrapper around the real processor and a processor with an on_failure handler is also a wrapper around the processor(s). When decorating for simulation we treat compound processor specially, but if a compound processor is wrapped by a conditional processor that compound processor's processors can be missed for decoration resulting in the missing displayed steps. The fix to this is to treat the conditional processor specially and explicitly seperate it from the processor it is wrapping. This requires us to keep track of 2 processors a possible conditional processor and the actual processor it may be wrapping. related: #56004	2020-05-12 15:41:05 -05:00
Armin Braun	b449661b8f	Remove Unused ByteBufStreamInput (#56567 ) (#56601 ) We're not using this one any more.	2020-05-12 16:04:58 +02:00
Tim Brooks	760ab726c2	Share netty event loops between transports (#56553 ) Currently Elasticsearch creates independent event loop groups for each transport (http and internal) transport type. This is unnecessary and can lead to contention when different threads access shared resources (ex: allocators). This commit moves to a model where, by default, the event loops are shared between the transports. The previous behavior can be attained by specifically setting the http worker count.	2020-05-11 15:43:43 -06:00
Nik Everett	2f38aeb5e2	Save memory when numeric terms agg is not top (#55873 ) (#56454 ) Right now all implementations of the `terms` agg allocate a new `Aggregator` per bucket. This uses a bunch of memory. Exactly how much isn't clear but each `Aggregator` ends up making its own objects to read doc values which have non-trivial buffers. And it forces all of it sub-aggregations to do the same. We allocate a new `Aggregator` per bucket for two reasons: 1. We didn't have an appropriate data structure to track the sub-ordinals of each parent bucket. 2. You can only make a single call to `runDeferredCollections(long...)` per `Aggregator` which was the only way to delay collection of sub-aggregations. This change switches the method that builds aggregation results from building them one at a time to building all of the results for the entire aggregator at the same time. It also adds a fairly simplistic data structure to track the sub-ordinals for `long`-keyed buckets. It uses both of those to power numeric `terms` aggregations and removes the per-bucket allocation of their `Aggregator`. This fairly substantially reduces memory consumption of numeric `terms` aggregations that are not the "top level", especially when those aggregations contain many sub-aggregations. It also is a pretty big speed up, especially when the aggregation is under a non-selective aggregation like the `date_histogram`. I picked numeric `terms` aggregations because those have the simplest implementation. At least, I could kind of fit it in my head. And I haven't fully understood the "bytes"-based terms aggregations, but I imagine I'll be able to make similar optimizations to them in follow up changes.	2020-05-08 20:38:53 -04:00
Mark Vieira	0fb9bc5379	Always use archive base name as the pom artifact id (#56447 ) (#56467 )	2020-05-08 16:11:19 -07:00
Jason Tedor	33669c0420	Upgrade to Jackson 2.10.4 (#56188 ) Another Jackson release is available. There are some CVEs addressed, none of which impact us, but since we can now bump Jackson easily, let us move along with the train to avoid the false positives from security scanners.	2020-05-06 17:20:23 -04:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Nhat Nguyen	c305cfbbb6	Fix CancelTests#testDeleteByQueryCancelWithWorkers (#56242 ) We need to relax the assertion as a TaskCancelledException can be suppressed instead. Closes #55647	2020-05-06 09:55:40 -04:00
Tim Brooks	6a51017cb2	Upgrade netty to 4.1.49.Final (#56059 )	2020-05-05 10:40:23 -06:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Armin Braun	75d4a4def4	Fix potential NPEin Netty4Transport.stopInternal (#56080 ) (#56129 ) Closes #56068	2020-05-04 19:38:21 +02:00
markharwood	e197b6c45b	Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432 ) (#56100 ) Authored-by: Amit Khandelwal <amitmbm87@gmail.com>	2020-05-04 11:31:28 +01:00
Dan Hermann	2061652988	Ensure auto close of HTMLStripCharFilter in HtmlStripProcessor The HtmlStripProcessor did not use a try-with resources block to ensure that the used HTMLStripCharFilter is closed.	2020-05-01 17:31:53 -05:00
Igor Motov	d8f9df771d	Expose agg usage in Feature Usage API (#55732 ) (#56048 ) Counts usage of the aggs and exposes them on the _nodes/usage/. Closes #53746	2020-04-30 12:53:36 -04:00
Przemko Robakowski	797f63e743	[7.x] Emit deprecation warning if multiple v1 templates match with a new index (#55558 ) (#56038 ) * Emit deprecation warning if multiple v1 templates match with a new index (#55558) * Emit deprecation warning if multiple v1 templates match with a new index * DEPRECATION_LOGGER rename	2020-04-30 17:36:17 +02:00
Przemko Robakowski	bf0204ba06	Fix empty_value handling in CsvProcessor (#55649 ) (#55968 ) * Fix empty_value handling in CsvProcessor Due to bug in `CsvProcessor.Factory` it was impossible to specify `empty_value`. This change fixes that and adds relevant test. Closes #55643 * assert changed	2020-04-29 22:37:22 +02:00
Amit Khandelwal	126e4acca8	Expose `preserve_original` in `edge_ngram` token filter (#55766 ) The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767	2020-04-28 10:24:27 +02:00
Tim Brooks	80662f31a1	Introduce mechanism to stub request handling (#55832 ) Currently there is a clear mechanism to stub sending a request through the transport. However, this is limited to testing exceptions on the sender side. This commit reworks our transport related testing infrastructure to allow stubbing request handling on the receiving side.	2020-04-27 16:57:15 -06:00
Ryan Ernst	70b499b7aa	Simplify java home verification (#55635 ) * Simplify java home verification At one time, all uses of java home were found through the getJavaHome utility method on BuildPlugin. However, that was changed many refactorings ago, but the complex support for registering a java home version needed that fails at configuration time still exists. The only remaining use of grabbing java home is within bwc tests, and must be at runtime since that is when we have the checkout and know what version is needed. This commit consolidates the java home finding method into a utility unassociated with BuildPlugin. * fix checkstyle * address feedback	2020-04-27 12:43:32 -07:00
Jake Landis	7b4bacebb5	[7.x] fix the schema validation for scripts_painless_context (#55738 ) (#55751 )	2020-04-27 08:39:56 -05:00
Rory Hunter	d66af46724	Always use deprecateAndMaybeLog for deprecation warnings (#55319 ) Backport of #55115. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages can potentially be deduplicated.	2020-04-23 09:20:54 +01:00
Jake Landis	25ea6a74f0	[7.x] Validate REST specs against schema (#55117 ) (#55563 ) A JSON schema was recently introduced for the REST API specification. #54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes #54314	2020-04-22 14:14:03 -05:00
Tal Levy	0844455505	Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037 ) (#55500 ) After #53562, the `geo_shape` field mapper is registered within a module. This opens the door for introducing a new `geo_shape` field mapper into the Spatial Plugin that has doc-values support. This is very much an extension of server's GeoShapeFieldMapper, but with the addition of the doc values implementation.	2020-04-22 08:12:54 -07:00
Jason Tedor	1553e7e620	Encapsulate systemd extender The systemd extender is a scheduled execution that ensures we repeatedly let systemd know during startup that we are still starting up. We cancel this scheduled execution once the node has successfully started up. This extender is wrapped in a set once, which we expose directly. This commit addresses this by putting the extender behind a getter, which hides the implementation detail that the extener is wrapped in a set once. This cleans up some issues in tests, that ensures we are not making assertions about the set once, but instead about the extender.	2020-04-20 21:17:42 -04:00
Jason Tedor	80f18ad31a	Use set once for systemd extender (#55497 ) When Elasticsearch is starting up, we schedule a thread to repeatedly let systemd know that we are still in the process of starting up. Today we use a non-final field for this. This commit changes this to be a set once so we can mark the field as final, and get stronger guarantees when reasoning about the state of execution here.	2020-04-20 21:15:04 -04:00
Zachary Tong	f46b567563	Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250 ) Some aggregations, such as the Terms* family, will use an alternate class to represent unmapped shard results (while the rest of the aggs use the same object but with some form of "empty" or "nullish" values to represent unmapped). This was problematic with AbstractWireSerializingTestCase because it expects the instanceReader to always match the original class. Instead, we need to use the NamedWriteable version so that the registry can be consulted for the proper deserialization reader.	2020-04-17 16:39:38 -04:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00
Mark Tozzi	22c55180c1	[7.x] Backport ValuesSourceRegistry and related work (#54922 ) * Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>	2020-04-16 16:54:46 -04:00
David Turner	7941f4a47e	Add RepositoriesService to createComponents() args (#54814 ) Today we pass the `RepositoriesService` to the searchable snapshots plugin during the initialization of the `RepositoryModule`, forcing the plugin to be a `RepositoryPlugin` even though it does not implement any repositories. After discussion we decided it best for now to pass this in via `Plugin#createComponents` instead, pending some future work in which plugins can depend on services more dynamically.	2020-04-16 16:27:36 +01:00
William Brafford	2ba3be9db6	Remove deprecated third-party methods from tests (#55255 ) (#55269 ) I've noticed that a lot of our tests are using deprecated static methods from the Hamcrest matchers. While this is not a big deal in any objective sense, it seems like a small good thing to reduce compilation warnings and be ready for a new release of the matcher library if we need to upgrade. I've also switched a few other methods in tests that have drop-in replacements.	2020-04-15 17:54:47 -04:00
Ignacio Vera	a677b63daa	Upgrade to lucene 8.5.1 release (#55229 ) (#55235 ) Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.	2020-04-15 17:35:42 +02:00
Mark Vieira	ce85063653	[7.x] Re-add origin url information to publish POM files (#55173 )	2020-04-14 13:24:15 -07:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Jake Landis	a2fafa6af4	[7.x] Lazy test cluster module and plugins (#54852 ) (#55087 ) This change converts the module and plugin parameters for testClusters to be lazy. Meaning that the values are not resolved until they are actually used. This removes the requirement to use project.afterEvaluate to be able to resolve the bundle artifact. Note - this does not completely remove the need for afterEvaluate since it is still needed for the custom resource extension.	2020-04-13 10:53:35 -05:00
Jason Tedor	9eeae59a83	Clarify available processors (#54907 ) The use of available processors, the terminology, and the settings around it have evolved over time. This commit cleans up some places in the codes and in the docs to adjust to the current terminology.	2020-04-10 08:48:27 -04:00
Mark Vieira	dd73a14d11	Improve total build configuration time (#54611 ) (#54994 ) This commit includes a number of changes to reduce overall build configuration time. These optimizations include: - Removing the usage of the 'nebula.info-scm' plugin. This plugin leverages jgit to load read various pieces of VCS information. This is mostly overkill and we have our own minimal implementation for determining the current commit id. - Removing unnecessary build dependencies such as perforce and jgit now that we don't need them. This reduces our classpath considerably. - Expanding the usage lazy task creation, particularly in our distribution projects. The archives and packages projects create lots of tasks with very complex configuration. Avoiding the creation of these tasks at configuration time gives us a nice boost.	2020-04-08 16:47:02 -07:00
Jay Modi	3600c9862f	Reintroduce system index APIs for Kibana (#54935 ) This change reintroduces the system index APIs for Kibana without the changes made for marking what system indices could be accessed using these APIs. In essence, this is a partial revert of #53912. The changes for marking what system indices should be allowed access will be handled in a separate change. The APIs introduced here are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Relates #52385 Backport of #54858	2020-04-08 09:08:49 -06:00
Tal Levy	254d1e3543	[7.x] Create new `geo` module and migrate geo_shape registration (#53562 ) (#54924 ) This commit introduces a new `geo` module that is intended to be contain all the geo-spatial-specific features in server. As a first step, the responsibility of registering the geo_shape field mapper is moved to this module. Co-authored-by: Nicholas Knize <nknize@gmail.com>	2020-04-07 16:30:58 -07:00

1 2 3 4 5 ...

5617 Commits