OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alan Woodward	16e230dcb8	Update to lucene snapshot e7c625430ed (#57981 ) Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.	2020-06-11 14:51:53 +01:00
Nik Everett	0a2bd10758	Save memory when parent and child are not on top (#57892 ) (#57944 ) Reworks the `parent` and `child` aggregation are not at the top level using the optimization from #55873. Instead of wrapping all non-top-level `parent` and `child` aggregators we now handle being a child aggregator in the aggregator, specifically by adding recording which global ordinals show up in the parent and then checking if they match the child.	2020-06-10 16:25:10 -04:00
Yannick Welsch	80f221e920	Use clean thread context for transport and applier service (#57792 ) (#57914 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-10 10:30:28 +02:00
Jake Landis	a370d5eead	[7.x] Ensure Joni warning are logged at debug (#57302 ) (#57897 ) When Joni, the regex engine that powers grok emits a warning it does so by default to System.err. System.err logs are all bucketed together in the server log at WARN level. When Joni emits a warning, it can be extremely verbose, logging a message for each execution again that pattern. For ingest node that means for every document that is run that through Grok. Fortunately, Joni provides a call back hook to push these warnings to a custom location. This commit implements Joni's callback hook to push the Joni warning to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor) at debug level. Generally these warning indicate a possible issue with the regular expression and upon creation of the Grok processor will do a "test run" of the expression and log the result (if any) at WARN level. This WARN level log should only occur on pipeline creation which is a much lower frequency then every document. Additionally, the documentation is updated with instructions for how to set the logger to debug level.	2020-06-09 17:06:29 -05:00
Yannick Welsch	9eec819c5b	Revert "Use clean thread context for transport and applier service (#57792 )" This reverts commit `259be236cf`.	2020-06-09 22:24:54 +02:00
Jake Landis	fff0a106c9	[7.x] Support `if_seq_no` and `if_primary_term` for ingest (#55430 ) (#57768 ) Allow for optimistic concurrency control during ingest by checking the sequence number and primary term. This is accomplished by defining _if_seq_no and _if_primary_term in the pipeline, similarly to _version and _version_type. Closes #41255 Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>	2020-06-09 14:20:26 -05:00
Yannick Welsch	259be236cf	Use clean thread context for transport and applier service (#57792 ) Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and also that thread contexts are not leaked). Moves the ClusterApplierService to use the system context (same as we do for MasterService), which allows to remove a hack from TemplateUgradeService and makes it clearer that applying CS updates is fully executing under system context.	2020-06-09 12:32:28 +02:00
Mayya Sharipova	70e63a365a	Refactor how to determine if a field is metafield (#57378 ) (#57771 ) Before to determine if a field is meta-field, a static method of MapperService isMetadataField was used. This method was using an outdated static list of meta-fields. This PR instead changes this method to the instance method that is also aware of meta-fields in all registered plugins. Related #38373, #41656 Closes #24422	2020-06-08 09:16:18 -04:00
Tanguy Leroux	0e57528d5d	Remove more //NORELEASE (#57517 ) We agreed on removing the following //NORELEASE tags.	2020-06-05 15:34:06 +02:00
Armin Braun	24779c80f9	Serialize Outbound Message on Flush (#57084 ) (#57682 ) Follow up to #56961: We can be a little more efficient than just serializing at the IO loop by serializing only when we flush to a channel. This has the advantage that we don't serialize a long queue of messages for a channel that isn't writable for a longer period of time (unstable network, actually writing large volumes of data, etc.). Also, this further reduces the time for which we hold on to the write buffer for a message, making allocations because of an empty page cache recycler pool less likely.	2020-06-04 18:06:13 +02:00
Nik Everett	928794cd61	Make parent and child aggregator more obvious (#57490 ) (#57553 ) Pulls the way that the `ParentJoinAggregator` collects global ordinals into a strategy object so it is a little simpler to reason about and it'll be simpler to save memory by removing `asMultiBucketAggregator` in the future. Relates to #56487	2020-06-02 16:22:38 -04:00
Mark Tozzi	e50f514092	IndexFieldData should hold the ValuesSourceType (#57373 ) (#57532 )	2020-06-02 12:16:53 -04:00
Armin Braun	ba2d70d8eb	Serialize Outbound Messages on IO Threads (#56961 ) (#57080 ) Almost every outbound message is serialized to buffers of 16k pagesize. We were serializing these messages off the IO loop (and retaining the concrete message instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the channel is ready. 1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average. 2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically. With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable. Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC. This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain). I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once. Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.	2020-06-02 16:15:18 +02:00
Nik Everett	f52e779806	Fix casting of scaled_float in sorts (#57207 ) (#57385 ) Previously we'd get a `ClassCastException` when you tried to use `numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves some code around so the casting actually works.	2020-05-29 18:06:04 -04:00
Tomasz Elendt	a7c36c8af5	Support multiple tokens on LHS in stemmer_override rules (#56113 ) (#56484 ) This commit adds support for rules with multiple tokens on LHS, also known as "contraction rules", into stemmer override token filter. Contraction rules are handy into translating multiple inflected words into the same root form. One side effect of this change is that it brings stemmer override rules format closer to synonym rules format so that it makes it easier to translate one into another. This change also makes stemmer override rules parser more strict so that it should catch more errors which were previously accepted. Closes #56113	2020-05-29 22:34:31 +02:00
Henning Andersen	8427d677e9	Reindex and friends fail nicely when max_docs < slices (#54901 ) (#57348 ) When the parameter `max_docs` is less than `slices` in update_by_query, delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an action_request_validation_exception with confused error message: "maxDocs should be greater than 0...". This change checks that whether `max_docs` is less than `slices` and throw an illegal_argument_exception with clear message. Relates to #52786. Co-authored-by: bellengao <gbl_long@163.com>	2020-05-29 14:30:14 +02:00
Lee Hinman	c0f732b9f6	[7.x] Rename template V2 classes to ComposableTemplate (#57183 ) (#57232 ) Backports the following commits to 7.x: Rename template V2 classes to ComposableTemplate (#57183)	2020-05-27 11:01:59 -06:00
Alan Woodward	d6b79bcd95	Remove Mapper.updateFieldType() (#57151 ) When we had multiple mapping types, an update to a field in one type had to be propagated to the same field in all other types. This was done using the Mapper.updateFieldType() method, called at the end of a merge. However, now that we only have a single type per index, this method is unnecessary and can be removed. Relates to #41059 Backport of #56986	2020-05-27 09:21:24 +01:00
Armin Braun	56401d3f66	Release HTTP Request Body Earlier (#57094 ) (#57110 ) We don't need to hold on to the request body past the beginning of sending the response. There is no need to keep a reference to it until after the response has been sent fully and we can eagerly release it here. Note, this can be optimized further to release the contents even earlier but for now this is an easy increment to saving some memory on the IO pool.	2020-05-25 13:00:19 +02:00
Jack Conradson	35c546b388	Backports for _source bug fix in scripting (#57068 ) * Update DeprecationMap to DynamicMap (#56149) This renames DeprecationMap to DynamicMap, and changes the deprecation messages Map to accept a Map of String (keys) to Functions (updated values) instead. This creates more flexibility in either logging or updating values from params within a script. This change is required to fix (#52103) in a future PR. * Fix Source Return Bug in Scripting (#56831) This change ensures that when a user returns _source directly no matter where accessed within scripting, the value is a Map of the converted source as opposed to a SourceLookup.	2020-05-21 17:07:38 -07:00
markharwood	eb8cb31d46	Update Lucene version to 8.6.0-snapshot-9d6c738ffce (#57024 ) Same version as master	2020-05-21 11:28:16 +01:00
Andrei Balici	19a336e8d3	Add `max_token_length` setting to the CharGroupTokenizer (#56860 ) Adds `max_token_length` option to the CharGroupTokenizer. Updates documentation as well to reflect the changes. Closes #56676	2020-05-20 14:28:40 +02:00
Alan Woodward	18bfbeda29	Move merge compatibility logic from MappedFieldType to FieldMapper (#56915 ) Merging logic is currently split between FieldMapper, with its merge() method, and MappedFieldType, which checks for merging compatibility. The compatibility checks are called from a third class, MappingMergeValidator. This makes it difficult to reason about what is or is not compatible in updates, and even what is in fact updateable - we have a number of tests that check compatibility on changes in mapping configuration that are not in fact possible. This commit refactors the compatibility logic so that it all sits on FieldMapper, and makes it called at merge time. It adds a new FieldMapperTestCase base class that FieldMapper tests can extend, and moves the compatibility testing machinery from FieldTypeTestCase to here. Relates to #56814	2020-05-20 09:43:13 +01:00
Tim Brooks	57c3a61535	Create HttpRequest earlier in pipeline (#56393 ) Elasticsearch requires that a HttpRequest abstraction be implemented by http modules before server processing. This abstraction controls when underlying resources are released. This commit moves this abstraction to be created immediately after content aggregation. This change will enable follow-up work including moving Cors logic into the server package and tracking bytes as they are aggregated from the network level.	2020-05-18 14:54:01 -06:00
Armin Braun	cac85a6f18	Shorter Path in Netty ByteBuf Unwrap (#56740 ) (#56857 ) In most cases we are seeing a `PooledHeapByteBuf` here now. No need to redundantly create an new `ByteBuffer` and single element array for it here when we can just directly unwrap its internal `byte[]`.	2020-05-16 11:54:36 +02:00
Alan Woodward	d33d13f2be	Simplify generics on Mapper.Builder (#56747 ) Mapper.Builder currently has some complex generics on it to allow fluent builder construction. However, the second parameter, a return type from the build() method, is unnecessary, as we can use covariant return types. This commit removes this second generic parameter.	2020-05-15 12:14:49 +01:00
Ryan Ernst	9fb80d3827	Move publishing configuration to a separate plugin (#56727 ) This is another part of the breakup of the massive BuildPlugin. This PR moves the code for configuring publications to a separate plugin. Most of the time these publications are jar files, but this also supports the zip publication we have for integ tests.	2020-05-14 20:23:07 -07:00
Armin Braun	14a042fbe5	Make No. of Transport Threads == Available CPUs (#56488 ) (#56780 ) We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using up to `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.	2020-05-14 21:33:46 +02:00
Mark Tozzi	b718193a01	Clean up DocValuesIndexFieldData (#56372 ) (#56684 )	2020-05-14 12:42:37 -04:00
Julie Tibshirani	1ad83c37c4	Use index sort range query when possible. (#56710 ) This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when possible to speed up certain range queries. Points-based queries are already very efficient, the only time this query makes a difference is when the range matches a large number of documents. Relates to #48665.	2020-05-13 13:24:45 -07:00
Ignacio Vera	b4521d5183	upgrade to Lucene 8.6.0 snapshot (#56661 )	2020-05-13 14:25:16 +02:00
Jake Landis	a56fb6192e	[7.x] Fix ingest simulate verbose on failure with conditional (#56478 ) (#56635 ) If a conditional is added to a processor, and that processor fails, and that processor has an on_failure handler, the full trace of all of the executed processors may not be displayed in simulate verbose. The information is correct, but misses displaying some of the steps used to get there. This happens because a processor that is conditional processor is a wrapper around the real processor and a processor with an on_failure handler is also a wrapper around the processor(s). When decorating for simulation we treat compound processor specially, but if a compound processor is wrapped by a conditional processor that compound processor's processors can be missed for decoration resulting in the missing displayed steps. The fix to this is to treat the conditional processor specially and explicitly seperate it from the processor it is wrapping. This requires us to keep track of 2 processors a possible conditional processor and the actual processor it may be wrapping. related: #56004	2020-05-12 15:41:05 -05:00
Armin Braun	b449661b8f	Remove Unused ByteBufStreamInput (#56567 ) (#56601 ) We're not using this one any more.	2020-05-12 16:04:58 +02:00
Tim Brooks	760ab726c2	Share netty event loops between transports (#56553 ) Currently Elasticsearch creates independent event loop groups for each transport (http and internal) transport type. This is unnecessary and can lead to contention when different threads access shared resources (ex: allocators). This commit moves to a model where, by default, the event loops are shared between the transports. The previous behavior can be attained by specifically setting the http worker count.	2020-05-11 15:43:43 -06:00
Nik Everett	2f38aeb5e2	Save memory when numeric terms agg is not top (#55873 ) (#56454 ) Right now all implementations of the `terms` agg allocate a new `Aggregator` per bucket. This uses a bunch of memory. Exactly how much isn't clear but each `Aggregator` ends up making its own objects to read doc values which have non-trivial buffers. And it forces all of it sub-aggregations to do the same. We allocate a new `Aggregator` per bucket for two reasons: 1. We didn't have an appropriate data structure to track the sub-ordinals of each parent bucket. 2. You can only make a single call to `runDeferredCollections(long...)` per `Aggregator` which was the only way to delay collection of sub-aggregations. This change switches the method that builds aggregation results from building them one at a time to building all of the results for the entire aggregator at the same time. It also adds a fairly simplistic data structure to track the sub-ordinals for `long`-keyed buckets. It uses both of those to power numeric `terms` aggregations and removes the per-bucket allocation of their `Aggregator`. This fairly substantially reduces memory consumption of numeric `terms` aggregations that are not the "top level", especially when those aggregations contain many sub-aggregations. It also is a pretty big speed up, especially when the aggregation is under a non-selective aggregation like the `date_histogram`. I picked numeric `terms` aggregations because those have the simplest implementation. At least, I could kind of fit it in my head. And I haven't fully understood the "bytes"-based terms aggregations, but I imagine I'll be able to make similar optimizations to them in follow up changes.	2020-05-08 20:38:53 -04:00
Mark Vieira	0fb9bc5379	Always use archive base name as the pom artifact id (#56447 ) (#56467 )	2020-05-08 16:11:19 -07:00
Jason Tedor	33669c0420	Upgrade to Jackson 2.10.4 (#56188 ) Another Jackson release is available. There are some CVEs addressed, none of which impact us, but since we can now bump Jackson easily, let us move along with the train to avoid the false positives from security scanners.	2020-05-06 17:20:23 -04:00
Julie Tibshirani	e852bb29b7	Simplify signature of FieldMapper#parseCreateField. (#56144 ) `FieldMapper#parseCreateField` accepts the parse context, plus a list of fields as an output parameter. These fields are immediately added to the document through `ParseContext#doc()`. This commit simplifies the signature by removing the list of fields, and having the mappers add the fields directly to `ParseContext#doc()`. I think this is nicer for implementors, because previously fields could be added either through the list, or the context (through `add`, `addWithKey`, etc.)	2020-05-06 11:12:09 -07:00
Nhat Nguyen	c305cfbbb6	Fix CancelTests#testDeleteByQueryCancelWithWorkers (#56242 ) We need to relax the assertion as a TaskCancelledException can be suppressed instead. Closes #55647	2020-05-06 09:55:40 -04:00
Tim Brooks	6a51017cb2	Upgrade netty to 4.1.49.Final (#56059 )	2020-05-05 10:40:23 -06:00
Martijn van Groningen	2ac32db607	Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151 ) Backport of #56034. Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context as a dedicated field that callers to IndexNameExpressionResolver can set. Also alter indices stats api to support data streams. The rollover api uses this api and otherwise rolling over data stream does no longer work. Relates to #53100	2020-05-04 22:38:33 +02:00
Armin Braun	75d4a4def4	Fix potential NPEin Netty4Transport.stopInternal (#56080 ) (#56129 ) Closes #56068	2020-05-04 19:38:21 +02:00
markharwood	e197b6c45b	Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432 ) (#56100 ) Authored-by: Amit Khandelwal <amitmbm87@gmail.com>	2020-05-04 11:31:28 +01:00
Dan Hermann	2061652988	Ensure auto close of HTMLStripCharFilter in HtmlStripProcessor The HtmlStripProcessor did not use a try-with resources block to ensure that the used HTMLStripCharFilter is closed.	2020-05-01 17:31:53 -05:00
Igor Motov	d8f9df771d	Expose agg usage in Feature Usage API (#55732 ) (#56048 ) Counts usage of the aggs and exposes them on the _nodes/usage/. Closes #53746	2020-04-30 12:53:36 -04:00
Przemko Robakowski	797f63e743	[7.x] Emit deprecation warning if multiple v1 templates match with a new index (#55558 ) (#56038 ) * Emit deprecation warning if multiple v1 templates match with a new index (#55558) * Emit deprecation warning if multiple v1 templates match with a new index * DEPRECATION_LOGGER rename	2020-04-30 17:36:17 +02:00
Przemko Robakowski	bf0204ba06	Fix empty_value handling in CsvProcessor (#55649 ) (#55968 ) * Fix empty_value handling in CsvProcessor Due to bug in `CsvProcessor.Factory` it was impossible to specify `empty_value`. This change fixes that and adds relevant test. Closes #55643 * assert changed	2020-04-29 22:37:22 +02:00
Amit Khandelwal	126e4acca8	Expose `preserve_original` in `edge_ngram` token filter (#55766 ) The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767	2020-04-28 10:24:27 +02:00
Tim Brooks	80662f31a1	Introduce mechanism to stub request handling (#55832 ) Currently there is a clear mechanism to stub sending a request through the transport. However, this is limited to testing exceptions on the sender side. This commit reworks our transport related testing infrastructure to allow stubbing request handling on the receiving side.	2020-04-27 16:57:15 -06:00
Ryan Ernst	70b499b7aa	Simplify java home verification (#55635 ) * Simplify java home verification At one time, all uses of java home were found through the getJavaHome utility method on BuildPlugin. However, that was changed many refactorings ago, but the complex support for registering a java home version needed that fails at configuration time still exists. The only remaining use of grabbing java home is within bwc tests, and must be at runtime since that is when we have the checkout and know what version is needed. This commit consolidates the java home finding method into a utility unassociated with BuildPlugin. * fix checkstyle * address feedback	2020-04-27 12:43:32 -07:00
Jake Landis	7b4bacebb5	[7.x] fix the schema validation for scripts_painless_context (#55738 ) (#55751 )	2020-04-27 08:39:56 -05:00
Rory Hunter	d66af46724	Always use deprecateAndMaybeLog for deprecation warnings (#55319 ) Backport of #55115. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages can potentially be deduplicated.	2020-04-23 09:20:54 +01:00
Jake Landis	25ea6a74f0	[7.x] Validate REST specs against schema (#55117 ) (#55563 ) A JSON schema was recently introduced for the REST API specification. #54252 This PR introduces a 3rd party validation tool to ensure that the REST specification conforms to the schema. The task is applied to the 3 projects that contain REST API specifications. The plugin wires this task into the precommit commit task, and should be considered as part of the public API for the build tools for any plugin developer to contribute their plugin's specification. An ignore parameter has been introduced for the task to allow specific file to be ignored from the validation. The ignored files in this PR will soon get issues logged and a link so they can be fixed. Closes #54314	2020-04-22 14:14:03 -05:00
Tal Levy	0844455505	Add geo_shape mapper supporting doc-values in Spatial Plugin (#55037 ) (#55500 ) After #53562, the `geo_shape` field mapper is registered within a module. This opens the door for introducing a new `geo_shape` field mapper into the Spatial Plugin that has doc-values support. This is very much an extension of server's GeoShapeFieldMapper, but with the addition of the doc values implementation.	2020-04-22 08:12:54 -07:00
Jason Tedor	1553e7e620	Encapsulate systemd extender The systemd extender is a scheduled execution that ensures we repeatedly let systemd know during startup that we are still starting up. We cancel this scheduled execution once the node has successfully started up. This extender is wrapped in a set once, which we expose directly. This commit addresses this by putting the extender behind a getter, which hides the implementation detail that the extener is wrapped in a set once. This cleans up some issues in tests, that ensures we are not making assertions about the set once, but instead about the extender.	2020-04-20 21:17:42 -04:00
Jason Tedor	80f18ad31a	Use set once for systemd extender (#55497 ) When Elasticsearch is starting up, we schedule a thread to repeatedly let systemd know that we are still in the process of starting up. Today we use a non-final field for this. This commit changes this to be a set once so we can mark the field as final, and get stronger guarantees when reasoning about the state of execution here.	2020-04-20 21:15:04 -04:00
Zachary Tong	f46b567563	Convert InternalAggTestCase to AbstractNamedWriteableTestCase (#55250 ) Some aggregations, such as the Terms* family, will use an alternate class to represent unmapped shard results (while the rest of the aggs use the same object but with some form of "empty" or "nullish" values to represent unmapped). This was problematic with AbstractWireSerializingTestCase because it expects the instanceReader to always match the original class. Instead, we need to use the NamedWriteable version so that the registry can be consulted for the proper deserialization reader.	2020-04-17 16:39:38 -04:00
Martijn van Groningen	417d5f2009	Make data streams in APIs resolvable. (#55337 ) Backport from: #54726 The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used. In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs. Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex(). If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api). Relates to #53100	2020-04-17 08:33:37 +02:00
Mark Tozzi	22c55180c1	[7.x] Backport ValuesSourceRegistry and related work (#54922 ) * Add ValuesSource Registry and associated logic (#54281) * Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638) * ValuesSourceRegistry Prototype (#48758) * Remove generics from ValuesSource related classes (#49606) * fix percentile aggregation tests (#50712) * Basic thread safety for ValuesSourceRegistry (#50340) * Remove target value type from ValuesSourceAggregationBuilder (#49943) * Cleanup default values source type (#50992) * CoreValuesSourceType no longer implements Writable (#51276) * Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131) * Put values source types on fields (#51503) * Remove VST Any (#51539) * Rewire terms agg to use new VS registry (#51182) Also adds some basic AggTestCases for untested code paths (and boilerplate for future tests once the IT are converted over) * Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337) * Wire Percentiles aggregator into new VS framework (#51639) This required a bit of a refactor to percentiles itself. Before, the Builder would switch on the chosen algo to generate an algo-specific factory. This doesn't work (or at least, would be difficult) in the new VS framework. This refactor consolidates both factories together and introduces a PercentilesConfig object to act as a standardized way to pass algo-specific parameters through the factory. This object is then used when deciding which kind of aggregator to create Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will be moved in a subsequent PR. * Remove generics and target value type from MultiVSAB (#51647) * fix checkstyle after merge (#52008) * Plumb ValuesSourceRegistry through to QuerySearchContext (#51710) * Convert RareTerms to new VS registry (#52166) * Wire up Value Count (#52225) * Wire up Max & Min aggregations (#52219) * ValuesSource refactoring: Wire up Sum aggregation (#52571) * ValuesSource refactoring: Wire up SigTerms aggregation (#52590) * Soft immutability for VSConfig (#52729) * Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734) Also fixes Percentiles which was incorrectly specified to only accept numeric, but in fact also accepts Boolean and Date (because those are numeric on master - thanks `testSupportedFieldTypes` for catching it!) * VS refactoring: Wire up stats aggregation (#52891) * ValuesSource refactoring: Wire up string_stats aggregation (#52875) * VS refactoring: Wire up median (MAD) aggregation (#52945) * fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java this commit implements `getValuesSourceType` for the ConstantKeyword field type. master was merged into feature/extensible-values-source introducing a new field type that was not implementing `getValuesSourceType`. * ValuesSource refactoring: Wire up Avg aggregation (#52752) * Wire PercentileRanks aggregator into new VS framework (#51693) * Add a VSConfig resolver for aggregations not using the registry (#53038) * Vs refactor wire up ranges and date ranges (#52918) * Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034) This commit updates the geo_bounds aggregation to depend on registering itself in the ValuesSourceRegistry relates #42949. * VS refactoring: convert Boxplot to new registry (#53132) * Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037) This commit updates the geo_grid aggregations to depend on registering itself in the ValuesSourceRegistry relates to the values-source refactoring meta issue #42949. Wire-up geo_centroid agg to ValuesSourceRegistry (#53040) This commit updates the geo_centroid aggregation to depend on registering itself in the ValuesSourceRegistry. relates to the values-source refactoring meta issue #42949. * Fix type tests for Missing aggregation (#53501) * ValuesSource Refactor: move histo VSType into XPack module (#53298) - Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core. - This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept - Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs * Wire up DateHistogram to the ValuesSourceRegistry (#53484) * Vs refactor parser cleanup (#53198) Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com> * First batch of easy fixes * Remove List.of from ValuesSourceRegistry Note that we intend to have a follow up PR dealing with the mutability of the registry, so I didn't even try to address that here. * More compiler fixes * More compiler fixes * More compiler fixes * Precommit is happy and so am I * Add new Core VSTs to tests * Disabled supported type test on SigTerms until we can backport it's fix * fix checkstyle * Fix test failure from semantic merge issue * Fix some metaData->metadata replacements that got lost * Fix list of supported types for MinAggregator * Fix list of supported types for Avg * remove unused import Co-authored-by: Zachary Tong <polyfractal@elastic.co> Co-authored-by: Zachary Tong <zach@elastic.co> Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com> Co-authored-by: Tal Levy <JubBoy333@gmail.com>	2020-04-16 16:54:46 -04:00
David Turner	7941f4a47e	Add RepositoriesService to createComponents() args (#54814 ) Today we pass the `RepositoriesService` to the searchable snapshots plugin during the initialization of the `RepositoryModule`, forcing the plugin to be a `RepositoryPlugin` even though it does not implement any repositories. After discussion we decided it best for now to pass this in via `Plugin#createComponents` instead, pending some future work in which plugins can depend on services more dynamically.	2020-04-16 16:27:36 +01:00
William Brafford	2ba3be9db6	Remove deprecated third-party methods from tests (#55255 ) (#55269 ) I've noticed that a lot of our tests are using deprecated static methods from the Hamcrest matchers. While this is not a big deal in any objective sense, it seems like a small good thing to reduce compilation warnings and be ready for a new release of the matcher library if we need to upgrade. I've also switched a few other methods in tests that have drop-in replacements.	2020-04-15 17:54:47 -04:00
Ignacio Vera	a677b63daa	Upgrade to lucene 8.5.1 release (#55229 ) (#55235 ) Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.	2020-04-15 17:35:42 +02:00
Mark Vieira	ce85063653	[7.x] Re-add origin url information to publish POM files (#55173 )	2020-04-14 13:24:15 -07:00
William Brafford	52bebec51f	NodeInfo response should use a collection rather than fields (#54460 ) (#55132 ) This is a first cut at giving NodeInfo the ability to carry a flexible list of heterogeneous info responses. The trick is to be able to serialize and deserialize an arbitrary list of blocks of information. It is convenient to be able to deserialize into usable Java objects so that we can aggregate nodes stats for the cluster stats endpoint. In order to provide a little bit of clarity about which objects can and can't be used as info blocks, I've introduced a new interface called "ReportingService." I have removed the hard-coded getters (e.g., getOs()) in favor of a flexible method that can return heterogeneous kinds of info blocks (e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the need to cast in the client code.	2020-04-13 17:18:39 -04:00
Jake Landis	a2fafa6af4	[7.x] Lazy test cluster module and plugins (#54852 ) (#55087 ) This change converts the module and plugin parameters for testClusters to be lazy. Meaning that the values are not resolved until they are actually used. This removes the requirement to use project.afterEvaluate to be able to resolve the bundle artifact. Note - this does not completely remove the need for afterEvaluate since it is still needed for the custom resource extension.	2020-04-13 10:53:35 -05:00
Jason Tedor	9eeae59a83	Clarify available processors (#54907 ) The use of available processors, the terminology, and the settings around it have evolved over time. This commit cleans up some places in the codes and in the docs to adjust to the current terminology.	2020-04-10 08:48:27 -04:00
Mark Vieira	dd73a14d11	Improve total build configuration time (#54611 ) (#54994 ) This commit includes a number of changes to reduce overall build configuration time. These optimizations include: - Removing the usage of the 'nebula.info-scm' plugin. This plugin leverages jgit to load read various pieces of VCS information. This is mostly overkill and we have our own minimal implementation for determining the current commit id. - Removing unnecessary build dependencies such as perforce and jgit now that we don't need them. This reduces our classpath considerably. - Expanding the usage lazy task creation, particularly in our distribution projects. The archives and packages projects create lots of tasks with very complex configuration. Avoiding the creation of these tasks at configuration time gives us a nice boost.	2020-04-08 16:47:02 -07:00
Jay Modi	3600c9862f	Reintroduce system index APIs for Kibana (#54935 ) This change reintroduces the system index APIs for Kibana without the changes made for marking what system indices could be accessed using these APIs. In essence, this is a partial revert of #53912. The changes for marking what system indices should be allowed access will be handled in a separate change. The APIs introduced here are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Relates #52385 Backport of #54858	2020-04-08 09:08:49 -06:00
Tal Levy	254d1e3543	[7.x] Create new `geo` module and migrate geo_shape registration (#53562 ) (#54924 ) This commit introduces a new `geo` module that is intended to be contain all the geo-spatial-specific features in server. As a first step, the responsibility of registering the geo_shape field mapper is moved to this module. Co-authored-by: Nicholas Knize <nknize@gmail.com>	2020-04-07 16:30:58 -07:00
Tim Brooks	619028c33e	Implement transport circuit breaking in aggregator (#54927 ) This commit moves the action name validation and circuit breaking into the InboundAggregator. This work is valuable because it lays the groundwork for incrementally circuit breaking as data is received. This PR includes the follow behavioral change: Handshakes contribute to circuit breaking, but cannot be broken. They currently do not contribute nor are they broken.	2020-04-07 17:10:31 -06:00
Nik Everett	ce7ae4a7d1	Remove pipline aggs from agg result tree (backport of #54716 ) (#54920 ) This removes pipeline aggregators from the aggregation result tree except for a single field used for backwards compatibility with pre-7.8 versions of Elasticsearch. That field isn't populated unless we are serializing to pre-7.8 Elasticsearch. So, good news! We no longer build pipeline aggregators on the data node. Most of the time.	2020-04-07 17:22:23 -04:00
Tim Brooks	9cf2406cf1	Move network stats marking into InboundPipeline (#54908 ) This is a follow-up to #48263. It moves the inbound stats tracking inside of the InboundPipeline.	2020-04-07 13:34:05 -06:00
Christoph Büscher	8c9ac14a98	Rename field name constants in AbstractBuilderTestCase (#53234 ) Some field name constants were not updaten when we moved from "string" to "text" and "keyword" fields. Renaming them makes it easier and faster to know which field type is used in test subclassing this base test case.	2020-04-03 17:28:22 +02:00
Nik Everett	54ea4f4f50	Begin to drop pipeline aggs from the result tree (backport of #54311 ) (#54659 ) Removes pipeline aggregations from the aggregation result tree as they are no longer used. This stops us from building the pipeline aggregators at all on data nodes except for backwards compatibility serialization. This will save a tiny bit of space in the aggregation tree which is lovely, but the biggest benefit is that it is a step towards simplifying pipeline aggregators. This only does about half of the work to remove the pipeline aggs from the tree. Removing all of it would, well, double the size of the change and make it harder to review.	2020-04-02 16:45:12 -04:00
William Brafford	958e9d1b78	Refactor nodes stats request builders to match requests (#54363 ) (#54604 ) * Refactor nodes stats request builders to match requests (#54363) * Remove hard-coded setters from NodesInfoRequestBuilder * Remove hard-coded setters from NodesStatsRequest * Use static imports to reduce clutter * Remove uses of old info APIs	2020-04-01 17:03:04 -04:00
Mayya Sharipova	bf4857d9e0	Search hit refactoring (#41656 ) (#54584 ) Refactor SearchHit to have separate document and meta fields. This is a part of bigger refactoring of issue #24422 to remove dependency on MapperService to check if a field is metafield. Relates to PR: #38373 Relates to issue #24422 Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>	2020-04-01 15:19:00 -04:00
Jason Tedor	5fcda57b37	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 17:24:38 -04:00
Zachary Tong	c9db2de41d	[7.x] Comprehensively test supported/unsupported field type:agg combinations (#54451 ) * Comprehensively test supported/unsupported field type:agg combinations (#52493) This adds a test to AggregatorTestCase that allows us to programmatically verify that an aggregator supports or does not support a particular field type. It fetches the list of registered field type parsers, creates a MappedFieldType from the parser and then attempts to run a basic agg against the field. A supplied list of supported VSTypes are then compared against the output (success or exception) and suceeds or fails the test accordingly. Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com> * Skip fields that are not aggregatable * Use newIndexSearcher() to avoid incompatible readers (#52723) Lucene's `newSearcher()` can generate readers like ParallelCompositeReader which we can't use. We need to instead use our helper `newIndexSearcher`	2020-03-31 14:35:03 -04:00
Alan Woodward	25a0addb17	Don't double-wrap values (#54432 ) After commit #53661 converted the lang-expressions module to using DoubleValuesSource, we've seen a performance regression for expressions that use geopoints. Some investigation suggests that this may be due to GeoLatitudeValueSource and GeoLongitudeValueSource wrapping their per-document values in a DoubleValues.withDefault() class. Values exposed via expressions already have a '0' default value, so this extra wrapping is unnecessary, and is directly on the hot path. This commit removes the extra wrapping.	2020-03-31 10:50:13 +01:00
Nik Everett	e58ad9fed3	Clean up how pipeline aggs check for multi-bucket (backport of #54161 ) (#54379 ) Pipeline aggregations like `stats_bucket`, `sum_bucket`, and `percentiles_bucket` only operate on buckets that have multiple buckets. This adds support for those aggregations to `geo_distance`, `ip_range`, `auto_date_histogram`, and `rare_terms`. This all happened because we used a marker interface to mark compatible aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget to implement the interface. This replaces the marker interface with an abstract method in `AggregationBuilder`, `bucketCardinality` which makes you return `NONE`, `ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At this point `ONE` and `NONE` amount to about the same thing, but I suspect that'll be a useful distinction when validating bucket sorts. Closes #53215	2020-03-30 10:44:55 -04:00
Stuart Tettemer	30c56087fd	Docs: Use splitOnToken instead of custom function (#48408 ) (#54364 ) Painless ingest example uses a custom split function but new splitOnToken function was added in 7.2 Backport of: 0c52a92	2020-03-27 15:04:27 -06:00
Tim Brooks	2ccddbfa88	Move transport decoding and aggregation to server (#54360 ) Currently all of our transport protocol decoding and aggregation occurs in the individual transport modules. This means that each implementation (test, netty, nio) must implement this logic. Additionally, it means that the entire message has been read from the network before the server package receives it. This commit creates a pipeline in server which can be passed arbitrary bytes to handle. Internally, the pipeline will decode, decompress, and aggregate the messages. Additionally, this allows us to run many megabytes of bytes through the pipeline in tests to ensure that the logic works. This work will enable future work: Circuit breaking or backoff logic based on message type and byte in the content aggregator. Sharing bytes with the application layer using the ref counted releasable network bytes. Improved network monitoring based specifically on channels. Finally, this fixes the bug where we do not circuit break on the correct message size when compression is enabled.	2020-03-27 14:13:10 -06:00
Tim Brooks	f5b4020819	Remove netty BytesReference implementations (#54355 ) Elasticsearch has a number of different BytesReference implementations. These implementations can all implement the interface in different ways with subtly different behavior and performance characteristics. On the other-hand, the JVM only represents bytes as an array or a direct byte buffer. This commit deletes the specialized Netty implementations and moves to using a generic ByteBuffer reference type. This will allow us to focus on standardizing performance and behave around a smaller number of implementations that can be used by all components in Elasticsearch.	2020-03-27 11:01:33 -06:00
Henning Andersen	7ce7aff66e	Reindex negative TimeValue fix (#54057 ) Reindex would use timeValueNanos(System.nanoTime()). The intended use for TimeValue is as a duration, not as absolute time. In particular, this could result in negative TimeValue's, being unsupported in #53913. Modified to use the bare long nano-second value.	2020-03-24 22:29:09 +01:00
Alan Woodward	39d7d0dc10	Upgrade to lucene 8.5.0 release (#54077 ) Upgrades our lucene dependency to the released 8.5.0 version.	2020-03-24 13:45:50 +00:00
Ryan Ernst	960d1fb578	Revert "Introduce system index APIs for Kibana (#53035 )" (#53992 ) This reverts commit `c610e0893d`. backport of #53912	2020-03-23 10:29:35 -07:00
Alan Woodward	0c010e1bfc	lang-expressions should use DoubleValuesSource, not ValueSource (#53661 ) DoubleValuesSource is the type-safe replacement for ValueSource in the lucene core. Most of elasticsearch has moved to use these, but lang-expressions is still using the old version. This commit migrates lang-expressions as well.	2020-03-23 14:48:38 +00:00
Alan Woodward	d23112f441	Report parser name and location in XContent deprecation warnings (#53805 ) It's simple to deprecate a field used in an ObjectParser just by adding deprecation markers to the relevant ParseField objects. The warnings themselves don't currently have any context - they simply say that a deprecated field has been used, but not where in the input xcontent it appears. This commit adds the parent object parser name and XContentLocation to these deprecation messages. Note that the context is automatically stripped from warning messages when they are asserted on by integration tests and REST tests, because randomization of xcontent type during these tests means that the XContentLocation is not constant	2020-03-20 11:52:55 +00:00
Jake Landis	db3420d757	[7.x] Optimize which Rest resources are used by the Rest tests… (#53766 ) This should help with Gradle's incremental compile such that projects only depend upon the resources they use. related #52114	2020-03-19 12:28:59 -05:00
Jim Ferenczi	8e17322b3a	Shortcut query phase using the results of other shards (#51852 ) (#53659 ) This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents. This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard. For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none. This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.	2020-03-18 17:20:35 +01:00
Alan Woodward	d325899c54	Use QueryVisitor when extracting PercolatorQuery list for highlighting (#53728 ) The highlighting phase for percolator queries currently uses some custom query traversal logic to find all instances of PercolatorQuery in the query tree for the current search context. This commit converts things to instead use a QueryVisitor, which future-proofs us against new wrapper queries or queries from custom plugins that the percolator module doesn't know about.	2020-03-18 15:24:49 +00:00
Dan Hermann	94ac979c66	Support array for all string ingest processors (#53694 )	2020-03-18 07:07:49 -05:00
Ryan Ernst	5c472fcb47	Upgrade jackson to 2.10.3 and GeoIP to 2.13.1 (#53642 ) Re-applies the change from #53523 along with test fixes. closes #53626 closes #53624 closes #53622 closes #53625 Co-authored-by: Nik Everett <nik9000@gmail.com> Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com> Co-authored-by: Jake Landis <jake.landis@elastic.co>	2020-03-17 10:28:51 -07:00
Alan Woodward	71b703edd1	Rename AtomicFieldData to LeafFieldData (#53554 ) This conforms with lucene's LeafReader naming convention, and matches other per-segment structures in elasticsearch.	2020-03-17 12:30:12 +00:00
Nik Everett	f0beab4041	Stop using round-tripped PipelineAggregators (backport of #53423 ) (#53629 ) This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really need one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because that makes sense.	2020-03-16 16:15:23 -04:00
bellengao	e2effa9fab	Fix inaccurate total hit count in _search template api (#53155 ) When 'rest_track_total_hits_as_int' is set to true, the total hits count in the response should be accurate. So we should set trackTotalHits to true if need when parsing the inline script of a search template request. Closes #52801	2020-03-16 11:48:43 +01:00
Mark Vieira	2f0aca992b	Revert "Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576 )" This reverts commit `b7dbadeea0`.	2020-03-15 18:10:40 -07:00
Jason Tedor	b7dbadeea0	Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576 ) This commit upgrades our Jackson dependency to 2.10.3 and our GeoIP2 dependency to 2.13.1. Relates #53523	2020-03-14 13:28:06 -04:00
Jason Tedor	32dd852210	Update jackson-databind to 2.8.11.6 (#53522 ) This commit upgrades the jackson-databind depdendency to 2.8.11.6. Additionally, we revert a previous change that put ingest-geoip on the version of jackson-databind from the version properties file. This is because upgrading ingest-geoip to a later version of jackson-databind also requires an upgrade to the geoip2 dependency which is currently blocked. Therefore, if we can get to a point where we otherwise upgrade our Jackson dependencies, we do not want ingest-geoip to automatically come along with it.	2020-03-12 20:15:13 -04:00
Alan Woodward	5c861cfe6e	Upgrade to final lucene 8.5.0 snapshot (#53293 ) Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use the latest snapshot to check that there are no last-minute bugs or regressions.	2020-03-10 09:32:59 +00:00
Nhat Nguyen	5476a49833	Revert "upgrade to lucene-snapshot-fa75139efea (#53150 ) (#53151 )" This reverts commit `058113aa42`.	2020-03-05 17:33:00 -05:00
Ignacio Vera	058113aa42	upgrade to lucene-snapshot-fa75139efea (#53150 ) (#53151 )	2020-03-05 10:04:05 +01:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Alan Woodward	3759063d34	Allow specifying an exclusive set of fields on ObjectParser (#52893 ) ObjectParser allows you to declare a set of required fields, such that at least one of the set must appear in an xcontent object for it to be valid. This commit adds the similar concept of a set of exclusive fields, such that at most one of the set must be present. It also enables required fields on ConstructingObjectParser, and re-implements PercolateQueryBuilder.fromXContent() to use object parsing as an example of how this works.	2020-03-03 10:56:20 +00:00
Nhat Nguyen	e6755afeeb	Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950 ) (#52977 ) To give LUCENE-9228 more CI cycles	2020-02-29 09:29:16 -05:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
Josh Devins	68ba571f70	Adds recall@k metric to rank eval API (#52889 ) This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: https://github.com/elastic/elasticsearch/issues/51676 Backports: https://github.com/elastic/elasticsearch/pull/52577	2020-02-27 16:04:24 +01:00
Dan Hermann	3c8b46a8c1	[7.x] Handle errors when evaluating if conditions in processors (#52892 )	2020-02-27 09:00:51 -06:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Alan Woodward	a76ec765e5	Ensure that percolator sorting also works (#52758 ) Commit #52748 fixed a bug where percolate queries wrapped in a constant score could report incorrect matches. This commit adds a test to check that it also fixes the case where a percolate query is sorted by something other than score. Closes #52618	2020-02-26 10:52:04 +00:00
Alan Woodward	638f3e4183	Use ByteBuffersDirectory rather than RAMDirectory (#52768 ) Lucene's RAMDirectory has been deprecated. This commit replaces all uses of RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses are in tests, but the percolator and painless executor may get some small speedups.	2020-02-25 15:46:35 +00:00
Henning Andersen	3ad1783a41	Delete by query test on low free disk block (#52759 ) The block setup by the test could be released by the nodes cluster info thread before the disk threshold decider was disabled, now disable decider first.	2020-02-25 15:54:30 +01:00
Alan Woodward	18663b0a85	Don't index ranges including NOW in percolator (#52748 ) Currently, date ranges queries using NOW-based date math are rewritten to MatchAllDocs queries when being preprocessed for the percolator. However, since we added the verification step, this can result in incorrect matches when percolator queries are run without scores. This commit changes things to instead wrap date queries that use NOW with a new DateRangeIncludingNowQuery. This is a simple wrapper query that returns its delegate at rewrite time, but it can be detected by the percolator QueryAnalyzer and be dealt with accordingly. This also allows us to remove a method on QueryRewriteContext, and push all logic relating to NOW-based ranges into the DateFieldMapper. Fixes #52617	2020-02-25 12:18:16 +00:00
Mayya Sharipova	034b1c0ba3	Correct boost calculation in script_score query (#52478 ) (#52724 ) Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes #48465	2020-02-24 13:48:21 -05:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
markharwood	96d603979b	Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584 ) Upgrading 7x to same Lucene 8.5 version used in master	2020-02-21 10:25:03 +00:00
Rory Hunter	8fb9bed078	Fix compilation error	2020-02-20 14:39:44 +00:00
Maria Ralli	ba8d6d1fb5	Remove Xlint exclusions from gradle files Backport of #52542. This commit is part of issue #40366 to remove disabled Xlint warnings from gradle files. In particular, it removes the Xlint exclusions from the following files: - benchmarks/build.gradle - client/client-benchmark-noop-api-plugin/build.gradle - x-pack/qa/rolling-upgrade/build.gradle - x-pack/qa/third-party/active-directory/build.gradle - modules/transport-netty4/build.gradle For the first three files no code adjustments were needed. For x-pack/qa/third-party/active-directory move the suppression at the code level. For transport-netty4 replace the variable arguments with ArrayLists and remove any redundant casts.	2020-02-20 14:12:05 +00:00
Tim Brooks	e752221fc6	Upgrade netty to 4.1.45.Final (#51689 ) Upgrade netty.	2020-02-18 09:11:29 -07:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Marios Trivyzas	ea6f0e39bc	[Tests] Update skip version for YAML tests (#52310 ) Update skip versions upper boundary to match the release or intented release version of the feature/fix.	2020-02-13 15:36:31 +01:00
Marios Trivyzas	dac720d7a1	Add a cluster setting to disallow expensive queries (#51385 ) (#52279 ) Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)	2020-02-12 22:56:14 +01:00
Nhat Nguyen	257eb0212c	Mute ‘test user agent processor with non-ECS schema’ Tracked at #52266	2020-02-12 10:27:18 -05:00
Ignacio Vera	80e3c97210	Upgrade to lucene-8.5.0-snapshot-d62f6307658 (#52039 ) (#52130 )	2020-02-10 10:13:22 +01:00
Jay Modi	3edadfefd0	RestHandlers declare handled routes (#52123 ) This commit changes how RestHandlers are registered with the RestController so that a RestHandler no longer needs to register itself with the RestController. Instead the RestHandler interface has new methods which when called provide information about the routes (method and path combinations) that are handled by the handler including any deprecated and/or replaced combinations. This change also makes the publication of RestHandlers safe since they no longer publish a reference to themselves within their constructors. Closes #51622 Co-authored-by: Jason Tedor <jason@tedor.me> Backport of #51950	2020-02-09 22:48:32 -07:00
Ioannis Kakavas	8c0b49cd32	Adjust jarHell and 3rd party audit exclusions (#51733 ) (#51766 ) Now that the FIPS 140 security provider is simply a test dependency we don't need the thirdPartyAudit exceptions, but plugin-cli and transport-netty4 do need jarHell disabled as they use the non fips BouncyCastle security provider as a test dependency too.	2020-02-10 07:38:59 +02:00
Nhat Nguyen	6d0a0e1240	Revert "Mute ReindexFailureTests test" This reverts commit `16afbf91bb`. The issue was fixed in #52099	2020-02-09 22:13:36 -05:00
Przemko Robakowski	8cf47aca7e	[7.x] Improve Painless compilation performance for nested conditionals (#52056 ) (#52074 ) * Improve Painless compilation performance for nested conditionals (#52056) This PR changes how conditional expression is handled in `PainlessParser` in a way that avoids the need for backtracking, which led to exponential compilation times in case of nested conditionals. The test was added ensures that we can compile deeply nested conditionals. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> * Fix Map.of in Java8 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-07 21:13:25 +01:00
Przemko Robakowski	c827f6f440	Avoid clash between source field and header field in CsvProcessorTests (#51962 ) (#52070 ) This change fixes flakiness in `CsvProcessorTests` where source field can be the same as one of the headers used by tests which messes up asserts when we check that field is not present after processor run. Closes #50209	2020-02-07 21:00:39 +01:00
Julie Tibshirani	337d73a7c6	Rename MapperService#fullName to fieldType. The new name more accurately describes what the method returns.	2020-02-07 10:35:53 -08:00
Armin Braun	91e938ead8	Add Trace Logging of REST Requests (#51684 ) (#52015 ) Being able to trace log all REST requests to a node would make debugging a number of issues a lot easier.	2020-02-07 09:03:20 +01:00
Mark Vieira	16afbf91bb	Mute ReindexFailureTests test	2020-02-06 16:29:04 -08:00
Mark Vieira	bc7aff917e	Mute LangPainlessClientYamlTestSuiteIT context API tests (#51939 ) (#52011 )	2020-02-06 11:04:15 -08:00
Przemko Robakowski	6332de40b4	Add empty_value parameter to CSV processor (#51567 ) (#51966 ) * Add empty_value parameter to CSV processor This change adds `empty_value` parameter to the CSV processor. This value is used to fill empty fields. Fields will be skipped if this parameter is ommited. This behavior is the same for both quoted and unquoted fields. * docs updated * Fix compilation problem Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-05 23:35:52 +01:00
Henning Andersen	d0865b963e	Disable reindex test against 0.90 on mac (#51884 ) Follow-up to #51449 to also disable the test on mac. Closes #51617	2020-02-05 16:45:51 +01:00
Adrien Grand	ad9d2f1922	Move analysis/mappings stats to cluster-stats. (#51875 ) Closes #51138	2020-02-05 11:02:25 +01:00
Maria Ralli	8d3e73b3a0	Add host address to BindTransportException message (#51269 ) When bind fails, show the host address in addition to the port. This helps debugging cases with wrong "network.host" values. Closes #48001	2020-02-04 17:13:19 +00:00
Przemyslaw Gomulka	a6d24d6a46	Fix ingest timezone logic backport(#51215 ) (#51802 ) when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter. When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter closes #51108 backport(#51215)	2020-02-03 14:17:43 +01:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Mayya Sharipova	16ef6a5785	Mute testEs090 and testEs090WithFunnyThrottle Mure tests that reproducilbly fail. Related issue #51617	2020-01-31 14:46:29 -05:00
Mayya Sharipova	42b885f050	Upgrade to lucene-8.5.0-snapshot-3333ce7da6d (#51749 ) Backport for #51327	2020-01-31 11:20:15 -05:00
Przemko Robakowski	a7f0c699cf	Fix ignore_missing in CsvProcessor (#51600 ) (#51609 ) This change fixes inverted logic around ignore_missing in CsvProcessor	2020-01-29 14:58:23 +01:00
Ioannis Kakavas	ba3051a50f	Mute Netty4ClientYamlTestSuiteIT in FIPS 140 (#51536 ) rest-api-spec/test/10_basic.yml would check that transport_types is `netty4` but we run FIPS 140 tests with default distribution and transport_types is `security4`	2020-01-29 08:16:47 +02:00
Gordon Brown	89c2834b24	Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959 ) This commit deprecates the creation of dot-prefixed index names (e.g. .watches) unless they are either 1) a hidden index, or 2) registered by a plugin that extends SystemIndexPlugin. This is the first step towards more thorough protections for system indices. This commit also modifies several plugins which use dot-prefixed indices to register indices they own as system indices, and adds a plugin to register .tasks as a system index.	2020-01-28 10:01:16 -07:00
Henning Andersen	9085024e1d	Disable reindex against 0.90 on mac (#51449 ) We still test remote reindex against version 0.90. This failed on mac a few times and rather than spend time investigating this, we no longer test remote reindex against 0.90 on mac. Closes #51202	2020-01-27 12:42:12 +01:00
Ioannis Kakavas	ee202a642f	Enable tests in FIPS 140 in JDK 11 (#49485 ) This change changes the way to run our test suites in JVMs configured in FIPS 140 approved mode. It does so by: - Configuring any given runtime Java in FIPS mode with the bundled policy and security properties files, setting the system properties java.security.properties and java.security.policy with the == operator that overrides the default JVM properties and policy. - When runtime java is 11 and higher, using BouncyCastle FIPS Cryptographic provider and BCJSSE in FIPS mode. These are used as testRuntime dependencies for unit tests and internal clusters, and copied (relevant jars) explicitly to the lib directory for testclusters used in REST tests - When runtime java is 8, using BouncyCastle FIPS Cryptographic provider and SunJSSE in FIPS mode. Running the tests in FIPS 140 approved mode doesn't require an additional configuration either in CI workers or locally and is controlled by specifying -Dtests.fips.enabled=true	2020-01-27 11:14:52 +02:00
Przemko Robakowski	3fb7ad0e67	[7.x] Refactor ForEachProcessor to use iteration instead of recursion (#51104 ) (#51322 ) * Refactor ForEachProcessor to use iteration instead of recursion (#51104) * Refactor ForEachProcessor to use iteration instead of recursion This change makes ForEachProcessor iterative and still non-blocking. In case of non-async processors we use single for loop and no recursion at all. In case of async processors we continue work on either current thread or thread started by downstream processor, whichever is slower (usually processor thread). Everything is synchronised by single atomic variable. Relates #50514 * Update IngestCommonPlugin.java	2020-01-22 20:03:37 +01:00
Stuart Tettemer	41c15b438d	Scripting: Add char position of script errors (#51069 ) (#51266 ) Add the character position of a scripting error to error responses. The contents of the `position` field are experimental and subject to change. Currently, `offset` refers to the character location where the error was encountered, `start` and `end` define a range of characters that contain the error. eg. ``` { "error": { "root_cause": [ { "type": "script_exception", "reason": "runtime error", "script_stack": [ "y = x;", " ^---- HERE" ], "script": "def x = new ArrayList(); Map y = x;", "lang": "painless", "position": { "offset": 33, "start": 29, "end": 35 } } ``` Refs: #50993	2020-01-21 13:45:59 -07:00
Nik Everett	ca15a3f5a8	Add "did you mean" to unknown queries (#51177 ) (#51254 ) This replaces the message we return for unknown queries with the standard one that we use for unknown fields from `ObjectParser`. This is nice because it includes "did you mean". One day we might convert parsing queries to using object parser, but that looks complex. This change is much smaller and seems useful.	2020-01-21 12:45:52 -05:00
Marios Trivyzas	fda25ed04a	Fix caching for PreConfiguredTokenFilter (#50912 ) (#51091 ) The PreConfiguredTokenFilter#singletonWithVersion uses the version internally for the token filter factories but it registers only one instance in the cache and not one instance per version. This can lead to exceptions like the one described in #50734 since the singleton is created and cached using the version created of the first index that is processed. Remove the singletonWithVersion() methods and use the elasticsearchVersion() methods instead. Fixes: #50734 (cherry picked from commit 24e1858)	2020-01-16 13:58:02 +01:00
Martijn van Groningen	02dfd71efa	Backport: Add pipeline name to ingest metadata (#51050 ) Backport: #50467 This commit adds the name of the current pipeline to ingest metadata. This pipeline name is accessible under the following key: '_ingest.pipeline'. Example usage in pipeline: PUT /_ingest/pipeline/2 { "processors": [ { "set": { "field": "pipeline_name", "value": "{{_ingest.pipeline}}" } } ] } Closes #42106	2020-01-16 10:50:47 +01:00
Nik Everett	fc5fde7950	Add "did you mean" to ObjectParser (#50938 ) (#50985 ) Check it out: ``` $ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{ "dac": {} }' { "error" : { "root_cause" : [ { "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" } ], "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" }, "status" : 400 } ``` The tricky thing about implementing this is that x-content doesn't depend on Lucene. So this works by creating an extension point for the error message using SPI. Elasticsearch's server module provides the "spell checking" implementation. s	2020-01-14 17:53:41 -05:00
Christoph Büscher	2f13751bad	Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862 ) (#50991 ) We deprecated and removed the camel-case versions of the nGram and edgeNGram filters a while ago and we should do the same with the nGram and edgeNGram tokenizers. This PR deprecates the use of these names in favour of ngram and edge_ngram in 7. Usage will be disallowed on new indices starting with 8 then.	2020-01-14 21:42:34 +01:00
Alan Woodward	4974f56b25	Fix analysis BWC tests - warnings now emitted on index creation	2020-01-14 14:48:40 +00:00
Alan Woodward	8c16725a0d	Check for deprecations when analyzers are built (#50908 ) Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes #42349	2020-01-14 13:52:02 +00:00
Jake Landis	de6f132887	[7.x] Foreach processor - fork recursive call (#50514 ) (#50773 ) A very large number of recursive calls can cause a stack overflow exception. This commit forks the recursive calls for non-async processors. Once forked, each thread will handle at most 10 recursive calls to help keep the stack size and thread count down to a reasonable size.	2020-01-09 13:21:18 -06:00
Christoph Büscher	b1b4282273	Make Multiplexer inherit filter chains analysis mode (#50662 ) Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the _reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. This, of course, will also make the analyzers using the multiplexer be usable at search-time only. Closes #50554	2020-01-08 22:12:01 +01:00
Henning Andersen	125feecabc	Guess root cause support unwrap (#50525 ) (#50742 ) ElasticsearchException.guessRootCauses would return wrapper exception if inner exception was not an ElasticsearchException. Fixed to never return wrapper exceptions. At least following APIs change root_cause.0.type as a result: _update with bad script _index with bad pipeline Relates #50417	2020-01-08 19:09:14 +01:00
Adrien Grand	4f2299c714	Upgrade to Lucene 8.4.0. (#50518 ) (#50750 )	2020-01-08 18:53:59 +01:00
Adrien Grand	31158ab3d5	Add per-field metadata. (#50333 ) This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267	2020-01-08 16:21:18 +01:00
Alexander Reelsen	71054d269b	Sync grok patterns with logstash patterns (#50381 ) In order to ensure that logstash and Elasticsearch are able to understand the same patterns, this commit adapts to changes in logstash, adds a few patterns and changes a few.	2020-01-08 14:59:34 +01:00
Mayya Sharipova	0b7309ec9c	Fix NPE bug inner_hits (#50709 ) When there several subqueries on different relations of the join field, and only one of subqueries is using inner_hits, NPE occurs. This PR prevents NPE error. Closes #50539	2020-01-07 14:21:54 -05:00
Alan Woodward	a3ab7eb95d	Correctly handle MSM for nested disjunctions (#50669 ) With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API, term queries that are direct children of a boolean query are handled separately from other children. This works fine for conjunctions, but for disjunctions we need to treat the extracted terms from these direct descendents along with extractions from more deeply nested children to ensure that minimum-should-match requirements are met correctly. This commit changes the logic in QueryAnalyzer#getResult() to bundle child term results with all other results before handling them. Fixes #50305	2020-01-07 09:32:30 +00:00
Nik Everett	45663ac1a8	Use Void context on parsers where possible (#50573 ) (#50617 ) Most of our parsing can be done without passing any extra context into the parser that isn't already part of the xcontent stream. While I was looking around at the places that do need a context I found a few places that were declared to need a context but don't actually need it.	2020-01-03 13:28:55 -05:00
Nik Everett	4d58656065	Declare remaining parsers `final` (#50571 ) (#50615 ) We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which are final. This is probably the right way to declare them because in practice we never mutate them after they are built. And we certainly don't change the static reference. Anyway, this adds `final` to these parsers. I found the non-final parsers with this: ``` diff \ <(find . -type f -name '.java' -exec grep -iHe 'static.PARSER\s=' {} \+ \| sort) \ <(find . -type f -name '.java' -exec grep -iHe 'static.final.PARSER\s*=' {} \+ \| sort) \ 2>&1 \| grep '^<' ```	2020-01-03 11:48:11 -05:00
Nik Everett	b36a8ab141	Make some ObjectParsers final (#50471 ) (#50556 ) We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which are final. This is probably the right way to declare them because in practice we never mutate them after they are built. And we certainly don't change the static reference. Anyway, this adds `final` to a bunch of these parsers, mostly the ones in xpack and their "paired" parsers in the high level rest client. I picked these just to have somewhere to break the up the change so it wouldn't be huge. I found the non-final parsers with this: ``` diff \ <(find . -type f -name '.java' -exec grep -iHe 'static.PARSER\s=' {} \+ \| sort) \ <(find . -type f -name '.java' -exec grep -iHe 'static.final.PARSER\s*=' {} \+ \| sort) \ 2>&1 \| grep '^<' ```	2020-01-02 10:47:38 -05:00
Christoph Büscher	6258d25458	Log deprecation for nGram and edgeNGram custom filters (#50376 ) (#50445 ) The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We currently throw errors on new indices when they are used. However these errors are currently only thrown for pre-configured filters, adding them as custom filters doesn't trigger the warning and error. This change adds the appropriate deprecation warnings for `nGram` and `edgeNGram` respectively on version 7 indices. Relates #50360	2019-12-20 22:00:08 +01:00
Stuart Tettemer	f212994c16	[TEST] Unknown scripting annotations raise error (#50343 ) (#50346 ) Ensure that unknown annotations, such as typo'd `@nondeterministic`, will raise an exception.	2019-12-19 16:22:22 -07:00
Stuart Tettemer	689df1f28f	Scripting: ScriptFactory not required by compile (#50344 ) (#50392 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Backport Relates: #49466	2019-12-19 12:50:25 -07:00
Stuart Tettemer	06a24f09cf	Scripting: Cache script results if deterministic (#50106 ) (#50329 ) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: #49466	2019-12-18 13:00:42 -07:00
Przemko Robakowski	0efb241b3c	Fix flakiness in CsvProcessorTests (#50254 ) (#50256 ) There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields. Closes #50209	2019-12-17 01:15:15 +01:00
Ignacio Vera	b5ec227de8	upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129 ) (#50132 )	2019-12-12 13:13:37 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Przemko Robakowski	4619834b97	[7.x] CSV ingest processor (#49509 ) (#50083 ) * CSV ingest processor (#49509) This change adds new ingest processor that breaks line from CSV file into separate fields. By default it conforms to RFC 4180 but can be tweaked. Closes #49113	2019-12-11 23:06:05 +01:00
Jack Conradson	eb20db8a1c	Update Painless AST Catch Node (#50044 ) This makes two changes to the catch node: 1. Use SDeclaration to replace independent variable usage. 2. Use a DType to set a "minimum" exception type - this allows us to require users to continue using Exception as "minimum" type for catch blocks, but for us to internally catch Error/Throwable. This is a required step to removing custom try/catch blocks from SClass.	2019-12-10 12:56:34 -08:00
Adrien Grand	87e72156ce	Upgrade to lucene 8.4.0-snapshot-662c455. (#50016 ) (#50039 ) Lucene 8.4 is about to be released so we should check it doesn't cause problems with Elasticsearch.	2019-12-10 18:04:58 +01:00
Alan Woodward	3d8c2f9e18	Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803 ) When the query analyzer examines a conjunction containing both terms and ranges, it should only include ranges in the minimum_should_match calculation if there are no other range queries on that same field within the conjunction. This is because we cannot build a selection query over disjoint ranges on the same field, and it is not easy to check if two range queries have an overlap. The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent on whether or not the current range is over a field that has already been seen. However, this can be incorrect in the case that there are terms in the same match group which adjust the minimum_should_match downwards. Instead, the logic should be changed to match the terms extraction, whereby we adjust minimum_should_match downwards if we have already seen a range field. Fixes #49684	2019-12-10 11:01:52 +00:00
Przemko Robakowski	d7083a84f4	Allow list of IPs in geoip ingest processor (#49573 ) (#49947 ) * Allow list of IPs in geoip ingest processor This change lets you use array of IPs in addition to string in geoip processor source field. It will set array containing geoip data for each element in source, unless first_only parameter option is enabled, then only first found will be returned. Closes #46193	2019-12-07 00:19:09 +01:00
Stuart Tettemer	17cda5b2c0	Scripting: Groundwork for caching script results (#49895 ) (#49944 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466 Backport	2019-12-06 15:08:05 -07:00
Jake Landis	1c5a139968	Update jackson-databind to 2.8.11.4 (#49347 ) (#49937 )	2019-12-06 13:39:33 -06:00
Henning Andersen	1d3feaf18e	Reindex sort deprecation warning take 2 (#49855 ) (#49899 ) Moved the deprecation warning to ReindexValidator to ensure it runs early and works with resilient reindex. Also check that the warning is reported back for wait_for_completion=false. Follow-up to #49458	2019-12-06 09:44:36 +01:00
Jack Conradson	cd3744c0b7	Add nodes to handle types (#49785 ) This PR adds 3 nodes to handle types defined by a front-end creating a Painless AST. These types are decided with data immutability in mind - hence the reason for more than a single node.	2019-12-05 17:09:19 -08:00
Zachary Tong	fec882a457	Decouple pipeline reductions from final agg reduction (#45796 ) Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible	2019-12-05 16:11:54 -05:00
Jack Conradson	687c6648d9	Minor Painless Clean Up (#49844 ) This cleans up two minor things. - Cleans up style of == false - Pulls maxLoopCounter into a member variable instead of accessing CompilerSettings multiple times in the SFunction node	2019-12-05 12:20:07 -08:00
Stuart Tettemer	426c7a5e8f	Scripting: add available languages & contexts API (#49652 ) (#49815 ) Adds `GET /_script_language` to support Kibana dynamic scripting language selection. Response contains whether `inline` and/or `stored` scripts are enabled as determined by the `script.allowed_types` settings. For each scripting language registered, such as `painless`, `expression`, `mustache` or custom, available contexts for the language are included as determined by the `script.allowed_contexts` setting. Response format: ``` { "types_allowed": [ "inline", "stored" ], "language_contexts": [ { "language": "expression", "contexts": [ "aggregation_selector", "aggs" ... ] }, { "language": "painless", "contexts": [ "aggregation_selector", "aggs", "aggs_combine", ... ] } ... ] } ``` Fixes: #49463 Backport	2019-12-04 16:18:22 -07:00
Jack Conradson	dbf6183469	Remove extraneous pass (#49797 ) This removes the storeSettings pass where nodes in the AST could store information they needed out of CompilerSettings for use during later passes. CompilerSettings is part of ScriptRoot which is available during the analysis pass making the storeSettings pass redundant.	2019-12-04 12:18:04 -08:00
Armin Braun	91ac87d75b	Stop Allocating Buffers in CopyBytesSocketChannel (#49825 ) (#49832 ) * Stop Allocating Buffers in CopyBytesSocketChannel (#49825) The way things currently work, we read up to 1M from the channel and then potentially force all of it into the `ByteBuf` passed by Netty. Since that `ByteBuf` tends to by default be `64k` in size, large reads will force the buffer to grow, completely circumventing the logic of `allocHandle`. This seems like it could break `io.netty.channel.RecvByteBufAllocator.Handle#continueReading` since that method for the fixed-size allocator does check whether the last read was equal to the attempted read size. So if we set `64k` because that's what the buffer size is, then wirte `1M` to the buffer we will stop reading on the IO loop, even though the channel may still have bytes that we can read right away. More imporatantly though, this can lead to running OOM quite easily under IO pressure as we are forcing the heap buffers passed to the read to `reallocate`. Closes #49699	2019-12-04 19:36:52 +01:00
Armin Braun	996cddd98b	Stop Copying Every Http Request in Message Handler (#44564 ) (#49809 ) * Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)	2019-12-04 08:41:42 +01:00
Jason Tedor	0f27c0b702	Extend systemd timeout during startup (#49784 ) When we are notifying systemd that we are fully started up, it can be that we do not notify systemd before its default timeout of sixty seconds elapses (e.g., if we are upgrading on-disk metadata). In this case, we need to notify systemd to extend this timeout so that we are not abruptly terminated. We do this by repeatedly sending EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this every fifteen seconds. This will prevent systemd from abruptly terminating us during a long startup. We cancel the scheduled execution of this notification after we have successfully started up.	2019-12-03 14:25:45 -05:00
Henning Andersen	5adb33ec17	Deprecate sorting in reindex (#49458 ) (#49738 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-12-01 19:24:27 +01:00
Henning Andersen	1d745f1e5c	Revert "Deprecate sorting in reindex (#49458 )" This reverts commit `27d45c9f1f`.	2019-11-29 22:08:19 +01:00
Henning Andersen	27d45c9f1f	Deprecate sorting in reindex (#49458 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-11-29 21:35:11 +01:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Mayya Sharipova	2dafecc398	Upgrade lucene to 8.4.0-snapshot-e648d601efb (#49641 )	2019-11-28 11:59:58 -05:00
jimczi	35732504ba	#49166 Fix spurious test failure	2019-11-28 11:08:15 +01:00
Jim Ferenczi	d6445fae4b	Add a cluster setting to disallow loading fielddata on _id field (#49166 ) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472 is implemented. Closes #43599	2019-11-28 09:35:28 +01:00
Martijn van Groningen	0a42395dfa	Backport: add templating support to pipeline processor (#49643 ) Backport of #49030 This commit adds templating support to the pipeline processor's `name` option. Closes #39955	2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka	502873b144	[Java.time] Retain prefixed date pattern in formatter (#48703 ) JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters. closes #48698	2019-11-27 12:29:18 +01:00
Yannick Welsch	bd007271cf	Avoid double-wrapping allocator (#49534 ) When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.	2019-11-27 09:25:32 +01:00
Martijn van Groningen	90850f4ea0	Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596 ) Backport of #49076 In case an exception occurs inside a pipeline processor, the pipeline stack is kept around as header in the exception. Then in the on_failure processor the id of the pipeline the exception occurred is made accessible via the `on_failure_pipeline` ingest metadata. Closes #44920	2019-11-27 07:52:08 +01:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Henning Andersen	49bb5fb642	Netty4: switch to composite cumulator (#49478 ) The default merge cumulator used in netty transport leads to additional GC pressure and memory copying when a message that exceeds the chunk size is handled. This is especially a problem on G1 GC, since we get many "humongous" allocations and that can in theory cause real memory circuit breaker to break unnecessarily.	2019-11-22 18:14:10 +01:00
Martijn van Groningen	2243743450	Update geolite2 database in ingest geoip plugin. (#49308 ) Some tests were tweaked to deal with the updated database files.	2019-11-22 08:38:57 +01:00
Henning Andersen	0164de8579	Reindex search response fix again (#49423 ) Fixed test case to more broadly accept all messages with "Partial shards failure" in it, to hopefully catch all relevant search messages now that reindex does not allow searching against red shards. Closes #49295	2019-11-21 11:45:08 +01:00
Jack Conradson	a780ec14f0	Painless: Upgrade ASM to 7.2 (#49263 ) This upgrades Painless to use the latest ASM libraries providing support up to Java 14. Note the library is not published with the latest versions in an "all" package, so we pick up each lib independently that's required. There were some changes to the getType method that require descriptors to be used in place of internal class names.	2019-11-20 07:09:47 -08:00
Christoph Büscher	4ffa050735	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:37:12 +01:00
Alan Woodward	c6b31162ba	Refactor percolator's QueryAnalyzer to use QueryVisitors Lucene now allows us to explore the structure of a query using QueryVisitors, delegating the knowledge of how to recurse through and collect terms to the query implementations themselves. The percolator currently has a home-grown external version of this API to construct sets of matching terms that must be present in a document in order for it to possibly match the query. This commit removes the home-grown implementation in favour of one using QueryVisitor. This has the added benefit of making interval queries available for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050) it also includes a clone of some of the lucene intervals logic, that can be removed once upstream has been fixed. Closes #45639	2019-11-20 09:21:01 +00:00
Mark Tozzi	17358b5af7	(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320 ) (#49330 ) This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.	2019-11-19 16:44:29 -05:00
Ryan Ernst	c6a8913c38	Fix java home validation usage by tasks (#49204 ) Tasks intending to use a particular java home provided by JAVA<N>_HOME use the getJavaHome method, which verifies the given java home is available, or will be if the task will run. However, the verification logic was broken, in addition to unnecessarily delaying retrieving the java home until runtime. This commit fixes the verification logic to run at either config time, delaying verification, or at runtime which immediately checks if java home is available. closes #49153	2019-11-19 10:30:19 -08:00
Henning Andersen	bc29c9877a	Reindex search response fix (#49301 ) Fixed test case to also accept another error message, now that reindex does not allow searching against red shards. Closes #49295	2019-11-19 14:38:05 +01:00
Tanguy Leroux	abed869ec6	Mute ReindexFailureTests.testResponseOnSearchFailure (#49298 ) Relates #49295	2019-11-19 12:38:54 +01:00
Henning Andersen	2ac38fd315	Reindex and friends fail on RED shards (#45830 ) Reindex, update by query and delete by query would silently disregard RED/unavailable shards, thus not copying, updating or deleting matching data in those shards. Now use `allow_partial_search_results=false` to ensure these operations fail if the search crosses an unavailable chard. Added the option to explicitly specify `allow_partial_search_results=true` for reindex only (seemed too strange for update/delete by query). Relates #45739 and #42612	2019-11-18 21:23:08 +01:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Jason Tedor	2bcdcb17cd	Introduce dedicated ingest processor exception (#48810 ) Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.	2019-11-14 11:04:53 -05:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	8835142ac9	Grok processor ignore case test (#48909 ) Added test demonstrating that grok using ignore case works, since this does a minimal test that the `joni` and `jcodings` libraries are compatible. Forward-port of test from #43334	2019-11-08 00:04:29 +01:00
Jason Tedor	c82ecb664c	Do not wrap ingest processor exception with IAE (#48816 ) The problem with wrapping here is that it converts any exception into an IAE, which we treat as a client error (400 status) whereas the exception being wrapped here could be a server error (e.g., NPE). This commit stops wrapping all ingest processor exceptions as IAEs.	2019-11-01 15:11:35 -04:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Ioannis Kakavas	99aedc844d	Copy http headers to ThreadContext strictly (#45945 ) (#48675 ) Previous behavior while copying HTTP headers to the ThreadContext, would allow multiple HTTP headers with the same name, handling only the first occurrence and disregarding the rest of the values. This can be confusing when dealing with multiple Headers as it is not obvious which value is read and which ones are silently dropped. According to RFC-7230, a client must not send multiple header fields with the same field name in a HTTP message, unless the entire field value for this header is defined as a comma separated list or this specific header is a well-known exception. This commits changes the behavior in order to be more compliant to the aforementioned RFC by requiring the classes that implement ActionPlugin to declare if a header can be multi-valued or not when registering this header to be copied over to the ThreadContext in ActionPlugin#getRestHeaders. If the header is allowed to be multivalued, then all such headers are read from the HTTP request and their values get concatenated in a comma-separated string. If the header is not allowed to be multivalued, and the HTTP request contains multiple such Headers with different values, the request is rejected with a 400 status.	2019-10-31 23:05:12 +02:00
Dan Hermann	dbc05cd808	Add option to split processor for preserving trailing empty fields (#48685 )	2019-10-30 08:25:03 -05:00
Yogesh Gaikwad	9ed7352a12	Add Sysprop to Adjust IO Buffer Size (#48267 ) (#48667 ) The 1MB IO-buffer size per transport thread is causing trouble in some tests, albeit at a low rate. Reducing the number of transport threads was not enough to fully fix this situation. Allowing to configure the size of the buffer and reducing it by more than an order of magnitude should fix these tests. Closes #46803	2019-10-30 14:19:54 +11:00
Christoph Büscher	09d68e7548	Support `search_type` in Rank Evaluation API (#48542 ) (#48631 ) Adding support for the `search_type` request parameter to the Ranking Evaluation API since this parameter can impact the ranking and the metric score and should be choosen in the same way when evaluating the search as later in the real search. Closes #48503	2019-10-29 14:54:33 +01:00
Rory Hunter	3c77c50f5f	Improve resiliency to auto-formatting in libs, modules (#48619 ) Backport of #48448. Make a number of changes so that code in the libs and modules directories are more resilient to automatic formatting. This covers: * Remove string concatenation where JSON fits on a single line * Move some comments around to they aren't auto-formatted to a strange place	2019-10-29 10:39:34 +00:00
Tim Brooks	45e42f4e18	Upgrade to Netty 4.1.43 (#48484 ) With this update we can remove the mitigation in our custom allocator which forces heap buffer allocations.	2019-10-25 10:17:25 -06:00
Tim Brooks	c0b545f325	Make BytesReference an interface (#48486 ) BytesReference is currently an abstract class which is extended by various implementations. This makes it very difficult to use the delegation pattern. The implication of this is that our releasable BytesReference is a PagedBytesReference type and cannot be used as a generic releasable bytes reference that delegates to any reference type. This commit makes BytesReference an interface and introduces an AbstractBytesReference for common functionality.	2019-10-24 15:39:30 -06:00
Michael Basnight	c19379ef31	Remove random when using HLRC sync and async calls (#48211 ) This commit removes the randomization used by every execute call in the high level rest tests. Previously every execute call, which can be many calls per single test, would rely on a random boolean to determine if they should use the sync or async methods provided to the execute method. This commit runs the tests twice, using two different clusters, both of them providing the value one time via a sysprop. This ensures that the whole suite of tests is run using the sync and async code paths. Closes #39667	2019-10-24 09:06:17 -05:00
Martijn van Groningen	b034153df7	Change grok watch dog to be Matcher based instead of thread based. (#48346 ) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.	2019-10-24 15:34:01 +02:00
Tim Brooks	c1f6aff5bb	Remove default netty allocator empty assertions (#48356 ) This commit removes a problematic assertion that the netty default allocator is not used. This assertion is problematic because any other test can cause this task to fail by touching the default allocator. We assert that we are using heap buffers in the channel.	2019-10-22 20:22:32 -06:00
Tim Brooks	547e399dbf	Remove option to enable direct buffer pooling (#48310 ) This commit removes the option to change the netty system properties to reenable the direct buffer pooling. It also removes the need for us to disable the buffer pooling in the system properties file. Instead, we programmatically craete an allocator that is used by our networking layer. This commit does introduce an Elasticsearch property which allows the user to fallback on the netty default allocator. If they choose this option, they can configure the default allocator how they wish using the standard netty properties.	2019-10-21 19:15:50 -06:00
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Alexander Reelsen	66581d8158	update ingest-user-agent regexes.yml (#47807 ) This new regexes are from: `154eba17f5/regexes.yaml`	2019-10-18 16:26:48 +02:00
Jack Conradson	155ecd0a76	Change Painless regex node to use SField instead of Globals (#47944 ) * Change Painless regex node to use SField instead of Globals * Use reflection instead of ASM to specify modifiers * Remove synthetic from SField	2019-10-15 07:47:16 -07:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Tim Brooks	8814bf07f1	Upgrade to Netty 4.1.42 (#48015 ) Upgrades the netty version.	2019-10-14 13:54:02 -06:00
Przemyslaw Gomulka	6ab58de7ef	[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675 ) (#47913 ) Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from Within DateFormatters we use the correct uuuu year instead of yyyy year of era worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.	2019-10-11 21:19:56 +02:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Jack Conradson	076d3073b5	Move binding member field generation to Painless semantic pass (#47739 ) This adds an SField node that operates similarly to SFunction as a top level node meant only for use in an SClass node. Member fields are generated for both class bindings and instance bindings using the new SField node during the semantic pass, and information is no longer passed through Globals for this during the write pass.	2019-10-09 10:24:53 -07:00
Tim Brooks	02622c1ef9	Fix issues with serializing BulkByScrollResponse (#45357 ) Currently there are two issues with serializing BulkByScrollResponse. First, when deserializing from XContent, indexing exceptions and search exceptions are switched. Additionally, search exceptions do no retain the appropriate RestStatus code, so you must evaluate the status code from the exception. However, the exception class is not always correctly retained when serialized. This commit adds tests in the failure case. Additionally, fixes the swapping of failure types and adds the rest status code to the search failure.	2019-10-09 10:12:14 -06:00
Alpar Torok	36d018c909	Convert RunTask to use testclusers, remove ClusterFormationTasks (#47572 ) * Convert RunTask to use testclusers, remove ClusterFormationTasks This PR adds a new RunTask and a way for it to start a testclusters cluster out of band and block on it to replace the old RunTask that used ClusterFormationTasks. With this we can now remove ClusterFormationTasks.	2019-10-08 14:43:29 +03:00
Jack Conradson	833ed30f0d	Modify Painless AST to add synthetic functions during semantic pass (#47611 ) This has ELambda and ENewArrayFunctionRef add their generated synthetic methods to the SClass node during the semantic pass and removes this data from the write pass. This is the first step to remove "Globals" (mutable state) from the write pass.	2019-10-07 07:48:51 -07:00
Jack Conradson	e3aab1295e	Add a ScriptRoot to consolidate global data necessary for multiple passes (#47532 ) This PR is to get plumbing in for a ScriptRoot class that will consolidate several pieces of state required by potentially multiple passes including PainlessLookup, CompilerSettings, FunctionTable, the root class node, and a synthetic counter. It's possible more may be added to this as we move forward and slowly make the the nodes have less mutable state.	2019-10-04 08:37:19 -07:00
Ryan Ernst	f32692208e	Add explanations to script score queries (#46693 ) (#47548 ) While function scores using scripts do allow explanations, they are only creatable with an expert plugin. This commit improves the situation for the newer script score query by adding the ability to set the explanation from the script itself. To set the explanation, a user would check for `explanation != null` to indicate an explanation is needed, and then call `explanation.set("some description")`.	2019-10-03 21:05:05 -07:00
Jim Ferenczi	5a3fa4a479	Add client jar for mapper-extras (#47430 ) The rest high level client has a dependency on mapper-extras but the jar is not published so this commit adds a client jar for this module. Closes #47413	2019-10-03 01:23:45 +02:00
Jim Ferenczi	c340814b34	Fix highlighting of overlapping terms in the unified highlighter (#47227 ) The passage formatter that the unified highlighter use doesn't handle terms with overlapping offsets. For tokenizer that provides multiple segmentation of the same terms (edge ngram for instance) the formatter should select the largest span in order to highlight the term only once. This change implements this logic.	2019-10-02 16:34:12 +02:00
Alan Woodward	697c693ee7	Reset Token position on reuse in scripted analysis (#47424 ) Most of the information in AnalysisPredicateScript.Token is pulled directly from its underlying AttributeSource, but we also keep track of the token position, and this state is held directly on the Token. This information needs to be reset when the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used. Fixes #47197	2019-10-02 11:27:04 +01:00
Jack Conradson	8f1a80a43d	Move Painless local methods to a dedicated FunctionTable (#46889 ) This moves the way Painless maintains function headers for use across compilation into its own class - FunctionTable. This allows us to store a dedicated object for function lookup at runtime for the def type instead of a loose Map of functions.	2019-09-30 09:06:40 -07:00
Rory Hunter	53a4d2176f	Convert most awaitBusy calls to assertBusy (#45794 ) (#47112 ) Backport of #45794 to 7.x. Convert most `awaitBusy` calls to `assertBusy`, and use asserts where possible. Follows on from #28548 by @liketic. There were a small number of places where it didn't make sense to me to call `assertBusy`, so I kept the existing calls but renamed the method to `waitUntil`. This was partly to better reflect its usage, and partly so that anyone trying to add a new call to awaitBusy wouldn't be able to find it. I also didn't change the usage in `TransportStopRollupAction` as the comments state that the local awaitBusy method is a temporary copy-and-paste. Other changes: * Rework `waitForDocs` to scale its timeout. Instead of calling `assertBusy` in a loop, work out a reasonable overall timeout and await just once. * Some tests failed after switching to `assertBusy` and had to be fixed. * Correct the expect templates in AbstractUpgradeTestCase. The ES Security team confirmed that they don't use templates any more, so remove this from the expected templates. Also rewrite how the setup code checks for templates, in order to give more information. * Remove an expected ML template from XPackRestTestConstants The ML team advised that the ML tests shouldn't be waiting for any `.ml-notifications` templates, since such checks should happen in the production code instead. Also rework the template checking code in `XPackRestTestHelper` to give more helpful failure messages. * Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4	2019-09-29 12:21:46 +01:00
Jack Conradson	d09965a6dd	Add ClassWriter to Painless writing pass (#47140 ) This the first part of a series to allow nodes to write all of their appropriate pieces to the class. Currently, nodes must add their bindings, constants, and functions to main SClass node for delayed writing. This instead adds a Painless version of ClassWriter to the write pass. The Painless ClassWriter contains an appropriate ClassVisitor that can be accessed in any node during the process along with access to the clinit method, and finally a shortcut for creating new MethodWriter. The next step will be removing the delayed writing in SClass, and instead, delegate all writing responsibilities to the nodes.	2019-09-27 11:04:15 -07:00
Jack Conradson	9b4f377474	Change Painless function node to use a block instead of raw statements (#46884 ) This change improves the node structure of SFunction. SFunction now uses an SBlock instead of a List of AStatments reducing code duplication and gives a future target for symbol table scoping.	2019-09-26 10:33:35 -07:00

... 3 4 5 6 7 ...

5736 Commits