OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alan Woodward	71b703edd1	Rename AtomicFieldData to LeafFieldData (#53554 ) This conforms with lucene's LeafReader naming convention, and matches other per-segment structures in elasticsearch.	2020-03-17 12:30:12 +00:00
Nik Everett	f0beab4041	Stop using round-tripped PipelineAggregators (backport of #53423 ) (#53629 ) This begins to clean up how `PipelineAggregator`s and executed. Previously, we would create the `PipelineAggregator`s on the data nodes and embed them in the aggregation tree. When it came time to execute the pipeline aggregation we'd use the `PipelineAggregator`s that were on the first shard's results. This is inefficient because: 1. The data node needs to make the `PipelineAggregator` only to serialize it and then throw it away. 2. The coordinating node needs to deserialize all of the `PipelineAggregator`s even though it only needs one of them. 3. You end up with many `PipelineAggregator` instances when you only really need one per pipeline. 4. `PipelineAggregator` needs to implement serialization. This begins to undo these by building the `PipelineAggregator`s directly on the coordinating node and using those instead of the `PipelineAggregator`s in the aggregtion tree. In a follow up change we'll stop serializing the `PipelineAggregator`s to node versions that support this behavior. And, one day, we'll be able to remove `PipelineAggregator` from the aggregation result tree entirely. Importantly, this doesn't change how pipeline aggregations are declared or parsed or requested. They are still part of the `AggregationBuilder` tree because that makes sense.	2020-03-16 16:15:23 -04:00
bellengao	e2effa9fab	Fix inaccurate total hit count in _search template api (#53155 ) When 'rest_track_total_hits_as_int' is set to true, the total hits count in the response should be accurate. So we should set trackTotalHits to true if need when parsing the inline script of a search template request. Closes #52801	2020-03-16 11:48:43 +01:00
Mark Vieira	2f0aca992b	Revert "Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576 )" This reverts commit `b7dbadeea0`.	2020-03-15 18:10:40 -07:00
Jason Tedor	b7dbadeea0	Upgrade to Jackson 2.10.3 and GeoIP2 to 2.13.1 (#53576 ) This commit upgrades our Jackson dependency to 2.10.3 and our GeoIP2 dependency to 2.13.1. Relates #53523	2020-03-14 13:28:06 -04:00
Jason Tedor	32dd852210	Update jackson-databind to 2.8.11.6 (#53522 ) This commit upgrades the jackson-databind depdendency to 2.8.11.6. Additionally, we revert a previous change that put ingest-geoip on the version of jackson-databind from the version properties file. This is because upgrading ingest-geoip to a later version of jackson-databind also requires an upgrade to the geoip2 dependency which is currently blocked. Therefore, if we can get to a point where we otherwise upgrade our Jackson dependencies, we do not want ingest-geoip to automatically come along with it.	2020-03-12 20:15:13 -04:00
Alan Woodward	5c861cfe6e	Upgrade to final lucene 8.5.0 snapshot (#53293 ) Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use the latest snapshot to check that there are no last-minute bugs or regressions.	2020-03-10 09:32:59 +00:00
Nhat Nguyen	5476a49833	Revert "upgrade to lucene-snapshot-fa75139efea (#53150 ) (#53151 )" This reverts commit `058113aa42`.	2020-03-05 17:33:00 -05:00
Ignacio Vera	058113aa42	upgrade to lucene-snapshot-fa75139efea (#53150 ) (#53151 )	2020-03-05 10:04:05 +01:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Alan Woodward	3759063d34	Allow specifying an exclusive set of fields on ObjectParser (#52893 ) ObjectParser allows you to declare a set of required fields, such that at least one of the set must appear in an xcontent object for it to be valid. This commit adds the similar concept of a set of exclusive fields, such that at most one of the set must be present. It also enables required fields on ConstructingObjectParser, and re-implements PercolateQueryBuilder.fromXContent() to use object parsing as an example of how this works.	2020-03-03 10:56:20 +00:00
Nhat Nguyen	e6755afeeb	Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950 ) (#52977 ) To give LUCENE-9228 more CI cycles	2020-02-29 09:29:16 -05:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
Josh Devins	68ba571f70	Adds recall@k metric to rank eval API (#52889 ) This change adds the recall@k metric and refactors precision@k to match the new metric. Recall@k is an important metric to use for learning to rank (LTR) use-cases. Candidate generation or first ranking phase ranking functions are often optimized for high recall, in order to generate as many relevant candidates in the top-k as possible for a second phase of ranking. Adding this metric allows tuning that base query for LTR. See: https://github.com/elastic/elasticsearch/issues/51676 Backports: https://github.com/elastic/elasticsearch/pull/52577	2020-02-27 16:04:24 +01:00
Dan Hermann	3c8b46a8c1	[7.x] Handle errors when evaluating if conditions in processors (#52892 )	2020-02-27 09:00:51 -06:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Alan Woodward	a76ec765e5	Ensure that percolator sorting also works (#52758 ) Commit #52748 fixed a bug where percolate queries wrapped in a constant score could report incorrect matches. This commit adds a test to check that it also fixes the case where a percolate query is sorted by something other than score. Closes #52618	2020-02-26 10:52:04 +00:00
Alan Woodward	638f3e4183	Use ByteBuffersDirectory rather than RAMDirectory (#52768 ) Lucene's RAMDirectory has been deprecated. This commit replaces all uses of RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses are in tests, but the percolator and painless executor may get some small speedups.	2020-02-25 15:46:35 +00:00
Henning Andersen	3ad1783a41	Delete by query test on low free disk block (#52759 ) The block setup by the test could be released by the nodes cluster info thread before the disk threshold decider was disabled, now disable decider first.	2020-02-25 15:54:30 +01:00
Alan Woodward	18663b0a85	Don't index ranges including NOW in percolator (#52748 ) Currently, date ranges queries using NOW-based date math are rewritten to MatchAllDocs queries when being preprocessed for the percolator. However, since we added the verification step, this can result in incorrect matches when percolator queries are run without scores. This commit changes things to instead wrap date queries that use NOW with a new DateRangeIncludingNowQuery. This is a simple wrapper query that returns its delegate at rewrite time, but it can be detected by the percolator QueryAnalyzer and be dealt with accordingly. This also allows us to remove a method on QueryRewriteContext, and push all logic relating to NOW-based ranges into the DateFieldMapper. Fixes #52617	2020-02-25 12:18:16 +00:00
Mayya Sharipova	034b1c0ba3	Correct boost calculation in script_score query (#52478 ) (#52724 ) Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes #48465	2020-02-24 13:48:21 -05:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
markharwood	96d603979b	Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584 ) Upgrading 7x to same Lucene 8.5 version used in master	2020-02-21 10:25:03 +00:00
Rory Hunter	8fb9bed078	Fix compilation error	2020-02-20 14:39:44 +00:00
Maria Ralli	ba8d6d1fb5	Remove Xlint exclusions from gradle files Backport of #52542. This commit is part of issue #40366 to remove disabled Xlint warnings from gradle files. In particular, it removes the Xlint exclusions from the following files: - benchmarks/build.gradle - client/client-benchmark-noop-api-plugin/build.gradle - x-pack/qa/rolling-upgrade/build.gradle - x-pack/qa/third-party/active-directory/build.gradle - modules/transport-netty4/build.gradle For the first three files no code adjustments were needed. For x-pack/qa/third-party/active-directory move the suppression at the code level. For transport-netty4 replace the variable arguments with ArrayLists and remove any redundant casts.	2020-02-20 14:12:05 +00:00
Tim Brooks	e752221fc6	Upgrade netty to 4.1.45.Final (#51689 ) Upgrade netty.	2020-02-18 09:11:29 -07:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Marios Trivyzas	ea6f0e39bc	[Tests] Update skip version for YAML tests (#52310 ) Update skip versions upper boundary to match the release or intented release version of the feature/fix.	2020-02-13 15:36:31 +01:00
Marios Trivyzas	dac720d7a1	Add a cluster setting to disallow expensive queries (#51385 ) (#52279 ) Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)	2020-02-12 22:56:14 +01:00
Nhat Nguyen	257eb0212c	Mute ‘test user agent processor with non-ECS schema’ Tracked at #52266	2020-02-12 10:27:18 -05:00
Ignacio Vera	80e3c97210	Upgrade to lucene-8.5.0-snapshot-d62f6307658 (#52039 ) (#52130 )	2020-02-10 10:13:22 +01:00
Jay Modi	3edadfefd0	RestHandlers declare handled routes (#52123 ) This commit changes how RestHandlers are registered with the RestController so that a RestHandler no longer needs to register itself with the RestController. Instead the RestHandler interface has new methods which when called provide information about the routes (method and path combinations) that are handled by the handler including any deprecated and/or replaced combinations. This change also makes the publication of RestHandlers safe since they no longer publish a reference to themselves within their constructors. Closes #51622 Co-authored-by: Jason Tedor <jason@tedor.me> Backport of #51950	2020-02-09 22:48:32 -07:00
Ioannis Kakavas	8c0b49cd32	Adjust jarHell and 3rd party audit exclusions (#51733 ) (#51766 ) Now that the FIPS 140 security provider is simply a test dependency we don't need the thirdPartyAudit exceptions, but plugin-cli and transport-netty4 do need jarHell disabled as they use the non fips BouncyCastle security provider as a test dependency too.	2020-02-10 07:38:59 +02:00
Nhat Nguyen	6d0a0e1240	Revert "Mute ReindexFailureTests test" This reverts commit `16afbf91bb`. The issue was fixed in #52099	2020-02-09 22:13:36 -05:00
Przemko Robakowski	8cf47aca7e	[7.x] Improve Painless compilation performance for nested conditionals (#52056 ) (#52074 ) * Improve Painless compilation performance for nested conditionals (#52056) This PR changes how conditional expression is handled in `PainlessParser` in a way that avoids the need for backtracking, which led to exponential compilation times in case of nested conditionals. The test was added ensures that we can compile deeply nested conditionals. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> * Fix Map.of in Java8 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-07 21:13:25 +01:00
Przemko Robakowski	c827f6f440	Avoid clash between source field and header field in CsvProcessorTests (#51962 ) (#52070 ) This change fixes flakiness in `CsvProcessorTests` where source field can be the same as one of the headers used by tests which messes up asserts when we check that field is not present after processor run. Closes #50209	2020-02-07 21:00:39 +01:00
Julie Tibshirani	337d73a7c6	Rename MapperService#fullName to fieldType. The new name more accurately describes what the method returns.	2020-02-07 10:35:53 -08:00
Armin Braun	91e938ead8	Add Trace Logging of REST Requests (#51684 ) (#52015 ) Being able to trace log all REST requests to a node would make debugging a number of issues a lot easier.	2020-02-07 09:03:20 +01:00
Mark Vieira	16afbf91bb	Mute ReindexFailureTests test	2020-02-06 16:29:04 -08:00
Mark Vieira	bc7aff917e	Mute LangPainlessClientYamlTestSuiteIT context API tests (#51939 ) (#52011 )	2020-02-06 11:04:15 -08:00
Przemko Robakowski	6332de40b4	Add empty_value parameter to CSV processor (#51567 ) (#51966 ) * Add empty_value parameter to CSV processor This change adds `empty_value` parameter to the CSV processor. This value is used to fill empty fields. Fields will be skipped if this parameter is ommited. This behavior is the same for both quoted and unquoted fields. * docs updated * Fix compilation problem Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-05 23:35:52 +01:00
Henning Andersen	d0865b963e	Disable reindex test against 0.90 on mac (#51884 ) Follow-up to #51449 to also disable the test on mac. Closes #51617	2020-02-05 16:45:51 +01:00
Adrien Grand	ad9d2f1922	Move analysis/mappings stats to cluster-stats. (#51875 ) Closes #51138	2020-02-05 11:02:25 +01:00
Maria Ralli	8d3e73b3a0	Add host address to BindTransportException message (#51269 ) When bind fails, show the host address in addition to the port. This helps debugging cases with wrong "network.host" values. Closes #48001	2020-02-04 17:13:19 +00:00
Przemyslaw Gomulka	a6d24d6a46	Fix ingest timezone logic backport(#51215 ) (#51802 ) when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter. When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter closes #51108 backport(#51215)	2020-02-03 14:17:43 +01:00
Ryan Ernst	21224caeaf	Remove comparison to true for booleans (#51723 ) While we use `== false` as a more visible form of boolean negation (instead of `!`), the true case is implied and the true value does not need to explicitly checked. This commit converts cases that have slipped into the code checking for `== true`.	2020-01-31 16:35:43 -08:00
Mayya Sharipova	16ef6a5785	Mute testEs090 and testEs090WithFunnyThrottle Mure tests that reproducilbly fail. Related issue #51617	2020-01-31 14:46:29 -05:00
Mayya Sharipova	42b885f050	Upgrade to lucene-8.5.0-snapshot-3333ce7da6d (#51749 ) Backport for #51327	2020-01-31 11:20:15 -05:00
Przemko Robakowski	a7f0c699cf	Fix ignore_missing in CsvProcessor (#51600 ) (#51609 ) This change fixes inverted logic around ignore_missing in CsvProcessor	2020-01-29 14:58:23 +01:00
Ioannis Kakavas	ba3051a50f	Mute Netty4ClientYamlTestSuiteIT in FIPS 140 (#51536 ) rest-api-spec/test/10_basic.yml would check that transport_types is `netty4` but we run FIPS 140 tests with default distribution and transport_types is `security4`	2020-01-29 08:16:47 +02:00
Gordon Brown	89c2834b24	Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959 ) This commit deprecates the creation of dot-prefixed index names (e.g. .watches) unless they are either 1) a hidden index, or 2) registered by a plugin that extends SystemIndexPlugin. This is the first step towards more thorough protections for system indices. This commit also modifies several plugins which use dot-prefixed indices to register indices they own as system indices, and adds a plugin to register .tasks as a system index.	2020-01-28 10:01:16 -07:00
Henning Andersen	9085024e1d	Disable reindex against 0.90 on mac (#51449 ) We still test remote reindex against version 0.90. This failed on mac a few times and rather than spend time investigating this, we no longer test remote reindex against 0.90 on mac. Closes #51202	2020-01-27 12:42:12 +01:00
Ioannis Kakavas	ee202a642f	Enable tests in FIPS 140 in JDK 11 (#49485 ) This change changes the way to run our test suites in JVMs configured in FIPS 140 approved mode. It does so by: - Configuring any given runtime Java in FIPS mode with the bundled policy and security properties files, setting the system properties java.security.properties and java.security.policy with the == operator that overrides the default JVM properties and policy. - When runtime java is 11 and higher, using BouncyCastle FIPS Cryptographic provider and BCJSSE in FIPS mode. These are used as testRuntime dependencies for unit tests and internal clusters, and copied (relevant jars) explicitly to the lib directory for testclusters used in REST tests - When runtime java is 8, using BouncyCastle FIPS Cryptographic provider and SunJSSE in FIPS mode. Running the tests in FIPS 140 approved mode doesn't require an additional configuration either in CI workers or locally and is controlled by specifying -Dtests.fips.enabled=true	2020-01-27 11:14:52 +02:00
Przemko Robakowski	3fb7ad0e67	[7.x] Refactor ForEachProcessor to use iteration instead of recursion (#51104 ) (#51322 ) * Refactor ForEachProcessor to use iteration instead of recursion (#51104) * Refactor ForEachProcessor to use iteration instead of recursion This change makes ForEachProcessor iterative and still non-blocking. In case of non-async processors we use single for loop and no recursion at all. In case of async processors we continue work on either current thread or thread started by downstream processor, whichever is slower (usually processor thread). Everything is synchronised by single atomic variable. Relates #50514 * Update IngestCommonPlugin.java	2020-01-22 20:03:37 +01:00
Stuart Tettemer	41c15b438d	Scripting: Add char position of script errors (#51069 ) (#51266 ) Add the character position of a scripting error to error responses. The contents of the `position` field are experimental and subject to change. Currently, `offset` refers to the character location where the error was encountered, `start` and `end` define a range of characters that contain the error. eg. ``` { "error": { "root_cause": [ { "type": "script_exception", "reason": "runtime error", "script_stack": [ "y = x;", " ^---- HERE" ], "script": "def x = new ArrayList(); Map y = x;", "lang": "painless", "position": { "offset": 33, "start": 29, "end": 35 } } ``` Refs: #50993	2020-01-21 13:45:59 -07:00
Nik Everett	ca15a3f5a8	Add "did you mean" to unknown queries (#51177 ) (#51254 ) This replaces the message we return for unknown queries with the standard one that we use for unknown fields from `ObjectParser`. This is nice because it includes "did you mean". One day we might convert parsing queries to using object parser, but that looks complex. This change is much smaller and seems useful.	2020-01-21 12:45:52 -05:00
Marios Trivyzas	fda25ed04a	Fix caching for PreConfiguredTokenFilter (#50912 ) (#51091 ) The PreConfiguredTokenFilter#singletonWithVersion uses the version internally for the token filter factories but it registers only one instance in the cache and not one instance per version. This can lead to exceptions like the one described in #50734 since the singleton is created and cached using the version created of the first index that is processed. Remove the singletonWithVersion() methods and use the elasticsearchVersion() methods instead. Fixes: #50734 (cherry picked from commit 24e1858)	2020-01-16 13:58:02 +01:00
Martijn van Groningen	02dfd71efa	Backport: Add pipeline name to ingest metadata (#51050 ) Backport: #50467 This commit adds the name of the current pipeline to ingest metadata. This pipeline name is accessible under the following key: '_ingest.pipeline'. Example usage in pipeline: PUT /_ingest/pipeline/2 { "processors": [ { "set": { "field": "pipeline_name", "value": "{{_ingest.pipeline}}" } } ] } Closes #42106	2020-01-16 10:50:47 +01:00
Nik Everett	fc5fde7950	Add "did you mean" to ObjectParser (#50938 ) (#50985 ) Check it out: ``` $ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{ "dac": {} }' { "error" : { "root_cause" : [ { "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" } ], "type" : "x_content_parse_exception", "reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?" }, "status" : 400 } ``` The tricky thing about implementing this is that x-content doesn't depend on Lucene. So this works by creating an extension point for the error message using SPI. Elasticsearch's server module provides the "spell checking" implementation. s	2020-01-14 17:53:41 -05:00
Christoph Büscher	2f13751bad	Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862 ) (#50991 ) We deprecated and removed the camel-case versions of the nGram and edgeNGram filters a while ago and we should do the same with the nGram and edgeNGram tokenizers. This PR deprecates the use of these names in favour of ngram and edge_ngram in 7. Usage will be disallowed on new indices starting with 8 then.	2020-01-14 21:42:34 +01:00
Alan Woodward	4974f56b25	Fix analysis BWC tests - warnings now emitted on index creation	2020-01-14 14:48:40 +00:00
Alan Woodward	8c16725a0d	Check for deprecations when analyzers are built (#50908 ) Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes #42349	2020-01-14 13:52:02 +00:00
Jake Landis	de6f132887	[7.x] Foreach processor - fork recursive call (#50514 ) (#50773 ) A very large number of recursive calls can cause a stack overflow exception. This commit forks the recursive calls for non-async processors. Once forked, each thread will handle at most 10 recursive calls to help keep the stack size and thread count down to a reasonable size.	2020-01-09 13:21:18 -06:00
Christoph Büscher	b1b4282273	Make Multiplexer inherit filter chains analysis mode (#50662 ) Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the _reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. This, of course, will also make the analyzers using the multiplexer be usable at search-time only. Closes #50554	2020-01-08 22:12:01 +01:00
Henning Andersen	125feecabc	Guess root cause support unwrap (#50525 ) (#50742 ) ElasticsearchException.guessRootCauses would return wrapper exception if inner exception was not an ElasticsearchException. Fixed to never return wrapper exceptions. At least following APIs change root_cause.0.type as a result: _update with bad script _index with bad pipeline Relates #50417	2020-01-08 19:09:14 +01:00
Adrien Grand	4f2299c714	Upgrade to Lucene 8.4.0. (#50518 ) (#50750 )	2020-01-08 18:53:59 +01:00
Adrien Grand	31158ab3d5	Add per-field metadata. (#50333 ) This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267	2020-01-08 16:21:18 +01:00
Alexander Reelsen	71054d269b	Sync grok patterns with logstash patterns (#50381 ) In order to ensure that logstash and Elasticsearch are able to understand the same patterns, this commit adapts to changes in logstash, adds a few patterns and changes a few.	2020-01-08 14:59:34 +01:00
Mayya Sharipova	0b7309ec9c	Fix NPE bug inner_hits (#50709 ) When there several subqueries on different relations of the join field, and only one of subqueries is using inner_hits, NPE occurs. This PR prevents NPE error. Closes #50539	2020-01-07 14:21:54 -05:00
Alan Woodward	a3ab7eb95d	Correctly handle MSM for nested disjunctions (#50669 ) With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API, term queries that are direct children of a boolean query are handled separately from other children. This works fine for conjunctions, but for disjunctions we need to treat the extracted terms from these direct descendents along with extractions from more deeply nested children to ensure that minimum-should-match requirements are met correctly. This commit changes the logic in QueryAnalyzer#getResult() to bundle child term results with all other results before handling them. Fixes #50305	2020-01-07 09:32:30 +00:00
Nik Everett	45663ac1a8	Use Void context on parsers where possible (#50573 ) (#50617 ) Most of our parsing can be done without passing any extra context into the parser that isn't already part of the xcontent stream. While I was looking around at the places that do need a context I found a few places that were declared to need a context but don't actually need it.	2020-01-03 13:28:55 -05:00
Nik Everett	4d58656065	Declare remaining parsers `final` (#50571 ) (#50615 ) We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which are final. This is probably the right way to declare them because in practice we never mutate them after they are built. And we certainly don't change the static reference. Anyway, this adds `final` to these parsers. I found the non-final parsers with this: ``` diff \ <(find . -type f -name '.java' -exec grep -iHe 'static.PARSER\s=' {} \+ \| sort) \ <(find . -type f -name '.java' -exec grep -iHe 'static.final.PARSER\s*=' {} \+ \| sort) \ 2>&1 \| grep '^<' ```	2020-01-03 11:48:11 -05:00
Nik Everett	b36a8ab141	Make some ObjectParsers final (#50471 ) (#50556 ) We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which are final. This is probably the right way to declare them because in practice we never mutate them after they are built. And we certainly don't change the static reference. Anyway, this adds `final` to a bunch of these parsers, mostly the ones in xpack and their "paired" parsers in the high level rest client. I picked these just to have somewhere to break the up the change so it wouldn't be huge. I found the non-final parsers with this: ``` diff \ <(find . -type f -name '.java' -exec grep -iHe 'static.PARSER\s=' {} \+ \| sort) \ <(find . -type f -name '.java' -exec grep -iHe 'static.final.PARSER\s*=' {} \+ \| sort) \ 2>&1 \| grep '^<' ```	2020-01-02 10:47:38 -05:00
Christoph Büscher	6258d25458	Log deprecation for nGram and edgeNGram custom filters (#50376 ) (#50445 ) The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We currently throw errors on new indices when they are used. However these errors are currently only thrown for pre-configured filters, adding them as custom filters doesn't trigger the warning and error. This change adds the appropriate deprecation warnings for `nGram` and `edgeNGram` respectively on version 7 indices. Relates #50360	2019-12-20 22:00:08 +01:00
Stuart Tettemer	f212994c16	[TEST] Unknown scripting annotations raise error (#50343 ) (#50346 ) Ensure that unknown annotations, such as typo'd `@nondeterministic`, will raise an exception.	2019-12-19 16:22:22 -07:00
Stuart Tettemer	689df1f28f	Scripting: ScriptFactory not required by compile (#50344 ) (#50392 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Backport Relates: #49466	2019-12-19 12:50:25 -07:00
Stuart Tettemer	06a24f09cf	Scripting: Cache script results if deterministic (#50106 ) (#50329 ) Cache results from queries that use scripts if they use only deterministic API calls. Nondeterministic API calls are marked in the whitelist with the `@nondeterministic` annotation. Examples are `Math.random()` and `new Date()`. Refs: #49466	2019-12-18 13:00:42 -07:00
Przemko Robakowski	0efb241b3c	Fix flakiness in CsvProcessorTests (#50254 ) (#50256 ) There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields. Closes #50209	2019-12-17 01:15:15 +01:00
Ignacio Vera	b5ec227de8	upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129 ) (#50132 )	2019-12-12 13:13:37 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Przemko Robakowski	4619834b97	[7.x] CSV ingest processor (#49509 ) (#50083 ) * CSV ingest processor (#49509) This change adds new ingest processor that breaks line from CSV file into separate fields. By default it conforms to RFC 4180 but can be tweaked. Closes #49113	2019-12-11 23:06:05 +01:00
Jack Conradson	eb20db8a1c	Update Painless AST Catch Node (#50044 ) This makes two changes to the catch node: 1. Use SDeclaration to replace independent variable usage. 2. Use a DType to set a "minimum" exception type - this allows us to require users to continue using Exception as "minimum" type for catch blocks, but for us to internally catch Error/Throwable. This is a required step to removing custom try/catch blocks from SClass.	2019-12-10 12:56:34 -08:00
Adrien Grand	87e72156ce	Upgrade to lucene 8.4.0-snapshot-662c455. (#50016 ) (#50039 ) Lucene 8.4 is about to be released so we should check it doesn't cause problems with Elasticsearch.	2019-12-10 18:04:58 +01:00
Alan Woodward	3d8c2f9e18	Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803 ) When the query analyzer examines a conjunction containing both terms and ranges, it should only include ranges in the minimum_should_match calculation if there are no other range queries on that same field within the conjunction. This is because we cannot build a selection query over disjoint ranges on the same field, and it is not easy to check if two range queries have an overlap. The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent on whether or not the current range is over a field that has already been seen. However, this can be incorrect in the case that there are terms in the same match group which adjust the minimum_should_match downwards. Instead, the logic should be changed to match the terms extraction, whereby we adjust minimum_should_match downwards if we have already seen a range field. Fixes #49684	2019-12-10 11:01:52 +00:00
Przemko Robakowski	d7083a84f4	Allow list of IPs in geoip ingest processor (#49573 ) (#49947 ) * Allow list of IPs in geoip ingest processor This change lets you use array of IPs in addition to string in geoip processor source field. It will set array containing geoip data for each element in source, unless first_only parameter option is enabled, then only first found will be returned. Closes #46193	2019-12-07 00:19:09 +01:00
Stuart Tettemer	17cda5b2c0	Scripting: Groundwork for caching script results (#49895 ) (#49944 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466 Backport	2019-12-06 15:08:05 -07:00
Jake Landis	1c5a139968	Update jackson-databind to 2.8.11.4 (#49347 ) (#49937 )	2019-12-06 13:39:33 -06:00
Henning Andersen	1d3feaf18e	Reindex sort deprecation warning take 2 (#49855 ) (#49899 ) Moved the deprecation warning to ReindexValidator to ensure it runs early and works with resilient reindex. Also check that the warning is reported back for wait_for_completion=false. Follow-up to #49458	2019-12-06 09:44:36 +01:00
Jack Conradson	cd3744c0b7	Add nodes to handle types (#49785 ) This PR adds 3 nodes to handle types defined by a front-end creating a Painless AST. These types are decided with data immutability in mind - hence the reason for more than a single node.	2019-12-05 17:09:19 -08:00
Zachary Tong	fec882a457	Decouple pipeline reductions from final agg reduction (#45796 ) Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible	2019-12-05 16:11:54 -05:00
Jack Conradson	687c6648d9	Minor Painless Clean Up (#49844 ) This cleans up two minor things. - Cleans up style of == false - Pulls maxLoopCounter into a member variable instead of accessing CompilerSettings multiple times in the SFunction node	2019-12-05 12:20:07 -08:00
Stuart Tettemer	426c7a5e8f	Scripting: add available languages & contexts API (#49652 ) (#49815 ) Adds `GET /_script_language` to support Kibana dynamic scripting language selection. Response contains whether `inline` and/or `stored` scripts are enabled as determined by the `script.allowed_types` settings. For each scripting language registered, such as `painless`, `expression`, `mustache` or custom, available contexts for the language are included as determined by the `script.allowed_contexts` setting. Response format: ``` { "types_allowed": [ "inline", "stored" ], "language_contexts": [ { "language": "expression", "contexts": [ "aggregation_selector", "aggs" ... ] }, { "language": "painless", "contexts": [ "aggregation_selector", "aggs", "aggs_combine", ... ] } ... ] } ``` Fixes: #49463 Backport	2019-12-04 16:18:22 -07:00
Jack Conradson	dbf6183469	Remove extraneous pass (#49797 ) This removes the storeSettings pass where nodes in the AST could store information they needed out of CompilerSettings for use during later passes. CompilerSettings is part of ScriptRoot which is available during the analysis pass making the storeSettings pass redundant.	2019-12-04 12:18:04 -08:00
Armin Braun	91ac87d75b	Stop Allocating Buffers in CopyBytesSocketChannel (#49825 ) (#49832 ) * Stop Allocating Buffers in CopyBytesSocketChannel (#49825) The way things currently work, we read up to 1M from the channel and then potentially force all of it into the `ByteBuf` passed by Netty. Since that `ByteBuf` tends to by default be `64k` in size, large reads will force the buffer to grow, completely circumventing the logic of `allocHandle`. This seems like it could break `io.netty.channel.RecvByteBufAllocator.Handle#continueReading` since that method for the fixed-size allocator does check whether the last read was equal to the attempted read size. So if we set `64k` because that's what the buffer size is, then wirte `1M` to the buffer we will stop reading on the IO loop, even though the channel may still have bytes that we can read right away. More imporatantly though, this can lead to running OOM quite easily under IO pressure as we are forcing the heap buffers passed to the read to `reallocate`. Closes #49699	2019-12-04 19:36:52 +01:00
Armin Braun	996cddd98b	Stop Copying Every Http Request in Message Handler (#44564 ) (#49809 ) * Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)	2019-12-04 08:41:42 +01:00
Jason Tedor	0f27c0b702	Extend systemd timeout during startup (#49784 ) When we are notifying systemd that we are fully started up, it can be that we do not notify systemd before its default timeout of sixty seconds elapses (e.g., if we are upgrading on-disk metadata). In this case, we need to notify systemd to extend this timeout so that we are not abruptly terminated. We do this by repeatedly sending EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this every fifteen seconds. This will prevent systemd from abruptly terminating us during a long startup. We cancel the scheduled execution of this notification after we have successfully started up.	2019-12-03 14:25:45 -05:00
Henning Andersen	5adb33ec17	Deprecate sorting in reindex (#49458 ) (#49738 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-12-01 19:24:27 +01:00
Henning Andersen	1d745f1e5c	Revert "Deprecate sorting in reindex (#49458 )" This reverts commit `27d45c9f1f`.	2019-11-29 22:08:19 +01:00
Henning Andersen	27d45c9f1f	Deprecate sorting in reindex (#49458 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-11-29 21:35:11 +01:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Mayya Sharipova	2dafecc398	Upgrade lucene to 8.4.0-snapshot-e648d601efb (#49641 )	2019-11-28 11:59:58 -05:00
jimczi	35732504ba	#49166 Fix spurious test failure	2019-11-28 11:08:15 +01:00
Jim Ferenczi	d6445fae4b	Add a cluster setting to disallow loading fielddata on _id field (#49166 ) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472 is implemented. Closes #43599	2019-11-28 09:35:28 +01:00
Martijn van Groningen	0a42395dfa	Backport: add templating support to pipeline processor (#49643 ) Backport of #49030 This commit adds templating support to the pipeline processor's `name` option. Closes #39955	2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka	502873b144	[Java.time] Retain prefixed date pattern in formatter (#48703 ) JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters. closes #48698	2019-11-27 12:29:18 +01:00
Yannick Welsch	bd007271cf	Avoid double-wrapping allocator (#49534 ) When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.	2019-11-27 09:25:32 +01:00
Martijn van Groningen	90850f4ea0	Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596 ) Backport of #49076 In case an exception occurs inside a pipeline processor, the pipeline stack is kept around as header in the exception. Then in the on_failure processor the id of the pipeline the exception occurred is made accessible via the `on_failure_pipeline` ingest metadata. Closes #44920	2019-11-27 07:52:08 +01:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Henning Andersen	49bb5fb642	Netty4: switch to composite cumulator (#49478 ) The default merge cumulator used in netty transport leads to additional GC pressure and memory copying when a message that exceeds the chunk size is handled. This is especially a problem on G1 GC, since we get many "humongous" allocations and that can in theory cause real memory circuit breaker to break unnecessarily.	2019-11-22 18:14:10 +01:00
Martijn van Groningen	2243743450	Update geolite2 database in ingest geoip plugin. (#49308 ) Some tests were tweaked to deal with the updated database files.	2019-11-22 08:38:57 +01:00
Henning Andersen	0164de8579	Reindex search response fix again (#49423 ) Fixed test case to more broadly accept all messages with "Partial shards failure" in it, to hopefully catch all relevant search messages now that reindex does not allow searching against red shards. Closes #49295	2019-11-21 11:45:08 +01:00
Jack Conradson	a780ec14f0	Painless: Upgrade ASM to 7.2 (#49263 ) This upgrades Painless to use the latest ASM libraries providing support up to Java 14. Note the library is not published with the latest versions in an "all" package, so we pick up each lib independently that's required. There were some changes to the getType method that require descriptors to be used in place of internal class names.	2019-11-20 07:09:47 -08:00
Christoph Büscher	4ffa050735	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:37:12 +01:00
Alan Woodward	c6b31162ba	Refactor percolator's QueryAnalyzer to use QueryVisitors Lucene now allows us to explore the structure of a query using QueryVisitors, delegating the knowledge of how to recurse through and collect terms to the query implementations themselves. The percolator currently has a home-grown external version of this API to construct sets of matching terms that must be present in a document in order for it to possibly match the query. This commit removes the home-grown implementation in favour of one using QueryVisitor. This has the added benefit of making interval queries available for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050) it also includes a clone of some of the lucene intervals logic, that can be removed once upstream has been fixed. Closes #45639	2019-11-20 09:21:01 +00:00
Mark Tozzi	17358b5af7	(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320 ) (#49330 ) This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.	2019-11-19 16:44:29 -05:00
Ryan Ernst	c6a8913c38	Fix java home validation usage by tasks (#49204 ) Tasks intending to use a particular java home provided by JAVA<N>_HOME use the getJavaHome method, which verifies the given java home is available, or will be if the task will run. However, the verification logic was broken, in addition to unnecessarily delaying retrieving the java home until runtime. This commit fixes the verification logic to run at either config time, delaying verification, or at runtime which immediately checks if java home is available. closes #49153	2019-11-19 10:30:19 -08:00
Henning Andersen	bc29c9877a	Reindex search response fix (#49301 ) Fixed test case to also accept another error message, now that reindex does not allow searching against red shards. Closes #49295	2019-11-19 14:38:05 +01:00
Tanguy Leroux	abed869ec6	Mute ReindexFailureTests.testResponseOnSearchFailure (#49298 ) Relates #49295	2019-11-19 12:38:54 +01:00
Henning Andersen	2ac38fd315	Reindex and friends fail on RED shards (#45830 ) Reindex, update by query and delete by query would silently disregard RED/unavailable shards, thus not copying, updating or deleting matching data in those shards. Now use `allow_partial_search_results=false` to ensure these operations fail if the search crosses an unavailable chard. Added the option to explicitly specify `allow_partial_search_results=true` for reindex only (seemed too strange for update/delete by query). Relates #45739 and #42612	2019-11-18 21:23:08 +01:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Jason Tedor	2bcdcb17cd	Introduce dedicated ingest processor exception (#48810 ) Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.	2019-11-14 11:04:53 -05:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	8835142ac9	Grok processor ignore case test (#48909 ) Added test demonstrating that grok using ignore case works, since this does a minimal test that the `joni` and `jcodings` libraries are compatible. Forward-port of test from #43334	2019-11-08 00:04:29 +01:00
Jason Tedor	c82ecb664c	Do not wrap ingest processor exception with IAE (#48816 ) The problem with wrapping here is that it converts any exception into an IAE, which we treat as a client error (400 status) whereas the exception being wrapped here could be a server error (e.g., NPE). This commit stops wrapping all ingest processor exceptions as IAEs.	2019-11-01 15:11:35 -04:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Ioannis Kakavas	99aedc844d	Copy http headers to ThreadContext strictly (#45945 ) (#48675 ) Previous behavior while copying HTTP headers to the ThreadContext, would allow multiple HTTP headers with the same name, handling only the first occurrence and disregarding the rest of the values. This can be confusing when dealing with multiple Headers as it is not obvious which value is read and which ones are silently dropped. According to RFC-7230, a client must not send multiple header fields with the same field name in a HTTP message, unless the entire field value for this header is defined as a comma separated list or this specific header is a well-known exception. This commits changes the behavior in order to be more compliant to the aforementioned RFC by requiring the classes that implement ActionPlugin to declare if a header can be multi-valued or not when registering this header to be copied over to the ThreadContext in ActionPlugin#getRestHeaders. If the header is allowed to be multivalued, then all such headers are read from the HTTP request and their values get concatenated in a comma-separated string. If the header is not allowed to be multivalued, and the HTTP request contains multiple such Headers with different values, the request is rejected with a 400 status.	2019-10-31 23:05:12 +02:00
Dan Hermann	dbc05cd808	Add option to split processor for preserving trailing empty fields (#48685 )	2019-10-30 08:25:03 -05:00
Yogesh Gaikwad	9ed7352a12	Add Sysprop to Adjust IO Buffer Size (#48267 ) (#48667 ) The 1MB IO-buffer size per transport thread is causing trouble in some tests, albeit at a low rate. Reducing the number of transport threads was not enough to fully fix this situation. Allowing to configure the size of the buffer and reducing it by more than an order of magnitude should fix these tests. Closes #46803	2019-10-30 14:19:54 +11:00
Christoph Büscher	09d68e7548	Support `search_type` in Rank Evaluation API (#48542 ) (#48631 ) Adding support for the `search_type` request parameter to the Ranking Evaluation API since this parameter can impact the ranking and the metric score and should be choosen in the same way when evaluating the search as later in the real search. Closes #48503	2019-10-29 14:54:33 +01:00
Rory Hunter	3c77c50f5f	Improve resiliency to auto-formatting in libs, modules (#48619 ) Backport of #48448. Make a number of changes so that code in the libs and modules directories are more resilient to automatic formatting. This covers: * Remove string concatenation where JSON fits on a single line * Move some comments around to they aren't auto-formatted to a strange place	2019-10-29 10:39:34 +00:00
Tim Brooks	45e42f4e18	Upgrade to Netty 4.1.43 (#48484 ) With this update we can remove the mitigation in our custom allocator which forces heap buffer allocations.	2019-10-25 10:17:25 -06:00
Tim Brooks	c0b545f325	Make BytesReference an interface (#48486 ) BytesReference is currently an abstract class which is extended by various implementations. This makes it very difficult to use the delegation pattern. The implication of this is that our releasable BytesReference is a PagedBytesReference type and cannot be used as a generic releasable bytes reference that delegates to any reference type. This commit makes BytesReference an interface and introduces an AbstractBytesReference for common functionality.	2019-10-24 15:39:30 -06:00
Michael Basnight	c19379ef31	Remove random when using HLRC sync and async calls (#48211 ) This commit removes the randomization used by every execute call in the high level rest tests. Previously every execute call, which can be many calls per single test, would rely on a random boolean to determine if they should use the sync or async methods provided to the execute method. This commit runs the tests twice, using two different clusters, both of them providing the value one time via a sysprop. This ensures that the whole suite of tests is run using the sync and async code paths. Closes #39667	2019-10-24 09:06:17 -05:00
Martijn van Groningen	b034153df7	Change grok watch dog to be Matcher based instead of thread based. (#48346 ) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.	2019-10-24 15:34:01 +02:00
Tim Brooks	c1f6aff5bb	Remove default netty allocator empty assertions (#48356 ) This commit removes a problematic assertion that the netty default allocator is not used. This assertion is problematic because any other test can cause this task to fail by touching the default allocator. We assert that we are using heap buffers in the channel.	2019-10-22 20:22:32 -06:00
Tim Brooks	547e399dbf	Remove option to enable direct buffer pooling (#48310 ) This commit removes the option to change the netty system properties to reenable the direct buffer pooling. It also removes the need for us to disable the buffer pooling in the system properties file. Instead, we programmatically craete an allocator that is used by our networking layer. This commit does introduce an Elasticsearch property which allows the user to fallback on the netty default allocator. If they choose this option, they can configure the default allocator how they wish using the standard netty properties.	2019-10-21 19:15:50 -06:00
Ignacio Vera	b1224fca8c	upgrade to Lucene-8.3.0-snapshot-25968e3b75e (#48227 )	2019-10-21 08:21:09 +02:00
Alexander Reelsen	66581d8158	update ingest-user-agent regexes.yml (#47807 ) This new regexes are from: `154eba17f5/regexes.yaml`	2019-10-18 16:26:48 +02:00
Jack Conradson	155ecd0a76	Change Painless regex node to use SField instead of Globals (#47944 ) * Change Painless regex node to use SField instead of Globals * Use reflection instead of ASM to specify modifiers * Remove synthetic from SField	2019-10-15 07:47:16 -07:00
jimczi	b858e19bcc	Revert #46598 that breaks the cachability of the sub search contexts.	2019-10-15 09:40:59 +02:00
Tim Brooks	8814bf07f1	Upgrade to Netty 4.1.42 (#48015 ) Upgrades the netty version.	2019-10-14 13:54:02 -06:00
Przemyslaw Gomulka	6ab58de7ef	[7.x] Enable ResolverStyle.STRICT for java formatters backport(#46675 ) (#47913 ) Joda was using ResolverStyle.STRICT when parsing. This means that date will be validated to be a correct year, year-of-month, day-of-month However, we also want to make it works with Year-Of-Era as Joda used to, hence custom temporalquery.localdate in DateFormatters.from Within DateFormatters we use the correct uuuu year instead of yyyy year of era worth noting: if yyyy(without an era) is used in code, the parsing result will be a TemporalAccessor which will fail to be converted into LocalDate. We mostly use DateFormatters.from so this takes care of this. If possible the uuuu format should be used.	2019-10-11 21:19:56 +02:00
Jim Ferenczi	bd6e2592a7	Remove the SearchContext from the highlighter context (#47733 ) Today built-in highlighter and plugins have access to the SearchContext through the highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the complex SearchContext and remove the needs to clone it in the percolator sub phase. Relates #47198 Relates #46523	2019-10-10 10:34:10 +02:00
Jack Conradson	076d3073b5	Move binding member field generation to Painless semantic pass (#47739 ) This adds an SField node that operates similarly to SFunction as a top level node meant only for use in an SClass node. Member fields are generated for both class bindings and instance bindings using the new SField node during the semantic pass, and information is no longer passed through Globals for this during the write pass.	2019-10-09 10:24:53 -07:00
Tim Brooks	02622c1ef9	Fix issues with serializing BulkByScrollResponse (#45357 ) Currently there are two issues with serializing BulkByScrollResponse. First, when deserializing from XContent, indexing exceptions and search exceptions are switched. Additionally, search exceptions do no retain the appropriate RestStatus code, so you must evaluate the status code from the exception. However, the exception class is not always correctly retained when serialized. This commit adds tests in the failure case. Additionally, fixes the swapping of failure types and adds the rest status code to the search failure.	2019-10-09 10:12:14 -06:00
Alpar Torok	36d018c909	Convert RunTask to use testclusers, remove ClusterFormationTasks (#47572 ) * Convert RunTask to use testclusers, remove ClusterFormationTasks This PR adds a new RunTask and a way for it to start a testclusters cluster out of band and block on it to replace the old RunTask that used ClusterFormationTasks. With this we can now remove ClusterFormationTasks.	2019-10-08 14:43:29 +03:00
Jack Conradson	833ed30f0d	Modify Painless AST to add synthetic functions during semantic pass (#47611 ) This has ELambda and ENewArrayFunctionRef add their generated synthetic methods to the SClass node during the semantic pass and removes this data from the write pass. This is the first step to remove "Globals" (mutable state) from the write pass.	2019-10-07 07:48:51 -07:00
Jack Conradson	e3aab1295e	Add a ScriptRoot to consolidate global data necessary for multiple passes (#47532 ) This PR is to get plumbing in for a ScriptRoot class that will consolidate several pieces of state required by potentially multiple passes including PainlessLookup, CompilerSettings, FunctionTable, the root class node, and a synthetic counter. It's possible more may be added to this as we move forward and slowly make the the nodes have less mutable state.	2019-10-04 08:37:19 -07:00
Ryan Ernst	f32692208e	Add explanations to script score queries (#46693 ) (#47548 ) While function scores using scripts do allow explanations, they are only creatable with an expert plugin. This commit improves the situation for the newer script score query by adding the ability to set the explanation from the script itself. To set the explanation, a user would check for `explanation != null` to indicate an explanation is needed, and then call `explanation.set("some description")`.	2019-10-03 21:05:05 -07:00

1 2 3 4 5 ...

5543 Commits