OpenSearch

Commit Graph

Author	SHA1	Message	Date
Przemko Robakowski	0efb241b3c	Fix flakiness in CsvProcessorTests (#50254 ) (#50256 ) There's flakiness in CsvProcesorTests, where tests fail if random document generator add field that should not be present. This change cleans generated document from these problematic fields. Closes #50209	2019-12-17 01:15:15 +01:00
Ignacio Vera	b5ec227de8	upgrade to lucene 8.4.0-snapshot-08b8d116f8f (#50129 ) (#50132 )	2019-12-12 13:13:37 +01:00
Armin Braun	6eee41e253	Remove Unused Single Delete in BlobStoreRepository (#50024 ) (#50123 ) * Remove Unused Single Delete in BlobStoreRepository There are no more production uses of the non-bulk delete or the delete that throws on missing so this commit removes both these methods. Only the bulk delete logic remains. Where the bulk delete was derived from single deletes, the single delete code was inlined into the bulk delete method. Where single delete was used in tests it was replaced by bulk deleting.	2019-12-12 11:17:46 +01:00
Przemko Robakowski	4619834b97	[7.x] CSV ingest processor (#49509 ) (#50083 ) * CSV ingest processor (#49509) This change adds new ingest processor that breaks line from CSV file into separate fields. By default it conforms to RFC 4180 but can be tweaked. Closes #49113	2019-12-11 23:06:05 +01:00
Jack Conradson	eb20db8a1c	Update Painless AST Catch Node (#50044 ) This makes two changes to the catch node: 1. Use SDeclaration to replace independent variable usage. 2. Use a DType to set a "minimum" exception type - this allows us to require users to continue using Exception as "minimum" type for catch blocks, but for us to internally catch Error/Throwable. This is a required step to removing custom try/catch blocks from SClass.	2019-12-10 12:56:34 -08:00
Adrien Grand	87e72156ce	Upgrade to lucene 8.4.0-snapshot-662c455. (#50016 ) (#50039 ) Lucene 8.4 is about to be released so we should check it doesn't cause problems with Elasticsearch.	2019-12-10 18:04:58 +01:00
Alan Woodward	3d8c2f9e18	Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803 ) When the query analyzer examines a conjunction containing both terms and ranges, it should only include ranges in the minimum_should_match calculation if there are no other range queries on that same field within the conjunction. This is because we cannot build a selection query over disjoint ranges on the same field, and it is not easy to check if two range queries have an overlap. The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent on whether or not the current range is over a field that has already been seen. However, this can be incorrect in the case that there are terms in the same match group which adjust the minimum_should_match downwards. Instead, the logic should be changed to match the terms extraction, whereby we adjust minimum_should_match downwards if we have already seen a range field. Fixes #49684	2019-12-10 11:01:52 +00:00
Przemko Robakowski	d7083a84f4	Allow list of IPs in geoip ingest processor (#49573 ) (#49947 ) * Allow list of IPs in geoip ingest processor This change lets you use array of IPs in addition to string in geoip processor source field. It will set array containing geoip data for each element in source, unless first_only parameter option is enabled, then only first found will be returned. Closes #46193	2019-12-07 00:19:09 +01:00
Stuart Tettemer	17cda5b2c0	Scripting: Groundwork for caching script results (#49895 ) (#49944 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466 Backport	2019-12-06 15:08:05 -07:00
Jake Landis	1c5a139968	Update jackson-databind to 2.8.11.4 (#49347 ) (#49937 )	2019-12-06 13:39:33 -06:00
Henning Andersen	1d3feaf18e	Reindex sort deprecation warning take 2 (#49855 ) (#49899 ) Moved the deprecation warning to ReindexValidator to ensure it runs early and works with resilient reindex. Also check that the warning is reported back for wait_for_completion=false. Follow-up to #49458	2019-12-06 09:44:36 +01:00
Jack Conradson	cd3744c0b7	Add nodes to handle types (#49785 ) This PR adds 3 nodes to handle types defined by a front-end creating a Painless AST. These types are decided with data immutability in mind - hence the reason for more than a single node.	2019-12-05 17:09:19 -08:00
Zachary Tong	fec882a457	Decouple pipeline reductions from final agg reduction (#45796 ) Historically only two things happened in the final reduction: empty buckets were filled, and pipeline aggs were reduced (since it was the final reduction, this was safe). Usage of the final reduction is growing however. Auto-date-histo might need to perform many reductions on final-reduce to merge down buckets, CCS may need to side-step the final reduction if sending to a different cluster, etc Having pipelines generate their output in the final reduce was convenient, but is becoming increasingly difficult to manage as the rest of the agg framework advances. This commit decouples pipeline aggs from the final reduction by introducing a new "top level" reduce, which should be called at the beginning of the reduce cycle (e.g. from the SearchPhaseController). This will only reduce pipeline aggs on the final reduce after the non-pipeline agg tree has been fully reduced. By separating pipeline reduction into their own set of methods, aggregations are free to use the final reduction for whatever purpose without worrying about generating pipeline results which are non-reducible	2019-12-05 16:11:54 -05:00
Jack Conradson	687c6648d9	Minor Painless Clean Up (#49844 ) This cleans up two minor things. - Cleans up style of == false - Pulls maxLoopCounter into a member variable instead of accessing CompilerSettings multiple times in the SFunction node	2019-12-05 12:20:07 -08:00
Stuart Tettemer	426c7a5e8f	Scripting: add available languages & contexts API (#49652 ) (#49815 ) Adds `GET /_script_language` to support Kibana dynamic scripting language selection. Response contains whether `inline` and/or `stored` scripts are enabled as determined by the `script.allowed_types` settings. For each scripting language registered, such as `painless`, `expression`, `mustache` or custom, available contexts for the language are included as determined by the `script.allowed_contexts` setting. Response format: ``` { "types_allowed": [ "inline", "stored" ], "language_contexts": [ { "language": "expression", "contexts": [ "aggregation_selector", "aggs" ... ] }, { "language": "painless", "contexts": [ "aggregation_selector", "aggs", "aggs_combine", ... ] } ... ] } ``` Fixes: #49463 Backport	2019-12-04 16:18:22 -07:00
Jack Conradson	dbf6183469	Remove extraneous pass (#49797 ) This removes the storeSettings pass where nodes in the AST could store information they needed out of CompilerSettings for use during later passes. CompilerSettings is part of ScriptRoot which is available during the analysis pass making the storeSettings pass redundant.	2019-12-04 12:18:04 -08:00
Armin Braun	91ac87d75b	Stop Allocating Buffers in CopyBytesSocketChannel (#49825 ) (#49832 ) * Stop Allocating Buffers in CopyBytesSocketChannel (#49825) The way things currently work, we read up to 1M from the channel and then potentially force all of it into the `ByteBuf` passed by Netty. Since that `ByteBuf` tends to by default be `64k` in size, large reads will force the buffer to grow, completely circumventing the logic of `allocHandle`. This seems like it could break `io.netty.channel.RecvByteBufAllocator.Handle#continueReading` since that method for the fixed-size allocator does check whether the last read was equal to the attempted read size. So if we set `64k` because that's what the buffer size is, then wirte `1M` to the buffer we will stop reading on the IO loop, even though the channel may still have bytes that we can read right away. More imporatantly though, this can lead to running OOM quite easily under IO pressure as we are forcing the heap buffers passed to the read to `reallocate`. Closes #49699	2019-12-04 19:36:52 +01:00
Armin Braun	996cddd98b	Stop Copying Every Http Request in Message Handler (#44564 ) (#49809 ) * Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way * Relates #32228 * I think the issue that preventet that PR that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer) * I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)	2019-12-04 08:41:42 +01:00
Jason Tedor	0f27c0b702	Extend systemd timeout during startup (#49784 ) When we are notifying systemd that we are fully started up, it can be that we do not notify systemd before its default timeout of sixty seconds elapses (e.g., if we are upgrading on-disk metadata). In this case, we need to notify systemd to extend this timeout so that we are not abruptly terminated. We do this by repeatedly sending EXTEND_TIMEOUT_USEC to extend the timeout by thirty seconds; we do this every fifteen seconds. This will prevent systemd from abruptly terminating us during a long startup. We cancel the scheduled execution of this notification after we have successfully started up.	2019-12-03 14:25:45 -05:00
Henning Andersen	5adb33ec17	Deprecate sorting in reindex (#49458 ) (#49738 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-12-01 19:24:27 +01:00
Henning Andersen	1d745f1e5c	Revert "Deprecate sorting in reindex (#49458 )" This reverts commit `27d45c9f1f`.	2019-11-29 22:08:19 +01:00
Henning Andersen	27d45c9f1f	Deprecate sorting in reindex (#49458 ) Reindex sort never gave a guarantee about the order of documents being indexed into the destination, though it could give a sense of locality of source data. It prevents us from doing resilient reindex and other optimizations and it has therefore been deprecated. Related to #47567	2019-11-29 21:35:11 +01:00
Armin Braun	813b49adb4	Make BlobStoreRepository Aware of ClusterState (#49639 ) (#49711 ) * Make BlobStoreRepository Aware of ClusterState (#49639) This is a preliminary to #49060. It does not introduce any substantial behavior change to how the blob store repository operates. What it does is to add all the infrastructure changes around passing the cluster service to the blob store, associated test changes and a best effort approach to tracking the latest repository generation on all nodes from cluster state updates. This brings a slight improvement to the consistency by which non-master nodes (or master directly after a failover) will be able to determine the latest repository generation. It does not however do any tricky checks for the situation after a repository operation (create, delete or cleanup) that could theoretically be used to get even greater accuracy to keep this change simple. This change does not in any way alter the behavior of the blobstore repository other than adding a better "guess" for the value of the latest repo generation and is mainly intended to isolate the actual logical change to how the repository operates in #49060	2019-11-29 14:57:47 +01:00
Mayya Sharipova	2dafecc398	Upgrade lucene to 8.4.0-snapshot-e648d601efb (#49641 )	2019-11-28 11:59:58 -05:00
jimczi	35732504ba	#49166 Fix spurious test failure	2019-11-28 11:08:15 +01:00
Jim Ferenczi	d6445fae4b	Add a cluster setting to disallow loading fielddata on _id field (#49166 ) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472 is implemented. Closes #43599	2019-11-28 09:35:28 +01:00
Martijn van Groningen	0a42395dfa	Backport: add templating support to pipeline processor (#49643 ) Backport of #49030 This commit adds templating support to the pipeline processor's `name` option. Closes #39955	2019-11-27 15:53:40 +01:00
Przemyslaw Gomulka	502873b144	[Java.time] Retain prefixed date pattern in formatter (#48703 ) JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters. closes #48698	2019-11-27 12:29:18 +01:00
Yannick Welsch	bd007271cf	Avoid double-wrapping allocator (#49534 ) When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.	2019-11-27 09:25:32 +01:00
Martijn van Groningen	90850f4ea0	Backport: Introduce on_failure_pipeline ingest metadata inside on_failure block (#49596 ) Backport of #49076 In case an exception occurs inside a pipeline processor, the pipeline stack is kept around as header in the exception. Then in the on_failure processor the id of the pipeline the exception occurred is made accessible via the `on_failure_pipeline` ingest metadata. Closes #44920	2019-11-27 07:52:08 +01:00
Jason Tedor	71bcfbf1e3	Replace required pipeline with final pipeline (#49470 ) This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.	2019-11-22 14:37:36 -05:00
Henning Andersen	49bb5fb642	Netty4: switch to composite cumulator (#49478 ) The default merge cumulator used in netty transport leads to additional GC pressure and memory copying when a message that exceeds the chunk size is handled. This is especially a problem on G1 GC, since we get many "humongous" allocations and that can in theory cause real memory circuit breaker to break unnecessarily.	2019-11-22 18:14:10 +01:00
Martijn van Groningen	2243743450	Update geolite2 database in ingest geoip plugin. (#49308 ) Some tests were tweaked to deal with the updated database files.	2019-11-22 08:38:57 +01:00
Henning Andersen	0164de8579	Reindex search response fix again (#49423 ) Fixed test case to more broadly accept all messages with "Partial shards failure" in it, to hopefully catch all relevant search messages now that reindex does not allow searching against red shards. Closes #49295	2019-11-21 11:45:08 +01:00
Jack Conradson	a780ec14f0	Painless: Upgrade ASM to 7.2 (#49263 ) This upgrades Painless to use the latest ASM libraries providing support up to Java 14. Note the library is not published with the latest versions in an "all" package, so we pick up each lib independently that's required. There were some changes to the getType method that require descriptors to be used in place of internal class names.	2019-11-20 07:09:47 -08:00
Christoph Büscher	4ffa050735	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:37:12 +01:00
Alan Woodward	c6b31162ba	Refactor percolator's QueryAnalyzer to use QueryVisitors Lucene now allows us to explore the structure of a query using QueryVisitors, delegating the knowledge of how to recurse through and collect terms to the query implementations themselves. The percolator currently has a home-grown external version of this API to construct sets of matching terms that must be present in a document in order for it to possibly match the query. This commit removes the home-grown implementation in favour of one using QueryVisitor. This has the added benefit of making interval queries available for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050) it also includes a clone of some of the lucene intervals logic, that can be removed once upstream has been fixed. Closes #45639	2019-11-20 09:21:01 +00:00
Mark Tozzi	17358b5af7	(refactor) Extract Empty/Script/Missing ValuesSource behavior to an interface (#48320 ) (#49330 ) This is a pure code rearrangement refactor. Logic for what specific ValuesSource instance to use for a given type (e.g. script or field) moved out of ValuesSourceConfig and into CoreValuesSourceType (previously just ValueSourceType; we extract an interface for future extensibility). ValueSourceConfig still selects which case to use, and then the ValuesSourceType instance knows how to construct the ValuesSource for that case.	2019-11-19 16:44:29 -05:00
Ryan Ernst	c6a8913c38	Fix java home validation usage by tasks (#49204 ) Tasks intending to use a particular java home provided by JAVA<N>_HOME use the getJavaHome method, which verifies the given java home is available, or will be if the task will run. However, the verification logic was broken, in addition to unnecessarily delaying retrieving the java home until runtime. This commit fixes the verification logic to run at either config time, delaying verification, or at runtime which immediately checks if java home is available. closes #49153	2019-11-19 10:30:19 -08:00
Henning Andersen	bc29c9877a	Reindex search response fix (#49301 ) Fixed test case to also accept another error message, now that reindex does not allow searching against red shards. Closes #49295	2019-11-19 14:38:05 +01:00
Tanguy Leroux	abed869ec6	Mute ReindexFailureTests.testResponseOnSearchFailure (#49298 ) Relates #49295	2019-11-19 12:38:54 +01:00
Henning Andersen	2ac38fd315	Reindex and friends fail on RED shards (#45830 ) Reindex, update by query and delete by query would silently disregard RED/unavailable shards, thus not copying, updating or deleting matching data in those shards. Now use `allow_partial_search_results=false` to ensure these operations fail if the search crosses an unavailable chard. Added the option to explicitly specify `allow_partial_search_results=true` for reindex only (seemed too strange for update/delete by query). Relates #45739 and #42612	2019-11-18 21:23:08 +01:00
gpaimla	7d20b50f45	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:24:21 +01:00
Jason Tedor	2bcdcb17cd	Introduce dedicated ingest processor exception (#48810 ) Today we wrap exceptions that occur while executing an ingest processor in an ElasticsearchException. Today, in ExceptionsHelper#unwrapCause we only unwrap causes for exceptions that implement ElasticsearchWrapperException, which the top-level ElasticsearchException does not. Ultimately, this means that any exception that occurs during processor execution does not have its cause unwrapped, and so its status is blanket treated as a 500. This means that while executing a bulk request with an ingest pipeline, document-level failures that occur during a processor will cause the status for that document to be treated as 500. Since that does not give the client any indication that they made a mistake, it means some clients will enter infinite retries, thinking that there is some server-side problem that merely needs to clear. This commit addresses this by introducing a dedicated ingest processor exception, so that its causes can be unwrapped. While we could consider a broader change to unwrap causes for more than just ElasticsearchWrapperExceptions, that is a broad change with unclear implications. Since the problem of reporting 500s on client errors is a user-facing bug, we take the conservative approach for now, and we can revisit the unwrapping in a future change.	2019-11-14 11:04:53 -05:00
Rory Hunter	c46a0e8708	Apply 2-space indent to all gradle scripts (#49071 ) Backport of #48849. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-14 11:01:23 +00:00
Henning Andersen	8835142ac9	Grok processor ignore case test (#48909 ) Added test demonstrating that grok using ignore case works, since this does a minimal test that the `joni` and `jcodings` libraries are compatible. Forward-port of test from #43334	2019-11-08 00:04:29 +01:00
Jason Tedor	c82ecb664c	Do not wrap ingest processor exception with IAE (#48816 ) The problem with wrapping here is that it converts any exception into an IAE, which we treat as a client error (400 status) whereas the exception being wrapped here could be a server error (e.g., NPE). This commit stops wrapping all ingest processor exceptions as IAEs.	2019-11-01 15:11:35 -04:00
Mark Vieira	6ab4645f4e	[7.x] Introduce type-safe and consistent pattern for handling build globals (#48818 ) This commit introduces a consistent, and type-safe manner for handling global build parameters through out our build logic. Primarily this replaces the existing usages of extra properties with static accessors. It also introduces and explicit API for initialization and mutation of any such parameters, as well as better error handling for uninitialized or eager access of parameter values. Closes #42042	2019-11-01 11:33:11 -07:00
Ioannis Kakavas	99aedc844d	Copy http headers to ThreadContext strictly (#45945 ) (#48675 ) Previous behavior while copying HTTP headers to the ThreadContext, would allow multiple HTTP headers with the same name, handling only the first occurrence and disregarding the rest of the values. This can be confusing when dealing with multiple Headers as it is not obvious which value is read and which ones are silently dropped. According to RFC-7230, a client must not send multiple header fields with the same field name in a HTTP message, unless the entire field value for this header is defined as a comma separated list or this specific header is a well-known exception. This commits changes the behavior in order to be more compliant to the aforementioned RFC by requiring the classes that implement ActionPlugin to declare if a header can be multi-valued or not when registering this header to be copied over to the ThreadContext in ActionPlugin#getRestHeaders. If the header is allowed to be multivalued, then all such headers are read from the HTTP request and their values get concatenated in a comma-separated string. If the header is not allowed to be multivalued, and the HTTP request contains multiple such Headers with different values, the request is rejected with a 400 status.	2019-10-31 23:05:12 +02:00
Dan Hermann	dbc05cd808	Add option to split processor for preserving trailing empty fields (#48685 )	2019-10-30 08:25:03 -05:00

1 2 3 4 5 ...

5365 Commits