OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nick Knize	9168f1fb43	[License] Add SPDX and OpenSearch Modification license header (#509 ) This commit adds the SPDX Apache-2.0 license header along with an additional copyright header for all modifications. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2021-04-09 14:28:18 -05:00
Rabi Panda	2dca3462f2	Fix stragglers from renaming to OpenSearch work. (#483 ) This commit fixes more instances where we missed renaming to OpenSearch. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-04-05 11:51:20 -07:00
Harold Wang	5971a518d0	Replace nio and nitty test endpoint (#475 ) Signed-off-by: Harold Wang <harowang@amazon.com>	2021-03-31 13:37:22 -07:00
Harold Wang	fd4c3968ab	[Rename] org.opensearch.ingest.attachment.IngestAttachmentClientYamlTestSuiteIT (#463 ) * Change "Test elasticsearch" back * Update content, language and size of test attachement * Regenerate test attachment content with updated date and author Signed-off-by: Harold Wang <harowang@amazon.com>	2021-03-26 21:59:23 -07:00
Rabi Panda	3460a8c213	Fix a few more renaming issues. (#464 ) This commit fixes some more missed instances where we can perform the renaming to OpenSearch. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-26 12:05:16 -07:00
Rabi Panda	0bdd1293c1	Use alternate example data in OpenSearch test cases. (#454 ) This commit updates some of the sample test data used in test cases in OpenSearch. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-25 08:52:07 -07:00
Rabi Panda	2e3055c9e2	Fix more failing tests as a result of renaming (#457 ) This commit fixes some more renaming issues and as a result fixes the failing tests, * :qa:logging-config:test * :example-plugins:painless-whitelist:yamlRestTest * :modules:reindex:test Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-24 09:33:05 -07:00
Rabi Panda	8469519413	Fix Checkstyle issues. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	8bba6603da	[Rename] Replace more instances of Elasticsearch with OpenSearch. (#432 ) This commit replaces more replaceable instances of Elasticsearch with OpenSearch. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Nick Knize	7051167c83	[Rename] remaining elasticsearch pass 1 (#416 ) This commit refactors instances of 'elasticsearch' with opensearch everywhere except references to issues, and other places needed to test compatibility with old elasticsearch clusters. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2021-03-21 20:56:34 -05:00
Rabi Panda	597b52992d	[Rename] File names replace elasticsearch with opensearch. (#419 ) This commit renames several files that contain the name elasticsearch and replace that with opensearch. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	eddfe6760d	[Rename] Fix issues for gradle precommit task. (#418 ) Fix miscellaneous issues identified during `gradle precommit`. These issues are the side effects of the renaming to OpenSearch work. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	df11cc9de4	[Rename] Fix gradle build as part of the renaming process. (#397 ) This commit fixes the currently broken gradle build resulted from the renaming work. It reverts a few dependencies and comments out the `opensearch_distibutions` task which is currently failing for some builds. We will address these separately in the future once we have a working build. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	13f6d23e40	[Rename] Property and metadata keys with prefix es. (#389 ) Rename all property and metadata keys with prefix 'es.' to 'opensearch.'. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Nick Knize	5b46a05702	[Rename] remaining packages and resources in test/fixture (#364 ) This commit refactors the remaining o.e.index and o.e.test packages in the test/fixtures module. References throughout the codebase are also refactored. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2021-03-21 20:56:34 -05:00
Harold Wang	82f9ff93cb	[Rename] plugins (#193 ) * [Rename] plugins (#193) This PR refactors files under "plugins" folders part of the Elasticsearch to OpenSearch renaming effort. Signed-off-by: Harold Wang <harowang@amazon.com>	2021-03-21 20:56:34 -05:00
Nick Knize	923ea001f5	[Rename] o.e.action.support classes (#253 ) This commit refactors the classes in o.e.action.support to o.opensearch.action.support. The remaining directories will be refactored in a separate commit. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	991b3650b6	[Rename] refactor server/snapshots package. (#251 ) Refactor `server/snapshots` to rename the package names from `org.elasticsearch.snapshots` to `org.opensearch.snapshots` as part of the rename to OpenSearch work. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	584efd7970	[Rename] modules/lang-painless (#210 ) Refactor lang-painless module as part rename to OpenSearch work. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	3eee5183d1	[Rename] server/rest (#229 ) This commit refactors the `server/rest` package as part of the Elasticsearch to OpenSearch renaming. Signed-off-by: Rabi Panda <adnapibar@gmail.com>	2021-03-21 20:56:34 -05:00
Nick Knize	8aa818e93e	[Rename] refactor o.e.action.admin.cluster (#207 ) This commit refactors all classes in o.e.action.admin.cluster to org.opensearch.action.admin.cluster. References are updated throughout the codebase. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Nick Knize	1203aa7302	[Rename] refactor o.e.action classes (#203 ) This commit refactors top level classes in o.e.action to o.opensearch.action. References throughout the rest of the codebase have been updated. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Nick Knize	0c81a5cf65	[Rename] refactor o.e.action.admin.indices (#209 ) This commit refactors o.e.action.admin.indices package to o.opensearch.action.admin.indices. References through out the codebase have been updated to reflect the new package location. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Nick Knize	2aa9906c42	[Rename] ElasticsearchParseException class in server module (#169 ) This commit refactors ElasticsearchParseException class in the server module to OpenSearchParseException. References and usages throughout the rest of the codebase are fully refactored. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Nick Knize	ccceb381db	[Rename] ElasticsearchException class in server module (#165 ) This commit refactors the ElasticsearchException class located in the server module to OpenSearchException. References and usages throughout the rest of the codebase are fully refactored. Signed-off-by: Nicholas Knize <nknize@amazon.com>	2021-03-21 20:56:34 -05:00
Rabi Panda	38e9c9750a	[PURIFY] Remove the AuthorizationEnginePlugin from examples. (#26 ) Signed-off-by: Peter Nied <petern@amazon.com>	2021-03-13 10:36:09 -06:00
Rabi Panda	c856534394	[PURIFY] Remove remaining x-pack license. (#25 ) Signed-off-by: Peter Nied <petern@amazon.com>	2021-03-13 10:36:09 -06:00
Nick Knize	a0b91cb230	Cleanup build script to exclude security-authorization-engine (#8 ) (#8 ) * Cleanup build-scan, remove publish scan to elastic server * Cleanup build script to exclude security-authorization-engine which test has dependency on xpack * Cleanup build script to exclude security-authorization-engine which test has dependency on xpack Co-authored-by: Huan Jiang <huanji@amazon.com> Signed-off-by: Peter Nied <petern@amazon.com>	2021-03-13 10:36:06 -06:00
Nick Knize	3a52e9ddc1	[PURIFY] update build.gradle files to ensure build completes; gradle check fails (#7 ) Signed-off-by: Peter Nied <petern@amazon.com>	2021-03-13 10:36:06 -06:00
Alan Woodward	fb84b6710d	Restore use of default search and search_quote analyzers (#65491 ) (#65562 ) In the refactoring of TextFieldMapper, we lost the ability to define a default search or search_quote analyzer in index settings. This commit restores that ability, and adds some more comprehensive testing. Fixes #65434	2020-11-26 18:34:59 +00:00
Armin Braun	51e9d6f227	Revert Serializing Outbound Transport Messages on IO Threads (#64632 ) (#64654 ) Serializing outbound transport message on the IO loop was introduced in https://github.com/elastic/elasticsearch/pull/56961. Unfortunately it turns out that this is incompatible with assumptions made by CCR code here: `f22ddf822e/x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/action/repositories/GetCcrRestoreFileChunkAction.java (L60-L61)` and that are not easy to work around on short notice. Raising reverting this move (as a temporary solution, it's still a valuable change long-term) as a blocker therefore as this seriously affects the stability of the initial phase of the CCR following by causing corrupted bytes to be send to the follower.	2020-11-05 16:29:12 +01:00
Ignacio Vera	4851bc7bae	Upgrade to Lucene-8.7.0 (#64532 ) (#64537 )	2020-11-03 16:57:04 +01:00
Ignacio Vera	d0f5066310	Upgrade to lucene-8.7.0-snapshot-72d8528c3a6 (#63912 ) (#63928 ) (#63933 )	2020-10-20 15:08:06 +02:00
Julie Tibshirani	ae2fc4118d	Add factory methods for common value fetchers. (#63438 ) This PR adds factory methods for the most common implementations: * `SourceValueFetcher.identity` to pass through the source value untouched. * `SourceValueFetcher.toString` to simply convert the source value to a string.	2020-10-08 12:14:53 -07:00
Mayya Sharipova	e022b78198	Upgrade to lucene-8.7.0-snapshot-5c4168d (#63466 ) This disables sort optim on _doc, which may still be unstable. Backport for #63444	2020-10-08 08:20:43 -04:00
Mayya Sharipova	e236ea43e9	Upgrade to lucene-8.7.0-snapshot-e914862 (#63401 ) Backport for: #63395	2020-10-07 09:45:14 -04:00
Alan Woodward	88b45dfa61	Convert TextFieldMapper to parametrized form (#63269 ) (#63392 ) As a result of this, we can remove a chunk of code from TypeParsers as well. Tests for search/index mode analyzers have moved into their own file. This commit also rationalises the serialization checks for parameters into a single SerializerCheck interface that takes the values includeDefaults, isConfigured and the value itself. Relates to #62988	2020-10-07 13:26:25 +01:00
Mayya Sharipova	f2ba62b894	Upgrade to lucene- 8.7.0-snapshot-66c49a35402 (#63372 ) This includes fixing a bug in doc iteration during sort optimization Backport for #63349	2020-10-06 22:38:58 -04:00
Julie Tibshirani	f17ca18dfa	Make array value parsing flag more robust. (#63371 ) When constructing a value fetcher, the 'parsesArrayValue' flag must match `FieldMapper#parsesArrayValue`. However there is nothing in code or tests to help enforce this. This PR reworks the value fetcher constructors so that `parsesArrayValue` is 'false' by default. Just as for `FieldMapper#parsesArrayValue`, field types must explicitly set it to true and ensure the behavior is covered by tests. Follow-up to #62974.	2020-10-06 17:49:25 -07:00
Nhat Nguyen	1a6837883a	Upgrade to Lucene-8.7.0-snapshot-77396dbf339 (#63222 ) Includes LUCENE-9554, which exposes the pendingNumDocs from IndexWriter.	2020-10-05 14:39:30 -04:00
Rene Groeschke	f58ebe58ee	Use services for archive and file operations in tasks (#62968 ) (#63201 ) Referencing a project instance during task execution is discouraged by Gradle and should be avoided. E.g. It is incompatible with Gradles incubating configuration cache. Instead there are services available to handle archive and filesystem operations in task actions. Brings us one step closer to #57918	2020-10-05 15:52:15 +02:00
Alan Woodward	01950bc80f	Move FieldMapper#valueFetcher to MappedFieldType (#62974 ) (#63220 ) For runtime fields, we will want to do all search-time interaction with a field definition via a MappedFieldType, rather than a FieldMapper, to avoid interfering with the logic of document parsing. Currently, fetching values for runtime scripts and for building top hits responses need to call a method on FieldMapper. This commit moves this method to MappedFieldType, incidentally simplifying the current call sites and freeing us up to implement runtime fields as pure MappedFieldType objects.	2020-10-04 14:54:59 +01:00
Alan Woodward	de08ba58bf	Convert percolator, murmur3 and histogram mappers to parametrized form (#63004 ) Relates to #62988	2020-09-29 14:42:26 +01:00
Mayya Sharipova	4c8c3c8df6	Upgrade lucene to lucene-8.7.0-snapshot-3b59906 (#62978 ) Backport for #62970	2020-09-28 16:52:31 -04:00
Tim Brooks	59dd889c10	Split up large HTTP responses in outbound pipeline (#62666 ) Currently Netty will batch compression an entire HTTP response regardless of its content size. It allocates a byte array at least of the same size as the uncompressed content. This causes issues with our attempts to remove humungous G1GC allocations. This commit resolves the issue by split responses into 128KB chunks. This has the side-effect of making large outbound HTTP responses that are compressed be send as chunked transfer-encoding.	2020-09-24 16:35:52 -06:00
Tim Brooks	43a4882951	Move CorsHandler to server (#62007 ) Currently we duplicate our specialized cors logic in all transport plugins. This is unnecessary as it could be implemented in a single place. This commit moves the logic to server. Additionally it fixes a but where we are incorrectly closing http channels on early Cors responses.	2020-09-24 16:32:59 -06:00
Alan Woodward	e28750b001	Add parameter update and conflict tests to MapperTestCase (#62828 ) (#62902 ) This commit adds a mechanism to MapperTestCase that allows implementing test classes to check that their parameters can be updated, or throw conflict errors as advertised. Child classes override the registerParameters method and tell the passed-in UpdateChecker class about their parameters. Simple conflicts can be checked, using the existing minimal mappings as a base to compare against, or alternatively a particular initial mapping can be provided to check edge cases (eg, norms can be updated from true to false, but not vice versa). Updates are registered with a predicate that checks that the update has in fact been applied to the resulting FieldMapper. Fixes #61631	2020-09-24 20:38:12 +01:00
Armin Braun	83ec8dd4e2	Upgrade GCS SDK to 1.113.1 (#62848 ) (#62864 ) Just staying on top of upgrades to the SDK and its dependencies.	2020-09-24 15:43:21 +02:00
Luca Cavanna	862fab06d3	Share same existsQuery impl throughout mappers (#57607 ) Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers. There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available. This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method. At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.	2020-09-23 11:00:53 +02:00
Luca Cavanna	5ca86d541c	Move stored flag from TextSearchInfo to MappedFieldType (#62717 ) (#62770 )	2020-09-23 09:40:34 +02:00
markharwood	a0df0fb074	Search - add case insensitive flag for "term" family of queries #61596 (#62661 ) Backport of fe9145f Closes #61546	2020-09-22 13:56:51 +01:00
Luca Cavanna	9ae29713fd	Dense vector field type minor fixes (#62631 ) The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x). This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.	2020-09-22 10:40:51 +02:00
Christos Soulios	6a298970fd	[7.x] Allow metadata fields in the _source (#62616 ) Backports #61590 to 7.x So far we don't allow metadata fields in the document _source. However, in the case of the _doc_count field mapper (#58339) we want to be able to set This PR adds a method to the metadata field parsers that exposes if the field can be included in the document source or not. This way each metadata field can configure if it can be included in the document _source	2020-09-18 19:56:41 +03:00
Adrien Grand	4de8579455	Upgrade to lucene-8.7.0-snapshot-830bd186a8d. (#62596 )	2020-09-18 09:51:34 +02:00
David Turner	0a3f2c453f	Hide c.a.s.s.i.UseArnRegionResolver noise (#62522 ) A recent AWS SDK upgrade has introduced a new source of spurious `WARN` logs when the security manager prevents access to the user's home directory and therefore to `$HOME/.aws/config`. This is the behaviour we want, and it's harmless and handled by the SDK as if the config doesn't exist, so this log message is unnecessary noise. This commit suppresses this noisy logging by default. Relates #20313, #56346, #53962 Closes #62493	2020-09-18 08:30:39 +01:00
Tanguy Leroux	e6777810ba	Fix S3BlobContainerRetriesTests (#62464 ) (#62551 ) The AssertingInputStream in S3BlobContainerRetriesTests verifies that InputStream are either fully consumed or aborted, but the eof flag is only set when the underlying stream returns it. When buffered read are executed and when the exact number of remaining bytes are read, the eof flag is not set to true. Instead the test should rely on the total number of bytes read to know if the stream has been fully consumed. Close #62390	2020-09-17 17:12:34 +02:00
Adrien Grand	9a8225bbc1	Upgrade to lucene-8.7.0-snapshot-9cd3af50f80. (#62450 ) (#62476 ) This new snapshot contains the following JIRAs that we're interested in: - [LUCENE-9525](https://issues.apache.org/jira/browse/LUCENE-9525) Better handling of small documents. This should improve retrieval times when documents are less than ~1kB. - [LUCENE-9510](https://issues.apache.org/jira/browse/LUCENE-9510) Faster flushes when index sorting is enabled by not compressing the temporary files that store stored fields and term vectors.	2020-09-17 10:28:20 +02:00
Nik Everett	24a24d050a	Implement fields fetch for runtime fields (backport of #61995 ) (#62416 ) This implements the `fields` API in `_search` for runtime fields using doc values. Most of that implementation is stolen from the `docvalue_fields` fetch sub-phase, just moved into the same API that the `fields` API uses. At this point the `docvalue_fields` fetch phase looks like a special case of the `fields` API. While I was at it I moved the "which doc values sub-implementation should I use for fetching?" question from a bunch of `instanceof`s to a method on `LeafFieldData` so we can be much more flexible with what is returned and we're not forced to extend certain classes just to make the fetch phase happy. Relates to #59332	2020-09-15 20:24:10 -04:00
Armin Braun	98f525f8a7	Faster Azure Blob InputStream (#61812 ) (#62387 ) Building our own that should perform better than the one in the SDK. Also, as a result saving a HEAD call for each ranged read on Azure.	2020-09-15 18:27:22 +02:00
Adrien Grand	6db8afefc2	Upgrade to lucene-8.7.0-snapshot-cdfdc1e0851. (#62376 ) Upgrade to a new Lucene snapshot that (at least partially) addresses the indexing rate regression when index sorting is enabled. Backport of #62334.	2020-09-15 17:48:07 +02:00
Tanguy Leroux	faf96c175e	Abort non-fully consumed S3 input stream (#62167 ) (#62370 ) Today when an S3RetryingInputStream is closed the remaining bytes that were not consumed are drained right before closing the underlying stream. In some contexts it might be more efficient to not consume the remaining bytes and just drop the connection. This is for example the case with snapshot backed indices prewarming, where there is not point in reading potentially large blobs if we know the cache file we want to write the content of the blob as already been evicted. Draining all bytes here takes a slot in the prewarming thread pool for nothing.	2020-09-15 14:33:37 +02:00
Francisco Fernández Castaño	21303e8e15	Take into account sas tokens while metering put object requests on azure (#62244 ) Backport of #62225 Closes #62208	2020-09-10 19:47:58 +02:00
Ignacio Vera	c8981ea93d	upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213 ) (#62222 )	2020-09-10 16:23:18 +02:00
Jake Landis	d8dad9ab2c	[7.x] Remove integTest task from PluginBuildPlugin (#61879 ) (#62135 ) This commit removes `integTest` task from all es-plugins. Most relevant projects have been converted to use yamlRestTest, javaRestTest, or internalClusterTest in prior PRs. A few projects needed to be adjusted to allow complete removal of this task * x-pack/plugin - converted to use yamlRestTest and javaRestTest * plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task * qa/die-with-dignity - convert to javaRestTest * x-pack/qa/security-example-spi-extension - convert to javaRestTest * multiple projects - remove the integTest.enabled = false (yay!) related: #61802 related: #60630 related: #59444 related: #59089 related: #56841 related: #59939 related: #55896	2020-09-09 14:25:41 -05:00
Nik Everett	b8e9a7125f	Speed up empty highlighting many fields (backport of #61860 ) (#62122 ) Kibana often highlights everything like this: ``` POST /_search { "query": ..., "size": 500, "highlight": { "fields": { "": { ... } } } } ``` This can get slow when there are hundreds of mapped fields. I tested this locally and unscientifically and it took a request from 20ms to 150ms when there are 100 fields. I've seen clusters with 2000 fields where simple search go from 500ms to 1500ms just by turning on this sort of highlighting. Even when the query is just a `range` that and the fields are all numbers and stuff so it won't highlight anything. This speeds up the `unified` highlighter in this case in a few ways: 1. Build the highlighting infrastructure once field rather than once pre document per field. This cuts out a ton* of work analyzing the query over and over and over again. 2. Bail out of the highlighter before loading values if we can't produce any results. Combined these take that local 150ms case down to 65ms. This is unlikely to be really useful when there are only a few fetched docs and only a few fields, but we often end up having many fields with many fetched docs.	2020-09-08 15:49:50 -04:00
Francisco Fernández Castaño	2bb5716b3d	Add repositories metering API (#62088 ) This pull request adds a new set of APIs that allows tracking the number of requests performed by the different registered repositories. In order to avoid losing data, the repository statistics are archived after the repository is closed for a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the statistics for the active repositories as well as the modified/closed repositories. Backport of #60371	2020-09-08 14:01:04 +02:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Ryan Ernst	d6e17170c3	Simplify adding plugins and modules to testclusters (#61886 ) There are currently half a dozen ways to add plugins and modules for test clusters to use. All of them require the calling project to peek into the plugin or module they want to use to grab its bundlePlugin task, and then both depend on that task, as well as extract the archive path the task will produce. This creates cross project dependencies that are difficult to detect, and if the dependent plugin/module has not yet been configured, the build will fail because the task does not yet exist. This commit makes the plugin and module methods for testclusters symmetetric, and simply adding a file provider directly, or a project path that will produce the plugin/module zip. Internally this new variant uses normal configuration/dependencies across projects to get the zip artifact. It also has the added benefit of no longer needing the caller to add to the test task a dependsOn for bundlePlugin task.	2020-09-03 19:37:46 -07:00
Alan Woodward	e2f006eeb4	Merge FetchSubPhase hitsExecute and hitExecute methods (#60907 ) (#61893 ) FetchSubPhase has two 'execute' methods, one which takes all hits to be examined, and one which takes a single HitContext. It's not obvious which one should be implemented by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first run the hitExecute methods of all subphases, and then subsequently call all the hitsExecute methods. This commit reworks FetchSubPhase to replace these two variants with a processor class, `FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method. This processor class has two methods, `setNextReader()` and `process`. FetchPhase collects processors from all its subphases (if a subphase does not need to execute on the current search context, it can return `null` from `getProcessor`). It then sorts its hits by docid, and groups them by lucene leaf reader. For each reader group, it calls `setNextReader()` on all non-null processors, and then passes each doc id to `process()`. Implementations of fetch sub phases can divide their concerns into per-request, per-reader and per-document sections, and no longer need to worry about sorting docs or dealing with reader slices. FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods, setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all these executors together (if a phase should not be executed, then it returns null here); then it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then execute for each hit in turn. Individual sub phases no longer need to concern themselves with sorting docs or keeping track of readers; global structures can be built in getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.	2020-09-03 12:20:55 +01:00
Tim Brooks	e573fa9abc	Add data.path fast path for FilePermission (#61302 ) The recursive data.path FilePermission check is an extremely hot codepath in Elasticsearch. Unfortunately the FilePermission check in Java is extremely allocation heavy. As it iterates through different file permissions, it allocates byte arrays for each Path component that must be compared. This PR improves the situation by adding the recursive data.path FilePermission it its own PermissionsCollection object which is checked first.	2020-09-01 12:03:22 -06:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Armin Braun	0da20579ca	Cleanly Handle S3 SDK Exceptions in Request Counting (#61686 ) (#61698 ) It looks like it is possible for a request to throw an exception early before any API interaciton has happened. This can lead to the request count map containing a `null` for the request count key. The assertion is not correct and we should not NPE here (as that might also hide the original exception since we are running this code in a `finally` block from within the S3 SDK). Closes #61670	2020-08-31 11:05:59 +02:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Nik Everett	87cf81e179	Migrate some more mapper test cases (#61507 ) (#61552 ) Migrate some more mapper test cases from `ESSingleNodeTestCase` to `MapperTestCase`.	2020-08-25 15:27:26 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Julie Tibshirani	997c73ec17	Correct how field retrieval handles multifields and copy_to. (#61391 ) Before when a value was copied to a field through a parent field or `copy_to`, we parsed it using the `FieldMapper` from the source field. Instead we should parse it using the target `FieldMapper`. This ensures that we apply the appropriate mapping type and options to the copied value. To implement the fix cleanly, this PR refactors the value parsing strategy. Now instead of looking up values directly, field mappers produce a helper object `ValueFetcher`. The value fetchers are responsible for almost all aspects of fetching, including looking up the right paths in the _source. The PR is fairly big but each commit can be reviewed individually. Fixes #61033.	2020-08-20 15:53:35 -07:00
Rory Hunter	be4ebfbf46	Remove old test mute code (#61277 ) It seems that some old test mute code, added as part of #31498, was never removed. This meant that the HDFS tests would fail when run under JDK 11.	2020-08-19 09:40:59 +01:00
Jake Landis	cb9f4cdae2	Fix the REST FIPS tests (#61001 ) Adds bouncycastle to classpath for tests and testclusters	2020-08-13 16:23:54 -07:00
Alan Woodward	54279212cf	Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847 ) (#60924 ) This commit cuts over all metadata field mappers to parametrized format.	2020-08-11 09:02:28 +01:00
Armin Braun	3e2dfc6eac	Remove GCS Bucket Exists Check (#60899 ) (#60914 ) Same as https://github.com/elastic/elasticsearch/pull/43288 for GCS. We don't need to do the bucket exists check before using the repo, that just needlessly increases the necessary permissions for using the GCS repository.	2020-08-11 09:54:27 +02:00
Rene Groeschke	bdd7347bbf	Merge test runner task into RestIntegTest (7.x backport) (#60600 ) * Merge test runner task into RestIntegTest (#60261) * Merge test runner task into RestIntegTest * Reorganizing Standalone runner and RestIntegTest task * Rework general test task configuration and extension * Fix merge issues * use former 7.x common test configuration	2020-08-04 14:46:32 +02:00
Armin Braun	7ae9dc2092	Unify Stream Copy Buffer Usage (#56078 ) (#60608 ) We have various ways of copying between two streams and handling thread-local buffers throughout the codebase. This commit unifies a number of them and removes buffer allocations in many spots.	2020-08-04 09:54:52 +02:00
Rene Groeschke	ed4b70190b	Replace immediate task creations by using task avoidance api (#60071 ) (#60504 ) - Replace immediate task creations by using task avoidance api - One step closer to #56610 - Still many tasks are created during configuration phase. Tackled in separate steps	2020-07-31 13:09:04 +02:00
Julie Tibshirani	dfd7f226f0	Clarify SourceLookup sharing across fetch subphases. (#60484 ) The `SourceLookup` class provides access to the _source for a particular document, specified through `SourceLookup#setSegmentAndDocument`. Previously the search context contained a single `SourceLookup` that was shared between different fetch subphases. It was hard to reason about its state: is `SourceLookup` set to the expected document? Is the _source already loaded and available? Instead of using a global source lookup, the fetch hit context now provides access to a lookup that is set to load from the hit document. This refactor closes #31000, since the same `SourceLookup` is no longer shared between the 'fetch _source phase' and script execution.	2020-07-30 13:22:31 -07:00
Julie Tibshirani	5359417ec3	Minor clean-up around search highlight context. (#60422 ) * Rename SearchContextHighlight -> SearchHighlightContext. * Rename HighlighterContext to FieldHighlightContext. * Make the search highlight context immutable. * Avoid storing SearchHighlightContext on HighlighterContext.	2020-07-29 11:39:17 -07:00
Jake Landis	6ce30bea08	[7.x] Convert most OSS plugins from integTest to [yaml \| java]RestTest or internalClusterTest (#59444 ) (#60343 ) For all OSS plugins (except repository-* and discovery-) integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. This commit does NOT convert the discovery- and repository-* since they are bit more complex then the rest of tests and this PR is large enough. Those plugins will be addressed in a future PR(s). This commit also fixes a minor issue that did not copy the rest api for projects that only had YAML TEST tests. related: #56841	2020-07-29 13:06:13 -05:00
Jake Landis	f6abd67029	[7.x] Convert discovery-* from integTest to [yaml \| java]RestTest or internalClusterTest (#60084 ) (#60344 ) For OSS plugins that begin with discovery-*, the integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. related: #56841 related: #59444	2020-07-29 11:20:19 -05:00
Jake Landis	96b7122917	[7.x] Convert repository-* from integTest to [yaml \| java]RestTest or internalClusterTest (#60085 ) (#60404 ) For OSS plugins that being with repository-*, integTest task is now a no-op and all of the tests are now executed via a test, yamlRestTest, javaRestTest, or internalClusterTest. related: #56841 related: #59444	2020-07-29 11:19:44 -05:00
David Turner	bbacad648a	Fix network logging test failures (#60334 ) In #60297 we added some tests related to logging from the transport layer, but these tests failed occasionally since the cluster was kept alive between test invocations but the logging framework expected it only to be used for a single test. With this commit we reduce the scope of the internal test cluster to `TEST` to solve this problem. Closes #60321.	2020-07-29 08:29:09 +01:00
Julie Tibshirani	c7bfb5de41	Add search `fields` parameter to support high-level field retrieval. (#60258 ) This feature adds a new `fields` parameter to the search request, which consults both the document `_source` and the mappings to fetch fields in a consistent way. The PR merges the `field-retrieval` feature branch. Addresses #49028 and #55363.	2020-07-28 10:58:20 -07:00
David Turner	9c62b5cb96	Mute tests for #60321	2020-07-28 18:12:54 +01:00
David Turner	9450ea08b4	Log and track open/close of transport connections (#60297 ) Transport connections between nodes remain in place until one or other node shuts down or the connection is disrupted by a flaky network. Today it is very difficult to demonstrate that transient failures and cluster instability are caused by the network even though this is often the case. In particular, transport connections open and close without logging anything, even at `DEBUG` level, making it very hard to quantify the scale of the problem or to correlate the networking problems with external events. This commit adds the missing `DEBUG`-level logging when transport connections open and close, and also tracks the total number of transport connections a node has opened as a measure of the stability of the underlying network.	2020-07-28 17:08:04 +01:00
Yannick Welsch	ffe114b890	Set specific keepalive options by default on supported platforms (#59278 ) keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of distributed systems such as Elasticsearch. This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations that support it (>= Java 11 & (MacOS \|\| Linux)) and where the system defaults are set to something higher than 5 minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings unless they are deemed to be set too high by providing better out-of-the-box defaults.	2020-07-28 11:10:04 +02:00
Armin Braun	ebb6677815	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) (#60051 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 21:06:31 +02:00
Nik Everett	6f6076e208	Drop some params from IndexFieldData.Builder (backport of #59934 ) (#59972 ) We never used the `IndexSettings` parameter and we only used the `MappedFieldType` parameter to get the name of the field which we already know everywhere where we build the `IFD.Builder`. This allows us to drop a fair bit of ceremony from a couple of tests.	2020-07-21 10:28:59 -04:00
Ignacio Vera	f8037abf47	upgrade to lucene-8.6.0 release (#59596 ) (#59599 )	2020-07-15 12:40:57 +02:00
Armin Braun	2dd086445c	Enable Fully Concurrent Snapshot Operations (#56911 ) (#59578 ) Enables fully concurrent snapshot operations: * Snapshot create- and delete operations can be started in any order * Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one)) * Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations See updated JavaDoc and added test cases for more details and illustration on the functionality. Some notes: The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up. This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots. Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first	2020-07-15 03:42:31 +02:00
Armin Braun	e1014038e9	Simplify Repository.finalizeSnapshot Signature (#58834 ) (#59574 ) Many of the parameters we pass into this method were only used to build the `SnapshotInfo` instance to write. This change simplifies the signature. Also, it seems less error prone to build `SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository implementation will build the correct `SnapshotInfo`.	2020-07-15 00:14:28 +02:00

1 2 3 4 5 ...

2757 Commits