OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ryan Ernst	7e9b957da5	Handle JAVA_HOME better in packaging tests (#62905 ) JAVA_HOME is set as necessary in packaging tests, depending on whether it is needed for no-jdk distributions or testing override behavior. We currently rely on gradle finding java through PATH. However, JAVA_HOME can sometimes be set by the system itself, which then leaks through to the packaging test. This commit reworks our handling of JAVA_HOME to pass it through for gradle, and then explicitly clear it whenever running shell commands in packaging tests.	2020-09-24 17:01:29 -07:00
Ryan Ernst	acd49f89c7	Re-enable PluginCliTests.test20SymlinkPluginsDir (#62736 ) This test was disabled with an awaits fix, but the underlying issue has been worked around, so the test can be re-enabled. relates #46050 relates #58628	2020-09-24 16:48:44 -07:00
Tim Brooks	59dd889c10	Split up large HTTP responses in outbound pipeline (#62666 ) Currently Netty will batch compression an entire HTTP response regardless of its content size. It allocates a byte array at least of the same size as the uncompressed content. This causes issues with our attempts to remove humungous G1GC allocations. This commit resolves the issue by split responses into 128KB chunks. This has the side-effect of making large outbound HTTP responses that are compressed be send as chunked transfer-encoding.	2020-09-24 16:35:52 -06:00
Tim Brooks	43a4882951	Move CorsHandler to server (#62007 ) Currently we duplicate our specialized cors logic in all transport plugins. This is unnecessary as it could be implemented in a single place. This commit moves the logic to server. Additionally it fixes a but where we are incorrectly closing http channels on early Cors responses.	2020-09-24 16:32:59 -06:00
Ryan Ernst	9c0444145e	Avoid bundled jdk test on legacy platforms This commit skips a test of bundled jdk behavior on legacy platforms that can't run the bundled jdk.	2020-09-24 15:21:23 -07:00
Mayya Sharipova	54064a1eec	Unsigned long 64bits(#62892 ) Introduce 64-bit unsigned long field type This field type supports - indexing of integer values from [0, 18446744073709551615] - precise queries (term, range) - precise sort and terms aggregations - other aggregations are based on conversion of long values to double and can be imprecise for large values. Backport for #60050 Closes #32434	2020-09-24 16:51:47 -04:00
Andrei Stefan	a43f29cfc9	EQL: data streams tests for PIT and EQL sequences (#62850 ) (#62889 ) * PIT should run well with data streams (cherry picked from commit 0a89a7db848b015b797c7678874b5c9e33bbd650)	2020-09-24 23:37:46 +03:00
Alan Woodward	e28750b001	Add parameter update and conflict tests to MapperTestCase (#62828 ) (#62902 ) This commit adds a mechanism to MapperTestCase that allows implementing test classes to check that their parameters can be updated, or throw conflict errors as advertised. Child classes override the registerParameters method and tell the passed-in UpdateChecker class about their parameters. Simple conflicts can be checked, using the existing minimal mappings as a base to compare against, or alternatively a particular initial mapping can be provided to check edge cases (eg, norms can be updated from true to false, but not vice versa). Updates are registered with a predicate that checks that the update has in fact been applied to the resulting FieldMapper. Fixes #61631	2020-09-24 20:38:12 +01:00
Armin Braun	4b9ddb48b6	Add Missing Netty Runtime Proc Property to Security Tests (#62846 ) (#62890 ) Same as in the normal Netty tests we have to disable the runtime proc setting in the normal tests task just like we do for the internal cluster tests. Closes #61919 Closes #62298	2020-09-24 20:48:38 +02:00
Jim Ferenczi	78a93dc18f	Request-level circuit breaker support on coordinating nodes (#62884 ) This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking search request to a more sane number. Closes #37182	2020-09-24 18:59:28 +02:00
Dan Hermann	cd584d49dc	Bump version after 7.9.2 release	2020-09-24 10:48:57 -05:00
Nik Everett	719a76e4bd	Grok: "native" results (backport of #62843 ) (#62886 ) This adds the ability to fetch java primitives like `long` and `float` from grok matches rather than their boxed versions. It also allows customizing the which fields are extracted and how they are extracted. By default we continue to fetch a `Map<String, Object>` but runtime fields will be able to catch just the fields it is interested in, and the values will be primitives.	2020-09-24 11:47:13 -04:00
Andrei Dan	3590a77b2b	HLRC: add support for the searchable_snapshot ILM action (#62323 ) (#62887 ) (cherry picked from commit 681eb58718c4cce9ed18a835f4eadb06997e91a0) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-24 16:45:50 +01:00
Benjamin Trent	c56424f740	[ML] write deprecation warning when include_model_definition parameter is used (#62834 ) (#62885 ) for get trained models include_model_definition is now deprecated. This commit writes a deprecation warning if that parameter is used and suggests the caller to utilize the replacement	2020-09-24 11:38:54 -04:00
Stuart Tettemer	8d69334c2f	Scripting: Watcher defaults to unlimited compile rate (#62655 ) (#62671 ) Backport of #62655	2020-09-24 10:22:50 -05:00
Martijn van Groningen	8ca33feffd	Fail with correct error if first backing index exists when auto creating data stream (#62862 ) Backport #62825 to 7.x branch. Today if a data stream is auto created, but an index with same name as the first backing index already exists then internally that error is ignored, which then result that later in the execution of a bulk request, the bulk item fails due to that the data stream hasn't been auto created. This situation can only occur if an index with same is created that will be the backing index of a data stream prior to the creation of the data stream. Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>	2020-09-24 17:16:34 +02:00
Nik Everett	ce24115ba3	Speed up date_histogram by precomputing ranges (backport of #61467 ) (#62880 ) A few of us were talking about ways to speed up the `date_histogram` using the index for the timestamp rather than the doc values. To do that we'd have to pre-compute all of the "round down" points in the index. It turns out that just precomputing those values speeds up rounding fairly significantly: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 130598950.850 ± 1249189.867 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op ``` That's a 46% speed up when there isn't a time zone and a 58% speed up when there is. This doesn't work for every time zone, specifically those that have two midnights in a single day due to daylight savings time will produce wonky results. So they don't get the optimization. Second, this requires a few expensive computation up front to make the transition array. And if the transition array is too large then we give up and use the original mechanism, throwing away all of the work we did to build the array. This seems appropriate for most usages of `round`, but this change uses it for all usages of `round`. That seems ok for now, but it might be worth investigating in a follow up. I ran a macrobenchmark as well which showed an 11% preformance improvement. BUT the benchmark wasn't tuned for my desktop so it overwhelmed it and might have produced "funny" results. I think it is pretty clear that this is an improvement, but know the measurement is weird: ``` Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 g± 1249189.867 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op Before: \| Min Throughput \| hourly_agg \| 0.11 \| ops/s \| \| Median Throughput \| hourly_agg \| 0.11 \| ops/s \| \| Max Throughput \| hourly_agg \| 0.11 \| ops/s \| \| 50th percentile latency \| hourly_agg \| 650623 \| ms \| \| 90th percentile latency \| hourly_agg \| 821478 \| ms \| \| 99th percentile latency \| hourly_agg \| 859780 \| ms \| \| 100th percentile latency \| hourly_agg \| 864030 \| ms \| \| 50th percentile service time \| hourly_agg \| 9268.71 \| ms \| \| 90th percentile service time \| hourly_agg \| 9380 \| ms \| \| 99th percentile service time \| hourly_agg \| 9626.88 \| ms \| \|100th percentile service time \| hourly_agg \| 9884.27 \| ms \| \| error rate \| hourly_agg \| 0 \| % \| After: \| Min Throughput \| hourly_agg \| 0.12 \| ops/s \| \| Median Throughput \| hourly_agg \| 0.12 \| ops/s \| \| Max Throughput \| hourly_agg \| 0.12 \| ops/s \| \| 50th percentile latency \| hourly_agg \| 519254 \| ms \| \| 90th percentile latency \| hourly_agg \| 653099 \| ms \| \| 99th percentile latency \| hourly_agg \| 683276 \| ms \| \| 100th percentile latency \| hourly_agg \| 686611 \| ms \| \| 50th percentile service time \| hourly_agg \| 8371.41 \| ms \| \| 90th percentile service time \| hourly_agg \| 8407.02 \| ms \| \| 99th percentile service time \| hourly_agg \| 8536.64 \| ms \| \|100th percentile service time \| hourly_agg \| 8538.54 \| ms \| \| error rate \| hourly_agg \| 0 \| % \| ```	2020-09-24 11:03:47 -04:00
Armin Braun	83ec8dd4e2	Upgrade GCS SDK to 1.113.1 (#62848 ) (#62864 ) Just staying on top of upgrades to the SDK and its dependencies.	2020-09-24 15:43:21 +02:00
Daniel Mitterdorfer	d2166030d1	Mute failing test case in DeleteExpiredDataIT (#62870 ) (#62871 ) Relates #62699	2020-09-24 15:42:52 +02:00
James Rodewig	20630b0088	[DOCS] Correct the documented behaviour of `track_total_hits` (#62837 ) (#62867 ) If `track_total_hits=true` is used, the exact value of the number of hits is returned - i.e. the value is effectively limitless, and not the default value of 10,000 Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>	2020-09-24 09:18:38 -04:00
Daniel Mitterdorfer	00ce1d7e4b	Mute failing test in IndexRecoveryIT (#62865 ) (#62868 ) Relates #62863	2020-09-24 15:16:40 +02:00
Andrei Dan	e323c5245b	[7.x] ILM: migrate action configures the _tier_preference setting (#62829 ) (#62860 ) The `migrate` action will now configure the `index.routing.allocation.include._tier_preference` setting to the corresponding tiers. For the HOT phase it will configure `data_hot`, for the WARM phase it will configure `data_warm,data_hot` and for the COLD phase `data_cold,data_warm,data_cold`. (cherry picked from commit 9dbf0e6f0c267e40c5bcfb568bb2254da103ae40) Signed-off-by: Andrei Dan <andrei.dan@elastic.co>	2020-09-24 13:37:09 +01:00
Rory Hunter	752590041e	Upgrade spotless dependency (#62857 ) No changes to formatted Java files.	2020-09-24 12:52:45 +01:00
Daniel Mitterdorfer	aec7c65af4	Mute DiskThresholdDeciderIT (#62858 ) (#62859 ) Relates #62326	2020-09-24 13:24:11 +02:00
Rory Hunter	7771d8b6fa	Tweak the ECS fields in DeprecatedMessage (#62855 ) Backport of #62855. Follow-up to #61484.	2020-09-24 12:07:48 +01:00
Costin Leau	71b92f8699	QL: Optimize Like/Rlike all (#62682 ) Replace common Like and RLike queries that match all characters with IsNotNull (exists) queries Fix #62585 (cherry picked from commit 4c23fad0468a9edd7325b06c6a96f7af37625dbf)	2020-09-24 13:44:53 +03:00
Martijn van Groningen	8d73379493	Adjust skip version in data stream yaml test. (#62831 ) (#62851 ) Relates to #62766	2020-09-24 11:00:02 +02:00
Rory Hunter	1515951de5	Change approach to checking GID in Docker (#62751 ) Closes #62466. Since we're still seeing occasional failures when checking the GID of all files in the Docker image due to Elasticsearch running in the background, instead run a new container with ES running at all.	2020-09-24 09:36:11 +01:00
Hendrik Muhs	a70389015d	[Transform] Return parsed count for get transform stats (#62809 ) In case of more than 500 transforms, get and stats return paged results which can be requested using page parameters. For >500 transforms count wasn't parsed out of the server response but taken from size of the list of transforms. The change also adds client/server hlrc tests and fixes a wrong type for count in get. fixes #56245	2020-09-24 08:38:07 +02:00
Ryan Ernst	1c26926dea	Avoid using bundled jdk on unsupported platforms (#62793 ) We use the bundled jdk for unit, integ and packaging tests. Since upgrading to jdk 15, centos-6 and oracle enterprise linux 6 have failed due to versions of glibc no longer supported by the jdk. This commit adds detection of the old glibc versions to gradle, and utilizes that when deciding which jdk to use for tests. relates #62709 closes #62635	2020-09-23 16:55:47 -07:00
Julie Tibshirani	f971146de4	Rename FieldValueRetriever -> FieldFetcher. (#62795 ) (#62836 ) The name `FieldFetcher` fits better with the 'fetch' terminology we use elsewhere, for example `FetchFieldsPhase` and `ValueFetcher`. This PR also moves the construction of the fetcher off the context and onto `FetchFieldsPhase`, which feels like a more natural place for it, and fixes a TODO in javadocs.	2020-09-23 10:12:23 -07:00
Nhat Nguyen	38c8a55df8	Better UUID for reader context (#62799 ) We can use a single and stronger UUID for all reader contexts created by the same SearchService. Backport of #62715	2020-09-23 12:50:18 -04:00
Julie Tibshirani	7ba0c95191	Mute ClusterHealthIT.testHealthOnMasterFailover while we await a fix.	2020-09-23 09:17:45 -07:00
Alan Woodward	7984e4e89f	Fix test bug in SpanMultiTermQueryBuilderTests (#62833 ) This test checks to see if the index has been created before version 6.4, in which case index prefixes are unavailable and so it expects to see a span multi-term wrapper. However, the production code doesn't bother with checking for versions, because if the field in question is configured with index_prefixes then it knows that it must have been created post 6.4 (you can't merge in a new index_prefixes configuration). This commit alters the test to remove the random version checks, as we know we will always have a prefix field available in this scenario. Fixes #58199	2020-09-23 17:02:12 +01:00
Martijn van Groningen	0baefc8ddc	Always validate that only a create op is allowed in bulk api for data streams (#62820 ) Backport #62766 to 7.x branch. The bulk api cache the resolved concrete indices when resolving the user provided index name into the actual index name. The validation that prevents write ops other than create from being executed in a data stream was only performed if the result wasn't cached. In case of cached resolvings, the validation never occurs. The validation would be skipped for all bulk items for a data stream after a create operation for that same data stream. This commit ensures that the validation is always performed for all bulk items (whether the concrete index resolution has been cached or not cached). Closes #62762	2020-09-23 16:27:54 +02:00
Nik Everett	f8bc5a3e6b	Grok: Handle utf-8 natively (backport of #62794 ) (#62826 ) This adds a method to `Grok` that matches against sections offset from utf-8 byte arrays: ``` Map<String, Object> captures(byte[] utf8Bytes, int offset, int length) ``` This'll be useful for the grok-flavored runtime fields because they want to match against utf-8 encoded strings stored in a big array. And joni already supports this.	2020-09-23 09:33:03 -04:00
Dimitris Athanasiou	7de5201291	[7.x][ML] Handle data frame analytics state spreading over multiple docs (#62564 ) (#62824 ) When state persistence was first implemented for data frame analytics we had the assumption that state would always fit in a single document. However this is not the case any more. This commit adds handling of state that spreads over multiple documents. Backport of #62564	2020-09-23 16:16:34 +03:00
James Rodewig	e3d5915566	[DOCS] Fix JSON spec linnk for PIT API (#61783 )	2020-09-23 14:29:06 +02:00
Armin Braun	a754fd8020	Fix CoordinatorTests.testLogsMessagesIfPublicationDelayed (#62815 ) (#62822 ) We need to account for an addional `DEFAULT_DELAY_VARIABILITY` timeout for the lag detector task to be executed after its scheduled. Closes #62383	2020-09-23 14:23:28 +02:00
Dimitris Athanasiou	69e72656fa	[7.x][ML] Reset reindexing progress when DFA job resumes with incomplete reindexing (#62772 ) (#62816 ) This fixes reindexing progress in the scenario when a DFA job that had not finished reindexing is resumed (either because the user called stop and start or because the job was reassigned in the middle of reindexing). Before the fix reindexing progress stays to the value it had reached before until it surpasses that value. When we resume a data frame analytics job we want to preserve reindexing progress and reset all other phases. Except for when reindexing was not completed. In that case we are deleting the destination index and starting reindexing from scratch. Thus we need to reset reindexing progress too. Backport of #62772	2020-09-23 14:09:04 +03:00
Christoph Büscher	054a950ceb	Align version field plugin naming (#62757 ) To better align the plugin naming with other mapper plugins under x-pack (e.g. mapper-flattened) this PR changes the plugin name and the containing directory to "mapper-version"	2020-09-23 11:50:15 +02:00
Christoph Büscher	29074e7055	Add case insensitive prefix and wildcard to 'version' field (#62754 ) (#62782 ) This change adds support for the recently introduced case insensitivity flag for wildcard and prefix queries. Since version field values are encoded differently we need to adapt our own AutomatonQuery variation to add both cases if case insensitivity is turned on.	2020-09-23 11:48:34 +02:00
Ignacio Vera	81645ec2cc	nextSetBit should check if the underlaying array contains the current word (#62805 ) (#62812 ) This is a recent addition and it is missing a check as the underlaying array can be smaller that the numBits capacity.	2020-09-23 11:17:26 +02:00
Luca Cavanna	862fab06d3	Share same existsQuery impl throughout mappers (#57607 ) Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers. There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available. This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method. At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.	2020-09-23 11:00:53 +02:00
Przemko Robakowski	005e0bffaf	[7.x] Make for each processor resistant to field modification (#62791 ) (#62807 ) * Make for each processor resistant to field modification (#62791) This change provides consistent view of field that foreach processor is iterating over. That prevents it to go into infinite loop and put great pressure on the cluster. Closes #62790 * fix compilation	2020-09-23 10:46:00 +02:00
David Kyle	bc34ecc581	[ML] Mute annotations index upgrade mapping test (#62814 ) For #61908	2020-09-23 09:37:04 +01:00
Luca Cavanna	5ca86d541c	Move stored flag from TextSearchInfo to MappedFieldType (#62717 ) (#62770 )	2020-09-23 09:40:34 +02:00
Albert Zaharovits	b4ec821067	Fix doc-update interceptor for indices with DLS and FLS (#61516 ) This fixes the protection against updates (and bulk updates) for indices with DLS and/or FLS, when the request uses date math expressions.	2020-09-23 08:55:22 +03:00
Nhat Nguyen	663b85b98f	Make keep alive optional in PointInTimeBuilder (#62720 ) Remove the keepAlive parameter from the constructor of PointInTimeBuilder as it's optional.	2020-09-22 18:52:54 -04:00
Nik Everett	7ffea4621d	Extract capture config from grok patterns up front (backport of #62706 ) (#62785 ) This extracts the configuration for extracting values from a groked string when building the grok expression to do two things: 1. Create a method exposing that configuration on `Grok` itself which will be used grok `grok` flavored runtime fields. 2. Marginally speed up extracting grok values by skipping a little string manipulation.	2020-09-22 17:44:42 -04:00

... 5 6 7 8 9 ...

54120 Commits All Branches Search

54120 Commits

All Branches