OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ryan Ernst	5fba8cbc7b	Rename local Environment var in Node to avoid confusion (#52602 ) When the Node class is being constructed, an initial environment is passed in with the initial settings for the node. Once the plugin servicie is initialized, the final Environment+Settings are created, at which point the initial environment should no longer be used. This commit renames the constructor arg to avoid naming clashes with the final environment variable.	2020-02-24 11:14:46 -08:00
Lee Hinman	7d9de8412a	[7.x] fix npe in RestPluginsAction (#52620 ) (de56de9a) (#52721 ) Relates #45321 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Kaihong.Wang <kyra.wkh@alibaba-inc.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-24 11:57:01 -07:00
Mayya Sharipova	034b1c0ba3	Correct boost calculation in script_score query (#52478 ) (#52724 ) Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes #48465	2020-02-24 13:48:21 -05:00
Adrien Grand	f993ef80f8	Move the terms index of `_id` off-heap. (#52518 ) In #42838 we moved the terms index of all fields off-heap except the `_id` field because we were worried it might make indexing slower. In general, the indexing rate is only affected if explicit IDs are used, as otherwise Elasticsearch almost never performs lookups in the terms dictionary for the purpose of indexing. So it's quite wasteful to require the terms index of `_id` to be loaded on-heap for users who have append-only workloads. Furthermore I've been conducting benchmarks when indexing with explicit ids on the http_logs dataset that suggest that the slowdown is low enough that it's probably not worth forcing the terms index to be kept on-heap. Here are some numbers for the median indexing rate in docs/s: \| Run \| Master \| Patch \| \| --- \| ------- \| ------- \| \| 1 \| 45851.2 \| 46401.4 \| \| 2 \| 45192.6 \| 44561.0 \| \| 3 \| 45635.2 \| 44137.0 \| \| 4 \| 46435.0 \| 44692.8 \| \| 5 \| 45829.0 \| 44949.0 \| And now heap usage in MB for segments: \| Run \| Master \| Patch \| \| --- \| ------- \| -------- \| \| 1 \| 41.1720 \| 0.352083 \| \| 2 \| 45.1545 \| 0.382534 \| \| 3 \| 41.7746 \| 0.381285 \| \| 4 \| 45.3673 \| 0.412737 \| \| 5 \| 45.4616 \| 0.375063 \| Indexing rate decreased by 1.8% on average, while memory usage decreased by more than 100x. The `http_logs` dataset contains small documents and has a simple indexing chain. More complex indexing chains, e.g. with more fields, ingest pipelines, etc. would see an even lower decrease of indexing rate.	2020-02-24 18:14:12 +01:00
Alan Woodward	7dc41a3b83	Use BoostQuery rather than FunctionScoreQuery for query-time indices_boost (#52272 ) This is a trivial change, but it should result in a slightly more efficient query boost.	2020-02-24 14:41:46 +00:00
Nik Everett	d26d7721ea	Continue realizing sorting by aggregations (backport of #52298 ) (#52667 ) This drops more of the `instanceof`s from `AggregationPath`. There are still a couple in `AggregationPath`. And I ended up moving two into `BucketsAggregator`, but I think this is still an improvement!	2020-02-23 17:13:55 -05:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	8abfda0b59	Rename assertThrows to prevent naming clash (#52651 ) This commit renames ElasticsearchAssertions#assertThrows to assertRequestBuilderThrows and assertFutureThrows to avoid a naming clash with JUnit 4.13+ and static imports of these methods. Additionally, these methods have been updated to make use of expectThrows internally to avoid duplicating the logic there. Relates #51787 Backport of #52582	2020-02-21 13:30:11 -07:00
Stuart Tettemer	376932a47d	Scripting: split out compile limits and caching (#52498 ) (#52652 ) Phase 1 of adding compilation limits per context. * Refactor rate limiting and caching into separate class, `ScriptCache`, which will be used per context. * Disable compilation limit for certain tests. Backport of 0866031 Refs: #50152	2020-02-21 12:10:51 -07:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
markharwood	96d603979b	Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584 ) Upgrading 7x to same Lucene 8.5 version used in master	2020-02-21 10:25:03 +00:00
Armin Braun	0a09e15959	Add Caching for RepositoryData in BlobStoreRepository (#52341 ) (#52566 ) Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode). `RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO. The benefits of this move are: * Much faster repository status API calls * listing all snapshot names becomes instant * Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data * Additional cloud cost savings * Better resiliency, saving another spot where an IO issue could break the snapshot * We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.	2020-02-21 10:20:07 +01:00
Armin Braun	4bb780bc37	Refactor Inflexible Snapshot Repository BwC (#52365 ) (#52557 ) * Refactor Inflexible Snapshot Repository BwC (#52365) Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere. Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.	2020-02-21 09:14:34 +01:00
Ignacio Vera	107f00a4ec	Add support for multipoint geoshape queries (#52133 ) (#52553 ) Currently multi-point queries are not supported when indexing your data using BKD-backed geoshape strategy. This commit removes this limitation.	2020-02-21 07:45:53 +01:00
Yannick Welsch	d76358c875	Deprecate fixed_auto_queue_size thread pool type (#52399 ) Relates #52280	2020-02-20 11:11:06 +01:00
Yannick Welsch	3afb5ca133	Fix synchronization in ByteSizeCachingDirectory (#52512 ) One particular code place was synchronizing on the wrong object.	2020-02-19 16:10:39 +01:00
Przemysław Witek	7cd997df84	[ML] Make ml internal indices hidden (#52423 ) (#52509 )	2020-02-19 14:02:32 +01:00
Ignacio Vera	8d2261fe47	Refactor GeoShapeIndexer by extracting polygon / line decomposers (#52422 ) (#52506 ) Refactor GeoShapeIndexer. We extract Polygon and Line decomposers which are in charge of breaking a shape around the dateline if needed.	2020-02-19 12:04:29 +01:00
Henning Andersen	9d40277d4c	Deciders should not by default collect yes'es (#52438 ) AllocationDeciders would collect Yes decisions when not asking for debug info. Changed to only include Yes decisions when debug is requested (explain).	2020-02-19 11:18:03 +01:00
Henning Andersen	d4bc3b75dc	Reindex: allow comma separated source indices (#52044 ) Added ability to specify comma separated list of source indices without array. Also fixed so that empty string results in validation error rather than index does not exist. Closes #51949	2020-02-19 09:23:15 +01:00
David Turner	baf184c93f	Avoid using WindowsFS in ClusterRerouteIT (#52488 ) Issue #52000 looks like a case of cluster state updates being slower than expected, but it seems that these slowdowns are relatively rare: most invocations of `testDelayWithALargeAmountOfShards` take well under a minute in CI, but there are occasional failures that take 6+ minutes instead. When it fails like this, cluster state persistence seems generally slow: most are slower than expected, with some small updates even taking over 2 seconds to complete. The failures all have in common that they use `WindowsFS` to emulate Windows' behaviour of refusing to delete files that are still open, by tracking all files (really, inodes) and validating that deleted files are really closed first. There is a suggestion that this is a little slow in the Lucene test framework [1]. To see if we can attribute the slowdown to that common factor, this commit suppresses the use of `WindowsFS` for this test suite. [1] `4a513fa99f/lucene/test-framework/src/java/org/apache/lucene/util/TestRuleTemporaryFilesCleanup.java (L166)`	2020-02-19 07:52:49 +00:00
Tim Brooks	8038f9bba6	Do not lock when generating time based uuid (#52436 ) Currently we lock when generating time based uuids. The lock is implemented to prevent concurrent writes to the last timestamp. The uuid generation is an area of contention when indexing. This commit modifies the code to use atomic compare and set operations to update the last timestamp.	2020-02-18 09:55:51 -07:00
Tim Brooks	7fcd997b39	Do not lock on settings keyset if keys initialized (#52435 ) Every time a setting#exist call is made we lock on the keyset to ensure that it has been initialized. This a heavyweight operation that only should be done once. This commit moves to a volatile read instead to prevent unnecessary locking.	2020-02-18 09:36:07 -07:00
Tim Brooks	a742c58d45	Extract a ConnectionManager interface (#51722 ) Currently we have three different implementations representing a `ConnectionManager`. There is the basic `ConnectionManager` which holds all connections for a cluster. And a remote connection manager which support proxy behavior. And a stubbable connection manager for tests. The remote and stubbable instances use the delegate pattern, so this commit extracts an interface for them all to implement.	2020-02-18 09:19:24 -07:00
Benedict Jin	0c4f7dc193	Minor code improvements (#51921 ) Fix some whitespaces, comments and usage of `this.`. (cherry picked from commit 9f59900bf6389172811eb2279c17a2dc7cd9dfdf)	2020-02-18 16:00:05 +01:00
David Turner	3d57a78deb	Add extra logging for investigation into #52000 (#52472 ) It looks like #52000 is caused by a slowdown in cluster state application (maybe due to #50907) but I would like to understand the details to ensure that there's nothing else going on here too before simply increasing the timeout. This commit enables some relevant `DEBUG` loggers and also captures stack traces from all threads rather than just the three hottest ones.	2020-02-18 13:02:33 +00:00
Armin Braun	57d6dd7e31	Fix Non-Verbose Snapshot List Missing Empty Snapshots (#52433 ) (#52456 ) We were not including snapshots without indices in the non-verbose listing because we used the snapshot -> indices mapping to get the snapshots.	2020-02-18 11:37:53 +01:00
Armin Braun	cc628748e1	Optimize FilterStreamInput for Network Reads (#52395 ) (#52403 ) When `FilterStreamInput` wraps a Netty `ByteBuf` based stream it did not forward the bulk primitive reads to the delegate. These are optimized on the delegate but if they're not forwarded then the delegate will be called e.g. 4 times to read an `int`. This happens for essentially all network reads prior to this change because they all run from a `NamedWritableAwareStreamInput`. This also required optimising `BufferedChecksumStreamInput` individually to use bulk reads from the buffer because it implicitly assumed that the filter stream input wouldn't override any of the bulk operations.	2020-02-17 13:07:19 +01:00
Nik Everett	146def8caa	Implement top_metrics agg (#51155 ) (#52366 ) The `top_metrics` agg is kind of like `top_hits` but it only works on doc values so it should be faster. At this point it is fairly limited in that it only supports a single, numeric sort and a single, numeric metric. And it only fetches the "very topest" document worth of metric. We plan to support returning a configurable number of top metrics, requesting more than one metric and more than one sort. And, eventually, non-numeric sorts and metrics. The trick is doing those things fairly efficiently. Co-Authored by: Zachary Tong <zach@elastic.co>	2020-02-14 11:19:11 -05:00
Nik Everett	53b6583fed	Decode max and min optimization more carefully (#52336 ) (#52358 ) Fixes the the no-query optimization for `min` and `max` aggregations for `date_nanos` fields by delegating decoding dates "through" their `resolution` member. Closes #52220	2020-02-14 07:07:56 -05:00
Julie Tibshirani	0d7165a40b	Standardize naming of fetch subphases. (#52171 ) This commit makes the names of fetch subphases more consistent: * Now the names end in just 'Phase', whereas before some ended in 'FetchSubPhase'. This matches the query subphases like AggregationPhase. * Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what they do.	2020-02-13 13:00:46 -08:00
Nik Everett	2dac36de4d	HLRC support for string_stats (#52163 ) (#52297 ) This adds a builder and parsed results for the `string_stats` aggregation directly to the high level rest client. Without this the HLRC can't access the `string_stats` API without the elastic licensed `analytics` module. While I'm in there this adds a few of our usual unit tests and modernizes the parsing.	2020-02-12 19:25:05 -05:00
Nik Everett	7efce22f19	Fix a DST error in date_histogram (backport #52016 ) (#52237 ) When `date_histogram` attempts to optimize itself it for a particular time zone it checks to see if the entire shard is within the same "transition". Most time zone transition once every size months or thereabouts so the optimization can usually kicks in. But it crashes when you attempt feed it a time zone who's last DST transition was before epoch. The reason for this is a little twisted: before this patch it'd find the next and previous transitions in milliseconds since epoch. Then it'd cast them to `Long`s and pass them into the `DateFieldType` to check if the shard's contents were within the range. The trouble is they are then converted to `String`s which are then parsed back to `Instant`s which are then convertd to `long`s. And the parser doesn't like most negative numbers. And everything before epoch is negative. This change removes the `long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of passing the `long` -> `Instant` -> `long` which avoids the fairly complex parsing code and handles a bunch of interesting edge cases around epoch. And other edge cases around `date_nanos`. Closes #50265	2020-02-12 17:57:04 -05:00
Nhat Nguyen	12cb6dcefe	Fix testFlushOnInactive (#52275 ) We need to reduce the translog sync interval for indices with translog async setting so that we can have the safe commit in the assertBusy interval. This is needed since #51905, where we use the local checkpoint of the safe commit to calculate the number of uncommitted operations of a translog stats. Closes #52251 Relates #51905	2020-02-12 17:19:02 -05:00
Jay Modi	5bcc6fce5c	Remove DeprecationLogger from route objects (#52285 ) This commit removes the need for DeprecatedRoute and ReplacedRoute to have an instance of a DeprecationLogger. Instead the RestController now has a DeprecationLogger that will be used for all deprecated and replaced route messages. Relates #51950 Backport of #52278	2020-02-12 15:05:41 -07:00
Marios Trivyzas	dac720d7a1	Add a cluster setting to disallow expensive queries (#51385 ) (#52279 ) Add a new cluster setting `search.allow_expensive_queries` which by default is `true`. If set to `false`, certain queries that have usually slow performance cannot be executed and an error message is returned. - Queries that need to do linear scans to identify matches: - Script queries - Queries that have a high up-front cost: - Fuzzy queries - Regexp queries - Prefix queries (without index_prefixes enabled - Wildcard queries - Range queries on text and keyword fields - Joining queries - HasParent queries - HasChild queries - ParentId queries - Nested queries - Queries on deprecated 6.x geo shapes (using PrefixTree implementation) - Queries that may have a high per-document cost: - Script score queries - Percolate queries Closes: #29050 (cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)	2020-02-12 22:56:14 +01:00
Ryan Ernst	c07f46409c	Fix single newline in logging output stream buffer (#52253 ) The buffer in LoggingOutputStream skips flushing when only a newline appears. However, if a windows newline appeared, the buffer length was not reset. This commit resets the length so the \r does not appear in the next logging message. closes #51838	2020-02-12 10:48:55 -08:00
Nhat Nguyen	e098e837f7	Fix testShouldPeriodicallyFlushAfterMerge (#52243 ) MockRandomMergePolicy randomly determines if a segment should use a compound format. This can cause a force merge performing two merges: (1) merging to a single segment, (2) rewriting the new segment using the compound format. If the second merge completes after we have flushed, then it can flip the flag shouldPeriodicallyFlushAfterBigMerge to true. Closes #52205	2020-02-12 11:25:39 -05:00
Gordon Brown	d48ce12920	Convert ILM and SLM histories into hidden indices (#51456 ) Modifies SLM's and ILM's history indices to be hidden indices for added protection against accidental querying and deletion, and improves IndexTemplateRegistry to handle upgrading index templates. Also modifies the REST test cleanup to delete hidden indices.	2020-02-11 14:18:55 -07:00
Nik Everett	86d5211c05	Make sorting by an agg results a real abstraction (#52007 ) (#52212 ) This removes a bunch of `instanceof`s in favor of two new methods on `InernalAggregation`. The default implementations of these methods just throw exceptions explaining that you can't sort on this aggregation. They are overridden by all of the classes that used to have `instanceof` checks against them. I doubt this is really any faster in practice. The real benefit here is that it is a little more obvious that you can sort by the results of an aggregation and it should be much more obvious where to look at how aggregations sort themselves. There are still a bunch more `instanceof`s in left in `AggregationPath` but those will wait for a followup change.	2020-02-11 12:58:40 -05:00
Hendrik Muhs	098380e483	Percentiles aggregation validation checks for range (#51871 ) disallow to specify percentile out of range [0,100]. This also fixes a problem in transform by failing validation if an invalid percentile configuration is used.	2020-02-11 17:25:39 +01:00
David Roberts	473468d763	[ML] Better error when persistent task assignment disabled (#52014 ) Changes the misleading error message when attempting to open a job while the "cluster.persistent_tasks.allocation.enable" setting is set to "none" to a clearer message that names the setting. Closes #51956	2020-02-11 15:23:21 +00:00
Zachary Tong	87854573e4	Add version constant for 7.6.1	2020-02-11 09:44:43 -05:00
Igor Motov	667e1a5225	Add Boxplot Aggregation (#52174 ) Adds a `boxplot` aggregation that calculates min, max, medium and the first and the third quartiles of the given data set. Closes #33112	2020-02-11 09:38:17 -05:00
David Turner	00b9098250	Ignore timeouts with single-node discovery (#52159 ) Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely if joining a faulty master that is too slow to respond, and `cluster.publish.timeout` to allow a faulty master to detect that it is unable to publish its cluster state updates in a timely fashion. If these timeouts occur then the node restarts the discovery process in an attempt to find a healthier master. In the special case of `discovery.type: single-node` there is no point in looking for another healthier master since the single node in the cluster is all we've got. This commit suppresses these timeouts and instead lets the node wait for joins and publications to succeed no matter how long this might take.	2020-02-11 14:15:01 +00:00
David Kyle	343ced42be	Mute LoggingOutputStreamTests.testMaxBuffer (#52193 ) Relates to https://github.com/elastic/elasticsearch/issues/51838	2020-02-11 11:46:17 +00:00
Gordon Brown	350288ddf8	Check dot-index rules after template application (#52087 ) Previously, the dot-index rules (namely, that indices with dot-prefixed names should be either hidden indices or system indices) was done before* template application, and so only checked for the `index.hidden` setting in the request, ignoring if that setting was set via a template. This commit moves that check to a different method, which is applied after templates have been resolved and applied to the index settings.	2020-02-10 17:01:59 -07:00
Ryan Ernst	88cf8ac0a8	Fix windows empty line in logging capture (#52162 ) This commit fixes another edge case in handling windows newlines in our capture of stdout/stderr to log4j. The case is that the \r appears at the beginning of the buffer when flushing, which would unintentionally be emitted as an empty string. This commit skips the flush if only a \r was found. closes #51838	2020-02-10 13:29:50 -08:00
Julie Tibshirani	28a8db730f	In FieldTypeLookup, factor out flat object field logic. (#52091 ) Currently, the logic for looking up `flattened` field types lives in the top-level `FieldTypeLookup`. This PR moves it into a dedicated class `DynamicKeyFieldTypeLookup`.	2020-02-10 10:44:02 -08:00
Armin Braun	d8169e5fdc	Don't Upload Redundant Shard Files (#51729 ) (#52147 ) Segment(s) info blobs are already stored with their full content in the "hash" field in the shard snapshot metadata as long as they are smaller than 1MB. We can make use of this fact and never upload them physically to the repo. This saves a non-trivial number of uploads and downloads when restoring and might also lower the latency of searchable snapshots since they can save phyiscally loading this information as well.	2020-02-10 16:50:09 +01:00

1 2 3 4 5 ...

4208 Commits