OpenSearch

Commit Graph

Author	SHA1	Message	Date
Zachary Tong	3fcf598b92	Reduce deprecation log noise from DateIntervalWrapper (#52655 ) Converts the deprecations to `deprecatedAndMaybeLog` to reduce the number of times we log deprecations, since some of these could be called at a high frequency (due to unconverted queries, aggs, etc)	2020-03-03 17:08:10 -05:00
Jay Modi	c610e0893d	Introduce system index APIs for Kibana (#53035 ) This commit introduces a module for Kibana that exposes REST APIs that will be used by Kibana for access to its system indices. These APIs are wrapped versions of the existing REST endpoints. A new setting is also introduced since the Kibana system indices' names are allowed to be changed by a user in case multiple instances of Kibana use the same instance of Elasticsearch. Additionally, the ThreadContext has been extended to indicate that the use of system indices may be allowed in a request. This will be built upon in the future for the protection of system indices. Backport of #52385	2020-03-03 14:11:36 -07:00
Nik Everett	7339427af5	Remove some deprecation warnings parsing aggs (backport of #53026 ) (#53072 ) With #50871 aggrgations should now be parsed directly by an `ObjectParser` or `ConstructingObjectParser` without the need for the ceremonial `parse` method. This removes 10 of those `parse` methods and parses the aggregation directly from their `ObjectParser`.	2020-03-03 15:27:49 -05:00
Luca Cavanna	8a05b670ca	Address MinAndMax generics warnings (#52642 ) `MinAndMax` encapsulates min and max values for a shard. It uses generics to make sure that the values are of the same type and are also comparable. Though there are warnings whenever this class is currently used, which are addressed with this commit. Relates to #49092	2020-03-03 16:08:10 +01:00
Adrien Grand	cb868d2f5e	Introduce a `constant_keyword` field. (#49713 ) (#53024 ) This field is a specialization of the `keyword` field for the case when all documents have the same value. It typically performs more efficiently than keywords at query time by figuring out whether all or none of the documents match at rewrite time, like `term` queries on `_index`. The name is up for discussion. I liked including `keyword` in it, so that we still have room for a `singleton_numeric` in the future. However I'm unsure whether to call it `singleton`, `constant` or something else, any opinions? For this field there is a choice between 1. accepting values in `_source` when they are equal to the value configured in mappings, but rejecting mapping updates 2. rejecting values in `_source` but then allowing updates to the value that is configured in the mapping This commit implements option 1, so that it is possible to reindex from/to an index that has the field mapped as a keyword with no changes to the source. Backport of #49713	2020-03-03 16:01:47 +01:00
Jason Tedor	a154f9c657	Early return if no global checkpoint listeners (#53036 ) When notifying global checkpoint listeners, we have an opportunity to early return if there are not any registered listeners. This is important since it saves some allocations, and also saves forking some empty work to another thread. This commit adds an early return from notifying listeners if there are not any registered.	2020-03-02 23:28:22 -05:00
Stuart Tettemer	210aab0935	Settings: AffixSettings as validator dependencies (#52973 ) (#52982 ) Allow AffixSetting as validator dependencies. If a validator specifies AffixSettings as a dependency, then `validate(T, Map)` will have the concrete setting in a map. Backport of: #52973, 1e0ba70 Fixes: #52933	2020-02-29 09:38:46 -07:00
Nhat Nguyen	e6755afeeb	Upgrade to Lucene 8.5.0-snapshot-c4475920b08 (#52950 ) (#52977 ) To give LUCENE-9228 more CI cycles	2020-02-29 09:29:16 -05:00
Jay Modi	1cd0eee723	Remove TODO in IndexNameExpressionResolver (#52969 ) This commit removes a TODO in the IndexNameExpressionResolver that indicated the API should use a Set instead of a List. However, this TODO was not completely correct since the ordering of arguments matters due to negations when evaluating wildcards and since we also allow a list of patterns like `,-foo,`, which would have a different meaning even when using a Set with insertion ordering. Relates #52788 Backport of #52963	2020-02-28 13:56:28 -07:00
Adrien Grand	331d4bb0af	HybridDirectory should mmap postings. (#52641 ) (#52873 ) Since version 8.4, `MMapDirectory` has an optimization to read long[] arrays directly in little endian order, which postings leverage. So it'd be more efficient to open postings with `MMapDirectory`. I refactored a bit the existing logic to better explain why every listed file extension is open with `mmap`.	2020-02-28 18:45:46 +01:00
Martijn van Groningen	6aa9aaa2c6	Add validation for dynamic templates (#52890 ) Backport of #51233 to the seven dot x branch. Tries to load a `Mapper` instance for the mapping snippet of a dynamic template. This should catch things like using an analyzer that is undefined or mapping attributes that are unused. This is best effort: * If `{{name}}` placeholder is used in the mapping snippet then validation is skipped. * If `match_mapping_type` is not specified then validation is performed for all mapping types. If parsing succeeds with a single mapping type then this the dynamic mapping is considered valid. If is detected that a dynamic template mapping snippet is invalid at mapping update time then the mapping update is failed for indices created on 8.0.0-alpha1 and later. For indices created on prior version a deprecation warning is omitted instead. In 7.x clusters the mapping update will never fail in case of an invalid dynamic template mapping snippet and a deprecation warning will always be omitted. Closes #17411 Closes #24419 Co-authored-by: Adrien Grand <jpountz@gmail.com>	2020-02-28 10:35:04 +01:00
Nik Everett	407101c39b	Clean and document sorting with partialy built buckets (backport of #52769 ) (#52925 ) The `terms` aggregation can be sortd by the results of its sub-aggregations. Because it uses that sorting for filtering to the top-n it tries not to construct all of the buckets for the child aggregations. This has its own interesting problem around reduction, but they aren't super relevant to this change. This change moves that optimization from the `TermsAggregator` and into the aggregators being sorted on. This should make it more clear what is going on and it unifies this optimization with validating the sort. Finally, this should enable some minor optimizations to save a few comparisons when sorting multi-valued buckets. I'll get those in a follow up because they are now fairly obvious. They probably won't be a huge performance improvement, but it'll be nice anyway.	2020-02-27 17:50:55 -05:00
Nik Everett	1d1956ee93	Add size support to `top_metrics` (backport of #52662 ) (#52914 ) This adds support for returning the top "n" metrics instead of just the very top. Relates to #51813	2020-02-27 16:12:52 -05:00
Lee Hinman	e139d70abe	Remove TODO in MaxSizeCondition (#52854 ) Similar to what we did in #52794, this removes the TODO. Relates again to #52505	2020-02-27 09:29:12 -07:00
Dan Hermann	3c8b46a8c1	[7.x] Handle errors when evaluating if conditions in processors (#52892 )	2020-02-27 09:00:51 -06:00
hezhen Zhang	280d59c724	Append index name for the source of the cluster put-mapping task (#52690 ) Add index name(s) into the source for the cluster state update done when putting mapping. This ensures that the pending tasks API includes information on source indices.	2020-02-27 12:16:24 +01:00
David Turner	52fa465300	Cache completion stats between refreshes (#52872 ) Computing the stats for completion fields may involve a significant amount of work since it walks every field of every segment looking for completion fields. Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for every shard in the cluster. This repeated work is unnecessary since these stats do not change between refreshes; in many indices they remain constant for a long time. This commit introduces a cache for these stats which is invalidated on a refresh, allowing most stats calls to bypass the work needed to compute them on most shards. Closes #51915 Backport of #51991	2020-02-27 10:01:24 +00:00
Nhat Nguyen	814c275f35	Add more assertions to testMaybeFlush (#52792 ) We aren't able to reproduce or figure out the reason that failed this test. This commit adds more assertions so we can narrow the scope. Relates #52223	2020-02-26 17:08:18 -05:00
Nhat Nguyen	0a15a6bfad	Fix testSeqNoCollision (#52588 ) Adjusts the assertion as we trim translog more eagerly since #52556. Relates #52556 Closes #52148	2020-02-26 17:08:18 -05:00
Nhat Nguyen	87e765609e	Fix testResyncAfterPrimaryPromotion (#52615 ) Adjusts the assertion as we might eagerly clean up translog during resync since #52556 Relates #52556 Closes #52598	2020-02-26 17:08:18 -05:00
Nhat Nguyen	5aa612c275	Fix testRestoreLocalHistoryFromTranslog (#52441 ) Asserts that no new operations are made into the translog since we re-opened the engine. Relates #51905 Closes #52410	2020-02-26 17:08:18 -05:00
Nhat Nguyen	a92bf5ec61	Fix IndexShardIT#testMaybeFlush (#52247 ) Since #51905, we use the local checkpoint of the safe commit to calculate the number of uncommitted operations of a translog stats. If a periodic flush triggered by afterWriteOperation completes before we sync translog, then the last commit is not safe. We also need to sync translog from Engine instead of the translog so that we can advance the safe commit. Relates #51905 Closes #52223	2020-02-26 17:08:18 -05:00
Nhat Nguyen	d7fe135d90	Fix testPrepareIndexForPeerRecovery (#52245 ) Since #51905, we skip translog recovery if the local checkpoint of the safe commit equals to the global checkpoint. This change adjusts the test not to create a new snapshot in that case. Closes #52221 Relates #51905	2020-02-26 17:08:18 -05:00
Yannick Welsch	82ab1bc1ff	Separate translog from index deletion conditions (#52556 ) Separates the translog from the index deletion conditions (allowing the translog to be cleaned up more eagerly), and avoids taking the write lock on the translog if no clean-up is actually necessary.	2020-02-26 17:08:18 -05:00
Nhat Nguyen	db6b9c21c7	Use local checkpoint to calculate min translog gen for recovery (#51905 ) Today we use the translog_generation of the safe commit as the minimum required translog generation for recovery. This approach has a limitation, where we won't be able to clean up translog unless we flush. Reopening an already recovered engine will create a new empty translog, and we leave it there until we force flush. This commit removes the translog_generation commit tag and uses the local checkpoint of the safe commit to calculate the minimum required translog generation for recovery instead. Closes #49970	2020-02-26 17:08:18 -05:00
Dan Hermann	3ffd34617f	Switch to AtomicLong for ingestCurrent metric to prevent negative values (#52581 ) (#52834 )	2020-02-26 13:26:26 -06:00
Jay Modi	07ef8ccff4	Allow dynamic updates for index.hidden setting (#52837 ) This commit changes the `index.hidden` setting from being final to a dynamic setting. While the setting being final allows for easier reasoning about an index, making this setting update-able has more benefits in that we can upgrade existing indices to be hidden and it will enable future features that would dynamically make indices hidden. Backport of #52772	2020-02-26 11:46:29 -07:00
Nik Everett	bfaa487757	Switch pipeline agg parsing to ContextParser (#52776 ) (#52832 ) We've pretty well settled on `ContextParser` for a generic interface to `ObjectParser`-like-things. This switches the interface used for building parsing pipeline aggregations to `ContextParser` which saves a couple of little wrappers around `ObjectParser`.	2020-02-26 12:57:20 -05:00
Tim Brooks	be8d704e2b	Remove seeds depedency for remote cluster settings (#52829 ) Currently 3 remote cluster settings (ping interval, skip unavailable, and compression) have a dependency on the seeds setting being comfigured. With proxy mode, it is now possible that these settings the seeds setting has not been configured. This commit removes this dependency and adds new validation for these settings.	2020-02-26 10:17:25 -07:00
Adrien Grand	1807f86751	Generalize how queries on `_index` are handled at rewrite time (#52815 ) Generalize how queries on `_index` are handled at rewrite time (#52486) Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder. This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed. Closes #49254	2020-02-26 15:37:43 +01:00
Luca Cavanna	9e38125464	Clarify when shard iterators get sorted (#52810 ) Currently we have two ways to create a GroupShardsIterator: one that will resort the iterators based on their natural ordering, and another one that will leave them in their original order. This is currently done through two constructors, one that accepts a single argument which does the sorting, and another which accepts a second boolean argument to control whether sorting should happen or not. This second constructor is only called externally to disable the sorting. By introducing a specific method to create a sorted shard iterator we clarify and make it easier to track when we do sort and when we do not as the iterators are externally sorted.	2020-02-26 13:58:20 +01:00
Jim Ferenczi	a73ad248e8	Fix backport of #46731 (#52744 ) This change fixes the incomplete backport of #46731 in 7.x (as of 7.5). We now check if `max_children` is set on the top level nested sort and fails with an exception if it's not the case. Relates #46731 Closes #52202	2020-02-26 10:46:51 +01:00
Sachin Frayne	d3c0a2f013	Improve the error message when loading text fielddata. (#52753 ) Emphasize keyword over fielddata as the preferred way to use String fields for aggregations or sorting.	2020-02-25 15:45:44 -08:00
Lee Hinman	662f21fcea	Remove TODO in MaxAgeCondition serialization (#52794 ) * Remove TODO in MaxAgeCondition serialization This removes the TODO with a message for any future readers regarding the code in question. Resolves #52505	2020-02-25 15:47:36 -07:00
Tim Brooks	c8ef9649e2	Force execution of finish shard bulk request (#51957 ) (#52484 ) Currently the shard bulk request can be rejected by the write threadpool after a mapping update. This introduces a scenario where the mapping listener thread will attempt to finish the request and fsync. This thread can potentially be a transport thread. This commit fixes this issue by forcing the finish action to happen on the write threadpool. Fixes #51904.	2020-02-25 14:37:11 -07:00
Nhat Nguyen	848d3bc153	Revert "Fix testKeepTranslogAfterGlobalCheckpoint" This reverts commit `a88d54eb2d`.	2020-02-25 14:12:35 -05:00
Nhat Nguyen	a88d54eb2d	Fix testKeepTranslogAfterGlobalCheckpoint Read the last synced global checkpoint after flushing as we might advance it during committing. CI: https://gradle-enterprise.elastic.co/s/7o6qengg4gva2	2020-02-25 11:49:24 -05:00
Alan Woodward	638f3e4183	Use ByteBuffersDirectory rather than RAMDirectory (#52768 ) Lucene's RAMDirectory has been deprecated. This commit replaces all uses of RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses are in tests, but the percolator and painless executor may get some small speedups.	2020-02-25 15:46:35 +00:00
Alan Woodward	18663b0a85	Don't index ranges including NOW in percolator (#52748 ) Currently, date ranges queries using NOW-based date math are rewritten to MatchAllDocs queries when being preprocessed for the percolator. However, since we added the verification step, this can result in incorrect matches when percolator queries are run without scores. This commit changes things to instead wrap date queries that use NOW with a new DateRangeIncludingNowQuery. This is a simple wrapper query that returns its delegate at rewrite time, but it can be detected by the percolator QueryAnalyzer and be dealt with accordingly. This also allows us to remove a method on QueryRewriteContext, and push all logic relating to NOW-based ranges into the DateFieldMapper. Fixes #52617	2020-02-25 12:18:16 +00:00
Ryan Ernst	5fba8cbc7b	Rename local Environment var in Node to avoid confusion (#52602 ) When the Node class is being constructed, an initial environment is passed in with the initial settings for the node. Once the plugin servicie is initialized, the final Environment+Settings are created, at which point the initial environment should no longer be used. This commit renames the constructor arg to avoid naming clashes with the final environment variable.	2020-02-24 11:14:46 -08:00
Lee Hinman	7d9de8412a	[7.x] fix npe in RestPluginsAction (#52620 ) (de56de9a) (#52721 ) Relates #45321 Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> Co-authored-by: Kaihong.Wang <kyra.wkh@alibaba-inc.com> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2020-02-24 11:57:01 -07:00
Mayya Sharipova	034b1c0ba3	Correct boost calculation in script_score query (#52478 ) (#52724 ) Before boost in script_score query was wrongly applied only to the subquery. This commit makes sure that the boost is applied to the whole score that comes out of script. Closes #48465	2020-02-24 13:48:21 -05:00
Adrien Grand	f993ef80f8	Move the terms index of `_id` off-heap. (#52518 ) In #42838 we moved the terms index of all fields off-heap except the `_id` field because we were worried it might make indexing slower. In general, the indexing rate is only affected if explicit IDs are used, as otherwise Elasticsearch almost never performs lookups in the terms dictionary for the purpose of indexing. So it's quite wasteful to require the terms index of `_id` to be loaded on-heap for users who have append-only workloads. Furthermore I've been conducting benchmarks when indexing with explicit ids on the http_logs dataset that suggest that the slowdown is low enough that it's probably not worth forcing the terms index to be kept on-heap. Here are some numbers for the median indexing rate in docs/s: \| Run \| Master \| Patch \| \| --- \| ------- \| ------- \| \| 1 \| 45851.2 \| 46401.4 \| \| 2 \| 45192.6 \| 44561.0 \| \| 3 \| 45635.2 \| 44137.0 \| \| 4 \| 46435.0 \| 44692.8 \| \| 5 \| 45829.0 \| 44949.0 \| And now heap usage in MB for segments: \| Run \| Master \| Patch \| \| --- \| ------- \| -------- \| \| 1 \| 41.1720 \| 0.352083 \| \| 2 \| 45.1545 \| 0.382534 \| \| 3 \| 41.7746 \| 0.381285 \| \| 4 \| 45.3673 \| 0.412737 \| \| 5 \| 45.4616 \| 0.375063 \| Indexing rate decreased by 1.8% on average, while memory usage decreased by more than 100x. The `http_logs` dataset contains small documents and has a simple indexing chain. More complex indexing chains, e.g. with more fields, ingest pipelines, etc. would see an even lower decrease of indexing rate.	2020-02-24 18:14:12 +01:00
Alan Woodward	7dc41a3b83	Use BoostQuery rather than FunctionScoreQuery for query-time indices_boost (#52272 ) This is a trivial change, but it should result in a slightly more efficient query boost.	2020-02-24 14:41:46 +00:00
Nik Everett	d26d7721ea	Continue realizing sorting by aggregations (backport of #52298 ) (#52667 ) This drops more of the `instanceof`s from `AggregationPath`. There are still a couple in `AggregationPath`. And I ended up moving two into `BucketsAggregator`, but I think this is still an improvement!	2020-02-23 17:13:55 -05:00
bellengao	02cb5b6c0e	Return 429 status code on read_only_allow_delete index block (#50166 ) We consider index level read_only_allow_delete blocks temporary since the DiskThresholdMonitor can automatically release those when an index is no longer allocated on nodes above high threshold. The rest status has therefore been changed to 429 when encountering this index block to signal retryability to clients. Related to #49393	2020-02-22 16:24:25 +01:00
Jay Modi	8abfda0b59	Rename assertThrows to prevent naming clash (#52651 ) This commit renames ElasticsearchAssertions#assertThrows to assertRequestBuilderThrows and assertFutureThrows to avoid a naming clash with JUnit 4.13+ and static imports of these methods. Additionally, these methods have been updated to make use of expectThrows internally to avoid duplicating the logic there. Relates #51787 Backport of #52582	2020-02-21 13:30:11 -07:00
Stuart Tettemer	376932a47d	Scripting: split out compile limits and caching (#52498 ) (#52652 ) Phase 1 of adding compilation limits per context. * Refactor rate limiting and caching into separate class, `ScriptCache`, which will be used per context. * Disable compilation limit for certain tests. Backport of 0866031 Refs: #50152	2020-02-21 12:10:51 -07:00
Jay Modi	f3f6ff97ee	Single instance of the IndexNameExpressionResolver (#52604 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor. Backport of #52596	2020-02-21 07:50:02 -07:00
markharwood	96d603979b	Upgrade Lucene to 8.5.0-snapshot-b01d7cb (#52584 ) Upgrading 7x to same Lucene 8.5 version used in master	2020-02-21 10:25:03 +00:00

1 2 3 4 5 ...

4247 Commits