OpenSearch

Commit Graph

Author	SHA1	Message	Date
Zachary Tong	25d74bd0cb	Prefer mapped aggs to lead reductions (#33528 ) Previously, unmapped aggs try to delegate reduction to a sibling agg that is mapped. That delegated agg will run the reductions, and also reduce any pipeline aggs. But because delegation comes before running pipelines, the unmapped agg _also_ tries to run pipeline aggs. This causes the pipeline to run twice, and potentially double it's output in buckets which can create invalid JSON (e.g. same key multiple times) and break when converting to maps. This fixes by sorting the list of aggregations ahead of time so that mapped aggs appear first, meaning they preferentially lead the reduction. If all aggs are unmapped, the first unmapped agg simply creates a new unmapped object and returns that for the reduction. This means that unmapped aggs no longer defer and there is no chance for a secondary execution of pipelines (or other side effects caused by deferring execution). Closes #33514	2018-09-26 10:09:31 -04:00
Nik Everett	1871e7f7e9	Search: Simply SingleFieldsVisitor (#34052 ) `SingleFieldsVisitor` is meant to load a single stored field but it manages to be quite complex to reason about because it inherits from our "basic" `FieldsVisitor` which is designed to load many fields. This breaks that inheritance and adds logic to `SingleFieldsVisitor` so it can be properly stand alone. While this amounts to more lines of code they ought to be significantly easier to reason about.	2018-09-26 09:48:15 -04:00
David Roberts	1413ace74f	Mute testSplitFromOneToN and testCreateShrinkIndexToN on Windows Relates #34080	2018-09-26 14:02:14 +01:00
Christoph Büscher	ba3ceeaccf	Clean up "unused variable" warnings (#31876 ) This change cleans up "unused variable" warnings. There are several cases were we most likely want to suppress the warnings (especially in the client documentation test where the snippets contain many unused variables). In a lot of cases the unused variables can just be deleted though.	2018-09-26 14:09:32 +02:00
Jim Ferenczi	a255880497	Add nested and object fields to field capabilities response (#33803 ) This commit adds nested and object fields to the field capabilities response. Closes #33237	2018-09-26 08:59:41 +02:00
Ryan Ernst	be8475955e	Scripting: Use ParameterMap for deprecated ctx var in update scripts (#34065 ) This commit removes the sysprop controlling whether ctx is in params for update scripts and replaces it with use of the new ParameterMap, which outputs a deprecation warning whenever params.ctx is used.	2018-09-25 22:08:02 -07:00
Nhat Nguyen	8a56369f5b	Move max_unsafe_auto_id_timestamp constant to Engine (#34025 ) We should not access InternalEngine in other classes.	2018-09-25 19:20:00 -04:00
Jim Ferenczi	0f878eff19	Add a limit for graph phrase query expansion (#34031 ) Today query parsers throw TooManyClauses exception when a query creates too many clauses. However graph phrase queries do not respect this limit. This change adds a protection against crazy expansions that can happen when building a graph phrase query. This is a temporary copy of the fix available in https://issues.apache.org/jira/browse/LUCENE-8479 but not merged yet. This logic will be removed when we integrate the Lucene patch in a future release.	2018-09-25 21:38:47 +02:00
Igor Motov	1e6780d703	Mute AckClusterUpdateSettingsIT Tracked by #33673	2018-09-25 14:16:47 -04:00
Armin Braun	0ba1855740	INGEST: Tests for Drop Processor (#33430 ) * INGEST: Tests for Drop Processor * UT for behavior of dropped callback and drop processor * Moved drop processor to `server` project to enable this test * Simple IT * Relates #32278	2018-09-25 19:29:22 +02:00
Christoph Büscher	ecc087a5bb	Remove Join utility class (#34037 ) The functionality can be replaces with String.join in new Java versions.	2018-09-25 15:25:54 +02:00
David Turner	7c63f5455b	Use a threadsafe map in SearchAsyncActionTests (#33700 ) Today `SearchAsyncActionTests#testFanOutAndCollect` uses a simple `HashMap` for the `nodeToContextMap` variable, which is then accessed from multiple threads without, apparently, explicit synchronisation. This provides an explanation for the test failure identified in #29242 in which `.toString()` returns `"[]"` just before `.isEmpty` returns `false`, without any concurrent modifications. This change converts `nodeToContextMap` to a `newConcurrentMap()` so that this cannot occur. It also fixes a race condition in the detection of double-calling the subsequent search phase. Closes #29242.	2018-09-25 13:58:05 +01:00
Nhat Nguyen	5166dd0a4c	Replicate max seq_no of updates to replicas (#33967 ) We start tracking max seq_no_of_updates on the primary in #33842. This commit replicates that value from a primary to its replicas in replication requests or the translog phase of peer-recovery. With this change, we guarantee that the value of max seq_no_of_updates on a replica when any index/delete operation is performed at least the max_seq_no_of_updates on the primary when that operation was executed. Relates #33656	2018-09-25 08:07:57 -04:00
Luca Cavanna	970407c663	[DOCS] add comment to clarify cluster name resolution (#34014 ) We currently fallback to local indices whenever a remote cluster is not found, as there may still be indices / aliases with the same name. Such behaviour is lenient but needs to be kept for backwards compatibility. Clarified that in the code so we don't forget. Relates to #26247	2018-09-25 14:03:07 +02:00
Adrien Grand	612201aee0	Fix created version for similarity validation. (#33890 ) It mistakenly uses the Elasticsearch major version instead of the Lucene major version. I noticed it when backporting, it is not noticeable on master because the only two Lucene versions that are supported, 7 and 8, encode norms the same way, unlike Lucene 6.	2018-09-25 13:48:25 +02:00
Hendrik Muhs	bf6cf6b6d9	refactor CompositeValuesSourceParserHelper for reusage by making it public (#33945 ) refactor CompositeValuesSourceParserHelper for reusage by making it public and moving toXContent into it	2018-09-25 09:15:52 +02:00
David Turner	3af8fc74c7	Make TransportService more test-friendly (#33869 ) Today, TransportService uses System.currentTimeMillis() to get the current time to report on things like timeouts, and enqueues lambdas for future execution. However, in tests it is useful to be able to fake out the current time and to see what all these enqueued lambdas are really for. This change alters the situation so that we can obtain the time from the more easily-faked ThreadPool#relativeTimeInMillis(), and implements some friendlier toString() methods on the various Runnables so we can see what they are later.	2018-09-25 07:50:18 +01:00
Armin Braun	25bc8c4b5a	Fix typo `NodeEnvironment#assertPathsDoNotExist` (#33996 ) * We want to check the individual paths here one by one to get a better to interpret assertion message	2018-09-24 17:57:27 +02:00
Julie Tibshirani	8e8bd56cc7	In MatchQuery, remove a check for fragile search analyzers. (#33927 ) As far as I can tell this guard against fragile analyzers is no longer relevant, since we stopped setting special analyzers on numeric fields (3bf6f4). Instead of removing the guard completely, I opted to keep a check for untokenized + unnormalized fields to avoid going through the analysis process unnecessarily. My motivation for simplifying this check is that I'd like to add support for `split_queries_on_whitespace` to the new 'queryable object' fields. As it stands, I would have to add a dedicated instanceof check for the new mapper, which is not optimal.	2018-09-24 08:56:13 -07:00
Tim Brooks	78e483e8d8	Introduce abstract security transport testcase (#33878 ) This commit introduces an AbstractSimpleSecurityTransportTestCase for security transports. This classes provides transport tests that are specific for security transports. Additionally, it fixes the tests referenced in #33285.	2018-09-24 09:44:44 -06:00
Ignacio Vera	df333ca305	TESTS: Make score Float#NaN when there is no max score (#33997 ) * TESTS: Make score Float#NaN when there is no max score Fixes test failure due to maxScore set to Float#MinValue instead on Float#NaN. In addition the initial value for maxScore is set to Float#NEGATIVE_INFINITY so it is an illegal value. Closes #33993	2018-09-24 17:36:48 +02:00
Luca Cavanna	e389d9e296	Clarify RemoteClusterService#groupIndices behaviour (#33899 ) When executing a cross-cluster search, we need to search against all local indices (and no remote indices) in case no indices are specified. Also, if only remote indices are specified, no local indices will be queried. We previously added empty local indices whenever they were not present in the map of the grouped indices, then we would act differently later based on the extracted remote indices. Instead, we now add the empty array for local indices only in case we need to search all local indices; the entry for local indices is not added when local indices should not be searched. This way the grouped indices reflect reality and provide a better indication of what indices will be searched.	2018-09-24 11:45:33 +02:00
Christophe Bismuth	47ed6c79ee	[TEST] Add validate query tests for empty and malformed queries (#33862 ) Relates to #33095	2018-09-24 11:21:47 +02:00
Simon Willnauer	7d703c2f92	Fix AutoQueueAdjustingExecutorBuilder settings validation (#33922 ) Settings validation in AutoQueueAdjustingExecutorBuilder always checked against a default value which means that we never can change a max queue size that is lower than the default. This change adds tests and fixes this validation.	2018-09-24 07:45:50 +02:00
Nhat Nguyen	432e61c971	Adjust bwc for resync request (#33964 ) Relates #33964	2018-09-22 19:29:38 -04:00
Nhat Nguyen	f2f08dd6c5	Adjust bwc for recovery request (#33693 ) Relates #33693	2018-09-22 19:28:20 -04:00
Nhat Nguyen	e7ae2f9d36	Propagate auto_id_timestamp in primary-replica resync (#33964 ) A follow-up of #33693 to propagate max_seen_auto_id_timestamp in a primary-replica resync. Relates #33693	2018-09-22 11:40:10 -04:00
Nhat Nguyen	7944a0cb25	Track max seq_no of updates or deletes on primary (#33842 ) This PR is the first step to use seq_no to optimize indexing operations. The idea is to track the max seq_no of either update or delete ops on a primary, and transfer this information to replicas, and replicas use it to optimize indexing plan for index operations (with assigned seq_no). The max_seq_no_of_updates on primary is initialized once when a primary finishes its local recovery or peer recovery in relocation or being promoted. After that, the max_seq_no_of_updates is only advanced internally inside an engine when processing update or delete operations. Relates #33656	2018-09-22 08:02:57 -04:00
Vladimir Dolzhenko	9c0316869b	Store: keep IndexFormatTooOldException and IndexFormatTooNewException in corruption marker (#33920 ) Closes #33916	2018-09-21 14:00:02 +02:00
Nik Everett	cac93949fe	API: Drop deprecated methods from Retry (#33925 ) We deprecated the `Retry.withBackoff` flavors with `Settings` in 6.5 because they were no longer needed. This drops them form 7.0.	2018-09-21 07:55:50 -04:00
Christoph Büscher	b654d986d7	Add OneStatementPerLineCheck to Checkstyle rules (#33682 ) This change adds the OneStatementPerLineCheck to our checkstyle precommit checks. This rule restricts the number of statements per line to one. The resoning behind this is that it is very difficult to read multiple statements on one line. People seem to mostly use it in short lambdas and switch statements in our code base, but just going through the changes already uncovered some actual problems in randomization in test code, so I think its worth it.	2018-09-21 11:52:31 +02:00
Nhat Nguyen	5f7f793f43	Propagate max_auto_id_timestamp in peer recovery (#33693 ) Today we don't store the auto-generated timestamp of append-only operations in Lucene; and assign -1 to every index operations constructed from LuceneChangesSnapshot. This looks innocent but it generates duplicate documents on a replica if a retry append-only arrives first via peer-recovery; then an original append-only arrives via replication. Since the retry append-only (delivered via recovery) does not have timestamp, the replica will happily optimizes the original request while it should not. This change transmits the max auto-generated timestamp from the primary to replicas before translog phase in peer recovery. This timestamp will prevent replicas from optimizing append-only requests if retry counterparts have been processed. Relates #33656 Relates #33222	2018-09-20 19:53:30 -04:00
Vladimir Dolzhenko	dbe6405354	mute RemoveCorruptedShardDataCommandTests.testCorruptedIndex	2018-09-20 21:30:40 +02:00
Nhat Nguyen	76a1a863e3	TEST: stop assertSeqNos if shards movement (#33875 ) Currently, assertSeqNos assumes that the cluster is stable at the end of the test (i.e., no more shard movement). However, this assumption does not always hold. In these cases, we can stop the assertion instead of failing a test. Closes #33704	2018-09-20 13:44:26 -04:00
Christoph Büscher	28b1d41007	Fix unused import checktyle issue	2018-09-20 19:42:15 +02:00
Nhat Nguyen	002f763c48	Restore local history from translog on promotion (#33616 ) If a shard was serving as a replica when another shard was promoted to primary, then its Lucene index was reset to the global checkpoint. However, if the new primary fails before the primary/replica resync completes and we are now being promoted, we have to restore the reverted operations by replaying the translog to avoid losing acknowledged writes. Relates #33473 Relates #32867	2018-09-20 13:21:11 -04:00
Nhat Nguyen	b13a434f59	Remove wrong assert in LocalCheckpointTrackerTests It's possible for the set "seqNos" to contain only the "unFinishedSeq" in the testConcurrentReplica test. If this is the case, the call `randomValueOtherThan` won't make any progress because the predicate will never be false. This commit removes this expectation because it's incorrect and it's no longer needed as we have a dedicated test to verify the contains method. Relates #33871	2018-09-20 13:12:19 -04:00
Alan Woodward	b33c18d316	Move SoraniNormalizationFilterFactory to the common analysis plugin (#33892 ) Follow up to #25715	2018-09-20 17:31:41 +01:00
Yannick Welsch	db327818dd	[TEST] Enable DEBUG logging on testCreateShrinkIndexToN	2018-09-20 18:16:20 +02:00
Nik Everett	f963c29876	Logging: Drop Settings from some logger lookups (#33859 ) Drops `Settings` from some of the methods to lookup loggers and deprecates another logger lookup that takes `Settings` because `Settings` is no longer required to build a logger.	2018-09-20 10:42:48 -04:00
Jake Landis	e37e5dfc04	ingest: support simulate with verbose for pipeline processor (#33839 ) * ingest: support simulate with verbose for pipeline processor This change better supports the use of simulate?verbose with the pipeline processor. Prior to this change any pipeline processors executed with simulate?verbose would not show all intermediate processors for the inner pipelines. This changes also moves the PipelineProcess and TrackingResultProcessor classes to enable instance checks and to avoid overly public classes. As well this updates the error message for when cycles are detected in pipelines calling other pipelines.	2018-09-20 08:33:07 -05:00
Simon Willnauer	3522b9084b	Introduce a `search_throttled` threadpool (#33732 ) Today all searches happen on the search threadpool which is the correct behavior in almost any case. Yet, there are exceptions where for instance searches searches should be passed through a single-thread thread-pool to reduce impact on a node. This change adds a index-private setting that allows to mark an index as throttled for searches and forks off all non-stats searcher access to this thread-pool for indices that are marked as `index.search.throttled`	2018-09-20 13:43:11 +02:00
David Turner	c041e94349	Test that transient settings beat persistent ones (#33818 ) Transient settings override persistent settings, but in fact all of the tests that run as part of `:server:test` and `:server:integTest` will pass if the precedence is changed to be the other way round. This change adds a test that verifies the precedence is as documented.	2018-09-20 11:17:19 +01:00
Tim Vernum	8d50c10208	Mute ShrinkIndexIT.testCreateShrinkIndexToN on Windows Relates: #33857	2018-09-20 18:21:15 +10:00
Daniel Mitterdorfer	b1cc58e425	Allow to clear the fielddata cache per field With this commit we clear the fielddata cache per field as it is supposed to be. Previously we retrieved the proper field from the cache but then cleared the entire cache anyway. Closes #33798 Relates #33807	2018-09-20 08:59:53 +02:00
Tim Vernum	1f1ebb4656	Add additional null check in _cat/shards The target of the func lambda may be null (e.g. in a mixed cluster where older nodes lack some of the values) Relates: #33858 / 331caba Closes #33877	2018-09-20 06:44:13 +02:00
Nhat Nguyen	05bf9dc2e8	Add contains method to LocalCheckpointTracker (#33871 ) This change adds "contains" method to LocalCheckpointTracker. One of the use cases is to check if a given operation has been processed in an engine or not by looking up its seq_no in LocalCheckpointTracker. Relates #33656	2018-09-19 20:29:36 -04:00
Nik Everett	26c4f1fb6c	Core: Default node.name to the hostname (#33677 ) Changes the default of the `node.name` setting to the hostname of the machine on which Elasticsearch is running. Previously it was the first 8 characters of the node id. This had the advantage of producing a unique name even when the node name isn't configured but the disadvantage of being unrecognizable and not being available until fairly late in the startup process. Of particular interest is that it isn't available until after logging is configured. This forces us to use a volatile read whenever we add the node name to the log. Using the hostname is available immediately on startup and is generally recognizable but has the disadvantage of not being unique when run on machines that don't set their hostname or when multiple elasticsearch processes are run on the same host. I believe that, taken together, it is better to default to the hostname. 1. Running multiple copies of Elasticsearch on the same node is a fairly advanced feature. We do it all the as part of the elasticsearch build for testing but we make sure to set the node name then. 2. That the node.name defaults to some flavor of "localhost" on an unconfigured box feels like it isn't going to come up too much in production. I expect most production deployments to at least set the hostname. As a bonus, production deployments need no longer set the node name in most cases. At least in my experience most folks set it to the hostname anyway.	2018-09-19 15:21:29 -04:00
Simon Willnauer	a92dda2e7e	Move CompletionStats into the Engine (#33847 ) By moving CompletionStats into the engine we can easily cache the stats for read-only engines if necessary. It also moves the responsibiltiy out of IndexShard which has quiet some complexity already. Relates to #33835	2018-09-19 20:35:57 +02:00
Simon Willnauer	0fa5758bc6	Fix potential NPE in `_cat/shards/` with partial CommonStats (#33858 ) Today if we fetch common stats from a shard we might get a partial response if the shard is closed while we fetch the stats. This causes hard to track and reproduce NPEs. This change streamlines null checking to ensure we only render stats we actually received.	2018-09-19 20:34:54 +02:00

1 2 3 4 5 ...

1397 Commits