OpenSearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	ebd1569028	Fix testMasterFailOverWithQueuedDeletes (#62062 ) (#62078 ) Fixing very rare corner case where the delete retry is slow. Closes #62031	2020-09-08 10:35:06 +02:00
Nhat Nguyen	bb0a583990	Allow enabling soft-deletes on restore from snapshot (#62018 ) Closes #61969	2020-09-07 09:45:36 -04:00
Alan Woodward	cbc9578cbd	Remove SearchPhase interface (#62050 ) The interface is never used as an abstraction - implementations are are called directly, and most of them don't need to implement the preProcess method.	2020-09-07 13:45:43 +01:00
David Turner	3389d5ccb2	Introduce integ tests for high disk watermark (#60460 ) An important goal of the disk threshold decider is to ensure that nodes use less disk space than the high watermark, and to take action if a node ever exceeds this watermark. Today we do not have any integration-style tests of this high-level behaviour. This commit introduces a small test harness that can adjust the apparent size of the disk and verify that the disk threshold decider moves shards around in response. Co-authored-by: Yannick Welsch <yannick@welsch.lu>	2020-09-07 14:39:39 +02:00
Armin Braun	395538f508	Improve Snapshot State Machine Performance (#62000 ) (#62049 ) Just a few random things to optimize motivated by somewhat sub-standard performance for large snapshot cluster states with many concurrent snapshots observed in production.	2020-09-07 13:25:40 +02:00
Jim Ferenczi	fa8e76abb1	Improve reduction of terms aggregations (#61779 ) (#62028 ) Today, the terms aggregation reduces multiple aggregations at once using a map to group same buckets together. This operation can be costly since it requires to lookup every bucket in a global map with no particular order. This commit changes how term buckets are sorted by shards and partial reduces in order to be able to reduce results using a merge-sort strategy. For bwc, results are merged with the legacy code if any of the aggregations use a different sort (if it was returned by a node in prior versions). Relates #51857	2020-09-07 13:13:20 +02:00
Alan Woodward	a295b0aa86	Fix null_value parsing for data_nanos field mapper (#61994 ) The null_value parameter for date fields is always parsed using DateFormatter.parseMillis, which is incorrect for nanosecond resolution fields. This commit changes the parsing logic to always use DateFieldType.parse() to parse the null value.	2020-09-07 10:58:54 +01:00
Alan Woodward	1799c0c583	Convert completion, binary, boolean tests to MapperTestCase (#62004 ) Also fixes a metadata serialization bug in CompletionFieldMapper.	2020-09-07 10:48:20 +01:00
Luca Cavanna	0c8b438577	Add support for runtime fields (#61776 ) This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch. We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script. The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields. Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-09-07 09:14:53 +02:00
Howard	b26584dff8	Remove unused deciders in BalancedShardsAllocator (#62026 )	2020-09-07 00:04:16 -04:00
Armin Braun	1e3edbbe74	Simplify BytesReference StreamInput (#61681 ) (#62014 ) Flattening both streams into a single stream here saves a few objects and some indirection. Also, removed the redundant `offset` field which added nothing but complexity by forcing the incrementation of two counters on every read.	2020-09-05 10:45:52 +02:00
Ryan Ernst	6d3b691048	Add snapshot only test modules (#61954 ) This commit adds external test modules. These are modules meant for external systems to test edge cases in elasticsearch, but only within snapshots. They are not meant to be used in production, so protections are also added from their accidental inclusion in release builds. Note that this commit does not actually add any new modules, it only adds the infrastructure for the new modules, under `test/external-modules`.	2020-09-04 16:35:18 -07:00
Yannick Welsch	6d08b55d4e	Simplify searchable snapshot shard allocation (#61911 ) Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).	2020-09-04 15:45:00 +02:00
Alan Woodward	66bb1eea98	Improve error messages on bad [format] and [null_value] params for date mapper (#61932 ) Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper configuration, you get a vague error: Failed to parse mapping [_doc]: cannot parse empty date Similarly, if you pass an incorrect format, you get the error: Failed to parse mapping [_doc]: Invalid format [...] This commit improves both these errors by including the mapper name and parameter that are misconfigured. Fixes #61712	2020-09-04 14:13:28 +01:00
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Nik Everett	3d23dcd742	Use standard bit set impl in cardinality (#61816 ) (#61930 ) This replaces a specialized bit set implementation used in cardinality with our standard `BitArray` which works exactly the same way. Its also tracked by `BigArrays` which is great!	2020-09-03 12:37:30 -04:00
Nik Everett	3934e14bc0	Fixup vwhisto test (#60936 ) (#61928 ) This test assumed some random bounds that turned out not to hold in some cases. Closes #60673	2020-09-03 12:37:17 -04:00
Alan Woodward	48870c60c7	Don't spin up a whole node to unit test some data structures (#61923 ) BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase, which builds an entire node just to run some unit tests over entirely in-memory data structures. This commit converts them both to extend ESTestCase.	2020-09-03 17:19:42 +01:00
Alan Woodward	3a1e0edf0a	Convert DateFieldMapperTests to MapperTestCase (#61920 )	2020-09-03 16:04:02 +01:00
Martijn Laarman	cfa54c08bd	[7.x] Version bump 7.9.1 release	2020-09-03 16:41:58 +02:00
Alan Woodward	e2f006eeb4	Merge FetchSubPhase hitsExecute and hitExecute methods (#60907 ) (#61893 ) FetchSubPhase has two 'execute' methods, one which takes all hits to be examined, and one which takes a single HitContext. It's not obvious which one should be implemented by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first run the hitExecute methods of all subphases, and then subsequently call all the hitsExecute methods. This commit reworks FetchSubPhase to replace these two variants with a processor class, `FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method. This processor class has two methods, `setNextReader()` and `process`. FetchPhase collects processors from all its subphases (if a subphase does not need to execute on the current search context, it can return `null` from `getProcessor`). It then sorts its hits by docid, and groups them by lucene leaf reader. For each reader group, it calls `setNextReader()` on all non-null processors, and then passes each doc id to `process()`. Implementations of fetch sub phases can divide their concerns into per-request, per-reader and per-document sections, and no longer need to worry about sorting docs or dealing with reader slices. FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods, setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all these executors together (if a phase should not be executed, then it returns null here); then it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then execute for each hit in turn. Individual sub phases no longer need to concern themselves with sorting docs or keeping track of readers; global structures can be built in getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.	2020-09-03 12:20:55 +01:00
Alan Woodward	af01ccee93	Add specific test for serializing all mapping parameter values (#61844 ) (#61877 ) This commit adds a test to MapperTestCase that explicitly checks that a mapper can serialize all its default values, and that this serialization can then be re-parsed. Note that the test is disabled for non-parametrized mappers as their serialization may in some cases output parameters that are not accepted. Gradually moving all mappers to parametrized form will address this. The commit also contains a fix to keyword mappers, which were not correctly serializing the similarity parameter; this partially addresses #61563. It also enables `null` as a value for `null_value` on `scaled_float`, as a follow-up to #61798	2020-09-03 09:20:26 +01:00
Nik Everett	c19f67ce30	Support longs in BitArray (backport of #61867 ) (#61871 ) We frequently use `long`s with `BitArray` in aggs and right now we have to assert that the `long` fits in an `int`. This adds support for `long` to `BitArray` so we don't need those assertions.	2020-09-02 17:24:31 -04:00
Henning Andersen	867d5f1c68	Search memory leak (#61788 ) (#61862 ) Search could leak memory if global ordinals were calculated as part of a search with low level cancellation enabled. QueryPhase registers a cancellation on the reader that is never removed, which ends up being referenced from the global ordinals cache entry. This keeps an indirect reference to the search context. A significant leak can occur when a heavy aggregation (cardinality for instance) is used and a failure occurs during search, in particular if the pages backing the hyperlog++ structure are not recycled when it is closed. This commit also fixes an issue with an unclosed resource and request breaker adjustment in the cardinality aggregation.	2020-09-02 18:51:14 +02:00
Jim Ferenczi	a0e4331c49	Cleanup usages of QueryPhaseResultConsumer (#61713 ) This commit generalizes how QueryPhaseResultConsumer is initialized. The query phase always uses this consumer so it doesn't need to be hidden behind an abstract class.	2020-09-02 14:41:02 +02:00
Alan Woodward	d59343b4ba	Allow [null] values in [null_value] (#61798 ) (#61807 ) Several field mappers have a null_value parameter, that allows you to specify a placeholder value to insert into a document if the incoming value for that field is null. The default value for this is always null, meaning "add no placeholder". However, we explicitly bar users from setting this parameter directly to null (done in #7978, in order to fix an NPE). This exclusion means that if a mapper is serialized with include_defaults, then we either need to special-case null_value to ensure that it is not output when it holds the default value, or we find that the resulting serialized form cannot be used to create a mapping. This stops us doing some useful generic testing of mappers. This commit permits null as a parameter value for null_value, and changes the tests to check that it is a) permissible and b) applied without throwing errors. As part of the testing changes, a new base class MapperServiceTestCase is refactored from MapperTestCase, holding the various helper methods related to building mappings but not the single-mapper specific abstract methods. Closes #58823	2020-09-02 10:42:19 +01:00
Igor Motov	48e53cca94	Fix wrong NaN comparison (#61795 ) (#61811 ) Fixes wrong NaN comparison in error message generator in GeoPolygonDecomposer and PolygonBuilder. Supersedes #48207 Co-authored-by: Pedro Luiz Cabral Salomon Prado <pedroprado010@users.noreply.github.com>	2020-09-01 15:50:38 -04:00
Tim Brooks	e573fa9abc	Add data.path fast path for FilePermission (#61302 ) The recursive data.path FilePermission check is an extremely hot codepath in Elasticsearch. Unfortunately the FilePermission check in Java is extremely allocation heavy. As it iterates through different file permissions, it allocates byte arrays for each Path component that must be compared. This PR improves the situation by adding the recursive data.path FilePermission it its own PermissionsCollection object which is checked first.	2020-09-01 12:03:22 -06:00
Armin Braun	28710c985d	Dry up Settings from Map Construction (#61778 ) (#61803 ) We used the same hack all over the place. At least drying it up to a single place. Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>	2020-09-01 19:46:10 +02:00
Tanguy Leroux	6e944d9e21	Throws IndexNotFoundException in TransportGetAction for unknown System indices (#61785 ) (#61791 ) The change #57936 introduced a dedicated thread pool for reads in system indices. It also introduced a potential NPE in the case the index to read in not yet present in the cluster state. This commit fixes that bug by using the getIndexSafe() instead of just index() method when retrieving the index's metadata so that an INFE is thrown if the index does not exist.	2020-09-01 17:41:57 +02:00
Dan Hermann	88a448f1cd	Fix wrong result when executing bulk requests with and without pipeline (#60818 ) (#61777 )	2020-09-01 07:05:25 -05:00
Armin Braun	3fd25bfa87	Fix Concurrent Snapshot Create+Delete + Delete Index (#61770 ) (#61773 ) We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes #61762	2020-09-01 13:20:25 +02:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Boice Huang	8fdd3d158b	Remove redundant symbol in msearch tests (#61353 )	2020-09-01 10:58:22 +02:00
Nik Everett	fb84c1f73e	Calculate precise cardinality upper bounds (#61529 ) (#61754 ) This reworks `CardinalityUpperBound` to support precise estimates while maintaining most of the public API. This will allow us to make more informed choices about the data structures that we use in aggregations. None of those interesting choices come as part of this change, but they are more possible with it.	2020-08-31 15:10:02 -04:00
Dan Hermann	2858e1efc4	Document new stats in _cat/nodes (#60445 ) (#61742 )	2020-08-31 12:40:21 -05:00
Adam Locke	5723b928d7	Remove Outdated Snapshot Docs (#61684 ) (#61728 ) Removing some now outdated statements that refer to a time when snapshot operations could not run concurrently. Closes #61680	2020-08-31 12:04:27 -04:00
Jason Tedor	43cb7c48bd	Adjust Lucene versions for 7.9.1 This commit adjusts the Lucene versions for 7.9.1 after the backporting of upgrading the 7.9 branch to Lucene 8.6.2.	2020-08-31 10:30:39 -04:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Rory Hunter	ff6c071275	Implement deprecation logging using log4j (#61629 ) Backport of #61474. Part of #46106. Simplify the implementation of deprecation logging by relying of log4j more completely, and implementing additional behaviour through custom appenders and filters.	2020-08-31 12:42:04 +01:00
Armin Braun	5c86b216e8	Fix Race in testGetSnapshotsRequest (#61694 ) (#61700 ) The fact that the data node is already blocked on writing data files did not guarantee that the cluster state that made the data node start snapshotting is already applied on master. This could lead to races where the get snapshots action still runs based on a state without the snapshot in it, tripping the assertion. Much safer to handle this by waiting on the non-blocking snapshot create to return, which guarantees that the CS has been applied on master. Closes #61541	2020-08-31 11:06:51 +02:00
Armin Braun	22e4d759c3	Speed up Reading Enum Set from Stream (#61678 ) (#61687 ) No need in adding enum values to a normal set and then copying, the `EnumSet` is directly mutable just fine.	2020-08-30 20:49:51 +02:00
Jake Landis	d2e5f2f532	[7.x] Enhance the ingest node simulate verbose output (#60433 ) (#60678 ) This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes #56004	2020-08-27 16:53:09 -05:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Alan Woodward	b6cb590685	Log more information when mappings fail on index creation (#61577 ) Errors from bad mappings at index creation are currently logged at DEBUG level, which can make it difficult to work out what's going on if the index is being auto-created. This commit ups the log level to INFO for auto-created indices, and includes some more information in the log message.	2020-08-27 15:08:51 +01:00
David Turner	411965d392	Allow background cluster state update in tests (#61455 ) Today the `CoordinatorTests` run the publication process as a single atomic action; however in production it appears possible that another master may be elected, publish its state, then fail, then we win another election, all in between the time we sampled our previous cluster state and started to publish the one we first thought of. This violates the `assertClusterStateConsistency()` assertion that verifies the cluster state update event matches the states we actually published and applied. This commit adjusts the tests to run the publication process more asynchronously so as to allow time for this behaviour to occur. This should eventually result in a reproduction of the failure in #61437 that will let us analyse what's really going on there and help us fix it.	2020-08-27 11:22:58 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
Jay Modi	34c4fc3b91	Remove tasks module to define tasks system index (#61588 ) This commit removes the tasks module that only existed to define the tasks result index, `.tasks`, as a system index. The definition for the tasks results system index descriptor is moved to the `SystemIndices` class with a check that no other plugin or module attempts to define an entry with the same source. Additionally, this change also makes the pattern for the tasks result index a wildcard pattern since we will need this when the index is upgraded (reindex to new name and then alias that to .tasks). Backport of #61540	2020-08-26 09:48:23 -06:00

1 2 3 4 5 ...

5289 Commits