OpenSearch

Commit Graph

Author	SHA1	Message	Date
Ignacio Vera	31c026f25c	upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957 ) (#61974 )	2020-09-04 13:46:20 +02:00
Nik Everett	3d23dcd742	Use standard bit set impl in cardinality (#61816 ) (#61930 ) This replaces a specialized bit set implementation used in cardinality with our standard `BitArray` which works exactly the same way. Its also tracked by `BigArrays` which is great!	2020-09-03 12:37:30 -04:00
Nik Everett	3934e14bc0	Fixup vwhisto test (#60936 ) (#61928 ) This test assumed some random bounds that turned out not to hold in some cases. Closes #60673	2020-09-03 12:37:17 -04:00
Alan Woodward	48870c60c7	Don't spin up a whole node to unit test some data structures (#61923 ) BytesRefHashTests and LongObjectHashMapTests currently extend ESSingleNodeTestCase, which builds an entire node just to run some unit tests over entirely in-memory data structures. This commit converts them both to extend ESTestCase.	2020-09-03 17:19:42 +01:00
Alan Woodward	3a1e0edf0a	Convert DateFieldMapperTests to MapperTestCase (#61920 )	2020-09-03 16:04:02 +01:00
Martijn Laarman	cfa54c08bd	[7.x] Version bump 7.9.1 release	2020-09-03 16:41:58 +02:00
Alan Woodward	e2f006eeb4	Merge FetchSubPhase hitsExecute and hitExecute methods (#60907 ) (#61893 ) FetchSubPhase has two 'execute' methods, one which takes all hits to be examined, and one which takes a single HitContext. It's not obvious which one should be implemented by a given sub-phase, or if implementing both is a possibility; nor is it obvious that we first run the hitExecute methods of all subphases, and then subsequently call all the hitsExecute methods. This commit reworks FetchSubPhase to replace these two variants with a processor class, `FetchSubPhaseProcessor`, that is returned from a single `getProcessor` method. This processor class has two methods, `setNextReader()` and `process`. FetchPhase collects processors from all its subphases (if a subphase does not need to execute on the current search context, it can return `null` from `getProcessor`). It then sorts its hits by docid, and groups them by lucene leaf reader. For each reader group, it calls `setNextReader()` on all non-null processors, and then passes each doc id to `process()`. Implementations of fetch sub phases can divide their concerns into per-request, per-reader and per-document sections, and no longer need to worry about sorting docs or dealing with reader slices. FetchSubPhase now provides a FetchSubPhaseExecutor that exposes two methods, setNextReader(LeafReaderContext) and execute(HitContext). The parent FetchPhase collects all these executors together (if a phase should not be executed, then it returns null here); then it sorts hits, and groups them by reader; for each reader it calls setNextReader, and then execute for each hit in turn. Individual sub phases no longer need to concern themselves with sorting docs or keeping track of readers; global structures can be built in getExecutor(SearchContext), per-reader structures in setNextReader and per-doc in execute.	2020-09-03 12:20:55 +01:00
Alan Woodward	af01ccee93	Add specific test for serializing all mapping parameter values (#61844 ) (#61877 ) This commit adds a test to MapperTestCase that explicitly checks that a mapper can serialize all its default values, and that this serialization can then be re-parsed. Note that the test is disabled for non-parametrized mappers as their serialization may in some cases output parameters that are not accepted. Gradually moving all mappers to parametrized form will address this. The commit also contains a fix to keyword mappers, which were not correctly serializing the similarity parameter; this partially addresses #61563. It also enables `null` as a value for `null_value` on `scaled_float`, as a follow-up to #61798	2020-09-03 09:20:26 +01:00
Nik Everett	c19f67ce30	Support longs in BitArray (backport of #61867 ) (#61871 ) We frequently use `long`s with `BitArray` in aggs and right now we have to assert that the `long` fits in an `int`. This adds support for `long` to `BitArray` so we don't need those assertions.	2020-09-02 17:24:31 -04:00
Henning Andersen	867d5f1c68	Search memory leak (#61788 ) (#61862 ) Search could leak memory if global ordinals were calculated as part of a search with low level cancellation enabled. QueryPhase registers a cancellation on the reader that is never removed, which ends up being referenced from the global ordinals cache entry. This keeps an indirect reference to the search context. A significant leak can occur when a heavy aggregation (cardinality for instance) is used and a failure occurs during search, in particular if the pages backing the hyperlog++ structure are not recycled when it is closed. This commit also fixes an issue with an unclosed resource and request breaker adjustment in the cardinality aggregation.	2020-09-02 18:51:14 +02:00
Jim Ferenczi	a0e4331c49	Cleanup usages of QueryPhaseResultConsumer (#61713 ) This commit generalizes how QueryPhaseResultConsumer is initialized. The query phase always uses this consumer so it doesn't need to be hidden behind an abstract class.	2020-09-02 14:41:02 +02:00
Alan Woodward	d59343b4ba	Allow [null] values in [null_value] (#61798 ) (#61807 ) Several field mappers have a null_value parameter, that allows you to specify a placeholder value to insert into a document if the incoming value for that field is null. The default value for this is always null, meaning "add no placeholder". However, we explicitly bar users from setting this parameter directly to null (done in #7978, in order to fix an NPE). This exclusion means that if a mapper is serialized with include_defaults, then we either need to special-case null_value to ensure that it is not output when it holds the default value, or we find that the resulting serialized form cannot be used to create a mapping. This stops us doing some useful generic testing of mappers. This commit permits null as a parameter value for null_value, and changes the tests to check that it is a) permissible and b) applied without throwing errors. As part of the testing changes, a new base class MapperServiceTestCase is refactored from MapperTestCase, holding the various helper methods related to building mappings but not the single-mapper specific abstract methods. Closes #58823	2020-09-02 10:42:19 +01:00
Igor Motov	48e53cca94	Fix wrong NaN comparison (#61795 ) (#61811 ) Fixes wrong NaN comparison in error message generator in GeoPolygonDecomposer and PolygonBuilder. Supersedes #48207 Co-authored-by: Pedro Luiz Cabral Salomon Prado <pedroprado010@users.noreply.github.com>	2020-09-01 15:50:38 -04:00
Tim Brooks	e573fa9abc	Add data.path fast path for FilePermission (#61302 ) The recursive data.path FilePermission check is an extremely hot codepath in Elasticsearch. Unfortunately the FilePermission check in Java is extremely allocation heavy. As it iterates through different file permissions, it allocates byte arrays for each Path component that must be compared. This PR improves the situation by adding the recursive data.path FilePermission it its own PermissionsCollection object which is checked first.	2020-09-01 12:03:22 -06:00
Armin Braun	28710c985d	Dry up Settings from Map Construction (#61778 ) (#61803 ) We used the same hack all over the place. At least drying it up to a single place. Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>	2020-09-01 19:46:10 +02:00
Tanguy Leroux	6e944d9e21	Throws IndexNotFoundException in TransportGetAction for unknown System indices (#61785 ) (#61791 ) The change #57936 introduced a dedicated thread pool for reads in system indices. It also introduced a potential NPE in the case the index to read in not yet present in the cluster state. This commit fixes that bug by using the getIndexSafe() instead of just index() method when retrieving the index's metadata so that an INFE is thrown if the index does not exist.	2020-09-01 17:41:57 +02:00
Dan Hermann	88a448f1cd	Fix wrong result when executing bulk requests with and without pipeline (#60818 ) (#61777 )	2020-09-01 07:05:25 -05:00
Armin Braun	3fd25bfa87	Fix Concurrent Snapshot Create+Delete + Delete Index (#61770 ) (#61773 ) We had a bug here were we put a `null` value into the shard assignment mapping when reassigning work after a snapshot delete had gone through. This only affects partial snaphots but essentially dead-locks the snapshot process. Closes #61762	2020-09-01 13:20:25 +02:00
Tanguy Leroux	787dfda4c1	Prevent snapshots to be mounted as system indices (#61517 ) (#61727 ) System indices can be snapshotted and are therefore potential candidates to be mounted as searchable snapshot indices. As of today nothing prevents a snapshot to be mounted under an index name starting with . and this can lead to conflicting situations because searchable snapshot indices are read-only and Elasticsearch expects some system indices to be writable; because searchable snapshot indices will soon use an internal system index (#60522) to speed up recoveries and we should prevent the system index to be itself a searchable snapshot index (leading to some deadlock situation for recovery). This commit introduces a changes to prevent snapshots to be mounted as a system index.	2020-09-01 11:13:28 +02:00
Boice Huang	8fdd3d158b	Remove redundant symbol in msearch tests (#61353 )	2020-09-01 10:58:22 +02:00
Nik Everett	fb84c1f73e	Calculate precise cardinality upper bounds (#61529 ) (#61754 ) This reworks `CardinalityUpperBound` to support precise estimates while maintaining most of the public API. This will allow us to make more informed choices about the data structures that we use in aggregations. None of those interesting choices come as part of this change, but they are more possible with it.	2020-08-31 15:10:02 -04:00
Dan Hermann	2858e1efc4	Document new stats in _cat/nodes (#60445 ) (#61742 )	2020-08-31 12:40:21 -05:00
Adam Locke	5723b928d7	Remove Outdated Snapshot Docs (#61684 ) (#61728 ) Removing some now outdated statements that refer to a time when snapshot operations could not run concurrently. Closes #61680	2020-08-31 12:04:27 -04:00
Jason Tedor	43cb7c48bd	Adjust Lucene versions for 7.9.1 This commit adjusts the Lucene versions for 7.9.1 after the backporting of upgrading the 7.9 branch to Lucene 8.6.2.	2020-08-31 10:30:39 -04:00
Jason Tedor	64cd229b35	Upgrade to Lucene 8.6.2 (#61688 ) This commit upgrades the Lucene dependencies to 8.6.2.	2020-08-31 09:54:07 -04:00
Rory Hunter	ff6c071275	Implement deprecation logging using log4j (#61629 ) Backport of #61474. Part of #46106. Simplify the implementation of deprecation logging by relying of log4j more completely, and implementing additional behaviour through custom appenders and filters.	2020-08-31 12:42:04 +01:00
Armin Braun	5c86b216e8	Fix Race in testGetSnapshotsRequest (#61694 ) (#61700 ) The fact that the data node is already blocked on writing data files did not guarantee that the cluster state that made the data node start snapshotting is already applied on master. This could lead to races where the get snapshots action still runs based on a state without the snapshot in it, tripping the assertion. Much safer to handle this by waiting on the non-blocking snapshot create to return, which guarantees that the CS has been applied on master. Closes #61541	2020-08-31 11:06:51 +02:00
Armin Braun	22e4d759c3	Speed up Reading Enum Set from Stream (#61678 ) (#61687 ) No need in adding enum values to a normal set and then copying, the `EnumSet` is directly mutable just fine.	2020-08-30 20:49:51 +02:00
Jake Landis	d2e5f2f532	[7.x] Enhance the ingest node simulate verbose output (#60433 ) (#60678 ) This commit enhances the verbose output for the `_ingest/pipeline/_simulate?verbose` api. Specifically this adds the following: * the pipeline processor is now included in the output * the conditional (if) and result is now included in the output iff it was defined * a status field is always displayed. the possible values of status are * `success` - if the processor ran with out errors * `error` - if the processor ran but threw an error that was not ingored * `error_ignored` - if the processor ran but threw an error that was ingored * `skipped` - if the process did not run (currently only possible if the if condition evaluates to false) * `dropped` - if the the `drop` processor ran and dropped the document * a `processor_type` field for the type of processor (e.g. set, rename, etc.) * throw a better error if trying to simulate with a pipeline that does not exist closes #56004	2020-08-27 16:53:09 -05:00
Lee Hinman	1bfebd54ea	[7.x] Allocate newly created indices on data_hot tier nodes (#61342 ) (#61650 ) This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by default when they are created. This does not break existing behavior, as nodes with the `data` role are considered to be part of the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`, `data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by default. This change is a little more complicated than changing the default value for `index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to have a plugin inject a setting into the builder for a newly created index. This has the benefit of allowing this setting to be visible as part of the settings when retrieving the index, for example: ``` // Create an index PUT /eggplant // Get an index GET /eggplant?flat_settings ``` Returns the default settings now of: ```json { "eggplant" : { "aliases" : { }, "mappings" : { }, "settings" : { "index.creation_date" : "1597855465598", "index.number_of_replicas" : "1", "index.number_of_shards" : "1", "index.provided_name" : "eggplant", "index.routing.allocation.include._tier" : "data_hot", "index.uuid" : "6ySG78s9RWGystRipoBFCA", "index.version.created" : "8000099" } } } ``` After the initial setting of this setting, it can be treated like any other index level setting. This new setting is not set on a new index if any of the following is true: - The index is created with an `index.routing.allocation.include.<anything>` setting - The index is created with an `index.routing.allocation.exclude.<anything>` setting - The index is created with an `index.routing.allocation.require.<anything>` setting - The index is created with a null `index.routing.allocation.include._tier` value - The index was created from an existing source metadata (shrink, clone, split, etc) Relates to #60848	2020-08-27 13:41:12 -06:00
Luca Cavanna	f769821bc8	Pass SearchLookup supplier through to fielddataBuilder (#61430 ) (#61638 ) Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not. To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method. As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors. With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition. Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch. This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>. Relates to #59332 Co-authored-by: Nik Everett <nik9000@gmail.com>	2020-08-27 18:09:56 +02:00
Alan Woodward	b6cb590685	Log more information when mappings fail on index creation (#61577 ) Errors from bad mappings at index creation are currently logged at DEBUG level, which can make it difficult to work out what's going on if the index is being auto-created. This commit ups the log level to INFO for auto-created indices, and includes some more information in the log message.	2020-08-27 15:08:51 +01:00
David Turner	411965d392	Allow background cluster state update in tests (#61455 ) Today the `CoordinatorTests` run the publication process as a single atomic action; however in production it appears possible that another master may be elected, publish its state, then fail, then we win another election, all in between the time we sampled our previous cluster state and started to publish the one we first thought of. This violates the `assertClusterStateConsistency()` assertion that verifies the cluster state update event matches the states we actually published and applied. This commit adjusts the tests to run the publication process more asynchronously so as to allow time for this behaviour to occur. This should eventually result in a reproduction of the failure in #61437 that will let us analyse what's really going on there and help us fix it.	2020-08-27 11:22:58 +01:00
David Turner	b866aaf81c	Use int for number of parts in blob store (#61618 ) Today we use `long` to represent the number of parts of a blob. There's no need for this extra range, it forces us to do some casting elsewhere, and indeed when snapshotting we iterate over the parts using an `int` which would be an infinite loop in case of overflow anyway: for (int i = 0; i < fileInfo.numberOfParts(); i++) { This commit changes the representation of the number of parts of a blob to an `int`.	2020-08-27 10:54:03 +01:00
David Turner	5df74cc888	Replace Math.toIntExact with toIntBytes (#61604 ) We convert longs to ints using `Math.toIntExact` in places where we're sure there will be no overflow, but this doesn't explain the intent of these conversions very well. This commit introduces a dedicated method for these conversions, and adds an assertion that we never overflow.	2020-08-27 08:28:54 +01:00
Jay Modi	34c4fc3b91	Remove tasks module to define tasks system index (#61588 ) This commit removes the tasks module that only existed to define the tasks result index, `.tasks`, as a system index. The definition for the tasks results system index descriptor is moved to the `SystemIndices` class with a check that no other plugin or module attempts to define an entry with the same source. Additionally, this change also makes the pattern for the tasks result index a wildcard pattern since we will need this when the index is upgraded (reindex to new name and then alias that to .tasks). Backport of #61540	2020-08-26 09:48:23 -06:00
David Turner	f2dc664228	Remove dead code in EsExecutors (#61574 ) Removes a couple of unused methods.	2020-08-26 16:08:36 +01:00
Przemyslaw Gomulka	9f566644af	Do not create two loggers for DeprecationLogger backport(#58435 ) (#61530 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance. depends on #61515 backports #58435	2020-08-26 16:04:02 +02:00
Igor Motov	f70a59971a	[7.x] Add rate aggregation (#61369 ) (#61554 ) Adds a new rate aggregation that can calculate a document rate for buckets of a date_histogram. Closes #60674	2020-08-25 17:39:00 -04:00
Nik Everett	87cf81e179	Migrate some more mapper test cases (#61507 ) (#61552 ) Migrate some more mapper test cases from `ESSingleNodeTestCase` to `MapperTestCase`.	2020-08-25 15:27:26 -04:00
markharwood	8b56441d2b	Search - add case insensitive support for regex queries. (#59441 ) (#61532 ) Backport to add case insensitive support for regex queries. Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master. This can be removed when 8.7 Lucene is released. Closes #59235	2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka	f3f7d25316	Header warning logging refactoring backport(#55941 ) (#61515 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369 backports #55941	2020-08-25 16:35:54 +02:00
Armin Braun	f22ddf822e	Some Optimizations around BytesArray (#61183 ) (#61511 ) * Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache * Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference * Lighter `writeTo` implementation * Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory	2020-08-25 07:13:39 +02:00
Armin Braun	806dfcfcf7	Speed up Compression Logic by Pooling Resources (#61358 ) (#61495 ) This is mostly motivated by the performance issues we are seeing around the GET mappings REST API which (in case of a large number of indices) will create decompressing streams in a hot loop which takes a significant amount of time for the system calls involved in instantiating deflaters and inflaters. Also, this fixes a leaked deflater when deserializing cached repository data.	2020-08-25 04:01:55 +02:00
Armin Braun	16b932c1dc	Remove Potentially Expensive Use of BytesReference.toBytesRef (#61415 ) (#61503 ) This method might have materialize all the bytes in a reference into a fresh `byte[]`. Using the stream is much safer and only trivially more expensive + in most cases we now run the fast path via `BytesArray` anyway.	2020-08-24 23:58:21 +02:00
Nhat Nguyen	d47bbbafe0	Cancel multisearch when http connection closed (#61399 ) Relates #61337	2020-08-24 15:12:54 -04:00
Nhat Nguyen	23a0f8b617	Detect and optimize noop of update index settings (#61348 ) This optimization is more relevant in the context of CCR. When a node in the follower cluster leaves, we reallocate the shard-follow tasks on that node to other nodes. The new tasks will overwhelm the follower cluster with many put-mapping, update-settings requests, although most of them are noop. This change detects and optimizes the noop update-settings requests.	2020-08-24 15:08:53 -04:00
Nik Everett	f3b6d49ae1	Migrate server mapper tests to new MapperTestCase (#61378 ) (#61490 ) This continues #61301, migrating all of the mappers in `server` to the new `MapperTestCase` which is nicer than `FieldMapperTestCase` because it doesn't depend on all of Elasticsearch.	2020-08-24 13:33:35 -04:00
Armin Braun	bb4d97073c	Remove Favicon Special Path in RestController (#61460 ) (#61487 ) It's unnecessary (and adds one string comparison to every request) to special case the favicon so I added it as a normal REST handler to simplify the code.	2020-08-24 18:36:23 +02:00
Armin Braun	af2e2782eb	Stop Needlessly Copying Bytes in XContent Parsing (#61447 ) (#61469 ) Wrapping a `BytesArray` in a `StreamInput` for deserialization is inefficient. This forces Jackson to internally buffer (i.e. copy) all bytes from the `BytesArray` before deserializing, adding overhead for copying the bytes and managing the buffers. This commit fixes a number of spots where `BytesArray` is the most common type of `BytesReference` to special case this type and parse it more efficiently. Also improves parsing `String`s to use the more efficient direct `String` parsing APIs.	2020-08-24 15:49:15 +02:00

1 2 3 4 5 ...

5275 Commits