OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	87957603c0	Prune only gc deletes below local checkpoint (#28790 ) Once a document is deleted and Lucene is refreshed, we will not be able to look up the `version/seq#` associated with that delete in Lucene. As conflicting operations can still be indexed, we need another mechanism to remember these deletes. Therefore deletes should still be stored in the Version Map, even after Lucene is refreshed. Obviously, we can't remember all deletes forever so a trimming mechanism is needed. Currently, we remember deletes for at least 1 minute (the default GC deletes cycle) and clean them periodically. This is, at the moment, the best we can do on the primary for user facing APIs but this arbitrary time limit is problematic for replicas. Furthermore, we can't rely on the primary and replicas doing the trimming in a synchronized manner, and failing to do so results in the replica and primary making different decisions. The following scenario can cause inconsistency between primary and replica. 1. Primary index doc (index, id=1, v2) 2. Network packet issue causes index operation to back off and wait 3. Primary deletes doc (delete, id=1, v3) 4. Replica processes delete (delete, id=1, v3) 5. 1+ minute passes (GC deletes runs replica) 6. Indexing op is finally sent to the replica which no processes it because it forgot about the delete. We can reply on sequence-numbers to prevent this issue. If we prune only deletes whose seqno at most the local checkpoint, a replica will correctly remember what it needs. The correctness is explained as follows: Suppose o1 and o2 are two operations on the same document with seq#(o1) < seq#(o2), and o2 arrives before o1 on the replica. o2 is processed normally since it arrives first; when o1 arrives it should be discarded: 1. If seq#(o1) <= LCP, then it will be not be added to Lucene, as it was already previously added. 2. If seq#(o1) > LCP, then it depends on the nature of o2: - If o2 is a delete then its seq# is recorded in the VersionMap, since seq#(o2) > seq#(o1) > LCP, so a lookup can find it and determine that o1 is stale. - If o2 is an indexing then its seq# is either in Lucene (if refreshed) or the VersionMap (if not refreshed yet), so a real-time lookup can find it and determine that o1 is stale. In this PR, we prefer to deploy a single trimming strategy, which satisfies both requirements, on primary and replicas because: - It's simpler - no need to distinguish if an engine is running at primary mode or replica mode or being promoted. - If a replica subsequently is promoted, user experience is fully maintained as that replica remembers deletes for the last GC cycle. However, the version map may consume less memory if we deploy two different trimming strategies for primary and replicas.	2018-03-26 13:42:08 -04:00
Boaz Leskes	bca264699a	remove testUnassignedShardAndEmptyNodesInRoutingTable testUnassignedShardAndEmptyNodesInRoutingTable and that test is as old as time and does a very bogus thing. it is an IT test which extracts the GatewayAllocator from the node and tells it to allocated unassigned shards, while giving it a conjured cluster state with no nodes in it (it uses the DiscoveryNodes.EMPTY_NODES. This is never a cluster state we want to reroute on (we always have at least master node in it). I'm going to just delete the test as I don't think it adds much value. Closes #21463	2018-03-26 17:10:57 +02:00
Boaz Leskes	f5d4550e93	Fold EngineDiskUtils into Store, for better lock semantics (#29156 ) #28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change translog and lucene commit points. That util class bundled everything that's needed to create and empty shard, bootstrap a shard from a lucene index that was just restored etc. In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That would sometime fail due to concurrent shard store fetching or other short activities that require the files not to be changed while they read from them. Since there is no way to wait on the index writer lock, the `Store` class has other locks to make sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the translog manipulations on their own.	2018-03-26 14:08:03 +02:00
Christoph Büscher	318b0af953	Remove execute mode bit from source files Some source files seem to have the execute bit (a+x) set, which doesn't really seem to hurt but is a bit odd. This change removes those, making the permissions similar to other source files in the repository.	2018-03-26 13:37:55 +02:00
Jim Ferenczi	5288235ca3	Optimize the composite aggregation for match_all and range queries (#28745 ) This change refactors the composite aggregation to add an execution mode that visits documents in the order of the values present in the leading source of the composite definition. This mode does not need to visit all documents since it can early terminate the collection when the leading source value is greater than the lowest value in the queue. Instead of collecting the documents in the order of their doc_id, this mode uses the inverted lists (or the bkd tree for numerics) to collect documents in the order of the values present in the leading source. For instance the following aggregation: ``` "composite" : { "sources" : [ { "value1": { "terms" : { "field": "timestamp", "order": "asc" } } } ], "size": 10 } ``` ... can use the field `timestamp` to collect the documents with the 10 lowest values for the field instead of visiting all documents. For composite aggregation with more than one source the execution can early terminate as soon as one of the 10 lowest values produces enough composite buckets. For instance if visiting the first two lowest timestamp created 10 composite buckets we can early terminate the collection since it is guaranteed that the third lowest timestamp cannot create a composite key that compares lower than the one already visited. This mode can execute iff: * The leading source in the composite definition uses an indexed field of type `date` (works also with `date_histogram` source), `integer`, `long` or `keyword`. * The query is a match_all query or a range query over the field that is used as the leading source in the composite definition. * The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only). If these conditions are not met this aggregation visits each document like any other agg.	2018-03-26 09:51:37 +02:00
Nicholas Knize	fede633563	Add Z value support to geo_shape This enhancement adds Z value support (source only) to geo_shape fields. If vertices are provided with a third dimension, the third dimension is ignored for indexing but returned as part of source. Like beofre, any values greater than the 3rd dimension are ignored. closes #23747	2018-03-23 08:50:55 -05:00
Nhat Nguyen	794de63232	Remove type casts in logging in server component (#28807 ) This commit removes type-casts in logging in the server component (other components will be done later). This also adds a parameterized message test which would catch breaking-changes related to lambdas in Log4J.	2018-03-23 07:35:50 -04:00
Yu	4a8099c696	Change BroadcastResponse from ToXContentFragment to ToXContentObject (#28878 ) While working on #27799, we find that it might make sense to change BroadcastResponse from ToXContentFragment to ToXContentObject, seeing that it's rather a complete XContent object and also the other Responses are normally ToXContentObject. By doing this, we can also move the XContent build logic of BroadcastResponse's subclasses, from Rest Layer to the concrete classes themselves. Relates to #3889	2018-03-23 10:53:37 +01:00
Milan Chovatiya	8328b9c5cd	REST : Split `RestUpgradeAction` into two actions (#29124 ) Closes #29062	2018-03-23 10:37:31 +01:00
Nhat Nguyen	14157c8705	Harden periodically check to avoid endless flush loop (#29125 ) In #28350, we fixed an endless flushing loop which may happen on replicas by tightening the relation between the flush action and the periodically flush condition. 1. The periodically flush condition is enabled only if it is disabled after a flush. 2. If the periodically flush condition is enabled then a flush will actually happen regardless of Lucene state. (1) and (2) guarantee that a flushing loop will be terminated. Sadly, the condition 1 can be violated in edge cases as we used two different algorithms to evaluate the current and future uncommitted translog size. - We use method `uncommittedSizeInBytes` to calculate current uncommitted size. It is the sum of translogs whose generation at least the minGen (determined by a given seqno). We pick a continuous range of translogs since the minGen to evaluate the current uncommitted size. - We use method `sizeOfGensAboveSeqNoInBytes` to calculate the future uncommitted size. It is the sum of translogs whose maxSeqNo at least the given seqNo. Here we don't pick a range but select translog one by one. Suppose we have 3 translogs `gen1={#1,#2}, gen2={}, gen3={#3} and seqno=#1`, `uncommittedSizeInBytes` is the sum of gen1, gen2, and gen3 while `sizeOfGensAboveSeqNoInBytes` is the sum of gen1 and gen3. Gen2 is excluded because its maxSeqno is still -1. This commit removes both `sizeOfGensAboveSeqNoInBytes` and `uncommittedSizeInBytes` methods, then enforces an engine to use only `sizeInBytesByMinGen` method to evaluate the periodically flush condition. Closes #29097 Relates ##28350	2018-03-22 14:31:15 -04:00
Jim Ferenczi	c93c7f3121	Remove deprecated options for query_string (#29203 ) This commit removes some parameters deprecated in 6.x (or 5.x): `use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`. Closes #25551	2018-03-22 18:37:08 +01:00
Yu	24c8d8f5ef	REST high-level client: add force merge API (#28896 ) Relates to #27205	2018-03-22 17:17:16 +01:00
Lee Hinman	7d1de890b8	Decouple more classes from XContentBuilder and make builder strict (#29197 ) This commit decouples `BytesRef`, `Releaseable`, and `TimeValue` from XContentBuilder, and paves the way for doupling `ByteSizeValue` as well. It moves much of the Lucene and Joda encoding into a new SPI extension that is loaded by XContentBuilder to know how to encode these values. Part of doing this also allows us to make JSON encoding strict, as we no longer allow just any old object to be passed (in the past it was possible to get json that was `"field": "java.lang.Object@d8355a8"` if no one was careful about what was passed in). Relates to #28504	2018-03-22 08:18:55 -06:00
Christoph Büscher	d6d3fb3c73	Use EnumMap in ClusterBlocks (#29112 ) By using EnumMap instead of an ImmutableLevelHolder array we can avoid the using enum ordinals to index into the array.	2018-03-22 11:14:24 +01:00
Tanguy Leroux	edf27a599e	Add new setting to disable persistent tasks allocations (#29137 ) This commit adds a new setting `cluster.persistent_tasks.allocation.enable` that can be used to enable or disable the allocation of persistent tasks. The setting accepts the values `all` (default) or `none`. When set to none, the persistent tasks that are created (or that must be reassigned) won't be assigned to a node but will reside in the cluster state with a no "executor node" and a reason describing why it is not assigned: ``` "assignment" : { "executor_node" : null, "explanation" : "persistent task [foo/bar] cannot be assigned [no persistent task assignments are allowed due to cluster settings]" } ```	2018-03-22 09:18:07 +01:00
Nhat Nguyen	7d44d75774	Adjust PreSyncedFlushResponse bwc versions We discussed and agreed to include the synced-flush change in 6.3.0+ but not in 5.6.9. We will re-evaluate the urgency and importance of the issue then decide which versions that the change should be included.	2018-03-21 16:50:35 -04:00
markharwood	93ff973afc	Tests - fix incorrect test assumption that zero-doc buckets will be returned by the adjacency matrix aggregation. Closes #29159 (#29167 )	2018-03-21 10:42:14 +00:00
Jason Tedor	2f6c77337e	Remove 6.1.5 version constant The assumption here is that we will no longer be making a release from the 6.1 branch. Since we assume that all versions on this branch are actually released, we do not want to leave behind any versions that would require a snapshot build. We do have a test that verifies that all released versions are present here, so if another release is performed from the 6.1 branch, that test will fail and we will know to add the version constant at that time.	2018-03-21 06:28:17 -04:00
Adrien Grand	8f9d2ee4e2	Reject updates to the `_default_` mapping. (#29165 ) This will reject mapping updates to the `_default_` mapping with 7.x indices and still emit a deprecation warning with 6.x indices. Relates #15613 Supersedes #28248	2018-03-21 10:44:11 +01:00
Nhat Nguyen	f938c4267e	Fix BWC issue for PreSyncedFlushResponse I misunderstood how the bwc versions works. If we backport to 5.x, we need to backport to all supported 6.*. This commit corrects the BWC versions for PreSyncedFlushResponse. Relates #29103	2018-03-20 13:56:15 -04:00
Lee Hinman	b4af451ec5	Remove BytesArray and BytesReference usage from XContentFactory (#29151 ) * Remove BytesArray and BytesReference usage from XContentFactory This removes the usage of `BytesArray` and `BytesReference` from `XContentFactory`. Instead, a regular `byte[]` should be passed. To assist with this a helper has been added to `XContentHelper` that will preserve the offset and length from the underlying BytesReference. This is part of ongoing work to separate the XContent parts from ES so they can be factored into their own jar. Relates to #28504	2018-03-20 11:52:26 -06:00
Lee Hinman	4bd217c94f	Add pluggable XContentBuilder writers and human readable writers (#29120 ) * Add pluggable XContentBuilder writers and human readable writers This adds the ability to use SPI to plug in writers for XContentBuilder. By implementing the XContentBuilderProvider class we can allow Elasticsearch to plug in different ways to encode types to JSON. Important caveat for this, we should always try to have the class implement `ToXContentFragment` first, however, in the case of classes from our dependencies (think Joda classes or Lucene classes) we need a way to specify writers for these classes. This also makes the human-readable field writers generic and pluggable, so that we no longer need to tie XContentBuilder to things like `TimeValue` and `ByteSizeValue`. Contained as part of this moves all the TimeValue human readable fields to the new `humanReadableField` method. A future commit will move the `ByteSizeValue` calls over to this method. Relates to #28504	2018-03-20 11:39:24 -06:00
Christoph Büscher	701625b065	Add unreleased version 6.2.4 (#29171 )	2018-03-20 18:38:06 +01:00
Christoph Büscher	5a97fe75da	Add unreleased version 6.1.5 (#29168 )	2018-03-20 18:31:59 +01:00
Luca Cavanna	ff09c82319	REST high-level client: add clear cache API (#28866 ) * REST high-level client: add clear cache API Relates to #27205 Also Closes #26947 (rest-spec were outdated)	2018-03-20 10:39:36 +01:00
Lee Hinman	687577a516	Fix javadoc warning in Strings for missing parameter description Fixes a parameter in `Strings` that had a javadoc annotation but was missing the description, causing warnings in the build.	2018-03-19 12:28:15 -06:00
Lee Hinman	3025295f7e	Decouple Text and Geopoint from XContentBuilder (#29119 ) This removes the `Text` and `Geopoint` special handling from `XContentBuilder`. Instead, these classes now implement `ToXContentFragment` and render themselves accordingly. This allows us to further decouple XContentBuilder from Elasticsearch-specific classes so it can be factored into a standalone lib at a later time. Relates to #28504	2018-03-19 08:54:10 -06:00
Nik Everett	bf05c600c4	REST: Include suppressed exceptions on failures (#29115 ) This modifies xcontent serialization of Exceptions to contain suppressed exceptions. If there are any suppressed exceptions they are included in the exception response by default. The reasoning here is that they are fairly rare but when they exist they almost always add extra useful information. Take, for example, the response when you specify two broken ingest pipelines: ``` { "error" : { "root_cause" : ...snip... "type" : "parse_exception", "reason" : "[field] required property is missing", "header" : { "processor_type" : "set", "property_name" : "field" }, "suppressed" : [ { "type" : "parse_exception", "reason" : "[field] required property is missing", "header" : { "processor_type" : "convert", "property_name" : "field" } } ] }, "status" : 400 } ``` Moreover, when suppressed exceptions come from 500 level errors should give us more useful debugging information. Closes #23392	2018-03-19 10:52:50 -04:00
Tanguy Leroux	0f93b7abdf	Fix compilation errors in ML integration tests After elastic/elasticsearch#29109, the `needsReassignment` method has been moved to the PersistentTasksClusterService. This commit fixes some compilation in tests I introduced.	2018-03-19 09:46:53 +01:00
Tanguy Leroux	b57bd695f2	Small code cleanups and refactorings in persistent tasks (#29109 ) This commit consists of small code cleanups and refactorings in the persistent tasks framework. Most changes are in PersistentTasksClusterService where some methods have been renamed or merged together, documentation has been added, unused code removed in order to improve readability of the code.	2018-03-19 09:26:17 +01:00
Nhat Nguyen	f1029aaad5	getMinGenerationForSeqNo should acquire read lock (#29126 ) The method Translog#getMinGenerationForSeqNo does not modify the current translog but only access, it therefore should acquire the readLock instead of writeLock.	2018-03-17 17:43:20 -04:00
Nhat Nguyen	c9749180a1	Backport - Do not renew sync-id PR to 5.6 and 6.3 Relates ##29103	2018-03-17 11:38:22 -04:00
Jason Tedor	2e93a9158f	Align thread pool info to thread pool configuration (#29123 ) Today we report thread pool info using a common object. This means that we use a shared set of terminology that is not consistent with the terminology used to the configure thread pools. This holds in particular for the minimum and maximum number of threads in the thread pool where we use the following terminology: thread pool info \| fixed \| scaling min core size max max size This commit changes the display of thread pool info to be dependent on the type of the thread pool so that we can align the terminology in the output of thread pool info with the terminology used to configure a thread pool.	2018-03-16 22:47:06 -04:00
Nhat Nguyen	22ad52a288	TEST: Adjust translog size assumption in new engine A new engine now can have more than one empty translog since #28676. This cause #testShouldPeriodicallyFlush failed because in the test we asssume an engine should have one empty translog. This commit takes into account the extra translog size of a new engine.	2018-03-16 21:50:31 -04:00
olcbean	47211c00e9	REST: Clear Indices Cache API simplify param parsing (#29111 ) Simplify the parsing of the params in Clear Indices Cache API, as a follow up to the removing of the deprecated parameter names.	2018-03-16 16:50:34 -04:00
Jason Tedor	4d62640bf1	Fix typo in ExceptionSerializationTests This commit fixes a little typo in ExceptionSerializationTests.java replacing "weas" by "was".	2018-03-16 15:52:39 -04:00
Jason Tedor	1f1a4d17b4	Remove BWC layer for rejected execution exception The serialization changes for rejected execution exceptions has been backported to 6.x with the intention to appear in all versions since 6.3.0. Therefore, this BWC layer is no longer needed in master since master would never speak to a node that does not speak the same serialization.	2018-03-16 14:40:17 -04:00
Jason Tedor	6bf742dd1b	Fix EsAbortPolicy to conform to API (#29075 ) The rejected execution handler API says that rejectedExecution(Runnable, ThreadPoolExecutor) throws a RejectedExecutionException if the task must be rejected due to capacity on the executor. We do throw something that smells like a RejectedExecutionException (it is named EsRejectedExecutionException) yet we violate the API because EsRejectedExecutionException is not a RejectedExecutionException. This has caused problems before where we try to catch RejectedExecution when invoking rejectedExecution but this causes EsRejectedExecutionException to go uncaught. This commit addresses this by modifying EsRejectedExecutionException to extend RejectedExecutionException.	2018-03-16 14:34:36 -04:00
David Turner	158bb23887	Remove usages of obsolete settings (#29087 ) The settings `indices.recovery.concurrent_streams` and `indices.recovery.concurrent_small_file_streams` were removed in `f5e4cd4616`. This commit removes their last traces from the codebase.	2018-03-16 15:35:40 +00:00
Nhat Nguyen	2c1ef3d4c6	Do not renew sync-id if all shards are sealed (#29103 ) Today the synced-flush always issues a new sync-id even though all shards haven't been changed since the last seal. This causes active shards to have different a sync-id from offline shards even though all were sealed and no writes since then. This commit adjusts not to renew sync-id if all active shards are sealed with the same sync-id. Closes #27838	2018-03-16 11:16:30 -04:00
Adrien Grand	0755ff425f	Clarify requirements of strict date formats. (#29090 ) Closes #29014	2018-03-16 14:39:36 +01:00
Alan Woodward	a2d5cf6514	Compilation fix for #29067	2018-03-16 13:33:25 +00:00
Alan Woodward	986e518170	Store offsets in index prefix fields when stored in the parent field (#29067 ) The index prefix field is normally indexed as docs-only, given that it cannot be used in phrases. However, in the case that the parent field has been indexed with offsets, or has term-vector offsets, we should also store this in the index prefix field for highlighting. Note that this commit does not implement highlighting on prefix fields, but rather ensures that future work can implement this without a backwards-break in index data. Closes #28994	2018-03-16 11:39:46 +00:00
Tanguy Leroux	f14146982f	Use removeTask instead of finishTask in PersistentTasksClusterService (#29055 ) The method `PersistentTasksClusterService.finishTask()` has been modified since it was added and does not use any `removeOncompletion` flag anymore. Its behavior is now similar to `removeTask()` and can be replaced by this one. When a non existing task is removed, the cluster state update task will fail and its `source` will still indicate `finish persistent task`/`remove persistent task`.	2018-03-16 10:20:56 +01:00
Yogesh Gaikwad	a685784cea	CLI: Close subcommands in MultiCommand (#28954 ) * CLI Command: MultiCommand must close subcommands to release resources properly - Changes are done to override the close method and call close on subcommands using IOUtils#close - Unit Test Closes #28953	2018-03-16 09:59:23 +11:00
Nhat Nguyen	c75790e7c0	TEST: write ops should execute under shard permit (#28966 ) Currently ESIndexLevelReplicationTestCase executes write operations without acquiring index shard permit. This may prevent the primary term on replica from being updated or cause a race between resync and indexing on primary. This commit ensures that write operations are always executed under shard permit like the production code.	2018-03-15 14:42:15 -04:00
Mayya Sharipova	8cb3d18eac	Revert "Improve error message for installing plugin (#28298 )" This reverts commit `0cc1ffdf20` The reason is that Windows test are failing, because of the incorrect path for the plugin	2018-03-15 10:47:50 -07:00
Adrien Grand	404e776a45	Validate regular expressions in dynamic templates. (#29013 ) Today you would only get these errors at index time. Relates #24749	2018-03-15 16:43:56 +01:00
Christoph Büscher	312ccc05d5	[Tests] Fix GetResultTests and DocumentFieldTests failures (#29083 ) Changes made in #28972 seems to have changed some assumptions about how SMILE and CBOR write byte[] values and how this is tested. This changes the generation of the randomized DocumentField values back to BytesArray while expecting the JSON and YAML deserialisation to produce Base64 encoded strings and SMILE and CBOR to parse back BytesArray instances. Closes #29080	2018-03-15 16:42:26 +01:00
Adrien Grand	18d848f218	Reenable LiveVersionMapTests.testRamBytesUsed on Java 9. (#29063 ) I also had to make the test more lenient. This is due to the fact that Lucene's RamUsageTester was changed in order not to reflect `java.*` classes and the way that it estimates ram usage of maps is by assuming it has similar memory usage to an `Object[]` array that stores all keys and values. The implementation in `LiveVersionMap` tries to be slightly more realistic by taking the load factor and linked lists into account, so it usually gives a higher estimate which happens to be closer to reality. Closes #22548	2018-03-15 16:39:02 +01:00

1 2 3 4 5 ...

343 Commits