OpenSearch

Commit Graph

Author	SHA1	Message	Date
Nhat Nguyen	04dd738782	TEST: trim unsafe commits before opening engine Since #29260, unsafe commits must be trimmed before opening an engine. This makes the engine constructor follow Lucene standard semantics and use the last commit. However, we haven't fully applied this change in some tests. Relates #29260	2018-03-29 14:25:42 -04:00
Boaz Leskes	eb8b31746a	Move trimming unsafe commits from engine ctor to store (#29260 ) As follow up to #28245 , this PR removes the logic for selecting the right start commit from the Engine constructor in favor of explicitly trimming them in the Store, before the engine is opened. This makes the constructor in engine follow standard Lucene semantics and use the last commit. Relates #28245 Relates #29156	2018-03-29 13:35:57 -04:00
Igor Motov	04d0edc8ee	Fix incorrect geohash for lat 90, lon 180 (#29256 ) Due to special treatment for the 0xFFFFFF... value in GeoHashUtils' encodeLatLon method, the hashcode for lat 90, lon 180 is incorrectly encoded as `"000000000000"` instead of "zzzzzzzzzzzz". This commit removes the special treatment and fixes the issue. Closes #22163	2018-03-29 09:23:43 -04:00
Tanguy Leroux	b6568d0cfd	Do not load global state when deleting a snapshot (#29278 ) When deleting a snapshot, it is not necessary to load and to parse the global metadata of the snapshot to delete. Now indices are stored in the snapshot metadata file, we have all the information to resolve the shards files to delete. This commit removes the readSnapshotMetaData() method that was used to load both global and index metadata files. Test coverage should be enough as SharedClusterSnapshotRestoreIT already contains several deletion tests. Related to #28934	2018-03-29 09:16:53 +02:00
Uwe Schindler	6578d8a2a8	Update to forbiddenapis 2.5 (#29285 )	2018-03-28 20:29:02 -07:00
Ryan Ernst	21c985122b	Build: Fix repos.mavenLocal casing (#29289 ) The sysprop repos.mavenLocal may be used to add the local .m2 maven repository for testing snapshots of locally build dependencies. Unfortunately this has to be checked in two different places (they cannot be shared, due to buildSrc being built essentially as a separate project), and the casing of the string sysprop lookups did not align. This commit fixes BuildPlugin's checking of repos.mavenLocal to use the correct casing (camelCase, to match the gradle dsl element).	2018-03-28 20:21:30 -07:00
Nhat Nguyen	9bc167466f	TEST: add log testDoNotRenewSyncedFlushWhenAllSealed This test was failed recently. This commit enables debug log and prints out seals. https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+multijob-unix-compatibility/os=oraclelinux/2234/console https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+intake/1437/console	2018-03-28 22:05:00 -04:00
Bolarinwa Saheed Olayemi	a3e5773522	Docs: Link to Ansible playbook for Elasticsearch (#29238 ) Links to the official Ansible playbook for Elasticsearch.	2018-03-28 18:18:42 -04:00
Jason Tedor	4ef3de40bc	Fix handling of bad requests (#29249 ) Today we have a few problems with how we handle bad requests: - handling requests with bad encoding - handling requests with invalid value for filter_path/pretty/human - handling requests with a garbage Content-Type header There are two problems: - in every case, we give an empty response to the client - in most cases, we leak the byte buffer backing the request! These problems are caused by a broader problem: poor handling preparing the request for handling, or the channel to write to when the response is ready. This commit addresses these issues by taking a unified approach to all of them that ensures that: - we respond to the client with the exception that blew us up - we do not leak the byte buffer backing the request	2018-03-28 16:25:01 -04:00
Simon Willnauer	13e19e7428	Allow _update and upsert to read from the transaction log (#29264 ) We historically removed reading from the transaction log to get consistent results from _GET calls. There was also the motivation that the read-modify-update principle we apply should not be hidden from the user. We still agree on the fact that we should not hide these aspects but the impact on updates is quite significant especially if the same documents is updated before it's written to disk and made serachable. This change adds back the ability to read from the transaction log but only for update calls. Calls to the _GET API will always do a refresh if necessary to return consistent results ie. if stored fields or DocValues Fields are requested. Closes #26802	2018-03-28 18:03:34 +02:00
Christoph Büscher	c3fdf8fbfb	[Docs] Fix small typo in ranking evaluation docs	2018-03-28 17:45:44 +02:00
Christoph Büscher	27e45fc552	Remove IndicesOptions bwc serialization layer (#29281 ) On master we don't need to talk to pre-6.0 nodes anymore.	2018-03-28 16:19:45 +02:00
Luca Cavanna	245dd73156	Bulk processor#awaitClose to close scheduler (#29263 ) When the `BulkProcessor` is used with the high-level REST client, a scheduler is internally created that allows to schedule tasks. Such scheduler is not exposed to users and needs to be closed once the `BulkProcessor` is closed. There are two ways to close the `BulkProcessor` though, one is the ordinary `close` method and the other one is `awaitClose`. The former closes the scheduler while the latter doesn't, leaving threads lingering.	2018-03-28 16:09:18 +02:00
Diwas Joshi	63203b228b	[Docs] Update aggregations.asciidoc (#29265 ) Add note about accuracy of some aggregation results.	2018-03-28 15:01:45 +02:00
Jason Tedor	4417580d05	Remove leftover tests.rest.spec property from docs (#29279 ) We previously had a property to specify the location of the REST test spec files but this was removed in a previous refactoring yet left behind in the docs. This commit removes the last remaining vestige of this parameter.	2018-03-28 08:22:06 -04:00
Yannick Welsch	cacf759213	Remove RELOCATED index shard state (#29246 ) as this information is already covered by ReplicationTracker.primaryMode.	2018-03-28 12:25:46 +02:00
Robin Neatherway	ea8e3661d0	Fix a type check that is always false (#27726 ) DocumentParser: The checks for Text and Keyword were masked by the earlier check for String, which they are child classes of. As String field types are no longer supported, this check can be removed.	2018-03-28 10:20:20 +02:00
Tanguy Leroux	36f8531bf4	Don't load global state when only restoring indices (#29239 ) Restoring a snapshot, or getting the status of finished snapshots, currently always load the global state metadata file from the repository even if it not required. This slows down the restore process (or listing statuses process) and can also be an issue if the global state cannot be deserialized (because it has unknown customs for example). This commit splits the Repository.getSnapshotMetadata() method into two distincts methods: getGlobalMetadata() and getIndexMetadata() that are now called only when needed.	2018-03-28 09:35:05 +02:00
Jason Tedor	1f6a3c1d80	Fix building Javadoc JARs on JDK for client JARs (#29274 ) When a module or plugin register that it has a client JAR, we copy artifacts like the Javadoc and sources JARs as the JARs for the client as well (with -client added to the name). I previously had to disable the Javadoc task on JDK 10 due to a bug in bin/javadoc. After JDK 10 went GA without a fix for this bug, I added workaround to fix the Javadoc task on JDK 10. However, I made a mistake reverting the previously skipped Javadocs tasks and missed that one that copies the Javadoc JAR for client JARs. This commit fixes that issue.	2018-03-27 22:58:44 -04:00
Jason Tedor	38fd9998e7	Require JDK 10 to build Elasticsearch (#29174 ) This commit bumps the minimum compiler version required to build Elasticsearch from JDK 9 to JDK 10.	2018-03-27 19:45:13 -04:00
Lee Hinman	eebda6974d	Decouple NamedXContentRegistry from ElasticsearchException (#29253 ) * Decouple NamedXContentRegistry from ElasticsearchException This commit decouples `NamedXContentRegistry` from using either `ElasticsearchException`, `ParsingException`, or `UnknownNamedObjectException`. This will allow us to move NamedXContentRegistry to its own lib as part of the xcontent extraction work. Relates to #28504	2018-03-27 16:51:31 -06:00
Bart van Oort	67a6a76aad	Docs: Update generating test coverage reports (#29255 ) Old docs said to use maven. That doesn't work. We can't generate the reports right now.	2018-03-27 17:29:19 -04:00
Lee Hinman	7df66abaf5	[TEST] Fix issue with HttpInfo passed invalid parameter HttpInfo is passed the maxContentLength as a parameter, but this value should never be negative. This fixes the test to only pass a positive random value.	2018-03-27 14:20:06 -06:00
Lee Hinman	b4c78019b0	Remove all dependencies from XContentBuilder (#29225 ) * Remove all dependencies from XContentBuilder This commit removes all of the non-JDK dependencies from XContentBuilder, with the exception of `CollectionUtils.ensureNoSelfReferences`. It adds a third extension point around dealing with time-based fields and formatters to work around the Joda dependency. This decoupling allows us to be able to move XContentBuilder to a separate lib so it can be available for things like the high level rest client. Relates to #28504	2018-03-27 12:58:22 -06:00
Jim Ferenczi	3db6f1c9d5	Fix sporadic failure in CompositeValuesCollectorQueueTests This commit fixes a test bug that causes an NPE on empty segments. Closes #29269	2018-03-27 20:11:21 +02:00
Jim Ferenczi	2aaa057387	Propagate ignore_unmapped to inner_hits (#29261 ) In 5.2 `ignore_unmapped` was added to `inner_hits` in order to ignore invalid mapping. This value was automatically set to the value defined in the parent query (`nested`, `has_child`, `has_parent`) but the refactoring of the parent/child in 5.6 removed this behavior unintentionally. This commit restores this behavior but also makes sure that we always automatically enforce this value when the query builder is used directly (previously this was only done by the XContent deserialization). Closes #29071	2018-03-27 18:55:42 +02:00
Nhat Nguyen	dfc9e721d8	TEST: Increase timeout for testPrimaryReplicaResyncFailed The default timeout (eg. 10 seconds) may not be enough for CI to re-allocate shards after the partion is healed. This commit increases the timeout to 30 seconds and enables logging in order to have more detailed information in case this test failed again. Closes #29060	2018-03-27 12:18:09 -04:00
Luca Cavanna	13f9e922f3	REST client: hosts marked dead for the first time should not be immediately retried (#29230 ) This was the plan from day one but due to a silly bug nodes were immediately retried after they were marked as dead for the first time. From the second time on, the expected backoff was applied.	2018-03-27 16:15:44 +02:00
Nhat Nguyen	d1d3edf156	TEST: Use different translog dir for a new engine In #testPruneOnlyDeletesAtMostLocalCheckpoint, we create a new engine but mistakenly use the same translog directory of the existing engine. This prevents translog files from cleaning up when closing the engines. ERROR 0.12s J2 \| InternalEngineTests.testPruneOnlyDeletesAtMostLocalCheckpoint <<< FAILURES! > Throwable #1: java.io.IOException: could not remove the following files (in the order of attempts): > translog-primary-060/translog-2.tlog: java.io.IOException: access denied: This commit makes sure to use a separate directory for each engine in this tes.	2018-03-27 09:45:51 -04:00
Christoph Büscher	8d6832c5ee	Make SearchStats implement Writeable (#29258 ) Moves another class over from Streamable to Writeable. By this, also some constructors can be removed or made private.	2018-03-27 15:21:11 +02:00
Andrew Banchich	d2baf4b191	[Docs] Spelling and grammar changes to reindex.asciidoc (#29232 )	2018-03-27 12:17:46 +02:00
Nhat Nguyen	0ac89a32cc	Do not optimize append-only if seen normal op with higher seqno (#28787 ) When processing an append-only operation, primary knows that operations can only conflict with another instance of the same operation. This is true as the id was freshly generated. However this property doesn't hold for replicas. As soon as an auto-generated ID was indexed into the primary, it can be exposed to a search and users can issue a follow up operation on it. In extremely rare cases, the follow up operation can be arrived and processed on a replica before the original append-only request. In this case we can't simply proceed with the append-only request and blindly add it to the index without consulting the version map. The following scenario can cause difference between primary and replica. 1. Primary indexes an auto-gen-id doc. (id=X, v=1, s#=20) 2. A refresh cycle happens on primary 3. The new doc is picked up and modified - say by a delete by query request - Primary gets a delete doc (id=X, v=2, s#=30) 4. Delete doc is processed first on the replica (id=X, v=2, s#=30) 5. Indexing operation arrives on the replica, since it's an auto-gen-id request and the retry marker is lower, we put it into lucene without any check. Replica has a doc the primary doesn't have. To deal with a potential conflict between an append-only operation and a normal operation on replicas, we need to rely on sequence numbers. This commit maintains the max seqno of non-append-only operations on replica then only apply optimization for an append-only operation only if its seq# is higher than the seq# of all non-append-only.	2018-03-26 16:56:12 -04:00
Andy Bristol	7bf9091942	[test] packaging: gradle tasks for groovy tests (#29046 ) The vagrant test plugin adds tasks for the groovy packaging tests, which run after the bats packaging test tasks.Rename the 'bats' configuration to 'packaging' and remove the option to inherit archives from this configuration.	2018-03-26 13:43:09 -07:00
Nhat Nguyen	87957603c0	Prune only gc deletes below local checkpoint (#28790 ) Once a document is deleted and Lucene is refreshed, we will not be able to look up the `version/seq#` associated with that delete in Lucene. As conflicting operations can still be indexed, we need another mechanism to remember these deletes. Therefore deletes should still be stored in the Version Map, even after Lucene is refreshed. Obviously, we can't remember all deletes forever so a trimming mechanism is needed. Currently, we remember deletes for at least 1 minute (the default GC deletes cycle) and clean them periodically. This is, at the moment, the best we can do on the primary for user facing APIs but this arbitrary time limit is problematic for replicas. Furthermore, we can't rely on the primary and replicas doing the trimming in a synchronized manner, and failing to do so results in the replica and primary making different decisions. The following scenario can cause inconsistency between primary and replica. 1. Primary index doc (index, id=1, v2) 2. Network packet issue causes index operation to back off and wait 3. Primary deletes doc (delete, id=1, v3) 4. Replica processes delete (delete, id=1, v3) 5. 1+ minute passes (GC deletes runs replica) 6. Indexing op is finally sent to the replica which no processes it because it forgot about the delete. We can reply on sequence-numbers to prevent this issue. If we prune only deletes whose seqno at most the local checkpoint, a replica will correctly remember what it needs. The correctness is explained as follows: Suppose o1 and o2 are two operations on the same document with seq#(o1) < seq#(o2), and o2 arrives before o1 on the replica. o2 is processed normally since it arrives first; when o1 arrives it should be discarded: 1. If seq#(o1) <= LCP, then it will be not be added to Lucene, as it was already previously added. 2. If seq#(o1) > LCP, then it depends on the nature of o2: - If o2 is a delete then its seq# is recorded in the VersionMap, since seq#(o2) > seq#(o1) > LCP, so a lookup can find it and determine that o1 is stale. - If o2 is an indexing then its seq# is either in Lucene (if refreshed) or the VersionMap (if not refreshed yet), so a real-time lookup can find it and determine that o1 is stale. In this PR, we prefer to deploy a single trimming strategy, which satisfies both requirements, on primary and replicas because: - It's simpler - no need to distinguish if an engine is running at primary mode or replica mode or being promoted. - If a replica subsequently is promoted, user experience is fully maintained as that replica remembers deletes for the last GC cycle. However, the version map may consume less memory if we deploy two different trimming strategies for primary and replicas.	2018-03-26 13:42:08 -04:00
Boaz Leskes	bca264699a	remove testUnassignedShardAndEmptyNodesInRoutingTable testUnassignedShardAndEmptyNodesInRoutingTable and that test is as old as time and does a very bogus thing. it is an IT test which extracts the GatewayAllocator from the node and tells it to allocated unassigned shards, while giving it a conjured cluster state with no nodes in it (it uses the DiscoveryNodes.EMPTY_NODES. This is never a cluster state we want to reroute on (we always have at least master node in it). I'm going to just delete the test as I don't think it adds much value. Closes #21463	2018-03-26 17:10:57 +02:00
Jim Ferenczi	dd77d7fd0a	#28745 : remove extra option in the composite rest tests `allow_partial_search_results` is not needed for these tests.	2018-03-26 14:32:59 +02:00
Boaz Leskes	f5d4550e93	Fold EngineDiskUtils into Store, for better lock semantics (#29156 ) #28245 has introduced the utility class`EngineDiskUtils` with a set of methods to prepare/change translog and lucene commit points. That util class bundled everything that's needed to create and empty shard, bootstrap a shard from a lucene index that was just restored etc. In order to safely do these manipulations, the util methods acquired the IndexWriter's lock. That would sometime fail due to concurrent shard store fetching or other short activities that require the files not to be changed while they read from them. Since there is no way to wait on the index writer lock, the `Store` class has other locks to make sure that once we try to acquire the IW lock, it will succeed. To side step this waiting problem, this PR folds `EngineDiskUtils` into `Store`. Sadly this comes with a price - the store class doesn't and shouldn't know about the translog. As such the logic is slightly less tight and callers have to do the translog manipulations on their own.	2018-03-26 14:08:03 +02:00
Christoph Büscher	a9392f6d42	Add file permissions checks to precommit task This adds a check for source files that have the execute bit set to the precommit task.	2018-03-26 13:37:55 +02:00
Christoph Büscher	318b0af953	Remove execute mode bit from source files Some source files seem to have the execute bit (a+x) set, which doesn't really seem to hurt but is a bit odd. This change removes those, making the permissions similar to other source files in the repository.	2018-03-26 13:37:55 +02:00
Jim Ferenczi	5288235ca3	Optimize the composite aggregation for match_all and range queries (#28745 ) This change refactors the composite aggregation to add an execution mode that visits documents in the order of the values present in the leading source of the composite definition. This mode does not need to visit all documents since it can early terminate the collection when the leading source value is greater than the lowest value in the queue. Instead of collecting the documents in the order of their doc_id, this mode uses the inverted lists (or the bkd tree for numerics) to collect documents in the order of the values present in the leading source. For instance the following aggregation: ``` "composite" : { "sources" : [ { "value1": { "terms" : { "field": "timestamp", "order": "asc" } } } ], "size": 10 } ``` ... can use the field `timestamp` to collect the documents with the 10 lowest values for the field instead of visiting all documents. For composite aggregation with more than one source the execution can early terminate as soon as one of the 10 lowest values produces enough composite buckets. For instance if visiting the first two lowest timestamp created 10 composite buckets we can early terminate the collection since it is guaranteed that the third lowest timestamp cannot create a composite key that compares lower than the one already visited. This mode can execute iff: * The leading source in the composite definition uses an indexed field of type `date` (works also with `date_histogram` source), `integer`, `long` or `keyword`. * The query is a match_all query or a range query over the field that is used as the leading source in the composite definition. * The sort order of the leading source is the natural order (ascending since postings and numerics are sorted in ascending order only). If these conditions are not met this aggregation visits each document like any other agg.	2018-03-26 09:51:37 +02:00
Christoph Büscher	afe95a7738	[Docs] Add rank_eval size parameter k (#29218 ) The rank_eval documentation was missing an explanation of the parameter `k` that controls the number of top hits that are used in the ranking evaluation. Closes #29205	2018-03-23 18:04:32 +01:00
Nicholas Knize	d400a08788	[DOCS] Remove ignore_z_value parameter link Removes invalid ignore_z_value parameter link in geo-point.asciidoc.	2018-03-23 11:07:24 -05:00
Jean-Charles Legras	687fe860ac	Docs: Update docs/index_.asciidoc (#29172 ) Use `_doc` in the routing example instead of `tweet` to agree with the text and line up with the other examples.	2018-03-23 11:35:10 -04:00
Petr Novák	16bffc7394	Docs: Link C++ client lib elasticlient (#28949 ) elasticlient is simple library for simplified work with Elasticsearch in C++	2018-03-23 11:30:01 -04:00
Yannick Welsch	3b8a8867c4	[DOCS] Unregister repository instead of deleting it (#29206 ) Relates to #15426	2018-03-23 15:53:36 +01:00
Nik Everett	8c59e43ac7	Docs: HighLevelRestClient#multiSearch (#29144 ) Adds docs for `HighLevelRestClient#multiSearch`. Unlike the `multiGet` docs these are much more sparse because multi-search doesn't support setting many options on the `MultiSearchRequest` and instead just wraps a list of `SearchRequest`s. Closes #28389	2018-03-23 10:11:50 -04:00
Nicholas Knize	fede633563	Add Z value support to geo_shape This enhancement adds Z value support (source only) to geo_shape fields. If vertices are provided with a third dimension, the third dimension is ignored for indexing but returned as part of source. Like beofre, any values greater than the 3rd dimension are ignored. closes #23747	2018-03-23 08:50:55 -05:00
Nhat Nguyen	794de63232	Remove type casts in logging in server component (#28807 ) This commit removes type-casts in logging in the server component (other components will be done later). This also adds a parameterized message test which would catch breaking-changes related to lambdas in Log4J.	2018-03-23 07:35:50 -04:00
Yu	4a8099c696	Change BroadcastResponse from ToXContentFragment to ToXContentObject (#28878 ) While working on #27799, we find that it might make sense to change BroadcastResponse from ToXContentFragment to ToXContentObject, seeing that it's rather a complete XContent object and also the other Responses are normally ToXContentObject. By doing this, we can also move the XContent build logic of BroadcastResponse's subclasses, from Rest Layer to the concrete classes themselves. Relates to #3889	2018-03-23 10:53:37 +01:00
Milan Chovatiya	8328b9c5cd	REST : Split `RestUpgradeAction` into two actions (#29124 ) Closes #29062	2018-03-23 10:37:31 +01:00

1 2 3 4 5 ...

30482 Commits All Branches Search

30482 Commits

All Branches