OpenSearch

mirror of https://github.com/honeymoose/OpenSearch.git synced 2025-02-13 08:25:26 +00:00

Author	SHA1	Message	Date
Simon Willnauer	b294250aba	Remove unused searcher parameter in SearchService#createContext (#27227 ) This parameter isn't used anywhere and just adds complexity.	2017-11-02 14:58:34 +01:00
Colin Goodheart-Smithe	c1b8140c83	Upgrade to Lucene 7.1 (#27225 )	2017-11-02 13:25:33 +00:00
Simon Willnauer	f928d613ad	Move IndexShard#getWritingBytes() under InternalEngine (#27209 ) We do some accounting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to #26972	2017-11-02 10:43:17 +01:00
olcbean	b9896465cd	Introducing took time for _msearch This commit adds the took time to the response for _msearch. Relates #23767	2017-11-01 21:39:04 -04:00
Jason Tedor	59657ad1cb	Lazy initialize checkpoint tracker bit sets This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated. Relates #27179	2017-11-01 21:26:52 -04:00
Jason Tedor	90d6317437	Remove checkpoint tracker bit sets setting We added an index-level setting for controlling the size of the bit sets used to back the local checkpoint tracker. This setting is really only needed to control the memory footprint of the bit sets but we do not think this setting is going to be needed. This commit removes this setting before it is released to the wild after which we would have to worry about BWC implications. Relates #27191	2017-11-01 21:13:01 -04:00
Colin Goodheart-Smithe	99aca9cdfc	Enhances exists queries to reduce need for `_field_names` (#26930 ) * Enhances exists queries to reduce need for `_field_names` Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance. This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`. Closes #26770 * Addresses review comments * Addresses more review comments * implements existsQuery explicitly on every mapper * Reinstates ability to perform term query on `_field_names` * Added bwc depending on index created version * Review Comments * Skips tests that are not supported in 6.1.0 These values will need to be changed after backporting this PR to 6.x	2017-11-01 10:46:59 +00:00
Martijn van Groningen	d805c41b28	Added new terms_set query This query returns documents that match with at least one ore more of the provided terms. The number of terms that must match varies per document and is either controlled by a minimum should match field or computed per document in a minimum should match script. Closes #26915	2017-11-01 10:55:18 +01:00
Jack Conradson	fd73e5fa41	Add version 6.0.0	2017-10-31 17:49:52 -07:00
Tanguy Leroux	13cd08b1e6	Convert index blocks to cluster block exceptions (#27050 )	2017-10-31 16:11:18 +01:00
Shai Erera	bd0261916c	Fix Laplace scorer to multiply by alpha (and not add) (#27125 )	2017-10-31 13:08:44 +01:00
javanna	34666844b3	[DOCS] Clarify migrate guide and search request validation Relates to #26811	2017-10-31 12:36:00 +01:00
kel	c3e2bdf20c	Raise IllegalArgumentException if query validation failed (#26811 ) Closes #26799	2017-10-31 12:17:27 +01:00
Armin Braun	a4c159e91e	prevent duplicate fields when mixing parent and root nested includes (#27072 ) Closes #26990	2017-10-31 10:01:33 +01:00
Adrien Grand	3812d3cb43	TopHitsAggregator must propagate calls to `setScorer`. (#27138 ) It is required in order to work correctly with bulk scorer implementations that change the scorer during the collection process. Otherwise sub collectors might call `Scorer.score()` on the wrong scorer. Closes #27131	2017-10-31 09:59:06 +01:00
Jason Tedor	a566942219	Refactor internal engine This commit is a minor refactoring of internal engine to move hooks for generating sequence numbers into the engine itself. As such, we refactor tests that relied on this hook to use the new hook, and remove the hook from the sequence number service itself. Relates #27082	2017-10-30 13:10:20 -04:00
Martijn van Groningen	c406a91158	Fix division by zero in phrase suggester that causes assertion to fail	2017-10-30 09:04:56 +01:00
Nhat	d01ad9367e	Enable Docstats with totalSizeInBytes for 6.1.0 Relates https://github.com/elastic/elasticsearch/pull/27117	2017-10-28 14:54:53 -04:00
Nhat	07d270b45f	Adds average document size to DocsStats (#27117 ) This change is required in order to support a size based check for the index rollover. The index size is estimated by sampling the existing segments only. We prefer using segments to StoreStats because StoreStats is not reliable if indexing or merging operations are in progress. Relates #27004	2017-10-28 12:47:08 -04:00
Jim Ferenczi	6625ecfff4	Fix max score tracking with field collapsing (#27122 ) This change makes sure that we track score when sort is set to relevancy only. In this case we always track max score like normal search does. Closes #23840	2017-10-27 09:18:34 +02:00
olcbean	35a2cc1003	fixed typo in ConstructingObjectParse (#27129 )	2017-10-26 13:14:56 -06:00
Jim Ferenczi	d1acf449f5	Apply missing request options to the expand phase (#27118 ) * Apply missing request options to the expand phase This change adds some missing options to the expand query that builds the inner hits for field collapsing. The following options are now applied to the inner_hits query: * post_filters * preferences * routing Closes #27079 Closes #26649	2017-10-26 17:01:57 +02:00
Simon Willnauer	1460a3feac	Only pull SegmentReader once in getSegmentInfo (#27121 )	2017-10-26 14:56:14 +02:00
Jason Tedor	0174d13ca2	Fix BWC for discovery stats The new discovery stats were pushed to the 6.x branch (currently versioned at 6.1.0) but master was not updated to reflect this. This impacts the mixed-cluster BWC tests because a 6.1.0 node will be trying to send a 7.0.0 node the new discovery stats but the 7.0.0 did not yet understand that it should be reading these when talking to a 6.1.0 node. This commit addresses this, and changes the skip version on the discovery stats REST tests.	2017-10-26 07:53:18 -04:00
Catalin Ursachi	8bf33241ed	Add Delete Index API support to high-level REST client (#27019 ) Relates to #25847	2017-10-26 09:52:46 +02:00
Jason Tedor	77f87732ef	Adjust .DS_Store test assertions on Windows Windows handles trying to read a file that does not exist because a component of the path is not a directory differently than other OS handle this situation. This commit adjusts these assertions for Windows.	2017-10-25 22:36:53 -04:00
Jason Tedor	17d6820a4b	Emit settings deprecation logging on empty update When executing a cluster settings update that leaves the cluster state unchanged, we skip validation and this avoids deprecation logging for deprecated settings in the cluster state. This commit addresses this by running validation even if the settings are unchanged. Relates #27017	2017-10-25 22:15:38 -04:00
Jason Tedor	9aae2f593a	Avoid stack overflow on search phases When a search is executing locally over many shards, we can stack overflow during query phase execution. This happens due to callbacks that occur after a phase completes for a shard and we move to the same phase on another shard. If all the shards for the query are local to the local node then we will never go async and these callbacks will end up as recursive calls. With sufficiently many shards, this will end up as a stack overflow. This commit addresses this by truncating the stack by forking to another thread on the executor for the phase. Relates #27069	2017-10-25 22:05:46 -04:00
Nhat	adc195e30c	Fix error message for a put index template request without index_patterns (#27102 ) Just correct the error message from "Validation Failed: 1: pattern is missing;" to "Validation Failed: 1: index_patterns is missing;". Closes #27100	2017-10-25 18:54:40 -04:00
Armin Braun	6533b165d6	#25601 Add pipeline support for REST API bulk upsert (#27075 )	2017-10-25 19:03:25 +02:00
Jason Tedor	6722b9c4a2	Ignore .DS_Store files on macOS Finder creates these files if you browse a directory there. These files are really annoying, but it's an incredible pain for users that these files are created unbeknownst to them, and then they get in the way of Elasticsearch starting. This commit adds leniency on macOS only to skip these files. Relates #27108	2017-10-25 11:25:29 -04:00
Luca Cavanna	5818ff6b56	Make ShardSearchTarget optional when parsing ShardSearchFailure (#27078 ) Turns out that `ShardSearchTarget` is nullable, hence its fields may not be printed out as part of `ShardSearchFailure#toXContent`, in which case `fromXContent` cannot parse it back. We would previously try to create the object with all of its fields set to null, but `Index` complains about it in the constructor. Also made sure that this code path is covered by our unit tests in `ShardSearchFailureTests`. Closes #27055	2017-10-25 13:26:06 +02:00
Luca Cavanna	8caf7d4ff8	Decouple BulkProcessor from ThreadPool (#26727 ) Introduce minimal thread scheduler as a base class for `ThreadPool`. Such a class can be used from the `BulkProcessor` to schedule retries and the flush task. This allows to remove the `ThreadPool` dependency from `BulkProcessor`, which requires to provide settings that contain `node.name` and also needed log4j for logging. Instead, it needs now a `Scheduler` that is much lighter and gets automatically created and shut down on close. Closes #26028	2017-10-25 10:30:23 +02:00
David Turner	cc3364e4f8	Stats to record how often the ClusterState diff mechanism is used successfully (#26973 ) It's believed that using diffs obsoletes the other mechanism for reusing the bits of the ClusterState that didn't change between updates, but in fact we don't know for sure how often the diff mechanism works successfully. The stats collected here will tell us.	2017-10-25 07:35:25 +01:00
Lee Hinman	6bc7024f26	Tie-break shard path decision based on total number of shards on path (#27039 ) Right now if the number of shards for a particular index is equal across the data paths, we tie-break on space. This changes to tie-break first on the total number of shards for each path, and then, if that is the same, on the usable bytes. Relates to #26654 (it's a follow-up)	2017-10-24 16:12:47 -06:00
Jason Tedor	7a792d2c1f	Timed runnable should delegate to abstract runnable If timed runnable wraps an abstract runnable, then it should delegate to the abstract runnable otherwise force execution and handling rejections is dropped on the floor. Thus, timed runnable should itself be an abstract runnable delegating all methods to the wrapped runnable in cases when it is an abstract runnable. This commit causes this to be the case. Relates #27095	2017-10-24 11:36:50 -04:00
Lee Hinman	fcfbdf1f37	Expose adaptive replica selection stats in /_nodes/stats API This exposes the collected metrics we store for ARS in the nodes stats, as well as the computed rank of nodes. Each node exposes its perspective about the cluster. Here's an example output (with `?human`): ```json ... "adaptive_selection" : { "_k6v1-wERxyUd5ke6s-D0g" : { "outgoing_searches" : 0, "avg_queue_size" : 0, "avg_service_time" : "7.8ms", "avg_service_time_ns" : 7896963, "avg_response_time" : "9ms", "avg_response_time_ns" : 9095598, "rank" : "9.1" }, "VJiCUFoiTpySGmO00eWmtQ" : { "outgoing_searches" : 0, "avg_queue_size" : 0, "avg_service_time" : "1.3ms", "avg_service_time_ns" : 1330240, "avg_response_time" : "4.5ms", "avg_response_time_ns" : 4524154, "rank" : "4.5" }, "DHNGTdzyT9iiaCpEUsIAKA" : { "outgoing_searches" : 0, "avg_queue_size" : 0, "avg_service_time" : "2.1ms", "avg_service_time_ns" : 2113164, "avg_response_time" : "6.3ms", "avg_response_time_ns" : 6375810, "rank" : "6.4" } } ... ```	2017-10-24 08:58:42 -06:00
David Turner	cf2d0834f5	Remove duplicated test (#27091 )	2017-10-24 11:52:01 +01:00
Nhat	bf557fd886	test: avoid generating duplicate multiple fields (#27080 ) Multifields parser does not allow duplicate values, however the MultiFieldTests may produce duplicate field values. See https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/132/console.	2017-10-23 09:59:40 -04:00
Adrien Grand	d0104c22a5	Reduce the default number of cached queries. (#26949 ) Memory usage of queries can't be properly accounted, which can be an issue when large queries are cached since the actual memory usage will be much higher than what the cache thinks. This problem is very hard if not impossible to fix so as a workaround I would like to decrease the maximum number of cached queries so that this problem is less likely to cause trouble in practice. For the record, this problem is more likely to occur in envirenments that have small shards or don't give much memory to the JVM. Closes #26938	2017-10-23 14:11:35 +02:00
Jason Tedor	35984a616e	Keep cumulative elapsed scroll time in microseconds Today we internally accumulate elapsed scroll time in nanoseconds. The problem here is that this can reasonably overflow. For example, on a system with scrolls that are open for ten minutes on average, after sixteen million scrolls the largest value that can be represented by a long will be executed. To address this, we switch to internally representing scrolls using microseconds as this enables with the same number of scrolls scrolls that are open for seven days on average, or with the same average elapsed time sixteen billion scrolls which will never happen (executing one scroll a second until sixteen billion have executed would not occur until more than five-hundred years had elapsed). Relates #27068	2017-10-21 13:18:28 +02:00
Tanguy Leroux	463e7e6fa3	Revert "Upgrade to Jackson 2.9.2 (#27032 )" This reverts commit 0b9acc5acea90887cfab666a05cb6d3cd8aa1e02.	2017-10-20 08:25:41 +02:00
Tanguy Leroux	0b9acc5ace	Upgrade to Jackson 2.9.2 (#27032 ) Upgrade to Jackson 2.9.2 and also use a boolean `closed` flag to indicate that a FastStringReader instance is closed, so that length is still correctly reported after the reader is closed.	2017-10-19 15:15:02 +02:00
Martijn van Groningen	87c9b79b10	Return the _source of inner hit nested as is without wrapping it into its full path context Due to a change happened via #26102 to make the nested source consistent with or without source filtering, the _source of a nested inner hit was always wrapped in the parent path. This turned out to be not ideal for users relying on the nested source, as it would require additional parsing on the client side. This change fixes this, the _source of nested inner hits is now no longer wrapped by parent json objects, irregardless of whether the _source is included as is or source filtering is used. Internally source filtering and highlighting relies on the fact that the _source of nested inner hits are accessible by its full field path, so in order to now break this, the conversion of the _source into its binary form is performed in FetchSourceSubPhase, after any potential source filtering is performed to make sure the structure of _source of the nested inner hit is consistent irregardless if source filtering is performed. PR for #26944 Closes #26944	2017-10-19 12:04:56 +02:00
Alexander Kazakov	9a3a1cd1b7	Handle leniency for cross_fields type in multi_match query (#27045 )	2017-10-19 10:29:28 +02:00
Stephen Yeargin	8a05e5b92c	Fix typo in thrown exception in IndicesAliasesRequest (#27025 ) There is a typo in the exception thrown in `IndicesAliasesRequest`. This PR corrects the spelling and removes extraneous word.	2017-10-18 13:54:16 +00:00
Lee Hinman	78c54c4560	Balance shards for an index more evenly across multiple data paths (#26654 ) * Balance shards for an index more evenly across multiple data paths When a node has multiple data paths configured, and is assigned all of the shards for a particular index, it's possible now that all shards will be assigned to the same path (see #16763). This change keeps the same behavior around determining the "best" path for a shard based on space, however, it enforces limits for the number of shards on a path for an index from the single-node perspective. For example: Assume you had a node with 4 data paths, where `/path1` has a tremendously high amount of disk space available compared to the other paths. If you create an index with 5 primary shards, the previous behavior would be to assign all 5 shards to `/path1`. This change would enforce a limit of 2 shards to each data path for that particular node, so you would end up with the following distribution: - `/path1` - 2 shards (because it has the most usable space) - `/path2` - 1 shard - `/path3` - 1 shard - `/path4` - 1 shard Note, however, that this limit is only enforced at the local node level for simplicity in implementation, so if you had multiple nodes, the "limit" for the node is still 2, so assuming you had enough nodes that there was only 2 shards for this index assigned to this node, they would still both be assigned to `/path1`. * Switch from ObjectLongHashMap to regular HashMap * Remove unneeded Files.isDirectory check * Skip iterating directories when not necessary * Add message to assert * Implement different (better) ranking for node paths This is the method we discussed * Remove unused pathHasEnoughSpace method * Use findFirst instead of .get(0); * Update for master merge to fix compilation Settings.putArray -> Settings.putList	2017-10-17 05:49:24 -06:00
Jason Tedor	62bf3c11a9	Stop invoking non-existant syscall Today when getting ready to enter seccomp, we do some probes to ensure that we are really talking to seccomp, etc. One of these probes is pure paranoia. The paranoia was driven by a kernel bug (https://lkml.org/lkml/2014/7/20/222) that only impacted 32-bit x86 kernels wherein invoking a non-existant syscall was not returning ENOSYS (as it should). This probe causes problems though, for example in containers with syscall filters, invoking a non-existant syscall will lead to the process being sent SIGSYS and terminated. We do not need this paranoid, we do not support 32-bit, and our other probes give us enough of a defense to ensure that we are talking to seccomp (and we hardcode the seccomp syscall number for platforms that we support). Given that this probe offers us little value, but does cause problems in valid use-cases, this commit removes this paranoia. Relates #27016	2017-10-17 11:34:44 +02:00
Jason Tedor	3664ede9b5	Remove unnecessary exception for engine constructor The internal engine constructor declares a checked engine exception yet this constructor does not actually throw this exception. This commit removes this declaration from the internal engine constructor. Relates #27022	2017-10-16 10:17:37 -04:00
Simon Willnauer	8dda827ff4	Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000 ) Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.	2017-10-16 10:16:35 +02:00

1 2 3 4 5 ...

8984 Commits