OpenSearch

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	bc7ec68e76	Add Cross Cluster Search support for scroll searches (#25094 ) To complete the cross cluster search capabilities for all search types and function this change adds cross cluster search support for scroll searches.	2017-06-13 17:22:49 +02:00
Sergey Galkin	1c95cbc4e8	Rollover max docs should only count primaries (#24977 ) max_doc condition for index rollover should use document count only from primary shards Fixes #24217	2017-06-13 14:30:46 +02:00
Simon Willnauer	01d7c217f6	Add remote cluster infrastructure to fetch discovery nodes. (#25123 ) In order to add scroll support for cross cluster search we need to resolve the nodes encoded in the scroll ID to send requests to the corresponding nodes. This change adds the low level connection infrastructure that also ensures that connections are re-established if the cluster is disconnected due to a network failure or restarts. Relates to #25094	2017-06-13 14:23:56 +02:00
Simon Willnauer	186c16ea41	Ensure pending transport handlers are invoked for all channel failures (#25150 ) Today if a channel gets closed due to a disconnect we notify the response handler that the connection is closed and the node is disconnected. Unfortunately this is not a complete solution since it only works for published connections. Connections that are unpublished ie. for discovery can indefinitely hang since we never invoke their handers when we get a failure while a user is waiting for the response. This change adds connection tracking to TcpTransport that ensures we are notifying the corresponding connection if there is a failure on a channel.	2017-06-13 09:37:05 +02:00
Lee Hinman	ee1113c902	Tweak AggregatorBase.addRequestCircuitBreakerBytes This modifies a method Mark added to the AggregatorBase that allows aggregations to add additional memory tracking for datastructures used during execution. If an aggregation would like to reclaim circuit breaker reserved bytes by adding a negative number, `addWithoutBreaking` should be used instead of `addEstimateBytesAndMaybeBreak`. Resolves #24511	2017-06-12 12:55:50 -06:00
Jason Tedor	bb66f3b76b	Explicitly reject duplicate data paths Duplicate data paths already fail to work because we would attempt to take out a node lock on the directory a second time which will fail after the first lock attempt succeeds. However, how this failure manifests is not apparent at all and is quite difficult to debug. Instead, we should explicitly reject duplicate data paths to make the failure cause more obvious. Relates #25178	2017-06-12 12:55:19 -04:00
Jason Tedor	982900eabf	Do not swallow node lock failed exception When attempting to obtain the node lock, if an exception is thrown it is not logged. This makes debugging difficult. This commit causes such an exception to be logged. Relates #25176	2017-06-12 11:42:45 -04:00
markharwood	518cda6637	Aggregations bug: Significant_text fails on arrays of text. (#25030 ) * Aggregations bug: Significant_text fails on arrays of text. The set of previously-seen tokens in a doc was allocated per-JSON-field string value rather than once per JSON document meaning the number of docs containing a term could be over-counted leading to exceptions from the checks in significance heuristics. Added unit test for this scenario Closes #25029	2017-06-12 14:02:54 +01:00
Jim Ferenczi	7ab3d5d04a	Speed up sorted scroll when the index sort matches the search sort (#25138 ) Sorted scroll search can use early termination when the index sort matches the scroll search sort. The optimization can be done after the first query (which still needs to collect all documents) by applying a query that only matches documents that are greater than the last doc retrieved in the previous request. Since the index is sorted, retrieving the list of documents that are greater than the last doc only requires a binary search on each segment. This change introduces this new query called `SortedSearchAfterDocQuery` and apply it when possible. Scrolls with this optimization will search all documents on the first request and then will early terminate each segment after $size doc for any subsequent requests. Relates #6720	2017-06-12 09:33:30 +02:00
Boaz Leskes	f34136eda4	TranslogTests.testWithRandomException ignored a possible simulated OOM when trimming files	2017-06-12 08:32:55 +02:00
Boaz Leskes	cfb5f6a5a6	Adapt TranslogTests.testWithRandomException to checkpoint syncing on trim #25005 changed the translog dynamic to fsync the checkpoint before trimming a file. This changed the dynamics of potential failure modes which requires a change to testWithRandomException - it's now possible that we had an exception but the translog was trimmed. Closes #25133	2017-06-11 23:17:10 +02:00
Jason Tedor	dcf57f296e	Fix get mappings HEAD requests Get mappings HEAD requests incorrectly return a content-length header of 0. This commit addresses this by removing the special handling for get mappings HEAD requests, and just relying on the general mechanism that exists for handling HEAD requests in the REST layer. Relates #23192	2017-06-11 14:58:56 -04:00
Boaz Leskes	9b8754e4c2	TranslogTests#commit didn't allow for a concurrent closing of a view The view closing will trim unneeded files but there is a small window where they may still be around.	2017-06-11 19:09:01 +02:00
Jason Tedor	7182577904	Fix handling of exceptions thrown on HEAD requests Today when an exception is thrown handling a HEAD request, the body is swallowed before the channel has a chance to see it. Yet, the channel is where we compute the content length that would be returned as a header in the response. This is a violation of the HTTP specification. This commit addresses the issue. To address this issue, we remove the special handling in bytes rest response for HEAD requests when an exception is thrown. Instead, we let the upstream channel handle the special case, as we already do today for the non-exceptional case. Relates #25172	2017-06-10 23:44:18 -04:00
Jason Tedor	5108fa7529	Remove unneeded weak reference from prefix logger We have a custom logger implementation known as a prefix logger that is used to write every message by the logger with a given prefix. This is useful for node-level, index-level, and shard-level messages where we want to log the node name, index name, and shard ID, respectively, if possible. The mechanism that we employ is that of a marker. Log4j has a built-in facility for managing these markers, but its effectively a memory leak because these markers are held in a map and can never be released. This is problematic for us since indices and shards do not necessarily have infinite life spans and so on a node where there are many indices being creted and destroyed, this infinite lifespan can be a problem indeed. To solve this, we use our own cache of markers. This is necessary to prevent too many instances of the marker for the same prefix from being created (just think of all the shard-level components that exist in the system), and to workaround the effective leak in Log4j. These markers are stored as weak references in a weak hash map. It is these weak references that are unneeded. When a key is removed from a weak hash map, the corresponding entry is placed on a reference queue that is eventually cleared. This commit simplifies prefix logger by removing this unnecessary weak reference wrapper. Relates #22460	2017-06-10 13:20:45 -04:00
Chris Earle	af7b479e12	"shard started" should show index and shard ID (#25157 ) When the cluster state is updated with Shard Started entries, it simply adds "shard-started" as the source of the change. This adds the index name and shard ID so that we can see who/what is spamming the changes when the index creation step has already left the cluster state.	2017-06-09 14:52:42 -04:00
Boaz Leskes	b8fef3309c	await fix testWithRandomException	2017-06-09 20:31:39 +02:00
Jason Tedor	8a45c3105f	Change BWC versions on create index response This commit changes the BWC versions on the create index response now that the index name in the response is supported since 5.6.0. Relates #25139	2017-06-09 13:52:08 -04:00
Sergey Novikov	7c8657df0e	Return the index name on a create index response This commit modifies the create index response so that it includes the index name. Relates #25139	2017-06-09 13:47:47 -04:00
Koen De Groote	64888f6f01	Correctly format arrays in output There are a few places where arrays are output in messages yet the output would merely use the default toString implementation rather than actually putting the content of the array in the message. This commit fixes the issue. Relates #24340	2017-06-09 11:45:07 -04:00
Christoph Büscher	823cbb437b	[Test] Extending parsing checks for SearchResponse (#25148 ) This change extends the tests and parsing of SearchResponse to make sure we can skip additional fields the parser doesn't know for forward compatibility reasons.	2017-06-09 17:33:44 +02:00
Ryan Ernst	a03b6c2fa5	Scripting: Change keys for inline/stored scripts to source/id (#25127 ) This commit adds back "id" as the key within a script to specify a stored script (which with file scripts now gone is no longer ambiguous). It also adds "source" as a replacement for "code". This is in an attempt to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.	2017-06-09 08:29:25 -07:00
Martijn van Groningen	c7ae27d57f	nested: In case of a single type the _id field should be added to the nested document instead of _uid field. When `index.mapping.single_type` is `true` the `_uid` field is not used and instead `_id` field is used. Prior to this change nested documents would in this case still use the `_uid` field to mark to what root document they belong to. In case of deleting documents this could lead to only the root Lucene document to be deleted and not the nested Lucene documents. This broke the docid block ordering the block join relies on in order to work correctly and thus causing the `nested` query, `nested` aggregation, nested sorting and nested inner hits to either fail or yield incorrect results. This bug only manifests in 6.0.0-ALPHA2 release and snaphots (5.5.0-SNAPSHOT, 5.6.0-SNAPSHOT, 6.0.0-SNAPSHOT).	2017-06-09 14:57:11 +02:00
Adrien Grand	87d19b21c7	`type` and `id` are lost upon serialization of `Translog.Delete`. (#24586 ) This was introduced in #24460: the constructor of `Translog.Delete` that takes a `StreamInput` does not set the type and id. To make it a bit more robust, I made fields final so that forgetting to set them would make the compiler complain.	2017-06-09 14:56:23 +02:00
Sergey Galkin	dc5aa993e0	Fix NPE in token_count datatype with null value (#25046 ) Fixes an issue with the handling of null values for the token_count data type. Closes #24928	2017-06-09 14:13:05 +02:00
Jim Ferenczi	8250aa4267	Remove the postings highlighter and make unified the default highlighter choice (#25028 ) This change removes the `postings` highlighter. This highlighter has been removed from Lucene master (7.x) because it behaves exactly like the `unified` highlighter when index_options is set to `offsets`: https://issues.apache.org/jira/browse/LUCENE-7815 It also makes the `unified` highlighter the default choice for highlighting a field (if `type` is not provided). The strategy used internally by this highlighter remain the same as before, it checks `term_vectors` first, then `postings` and ultimately it re-analyzes the text. Ultimately it rewrites the docs so that the options that the `unified` highlighter cannot handle are clearly marked as such. There are few features that the `unified` highlighter is not able to handle which is why the other highlighters (`plain` and `fvh`) are still available. I'll open separate issues for these features and we'll deprecate the `fvh` and `plain` highlighters when full support for these features have been added to the `unified`.	2017-06-09 14:09:57 +02:00
Christoph Büscher	eca4f24b16	[Test] Adding test for parsing SearchShardFailure leniently (#25144 ) This change extends the tests and parsing of SearchShardFailure to make sure we can skip fields the parser doesn't know for forward compatibility reasons.	2017-06-09 12:46:09 +02:00
Christoph Büscher	79057b1c61	[Test] Extending checks for Suggestion parsing (#25132 ) When parsing responses we should be ignoring any new unknown fields or inner objects in most cases to be forward compatible with changes in core on the client side. This change adds test for this for Suggestions and its various subclasses to check if we are able to ignore new fields and objects in the xContent.	2017-06-09 10:11:08 +02:00
Tal Levy	340909582f	remove Ingest's Internal Template Service (#25085 ) Ingest was using it's own wrapper around TemplateScripts and the ScriptService. This commit removes that abstraction	2017-06-08 15:24:03 -07:00
Lee Hinman	119f8ed9f0	Correctly enable _all for older 5.x indices When we disabled `_all` by default for indices created in 6.0, we missed adding a layer that would handle the situation where `_all` was not enabled in 5.x and then the cluster was updated to 6.0, this means that when the cluster was updated the `_all` field would be disabled for 5.x indices and field values would not be added to the `_all` field. This adds a compatibility layer for 5.x indices where we treat the default enabled value for the `_all` field to be `true` if unset on 5.x indices. Resolves #25068	2017-06-08 14:37:44 -06:00
Jason Tedor	1708f1773b	Mark Log4j API dependency as non-optional The Log4j dependency is separated into two artifacts, the API and the core implementation. This is to enable replacing Log4j on the backend through the SLF4J bridge with another logging implementation. For this reason, the dependencies are marked as optional. This causes confusion amongst users as to use the bridge, the API should be non-optional since it is needed for the bridge to function correctly. While they could pull it into their application directly, it would be clearer if we simply marked this depdendency as non-optional. Note that this does not mean that users have to use Log4j for logging in their application, so we are not marking core as required, it only clarifies what they need to be able to plug in a different logging implementation. Relates #25136	2017-06-08 16:09:34 -04:00
Lee Hinman	050b7cd0f9	Include empty mappings in GET /{index}/_mappings requests (#25118 ) Previously this would output: ``` GET /test-1/_mappings { } ``` And after this change: ``` GET /test-1/_mappings { "test-1": { "mappings": {} } } ``` To bring parity back to the REST output after #24723. Relates to #25090	2017-06-08 10:57:04 -06:00
Lee Hinman	5b2ab96364	Return index name and empty map for /{index}/_alias with no aliases Previously in #24723 we changed the `_alias` API to not go through the `RestGetIndicesAction` endpoint, instead creating a `RestGetAliasesAction` that did the same thing. This changes the formatting so that it matches the old formatting of the endpoint, before: ``` GET /test-1/_alias { } ``` And after this change: ``` GET /test-1/_alias { "test-1": { "aliases": {} } } ``` This is related to #25090	2017-06-08 10:03:03 -06:00
Eli Skeggs	ee0e921643	Fix typo in GeoUtils#isValidLongitude (#25121 ) GeoUtils#isValidLongitude is inconsistent with GeoUtils#isValidLatitude. Neither technically need the isInfinite() check because they then compare against min and max values.	2017-06-08 17:23:22 +02:00
Christoph Büscher	a0afa917ac	[Tests] Check QueryProfileShardResult parser robustness for new fields (#25130 ) When parsing resonses we should be ignoring any new unknown fields or inner objects in most cases to be forward compatible with changes in core on the client side. This change adds test for this for QueryProfileShardResult and nested substructures and changes the parsing code where necessary to be able to ignore new fields and objects in the xContent.	2017-06-08 16:40:00 +02:00
Nik Everett	4a8c09c5f1	Make randomVersionBetween work with unreleased versions (#25042 ) Test: randomVersionBetween works with unreleased Modifies randomVersionBetween so that it works with unreleased versions. This should make switching a version from unreleased to released much simpler.	2017-06-08 10:19:06 -04:00
Yannick Welsch	cd57395c98	Use correct primary term for replicating NOOPs (#25128 ) NOOPs should be, same as for indexing operations, written on the replica using the original operation term instead of the current term of the replica.	2017-06-08 14:20:26 +02:00
Martijn van Groningen	326fa33d4e	fielddata: Binary script doc values should make a deep copy of the BytesRef before populating it in the values array. Added common base class for ScriptDocValues.Strings and ScriptDocValues.BytesRefs now that these classes are very similar. Also cleaned up the BinaryDVFieldDataTests: * Use junit assertions instead of hamcrest * Use BytesRef directly instead of byte[] Closes #24785	2017-06-08 13:20:35 +02:00
Jim Ferenczi	eeac4b9721	Fix Fast Vector Highlighter NPE on match phrase prefix (#25116 ) The FVH fails with an NPE when a match phrase prefix is rewritten in an empty phrase query. This change makes sure that the multi match query rewrites to a MatchNoDocsQuery (instead of an empty phrase query) when there is a single term and that term does not expand to any term in the index. Fixes #25088	2017-06-08 12:27:11 +02:00
Jim Ferenczi	36a5cf8f35	Automatically early terminate search query based on index sorting (#24864 ) This commit refactors the query phase in order to be able to automatically detect queries that can be early terminated. If the index sort matches the query sort, the top docs collection is early terminated on each segment and the computing of the total number of hits that match the query is delegated to a simple TotalHitCountCollector. This change also adds a new parameter to the search request called `track_total_hits`. It indicates if the total number of hits that match the query should be tracked. If false, queries sorted by the index sort will not try to compute this information and and will limit the collection to the first N documents per segment. Aggregations are not impacted and will continue to see every document even when the index sort matches the query sort and `track_total_hits` is false. Relates #6720	2017-06-08 12:10:46 +02:00
Jim Ferenczi	21a57c1494	Always use DisjunctionMaxQuery to build cross fields disjunction (#25115 ) This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring. The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match. Closes #23966	2017-06-08 11:18:17 +02:00
Simon Willnauer	d6d416cacc	Break out clear scroll logic from TransportClearScrollAction (#25125 ) This change extracts the main logic from `TransportClearScrollAction` into a new class `ClearScrollController` and adds a corresponding unit test. Relates to #25094	2017-06-08 11:13:08 +02:00
Simon Willnauer	bdc3a16fa4	Fix naminig in GroupedActionListener GroupedActionListener still had some members named from it's specialization before it was factored out in a general purpose class.	2017-06-08 10:21:15 +02:00
Adrien Grand	a8ea2f0df4	Leverage scorerSupplier when applicable. (#25109 ) The `scorerSupplier` API allows to give a hint to queries in order to let them know that they will be consumed in a random-access fashion. We should use this for aggregations, function_score and matched queries.	2017-06-08 10:19:38 +02:00
Boaz Leskes	087f182481	Translog file recovery should not rely on lucene commits (#25005 ) When we open a translog, we rely on the `translog.ckp` file to tell us what the maximum generation file should be and on the information stored in the last lucene commit to know the first file we need to recover. This requires coordination and is currently subject to a race condition: if a node dies after a lucene commit is made but before we remove the translog generations that were unneeded by it, the next time we open the translog we will ignore those files and never delete them (I have added tests for this). This PR changes the approach to have the translog store both of those numbers in the `translog.ckp`. This means it's more self contained and easier to control. This change also decouples the translog recovery logic from the specific commit we're opening. This prepares the ground to fully utilize the deletion policy introduced in #24950 and store more translog data that's needed for Lucene, keep multiple lucene commits around and be free to recover from any of them.	2017-06-08 09:21:28 +02:00
Simon Willnauer	ce24331d1f	Add helper methods to TransportActionProxy to identify proxy actions and requests (#25124 ) Downstream users of out network intercept infrastructure need this information which is hidden due to member and class visibility.	2017-06-08 09:07:22 +02:00
Jack Conradson	d187fa78fd	Generate Painless Factory for Creating Script Instances (#25120 )	2017-06-07 16:06:11 -07:00
Christoph Büscher	9e741cd13d	Tests: Add ability to generate random new fields for xContent parsing test (#23437 ) For the response parsing we want to be lenient when it comes to parsing new xContent fields. In order to ensure this in our testing, this change adds a utility method to XContentTestUtils that takes xContent bytes representation as input and recursively a random field on each object level. Sometimes we also want to exclude a whole subtree from this treatment (e.g. skipping "_source"), other times an element (e.g. "fields", "highlight" in SearchHit) can have arbitraryly named objects. Those cases can be specified as exceptions.	2017-06-07 21:01:20 +02:00
Jim Ferenczi	68f1d4df5a	bump the Lucene version for Version 5.5 and 5.6 after the upgrade to Lucene 6.6.0	2017-06-07 19:32:13 +02:00
Ryan Ernst	2057bbc6c5	Scripting: Remove unnecessary intermediate script compilation methods on QueryShardContext (#25093 ) This commit removes wrapper methods on QueryShardContext used to compile scripts. Instead, the script service is made accessible in the context, and calls to compile can be made directly. This will ease transition to each of those location becoming their own context, since they would no longer be able to expect the same script class type.	2017-06-07 08:24:18 -07:00

1 2 3 4 5 ...

8358 Commits