OpenSearch

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	4918924fae	Remove legacy mapping code. (#29224 ) Some features have been deprecated since `6.0` like the `_parent` field or the ability to have multiple types per index. This allows to remove quite some code, which in-turn will hopefully make it easier to proceed with the removal of types.	2018-04-11 09:41:37 +02:00
Martijn van Groningen	182cf11f37	Fixed bug when non percolator docs end up in the search hits. In the case that a document with a percolator field is matched when using the `percolate` query then the fetch phase can fail due to the fact that the percolator can't resolve any query from that document. Closes #29429	2018-04-10 13:33:31 +02:00
Martijn van Groningen	f4395c0c94	Fixed a msm accounting error that can occur during analyzing a percolator query. In case of a disjunction query with both range and term based clauses and msm specified, the query analyzer needs to also reduce the msn if a range based clause for the same field is encountered. This did not happen. Instead of fixing this bug the logic has been simplified to just set a percolator query's msm to 1 if a disjunction contains range clauses and msm on disjunction has been specified. The logic would otherwise just get to complex and the performance gain isn't that much for this kind of percolator queries. In case a percolator query has clauses that have duplicate terms or ranges then for disjunction clauses with a minimum should match the query extraction of the clause with the lowest msm should be used and for conjunction queries query extractions wiht duplicate terms/ranges the msn should be ignored. If this is not done then percolator queries that should match never match. Example percolator query: value1 OR value2 OR value2 OR value3 OR value3 OR value3 OR value4 OR value5 (msm set to 3) In the above example query the extracted msm would be 3 Example document1: value1 value2 value3 With the msm and extracted terms this would match and is expected behaviour Example document2: value3 This document should match too (value3 appears in 3 clauses), but with msm set to 3 and the fact that fact that only distinct values are indexed in extracted terms field this document would Also added another random duel test. Closes #29393	2018-04-10 07:25:12 +02:00
Adrien Grand	0f00277851	Simplify analysis of `bool` queries. (#29430 ) This change tries to simplify the extraction logic of boolean queries by concentrating the logic into two methods: one that merges results for conjunctions, and another one for disjunctions. Other concerns, like the impact of prohibited clauses or how an `UnsupportedQueryException` should be treated are applied on top of those two methods. This is mostly a code reorganization, it doesn't change the result of query extraction except in the case that a query both has required clauses and a minimum number of `SHOULD` clauses that is greater than 1, which we now rewrite into a pure conjunction. For instance `(+A B C)~1` is rewritten into `(+A +(B C))` prior to extraction.	2018-04-09 16:34:45 +02:00
Adrien Grand	85f5382a3c	Fix more query extraction bugs. (#29388 ) I found the following bugs: - The 6.0 logic for conjunctions didn't work when there were only `match_all` queries in MUST/FILTER clauses as they didn't propagate the `matchAllDocs` flag. - Some queries still had the same issue as `BooleanQuery` used to have with duplicate terms (see #28353), eg. `MultiPhraseQuery`. Closes #29376	2018-04-06 10:44:34 +02:00
Adrien Grand	c21057b3a2	Fix QueryAnalyzerTests. Closes #29363	2018-04-04 12:48:42 +02:00
Jason Tedor	a19fd5636b	Add awaits fix for a query analyzer test The test QueryAnalyzerTests#testExactMatch_booleanQuery is failing since `8cdd950056`. This commit adds an awaits fix for it until it can be addressed.	2018-04-04 05:40:13 -04:00
Adrien Grand	8cdd950056	Fix some query extraction bugs. (#29283 ) While playing with the percolator I found two bugs: - Sometimes we set a min_should_match that is greater than the number of extractions. While this doesn't cause direct trouble, it does when the query is nested into a boolean query and the boolean query tries to compute the min_should_match for the entire query based on its own min_should_match and those of the sub queries. So I changed the code to throw an exception when min_should_match is greater than the number of extractions. - Boolean queries claim matches are verified when in fact they shouldn't. This is due to the fact that boolean queries assume that they are verified if all sub clauses are verified but things are more complex than that, eg. conjunctions that are nested in a disjunction or disjunctions that are nested in a conjunction can generally not be verified without running the query.	2018-04-03 16:44:26 +02:00
Lee Hinman	8e8fdc4f0e	Decouple XContentBuilder from BytesReference (#28972 ) * Decouple XContentBuilder from BytesReference This commit removes all mentions of `BytesReference` from `XContentBuilder`. This is needed so that we can completely decouple the XContent code and move it into its own dependency. While this change appears large, it is due to two main changes, moving `.bytes()` and `.string()` out of XContentBuilder itself into static methods `BytesReference.bytes` and `Strings.toString` respectively. The rest of the change is code reacting to these changes (the majority of it in tests). Relates to #28504	2018-03-14 13:47:57 -06:00
Martijn van Groningen	beb22d89c8	percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query. Before the `matchAllDocs` was ignored and this could lead to percolator queries not matching when the inner query was a match_all query and min_score was specified. Before when `verified` was not taken into account if the function_score query wrapped an unverified query this could lead to matching percolator queries that shouldn't match at all.	2018-03-09 07:16:21 +01:00
Martijn van Groningen	bcfb7ab591	Improved percolator's random candidate query duel test and fixed bugs that were exposed by this: * Duplicates query leafs were not detected in a multi level boolean query * Tracking fields for numeric range queries did not work properly. * The sorting that was used to find the less restrictive clauses in disjunction query did not work too.	2018-03-08 11:39:03 +01:00
Martijn van Groningen	ecb1d07d00	percolator: remove deprecated map_unmapped_fields_as_string setting	2018-02-01 11:11:22 +01:00
Martijn van Groningen	9bada306dc	Improved percolator candidate query tests.	2018-02-01 07:43:03 +01:00
Martijn van Groningen	204f4022c2	percolator: Do not take duplicate query extractions into account for minimum_should_match attribute If a percolator query contains duplicate query clauses somewhere in the query tree then when these clauses are extracted then they should not affect the msm. This can lead a percolator query that should be a valid match not become a candidate match, because at query time, the msm that is being used by the CoveringQuery would never match with the msm used at index time. Closes #28315	2018-01-30 07:25:33 +01:00
Adrien Grand	700d9ecc95	Remove the `update_all_types` option. (#28288 ) This option is not useful in 7.x since no indices may have more than one type anymore.	2018-01-22 12:03:07 +01:00
Martijn van Groningen	73f6857dff	test: ensure we endup with a single segment Closes #28127	2018-01-10 15:14:26 +01:00
Martijn van Groningen	e9160fc014	percolator: also extract match_all queries I've seen several cases where match_all queries were being used inside percolator queries, because these queries were created generated by other systems. Extracting these queries will allow the percolator at query time in a filter context to skip over these queries without parsing or validating that these queries actually match with the document being percolated.	2017-12-15 08:50:29 +01:00
Adrien Grand	1b660821a2	Allow `_doc` as a type. (#27816 ) Allowing `_doc` as a type will enable users to make the transition to 7.0 smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`. This also moves most of the documentation to `_doc` as a type name. Closes #27750 Closes #27751	2017-12-14 17:47:53 +01:00
Adrien Grand	996990ad1f	Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496 ) The main highlight of this new snapshot is that it introduces the opportunity for queries to opt out of caching. In case a query opts out of caching, not only will it never be cached, but also no compound query that wraps it will be cached.	2017-11-28 14:52:42 +01:00
Martijn van Groningen	4ab638b71d	percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024 The logic whether to use CoveringQuery was in two places which is why this bug snug in.	2017-11-27 08:55:11 +01:00
Martijn van Groningen	1bd31e9b53	percolator: fixed issue where in indices created before 6.1 if minimum should match has been specified on a disjunction, the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match on indices created on or after 6.1, due to the use of the CoveringQuery.	2017-11-10 12:02:33 +01:00
Martijn van Groningen	b4048b4e7f	Use CoveringQuery to select percolate candidate matches and extract all clauses from a conjunction query. When clauses from a conjunction are extracted the number of clauses is also stored in an internal doc values field (minimum_should_match field). This field is used by the CoveringQuery and allows the percolator to reduce the number of false positives when selecting candidate matches and in certain cases be absolutely sure that a conjunction candidate match will match and then skip MemoryIndex validation. This can greatly improve performance. Before this change only a single clause was extracted from a conjunction query. The percolator tried to extract the clauses that was rarest in order (based on term length) to attempt less candidate queries to be selected in the first place. However this still method there is still a very high chance that candidate query matches are false positives. This change also removes the influencing query extraction added via #26081 as this is no longer needed because now all conjunction clauses are extracted. https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction Closes #26307	2017-11-10 07:44:42 +01:00
Tanguy Leroux	6658ff0fd6	Don't detect source's XContentType in DocumentParser.parseDocument() (#26880 ) DocumentParser.parseDocument() auto detects the XContentType of the document to parse, but this information is already provided by SourceToParse.	2017-10-10 15:31:56 +02:00
Martijn van Groningen	805437b8bc	percolator: Also support query extraction for queries wrapped inside a ESToParentBlockJoinQuery	2017-09-28 09:28:50 +02:00
Adrien Grand	1adee8b5a8	Fix the MapperFieldType.rangeQuery API. (#26552 ) RangeQueryBuilder needs to perform too many `instanceof` checks in order to check for `date` or `range` fields in order to know what it should do with the shape relation, time zone and date format. This commit adds those 3 parameters to the `rangeQuery` factory method so that those instanceof checks are not necessary anymore.	2017-09-11 11:02:05 +02:00
Martijn van Groningen	b391425da1	Added support to the percolate query to percolate multiple documents The percolator will add a `_percolator_document_slot` field to all percolator hits to indicate with what document it has matched. This number matches with the order in which the documents have been specified in the percolate query. Also improved the support for multiple percolate queries in a search request.	2017-09-08 17:28:39 +02:00
Martijn van Groningen	77bbe99102	Fix two unreleased percolator query analyze bugs * If in a range query upper is smaller than lower then ignore the range query * If two empty range extractions are compared don't fail with NoSuchElementException	2017-09-06 06:47:01 +02:00
Martijn van Groningen	2ad3608245	percolator: handle point queries with 2 or more dimensions correctly	2017-09-06 06:36:47 +02:00
Martijn van Groningen	a4d5c6418e	percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings and string is no longer a field type (it is either keyword or text).	2017-09-04 14:12:44 +02:00
Yannick Welsch	01f6851691	Serialize and expose timeout of acknowledged requests in REST layer (#26189 ) Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead). Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.	2017-08-16 07:43:05 +08:00
Martijn van Groningen	636e85e5b7	percolator: Hint what clauses are important in a conjunction query based on fields The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses. In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses. This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query. For example a status like field is something that should configured as an ignore field. Queries on this field tend to match with more documents and so if clauses for this fields get selected as best clause then that isn't very helpful for the candidate query that the percolate query generates to filter out percolator queries that are likely not going to match.	2017-08-11 15:32:01 +02:00
Martijn van Groningen	8285a0f399	percolator: Use correct version for bwc checking now that the change has been backported to 6.0 branch	2017-08-09 13:49:20 +02:00
Martijn van Groningen	11ce6b91a4	test: Do not use random index writer as test expects a single segment check against right version	2017-08-07 09:40:54 +02:00
Martijn van Groningen	53dd8afaea	fix test	2017-08-02 11:25:03 +02:00
Martijn van Groningen	5f36bdfda0	percolator: Also support IndexOrDocValuesQuery Otherwise ranges are never extracted properly.	2017-08-01 09:44:42 +02:00
Martijn van Groningen	7c3735bdc4	percolator: Store the QueryBuilder's Writable representation instead of its XContent representation. The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput. The query builder's binary format has now the same bwc guarentees as the xcontent format. Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.	2017-07-28 12:24:10 +02:00
Jim Ferenczi	562c3744ca	Merge FunctionScoreQuery and FiltersFunctionScoreQuery (#25889 ) This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery. It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY. These scores are invalid for TopDocsCollectors that relies on score comparison. Fixes #15709 Fixes #23628	2017-07-28 09:22:20 +02:00
Martijn van Groningen	edad7b4737	Add support for selecting percolator query candidate matches containing range queries. Extracts ranges from range queries on byte, short, integer, long, half_float, scaled_float, float, double, date and ip fields. byte, short, integer and date ranges are normalized to Lucene's LongRange. half_float and float are normalized to Lucene's DoubleRange. When extracting range queries, the QueryAnalyzer computes the width of the range. This width is used to determine what range should be preferred in a conjunction query. The QueryAnalyzer prefers the smaller ranges, because these ranges tend to match with less documents. Closes #21040	2017-07-26 21:25:45 +02:00
Simon Willnauer	634ce90dc0	Respect cluster alias in `_index` aggs and queries (#25885 ) Today when we aggregate on the `_index` field the cross cluster search alias is not taken into account. Neither is it respected when we search on the field. This change adds support for cluster alias when the cluster alias is present on the `_index` field. Closes #25606	2017-07-26 09:16:52 +02:00
Simon Willnauer	0e3ad522a2	Rewrite search requests on the coordinating nodes (#25814 ) This change rewrites search requests on the coordinating node before we send requests to the individual shards. This will reduce the rewrite load and object creation for each rewrite on the executing nodes and will fetch resources only once instead of N times once per shard for queries like `terms` query with index lookups. (among percolator and geo-shape) Relates to #25791	2017-07-21 09:38:38 +02:00
Simon Willnauer	5e629cfba0	Ensure query resources are fetched asynchronously during rewrite (#25791 ) The `QueryRewriteContext` used to provide a client object that can be used to fetch geo-shapes, terms or documents for percolation. Unfortunately all client calls used to be blocking calls which can have significant impact on the rewrite phase since it occupies an entire search thread until the resource is received. In the case that the index the resource is fetched from isn't on the local node this can have significant impact on query throughput. Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction	2017-07-20 15:37:50 +02:00
Adrien Grand	f1ff7f2454	Require a field when a `seed` is provided to the `random_score` function. (#25594 ) We currently use fielddata on the `_id` field which is trappy, especially as we do it implicitly. This changes the `random_score` function to use doc ids when no seed is provided and to suggest a field when a seed is provided. For now the change only emits a deprecation warning when no field is supplied but this should be replaced by a strict check on 7.0. Closes #25240	2017-07-19 14:11:15 +02:00
Christoph Büscher	927111c91d	Remove QueryParseContext from parsing QueryBuilders (#25448 ) Currently QueryParseContext is only a thin wrapper around an XContentParser that adds little functionality of its own. I provides helpers for long deprecated field names which can be removed and two helper methods that can be made static and moved to other classes. This is a first step in helping to remove QueryParseContext entirely.	2017-06-29 17:10:20 +02:00
Martijn van Groningen	c85ac402b0	test: Make many percolator integration tests real integration tests	2017-06-27 17:44:30 +02:00
Martijn van Groningen	343e7571b9	test: single type defaults to true since alpha1 and not alpha3 Closes #25354	2017-06-22 16:31:15 +02:00
Adrien Grand	44e9c0b947	Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349 ) Most notable changes: - better update concurrency: LUCENE-7868 - TopDocs.totalHits is now a long: LUCENE-7872 - QueryBuilder does not remove the boolean query around multi-term synonyms: LUCENE-7878 - removal of Fields: LUCENE-7500 For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding of vInts and vLongs are compatible: you can write and read with any of them as long as the value can be represented by a positive int.	2017-06-22 12:35:33 +02:00
Martijn van Groningen	a977569085	percolator: Deprecate `document_type` parameter. The `document_type` parameter is no longer required to be specified, because by default from 6.0 only a single type is allowed. (`index.mapping.single_type` defaults to `true`)	2017-06-22 09:55:06 +02:00
Ryan Ernst	a03b6c2fa5	Scripting: Change keys for inline/stored scripts to source/id (#25127 ) This commit adds back "id" as the key within a script to specify a stored script (which with file scripts now gone is no longer ambiguous). It also adds "source" as a replacement for "code". This is in an attempt to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.	2017-06-09 08:29:25 -07:00
Jim Ferenczi	21a57c1494	Always use DisjunctionMaxQuery to build cross fields disjunction (#25115 ) This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring. The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match. Closes #23966	2017-06-08 11:18:17 +02:00
Jim Ferenczi	7e60cf3e54	Move parent_id query to the parent-join module (#25072 ) This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on). If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6). Relates #20257	2017-06-06 19:35:14 +02:00

1 2 3

130 Commits