Commit Graph

206 Commits

Author SHA1 Message Date
Adrien Grand 4918924fae
Remove legacy mapping code. (#29224)
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
2018-04-11 09:41:37 +02:00
Martijn van Groningen 182cf11f37
Fixed bug when non percolator docs end up in the search hits.
In the case that a document with a percolator field is matched when using the `percolate` query then
the fetch phase can fail due to the fact that the percolator can't resolve any query from that document.

Closes #29429
2018-04-10 13:33:31 +02:00
Martijn van Groningen 2346f7fa89
removed unused import 2018-04-10 07:44:51 +02:00
Martijn van Groningen f4395c0c94
Fixed a msm accounting error that can occur during analyzing a percolator query.
In case of a disjunction query with both range and term based clauses and
msm specified, the query analyzer needs to also reduce the msn if a range
based clause for the same field is encountered. This did not happen.

Instead of fixing this bug the logic has been simplified to just set a
percolator query's msm to 1 if a disjunction contains range clauses and
msm on disjunction has been specified. The logic would otherwise just get
to complex and the performance gain isn't that much for this kind of
percolator queries.

In case a percolator query has clauses that have duplicate terms or ranges then
for disjunction clauses with a minimum should match the query extraction of the
clause with the lowest msm should be used and for conjunction queries query
extractions wiht duplicate terms/ranges the msn should be ignored. If this
is not done then percolator queries that should match never match.

Example percolator query: value1 OR value2 OR value2 OR value3 OR value3 OR value3 OR value4 OR value5 (msm set to 3)
In the above example query the extracted msm would be 3
Example document1: value1 value2 value3
With the msm and extracted terms this would match and is expected behaviour
Example document2: value3
This document should match too (value3 appears in 3 clauses), but with msm set to 3 and the fact
that fact that only distinct values are indexed in extracted terms field this document would

Also added another random duel test.

Closes #29393
2018-04-10 07:25:12 +02:00
Adrien Grand 0f00277851
Simplify analysis of `bool` queries. (#29430)
This change tries to simplify the extraction logic of boolean queries by
concentrating the logic into two methods: one that merges results for
conjunctions, and another one for disjunctions. Other concerns, like the impact
of prohibited clauses or how an `UnsupportedQueryException` should be treated
are applied on top of those two methods.

This is mostly a code reorganization, it doesn't change the result of query
extraction except in the case that a query both has required clauses and a
minimum number of `SHOULD` clauses that is greater than 1, which we now
rewrite into a pure conjunction. For instance `(+A B C)~1` is rewritten into
`(+A +(B C))` prior to extraction.
2018-04-09 16:34:45 +02:00
Adrien Grand 85f5382a3c
Fix more query extraction bugs. (#29388)
I found the following bugs:
 - The 6.0 logic for conjunctions didn't work when there were only `match_all`
   queries in MUST/FILTER clauses as they didn't propagate the `matchAllDocs`
   flag.
 - Some queries still had the same issue as `BooleanQuery` used to have with
   duplicate terms (see #28353), eg. `MultiPhraseQuery`.

Closes #29376
2018-04-06 10:44:34 +02:00
Adrien Grand c21057b3a2 Fix QueryAnalyzerTests.
Closes #29363
2018-04-04 12:48:42 +02:00
Jason Tedor a19fd5636b Add awaits fix for a query analyzer test
The test QueryAnalyzerTests#testExactMatch_booleanQuery is failing since
8cdd950056. This commit adds an awaits fix
for it until it can be addressed.
2018-04-04 05:40:13 -04:00
Adrien Grand 8cdd950056
Fix some query extraction bugs. (#29283)
While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
2018-04-03 16:44:26 +02:00
Lee Hinman 6b2167f462
Begin moving XContent to a separate lib/artifact (#29300)
* Begin moving XContent to a separate lib/artifact

This commit moves a large portion of the XContent code from the `server` project
to the `libs/xcontent` project. For the pieces that have been moved, some
helpers have been duplicated to allow them to be decoupled from ES helper
classes. In addition, `Booleans` and `CheckedFunction` have been moved to the
`elasticsearch-core`  project.

This decoupling is a move so that we can eventually make things like the
high-level REST client not rely on the entire ES jar, only the parts it needs.

There are some pieces that are still not decoupled, in particular some of the
XContent tests still remain in the server project, this is because they test a
large portion of the pluggable xcontent pieces through
`XContentElasticsearchException`. They may be decoupled in future work.
Additionally, there may be more piecese that we want to move to the xcontent lib
in the future that are not part of this PR, this is a starting point.

Relates to #28504
2018-04-02 15:58:31 -06:00
Lee Hinman b4af451ec5
Remove BytesArray and BytesReference usage from XContentFactory (#29151)
* Remove BytesArray and BytesReference usage from XContentFactory

This removes the usage of `BytesArray` and `BytesReference` from
`XContentFactory`. Instead, a regular `byte[]` should be passed. To assist with
this a helper has been added to `XContentHelper` that will preserve the offset
and length from the underlying BytesReference.

This is part of ongoing work to separate the XContent parts from ES so they can
be factored into their own jar.

Relates to #28504
2018-03-20 11:52:26 -06:00
Lee Hinman 8e8fdc4f0e
Decouple XContentBuilder from BytesReference (#28972)
* Decouple XContentBuilder from BytesReference

This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.

While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).

Relates to #28504
2018-03-14 13:47:57 -06:00
Martijn van Groningen beb22d89c8 percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query.
Before the `matchAllDocs` was ignored and this could lead to percolator queries not matching when
the inner query was a match_all query and min_score was specified.

Before when `verified` was not taken into account if the function_score query wrapped an unverified query this could
lead to matching percolator queries that shouldn't match at all.
2018-03-09 07:16:21 +01:00
Martijn van Groningen bcfb7ab591
Improved percolator's random candidate query duel test and
fixed bugs that were exposed by this:

* Duplicates query leafs were not detected in a multi level boolean query
* Tracking fields for numeric range queries did not work properly.
* The sorting that was used to find the less restrictive clauses in
  disjunction query did not work too.
2018-03-08 11:39:03 +01:00
Lee Hinman 818920a281
Decouple XContentType from StreamInput/Output (#28927)
This removes the readFrom and writeTo methods from XContentType, instead using
the more generic `readEnum` and `writeEnum` methods. Luckily they are both
encoded exactly the same way, so there is no compatibility layer needed for
backwards compatibility.

Relates to #28504
2018-03-07 14:50:30 -07:00
Lee Hinman 0dd79028c9
Remove deprecated createParser methods (#28697)
* Remove deprecated createParser methods

This removes the final instances of the callers of `XContent.createParser` and
`XContentHelper.createParser` that did not pass in the `DeprecationHandler`. It
also removes the now-unused deprecated methods and fully removes any mention of
Log4j or LoggingDeprecationHandler from the XContent code.

Relates to #28504

* Add comments in JsonXContentGenerator
2018-02-16 08:26:30 -07:00
Lee Hinman 7c1f5f5054
Move more XContent.createParser calls to non-deprecated version (#28670)
* Move more XContent.createParser calls to non-deprecated version

This moves more of the callers to pass in the DeprecationHandler.

Relates to #28504

* Use parser's deprecation handler where available
2018-02-14 09:01:40 -07:00
Ryan Ernst 20c37efea2
Build: Replace provided configuration with compileOnly (#28564)
When elasticsearch was originally moved to gradle, the "provided" equivalent in maven had to be done through a plugin. Since then, gradle added the "compileOnly" configuration. This commit removes the provided plugin and replaces all uses with compileOnly.
2018-02-09 11:30:24 -08:00
Lee Hinman 3ddea8d8d2
Start switching to non-deprecated ParseField.match method (#28488)
This commit switches all the modules and server test code to use the
non-deprecated `ParseField.match` method, passing in the parser's deprecation
handler or the logging deprecation handler when a parser is not available (like
in tests).

Relates to #28449
2018-02-02 10:10:13 -07:00
Martijn van Groningen ecb1d07d00
percolator: remove deprecated map_unmapped_fields_as_string setting 2018-02-01 11:11:22 +01:00
Martijn van Groningen 9bada306dc
Improved percolator candidate query tests. 2018-02-01 07:43:03 +01:00
Martijn van Groningen 204f4022c2
percolator: Do not take duplicate query extractions into account for minimum_should_match attribute
If a percolator query contains duplicate query clauses somewhere in the query tree then
when these clauses are extracted then they should not affect the msm.

This can lead a percolator query that should be a valid match not become a candidate match,
because at query time, the msm that is being used by the CoveringQuery would never match with
the msm used at index time.

Closes #28315
2018-01-30 07:25:33 +01:00
Adrien Grand 700d9ecc95
Remove the `update_all_types` option. (#28288)
This option is not useful in 7.x since no indices may have more than one type
anymore.
2018-01-22 12:03:07 +01:00
Martijn van Groningen 73f6857dff
test: ensure we endup with a single segment
Closes #28127
2018-01-10 15:14:26 +01:00
Jason Tedor 75c0cd0672
Move range field mapper back to core
This commit moves the range field mapper back to core so that we can
remove the compile-time dependency of percolator on mapper-extras which
compilcates dependency management for the percolator client JAR, and
modules should not be intertwined like this anyway.

Relates #27854
2017-12-17 14:27:10 -05:00
Martijn van Groningen e9160fc014
percolator: also extract match_all queries
I've seen several cases where match_all queries were being used inside percolator queries,
because these queries were created generated by other systems.

Extracting these queries will allow the percolator at query time in a filter context
to skip over these queries without parsing or validating that these queries actually
match with the document being percolated.
2017-12-15 08:50:29 +01:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Christoph Büscher b83e14858a Correcting some minor typos in comments 2017-12-07 16:39:23 +01:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
Martijn van Groningen 4ab638b71d
percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024
The logic whether to use CoveringQuery was in two places which is why this bug snug in.
2017-11-27 08:55:11 +01:00
Simon Willnauer 5a0b6d1977
Use the primary_term field to identify parent documents (#27469)
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.

Relates to #24362
2017-11-21 15:14:03 +01:00
Martijn van Groningen 7c056f4523
reword comment 2017-11-13 08:00:34 +01:00
Martijn van Groningen 1bd31e9b53
percolator: fixed issue where in indices created before 6.1 if minimum should match has been specified on a disjunction,
the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match
on indices created on or after 6.1, due to the use of the CoveringQuery.
2017-11-10 12:02:33 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Tanguy Leroux 6658ff0fd6 Don't detect source's XContentType in DocumentParser.parseDocument() (#26880)
DocumentParser.parseDocument() auto detects the XContentType of the
document to parse, but this information is already provided by SourceToParse.
2017-10-10 15:31:56 +02:00
Martijn van Groningen 805437b8bc
percolator: Also support query extraction for queries wrapped inside a ESToParentBlockJoinQuery 2017-09-28 09:28:50 +02:00
Simon Willnauer 9f97f9072a Allow `InputStreamStreamInput` array size validation where applicable (#26692)
Today we can't validate the array length in `InputStreamStreamInput` since
we can't rely on `InputStream.available` yet in some situations we know
the size of the stream and can apply additional validation.
2017-09-18 17:52:36 +02:00
Jim Ferenczi 401f4ba2ce Fix percolator highlight sub fetch phase to not highlight query twice (#26622)
* Fix percolator highlight sub fetch phase to not highlight query twice

The PercolatorHighlightSubFetchPhase does not override hitExecute and since it extends HighlightPhase the search hits
are highlighted twice (by the highlight phase and then by the percolator). This does not alter the results, the second highlighting
just overrides the first one but this slow down the request because it duplicates the work.
2017-09-14 09:31:14 +02:00
Adrien Grand 93da7720ff Move non-core mappers to a module. (#26549)
Today we have all non-plugin mappers in core. I'd like to start moving those
that neither map to json datatypes nor are very frequently used like `date` or
`ip` to a module.

This commit creates a new module called `mappers-extra` and moves the
`scaled_float` and `token_count` mappers to it. I'd like to eventually move
`range` fields there but it's more complicated due to their intimate
relationship with range queries.

Relates #10368
2017-09-13 17:58:53 +02:00
Adrien Grand 1adee8b5a8 Fix the MapperFieldType.rangeQuery API. (#26552)
RangeQueryBuilder needs to perform too many `instanceof` checks in order to
check for `date` or `range` fields in order to know what it should do with the
shape relation, time zone and date format.

This commit adds those 3 parameters to the `rangeQuery` factory method so that
those instanceof checks are not necessary anymore.
2017-09-11 11:02:05 +02:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Martijn van Groningen 6bdf591193
removed unused import 2017-09-06 07:01:58 +02:00
Martijn van Groningen 77bbe99102
Fix two unreleased percolator query analyze bugs
* If in a range query upper is smaller than lower then ignore the range query
* If two empty range extractions are compared don't fail with NoSuchElementException
2017-09-06 06:47:01 +02:00
Martijn van Groningen 2ad3608245
percolator: handle point queries with 2 or more dimensions correctly 2017-09-06 06:36:47 +02:00
Martijn van Groningen a4d5c6418e
percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
2017-09-04 14:12:44 +02:00
Yannick Welsch 01f6851691 Serialize and expose timeout of acknowledged requests in REST layer (#26189)
Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead).
Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.
2017-08-16 07:43:05 +08:00
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Martijn van Groningen 8285a0f399
percolator: Use correct version for bwc checking now that the change has been backported to 6.0 branch 2017-08-09 13:49:20 +02:00
Simon Willnauer 82fa531ab4 Remove `_index` fielddata hack if cluster alias is present (#26082)
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
2017-08-08 09:24:24 +02:00