Commit Graph

196 Commits

Author SHA1 Message Date
Lee Hinman b4af451ec5
Remove BytesArray and BytesReference usage from XContentFactory (#29151)
* Remove BytesArray and BytesReference usage from XContentFactory

This removes the usage of `BytesArray` and `BytesReference` from
`XContentFactory`. Instead, a regular `byte[]` should be passed. To assist with
this a helper has been added to `XContentHelper` that will preserve the offset
and length from the underlying BytesReference.

This is part of ongoing work to separate the XContent parts from ES so they can
be factored into their own jar.

Relates to #28504
2018-03-20 11:52:26 -06:00
Lee Hinman 8e8fdc4f0e
Decouple XContentBuilder from BytesReference (#28972)
* Decouple XContentBuilder from BytesReference

This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.

While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).

Relates to #28504
2018-03-14 13:47:57 -06:00
Martijn van Groningen beb22d89c8 percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query.
Before the `matchAllDocs` was ignored and this could lead to percolator queries not matching when
the inner query was a match_all query and min_score was specified.

Before when `verified` was not taken into account if the function_score query wrapped an unverified query this could
lead to matching percolator queries that shouldn't match at all.
2018-03-09 07:16:21 +01:00
Martijn van Groningen bcfb7ab591
Improved percolator's random candidate query duel test and
fixed bugs that were exposed by this:

* Duplicates query leafs were not detected in a multi level boolean query
* Tracking fields for numeric range queries did not work properly.
* The sorting that was used to find the less restrictive clauses in
  disjunction query did not work too.
2018-03-08 11:39:03 +01:00
Lee Hinman 818920a281
Decouple XContentType from StreamInput/Output (#28927)
This removes the readFrom and writeTo methods from XContentType, instead using
the more generic `readEnum` and `writeEnum` methods. Luckily they are both
encoded exactly the same way, so there is no compatibility layer needed for
backwards compatibility.

Relates to #28504
2018-03-07 14:50:30 -07:00
Lee Hinman 0dd79028c9
Remove deprecated createParser methods (#28697)
* Remove deprecated createParser methods

This removes the final instances of the callers of `XContent.createParser` and
`XContentHelper.createParser` that did not pass in the `DeprecationHandler`. It
also removes the now-unused deprecated methods and fully removes any mention of
Log4j or LoggingDeprecationHandler from the XContent code.

Relates to #28504

* Add comments in JsonXContentGenerator
2018-02-16 08:26:30 -07:00
Lee Hinman 7c1f5f5054
Move more XContent.createParser calls to non-deprecated version (#28670)
* Move more XContent.createParser calls to non-deprecated version

This moves more of the callers to pass in the DeprecationHandler.

Relates to #28504

* Use parser's deprecation handler where available
2018-02-14 09:01:40 -07:00
Ryan Ernst 20c37efea2
Build: Replace provided configuration with compileOnly (#28564)
When elasticsearch was originally moved to gradle, the "provided" equivalent in maven had to be done through a plugin. Since then, gradle added the "compileOnly" configuration. This commit removes the provided plugin and replaces all uses with compileOnly.
2018-02-09 11:30:24 -08:00
Lee Hinman 3ddea8d8d2
Start switching to non-deprecated ParseField.match method (#28488)
This commit switches all the modules and server test code to use the
non-deprecated `ParseField.match` method, passing in the parser's deprecation
handler or the logging deprecation handler when a parser is not available (like
in tests).

Relates to #28449
2018-02-02 10:10:13 -07:00
Martijn van Groningen ecb1d07d00
percolator: remove deprecated map_unmapped_fields_as_string setting 2018-02-01 11:11:22 +01:00
Martijn van Groningen 9bada306dc
Improved percolator candidate query tests. 2018-02-01 07:43:03 +01:00
Martijn van Groningen 204f4022c2
percolator: Do not take duplicate query extractions into account for minimum_should_match attribute
If a percolator query contains duplicate query clauses somewhere in the query tree then
when these clauses are extracted then they should not affect the msm.

This can lead a percolator query that should be a valid match not become a candidate match,
because at query time, the msm that is being used by the CoveringQuery would never match with
the msm used at index time.

Closes #28315
2018-01-30 07:25:33 +01:00
Adrien Grand 700d9ecc95
Remove the `update_all_types` option. (#28288)
This option is not useful in 7.x since no indices may have more than one type
anymore.
2018-01-22 12:03:07 +01:00
Martijn van Groningen 73f6857dff
test: ensure we endup with a single segment
Closes #28127
2018-01-10 15:14:26 +01:00
Jason Tedor 75c0cd0672
Move range field mapper back to core
This commit moves the range field mapper back to core so that we can
remove the compile-time dependency of percolator on mapper-extras which
compilcates dependency management for the percolator client JAR, and
modules should not be intertwined like this anyway.

Relates #27854
2017-12-17 14:27:10 -05:00
Martijn van Groningen e9160fc014
percolator: also extract match_all queries
I've seen several cases where match_all queries were being used inside percolator queries,
because these queries were created generated by other systems.

Extracting these queries will allow the percolator at query time in a filter context
to skip over these queries without parsing or validating that these queries actually
match with the document being percolated.
2017-12-15 08:50:29 +01:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Christoph Büscher b83e14858a Correcting some minor typos in comments 2017-12-07 16:39:23 +01:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
Martijn van Groningen 4ab638b71d
percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024
The logic whether to use CoveringQuery was in two places which is why this bug snug in.
2017-11-27 08:55:11 +01:00
Simon Willnauer 5a0b6d1977
Use the primary_term field to identify parent documents (#27469)
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.

Relates to #24362
2017-11-21 15:14:03 +01:00
Martijn van Groningen 7c056f4523
reword comment 2017-11-13 08:00:34 +01:00
Martijn van Groningen 1bd31e9b53
percolator: fixed issue where in indices created before 6.1 if minimum should match has been specified on a disjunction,
the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match
on indices created on or after 6.1, due to the use of the CoveringQuery.
2017-11-10 12:02:33 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Tanguy Leroux 6658ff0fd6 Don't detect source's XContentType in DocumentParser.parseDocument() (#26880)
DocumentParser.parseDocument() auto detects the XContentType of the
document to parse, but this information is already provided by SourceToParse.
2017-10-10 15:31:56 +02:00
Martijn van Groningen 805437b8bc
percolator: Also support query extraction for queries wrapped inside a ESToParentBlockJoinQuery 2017-09-28 09:28:50 +02:00
Simon Willnauer 9f97f9072a Allow `InputStreamStreamInput` array size validation where applicable (#26692)
Today we can't validate the array length in `InputStreamStreamInput` since
we can't rely on `InputStream.available` yet in some situations we know
the size of the stream and can apply additional validation.
2017-09-18 17:52:36 +02:00
Jim Ferenczi 401f4ba2ce Fix percolator highlight sub fetch phase to not highlight query twice (#26622)
* Fix percolator highlight sub fetch phase to not highlight query twice

The PercolatorHighlightSubFetchPhase does not override hitExecute and since it extends HighlightPhase the search hits
are highlighted twice (by the highlight phase and then by the percolator). This does not alter the results, the second highlighting
just overrides the first one but this slow down the request because it duplicates the work.
2017-09-14 09:31:14 +02:00
Adrien Grand 93da7720ff Move non-core mappers to a module. (#26549)
Today we have all non-plugin mappers in core. I'd like to start moving those
that neither map to json datatypes nor are very frequently used like `date` or
`ip` to a module.

This commit creates a new module called `mappers-extra` and moves the
`scaled_float` and `token_count` mappers to it. I'd like to eventually move
`range` fields there but it's more complicated due to their intimate
relationship with range queries.

Relates #10368
2017-09-13 17:58:53 +02:00
Adrien Grand 1adee8b5a8 Fix the MapperFieldType.rangeQuery API. (#26552)
RangeQueryBuilder needs to perform too many `instanceof` checks in order to
check for `date` or `range` fields in order to know what it should do with the
shape relation, time zone and date format.

This commit adds those 3 parameters to the `rangeQuery` factory method so that
those instanceof checks are not necessary anymore.
2017-09-11 11:02:05 +02:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Martijn van Groningen 6bdf591193
removed unused import 2017-09-06 07:01:58 +02:00
Martijn van Groningen 77bbe99102
Fix two unreleased percolator query analyze bugs
* If in a range query upper is smaller than lower then ignore the range query
* If two empty range extractions are compared don't fail with NoSuchElementException
2017-09-06 06:47:01 +02:00
Martijn van Groningen 2ad3608245
percolator: handle point queries with 2 or more dimensions correctly 2017-09-06 06:36:47 +02:00
Martijn van Groningen a4d5c6418e
percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
2017-09-04 14:12:44 +02:00
Yannick Welsch 01f6851691 Serialize and expose timeout of acknowledged requests in REST layer (#26189)
Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead).
Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.
2017-08-16 07:43:05 +08:00
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Martijn van Groningen 8285a0f399
percolator: Use correct version for bwc checking now that the change has been backported to 6.0 branch 2017-08-09 13:49:20 +02:00
Simon Willnauer 82fa531ab4 Remove `_index` fielddata hack if cluster alias is present (#26082)
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
2017-08-08 09:24:24 +02:00
Martijn van Groningen 11ce6b91a4
test: Do not use random index writer as test expects a single segment
check against right version
2017-08-07 09:40:54 +02:00
Martijn van Groningen 53dd8afaea
fix test 2017-08-02 11:25:03 +02:00
Martijn van Groningen a3d1248014
percolator: use correct version. 2017-08-02 10:37:59 +02:00
Martijn van Groningen 5f36bdfda0
percolator: Also support IndexOrDocValuesQuery
Otherwise ranges are never extracted properly.
2017-08-01 09:44:42 +02:00
Martijn van Groningen 7c3735bdc4
percolator: Store the QueryBuilder's Writable representation instead of its XContent representation.
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.

The query builder's binary format has now the same bwc guarentees as the xcontent format.

Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
2017-07-28 12:24:10 +02:00
Jim Ferenczi 562c3744ca Merge FunctionScoreQuery and FiltersFunctionScoreQuery (#25889)
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.

Fixes #15709
Fixes #23628
2017-07-28 09:22:20 +02:00
Martijn van Groningen edad7b4737
Add support for selecting percolator query candidate matches containing range queries.
Extracts ranges from range queries on byte, short, integer, long, half_float, scaled_float, float, double, date and ip fields.
byte, short, integer and date ranges are normalized to Lucene's LongRange.
half_float and float are normalized to Lucene's DoubleRange.

When extracting range queries, the QueryAnalyzer computes the width of the range.  This width is used to determine
what range should be preferred in a conjunction query. The QueryAnalyzer prefers the smaller ranges, because these
ranges tend to match with less documents.

Closes #21040
2017-07-26 21:25:45 +02:00
Simon Willnauer 634ce90dc0 Respect cluster alias in `_index` aggs and queries (#25885)
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.

Closes #25606
2017-07-26 09:16:52 +02:00
Simon Willnauer 0e3ad522a2 Rewrite search requests on the coordinating nodes (#25814)
This change rewrites search requests on the coordinating node before
we send requests to the individual shards. This will reduce the rewrite load
and object creation for each rewrite on the executing nodes and will fetch
resources only once instead of N times once per shard for queries like `terms`
query with index lookups. (among percolator and geo-shape)

Relates to #25791
2017-07-21 09:38:38 +02:00
Simon Willnauer 5e629cfba0 Ensure query resources are fetched asynchronously during rewrite (#25791)
The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.

Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction
2017-07-20 15:37:50 +02:00