Commit Graph

248 Commits

Author SHA1 Message Date
Alan Woodward a3ab7eb95d Correctly handle MSM for nested disjunctions (#50669)
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.

This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.

Fixes #50305
2020-01-07 09:32:30 +00:00
Alan Woodward 3d8c2f9e18 Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803)
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
2019-12-10 11:01:52 +00:00
jimczi 35732504ba #49166 Fix spurious test failure 2019-11-28 11:08:15 +01:00
Jim Ferenczi d6445fae4b Add a cluster setting to disallow loading fielddata on _id field (#49166)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.

Closes #43599
2019-11-28 09:35:28 +01:00
Alan Woodward c6b31162ba
Refactor percolator's QueryAnalyzer to use QueryVisitors
Lucene now allows us to explore the structure of a query using QueryVisitors,
delegating the knowledge of how to recurse through and collect terms to the
query implementations themselves. The percolator currently has a home-grown
external version of this API to construct sets of matching terms that must be
present in a document in order for it to possibly match the query.

This commit removes the home-grown implementation in favour of one using
QueryVisitor. This has the added benefit of making interval queries available
for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050)
it also includes a clone of some of the lucene intervals logic, that can be removed
once upstream has been fixed.

Closes #45639
2019-11-20 09:21:01 +00:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Jim Ferenczi bd6e2592a7 Remove the SearchContext from the highlighter context (#47733)
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates #47198
Relates #46523
2019-10-10 10:34:10 +02:00
Jim Ferenczi 08f28e642b Replace SearchContext with QueryShardContext in query builder tests (#46978)
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates #46523
2019-09-23 20:24:02 +02:00
Christoph Büscher aa0c586b73 Deprecate `_field_names` disabling (#42854)
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.

Closes #27239
2019-09-11 14:58:08 +02:00
Jim Ferenczi 425b1a77e8 Add more context to QueryShardContext (#46584)
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
2019-09-11 12:24:51 +02:00
Mark Tozzi aec125faff
Support Range Fields in Histogram and Date Histogram (#46012)
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)

     * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
     * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
     * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type.  This is similar to how Terms aggregator works.
     * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
2019-08-28 09:06:09 -04:00
Jim Ferenczi cdf55cb5c5 Refactor index engines to manage readers instead of searchers (#43860)
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
2019-07-04 22:49:43 +02:00
Christoph Büscher 36360358b2 Move query builder caching check to dedicated tests (#43238)
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.

This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.

Closes #43200
2019-06-27 14:56:29 +02:00
Gürkan Kaymak 1d09367a82 Fixed ignoring name parameter for percolator queries (#42598)
Closes #40405
2019-05-28 09:38:00 +02:00
Ryan Ernst a49bafc194
Split document and metadata fields in GetResult (#38373) (#42456)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
2019-05-23 14:01:07 -07:00
Martijn van Groningen 1d3ece1e96
Remove -Xlint exclusions in the percolator module. (#40372)
Relates to #40366
2019-03-26 07:55:02 +01:00
Adrien Grand 9731ba4338
Make the `type` parameter optional when percolating existing documents. (#39987) (#39989)
`document_type` is the type to use for parsing the document to percolate, which
is already optional and deprecated. However `percotale` queries also have the
ability to percolate existing documents, identified by an index, an id and a
type. This change makes the latter optional and deprecated.

Closes #39963
2019-03-13 15:04:41 +01:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Mayya Sharipova ec32e66088 Deprecate reference to _type in lookup queries (#37016)
Relates to #35190
2019-01-08 18:46:41 -08:00
Josh Soref d3e98278c3 Spelling: replace cachable with cacheable (#37047) 2019-01-02 14:10:30 +01:00
Nhat Nguyen 7580d9d925
Make SourceToParse immutable (#36971)
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.

Relates #36921
2018-12-24 14:06:50 -05:00
Boaz Leskes e356b8cb95
Add doc's sequence number + primary term to GetResult and use it for updates (#36680)
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
2018-12-17 15:22:13 +01:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Martijn van Groningen 8de3c6e618
Ignore date ranges containing 'now' when pre-processing a percolator query (#35160)
Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query, 
so that the query analyzer can't extract it and the `percolate` query 
is  then able to evaluate 'now' at query time.
2018-11-07 20:41:27 +01:00
Nick Knize a5e1f4d3a2 Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224) 2018-11-06 11:55:23 +01:00
Nik Everett e28509fbfe
Core: Less settings to AbstractComponent (#35140)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.

I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler

These files don't *need* `settings` either, but this change is large
enough as is.

Relates to #34488
2018-10-31 21:23:20 -04:00
Tal Levy e1fdd00420
Lowercase static final DeprecationLogger instance names (#34887)
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.

Historically, DeprecationLogger was uppercased more frequently than lowercased.
2018-10-25 21:12:19 -07:00
Christoph Büscher b654d986d7
Add OneStatementPerLineCheck to Checkstyle rules (#33682)
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
2018-09-21 11:52:31 +02:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Julie Tibshirani 78df00ff24
Simplify the return type of FieldMapper#parse. (#32654) 2018-09-04 01:15:19 +00:00
Jason Tedor 91a052b617
Merge branch 'master' into ccr
* master:
  Add hook to skip asserting x-content equivalence (#33114)
  Muted testListenersThrowingExceptionsDoNotCauseOtherListenersToBeSkipped
  [Rollup] Move getMetadata() methods out of rollup config objects (#32579)
  Muted testEmptyAuthorizedIndicesSearchForAllDisallowNoIndices
  Update Google Cloud Storage Library for Java (#32940)
  Remove unsupported Version.V_5_* (#32937)
2018-08-24 06:55:10 -04:00
Jim Ferenczi f4e9729d64
Remove unsupported Version.V_5_* (#32937)
This change removes the es 5x version constants and their usages.
2018-08-24 09:51:21 +02:00
Nhat Nguyen 6556186d9a Merge branch 'master' into ccr 2018-08-14 12:11:35 -04:00
Luca Cavanna 5c2ef5e869
Preserve index_uuid when creating QueryShardException (#32677)
As part of #32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.
2018-08-08 09:57:11 +02:00
Nhat Nguyen d0f3ed5abd Merge branch 'master' into ccr
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
2018-07-19 23:03:01 -04:00
Julie Tibshirani 15ff3da653
Add support for field aliases. (#32172)
* Add basic support for field aliases in index mappings. (#31287)
* Allow for aliases when fetching stored fields. (#31411)
* Add tests around accessing field aliases in scripts. (#31417)
* Add documentation around field aliases. (#31538)
* Add validation for field alias mappings. (#31518)
* Return both concrete fields and aliases in DocumentFieldMappers#getMapper. (#31671)
* Make sure that field-level security is enforced when using field aliases. (#31807)
* Add more comprehensive tests for field aliases in queries + aggregations. (#31565)
* Remove the deprecated method DocumentFieldMappers#getFieldMapper. (#32148)
2018-07-18 09:33:09 -07:00
Nhat Nguyen 6dd3434519 Merge branch 'master' into ccr
* master:
  REST high-level client: add get index API (#31703)
  SQL: Allow long literals (#31777)
  SQL: Fix incorrect message for aliases (#31792)
  Test: Do not remove xpack templates when cleaning (#31642)
  Reduce more raw types warnings (#31780)
  Add unreleased version 6.3.2
  Scripting: Remove support for deprecated StoredScript contexts (#31394)
  [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817)
  [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800)
  [ML] Fix calendar and filter updates from non-master nodes (#31804)
  Fix license header generation on Windows (#31790)
  mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix
  mark SearchAsyncActionTests.testFanOutAndCollect as AwaitsFix
  Correct exclusion of test on JDK 11
  Fix doclint jdk 11
  Add JDK11 support and enable in CI (#31644)
  Watcher: Fix check for currently executed watches (#31137)
  Watcher: Ensure correct method is used to read secure settings (#31753)
  SQL: Update CLI logo
2018-07-05 14:07:06 -04:00
Christoph Büscher bd1c513422
Reduce more raw types warnings (#31780)
Similar to #31523.
2018-07-05 15:38:06 +02:00
Simon Willnauer 5c6711b8a4
Use a `_recovery_source` if source is omitted or modified (#31106)
Today if a user omits the `_source` entirely or modifies the source
on indexing we have no chance to re-create the document after it has
been added. This is an issue for CCR and recovery based on soft deletes
which we are going to make the default. This change adds an additional
recovery source if the source is disabled or modified that is only kept
around until the document leaves the retention policy window.

This change adds a merge policy that efficiently removes this extra source
on merge for all document that are live and not in the retention policy window
anymore.
2018-06-07 07:39:28 +02:00
Ke Li d2b9a765cf Remove version argument in RangeFieldType (#30411)
The argument `indexVersionCreated` is not needed any more and
can be removed.
2018-05-16 17:42:44 +02:00
Adrien Grand 368ddc408f
Remove MapperService#types(). (#29617)
This isn't be necessary with a single type per index.
2018-05-02 11:35:12 +02:00
Adrien Grand 4918924fae
Remove legacy mapping code. (#29224)
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
2018-04-11 09:41:37 +02:00
Martijn van Groningen 182cf11f37
Fixed bug when non percolator docs end up in the search hits.
In the case that a document with a percolator field is matched when using the `percolate` query then
the fetch phase can fail due to the fact that the percolator can't resolve any query from that document.

Closes #29429
2018-04-10 13:33:31 +02:00
Martijn van Groningen 2346f7fa89
removed unused import 2018-04-10 07:44:51 +02:00
Martijn van Groningen f4395c0c94
Fixed a msm accounting error that can occur during analyzing a percolator query.
In case of a disjunction query with both range and term based clauses and
msm specified, the query analyzer needs to also reduce the msn if a range
based clause for the same field is encountered. This did not happen.

Instead of fixing this bug the logic has been simplified to just set a
percolator query's msm to 1 if a disjunction contains range clauses and
msm on disjunction has been specified. The logic would otherwise just get
to complex and the performance gain isn't that much for this kind of
percolator queries.

In case a percolator query has clauses that have duplicate terms or ranges then
for disjunction clauses with a minimum should match the query extraction of the
clause with the lowest msm should be used and for conjunction queries query
extractions wiht duplicate terms/ranges the msn should be ignored. If this
is not done then percolator queries that should match never match.

Example percolator query: value1 OR value2 OR value2 OR value3 OR value3 OR value3 OR value4 OR value5 (msm set to 3)
In the above example query the extracted msm would be 3
Example document1: value1 value2 value3
With the msm and extracted terms this would match and is expected behaviour
Example document2: value3
This document should match too (value3 appears in 3 clauses), but with msm set to 3 and the fact
that fact that only distinct values are indexed in extracted terms field this document would

Also added another random duel test.

Closes #29393
2018-04-10 07:25:12 +02:00
Adrien Grand 0f00277851
Simplify analysis of `bool` queries. (#29430)
This change tries to simplify the extraction logic of boolean queries by
concentrating the logic into two methods: one that merges results for
conjunctions, and another one for disjunctions. Other concerns, like the impact
of prohibited clauses or how an `UnsupportedQueryException` should be treated
are applied on top of those two methods.

This is mostly a code reorganization, it doesn't change the result of query
extraction except in the case that a query both has required clauses and a
minimum number of `SHOULD` clauses that is greater than 1, which we now
rewrite into a pure conjunction. For instance `(+A B C)~1` is rewritten into
`(+A +(B C))` prior to extraction.
2018-04-09 16:34:45 +02:00
Adrien Grand 85f5382a3c
Fix more query extraction bugs. (#29388)
I found the following bugs:
 - The 6.0 logic for conjunctions didn't work when there were only `match_all`
   queries in MUST/FILTER clauses as they didn't propagate the `matchAllDocs`
   flag.
 - Some queries still had the same issue as `BooleanQuery` used to have with
   duplicate terms (see #28353), eg. `MultiPhraseQuery`.

Closes #29376
2018-04-06 10:44:34 +02:00
Adrien Grand c21057b3a2 Fix QueryAnalyzerTests.
Closes #29363
2018-04-04 12:48:42 +02:00
Jason Tedor a19fd5636b Add awaits fix for a query analyzer test
The test QueryAnalyzerTests#testExactMatch_booleanQuery is failing since
8cdd950056. This commit adds an awaits fix
for it until it can be addressed.
2018-04-04 05:40:13 -04:00