Commit Graph

371 Commits

Author SHA1 Message Date
Mayya Sharipova 70e63a365a
Refactor how to determine if a field is metafield (#57378) (#57771)
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
2020-06-08 09:16:18 -04:00
Mark Tozzi e50f514092
IndexFieldData should hold the ValuesSourceType (#57373) (#57532) 2020-06-02 12:16:53 -04:00
Alan Woodward d6b79bcd95 Remove Mapper.updateFieldType() (#57151)
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.

Relates to #41059
Backport of #56986
2020-05-27 09:21:24 +01:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Alan Woodward d33d13f2be Simplify generics on Mapper.Builder (#56747)
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
2020-05-15 12:14:49 +01:00
Mark Tozzi b718193a01
Clean up DocValuesIndexFieldData (#56372) (#56684) 2020-05-14 12:42:37 -04:00
Julie Tibshirani 1ad83c37c4
Use index sort range query when possible. (#56710)
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.

Relates to #48665.
2020-05-13 13:24:45 -07:00
Ignacio Vera b4521d5183
upgrade to Lucene 8.6.0 snapshot (#56661) 2020-05-13 14:25:16 +02:00
Julie Tibshirani e852bb29b7
Simplify signature of FieldMapper#parseCreateField. (#56144)
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.

This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
2020-05-06 11:12:09 -07:00
Tal Levy 254d1e3543
[7.x] Create new `geo` module and migrate geo_shape registration (#53562) (#54924)
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.

As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.

Co-authored-by: Nicholas Knize <nknize@gmail.com>
2020-04-07 16:30:58 -07:00
Christoph Büscher 8c9ac14a98
Rename field name constants in AbstractBuilderTestCase (#53234)
Some field name constants were not updaten when we moved from "string" to "text"
and "keyword" fields. Renaming them makes it easier and faster to know which
field type is used in test subclassing this base test case.
2020-04-03 17:28:22 +02:00
Mayya Sharipova bf4857d9e0
Search hit refactoring (#41656) (#54584)
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.

Relates to PR: #38373
Relates to issue #24422

Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>
2020-04-01 15:19:00 -04:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
Jake Landis db3420d757
[7.x] Optimize which Rest resources are used by the Rest tests… (#53766)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-19 12:28:59 -05:00
Alan Woodward d325899c54 Use QueryVisitor when extracting PercolatorQuery list for highlighting (#53728)
The highlighting phase for percolator queries currently uses some custom query
traversal logic to find all instances of PercolatorQuery in the query tree for the
current search context. This commit converts things to instead use a QueryVisitor,
which future-proofs us against new wrapper queries or queries from custom
plugins that the percolator module doesn't know about.
2020-03-18 15:24:49 +00:00
Alan Woodward 5c861cfe6e Upgrade to final lucene 8.5.0 snapshot (#53293)
Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use
the latest snapshot to check that there are no last-minute bugs or regressions.
2020-03-10 09:32:59 +00:00
Alan Woodward 3759063d34 Allow specifying an exclusive set of fields on ObjectParser (#52893)
ObjectParser allows you to declare a set of required fields, such that at least one
of the set must appear in an xcontent object for it to be valid. This commit adds
the similar concept of a set of exclusive fields, such that at most one of the set
must be present. It also enables required fields on ConstructingObjectParser, and
re-implements PercolateQueryBuilder.fromXContent() to use object parsing as
an example of how this works.
2020-03-03 10:56:20 +00:00
Adrien Grand 1807f86751
Generalize how queries on `_index` are handled at rewrite time (#52815)
Generalize how queries on `_index` are handled at rewrite time (#52486)

Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.

This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.

Closes #49254
2020-02-26 15:37:43 +01:00
Alan Woodward a76ec765e5 Ensure that percolator sorting also works (#52758)
Commit #52748 fixed a bug where percolate queries wrapped in a constant score
could report incorrect matches. This commit adds a test to check that it also fixes
the case where a percolate query is sorted by something other than score.

Closes #52618
2020-02-26 10:52:04 +00:00
Alan Woodward 638f3e4183 Use ByteBuffersDirectory rather than RAMDirectory (#52768)
Lucene's RAMDirectory has been deprecated. This commit replaces all uses of
RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses
are in tests, but the percolator and painless executor may get some small speedups.
2020-02-25 15:46:35 +00:00
Alan Woodward 18663b0a85 Don't index ranges including NOW in percolator (#52748)
Currently, date ranges queries using NOW-based date math are rewritten to
MatchAllDocs queries when being preprocessed for the percolator. However,
since we added the verification step, this can result in incorrect matches when
percolator queries are run without scores. This commit changes things to instead
wrap date queries that use NOW with a new DateRangeIncludingNowQuery.
This is a simple wrapper query that returns its delegate at rewrite time, but it can
be detected by the percolator QueryAnalyzer and be dealt with accordingly.

This also allows us to remove a method on QueryRewriteContext, and push all
logic relating to NOW-based ranges into the DateFieldMapper.

Fixes #52617
2020-02-25 12:18:16 +00:00
Marios Trivyzas dac720d7a1
Add a cluster setting to disallow expensive queries (#51385) (#52279)
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.

- Queries that need to do linear scans to identify matches:
  - Script queries
- Queries that have a high up-front cost:
  - Fuzzy queries
  - Regexp queries
  - Prefix queries (without index_prefixes enabled
  - Wildcard queries
  - Range queries on text and keyword fields
- Joining queries
  - HasParent queries
  - HasChild queries
  - ParentId queries
  - Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
  - Script score queries
  - Percolate queries

Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
2020-02-12 22:56:14 +01:00
Julie Tibshirani 337d73a7c6 Rename MapperService#fullName to fieldType.
The new name more accurately describes what the method returns.
2020-02-07 10:35:53 -08:00
Alan Woodward a3ab7eb95d Correctly handle MSM for nested disjunctions (#50669)
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.

This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.

Fixes #50305
2020-01-07 09:32:30 +00:00
Alan Woodward 3d8c2f9e18 Fix query analyzer logic for mixed conjunctions of terms and ranges (#49803)
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.

The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.

Fixes #49684
2019-12-10 11:01:52 +00:00
jimczi 35732504ba #49166 Fix spurious test failure 2019-11-28 11:08:15 +01:00
Jim Ferenczi d6445fae4b Add a cluster setting to disallow loading fielddata on _id field (#49166)
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.

Closes #43599
2019-11-28 09:35:28 +01:00
Alan Woodward c6b31162ba
Refactor percolator's QueryAnalyzer to use QueryVisitors
Lucene now allows us to explore the structure of a query using QueryVisitors,
delegating the knowledge of how to recurse through and collect terms to the
query implementations themselves. The percolator currently has a home-grown
external version of this API to construct sets of matching terms that must be
present in a document in order for it to possibly match the query.

This commit removes the home-grown implementation in favour of one using
QueryVisitor. This has the added benefit of making interval queries available
for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050)
it also includes a clone of some of the lucene intervals logic, that can be removed
once upstream has been fixed.

Closes #45639
2019-11-20 09:21:01 +00:00
Rory Hunter c46a0e8708
Apply 2-space indent to all gradle scripts (#49071)
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
2019-11-14 11:01:23 +00:00
Jim Ferenczi bd6e2592a7 Remove the SearchContext from the highlighter context (#47733)
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.

Relates #47198
Relates #46523
2019-10-10 10:34:10 +02:00
Jim Ferenczi 08f28e642b Replace SearchContext with QueryShardContext in query builder tests (#46978)
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.

Relates #46523
2019-09-23 20:24:02 +02:00
Christoph Büscher aa0c586b73 Deprecate `_field_names` disabling (#42854)
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.

Closes #27239
2019-09-11 14:58:08 +02:00
Jim Ferenczi 425b1a77e8 Add more context to QueryShardContext (#46584)
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.

Relates #46523
2019-09-11 12:24:51 +02:00
Mark Tozzi aec125faff
Support Range Fields in Histogram and Date Histogram (#46012)
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)

     * Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
     * Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
     * Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type.  This is similar to how Terms aggregator works.
     * Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
2019-08-28 09:06:09 -04:00
Jim Ferenczi cdf55cb5c5 Refactor index engines to manage readers instead of searchers (#43860)
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
2019-07-04 22:49:43 +02:00
Christoph Büscher 36360358b2 Move query builder caching check to dedicated tests (#43238)
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.

This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.

Closes #43200
2019-06-27 14:56:29 +02:00
Gürkan Kaymak 1d09367a82 Fixed ignoring name parameter for percolator queries (#42598)
Closes #40405
2019-05-28 09:38:00 +02:00
Ryan Ernst a49bafc194
Split document and metadata fields in GetResult (#38373) (#42456)
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
2019-05-23 14:01:07 -07:00
Martijn van Groningen 1d3ece1e96
Remove -Xlint exclusions in the percolator module. (#40372)
Relates to #40366
2019-03-26 07:55:02 +01:00
Adrien Grand 9731ba4338
Make the `type` parameter optional when percolating existing documents. (#39987) (#39989)
`document_type` is the type to use for parsing the document to percolate, which
is already optional and deprecated. However `percotale` queries also have the
ability to percolate existing documents, identified by an index, an id and a
type. This change makes the latter optional and deprecated.

Closes #39963
2019-03-13 15:04:41 +01:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Mayya Sharipova ec32e66088 Deprecate reference to _type in lookup queries (#37016)
Relates to #35190
2019-01-08 18:46:41 -08:00
Josh Soref d3e98278c3 Spelling: replace cachable with cacheable (#37047) 2019-01-02 14:10:30 +01:00
Nhat Nguyen 7580d9d925
Make SourceToParse immutable (#36971)
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.

Relates #36921
2018-12-24 14:06:50 -05:00
Boaz Leskes e356b8cb95
Add doc's sequence number + primary term to GetResult and use it for updates (#36680)
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
2018-12-17 15:22:13 +01:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Martijn van Groningen 8de3c6e618
Ignore date ranges containing 'now' when pre-processing a percolator query (#35160)
Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.

The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query, 
so that the query analyzer can't extract it and the `percolate` query 
is  then able to evaluate 'now' at query time.
2018-11-07 20:41:27 +01:00
Nick Knize a5e1f4d3a2 Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (#35224) 2018-11-06 11:55:23 +01:00
Nik Everett e28509fbfe
Core: Less settings to AbstractComponent (#35140)
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.

I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler

These files don't *need* `settings` either, but this change is large
enough as is.

Relates to #34488
2018-10-31 21:23:20 -04:00
Tal Levy e1fdd00420
Lowercase static final DeprecationLogger instance names (#34887)
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.

Historically, DeprecationLogger was uppercased more frequently than lowercased.
2018-10-25 21:12:19 -07:00
Christoph Büscher b654d986d7
Add OneStatementPerLineCheck to Checkstyle rules (#33682)
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
2018-09-21 11:52:31 +02:00
Jim Ferenczi 7ad71f906a
Upgrade to a Lucene 8 snapshot (#33310)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309

Closes #32899
2018-09-06 14:42:06 +02:00
Julie Tibshirani 78df00ff24
Simplify the return type of FieldMapper#parse. (#32654) 2018-09-04 01:15:19 +00:00
Jason Tedor 91a052b617
Merge branch 'master' into ccr
* master:
  Add hook to skip asserting x-content equivalence (#33114)
  Muted testListenersThrowingExceptionsDoNotCauseOtherListenersToBeSkipped
  [Rollup] Move getMetadata() methods out of rollup config objects (#32579)
  Muted testEmptyAuthorizedIndicesSearchForAllDisallowNoIndices
  Update Google Cloud Storage Library for Java (#32940)
  Remove unsupported Version.V_5_* (#32937)
2018-08-24 06:55:10 -04:00
Jim Ferenczi f4e9729d64
Remove unsupported Version.V_5_* (#32937)
This change removes the es 5x version constants and their usages.
2018-08-24 09:51:21 +02:00
Nhat Nguyen 6556186d9a Merge branch 'master' into ccr 2018-08-14 12:11:35 -04:00
Luca Cavanna 5c2ef5e869
Preserve index_uuid when creating QueryShardException (#32677)
As part of #32608 we made sure that the fully qualified index name is taken from the query shard context whenever creating a new `QueryShardException`. That change introduced a regression as instead of setting the entire `Index` object to the exception, which holds index name and index uuid, we ended up setting only the index name (including cluster alias). With this commit we make sure that the index uuid does not get lost and we try to lower the chances that a similar bug makes it in another time. That's done by making `QueryShardContext` return the fully qualified `Index` (which also holds the uuid) rather than only the fully qualified index name.
2018-08-08 09:57:11 +02:00
Nhat Nguyen d0f3ed5abd Merge branch 'master' into ccr
* master:
  Painless: Simplify Naming in Lookup Package (#32177)
  Handle missing values in painless (#32207)
  add support for write index resolution when creating/updating documents (#31520)
  ECS Task IAM profile credentials ignored in repository-s3 plugin (#31864)
  Remove indication of future multi-homing support (#32187)
  Rest test - allow for snapshots to take 0 milliseconds
  Make x-pack-core generate a pom file
  Rest HL client: Add put watch action (#32026)
  Build: Remove pom generation for plugin zip files (#32180)
  Fix comments causing errors with Java 11
  Fix rollup on date fields that don't support epoch_millis (#31890)
  Detect and prevent configuration that triggers a Gradle bug (#31912)
  [test] port linux package packaging tests (#31943)
  Revert "Introduce a Hashing Processor (#31087)" (#32178)
  Remove empty @return from JavaDoc
  Adjust SSLDriver behavior for JDK11 changes (#32145)
  [test] use randomized runner in packaging tests (#32109)
  Add support for field aliases. (#32172)
  Painless: Fix caching bug and clean up addPainlessClass. (#32142)
  Call setReferences() on custom referring tokenfilters in _analyze (#32157)
  Fix BwC Tests looking for UUID Pre 6.4 (#32158)
  Improve docs for search preferences (#32159)
  use before instead of onOrBefore
  Add more contexts to painless execute api (#30511)
  Add EC2 credential test for repository-s3 (#31918)
  A replica can be promoted and started in one cluster state update (#32042)
  Fix Java 11 javadoc compile problem
  Fix CP for namingConventions when gradle home has spaces (#31914)
  Fix `range` queries on `_type` field for singe type indices (#31756)
  [DOCS] Update TLS on Docker for 6.3 (#32114)
  ESIndexLevelReplicationTestCase doesn't support replicated failures but it's good to know what they are
  Remove versionType from translog (#31945)
  Switch distribution to new style Requests (#30595)
  Build: Skip jar tests if jar disabled
  Painless: Add PainlessClassBuilder (#32141)
  Build: Make additional test deps of check (#32015)
  Disable C2 from using AVX-512 on JDK 10 (#32138)
  Build: Move shadow customizations into common code (#32014)
  Painless: Fix Bug with Duplicate PainlessClasses (#32110)
  Remove empty @param from Javadoc
  Re-disable packaging tests on suse boxes
  Docs: Fix missing example script quote (#32010)
  [ML] Wait for aliases in multi-node tests (#32086)
  [ML] Move analyzer dependencies out of categorization config (#32123)
  Ensure to release translog snapshot in primary-replica resync (#32045)
  Handle TokenizerFactory  TODOs (#32063)
  Relax TermVectors API to work with textual fields other than TextFieldType (#31915)
  Updates the build to gradle 4.9 (#32087)
  Mute :qa:mixed-cluster indices.stats/10_index/Index - all’
  Check that client methods match API defined in the REST spec (#31825)
  Enable testing in FIPS140 JVM (#31666)
  Fix put mappings java API documentation (#31955)
  Add exclusion option to `keep_types` token filter (#32012)
  [Test] Modify assert statement for ssl handshake (#32072)
2018-07-19 23:03:01 -04:00
Julie Tibshirani 15ff3da653
Add support for field aliases. (#32172)
* Add basic support for field aliases in index mappings. (#31287)
* Allow for aliases when fetching stored fields. (#31411)
* Add tests around accessing field aliases in scripts. (#31417)
* Add documentation around field aliases. (#31538)
* Add validation for field alias mappings. (#31518)
* Return both concrete fields and aliases in DocumentFieldMappers#getMapper. (#31671)
* Make sure that field-level security is enforced when using field aliases. (#31807)
* Add more comprehensive tests for field aliases in queries + aggregations. (#31565)
* Remove the deprecated method DocumentFieldMappers#getFieldMapper. (#32148)
2018-07-18 09:33:09 -07:00
Nhat Nguyen 6dd3434519 Merge branch 'master' into ccr
* master:
  REST high-level client: add get index API (#31703)
  SQL: Allow long literals (#31777)
  SQL: Fix incorrect message for aliases (#31792)
  Test: Do not remove xpack templates when cleaning (#31642)
  Reduce more raw types warnings (#31780)
  Add unreleased version 6.3.2
  Scripting: Remove support for deprecated StoredScript contexts (#31394)
  [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817)
  [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800)
  [ML] Fix calendar and filter updates from non-master nodes (#31804)
  Fix license header generation on Windows (#31790)
  mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix
  mark SearchAsyncActionTests.testFanOutAndCollect as AwaitsFix
  Correct exclusion of test on JDK 11
  Fix doclint jdk 11
  Add JDK11 support and enable in CI (#31644)
  Watcher: Fix check for currently executed watches (#31137)
  Watcher: Ensure correct method is used to read secure settings (#31753)
  SQL: Update CLI logo
2018-07-05 14:07:06 -04:00
Christoph Büscher bd1c513422
Reduce more raw types warnings (#31780)
Similar to #31523.
2018-07-05 15:38:06 +02:00
Simon Willnauer 5c6711b8a4
Use a `_recovery_source` if source is omitted or modified (#31106)
Today if a user omits the `_source` entirely or modifies the source
on indexing we have no chance to re-create the document after it has
been added. This is an issue for CCR and recovery based on soft deletes
which we are going to make the default. This change adds an additional
recovery source if the source is disabled or modified that is only kept
around until the document leaves the retention policy window.

This change adds a merge policy that efficiently removes this extra source
on merge for all document that are live and not in the retention policy window
anymore.
2018-06-07 07:39:28 +02:00
Ke Li d2b9a765cf Remove version argument in RangeFieldType (#30411)
The argument `indexVersionCreated` is not needed any more and
can be removed.
2018-05-16 17:42:44 +02:00
Adrien Grand 368ddc408f
Remove MapperService#types(). (#29617)
This isn't be necessary with a single type per index.
2018-05-02 11:35:12 +02:00
Adrien Grand 4918924fae
Remove legacy mapping code. (#29224)
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
2018-04-11 09:41:37 +02:00
Martijn van Groningen 182cf11f37
Fixed bug when non percolator docs end up in the search hits.
In the case that a document with a percolator field is matched when using the `percolate` query then
the fetch phase can fail due to the fact that the percolator can't resolve any query from that document.

Closes #29429
2018-04-10 13:33:31 +02:00
Martijn van Groningen 2346f7fa89
removed unused import 2018-04-10 07:44:51 +02:00
Martijn van Groningen f4395c0c94
Fixed a msm accounting error that can occur during analyzing a percolator query.
In case of a disjunction query with both range and term based clauses and
msm specified, the query analyzer needs to also reduce the msn if a range
based clause for the same field is encountered. This did not happen.

Instead of fixing this bug the logic has been simplified to just set a
percolator query's msm to 1 if a disjunction contains range clauses and
msm on disjunction has been specified. The logic would otherwise just get
to complex and the performance gain isn't that much for this kind of
percolator queries.

In case a percolator query has clauses that have duplicate terms or ranges then
for disjunction clauses with a minimum should match the query extraction of the
clause with the lowest msm should be used and for conjunction queries query
extractions wiht duplicate terms/ranges the msn should be ignored. If this
is not done then percolator queries that should match never match.

Example percolator query: value1 OR value2 OR value2 OR value3 OR value3 OR value3 OR value4 OR value5 (msm set to 3)
In the above example query the extracted msm would be 3
Example document1: value1 value2 value3
With the msm and extracted terms this would match and is expected behaviour
Example document2: value3
This document should match too (value3 appears in 3 clauses), but with msm set to 3 and the fact
that fact that only distinct values are indexed in extracted terms field this document would

Also added another random duel test.

Closes #29393
2018-04-10 07:25:12 +02:00
Adrien Grand 0f00277851
Simplify analysis of `bool` queries. (#29430)
This change tries to simplify the extraction logic of boolean queries by
concentrating the logic into two methods: one that merges results for
conjunctions, and another one for disjunctions. Other concerns, like the impact
of prohibited clauses or how an `UnsupportedQueryException` should be treated
are applied on top of those two methods.

This is mostly a code reorganization, it doesn't change the result of query
extraction except in the case that a query both has required clauses and a
minimum number of `SHOULD` clauses that is greater than 1, which we now
rewrite into a pure conjunction. For instance `(+A B C)~1` is rewritten into
`(+A +(B C))` prior to extraction.
2018-04-09 16:34:45 +02:00
Adrien Grand 85f5382a3c
Fix more query extraction bugs. (#29388)
I found the following bugs:
 - The 6.0 logic for conjunctions didn't work when there were only `match_all`
   queries in MUST/FILTER clauses as they didn't propagate the `matchAllDocs`
   flag.
 - Some queries still had the same issue as `BooleanQuery` used to have with
   duplicate terms (see #28353), eg. `MultiPhraseQuery`.

Closes #29376
2018-04-06 10:44:34 +02:00
Adrien Grand c21057b3a2 Fix QueryAnalyzerTests.
Closes #29363
2018-04-04 12:48:42 +02:00
Jason Tedor a19fd5636b Add awaits fix for a query analyzer test
The test QueryAnalyzerTests#testExactMatch_booleanQuery is failing since
8cdd950056. This commit adds an awaits fix
for it until it can be addressed.
2018-04-04 05:40:13 -04:00
Adrien Grand 8cdd950056
Fix some query extraction bugs. (#29283)
While playing with the percolator I found two bugs:
 - Sometimes we set a min_should_match that is greater than the number of
   extractions. While this doesn't cause direct trouble, it does when the query
   is nested into a boolean query and the boolean query tries to compute the
   min_should_match for the entire query based on its own min_should_match and
   those of the sub queries. So I changed the code to throw an exception when
   min_should_match is greater than the number of extractions.
 - Boolean queries claim matches are verified when in fact they shouldn't. This
   is due to the fact that boolean queries assume that they are verified if all
   sub clauses are verified but things are more complex than that, eg.
   conjunctions that are nested in a disjunction or disjunctions that are nested
   in a conjunction can generally not be verified without running the query.
2018-04-03 16:44:26 +02:00
Lee Hinman 6b2167f462
Begin moving XContent to a separate lib/artifact (#29300)
* Begin moving XContent to a separate lib/artifact

This commit moves a large portion of the XContent code from the `server` project
to the `libs/xcontent` project. For the pieces that have been moved, some
helpers have been duplicated to allow them to be decoupled from ES helper
classes. In addition, `Booleans` and `CheckedFunction` have been moved to the
`elasticsearch-core`  project.

This decoupling is a move so that we can eventually make things like the
high-level REST client not rely on the entire ES jar, only the parts it needs.

There are some pieces that are still not decoupled, in particular some of the
XContent tests still remain in the server project, this is because they test a
large portion of the pluggable xcontent pieces through
`XContentElasticsearchException`. They may be decoupled in future work.
Additionally, there may be more piecese that we want to move to the xcontent lib
in the future that are not part of this PR, this is a starting point.

Relates to #28504
2018-04-02 15:58:31 -06:00
Lee Hinman b4af451ec5
Remove BytesArray and BytesReference usage from XContentFactory (#29151)
* Remove BytesArray and BytesReference usage from XContentFactory

This removes the usage of `BytesArray` and `BytesReference` from
`XContentFactory`. Instead, a regular `byte[]` should be passed. To assist with
this a helper has been added to `XContentHelper` that will preserve the offset
and length from the underlying BytesReference.

This is part of ongoing work to separate the XContent parts from ES so they can
be factored into their own jar.

Relates to #28504
2018-03-20 11:52:26 -06:00
Lee Hinman 8e8fdc4f0e
Decouple XContentBuilder from BytesReference (#28972)
* Decouple XContentBuilder from BytesReference

This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.

While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).

Relates to #28504
2018-03-14 13:47:57 -06:00
Martijn van Groningen beb22d89c8 percolator: Take `matchAllDocs` and `verified` of the sub result into account when analyzing a function_score query.
Before the `matchAllDocs` was ignored and this could lead to percolator queries not matching when
the inner query was a match_all query and min_score was specified.

Before when `verified` was not taken into account if the function_score query wrapped an unverified query this could
lead to matching percolator queries that shouldn't match at all.
2018-03-09 07:16:21 +01:00
Martijn van Groningen bcfb7ab591
Improved percolator's random candidate query duel test and
fixed bugs that were exposed by this:

* Duplicates query leafs were not detected in a multi level boolean query
* Tracking fields for numeric range queries did not work properly.
* The sorting that was used to find the less restrictive clauses in
  disjunction query did not work too.
2018-03-08 11:39:03 +01:00
Lee Hinman 818920a281
Decouple XContentType from StreamInput/Output (#28927)
This removes the readFrom and writeTo methods from XContentType, instead using
the more generic `readEnum` and `writeEnum` methods. Luckily they are both
encoded exactly the same way, so there is no compatibility layer needed for
backwards compatibility.

Relates to #28504
2018-03-07 14:50:30 -07:00
Lee Hinman 0dd79028c9
Remove deprecated createParser methods (#28697)
* Remove deprecated createParser methods

This removes the final instances of the callers of `XContent.createParser` and
`XContentHelper.createParser` that did not pass in the `DeprecationHandler`. It
also removes the now-unused deprecated methods and fully removes any mention of
Log4j or LoggingDeprecationHandler from the XContent code.

Relates to #28504

* Add comments in JsonXContentGenerator
2018-02-16 08:26:30 -07:00
Lee Hinman 7c1f5f5054
Move more XContent.createParser calls to non-deprecated version (#28670)
* Move more XContent.createParser calls to non-deprecated version

This moves more of the callers to pass in the DeprecationHandler.

Relates to #28504

* Use parser's deprecation handler where available
2018-02-14 09:01:40 -07:00
Ryan Ernst 20c37efea2
Build: Replace provided configuration with compileOnly (#28564)
When elasticsearch was originally moved to gradle, the "provided" equivalent in maven had to be done through a plugin. Since then, gradle added the "compileOnly" configuration. This commit removes the provided plugin and replaces all uses with compileOnly.
2018-02-09 11:30:24 -08:00
Lee Hinman 3ddea8d8d2
Start switching to non-deprecated ParseField.match method (#28488)
This commit switches all the modules and server test code to use the
non-deprecated `ParseField.match` method, passing in the parser's deprecation
handler or the logging deprecation handler when a parser is not available (like
in tests).

Relates to #28449
2018-02-02 10:10:13 -07:00
Martijn van Groningen ecb1d07d00
percolator: remove deprecated map_unmapped_fields_as_string setting 2018-02-01 11:11:22 +01:00
Martijn van Groningen 9bada306dc
Improved percolator candidate query tests. 2018-02-01 07:43:03 +01:00
Martijn van Groningen 204f4022c2
percolator: Do not take duplicate query extractions into account for minimum_should_match attribute
If a percolator query contains duplicate query clauses somewhere in the query tree then
when these clauses are extracted then they should not affect the msm.

This can lead a percolator query that should be a valid match not become a candidate match,
because at query time, the msm that is being used by the CoveringQuery would never match with
the msm used at index time.

Closes #28315
2018-01-30 07:25:33 +01:00
Adrien Grand 700d9ecc95
Remove the `update_all_types` option. (#28288)
This option is not useful in 7.x since no indices may have more than one type
anymore.
2018-01-22 12:03:07 +01:00
Martijn van Groningen 73f6857dff
test: ensure we endup with a single segment
Closes #28127
2018-01-10 15:14:26 +01:00
Jason Tedor 75c0cd0672
Move range field mapper back to core
This commit moves the range field mapper back to core so that we can
remove the compile-time dependency of percolator on mapper-extras which
compilcates dependency management for the percolator client JAR, and
modules should not be intertwined like this anyway.

Relates #27854
2017-12-17 14:27:10 -05:00
Martijn van Groningen e9160fc014
percolator: also extract match_all queries
I've seen several cases where match_all queries were being used inside percolator queries,
because these queries were created generated by other systems.

Extracting these queries will allow the percolator at query time in a filter context
to skip over these queries without parsing or validating that these queries actually
match with the document being percolated.
2017-12-15 08:50:29 +01:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Christoph Büscher b83e14858a Correcting some minor typos in comments 2017-12-07 16:39:23 +01:00
Adrien Grand 996990ad1f
Upgrade to lucene-7.2.0-snapshot-8c94404. (#27496)
The main highlight of this new snapshot is that it introduces the opportunity
for queries to opt out of caching. In case a query opts out of caching, not only
will it never be cached, but also no compound query that wraps it will be
cached.
2017-11-28 14:52:42 +01:00
Martijn van Groningen 4ab638b71d
percolator: Avoid TooManyClauses exception if number of terms / ranges is exactly equal to 1024
The logic whether to use CoveringQuery was in two places which is why this bug snug in.
2017-11-27 08:55:11 +01:00
Simon Willnauer 5a0b6d1977
Use the primary_term field to identify parent documents (#27469)
This change stops indexing the `_primary_term` field for nested documents
to allow fast retrieval of parent documents. Today we create a docvalues
field for children to ensure we have a dense datastructure on disk. Yet,
since we only use the primary term to tie-break on when we see the same
seqID on indexing having a dense datastructure is less important. We can
use this now to improve the nested docs performance and it's memory footprint.

Relates to #24362
2017-11-21 15:14:03 +01:00
Martijn van Groningen 7c056f4523
reword comment 2017-11-13 08:00:34 +01:00
Martijn van Groningen 1bd31e9b53
percolator: fixed issue where in indices created before 6.1 if minimum should match has been specified on a disjunction,
the query would be marked as verified candidate match. This is wrong as it can only marked as verified candidate match
on indices created on or after 6.1, due to the use of the CoveringQuery.
2017-11-10 12:02:33 +01:00
Martijn van Groningen b4048b4e7f
Use CoveringQuery to select percolate candidate matches and
extract all clauses from a conjunction query.

When clauses from a conjunction are extracted the number of clauses is
also stored in an internal doc values field (minimum_should_match field).
This field is used by the CoveringQuery and allows the percolator to
reduce the number of false positives when selecting candidate matches and
in certain cases be absolutely sure that a conjunction candidate match
will match and then skip MemoryIndex validation. This can greatly improve
performance.

Before this change only a single clause was extracted from a conjunction
query. The percolator tried to extract the clauses that was rarest in order
(based on term length) to attempt less candidate queries to be selected
in the first place. However this still method there is still a very high
chance that candidate query matches are false positives.

This change also removes the influencing query extraction added via #26081
as this is no longer needed because now all conjunction clauses are extracted.

https://www.elastic.co/guide/en/elasticsearch/reference/6.x/percolator.html#_influencing_query_extraction

Closes #26307
2017-11-10 07:44:42 +01:00
Colin Goodheart-Smithe 99aca9cdfc
Enhances exists queries to reduce need for `_field_names` (#26930)
* Enhances exists queries to reduce need for `_field_names`

Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.

This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.

Closes #26770

* Addresses review comments

* Addresses more review comments

* implements existsQuery explicitly on every mapper

* Reinstates ability to perform term query on `_field_names`

* Added bwc depending on index created version

* Review Comments

* Skips tests that are not supported in 6.1.0

These values will need to be changed after backporting this PR to 6.x
2017-11-01 10:46:59 +00:00
Tanguy Leroux 6658ff0fd6 Don't detect source's XContentType in DocumentParser.parseDocument() (#26880)
DocumentParser.parseDocument() auto detects the XContentType of the
document to parse, but this information is already provided by SourceToParse.
2017-10-10 15:31:56 +02:00
Martijn van Groningen 805437b8bc
percolator: Also support query extraction for queries wrapped inside a ESToParentBlockJoinQuery 2017-09-28 09:28:50 +02:00
Simon Willnauer 9f97f9072a Allow `InputStreamStreamInput` array size validation where applicable (#26692)
Today we can't validate the array length in `InputStreamStreamInput` since
we can't rely on `InputStream.available` yet in some situations we know
the size of the stream and can apply additional validation.
2017-09-18 17:52:36 +02:00
Jim Ferenczi 401f4ba2ce Fix percolator highlight sub fetch phase to not highlight query twice (#26622)
* Fix percolator highlight sub fetch phase to not highlight query twice

The PercolatorHighlightSubFetchPhase does not override hitExecute and since it extends HighlightPhase the search hits
are highlighted twice (by the highlight phase and then by the percolator). This does not alter the results, the second highlighting
just overrides the first one but this slow down the request because it duplicates the work.
2017-09-14 09:31:14 +02:00
Adrien Grand 93da7720ff Move non-core mappers to a module. (#26549)
Today we have all non-plugin mappers in core. I'd like to start moving those
that neither map to json datatypes nor are very frequently used like `date` or
`ip` to a module.

This commit creates a new module called `mappers-extra` and moves the
`scaled_float` and `token_count` mappers to it. I'd like to eventually move
`range` fields there but it's more complicated due to their intimate
relationship with range queries.

Relates #10368
2017-09-13 17:58:53 +02:00
Adrien Grand 1adee8b5a8 Fix the MapperFieldType.rangeQuery API. (#26552)
RangeQueryBuilder needs to perform too many `instanceof` checks in order to
check for `date` or `range` fields in order to know what it should do with the
shape relation, time zone and date format.

This commit adds those 3 parameters to the `rangeQuery` factory method so that
those instanceof checks are not necessary anymore.
2017-09-11 11:02:05 +02:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Martijn van Groningen 6bdf591193
removed unused import 2017-09-06 07:01:58 +02:00
Martijn van Groningen 77bbe99102
Fix two unreleased percolator query analyze bugs
* If in a range query upper is smaller than lower then ignore the range query
* If two empty range extractions are compared don't fail with NoSuchElementException
2017-09-06 06:47:01 +02:00
Martijn van Groningen 2ad3608245
percolator: handle point queries with 2 or more dimensions correctly 2017-09-06 06:36:47 +02:00
Martijn van Groningen a4d5c6418e
percolator: Rename map_unmapped_fields_as_string setting to map_unmapped_fields_as_text
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
2017-09-04 14:12:44 +02:00
Yannick Welsch 01f6851691 Serialize and expose timeout of acknowledged requests in REST layer (#26189)
Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead).
Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.
2017-08-16 07:43:05 +08:00
Martijn van Groningen 636e85e5b7
percolator: Hint what clauses are important in a conjunction query based on fields
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.

For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.
2017-08-11 15:32:01 +02:00
Martijn van Groningen 8285a0f399
percolator: Use correct version for bwc checking now that the change has been backported to 6.0 branch 2017-08-09 13:49:20 +02:00
Simon Willnauer 82fa531ab4 Remove `_index` fielddata hack if cluster alias is present (#26082)
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
2017-08-08 09:24:24 +02:00
Martijn van Groningen 11ce6b91a4
test: Do not use random index writer as test expects a single segment
check against right version
2017-08-07 09:40:54 +02:00
Martijn van Groningen 53dd8afaea
fix test 2017-08-02 11:25:03 +02:00
Martijn van Groningen a3d1248014
percolator: use correct version. 2017-08-02 10:37:59 +02:00
Martijn van Groningen 5f36bdfda0
percolator: Also support IndexOrDocValuesQuery
Otherwise ranges are never extracted properly.
2017-08-01 09:44:42 +02:00
Martijn van Groningen 7c3735bdc4
percolator: Store the QueryBuilder's Writable representation instead of its XContent representation.
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.

The query builder's binary format has now the same bwc guarentees as the xcontent format.

Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
2017-07-28 12:24:10 +02:00
Jim Ferenczi 562c3744ca Merge FunctionScoreQuery and FiltersFunctionScoreQuery (#25889)
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.

Fixes #15709
Fixes #23628
2017-07-28 09:22:20 +02:00
Martijn van Groningen edad7b4737
Add support for selecting percolator query candidate matches containing range queries.
Extracts ranges from range queries on byte, short, integer, long, half_float, scaled_float, float, double, date and ip fields.
byte, short, integer and date ranges are normalized to Lucene's LongRange.
half_float and float are normalized to Lucene's DoubleRange.

When extracting range queries, the QueryAnalyzer computes the width of the range.  This width is used to determine
what range should be preferred in a conjunction query. The QueryAnalyzer prefers the smaller ranges, because these
ranges tend to match with less documents.

Closes #21040
2017-07-26 21:25:45 +02:00
Simon Willnauer 634ce90dc0 Respect cluster alias in `_index` aggs and queries (#25885)
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.

Closes #25606
2017-07-26 09:16:52 +02:00
Simon Willnauer 0e3ad522a2 Rewrite search requests on the coordinating nodes (#25814)
This change rewrites search requests on the coordinating node before
we send requests to the individual shards. This will reduce the rewrite load
and object creation for each rewrite on the executing nodes and will fetch
resources only once instead of N times once per shard for queries like `terms`
query with index lookups. (among percolator and geo-shape)

Relates to #25791
2017-07-21 09:38:38 +02:00
Simon Willnauer 5e629cfba0 Ensure query resources are fetched asynchronously during rewrite (#25791)
The `QueryRewriteContext` used to provide a client object that can
be used to fetch geo-shapes, terms or documents for percolation. Unfortunately
all client calls used to be blocking calls which can have significant impact on the
rewrite phase since it occupies an entire search thread until the resource is
received. In the case that the index the resource is fetched from isn't on the local
node this can have significant impact on query throughput.

Note: this doesn't fix MLT since it fetches stuff in doQuery which is a different beast. Yet, it is a huge step in the right direction
2017-07-20 15:37:50 +02:00
Simon Willnauer 4d78935df7 Introduce a new Rewriteable interface to streamline rewriting (#25788)
Today we have duplicated code that is quite complicated to iterate
over rewriteable (`QueryBuilders` mainly) This change introduces a
`Rewriteable` interface that allow to share code to do the rewriting as
well as encapsulation and composition of queries.
2017-07-19 15:06:49 +02:00
Adrien Grand f1ff7f2454 Require a field when a `seed` is provided to the `random_score` function. (#25594)
We currently use fielddata on the `_id` field which is trappy, especially as we
do it implicitly. This changes the `random_score` function to use doc ids when
no seed is provided and to suggest a field when a seed is provided.

For now the change only emits a deprecation warning when no field is supplied
but this should be replaced by a strict check on 7.0.

Closes #25240
2017-07-19 14:11:15 +02:00
Jason Tedor c084542731 Bump version to 6.0.0-beta1
This commit does two things:
 - bumps the version from 6.0.0-alpha3 to 6.0.0-beta1
 - renames the 6.0.0-alpha3 version constant to 6.0.0-beta1

Relates #25621
2017-07-09 18:12:50 -04:00
Christoph Büscher f576c987ce Remove QueryParseContext (#25486)
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
2017-07-03 17:30:40 +02:00
Christoph Büscher 927111c91d Remove QueryParseContext from parsing QueryBuilders (#25448)
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
2017-06-29 17:10:20 +02:00
Martijn van Groningen c85ac402b0
test: Make many percolator integration tests real integration tests 2017-06-27 17:44:30 +02:00
Martijn van Groningen 343e7571b9
test: single type defaults to true since alpha1 and not alpha3
Closes #25354
2017-06-22 16:31:15 +02:00
Adrien Grand 44e9c0b947 Upgrade to lucene-7.0.0-snapshot-ad2cb77. (#25349)
Most notable changes:
 - better update concurrency: LUCENE-7868
 - TopDocs.totalHits is now a long: LUCENE-7872
 - QueryBuilder does not remove the boolean query around multi-term synonyms:
   LUCENE-7878
 - removal of Fields: LUCENE-7500

For the `TopDocs.totalHits` change, this PR relies on the fact that the encoding
of vInts and vLongs are compatible: you can write and read with any of them as
long as the value can be represented by a positive int.
2017-06-22 12:35:33 +02:00
Martijn van Groningen a977569085
percolator: Deprecate `document_type` parameter.
The `document_type` parameter is no longer required to be specified,
because by default from 6.0 only a single type is allowed. (`index.mapping.single_type` defaults to `true`)
2017-06-22 09:55:06 +02:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Jim Ferenczi 21a57c1494 Always use DisjunctionMaxQuery to build cross fields disjunction (#25115)
This commit modifies query_string, simple_query_string and multi_match queries to always use a DisjunctionMaxQuery when a disjunction over multiple fields is built. The tiebreaker is set to 1 in order to behave like the boolean query in terms of scoring.
The removal of the coord factor in Lucene 7 made this change mandatory to correctly handle minimum_should_match.

Closes #23966
2017-06-08 11:18:17 +02:00
Adrien Grand a8ea2f0df4 Leverage scorerSupplier when applicable. (#25109)
The `scorerSupplier` API allows to give a hint to queries in order to let them
know that they will be consumed in a random-access fashion. We should use this
for aggregations, function_score and matched queries.
2017-06-08 10:19:38 +02:00
Jim Ferenczi 7e60cf3e54 Move parent_id query to the parent-join module (#25072)
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on).
If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6).

Relates #20257
2017-06-06 19:35:14 +02:00
Martijn van Groningen 6945d7b046
test: Stop using the `mapping.single_type` setting in percolator tests.
Closes #24958
2017-05-31 09:11:33 +02:00
Martijn van Groningen 08eda43899
percolator: Use QueryBuilder.rewriteQuery(...) to rewrite query builder instead of QueryBuilder.rewrite(...)
Relates to #24617
2017-05-22 12:20:26 +02:00
Simon Willnauer 2ccc223ff7 Fix Version based BWC and set correct minCompatVersion (#24732)
Approaching the release of 6.0 we need to sort out the usage of
`Version#minimumCompatibilityVersion` which was still set to 5.0.0.
Now this change moves it to the latest released version of 5.x (5.4 at this point)
to ensure we are compatible with the latest minor of the previous major. This change
also removes all the `_UNRELEASED` from the versions that where released and drops versions
that were never released and are not expected to be released (bugfixes in minors that are not
the latest in the previous major).
2017-05-17 17:27:09 +02:00
Ryan Ernst 2a65bed243 Tests: Change rest test extension from .yaml to .yml (#24659)
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
2017-05-16 17:24:35 -07:00
Martijn van Groningen f6e19dcedc
percolator: Fix range queries with date range based on current time.
Range queries with now based date ranges were previously not allowed,
but since #23921 these queries were allowed. This change should really
fix range queries with now based date ranges.
2017-05-16 13:13:11 +02:00
Jim Ferenczi 279a18a527 Add parent-join module (#24638)
* Add parent-join module

This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.

Relates #20257
2017-05-12 15:58:06 +02:00
Adrien Grand 7311aaa2eb Fix PercolatorQuerySearchIT to not create multiple types. 2017-05-03 16:44:14 +02:00
Adrien Grand 1be2800120 Only allow one type on 7.0 indices (#24317)
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.

Relates #15613
2017-04-27 08:43:20 +02:00
Martijn van Groningen c17de49a6d
[percolator] Fix memory leak when percolator uses bitset or field data cache.
The percolator doesn't close the IndexReader of the memory index any more.
Prior to 2.x the percolator had its own SearchContext (PercolatorContext) that did this,
but that was removed when the percolator was refactored as part of the 5.0 release.

I think an alternative way to fix this is to let percolator not use the bitset and fielddata caches,
that way we prevent the memory leak.

Closes #24108
2017-04-26 11:08:15 +02:00
Ryan Ernst 212f24aa27 Tests: Clean up rest test file handling (#21392)
This change simplifies how the rest test runner finds test files and
removes all leniency.  Previously multiple prefixes and suffixes would
be tried, and tests could exist inside or outside of the classpath,
although outside of the classpath never quite worked. Now only classpath
tests are supported, and only one resource prefix is supported,
`/rest-api-spec/tests`.

closes #20240
2017-04-18 15:07:08 -07:00
Adrien Grand 4632661bc7 Upgrade to a Lucene 7 snapshot (#24089)
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.

Some notes about the change:
 - it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
 - it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
 - two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
2017-04-18 15:17:21 +02:00
Colin Goodheart-Smithe 0114f0061c Removes version 2.x constants from Version (#24011)
* Removes version 2.x constants from Version

Closes #21887

* Addresses review comments
2017-04-11 08:31:22 +01:00