Commit Graph

430 Commits

Author SHA1 Message Date
Gytis Šk 86bffa870b Update fuzzy-query.asciidoc (#28032) 2018-01-01 08:44:04 +01:00
Mayya Sharipova dcde895f49
Introduce limit to the number of terms in Terms Query (#27968)
- Introduce index level settings to control the maximum number of terms
    that can be used in a Terms Query
- Throw an error if a request exceeds this max number

Closes #18829
2017-12-28 17:36:29 -05:00
Vlad Holubiev 7b14e4b8e0 [DOCS] Remove extra word (#27989) 2017-12-26 16:24:29 +00:00
Adrien Grand 1b660821a2
Allow `_doc` as a type. (#27816)
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.

Closes #27750
Closes #27751
2017-12-14 17:47:53 +01:00
Martijn van Groningen 6cda5b292c
docs: add paragraph about using `percolate` query in a filter context 2017-12-01 10:55:01 +01:00
Simon Willnauer fadbe0de08
Automatically prepare indices for splitting (#27451)
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.

This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.

NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.

This is a 7.0 only change.
2017-11-23 09:48:54 +01:00
Jim Ferenczi 53462f6499
Make fields optional in multi_match query and rely on index.query.default_field by default (#27380)
* Make fields optional in multi_match query and rely on index.query.default_field by default

This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.
2017-11-17 10:25:21 +01:00
Martijn van Groningen d805c41b28
Added new terms_set query
This query returns documents that match with at least one ore more
of the provided terms. The number of terms that must match varies
per document and is either controlled by a minimum should match
field or computed per document in a minimum should match script.

Closes #26915
2017-11-01 10:55:18 +01:00
Jim Ferenczi 792641a6e3 [Docs] #26541: add warning regarding the limit on the number of fields that can be queried at once in the multi_match query. 2017-10-30 18:03:56 +01:00
Clarkie b1ce5cf836 [Docs] Fix indentation of examples (#27168) 2017-10-30 11:56:38 +01:00
Jim Ferenczi a4105c6b4a
[Docs] Clarify `span_not` query behavior for non-overlapping matches (#27150)
Closes #27134
2017-10-30 11:29:40 +01:00
Martijn van Groningen f1e944a675
docs: describe parent/child performances 2017-10-26 11:49:13 +02:00
Alexander Kazakov 592ab043dd Change default value to true for transpositions parameter of fuzzy query (#26901) 2017-10-11 15:31:48 +02:00
Alexander Kazakov 9c95e91471 Expose `fuzzy_transpositions` parameter in fuzzy queries (#26870)
Add fuzzy_transpositions parameter to multi_match and query_string queries.
Add fuzzy_transpositions, fuzzy_prefix_length and fuzzy_max_expansions
parameters to simple_query_string query.
2017-10-05 09:01:09 +02:00
Jim Ferenczi 17b9baf5fd Clarify pure wilcard matching with `query_string` (#26814)
In 5.x pure wildcard queries `*` in `query_string` are rewritten to `exists` query for efficiency.
Though this introduced a change in the document that match such queries because
`exists` query also return documents with an empty value for the field.
This change clarifies this behavior for 5.x and beyond.

Closes #26801

* review
2017-10-04 09:55:26 +02:00
javanna dee2ae1023 [DOCS] Replace mention of string field type with text and keyword
Closes #25713
2017-09-25 11:12:06 +02:00
Christoph Büscher 86b00b84bc Remove parse field deprecations in query builders (#26711)
The `fielddata` field and the use of the `_name` field in the short syntax of the range 
query have been deprecated in 5.0 and can be removed.

The same goes for the deprecated `score_mode` field in HasParentQueryBuilder,
the deprecated `like_text`, `ids` and `docs` parameter in the `more_like_this` query,
the deprecated query name in the short version of the `regexp` query, and several
deprecated alternative field names in other query builders.
2017-09-20 16:22:21 +02:00
Christoph Büscher c7c6443b10 [Docs] "The the" is a great band, but ... (#26644)
Removing several occurrences of this typo in the docs and javadocs, seems to be
a common mistake. Corrections turn up once in a while in PRs, better to correct
some of this in one sweep.
2017-09-14 15:08:20 +02:00
Lee Hinman 2702918780 Limit the number of expanded fields it query_string and simple_query_string (#26541)
* Limit the number of expanded fields it query_string and simple_query_string

This limits the number of automatically expanded fields for the "all fields"
mode (`"default_field": "*"`) for the `query_string` and `simple_query_string`
queries to 1024 fields.

Resolves #25105

* Add blurb about limit to the docs
2017-09-08 13:37:55 -06:00
Martijn van Groningen b391425da1
Added support to the percolate query to percolate multiple documents
The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.

Also improved the support for multiple percolate queries in a search request.
2017-09-08 17:28:39 +02:00
Christoph Büscher f8fc0f3ebe [Tests] Check that quoteAnalyzer overrides analyzer in `query_string` query (#26473)
Adding a check to QueryStringQueryBuilderTests that checks the override
behaviour of `quote_analyzer`, also adding documentation explaining the use of
this parameter in `query_string` query.

Closes #25417
2017-09-02 11:53:02 +02:00
Jim Ferenczi 86d97971a4 Remove the _all metadata field (#26356)
* Remove the _all metadata field

This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
2017-08-28 17:43:59 +02:00
shaulzorea a827d545d8 [Docs] Fixing phrasing in has-parent-query.asciidoc (#26396) 2017-08-28 10:26:59 +02:00
Jim Ferenczi de1e4e0c15 Accept an array of field names and boosts in the index.query.default_field setting (#26320)
* Accept an array of field names and boosts in the index.query.default_field setting

This commit allows to define an array of field names and boosts for the index setting `index.query.default_field`.
The format is equivalent to the `fields` options of the full text search queries (e.g. field_name^boost).
This commit also makes this setting dynamically updatable.

Fixes #25946
2017-08-23 15:39:54 +02:00
Jim Ferenczi 4bce727165 Refactor simple_query_string to handle text part like multi_match and query_string (#26145)
This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query.
The main changes are:

 * For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.

 * For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored.

 * For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.

The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter.
This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part
, ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query.

Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
2017-08-21 13:12:27 +02:00
Jim Ferenczi a7e1610134 Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string (#26097)
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string

This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)

Note how the multi terms synonym "new york" produces a phrase query.
2017-08-09 12:15:09 +02:00
Ian Fisk 8cb1391f40 Docs: Use correct field name in Field Value factor docs. (#26104) 2017-08-08 16:34:20 -04:00
Jim Ferenczi 7868373069 [Docs] remove reference to the deprecated in the docs 2017-07-25 09:41:53 +02:00
Jim Ferenczi 4a9995145c [Docs]: Clarify query_string parser splits on operator 2017-07-24 18:36:16 +02:00
Jim Ferenczi c3784326eb Refactor field expansion for match, multi_match and query_string query (#25726)
This commit changes the way we handle field expansion in `match`, `multi_match` and `query_string` query.
 The main changes are:

- For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.

- For partial field names (with `*` suffix), the expansion is done only on `keyword`, `text`, `date`, `ip` and `number` field types. Other field types are simply ignored.

- For all fields (`*`), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.

- The `*` notation can also be used to set `default_field` option on`query_string` query. This should replace the needs for the extra option `use_all_fields` which is deprecated in this change.

This commit also rewrites simple `*` query to matchalldocs query when all fields are requested (Fixes #25556). 

The same change should be done on `simple_query_string` for completeness.

`use_all_fields` option in `query_string` is also deprecated in this change, `default_field` should be set to `*` instead.

Relates #25551
2017-07-21 16:52:57 +02:00
Adrien Grand f1ff7f2454 Require a field when a `seed` is provided to the `random_score` function. (#25594)
We currently use fielddata on the `_id` field which is trappy, especially as we
do it implicitly. This changes the `random_score` function to use doc ids when
no seed is provided and to suggest a field when a seed is provided.

For now the change only emits a deprecation warning when no field is supplied
but this should be replaced by a strict check on 7.0.

Closes #25240
2017-07-19 14:11:15 +02:00
Simon Willnauer cb4eebcd6a Make `index` in TermsLookup mandatory (#25753)
This change removes the leniency of having a `null` index to fetch
terms from in 6.0 onwards. This feature will be deprecated in the 5.x series
and 6.0 nodes will require the index to be set.

Closes #25750
2017-07-17 18:50:30 +02:00
Martijn van Groningen c8777c4c2e
docs: Updated reference docs that `document_type` is deprecated 2017-07-14 11:07:46 +02:00
Jim Ferenczi 13da3eb53e Refactor QueryStringQuery for 6.0 (#25646)
This change refactors the query_string query to analyze the query text around logical operators of the query string the same way than a match_query/multi_match_query.
It also adds a type parameter that can be used to change the way multi fields query are built the same way than a multi_match query does.

Now that these queries share the same behavior regarding text analysis, some parameters are obsolete and have been deprecated:

split_on_whitespace: This setting is now ignored with a deprecation notice
if it is used explicitely. With this PR The query_string always splits on logical operator.
It simplifies the understanding of the other parameters that can have different meanings
depending on the value of split_on_whitespace.

auto_generate_phrase_queries: This setting is now ignored with a deprecation notice
if it is used explicitely. This setting only makes sense when the parser splits on whitespace.

use_dismax: This setting is now ignored with a deprecation notice
if it is used explicitely. The tie_breaker parameter is sufficient to handle best_fields/most_fields.

Fixes #25574
2017-07-13 15:32:17 +02:00
Simon Willnauer e81804cfa4 Add a shard filter search phase to pre-filter shards based on query rewriting (#25658)
Today if we search across a large amount of shards we hit every shard. Yet, it's quite
common to search across an index pattern for time based indices but filtering will exclude
all results outside a certain time range ie. `now-3d`. While the search can potentially hit
hundreds of shards the majority of the shards might yield 0 results since there is not document
that is within this date range. Kibana for instance does this regularly but used `_field_stats`
to optimize the indexes they need to query. Now with the deprecation of `_field_stats` and it's upcoming removal a single dashboard in kibana can potentially turn into searches hitting hundreds or thousands of shards and that can easily cause search rejections even though the most of the requests are very likely super cheap and only need a query rewriting to early terminate with 0 results.

This change adds a pre-filter phase for searches that can, if the number of shards are higher than a the `pre_filter_shard_size` threshold (defaults to 128 shards), fan out to the shards
and check if the query can potentially match any documents at all. While false positives are possible, a negative response means that no matches are possible. These requests are not subject to rejection and can greatly reduce the number of shards a request needs to hit. The approach here is preferable to the kibana approach with field stats since it correctly handles aliases and uses the correct threadpools to execute these requests. Further it's completely transparent to the user and improves scalability of elasticsearch in general on large clusters.
2017-07-12 22:19:20 +02:00
olcbean 2ba9fd2aec Remove deprecated created and found from index, delete and bulk (#25516)
The created and found fields in index and delete responses became obsolete after the introduction of the result field in index, update and delete responses (#19566).

After deprecating the created and found fields in 5.x (#19633), now they are removed.

Fixes #19630
2017-07-07 13:58:46 -04:00
Clinton Gormley 0170e0e8d3 Remove usage of multi-types from the docs and added a page explaining type removal (#25543)
Closes #25401
2017-07-05 12:30:19 +02:00
dkimdon fdb3a97152
Update percolate-query.asciidoc (#25364) 2017-06-23 10:39:57 +02:00
Martijn van Groningen a977569085
percolator: Deprecate `document_type` parameter.
The `document_type` parameter is no longer required to be specified,
because by default from 6.0 only a single type is allowed. (`index.mapping.single_type` defaults to `true`)
2017-06-22 09:55:06 +02:00
Adrien Grand 0c117145f6 Upgrade to lucene-7.0.0-snapshot-92b1783. (#25222)
This snapshot has faster range queries on range fields (LUCENE-7828), more
accurate norms (LUCENE-7730) and the ability to use fake term frequencies
(LUCENE-7854).
2017-06-15 09:52:07 +02:00
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00
Andrey Groshev e4fd8485ce Made the same length of opening and closing lines (#23583) 2017-06-09 00:50:43 -07:00
David Cho-Lerat 491dc1186a Add missing word to terms-query.asciidoc (#24960) 2017-05-30 09:42:07 -04:00
David Cho-Lerat c939bcb7f5 Correct some spelling in match-phrase-prefix docs (#24956) 2017-05-30 09:02:01 -04:00
Martijn van Groningen 840da4aebf
Removed deprecated template query.
Relates to #19390
2017-05-11 14:56:45 +02:00
Adrien Grand 1be2800120 Only allow one type on 7.0 indices (#24317)
This adds the `index.mapping.single_type` setting, which enforces that indices
have at most one type when it is true. The default value is true for 6.0+ indices
and false for old indices.

Relates #15613
2017-04-27 08:43:20 +02:00
Nik Everett 416feeb7f9 Rewrite description of `bool`'s `should` (#24342)
Docs: rewrite description of `bool`'s `should`

Rewrites the description of the `bool` query's `should`
clauses so it is (hopefully) more clear what the defaults
for `minimum_should_match` are.

There is still an `[IMPORTANT]` section about `minimum_should_match`
in a filter context. I think it is worth keeping because it is, well,
important.

Closes #23831
2017-04-26 14:09:26 -04:00
Jason Tedor 4796557a30 Add primary term to doc write response
This commit adds the primary term to the doc write response.

Relates #24171
2017-04-19 14:44:22 -04:00
Adrien Grand 4632661bc7 Upgrade to a Lucene 7 snapshot (#24089)
We want to upgrade to Lucene 7 ahead of time in order to be able to check whether it causes any trouble to Elasticsearch before Lucene 7.0 gets released. From a user perspective, the main benefit of this upgrade is the enhanced support for sparse fields, whose resource consumption is now function of the number of docs that have a value rather than the total number of docs in the index.

Some notes about the change:
 - it includes the deprecation of the `disable_coord` parameter of the `bool` and `common_terms` queries: Lucene has removed support for coord factors
 - it includes the deprecation of the `index.similarity.base` expert setting, since it was only useful to configure coords and query norms, which have both been removed
 - two tests have been marked with `@AwaitsFix` because of #23966, which we intend to address after the merge
2017-04-18 15:17:21 +02:00
Nik Everett 048191ceb6 CONSOLEify highlighting a function_score docs
Converts many of the partial examples into full search requests.

Relates #18160
2017-04-06 08:13:56 -04:00