* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.
Stating that the Fuzzy Query generates "all possible" matching terms is misleading, given that the query's default behavior is to generate a maximum of 50 matching terms.
(cherry picked from commit 345a0071a2a41fd7f80ae9ef8a39a2cb4991aedd)
This field is similar to the `feature` field but is better suited to index
sparse feature vectors. A use-case for this field could be to record topics
associated with every documents alongside a metric that quantifies how well
the topic is connected to this document, and then boost queries based on the
topics that the logged user is interested in.
Relates #27552
By default span_multi query will limit term expansions = boolean max clause.
This will limit high heap usage in case of high cardinality term
expansions. This applies only if top_terms_N is not used in inner multi
query.
* Fix index prefixes to work with span_multi
Text fields that use `index_prefixes` can rewrite `prefix` queries into
`term` queries internally. This commit fix the handling of this rewriting
in the `span_multi` query.
This change also copies the index options of the text field into the
prefix field in order to be able to run positional queries. This is mandatory
for `span_multi` to work but this could also be useful to optimize `match_phrase_prefix`
queries in a follow up. Note that this change can only be done on indices created
after 6.3 since we set the index options to doc only in this version.
Fixes#31056
Treats geohashes as grid cells instead of just points when the
geohashes are used to specify the edges in the geo_bounding_box
query. For example, if a geohash is used to specify the top_left
corner, the top left corner of the geohash cell will be used as the
corner of the bounding box.
Closes#25154
The current documentation isn't very clear about how incomplete dates are
treated when specifying custom formats in a `range` query. This change adds a
note explaining how missing month or year coordinates translate to dates that
have the missings slots filled with unix time start date (1970-01-01)
Closes#30634
Lucene has a new `FeatureField` which gives the ability to record numeric
features as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.
The geo_bounding_box query might produce false positives alongside
the right and upper edges and false negatives alongside left and
bottom edges. This commit documents the behavior and defines the
maximum error.
Closes#29196
This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the split API in place they no longer need to resort only to
reindexing.
Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).
Some features have been deprecated since `6.0` like the `_parent` field or the
ability to have multiple types per index. This allows to remove quite some
code, which in-turn will hopefully make it easier to proceed with the removal
of types.
This commit removes some parameters deprecated in 6.x (or 5.x):
`use_dismax`, `split_on_whitespace`, `all_fields` and `lowercase_expanded_terms`.
Closes#25551
* Clarifies how the query_string splits textual part to build a query
Whitespaces are not considered as operators anymore in 6x but the documentation is not clear about it.
This commit changes the example in the documentation and adds a note regarding whitespaces and operators.
Closes#28719
* Reject regex search if regex string is too long (#28344)
* Add docs
* Introduce index level setting `index.max_regex_length`
to control the maximum length of the regular expression
Closes#28344
The `terms` query is really designed for filtering and highlighting it might
cause performance issues if it wraps many terms, so I am documenting
highlighting these queries as a best-effort only.
Closes#28099
- Introduce index level settings to control the maximum number of terms
that can be used in a Terms Query
- Throw an error if a request exceeds this max number
Closes#18829
Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.
Closes#27750Closes#27751
Today we require users to prepare their indices for split operations.
Yet, we can do this automatically when an index is created which would
make the split feature a much more appealing option since it doesn't have
any 3rd party prerequisites anymore.
This change automatically sets the number of routinng shards such that
an index is guaranteed to be able to split once into twice as many shards.
The number of routing shards is scaled towards the default shard limit per index
such that indices with a smaller amount of shards can be split more often than
larger ones. For instance an index with 1 or 2 shards can be split 10x
(until it approaches 1024 shards) while an index created with 128 shards can only
be split 3x by a factor of 2. Please note this is just a default value and users
can still prepare their indices with `index.number_of_routing_shards` for custom
splitting.
NOTE: this change has an impact on the document distribution since we are changing
the hash space. Documents are still uniformly distributed across all shards but since
we are artificually changing the number of buckets in the consistent hashign space
document might be hashed into different shards compared to previous versions.
This is a 7.0 only change.
* Make fields optional in multi_match query and rely on index.query.default_field by default
This commit adds the ability to send `multi_match` query without providing any `fields`.
When no fields are provided the `multi_match` query will use the fields defined in the index setting `index.query.default_field`
(which in turns defaults to `*`).
The same behavior is already implemented in `query_string` and `simple_query_string` so this change just applies
the heuristic to `multi_match` queries.
Relying on `index.query.default_field` rather than `*` is safer for big mappings that break the 1024 field expansion limit added in 7.0 for all
text queries. For these kind of mappings the admin can change the `index.query.default_field` in order to make sure that exploratory queries using
`multi_match`, `query_string` or `simple_query_string` do not throw an exception.