* [Geo] Expose BKDBackedGeoShapes as new VECTOR strategy
This commit exposes lucene's LatLonShape field as a new
strategy in GeoShapeFieldMapper. To use the new indexing
approach, strategy should be set to "vector" in the
geo_shape field mapper. If the tree parameter is set
the mapper will throw an IAE. Note the following:
When using vector strategy:
* geo_shape query does not support querying by POINT,
MULTIPOINT, or GEOMETRYCOLLECTION.
* LINESTRING and MULTILINESTRING queries do not support
WITHIN relation.
* CONTAINS relation is not supported.
* The tree, precision, tree_levels, distance_error_pct,
and points_only parameters will not throw an exception
but they have no effect and will be marked as
deprecated..
All other features are supported.
* revert change to PercolatorFieldMapper
* fix ExistsQuery for geo_shape vector strategy
* add deprecation logging for tree, precision, tree_levels, distance_error_pct, and points_only
* initial update to geoshape docs, including mapping migration updates
* initial support for GeoCollection queries
* fix docs and javadoc errors
* clean up geocollection queries
* set deprecated mapping tests to NOTCONSOLE
* fix geo-shape mapper asciidoc mapping and test warnings
* add support for point queries using LatLonShapeBoundingBoxQuery
* update GeoShapeQueryBuilderTests to include POINT queries for VECTOR strategy. Other comment cleanups
* add lucene geometry build testing to ShapeBuilder tests
* remove deprecated prefix tree mapping from geo-shape.asciidoc
* refactor GeoShapeFieldMapper into LegacyGeoShapeFieldMapper and GeoShapeFieldMapper
Both classes derive from BaseGeoShapeFieldMapper that provides shared parameters:
coerce, ignoreMalformed, ignore_z_value, orientation.
* update docs to remove vector strategy
* fix GeometryCollectionBuilder#buildLucene to return the object created by the shape builder
* fix LineLength failure in GeoJsonShapeParserTests
* ShapeMapper refactor changes from PR feedback
* fix typo in geo-shape.asciidoc
* ignore circle test in docs
* update indexing-approach ref to geoshape-indexing-approach
* add warnings check for LegacyGeoShapeFieldMapper to AbstractBuilderTestCase
* fix deprecatedParameters setup
* update indexing approach
* fixing unexpected warnings failures
* move orientation back to field type
* remove if in LegacyGeoShapeFieldMapper#doXContent. Fix GeoShapeFieldMapper to work with double array as a point
* fix indexing-approach link in circle section of geoshape docs
* add strategy to deprecation warnings check
* fix test failures
* fix typo in QueryStringQueryBuilderTests
* fix total hits to totalHits().value
* fix version number
* add version check to BaseGeoShapeFieldMapper
* fix line length!
* revert version check in BaseGeoShapeFieldMapper
* Fix serialization of mappings of legacy shapes.
This commit exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new
indexing approach, simply set "type" : "geo_shape" in
the mappings without setting any of the strategy, precision,
tree_levels, or distance_error_pct parameters. Note the
following when using the new indexing approach:
* geo_shape query does not support querying by
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct,
and points_only parameters are deprecated.
* Add IntervalQueryBuilder with support for match and combine intervals
* Add relative intervals
* feedback
* YAML test - broekn
* yaml test; begin to add block source
* Add block; make disjunction its own source
* WIP
* Extract IntervalBuilder and add tests for it
* Fix eq/hashcode in Disjunction
* New yaml test
* checkstyle
* license headers
* test fix
* YAML format
* YAML formatting again
* yaml tests; javadoc
* Add OR test -> requires fix from LUCENE-8586
* Add docs
* Re-do API
* Clint's API
* Delete bash script
* doc fixes
* imports
* docs
* test fix
* feedback
* comma
* docs fixes
* Tidy up doc references to old rule
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
When building a query Lucene distinguishes two cases, queries that require to produce a score and queries that only need to match. We cloned this mechanism in the QueryBuilders in order to be able to produce different queries based on whether they need to produce a score or not. However the only case in es that require this distinction is the BoolQueryBuilder that sets a different minimum_should_match when a `bool` query is built in a filter context..
This behavior doesn't seem right because it makes the matching of `should` clauses different when the score is not required.
Closes#35293
* Forbid negative scores in functon_score query
- Throw an exception when scores are negative in field_value_factor
function
- Throw an exception when scores are negative in script_score
function
Relates to #33309
This commit moves the documentation and examples for the `index_prefixes`
option on text fields to its own file, to bring it in line with other mapping
parameters, and expands a bit on both.
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in #26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.
Closes#34778
The `random_score` function produces values between 0 (inclusive) and 1
(exclusive) and documented it with fancy methematical range notation. It
is so fancy I thought it was a typo. This changes the documentation to
use words.
Relates to #35084
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
The original statement "Runs a match_phrase query on each field and combines the _score from each field." for the phrase type is a but misleading. The phrase type behaves like the best_fields type and does not combine the scores of each fields.
Stating that the Fuzzy Query generates "all possible" matching terms is misleading, given that the query's default behavior is to generate a maximum of 50 matching terms.
(cherry picked from commit 345a0071a2a41fd7f80ae9ef8a39a2cb4991aedd)
This field is similar to the `feature` field but is better suited to index
sparse feature vectors. A use-case for this field could be to record topics
associated with every documents alongside a metric that quantifies how well
the topic is connected to this document, and then boost queries based on the
topics that the logged user is interested in.
Relates #27552
By default span_multi query will limit term expansions = boolean max clause.
This will limit high heap usage in case of high cardinality term
expansions. This applies only if top_terms_N is not used in inner multi
query.
* Fix index prefixes to work with span_multi
Text fields that use `index_prefixes` can rewrite `prefix` queries into
`term` queries internally. This commit fix the handling of this rewriting
in the `span_multi` query.
This change also copies the index options of the text field into the
prefix field in order to be able to run positional queries. This is mandatory
for `span_multi` to work but this could also be useful to optimize `match_phrase_prefix`
queries in a follow up. Note that this change can only be done on indices created
after 6.3 since we set the index options to doc only in this version.
Fixes#31056
Treats geohashes as grid cells instead of just points when the
geohashes are used to specify the edges in the geo_bounding_box
query. For example, if a geohash is used to specify the top_left
corner, the top left corner of the geohash cell will be used as the
corner of the bounding box.
Closes#25154
The current documentation isn't very clear about how incomplete dates are
treated when specifying custom formats in a `range` query. This change adds a
note explaining how missing month or year coordinates translate to dates that
have the missings slots filled with unix time start date (1970-01-01)
Closes#30634
Lucene has a new `FeatureField` which gives the ability to record numeric
features as term frequencies. Its main benefit is that it allows to boost
queries with the values of these features and efficiently skip non-competitive
documents at the same time using block-max WAND and indexed impacts.
The geo_bounding_box query might produce false positives alongside
the right and upper edges and false negatives alongside left and
bottom edges. This commit documents the behavior and defines the
maximum error.
Closes#29196
This commit changes the default out-of-the-box configuration for the
number of shards from five to one. We think this will help address a
common problem of oversharding. For users with time-based indices that
need a different default, this can be managed with index templates. For
users with non-time-based indices that find they need to re-shard with
the split API in place they no longer need to resort only to
reindexing.
Since this has the impact of changing the default number of shards used
in REST tests, we want to ensure that we still have coverage for issues
that could arise from multiple shards. As such, we randomize (rarely)
the default number of shards in REST tests to two. This is managed via a
global index template. However, some tests check the templates that are
in the cluster state during the test. Since this template is randomly
there, we need a way for tests to skip adding the template used to set
the number of shards to two. For this we add the default_shards feature
skip. To avoid having to write our docs in a complicated way because
sometimes they might be behind one shard, and sometimes they might be
behind two shards we apply the default_shards feature skip to all docs
tests. That is, these tests will always run with the default number of
shards (one).