Also include _type and _id for parent/child hits inside inner hits.
In the case of top_hits aggregation the nested search hits are
directly returned and are not grouped by a root or parent document, so
it is important to include the _id and _index attributes in order to know
to what documents these nested search hits belong to.
Closes#27053
* Enhances exists queries to reduce need for `_field_names`
Before this change we wrote the name all the fields in a document to a `_field_names` field and then implemented exists queries as a term query on this field. The problem with this approach is that it bloats the index and also affects indexing performance.
This change adds a new method `existsQuery()` to `MappedFieldType` which is implemented by each sub-class. For most field types if doc values are available a `DocValuesFieldExistsQuery` is used, falling back to using `_field_names` if doc values are disabled. Note that only fields where no doc values are available are written to `_field_names`.
Closes#26770
* Addresses review comments
* Addresses more review comments
* implements existsQuery explicitly on every mapper
* Reinstates ability to perform term query on `_field_names`
* Added bwc depending on index created version
* Review Comments
* Skips tests that are not supported in 6.1.0
These values will need to be changed after backporting this PR to 6.x
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
This change adds a fromXContent method to Settings that allows to read
the xcontent that is produced by toXContent. It also replaces the entire settings
loader infrastructure and removes the structured map representation. Future PRs will
also tackle the `getAsMap` that exposes the internal represenation of settings for
better encapsulation.
The `fielddata` field and the use of the `_name` field in the short syntax of the range
query have been deprecated in 5.0 and can be removed.
The same goes for the deprecated `score_mode` field in HasParentQueryBuilder,
the deprecated `like_text`, `ids` and `docs` parameter in the `more_like_this` query,
the deprecated query name in the short version of the `regexp` query, and several
deprecated alternative field names in other query builders.
* Moves deferring code into its own subclass
This change moves the code that deals with deferring collection to a subclass of BucketAggregator called DeferringBucketAggregator. This means that the code in AggregatorBase is simplified and also means that the code for deferring colleciton is in one place and easier to maintain.
* Makes SIngleBucketAggregator an interface
This is so aggregators that extend BucketsAggregator directly and those that extend DeferringBucketAggregator can be a single bucket aggregator
* review comments
* More review comments
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.
Fixes#15709Fixes#23628
Today when we aggregate on the `_index` field the cross cluster search
alias is not taken into account. Neither is it respected when we search
on the field. This change adds support for cluster alias when the cluster
alias is present on the `_index` field.
Closes#25606
Indexing ids in binary form should help with indexing speed since we would
have to compare fewer bytes upon sorting, should help with memory usage of
the live version map since keys will be shorter, and might help with disk
usage depending on how efficient the terms dictionary is at compressing
terms.
Since we can only expect base64 ids in the auto-generated case, this PR tries
to use an encoding that makes the binary id equal to the base64-decoded id in
the majority of cases (253 out of 256). It also specializes numeric ids, since
this seems to be common when content that is stored in Elasticsearch comes
from another database that uses eg. auto-increment ids.
Another option could be to require base64 ids all the time. It would make things
simpler but I'm not sure users would welcome this requirement.
This PR should bring some benefits, but I expect it to be mostly useful when
coupled with something like #24615.
Closes#18154
QueryParseContext is currently only used as a wrapper for an XContentParser, so
this change removes it entirely and changes the appropriate APIs that use it so
far to only accept a parser instead.
Currently QueryParseContext is only a thin wrapper around an XContentParser that
adds little functionality of its own. I provides helpers for long deprecated
field names which can be removed and two helper methods that can be made static
and moved to other classes. This is a first step in helping to remove
QueryParseContext entirely.
This removes the remaining usage of `mapping.single_type` from the parent join
module and moves it's bwc test to the mixed cluster tests
Relates to #24961
Relates to #20257
This change cleans up remaining tests to not use index.mapping.single_type=false
but instead where applicable use a single type or markt the index as created
with a pre 6.x version.
Yet, there is still on leftover in the client tests that needs special attention.
See `org.elasticsearch.client.SearchIT`
Relates to #24961
* Add a section named "relation" in the ParentJoinFieldMapper
This commit puts the parent/child definition in an inner section named "relation".
Mapping for the parent-join will look like this:
```
"join_field": {
"type": "join"
"relations":
"parent": "child"
}
}
```
This commit adds a NamedXContentProvider interface that can
be implemented by plugins or modules using Java's SPI feature
in order to provide additional NamedXContent parsers to external
applications like the Java High Level Rest Client.
The `scorerSupplier` API allows to give a hint to queries in order to let them
know that they will be consumed in a random-access fashion. We should use this
for aggregations, function_score and matched queries.
This change moves the parent_id query to the parent-join module and handles the case when only the parent-join field can be declared on an index (index with single type on).
If single type is off it uses the legacy parent join field mapper and switch to the new one otherwise (default in 6).
Relates #20257
This change ensures that there is a single parent-join field defined per mapping.
The verification is done through the addition of a special field mapper (MetaJoinFieldMapper) with a unique name (_parent_join) that is registered to the mapping service
when the first parent-join field is defined. If a new parent-join is added, this field mapper will clash with the new one and the update will fail.
This change also simplifies the parent join fetch sub phase by retrieving the parent-join field without iterating on all fields in the mapping.
* Introduce ParentJoinFieldMapper, a field mapper that creates parent/child relation within documents of the same index
This change adds a new field mapper named ParentJoinFieldMapper. This mapper is a replacement for the ParentFieldMapper but instead of using the types in the mapping
it uses an internal field to materialize parent/child relation within a single index.
This change also adds a fetch sub phase that automatically retrieves the join name (parent or child name) and the parent id for child documents in the response hit fields.
The compatibility with `has_parent`, `has_child` queries and `children` agg will be added in a follow up.
Relates #20257
Removes the need for the `_UNRELEASED` suffix on versions by detecting if a version should be unreleased or not based on the versions around it. This should make it simpler to automate the task of adding a new version label.
This commit moves the handling of nested and parent/child inner hits to specialized classes that can be defined outside of ES core.
InnerHitBuilderContext is now used by the parent query (nested or hasChild, ...) to build the sub context from the InnerHitBuilder definition.
BWC is also ensured so that nodes in previous versions can still send/receive inner hits to/from this version.
Relates #20257
This change removes the field data specialization needed for the parent field and replaces it with
a simple DocValuesIndexFieldData. The underlying global ordinals are retrieved via a new function called
IndexOrdinalsFieldData#getOrdinalMap.
The children aggregation is also modified to use a simple WithOrdinals value source rather than the deleted WithOrdinals.Parent.
Relates #20257
This commit renames all rest test files to use the .yml extension
instead of .yaml. This way the extension used within all of
elasticsearch for yaml is consistent.
* Add parent-join module
This change adds a new module named `parent-join`.
The goal of this module is to provide a replacement for the `_parent` field but as a first step this change only moves the `has_child`, `has_parent` queries and the `children` aggregation to this module.
These queries and aggregations are no longer in core but they are deployed by default as a module.
Relates #20257