OpenSearch/docs/reference/mapping/fields/all-field.asciidoc

368 lines
9.1 KiB
Plaintext
Raw Normal View History

[[mapping-all-field]]
=== `_all` field
deprecated[6.0.0, `_all` may no longer be enabled for indices created in 6.0+, use a custom field and the mapping `copy_to` parameter]
The `_all` field is a special _catch-all_ field which concatenates the values
of all of the other fields into one big string, using space as a delimiter, which is then
<<analysis,analyzed>> and indexed, but not stored. This means that it can be
searched, but not retrieved.
The `_all` field allows you to search for values in documents without knowing
which field contains the value. This makes it a useful option when getting
started with a new dataset. For instance:
[source,js]
--------------------------------
PUT /my_index
{
"mapping": {
"user": {
"_all": {
"enabled": true <1>
}
}
}
}
PUT /my_index/user/1 <2>
{
"first_name": "John",
"last_name": "Smith",
"date_of_birth": "1970-10-24"
}
GET /my_index/_search
{
"query": {
"match": {
"_all": "john smith 1970"
}
}
}
--------------------------------
// TEST[skip:_all is no longer allowed]
// CONSOLE
<1> Enabling the `_all` field
<2> The `_all` field will contain the terms: [ `"john"`, `"smith"`, `"1970"`, `"10"`, `"24"` ]
[NOTE]
.All values treated as strings
=============================================================================
The `date_of_birth` field in the above example is recognised as a `date` field
and so will index a single term representing `1970-10-24 00:00:00 UTC`. The
`_all` field, however, treats all values as strings, so the date value is
indexed as the three string terms: `"1970"`, `"24"`, `"10"`.
It is important to note that the `_all` field combines the original values
from each field as a string. It does not combine the _terms_ from each field.
=============================================================================
2016-03-18 12:01:27 -04:00
The `_all` field is just a <<text,`text`>> field, and accepts the same
parameters that other string fields accept, including `analyzer`,
`term_vectors`, `index_options`, and `store`.
The `_all` field can be useful, especially when exploring new data using
simple filtering. However, by concatenating field values into one big string,
the `_all` field loses the distinction between short fields (more relevant)
and long fields (less relevant). For use cases where search relevance is
important, it is better to query individual fields specifically.
The `_all` field is not free: it requires extra CPU cycles and uses more disk
space. For this reason, it is disabled by default. If not needed, it can be
<<enabling-all-field,enabled>> or customised on a <<include-in-all,per-field
basis>>.
[[querying-all-field]]
==== Using the `_all` field in queries
The <<query-dsl-query-string-query,`query_string`>> and
<<query-dsl-simple-query-string-query,`simple_query_string`>> queries query the
`_all` field by default if it is enabled, unless another field is specified:
[source,js]
--------------------------------
GET _search
{
"query": {
"query_string": {
"query": "john smith new york"
}
}
}
--------------------------------
// CONSOLE
The same goes for the `?q=` parameter in <<search-uri-request, URI search
requests>> (which is rewritten to a `query_string` query internally):
[source,js]
--------------------------------
GET _search?q=john+smith+new+york
--------------------------------
Other queries, such as the <<query-dsl-match-query,`match`>> and
<<query-dsl-term-query,`term`>> queries require you to specify the `_all` field
explicitly, as per the <<mapping-all-field,first example>>.
[[enabling-all-field]]
==== Enabling the `_all` field
The `_all` field can be enabled per-type by setting `enabled` to `true`:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"type_1": { <1>
"properties": {...}
},
"type_2": { <2>
"_all": {
"enabled": true
},
"properties": {...}
}
}
}
--------------------------------
// TEST[s/\.\.\.//]
// TEST[skip:_all is no longer allowed]
// CONSOLE
<1> The `_all` field in `type_1` is disabled.
<2> The `_all` field in `type_2` is enabled.
If the `_all` field is enabled, then URI search requests and the `query_string`
and `simple_query_string` queries can automatically use it for queries (see
<<querying-all-field>>). You can configure them to use a different field with
the `index.query.default_field` setting:
[source,js]
--------------------------------
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"content": {
2016-03-18 12:01:27 -04:00
"type": "text"
}
}
}
},
"settings": {
"index.query.default_field": "content" <1>
}
}
--------------------------------
// CONSOLE
<1> The `query_string` query will default to querying the `content` field in this index.
[[excluding-from-all]]
==== Excluding fields from `_all`
Individual fields can be included or excluded from the `_all` field with the
<<include-in-all,`include_in_all`>> setting.
[[all-field-and-boosting]]
==== Index boosting and the `_all` field
Individual fields can be _boosted_ at index time, with the <<mapping-boost,`boost`>>
parameter. The `_all` field takes these boosts into account:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"_all": {"enabled": true},
"properties": {
"title": { <1>
2016-03-18 12:01:27 -04:00
"type": "text",
"boost": 2
},
"content": { <1>
2016-03-18 12:01:27 -04:00
"type": "text"
}
}
}
}
}
--------------------------------
// TEST[skip:_all is no longer allowed]
// CONSOLE
<1> When querying the `_all` field, words that originated in the
`title` field are twice as relevant as words that originated in
the `content` field.
WARNING: Using index-time boosting with the `_all` field has a significant
impact on query performance. Usually the better solution is to query fields
individually, with optional query time boosting.
[[custom-all-fields]]
==== Custom `_all` fields
While there is only a single `_all` field per index, the <<copy-to,`copy_to`>>
parameter allows the creation of multiple __custom `_all` fields__. For
instance, `first_name` and `last_name` fields can be combined together into
the `full_name` field:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"properties": {
"first_name": {
2016-03-18 12:01:27 -04:00
"type": "text",
"copy_to": "full_name" <1>
},
"last_name": {
2016-03-18 12:01:27 -04:00
"type": "text",
"copy_to": "full_name" <1>
},
"full_name": {
2016-03-18 12:01:27 -04:00
"type": "text"
}
}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET myindex/_search
{
"query": {
"match": {
"full_name": "John Smith"
}
}
}
--------------------------------
// CONSOLE
<1> The `first_name` and `last_name` values are copied to the `full_name` field.
[[highlighting-all-field]]
==== Highlighting and the `_all` field
A field can only be used for <<search-request-highlighting,highlighting>> if
the original string value is available, either from the
<<mapping-source-field,`_source`>> field or as a stored field.
The `_all` field is not present in the `_source` field and it is not stored or
enabled by default, and so cannot be highlighted. There are two options. Either
<<all-field-store,store the `_all` field>> or highlight the
<<all-highlight-fields,original fields>>.
[[all-field-store]]
===== Store the `_all` field
If `store` is set to `true`, then the original field value is retrievable and
can be highlighted:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"_all": {
"enabled": true,
"store": true
}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET _search
{
"query": {
"match": {
"_all": "John Smith"
}
},
"highlight": {
"fields": {
"_all": {}
}
}
}
--------------------------------
// TEST[skip:_all is no longer allowed]
// CONSOLE
Of course, enabling and storing the `_all` field will use significantly more
disk space and, because it is a combination of other fields, it may result in
odd highlighting results.
The `_all` field also accepts the `term_vector` and `index_options`
parameters, allowing the use of the fast vector highlighter and the postings
highlighter.
[[all-highlight-fields]]
===== Highlight original fields
You can query the `_all` field, but use the original fields for highlighting as follows:
[source,js]
--------------------------------
PUT myindex
{
"mappings": {
"mytype": {
"_all": {"enabled": true}
}
}
}
PUT myindex/mytype/1
{
"first_name": "John",
"last_name": "Smith"
}
GET _search
{
"query": {
"match": {
"_all": "John Smith" <1>
}
},
"highlight": {
"fields": {
"*_name": { <2>
"require_field_match": false <3>
}
}
}
}
--------------------------------
// TEST[skip:_all is no longer allowed]
// CONSOLE
<1> The query inspects the `_all` field to find matching documents.
<2> Highlighting is performed on the two name fields, which are available from the `_source`.
<3> The query wasn't run against the name fields, so set `require_field_match` to `false`.