456 lines
13 KiB
Plaintext
456 lines
13 KiB
Plaintext
[[search-fields]]
|
||
== Retrieve selected fields from a search
|
||
++++
|
||
<titleabbrev>Retrieve selected fields</titleabbrev>
|
||
++++
|
||
|
||
By default, each hit in the search response includes the document
|
||
<<mapping-source-field,`_source`>>, which is the entire JSON object that was
|
||
provided when indexing the document. To retrieve specific fields in the search
|
||
response, you can use the `fields` parameter:
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index-000001/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"message": "foo"
|
||
}
|
||
},
|
||
"fields": ["user.id", "@timestamp"],
|
||
"_source": false
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
The `fields` parameter consults both a document's `_source` and the index
|
||
mappings to load and return values. Because it makes use of the mappings,
|
||
`fields` has some advantages over referencing the `_source` directly: it
|
||
accepts <<multi-fields, multi-fields>> and <<alias, field aliases>>, and
|
||
also formats field values like dates in a consistent way.
|
||
|
||
A document's `_source` is stored as a single field in Lucene. So the whole
|
||
`_source` object must be loaded and parsed even if only a small number of
|
||
fields are requested. To avoid this limitation, you can try another option for
|
||
loading fields:
|
||
|
||
* Use the <<docvalue-fields, `docvalue_fields`>>
|
||
parameter to get values for selected fields. This can be a good
|
||
choice when returning a fairly small number of fields that support doc values,
|
||
such as keywords and dates.
|
||
* Use the <<request-body-search-stored-fields, `stored_fields`>> parameter to
|
||
get the values for specific stored fields (fields that use the
|
||
<<mapping-store,`store`>> mapping option).
|
||
|
||
If needed, you can use the <<script-fields,`script_field`>> parameter to
|
||
transform field values in the response using a script. However, scripts can’t
|
||
make use of {es}'s index structures or related optimizations. This can sometimes
|
||
result in slower search speeds.
|
||
|
||
You can find more detailed information on each of these methods in the
|
||
following sections:
|
||
|
||
* <<search-fields-param>>
|
||
* <<docvalue-fields>>
|
||
* <<stored-fields>>
|
||
* <<source-filtering>>
|
||
* <<script-fields>>
|
||
|
||
[discrete]
|
||
[[search-fields-param]]
|
||
=== Fields
|
||
|
||
The `fields` parameter allows for retrieving a list of document fields in
|
||
the search response. It consults both the document `_source` and the index
|
||
mappings to return each value in a standardized way that matches its mapping
|
||
type. By default, date fields are formatted according to the
|
||
<<mapping-date-format,date format>> parameter in their mappings.
|
||
|
||
The following search request uses the `fields` parameter to retrieve values
|
||
for the `user.id` field, all fields starting with `http.response.`, and the
|
||
`@timestamp` field:
|
||
|
||
[source,console]
|
||
----
|
||
POST my-index-000001/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"user.id": "kimchy"
|
||
}
|
||
},
|
||
"fields": [
|
||
"user.id",
|
||
"http.response.*", <1>
|
||
{
|
||
"field": "@timestamp",
|
||
"format": "epoch_millis" <2>
|
||
}
|
||
],
|
||
"_source": false
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
<1> Both full field names and wildcard patterns are accepted.
|
||
<2> Using object notation, you can pass a `format` parameter to apply a custom
|
||
format for the field's values. The date fields
|
||
<<date,`date`>> and <<date_nanos, `date_nanos`>> accept a
|
||
<<mapping-date-format,date format>>. <<spatial_datatypes, Spatial fields>>
|
||
accept either `geojson` for http://www.geojson.org[GeoJSON] (the default)
|
||
or `wkt` for
|
||
{wikipedia}/Well-known_text_representation_of_geometry[Well Known Text].
|
||
Other field types do not support the `format` parameter.
|
||
|
||
The values are returned as a flat list in the `fields` section in each hit:
|
||
|
||
[source,console-result]
|
||
----
|
||
{
|
||
"took" : 2,
|
||
"timed_out" : false,
|
||
"_shards" : {
|
||
"total" : 1,
|
||
"successful" : 1,
|
||
"skipped" : 0,
|
||
"failed" : 0
|
||
},
|
||
"hits" : {
|
||
"total" : {
|
||
"value" : 1,
|
||
"relation" : "eq"
|
||
},
|
||
"max_score" : 1.0,
|
||
"hits" : [
|
||
{
|
||
"_index" : "my-index-000001",
|
||
"_id" : "0",
|
||
"_score" : 1.0,
|
||
"_type" : "_doc",
|
||
"fields" : {
|
||
"user.id" : [
|
||
"kimchy"
|
||
],
|
||
"@timestamp" : [
|
||
"4098435132000"
|
||
],
|
||
"http.response.bytes": [
|
||
1070000
|
||
],
|
||
"http.response.status_code": [
|
||
200
|
||
]
|
||
}
|
||
}
|
||
]
|
||
}
|
||
}
|
||
----
|
||
// TESTRESPONSE[s/"took" : 2/"took": $body.took/]
|
||
// TESTRESPONSE[s/"max_score" : 1.0/"max_score" : $body.hits.max_score/]
|
||
// TESTRESPONSE[s/"_score" : 1.0/"_score" : $body.hits.hits.0._score/]
|
||
|
||
Only leaf fields are returned -- `fields` does not allow for fetching entire
|
||
objects.
|
||
|
||
The `fields` parameter handles field types like <<alias, field aliases>> and
|
||
<<constant-keyword-field-type, `constant_keyword`>> whose values aren't always present in
|
||
the `_source`. Other mapping options are also respected, including
|
||
<<ignore-above, `ignore_above`>>, <<ignore-malformed, `ignore_malformed`>> and
|
||
<<null-value, `null_value`>>.
|
||
|
||
NOTE: The `fields` response always returns an array of values for each field,
|
||
even when there is a single value in the `_source`. This is because {es} has
|
||
no dedicated array type, and any field could contain multiple values. The
|
||
`fields` parameter also does not guarantee that array values are returned in
|
||
a specific order. See the mapping documentation on <<array, arrays>> for more
|
||
background.
|
||
|
||
|
||
|
||
[discrete]
|
||
[[docvalue-fields]]
|
||
=== Doc value fields
|
||
|
||
You can use the <<docvalue-fields,`docvalue_fields`>> parameter to return
|
||
<<doc-values,doc values>> for one or more fields in the search response.
|
||
|
||
Doc values store the same values as the `_source` but in an on-disk,
|
||
column-based structure that's optimized for sorting and aggregations. Since each
|
||
field is stored separately, {es} only reads the field values that were requested
|
||
and can avoid loading the whole document `_source`.
|
||
|
||
Doc values are stored for supported fields by default. However, doc values are
|
||
not supported for <<text,`text`>> or
|
||
{plugins}/mapper-annotated-text-usage.html[`text_annotated`] fields.
|
||
|
||
The following search request uses the `docvalue_fields` parameter to retrieve
|
||
doc values for the `user.id` field, all fields starting with `http.response.`, and the
|
||
`@timestamp` field:
|
||
|
||
[source,console]
|
||
----
|
||
GET my-index-000001/_search
|
||
{
|
||
"query": {
|
||
"match": {
|
||
"user.id": "kimchy"
|
||
}
|
||
},
|
||
"docvalue_fields": [
|
||
"user.id",
|
||
"http.response.*", <1>
|
||
{
|
||
"field": "date",
|
||
"format": "epoch_millis" <2>
|
||
}
|
||
]
|
||
}
|
||
----
|
||
// TEST[setup:my_index]
|
||
|
||
<1> Both full field names and wildcard patterns are accepted.
|
||
<2> Using object notation, you can pass a `format` parameter to apply a custom
|
||
format for the field's doc values. <<date,Date fields>> support a
|
||
<<mapping-date-format,date `format`>>. <<number,Numeric fields>> support a
|
||
https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html[DecimalFormat
|
||
pattern]. Other field datatypes do not support the `format` parameter.
|
||
|
||
TIP: You cannot use the `docvalue_fields` parameter to retrieve doc values for
|
||
nested objects. If you specify a nested object, the search returns an empty
|
||
array (`[ ]`) for the field. To access nested fields, use the
|
||
<<inner-hits, `inner_hits`>> parameter's `docvalue_fields`
|
||
property.
|
||
|
||
[discrete]
|
||
[[stored-fields]]
|
||
=== Stored fields
|
||
|
||
It's also possible to store an individual field's values by using the
|
||
<<mapping-store,`store`>> mapping option. You can use the
|
||
`stored_fields` parameter to include these stored values in the search response.
|
||
|
||
WARNING: The `stored_fields` parameter is for fields that are explicitly marked as
|
||
stored in the mapping, which is off by default and generally not recommended.
|
||
Use <<source-filtering,source filtering>> instead to select
|
||
subsets of the original source document to be returned.
|
||
|
||
Allows to selectively load specific stored fields for each document represented
|
||
by a search hit.
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
GET /_search
|
||
{
|
||
"stored_fields" : ["user", "postDate"],
|
||
"query" : {
|
||
"term" : { "user" : "kimchy" }
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
`*` can be used to load all stored fields from the document.
|
||
|
||
An empty array will cause only the `_id` and `_type` for each hit to be
|
||
returned, for example:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
GET /_search
|
||
{
|
||
"stored_fields" : [],
|
||
"query" : {
|
||
"term" : { "user" : "kimchy" }
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
If the requested fields are not stored (`store` mapping set to `false`), they will be ignored.
|
||
|
||
Stored field values fetched from the document itself are always returned as an array. On the contrary, metadata fields like `_routing` are never returned as an array.
|
||
|
||
Also only leaf fields can be returned via the `stored_fields` option. If an object field is specified, it will be ignored.
|
||
|
||
NOTE: On its own, `stored_fields` cannot be used to load fields in nested
|
||
objects -- if a field contains a nested object in its path, then no data will
|
||
be returned for that stored field. To access nested fields, `stored_fields`
|
||
must be used within an <<inner-hits, `inner_hits`>> block.
|
||
|
||
[discrete]
|
||
[[disable-stored-fields]]
|
||
==== Disable stored fields
|
||
|
||
To disable the stored fields (and metadata fields) entirely use: `_none_`:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
GET /_search
|
||
{
|
||
"stored_fields": "_none_",
|
||
"query" : {
|
||
"term" : { "user" : "kimchy" }
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
|
||
NOTE: <<source-filtering,`_source`>> and <<request-body-search-version, `version`>> parameters cannot be activated if `_none_` is used.
|
||
|
||
[discrete]
|
||
[[source-filtering]]
|
||
=== Source filtering
|
||
|
||
You can use the `_source` parameter to select what fields of the source are
|
||
returned. This is called _source filtering_.
|
||
|
||
The following search API request sets the `_source` request body parameter to
|
||
`false`. The document source is not included in the response.
|
||
|
||
[source,console]
|
||
----
|
||
GET /_search
|
||
{
|
||
"_source": false,
|
||
"query": {
|
||
"match": {
|
||
"user.id": "kimchy"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
To return only a subset of source fields, specify a wildcard (`*`) pattern in
|
||
the `_source` parameter. The following search API request returns the source for
|
||
only the `obj` field and its properties.
|
||
|
||
[source,console]
|
||
----
|
||
GET /_search
|
||
{
|
||
"_source": "obj.*",
|
||
"query": {
|
||
"match": {
|
||
"user.id": "kimchy"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
You can also specify an array of wildcard patterns in the `_source` field. The
|
||
following search API request returns the source for only the `obj1` and
|
||
`obj2` fields and their properties.
|
||
|
||
[source,console]
|
||
----
|
||
GET /_search
|
||
{
|
||
"_source": [ "obj1.*", "obj2.*" ],
|
||
"query": {
|
||
"match": {
|
||
"user.id": "kimchy"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
For finer control, you can specify an object containing arrays of `includes` and
|
||
`excludes` patterns in the `_source` parameter.
|
||
|
||
If the `includes` property is specified, only source fields that match one of
|
||
its patterns are returned. You can exclude fields from this subset using the
|
||
`excludes` property.
|
||
|
||
If the `includes` property is not specified, the entire document source is
|
||
returned, excluding any fields that match a pattern in the `excludes` property.
|
||
|
||
The following search API request returns the source for only the `obj1` and
|
||
`obj2` fields and their properties, excluding any child `description` fields.
|
||
|
||
[source,console]
|
||
----
|
||
GET /_search
|
||
{
|
||
"_source": {
|
||
"includes": [ "obj1.*", "obj2.*" ],
|
||
"excludes": [ "*.description" ]
|
||
},
|
||
"query": {
|
||
"term": {
|
||
"user.id": "kimchy"
|
||
}
|
||
}
|
||
}
|
||
----
|
||
|
||
[discrete]
|
||
[[script-fields]]
|
||
=== Script fields
|
||
|
||
You can use the `script_fields` parameter to retrieve a <<modules-scripting,script
|
||
evaluation>> (based on different fields) for each hit. For example:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
GET /_search
|
||
{
|
||
"query": {
|
||
"match_all": {}
|
||
},
|
||
"script_fields": {
|
||
"test1": {
|
||
"script": {
|
||
"lang": "painless",
|
||
"source": "doc['price'].value * 2"
|
||
}
|
||
},
|
||
"test2": {
|
||
"script": {
|
||
"lang": "painless",
|
||
"source": "doc['price'].value * params.factor",
|
||
"params": {
|
||
"factor": 2.0
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[setup:sales]
|
||
|
||
Script fields can work on fields that are not stored (`price` in
|
||
the above case), and allow to return custom values to be returned (the
|
||
evaluated value of the script).
|
||
|
||
Script fields can also access the actual `_source` document and
|
||
extract specific elements to be returned from it by using `params['_source']`.
|
||
Here is an example:
|
||
|
||
[source,console]
|
||
--------------------------------------------------
|
||
GET /_search
|
||
{
|
||
"query" : {
|
||
"match_all": {}
|
||
},
|
||
"script_fields" : {
|
||
"test1" : {
|
||
"script" : "params['_source']['message']"
|
||
}
|
||
}
|
||
}
|
||
--------------------------------------------------
|
||
// TEST[setup:my_index]
|
||
|
||
Note the `_source` keyword here to navigate the json-like model.
|
||
|
||
It's important to understand the difference between
|
||
`doc['my_field'].value` and `params['_source']['my_field']`. The first,
|
||
using the doc keyword, will cause the terms for that field to be loaded to
|
||
memory (cached), which will result in faster execution, but more memory
|
||
consumption. Also, the `doc[...]` notation only allows for simple valued
|
||
fields (you can't return a json object from it) and makes sense only for
|
||
non-analyzed or single term based fields. However, using `doc` is
|
||
still the recommended way to access values from the document, if at all
|
||
possible, because `_source` must be loaded and parsed every time it's used.
|
||
Using `_source` is very slow.
|