OpenSearch/docs/reference/mapping/fields/source-field.asciidoc

115 lines
3.0 KiB
Plaintext

[[mapping-source-field]]
=== `_source` field
The `_source` field contains the original JSON document body that was passed
at index time. The `_source` field itself is not indexed (and thus is not
searchable), but it is stored so that it can be returned when executing
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
[[disable-source-field]]
==== Disabling the `_source` field
Though very handy to have around, the source field does incur storage overhead
within the index. For this reason, it can be disabled as follows:
[source,console]
--------------------------------------------------
PUT my-index-000001
{
"mappings": {
"_source": {
"enabled": false
}
}
}
--------------------------------------------------
[WARNING]
.Think before disabling the `_source` field
==================================================
Users often disable the `_source` field without thinking about the
consequences, and then live to regret it. If the `_source` field isn't
available then a number of features are not supported:
* The <<docs-update,`update`>>, <<docs-update-by-query,`update_by_query`>>,
and <<docs-reindex,`reindex`>> APIs.
* On the fly <<highlighting,highlighting>>.
* The ability to reindex from one Elasticsearch index to another, either
to change mappings or analysis, or to upgrade an index to a new major
version.
* The ability to debug queries or aggregations by viewing the original
document used at index time.
* Potentially in the future, the ability to repair index corruption
automatically.
==================================================
TIP: If disk space is a concern, rather increase the
<<index-codec,compression level>> instead of disabling the `_source`.
[[include-exclude]]
==== Including / Excluding fields from `_source`
An expert-only feature is the ability to prune the contents of the `_source`
field after the document has been indexed, but before the `_source` field is
stored.
WARNING: Removing fields from the `_source` has similar downsides to disabling
`_source`, especially the fact that you cannot reindex documents from one
Elasticsearch index to another. Consider using
<<source-filtering,source filtering>> instead.
The `includes`/`excludes` parameters (which also accept wildcards) can be used
as follows:
[source,console]
--------------------------------------------------
PUT logs
{
"mappings": {
"_source": {
"includes": [
"*.count",
"meta.*"
],
"excludes": [
"meta.description",
"meta.other.*"
]
}
}
}
PUT logs/_doc/1
{
"requests": {
"count": 10,
"foo": "bar" <1>
},
"meta": {
"name": "Some metric",
"description": "Some metric description", <1>
"other": {
"foo": "one", <1>
"baz": "two" <1>
}
}
}
GET logs/_search
{
"query": {
"match": {
"meta.other.foo": "one" <2>
}
}
}
--------------------------------------------------
<1> These fields will be removed from the stored `_source` field.
<2> We can still search on this field, even though it is not in the stored `_source`.