137 lines
3.7 KiB
Plaintext
137 lines
3.7 KiB
Plaintext
[[mapping-source-field]]
|
|
=== `_source` field
|
|
|
|
The `_source` field contains the original JSON document body that was passed
|
|
at index time. The `_source` field itself is not indexed (and thus is not
|
|
searchable), but it is stored so that it can be returned when executing
|
|
_fetch_ requests, like <<docs-get,get>> or <<search-search,search>>.
|
|
|
|
==== Disabling the `_source` field
|
|
|
|
Though very handy to have around, the source field does incur storage overhead
|
|
within the index. For this reason, it can be disabled as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT tweets
|
|
{
|
|
"mappings": {
|
|
"tweet": {
|
|
"_source": {
|
|
"enabled": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// AUTOSENSE
|
|
|
|
[WARNING]
|
|
.Think before disabling the `_source` field
|
|
==================================================
|
|
|
|
Users often disable the `_source` field without thinking about the
|
|
consequences, and then live to regret it. If the `_source` field isn't
|
|
available then a number of features are not supported:
|
|
|
|
* The <<docs-update,`update` API>>.
|
|
|
|
* On the fly <<search-request-highlighting,highlighting>>.
|
|
|
|
* The ability to reindex from one Elasticsearch index to another, either
|
|
to change mappings or analysis, or to upgrade an index to a new major
|
|
version.
|
|
|
|
* The ability to debug queries or aggregations by viewing the original
|
|
document used at index time.
|
|
|
|
* Potentially in the future, the ability to repair index corruption
|
|
automatically.
|
|
==================================================
|
|
|
|
TIP: If disk space is a concern, rather increase the
|
|
<<index-codec,compression level>> instead of disabling the `_source`.
|
|
|
|
.The metrics use case
|
|
**************************************************
|
|
|
|
The _metrics_ use case is distinct from other time-based or logging use cases
|
|
in that there are many small documents which consist only of numbers, dates,
|
|
or keywords. There are no updates, no highlighting requests, and the data
|
|
ages quickly so there is no need to reindex. Search requests typically use
|
|
simple queries to filter the dataset by date or tags, and the results are
|
|
returned as aggregations.
|
|
|
|
In this case, disabling the `_source` field will save space and reduce I/O.
|
|
It is also advisable to disable the <<mapping-all-field,`_all` field>> in the
|
|
metrics case.
|
|
|
|
**************************************************
|
|
|
|
|
|
[[include-exclude]]
|
|
==== Including / Excluding fields from `_source`
|
|
|
|
An expert-only feature is the ability to prune the contents of the `_source`
|
|
field after the document has been indexed, but before the `_source` field is
|
|
stored.
|
|
|
|
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
|
`_source`, especially the fact that you cannot reindex documents from one
|
|
Elasticsearch index to another. Consider using
|
|
<<search-request-source-filtering,source filtering>> instead.
|
|
|
|
The `includes`/`excludes` parameters (which also accept wildcards) can be used
|
|
as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT logs
|
|
{
|
|
"mappings": {
|
|
"event": {
|
|
"_source": {
|
|
"includes": [
|
|
"*.count",
|
|
"meta.*"
|
|
],
|
|
"excludes": [
|
|
"meta.description",
|
|
"meta.other.*"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT logs/event/1
|
|
{
|
|
"requests": {
|
|
"count": 10,
|
|
"foo": "bar" <1>
|
|
},
|
|
"meta": {
|
|
"name": "Some metric",
|
|
"description": "Some metric description", <1>
|
|
"other": {
|
|
"foo": "one", <1>
|
|
"baz": "two" <1>
|
|
}
|
|
}
|
|
}
|
|
|
|
GET logs/event/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"meta.other.foo": "one" <2>
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// AUTOSENSE
|
|
|
|
<1> These fields will be removed from the stored `_source` field.
|
|
<2> We can still search on this field, even though it is not in the stored `_source`.
|
|
|