134 lines
3.7 KiB
Plaintext
134 lines
3.7 KiB
Plaintext
[[mapping-source-field]]
|
|
=== `_source`
|
|
|
|
The `_source` field is an automatically generated field that stores the actual
|
|
JSON that was used as the indexed document. It is not indexed (searchable),
|
|
just stored. When executing "fetch" requests, like <<docs-get,get>> or
|
|
<<search-search,search>>, the `_source` field is returned by default.
|
|
|
|
==== Disabling source
|
|
|
|
Though very handy to have around, the source field does incur storage overhead
|
|
within the index. For this reason, it can be disabled as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT tweets
|
|
{
|
|
"mappings": {},
|
|
"tweet": {
|
|
"_source": {
|
|
"enabled": false
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// AUTOSENSE
|
|
|
|
[WARNING]
|
|
.Think before disabling the source field
|
|
==================================================
|
|
|
|
Users often disable the `_source` field without thinking about the
|
|
consequences, and then live to regret it. If the `_source` field isn't
|
|
available then a number of features are not supported:
|
|
|
|
* The <<docs-update,`update` API>>.
|
|
|
|
* On the fly <<search-request-highlighting,highlighting>>.
|
|
|
|
* The ability to reindex from one Elasticsearch index to another, either
|
|
to change mappings or analysis, or to upgrade an index to a new major
|
|
version.
|
|
|
|
* The ability to debug queries or aggregations by viewing the original
|
|
document used at index time.
|
|
|
|
* Potentially in the future, the ability to repair index corruption
|
|
automatically.
|
|
|
|
If disk space is a concern, rather increase the
|
|
<<index-codec,compression level>> instead of disabling the `_source`.
|
|
==================================================
|
|
|
|
.The metrics use case
|
|
**************************************************
|
|
|
|
The _metrics_ use case is distinct from other time-based or logging use cases
|
|
in that there are many small documents which consist only of numbers, dates,
|
|
or keywords. There are no updates, no highlighting requests, and the data
|
|
ages quickly so there is no need to reindex. Search requests typically use
|
|
simple queries to filter the dataset by date or tags, and the results are
|
|
returned as aggregations.
|
|
|
|
In this case, disabling the `_source` field will save space and reduce I/O.
|
|
It is also advisable to disable the <<mapping-all-field,`_all` field>> in the
|
|
metrics case.
|
|
|
|
**************************************************
|
|
|
|
|
|
[[include-exclude]]
|
|
==== Including / Excluding fields from source
|
|
|
|
An expert-only feature is the ability to prune the contents of the `_source`
|
|
field after the document has been indexed, but before the `_source` field is
|
|
stored. The `includes`/`excludes` parameters (which also accept wildcards)
|
|
can be used as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT logs
|
|
{
|
|
"mappings": {
|
|
"event": {
|
|
"_source": {
|
|
"includes": [
|
|
"*.count",
|
|
"meta.*"
|
|
],
|
|
"excludes": [
|
|
"meta.description",
|
|
"meta.other.*"
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT logs/event/1
|
|
{
|
|
"requests": {
|
|
"count": 10,
|
|
"foo": "bar" <1>
|
|
},
|
|
"meta": {
|
|
"name": "Some metric",
|
|
"description": "Some metric description", <1>
|
|
"other": {
|
|
"foo": "one", <1>
|
|
"baz": "two" <1>
|
|
}
|
|
}
|
|
}
|
|
|
|
GET logs/event/_search
|
|
{
|
|
"query": {
|
|
"match": {
|
|
"meta.other.foo": "one" <2>
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// AUTOSENSE
|
|
|
|
<1> These fields will be removed from the stored `_source` field.
|
|
<2> We can still search on this field, even though it is not in the stored `_source`.
|
|
|
|
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
|
`_source`, especially the fact that you cannot reindex documents from one
|
|
Elasticsearch index to another. Consider using
|
|
<<search-request-source-filtering,source filtering>> or a
|
|
<<mapping-transform,transform script>> instead.
|