mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-17 10:25:15 +00:00
Docs: Updated the source field docs to remove deprecation of includes/excludes
Also provide warnings about why disabling source is probably something you don't want to do Closes #12141
This commit is contained in:
parent
52859e3a52
commit
6c0badd0b3
@ -92,7 +92,7 @@ specific index module:
|
||||
index visible to search. Defaults to `1s`. Can be set to `-1` to disable
|
||||
refresh.
|
||||
|
||||
`index.codec`::
|
||||
[[index-codec]] `index.codec`::
|
||||
|
||||
experimental[] The `default` value compresses stored data with LZ4
|
||||
compression, but this can be set to `best_compression` for a higher
|
||||
|
@ -1,13 +1,133 @@
|
||||
[[mapping-source-field]]
|
||||
=== `_source`
|
||||
|
||||
The `_source` field is an automatically generated field that stores the
|
||||
actual JSON that was used as the indexed document. It is not indexed
|
||||
(searchable), just stored. When executing "fetch" requests, like
|
||||
<<docs-get,get>> or
|
||||
<<search-search,search>>, the `_source` field is
|
||||
returned by default.
|
||||
The `_source` field is an automatically generated field that stores the actual
|
||||
JSON that was used as the indexed document. It is not indexed (searchable),
|
||||
just stored. When executing "fetch" requests, like <<docs-get,get>> or
|
||||
<<search-search,search>>, the `_source` field is returned by default.
|
||||
|
||||
Many APIs may use the `_source` field. For example, the
|
||||
<<docs-update,Update API>>. To minimize the storage cost of
|
||||
`_source`, set `index.codec: best_compression` in index settings.
|
||||
==== Disabling source
|
||||
|
||||
Though very handy to have around, the source field does incur storage overhead
|
||||
within the index. For this reason, it can be disabled as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT tweets
|
||||
{
|
||||
"mappings": {},
|
||||
"tweet": {
|
||||
"_source": {
|
||||
"enabled": false
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
[WARNING]
|
||||
.Think before disabling the source field
|
||||
==================================================
|
||||
|
||||
Users often disable the `_source` field without thinking about the
|
||||
consequences, and then live to regret it. If the `_source` field isn't
|
||||
available then a number of features are not supported:
|
||||
|
||||
* The <<docs-update,`update` API>>.
|
||||
|
||||
* On the fly <<search-request-highlighting,highlighting>>.
|
||||
|
||||
* The ability to reindex from one Elasticsearch index to another, either
|
||||
to change mappings or analysis, or to upgrade an index to a new major
|
||||
version.
|
||||
|
||||
* The ability to debug queries or aggregations by viewing the original
|
||||
document used at index time.
|
||||
|
||||
* Potentially in the future, the ability to repair index corruption
|
||||
automatically.
|
||||
|
||||
If disk space is a concern, rather increase the
|
||||
<<index-codec,compression level>> instead of disabling the `_source`.
|
||||
==================================================
|
||||
|
||||
.The metrics use case
|
||||
**************************************************
|
||||
|
||||
The _metrics_ use case is distinct from other time-based or logging use cases
|
||||
in that there are many small documents which consist only of numbers, dates,
|
||||
or keywords. There are no updates, no highlighting requests, and the data
|
||||
ages quickly so there is no need to reindex. Search requests typically use
|
||||
simple queries to filter the dataset by date or tags, and the results are
|
||||
returned as aggregations.
|
||||
|
||||
In this case, disabling the `_source` field will save space and reduce I/O.
|
||||
It is also advisable to disable the <<mapping-all-field,`_all` field>> in the
|
||||
metrics case.
|
||||
|
||||
**************************************************
|
||||
|
||||
|
||||
[[include-exclude]]
|
||||
==== Including / Excluding fields from source
|
||||
|
||||
An expert-only feature is the ability to prune the contents of the `_source`
|
||||
field after the document has been indexed, but before the `_source` field is
|
||||
stored. The `includes`/`excludes` parameters (which also accept wildcards)
|
||||
can be used as follows:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT logs
|
||||
{
|
||||
"mappings": {
|
||||
"event": {
|
||||
"_source": {
|
||||
"includes": [
|
||||
"*.count",
|
||||
"meta.*"
|
||||
],
|
||||
"excludes": [
|
||||
"meta.description",
|
||||
"meta.other.*"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
PUT logs/event/1
|
||||
{
|
||||
"requests": {
|
||||
"count": 10,
|
||||
"foo": "bar" <1>
|
||||
},
|
||||
"meta": {
|
||||
"name": "Some metric",
|
||||
"description": "Some metric description", <1>
|
||||
"other": {
|
||||
"foo": "one", <1>
|
||||
"baz": "two" <1>
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
GET logs/event/_search
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"meta.other.foo": "one" <2>
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// AUTOSENSE
|
||||
|
||||
<1> These fields will be removed from the stored `_source` field.
|
||||
<2> We can still search on this field, even though it is not in the stored `_source`.
|
||||
|
||||
WARNING: Removing fields from the `_source` has similar downsides to disabling
|
||||
`_source`, especially the fact that you cannot reindex documents from one
|
||||
Elasticsearch index to another. Consider using
|
||||
<<search-request-source-filtering,source filtering>> or a
|
||||
<<mapping-transform,transform script>> instead.
|
||||
|
Loading…
x
Reference in New Issue
Block a user