143 lines
4.5 KiB
Plaintext
143 lines
4.5 KiB
Plaintext
|
[[index-modules-fielddata]]
|
||
|
== Field data
|
||
|
|
||
|
The field data cache is used mainly when sorting on or faceting on a
|
||
|
field. It loads all the field values to memory in order to provide fast
|
||
|
document based access to those values. The field data cache can be
|
||
|
expensive to build for a field, so its recommended to have enough memory
|
||
|
to allocate it, and to keep it loaded.
|
||
|
|
||
|
From version 0.90 onwards, the amount of memory used for the field
|
||
|
data cache can be controlled using `indices.fielddata.cache.size`. Note:
|
||
|
reloading the field data which does not fit into your cache will be expensive
|
||
|
and perform poorly.
|
||
|
|
||
|
[cols="<,<",options="header",]
|
||
|
|=======================================================================
|
||
|
|Setting |Description
|
||
|
|`indices.fielddata.cache.size` |The max size of the field data cache,
|
||
|
eg `30%` of node heap space, or an absolute value, eg `12GB`. Defaults
|
||
|
to unbounded.
|
||
|
|
||
|
|`indices.fielddata.cache.expire` |A time based setting that expires
|
||
|
field data after a certain time of inactivity. Defaults to `-1`. For
|
||
|
example, can be set to `5m` for a 5 minute expiry.
|
||
|
|=======================================================================
|
||
|
|
||
|
[float]
|
||
|
=== Filtering fielddata
|
||
|
|
||
|
It is possible to control which field values are loaded into memory,
|
||
|
which is particularly useful for string fields. When specifying the
|
||
|
<<mapping-core-types,mapping>> for a field, you
|
||
|
can also specify a fielddata filter.
|
||
|
|
||
|
Fielddata filters can be changed using the
|
||
|
<<indices-put-mapping,PUT mapping>>
|
||
|
API. After changing the filters, use the
|
||
|
<<indices-clearcache,Clear Cache>> API
|
||
|
to reload the fielddata using the new filters.
|
||
|
|
||
|
[float]
|
||
|
==== Filtering by frequency:
|
||
|
|
||
|
The frequency filter allows you to only load terms whose frequency falls
|
||
|
between a `min` and `max` value, which can be expressed an absolute
|
||
|
number or as a percentage (eg `0.01` is `1%`). Frequency is calculated
|
||
|
*per segment*. Percentages are based on the number of docs which have a
|
||
|
value for the field, as opposed to all docs in the segment.
|
||
|
|
||
|
Small segments can be excluded completely by specifying the minimum
|
||
|
number of docs that the segment should contain with `min_segment_size`:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
tag: {
|
||
|
type: "string",
|
||
|
fielddata: {
|
||
|
filter: {
|
||
|
frequency: {
|
||
|
min: 0.001,
|
||
|
max: 0.1,
|
||
|
min_segment_size: 500
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
[float]
|
||
|
==== Filtering by regex
|
||
|
|
||
|
Terms can also be filtered by regular expression - only values which
|
||
|
match the regular expression are loaded. Note: the regular expression is
|
||
|
applied to each term in the field, not to the whole field value. For
|
||
|
instance, to only load hashtags from a tweet, we can use a regular
|
||
|
expression which matches terms beginning with `#`:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
tweet: {
|
||
|
type: "string",
|
||
|
analyzer: "whitespace"
|
||
|
fielddata: {
|
||
|
filter: {
|
||
|
regex: "^#.*"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
[float]
|
||
|
==== Combining filters
|
||
|
|
||
|
The `frequency` and `regex` filters can be combined:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
tweet: {
|
||
|
type: "string",
|
||
|
analyzer: "whitespace"
|
||
|
fielddata: {
|
||
|
filter: {
|
||
|
regex: "^#.*",
|
||
|
frequency: {
|
||
|
min: 0.001,
|
||
|
max: 0.1,
|
||
|
min_segment_size: 500
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
[float]
|
||
|
=== Settings before v0.90
|
||
|
|
||
|
[cols="<,<",options="header",]
|
||
|
|=======================================================================
|
||
|
|Setting |Description
|
||
|
|`index.cache.field.type` |The default type for the field data cache is
|
||
|
`resident` (because of the cost of rebuilding it). Other types include
|
||
|
`soft`
|
||
|
|
||
|
|`index.cache.field.max_size` |The max size (count, not byte size) of
|
||
|
the cache (per search segment in a shard). Defaults to not set (`-1`).
|
||
|
|
||
|
|`index.cache.field.expire` |A time based setting that expires filters
|
||
|
after a certain time of inactivity. Defaults to `-1`. For example, can
|
||
|
be set to `5m` for a 5 minute expiry.
|
||
|
|=======================================================================
|
||
|
|
||
|
[float]
|
||
|
=== Monitoring field data
|
||
|
|
||
|
You can monitor memory usage for field data using
|
||
|
<<cluster-nodes-stats,Nodes Stats API>>
|