Updated fielddata docs to make it easier for users with old mappings
This commit is contained in:
parent
b5b0f04e34
commit
05271d58ca
|
@ -2,42 +2,105 @@
|
||||||
=== `fielddata`
|
=== `fielddata`
|
||||||
|
|
||||||
Most fields are <<mapping-index,indexed>> by default, which makes them
|
Most fields are <<mapping-index,indexed>> by default, which makes them
|
||||||
searchable. The inverted index allows queries to look up the search term in
|
searchable. Sorting, aggregations, and accessing field values in scripts,
|
||||||
unique sorted list of terms, and from that immediately have access to the list
|
however, requires a different access pattern from search.
|
||||||
of documents that contain the term.
|
|
||||||
|
|
||||||
Sorting, aggregations, and access to field values in scripts requires a
|
Search needs to answer the question _"Which documents contain this term?"_,
|
||||||
different data access pattern. Instead of lookup up the term and finding
|
while sorting and aggregations need to answer a different question: _"What is
|
||||||
documents, we need to be able to look up the document and find the terms that
|
the value of this field for **this** document?"_.
|
||||||
it has in a field.
|
|
||||||
|
|
||||||
Most fields can use index-time, on-disk <<doc-values,`doc_values`>> to support
|
Most fields can use index-time, on-disk <<doc-values,`doc_values`>> for this
|
||||||
this type of data access pattern, but `text` fields do not support `doc_values`.
|
data access pattern, but <<text,`text`>> fields do not support `doc_values`.
|
||||||
|
|
||||||
Instead, `text` strings use a query-time data structure called
|
Instead, `text` fields use a query-time *in-memory* data structure called
|
||||||
`fielddata`. This data structure is built on demand the first time that a
|
`fielddata`. This data structure is built on demand the first time that a
|
||||||
field is used for aggregations, sorting, or is accessed in a script. It is built
|
field is used for aggregations, sorting, or in a script. It is built by
|
||||||
by reading the entire inverted index for each segment from disk, inverting the
|
reading the entire inverted index for each segment from disk, inverting the
|
||||||
term ↔︎ document relationship, and storing the result in memory, in the
|
term ↔︎ document relationship, and storing the result in memory, in the JVM
|
||||||
JVM heap.
|
heap.
|
||||||
|
|
||||||
Loading fielddata is an expensive process so it is disabled by default. Also,
|
==== Fielddata is disabled on `text` fields by default
|
||||||
when enabled, once it has been loaded, it remains in memory for the lifetime of
|
|
||||||
the segment.
|
|
||||||
|
|
||||||
[WARNING]
|
Fielddata can consume a *lot* of heap space, especially when loading high
|
||||||
.Fielddata can fill up your heap space
|
cardinality `text` fields. Once fielddata has been loaded into the heap, it
|
||||||
==============================================================================
|
remains there for the lifetime of the segment. Also, loading fielddata is an
|
||||||
Fielddata can consume a lot of heap space, especially when loading high
|
expensive process which can cause users to experience latency hits. This is
|
||||||
cardinality `text` fields. Most of the time, it doesn't make sense
|
why fielddata is disabled by default.
|
||||||
to sort or aggregate on `text` fields (with the notable exception
|
|
||||||
of the
|
|
||||||
<<search-aggregations-bucket-significantterms-aggregation,`significant_terms`>>
|
|
||||||
aggregation). Always think about whether a <<keyword,`keyword`>> field (which can
|
|
||||||
use `doc_values`) would be a better fit for your use case.
|
|
||||||
==============================================================================
|
|
||||||
|
|
||||||
TIP: The `fielddata.*` settings must have the same settings for fields of the
|
If you try to sort, aggregate, or access values from a script on a `text`
|
||||||
|
field, you will see this exception:
|
||||||
|
|
||||||
|
[quote]
|
||||||
|
--
|
||||||
|
Fielddata is disabled on text fields by default. Set `fielddata=true` on
|
||||||
|
[`your_field_name`] in order to load fielddata in memory by uninverting the
|
||||||
|
inverted index. Note that this can however use significant memory.
|
||||||
|
--
|
||||||
|
|
||||||
|
[[before-enabling-fielddata]]
|
||||||
|
==== Before enabling fielddata
|
||||||
|
|
||||||
|
Before you enable fielddata, consider why you are using a `text` field for
|
||||||
|
aggregations, sorting, or in a script. It usually doesn't make sense to do
|
||||||
|
so.
|
||||||
|
|
||||||
|
A text field is analyzed before indexing so that a value like
|
||||||
|
`New York` can be found by searching for `new` or for `york`. A `terms`
|
||||||
|
aggregation on this field will return a `new` bucket and a `york` bucket, when
|
||||||
|
you probably want a single bucket called `New York`.
|
||||||
|
|
||||||
|
Instead, you should have a `text` field for full text searches, and an
|
||||||
|
unanalyzed <<keyword,`keyword`>> field with <<doc-values,`doc_values`>>
|
||||||
|
enabled for aggregations, as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
---------------------------------
|
||||||
|
PUT my_index
|
||||||
|
{
|
||||||
|
"mappings": {
|
||||||
|
"my_type": {
|
||||||
|
"properties": {
|
||||||
|
"my_field": { <1>
|
||||||
|
"type": "text",
|
||||||
|
"fields": {
|
||||||
|
"keyword": { <2>
|
||||||
|
"type": "keyword"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
---------------------------------
|
||||||
|
// CONSOLE
|
||||||
|
<1> Use the `my_field` field for searches.
|
||||||
|
<2> Use the `my_field.keyword` field for aggregations, sorting, or in scripts.
|
||||||
|
|
||||||
|
==== Enabling fielddata on `text` fields
|
||||||
|
|
||||||
|
You can enable fielddata on an existing `text` field using the
|
||||||
|
<<indices-put-mapping,PUT mapping API>> as follows:
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
-----------------------------------
|
||||||
|
PUT my_index/_mapping/my_type
|
||||||
|
{
|
||||||
|
"properties": {
|
||||||
|
"my_field": { <1>
|
||||||
|
"type": "text",
|
||||||
|
"fielddata": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
-----------------------------------
|
||||||
|
// CONSOLE
|
||||||
|
// TEST[continued]
|
||||||
|
|
||||||
|
<1> The mapping that you specify for `my_field` should consist of the existing
|
||||||
|
mapping for that field, plus the `fielddata` parameter.
|
||||||
|
|
||||||
|
TIP: The `fielddata.*` parameter must have the same settings for fields of the
|
||||||
same name in the same index. Its value can be updated on existing fields
|
same name in the same index. Its value can be updated on existing fields
|
||||||
using the <<indices-put-mapping,PUT mapping API>>.
|
using the <<indices-put-mapping,PUT mapping API>>.
|
||||||
|
|
||||||
|
@ -49,12 +112,13 @@ using the <<indices-put-mapping,PUT mapping API>>.
|
||||||
Global ordinals is a data-structure on top of fielddata and doc values, that
|
Global ordinals is a data-structure on top of fielddata and doc values, that
|
||||||
maintains an incremental numbering for each unique term in a lexicographic
|
maintains an incremental numbering for each unique term in a lexicographic
|
||||||
order. Each term has a unique number and the number of term 'A' is lower than
|
order. Each term has a unique number and the number of term 'A' is lower than
|
||||||
the number of term 'B'. Global ordinals are only supported on string fields.
|
the number of term 'B'. Global ordinals are only supported on <<text,`text`>>
|
||||||
|
and <<keyword,`keyword`>> fields.
|
||||||
|
|
||||||
Fielddata and doc values also have ordinals, which is a unique numbering for all terms
|
Fielddata and doc values also have ordinals, which is a unique numbering for
|
||||||
in a particular segment and field. Global ordinals just build on top of this,
|
all terms in a particular segment and field. Global ordinals just build on top
|
||||||
by providing a mapping between the segment ordinals and the global ordinals,
|
of this, by providing a mapping between the segment ordinals and the global
|
||||||
the latter being unique across the entire shard.
|
ordinals, the latter being unique across the entire shard.
|
||||||
|
|
||||||
Global ordinals are used for features that use segment ordinals, such as
|
Global ordinals are used for features that use segment ordinals, such as
|
||||||
sorting and the terms aggregation, to improve the execution time. A terms
|
sorting and the terms aggregation, to improve the execution time. A terms
|
||||||
|
@ -68,10 +132,11 @@ which is different than for field data for a specific field which is tied to a
|
||||||
single segment. For this reason global ordinals need to be entirely rebuilt
|
single segment. For this reason global ordinals need to be entirely rebuilt
|
||||||
whenever a once new segment becomes visible.
|
whenever a once new segment becomes visible.
|
||||||
|
|
||||||
The loading time of global ordinals depends on the number of terms in a field, but in general
|
The loading time of global ordinals depends on the number of terms in a field,
|
||||||
it is low, since it source field data has already been loaded. The memory overhead of global
|
but in general it is low, since it source field data has already been loaded.
|
||||||
ordinals is a small because it is very efficiently compressed. Eager loading of global ordinals
|
The memory overhead of global ordinals is a small because it is very
|
||||||
can move the loading time from the first search request, to the refresh itself.
|
efficiently compressed. Eager loading of global ordinals can move the loading
|
||||||
|
time from the first search request, to the refresh itself.
|
||||||
|
|
||||||
*****************************************
|
*****************************************
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue