160 lines
4.2 KiB
Plaintext
160 lines
4.2 KiB
Plaintext
[[tune-for-disk-usage]]
|
|
== Tune for disk usage
|
|
|
|
[float]
|
|
=== Disable the features you do not need
|
|
|
|
By default elasticsearch indexes and adds doc values to most fields so that they
|
|
can be searched and aggregated out of the box. For instance if you have a numeric
|
|
field called `foo` that you need to run histograms on but that you never need to
|
|
filter on, you can safely disable indexing on this field in your
|
|
<<mappings,mappings>>:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "integer",
|
|
"index": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<<text,`text`>> fields store normalization factors in the index in order to be
|
|
able to score documents. If you only need matching capabilities on a `text`
|
|
field but do not care about the produced scores, you can configure elasticsearch
|
|
to not write norms to the index:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"norms": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<<text,`text`>> fields also store frequencies and positions in the index by
|
|
default. Frequencies are used to compute scores and positions are used to run
|
|
phrase queries. If you do not need to run phrase queries, you can tell
|
|
elasticsearch to not index positions:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"index_options": "freqs"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
Furthermore if you do not care about scoring either, you can configure
|
|
elasticsearch to just index matching documents for every term. You will
|
|
still be able to search on this field, but phrase queries will raise errors
|
|
and scoring will assume that terms appear only once in every document.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"norms": false,
|
|
"index_options": "freqs"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Don't use default dynamic string mappings
|
|
|
|
The default <<dynamic-mapping,dynamic string mappings>> will index string fields
|
|
both as <<text,`text`>> and <<keyword,`keyword`>>. This is wasteful if you only
|
|
need one of them. Typically an `id` field will only need to be indexed as a
|
|
`keyword` while a `body` field will only need to be indexed as a `text` field.
|
|
|
|
This can be disabled by either configuring explicit mappings on string fields
|
|
or setting up dynamic templates that will map string fields as either `text`
|
|
or `keyword`.
|
|
|
|
For instance, here is a template that can be used in order to only map string
|
|
fields as `keyword`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"dynamic_templates": [
|
|
{
|
|
"strings": {
|
|
"match_mapping_type": "string",
|
|
"mapping": {
|
|
"type": "keyword"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Disable `_all`
|
|
|
|
The <<mapping-all-field,`_all`>> field indexes the value of all fields of a
|
|
document and can use significant space. If you never need to search against all
|
|
fields at the same time, it can be disabled.
|
|
|
|
[float]
|
|
=== Use `best_compression`
|
|
|
|
The `_source` and stored fields can easily take a non negligible amount of disk
|
|
space. They can be compressed more aggressively by using the `best_compression`
|
|
<<index-codec,codec>>.
|
|
|
|
[float]
|
|
=== Use the smallest numeric type that is sufficient
|
|
|
|
When storing <<number,numeric data>>, using `float` over `double`, or `half_float`
|
|
over `float` can help save storage. This is also true for integer types, but less
|
|
since Elasticsearch will more easily compress them based on the number of bits
|
|
that they actually need.
|
|
|