182 lines
5.3 KiB
Plaintext
182 lines
5.3 KiB
Plaintext
[[tune-for-disk-usage]]
|
|
== Tune for disk usage
|
|
|
|
[float]
|
|
=== Disable the features you do not need
|
|
|
|
By default elasticsearch indexes and adds doc values to most fields so that they
|
|
can be searched and aggregated out of the box. For instance if you have a numeric
|
|
field called `foo` that you need to run histograms on but that you never need to
|
|
filter on, you can safely disable indexing on this field in your
|
|
<<mappings,mappings>>:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "integer",
|
|
"index": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<<text,`text`>> fields store normalization factors in the index in order to be
|
|
able to score documents. If you only need matching capabilities on a `text`
|
|
field but do not care about the produced scores, you can configure elasticsearch
|
|
to not write norms to the index:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"norms": false
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
<<text,`text`>> fields also store frequencies and positions in the index by
|
|
default. Frequencies are used to compute scores and positions are used to run
|
|
phrase queries. If you do not need to run phrase queries, you can tell
|
|
elasticsearch to not index positions:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"index_options": "freqs"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
Furthermore if you do not care about scoring either, you can configure
|
|
elasticsearch to just index matching documents for every term. You will
|
|
still be able to search on this field, but phrase queries will raise errors
|
|
and scoring will assume that terms appear only once in every document.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "text",
|
|
"norms": false,
|
|
"index_options": "freqs"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Don't use default dynamic string mappings
|
|
|
|
The default <<dynamic-mapping,dynamic string mappings>> will index string fields
|
|
both as <<text,`text`>> and <<keyword,`keyword`>>. This is wasteful if you only
|
|
need one of them. Typically an `id` field will only need to be indexed as a
|
|
`keyword` while a `body` field will only need to be indexed as a `text` field.
|
|
|
|
This can be disabled by either configuring explicit mappings on string fields
|
|
or setting up dynamic templates that will map string fields as either `text`
|
|
or `keyword`.
|
|
|
|
For instance, here is a template that can be used in order to only map string
|
|
fields as `keyword`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"type": {
|
|
"dynamic_templates": [
|
|
{
|
|
"strings": {
|
|
"match_mapping_type": "string",
|
|
"mapping": {
|
|
"type": "keyword"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// CONSOLE
|
|
|
|
[float]
|
|
=== Disable `_all`
|
|
|
|
The <<mapping-all-field,`_all`>> field indexes the value of all fields of a
|
|
document and can use significant space. If you never need to search against all
|
|
fields at the same time, it can be disabled.
|
|
|
|
[float]
|
|
=== Use `best_compression`
|
|
|
|
The `_source` and stored fields can easily take a non negligible amount of disk
|
|
space. They can be compressed more aggressively by using the `best_compression`
|
|
<<index-codec,codec>>.
|
|
|
|
[float]
|
|
=== Use the smallest numeric type that is sufficient
|
|
|
|
The type that you pick for <<number,numeric data>> can have a significant impact
|
|
on disk usage. In particular, integers should be stored using an integer type
|
|
(`byte`, `short`, `integer` or `long`) and floating points should either be
|
|
stored in a `scaled_float` if appropriate or in the smallest type that fits the
|
|
use-case: using `float` over `double`, or `half_float` over `float` will help
|
|
save storage.
|
|
|
|
[float]
|
|
=== Use index sorting to colocate similar documents
|
|
|
|
When Elasticsearch stores `_source`, it compresses multiple documents at once
|
|
in order to improve the overall compression ratio. For instance it is very
|
|
common that documents share the same field names, and quite common that they
|
|
share some field values, especially on fields that have a low cardinality or
|
|
a https://en.wikipedia.org/wiki/Zipf%27s_law[zipfian] distribution.
|
|
|
|
By default documents are compressed together in the order that they are added
|
|
to the index. If you enabled <<index-modules-index-sorting,index sorting>>
|
|
then instead they are compressed in sorted order. Sorting documents with similar
|
|
structure, fields, and values together should improve the compression ratio.
|
|
|
|
[float]
|
|
=== Put fields in the same order in documents
|
|
|
|
Due to the fact that multiple documents are compressed together into blocks,
|
|
it is more likely to find longer duplicate strings in those `_source` documents
|
|
if fields always occur in the same order.
|