2016-07-07 03:12:09 -04:00
|
|
|
[[general-recommendations]]
|
|
|
|
== General recommendations
|
|
|
|
|
|
|
|
[float]
|
|
|
|
[[large-size]]
|
|
|
|
=== Don't return large result sets
|
|
|
|
|
|
|
|
Elasticsearch is designed as a search engine, which makes it very good at
|
|
|
|
getting back the top documents that match a query. However, it is not as good
|
|
|
|
for workloads that fall into the database domain, such as retrieving all
|
|
|
|
documents that match a particular query. If you need to do this, make sure to
|
2019-07-19 09:16:35 -04:00
|
|
|
use the <<request-body-search-scroll,Scroll>> API.
|
2016-07-07 03:12:09 -04:00
|
|
|
|
2016-11-21 09:01:36 -05:00
|
|
|
[float]
|
|
|
|
[[maximum-document-size]]
|
|
|
|
=== Avoid large documents
|
|
|
|
|
2018-09-20 16:13:18 -04:00
|
|
|
Given that the default <<modules-http,`http.max_content_length`>> is set to
|
2016-11-21 09:01:36 -05:00
|
|
|
100MB, Elasticsearch will refuse to index any document that is larger than
|
|
|
|
that. You might decide to increase that particular setting, but Lucene still
|
|
|
|
has a limit of about 2GB.
|
|
|
|
|
|
|
|
Even without considering hard limits, large documents are usually not
|
|
|
|
practical. Large documents put more stress on network, memory usage and disk,
|
|
|
|
even for search requests that do not request the `_source` since Elasticsearch
|
|
|
|
needs to fetch the `_id` of the document in all cases, and the cost of getting
|
|
|
|
this field is bigger for large documents due to how the filesystem cache works.
|
|
|
|
Indexing this document can use an amount of memory that is a multiplier of the
|
|
|
|
original size of the document. Proximity search (phrase queries for instance)
|
2019-07-19 09:16:35 -04:00
|
|
|
and <<request-body-search-highlighting,highlighting>> also become more expensive
|
2016-11-21 09:01:36 -05:00
|
|
|
since their cost directly depends on the size of the original document.
|
|
|
|
|
|
|
|
It is sometimes useful to reconsider what the unit of information should be.
|
2018-03-19 13:22:40 -04:00
|
|
|
For instance, the fact you want to make books searchable doesn't necessarily
|
2016-11-21 09:01:36 -05:00
|
|
|
mean that a document should consist of a whole book. It might be a better idea
|
|
|
|
to use chapters or even paragraphs as documents, and then have a property in
|
|
|
|
these documents that identifies which book they belong to. This does not only
|
|
|
|
avoid the issues with large documents, it also makes the search experience
|
|
|
|
better. For instance if a user searches for two words `foo` and `bar`, a match
|
|
|
|
across different chapters is probably very poor, while a match within the same
|
|
|
|
paragraph is likely good.
|
|
|
|
|