OpenSearch/docs/reference/how-to/general.asciidoc

[[general-recommendations]]
== General recommendations

[discrete]
[[large-size]]
=== Don't return large result sets

Elasticsearch is designed as a search engine, which makes it very good at
getting back the top documents that match a query. However, it is not as good
for workloads that fall into the database domain, such as retrieving all
documents that match a particular query. If you need to do this, make sure to
use the <<request-body-search-scroll,Scroll>> API.

[discrete]
[[maximum-document-size]]
=== Avoid large documents

Given that the default <<modules-http,`http.max_content_length`>> is set to
100MB, Elasticsearch will refuse to index any document that is larger than
that. You might decide to increase that particular setting, but Lucene still
has a limit of about 2GB.

Even without considering hard limits, large documents are usually not
practical. Large documents put more stress on network, memory usage and disk,
even for search requests that do not request the `_source` since Elasticsearch
needs to fetch the `_id` of the document in all cases, and the cost of getting
this field is bigger for large documents due to how the filesystem cache works.
Indexing this document can use an amount of memory that is a multiplier of the
original size of the document. Proximity search (phrase queries for instance)
and <<highlighting,highlighting>> also become more expensive
since their cost directly depends on the size of the original document.

It is sometimes useful to reconsider what the unit of information should be.
For instance, the fact you want to make books searchable doesn't necessarily
mean that a document should consist of a whole book. It might be a better idea
to use chapters or even paragraphs as documents, and then have a property in
these documents that identifies which book they belong to. This does not only
avoid the issues with large documents, it also makes the search experience
better. For instance if a user searches for two words `foo` and `bar`, a match
across different chapters is probably very poor, while a match within the same
paragraph is likely good.
Add notes about sparsity. 2016-07-07 09:12:09 +02:00			`[[general-recommendations]]`
			`== General recommendations`

[DOCS] Swap `[float]` for `[discrete]` (#60134) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks 2020-07-23 12:42:33 -04:00			`[discrete]`
Add notes about sparsity. 2016-07-07 09:12:09 +02:00			`[[large-size]]`
			`=== Don't return large result sets`

			`Elasticsearch is designed as a search engine, which makes it very good at`
			`getting back the top documents that match a query. However, it is not as good`
			`for workloads that fall into the database domain, such as retrieving all`
			`documents that match a particular query. If you need to do this, make sure to`
[DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:16:35 -04:00			`use the <<request-body-search-scroll,Scroll>> API.`
Add notes about sparsity. 2016-07-07 09:12:09 +02:00
[DOCS] Swap `[float]` for `[discrete]` (#60134) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks 2020-07-23 12:42:33 -04:00			`[discrete]`
Add a recommendation against large documents to the docs. (#21652) 2016-11-21 15:01:36 +01:00			`[[maximum-document-size]]`
			`=== Avoid large documents`

Docs: Corrected typo in how to (#33910) max_context_length -> max_content_length 2018-09-20 22:13:18 +02:00			Given that the default <<modules-http,`http.max_content_length`>> is set to
Add a recommendation against large documents to the docs. (#21652) 2016-11-21 15:01:36 +01:00			`100MB, Elasticsearch will refuse to index any document that is larger than`
			`that. You might decide to increase that particular setting, but Lucene still`
			`has a limit of about 2GB.`

			`Even without considering hard limits, large documents are usually not`
			`practical. Large documents put more stress on network, memory usage and disk,`
			even for search requests that do not request the `_source` since Elasticsearch
			needs to fetch the `_id` of the document in all cases, and the cost of getting
			`this field is bigger for large documents due to how the filesystem cache works.`
			`Indexing this document can use an amount of memory that is a multiplier of the`
			`original size of the document. Proximity search (phrase queries for instance)`
[DOCS] Move highlighting docs to separate page (#59768) (#59781) Moves the highlighting docs from the deprecated 'Request Body Search' chapter to the new subpage of the 'Run a search chapter' section. No substantive changes were made to the content. 2020-07-17 10:57:00 -04:00			`and <<highlighting,highlighting>> also become more expensive`
Add a recommendation against large documents to the docs. (#21652) 2016-11-21 15:01:36 +01:00			`since their cost directly depends on the size of the original document.`

			`It is sometimes useful to reconsider what the unit of information should be.`
[Docs]Corrected spelling errors. (#28976) 2018-03-19 10:22:40 -07:00			`For instance, the fact you want to make books searchable doesn't necessarily`
Add a recommendation against large documents to the docs. (#21652) 2016-11-21 15:01:36 +01:00			`mean that a document should consist of a whole book. It might be a better idea`
			`to use chapters or even paragraphs as documents, and then have a property in`
			`these documents that identifies which book they belong to. This does not only`
			`avoid the issues with large documents, it also makes the search experience`
			better. For instance if a user searches for two words `foo` and `bar`, a match
			`across different chapters is probably very poor, while a match within the same`
			`paragraph is likely good.`