From bbdf50f6bdd06d53a3cb9f58d606628f92eccda9 Mon Sep 17 00:00:00 2001 From: Adrien Grand Date: Thu, 1 Jun 2017 17:23:22 +0200 Subject: [PATCH] Docs: More search speed advices. (#24802) --- docs/reference/how-to/search-speed.asciidoc | 16 +++++++++++++ .../index-modules/index-sorting.asciidoc | 24 +++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc index 2d0525a48e8..42a03dd8fd2 100644 --- a/docs/reference/how-to/search-speed.asciidoc +++ b/docs/reference/how-to/search-speed.asciidoc @@ -310,3 +310,19 @@ setting. WARNING: Loading data into the filesystem cache eagerly on too many indices or too many files will make search _slower_ if the filesystem cache is not large enough to hold all the data. Use with caution. + +[float] +=== Map identifiers as `keyword` + +When you have numeric identifiers in your documents, it is tempting to map them +as numbers, which is consistent with their json type. However, the way that +Elasticsearch indexes numbers optimizes for `range` queries while `keyword` +fields are better at `term` queries. Since identifiers are never used in `range` +queries, they should be mapped as a `keyword`. + +[float] +=== Use index sorting to speed up conjunctions + +<> can be useful in order to make +conjunctions faster at the cost of slightly slower indexing. Read more about it +in the <>. diff --git a/docs/reference/index-modules/index-sorting.asciidoc b/docs/reference/index-modules/index-sorting.asciidoc index 0c2b5c9abe9..38b734546bc 100644 --- a/docs/reference/index-modules/index-sorting.asciidoc +++ b/docs/reference/index-modules/index-sorting.asciidoc @@ -105,3 +105,27 @@ Index sorting supports the following settings: [WARNING] Index sorting can be defined only once at index creation. It is not allowed to add or update a sort on an existing index. + +// TODO: Also document how index sorting can be used to early-terminate +// sorted search requests when the total number of matches is not needed + +[[index-modules-index-sorting-conjunctions]] +=== Use index sorting to speed up conjunctions + +Index sorting can be useful in order to organize Lucene doc ids (not to be +conflated with `_id`) in a way that makes conjunctions (a AND b AND ...) more +efficient. In order to be efficient, conjunctions rely on the fact that if any +clause does not match, then the entire conjunction does not match. By using +index sorting, we can put documents that do not match together, which will +help skip efficiently over large ranges of doc IDs that do not match the +conjunction. + +This trick only works with low-cardinality fields. A rule of thumb is that +you should sort first on fields that both have a low cardinality and are +frequently used for filtering. The sort order (`asc` or `desc`) does not +matter as we only care about putting values that would match the same clauses +close to each other. + +For instance if you were indexing cars for sale, it might be interesting to +sort by fuel type, body type, make, year of registration and finally mileage. +