Add more information to the how-to docs. #20297

- use auto-generated ids for indexing #20211 - use rounded dates in queries #20115
2025-03-25 01:19:02 +00:00 · 2016-09-02 13:30:36 +02:00 · 2016-09-02 13:30:36 +02:00 · cdc27b75b8
commit cdc27b75b8
parent 28d7ebe8f8
2 changed files with 127 additions and 0 deletions
--- a/docs/reference/how-to/indexing-speed.asciidoc
+++ b/docs/reference/how-to/indexing-speed.asciidoc
@ -67,6 +67,15 @@ The filesystem cache will be used in order to buffer I/O operations. You should
 make sure to give at least half the memory of the machine running elasticsearch
 to the filesystem cache.

+[float]
+=== Use auto-generated ids
+
+When indexing a document that has an explicit id, elasticsearch needs to check
+whether a document with the same id already exists within the same shard, which
+is a costly operation and gets even more costly as the index grows. By using
+auto-generated ids, Elasticsearch can skip this check, which makes indexing
+faster.
+
 [float]
 === Use faster hardware

--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@ -140,6 +140,124 @@ being mapped as <<keyword,`keyword`>> rather than `integer` or `long`.
 In general, scripts should be avoided. If they are absolutely needed, you
 should prefer the `painless` and `expressions` engines.

+[float]
+=== Search rounded dates
+
+Queries on date fields that use `now` are typically not cacheable since the
+range that is being matched changes all the time. However switching to a
+rounded date is often acceptable in terms of user experience, and has the
+benefit of making better use of the query cache.
+
+For instance the below query:
+
+[source,js]
+--------------------------------------------------
+PUT index/type/1
+{
+  "my_date": "2016-05-11T16:30:55.328Z"
+}
+
+GET index/_search
+{
+  "query": {
+    "constant_score": {
+      "filter": {
+        "range": {
+          "my_date": {
+            "gte": "now-1h",
+            "lte": "now"
+          }
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// CONSOLE
+
+could be replaced with the following query:
+
+[source,js]
+--------------------------------------------------
+GET index/_search
+{
+  "query": {
+    "constant_score": {
+      "filter": {
+        "range": {
+          "my_date": {
+            "gte": "now-1h/m",
+            "lte": "now/m"
+          }
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// CONSOLE
+// TEST[continued]
+
+In that case we rounded to the minute, so if the current time is `16:31:29`,
+the range query will match everything whose value of the `my_date` field is
+between `15:31:00` and `16:31:59`. And if several users run a query that
+contains this range in the same minute, the query cache could help speed things
+up a bit. The longer the interval that is used for rounding, the more the query
+cache can help, but beware that too aggressive rounding might also hurt user
+experience.
+
+
+NOTE: It might be tempting to split ranges into a large cacheable part and
+smaller not cacheable parts in order to be able to leverage the query cache,
+as shown below:
+
+[source,js]
+--------------------------------------------------
+GET index/_search
+{
+  "query": {
+    "constant_score": {
+      "filter": {
+        "bool": {
+          "should": [
+            {
+              "range": {
+                "my_date": {
+                  "gte": "now-1h",
+                  "lte": "now-1h/m"
+                }
+              }
+            },
+            {
+              "range": {
+                "my_date": {
+                  "gt": "now-1h/m",
+                  "lt": "now/m"
+                }
+              }
+            },
+            {
+              "range": {
+                "my_date": {
+                  "gte": "now/m",
+                  "lte": "now"
+                }
+              }
+            }
+          ]
+        }
+      }
+    }
+  }
+}
+--------------------------------------------------
+// CONSOLE
+// TEST[continued]
+
+However such practice might make the query run slower in some cases since the
+overhead introduced by the `bool` query may defeat the savings from better
+leveraging the query cache.
+
 [float]
 === Force-merge read-only indices