mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-03-25 01:19:02 +00:00
Add more information to the how-to docs. #20297
- use auto-generated ids for indexing #20211 - use rounded dates in queries #20115
This commit is contained in:
parent
28d7ebe8f8
commit
cdc27b75b8
@ -67,6 +67,15 @@ The filesystem cache will be used in order to buffer I/O operations. You should
|
||||
make sure to give at least half the memory of the machine running elasticsearch
|
||||
to the filesystem cache.
|
||||
|
||||
[float]
|
||||
=== Use auto-generated ids
|
||||
|
||||
When indexing a document that has an explicit id, elasticsearch needs to check
|
||||
whether a document with the same id already exists within the same shard, which
|
||||
is a costly operation and gets even more costly as the index grows. By using
|
||||
auto-generated ids, Elasticsearch can skip this check, which makes indexing
|
||||
faster.
|
||||
|
||||
[float]
|
||||
=== Use faster hardware
|
||||
|
||||
|
@ -140,6 +140,124 @@ being mapped as <<keyword,`keyword`>> rather than `integer` or `long`.
|
||||
In general, scripts should be avoided. If they are absolutely needed, you
|
||||
should prefer the `painless` and `expressions` engines.
|
||||
|
||||
[float]
|
||||
=== Search rounded dates
|
||||
|
||||
Queries on date fields that use `now` are typically not cacheable since the
|
||||
range that is being matched changes all the time. However switching to a
|
||||
rounded date is often acceptable in terms of user experience, and has the
|
||||
benefit of making better use of the query cache.
|
||||
|
||||
For instance the below query:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
PUT index/type/1
|
||||
{
|
||||
"my_date": "2016-05-11T16:30:55.328Z"
|
||||
}
|
||||
|
||||
GET index/_search
|
||||
{
|
||||
"query": {
|
||||
"constant_score": {
|
||||
"filter": {
|
||||
"range": {
|
||||
"my_date": {
|
||||
"gte": "now-1h",
|
||||
"lte": "now"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
|
||||
could be replaced with the following query:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET index/_search
|
||||
{
|
||||
"query": {
|
||||
"constant_score": {
|
||||
"filter": {
|
||||
"range": {
|
||||
"my_date": {
|
||||
"gte": "now-1h/m",
|
||||
"lte": "now/m"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
In that case we rounded to the minute, so if the current time is `16:31:29`,
|
||||
the range query will match everything whose value of the `my_date` field is
|
||||
between `15:31:00` and `16:31:59`. And if several users run a query that
|
||||
contains this range in the same minute, the query cache could help speed things
|
||||
up a bit. The longer the interval that is used for rounding, the more the query
|
||||
cache can help, but beware that too aggressive rounding might also hurt user
|
||||
experience.
|
||||
|
||||
|
||||
NOTE: It might be tempting to split ranges into a large cacheable part and
|
||||
smaller not cacheable parts in order to be able to leverage the query cache,
|
||||
as shown below:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
GET index/_search
|
||||
{
|
||||
"query": {
|
||||
"constant_score": {
|
||||
"filter": {
|
||||
"bool": {
|
||||
"should": [
|
||||
{
|
||||
"range": {
|
||||
"my_date": {
|
||||
"gte": "now-1h",
|
||||
"lte": "now-1h/m"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"range": {
|
||||
"my_date": {
|
||||
"gt": "now-1h/m",
|
||||
"lt": "now/m"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"range": {
|
||||
"my_date": {
|
||||
"gte": "now/m",
|
||||
"lte": "now"
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[continued]
|
||||
|
||||
However such practice might make the query run slower in some cases since the
|
||||
overhead introduced by the `bool` query may defeat the savings from better
|
||||
leveraging the query cache.
|
||||
|
||||
[float]
|
||||
=== Force-merge read-only indices
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user