421 lines
14 KiB
Plaintext
421 lines
14 KiB
Plaintext
[[tune-for-search-speed]]
|
|
== Tune for search speed
|
|
|
|
[float]
|
|
=== Give memory to the filesystem cache
|
|
|
|
Elasticsearch heavily relies on the filesystem cache in order to make search
|
|
fast. In general, you should make sure that at least half the available memory
|
|
goes to the filesystem cache so that Elasticsearch can keep hot regions of the
|
|
index in physical memory.
|
|
|
|
[float]
|
|
=== Use faster hardware
|
|
|
|
If your search is I/O bound, you should investigate giving more memory to the
|
|
filesystem cache (see above) or buying faster drives. In particular SSD drives
|
|
are known to perform better than spinning disks. Always use local storage,
|
|
remote filesystems such as `NFS` or `SMB` should be avoided. Also beware of
|
|
virtualized storage such as Amazon's `Elastic Block Storage`. Virtualized
|
|
storage works very well with Elasticsearch, and it is appealing since it is so
|
|
fast and simple to set up, but it is also unfortunately inherently slower on an
|
|
ongoing basis when compared to dedicated local storage. If you put an index on
|
|
`EBS`, be sure to use provisioned IOPS otherwise operations could be quickly
|
|
throttled.
|
|
|
|
If your search is CPU-bound, you should investigate buying faster CPUs.
|
|
|
|
[float]
|
|
=== Document modeling
|
|
|
|
Documents should be modeled so that search-time operations are as cheap as possible.
|
|
|
|
In particular, joins should be avoided. <<nested,`nested`>> can make queries
|
|
several times slower and <<mapping-parent-field,parent-child>> relations can make
|
|
queries hundreds of times slower. So if the same questions can be answered without
|
|
joins by denormalizing documents, significant speedups can be expected.
|
|
|
|
[float]
|
|
=== Search as few fields as possible
|
|
|
|
The more fields a <<query-dsl-query-string-query,`query_string`>> or
|
|
<<query-dsl-multi-match-query,`multi_match`>> query targets, the slower it is.
|
|
A common technique to improve search speed over multiple fields is to copy
|
|
their values into a single field at index time, and then use this field at
|
|
search time. This can be automated with the <<copy-to,`copy-to`>> directive of
|
|
mappings without having to change the source of documents. Here is an example
|
|
of an index containing movies that optimizes queries that search over both the
|
|
name and the plot of the movie by indexing both values into the `name_and_plot`
|
|
field.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT movies
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"name_and_plot": {
|
|
"type": "text"
|
|
},
|
|
"name": {
|
|
"type": "text",
|
|
"copy_to": "name_and_plot"
|
|
},
|
|
"plot": {
|
|
"type": "text",
|
|
"copy_to": "name_and_plot"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
=== Pre-index data
|
|
|
|
You should leverage patterns in your queries to optimize the way data is indexed.
|
|
For instance, if all your documents have a `price` field and most queries run
|
|
<<search-aggregations-bucket-range-aggregation,`range`>> aggregations on a fixed
|
|
list of ranges, you could make this aggregation faster by pre-indexing the ranges
|
|
into the index and using a <<search-aggregations-bucket-terms-aggregation,`terms`>>
|
|
aggregations.
|
|
|
|
For instance, if documents look like:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT index/_doc/1
|
|
{
|
|
"designation": "spoon",
|
|
"price": 13
|
|
}
|
|
--------------------------------------------------
|
|
|
|
and search requests look like:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET index/_search
|
|
{
|
|
"aggs": {
|
|
"price_ranges": {
|
|
"range": {
|
|
"field": "price",
|
|
"ranges": [
|
|
{ "to": 10 },
|
|
{ "from": 10, "to": 100 },
|
|
{ "from": 100 }
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
Then documents could be enriched by a `price_range` field at index time, which
|
|
should be mapped as a <<keyword,`keyword`>>:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"price_range": {
|
|
"type": "keyword"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
PUT index/_doc/1
|
|
{
|
|
"designation": "spoon",
|
|
"price": 13,
|
|
"price_range": "10-100"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
And then search requests could aggregate this new field rather than running a
|
|
`range` aggregation on the `price` field.
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET index/_search
|
|
{
|
|
"aggs": {
|
|
"price_ranges": {
|
|
"terms": {
|
|
"field": "price_range"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
[float]
|
|
[[map-ids-as-keyword]]
|
|
=== Consider mapping identifiers as `keyword`
|
|
|
|
include::../mapping/types/numeric.asciidoc[tag=map-ids-as-keyword]
|
|
|
|
[float]
|
|
=== Avoid scripts
|
|
|
|
In general, scripts should be avoided. If they are absolutely needed, you
|
|
should prefer the `painless` and `expressions` engines.
|
|
|
|
[float]
|
|
=== Search rounded dates
|
|
|
|
Queries on date fields that use `now` are typically not cacheable since the
|
|
range that is being matched changes all the time. However switching to a
|
|
rounded date is often acceptable in terms of user experience, and has the
|
|
benefit of making better use of the query cache.
|
|
|
|
For instance the below query:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT index/_doc/1
|
|
{
|
|
"my_date": "2016-05-11T16:30:55.328Z"
|
|
}
|
|
|
|
GET index/_search
|
|
{
|
|
"query": {
|
|
"constant_score": {
|
|
"filter": {
|
|
"range": {
|
|
"my_date": {
|
|
"gte": "now-1h",
|
|
"lte": "now"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
could be replaced with the following query:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET index/_search
|
|
{
|
|
"query": {
|
|
"constant_score": {
|
|
"filter": {
|
|
"range": {
|
|
"my_date": {
|
|
"gte": "now-1h/m",
|
|
"lte": "now/m"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
In that case we rounded to the minute, so if the current time is `16:31:29`,
|
|
the range query will match everything whose value of the `my_date` field is
|
|
between `15:31:00` and `16:31:59`. And if several users run a query that
|
|
contains this range in the same minute, the query cache could help speed things
|
|
up a bit. The longer the interval that is used for rounding, the more the query
|
|
cache can help, but beware that too aggressive rounding might also hurt user
|
|
experience.
|
|
|
|
|
|
NOTE: It might be tempting to split ranges into a large cacheable part and
|
|
smaller not cacheable parts in order to be able to leverage the query cache,
|
|
as shown below:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET index/_search
|
|
{
|
|
"query": {
|
|
"constant_score": {
|
|
"filter": {
|
|
"bool": {
|
|
"should": [
|
|
{
|
|
"range": {
|
|
"my_date": {
|
|
"gte": "now-1h",
|
|
"lte": "now-1h/m"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"range": {
|
|
"my_date": {
|
|
"gt": "now-1h/m",
|
|
"lt": "now/m"
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"range": {
|
|
"my_date": {
|
|
"gte": "now/m",
|
|
"lte": "now"
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[continued]
|
|
|
|
However such practice might make the query run slower in some cases since the
|
|
overhead introduced by the `bool` query may defeat the savings from better
|
|
leveraging the query cache.
|
|
|
|
[float]
|
|
=== Force-merge read-only indices
|
|
|
|
Indices that are read-only may benefit from being <<indices-forcemerge,merged
|
|
down to a single segment>>. This is typically the case with time-based indices:
|
|
only the index for the current time frame is getting new documents while older
|
|
indices are read-only. Shards that have been force-merged into a single segment
|
|
can use simpler and more efficient data structures to perform searches.
|
|
|
|
IMPORTANT: Do not force-merge indices to which you are still writing, or to
|
|
which you will write again in the future. Instead, rely on the automatic
|
|
background merge process to perform merges as needed to keep the index running
|
|
smoothly. If you continue to write to a force-merged index then its performance
|
|
may become much worse.
|
|
|
|
[float]
|
|
=== Warm up global ordinals
|
|
|
|
Global ordinals are a data-structure that is used in order to run
|
|
<<search-aggregations-bucket-terms-aggregation,`terms`>> aggregations on
|
|
<<keyword,`keyword`>> fields. They are loaded lazily in memory because
|
|
Elasticsearch does not know which fields will be used in `terms` aggregations
|
|
and which fields won't. You can tell Elasticsearch to load global ordinals
|
|
eagerly when starting or refreshing a shard by configuring mappings as
|
|
described below:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT index
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"foo": {
|
|
"type": "keyword",
|
|
"eager_global_ordinals": true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[float]
|
|
=== Warm up the filesystem cache
|
|
|
|
If the machine running Elasticsearch is restarted, the filesystem cache will be
|
|
empty, so it will take some time before the operating system loads hot regions
|
|
of the index into memory so that search operations are fast. You can explicitly
|
|
tell the operating system which files should be loaded into memory eagerly
|
|
depending on the file extension using the
|
|
<<preload-data-to-file-system-cache,`index.store.preload`>> setting.
|
|
|
|
WARNING: Loading data into the filesystem cache eagerly on too many indices or
|
|
too many files will make search _slower_ if the filesystem cache is not large
|
|
enough to hold all the data. Use with caution.
|
|
|
|
[float]
|
|
=== Use index sorting to speed up conjunctions
|
|
|
|
<<index-modules-index-sorting,Index sorting>> can be useful in order to make
|
|
conjunctions faster at the cost of slightly slower indexing. Read more about it
|
|
in the <<index-modules-index-sorting-conjunctions,index sorting documentation>>.
|
|
|
|
[float]
|
|
[[preference-cache-optimization]]
|
|
=== Use `preference` to optimize cache utilization
|
|
|
|
There are multiple caches that can help with search performance, such as the
|
|
https://en.wikipedia.org/wiki/Page_cache[filesystem cache], the
|
|
<<shard-request-cache,request cache>> or the <<query-cache,query cache>>. Yet
|
|
all these caches are maintained at the node level, meaning that if you run the
|
|
same request twice in a row, have 1 <<glossary-replica-shard,replica>> or more
|
|
and use https://en.wikipedia.org/wiki/Round-robin_DNS[round-robin], the default
|
|
routing algorithm, then those two requests will go to different shard copies,
|
|
preventing node-level caches from helping.
|
|
|
|
Since it is common for users of a search application to run similar requests
|
|
one after another, for instance in order to analyze a narrower subset of the
|
|
index, using a preference value that identifies the current user or session
|
|
could help optimize usage of the caches.
|
|
|
|
[float]
|
|
=== Replicas might help with throughput, but not always
|
|
|
|
In addition to improving resiliency, replicas can help improve throughput. For
|
|
instance if you have a single-shard index and three nodes, you will need to
|
|
set the number of replicas to 2 in order to have 3 copies of your shard in
|
|
total so that all nodes are utilized.
|
|
|
|
Now imagine that you have a 2-shards index and two nodes. In one case, the
|
|
number of replicas is 0, meaning that each node holds a single shard. In the
|
|
second case the number of replicas is 1, meaning that each node has two shards.
|
|
Which setup is going to perform best in terms of search performance? Usually,
|
|
the setup that has fewer shards per node in total will perform better. The
|
|
reason for that is that it gives a greater share of the available filesystem
|
|
cache to each shard, and the filesystem cache is probably Elasticsearch's
|
|
number 1 performance factor. At the same time, beware that a setup that does
|
|
not have replicas is subject to failure in case of a single node failure, so
|
|
there is a trade-off between throughput and availability.
|
|
|
|
So what is the right number of replicas? If you have a cluster that has
|
|
`num_nodes` nodes, `num_primaries` primary shards _in total_ and if you want to
|
|
be able to cope with `max_failures` node failures at once at most, then the
|
|
right number of replicas for you is
|
|
`max(max_failures, ceil(num_nodes / num_primaries) - 1)`.
|
|
|
|
=== Tune your queries with the Profile API
|
|
|
|
You can also analyse how expensive each component of your queries and
|
|
aggregations are using the {ref}/search-profile.html[Profile API]. This might
|
|
allow you to tune your queries to be less expensive, resulting in a positive
|
|
performance result and reduced load. Also note that Profile API payloads can be
|
|
easily visualised for better readability in the
|
|
{kibana-ref}/xpack-profiler.html[Search Profiler], which is a Kibana dev tools
|
|
UI available in all X-Pack licenses, including the free X-Pack Basic license.
|
|
|
|
Some caveats to the Profile API are that:
|
|
|
|
- the Profile API as a debugging tool adds significant overhead to search execution and can also have a very verbose output
|
|
- given the added overhead, the resulting took times are not reliable indicators of actual took time, but can be used comparatively between clauses for relative timing differences
|
|
- the Profile API is best for exploring possible reasons behind the most costly clauses of a query but isn't intended for accurately measuring absolute timings of each clause
|
|
|
|
[[faster-phrase-queries]]
|
|
=== Faster phrase queries with `index_phrases`
|
|
|
|
The <<text,`text`>> field has an <<index-phrases,`index_phrases`>> option that
|
|
indexes 2-shingles and is automatically leveraged by query parsers to run phrase
|
|
queries that don't have a slop. If your use-case involves running lots of phrase
|
|
queries, this can speed up queries significantly.
|
|
|
|
[[faster-prefix-queries]]
|
|
=== Faster prefix queries with `index_prefixes`
|
|
|
|
The <<text,`text`>> field has an <<index-phrases,`index_prefixes`>> option that
|
|
indexes prefixes of all terms and is automatically leveraged by query parsers to
|
|
run prefix queries. If your use-case involves running lots of prefix queries,
|
|
this can speed up queries significantly.
|