OpenSearch/docs/reference/search/near-real-time.asciidoc

26 lines
2.4 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[[near-real-time]]
== Near real-time search
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search?
Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared.
Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<<img-pre-refresh,Figure 1>>) are written to a new segment (<<img-post-refresh,Figure 2>>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.
[[img-pre-refresh]]
.A Lucene index with new documents in the in-memory buffer
image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"]
Lucene allows new segments to be written and opened, making the documents they contain visible to search without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance.
[[img-post-refresh]]
.The buffer contents are written to a segment, which is searchable, but is not yet committed
image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"]
In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means:
* Waiting for the refresh interval
* Setting the <<docs-refresh,?refresh>> option
* Using the <<indices-refresh,Refresh API>> to explicitly complete a refresh (`POST _refresh`)
By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe.