OpenSearch/docs/reference/search/search-your-data/near-real-time.asciidoc

26 lines
2.4 KiB
Plaintext
Raw Normal View History

[[near-real-time]]
== Near real-time search
The overview of <<documents-indices,documents and indices>> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search?
Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared.
Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<<img-pre-refresh,Figure 1>>) are written to a new segment (<<img-post-refresh,Figure 2>>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file.
[[img-pre-refresh]]
.A Lucene index with new documents in the in-memory buffer
image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"]
Lucene allows new segments to be written and opened, making the documents they contain visible to search without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance.
[[img-post-refresh]]
.The buffer contents are written to a segment, which is searchable, but is not yet committed
image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"]
In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means:
* Waiting for the refresh interval
* Setting the <<docs-refresh,?refresh>> option
* Using the <<indices-refresh,Refresh API>> to explicitly complete a refresh (`POST _refresh`)
By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe.