diff --git a/docs/reference/docs/refresh.asciidoc b/docs/reference/docs/refresh.asciidoc index c5b19903dac..479e6e8cf26 100644 --- a/docs/reference/docs/refresh.asciidoc +++ b/docs/reference/docs/refresh.asciidoc @@ -31,15 +31,15 @@ visible at some point after the request returns. [float] ==== Choosing which setting to use - -Unless you have a good reason to wait for the change to become visible always -use `refresh=false`, or, because that is the default, just leave the `refresh` -parameter out of the URL. That is the simplest and fastest choice. +// tag::refresh-default[] +Unless you have a good reason to wait for the change to become visible, always +use `refresh=false` (the default setting). The simplest and fastest choice is to omit the `refresh` parameter from the URL. If you absolutely must have the changes made by a request visible synchronously -with the request then you must pick between putting more load on -Elasticsearch (`true`) and waiting longer for the response (`wait_for`). Here -are a few points that should inform that decision: +with the request, you must choose between putting more load on +Elasticsearch (`true`) and waiting longer for the response (`wait_for`). +// end::refresh-default[] +Here are a few points that should inform that decision: * The more changes being made to the index the more work `wait_for` saves compared to `true`. In the case that the index is only changed once every diff --git a/docs/reference/images/lucene-in-memory-buffer.png b/docs/reference/images/lucene-in-memory-buffer.png new file mode 100644 index 00000000000..37674886183 Binary files /dev/null and b/docs/reference/images/lucene-in-memory-buffer.png differ diff --git a/docs/reference/images/lucene-written-not-committed.png b/docs/reference/images/lucene-written-not-committed.png new file mode 100644 index 00000000000..9d295fb412f Binary files /dev/null and b/docs/reference/images/lucene-written-not-committed.png differ diff --git a/docs/reference/intro.asciidoc b/docs/reference/intro.asciidoc index bf103ee26e1..6e9fc6999c2 100644 --- a/docs/reference/intro.asciidoc +++ b/docs/reference/intro.asciidoc @@ -7,9 +7,9 @@ the {stack}. {ls} and {beats} facilitate collecting, aggregating, and enriching your data and storing it in {es}. {kib} enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack. {es} is where the indexing, search, and analysis -magic happen. +magic happens. -{es} provides real-time search and analytics for all types of data. Whether you +{es} provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, {es} can efficiently store and index it in a way that supports fast searches. You can go far beyond simple data retrieval and aggregate information to discover @@ -43,8 +43,7 @@ as JSON documents. When you have multiple {es} nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node. -When a document is stored, it is indexed and fully searchable in near -real-time--within 1 second. {es} uses a data structure called an +When a document is stored, it is indexed and fully searchable in <>--within 1 second. {es} uses a data structure called an inverted index that supports very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in. diff --git a/docs/reference/search/index.asciidoc b/docs/reference/search/index.asciidoc index 57618627187..628765df31b 100644 --- a/docs/reference/search/index.asciidoc +++ b/docs/reference/search/index.asciidoc @@ -14,7 +14,7 @@ Depending on your data, you can use a query to get answers to questions like: * What pages on my website contain a specific word or phrase? * What processes on my server take longer than 500 milliseconds to respond? * What users on my network ran `regsvr32.exe` within the last week? -* How many of my products have a price greater than $20? +* How many of my products have a price greater than $20? A _search_ consists of one or more queries that are combined and sent to {es}. Documents that match a search's queries are returned in the _hits_, or @@ -29,11 +29,13 @@ a specific number of results. === In this section * <> +* <> * <> * <> -- include::run-a-search.asciidoc[] +include::{es-repo-dir}/search/near-real-time.asciidoc[] include::{es-repo-dir}/async-search.asciidoc[] -include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[] \ No newline at end of file +include::{es-repo-dir}/modules/cross-cluster-search.asciidoc[] diff --git a/docs/reference/search/near-real-time.asciidoc b/docs/reference/search/near-real-time.asciidoc new file mode 100644 index 00000000000..fe24a593cff --- /dev/null +++ b/docs/reference/search/near-real-time.asciidoc @@ -0,0 +1,25 @@ +[[near-real-time]] +== Near real-time search +The overview of <> indicates that when a document is stored in {es}, it is indexed and fully searchable in _near real-time_--within 1 second. What defines near real-time search? + +Lucene, the Java libraries on which {es} is based, introduced the concept of per-segment search. A _segment_ is similar to an inverted index, but the word _index_ in Lucene means "a collection of segments plus a commit point". After a commit, a new segment is added to the commit point and the buffer is cleared. + +Sitting between {es} and the disk is the filesystem cache. Documents in the in-memory indexing buffer (<>) are written to a new segment (<>). The new segment is written to the filesystem cache first (which is cheap) and only later is it flushed to disk (which is expensive). However, after a file is in the cache, it can be opened and read just like any other file. + +[[img-pre-refresh]] +.A Lucene index with new documents in the in-memory buffer +image::images/lucene-in-memory-buffer.png["A Lucene index with new documents in the in-memory buffer"] + +Lucene allows new segments to be written and opened, making the documents they contain visible to search ​without performing a full commit. This is a much lighter process than a commit to disk, and can be done frequently without degrading performance. + +[[img-post-refresh]] +.The buffer contents are written to a segment, which is searchable, but is not yet committed +image::images/lucene-written-not-committed.png["The buffer contents are written to a segment, which is searchable, but is not yet committed"] + +In {es}, this process of writing and opening a new segment is called a _refresh_. A refresh makes all operations performed on an index since the last refresh available for search. You can control refreshes through the following means: + +* Waiting for the refresh interval +* Setting the <> option +* Using the <> to explicitly complete a refresh (`POST _refresh`) + +By default, {es} periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. This is why we say that {es} has _near_ real-time search: document changes are not visible to search immediately, but will become visible within this timeframe.