diff --git a/docs/reference/search/profile.asciidoc b/docs/reference/search/profile.asciidoc index 366af6d57c2..b62d83eee6b 100644 --- a/docs/reference/search/profile.asciidoc +++ b/docs/reference/search/profile.asciidoc @@ -247,14 +247,14 @@ The meaning of the stats are as follows: === All parameters: [horizontal] - `create_weight`:: A Query in Lucene must be capable of reuse across multiple IndexSearchers (think of it as the engine that executes a search against a specific Lucene Index). This puts Lucene in a tricky spot, since many queries need to accumulate temporary state/statistics associated with the index it is being used against, but the Query contract mandates that it must be immutable. - + {empty} + + {empty} + To get around this, Lucene asks each query to generate a Weight object which acts as a temporary context object to hold state associated with this particular (IndexSearcher, Query) tuple. The `weight` metric shows how long this process takes @@ -265,7 +265,8 @@ The meaning of the stats are as follows: iterates over matching documents generates a score per-document (e.g. how well does "foo" match the document?). Note, this records the time required to generate the Scorer object, not actuall score the documents. Some queries have faster or slower initialization of the Scorer, depending on optimizations, complexity, etc. - + {empty} + + {empty} + This may also showing timing associated with caching, if enabled and/or applicable for the query `next_doc`:: @@ -280,7 +281,8 @@ The meaning of the stats are as follows: `advance` is the "lower level" version of next_doc: it serves the same purpose of finding the next matching doc, but requires the calling query to perform extra tasks such as identifying and moving past skips, etc. However, not all queries can use next_doc, so `advance` is also timed for those queries. - + {empty} + + {empty} + Conjunctions (e.g. `must` clauses in a boolean) are typical consumers of `advance` `matches`:: @@ -288,18 +290,21 @@ The meaning of the stats are as follows: Some queries, such as phrase queries, match documents using a "Two Phase" process. First, the document is "approximately" matched, and if it matches approximately, it is checked a second time with a more rigorous (and expensive) process. The second phase verification is what the `matches` statistic measures. - + {empty} + + {empty} + For example, a phrase query first checks a document approximately by ensuring all terms in the phrase are present in the doc. If all the terms are present, it then executes the second phase verification to ensure the terms are in-order to form the phrase, which is relatively more expensive than just checking for presence of the terms. - + {empty} + + {empty} + Because this two-phase process is only used by a handful of queries, the `metric` statistic will often be zero `score`:: This records the time taken to score a particular document via it's Scorer + === `collectors` Section The Collectors portion of the response shows high-level execution details. Lucene works by defining a "Collector" @@ -378,15 +383,15 @@ For reference, the various collector reason's are: === `rewrite` Section - All queries in Lucene undergo a "rewriting" process. A query (and its sub-queries) may be rewritten one or - more times, and the process continues until the query stops changing. This process allows Lucene to perform - optimizations, such as removing redundant clauses, replacing one query for a more efficient execution path, - etc. For example a Boolean -> Boolean -> TermQuery can be rewritten to a TermQuery, because all the Booleans - are unnecessary in this case. +All queries in Lucene undergo a "rewriting" process. A query (and its sub-queries) may be rewritten one or +more times, and the process continues until the query stops changing. This process allows Lucene to perform +optimizations, such as removing redundant clauses, replacing one query for a more efficient execution path, +etc. For example a Boolean -> Boolean -> TermQuery can be rewritten to a TermQuery, because all the Booleans +are unnecessary in this case. - The rewriting process is complex and difficult to display, since queries can change drastically. Rather than - showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds). This - value is cumulative and contains the total time for all queries being rewritten. +The rewriting process is complex and difficult to display, since queries can change drastically. Rather than +showing the intermediate results, the total rewrite time is simply displayed as a value (in nanoseconds). This +value is cumulative and contains the total time for all queries being rewritten. === A more complex example @@ -553,7 +558,7 @@ represented: 1. The first `TermQuery` (message:search) represents the main `term` query 2. The second `TermQuery` (my_field:foo) represents the `post_filter` query -3. There is a `MatchAllDocsQuery` (*:*) query which is being executed as a second, distinct search. This was +3. There is a `MatchAllDocsQuery` (\*:*) query which is being executed as a second, distinct search. This was not part of the query specified by the user, but is auto-generated by the global aggregation to provide a global query scope The Collector tree is fairly straightforward, showing how a single MultiCollector wraps a FilteredCollector