The Profile API provides timing information about the execution of individual components of a search request. Using the Profile API, you can debug slow requests and understand how to improve their performance. The Profile API does not measure the following:
- Network latency
- Time spent in the search fetch phase
- Amount of time a request spends in queues
- Idle time while merging shard responses on the coordinating node
The Profile API is a resource-consuming operation that adds overhead to search operations.
{: .warning}
#### Example request
To use the Profile API, include the `profile` parameter set to `true` in the search request sent to the `_search` endpoint:
```json
GET /testindex/_search
{
"profile": true,
"query" : {
"match" : { "title" : "wind" }
}
}
```
{% include copy-curl.html %}
To turn on human-readable format, include the `?human=true` query parameter in the request:
```json
GET /testindex/_search?human=true
{
"profile": true,
"query" : {
"match" : { "title" : "wind" }
}
}
```
{% include copy-curl.html %}
The response contains an additional `time` field with human-readable units, for example:
```json
"collector": [
{
"name": "SimpleTopScoreDocCollector",
"reason": "search_top_hits",
"time": "113.7micros",
"time_in_nanos": 113711
}
]
```
The Profile API response is verbose, so if you're running the request through the `curl` command, include the `?pretty` query parameter to make the response easier to understand.
{: .tip}
#### Example response
The response contains profiling information:
<detailsclosedmarkdown="block">
<summary>
Response
</summary>
{: .text-delta}
```json
{
"took": 21,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.19363807,
"hits": [
{
"_index": "testindex",
"_id": "1",
"_score": 0.19363807,
"_source": {
"title": "The wind rises"
}
},
{
"_index": "testindex",
"_id": "2",
"_score": 0.17225474,
"_source": {
"title": "Gone with the wind",
"description": "A 1939 American epic historical film"
`profile.shards` | Array of objects | A search request can be executed against one or more shards in the index, and a search may involve one or more indexes. Thus, the `profile.shards` array contains profiling information for each shard that was involved in the search.
`profile.shards.id` | String | The shard ID of the shard in the `[node-ID][index-name][shard-ID]` format.
`profile.shards.searches` | Array of objects | A search represents a query executed against the underlying Lucene index. Most search requests execute a single search against a Lucene index, but some search requests can execute more than one search. For example, including a global aggregation results in a secondary `match_all` query for the global context. The `profile.shards` array contains profiling information about each search execution.
`profile.shards.searches.rewrite_time` | Integer | All Lucene queries are rewritten. A query and its children may be rewritten more than once, until the query stops changing. The rewriting process involves performing optimizations, such as removing redundant clauses or replacing a query path with a more efficient one. After the rewriting process, the original query may change significantly. The `rewrite_time` field contains the cumulative total rewrite time for the query and all its children, in nanoseconds.
[`profile.shards.searches.collector`](#the-collector-array) | Array of objects | Profiling information about the Lucene collectors that ran the search.
[`profile.shards.aggregations`](#aggregations) | Array of objects | Profiling information about the aggregation execution.
`type` | String | The Lucene query type into which the search query was rewritten. Corresponds to the Lucene class name (which often has the same name in OpenSearch).
`description` | String | Contains a Lucene explanation of the query. Helps differentiate queries with the same type.
`time_in_nanos` | Long | The amount of time the query took to execute, in nanoseconds. In a parent query, the time is inclusive of the execution times of all the child queries.
`children` | Array of objects | If a query has subqueries (children), this field contains information about the subqueries.
### The `breakdown` object
The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object comprises the following fields. All fields contain integer values.
Field | Description
:--- | :---
`create_weight` | A `Query` object in Lucene is immutable. Yet, Lucene should be able to reuse `Query` objects in multiple `IndexSearcher` objects. Thus, `Query` objects need to keep temporary state and statistics associated with the index in which the query is executed. To achieve reuse, every `Query` object generates a `Weight` object, which keeps the temporary context (state) associated with the `<IndexSearcher, Query>` tuple. The `create_weight` field contains the amount of time spent creating the `Weight` object.
`build_scorer` | A `Scorer` iterates over matching documents and generates a score for each document. The `build_scorer` field contains the amount of time spent generating the `Scorer` object. This does not include the time spent scoring the documents. The `Scorer` initialization time depends on the optimization and complexity of a particular query. The `build_scorer` parameter also includes the amount of time associated with caching, if caching is applicable and enabled for the query.
`next_doc` | The `next_doc` Lucene method returns the document ID of the next document that matches the query. This method is a special type of the `advance` method and is equivalent to `advance(docId() + 1)`. The `next_doc` method is more convenient for many Lucene queries. The `next_doc` field contains the amount of time required to determine the next matching document, which varies depending on the query type.
`advance` | The `advance` method is a lower-level version of the `next_doc` method in Lucene. It also finds the next matching document but necessitates that the calling query perform additional tasks, such as identifying skips. Some queries, such as conjunctions (`must` clauses in Boolean queries), cannot use `next_doc`. For those queries, `advance` is timed.
`match` | For some queries, document matching is performed in two steps. First, the document is matched approximately. Second, those documents that are approximately matched are examined through a more comprehensive process. For example, a phrase query first checks whether a document contains all terms in the phrase. Next, it verifies that the terms are in order (which is a more expensive process). The `match` field is non-zero only for those queries that use the two-step verification process.
`score` | Contains the time taken for a `Scorer` to score a particular document.
`shallow_advance` | Contains the amount of time required to execute the `advanceShallow` Lucene method.
`compute_max_score` | Contains the amount of time required to execute the `getMaxScore` Lucene method.
`set_min_competitive_score` | Contains the amount of time required to execute the `setMinCompetitiveScore` Lucene method.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `advance_count` contains the number of invocations of the `advance` method. Different invocations of the same method occur because the method is called on different documents. You can determine the selectivity of a query by comparing counts in different query components.
### The `collector` array
The `collector` array contains information about Lucene Collectors. A Collector is responsible for coordinating document traversal and scoring and collecting matching documents. Using Collectors, individual queries can record aggregation results and execute global queries or post-query filters.
Field | Description
:--- | :---
`name` | The collector name. In the [example response](#example-response), the `collector` is a single `SimpleTopScoreDocCollector`---the default scoring and sorting collector.
`reason` | Contains a description of the collector. For possible field values, see [Collector reasons](#collector-reasons).
`time_in_nanos` | A wall-clock time, including timing for all children.
`children` | If a collector has subcollectors (children), this field contains information about the subcollectors.
Collector times are calculated, combined, and normalized independently, so they are independent of query times.
{: .note}
#### Collector reasons
The following table describes all available collector reasons.
Reason | Description
:--- | :---
`search_sorted` | A collector that scores and sorts documents. Present in most simple searches.
`search_count` | A collector that counts the number of matching documents but does not fetch the source. Present when `size: 0` is specified.
`search_terminate_after_count` | A collector that searches for matching documents and terminates the search when it finds a specified number of documents. Present when the `terminate_after_count` query parameter is specified.
`search_min_score` | A collector that returns matching documents that have a score greater than a minimum score. Present when the `min_score` parameter is specified.
`search_multi` | A wrapper collector for other collectors. Present when search, aggregations, global aggregations, and post filters are combined in a single search.
`search_timeout` | A collector that stops running after a specified period of time. Present when a `timeout` parameter is specified.
`aggregation` | A collector for aggregations that is run against the specified query scope. OpenSearch uses a single `aggregation` collector to collect documents for all aggregations.
`global_aggregation` | A collector that is run against the global query scope. Global scope is different from a specified query scope, so in order to collect the entire dataset, a `match_all` query must be run.
## Aggregations
To profile aggregations, send an aggregation request and provide the `profile` parameter set to `true`.
#### Example request: Global aggregation
```json
GET /opensearch_dashboards_sample_data_ecommerce/_search
The `aggregations` array contains aggregation objects with the following fields.
Field | Data type | Description
:--- | :--- | :---
`type` | String | The aggregator type. In the [non-global aggregation example response](#example-response-non-global-aggregation), the aggregator type is `AvgAggregator`. [Global aggregation example response](#example-request-global-aggregation) contains a `GlobalAggregator` with an `AvgAggregator` child.
`description` | String | Contains a Lucene explanation of the aggregation. Helps differentiate aggregations with the same type.
`time_in_nanos` | Long | The amount of time taken to execute the aggregation, in nanoseconds. In a parent aggregation, the time is inclusive of the execution times of all the child aggregations.
`children` | Array of objects | If an aggregation has subaggregations (children), this field contains information about the subaggregations.
`debug` | Object | Some aggregations return a `debug` object that describes the details of the underlying execution.
### The `breakdown` object
The `breakdown` object represents the timing statistics about low-level Lucene execution, broken down by method. Each field in the `breakdown` object represents an internal Lucene method executed within the aggregation. Timings are listed in wall-clock nanoseconds and are not normalized. The `breakdown` timings are inclusive of all child times. The `breakdown` object is comprised of the following fields. All fields contain integer values.
Field | Description
:--- | :---
`initialize` | Contains the amount of time taken to execute the `preCollection()` callback method during `AggregationCollectorManager` creation.
`build_leaf_collector`| Contains the time spent running the `getLeafCollector()` method of the aggregation, which creates a new collector to collect the given context.
`collect`| Contains the time spent collecting the documents into buckets.
`post_collection`| Contains the time spent running the aggregation’s `postCollection()` callback method.
`build_aggregation`| Contains the time spent running the aggregation’s `buildAggregations()` method, which builds the results of this aggregation.
`reduce`| Contains the time spent in the `reduce` phase.
`<method>_count` | Contains the number of invocations of a `<method>`. For example, `build_leaf_collector_count` contains the number of invocations of the `build_leaf_collector` method.
## Concurrent segment search
Starting in OpenSearch 2.10, [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/) allows each shard-level request to search segments in parallel during the query phase. If you enable the experimental concurrent segment search feature flag, the Profile API response will contain several additional fields with statistics about _slices_.
A slice is the unit of work that can be executed by a thread. Each query can be partitioned into multiple slices, with each slice containing one or more segments. All the slices can be executed either in parallel or in some order depending on the available threads in the pool.
In general, the max/min/avg slice time captures statistics across all slices for a timing type. For example, when profiling aggregations, the `max_slice_time_in_nanos` field in the `aggregations` section shows the maximum time consumed by the aggregation operation and its children across all slices.
#### Example response
The following is an example response for a concurrent search with three segment slices:
|`time_in_nanos` | The total elapsed time for this query, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). |
|`max_slice_time_in_nanos` | The maximum amount of time taken by any slice to run a query, in nanoseconds. |
|`min_slice_time_in_nanos` | The minimum amount of time taken by any slice to run a query, in nanoseconds. |
|`avg_slice_time_in_nanos` | The average amount of time taken by any slice to run a query, in nanoseconds. |
|`breakdown.<method>` | For concurrent segment search, `time_in_nanos` is the total time spent across all the slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `build_scorer` method, it is the total time spent constructing the `Scorer` object across all slices. |
|`breakdown.max_<method>` | The maximum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `max` time because the method runs at the query level rather than the slice level. |
|`breakdown.min_<method>` | The minimum amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `min` time because the method runs at the query level rather than the slice level. |
|`breakdown.avg_<method>` | The average amount of time taken by any slice to run a query method. Breakdown stats for the `create_weight` method do not include profiled `avg` time because the method runs at the query level rather than the slice level. |
|`breakdown.<method>_count` | For concurrent segment search, this field contains the total number of invocations of a `<method>` obtained by adding the number of method invocations for all slices. |
|`breakdown.max_<method>_count` | The maximum number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `max` count because the method runs at the query level rather than the slice level. |
|`breakdown.min_<method>_count` | The minimum number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `min` count because the method runs at the query level rather than the slice level. |
|`breakdown.avg_<method>_count` | The average number of invocations of a `<method>` on any slice. Breakdown stats for the `create_weight` method do not include profiled `avg` count because the method runs at the query level rather than the slice level. |
|`time_in_nanos` |The total elapsed time for this collector, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). |
|`time_in_nanos` |The total elapsed time for this aggregation, in nanoseconds. For concurrent segment search, `time_in_nanos` is the total amount of time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). |
|`<method>` |The total elapsed time across all slices (the difference between the last completed slice execution end time and the first slice execution start time). For example, for the `collect` method, it is the total time spent collecting documents into buckets across all slices. |
|`max_<method>` |The maximum amount of time taken by any slice to run an aggregation method. |
|`min_<method>`|The minimum amount of time taken by any slice to run an aggregation method. |
|`avg_<method>` |The average amount of time taken by any slice to run an aggregation method. |
|`<method>_count` |The total method count across all slices. For example, for the `collect` method, it is the total number of invocations of this method needed to collect documents into buckets across all slices. |
|`max_<method>_count` |The maximum number of invocations of a `<method>` on any slice. |
|`min_<method>_count` |The minimum number of invocations of a `<method>` on any slice. |
|`avg_<method>_count` |The average number of invocations of a `<method>` on any slice. |