diff --git a/_query-dsl/compound/hybrid.md b/_query-dsl/compound/hybrid.md index f317205c..cb3405da 100644 --- a/_query-dsl/compound/hybrid.md +++ b/_query-dsl/compound/hybrid.md @@ -12,7 +12,7 @@ You can use a hybrid query to combine relevance scores from multiple queries int ## Example -Before using a `hybrid` query, you must configure a search pipeline with a [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (see [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor#example)). +Before using a `hybrid` query, you must configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) (see [this example]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor#example)). To try out the example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial). diff --git a/_search-plugins/search-pipelines/normalization-processor.md b/_search-plugins/search-pipelines/normalization-processor.md index be854d13..3b0a1417 100644 --- a/_search-plugins/search-pipelines/normalization-processor.md +++ b/_search-plugins/search-pipelines/normalization-processor.md @@ -9,7 +9,7 @@ grand_parent: Search pipelines # Normalization processor -The `normalization_processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase. +The `normalization-processor` is a search phase results processor that runs between the query and fetch phases of search execution. It intercepts the query phase results and then normalizes and combines the document scores from different query clauses before passing the documents to the fetch phase. ## Score normalization and combination @@ -21,7 +21,7 @@ OpenSearch supports two search types: `query_then_fetch` and `dfs_query_then_fet ![Normalization processor flow diagram]({{site.url}}{{site.baseurl}}/images/normalization-processor.png) -When you send a search request to a node, the node becomes a _coordinating node_. During the first phase of search, the _query phase_, the coordinating node routes the search request to all shards in the index, including primary and replica shards. Each shard then runs the search query locally and returns metadata about the matching documents, which includes their document IDs and relevance scores. The `normalization_processor` then normalizes and combines scores from different query clauses. The coordinating node merges and sorts the local lists of results, compiling a global list of top documents that match the query. After that, search execution enters a _fetch phase_, in which the coordinating node requests the documents in the global list from the shards where they reside. Each shard returns the documents' `_source` to the coordinating node. Finally, the coordinating node sends a search response containing the results back to you. +When you send a search request to a node, the node becomes a _coordinating node_. During the first phase of search, the _query phase_, the coordinating node routes the search request to all shards in the index, including primary and replica shards. Each shard then runs the search query locally and returns metadata about the matching documents, which includes their document IDs and relevance scores. The `normalization-processor` then normalizes and combines scores from different query clauses. The coordinating node merges and sorts the local lists of results, compiling a global list of top documents that match the query. After that, search execution enters a _fetch phase_, in which the coordinating node requests the documents in the global list from the shards where they reside. Each shard returns the documents' `_source` to the coordinating node. Finally, the coordinating node sends a search response containing the results back to you. ## Request fields @@ -38,11 +38,11 @@ Field | Data type | Description ## Example -The following example demonstrates using a search pipeline with a `normalization_processor`. To try out this example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial). +The following example demonstrates using a search pipeline with a `normalization-processor`. To try out this example, follow the [Semantic search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial). ### Creating a search pipeline -The following request creates a search pipeline containing a `normalization_processor` that uses the `min_max` normalization technique and the `arithmetic_mean` combination technique: +The following request creates a search pipeline containing a `normalization-processor` that uses the `min_max` normalization technique and the `arithmetic_mean` combination technique: ```json PUT /_search/pipeline/nlp-search-pipeline @@ -114,4 +114,4 @@ For more information, see [Hybrid query]({{site.url}}{{site.baseurl}}/query-dsl/ To improve search relevance, we recommend increasing the sample size. -If the hybrid query does not return some expected results, it may be because the subqueries return too few documents. The `normalization_processor` only transforms the results returned by each subquery; it does not perform any additional sampling. During our experiments, we used [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to measure quality of information retrieval depending on the number of documents returned (the size). We have found that a size in the [100, 200] range works best for datasets of up to 10M documents. We do not recommend increasing the size beyond the recommended values because higher size values do not improve search relevance but increase search latency. +If the hybrid query does not return some expected results, it may be because the subqueries return too few documents. The `normalization-processor` only transforms the results returned by each subquery; it does not perform any additional sampling. During our experiments, we used [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to measure quality of information retrieval depending on the number of documents returned (the size). We have found that a size in the [100, 200] range works best for datasets of up to 10M documents. We do not recommend increasing the size beyond the recommended values because higher size values do not improve search relevance but increase search latency. diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 808ddf54..e4ef4e5f 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -46,7 +46,7 @@ The following table lists all supported search request processors. Processor | Description | Earliest available version :--- | :--- | :--- -[`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Intercepts the query phase results and normalizes and combines the document scores before passing the documents to the fetch phase. | 2.10 +[`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) | Intercepts the query phase results and normalizes and combines the document scores before passing the documents to the fetch phase. | 2.10 ## Viewing available processor types