From 485e140e932412ea85cc673e5bd7a23719aa8a3e Mon Sep 17 00:00:00 2001 From: Chris Hostetter Date: Mon, 9 Dec 2019 15:34:42 -0700 Subject: [PATCH] SOLR-14032: some misc ref-guide improvements related to clarifying the pros/cons of the diff ways to 'boost' documents by func/query --- solr/solr-ref-guide/src/other-parsers.adoc | 24 +++++++-- .../src/the-dismax-query-parser.adoc | 53 +++++++++++++++---- .../src/the-extended-dismax-query-parser.adoc | 22 +++++--- 3 files changed, 77 insertions(+), 22 deletions(-) diff --git a/solr/solr-ref-guide/src/other-parsers.adoc b/solr/solr-ref-guide/src/other-parsers.adoc index f7a17eb1290..12777a70859 100644 --- a/solr/solr-ref-guide/src/other-parsers.adoc +++ b/solr/solr-ref-guide/src/other-parsers.adoc @@ -204,24 +204,38 @@ A list of queries that *must* appear in matching documents. However, unlike `mus == Boost Query Parser -`BoostQParser` extends the `QParserPlugin` and creates a boosted query from the input value. The main value is the query to be boosted. Parameter `b` is the function query to use as the boost. The query to be boosted may be of any type. +`BoostQParser` extends the `QParserPlugin` and creates a boosted query from the input value. The main value is any query to be "wrapped" and "boosted" -- only documents which match that query will match the final query produced by this parter. Parameter `b` is a <> to be evaluted against each document that matches the original query, and the result of the function will be multiplied into into the final score for that document. === Boost Query Parser Examples -Creates a query "foo" which is boosted (scores are multiplied) by the function query `log(popularity)`: +Creates a query `name:foo` which is boosted (scores are multiplied) by the function query `log(popularity)`: [source,text] ---- -{!boost b=log(popularity)}foo +q={!boost b=log(popularity)}name:foo ---- -Creates a query "foo" which is boosted by the date boosting function referenced in `ReciprocalFloatFunction`: +Creates a query `name:foo` which has it's scores multiplied by the _inverse_ of the numeric `price` field -- effectively "demoting" documents which have a high `price` by lowering their final score: [source,text] ---- -{!boost b=recip(ms(NOW,mydatefield),3.16e-11,1,1)}foo +// NOTE: we "add 1" to the denominator to prevent divide by zero +q={!boost b=div(1,add(1,price))}name:foo ---- +The `<>` function is particularly useful for situations where you want to multiply (or divide) the score for each document matching your main query by the score that document would have from another query. + +This example uses <> to create a query for `name:foo` which is boosted by the scores from the independently specified query `category:electronics`: + +[source,text] +---- +q={!boost b=query($my_boost)}name:foo +my_boost=category:electronics +---- + + + + [[other-collapsing]] == Collapsing Query Parser diff --git a/solr/solr-ref-guide/src/the-dismax-query-parser.adoc b/solr/solr-ref-guide/src/the-dismax-query-parser.adoc index 182b37243c7..a2ca3d74592 100644 --- a/solr/solr-ref-guide/src/the-dismax-query-parser.adoc +++ b/solr/solr-ref-guide/src/the-dismax-query-parser.adoc @@ -115,35 +115,68 @@ A value of "0.0" - the default - makes the query a pure "disjunction max query": === bq (Boost Query) Parameter -The `bq` parameter specifies an additional, optional, query clause that will be added to the user's main query to influence the score. For example, if you wanted to add a relevancy boost for recent documents: +The `bq` parameter specifies an additional, optional, query clause that will be _added_ to the user's main query as optional clauses that will influence the score. For example, if you wanted to add a boost for documents that are in a particular category you could use: [source,text] ---- q=cheese -bq=date:[NOW/DAY-1YEAR TO NOW/DAY] +bq=category:food^10 ---- -You can specify multiple `bq` parameters. If you want your query to be parsed as separate clauses with separate boosts, use multiple `bq` parameters. +You can specify multiple `bq` parameters, which will each be added as separate clauses with separate boosts. + +[source,text] +---- +q=cheese +bq=category:food^10 +bq=category:deli^5 +---- + +Using the `bq` parameter in this way is functionally equivilent to combining your `q` and `bq` params into a single larger boolean query, where the (original) `q` param is "mandatory" and the other clauses are optional: + +[source,text] +---- +q=(+cheese category:food^10 category:deli^5) +---- + +The only difference between the above examples, is that using the `bq` param allows you to specify these extra clauses independently (ie: as configuration defaults) from the main query. + + +[TIP] +[[bq-bf-shortcomings]] +.Additive Boosts vs Multiplicative Boosts +==== +Generally speaking, using `bq` (or `bf`, below) is considered a poor way to "boost" documents by a secondary query because it has an "Additive" effect on the final score. The overall impact a particular `bq` param will have on a given document can vary a lot depending on the _absolute_ values of the scores from the original query as well as the `bq` query, which in turn depends on the complexity of the original query, and various scoring factors (TF, IDF, average field length, etc.) + +"Multiplicative Boosting" is generally considered to be a more predictable method of influcing document score, because it acts as a "scaling factor" -- increasing (or decreasing) the scores of each document by a _relative_ amount. + +The <> provides a convinient wrapper for implementing multiplicitive boosting, and the <> offers a `boost` query param short cut for using it. +==== === bf (Boost Functions) Parameter -The `bf` parameter specifies functions (with optional boosts) that will be used to construct FunctionQueries which will be added to the user's main query as optional clauses that will influence the score. Any function supported natively by Solr can be used, along with a boost value. For example: +The `bf` parameter specifies functions (with optional <>) that will be used to construct FunctionQueries which will be _added_ to the user's main query as optional clauses that will influence the score. Any <> can be used, along with a boost value. For example: [source,text] ---- -recip(rord(myfield),1,2,3)^1.5 +q=cheese +bf=div(1,sum(1,price))^1.5 ---- -Specifying functions with the bf parameter is essentially just shorthand for using the `bq` parameter combined with the `{!func}` parser. +Specifying functions with the bf parameter is essentially just shorthand for using the `bq` parameter (<<#bq-bf-shortcomings,with the same shortcomings>>) combined with the `{!func}` parser -- with the addition of the simplified "query boost" syntax. -For example, if you want to show the most recent documents first, you could use either of the following: +For example, the two `bf` params listed below, are completely equivilent to the two `bq` params below: [source,text] ---- -bf=recip(rord(creationDate),1,1000,1000) - ...or... -bq={!func}recip(rord(creationDate),1,1000,1000) +bf=div(sales_rank,ms(NOW,release_date)) +bf=div(1,sum(1,price))^1.5 +---- +[source,text] +---- +bq={!func}div(sales_rank,ms(NOW,release_date)) +bq={!lucene}( {!func v='div(1,sum(1,price))'} )^1.5 ---- diff --git a/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc b/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc index 70b8e7b73f1..29cc0071e88 100644 --- a/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc +++ b/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc @@ -27,7 +27,7 @@ In addition to supporting all the DisMax query parser parameters, Extended Disma * includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode. * improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied. * includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required. -* includes improved boost function: in Extended DisMax, the `boost` function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (`bf` and `bq`) are also supported. +* includes improved boost function: in Extended DisMax, the `boost` function is a multiplier <>, improving your boost results; the additive boost functions of DisMax (`bf` and `bq`) are also supported. * supports pure negative nested queries: queries such as `+foo (-foo)` will match all documents. * lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches. @@ -52,7 +52,20 @@ If `true`, the number of clauses required (<> whose results will be multiplied into the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the <>. + +These two examples are equivilent: +[source,text] +---- +q={!edismax qf=name}ipod +boost=div(1,sum(1,price)) +---- +[source,text] +---- +q={!boost b=div(1,sum(1,price)) v=$qq} +qq={!edismax qf=name}ipod +---- + `lowercaseOperators`:: A Boolean parameter indicating if lowercase "and" and "or" should be treated the same as operators "AND" and "OR". @@ -150,11 +163,6 @@ qf=title text last_name first_name f.name.qf=last_name first_name ---- -== Using Negative Boost - -Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too. - - == Using 'Slop' `Dismax` and `Edismax` can run queries against all query fields, and also run a query in the form of a phrase against the phrase fields. (This will work only for boosting documents, not actually for matching.) However, that phrase query can have a 'slop,' which is the distance between the terms of the query while still considering it a phrase match. For example: