SOLR-14032: some misc ref-guide improvements related to clarifying the pros/cons of the diff ways to 'boost' documents by func/query

(cherry picked from commit 485e140e93)
This commit is contained in:
Chris Hostetter 2019-12-09 15:34:42 -07:00
parent 6385e63851
commit d0c6ab8bac
3 changed files with 77 additions and 22 deletions

View File

@ -204,24 +204,38 @@ A list of queries that *must* appear in matching documents. However, unlike `mus
== Boost Query Parser
`BoostQParser` extends the `QParserPlugin` and creates a boosted query from the input value. The main value is the query to be boosted. Parameter `b` is the function query to use as the boost. The query to be boosted may be of any type.
`BoostQParser` extends the `QParserPlugin` and creates a boosted query from the input value. The main value is any query to be "wrapped" and "boosted" -- only documents which match that query will match the final query produced by this parter. Parameter `b` is a <<function-queries.adoc#available-functions,function>> to be evaluted against each document that matches the original query, and the result of the function will be multiplied into into the final score for that document.
=== Boost Query Parser Examples
Creates a query "foo" which is boosted (scores are multiplied) by the function query `log(popularity)`:
Creates a query `name:foo` which is boosted (scores are multiplied) by the function query `log(popularity)`:
[source,text]
----
{!boost b=log(popularity)}foo
q={!boost b=log(popularity)}name:foo
----
Creates a query "foo" which is boosted by the date boosting function referenced in `ReciprocalFloatFunction`:
Creates a query `name:foo` which has it's scores multiplied by the _inverse_ of the numeric `price` field -- effectively "demoting" documents which have a high `price` by lowering their final score:
[source,text]
----
{!boost b=recip(ms(NOW,mydatefield),3.16e-11,1,1)}foo
// NOTE: we "add 1" to the denominator to prevent divide by zero
q={!boost b=div(1,add(1,price))}name:foo
----
The `<<function-queries.adoc#query-function,query(...)>>` function is particularly useful for situations where you want to multiply (or divide) the score for each document matching your main query by the score that document would have from another query.
This example uses <<local-parameters-in-queries.adoc#parameter-dereferencing,local parameter variables>> to create a query for `name:foo` which is boosted by the scores from the independently specified query `category:electronics`:
[source,text]
----
q={!boost b=query($my_boost)}name:foo
my_boost=category:electronics
----
[[other-collapsing]]
== Collapsing Query Parser

View File

@ -115,35 +115,68 @@ A value of "0.0" - the default - makes the query a pure "disjunction max query":
=== bq (Boost Query) Parameter
The `bq` parameter specifies an additional, optional, query clause that will be added to the user's main query to influence the score. For example, if you wanted to add a relevancy boost for recent documents:
The `bq` parameter specifies an additional, optional, query clause that will be _added_ to the user's main query as optional clauses that will influence the score. For example, if you wanted to add a boost for documents that are in a particular category you could use:
[source,text]
----
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
bq=category:food^10
----
You can specify multiple `bq` parameters. If you want your query to be parsed as separate clauses with separate boosts, use multiple `bq` parameters.
You can specify multiple `bq` parameters, which will each be added as separate clauses with separate boosts.
[source,text]
----
q=cheese
bq=category:food^10
bq=category:deli^5
----
Using the `bq` parameter in this way is functionally equivilent to combining your `q` and `bq` params into a single larger boolean query, where the (original) `q` param is "mandatory" and the other clauses are optional:
[source,text]
----
q=(+cheese category:food^10 category:deli^5)
----
The only difference between the above examples, is that using the `bq` param allows you to specify these extra clauses independently (ie: as configuration defaults) from the main query.
[TIP]
[[bq-bf-shortcomings]]
.Additive Boosts vs Multiplicative Boosts
====
Generally speaking, using `bq` (or `bf`, below) is considered a poor way to "boost" documents by a secondary query because it has an "Additive" effect on the final score. The overall impact a particular `bq` param will have on a given document can vary a lot depending on the _absolute_ values of the scores from the original query as well as the `bq` query, which in turn depends on the complexity of the original query, and various scoring factors (TF, IDF, average field length, etc.)
"Multiplicative Boosting" is generally considered to be a more predictable method of influcing document score, because it acts as a "scaling factor" -- increasing (or decreasing) the scores of each document by a _relative_ amount.
The <<other-parsers.adoc#boost-query-parser,`{!boost}` QParser>> provides a convinient wrapper for implementing multiplicitive boosting, and the <<the-extended-dismax-query-parser.adoc#extended-dismax-parameters,`{!edismax}` QParser>> offers a `boost` query param short cut for using it.
====
=== bf (Boost Functions) Parameter
The `bf` parameter specifies functions (with optional boosts) that will be used to construct FunctionQueries which will be added to the user's main query as optional clauses that will influence the score. Any function supported natively by Solr can be used, along with a boost value. For example:
The `bf` parameter specifies functions (with optional <<the-standard-query-parser.adoc#boosting-a-term-with,query boost>>) that will be used to construct FunctionQueries which will be _added_ to the user's main query as optional clauses that will influence the score. Any <<function-queries.adoc#available-functions,function supported natively by Solr>> can be used, along with a boost value. For example:
[source,text]
----
recip(rord(myfield),1,2,3)^1.5
q=cheese
bf=div(1,sum(1,price))^1.5
----
Specifying functions with the bf parameter is essentially just shorthand for using the `bq` parameter combined with the `{!func}` parser.
Specifying functions with the bf parameter is essentially just shorthand for using the `bq` parameter (<<#bq-bf-shortcomings,with the same shortcomings>>) combined with the `{!func}` parser -- with the addition of the simplified "query boost" syntax.
For example, if you want to show the most recent documents first, you could use either of the following:
For example, the two `bf` params listed below, are completely equivilent to the two `bq` params below:
[source,text]
----
bf=recip(rord(creationDate),1,1000,1000)
...or...
bq={!func}recip(rord(creationDate),1,1000,1000)
bf=div(sales_rank,ms(NOW,release_date))
bf=div(1,sum(1,price))^1.5
----
[source,text]
----
bq={!func}div(sales_rank,ms(NOW,release_date))
bq={!lucene}( {!func v='div(1,sum(1,price))'} )^1.5
----

View File

@ -27,7 +27,7 @@ In addition to supporting all the DisMax query parser parameters, Extended Disma
* includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
* improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
* includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
* includes improved boost function: in Extended DisMax, the `boost` function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (`bf` and `bq`) are also supported.
* includes improved boost function: in Extended DisMax, the `boost` function is a multiplier <<the-dismax-query-parser.adoc#bq-bf-shortcomings,rather than an addend>>, improving your boost results; the additive boost functions of DisMax (`bf` and `bq`) are also supported.
* supports pure negative nested queries: queries such as `+foo (-foo)` will match all documents.
* lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.
@ -52,7 +52,20 @@ If `true`, the number of clauses required (<<the-dismax-query-parser.adoc#mm-min
Note that relaxing `mm` may cause undesired side effects, such as hurting the precision of the search, depending on the nature of your index content.
`boost`::
A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the `BoostQParserPlugin`.
A multivalued list of strings parsed as <<function-queries.adoc#available-functions,functions>> whose results will be multiplied into the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the <<other-parsers.adoc#boost-query-parser,`BoostQParserPlugin`>>.
These two examples are equivilent:
[source,text]
----
q={!edismax qf=name}ipod
boost=div(1,sum(1,price))
----
[source,text]
----
q={!boost b=div(1,sum(1,price)) v=$qq}
qq={!edismax qf=name}ipod
----
`lowercaseOperators`::
A Boolean parameter indicating if lowercase "and" and "or" should be treated the same as operators "AND" and "OR".
@ -150,11 +163,6 @@ qf=title text last_name first_name
f.name.qf=last_name first_name
----
== Using Negative Boost
Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too.
== Using 'Slop'
`Dismax` and `Edismax` can run queries against all query fields, and also run a query in the form of a phrase against the phrase fields. (This will work only for boosting documents, not actually for matching.) However, that phrase query can have a 'slop,' which is the distance between the terms of the query while still considering it a phrase match. For example: