mirror of https://github.com/apache/lucene.git
SOLR-15193: Improve maxDocFreq docs
This commit is contained in:
parent
606cea94d7
commit
140c37eb0f
|
@ -176,7 +176,7 @@ The ancestor links will only be tracked when the trackTraversal flag is turned o
|
||||||
Link analysis is often performed to determine *node centrality*. When analyzing for centrality the
|
Link analysis is often performed to determine *node centrality*. When analyzing for centrality the
|
||||||
goal is to assign a weight to each node based on how connected it is in the subgraph.
|
goal is to assign a weight to each node based on how connected it is in the subgraph.
|
||||||
There are different types of node centrality. Graph expressions very efficiently calculates
|
There are different types of node centrality. Graph expressions very efficiently calculates
|
||||||
*inbound degree centrality* (indegree).
|
*inbound degree centrality* (in-degree).
|
||||||
|
|
||||||
Inbound degree centrality is calculated by counting the number of inbound
|
Inbound degree centrality is calculated by counting the number of inbound
|
||||||
links to each node. For simplicity this document will sometimes refer
|
links to each node. For simplicity this document will sometimes refer
|
||||||
|
@ -274,17 +274,24 @@ image::images/math-expressions/graph2.png[]
|
||||||
If we compute the dot product between the butter column and the other product columns you will find that the dot product equals the inbound degree in each case.
|
If we compute the dot product between the butter column and the other product columns you will find that the dot product equals the inbound degree in each case.
|
||||||
This tells us that a nearest neighbor search, using a maximum inner product similarity, would select the column with the highest inbound degree.
|
This tells us that a nearest neighbor search, using a maximum inner product similarity, would select the column with the highest inbound degree.
|
||||||
|
|
||||||
=== Limiting Basket Size
|
=== Limiting Basket Out-Degree
|
||||||
|
|
||||||
The recommendation can be improved if we chose baskets that contain fewer items.
|
The recommendation can be made stronger by limiting the *out-degree* of the baskets. The out-degree is the
|
||||||
This is because baskets with a smaller number of products carry more information about the
|
number of outbound links of a node in a graph. In the shopping basket example the outbound links
|
||||||
relationship between the products in the basket.
|
from the baskets link to products. So limiting the out-degree will limit the size of the baskets.
|
||||||
|
|
||||||
The `maxDocFreq` parameter can be used to limit the "walk" to only include baskets that appear in the index a certain
|
Why does limiting the size of the shopping baskets make a stronger recommendation? To answer this question it helps
|
||||||
number of times. Since each occurrence of a basket ID in the index is a product, limiting the document frequency of the
|
to think about each shopping basket as *voting* for products that go with *butter*. In an election with two candidates
|
||||||
basket ID will limit the size of the basket. The `maxDocFreq` param is applied per shard. If there is a single
|
if you were to vote for both candidates the votes would cancel each other out and have no effect.
|
||||||
shard or documents are co-located by basket ID then the `maxDocFreq` will be an exact count.
|
But if you vote for only one candidate your vote will affect the outcome. The same principal holds true
|
||||||
Otherwise it will return baskets with a max size of numShards*maxDocFreq.
|
for recommendations. As a basket votes for more products it dilutes the strength of its recommendation for any
|
||||||
|
one product. A basket with just butter and one other item more strongly recommends that item.
|
||||||
|
|
||||||
|
The `maxDocFreq` parameter can be used to limit the graph "walk" to only include baskets that appear in
|
||||||
|
the index a certain number of times. Since each occurrence of a basket ID in the index is a link to a product,
|
||||||
|
limiting the document frequency of the basket ID will limit the out-degree of the basket. The `maxDocFreq` param is
|
||||||
|
applied per shard. If there is a single shard or documents are co-located by basket ID then the `maxDocFreq` will
|
||||||
|
be an exact count. Otherwise, it will return baskets with a max size of numShards * maxDocFreq.
|
||||||
|
|
||||||
The example below shows the `maxDocFreq` parameter applied to the `nodes` expression.
|
The example below shows the `maxDocFreq` parameter applied to the `nodes` expression.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue