This is the main [learning to rank integrated into solr](http://www.slideshare.net/lucidworks/learning-to-rank-in-solr-presented-by-michael-nilsson-diego-ceccarelli-bloomberg-lp)
repository.
[Read up on learning to rank](https://en.wikipedia.org/wiki/Learning_to_rank)
Apache Solr Learning to Rank (LTR) provides a way for you to extract features
directly inside Solr for use in training a machine learned model. You can then
deploy that model to Solr and use it to rerank your top X search results.
# Test the plugin with solr/example/techproducts in a few easy steps!
Solr provides some simple example of indices. In order to test the plugin with
the techproducts example please follow these steps.
Defines five features. Anything that is a valid Solr query can be used to define
a feature.
### Filter Query Features
The first feature isBook fires if the term 'book' matches the category field
for the given examined document. Since in this feature q was not specified,
either the score 1 (in case of a match) or the score 0 (in case of no match)
will be returned.
### Query Features
In the second feature (documentRecency) q was specified using a function query.
In this case the score for the feature on a given document is whatever the query
returns (1 for docs dated now, 1/2 for docs dated 1 year ago, 1/3 for docs dated
2 years ago, etc..) . If both an fq and q is used, documents that don't match
the fq will receive a score of 0 for the documentRecency feature, all other
documents will receive the score specified by the query for this feature.
### Original Score Feature
The third feature (originalScore) has no parameters, and uses the
OriginalScoreFeature class instead of the SolrFeature class. Its purpose is
to simply return the score for the original search request against the current
matching document.
### External Features
Users can specify external information that can to be passed in as
part of the query to the ltr ranking framework. In this case, the
fourth feature (userTextPhraseMatch) will be looking for an external field
called 'user_text' passed in through the request, and will fire if there is
a term match for the document field 'title' from the value of the external
field 'user_text'. You can provide default values for external features as
well by specifying ${myField:myDefault}, similar to how you would in a Solr config.
In this case, the fifth feature (userFromMobile) will be looking for an external parameter
called 'userFromMobile' passed in through the request, if the ValueFeature is :
required=true, it will throw an exception if the external feature is not passed
required=false, it will silently ignore the feature and avoid the scoring ( at Document scoring time, the model will consider 0 as feature value)
The advantage in defining a feature as not required, where possible, is to avoid wasting caching space and time in calculating the featureScore.
See the [Run a Rerank Query](#run-a-rerank-query) section for how to pass in external information.
### Custom Features
Custom features can be created by extending from
org.apache.solr.ltr.feature.Feature, however this is generally not recommended.
The majority of features should be possible to create using the methods described
above.
# Defining Models
Currently the Learning to Rank plugin supports 2 generalized forms of
models: 1. Linear Model i.e. [RankSVM](http://www.cs.cornell.edu/people/tj/publications/joachims_02c.pdf), [Pranking](https://papers.nips.cc/paper/2023-pranking-with-ranking.pdf)
and 2. Multiple Additive Trees i.e. [LambdaMART](http://research.microsoft.com/pubs/132652/MSR-TR-2010-82.pdf), [Gradient Boosted Regression Trees (GBRT)](https://papers.nips.cc/paper/3305-a-general-boosting-method-and-its-application-to-learning-ranking-functions-for-web-search.pdf)
### Linear
If you'd like to introduce a bias set a constant feature
to the bias value you'd like and make a weight of 1.0 for that feature.
###### model.json
```json
{
"class":"org.apache.solr.ltr.model.LinearModel",
"name":"myModelName",
"features":[
{ "name": "userTextTitleMatch"},
{ "name": "originalScore"},
{ "name": "isBook"}
],
"params":{
"weights": {
"userTextTitleMatch": 1.0,
"originalScore": 0.5,
"isBook": 0.1
}
}
}
```
This is an example of a toy Linear model. Class specifies the class to be
using to interpret the model. Name is the model identifier you will use
when making request to the ltr framework. Features specifies the feature
space that you want extracted when using this model. All features that
appear in the model params will be used for scoring and must appear in
the features list. You can add extra features to the features list that
will be computed but not used in the model for scoring, which can be useful
for logging. Params are the Linear parameters.
Good library for training SVM, an example of a Linear model, is
An LTR model is plugged into the ranking through the [LTRQParserPlugin](/solr/contrib/ltr/src/java/org/apache/solr/ltr/search/LTRQParserPlugin.java). The plugin will
read from the request the model, an instance of [LTRScoringModel](/solr/contrib/ltr/src/java/org/apache/solr/ltr/model/LTRScoringModel.java),
plus other parameters. The plugin will generate an LTRQuery, a particular [ReRankQuery](/solr/core/src/java/org/apache/solr/search/AbstractReRankQuery.java).
It wraps the original solr query for the first pass ranking, and uses the provided model in an
[LTRScoringQuery](/solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java) to
rescore and rerank the top documents. The LTRScoringQuery will take care of computing the values of all the
[features](/solr/contrib/ltr/src/java/org/apache/solr/ltr/feature/Feature.java) and then will delegate the final score
generation to the LTRScoringModel.
# Speeding up the weight creation with threads
About half the time for ranking is spent in the creation of weights for each feature used in ranking. If the number of features is significantly high (say, 500 or more), this increases the ranking overhead proportionally. To alleviate this problem, parallel weight creation is provided as a configurable option. In order to use this feature, the following lines need to be added to the solrconfig.xml
```xml
<config>
<!-- Query parser used to rerank top docs with a provided model -->
<intname="threadModule.totalPoolThreads">10</int><!-- Maximum threads to share for all requests -->
<intname="threadModule.numThreadsPerRequest">5</int><!-- Maximum threads to use for a single requests-->
</transformer>
</config>
```
The threadModule.totalPoolThreads option limits the total number of threads to be used across all query instances at any given time. threadModule.numThreadsPerRequest limits the number of threads used to process a single query. In the above example, 10 threads will be used to services all queries and a maximum of 5 threads to service a single query. If the solr instances is expected to receive no more than one query at a time, it is best to set both these numbers to the same value. If multiple queries need to serviced simultaneously, the numbers can be adjusted based on the expected response times. If the value of threadModule.numThreadsPerRequest is higher, the reponse time for a single query will be improved upto a point. If multiple queries are serviced simultaneously, the threadModule.totalPoolThreads imposes a contention between the queries if (threadModule.numThreadsPerRequest*total parallel queries > threadModule.totalPoolThreads).