mirror of https://github.com/apache/lucene.git
133 lines
6.1 KiB
Markdown
133 lines
6.1 KiB
Markdown
This README file is only about this example directory's content.
|
|
|
|
Please refer to the Solr Reference Guide's section on [Learning To Rank](https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank) section for broader information on Learning to Rank (LTR) with Apache Solr.
|
|
|
|
# Start Solr with the LTR plugin enabled
|
|
|
|
`./bin/solr -e techproducts -Dsolr.ltr.enabled=true`
|
|
|
|
# Train an example machine learning model using LIBLINEAR
|
|
|
|
1. Download and install [liblinear](https://www.csie.ntu.edu.tw/~cjlin/liblinear/)
|
|
|
|
2. Change `contrib/ltr/example/config.json` "trainingLibraryLocation" to point to the train directory where you installed liblinear.
|
|
|
|
Alternatively, leave the `config.json` file unchanged and create a soft-link to your `liblinear` directory e.g.
|
|
|
|
`ln -s /Users/YourNameHere/Downloads/liblinear-2.1 ./contrib/ltr/example/liblinear`
|
|
|
|
3. Extract features, train a reranking model, and deploy it to Solr.
|
|
|
|
`cd contrib/ltr/example`
|
|
|
|
`python train_and_upload_demo_model.py -c config.json`
|
|
|
|
This script deploys your features from `config.json` "solrFeaturesFile" to Solr. Then it takes the relevance judged query
|
|
document pairs of "userQueriesFile" and merges it with the features extracted from Solr into a training
|
|
file. That file is used to train a linear model, which is then deployed to Solr for you to rerank results.
|
|
|
|
4. Search and rerank the results using the trained model
|
|
|
|
```
|
|
http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=exampleModel%20reRankDocs=25%20efi.user_query=%27test%27}&fl=price,score,name
|
|
```
|
|
|
|
# Assemble training data
|
|
In order to train a learning to rank model you need training data. Training data is
|
|
what *teaches* the model what the appropriate weight for each feature is. In general
|
|
training data is a collection of queries with associated documents and what their ranking/score
|
|
should be. As an example:
|
|
```
|
|
hard drive|SP2514N |0.6|CLICK_LOGS
|
|
hard drive|6H500F0 |0.3|CLICK_LOGS
|
|
hard drive|F8V7067-APL-KIT|0.0|CLICK_LOGS
|
|
hard drive|IW-02 |0.0|CLICK_LOGS
|
|
|
|
ipod |MA147LL/A |1.0|HUMAN_JUDGEMENT
|
|
ipod |F8V7067-APL-KIT|0.5|HUMAN_JUDGEMENT
|
|
ipod |IW-02 |0.5|HUMAN_JUDGEMENT
|
|
ipod |6H500F0 |0.0|HUMAN_JUDGEMENT
|
|
```
|
|
The columns in the example represent:
|
|
|
|
1. the user query;
|
|
|
|
2. a unique id for a document in the response;
|
|
|
|
3. the a score representing the relevance of that document (not necessarily between zero and one);
|
|
|
|
4. the source, i.e., if the training record was produced by using interaction data (`CLICK_LOGS`) or by human judgements (`HUMAN_JUDGEMENT`).
|
|
|
|
## How to produce training data
|
|
|
|
You might collect data for use with your machine learning algorithm relying on:
|
|
|
|
* **Users Interactions**: given a specific query, it is possible to log all the users interactions (e.g., clicks, shares on social networks, send by email etc), and then use them as proxies for relevance;
|
|
* **Human Judgements**: A training dataset is produced by explicitly asking some judges to evaluate the relevance of a document given the query.
|
|
|
|
### How to prepare training data from interaction data?
|
|
|
|
There are many ways of preparing interaction data for training a model, and it is outside the scope of this readme to provide a complete review of all the techniques. In the following we illustrate a simple way for obtaining training data from simple interaction data.
|
|
|
|
Simple interaction data will be a log file generated by your application after it
|
|
has talked to Solr. The log will contain two different types of record:
|
|
|
|
* **query**: when a user performs a query we have a record with `user-id, query, responses`,
|
|
where `responses` is a list of unique document ids returned for a query.
|
|
|
|
**Example:**
|
|
|
|
```
|
|
diego, hard drive, [SP2514N,6H500F0,F8V7067-APL-KIT,IW-02]
|
|
```
|
|
|
|
* **click**: when a user performs a click we have a record with `user-id, query, document-id, click`
|
|
|
|
**Example:**
|
|
```
|
|
christine, hard drive, SP2154N
|
|
diego , hard drive, SP2154N
|
|
michael , hard drive, SP2154N
|
|
joshua , hard drive, IW-02
|
|
```
|
|
|
|
Given a log composed by records like these, a simple way to produce a training dataset is to group on the query field
|
|
and then assign to each query a relevance score equal to the number of clicks:
|
|
|
|
```
|
|
hard drive|SP2514N |3|CLICK_LOGS
|
|
hard drive|IW-02 |1|CLICK_LOGS
|
|
hard drive|6H500F0 |0|CLICK_LOGS
|
|
hard drive|F8V7067-APL-KIT|0|CLICK_LOGS
|
|
```
|
|
|
|
This is a really trival way to generate a training dataset, and in many settings
|
|
it might not produce great results. Indeed, it is a well known fact that
|
|
clicks are *biased*: users tend to click on the first
|
|
result proposed for a query, also if it is not relevant. A click on a document in position
|
|
five could be considered more important than a click on a document in position one, because
|
|
the user took the effort to browse the results list until position five.
|
|
|
|
Some approaches take into account the time spent on the clicked document (if the user
|
|
spent only two seconds on the document and then clicked on other documents in the list,
|
|
probably she did not intend to click that document).
|
|
|
|
There are many papers proposing techniques for removing the bias, or for taking into account the click positions,
|
|
a good survey is [Click Models for Web Search](http://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf),
|
|
by Chuklin, Markov and Rijke.
|
|
|
|
### Prepare training data from human judgements
|
|
|
|
Another way to get training data is asking human judges to label them.
|
|
Producing human judgements is in general more expensive, but the quality of the
|
|
dataset produced can be better than the one produced from interaction data.
|
|
It is worth to note that human judgements can be produced also relying on a
|
|
crowdsourcing platform, that allows a user to show human workers documents associated with a
|
|
query and to get back relevance labels.
|
|
Usually a human worker visualizes a query together with a list of results and the task
|
|
consists in assigning a relevance label to each document (e.g., Perfect, Excellent, Good, Fair, Not relevant).
|
|
Training data can then be obtained by translating the labels into numeric scores
|
|
(e.g., Perfect = 4, Excellent = 3, Good = 2, Fair = 1, Not relevant = 0).
|
|
|
|
|