lucene/solr/contrib/ltr/example/README.md

6.1 KiB

This README file is only about this example directory's content.

Please refer to the Solr Reference Guide's section on Learning To Rank section for broader information on Learning to Rank (LTR) with Apache Solr.

Start Solr with the LTR plugin enabled

./bin/solr -e techproducts -Dsolr.ltr.enabled=true

Train an example machine learning model using LIBLINEAR

  1. Download and install liblinear

  2. Change contrib/ltr/example/config.json "trainingLibraryLocation" to point to the train directory where you installed liblinear.

    Alternatively, leave the config.json file unchanged and create a soft-link to your liblinear directory e.g.

ln -s /Users/YourNameHere/Downloads/liblinear-2.1 ./contrib/ltr/example/liblinear

  1. Extract features, train a reranking model, and deploy it to Solr.

cd contrib/ltr/example

python train_and_upload_demo_model.py -c config.json

This script deploys your features from config.json "solrFeaturesFile" to Solr. Then it takes the relevance judged query document pairs of "userQueriesFile" and merges it with the features extracted from Solr into a training file. That file is used to train a linear model, which is then deployed to Solr for you to rerank results.

  1. Search and rerank the results using the trained model
http://localhost:8983/solr/techproducts/query?indent=on&q=test&wt=json&rq={!ltr%20model=exampleModel%20reRankDocs=25%20efi.user_query=%27test%27}&fl=price,score,name

Assemble training data

In order to train a learning to rank model you need training data. Training data is what teaches the model what the appropriate weight for each feature is. In general training data is a collection of queries with associated documents and what their ranking/score should be. As an example:

hard drive|SP2514N        |0.6|CLICK_LOGS
hard drive|6H500F0        |0.3|CLICK_LOGS
hard drive|F8V7067-APL-KIT|0.0|CLICK_LOGS
hard drive|IW-02          |0.0|CLICK_LOGS

ipod      |MA147LL/A      |1.0|HUMAN_JUDGEMENT
ipod      |F8V7067-APL-KIT|0.5|HUMAN_JUDGEMENT
ipod      |IW-02          |0.5|HUMAN_JUDGEMENT
ipod      |6H500F0        |0.0|HUMAN_JUDGEMENT

The columns in the example represent:

  1. the user query;

  2. a unique id for a document in the response;

  3. the a score representing the relevance of that document (not necessarily between zero and one);

  4. the source, i.e., if the training record was produced by using interaction data (CLICK_LOGS) or by human judgements (HUMAN_JUDGEMENT).

How to produce training data

You might collect data for use with your machine learning algorithm relying on:

  • Users Interactions: given a specific query, it is possible to log all the users interactions (e.g., clicks, shares on social networks, send by email etc), and then use them as proxies for relevance;
  • Human Judgements: A training dataset is produced by explicitly asking some judges to evaluate the relevance of a document given the query.

How to prepare training data from interaction data?

There are many ways of preparing interaction data for training a model, and it is outside the scope of this readme to provide a complete review of all the techniques. In the following we illustrate a simple way for obtaining training data from simple interaction data.

Simple interaction data will be a log file generated by your application after it has talked to Solr. The log will contain two different types of record:

  • query: when a user performs a query we have a record with user-id, query, responses, where responses is a list of unique document ids returned for a query.

Example:

diego, hard drive, [SP2514N,6H500F0,F8V7067-APL-KIT,IW-02]
  • click: when a user performs a click we have a record with user-id, query, document-id, click

Example:

christine, hard drive, SP2154N
diego    , hard drive, SP2154N
michael  , hard drive, SP2154N
joshua   , hard drive, IW-02

Given a log composed by records like these, a simple way to produce a training dataset is to group on the query field and then assign to each query a relevance score equal to the number of clicks:

hard drive|SP2514N        |3|CLICK_LOGS
hard drive|IW-02          |1|CLICK_LOGS
hard drive|6H500F0        |0|CLICK_LOGS
hard drive|F8V7067-APL-KIT|0|CLICK_LOGS

This is a really trival way to generate a training dataset, and in many settings it might not produce great results. Indeed, it is a well known fact that clicks are biased: users tend to click on the first result proposed for a query, also if it is not relevant. A click on a document in position five could be considered more important than a click on a document in position one, because the user took the effort to browse the results list until position five.

Some approaches take into account the time spent on the clicked document (if the user spent only two seconds on the document and then clicked on other documents in the list, probably she did not intend to click that document).

There are many papers proposing techniques for removing the bias, or for taking into account the click positions, a good survey is Click Models for Web Search, by Chuklin, Markov and Rijke.

Prepare training data from human judgements

Another way to get training data is asking human judges to label them. Producing human judgements is in general more expensive, but the quality of the dataset produced can be better than the one produced from interaction data. It is worth to note that human judgements can be produced also relying on a crowdsourcing platform, that allows a user to show human workers documents associated with a query and to get back relevance labels. Usually a human worker visualizes a query together with a list of results and the task consists in assigning a relevance label to each document (e.g., Perfect, Excellent, Good, Fair, Not relevant). Training data can then be obtained by translating the labels into numeric scores (e.g., Perfect = 4, Excellent = 3, Good = 2, Fair = 1, Not relevant = 0).