This README file is only about this example directory's content. Please refer to the Solr Reference Guide's section on [Learning To Rank](https://lucene.apache.org/solr/guide/learning-to-rank.html) section for broader information on Learning to Rank (LTR) with Apache Solr. # Start Solr with the LTR plugin enabled `./bin/solr -e techproducts -Dsolr.ltr.enabled=true` # Train an example machine learning model using LIBLINEAR 1. Download and install [liblinear](https://www.csie.ntu.edu.tw/~cjlin/liblinear/) 2. Change `contrib/ltr/example/config.json` "trainingLibraryLocation" to point to the train directory where you installed liblinear. Alternatively, leave the `config.json` file unchanged and create a soft-link to your `liblinear` directory e.g. `ln -s /Users/YourNameHere/Downloads/liblinear-2.1 ./contrib/ltr/example/liblinear` 3. Extract features, train a reranking model, and deploy it to Solr. `cd contrib/ltr/example` `python train_and_upload_demo_model.py -c config.json` This script deploys your features from `config.json` "solrFeaturesFile" to Solr. Then it takes the relevance judged query document pairs of "userQueriesFile" and merges it with the features extracted from Solr into a training file. That file is used to train a linear model, which is then deployed to Solr for you to rerank results. 4. Search and rerank the results using the trained model ``` http://localhost:8983/solr/techproducts/query?q=test&rq={!ltr%20model=exampleModel%20reRankDocs=25%20efi.user_query=%27test%27}&fl=price,score,name ``` # Assemble training data In order to train a learning to rank model you need training data. Training data is what *teaches* the model what the appropriate weight for each feature is. In general training data is a collection of queries with associated documents and what their ranking/score should be. As an example: ``` hard drive|SP2514N |0.6|CLICK_LOGS hard drive|6H500F0 |0.3|CLICK_LOGS hard drive|F8V7067-APL-KIT|0.0|CLICK_LOGS hard drive|IW-02 |0.0|CLICK_LOGS ipod |MA147LL/A |1.0|HUMAN_JUDGEMENT ipod |F8V7067-APL-KIT|0.5|HUMAN_JUDGEMENT ipod |IW-02 |0.5|HUMAN_JUDGEMENT ipod |6H500F0 |0.0|HUMAN_JUDGEMENT ``` The columns in the example represent: 1. the user query; 2. a unique id for a document in the response; 3. the a score representing the relevance of that document (not necessarily between zero and one); 4. the source, i.e., if the training record was produced by using interaction data (`CLICK_LOGS`) or by human judgements (`HUMAN_JUDGEMENT`). ## How to produce training data You might collect data for use with your machine learning algorithm relying on: * **Users Interactions**: given a specific query, it is possible to log all the users interactions (e.g., clicks, shares on social networks, send by email etc), and then use them as proxies for relevance; * **Human Judgements**: A training dataset is produced by explicitly asking some judges to evaluate the relevance of a document given the query. ### How to prepare training data from interaction data? There are many ways of preparing interaction data for training a model, and it is outside the scope of this readme to provide a complete review of all the techniques. In the following we illustrate a simple way for obtaining training data from simple interaction data. Simple interaction data will be a log file generated by your application after it has talked to Solr. The log will contain two different types of record: * **query**: when a user performs a query we have a record with `user-id, query, responses`, where `responses` is a list of unique document ids returned for a query. **Example:** ``` diego, hard drive, [SP2514N,6H500F0,F8V7067-APL-KIT,IW-02] ``` * **click**: when a user performs a click we have a record with `user-id, query, document-id, click` **Example:** ``` christine, hard drive, SP2154N diego , hard drive, SP2154N michael , hard drive, SP2154N joshua , hard drive, IW-02 ``` Given a log composed by records like these, a simple way to produce a training dataset is to group on the query field and then assign to each query a relevance score equal to the number of clicks: ``` hard drive|SP2514N |3|CLICK_LOGS hard drive|IW-02 |1|CLICK_LOGS hard drive|6H500F0 |0|CLICK_LOGS hard drive|F8V7067-APL-KIT|0|CLICK_LOGS ``` This is a really trival way to generate a training dataset, and in many settings it might not produce great results. Indeed, it is a well known fact that clicks are *biased*: users tend to click on the first result proposed for a query, also if it is not relevant. A click on a document in position five could be considered more important than a click on a document in position one, because the user took the effort to browse the results list until position five. Some approaches take into account the time spent on the clicked document (if the user spent only two seconds on the document and then clicked on other documents in the list, probably she did not intend to click that document). There are many papers proposing techniques for removing the bias, or for taking into account the click positions, a good survey is [Click Models for Web Search](http://clickmodels.weebly.com/uploads/5/2/2/5/52257029/mc2015-clickmodels.pdf), by Chuklin, Markov and Rijke. ### Prepare training data from human judgements Another way to get training data is asking human judges to label them. Producing human judgements is in general more expensive, but the quality of the dataset produced can be better than the one produced from interaction data. It is worth to note that human judgements can be produced also relying on a crowdsourcing platform, that allows a user to show human workers documents associated with a query and to get back relevance labels. Usually a human worker visualizes a query together with a list of results and the task consists in assigning a relevance label to each document (e.g., Perfect, Excellent, Good, Fair, Not relevant). Training data can then be obtained by translating the labels into numeric scores (e.g., Perfect = 4, Excellent = 3, Good = 2, Fair = 1, Not relevant = 0).