2007-06-06 23:17:18 -04:00
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2012-04-04 01:03:53 -04:00
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
2006-01-30 18:11:30 -05:00
< html >
< head >
2012-03-06 19:20:31 -05:00
< META http-equiv = "Content-Type" content = "text/html; charset=UTF-8" / >
< title > Solr Tutorial< / title >
< style >
pre.code {
background-color: #D3D3D3;
padding: 0.2em;
}
2012-03-27 21:35:27 -04:00
.codefrag {
font-family: monospace;
font-weight:bold;
}
2012-03-06 19:20:31 -05:00
< / style >
2006-01-30 18:11:30 -05:00
< / head >
2012-03-06 19:20:31 -05:00
< body >
2006-01-30 18:11:30 -05:00
2012-03-06 19:20:31 -05:00
< div id = "content" >
< h1 > Solr Tutorial< / h1 >
2006-01-30 18:11:30 -05:00
2012-01-07 12:07:21 -05:00
< a name = "N1000E" > < / a > < a name = "Overview" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Overview< / h2 >
< div class = "section" >
< p >
This document covers the basics of running Solr using an example
schema, and some sample data.
< / p >
< / div >
2012-01-07 12:07:21 -05:00
< a name = "N10018" > < / a > < a name = "Requirements" > < / a >
2006-01-30 18:11:30 -05:00
< h2 class = "boxed" > Requirements< / h2 >
< div class = "section" >
2006-02-28 14:24:54 -05:00
< p >
To follow along with this tutorial, you will need...
< / p >
2006-01-30 18:11:30 -05:00
< ol >
2014-11-20 17:47:21 -05:00
< li > Java 1.8 or greater. Some places you can get it are from
2011-03-17 13:32:39 -04:00
< a href = "http://www.oracle.com/technetwork/java/javase/downloads/index.html" > Oracle< / a > ,
2012-06-06 22:27:49 -04:00
< a href = "http://openjdk.java.net/" > Open JDK< / a > , or
< a href = "http://www.ibm.com/developerworks/java/jdk/" > IBM< / a > .
< ul >
< li > Running < span class = "codefrag" > java -version< / span > at the command
2014-11-20 17:47:21 -05:00
line should indicate a version number starting with 1.8.
2012-06-06 22:27:49 -04:00
< / li >
< li > Gnu's GCJ is not supported and does not work with Solr.< / li >
< / ul >
< / li >
2006-01-30 18:11:30 -05:00
2012-03-27 20:06:50 -04:00
< li > A < a href = "http://lucene.apache.org/solr/mirrors-solr-latest-redir.html" > Solr release< / a > .
2006-02-28 14:24:54 -05:00
< / li >
2006-01-30 18:11:30 -05:00
< / ol >
< / div >
2012-01-07 12:07:21 -05:00
< a name = "N10040" > < / a > < a name = "Getting+Started" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Getting Started< / h2 >
< div class = "section" >
< p >
2006-04-05 14:48:23 -04:00
< strong >
Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.
< / strong >
< / p >
< p >
2012-08-23 19:51:42 -04:00
Begin by unzipping the Solr release and changing your working directory
2009-10-25 17:59:38 -04:00
to be the "< span class = "codefrag" > example< / span > " directory. (Note that the base directory name may vary with the version of Solr downloaded.) For example, with a shell in UNIX, Cygwin, or MacOS:
2006-02-28 14:24:54 -05:00
< / p >
< pre class = "code" >
2009-10-25 17:59:38 -04:00
user:~solr$ < strong > ls< / strong >
2006-04-05 14:48:23 -04:00
solr-nightly.zip
2009-10-25 17:59:38 -04:00
user:~solr$ < strong > unzip -q solr-nightly.zip< / strong >
user:~solr$ < strong > cd solr-nightly/example/< / strong >
2006-02-28 14:24:54 -05:00
< / pre >
< p >
Solr can run in any Java Servlet Container of your choice, but to simplify
2009-10-25 17:59:38 -04:00
this tutorial, the example index includes a small installation of Jetty.
2006-02-28 14:24:54 -05:00
< / p >
< p >
To launch Jetty with the Solr WAR, and the example configs, just run the < span class = "codefrag" > start.jar< / span > ...
< / p >
< pre class = "code" >
2009-10-25 17:59:38 -04:00
user:~/solr/example$ < strong > java -jar start.jar< / strong >
2012-06-06 22:23:54 -04:00
2012-06-06 15:25:59.815:INFO:oejs.Server:jetty-8.1.2.v20120308
2012-06-06 15:25:59.834:INFO:oejdp.ScanningAppProvider:Deployment monitor .../solr/example/webapps at interval 0
2012-06-06 15:25:59.839:INFO:oejd.DeploymentManager:Deployable added: .../solr/example/webapps/solr.war
2006-02-28 14:24:54 -05:00
...
2012-06-06 22:23:54 -04:00
Jun 6, 2012 3:26:03 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [collection1] Registered new searcher Searcher@7527e2ee main{StandardDirectoryReader(segments_1:1)}
2006-02-28 14:24:54 -05:00
< / pre >
< p >
This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr.
< / p >
< p >
2012-06-06 22:23:54 -04:00
You can see that the Solr is running by loading < a href = "http://localhost:8983/solr/" > http://localhost:8983/solr/< / a > in your web browser. This is the main starting point for Administering Solr.
2006-02-28 14:24:54 -05:00
< / p >
< / div >
2012-01-07 12:07:21 -05:00
< a name = "N10078" > < / a > < a name = "Indexing+Data" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Indexing Data< / h2 >
< div class = "section" >
< p >
2009-10-25 17:59:38 -04:00
Your Solr server is up and running, but it doesn't contain any data. You can
2012-06-06 22:23:54 -04:00
modify a Solr index by POSTing commands to Solr to add (or
update) documents, delete documents, and commit pending adds and deletes.
These commands can be in a
< a href = "http://wiki.apache.org/solr/UpdateRequestHandler" > variety of formats< / a > .
2007-02-20 03:22:27 -05:00
< / p >
< p >
2012-06-06 22:23:54 -04:00
The < span class = "codefrag" > exampledocs< / span > directory contains sample files
showing of the types of commands Solr accepts, as well as a java utility
for posting them from the command line (a < span class = "codefrag" > post.sh< / span >
shell script is also available, but for this tutorial we'll use the
2012-09-14 15:23:16 -04:00
cross-platform Java client. Run < span class = "codefrag" > java -jar post.jar -h< / span > so see it's various options).
2012-06-06 22:23:54 -04:00
< / p >
< p > To try this, open a new terminal window, enter the exampledocs directory,
and run "< span class = "codefrag" > java -jar post.jar< / span > " on some of the XML
files in that directory.
2006-02-28 14:24:54 -05:00
< / p >
< pre class = "code" >
2009-10-25 17:59:38 -04:00
user:~/solr/example/exampledocs$ < strong > java -jar post.jar solr.xml monitor.xml< / strong >
2012-03-27 20:31:33 -04:00
SimplePostTool: version 1.4
2007-02-20 03:22:27 -05:00
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file monitor.xml
SimplePostTool: COMMITting Solr index changes..
2006-02-28 14:24:54 -05:00
< / pre >
< p >
2007-02-20 03:22:27 -05:00
You have now indexed two documents in Solr, and committed these changes.
2012-06-06 22:23:54 -04:00
You can now search for "solr" by loading the < a href = "http://localhost:8983/solr/#/collection1/query" > "Query" tab< / a > in the Admin interface, and entering "solr" in the "q" text box. Clicking the "Execute Query" button should display the following URL containing one result...
2006-02-28 14:24:54 -05:00
< / p >
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=solr&wt=xml" > http://localhost:8983/solr/collection1/select?q=solr& wt=xml< / a >
2006-02-28 14:24:54 -05:00
< / p >
< p >
2012-06-06 22:23:54 -04:00
You can index all of the sample data, using the following command
(assuming your command line shell supports the *.xml notation):
2006-02-28 14:24:54 -05:00
< / p >
< pre class = "code" >
2012-03-06 19:20:31 -05:00
user:~/solr/example/exampledocs$ < strong > java -jar post.jar *.xml< / strong >
2012-03-27 20:31:33 -04:00
SimplePostTool: version 1.4
2007-02-20 03:22:27 -05:00
SimplePostTool: POSTing files to http://localhost:8983/solr/update..
2012-03-27 20:31:33 -04:00
SimplePostTool: POSTing file gb18030-example.xml
2007-02-20 03:22:27 -05:00
SimplePostTool: POSTing file hd.xml
SimplePostTool: POSTing file ipod_other.xml
SimplePostTool: POSTing file ipod_video.xml
2012-06-06 22:23:54 -04:00
...
2007-02-20 03:22:27 -05:00
SimplePostTool: POSTing file solr.xml
SimplePostTool: POSTing file utf8-example.xml
SimplePostTool: POSTing file vidcard.xml
SimplePostTool: COMMITting Solr index changes..
2006-02-28 14:24:54 -05:00
< / pre >
< p >
2009-10-25 17:59:38 -04:00
...and now you can search for all sorts of things using the default < a href = "http://wiki.apache.org/solr/SolrQuerySyntax" > Solr Query Syntax< / a > (a superset of the Lucene query syntax)...
2006-02-28 14:24:54 -05:00
< / p >
< ul >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/#/collection1/query?q=video" > video< / a >
2006-02-28 14:24:54 -05:00
< / li >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/#/collection1/query?q=name:video" > name:video< / a >
2006-02-28 14:24:54 -05:00
< / li >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/#/collection1/query?q=%2Bvideo%20%2Bprice%3A[*%20TO%20400]" > +video +price:[* TO 400]< / a >
2006-02-28 14:24:54 -05:00
< / li >
2009-10-25 17:59:38 -04:00
< / ul >
< p > < / p >
< p >
There are many other different ways to import your data into Solr... one can
< / p >
< ul >
< li > Import records from a database using the
< a href = "http://wiki.apache.org/solr/DataImportHandler" > Data Import Handler (DIH)< / a > .
< / li >
< li >
< a href = "http://wiki.apache.org/solr/UpdateCSV" > Load a CSV file< / a > (comma separated values),
including those exported by Excel or MySQL.
< / li >
2011-03-17 13:32:39 -04:00
< li >
< a href = "http://wiki.apache.org/solr/UpdateJSON" > POST JSON documents< / a >
< / li >
2009-10-25 17:59:38 -04:00
< li > Index binary documents such as Word and PDF with
< a href = "http://wiki.apache.org/solr/ExtractingRequestHandler" > Solr Cell< / a > (ExtractingRequestHandler).
< / li >
< li >
Use < a href = "http://wiki.apache.org/solr/Solrj" > SolrJ< / a > for Java or other Solr clients to
programatically create documents to send to Solr.
< / li >
2006-02-28 14:24:54 -05:00
< / ul >
< / div >
2012-01-07 12:07:21 -05:00
< a name = "N100EE" > < / a > < a name = "Updating+Data" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Updating Data< / h2 >
< div class = "section" >
< p >
You may have noticed that even though the file < span class = "codefrag" > solr.xml< / span > has now
been POSTed to the server twice, you still only get 1 result when searching for
2011-05-29 06:27:23 -04:00
"solr". This is because the example < span class = "codefrag" > schema.xml< / span > specifies a "< span class = "codefrag" > uniqueKey< / span > " field
2012-06-06 22:23:54 -04:00
called "< span class = "codefrag" > id< / span > ". Whenever you POST commands to Solr to add a
2011-05-29 06:27:23 -04:00
document with the same value for the < span class = "codefrag" > uniqueKey< / span > as an existing document, it
2009-10-25 17:59:38 -04:00
automatically replaces it for you. You can see that that has happened by
2006-02-28 14:24:54 -05:00
looking at the values for < span class = "codefrag" > numDocs< / span > and < span class = "codefrag" > maxDoc< / span > in the
2009-10-25 17:59:38 -04:00
"CORE"/searcher section of the statistics page... < / p >
2006-02-28 14:24:54 -05:00
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher" > http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher< / a >
2006-02-28 14:24:54 -05:00
< / p >
< p >
2009-10-25 17:59:38 -04:00
2011-05-29 06:27:23 -04:00
< strong > < span class = "codefrag" > numDocs< / span > < / strong > represents the number of searchable documents in the
2009-10-25 17:59:38 -04:00
index (and will be larger than the number of XML files since some files
2011-05-29 06:27:23 -04:00
contained more than one < span class = "codefrag" > < doc> < / span > ). < strong > < span class = "codefrag" > maxDoc< / span > < / strong >
may be larger as the < span class = "codefrag" > maxDoc< / span > count includes logically deleted documents that
2009-10-25 17:59:38 -04:00
have not yet been removed from the index. You can re-post the sample XML
2011-05-29 06:27:23 -04:00
files over and over again as much as you want and < span class = "codefrag" > numDocs< / span > will never
increase, because the new documents will constantly be replacing the old.
2006-02-28 14:24:54 -05:00
< / p >
< p >
2009-10-25 17:59:38 -04:00
Go ahead and edit the existing XML files to change some of the data, and re-run
the < span class = "codefrag" > java -jar post.jar< / span > command, you'll see your changes reflected
in subsequent searches.
2006-02-28 14:24:54 -05:00
< / p >
2012-01-07 12:07:21 -05:00
< a name = "N1012D" > < / a > < a name = "Deleting+Data" > < / a >
2006-02-28 14:24:54 -05:00
< h3 class = "boxed" > Deleting Data< / h3 >
2012-06-06 22:23:54 -04:00
< p >
You can delete data by POSTing a delete command to the update URL and
specifying the value of the document's unique key field, or a query that
matches multiple documents (be careful with that one!). Since these commands
are smaller, we will specify them right on the command line rather than
reference an XML file.
< / p >
< p > Execute the following command to delete a specific document< / p >
< pre class = "code" > java -Ddata=args -Dcommit=false -jar post.jar "< delete> < id> SP2514N< /id> < /delete> "< / pre >
< p >
Because we have specified "commit=false", a search for < a href = "http://localhost:8983/solr/#/collection1/query?q=id:SP2514N" > id:SP2514N< / a > we still find the document we have deleted. Since the example configuration uses Solr's "autoCommit" feature Solr will still automatically persist this change to the index, but it will not affect search results until an "openSearcher" commit is explicitly executed.
< / p >
< p >
Using the < a href = "http://localhost:8983/solr/#/collection1/plugins/updatehandler?entry=updateHandler" > statistics page< / a >
for the < span class = "codefrag" > updateHandler< / span > you can observe this delete
propogate to disk by watching the < span class = "codefrag" > deletesById< / span >
value drop to 0 as the < span class = "codefrag" > cumulative_deletesById< / span >
and < span class = "codefrag" > autocommit< / span > values increase.
< / p >
< p >
Here is an example of using delete-by-query to delete anything with
< a href = "http://localhost:8983/solr/collection1/select?q=name:DDR&fl=name" > DDR< / a > in the name:
< / p >
< pre class = "code" > java -Dcommit=false -Ddata=args -jar post.jar "< delete> < query> name:DDR< /query> < /delete> "< / pre >
< p >
2012-09-14 15:23:16 -04:00
You can force a new searcher to be opened to reflect these changes by sending an explicit commit command to Solr:
2012-06-06 22:23:54 -04:00
< / p >
2012-09-14 15:23:16 -04:00
< pre class = "code" > java -jar post.jar -< / pre >
2012-06-06 22:23:54 -04:00
< p >
Now re-execute < a href = "http://localhost:8983/solr/#/collection1/query?q=id:SP2514N" > the previous search< / a >
and verify that no matching documents are found. You can also revisit the
statistics page and observe the changes to both the number of commits in the < a href = "http://localhost:8983/solr/#/collection1/plugins/updatehandler?entry=updateHandler" > updateHandler< / a > and the numDocs in the < a href = "http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher" > searcher< / a > .
< / p >
< p >
Commits that open a new searcher can be expensive operations so it's best to
make many changes to an index in a batch and then send the
< span class = "codefrag" > commit< / span > command at the end.
There is also an < span class = "codefrag" > optimize< / span > command that does the
same things as < span class = "codefrag" > commit< / span > , but also forces all index
segments to be merged into a single segment -- this can be very resource
2013-01-28 19:35:16 -05:00
intensive, but may be worthwhile for improving search speed if your index
2012-06-06 22:23:54 -04:00
changes very infrequently.
< / p >
< p >
All of the update commands can be specified using either < a href = "http://wiki.apache.org/solr/UpdateXmlMessages" > XML< / a > or < a href = "http://wiki.apache.org/solr/UpdateJSON" > JSON< / a > .
< / p >
2006-02-28 14:24:54 -05:00
< p > To continue with the tutorial, re-add any documents you may have deleted by going to the < span class = "codefrag" > exampledocs< / span > directory and executing< / p >
2007-02-20 03:22:27 -05:00
< pre class = "code" > java -jar post.jar *.xml< / pre >
2006-02-28 14:24:54 -05:00
< / div >
2012-01-07 12:07:21 -05:00
< a name = "N1017C" > < / a > < a name = "Querying+Data" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Querying Data< / h2 >
< div class = "section" >
< p >
2011-05-29 06:27:23 -04:00
Searches are done via HTTP GET on the < span class = "codefrag" > select< / span > URL with the query string in the < span class = "codefrag" > q< / span > parameter.
2012-06-06 22:23:54 -04:00
You can pass a number of optional < a href = "http://wiki.apache.org/solr/SearchHandler" > request parameters< / a >
2011-05-29 06:27:23 -04:00
to the request handler to control what information is returned. For example, you can use the "< span class = "codefrag" > fl< / span > " parameter
to control what stored fields are returned, and if the relevancy score is returned:
2006-02-28 14:24:54 -05:00
< / p >
< ul >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&fl=name,id" > q=video& fl=name,id< / a > (return only name and id fields) < / li >
2006-02-28 14:24:54 -05:00
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&fl=name,id,score" > q=video& fl=name,id,score< / a > (return relevancy score as well) < / li >
2006-02-28 14:24:54 -05:00
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&fl=*,score" > q=video& fl=*,score< / a > (return all stored fields, as well as relevancy score) < / li >
2006-02-28 14:24:54 -05:00
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=price desc&fl=name,id,price" > q=video& sort=price desc& fl=name,id,price< / a > (add sort specification: sort by price descending) < / li >
2009-10-25 17:59:38 -04:00
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&wt=json" > q=video& wt=json< / a > (return response in JSON format) < / li >
2006-02-28 14:24:54 -05:00
< / ul >
< p >
2012-06-06 22:23:54 -04:00
The < a href = "http://localhost:8983/solr/#/collection1/query" > query form< / a >
provided in the web admin interface allows setting various request parameters
and is useful when testing or debugging queries.
< / p >
2012-01-07 12:07:21 -05:00
< a name = "N101BA" > < / a > < a name = "Sorting" > < / a >
2006-02-28 14:24:54 -05:00
< h3 class = "boxed" > Sorting< / h3 >
< p >
2007-06-06 23:17:18 -04:00
Solr provides a simple method to sort on one or more indexed fields.
2011-05-29 06:27:23 -04:00
Use the "< span class = "codefrag" > sort< / span > ' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field:
2006-02-28 14:24:54 -05:00
< / p >
< ul >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=price+desc" > q=video& sort=price desc< / a >
2006-02-28 14:24:54 -05:00
< / li >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=price+asc" > q=video& sort=price asc< / a >
2006-02-28 14:24:54 -05:00
< / li >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=inStock+asc,price+desc" > q=video& sort=inStock asc, price desc< / a >
2006-02-28 14:24:54 -05:00
< / li >
< / ul >
< p >
2011-05-29 06:27:23 -04:00
"< span class = "codefrag" > score< / span > " can also be used as a field name when specifying a sort:
2006-02-28 14:24:54 -05:00
< / p >
< ul >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=score+desc" > q=video& sort=score desc< / a >
2006-02-28 14:24:54 -05:00
< / li >
< li >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=inStock+asc,score+desc" > q=video& sort=inStock asc, score desc< / a >
2006-02-28 14:24:54 -05:00
< / li >
2011-03-17 13:32:39 -04:00
< / ul >
< p >
2011-05-29 06:27:23 -04:00
Complex functions may also be used to sort results:
2011-03-17 13:32:39 -04:00
< / p >
< ul >
< li >
2012-09-14 15:23:16 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?indent=on&q=video&sort=div(popularity,add(price,1))+desc" > q=video& sort=div(popularity,add(price,1)) desc< / a >
2011-03-17 13:32:39 -04:00
< / li >
2006-02-28 14:24:54 -05:00
< / ul >
< p >
2009-10-25 17:59:38 -04:00
If no sort is specified, the default is < span class = "codefrag" > score desc< / span > to return the matches having the highest relevancy.
2006-02-28 14:24:54 -05:00
< / p >
< / div >
2009-10-25 17:59:38 -04:00
2012-01-07 12:07:21 -05:00
< a name = "N101FE" > < / a > < a name = "Highlighting" > < / a >
2009-10-25 17:59:38 -04:00
< h2 class = "boxed" > Highlighting< / h2 >
< div class = "section" >
< p >
2013-01-28 19:35:16 -05:00
Hit highlighting returns relevant snippets of each returned document, and highlights
2011-05-29 06:27:23 -04:00
terms from the query within those context snippets.
2009-10-25 17:59:38 -04:00
< / p >
< p >
The following example searches for < span class = "codefrag" > video card< / span > and requests
highlighting on the fields < span class = "codefrag" > name,features< / span > . This causes a
< span class = "codefrag" > highlighting< / span > section to be added to the response with the
words to highlight surrounded with < span class = "codefrag" > < em> < / span > (for emphasis)
tags.
< / p >
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?wt=json&indent=on&q=video+card&fl=name,id&hl=true&hl.fl=name,features" > ...& q=video card& fl=name,id& hl=true& hl.fl=name,features< / a >
2009-10-25 17:59:38 -04:00
< / p >
< p >
More request parameters related to controlling highlighting may be found
< a href = "http://wiki.apache.org/solr/HighlightingParameters" > here< / a > .
< / p >
< / div > <!-- highlighting -->
2012-01-07 12:07:21 -05:00
< a name = "N10227" > < / a > < a name = "Faceted+Search" > < / a >
2009-10-25 17:59:38 -04:00
< h2 class = "boxed" > Faceted Search< / h2 >
< div class = "section" >
< p >
Faceted search takes the documents matched by a query and generates counts for various
properties or categories. Links are usually provided that allows users to "drill down" or
refine their search results based on the returned categories.
< / p >
< p >
The following example searches for all documents (< span class = "codefrag" > *:*< / span > ) and
requests counts by the category field < span class = "codefrag" > cat< / span > .
< / p >
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?wt=json&indent=on&q=*:*&fl=name&facet=true&facet.field=cat" > ...& q=*:*& facet=true& facet.field=cat< / a >
2009-10-25 17:59:38 -04:00
< / p >
< p >
Notice that although only the first 10 documents are returned in the results list,
the facet counts generated are for the complete set of documents that match the query.
< / p >
< p >
2009-10-26 08:50:08 -04:00
We can facet multiple ways at the same time. The following example adds a facet on the
2009-10-25 17:59:38 -04:00
boolean < span class = "codefrag" > inStock< / span > field:
< / p >
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?wt=json&indent=on&q=*:*&fl=name&facet=true&facet.field=cat&facet.field=inStock" > ...& q=*:*& facet=true& facet.field=cat& facet.field=inStock< / a >
2009-10-25 17:59:38 -04:00
< / p >
< p >
Solr can also generate counts for arbitrary queries. The following example
queries for < span class = "codefrag" > ipod< / span > and shows prices below and above 100 by using
range queries on the price field.
< / p >
< p >
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?wt=json&indent=on&q=ipod&fl=name&facet=true&facet.query=price:[0+TO+100]&facet.query=price:[100+TO+*]" > ...& q=ipod& facet=true& facet.query=price:[0 TO 100]& facet.query=price:[100 TO *]< / a >
2009-10-25 17:59:38 -04:00
< / p >
< p >
2012-06-06 22:23:54 -04:00
Solr can even facet by numeric ranges (including dates). This example requests counts for the manufacture date (< span class = "codefrag" > manufacturedate_dt< / span > field) for each year between 2004 and 2010.
2009-10-25 17:59:38 -04:00
< / p >
< p >
2012-09-14 15:23:16 -04:00
< a href = "http://localhost:8983/solr/collection1/select/?wt=json&indent=on&q=*:*&fl=name,manufacturedate_dt&facet=true&facet.range=manufacturedate_dt&facet.range.start=2004-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=%2b1YEAR" > ...& q=*:*& facet=true& facet.range=manufacturedate_dt& facet.range.start=2004-01-01T00:00:00Z& facet.range.end=2010-01-01T00:00:00Z& facet.range.gap=+1YEAR< / a >
2009-10-25 17:59:38 -04:00
< / p >
< p >
More information on faceted search may be found on the
< a href = "http://wiki.apache.org/solr/SolrFacetingOverview" > faceting overview< / a >
and
< a href = "http://wiki.apache.org/solr/SimpleFacetParameters" > faceting parameters< / a >
pages.
< / p >
< / div > <!-- faceted search -->
2012-01-07 12:07:21 -05:00
< a name = "N10278" > < / a > < a name = "Search+UI" > < / a >
2011-03-17 13:32:39 -04:00
< h2 class = "boxed" > Search UI< / h2 >
< div class = "section" >
< p >
2012-06-06 22:23:54 -04:00
Solr includes an example search interface built with < a href = "https://wiki.apache.org/solr/VelocityResponseWriter" > velocity templating< / a >
that demonstrates many features, including searching, faceting, highlighting,
autocomplete, and geospatial searching.
< / p >
2011-03-17 13:32:39 -04:00
< p >
2012-06-06 22:23:54 -04:00
Try it out at
< a href = "http://localhost:8983/solr/collection1/browse" > http://localhost:8983/solr/collection1/browse< / a >
2011-03-17 13:32:39 -04:00
< / p >
< / div > <!-- Search UI -->
2012-01-07 12:07:21 -05:00
< a name = "N1028B" > < / a > < a name = "Text+Analysis" > < / a >
2006-02-28 14:24:54 -05:00
< h2 class = "boxed" > Text Analysis< / h2 >
< div class = "section" >
< p >
2011-05-29 06:27:23 -04:00
Text fields are typically indexed by breaking the text into words and applying various transformations such as
2006-02-28 14:24:54 -05:00
lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally
applied to any queries in order to match what is indexed.
< / p >
2011-05-29 06:27:23 -04:00
< p >
The < a href = "http://wiki.apache.org/solr/SchemaXml" > schema< / a > defines
2012-06-06 22:23:54 -04:00
the fields in the index and what type of analysis is applied to them. The current schema your collection is using
2013-01-28 19:35:16 -05:00
may be viewed directly via the < a href = "http://localhost:8983/solr/#/collection1/schema" > Schema tab< / a > in the Admin UI, or explored dynamically using the < a href = "http://localhost:8983/solr/#/collection1/schema-browser" > Schema Browser tab< / a > .
2012-06-06 22:23:54 -04:00
< / p >
2011-05-29 06:27:23 -04:00
< p >
2012-06-06 22:23:54 -04:00
The best analysis components (tokenization and filtering) for your textual
content depends heavily on language.
As you can see in the < a href = "http://localhost:8983/solr/#/collection1/schema-browser?type=text_general" > Schema Browser< / a > ,
many of the fields in the example schema are using a
< span class = "codefrag" > fieldType< / span > named
< span class = "codefrag" > text_general< / span > , which has defaults appropriate for
most languages.
< / p >
2011-05-29 06:27:23 -04:00
< p >
2012-03-28 00:55:10 -04:00
If you know your textual content is English, as is the case for the example
documents in this tutorial, and you'd like to apply English-specific stemming
and stop word removal, as well as split compound words, you can use the
2012-09-14 15:23:16 -04:00
< a href = "http://localhost:8983/solr/#/collection1/schema-browser?type=text_en_splitting" > < span class = "codefrag" > text_en_splitting< / span > < / a > fieldType instead.
2012-03-28 00:55:10 -04:00
Go ahead and edit the < span class = "codefrag" > schema.xml< / span > in the
2012-08-23 19:51:42 -04:00
< span class = "codefrag" > solr/example/solr/collection1/conf< / span > directory,
2012-03-28 00:55:10 -04:00
to use the < span class = "codefrag" > text_en_splitting< / span > fieldType for
the < span class = "codefrag" > text< / span > and
< span class = "codefrag" > features< / span > fields like so:
< / p >
< pre class = "code" >
< field name="features" < b > type="text_en_splitting"< / b > indexed="true" stored="true" multiValued="true"/>
...
< field name="text" < b > type="text_en_splitting"< / b > indexed="true" stored="false" multiValued="true"/>
< / pre >
< p >
Stop and restart Solr after making these changes and then re-post all of
the example documents using
< span class = "codefrag" > java -jar post.jar *.xml< / span > .
Now queries like the ones listed below will demonstrate English-specific
transformations:
2011-05-29 06:27:23 -04:00
< / p >
2006-02-28 14:24:54 -05:00
< ul >
< li > A search for
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=power-shot&fl=name" > power-shot< / a >
2012-03-28 00:55:10 -04:00
can match < span class = "codefrag" > PowerShot< / span > , and
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=adata&fl=name" > adata< / a >
2012-03-28 00:55:10 -04:00
can match < span class = "codefrag" > A-DATA< / span > by using the
< span class = "codefrag" > WordDelimiterFilter< / span > and < span class = "codefrag" > LowerCaseFilter< / span > .
< / li >
2006-02-28 14:24:54 -05:00
< li > A search for
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=features:recharging&fl=name,features" > features:recharging< / a >
2012-03-28 00:55:10 -04:00
can match < span class = "codefrag" > Rechargeable< / span > using the stemming
features of < span class = "codefrag" > PorterStemFilter< / span > .
< / li >
2006-02-28 14:24:54 -05:00
< li > A search for
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=%221 gigabyte%22&fl=name" > "1 gigabyte"< / a >
2012-03-28 00:55:10 -04:00
can match < span class = "codefrag" > 1GB< / span > , and the commonly misspelled
2012-06-06 22:23:54 -04:00
< a href = "http://localhost:8983/solr/collection1/select?q=pixima&fl=name" > pixima< / a > can matches < span class = "codefrag" > Pixma< / span > using the
2012-03-28 00:55:10 -04:00
< span class = "codefrag" > SynonymFilter< / span > .
< / li >
2006-02-28 14:24:54 -05:00
< / ul >
< p > A full description of the analysis components, Analyzers, Tokenizers, and TokenFilters
available for use is < a href = "http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters" > here< / a > .
< / p >
2012-01-07 12:07:21 -05:00
< a name = "N1030B" > < / a > < a name = "Analysis+Debugging" > < / a >
2012-06-06 22:23:54 -04:00
2006-02-28 14:24:54 -05:00
< h3 class = "boxed" > Analysis Debugging< / h3 >
< p >
2012-06-06 22:23:54 -04:00
There is a handy < a href = "http://localhost:8983/solr/#/collection1/analysis" > Analysis tab< / a >
where you can see how a text value is broken down into words by both Index time nad Query time analysis chains for a field or field type. This page shows the resulting tokens after they pass through each filter in the chains.
2012-03-28 00:55:10 -04:00
< / p >
2006-02-28 14:24:54 -05:00
< p >
2012-09-17 20:17:04 -04:00
< a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=&analysis.fieldtype=text_en_splitting&verbose_output=0" > This url< / a >
2012-06-06 22:23:54 -04:00
shows the tokens created from
"< span class = "codefrag" > Canon Power-Shot SD500< / span > "
using the
< span class = "codefrag" > text_en_splitting< / span > type. Each section of
2012-03-28 00:55:10 -04:00
the table shows the resulting tokens after having passed through the next
2012-06-06 22:23:54 -04:00
< span class = "codefrag" > TokenFilter< / span > in the (Index) analyzer.
2012-03-28 00:55:10 -04:00
Notice how both < span class = "codefrag" > powershot< / span > and
< span class = "codefrag" > power< / span > , < span class = "codefrag" > shot< / span >
2012-06-06 22:23:54 -04:00
are indexed, using tokens that have the same "position".
(Compare the previous output with
2012-09-17 20:17:04 -04:00
< a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=&analysis.fieldtype=text_general&verbose_output=0" > The tokens produced using the text_general field type< / a > .)
2012-03-28 00:55:10 -04:00
< / p >
2012-06-06 22:23:54 -04:00
2012-03-28 00:55:10 -04:00
< p >
2012-09-17 20:17:04 -04:00
Mousing over the section label to the left of the section will display the full name of the analyzer component at that stage of the chain. Toggling the "Verbose Output" checkbox will < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=&analysis.fieldtype=text_en_splitting&verbose_output=1" > show/hide the detailed token attributes< / a > .
2012-03-28 00:55:10 -04:00
< / p >
< p >
2012-09-17 20:17:04 -04:00
When both < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=Canon+Power-Shot+SD500&analysis.query=power+shot+sd-500&analysis.fieldtype=text_en_splitting&verbose_output=0" > Index and Query< / a >
2012-06-06 22:23:54 -04:00
values are provided, two tables will be displayed side by side showing the
2013-01-28 19:35:16 -05:00
results of each chain. Terms in the Index chain results that are equivalent
2012-06-06 22:23:54 -04:00
to the final terms produced by the Query chain will be highlighted.
2012-03-28 00:55:10 -04:00
< / p >
< p >
Other interesting examples:
< / p >
< ul >
2012-09-17 20:17:04 -04:00
< li > < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldvalue=A+new+nation%2C+conceived+in+liberty+and+dedicated+to+the+proposition+that+all+men+are+created+equal.%0A&analysis.query=liberties+and+equality&analysis.fieldtype=text_en&verbose_output=0" > English stemming and stop-words< / a >
2012-03-28 00:55:10 -04:00
using the < span class = "codefrag" > text_en< / span > field type
< / li >
2012-09-17 20:17:04 -04:00
< li > < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_cjk&analysis.fieldvalue=%EF%BD%B6%EF%BE%80%EF%BD%B6%EF%BE%85&analysis.query=%E3%82%AB%E3%82%BF%E3%82%AB%E3%83%8A&verbose_output=1" > Half-width katakana normalization with bi-graming< / a >
2012-03-28 00:55:10 -04:00
using the < span class = "codefrag" > text_cjk< / span > field type
< / li >
2012-09-17 20:17:04 -04:00
< li > < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_ja&analysis.fieldvalue=%E7%A7%81%E3%81%AF%E5%88%B6%E9%99%90%E3%82%B9%E3%83%94%E3%83%BC%E3%83%89%E3%82%92%E8%B6%85%E3%81%88%E3%82%8B%E3%80%82&verbose_output=1" > Japanese morphological decomposition with part-of-speech filtering< / a >
2012-03-28 00:55:10 -04:00
using the < span class = "codefrag" > text_ja< / span > field type
< / li >
2012-09-17 20:17:04 -04:00
< li > < a href = "http://localhost:8983/solr/#/collection1/analysis?analysis.fieldtype=text_ar&analysis.fieldvalue=%D9%84%D8%A7+%D8%A3%D8%AA%D9%83%D9%84%D9%85+%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9&verbose_output=1" > Arabic stop-words, normalization, and stemming< / a >
2012-03-28 00:55:10 -04:00
using the < span class = "codefrag" > text_ar< / span > field type
< / li >
< / ul >
2006-02-28 14:24:54 -05:00
< / div >
2006-01-30 18:11:30 -05:00
2012-01-07 12:07:21 -05:00
< a name = "N1034D" > < / a > < a name = "Conclusion" > < / a >
2006-04-05 14:48:23 -04:00
< h2 class = "boxed" > Conclusion< / h2 >
< div class = "section" >
< p >
2009-10-25 17:59:38 -04:00
Congratulations! You successfully ran a small Solr instance, added some
2011-05-29 06:27:23 -04:00
documents, and made changes to the index and schema. You learned about queries, text
2009-10-25 17:59:38 -04:00
analysis, and the Solr admin interface. You're ready to start using Solr on
your own project! Continue on with the following steps:
2006-04-05 14:48:23 -04:00
< / p >
< ul >
2012-03-06 19:20:31 -05:00
< li > Subscribe to the Solr < a href = "http://lucene.apache.org/solr/discussion.html" > mailing lists< / a > !< / li >
2006-04-05 14:48:23 -04:00
2011-05-29 06:27:23 -04:00
< li > Make a copy of the Solr < span class = "codefrag" > example< / span > directory as a template for your project.< / li >
2006-04-05 14:48:23 -04:00
2012-07-09 12:43:07 -04:00
< li > Customize the schema and other config in < span class = "codefrag" > solr/collection1/conf/< / span > to meet your needs.< / li >
2006-04-05 14:48:23 -04:00
< / ul >
2009-10-25 17:59:38 -04:00
< p >
2011-05-29 06:27:23 -04:00
Solr has a ton of other features that we haven't touched on here, including
2009-10-25 17:59:38 -04:00
< a href = "http://wiki.apache.org/solr/DistributedSearch" > distributed search< / a >
to handle huge document collections,
< a href = "http://wiki.apache.org/solr/FunctionQuery" > function queries< / a > ,
< a href = "http://wiki.apache.org/solr/StatsComponent" > numeric field statistics< / a > ,
and
< a href = "http://wiki.apache.org/solr/ClusteringComponent" > search results clustering< / a > .
2011-05-29 06:27:23 -04:00
Explore the < a href = "http://wiki.apache.org/solr/FrontPage" > Solr Wiki< / a > to find
2012-03-06 19:20:31 -05:00
more details about Solr's many < a href = "http://lucene.apache.org/solr/features.html" > features< / a > .
2009-10-25 17:59:38 -04:00
< / p >
2006-04-05 14:48:23 -04:00
< p >
Have Fun, and we'll see you on the Solr mailing lists!
< / p >
< / div >
2006-01-30 18:11:30 -05:00
< / div >
2012-03-06 19:20:31 -05:00
2006-01-30 18:11:30 -05:00
< div class = "clearboth" > < / div >
2012-03-06 19:20:31 -05:00
2006-01-30 18:11:30 -05:00
< div id = "footer" >
< div class = "copyright" >
Copyright ©
2012-03-27 20:31:33 -04:00
2012 < a href = "http://www.apache.org/licenses/" > The Apache Software Foundation.< / a >
2006-01-30 18:11:30 -05:00
< / div >
< / div >
< / body >
< / html >