From 83db0a52a103a5df0c22871bb366096bd389298c Mon Sep 17 00:00:00 2001 From: Erik Hatcher Date: Wed, 14 Jan 2015 03:05:35 +0000 Subject: [PATCH] SOLR-6870: overhaul/rename tutorial git-svn-id: https://svn.apache.org/repos/asf/lucene/dev/trunk@1651560 13f79535-47bb-0310-9956-ffa450edef68 --- solr/CHANGES.txt | 2 +- solr/README.txt | 10 +- solr/build.xml | 8 +- solr/example/README.txt | 2 +- .../SYSTEM_REQUIREMENTS.mdtext} | 0 solr/site/html/solr.svg | 39 - solr/site/html/tutorial.html | 686 ------------------ solr/site/{xsl => }/index.xsl | 2 +- solr/site/quickstart.mdtext | 596 +++++++++++++++ 9 files changed, 606 insertions(+), 739 deletions(-) rename solr/{SYSTEM_REQUIREMENTS.txt => site/SYSTEM_REQUIREMENTS.mdtext} (100%) delete mode 100644 solr/site/html/solr.svg delete mode 100755 solr/site/html/tutorial.html rename solr/site/{xsl => }/index.xsl (97%) create mode 100644 solr/site/quickstart.mdtext diff --git a/solr/CHANGES.txt b/solr/CHANGES.txt index 834b6e616a7..d9dcd20732a 100644 --- a/solr/CHANGES.txt +++ b/solr/CHANGES.txt @@ -15,7 +15,7 @@ Getting Started You need a Java 1.8 VM or later installed. In this release, there is an example Solr server including a bundled servlet container in the directory named "example". -See the tutorial at http://lucene.apache.org/solr/tutorial.html +See the Quick Start guide at http://lucene.apache.org/solr/quickstart.html ================== 6.0.0 ================== diff --git a/solr/README.txt b/solr/README.txt index 2895cfd885c..02edd32fc92 100644 --- a/solr/README.txt +++ b/solr/README.txt @@ -89,15 +89,11 @@ For more information about Solr examples please read... * example/solr/README.txt For more information about the "Solr Home" and Solr specific configuration - * http://lucene.apache.org/solr/tutorial.html - For a Tutorial using this example configuration - * http://wiki.apache.org/solr/SolrResources + * http://lucene.apache.org/solr/quickstart.html + For a Quick Start guide + * http://lucene.apache.org/solr/resources.html For a list of other tutorials and introductory articles. -A tutorial is available at: - - http://lucene.apache.org/solr/tutorial.html - or linked from "docs/index.html" in a binary distribution. Also, there are Solr clients for many programming languages, see diff --git a/solr/build.xml b/solr/build.xml index e58ff1250f5..ccd1325dffa 100644 --- a/solr/build.xml +++ b/solr/build.xml @@ -151,7 +151,7 @@ so we pass ourself (${ant.file}) here. The list of module build.xmls is given via string parameter, that must be splitted by the XSL at '|'. --> - + @@ -162,12 +162,12 @@ - - + + - + diff --git a/solr/example/README.txt b/solr/example/README.txt index fd7cb7d8966..26f3b67c4f2 100644 --- a/solr/example/README.txt +++ b/solr/example/README.txt @@ -48,7 +48,7 @@ For more information about this example please read... * example/solr/README.txt For more information about the "Solr Home" and Solr specific configuration - * http://lucene.apache.org/solr/tutorial.html + * http://lucene.apache.org/solr/quickstart.html For a Tutorial using this example configuration * http://wiki.apache.org/solr/SolrResources For a list of other tutorials and introductory articles. diff --git a/solr/SYSTEM_REQUIREMENTS.txt b/solr/site/SYSTEM_REQUIREMENTS.mdtext similarity index 100% rename from solr/SYSTEM_REQUIREMENTS.txt rename to solr/site/SYSTEM_REQUIREMENTS.mdtext diff --git a/solr/site/html/solr.svg b/solr/site/html/solr.svg deleted file mode 100644 index cb4ae64f814..00000000000 --- a/solr/site/html/solr.svg +++ /dev/null @@ -1,39 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/solr/site/html/tutorial.html b/solr/site/html/tutorial.html deleted file mode 100755 index 7e332256ecc..00000000000 --- a/solr/site/html/tutorial.html +++ /dev/null @@ -1,686 +0,0 @@ - - - - - -Solr Tutorial - - - - - -
-

Solr Tutorial

- - -

Overview

-
-

-This document covers the basics of running Solr using an example -schema, and some sample data. -

-
- - - -

Requirements

-
-

-To follow along with this tutorial, you will need... -

-
    - -
  1. Java 1.8 or greater. Some places you can get it are from - Oracle, - Open JDK, or - IBM. -
      -
    • Running java -version at the command - line should indicate a version number starting with 1.8. -
    • -
    • Gnu's GCJ is not supported and does not work with Solr.
    • -
    -
  2. - -
  3. A Solr release. -
  4. - -
-
- - - -

Getting Started

-
-

- -Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server. - -

-

-Begin by unzipping the Solr release and changing your working directory -to be the "example" directory. (Note that the base directory name may vary with the version of Solr downloaded.) For example, with a shell in UNIX, Cygwin, or MacOS: -

-
-user:~solr$ ls
-solr-nightly.zip
-user:~solr$ unzip -q solr-nightly.zip
-user:~solr$ cd solr-nightly/example/
-
-

-Solr can run in any Java Servlet Container of your choice, but to simplify -this tutorial, the example index includes a small installation of Jetty. -

-

-To launch Jetty with the Solr WAR, and the example configs, just run the start.jar ... -

-
-user:~/solr/example$ java -jar start.jar
-2012-06-06 15:25:59.815:INFO:oejs.Server:jetty-8.1.2.v20120308
-2012-06-06 15:25:59.834:INFO:oejdp.ScanningAppProvider:Deployment monitor .../solr/example/webapps at interval 0
-2012-06-06 15:25:59.839:INFO:oejd.DeploymentManager:Deployable added: .../solr/example/webapps/solr.war
-...
-Jun 6, 2012 3:26:03 PM org.apache.solr.core.SolrCore registerSearcher
-INFO: [collection1] Registered new searcher Searcher@7527e2ee main{StandardDirectoryReader(segments_1:1)}
-
-

-This will start up the Jetty application server on port 8983, and use your terminal to display the logging information from Solr. -

-

-You can see that the Solr is running by loading http://localhost:8983/solr/ in your web browser. This is the main starting point for Administering Solr. -

-
- - - - - -

Indexing Data

-
-

-Your Solr server is up and running, but it doesn't contain any data. You can -modify a Solr index by POSTing commands to Solr to add (or -update) documents, delete documents, and commit pending adds and deletes. -These commands can be in a -variety of formats. -

-

-The exampledocs directory contains sample files -showing of the types of commands Solr accepts, as well as a java utility -for posting them from the command line. Run java -jar post.jar -h so see it's various options. -

-

To try this, open a new terminal window, enter the exampledocs directory, -and run "java -Dc=collection_name -jar post.jar" on some of the XML -files in that directory. -

-
-user:~/solr/example/exampledocs$ java -Dc=techproducts -jar post.jar solr.xml monitor.xml
-SimplePostTool: version 1.4
-SimplePostTool: POSTing files to http://localhost:8983/solr/update..
-SimplePostTool: POSTing file solr.xml
-SimplePostTool: POSTing file monitor.xml
-SimplePostTool: COMMITting Solr index changes..
-
-

-You have now indexed two documents in Solr, and committed these changes. -You can now search for "solr" by loading the "Query" tab in the Admin interface, and entering "solr" in the "q" text box. Clicking the "Execute Query" button should display the following URL containing one result... -

-

-http://localhost:8983/solr/collection1/select?q=solr&wt=xml - -

-

-You can index all of the sample data, using the following command -(assuming your command line shell supports the *.xml notation): -

-
-user:~/solr/example/exampledocs$ java -Dc=techproducts -jar post.jar *.xml
-SimplePostTool: version 1.4
-SimplePostTool: POSTing files to http://localhost:8983/solr/update..
-SimplePostTool: POSTing file gb18030-example.xml
-SimplePostTool: POSTing file hd.xml
-SimplePostTool: POSTing file ipod_other.xml
-SimplePostTool: POSTing file ipod_video.xml
-...
-SimplePostTool: POSTing file solr.xml
-SimplePostTool: POSTing file utf8-example.xml
-SimplePostTool: POSTing file vidcard.xml
-SimplePostTool: COMMITting Solr index changes..
-
-

- ...and now you can search for all sorts of things using the default Solr Query Syntax (a superset of the Lucene query syntax)... -

- -

-

- There are many other different ways to import your data into Solr... one can -

-
    - -
  • Import records from a database using the - Data Import Handler (DIH). -
  • - -
  • -Load a CSV file (comma separated values), - including those exported by Excel or MySQL. -
  • - -
  • -POST JSON documents - -
  • - -
  • Index binary documents such as Word and PDF with - Solr Cell (ExtractingRequestHandler). -
  • - -
  • - Use SolrJ for Java or other Solr clients to - programatically create documents to send to Solr. -
  • - - -
-
- - - - - -

Updating Data

-
-

-You may have noticed that even though the file solr.xml has now -been POSTed to the server twice, you still only get 1 result when searching for -"solr". This is because the example schema.xml specifies a "uniqueKey" field -called "id". Whenever you POST commands to Solr to add a -document with the same value for the uniqueKey as an existing document, it -automatically replaces it for you. You can see that that has happened by -looking at the values for numDocs and maxDoc in the -"CORE"/searcher section of the statistics page...

-

- -http://localhost:8983/solr/#/collection1/plugins/core?entry=searcher - -

-

- -numDocs represents the number of searchable documents in the - index (and will be larger than the number of XML files since some files - contained more than one <doc>). maxDoc - may be larger as the maxDoc count includes logically deleted documents that - have not yet been removed from the index. You can re-post the sample XML - files over and over again as much as you want and numDocs will never - increase, because the new documents will constantly be replacing the old. -

-

-Go ahead and edit the existing XML files to change some of the data, and re-run -the java -jar post.jar command, you'll see your changes reflected -in subsequent searches. -

- -

Deleting Data

- -

-You can delete data by POSTing a delete command to the update URL and -specifying the value of the document's unique key field, or a query that -matches multiple documents (be careful with that one!). Since these commands -are smaller, we will specify them right on the command line rather than -reference an XML file. -

- -

Execute the following command to delete a specific document

-
java -Ddata=args -Dcommit=false -jar post.jar "<delete><id>SP2514N</id></delete>"
- -

-Because we have specified "commit=false", a search for id:SP2514N we still find the document we have deleted. Since the example configuration uses Solr's "autoCommit" feature Solr will still automatically persist this change to the index, but it will not affect search results until an "openSearcher" commit is explicitly executed. -

- -

-Using the statistics page -for the updateHandler you can observe this delete -propogate to disk by watching the deletesById -value drop to 0 as the cumulative_deletesById -and autocommit values increase. -

- -

-Here is an example of using delete-by-query to delete anything with -DDR in the name: -

-
java -Dcommit=false -Ddata=args -jar post.jar "<delete><query>name:DDR</query></delete>"
- -

-You can force a new searcher to be opened to reflect these changes by sending an explicit commit command to Solr: -

-
java -jar post.jar -
- -

-Now re-execute the previous search -and verify that no matching documents are found. You can also revisit the -statistics page and observe the changes to both the number of commits in the updateHandler and the numDocs in the searcher. -

- -

-Commits that open a new searcher can be expensive operations so it's best to -make many changes to an index in a batch and then send the -commit command at the end. -There is also an optimize command that does the -same things as commit, but also forces all index -segments to be merged into a single segment -- this can be very resource -intensive, but may be worthwhile for improving search speed if your index -changes very infrequently. -

-

-All of the update commands can be specified using either XML or JSON. -

- -

To continue with the tutorial, re-add any documents you may have deleted by going to the exampledocs directory and executing

-
java -jar post.jar *.xml
-
- - - -

Querying Data

-
-

- Searches are done via HTTP GET on the select URL with the query string in the q parameter. - You can pass a number of optional request parameters - to the request handler to control what information is returned. For example, you can use the "fl" parameter - to control what stored fields are returned, and if the relevancy score is returned: -

- -

-The query form -provided in the web admin interface allows setting various request parameters -and is useful when testing or debugging queries. -

- - -

Sorting

-

- Solr provides a simple method to sort on one or more indexed fields. - Use the "sort' parameter to specify "field direction" pairs, separated by commas if there's more than one sort field: -

- -

- "score" can also be used as a field name when specifying a sort: -

- -

- Complex functions may also be used to sort results: -

- -

- If no sort is specified, the default is score desc to return the matches having the highest relevancy. -

-
- - - - -

Highlighting

-
-

- Hit highlighting returns relevant snippets of each returned document, and highlights - terms from the query within those context snippets. -

-

- The following example searches for video card and requests - highlighting on the fields name,features. This causes a - highlighting section to be added to the response with the - words to highlight surrounded with <em> (for emphasis) - tags. -

-

- -...&q=video card&fl=name,id&hl=true&hl.fl=name,features - -

-

- More request parameters related to controlling highlighting may be found - here. -

-
- - - - -

Faceted Search

-
-

- Faceted search takes the documents matched by a query and generates counts for various - properties or categories. Links are usually provided that allows users to "drill down" or - refine their search results based on the returned categories. -

-

- The following example searches for all documents (*:*) and - requests counts by the category field cat. -

-

- -...&q=*:*&facet=true&facet.field=cat - -

-

- Notice that although only the first 10 documents are returned in the results list, - the facet counts generated are for the complete set of documents that match the query. -

-

- We can facet multiple ways at the same time. The following example adds a facet on the - boolean inStock field: -

-

- -...&q=*:*&facet=true&facet.field=cat&facet.field=inStock - -

-

- Solr can also generate counts for arbitrary queries. The following example - queries for ipod and shows prices below and above 100 by using - range queries on the price field. -

-

- -...&q=ipod&facet=true&facet.query=price:[0 TO 100]&facet.query=price:[100 TO *] - -

-

- Solr can even facet by numeric ranges (including dates). This example requests counts for the manufacture date (manufacturedate_dt field) for each year between 2004 and 2010. -

-

- -...&q=*:*&facet=true&facet.range=manufacturedate_dt&facet.range.start=2004-01-01T00:00:00Z&facet.range.end=2010-01-01T00:00:00Z&facet.range.gap=+1YEAR - -

-

- More information on faceted search may be found on the - faceting overview - and - faceting parameters - pages. -

-
- - - - -

Search UI

-
-

-Solr includes an example search interface built with velocity templating -that demonstrates many features, including searching, faceting, highlighting, -autocomplete, and geospatial searching. -

-

-Try it out at -http://localhost:8983/solr/collection1/browse - -

-
- - - - - -

Text Analysis

-
-

- Text fields are typically indexed by breaking the text into words and applying various transformations such as - lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally - applied to any queries in order to match what is indexed. -

-

- The schema defines - the fields in the index and what type of analysis is applied to them. The current schema your collection is using - may be viewed directly via the Schema tab in the Admin UI, or explored dynamically using the Schema Browser tab. -

-

-The best analysis components (tokenization and filtering) for your textual -content depends heavily on language. -As you can see in the Schema Browser, -many of the fields in the example schema are using a -fieldType named -text_general, which has defaults appropriate for -most languages. -

- -

- If you know your textual content is English, as is the case for the example - documents in this tutorial, and you'd like to apply English-specific stemming - and stop word removal, as well as split compound words, you can use the - text_en_splitting fieldType instead. - Go ahead and edit the schema.xml in the - solr/example/solr/collection1/conf directory, - to use the text_en_splitting fieldType for - the text and - features fields like so: -

-
-   <field name="features" type="text_en_splitting" indexed="true" stored="true" multiValued="true"/>
-   ...
-   <field name="text" type="text_en_splitting" indexed="true" stored="false" multiValued="true"/>
-
-

- Stop and restart Solr after making these changes and then re-post all of - the example documents using - java -jar post.jar *.xml. - Now queries like the ones listed below will demonstrate English-specific - transformations: -

-
    - -
  • A search for - power-shot - can match PowerShot, and - adata - can match A-DATA by using the - WordDelimiterFilter and LowerCaseFilter. -
  • - - -
  • A search for - features:recharging - can match Rechargeable using the stemming - features of PorterStemFilter. -
  • - - -
  • A search for - "1 gigabyte" - can match 1GB, and the commonly misspelled - pixima can matches Pixma using the - SynonymFilter. -
  • - - -
-

A full description of the analysis components, Analyzers, Tokenizers, and TokenFilters - available for use is here. -

- - - -

Analysis Debugging

-

-There is a handy Analysis tab -where you can see how a text value is broken down into words by both Index time nad Query time analysis chains for a field or field type. This page shows the resulting tokens after they pass through each filter in the chains. -

-

- This url - shows the tokens created from - "Canon Power-Shot SD500" - using the - text_en_splitting type. Each section of - the table shows the resulting tokens after having passed through the next - TokenFilter in the (Index) analyzer. - Notice how both powershot and - power, shot - are indexed, using tokens that have the same "position". - (Compare the previous output with - The tokens produced using the text_general field type.) -

- -

-Mousing over the section label to the left of the section will display the full name of the analyzer component at that stage of the chain. Toggling the "Verbose Output" checkbox will show/hide the detailed token attributes. -

-

-When both Index and Query -values are provided, two tables will be displayed side by side showing the -results of each chain. Terms in the Index chain results that are equivalent -to the final terms produced by the Query chain will be highlighted. -

-

- Other interesting examples: -

- - -
- - - -

Conclusion

-
-

- Congratulations! You successfully ran a small Solr instance, added some - documents, and made changes to the index and schema. You learned about queries, text - analysis, and the Solr admin interface. You're ready to start using Solr on - your own project! Continue on with the following steps: -

-
    - -
  • Subscribe to the Solr mailing lists!
  • - -
  • Make a copy of the Solr example directory as a template for your project.
  • - -
  • Customize the schema and other config in solr/collection1/conf/ to meet your needs.
  • - -
-

- Solr has a ton of other features that we haven't touched on here, including - distributed search - to handle huge document collections, - function queries, - numeric field statistics, - and - search results clustering. - Explore the Solr Wiki to find - more details about Solr's many features. -

-

- Have Fun, and we'll see you on the Solr mailing lists! -

-
- -
- -
 
- - - - diff --git a/solr/site/xsl/index.xsl b/solr/site/index.xsl similarity index 97% rename from solr/site/xsl/index.xsl rename to solr/site/index.xsl index c59825d040a..c273428c8d4 100644 --- a/solr/site/xsl/index.xsl +++ b/solr/site/index.xsl @@ -74,7 +74,7 @@
  • Wiki: Additional documentation, especially focused on using Solr.
  • Changes: List of changes in this release.
  • System Requirements: Minimum and supported Java versions.
  • -
  • Solr Tutorial: This document covers the basics of running Solr using an example schema, and some sample data.
  • +
  • Solr Quick Start: This document covers the basics of running Solr using an example schema, and some sample data.
  • Lucene Documentation
  • API Javadocs

    diff --git a/solr/site/quickstart.mdtext b/solr/site/quickstart.mdtext new file mode 100644 index 00000000000..753e0f0feec --- /dev/null +++ b/solr/site/quickstart.mdtext @@ -0,0 +1,596 @@ +# Solr Quick Start + + + + +## Overview + +This document covers getting Solr up and running, ingesting a variety of data sources into multiple collections, +and getting a feel for the Solr administrative and search interfaces. + +## Requirements + +To follow along with this tutorial, you will need... + +1. To meet the [system requirements](SYSTEM_REQUIREMENTS.html) +2. An Apache Solr release. This tutorial was written using Apache Solr 5.0.0. + +## Getting Started + +Please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly +point to your Solr server. + +Begin by unzipping the Solr release and changing your working directory to the subdirectory where Solr was installed. +Note that the base directory name may vary with the version of Solr downloaded. For example, with a shell in UNIX, +Cygwin, or MacOS: + + /:$ ls solr* + solr-5.0.0.zip + /:$ unzip -q solr-5.0.0.zip -d solr5 + /:$ cd solr5/ + +To launch Solr, run: `bin/solr start -e cloud -noprompt` + + /solr5:$ bin/solr start -e cloud -noprompt + Welcome to the SolrCloud example! + + + Starting up 2 Solr nodes for your example SolrCloud cluster. + ... + + Started Solr server on port 8983 (pid=8404). Happy searching! + ... + + Started Solr server on port 7574 (pid=8549). Happy searching! + ... + + SolrCloud example running, please visit http://localhost:8983/solr + + /solr5:$ + +You can see that the Solr is running by loading the Solr Admin UI in your web browser: . +This is the main starting point for administering Solr. + +Solr will now be running two "nodes", one on port 7574 and one on port 8983. There is one collection created +automatically, `gettingstarted`, a two shard collection, each with two replicas. +The [Cloud tab](http://localhost:8983/solr/#/~cloud) in the Admin UI diagrams the collection nicely: + +Solr Quick Start: SolrCloud diagram + +## Indexing Data + +Your Solr server is up and running, but it doesn't contain any data. The Solr install includes the `bin/post` tool in +order to facilitate getting various types of documents easily into Solr from the start. We'll be +using this tool for the indexing examples below. + +You'll need a command shell to run these examples, rooted in the Solr install directory; the shell from where you +launched Solr works just fine. + +### Indexing a directory of "rich" files + +Let's first index local "rich" files including HTML, PDF, Microsoft Office formats (such as MS Word), plain text and +many other formats. `SimplePostTool` features the ability to crawl a directory of files, optionally recursively even, +sending the raw content of each file into Solr for extraction and indexing. A Solr install includes a `docs/` +subdirectory, so that makes a convenient set of (mostly) HTML files built-in to start with. + + bin/post gettingstarted docs/ + +Here's what it'll look like: + + /solr5:$ bin/post gettingstarted docs/ + SimplePostTool version 1.5 + Posting files to base url http://localhost:8983/solr/update.. + Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log + Entering recursive mode, max depth=999, delay=0s + Indexing directory docs (3 files, depth=0) + POSTing file index.html (text/html) + POSTing file SYSTEM_REQUIREMENTS.html (text/html) + POSTing file tutorial.html (text/html) + Indexing directory docs/changes (1 files, depth=1) + POSTing file Changes.html (text/html) + Indexing directory docs/solr-analysis-extras (8 files, depth=1) + ... + 2945 files indexed. + COMMITting Solr index changes to http://localhost:8983/solr/update.. + Time spent: 0:00:37.537 + +The command-line breaks down as follows: + + * `gettingstarted`: name of the collection to index into + * `docs/`: a relative path of the Solr install `docs/` directory + +You have now indexed thousands of documents into the `gettingstarted` collection in Solr and committed these changes. +You can search for "solr" by loading the Admin UI [Query tab](http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query), +and enter "solr" in the `q` param (replacing `*:*`, which matches all documents). See the [Searching](#searching) +section below for more information. + +To index your own data, re-run the directory indexing command pointed to your own directory of documents. For example, +on a Mac instead of `docs/` try `~/Documents/` or `~/Desktop/` ! You may want to start from a clean, empty system +again, rather than have your content in addition to the Solr `docs/` directory; see the Cleanup section [below](#cleanup) +for how to get back to a clean starting point. + +### Indexing Solr XML + +Solr supports indexing structured content in a variety of incoming formats. The historically predominant format for +getting structured content into Solr has been [Solr XML](https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-XMLFormattedIndexUpdates). +Many Solr indexers have been coded to process domain content into Solr XML output, generally HTTP POSTed directly to +Solr's `/update` endpoint. + +Solr's install includes a handful of Solr XML formatted files with example data (mostly mocked tech product data). + +Using `bin/post`, index the example Solr XML files in `example/exampledocs/`: + + bin/post -c gettingstarted example/exampledocs/*.xml (TODO: depends on SOLR-6900) + +Here's what you'll see: + + /solr5:$ bin/post -c gettingstarted example/exampledocs/*.xml + SimplePostTool version 1.5 + Posting files to base url http://localhost:8983/solr/update using content-type application/xml.. + POSTing file gb18030-example.xml + POSTing file hd.xml + POSTing file ipod_other.xml + POSTing file ipod_video.xml + POSTing file manufacturers.xml + POSTing file mem.xml + POSTing file money.xml + POSTing file monitor.xml + POSTing file monitor2.xml + POSTing file mp500.xml + POSTing file sd500.xml + POSTing file solr.xml + POSTing file utf8-example.xml + POSTing file vidcard.xml + 14 files indexed. + COMMITting Solr index changes to http://localhost:8983/solr/update.. + Time spent: 0:00:00.453 + +...and now you can search for all sorts of things using the default [Solr Query Syntax](https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser) +(a superset of the Lucene query syntax)... + +NOTE: +You can browse the documents indexed at . The `/browse` UI allows getting +a feel for how Solr's technical capabilities can be worked with in a familiar, though a bit rough and prototypical, +interactive HTML view. (The `/browse` view defaults to assuming the `gettingstarted` schema and data are a catch-all mix +of structured XML, JSON, CSV example data, and unstructured rich documents. Your own data may not look ideal at first, +though the `/browse` templates are customizable.) + +### Indexing JSON + +Solr supports indexing JSON, either arbitrary structured JSON or "Solr JSON" (which is similiar to Solr XML). + +Solr includes a small sample Solr JSON file to illustrate this capability. Again using `bin/post`, index the +sample JSON file: + + bin/post -c gettingstarted example/exampledocs/books.json (TODO: depends on SOLR-6900) + +You'll see: + + /solr5:$ bin/post -c gettingstarted example/exampledocs/books.json (TODO: depends on SOLR-6900) + SimplePostTool version 1.5 + Posting files to base url http://localhost:8983/solr/update.. + Entering auto mode. File endings considered are xml,json,csv,... + POSTing file books.json (application/json) + 1 files indexed. + COMMITting Solr index changes to http://localhost:8983/solr/update.. + Time spent: 0:00:00.084 + +To flatten (and/or split) and index arbitrary structured JSON, a topic beyond this quick start guide, check out +[Transforming and Indexing Custom JSON data](https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingcustomJSONdata). + +### Indexing CSV (Comma/Column Separated Values) + +A great conduit of data into Solr is via CSV, especially when the documents are homogeneous by all having the +same set of fields. CSV can be conveniently exported from a spreadsheet such as Excel, or exported from databases such +as MySQL. When getting started with Solr, it can often be easiest to get your structured data into CSV format and then +index that into Solr rather than a more sophisticated single step operation. + +Using SimplePostTool and the included example CSV data file, index it: + + bin/post -c gettingstarted example/exampledocs/books.csv (TODO: depends on SOLR-6900) + +In your terminal you'll see: + + /solr5:$ bin/post -c gettingstarted example/exampledocs/books.csv (TODO: depends on SOLR-6900) + SimplePostTool version 1.5 + Posting files to base url http://localhost:8983/solr/update.. + Entering auto mode. File endings considered are xml,json,csv,... + POSTing file books.csv (text/csv) + 1 files indexed. + COMMITting Solr index changes to http://localhost:8983/solr/update.. + Time spent: 0:00:00.084 + +### Other indexing techniques + +* Import records from a database using the [Data Import Handler (DIH)](https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler). + +* Use [SolrJ](https://cwiki.apache.org/confluence/display/solr/Using+SolrJ) for Java or other Solr clients to +programatically create documents to send to Solr. + +* Use the Admin UI [Documents tab](http://localhost:8983/solr/#/gettingstarted_shard1_replica1/documents) to paste in a document to be +indexed, or select `Document Builder` from the `Document Type` dropdown to build a document one field at a time. +Click on the `Submit Document` button below the form to index your document. + +*** + +## Updating Data + +You may notice that even if you index content in this guide more than once, it does not duplicate the results found. +This is because the example `schema.xml` specifies a "`uniqueKey`" field called "`id`". Whenever you POST commands to +Solr to add a document with the same value for the `uniqueKey` as an existing document, it automatically replaces it +for you. You can see that that has happened by looking at the values for `numDocs` and `maxDoc` in the "CORE"/searcher +section of the statistics page... + + + +`numDocs` represents the number of searchable documents in the index (and will be larger than the number of XML, JSON, +or CSV files since some files contained more than one document). The maxDoc value may be larger as the maxDoc count +includes logically deleted documents that have not yet been removed from the index. You can re-post the sample files +over and over again as much as you want and `numDocs` will never increase, because the new documents will constantly be +replacing the old. + +Go ahead and edit any of the existing example data files, change some of the data, and re-run the SimplePostTool command. +You'll see your changes reflected in subsequent searches. + +## Deleting Data + +You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key +field, or a query that matches multiple documents (be careful with that one!). Since these commands are smaller, we +specify them right on the command line rather than reference a JSON or XML file. + +Execute the following command to delete a specific document: + +TODO: depends on SOLR-6900 to implement within bin/post: + java -Ddata=args org.apache.solr.util.SimplePostTool "SP2514N" + + +## Searching + +Solr can be queried via REST clients cURL, wget, Chrome POSTMAN, etc., as well as via the native clients available for +many programming languages. + +The Solr Admin UI includes a query builder interface - see the `gettingstarted` query tab at . +If you click the `Execute Query` button without changing anything in the form, you'll get 10 random documents in JSON +format (`*:*` in the `q` param matches all documents): + +Solr Quick Start: gettingstarted Query tab + +The URL sent by the Admin UI to Solr is shown in light grey near the top right of the above screenshot - if you click on +it, your browser will show you the raw response. To use cURL, just give the same URL in quotes on the `curl` command line: + + curl "http://localhost:8983/solr/gettingstarted/select?q=*%3A*&wt=json&indent=true" + +In the above URL, the "`:`" in "`q=*:*`" has been URL-encoded as "`%3A`", but since "`:`" has no reserved purpose in the +query component of the URL (after the "`?`"), you don't need to URL encode it. So the following also works: + + curl "http://localhost:8983/solr/gettingstarted/select?q=*:*&wt=json&indent=true" + +### Basics + +#### Search for a single term + +To search for a term, give it as the `q` param value - in the Admin UI [Query tab](http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query), +replace `*:*` with the term you want to find. To search for "foundation": + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation" + +You'll see: + + /solr5$ curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation" + { + "responseHeader":{ + "status":0, + "QTime":0, + "params":{ + "indent":"true", + "q":"foundation", + "wt":"json"}}, + "response":{"numFound":2812,"start":0,"docs":[ + { + "id":"0553293354", + "cat":["book"], + "name":"Foundation", + ... + +The response indicates that there are 2,812 hits (`"numFound":2812`), of which the first 10 were returned, since by +default `start`=`0` and `rows`=`10`. You can specify these params to page through results, where `start` is the position +of the first result to return, and `rows` is the page size. + +To restrict fields returned in the response, use the `fl` param, which takes a comma-separated list of field names. +E.g. to only return the `id` field: + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=foundation&fl=id" + +`q=foundation` matches nearly all of the docs we've indexed, since most of the files under `docs/` contain +"The Apache Software Foundation". To restrict search to a particular field, use the syntax "`q=field:value`", +e.g. to search for `foundation` only in the `name` field: + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation" + +The above request returns only one document (`"numFound":1`) - from the response: + + ... + "response":{"numFound":1,"start":0,"docs":[ + { + "id":"0553293354", + "cat":["book"], + "name":"Foundation", + ... + +#### Phrase search + +To search for a multi-term phrase, enclose it in double quotes: `q="multiple terms here"`. E.g. to search for +"CAS latency" - note that the space between terms must be converted to "`+`" in a URL (the Admin UI will handle URL +encoding for you automatically): + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=\"CAS+latency\"" + +You'll get back: + + { + "responseHeader":{ + "status":0, + "QTime":0, + "params":{ + "indent":"true", + "q":"\"CAS latency\"", + "wt":"json"}}, + "response":{"numFound":2,"start":0,"docs":[ + { + "id":"VDBDB1A16", + "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM", + "manu":"A-DATA Technology Inc.", + "manu_id_s":"corsair", + "cat":["electronics", "memory"], + "features":["CAS latency 3,\t 2.7v"], + ... + +#### Combining searches + +By default, when you search for multiple terms and/or phrases in a single query, Solr will only require that one of them +is present in order for a document to match. Documents containing more terms will be sorted higher in the results list. + +You can require that a term or phrase is present by prefixing it with a "`+`"; conversely, to disallow the presence of a +term or phrase, prefix it with a "`-`". + +To find documents that contain both terms "`one`" and "`three`", enter `+one +three` in the `q` param in the Admin UI +[Query tab](http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query). Because the "`+`" character has a reserved purpose in URLs +(encoding the space character), you must URL encode it for `curl` as "`%2B`": + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=%2Bone+%2Bthree" + +To search for documents that contain the term "`two`" but **don't** contain the term "`one`", enter `+two -one` in the +`q` param in the Admin UI. Again, URL encode "`+`" as "`%2B`": + + curl "http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=%2Btwo+-one" + +#### In depth + +For more Solr search options, see the Solr Reference Guide's [Searching](https://cwiki.apache.org/confluence/display/solr/Searching) +section. + + +### Faceting + +One of Solr's most popular features is faceting. Faceting allows the search results to be arranged into subsets (or +buckets or categories), providing a count for each subset. There are several types of faceting: field values, numeric +and date ranges, pivots (decision tree), and arbitrary query faceting. + +#### Field facets + +In addition to providing search results, a Solr query can return the number of documents that contain each unique value +in the whole result set. + +From the Admin UI [Query tab](http://localhost:8983/solr/#/gettingstarted_shard1_replica1/query), if you check the "`facet`" +checkbox, you'll see a few facet-related options appear: + +Solr Quick Start: Query tab facet options + +To see facet counts from all documents (`q=*:*`): turn on faceting (`facet=true`), and specify the field to facet on via +the `facet.field` param. If you only want facets, and no document contents, specify `rows=0`. The `curl` command below +will return facet counts for the `manu_id_s` field: + + curl http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=*:*&rows=0 \ + &facet=true&facet.field=manu_id_s + +In your terminal, you'll see: + + { + "responseHeader":{ + "status":0, + "QTime":3, + "params":{ + "facet":"true", + "indent":"true", + "q":"*:*", + "facet.field":"manu_id_s", + "wt":"json", + "rows":"0"}}, + "response":{"numFound":2990,"start":0,"docs":[] + }, + "facet_counts":{ + "facet_queries":{}, + "facet_fields":{ + "manu_id_s":[ + "corsair",3, + "belkin",2, + "canon",2, + "apple",1, + "asus",1, + "ati",1, + "boa",1, + "dell",1, + "eu",1, + "maxtor",1, + "nor",1, + "uk",1, + "viewsonic",1, + "samsung",0]}, + "facet_dates":{}, + "facet_ranges":{}, + "facet_intervals":{}}} + +#### Range facets + +For numerics or dates, it's often desirable to partition the facet counts into ranges rather than discrete values. +A prime example of numeric range faceting, using the example product data, is `price`. In the `/browse` UI, it looks +like this: + +Solr Quick Start: Range facets + +The data for these price range facets can be seen in JSON format with this command: + + curl http://localhost:8983/solr/gettingstarted/select?q=*:*&wt=json&indent=on&rows=0&facet=true \ + &facet.range=price \ + &f.price.facet.range.start=0 \ + &f.price.facet.range.end=600 \ + &f.price.facet.range.gap=50 \ + &facet.range.other=after + +In your terminal you will see: + + { + "responseHeader":{ + "status":0, + "QTime":1, + "params":{ + "facet.range.other":"after", + "facet":"true", + "indent":"on", + "q":"*:*", + "f.price.facet.range.gap":"50", + "facet.range":"price", + "f.price.facet.range.end":"600", + "wt":"json", + "f.price.facet.range.start":"0", + "rows":"0"}}, + "response":{"numFound":2990,"start":0,"docs":[] + }, + "facet_counts":{ + "facet_queries":{}, + "facet_fields":{}, + "facet_dates":{}, + "facet_ranges":{ + "price":{ + "counts":[ + "0.0",19, + "50.0",1, + "100.0",0, + "150.0",2, + "200.0",0, + "250.0",1, + "300.0",1, + "350.0",2, + "400.0",0, + "450.0",1, + "500.0",0, + "550.0",0], + "gap":50.0, + "start":0.0, + "end":600.0, + "after":2}}, + "facet_intervals":{}}} + +#### Pivot facets + +Another faceting type is pivot facets, also known as "decison trees", allowing two or more fields to be nested for all +the various possible combinations. Using the example technical product data, pivot facets can be used to see how many +of the products in the "book" category (the `cat` field) are in stock or not in stock. Here's how to get at the raw +data for this scenario: + + curl http://localhost:8983/solr/gettingstarted/select?q=*:*&rows=0&wt=json&indent=on \ + &facet=on&facet.pivot=cat,inStock + +This results in the following response (trimmed to just the book category output), which says out of 14 items in the +"book" category, 12 are in stock and 2 are not in stock: + + ... + "facet_pivot":{ + "cat,inStock":[{ + "field":"cat", + "value":"book", + "count":14, + "pivot":[{ + "field":"inStock", + "value":true, + "count":12}, + { + "field":"inStock", + "value":false, + "count":2}]}, + ... + +#### More faceting options + +For the full scoop on Solr faceting, visit the Solr Reference Guide's [Faceting](https://cwiki.apache.org/confluence/display/solr/Faceting) +section. + + +### Spatial + +Solr has sophisticated geospatial support, including searching within a specified distance range of a given location +(or within a bounding box), sorting by distance, or even boosting results by the distance. Some of the example tech products +documents in `example/exampledocs/*.xml` have locations associated with them to illustrate the spatial capabilities. +Spatial queries can be combined with any other types of queries, such as in this example of querying for "ipod" within +10 kilometers from San Francisco: + +Solr Quick Start: spatial search + +The URL to this example is , +leveraging the `/browse` UI to show a map for each item and allow easy selection of the location to search near. + +To learn more about Solr's spatial capabilities, see the Solr Reference Guide's [Spatial Search](https://cwiki.apache.org/confluence/display/solr/Spatial+Search) +section. + +## Wrapping up + +If you've run the full set of commands in this quick start guide you have done the following: + +* Launched Solr into SolrCloud mode, two nodes, two collections including shards and replicas +* Indexed a directory of rich text files +* Indexed Solr XML files +* Indexed Solr JSON files +* Indexed CSV content +* Opened the admin console, used its query interface to get JSON formatted results +* Opened the /browse interface to explore Solr's features in a more friendly and familiar interface + +Nice work! The script (see below) to run all of these items took under two minutes! (Your run time may vary, depending +on your computer's power and resources available.) + +Here's a Unix script for convenient copying and pasting in order to run the key commands for this quick start guide: + + # TODO: depends on SOLR-6900 + date ; + bin/solr start -e cloud -noprompt ; + open http://localhost:8983/solr ; + bin/post -c gettingstarted docs/ ; + open http://localhost:8983/solr/gettingstarted/browse ; + bin/post -c gettingstarted example/exampledocs/*.xml ; + bin/post -c gettingstarted example/exampledocs/books.json ; + bin/post -c gettingstarted example/exampledocs/books.csv ; + open "http://localhost:8983/solr/#/gettingstarted_shard1_replica1/plugins/core?entry=searcher" ; + java -Ddata=args org.apache.solr.util.SimplePostTool "SP2514N" ; # TODO: adjust this as SOLR-6900 implements + bin/solr healthcheck -c gettingstarted ; + date ; + +## Cleanup + +As you work through this guide, you may want to stop Solr and reset the environment back to the starting point. +The following command line will stop Solr and remove the directories for each of the two nodes that the start script +created: + + bin/solr stop -all ; rm -Rf example/cloud/node1/ example/cloud/node2/ + +## Where to next? + +For more information on Solr, check out the following resources: + + * [Solr Reference Guide](https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide) (ensure you + match the version of the reference guide with your version of Solr) + * See also additional [Resources](http://lucene.apache.org/solr/resources.html) + + +