SOLR-10842: Move Tutorial to Ref Guide

Squashed commit of the following:

commit 1cc4de5c4506c757746eac1809f9a7f3d3a55d00
Author: Cassandra Targett <ctargett@apache.org>
Date:   Tue Aug 15 13:23:40 2017 -0500

    SOLR-10842: add Field UI images; rename image paths; minor cleanups

commit 39c7c2f87c54eaaa3341dd119ecd4f0675244a38
Author: Cassandra Targett <ctargett@apache.org>
Date:   Thu Aug 3 15:49:21 2017 -0500

    SOLR-10842: remove running-solr.adoc; move starting solr content to installing-solr; move pages to improve flow; final readthrough

commit 70bea0e73159557f991572ad680a251a0791faec
Author: Cassandra Targett <ctargett@apache.org>
Date:   Wed Aug 2 19:31:57 2017 -0500

    SOLR-10842: rename upgrading-solr.adoc; fix links

commit 0d0cbe7980bf0868ea5d36093aad6101201de82b
Author: Cassandra Targett <ctargett@apache.org>
Date:   Tue Aug 1 09:46:42 2017 -0500

    SOLR-10842: page re-org cleanup; copy edits on tutorial; rename "quick-start" to "tutorial"

commit 4a2635638b214b1480d3b7055afa219ae7bb6a36
Author: Cassandra Targett <ctargett@apache.org>
Date:   Fri Jul 28 13:45:39 2017 -0500

    SOLR-10842: Overhaul of tutorial; update query image

commit 1e1646223b29a0788597d3695b5d0e7ebdd28187
Author: Cassandra Targett <ctargett@apache.org>
Date:   Thu Jul 20 14:13:43 2017 -0500

    little typos

commit e2cb85649dabfd7fd7df6f3d3cce2ca58a4c76a9
Author: Cassandra Targett <ctargett@apache.org>
Date:   Wed Jul 19 09:32:25 2017 -0500

    Change example to use Films

commit 49ad12ca58d5b3bbe60f3cb8f61469bfe321fcc3
Author: Cassandra Targett <ctargett@apache.org>
Date:   Tue Jul 18 09:31:03 2017 -0500

    Further experiments with tabbed layout

commit 21e4dcb38f802f9d2aed795d5c6ba3701b0178ae
Author: Cassandra Targett <ctargett@apache.org>
Date:   Mon Jul 17 16:58:33 2017 -0500

    Fix page links; add experiment with tabs for different data formats

commit c24a9385361d22d7cb51051152b9e1f834c25d45
Author: Cassandra Targett <ctargett@apache.org>
Date:   Thu Jun 29 14:36:01 2017 -0500

    SOLR-10842: minor changes to a few files

commit 819f160423d17dbb647935c5bbfd8a16d4b7b57c
Author: Cassandra Targett <ctargett@apache.org>
Date:   Fri Jun 23 13:49:32 2017 -0500

    SOLR-10842: major page reorg; new content for install and config files

commit 4be7b61ba46f440accdf96757566c3d854e09328
Author: Cassandra Targett <ctargett@apache.org>
Date:   Tue Jun 13 15:34:49 2017 -0500

    SOLR-10842: installation docs

commit c83a9ba91d96d5b75df2191404a3482ca81f8505
Author: Cassandra Targett <ctargett@apache.org>
Date:   Tue Jun 13 13:56:47 2017 -0500

    SOLR-10842: little fixes for quick start

commit 10c1a462338aa16c0435c01eba0506fe09277174
Author: Cassandra Targett <ctargett@apache.org>
Date:   Fri Jun 9 13:47:04 2017 -0500

    SOLR-10842: add quickstart.html from CMS; convert to asciidoc style
This commit is contained in:
Cassandra Targett 2017-08-15 14:03:37 -05:00
parent 9ebdd846fd
commit 0b353b6741
20 changed files with 1253 additions and 365 deletions

View File

@ -18,7 +18,7 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
Having had some fun with Solr, you will now learn about all the cool things it can do. Solr is a search server built on top of Apache Lucene, an open source, Java-based, information retrieval library. It is designed to drive powerful document retrieval applications - wherever you need to serve data to users based on their queries, Solr can work for you.
Here is a example of how Solr might be integrated into an application: Here is a example of how Solr might be integrated into an application:
@ -30,13 +30,12 @@ In the scenario above, Solr runs along side other server applications. For examp
Solr makes it easy to add the capability to search through the online store through the following steps: Solr makes it easy to add the capability to search through the online store through the following steps:
. Define a _schema_. The schema tells Solr about the contents of documents it will be indexing. In the online store example, the schema would define fields for the product name, description, price, manufacturer, and so on. Solr's schema is powerful and flexible and allows you to tailor Solr's behavior to your application. See <<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Documents, Fields, and Schema Design>> for all the details. . Define a _schema_. The schema tells Solr about the contents of documents it will be indexing. In the online store example, the schema would define fields for the product name, description, price, manufacturer, and so on. Solr's schema is powerful and flexible and allows you to tailor Solr's behavior to your application. See <<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Documents, Fields, and Schema Design>> for all the details.
. Deploy Solr.
. Feed Solr documents for which your users will search. . Feed Solr documents for which your users will search.
. Expose search functionality in your application. . Expose search functionality in your application.
Because Solr is based on open standards, it is highly extensible. Solr queries are RESTful, which means, in essence, that a query is a simple HTTP request URL and the response is a structured document: mainly XML, but it could also be JSON, CSV, or some other format. This means that a wide variety of clients will be able to use Solr, from other web applications to browser clients, rich client applications, and mobile devices. Any platform capable of HTTP can talk to Solr. See <<client-apis.adoc#client-apis,Client APIs>> for details on client APIs. Because Solr is based on open standards, it is highly extensible. Solr queries are simple HTTP request URLs and the response is a structured document: mainly JSON, but it could also be XML, CSV, or other formats. This means that a wide variety of clients will be able to use Solr, from other web applications to browser clients, rich client applications, and mobile devices. Any platform capable of HTTP can talk to Solr. See <<client-apis.adoc#client-apis,Client APIs>> for details on client APIs.
Solr is based on the Apache Lucene project, a high-performance, full-featured search engine. Solr offers support for the simplest keyword searching through to complex queries on multiple fields and faceted search results. <<searching.adoc#searching,Searching>> has more information about searching and queries. Solr offers support for the simplest keyword searching through to complex queries on multiple fields and faceted search results. <<searching.adoc#searching,Searching>> has more information about searching and queries.
If Solr's capabilities are not impressive enough, its ability to handle very high-volume applications should do the trick. If Solr's capabilities are not impressive enough, its ability to handle very high-volume applications should do the trick.
@ -44,6 +43,4 @@ A relatively common scenario is that you have so much data, or so many queries,
For example: "Sharding" is a scaling technique in which a collection is split into multiple logical pieces called "shards" in order to scale up the number of documents in a collection beyond what could physically fit on a single server. Incoming queries are distributed to every shard in the collection, which respond with merged results. Another technique available is to increase the "Replication Factor" of your collection, which allows you to add servers with additional copies of your collection to handle higher concurrent query load by spreading the requests around to multiple machines. Sharding and Replication are not mutually exclusive, and together make Solr an extremely powerful and scalable platform. For example: "Sharding" is a scaling technique in which a collection is split into multiple logical pieces called "shards" in order to scale up the number of documents in a collection beyond what could physically fit on a single server. Incoming queries are distributed to every shard in the collection, which respond with merged results. Another technique available is to increase the "Replication Factor" of your collection, which allows you to add servers with additional copies of your collection to handle higher concurrent query load by spreading the requests around to multiple machines. Sharding and Replication are not mutually exclusive, and together make Solr an extremely powerful and scalable platform.
Best of all, this talk about high-volume applications is not just hypothetical: some of the famous Internet sites that use Solr today are Macy's, EBay, and Zappo's. Best of all, this talk about high-volume applications is not just hypothetical: some of the famous Internet sites that use Solr today are Macy's, EBay, and Zappo's. For more examples, take a look at https://wiki.apache.org/solr/PublicServers.
For more information, take a look at https://wiki.apache.org/solr/PublicServers.

View File

@ -1,7 +1,7 @@
= Getting Started = Getting Started
:page-shortname: getting-started :page-shortname: getting-started
:page-permalink: getting-started.html :page-permalink: getting-started.html
:page-children: installing-solr, running-solr, a-quick-overview, a-step-closer, solr-control-script-reference :page-children: a-quick-overview, solr-system-requirements, installing-solr, solr-configuration-files, solr-upgrade-notes, taking-solr-to-production, upgrading-a-solr-cluster
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -19,25 +19,21 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
Solr makes it easy for programmers to develop sophisticated, high-performance search applications with advanced features such as faceting (arranging search results in columns with numerical counts of key terms). [.lead]
Solr makes it easy for programmers to develop sophisticated, high-performance search applications with advanced features.
Solr builds on another open source search technology: Lucene, a Java library that provides indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. Both Solr and Lucene are managed by the Apache Software Foundation (http://www.apache.org/[www.apache.org)]. This section introduces you to the basic Solr architecture and features to help you get up and running quickly. It covers the following topics:
The Lucene search library currently ranks among the top 15 open source projects and is one of the top 5 Apache projects, with installations at over 4,000 companies. Lucene/Solr downloads have grown nearly ten times over the past three years, with a current run-rate of over 6,000 downloads a day. The Solr search server, which provides application builders a ready-to-use search platform on top of the Lucene search library, is the fastest growing Lucene sub-project. Apache Lucene/Solr offers an attractive alternative to the proprietary licensed search and discovery software vendors.
This section helps you get Solr up and running quickly, and introduces you to the basic Solr architecture and features. It covers the following topics:
<<installing-solr.adoc#installing-solr,Installing Solr>>: A walkthrough of the Solr installation process.
<<running-solr.adoc#running-solr,Running Solr>>: An introduction to running Solr. Includes information on starting up the servers, adding documents, and running queries.
<<a-quick-overview.adoc#a-quick-overview,A Quick Overview>>: A high-level overview of how Solr works. <<a-quick-overview.adoc#a-quick-overview,A Quick Overview>>: A high-level overview of how Solr works.
<<a-step-closer.adoc#a-step-closer,A Step Closer>>: An introduction to Solr's home directory and configuration options. <<installing-solr.adoc#installing-solr,Installing Solr>>: A walkthrough of the Solr installation process.
<<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>>: a complete reference of all of the commands and options available with the bin/solr script. <<solr-configuration-files.adoc#solr-configuration-files,Solr Configuration Files>>: Overview of the installation layout and major configuration files.
[TIP] <<solr-upgrade-notes.adoc#solr-upgrade-notes,Solr Upgrade Notes>>: Information about changes made in Solr releases.
====
Solr includes a Quick Start tutorial which will be helpful if you are just starting out with Solr. You can find it online at http://lucene.apache.org/solr/quickstart.html. <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>>: Detailed steps to help you install Solr as a service and take your application to production.
====
<<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading a Solr Cluster>>: Information for upgrading a production SolrCloud cluster.
TIP: Solr includes a Quick Start tutorial which will be helpful if you are just starting out with Solr. You can find it in this Guide at <<solr-tutorial.adoc#solr-tutorial,Solr Tutorial>>.

Binary file not shown.

After

Width:  |  Height:  |  Size: 120 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 207 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

View File

@ -1,7 +1,7 @@
= Apache Solr Reference Guide = Apache Solr Reference Guide
:page-shortname: index :page-shortname: index
:page-permalink: index.html :page-permalink: index.html
:page-children: about-this-guide, getting-started, upgrading-solr, using-the-solr-administration-user-interface, documents-fields-and-schema-design, understanding-analyzers-tokenizers-and-filters, indexing-and-basic-data-operations, searching, the-well-configured-solr-instance, managing-solr, solrcloud, legacy-scaling-and-distribution, client-apis, major-changes-from-solr-5-to-solr-6, upgrading-a-solr-cluster, further-assistance, solr-glossary, errata, how-to-contribute :page-children: about-this-guide, solr-tutorial, getting-started, solr-control-script-reference, using-the-solr-administration-user-interface, documents-fields-and-schema-design, understanding-analyzers-tokenizers-and-filters, indexing-and-basic-data-operations, searching, the-well-configured-solr-instance, managing-solr, solrcloud, legacy-scaling-and-distribution, client-apis, major-changes-from-solr-5-to-solr-6, further-assistance, solr-glossary, errata, how-to-contribute
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -19,12 +19,17 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
This reference guide describes Apache Solr, the open source solution for search. You can download Apache Solr from the Solr website at http://lucene.apache.org/solr/. [.lead]
This reference guide describes Apache Solr, the open source solution for search.
This Guide contains the following sections: Solr builds on Lucene, an open source Java library that provides indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. Both Solr and Lucene are managed by the Apache Software Foundation (http://www.apache.org/[www.apache.org)]. You can download Apache Solr from the Solr website at http://lucene.apache.org/solr/.
This Guide contains the following main sections:
*<<getting-started.adoc#getting-started,Getting Started>>*: This section guides you through the installation and setup of Solr. *<<getting-started.adoc#getting-started,Getting Started>>*: This section guides you through the installation and setup of Solr.
*<<solr-control-script-reference#solr-control-script-reference,Solr Control Script Reference>>*: This section provides information about all of the options available to the `bin/solr` / `bin\solr.cmd` scripts, which can start and stop Solr, configure authentication, and create or remove collections and cores.
*<<using-the-solr-administration-user-interface.adoc#using-the-solr-administration-user-interface,Using the Solr Administration User Interface>>*: This section introduces the Solr Web-based user interface. From your browser you can view configuration files, submit queries, view logfile settings and Java environment settings, and monitor and control distributed configurations. *<<using-the-solr-administration-user-interface.adoc#using-the-solr-administration-user-interface,Using the Solr Administration User Interface>>*: This section introduces the Solr Web-based user interface. From your browser you can view configuration files, submit queries, view logfile settings and Java environment settings, and monitor and control distributed configurations.
*<<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Documents, Fields, and Schema Design>>*: This section describes how Solr organizes its data for indexing. It explains how a Solr schema defines the fields and field types which Solr uses to organize data within the document files it indexes. *<<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,Documents, Fields, and Schema Design>>*: This section describes how Solr organizes its data for indexing. It explains how a Solr schema defines the fields and field types which Solr uses to organize data within the document files it indexes.

View File

@ -1,6 +1,7 @@
= Installing Solr = Installing Solr
:page-shortname: installing-solr :page-shortname: installing-solr
:page-permalink: installing-solr.html :page-permalink: installing-solr.html
:page-toclevels: 1
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -18,39 +19,178 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
This section describes how to install Solr. Installation of Solr on Unix-compatible or Windows servers generally requires simply extracting (or, unzipping) the download package.
You can install Solr in any system where a suitable Java Runtime Environment (JRE) is available, as detailed below. Currently this includes Linux, OS X, and Microsoft Windows. The instructions in this section should work for any platform, with a few exceptions for Windows as noted. Please be sure to review the <<solr-system-requirements.adoc#solr-system-requirements,Solr System Requirements>> before starting Solr.
== Got Java? == Available Solr Packages
You will need the Java Runtime Environment (JRE) version 1.8 or higher. At a command line, check your Java version like this: Solr is available from the Solr website. Download the latest release https://lucene.apache.org/solr/mirrors-solr-latest-redir.html.
[source,plain] There are three separate packages:
----
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
----
The exact output will vary, but you need to make sure you meet the minimum version requirement. We also recommend choosing a version that is not end-of-life from its vendor. If you don't have the required version, or if the java command is not found, download and install the latest version from Oracle at http://www.oracle.com/technetwork/java/javase/downloads/index.html. * `solr-{solr-docs-version}.0.tgz` for Linux/Unix/OSX systems
* `solr-{solr-docs-version}.0.zip` for Microsoft Windows systems
* `solr-{solr-docs-version}.0-src.tgz` the package Solr source code. This is useful if you want to develop on Solr without using the official Git repository.
[[install-command]] == Preparing for Installation
== Installing Solr
Solr is available from the Solr website at http://lucene.apache.org/solr/. When getting started with Solr, all you need to do is extract the Solr distribution archive to a directory of your choosing. This will suffice as an initial development environment, but take care not to overtax this "toy" installation before setting up your true development and production environments.
For Linux/Unix/OSX systems, download the `.tgz` file. For Microsoft Windows systems, download the `.zip` file. When you've progressed past initial evaluation of Solr, you'll want to take care to plan your implementation. You may need to reinstall Solr on another server or make a clustered SolrCloud environment.
When getting started, all you need to do is extract the Solr distribution archive to a directory of your choosing. When you're ready to setup Solr for a production environment, please refer to the instructions provided on the <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>> page. When you're ready to setup Solr for a production environment, please refer to the instructions provided on the <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>> page.
.What Size Server Do I Need?
[NOTE]
====
How to size your Solr installation is a complex question that relies on a number of factors, including the number and structure of documents, how many fields you intend to store, the number of users, etc.
It's highly recommended that you spend a bit of time thinking about the factors that will impact hardware sizing for your Solr implementation. A very good blog post that discusses the issues to consider is https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/[Sizing Hardware in the Abstract: Why We Don't have a Definitive Answer].
====
== Package Installation
To keep things simple for now, extract the Solr distribution archive to your local home directory, for instance on Linux, do: To keep things simple for now, extract the Solr distribution archive to your local home directory, for instance on Linux, do:
[source,bash] [source,bash,subs="attributes"]
---- ----
cd ~/ cd ~/
tar zxf solr-x.y.z.tgz tar zxf solr-{solr-docs-version}.0.tgz
---- ----
Once extracted, you are now ready to run Solr using the instructions provided in the <<running-solr.adoc#running-solr,Running Solr>> section. Once extracted, you are now ready to run Solr using the instructions provided in the <<Starting Solr>> section below.
== Directory Layout
After installing Solr, you'll see the following directories and files within them:
bin::
This directory includes several important scripts that will make using Solr easier.
solr and solr.cmd::: This is <<solr-control-script-reference.adoc#solr-control-script-reference,Solr's Control Script>>, also known as `bin/solr` (*nix) / `bin/solr.cmd` (Windows). This script is the preferred tool to start and stop Solr. You can also create collections or cores, configure authentication, and work with configuration files when running in SolrCloud mode.
post::: The <<post-tool.adoc#post-tool,PostTool>>, which provides a simple command line interface for POSTing content to Solr.
solr.in.sh and solr.in.cmd:::
These are property files for *nix and Windows systems, respectively. System-level properties for Java, Jetty, and Solr are configured here. Many of these settings can be overridden when using `bin/solr` / `bin/solr.cmd`, but this allows you to set all the properties in one place.
install_solr_services.sh:::
This script is used on *nix systems to install Solr as a service. It is described in more detail in the section <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>>.
contrib::
Solr's `contrib` directory includes add-on plugins for specialized features of Solr.
dist::
The `dist` directory contains the main Solr .jar files.
docs::
The `docs` directory includes a link to online Javadocs for Solr.
example::
The `example` directory includes several types of examples that demonstrate various Solr capabilities. See the section <<Solr Examples>> below for more details on what is in this directory.
licenses::
The `licenses` directory includes all of the licenses for 3rd party libraries used by Solr.
server::
This directory is where the heart of the Solr application resides. A README in this directory provides a detailed overview, but here are some highlights:
* Solr's Admin UI (`server/solr-webapp`)
* Jetty libraries (`server/lib`)
* Log files (`server/logs`) and log configurations (`server/resources`). See the section <<configuring-logging.adoc#configuring-logging,Configuring Logging>> for more details on how to customize Solr's default logging.
* Sample configsets (`server/solr/configsets`)
== Solr Examples
Solr includes a number of example documents and configurations to use when getting started. If you ran through the <<solr-tutorial.adoc#solr-tutorial,Solr Tutorial>>, you have already interacted with some of these files.
Here are the examples included with Solr:
exampledocs::
This is a small set of simple CSV, XML, and JSON files that can be used with `bin/post` when first getting started with Solr. For more information about using `bin/post` with these files, see <<post-tool.adoc#post-tool,Post Tool>>.
example-DIH::
This directory includes a few example DataImport Handler (DIH) configurations to help you get started with importing structured content in a database, an email server, or even an Atom feed. Each example will index a different set of data; see the README there for more details about these examples.
files::
The `files` directory provides a basic search UI for documents such as Word or PDF that you may have stored locally. See the README there for details on how to use this example.
films::
The `films` directory includes a robust set of data about movies in three formats: CSV, XML, and JSON. See the README there for details on how to use this dataset.
== Starting Solr
Solr includes a command line interface tool called `bin/solr` (Linux/MacOS) or `bin\solr.cmd` (Windows). This tool allows you to start and stop Solr, create cores and collections, configure authentication, and check the status of your system.
To use it to start Solr you can simply enter:
[source,bash]
----
bin/solr start
----
If you are running Windows, you can start Solr by running `bin\solr.cmd` instead.
[source,plain]
----
bin\solr.cmd start
----
This will start Solr in the background, listening on port 8983.
When you start Solr in the background, the script will wait to make sure Solr starts correctly before returning to the command line prompt.
TIP: All of the options for the Solr CLI are described in the section <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>>.
=== Start Solr with a Specific Bundled Example
Solr also provides a number of useful examples to help you learn about key features. You can launch the examples using the `-e` flag. For instance, to launch the "techproducts" example, you would do:
[source,bash]
----
bin/solr -e techproducts
----
Currently, the available examples you can run are: techproducts, dih, schemaless, and cloud. See the section <<solr-control-script-reference.adoc#running-with-example-configurations,Running with Example Configurations>> for details on each example.
.Getting Started with SolrCloud
NOTE: Running the `cloud` example starts Solr in <<solrcloud.adoc#solrcloud,SolrCloud>> mode. For more information on starting Solr in cloud mode, see the section <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,Getting Started with SolrCloud>>.
=== Check if Solr is Running
If you're not sure if Solr is running locally, you can use the status command:
[source,bash]
----
bin/solr status
----
This will search for running Solr instances on your computer and then gather basic information about them, such as the version and memory usage.
That's it! Solr is running. If you need convincing, use a Web browser to see the Admin Console.
`\http://localhost:8983/solr/`
.The Solr Admin interface.
image::images/running-solr/SolrAdminDashboard.png[image,width=900,height=456]
If Solr is not running, your browser will complain that it cannot connect to the server. Check your port number and try again.
=== Create a Core
If you did not start Solr with an example configuration, you would need to create a core in order to be able to index and search. You can do so by running:
[source,bash]
----
bin/solr create -c <name>
----
This will create a core that uses a data-driven schema which tries to guess the correct field type when you add documents to the index.
To see all available options for creating a new core, execute:
[source,bash]
----
bin/solr create -help
----

View File

@ -18,7 +18,9 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
There are some major changes in Solr 6 to consider before starting to migrate your configurations and indexes. There are many hundreds of changes, so a thorough review of the <<upgrading-solr.adoc#upgrading-solr,Upgrading Solr>> section as well as the {solr-javadocs}/changes/Changes.html[CHANGES.txt] file in your Solr instance will help you plan your migration to Solr 6. This section attempts to highlight some of the major changes you should be aware of. There are some major changes in Solr 6 to consider before starting to migrate your configurations and indexes.
There are many hundreds of changes, so a thorough review of the <<solr-upgrade-notes.adoc#solr-upgrade-notes,Solr Upgrade Notes>> section as well as the {solr-javadocs}/changes/Changes.html[CHANGES.txt] file in your Solr instance will help you plan your migration to Solr 6. This section attempts to highlight some of the major changes you should be aware of.
== Highlights of New Features in Solr 6 == Highlights of New Features in Solr 6

View File

@ -1,7 +1,7 @@
= Managing Solr = Managing Solr
:page-shortname: managing-solr :page-shortname: managing-solr
:page-permalink: managing-solr.html :page-permalink: managing-solr.html
:page-children: taking-solr-to-production, securing-solr, running-solr-on-hdfs, making-and-restoring-backups, configuring-logging, using-jmx-with-solr, mbean-request-handler, performance-statistics-reference, metrics-reporting, v2-api :page-children: securing-solr, running-solr-on-hdfs, making-and-restoring-backups, configuring-logging, using-jmx-with-solr, mbean-request-handler, performance-statistics-reference, metrics-reporting, v2-api
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -21,8 +21,6 @@
This section describes how to run Solr and how to look at Solr when it is running. It contains the following sections: This section describes how to run Solr and how to look at Solr when it is running. It contains the following sections:
<<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>>: Describes how to install Solr as a service on Linux for production environments.
<<securing-solr.adoc#securing-solr,Securing Solr>>: How to use the Basic and Kerberos authentication and rule-based authorization plugins for Solr, and how to enable SSL. <<securing-solr.adoc#securing-solr,Securing Solr>>: How to use the Basic and Kerberos authentication and rule-based authorization plugins for Solr, and how to enable SSL.
<<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>>: How to use HDFS to store your Solr indexes and transaction logs. <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>>: How to use HDFS to store your Solr indexes and transaction logs.

View File

@ -1,289 +0,0 @@
= Running Solr
:page-shortname: running-solr
:page-permalink: running-solr.html
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
This section describes how to run Solr with an example schema, how to add documents, and how to run queries.
[[RunningSolr-StarttheServer]]
== Start the Server
If you didn't start Solr after installing it, you can start it by running `bin/solr` from the Solr directory.
[source,bash]
----
bin/solr start
----
If you are running Windows, you can start Solr by running `bin\solr.cmd` instead.
[source,plain]
----
bin\solr.cmd start
----
This will start Solr in the background, listening on port 8983.
When you start Solr in the background, the script will wait to make sure Solr starts correctly before returning to the command line prompt.
The `bin/solr` and `bin\solr.cmd` scripts allow you to customize how you start Solr. Let's work through a few examples of using the `bin/solr` script (if you're running Solr on Windows, the `bin\solr.cmd` works the same as what is shown in the examples below):
[[RunningSolr-SolrScriptOptions]]
=== Solr Script Options
The `bin/solr` script has several options.
[[RunningSolr-ScriptHelp]]
==== Script Help
To see how to use the `bin/solr` script, execute:
[source,bash]
----
bin/solr -help
----
For specific usage instructions for the *start* command, do:
[source,bash]
----
bin/solr start -help
----
[[RunningSolr-StartSolrintheForeground]]
==== Start Solr in the Foreground
Since Solr is a server, it is more common to run it in the background, especially on Unix/Linux. However, to start Solr in the foreground, simply do:
[source,bash]
----
bin/solr start -f
----
If you are running Windows, you can run:
[source,plain]
----
bin\solr.cmd start -f
----
[[RunningSolr-StartSolrwithaDifferentPort]]
==== Start Solr with a Different Port
To change the port Solr listens on, you can use the `-p` parameter when starting, such as:
[source,bash]
----
bin/solr start -p 8984
----
[[RunningSolr-StopSolr]]
==== Stop Solr
When running Solr in the foreground (using -f), then you can stop it using `Ctrl-c`. However, when running in the background, you should use the *stop* command, such as:
[source,bash]
----
bin/solr stop -p 8983
----
The stop command requires you to specify the port Solr is listening on or you can use the `-all` parameter to stop all running Solr instances.
[[RunningSolr-StartSolrwithaSpecificBundledExample]]
==== Start Solr with a Specific Bundled Example
Solr also provides a number of useful examples to help you learn about key features. You can launch the examples using the `-e` flag. For instance, to launch the "techproducts" example, you would do:
[source,bash]
----
bin/solr -e techproducts
----
Currently, the available examples you can run are: techproducts, dih, schemaless, and cloud. See the section <<solr-control-script-reference.adoc#running-with-example-configurations,Running with Example Configurations>> for details on each example.
.Getting Started with SolrCloud
[NOTE]
====
Running the `cloud` example starts Solr in <<solrcloud.adoc#solrcloud,SolrCloud>> mode. For more information on starting Solr in cloud mode, see the section <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,Getting Started with SolrCloud>>.
====
[[RunningSolr-CheckifSolrisRunning]]
==== Check if Solr is Running
If you're not sure if Solr is running locally, you can use the status command:
[source,bash]
----
bin/solr status
----
This will search for running Solr instances on your computer and then gather basic information about them, such as the version and memory usage.
That's it! Solr is running. If you need convincing, use a Web browser to see the Admin Console.
`\http://localhost:8983/solr/`
.The Solr Admin interface.
image::images/running-solr/SolrAdminDashboard.png[image,width=900,height=456]
If Solr is not running, your browser will complain that it cannot connect to the server. Check your port number and try again.
[[RunningSolr-CreateaCore]]
== Create a Core
If you did not start Solr with an example configuration, you would need to create a core in order to be able to index and search. You can do so by running:
[source,bash]
----
bin/solr create -c <name>
----
This will create a core that uses a data-driven schema which tries to guess the correct field type when you add documents to the index.
To see all available options for creating a new core, execute:
[source,bash]
----
bin/solr create -help
----
[[RunningSolr-AddDocuments]]
== Add Documents
Solr is built to find documents that match queries. Solr's schema provides an idea of how content is structured (more on the schema <<documents-fields-and-schema-design.adoc#documents-fields-and-schema-design,later>>), but without documents there is nothing to find. Solr needs input before it can do much.
You may want to add a few sample documents before trying to index your own content. The Solr installation comes with different types of example documents located under the sub-directories of the `example/` directory of your installation.
In the `bin/` directory is the post script, a command line tool which can be used to index different types of documents. Do not worry too much about the details for now. The <<indexing-and-basic-data-operations.adoc#indexing-and-basic-data-operations,Indexing and Basic Data Operations>> section has all the details on indexing.
To see some information about the usage of `bin/post`, use the `-help` option. Windows users, see the section for <<post-tool.adoc#post-tool-windows-support,Post Tool on Windows>>.
`bin/post` can post various types of content to Solr, including files in Solr's native XML and JSON formats, CSV files, a directory tree of rich documents, or even a simple short web crawl. See the examples at the end of `bin/post -help` for various commands to easily get started posting your content into Solr.
Go ahead and add all the documents in some example XML files:
[source,plain]
----
$ bin/post -c gettingstarted example/exampledocs/*.xml
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/gettingstarted/update...
Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file mp500.xml (application/xml) to [base]
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr.xml (application/xml) to [base]
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/gettingstarted/update...
Time spent: 0:00:00.153
----
That's it! Solr has indexed the documents contained in those files.
[[RunningSolr-AskQuestions]]
== Ask Questions
Now that you have indexed documents, you can perform queries. The simplest way is by building a URL that includes the query parameters. This is exactly the same as building any other HTTP URL.
For example, the following query searches all document fields for "video":
`\http://localhost:8983/solr/gettingstarted/select?q=video`
Notice how the URL includes the host name (`localhost`), the port number where the server is listening (`8983`), the application name (`solr`), the request handler for queries (`select`), and finally, the query itself (`q=video`).
The results are contained in an XML document, which you can examine directly by clicking on the link above. The document contains two parts. The first part is the `responseHeader`, which contains information about the response itself. The main part of the reply is in the result tag, which contains one or more doc tags, each of which contains fields from documents that match the query. You can use standard XML transformation techniques to mold Solr's results into a form that is suitable for displaying to users. Alternatively, Solr can output the results in JSON, PHP, Ruby and even user-defined formats.
Just in case you are not running Solr as you read, the following screen shot shows the result of a query (the next example, actually) as viewed in Mozilla Firefox. The top-level response contains a `lst` named `responseHeader` and a result named response. Inside result, you can see the three docs that represent the search results.
.An XML response to a query.
image::images/running-solr/solr34_responseHeader.png[image,width=600,height=634]
Once you have mastered the basic idea of a query, it is easy to add enhancements to explore the query syntax. This one is the same as before but the results only contain the ID, name, and price for each returned document. If you don't specify which fields you want, all of them are returned.
`\http://localhost:8983/solr/gettingstarted/select?q=video&fl=id,name,price`
Here is another example which searches for "black" in the `name` field only. If you do not tell Solr which field to search, it will search default fields, as specified in the schema.
`\http://localhost:8983/solr/gettingstarted/select?q=name:black`
You can provide ranges for fields. The following query finds every document whose price is between $0 and $400.
`\http://localhost:8983/solr/gettingstarted/select?q=price:[0%20TO%20400]&fl=id,name,price`
<<faceting.adoc#faceting,Faceted browsing>> is one of Solr's key features. It allows users to narrow search results in ways that are meaningful to your application. For example, a shopping site could provide facets to narrow search results by manufacturer or price.
Faceting information is returned as a third part of Solr's query response. To get a taste of this power, take a look at the following query. It adds `facet=true` and `facet.field=cat`.
`\http://localhost:8983/solr/gettingstarted/select?q=price:[0%20TO%20400]&fl=id,name,price&facet=true&facet.field=cat`
In addition to the familiar `responseHeader` and response from Solr, a `facet_counts` element is also present. Here is a view with the `responseHeader` and response collapsed so you can see the faceting information clearly.
*An XML Response with faceting*
[source,xml]
----
<response>
<lst name="responseHeader">
...
</lst>
<result name="response" numFound="9" start="0">
<doc>
<str name="id">SOLR1000</str>
<str name="name">Solr, the Enterprise Search Server</str>
<float name="price">0.0</float></doc>
...
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="cat">
<int name="electronics">6</int>
<int name="memory">3</int>
<int name="search">2</int>
<int name="software">2</int>
<int name="camera">1</int>
<int name="copier">1</int>
<int name="multifunction printer">1</int>
<int name="music">1</int>
<int name="printer">1</int>
<int name="scanner">1</int>
<int name="connector">0</int>
<int name="currency">0</int>
<int name="graphics card">0</int>
<int name="hard drive">0</int>
<int name="monitor">0</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
----
The facet information shows how many of the query results have each possible value of the `cat` field. You could easily use this information to provide users with a quick way to narrow their query results. You can filter results by adding one or more filter queries to the Solr request. This request constrains documents with a category of "software".
`\http://localhost:8983/solr/gettingstarted/select?q=price:0%20TO%20400&fl=id,name,price&facet=true&facet.field=cat&fq=cat:software`

View File

@ -1,6 +1,6 @@
= A Step Closer = Solr Configuration Files
:page-shortname: a-step-closer :page-shortname: solr-configuration-files
:page-permalink: a-step-closer.html :page-permalink: solr-configuration-files.html
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -18,9 +18,16 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
You already have some idea of Solr's schema. This section describes Solr's home directory and other configuration options. Solr has several configuration files that you will interact with during your implementation.
When Solr runs in an application server, it needs access to a home directory. The home directory contains important configuration information and is the place where Solr will store its index. The layout of the home directory will look a little different when you are running Solr in standalone mode vs when you are running in SolrCloud mode. Many of these files are in XML format, although APIs that interact with configuration settings tend to accept JSON for programmatic access as needed.
== Solr Home
When Solr runs, it needs access to a home directory.
When you first install Solr, your home directory is `server/solr`. However, some examples may change this location (such as, if you run `bin/solr start -e cloud`, your home directory will be `example/cloud`).
The home directory contains important configuration information and is the place where Solr will store its index. The layout of the home directory will look a little different when you are running Solr in standalone mode vs. when you are running in SolrCloud mode.
The crucial parts of the Solr home directory are shown in these examples: The crucial parts of the Solr home directory are shown in these examples:
@ -56,7 +63,10 @@ The crucial parts of the Solr home directory are shown in these examples:
data/ data/
---- ----
You may see other files, but the main ones you need to know are: You may see other files, but the main ones you need to know are discussed in the next section.
== Configuration Files
Inside Solr's Home, you'll find these files:
* `solr.xml` specifies configuration options for your Solr server instance. For more information on `solr.xml` see <<solr-cores-and-solr-xml.adoc#solr-cores-and-solr-xml,Solr Cores and solr.xml>>. * `solr.xml` specifies configuration options for your Solr server instance. For more information on `solr.xml` see <<solr-cores-and-solr-xml.adoc#solr-cores-and-solr-xml,Solr Cores and solr.xml>>.
* Per Solr Core: * Per Solr Core:
@ -67,4 +77,4 @@ You may see other files, but the main ones you need to know are:
Note that the SolrCloud example does not include a `conf` directory for each Solr Core (so there is no `solrconfig.xml` or Schema file). This is because the configuration files usually found in the `conf` directory are stored in ZooKeeper so they can be propagated across the cluster. Note that the SolrCloud example does not include a `conf` directory for each Solr Core (so there is no `solrconfig.xml` or Schema file). This is because the configuration files usually found in the `conf` directory are stored in ZooKeeper so they can be propagated across the cluster.
If you are using SolrCloud with the embedded ZooKeeper instance, you may also see `zoo.cfg` and `zoo.data` which are ZooKeeper configuration and data files. However, if you are running your own ZooKeeper ensemble, you would supply your own ZooKeeper configuration file when you start it and the copies in Solr would be unused. For more information about ZooKeeper and SolrCloud, see the section <<solrcloud.adoc#solrcloud,SolrCloud>>. If you are using SolrCloud with the embedded ZooKeeper instance, you may also see `zoo.cfg` and `zoo.data` which are ZooKeeper configuration and data files. However, if you are running your own ZooKeeper ensemble, you would supply your own ZooKeeper configuration file when you start it and the copies in Solr would be unused. For more information about SolrCloud, see the section <<solrcloud.adoc#solrcloud,SolrCloud>>.

View File

@ -25,7 +25,7 @@ You can start and stop Solr, create and delete collections or cores, perform ope
You can find the script in the `bin/` directory of your Solr installation. The `bin/solr` script makes Solr easier to work with by providing simple commands and options to quickly accomplish common goals. You can find the script in the `bin/` directory of your Solr installation. The `bin/solr` script makes Solr easier to work with by providing simple commands and options to quickly accomplish common goals.
More examples of `bin/solr` in use are available throughout the Solr Reference Guide, but particularly in the sections <<running-solr.adoc#running-solr,Running Solr>> and <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,Getting Started with SolrCloud>>. More examples of `bin/solr` in use are available throughout the Solr Reference Guide, but particularly in the sections <<installing-solr.adoc#starting-solr,Starting Solr>> and <<getting-started-with-solrcloud.adoc#getting-started-with-solrcloud,Getting Started with SolrCloud>>.
== Starting and Stopping == Starting and Stopping

View File

@ -0,0 +1,48 @@
= Solr System Requirements
:page-shortname: solr-system-requirements
:page-permalink: solr-system-requirements.html
:page-toc: false
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
You can install Solr in any system where a suitable Java Runtime Environment (JRE) is available, as detailed below.
Currently this includes Linux, MacOS/OS X, and Microsoft Windows.
== Installation Requirements
=== Java Requirements
You will need the Java Runtime Environment (JRE) version 1.8 or higher. At a command line, check your Java version like this:
[source,bash]
----
$ java -version
java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
----
The exact output will vary, but you need to make sure you meet the minimum version requirement. We also recommend choosing a version that is not end-of-life from its vendor. Oracle or OpenJDK are the most tested JREs and are recommended. It's also recommended to use the latest available official release when possible.
Some versions of Java VM have bugs that may impact your implementation. To be sure, check the page https://wiki.apache.org/lucene-java/JavaBugs[Lucene JavaBugs].
If you don't have the required version, or if the `java` command is not found, download and install the latest version from Oracle at http://www.oracle.com/technetwork/java/javase/downloads/index.html.
=== Supported Operating Systems
Solr is tested on several versions of Linux, MacOS, and Windows.

View File

@ -0,0 +1,980 @@
= Solr Tutorial
:page-shortname: solr-tutorial
:page-permalink: solr-tutorial.html
:page-tocclass: right
:experimental:
This tutorial covers getting Solr up and running, ingesting a variety of data sources into multiple collections,
and getting a feel for the Solr administrative and search interfaces.
It is organized into three sections that each build on the one before it. The <<exercise-1,first exercise>> will ask you to start Solr, create a collection, index some basic documents, and then perform a few searches.
The <<exercise-2,second exercise>> works with a different set of data, and explores requesting facets with the dataset.
The <<exercise-3,third exercise>> encourages you to begin to work with your own data and start a plan for your implementation.
Finally, we'll introduce <<Spatial Queries,spatial search>> and show you how to get your Solr instance back into a clean state.
== Before You Begin
To follow along with this tutorial, you will need...
// TODO possibly remove this system requirements or only replace the link
. To meet the {solr-javadocs}/solr/api/SYSTEM_REQUIREMENTS.html[system requirements]
. An Apache Solr release http://lucene.apache.org/solr/downloads.html[download]. This tutorial is designed for Apache Solr {solr-docs-version}.
For best results, please run the browser showing this tutorial and the Solr server on the same machine so tutorial links will correctly point to your Solr server.
== Unpack Solr
Begin by unzipping the Solr release and changing your working directory to the subdirectory where Solr was installed. For example, with a shell in UNIX, Cygwin, or MacOS:
[source,bash,subs="verbatim,attributes+"]
----
~$ ls solr*
solr-{solr-docs-version}.0.zip
~$ unzip -q solr-{solr-docs-version}.0.zip
~$ cd solr-{solr-docs-version}.0/
----
If you'd like to know more about Solr's directory layout before moving to the first exercise, see the section <<installing-solr.adoc#directory-layout,Directory Layout>> for details.
[[exercise-1]]
== Exercise 1: Index Techproducts Example Data
This exercise will walk you through how to start Solr as a two-node cluster (both nodes on the same machine) and create a collection during startup. Then you will index some sample data that ships with Solr and do some basic searches.
=== Launch Solr in SolrCloud Mode
To launch Solr, run: `bin/solr start -e cloud` on Unix or MacOS; `bin\solr.cmd start -e cloud` on Windows.
This will start an interactive session that will start two Solr "servers" on your machine. This command has an option to run without prompting you for input (`-noprompt`), but we want to modify two of the defaults so we won't use that option now.
[source,subs="verbatim,attributes+"]
----
solr-{solr-docs-version}.0:$ ./bin/solr start -e cloud
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
----
The first prompt asks how many nodes we want to run. Note the `[2]` at the end of the last line; that is the default number of nodes. Two is what we want for this example, so you can simply press kbd:[enter].
[source,subs="verbatim,attributes+"]
----
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
----
This will be the port that the first node runs on. Unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing kbd:[enter]. If something is already using that port, you will be asked to choose another port.
[source,subs="verbatim,attributes+"]
----
Please enter the port for node2 [7574]:
----
This is the port the second node will run on. Again, unless you know you have something else running on port 8983 on your machine, accept this default option also by pressing kbd:[enter]. If something is already using that port, you will be asked to choose another port.
Solr will now initialize itself and start running on those two nodes. The script will print the commands it uses for your reference.
[source,subs="verbatim,attributes+"]
----
Starting up 2 Solr nodes for your example SolrCloud cluster.
Creating Solr home directory /solr-{solr-docs-version}.0/example/cloud/node1/solr
Cloning /solr-{solr-docs-version}.0/example/cloud/node1 into
/solr-{solr-docs-version}.0/example/cloud/node2
Starting up Solr on port 8983 using command:
"bin/solr" start -cloud -p 8983 -s "example/cloud/node1/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [\]
Started Solr server on port 8983 (pid=34942). Happy searching!
Starting up Solr on port 7574 using command:
"bin/solr" start -cloud -p 7574 -s "example/cloud/node2/solr" -z localhost:9983
Waiting up to 180 seconds to see Solr running on port 7574 [\]
Started Solr server on port 7574 (pid=35036). Happy searching!
INFO - 2017-07-27 12:28:02.835; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
----
Notice that two instances of Solr have started on two nodes. Because we are starting in SolrCloud mode, and did not define any details about an external ZooKeeper cluster, Solr launches its own ZooKeeper and connects both nodes to it.
After startup is complete, you'll be prompted to create a collection to use for indexing data.
[source,subs="verbatim,attributes+"]
----
Now let's create a new collection for indexing documents in your 2-node cluster.
Please provide a name for your new collection: [gettingstarted]
----
Here's the first place where we'll deviate from the default options. This tutorial will ask you to index some sample data included with Solr, called the "techproducts" data. Let's name our collection "techproducts" so it's easy to differentiate from other collections we'll create later. Enter `techproducts` at the prompt and hit kbd:[enter].
[source,subs="verbatim,attributes+"]
----
How many shards would you like to split techproducts into? [2]
----
This is asking how many <<solr-glossary.adoc#shard,shards>> you want to split your index into across the two nodes. Choosing "2" (the default) means we will split the index relatively evenly across both nodes, which is a good way to start. Accept the default by hitting kbd:[enter].
[source,subs="verbatim,attributes+"]
----
How many replicas per shard would you like to create? [2]
----
A replica is a copy of the index that's used for failover (see also the <<solr-glossary.adoc#replica,Solr Glossary definition>>). Again, the default of "2" is fine to start with here also, so accept the default by hitting kbd:[enter].
[source,subs="verbatim,attributes+"]
----
Please choose a configuration for the techproducts collection, available options are:
_default or sample_techproducts_configs [_default]
----
We've reached another point where we will deviate from the default option. Solr has two sample sets of configuration files (called a _configSet_) available out-of-the-box.
A collection must have a configSet, which at a minimum includes the two main configuration files for Solr: the schema file (named either `managed-schema` or `schema.xml`), and `solrconfig.xml`. The question here is which configSet you would like to start with. The `_default` is a bare-bones option, but note there's one whose name includes "techproducts", the same as we named our collection. This configSet is specifically designed to support the sample data we want to use, so enter `sample_techproducts_configs` at the prompt and hit kbd:[enter].
At this point, Solr will create the collection and again output to the screen the commands it issues.
[source,subs="verbatim,attributes+"]
----
Uploading /solr-{solr-docs-version}.0/server/solr/configsets/_default/conf for config techproducts to ZooKeeper at localhost:9983
Connecting to ZooKeeper at localhost:9983 ...
INFO - 2017-07-27 12:48:59.289; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Uploading /solr-{solr-docs-version}.0/server/solr/configsets/sample_techproducts_configs/conf for config techproducts to ZooKeeper at localhost:9983
Creating new collection 'techproducts' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=techproducts&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=techproducts
{
"responseHeader":{
"status":0,
"QTime":5460},
"success":{
"192.168.0.110:7574_solr":{
"responseHeader":{
"status":0,
"QTime":4056},
"core":"techproducts_shard1_replica_n1"},
"192.168.0.110:8983_solr":{
"responseHeader":{
"status":0,
"QTime":4056},
"core":"techproducts_shard2_replica_n2"}}}
Enabling auto soft-commits with maxTime 3 secs using the Config API
POSTing request to Config API: http://localhost:8983/solr/techproducts/config
{"set-property":{"updateHandler.autoSoftCommit.maxTime":"3000"}}
Successfully set-property updateHandler.autoSoftCommit.maxTime to 3000
SolrCloud example running, please visit: http://localhost:8983/solr
----
*Congratulations!* Solr is ready for data!
You can see that Solr is running by launching the Solr Admin UI in your web browser: http://localhost:8983/solr/. This is the main starting point for administering Solr.
Solr will now be running two "nodes", one on port 7574 and one on port 8983. There is one collection created automatically, `techproducts`, a two shard collection, each with two replicas.
The http://localhost:8983/solr/#/~cloud[Cloud tab] in the Admin UI diagrams the collection nicely:
.SolrCloud Diagram
image::images/solr-tutorial/tutorial-solrcloud.png[]
=== Index the Techproducts Data
Your Solr server is up and running, but it doesn't contain any data yet, so we can't do any queries.
Solr includes the `bin/post` tool in order to facilitate indexing various types of documents easily. We'll use this tool for the indexing examples below.
You'll need a command shell to run some of the following examples, rooted in the Solr install directory; the shell from where you launched Solr works just fine.
NOTE: Currently the `bin/post` tool does not have a comparable Windows script, but the underlying Java program invoked is available. We'll show examples below for Windows, but you can also see the <<post-tool.adoc#post-tool-windows-support,Windows section>> of the Post Tool documentation for more details.
The data we will index is in the `example/exampledocs` directory. The documents are in a mix of document formats (JSON, CSV, etc.), and fortunately we can index them all at once:
.Linux/Mac
[source,subs="verbatim,attributes+"]
----
solr-{solr-docs-version}.0:$ bin/post -c techproducts example/exampledocs/*
----
.Windows
[source,subs="verbatim,attributes+"]
----
C:\solr-{solr-docs-version}.0> java -jar -Dc=techproducts -Dauto example\exampledocs\post.jar example\exampledocs\*
----
You should see output similar to the following:
[source,subs="verbatim,attributes+"]
----
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
POSTing file books.json (application/json) to [base]/json/docs
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file more_books.jsonl (application/json) to [base]/json/docs
POSTing file mp500.xml (application/xml) to [base]
POSTing file post.jar (application/octet-stream) to [base]/extract
POSTing file sample.html (text/html) to [base]/extract
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr-word.pdf (application/pdf) to [base]/extract
POSTing file solr.xml (application/xml) to [base]
POSTing file test_utf8.sh (application/octet-stream) to [base]/extract
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
21 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:00.822
----
Congratulations again! You have data in your Solr!
Now we're ready to start searching.
[[tutorial-searching]]
=== Basic Searching
Solr can be queried via REST clients, curl, wget, Chrome POSTMAN, etc., as well as via native clients available for many programming languages.
The Solr Admin UI includes a query builder interface via the Query tab for the `techproducts` collection (at http://localhost:8983/solr/#/techproducts/query). If you click the btn:[Execute Query] button without changing anything in the form, you'll get 10 documents in JSON format:
.Query Screen
image::images/solr-tutorial/tutorial-query-screen.png[Solr Quick Start: techproducts Query screen with results]
The URL sent by the Admin UI to Solr is shown in light grey near the top right of the above screenshot. If you click on it, your browser will show you the raw response.
To use curl, give the same URL shown in your browser in quotes on the command line:
`curl "http://localhost:8983/solr/techproducts/select?indent=on&q=\*:*"`
What's happening here is that we are using Solr's query parameter (`q`) with a special syntax that requests all documents in the index (`\*:*`). All of the documents are not returned to us, however, because of the default for a parameter called `rows`, which you can see in the form is `10`. You can change the parameter in the UI or in the defaults if you wish.
Solr has very powerful search options, and this tutorial won't be able to cover all of them. But we can cover some of the most common types of queries.
==== Search for a Single Term
To search for a term, enter it as the `q` param value in the Solr Admin UI Query screen, replacing `\*:*` with the term you want to find.
Enter "foundation" and hit btn:[Execute Query] again.
If you prefer curl, enter something like this:
`curl "http://localhost:8983/solr/techproducts/select?q=foundation"`
You'll see something like this:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":8,
"params":{
"q":"foundation"}},
"response":{"numFound":4,"start":0,"maxScore":2.7879646,"docs":[
{
"id":"0553293354",
"cat":["book"],
"name":"Foundation",
"price":7.99,
"price_c":"7.99,USD",
"inStock":true,
"author":"Isaac Asimov",
"author_s":"Isaac Asimov",
"series_t":"Foundation Novels",
"sequence_i":1,
"genre_s":"scifi",
"_version_":1574100232473411586,
"price_c____l_ns":799}]
}}
The response indicates that there are 4 hits (`"numFound":4`). We've only included one document the above sample output, but since 4 hits is lower than the `rows` parameter default of 10 to be returned, you should see all 4 of them.
Note the `responseHeader` before the documents. This header will include the parameters you have set for the search. By default it shows only the parameters _you_ have set for this query, which in this case is only your query term.
The documents we got back include all the fields for each document that were indexed. This is, again, default behavior. If you want to restrict the fields in the response, you can use the `fl` param, which takes a comma-separated list of field names. This is one of the available fields on the query form in the Admin UI.
Put "id" (without quotes) in the "fl" box and hit btn:[Execute Query] again. Or, to specify it with curl:
`curl "http://localhost:8983/solr/techproducts/select?q=foundation&fl=id"`
You should only see the IDs of the matching records returned.
==== Field Searches
All Solr queries look for documents using some field. Often you want to query across multiple fields at the same time, and this is what we've done so far with the "foundation" query. This is possible with the use of copy fields, which are set up already with this set of configurations. We'll cover copy fields a little bit more in Exercise 2.
Sometimes, though, you want to limit your query to a single field. This can make your queries more efficient and the results more relevant for users.
Much of the data in our small sample data set is related to products. Let's say we want to find all the "electronics" products in the index. In the Query screen, enter "electronics" (without quotes) in the `q` box and hit btn:[Execute Query]. You should get 14 results, such as:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":6,
"params":{
"q":"electronics"}},
"response":{"numFound":14,"start":0,"maxScore":1.5579545,"docs":[
{
"id":"IW-02",
"name":"iPod & iPod Mini USB 2.0 Cable",
"manu":"Belkin",
"manu_id_s":"belkin",
"cat":["electronics",
"connector"],
"features":["car power adapter for iPod, white"],
"weight":2.0,
"price":11.5,
"price_c":"11.50,USD",
"popularity":1,
"inStock":false,
"store":"37.7752,-122.4232",
"manufacturedate_dt":"2006-02-14T23:55:59Z",
"_version_":1574100232554151936,
"price_c____l_ns":1150}]
}}
This search finds all documents that contain the term "electronics" anywhere in the indexed fields. However, we can see from the above there is a `cat` field (for "category"). If we limit our search for only documents with the category "electronics", the results will be more precise for our users.
Update your query in the `q` field of the Admin UI so it's `cat:electronics`. Now you get 12 results:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":6,
"params":{
"q":"cat:electronics"}},
"response":{"numFound":12,"start":0,"maxScore":0.9614112,"docs":[
{
"id":"SP2514N",
"name":"Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
"manu":"Samsung Electronics Co. Ltd.",
"manu_id_s":"samsung",
"cat":["electronics",
"hard drive"],
"features":["7200RPM, 8MB cache, IDE Ultra ATA-133",
"NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor"],
"price":92.0,
"price_c":"92.0,USD",
"popularity":6,
"inStock":true,
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"store":"35.0752,-97.032",
"_version_":1574100232511160320,
"price_c____l_ns":9200}]
}}
Using curl, this query would look like this:
`curl "http://localhost:8983/solr/techproducts/select?q=cat:electronics"`
==== Phrase Search
To search for a multi-term phrase, enclose it in double quotes: `q="multiple terms here"`. For example, search for "CAS latency" by entering that phrase in quotes to the `q` box in the Admin UI.
If you're following along with curl, note that the space between terms must be converted to "+" in a URL, as so:
`curl "http://localhost:8983/solr/techproducts/select?q=\"CAS+latency\""`
We get 2 results:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":7,
"params":{
"q":"\"CAS latency\""}},
"response":{"numFound":2,"start":0,"maxScore":5.937691,"docs":[
{
"id":"VDBDB1A16",
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM",
"manu":"A-DATA Technology Inc.",
"manu_id_s":"corsair",
"cat":["electronics",
"memory"],
"features":["CAS latency 3, 2.7v"],
"popularity":0,
"inStock":true,
"store":"45.18414,-93.88141",
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"payloads":"electronics|0.9 memory|0.1",
"_version_":1574100232590852096},
{
"id":"TWINX2048-3200PRO",
"name":"CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
"manu":"Corsair Microsystems Inc.",
"manu_id_s":"corsair",
"cat":["electronics",
"memory"],
"features":["CAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered, heat-spreader"],
"price":185.0,
"price_c":"185.00,USD",
"popularity":5,
"inStock":true,
"store":"37.7752,-122.4232",
"manufacturedate_dt":"2006-02-13T15:26:37Z",
"payloads":"electronics|6.0 memory|3.0",
"_version_":1574100232584560640,
"price_c____l_ns":18500}]
}}
==== Combining Searches
By default, when you search for multiple terms and/or phrases in a single query, Solr will only require that one of them is present in order for a document to match. Documents containing more terms will be sorted higher in the results list.
You can require that a term or phrase is present by prefixing it with a `+`; conversely, to disallow the presence of a term or phrase, prefix it with a `-`.
To find documents that contain both terms "electronics" and "music", enter `+electronics +music` in the `q` box in the Admin UI Query tab.
If you're using curl, you must encode the `+` character because it has a reserved purpose in URLs (encoding the space character). The encoding for `+` is `%2B`:
`curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics%20%2Bmusic"`
You should only get a single result.
To search for documents that contain the term "electronics" but *don't* contain the term "music", enter `+electronics -music` in the `q` box in the Admin UI. For curl, again, URL encode "+" as "%2B":
`curl "http://localhost:8983/solr/techproducts/select?q=%2Belectronics+-music"`
This time you get 13 results.
==== More Information on Searching
We have only scratched the surface of the search options available in Solr. For more Solr search options, see the section on <<searching.adoc#searching,Searching>>.
=== Exercise 1 Wrap Up
At this point, you've seen how Solr can index data and have done some basic queries. You can choose now to continue to the next example which will introduce more Solr concepts, such as faceting results and managing your schema, or you can strike out on your own.
If you decide not to continue with this tutorial, the data we've indexed so far is likely of little value to you. You can delete your installation and start over, or you can use the `bin/solr` script we started out with to delete this collection:
`bin/solr delete -c techproducts`
And then create a new collection:
`bin/solr create -c <yourCollection> -s 2 -rf 2`
To stop both of the Solr nodes we started, issue the command:
`bin/solr stop -all`
For more information on start/stop and collection options with `bin/solr`, see <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>>.
[[exercise-2]]
== Exercise 2: Modify the Schema and Index Films Data
This exercise will build on the last one and introduce you to the index schema and Solr's powerful faceting features.
=== Restart Solr
Did you stop Solr after the last exercise? No? Then go ahead to the next section.
If you did, though, and need to restart Solr, issue these commands:
`./bin/solr start -c -p 8983 -s example/cloud/node1/solr`
This starts the first node. When it's done start the second node, and tell it how to connect to to ZooKeeper:
`./bin/solr start -c -p 7574 -s example/cloud/node2/solr -z localhost:9983`
=== Create a New Collection
We're going to use a whole new data set in this exercise, so it would be better to have a new collection instead of trying to reuse the one we had before.
One reason for this is we're going to use a feature in Solr called "field guessing", where Solr attempts to guess what type of data is in a field while it's indexing it. It also automatically creates new fields in the schema for new fields that appear in incoming documents. This mode is called "Schemaless". We'll see the benefits and limitations of this approach to help you decide how and where to use it in your real application.
.What is a "schema" and why do I need one?
[sidebar]
****
Solr's schema is a single file (in XML) that stores the details about the fields and field types Solr is expected to understand. The schema defines not only the field or field type names, but also any modifications that should happen to a field before it is indexed. For example, if you want to ensure that a user who enters "abc" and a user who enters "ABC" can both find a document containing the term "ABC", you will want to normalize (lower-case it, in this case) "ABC" when it is indexed, and normalize the user query to be sure of a match. These rules are defined in your schema.
Earlier in the tutorial we mentioned copy fields, which are fields made up of data that originated from other fields. You can also define dynamic fields, which use wildcards (such as `*_t` or `*_s`) to dynamically create fields of a specific field type. These types of rules are also defined in the schema.
****
When you initially started Solr in the first exercise, we had a choice of a configSet to use. The one we chose had a schema that was pre-defined for the data we later indexed. This time, we're going to use a configSet that has a very minimal schema and let Solr figure out from the data what fields to add.
The data you're going to index is related to movies, so start by creating a collection named "films" that uses the `_default` configSet:
`bin/solr create -c films -s 2 -rf 2`
Whoa, wait. We didn't specify a configSet! That's fine, the `_default` is appropriately named, since it's the default and is used if you don't specify one at all.
We did, however, set two parameters `-s` and `-rf`. Those are the number of shards to split the collection across (2) and how many replicas to create (2). This is equivalent to the options we had during the interactive example from the first exercise.
You should see output like:
[source,subs="verbatim,attributes+"]
----
WARNING: Using _default configset. Data driven schema functionality is enabled by default, which is
NOT RECOMMENDED for production use.
To turn it off:
curl http://localhost:7574/solr/films/config -d '{"set-user-property": {"update.autoCreateFields":"false"}}'
Connecting to ZooKeeper at localhost:9983 ...
INFO - 2017-07-27 15:07:46.191; org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at localhost:9983 ready
Uploading /{solr-docs-version}.0/server/solr/configsets/_default/conf for config films to ZooKeeper at localhost:9983
Creating new collection 'films' using command:
http://localhost:7574/solr/admin/collections?action=CREATE&name=films&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=films
{
"responseHeader":{
"status":0,
"QTime":3830},
"success":{
"192.168.0.110:8983_solr":{
"responseHeader":{
"status":0,
"QTime":2076},
"core":"films_shard2_replica_n1"},
"192.168.0.110:7574_solr":{
"responseHeader":{
"status":0,
"QTime":2494},
"core":"films_shard1_replica_n2"}}}
----
The first thing the command printed was a warning about not using this configSet in production. That's due to some of the limitations we'll cover shortly.
Otherwise, though, the collection should be created. If we go to the Admin UI at http://localhost:8983/solr/#/films/collection-overview we should see the overview screen.
==== Preparing Schemaless for the Films Data
There are two parallel things happening with the schema that comes with the `_default` configSet.
First, we are using a "managed schema", which is configured to only be modified by Solr's Schema API. That means we should not hand-edit it so there isn't confusion about which edits come from which source. Solr's Schema API allows us to make changes to fields, field types, and other types of schema rules.
Second, we are using "field guessing", which is configured in the `solrconfig.xml` file (and includes most of Solr's various configuration settings). Field guessing is designed to allow us to start using Solr without having to define all the fields we think will be in our documents before trying to index them. This is why we call it "schemaless", because you can start quickly and let Solr create fields for you as it encounters them in documents.
Sounds great! Well, not really, there are limitations. It's a bit brute force, and if it guesses wrong, you can't change much about a field after data has been indexed without having to reindex. If we only have a few thousand documents that might not be bad, but if you have millions and millions of documents, or, worse, don't have access to the original data anymore, this can be a real problem.
For these reasons, the Solr community does not recommend going to production without a schema that you have defined yourself. By this we mean that the schemaless features are fine to start with, but you should still always make sure your schema matches your expectations for how you want your data indexed and how users are going to query it.
It is possible to mix schemaless features with a defined schema. Using the Schema API, you can define a few fields that you know you want to control, and let Solr guess others that are less important or which you are confident (through testing) will be guessed to your satisfaction. That's what we're going to do here.
===== Create the "names" Field
The films data we are going to index has a small number of fields for each movie: an ID, director name(s), film name, release date, and genre(s).
If you look at one of the files in `example/films`, you'll see the first film is named _.45_, released in 2006. As the first document in the dataset, Solr is going to guess the field type based on the data in the record. If we go ahead and index this data, that first film name is going to indicate to Solr that that field type is a "float" numeric field, and will create a "name" field with a type `FloatPointField`. All data after this record will be expected to be a float.
Well, that's not going to work. We have titles like _A Mighty Wind_ and _Chicken Run_, which are strings - decidedly not numeric and not floats. If we let Solr guess the "name" field is a float, what will happen is later titles will cause an error and indexing will fail. That's not going to get us very far.
What we can do is set up the "name" field in Solr before we index the data to be sure Solr always interprets it as a string. At the command line, enter this curl command:
[source,bash]
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-field": {"name":"name", "type":"text_general", "multiValued":false, "stored":true}}' http://localhost:8983/solr/films/schema
This command uses the Schema API to explicitly define a field named "name" that has the field type "text_general" (a text field). It will not be permitted to have multiple values, but it will be stored (meaning it can be retrieved by queries).
You can also use the Admin UI to create fields, but it offers a bit less control over the properties of your field. It will work for our case, though:
.Creating a field
image::images/solr-tutorial/tutorial-add-field.png[]
===== Create a "catchall" Copy Field
There's one more change to make before we start indexing.
In the first exercise when we queried the documents we had indexed, we didn't have to specify a field to search because the configuration we used was set up to copy fields into a `text` field, and that field was the default when no other field was defined in the query.
The configuration we're using now doesn't have that rule. We would need to define a field to search for every query. We can, however, set up a "catchall field" by defining a copy field that will take all data from all fields and index it into a field named `\_text_`. Let's do that now.
You can use either the Admin UI or the Schema API for this.
At the command line, use the Schema API again to define a copy field:
[source,bash]
curl -X POST -H 'Content-type:application/json' --data-binary '{"add-copy-field" : {"source":"*","dest":"_text_"}}' http://localhost:8983/solr/films/schema
In the Admin UI, choose btn:[Add Copy Field], then fill out the source and destination for your field, as in this screenshot.
.Creating a copy field
image::images/solr-tutorial/tutorial-add-copy-field.png[]
What this does is make a copy of all fields and put the data into the "\_text_" field.
TIP: It can be very expensive to do this with your production data because it tells Solr to effectively index everything twice. It will make indexing slower, and make your index larger. With your production data, you will want to be sure you only copy fields that really warrant it for your application.
OK, now we're ready to index the data and start playing around with it.
=== Index Sample Film Data
The films data we will index is located in the `example/films` directory of your installation. It comes in three formats: JSON, XML and CSV. Pick one of the formats and index it into the "films" collection (in each example, one command is for Unix/MacOS and the other is for Windows):
.To Index JSON Format
[source,subs="verbatim,attributes+"]
----
bin/post -c films example/films/films.json
C:\solr-{solr-docs-version}.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.json
----
.To Index XML Format
[source,subs="verbatim,attributes+"]
----
bin/post -c films example/films/films.xml
C:\solr-{solr-docs-version}.0> java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.xml
----
.To Index CSV Format
[source,subs="verbatim,attributes+"]
----
bin/post -c films example/films/films.csv -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|"
C:\solr-{solr-docs-version}.0> java -jar -Dc=films -Dparams=f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=| -Dauto example\exampledocs\post.jar example\films\*.csv
----
Each command includes these main parameters:
* `-c films`: this is the Solr collection to index data to.
* `example/films/films.json` (or `films.xml` or `films.csv`): this is the path to the data file to index. You could simply supply the directory where this file resides, but since you know the format you want to index, specifying the exact file for that format is more efficient.
Note the CSV command includes extra parameters. This is to ensure multi-valued entries in the "genre" and "directed_by" columns are split by the pipe (`|`) character, used in this file as a separator. Telling Solr to split these columns this way will ensure proper indexing of the data.
Each command will produce output similar to the below seen while indexing JSON:
[source,bash,subs="verbatim,attributes"]
----
$ ./bin/post -c films example/films/films.json
/bin/java -classpath /solr-{solr-docs-version}.0/dist/solr-core-{solr-docs-version}.0.jar -Dauto=yes -Dc=films -Ddata=files org.apache.solr.util.SimplePostTool example/films/films.json
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/films/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file films.json (application/json) to [base]/json/docs
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/films/update...
Time spent: 0:00:00.878
----
Hooray!
If you go to the Query screen in the Admin UI for films (http://localhost:8983/solr/#/films/query) and hit btn:[Execute Query] you should see 1100 results, with the first 10 returned to the screen.
Let's do a query to see if the "catchall" field worked properly. Enter "comedy" in the `q` box and hit btn:[Execute Query] again. You should see get 417 results. Feel free to play around with other searches before we move on to faceting.
[[tutorial-faceting]]
=== Faceting
One of Solr's most popular features is faceting. Faceting allows the search results to be arranged into subsets (or buckets, or categories), providing a count for each subset. There are several types of faceting: field values, numeric and date ranges, pivots (decision tree), and arbitrary query faceting.
==== Field Facets
In addition to providing search results, a Solr query can return the number of documents that contain each unique value in the whole result set.
On the Admin UI Query tab, if you check the `facet` checkbox, you'll see a few facet-related options appear:
.Facet options in the Query screen
image::images/solr-tutorial/tutorial-admin-ui-facet-options.png[Solr Quick Start: Query tab facet options]
To see facet counts from all documents (`q=\*:*`): turn on faceting (`facet=true`), and specify the field to facet on via the `facet.field` param. If you only want facets, and no document contents, specify `rows=0`. The `curl` command below will return facet counts for the `genre_str` field:
`curl "http://localhost:8983/solr/films/select?q=\*:*&rows=0&facet=true&facet.field=genre_str"`
In your terminal, you'll see something like:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":11,
"params":{
"q":"*:*",
"facet.field":"genre_str",
"rows":"0",
"facet":"true"}},
"response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"genre_str":[
"Drama",552,
"Comedy",389,
"Romance Film",270,
"Thriller",259,
"Action Film",196,
"Crime Fiction",170,
"World cinema",167]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}
We've truncated the output here a little bit, but in the `facet_counts` section, you see by default you get a count of the number of documents using each genre for every genre in the index. Solr has a parameter `facet.mincount` that you could use to limit the facets to only those that contain a certain number of documents (this parameter is not shown in the UI). Or, perhaps you do want all the facets, and you'll let your application's front-end control how it's displayed to users.
If you wanted to control the number of items in a bucket, you could do something like this:
`curl "http://localhost:8983/solr/films/select?=&q=\*:*&facet.field=genre_str&facet.mincount=200&facet=on&rows=0"`
You should only see 4 facets returned.
There are a great deal of other parameters available to help you control how Solr constructs the facets and facet lists. We'll cover some of them in this exercise, but you can also see the section <<faceting.adoc#faceting,Faceting>> for more detail.
==== Range Facets
For numerics or dates, it's often desirable to partition the facet counts into ranges rather than discrete values. A prime example of numeric range faceting, using the example techproducts data from our previous exercise, is `price`. In the `/browse` UI, it looks like this:
.Range facets
image::images/solr-tutorial/tutorial-range-facet.png[Solr Quick Start: Range facets]
The films data includes the release date for films, and we could use that to create date range facets, which are another common use for range facets.
The Solr Admin UI doesn't yet support range facet options, so you will need to use curl or similar command line tool for the following examples.
If we construct a query that looks like this:
[source,bash]
curl 'http://localhost:8983/solr/films/select?q=*:*&rows=0'\
'&facet=true'\
'&facet.range=initial_release_date'\
'&facet.range.start=NOW-20YEAR'\
'&facet.range.end=NOW'\
'&facet.range.gap=%2B1YEAR'
This will request all films and ask for them to be grouped by year starting with 20 years ago (our earliest release date is in 2000) and ending today. Note that this query again URL encodes a `+` as `%2B`.
In the terminal you will see:
[source,json]
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":8,
"params":{
"facet.range":"initial_release_date",
"facet.limit":"300",
"q":"*:*",
"facet.range.gap":"+1YEAR",
"rows":"0",
"facet":"on",
"facet.range.start":"NOW-20YEAR",
"facet.range.end":"NOW"}},
"response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{},
"facet_ranges":{
"initial_release_date":{
"counts":[
"1997-07-28T17:12:06.919Z",0,
"1998-07-28T17:12:06.919Z",0,
"1999-07-28T17:12:06.919Z",48,
"2000-07-28T17:12:06.919Z",82,
"2001-07-28T17:12:06.919Z",103,
"2002-07-28T17:12:06.919Z",131,
"2003-07-28T17:12:06.919Z",137,
"2004-07-28T17:12:06.919Z",163,
"2005-07-28T17:12:06.919Z",189,
"2006-07-28T17:12:06.919Z",92,
"2007-07-28T17:12:06.919Z",26,
"2008-07-28T17:12:06.919Z",7,
"2009-07-28T17:12:06.919Z",3,
"2010-07-28T17:12:06.919Z",0,
"2011-07-28T17:12:06.919Z",0,
"2012-07-28T17:12:06.919Z",1,
"2013-07-28T17:12:06.919Z",1,
"2014-07-28T17:12:06.919Z",1,
"2015-07-28T17:12:06.919Z",0,
"2016-07-28T17:12:06.919Z",0],
"gap":"+1YEAR",
"start":"1997-07-28T17:12:06.919Z",
"end":"2017-07-28T17:12:06.919Z"}},
"facet_intervals":{},
"facet_heatmaps":{}}}
==== Pivot Facets
Another faceting type is pivot facets, also known as "decision trees", allowing two or more fields to be nested for all the various possible combinations. Using the films data, pivot facets can be used to see how many of the films in the "Drama" category (the `genre_str` field) are directed by a director. Here's how to get at the raw data for this scenario:
`curl "http://localhost:8983/solr/films/select?q=\*:*&rows=0&facet=on&facet.pivot=genre_str,directed_by_str"`
This results in the following response, which shows a facet for each category and director combination:
[source,json]
{"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":1147,
"params":{
"q":"*:*",
"facet.pivot":"genre_str,directed_by_str",
"rows":"0",
"facet":"on"}},
"response":{"numFound":1100,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{},
"facet_pivot":{
"genre_str,directed_by_str":[{
"field":"genre_str",
"value":"Drama",
"count":552,
"pivot":[{
"field":"directed_by_str",
"value":"Ridley Scott",
"count":5},
{
"field":"directed_by_str",
"value":"Steven Soderbergh",
"count":5},
{
"field":"directed_by_str",
"value":"Michael Winterbottom",
"count":4}}]}]}}}
We've truncated this output as well - you will see a lot of genres and directors in your screen.
=== Exercise 2 Wrap Up
In this exercise, we learned a little bit more about how Solr organizes data in the indexes, and how to work with the Schema API to manipulate the schema file. We also learned a bit about facets in Solr, including range facets and pivot facets. In both of these things, we've only scratched the surface of the available options. If you can dream it, it might be possible!
Like our previous exercise, this data may not be relevant to your needs. We can clean up our work by deleting the collection. To do that, issue this command at the command line:
`bin/solr delete -c films`
[[exercise-3]]
== Exercise 3: Index Your Own Data
For this last exercise, work with a dataset of your choice. This can be files on your local hard drive, a set of data you have worked with before, or maybe a sample of the data you intend to index to Solr for your production application.
This exercise is intended to get you thinking about what you will need to do for your application:
* What sorts of data do you need to index?
* What will you need to do to prepare Solr for your data (such as, create specific fields, set up copy fields, determine analysis rules, etc.)
* What kinds of search options do you want to provide to users?
* How much testing will you need to do to ensure everything works the way you expect?
=== Create Your Own Collection
Before you get started, create a new collection, named whatever you'd like. In this example, the collection will be named "localDocs"; replace that name with whatever name you choose if you want to.
`./bin/solr create -c localDocs -s 2 -rf 2`
Again, as we saw from Exercise 2 above, this will use the `_default` configSet and all the schemaless features it provides. As we noted previously, this may cause problems when we index our data. You may need to iterate on indexing a few times before you get the schema right.
=== Indexing Ideas
Solr has lots of ways to index data. Choose one of the approaches below and try it out with your system:
Local Files with bin/post::
If you have a local directory of files, the Post Tool (`bin/post`) can index a directory of files. We saw this in action in our first exercise.
+
We used only JSON, XML and CSV in our exercises, but the Post Tool can also handle HTML, PDF, Microsoft Office formats (such as MS Word), plain text, and more.
+
In this example, assume there is a directory named "Documents" locally. To index it, we would issue a command like this (correcting the collection name after the `-c` parameter as needed):
+
`./bin/post -c localDocs ~/Documents`
+
You may get errors as it works through your documents. These might be caused by the field guessing, or the file type may not be supported. Indexing content such as this demonstrates the need to plan Solr for your data, which requires understanding it and perhaps also some trial and error.
DataImportHandler::
Solr includes a tool called the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler (DIH)>> which can connect to databases (if you have a jdbc driver), mail servers, or other structured data sources. There are several examples included for feeds, GMail, and a small HSQL database.
+
The `README.txt` file in `example/example-DIH` will give you details on how to start working with this tool.
SolrJ::
SolrJ is a Java-based client for interacting with Solr. Use <<using-solrj.adoc#using-solrj,SolrJ>> for JVM-based languages or other <<client-apis.adoc#client-apis,Solr clients>> to programmatically create documents to send to Solr.
Documents Screen::
Use the Admin UI <<documents-screen.adoc#documents-screen,Documents tab>> (at http://localhost:8983/solr/#/localDocs/documents) to paste in a document to be indexed, or select `Document Builder` from the `Document Type` dropdown to build a document one field at a time. Click on the btn:[Submit Document] button below the form to index your document.
=== Updating Data
You may notice that even if you index content in this tutorial more than once, it does not duplicate the results found. This is because the example Solr schema (a file named either `managed-schema` or `schema.xml`) specifies a `uniqueKey` field called `id`. Whenever you POST commands to Solr to add a document with the same value for the `uniqueKey` as an existing document, it automatically replaces it for you.
You can see that that has happened by looking at the values for `numDocs` and `maxDoc` in the core-specific Overview section of the Solr Admin UI.
`numDocs` represents the number of searchable documents in the index (and will be larger than the number of XML, JSON, or CSV files since some files contained more than one document). The `maxDoc` value may be larger as the `maxDoc` count includes logically deleted documents that have not yet been physically removed from the index. You can re-post the sample files over and over again as much as you want and `numDocs` will never increase, because the new documents will constantly be replacing the old.
Go ahead and edit any of the existing example data files, change some of the data, and re-run the PostTool (`bin/post`) again. You'll see your changes reflected in subsequent searches.
=== Deleting Data
If you need to iterate a few times to get your schema right, you may want to delete documents to clear out the collection and try again. Note, however, that merely removing documents doesn't change the underlying field definitions. Essentially, this will allow you to reindex your data after making changes to fields for your needs.
You can delete data by POSTing a delete command to the update URL and specifying the value of the document's unique key field, or a query that matches multiple documents (be careful with that one!). We can use `bin/post` to delete documents also if we structure the request properly.
Execute the following command to delete a specific document:
`bin/post -c localDocs -d "<delete><id>SP2514N</id></delete>"`
To delete all documents, you can use "delete-by-query" command like:
`bin/post -c localDocs -d "<delete><query>*:*</query></delete>"`
You can also modify the above to only delete documents that match a specific query.
=== Exercise 3 Wrap Up
At this point, you're ready to start working on your own.
Jump ahead to the overall <<Wrapping Up,wrap up>> when you're ready to stop Solr and remove all the examples you worked with and start fresh.
== Spatial Queries
Solr has sophisticated geospatial support, including searching within a specified distance range of a given location (or within a bounding box), sorting by distance, or even boosting results by the distance.
Some of the example techproducts documents we indexed in Exercise 1 have locations associated with them to illustrate the spatial capabilities. To re-index this data, see <<index-the-techproducts-data,Exercise 1>>.
Spatial queries can be combined with any other types of queries, such as in this example of querying for "ipod" within 10 kilometers from San Francisco:
.Spatial queries and results
image::images/solr-tutorial/tutorial-spatial.png[Solr Quick Start: spatial search]
This is from Solr's example search UI (called `/browse`), which has a nice feature to show a map for each item and allow easy selection of the location to search near. You can see this yourself by going to <http://localhost:8983/solr/techproducts/browse?q=ipod&pt=37.7752%2C-122.4232&d=10&sfield=store&fq=%7B%21bbox%7D&queryOpts=spatial&queryOpts=spatial> in a browser.
To learn more about Solr's spatial capabilities, see the section <<spatial-search.adoc#spatial-search,Spatial Search>>.
== Wrapping Up
If you've run the full set of commands in this quick start guide you have done the following:
* Launched Solr into SolrCloud mode, two nodes, two collections including shards and replicas
* Indexed several types of files
* Used the Schema API to modify your schema
* Opened the admin console, used its query interface to get results
* Opened the /browse interface to explore Solr's features in a more friendly and familiar interface
Nice work!
== Cleanup
As you work through this tutorial, you may want to stop Solr and reset the environment back to the starting point. The following command line will stop Solr and remove the directories for each of the two nodes that were created all the way back in Exercise 1:
`bin/solr stop -all ; rm -Rf example/cloud/`
== Where to next?
This Guide will be your best resource for learning more about Solr.
Solr also has a robust community made up of people happy to help you get started. For more information, check out the Solr website's http://lucene.apache.org/solr/resources.html[Resources page].

View File

@ -1,6 +1,6 @@
= Upgrading Solr = Solr Upgrade Notes
:page-shortname: upgrading-solr :page-shortname: solr-upgrade-notes
:page-permalink: upgrading-solr.html :page-permalink: solr-upgrade-notes.html
// Licensed to the Apache Software Foundation (ASF) under one // Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file // or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information // distributed with this work for additional information
@ -18,10 +18,16 @@
// specific language governing permissions and limitations // specific language governing permissions and limitations
// under the License. // under the License.
If you are already using Solr 6.5, Solr 6.6 should not present any major problems. However, you should review the {solr-javadocs}/changes/Changes.html[`CHANGES.txt`] file found in your Solr package for changes and updates that may effect your existing implementation. Detailed steps for upgrading a Solr cluster can be found in the appendix: <<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading a Solr Cluster>>. The following notes describe changes to Solr in recent releases that you should be aware of before upgrading.
These notes are meant to highlight the biggest changes that may impact the largest number of implementations. It is not a comprehensive list of all changes to Solr in any release.
When planning your Solr upgrade, consider the customizations you have made to your system and review the {solr-javadocs}/changes/Changes.html[`CHANGES.txt`] file found in your Solr package. That file includes all of the changes and updates that may effect your existing implementation. Detailed steps for upgrading a Solr cluster can be found in the appendix: <<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading a Solr Cluster>>.
== Upgrading from 6.5.x == Upgrading from 6.5.x
If you are already using Solr 6.5, Solr 6.6 should not present any major problems.
* Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed. * Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
* JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any mincount greater than 1. * JSON Facet API now uses hyper-log-log for numBuckets cardinality calculation and calculates cardinality before filtering buckets by any mincount greater than 1.

View File

@ -21,24 +21,19 @@
This page covers how to upgrade an existing Solr cluster that was installed using the <<taking-solr-to-production.adoc#taking-solr-to-production,service installation scripts>>. This page covers how to upgrade an existing Solr cluster that was installed using the <<taking-solr-to-production.adoc#taking-solr-to-production,service installation scripts>>.
[IMPORTANT] IMPORTANT: The steps outlined on this page assume you use the default service name of `solr`. If you use an alternate service name or Solr installation directory, some of the paths and commands mentioned below will have to be modified accordingly.
====
The steps outlined on this page assume you use the default service name of "```solr```". If you use an alternate service name or Solr installation directory, some of the paths and commands mentioned below will have to be modified accordingly.
====
== Planning Your Upgrade == Planning Your Upgrade
Here is a checklist of things you need to prepare before starting the upgrade process: Here is a checklist of things you need to prepare before starting the upgrade process:
1. Examine the <<upgrading-solr.adoc#upgrading-solr,Upgrading Solr>> page to determine if any behavior changes in the new version of Solr will affect your installation. . Examine the <<solr-upgrade-notes.adoc#solr-upgrade-notes,Solr Upgrade Notes>> page to determine if any behavior changes in the new version of Solr will affect your installation.
2. If not using replication (ie: collections with replicationFactor > 1), then you should make a backup of each collection. If all of your collections use replication, then you don't technically need to make a backup since you will be upgrading and verifying each node individually. . If not using replication (ie: collections with replicationFactor > 1), then you should make a backup of each collection. If all of your collections use replication, then you don't technically need to make a backup since you will be upgrading and verifying each node individually.
3. Determine which Solr node is currently hosting the Overseer leader process in SolrCloud, as you should upgrade this node last. To determine the Overseer, use the Overseer Status API, see: <<collections-api.adoc#collections-api,Collections API>>. . Determine which Solr node is currently hosting the Overseer leader process in SolrCloud, as you should upgrade this node last. To determine the Overseer, use the Overseer Status API, see: <<collections-api.adoc#collections-api,Collections API>>.
4. Plan to perform your upgrade during a system maintenance window if possible. You'll be doing a rolling restart of your cluster (each node, one-by-one), but we still recommend doing the upgrade when system usage is minimal. . Plan to perform your upgrade during a system maintenance window if possible. You'll be doing a rolling restart of your cluster (each node, one-by-one), but we still recommend doing the upgrade when system usage is minimal.
5. Verify the cluster is currently healthy and all replicas are active, as you should not perform an upgrade on a degraded cluster. . Verify the cluster is currently healthy and all replicas are active, as you should not perform an upgrade on a degraded cluster.
6. Re-build and test all custom server-side components against the new Solr JAR files. . Re-build and test all custom server-side components against the new Solr JAR files.
7. Determine the values of the following variables that are used by the Solr Control Scripts: . Determine the values of the following variables that are used by the Solr Control Scripts:
* `ZK_HOST`: The ZooKeeper connection string your current SolrCloud nodes use to connect to ZooKeeper; this value will be the same for all nodes in the cluster. * `ZK_HOST`: The ZooKeeper connection string your current SolrCloud nodes use to connect to ZooKeeper; this value will be the same for all nodes in the cluster.
* `SOLR_HOST`: The hostname each Solr node used to register with ZooKeeper when joining the SolrCloud cluster; this value will be used to set the *host* Java system property when starting the new Solr process. * `SOLR_HOST`: The hostname each Solr node used to register with ZooKeeper when joining the SolrCloud cluster; this value will be used to set the *host* Java system property when starting the new Solr process.
* `SOLR_PORT`: The port each Solr node is listening on, such as 8983. * `SOLR_PORT`: The port each Solr node is listening on, such as 8983.