SOLR-11050: remove unneeded anchors for pages that have no incoming links from other pages

This commit is contained in:
Cassandra Targett 2017-07-12 11:56:50 -05:00
parent 47731ce0a4
commit 74ab16168c
72 changed files with 169 additions and 625 deletions

View File

@ -1,6 +1,7 @@
= About This Guide
:page-shortname: about-this-guide
:page-permalink: about-this-guide.html
:page-toc: false
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -26,38 +27,13 @@ Designed to provide high-level documentation, this guide is intended to be more
The material as presented assumes that you are familiar with some basic search concepts and that you can read XML. It does not assume that you are a Java programmer, although knowledge of Java is helpful when working directly with Lucene or when developing custom extensions to a Lucene/Solr installation.
[[AboutThisGuide-SpecialInlineNotes]]
== Special Inline Notes
Special notes are included throughout these pages. There are several types of notes:
Information blocks::
+
NOTE: These provide additional information that's useful for you to know.
Important::
+
IMPORTANT: These provide information that is critical for you to know.
Tip::
+
TIP: These provide helpful tips.
Caution::
+
CAUTION: These provide details on scenarios or configurations you should be careful with.
Warning::
+
WARNING: These are meant to warn you from a possibly dangerous change or action.
[[AboutThisGuide-HostsandPortExamples]]
== Hosts and Port Examples
The default port when running Solr is 8983. The samples, URLs and screenshots in this guide may show different ports, because the port number that Solr uses is configurable. If you have not customized your installation of Solr, please make sure that you use port 8983 when following the examples, or configure your own installation to use the port numbers shown in the examples. For information about configuring port numbers, see the section <<managing-solr.adoc#managing-solr,Managing Solr>>.
The default port when running Solr is 8983. The samples, URLs and screenshots in this guide may show different ports, because the port number that Solr uses is configurable.
Similarly, URL examples use 'localhost' throughout; if you are accessing Solr from a location remote to the server hosting Solr, replace 'localhost' with the proper domain or IP where Solr is running.
If you have not customized your installation of Solr, please make sure that you use port 8983 when following the examples, or configure your own installation to use the port numbers shown in the examples. For information about configuring port numbers, see the section <<managing-solr.adoc#managing-solr,Managing Solr>>.
Similarly, URL examples use `localhost` throughout; if you are accessing Solr from a location remote to the server hosting Solr, replace `localhost` with the proper domain or IP where Solr is running.
For example, we might provide a sample query like:
@ -67,7 +43,32 @@ There are several items in this URL you might need to change locally. First, if
`\http://www.example.com/solr/mycollection/select?q=brown+cow`
[[AboutThisGuide-Paths]]
== Paths
Path information is given relative to `solr.home`, which is the location under the main Solr installation where Solr's collections and their `conf` and `data` directories are stored. When running the various examples mentioned through out this tutorial (i.e., `bin/solr -e techproducts`) the `solr.home` will be a sub-directory of `example/` created for you automatically.
Path information is given relative to `solr.home`, which is the location under the main Solr installation where Solr's collections and their `conf` and `data` directories are stored.
When running the various examples mentioned through out this tutorial (i.e., `bin/solr -e techproducts`) the `solr.home` will be a sub-directory of `example/` created for you automatically.
== Special Inline Notes
Special notes are included throughout these pages. There are several types of notes:
=== Information blocks
NOTE: These provide additional information that's useful for you to know.
=== Important
IMPORTANT: These provide information that is critical for you to know.
=== Tip
TIP: These provide helpful tips.
=== Caution
CAUTION: These provide details on scenarios or configurations you should be careful with.
=== Warning
WARNING: These are meant to warn you from a possibly dangerous change or action.

View File

@ -37,7 +37,6 @@ A `TypeTokenFilterFactory` is available that creates a `TypeTokenFilter` that fi
For a complete list of the available TokenFilters, see the section <<tokenizers.adoc#tokenizers,Tokenizers>>.
[[AboutTokenizers-WhenTouseaCharFiltervs.aTokenFilter]]
== When To use a CharFilter vs. a TokenFilter
There are several pairs of CharFilters and TokenFilters that have related (ie: `MappingCharFilter` and `ASCIIFoldingFilter`) or nearly identical (ie: `PatternReplaceCharFilterFactory` and `PatternReplaceFilterFactory`) functionality and it may not always be obvious which is the best choice.

View File

@ -30,12 +30,10 @@ In addition to requiring that Solr by running in <<solrcloud.adoc#solrcloud,Solr
Before enabling this feature, users should carefully consider the issues discussed in the <<Securing Runtime Libraries>> section below.
====
[[AddingCustomPluginsinSolrCloudMode-UploadingJarFiles]]
== Uploading Jar Files
The first step is to use the <<blob-store-api.adoc#blob-store-api,Blob Store API>> to upload your jar files. This will to put your jars in the `.system` collection and distribute them across your SolrCloud nodes. These jars are added to a separate classloader and only accessible to components that are configured with the property `runtimeLib=true`. These components are loaded lazily because the `.system` collection may not be loaded when a particular core is loaded.
[[AddingCustomPluginsinSolrCloudMode-ConfigAPICommandstouseJarsasRuntimeLibraries]]
== Config API Commands to use Jars as Runtime Libraries
The runtime library feature uses a special set of commands for the <<config-api.adoc#config-api,Config API>> to add, update, or remove jar files currently available in the blob store to the list of runtime libraries.
@ -74,14 +72,12 @@ curl http://localhost:8983/solr/techproducts/config -H 'Content-type:application
}'
----
[[AddingCustomPluginsinSolrCloudMode-SecuringRuntimeLibraries]]
== Securing Runtime Libraries
A drawback of this feature is that it could be used to load malicious executable code into the system. However, it is possible to restrict the system to load only trusted jars using http://en.wikipedia.org/wiki/Public_key_infrastructure[PKI] to verify that the executables loaded into the system are trustworthy.
The following steps will allow you enable security for this feature. The instructions assume you have started all your Solr nodes with the `-Denable.runtime.lib=true`.
[[Step1_GenerateanRSAPrivateKey]]
=== Step 1: Generate an RSA Private Key
The first step is to generate an RSA private key. The example below uses a 512-bit key, but you should use the strength appropriate to your needs.
@ -91,7 +87,6 @@ The first step is to generate an RSA private key. The example below uses a 512-b
$ openssl genrsa -out priv_key.pem 512
----
[[Step2_OutputthePublicKey]]
=== Step 2: Output the Public Key
The public portion of the key should be output in DER format so Java can read it.
@ -101,7 +96,6 @@ The public portion of the key should be output in DER format so Java can read it
$ openssl rsa -in priv_key.pem -pubout -outform DER -out pub_key.der
----
[[Step3_LoadtheKeytoZooKeeper]]
=== Step 3: Load the Key to ZooKeeper
The `.der` files that are output from Step 2 should then be loaded to ZooKeeper under a node `/keys/exe` so they are available throughout every node. You can load any number of public keys to that node and all are valid. If a key is removed from the directory, the signatures of that key will cease to be valid. So, before removing the a key, make sure to update your runtime library configurations with valid signatures with the `update-runtimelib` command.
@ -130,7 +124,6 @@ $ .bin/zkCli.sh -server localhost:9983
After this, any attempt to load a jar will fail. All your jars must be signed with one of your private keys for Solr to trust it. The process to sign your jars and use the signature is outlined in Steps 4-6.
[[Step4_SignthejarFile]]
=== Step 4: Sign the jar File
Next you need to sign the sha1 digest of your jar file and get the base64 string.
@ -142,7 +135,6 @@ $ openssl dgst -sha1 -sign priv_key.pem myjar.jar | openssl enc -base64
The output of this step will be a string that you will need to add the jar to your classpath in Step 6 below.
[[Step5_LoadthejartotheBlobStore]]
=== Step 5: Load the jar to the Blob Store
Load your jar to the Blob store, using the <<blob-store-api.adoc#blob-store-api,Blob Store API>>. This step does not require a signature; you will need the signature in Step 6 to add it to your classpath.
@ -155,7 +147,6 @@ http://localhost:8983/solr/.system/blob/{blobname}
The blob name that you give the jar file in this step will be used as the name in the next step.
[[Step6_AddthejartotheClasspath]]
=== Step 6: Add the jar to the Classpath
Finally, add the jar to the classpath using the Config API as detailed above. In this step, you will need to provide the signature of the jar that you got in Step 4.

View File

@ -60,7 +60,6 @@ In this case, no Analyzer class was specified on the `<analyzer>` element. Rathe
The output of an Analyzer affects the _terms_ indexed in a given field (and the terms used when parsing queries against those fields) but it has no impact on the _stored_ value for the fields. For example: an analyzer might split "Brown Cow" into two indexed terms "brown" and "cow", but the stored value will still be a single String: "Brown Cow"
====
[[Analyzers-AnalysisPhases]]
== Analysis Phases
Analysis takes place in two contexts. At index time, when a field is being created, the token stream that results from analysis is added to an index and defines the set of terms (including positions, sizes, and so on) for the field. At query time, the values being searched for are analyzed and the terms that result are matched against those that are stored in the field's index.
@ -89,7 +88,6 @@ In this theoretical example, at index time the text is tokenized, the tokens are
At query time, the only normalization that happens is to convert the query terms to lowercase. The filtering and mapping steps that occur at index time are not applied to the query terms. Queries must then, in this example, be very precise, using only the normalized terms that were stored at index time.
[[Analyzers-AnalysisforMulti-TermExpansion]]
=== Analysis for Multi-Term Expansion
In some types of queries (ie: Prefix, Wildcard, Regex, etc...) the input provided by the user is not natural language intended for Analysis. Things like Synonyms or Stop word filtering do not work in a logical way in these types of Queries.

View File

@ -22,7 +22,6 @@ Solr can support Basic authentication for users with the use of the BasicAuthPlu
An authorization plugin is also available to configure Solr with permissions to perform various activities in the system. The authorization plugin is described in the section <<rule-based-authorization-plugin.adoc#rule-based-authorization-plugin,Rule-Based Authorization Plugin>>.
[[BasicAuthenticationPlugin-EnableBasicAuthentication]]
== Enable Basic Authentication
To use Basic authentication, you must first create a `security.json` file. This file and where to put it is described in detail in the section <<authentication-and-authorization-plugins.adoc#AuthenticationandAuthorizationPlugins-EnablePluginswithsecurity.json,Enable Plugins with security.json>>.
@ -68,7 +67,6 @@ If you are using SolrCloud, you must upload `security.json` to ZooKeeper. You ca
bin/solr zk cp file:path_to_local_security.json zk:/security.json -z localhost:9983
----
[[BasicAuthenticationPlugin-Caveats]]
=== Caveats
There are a few things to keep in mind when using the Basic authentication plugin.
@ -77,19 +75,16 @@ There are a few things to keep in mind when using the Basic authentication plugi
* A user who has access to write permissions to `security.json` will be able to modify all the permissions and how users have been assigned permissions. Special care should be taken to only grant access to editing security to appropriate users.
* Your network should, of course, be secure. Even with Basic authentication enabled, you should not unnecessarily expose Solr to the outside world.
[[BasicAuthenticationPlugin-EditingAuthenticationPluginConfiguration]]
== Editing Authentication Plugin Configuration
An Authentication API allows modifying user IDs and passwords. The API provides an endpoint with specific commands to set user details or delete a user.
[[BasicAuthenticationPlugin-APIEntryPoint]]
=== API Entry Point
`admin/authentication`
This endpoint is not collection-specific, so users are created for the entire Solr cluster. If users need to be restricted to a specific collection, that can be done with the authorization rules.
[[BasicAuthenticationPlugin-AddaUserorEditaPassword]]
=== Add a User or Edit a Password
The `set-user` command allows you to add users and change their passwords. For example, the following defines two users and their passwords:
@ -101,7 +96,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"harry":"HarrysSecret"}}'
----
[[BasicAuthenticationPlugin-DeleteaUser]]
=== Delete a User
The `delete-user` command allows you to remove a user. The user password does not need to be sent to remove a user. In the following example, we've asked that user IDs 'tom' and 'harry' be removed from the system.
@ -112,7 +106,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"delete-user": ["tom","harry"]}'
----
[[BasicAuthenticationPlugin-Setaproperty]]
=== Set a Property
Set arbitrary properties for authentication plugin. The only supported property is `'blockUnknown'`
@ -123,7 +116,6 @@ curl --user solr:SolrRocks http://localhost:8983/solr/admin/authentication -H 'C
"set-property": {"blockUnknown":false}}'
----
[[BasicAuthenticationPlugin-UsingBasicAuthwithSolrJ]]
=== Using BasicAuth with SolrJ
In SolrJ, the basic authentication credentials need to be set for each request as in this example:
@ -144,7 +136,6 @@ req.setBasicAuthCredentials(userName, password);
QueryResponse rsp = req.process(solrClient);
----
[[BasicAuthenticationPlugin-UsingCommandLinescriptswithBasicAuth]]
=== Using Command Line scripts with BasicAuth
Add the following line to the `solr.in.sh` or `solr.in.cmd` file. This example tells the `bin/solr` command line to to use "basic" as the type of authentication, and to pass credentials with the user-name "solr" and password "SolrRocks":

View File

@ -28,7 +28,6 @@ When using the blob store, note that the API does not delete or overwrite a prev
The blob store API is implemented as a requestHandler. A special collection named ".system" is used to store the blobs. This collection can be created in advance, but if it does not exist it will be created automatically.
[[BlobStoreAPI-Aboutthe.systemCollection]]
== About the .system Collection
Before uploading blobs to the blob store, a special collection must be created and it must be named `.system`. Solr will automatically create this collection if it does not already exist, but you can also create it manually if you choose.
@ -46,7 +45,6 @@ curl http://localhost:8983/solr/admin/collections?action=CREATE&name=.system&rep
IMPORTANT: The `bin/solr` script cannot be used to create the `.system` collection.
[[BlobStoreAPI-UploadFilestoBlobStore]]
== Upload Files to Blob Store
After the `.system` collection has been created, files can be uploaded to the blob store with a request similar to the following:
@ -132,7 +130,6 @@ For the latest version of a blob, the \{version} can be omitted,
curl http://localhost:8983/solr/.system/blob/{blobname}?wt=filestream > {outputfilename}
----
[[BlobStoreAPI-UseaBlobinaHandlerorComponent]]
== Use a Blob in a Handler or Component
To use the blob as the class for a request handler or search component, you create a request handler in `solrconfig.xml` as usual. You will need to define the following parameters:

View File

@ -22,7 +22,6 @@ CharFilter is a component that pre-processes input characters.
CharFilters can be chained like Token Filters and placed in front of a Tokenizer. CharFilters can add, change, or remove characters while preserving the original character offsets to support features like highlighting.
[[CharFilterFactories-solr.MappingCharFilterFactory]]
== solr.MappingCharFilterFactory
This filter creates `org.apache.lucene.analysis.MappingCharFilter`, which can be used for changing one string to another (for example, for normalizing `é` to `e`.).
@ -65,7 +64,6 @@ Mapping file syntax:
|===
** A backslash followed by any other character is interpreted as if the character were present without the backslash.
[[CharFilterFactories-solr.HTMLStripCharFilterFactory]]
== solr.HTMLStripCharFilterFactory
This filter creates `org.apache.solr.analysis.HTMLStripCharFilter`. This CharFilter strips HTML from the input stream and passes the result to another CharFilter or a Tokenizer.
@ -114,7 +112,6 @@ Example:
</analyzer>
----
[[CharFilterFactories-solr.ICUNormalizer2CharFilterFactory]]
== solr.ICUNormalizer2CharFilterFactory
This filter performs pre-tokenization Unicode normalization using http://site.icu-project.org[ICU4J].
@ -138,7 +135,6 @@ Example:
</analyzer>
----
[[CharFilterFactories-solr.PatternReplaceCharFilterFactory]]
== solr.PatternReplaceCharFilterFactory
This filter uses http://www.regular-expressions.info/reference.html[regular expressions] to replace or change character patterns.

View File

@ -27,7 +27,6 @@ The Collapsing query parser groups documents (collapsing the result set) accordi
In order to use these features with SolrCloud, the documents must be located on the same shard. To ensure document co-location, you can define the `router.name` parameter as `compositeId` when creating the collection. For more information on this option, see the section <<shards-and-indexing-data-in-solrcloud.adoc#ShardsandIndexingDatainSolrCloud-DocumentRouting,Document Routing>>.
====
[[CollapseandExpandResults-CollapsingQueryParser]]
== Collapsing Query Parser
The `CollapsingQParser` is really a _post filter_ that provides more performant field collapsing than Solr's standard approach when the number of distinct groups in the result set is high. This parser collapses the result set to a single document per group before it forwards the result set to the rest of the search components. So all downstream components (faceting, highlighting, etc...) will work with the collapsed result set.
@ -121,7 +120,6 @@ fq={!collapse field=group_field hint=top_fc}
The CollapsingQParserPlugin fully supports the QueryElevationComponent.
[[CollapseandExpandResults-ExpandComponent]]
== Expand Component
The ExpandComponent can be used to expand the groups that were collapsed by the http://heliosearch.org/the-collapsingqparserplugin-solrs-new-high-performance-field-collapsing-postfilter/[CollapsingQParserPlugin].

View File

@ -36,7 +36,6 @@ The `zkcli.sh` provided by Solr is not the same as the https://zookeeper.apache.
ZooKeeper's `zkCli.sh` provides a completely general, application-agnostic shell for manipulating data in ZooKeeper. Solr's `zkcli.sh` discussed in this section is specific to Solr, and has command line arguments specific to dealing with Solr data in ZooKeeper.
====
[[CommandLineUtilities-UsingSolr_sZooKeeperCLI]]
== Using Solr's ZooKeeper CLI
Use the `help` option to get a list of available commands from the script itself, as in `./server/scripts/cloud-scrips/zkcli.sh help`.
@ -91,23 +90,20 @@ The short form parameter options may be specified with a single dash (eg: `-c my
The long form parameter options may be specified using either a single dash (eg: `-collection mycollection`) or a double dash (eg: `--collection mycollection`)
====
[[CommandLineUtilities-ZooKeeperCLIExamples]]
== ZooKeeper CLI Examples
Below are some examples of using the `zkcli.sh` CLI which assume you have already started the SolrCloud example (`bin/solr -e cloud -noprompt`)
If you are on Windows machine, simply replace `zkcli.sh` with `zkcli.bat` in these examples.
[[CommandLineUtilities-Uploadaconfigurationdirectory]]
=== Upload a configuration directory
=== Upload a Configuration Directory
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd upconfig -confname my_new_config -confdir server/solr/configsets/_default/conf
----
[[CommandLineUtilities-BootstrapZooKeeperfromexistingSOLR_HOME]]
=== Bootstrap ZooKeeper from existing SOLR_HOME
=== Bootstrap ZooKeeper from an Existing solr.home
[source,bash]
----
@ -120,32 +116,28 @@ If you are on Windows machine, simply replace `zkcli.sh` with `zkcli.bat` in the
Using the boostrap command with a zookeeper chroot in the `-zkhost` parameter, e.g. `-zkhost 127.0.0.1:2181/solr`, will automatically create the chroot path before uploading the configs.
====
[[CommandLineUtilities-PutarbitrarydataintoanewZooKeeperfile]]
=== Put arbitrary data into a new ZooKeeper file
=== Put Arbitrary Data into a New ZooKeeper file
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd put /my_zk_file.txt 'some data'
----
[[CommandLineUtilities-PutalocalfileintoanewZooKeeperfile]]
=== Put a local file into a new ZooKeeper file
=== Put a Local File into a New ZooKeeper File
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd putfile /my_zk_file.txt /tmp/my_local_file.txt
----
[[CommandLineUtilities-Linkacollectiontoaconfigurationset]]
=== Link a collection to a configuration set
=== Link a Collection to a ConfigSet
[source,bash]
----
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:9983 -cmd linkconfig -collection gettingstarted -confname my_new_config
----
[[CommandLineUtilities-CreateanewZooKeeperpath]]
=== Create a new ZooKeeper path
=== Create a New ZooKeeper Path
This can be useful to create a chroot path in ZooKeeper before first cluster start.
@ -154,9 +146,7 @@ This can be useful to create a chroot path in ZooKeeper before first cluster sta
./server/scripts/cloud-scripts/zkcli.sh -zkhost 127.0.0.1:2181 -cmd makepath /solr
----
[[CommandLineUtilities-Setaclusterproperty]]
=== Set a cluster property
=== Set a Cluster Property
This command will add or modify a single cluster property in `clusterprops.json`. Use this command instead of the usual getfile \-> edit \-> putfile cycle.

View File

@ -25,7 +25,6 @@ Solr logs are a key way to know what's happening in the system. There are severa
In addition to the logging options described below, there is a way to configure which request parameters (such as parameters sent as part of queries) are logged with an additional request parameter called `logParamsList`. See the section on <<common-query-parameters.adoc#CommonQueryParameters-ThelogParamsListParameter,Common Query Parameters>> for more information.
====
[[ConfiguringLogging-TemporaryLoggingSettings]]
== Temporary Logging Settings
You can control the amount of logging output in Solr by using the Admin Web interface. Select the *LOGGING* link. Note that this page only lets you change settings in the running system and is not saved for the next run. (For more information about the Admin Web interface, see <<using-the-solr-administration-user-interface.adoc#using-the-solr-administration-user-interface,Using the Solr Administration User Interface>>.)
@ -59,7 +58,6 @@ Log levels settings are as follows:
Multiple settings at one time are allowed.
[[ConfiguringLogging-LoglevelAPI]]
=== Log level API
There is also a way of sending REST commands to the logging endpoint to do the same. Example:
@ -70,7 +68,6 @@ There is also a way of sending REST commands to the logging endpoint to do the s
curl -s http://localhost:8983/solr/admin/info/logging --data-binary "set=root:WARN&wt=json"
----
[[ConfiguringLogging-ChoosingLogLevelatStartup]]
== Choosing Log Level at Startup
You can temporarily choose a different logging level as you start Solr. There are two ways:
@ -87,7 +84,6 @@ bin/solr start -f -v
bin/solr start -f -q
----
[[ConfiguringLogging-PermanentLoggingSettings]]
== Permanent Logging Settings
Solr uses http://logging.apache.org/log4j/1.2/[Log4J version 1.2] for logging which is configured using `server/resources/log4j.properties`. Take a moment to inspect the contents of the `log4j.properties` file so that you are familiar with its structure. By default, Solr log messages will be written to `SOLR_LOGS_DIR/solr.log`.
@ -109,7 +105,6 @@ On every startup of Solr, the start script will clean up old logs and rotate the
You can disable the automatic log rotation at startup by changing the setting `SOLR_LOG_PRESTART_ROTATION` found in `bin/solr.in.sh` or `bin/solr.in.cmd` to false.
[[ConfiguringLogging-LoggingSlowQueries]]
== Logging Slow Queries
For high-volume search applications, logging every query can generate a large amount of logs and, depending on the volume, potentially impact performance. If you mine these logs for additional insights into your application, then logging every query request may be useful.

View File

@ -35,7 +35,6 @@ If you are using replication to replicate the Solr index (as described in <<lega
NOTE: If the environment variable `SOLR_DATA_HOME` if defined, or if `solr.data.home` is configured for your DirectoryFactory, the location of data directory will be `<SOLR_DATA_HOME>/<instance_name>/data`.
[[DataDirandDirectoryFactoryinSolrConfig-SpecifyingtheDirectoryFactoryForYourIndex]]
== Specifying the DirectoryFactory For Your Index
The default {solr-javadocs}/solr-core/org/apache/solr/core/StandardDirectoryFactory.html[`solr.StandardDirectoryFactory`] is filesystem based, and tries to pick the best implementation for the current JVM and platform. You can force a particular implementation and/or config options by specifying {solr-javadocs}/solr-core/org/apache/solr/core/MMapDirectoryFactory.html[`solr.MMapDirectoryFactory`], {solr-javadocs}/solr-core/org/apache/solr/core/NIOFSDirectoryFactory.html[`solr.NIOFSDirectoryFactory`], or {solr-javadocs}/solr-core/org/apache/solr/core/SimpleFSDirectoryFactory.html[`solr.SimpleFSDirectoryFactory`].
@ -57,7 +56,5 @@ The {solr-javadocs}/solr-core/org/apache/solr/core/RAMDirectoryFactory.html[`sol
[NOTE]
====
If you are using Hadoop and would like to store your indexes in HDFS, you should use the {solr-javadocs}/solr-core/org/apache/solr/core/HdfsDirectoryFactory.html[`solr.HdfsDirectoryFactory`] instead of either of the above implementations. For more details, see the section <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>>.
====

View File

@ -23,7 +23,6 @@ The Dataimport screen shows the configuration of the DataImportHandler (DIH) and
.The Dataimport Screen
image::images/dataimport-screen/dataimport.png[image,width=485,height=250]
This screen also lets you adjust various options to control how the data is imported to Solr, and view the data import configuration file that controls the import.
For more information about data importing with DIH, see the section on <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Uploading Structured Data Store Data with the Data Import Handler>>.

View File

@ -26,7 +26,6 @@ Preventing duplicate or near duplicate documents from entering an index or taggi
* Lookup3Signature: 64-bit hash used for exact duplicate detection. This is much faster than MD5 and smaller to index.
* http://wiki.apache.org/solr/TextProfileSignature[TextProfileSignature]: Fuzzy hashing implementation from Apache Nutch for near duplicate detection. It's tunable but works best on longer text.
Other, more sophisticated algorithms for fuzzy/near hashing can be added later.
[IMPORTANT]
@ -36,12 +35,10 @@ Adding in the de-duplication process will change the `allowDups` setting so that
Of course the `signatureField` could be the unique field, but generally you want the unique field to be unique. When a document is added, a signature will automatically be generated and attached to the document in the specified `signatureField`.
====
[[De-Duplication-ConfigurationOptions]]
== Configuration Options
There are two places in Solr to configure de-duplication: in `solrconfig.xml` and in `schema.xml`.
[[De-Duplication-Insolrconfig.xml]]
=== In solrconfig.xml
The `SignatureUpdateProcessorFactory` has to be registered in `solrconfig.xml` as part of an <<update-request-processors.adoc#update-request-processors,Update Request Processor Chain>>, as in this example:
@ -84,8 +81,6 @@ Set to *false* to disable de-duplication processing. The default is *true*.
overwriteDupes::
If true, the default, when a document exists that already matches this signature, it will be overwritten.
[[De-Duplication-Inschema.xml]]
=== In schema.xml
If you are using a separate field for storing the signature, you must have it indexed:

View File

@ -20,8 +20,7 @@
Fields are defined in the fields element of `schema.xml`. Once you have the field types set up, defining the fields themselves is simple.
[[DefiningFields-Example]]
== Example
== Example Field Definition
The following example defines a field named `price` with a type named `float` and a default value of `0.0`; the `indexed` and `stored` properties are explicitly set to `true`, while any other properties specified on the `float` field type are inherited.
@ -30,7 +29,6 @@ The following example defines a field named `price` with a type named `float` an
<field name="price" type="float" default="0.0" indexed="true" stored="true"/>
----
[[DefiningFields-FieldProperties]]
== Field Properties
Field definitions can have the following properties:
@ -44,7 +42,6 @@ The name of the `fieldType` for this field. This will be found in the `name` att
`default`::
A default value that will be added automatically to any document that does not have a value in this field when it is indexed. If this property is not specified, there is no default.
[[DefiningFields-OptionalFieldTypeOverrideProperties]]
== Optional Field Type Override Properties
Fields can have many of the same properties as field types. Properties from the table below which are specified on an individual field will override any explicit value for that property specified on the the `fieldType` of the field, or any implicit default property value provided by the underlying `fieldType` implementation. The table below is reproduced from <<field-type-definitions-and-properties.adoc#field-type-definitions-and-properties,Field Type Definitions and Properties>>, which has more details:

View File

@ -31,12 +31,10 @@ For specific information on each of these language identification implementation
For more information about language analysis in Solr, see <<language-analysis.adoc#language-analysis,Language Analysis>>.
[[DetectingLanguagesDuringIndexing-ConfiguringLanguageDetection]]
== Configuring Language Detection
You can configure the `langid` UpdateRequestProcessor in `solrconfig.xml`. Both implementations take the same parameters, which are described in the following section. At a minimum, you must specify the fields for language identification and a field for the resulting language code.
[[DetectingLanguagesDuringIndexing-ConfiguringTikaLanguageDetection]]
=== Configuring Tika Language Detection
Here is an example of a minimal Tika `langid` configuration in `solrconfig.xml`:
@ -51,7 +49,6 @@ Here is an example of a minimal Tika `langid` configuration in `solrconfig.xml`:
</processor>
----
[[DetectingLanguagesDuringIndexing-ConfiguringLangDetectLanguageDetection]]
=== Configuring LangDetect Language Detection
Here is an example of a minimal LangDetect `langid` configuration in `solrconfig.xml`:
@ -66,7 +63,6 @@ Here is an example of a minimal LangDetect `langid` configuration in `solrconfig
</processor>
----
[[DetectingLanguagesDuringIndexing-langidParameters]]
== langid Parameters
As previously mentioned, both implementations of the `langid` UpdateRequestProcessor take the same parameters.

View File

@ -22,7 +22,6 @@ When a Solr node receives a search request, the request is routed behind the sce
The chosen replica acts as an aggregator: it creates internal requests to randomly chosen replicas of every shard in the collection, coordinates the responses, issues any subsequent internal requests as needed (for example, to refine facets values, or request additional stored fields), and constructs the final response for the client.
[[DistributedRequests-LimitingWhichShardsareQueried]]
== Limiting Which Shards are Queried
While one of the advantages of using SolrCloud is the ability to query very large collections distributed among various shards, in some cases <<shards-and-indexing-data-in-solrcloud.adoc#ShardsandIndexingDatainSolrCloud-DocumentRouting,you may know that you are only interested in results from a subset of your shards>>. You have the option of searching over all of your data or just parts of it.
@ -71,7 +70,6 @@ And of course, you can specify a list of shards (seperated by commas) each defin
http://localhost:8983/solr/gettingstarted/select?q=*:*&shards=shard1,localhost:7574/solr/gettingstarted|localhost:7500/solr/gettingstarted
----
[[DistributedRequests-ConfiguringtheShardHandlerFactory]]
== Configuring the ShardHandlerFactory
You can directly configure aspects of the concurrency and thread-pooling used within distributed search in Solr. This allows for finer grained control and you can tune it to target your own specific requirements. The default configuration favors throughput over latency.
@ -118,7 +116,6 @@ If specified, the thread pool will use a backing queue instead of a direct hando
`fairnessPolicy`::
Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`.
[[DistributedRequests-ConfiguringstatsCache_DistributedIDF_]]
== Configuring statsCache (Distributed IDF)
Document and term statistics are needed in order to calculate relevancy. Solr provides four implementations out of the box when it comes to document stats calculation:
@ -135,15 +132,13 @@ The implementation can be selected by setting `<statsCache>` in `solrconfig.xml`
<statsCache class="org.apache.solr.search.stats.ExactStatsCache"/>
----
[[DistributedRequests-AvoidingDistributedDeadlock]]
== Avoiding Distributed Deadlock
Each shard serves top-level query requests and then makes sub-requests to all of the other shards. Care should be taken to ensure that the max number of threads serving HTTP requests is greater than the possible number of requests from both top-level clients and other shards. If this is not the case, the configuration may result in a distributed deadlock.
For example, a deadlock might occur in the case of two shards, each with just a single thread to service HTTP requests. Both threads could receive a top-level request concurrently, and make sub-requests to each other. Because there are no more remaining threads to service requests, the incoming requests will be blocked until the other pending requests are finished, but they will not finish since they are waiting for the sub-requests. By ensuring that Solr is configured to handle a sufficient number of threads, you can avoid deadlock situations like this.
[[DistributedRequests-PreferLocalShards]]
== Prefer Local Shards
== preferLocalShards Parameter
Solr allows you to pass an optional boolean parameter named `preferLocalShards` to indicate that a distributed query should prefer local replicas of a shard when available. In other words, if a query includes `preferLocalShards=true`, then the query controller will look for local replicas to service the query instead of selecting replicas at random from across the cluster. This is useful when a query requests many fields or large fields to be returned per document because it avoids moving large amounts of data over the network when it is available locally. In addition, this feature can be useful for minimizing the impact of a problematic replica with degraded performance, as it reduces the likelihood that the degraded replica will be hit by other healthy replicas.

View File

@ -26,14 +26,12 @@ Everything on this page is specific to legacy setup of distributed search. Users
Update reorders (i.e., replica A may see update X then Y, and replica B may see update Y then X). *deleteByQuery* also handles reorders the same way, to ensure replicas are consistent. All replicas of a shard are consistent, even if the updates arrive in a different order on different replicas.
[[DistributedSearchwithIndexSharding-DistributingDocumentsacrossShards]]
== Distributing Documents across Shards
When not using SolrCloud, it is up to you to get all your documents indexed on each shard of your server farm. Solr supports distributed indexing (routing) in its true form only in the SolrCloud mode.
In the legacy distributed mode, Solr does not calculate universal term/doc frequencies. For most large-scale implementations, it is not likely to matter that Solr calculates TF/IDF at the shard level. However, if your collection is heavily skewed in its distribution across servers, you may find misleading relevancy results in your searches. In general, it is probably best to randomly distribute documents to your shards.
[[DistributedSearchwithIndexSharding-ExecutingDistributedSearcheswiththeshardsParameter]]
== Executing Distributed Searches with the shards Parameter
If a query request includes the `shards` parameter, the Solr server distributes the request across all the shards listed as arguments to the parameter. The `shards` parameter uses this syntax:
@ -63,7 +61,6 @@ The following components support distributed search:
* The *Stats* component, which returns simple statistics for numeric fields within the DocSet.
* The *Debug* component, which helps with debugging.
[[DistributedSearchwithIndexSharding-LimitationstoDistributedSearch]]
== Limitations to Distributed Search
Distributed searching in Solr has the following limitations:
@ -78,12 +75,10 @@ Distributed searching in Solr has the following limitations:
Formerly a limitation was that TF/IDF relevancy computations only used shard-local statistics. This is still the case by default. If your data isn't randomly distributed, or if you want more exact statistics, then remember to configure the ExactStatsCache.
[[DistributedSearchwithIndexSharding-AvoidingDistributedDeadlock]]
== Avoiding Distributed Deadlock
== Avoiding Distributed Deadlock with Distributed Search
Like in SolrCloud mode, inter-shard requests could lead to a distributed deadlock. It can be avoided by following the instructions in the section <<distributed-requests.adoc#distributed-requests,Distributed Requests>>.
[[DistributedSearchwithIndexSharding-TestingIndexShardingonTwoLocalServers]]
== Testing Index Sharding on Two Local Servers
For simple functional testing, it's easiest to just set up two local Solr servers on different ports. (In a production environment, of course, these servers would be deployed on separate machines.)

View File

@ -28,7 +28,6 @@ For other features that we now commonly associate with search, such as sorting,
In Lucene 4.0, a new approach was introduced. DocValue fields are now column-oriented fields with a document-to-value mapping built at index time. This approach promises to relieve some of the memory requirements of the fieldCache and make lookups for faceting, sorting, and grouping much faster.
[[DocValues-EnablingDocValues]]
== Enabling DocValues
To use docValues, you only need to enable it for a field that you will use it with. As with all schema design, you need to define a field type and then define fields of that type with docValues enabled. All of these actions are done in `schema.xml`.
@ -76,7 +75,6 @@ Lucene index back-compatibility is only supported for the default codec. If you
If `docValues="true"` for a field, then DocValues will automatically be used any time the field is used for <<common-query-parameters.adoc#CommonQueryParameters-ThesortParameter,sorting>>, <<faceting.adoc#faceting,faceting>> or <<function-queries.adoc#function-queries,function queries>>.
[[DocValues-RetrievingDocValuesDuringSearch]]
=== Retrieving DocValues During Search
Field values retrieved during search queries are typically returned from stored values. However, non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (e.g. "`fl=*`") for search queries depending on the effective value of the `useDocValuesAsStored` parameter for each field. For schema versions >= 1.6, the implicit default is `useDocValuesAsStored="true"`. See <<field-type-definitions-and-properties.adoc#field-type-definitions-and-properties,Field Type Definitions and Properties>> & <<defining-fields.adoc#defining-fields,Defining Fields>> for more details.

View File

@ -24,10 +24,8 @@ This section describes enabling SSL using a self-signed certificate.
For background on SSL certificates and keys, see http://www.tldp.org/HOWTO/SSL-Certificates-HOWTO/.
[[EnablingSSL-BasicSSLSetup]]
== Basic SSL Setup
[[EnablingSSL-Generateaself-signedcertificateandakey]]
=== Generate a Self-Signed Certificate and a Key
To generate a self-signed certificate and a single key that will be used to authenticate both the server and the client, we'll use the JDK https://docs.oracle.com/javase/8/docs/technotes/tools/unix/keytool.html[`keytool`] command and create a separate keystore. This keystore will also be used as a truststore below. It's possible to use the keystore that comes with the JDK for these purposes, and to use a separate truststore, but those options aren't covered here.
@ -45,7 +43,6 @@ keytool -genkeypair -alias solr-ssl -keyalg RSA -keysize 2048 -keypass secret -s
The above command will create a keystore file named `solr-ssl.keystore.jks` in the current directory.
[[EnablingSSL-ConvertthecertificateandkeytoPEMformatforusewithcURL]]
=== Convert the Certificate and Key to PEM Format for Use with cURL
cURL isn't capable of using JKS formatted keystores, so the JKS keystore needs to be converted to PEM format, which cURL understands.
@ -73,7 +70,6 @@ If you want to use cURL on OS X Yosemite (10.10), you'll need to create a certif
openssl pkcs12 -nokeys -in solr-ssl.keystore.p12 -out solr-ssl.cacert.pem
----
[[EnablingSSL-SetcommonSSLrelatedsystemproperties]]
=== Set Common SSL-Related System Properties
The Solr Control Script is already setup to pass SSL-related Java system properties to the JVM. To activate the SSL settings, uncomment and update the set of properties beginning with SOLR_SSL_* in `bin/solr.in.sh`. (or `bin\solr.in.cmd` on Windows).
@ -116,7 +112,6 @@ REM Enable clients to authenticate (but not require)
set SOLR_SSL_WANT_CLIENT_AUTH=false
----
[[EnablingSSL-RunSingleNodeSolrusingSSL]]
=== Run Single Node Solr using SSL
Start Solr using the command shown below; by default clients will not be required to authenticate:
@ -133,12 +128,10 @@ bin/solr -p 8984
bin\solr.cmd -p 8984
----
[[EnablingSSL-SolrCloud]]
== SSL with SolrCloud
This section describes how to run a two-node SolrCloud cluster with no initial collections and a single-node external ZooKeeper. The commands below assume you have already created the keystore described above.
[[EnablingSSL-ConfigureZooKeeper]]
=== Configure ZooKeeper
NOTE: ZooKeeper does not support encrypted communication with clients like Solr. There are several related JIRA tickets where SSL support is being planned/worked on: https://issues.apache.org/jira/browse/ZOOKEEPER-235[ZOOKEEPER-235]; https://issues.apache.org/jira/browse/ZOOKEEPER-236[ZOOKEEPER-236]; https://issues.apache.org/jira/browse/ZOOKEEPER-1000[ZOOKEEPER-1000]; and https://issues.apache.org/jira/browse/ZOOKEEPER-2120[ZOOKEEPER-2120].
@ -163,10 +156,8 @@ server\scripts\cloud-scripts\zkcli.bat -zkhost localhost:2181 -cmd clusterprop -
If you have set up your ZooKeeper cluster to use a <<taking-solr-to-production.adoc#TakingSolrtoProduction-ZooKeeperchroot,chroot for Solr>> , make sure you use the correct `zkhost` string with `zkcli`, e.g. `-zkhost localhost:2181/solr`.
[[EnablingSSL-RunSolrCloudwithSSL]]
=== Run SolrCloud with SSL
[[EnablingSSL-CreateSolrhomedirectoriesfortwonodes]]
==== Create Solr Home Directories for Two Nodes
Create two copies of the `server/solr/` directory which will serve as the Solr home directories for each of your two SolrCloud nodes:
@ -187,7 +178,6 @@ xcopy /E server\solr cloud\node1\
xcopy /E server\solr cloud\node2\
----
[[EnablingSSL-StartthefirstSolrnode]]
==== Start the First Solr Node
Next, start the first Solr node on port 8984. Be sure to stop the standalone server first if you started it when working through the previous section on this page.
@ -220,7 +210,6 @@ bin/solr -cloud -s cloud/node1 -z localhost:2181 -p 8984 -Dsolr.ssl.checkPeerNam
bin\solr.cmd -cloud -s cloud\node1 -z localhost:2181 -p 8984 -Dsolr.ssl.checkPeerName=false
----
[[EnablingSSL-StartthesecondSolrnode]]
==== Start the Second Solr Node
Finally, start the second Solr node on port 7574 - again, to skip hostname verification, add `-Dsolr.ssl.checkPeerName=false`;
@ -237,14 +226,13 @@ bin/solr -cloud -s cloud/node2 -z localhost:2181 -p 7574
bin\solr.cmd -cloud -s cloud\node2 -z localhost:2181 -p 7574
----
[[EnablingSSL-ExampleClientActions]]
== Example Client Actions
[IMPORTANT]
====
cURL on OS X Mavericks (10.9) has degraded SSL support. For more information and workarounds to allow one-way SSL, see http://curl.haxx.se/mail/archive-2013-10/0036.html. cURL on OS X Yosemite (10.10) is improved - 2-way SSL is possible - see http://curl.haxx.se/mail/archive-2014-10/0053.html .
The cURL commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<EnablingSSL-ConvertthecertificateandkeytoPEMformatforusewithcURL,above>> for instructions on creating this file):
The cURL commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<Convert the Certificate and Key to PEM Format for Use with cURL,above>> for instructions on creating this file):
[source,bash]
curl -E solr-ssl.keystore.p12:secret --cacert solr-ssl.cacert.pem ...
@ -271,7 +259,6 @@ bin\solr.cmd create -c mycollection -shards 2
The `create` action will pass the `SOLR_SSL_*` properties set in your include file to the SolrJ code used to create the collection.
[[EnablingSSL-RetrieveSolrCloudclusterstatususingcURL]]
=== Retrieve SolrCloud Cluster Status using cURL
To get the resulting cluster status (again, if you have not enabled client authentication, remove the `-E solr-ssl.pem:secret` option):
@ -317,7 +304,6 @@ You should get a response that looks like this:
"properties":{"urlScheme":"https"}}}
----
[[EnablingSSL-Indexdocumentsusingpost.jar]]
=== Index Documents using post.jar
Use `post.jar` to index some example documents to the SolrCloud collection created above:
@ -329,7 +315,6 @@ cd example/exampledocs
java -Djavax.net.ssl.keyStorePassword=secret -Djavax.net.ssl.keyStore=../../server/etc/solr-ssl.keystore.jks -Djavax.net.ssl.trustStore=../../server/etc/solr-ssl.keystore.jks -Djavax.net.ssl.trustStorePassword=secret -Durl=https://localhost:8984/solr/mycollection/update -jar post.jar *.xml
----
[[EnablingSSL-QueryusingcURL]]
=== Query Using cURL
Use cURL to query the SolrCloud collection created above, from a directory containing the PEM formatted certificate and key created above (e.g. `example/etc/`) - if you have not enabled client authentication (system property `-Djetty.ssl.clientAuth=true)`, then you can remove the `-E solr-ssl.pem:secret` option:
@ -339,8 +324,7 @@ Use cURL to query the SolrCloud collection created above, from a directory conta
curl -E solr-ssl.pem:secret --cacert solr-ssl.pem "https://localhost:8984/solr/mycollection/select?q=*:*&wt=json&indent=on"
----
[[EnablingSSL-IndexadocumentusingCloudSolrClient]]
=== Index a document using CloudSolrClient
=== Index a Document using CloudSolrClient
From a java client using SolrJ, index a document. In the code below, the `javax.net.ssl.*` system properties are set programmatically, but you could instead specify them on the java command line, as in the `post.jar` example above:

View File

@ -25,19 +25,16 @@ This feature uses a stream sorting technique that begins to send records within
The cases where this functionality may be useful include: session analysis, distributed merge joins, time series roll-ups, aggregations on high cardinality fields, fully distributed field collapsing, and sort based stats.
[[ExportingResultSets-FieldRequirements]]
== Field Requirements
All the fields being sorted and exported must have docValues set to true. For more information, see the section on <<docvalues.adoc#docvalues,DocValues>>.
[[ExportingResultSets-The_exportRequestHandler]]
== The /export RequestHandler
The `/export` request handler with the appropriate configuration is one of Solr's out-of-the-box request handlers - see <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>> for more information.
Note that this request handler's properties are defined as "invariants", which means they cannot be overridden by other properties passed at another time (such as at query time).
[[ExportingResultSets-RequestingResultsExport]]
== Requesting Results Export
You can use `/export` to make requests to export the result set of a query.
@ -53,19 +50,16 @@ Here is an example of an export request of some indexed log data:
http://localhost:8983/solr/core_name/export?q=my-query&sort=severity+desc,timestamp+desc&fl=severity,timestamp,msg
----
[[ExportingResultSets-SpecifyingtheSortCriteria]]
=== Specifying the Sort Criteria
The `sort` property defines how documents will be sorted in the exported result set. Results can be sorted by any field that has a field type of int,long, float, double, string. The sort fields must be single valued fields.
Up to four sort fields can be specified per request, with the 'asc' or 'desc' properties.
[[ExportingResultSets-SpecifyingtheFieldList]]
=== Specifying the Field List
The `fl` property defines the fields that will be exported with the result set. Any of the field types that can be sorted (i.e., int, long, float, double, string, date, boolean) can be used in the field list. The fields can be single or multi-valued. However, returning scores and wildcards are not supported at this time.
[[ExportingResultSets-DistributedSupport]]
== Distributed Support
See the section <<streaming-expressions.adoc#streaming-expressions,Streaming Expressions>> for distributed support.

View File

@ -21,7 +21,7 @@
Faceting is the arrangement of search results into categories based on indexed terms.
Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found were each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.
[[Faceting-GeneralParameters]]
== General Parameters

View File

@ -27,7 +27,6 @@ A field type definition can include four types of information:
* If the field type is `TextField`, a description of the field analysis for the field type.
* Field type properties - depending on the implementation class, some properties may be mandatory.
[[FieldTypeDefinitionsandProperties-FieldTypeDefinitionsinschema.xml]]
== Field Type Definitions in schema.xml
Field types are defined in `schema.xml`. Each field type is defined between `fieldType` elements. They can optionally be grouped within a `types` element. Here is an example of a field type definition for a type called `text_general`:
@ -137,7 +136,6 @@ The default values for each property depend on the underlying `FieldType` class,
// TODO: SOLR-10655 END
[[FieldTypeDefinitionsandProperties-FieldTypeSimilarity]]
== Field Type Similarity
A field type may optionally specify a `<similarity/>` that will be used when scoring documents that refer to fields with this type, as long as the "global" similarity for the collection allows it.

View File

@ -33,10 +33,8 @@ In this section you will learn how to start a SolrCloud cluster using startup sc
This tutorial assumes that you're already familiar with the basics of using Solr. If you need a refresher, please see the <<getting-started.adoc#getting-started,Getting Started section>> to get a grounding in Solr concepts. If you load documents as part of that exercise, you should start over with a fresh Solr installation for these SolrCloud tutorials.
====
[[GettingStartedwithSolrCloud-SolrCloudExample]]
== SolrCloud Example
[[GettingStartedwithSolrCloud-InteractiveStartup]]
=== Interactive Startup
The `bin/solr` script makes it easy to get started with SolrCloud as it walks you through the process of launching Solr nodes in cloud mode and adding a collection. To get started, simply do:
@ -120,7 +118,6 @@ To stop Solr in SolrCloud mode, you would use the `bin/solr` script and issue th
bin/solr stop -all
----
[[GettingStartedwithSolrCloud-Startingwith-noprompt]]
=== Starting with -noprompt
You can also get SolrCloud started with all the defaults instead of the interactive session using the following command:
@ -130,7 +127,6 @@ You can also get SolrCloud started with all the defaults instead of the interact
bin/solr -e cloud -noprompt
----
[[GettingStartedwithSolrCloud-RestartingNodes]]
=== Restarting Nodes
You can restart your SolrCloud nodes using the `bin/solr` script. For instance, to restart node1 running on port 8983 (with an embedded ZooKeeper server), you would do:
@ -149,7 +145,6 @@ bin/solr restart -c -p 7574 -z localhost:9983 -s example/cloud/node2/solr
Notice that you need to specify the ZooKeeper address (`-z localhost:9983`) when starting node2 so that it can join the cluster with node1.
[[GettingStartedwithSolrCloud-Addinganodetoacluster]]
=== Adding a node to a cluster
Adding a node to an existing cluster is a bit advanced and involves a little more understanding of Solr. Once you startup a SolrCloud cluster using the startup scripts, you can add a new node to it by:

View File

@ -38,7 +38,6 @@ There are two plugin classes:
For most SolrCloud or standalone Solr setups, the `HadoopAuthPlugin` should suffice.
====
[[HadoopAuthenticationPlugin-PluginConfiguration]]
== Plugin Configuration
`class`::
@ -70,11 +69,8 @@ Configures proxy users for the underlying Hadoop authentication mechanism. This
`clientBuilderFactory`:: No |
The `HttpClientBuilderFactory` implementation used for the Solr internal communication. Only applicable for `ConfigurableInternodeAuthHadoopPlugin`.
[[HadoopAuthenticationPlugin-ExampleConfigurations]]
== Example Configurations
[[HadoopAuthenticationPlugin-KerberosAuthenticationusingHadoopAuthenticationPlugin]]
=== Kerberos Authentication using Hadoop Authentication Plugin
This example lets you configure Solr to use Kerberos Authentication, similar to how you would use the <<kerberos-authentication-plugin.adoc#kerberos-authentication-plugin,Kerberos Authentication Plugin>>.
@ -105,7 +101,6 @@ To setup this plugin, use the following in your `security.json` file.
}
----
[[HadoopAuthenticationPlugin-SimpleAuthenticationwithDelegationTokens]]
=== Simple Authentication with Delegation Tokens
Similar to the previous example, this is an example of setting up a Solr cluster that uses delegation tokens. Refer to the parameters in the Hadoop authentication library's https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[documentation] or refer to the section <<kerberos-authentication-plugin.adoc#kerberos-authentication-plugin,Kerberos Authentication Plugin>> for further details. Please note that this example does not use Kerberos and the requests made to Solr must contain valid delegation tokens.

View File

@ -24,7 +24,6 @@ The fragments are included in a special section of the query response (the `high
Highlighting is extremely configurable, perhaps more than any other part of Solr. There are many parameters each for fragment sizing, formatting, ordering, backup/alternate behavior, and more options that are hard to categorize. Nonetheless, highlighting is very simple to use.
[[Highlighting-Usage]]
== Usage
=== Common Highlighter Parameters
@ -36,7 +35,7 @@ Use this parameter to enable or disable highlighting. The default is `false`. If
`hl.method`::
The highlighting implementation to use. Acceptable values are: `unified`, `original`, `fastVector`. The default is `original`.
+
See the <<Highlighting-ChoosingaHighlighter,Choosing a Highlighter>> section below for more details on the differences between the available highlighters.
See the <<Choosing a Highlighter>> section below for more details on the differences between the available highlighters.
`hl.fl`::
Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets.
@ -92,7 +91,6 @@ The default is `51200` characters.
There are more parameters supported as well depending on the highlighter (via `hl.method`) chosen.
[[Highlighting-HighlightingintheQueryResponse]]
=== Highlighting in the Query Response
In the response to a query, Solr includes highlighting data in a section separate from the documents. It is up to a client to determine how to process this response and display the highlights to users.
@ -136,7 +134,6 @@ Note the two sections `docs` and `highlighting`. The `docs` section contains the
The `highlighting` section includes the ID of each document, and the field that contains the highlighted portion. In this example, we used the `hl.fl` parameter to say we wanted query terms highlighted in the "manu" field. When there is a match to the query term in that field, it will be included for each document ID in the list.
[[Highlighting-ChoosingaHighlighter]]
== Choosing a Highlighter
Solr provides a `HighlightComponent` (a `SearchComponent`) and it's in the default list of components for search handlers. It offers a somewhat unified API over multiple actual highlighting implementations (or simply "highlighters") that do the business of highlighting.
@ -173,7 +170,6 @@ The Unified Highlighter is exclusively configured via search parameters. In cont
In addition to further information below, more information can be found in the {solr-javadocs}/solr-core/org/apache/solr/highlight/package-summary.html[Solr javadocs].
[[Highlighting-SchemaOptionsandPerformanceConsiderations]]
=== Schema Options and Performance Considerations
Fundamental to the internals of highlighting are detecting the _offsets_ of the individual words that match the query. Some of the highlighters can run the stored text through the analysis chain defined in the schema, some can look them up from _postings_, and some can look them up from _term vectors._ These choices have different trade-offs:
@ -198,7 +194,6 @@ This is definitely the fastest option for highlighting wildcard queries on large
+
This adds substantial weight to the index similar in size to the compressed stored text. If you are using the Unified Highlighter then this is not a recommended configuration since it's slower and heavier than postings with light term vectors. However, this could make sense if full term vectors are already needed for another use-case.
[[Highlighting-TheUnifiedHighlighter]]
== The Unified Highlighter
The Unified Highlighter supports these following additional parameters to the ones listed earlier:
@ -243,7 +238,6 @@ Indicates which character to break the text on. Use only if you have defined `hl
This is useful when the text has already been manipulated in advance to have a special delineation character at desired highlight passage boundaries. This character will still appear in the text as the last character of a passage.
[[Highlighting-TheOriginalHighlighter]]
== The Original Highlighter
The Original Highlighter supports these following additional parameters to the ones listed earlier:
@ -314,7 +308,6 @@ If this may happen and you know you don't need them for highlighting (i.e. your
The Original Highlighter has a plugin architecture that enables new functionality to be registered in `solrconfig.xml`. The "```techproducts```" configset shows most of these settings explicitly. You can use it as a guide to provide your own components to include a `SolrFormatter`, `SolrEncoder`, and `SolrFragmenter.`
[[Highlighting-TheFastVectorHighlighter]]
== The FastVector Highlighter
The FastVector Highlighter (FVH) can be used in conjunction with the Original Highlighter if not all fields should be highlighted with the FVH. In such a mode, set `hl.method=original` and `f.yourTermVecField.hl.method=fastVector` for all fields that should use the FVH. One annoyance to keep in mind is that the Original Highlighter uses `hl.simple.pre` whereas the FVH (and other highlighters) use `hl.tag.pre`.
@ -349,15 +342,12 @@ The maximum number of phrases to analyze when searching for the highest-scoring
`hl.multiValuedSeparatorChar`::
Text to use to separate one value from the next for a multi-valued field. The default is " " (a space).
[[Highlighting-UsingBoundaryScannerswiththeFastVectorHighlighter]]
=== Using Boundary Scanners with the FastVector Highlighter
The FastVector Highlighter will occasionally truncate highlighted words. To prevent this, implement a boundary scanner in `solrconfig.xml`, then use the `hl.boundaryScanner` parameter to specify the boundary scanner for highlighting.
Solr supports two boundary scanners: `breakIterator` and `simple`.
[[Highlighting-ThebreakIteratorBoundaryScanner]]
==== The breakIterator Boundary Scanner
The `breakIterator` boundary scanner offers excellent performance right out of the box by taking locale and boundary type into account. In most cases you will want to use the `breakIterator` boundary scanner. To implement the `breakIterator` boundary scanner, add this code to the `highlighting` section of your `solrconfig.xml` file, adjusting the type, language, and country values as appropriate to your application:
@ -375,7 +365,6 @@ The `breakIterator` boundary scanner offers excellent performance right out of t
Possible values for the `hl.bs.type` parameter are WORD, LINE, SENTENCE, and CHARACTER.
[[Highlighting-ThesimpleBoundaryScanner]]
==== The simple Boundary Scanner
The `simple` boundary scanner scans term boundaries for a specified maximum character value (`hl.bs.maxScan`) and for common delimiters such as punctuation marks (`hl.bs.chars`). The `simple` boundary scanner may be useful for some custom To implement the `simple` boundary scanner, add this code to the `highlighting` section of your `solrconfig.xml` file, adjusting the values as appropriate to your application:

View File

@ -27,13 +27,11 @@ The following sections cover provide general information about how various SolrC
If you are already familiar with SolrCloud concepts and basic functionality, you can skip to the section covering <<solrcloud-configuration-and-parameters.adoc#solrcloud-configuration-and-parameters,SolrCloud Configuration and Parameters>>.
[[HowSolrCloudWorks-KeySolrCloudConcepts]]
== Key SolrCloud Concepts
A SolrCloud cluster consists of some "logical" concepts layered on top of some "physical" concepts.
[[HowSolrCloudWorks-Logical]]
=== Logical
=== Logical Concepts
* A Cluster can host multiple Collections of Solr Documents.
* A collection can be partitioned into multiple Shards, which contain a subset of the Documents in the Collection.
@ -41,8 +39,7 @@ A SolrCloud cluster consists of some "logical" concepts layered on top of some "
** The theoretical limit to the number of Documents that Collection can reasonably contain.
** The amount of parallelization that is possible for an individual search request.
[[HowSolrCloudWorks-Physical]]
=== Physical
=== Physical Concepts
* A Cluster is made up of one or more Solr Nodes, which are running instances of the Solr server process.
* Each Node can host multiple Cores.

View File

@ -43,7 +43,6 @@ This section describes how Solr adds data to its index. It covers the following
* *<<uima-integration.adoc#uima-integration,UIMA Integration>>*: Information about integrating Solr with Apache's Unstructured Information Management Architecture (UIMA). UIMA lets you define custom pipelines of Analysis Engines that incrementally add metadata to your documents as annotations.
[[IndexingandBasicDataOperations-IndexingUsingClientAPIs]]
== Indexing Using Client APIs
Using client APIs, such as <<using-solrj.adoc#using-solrj,SolrJ>>, from your applications is an important option for updating Solr indexes. See the <<client-apis.adoc#client-apis,Client APIs>> section for more information.

View File

@ -55,8 +55,7 @@ For example, if an `<initParams>` section has the name "myParams", you can call
[source,xml]
<requestHandler name="/dump1" class="DumpRequestHandler" initParams="myParams"/>
[[InitParamsinSolrConfig-Wildcards]]
== Wildcards
== Wildcards in initParams
An `<initParams>` section can support wildcards to define nested paths that should use the parameters defined. A single asterisk (\*) denotes that a nested path one level deeper should use the parameters. Double asterisks (**) denote all nested paths no matter how deep should use the parameters.

View File

@ -38,12 +38,10 @@ If the field name is defined in the Schema that is associated with the index, th
For more information on indexing in Solr, see the https://wiki.apache.org/solr/FrontPage[Solr Wiki].
[[IntroductiontoSolrIndexing-TheSolrExampleDirectory]]
== The Solr Example Directory
When starting Solr with the "-e" option, the `example/` directory will be used as base directory for the example Solr instances that are created. This directory also includes an `example/exampledocs/` subdirectory containing sample documents in a variety of formats that you can use to experiment with indexing into the various examples.
[[IntroductiontoSolrIndexing-ThecurlUtilityforTransferringFiles]]
== The curl Utility for Transferring Files
Many of the instructions and examples in this section make use of the `curl` utility for transferring content through a URL. `curl` posts and retrieves data over HTTP, FTP, and many other protocols. Most Linux distributions include a copy of `curl`. You'll find curl downloads for Linux, Windows, and many other operating systems at http://curl.haxx.se/download.html. Documentation for `curl` is available here: http://curl.haxx.se/docs/manpage.html.

View File

@ -24,7 +24,6 @@ Configuring your JVM can be a complex topic and a full discussion is beyond the
For more general information about improving Solr performance, see https://wiki.apache.org/solr/SolrPerformanceFactors.
[[JVMSettings-ChoosingMemoryHeapSettings]]
== Choosing Memory Heap Settings
The most important JVM configuration settings are those that determine the amount of memory it is allowed to allocate. There are two primary command-line options that set memory limits for the JVM. These are `-Xms`, which sets the initial size of the JVM's memory heap, and `-Xmx`, which sets the maximum size to which the heap is allowed to grow.
@ -41,12 +40,10 @@ When setting the maximum heap size, be careful not to let the JVM consume all av
On systems with many CPUs/cores, it can also be beneficial to tune the layout of the heap and/or the behavior of the garbage collector. Adjusting the relative sizes of the generational pools in the heap can affect how often GC sweeps occur and whether they run concurrently. Configuring the various settings of how the garbage collector should behave can greatly reduce the overall performance impact when it does run. There is a lot of good information on this topic available on Sun's website. A good place to start is here: http://www.oracle.com/technetwork/java/javase/tech/index-jsp-140228.html[Oracle's Java HotSpot Garbage Collection].
[[JVMSettings-UsetheServerHotSpotVM]]
== Use the Server HotSpot VM
If you are using Sun's JVM, add the `-server` command-line option when you start Solr. This tells the JVM that it should optimize for a long running, server process. If the Java runtime on your system is a JRE, rather than a full JDK distribution (including `javac` and other development tools), then it is possible that it may not support the `-server` JVM option. Test this by running `java -help` and look for `-server` as an available option in the displayed usage message.
[[JVMSettings-CheckingJVMSettings]]
== Checking JVM Settings
A great way to see what JVM settings your server is using, along with other useful information, is to use the admin RequestHandler, `solr/admin/system`. This request handler will display a wealth of server statistics and settings.

View File

@ -29,17 +29,14 @@ Support for the Kerberos authentication plugin is available in SolrCloud mode or
If you are using Solr with a Hadoop cluster secured with Kerberos and intend to store your Solr indexes in HDFS, also see the section <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,Running Solr on HDFS>> for additional steps to configure Solr for that purpose. The instructions on this page apply only to scenarios where Solr will be secured with Kerberos. If you only need to store your indexes in a Kerberized HDFS system, please see the other section referenced above.
====
[[KerberosAuthenticationPlugin-HowSolrWorksWithKerberos]]
== How Solr Works With Kerberos
When setting up Solr to use Kerberos, configurations are put in place for Solr to use a _service principal_, or a Kerberos username, which is registered with the Key Distribution Center (KDC) to authenticate requests. The configurations define the service principal name and the location of the keytab file that contains the credentials.
[[KerberosAuthenticationPlugin-security.json]]
=== security.json
The Solr authentication model uses a file called `security.json`. A description of this file and how it is created and maintained is covered in the section <<authentication-and-authorization-plugins.adoc#authentication-and-authorization-plugins,Authentication and Authorization Plugins>>. If this file is created after an initial startup of Solr, a restart of each node of the system is required.
[[KerberosAuthenticationPlugin-ServicePrincipalsandKeytabFiles]]
=== Service Principals and Keytab Files
Each Solr node must have a service principal registered with the Key Distribution Center (KDC). The Kerberos plugin uses SPNego to negotiate authentication.
@ -56,7 +53,6 @@ Along with the service principal, each Solr node needs a keytab file which shoul
Since a Solr cluster requires internode communication, each node must also be able to make Kerberos enabled requests to other nodes. By default, Solr uses the same service principal and keytab as a 'client principal' for internode communication. You may configure a distinct client principal explicitly, but doing so is not recommended and is not covered in the examples below.
[[KerberosAuthenticationPlugin-KerberizedZooKeeper]]
=== Kerberized ZooKeeper
When setting up a kerberized SolrCloud cluster, it is recommended to enable Kerberos security for ZooKeeper as well.
@ -65,15 +61,13 @@ In such a setup, the client principal used to authenticate requests with ZooKeep
See the <<ZooKeeper Configuration>> section below for an example of starting ZooKeeper in Kerberos mode.
[[KerberosAuthenticationPlugin-BrowserConfiguration]]
=== Browser Configuration
In order for your browser to access the Solr Admin UI after enabling Kerberos authentication, it must be able to negotiate with the Kerberos authenticator service to allow you access. Each browser supports this differently, and some (like Chrome) do not support it at all. If you see 401 errors when trying to access the Solr Admin UI after enabling Kerberos authentication, it's likely your browser has not been configured properly to know how or where to negotiate the authentication request.
Detailed information on how to set up your browser is beyond the scope of this documentation; please see your system administrators for Kerberos for details on how to configure your browser.
[[KerberosAuthenticationPlugin-PluginConfiguration]]
== Plugin Configuration
== Kerberos Authentication Configuration
.Consult Your Kerberos Admins!
[WARNING]
@ -97,7 +91,6 @@ We'll walk through each of these steps below.
To use host names instead of IP addresses, use the `SOLR_HOST` configuration in `bin/solr.in.sh` or pass a `-Dhost=<hostname>` system parameter during Solr startup. This guide uses IP addresses. If you specify a hostname, replace all the IP addresses in the guide with the Solr hostname as appropriate.
====
[[KerberosAuthenticationPlugin-GetServicePrincipalsandKeytabs]]
=== Get Service Principals and Keytabs
Before configuring Solr, make sure you have a Kerberos service principal for each Solr host and ZooKeeper (if ZooKeeper has not already been configured) available in the KDC server, and generate a keytab file as shown below.
@ -128,7 +121,6 @@ Copy the keytab file from the KDC servers `/tmp/107.keytab` location to the S
You might need to take similar steps to create a ZooKeeper service principal and keytab if it has not already been set up. In that case, the example below shows a different service principal for ZooKeeper, so the above might be repeated with `zookeeper/host1` as the service principal for one of the nodes
[[KerberosAuthenticationPlugin-ZooKeeperConfiguration]]
=== ZooKeeper Configuration
If you are using a ZooKeeper that has already been configured to use Kerberos, you can skip the ZooKeeper-related steps shown here.
@ -173,7 +165,6 @@ Once all of the pieces are in place, start ZooKeeper with the following paramete
bin/zkServer.sh start -Djava.security.auth.login.config=/etc/zookeeper/conf/jaas-client.conf
----
[[KerberosAuthenticationPlugin-Createsecurity.json]]
=== Create security.json
Create the `security.json` file.
@ -194,7 +185,6 @@ More details on how to use a `/security.json` file in Solr are available in the
If you already have a `/security.json` file in ZooKeeper, download the file, add or modify the authentication section and upload it back to ZooKeeper using the <<command-line-utilities.adoc#command-line-utilities,Command Line Utilities>> available in Solr.
====
[[KerberosAuthenticationPlugin-DefineaJAASConfigurationFile]]
=== Define a JAAS Configuration File
The JAAS configuration file defines the properties to use for authentication, such as the service principal and the location of the keytab file. Other properties can also be set to ensure ticket caching and other features.
@ -227,7 +217,6 @@ The main properties we are concerned with are the `keyTab` and `principal` prope
* `debug`: this boolean property will output debug messages for help in troubleshooting.
* `principal`: the name of the service principal to be used.
[[KerberosAuthenticationPlugin-SolrStartupParameters]]
=== Solr Startup Parameters
While starting up Solr, the following host-specific parameters need to be passed. These parameters can be passed at the command line with the `bin/solr` start command (see <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script Reference>> for details on how to pass system parameters) or defined in `bin/solr.in.sh` or `bin/solr.in.cmd` as appropriate for your operating system.
@ -252,7 +241,6 @@ The app name (section name) within the JAAS configuration file which is required
`java.security.auth.login.config`::
Path to the JAAS configuration file for configuring a Solr client for internode communication. This parameter is required.
Here is an example that could be added to `bin/solr.in.sh`. Make sure to change this example to use the right hostname and the keytab file path.
[source,bash]
@ -273,7 +261,6 @@ For Java 1.8, this is available here: http://www.oracle.com/technetwork/java/jav
Replace the `local_policy.jar` present in `JAVA_HOME/jre/lib/security/` with the new `local_policy.jar` from the downloaded package and restart the Solr node.
====
[[KerberosAuthenticationPlugin-UsingDelegationTokens]]
=== Using Delegation Tokens
The Kerberos plugin can be configured to use delegation tokens, which allow an application to reuse the authentication of an end-user or another application.
@ -304,7 +291,6 @@ The ZooKeeper path where the secret provider information is stored. This is in t
`solr.kerberos.delegation.token.secret.manager.znode.working.path`::
The ZooKeeper path where token information is stored. This is in the form of the path + /security/zkdtsm. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/zkdtsm`.
[[KerberosAuthenticationPlugin-StartSolr]]
=== Start Solr
Once the configuration is complete, you can start Solr with the `bin/solr` script, as in the example below, which is for users in SolrCloud mode only. This example assumes you modified `bin/solr.in.sh` or `bin/solr.in.cmd`, with the proper values, but if you did not, you would pass the system parameters along with the start command. Note you also need to customize the `-z` property as appropriate for the location of your ZooKeeper nodes.
@ -314,7 +300,6 @@ Once the configuration is complete, you can start Solr with the `bin/solr` scrip
bin/solr -c -z server1:2181,server2:2181,server3:2181/solr
----
[[KerberosAuthenticationPlugin-TesttheConfiguration]]
=== Test the Configuration
. Do a `kinit` with your username. For example, `kinit \user@EXAMPLE.COM`.
@ -325,7 +310,6 @@ bin/solr -c -z server1:2181,server2:2181,server3:2181/solr
curl --negotiate -u : "http://192.168.0.107:8983/solr/"
----
[[KerberosAuthenticationPlugin-UsingSolrJwithaKerberizedSolr]]
== Using SolrJ with a Kerberized Solr
To use Kerberos authentication in a SolrJ application, you need the following two lines before you create a SolrClient:
@ -353,7 +337,6 @@ SolrJClient {
};
----
[[KerberosAuthenticationPlugin-DelegationTokenswithSolrJ]]
=== Delegation Tokens with SolrJ
Delegation tokens are also supported with SolrJ, in the following ways:

View File

@ -32,7 +32,6 @@ We can prefix this query string with local parameters to provide more informatio
These local parameters would change the query to require a match on both "solr" and "rocks" while searching the "title" field by default.
[[LocalParametersinQueries-BasicSyntaxofLocalParameters]]
== Basic Syntax of Local Parameters
To specify a local parameter, insert the following before the argument to be modified:
@ -45,7 +44,6 @@ To specify a local parameter, insert the following before the argument to be mod
You may specify only one local parameters prefix per argument. Values in the key-value pairs may be quoted via single or double quotes, and backslash escaping works within quoted strings.
[[LocalParametersinQueries-QueryTypeShortForm]]
== Query Type Short Form
If a local parameter value appears without a name, it is given the implicit name of "type". This allows short-form representation for the type of query parser to use when parsing a query string. Thus
@ -74,7 +72,6 @@ is equivalent to
`q={!type=dismax qf=myfield v='solr rocks'`}
[[LocalParametersinQueries-ParameterDereferencing]]
== Parameter Dereferencing
Parameter dereferencing, or indirection, lets you use the value of another argument rather than specifying it directly. This can be used to simplify queries, decouple user input from query parameters, or decouple front-end GUI parameters from defaults set in `solrconfig.xml`.

View File

@ -27,7 +27,6 @@ image::images/logging/logging.png[image,width=621,height=250]
While this example shows logged messages for only one core, if you have multiple cores in a single instance, they will each be listed, with the level for each.
[[Logging-SelectingaLoggingLevel]]
== Selecting a Logging Level
When you select the *Level* link on the left, you see the hierarchy of classpaths and classnames for your instance. A row highlighted in yellow indicates that the class has logging capabilities. Click on a highlighted row, and a menu will appear to allow you to change the log level for that class. Characters in boldface indicate that the class will not be affected by level changes to root.

View File

@ -38,8 +38,7 @@ bin/solr -e techproducts
Let's begin learning about managed resources by looking at a couple of examples provided by Solr for managing stop words and synonyms using a REST API. After reading this section, you'll be ready to dig into the details of how managed resources are implemented in Solr so you can start building your own implementation.
[[ManagedResources-Stopwords]]
=== Stop Words
=== Managing Stop Words
To begin, you need to define a field type that uses the <<filter-descriptions.adoc#FilterDescriptions-ManagedStopFilter,ManagedStopFilterFactory>>, such as:
@ -134,8 +133,7 @@ curl -X DELETE "http://localhost:8983/solr/techproducts/schema/analysis/stopword
NOTE: PUT/POST is used to add terms to an existing list instead of replacing the list entirely. This is because it is more common to add a term to an existing list than it is to replace a list altogether, so the API favors the more common approach of incrementally adding terms especially since deleting individual terms is also supported.
[[ManagedResources-Synonyms]]
=== Synonyms
=== Managing Synonyms
For the most part, the API for managing synonyms behaves similar to the API for stop words, except instead of working with a list of words, it uses a map, where the value for each entry in the map is a set of synonyms for a term. As with stop words, the `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> includes a pre-built set of synonym mappings suitable for the sample data that is activated by the following field type definition in schema.xml:
@ -209,8 +207,7 @@ Note that the expansion is performed when processing the PUT request so the unde
Lastly, you can delete a mapping by sending a DELETE request to the managed endpoint.
[[ManagedResources-ApplyingChanges]]
== Applying Changes
== Applying Managed Resource Changes
Changes made to managed resources via this REST API are not applied to the active Solr components until the Solr collection (or Solr core in single server mode) is reloaded.
@ -227,7 +224,6 @@ However, the intent of this API implementation is that changes will be applied u
Changing things like stop words and synonym mappings typically require re-indexing existing documents if being used by index-time analyzers. The RestManager framework does not guard you from this, it simply makes it possible to programmatically build up a set of stop words, synonyms etc.
====
[[ManagedResources-RestManagerEndpoint]]
== RestManager Endpoint
Metadata about registered ManagedResources is available using the `/schema/managed` endpoint for each collection.

View File

@ -34,8 +34,7 @@ Specifies whether statistics are returned with results. You can override the `st
`wt`::
The output format. This operates the same as the <<response-writers.adoc#response-writers,`wt` parameter in a query>>. The default is `xml`.
[[MBeanRequestHandler-Examples]]
== Examples
== MBeanRequestHandler Examples
The following examples assume you are running Solr's `techproducts` example configuration:

View File

@ -27,7 +27,6 @@ To merge indexes, they must meet these requirements:
Optimally, the two indexes should be built using the same schema.
[[MergingIndexes-UsingIndexMergeTool]]
== Using IndexMergeTool
To merge the indexes, do the following:
@ -43,7 +42,6 @@ java -cp $SOLR/server/solr-webapp/webapp/WEB-INF/lib/lucene-core-VERSION.jar:$SO
This will create a new index at `/path/to/newindex` that contains both index1 and index2.
. Copy this new directory to the location of your application's solr index (move the old one aside first, of course) and start Solr.
[[MergingIndexes-UsingCoreAdmin]]
== Using CoreAdmin
The `MERGEINDEXES` command of the <<coreadmin-api.adoc#CoreAdminAPI-MERGEINDEXES,CoreAdminHandler>> can be used to merge indexes into a new core either from one or more arbitrary `indexDir` directories or by merging from one or more existing `srcCore` core names.

View File

@ -28,7 +28,6 @@ The second is to use it as a search component. This is less desirable since it p
The final approach is to use it as a request handler but with externally supplied text. This case, also referred to as the MoreLikeThisHandler, will supply information about similar documents in the index based on the text of the input document.
[[MoreLikeThis-HowMoreLikeThisWorks]]
== How MoreLikeThis Works
`MoreLikeThis` constructs a Lucene query based on terms in a document. It does this by pulling terms from the defined list of fields ( see the `mlt.fl` parameter, below). For best results, the fields should have stored term vectors in `schema.xml`. For example:
@ -42,7 +41,6 @@ If term vectors are not stored, `MoreLikeThis` will generate terms from stored f
The next phase filters terms from the original document using thresholds defined with the MoreLikeThis parameters. Finally, a query is run with these terms, and any other query parameters that have been defined (see the `mlt.qf` parameter, below) and a new document set is returned.
[[MoreLikeThis-CommonParametersforMoreLikeThis]]
== Common Parameters for MoreLikeThis
The table below summarizes the `MoreLikeThis` parameters supported by Lucene/Solr. These parameters can be used with any of the three possible MoreLikeThis approaches.
@ -77,8 +75,6 @@ Specifies if the query will be boosted by the interesting term relevance. It can
`mlt.qf`::
Query fields and their boosts using the same format as that used by the <<the-dismax-query-parser.adoc#the-dismax-query-parser,DisMax Query Parser>>. These fields must also be specified in `mlt.fl`.
[[MoreLikeThis-ParametersfortheMoreLikeThisComponent]]
== Parameters for the MoreLikeThisComponent
Using MoreLikeThis as a search component returns similar documents for each document in the response set. In addition to the common parameters, these additional options are available:
@ -89,8 +85,6 @@ If set to `true`, activates the `MoreLikeThis` component and enables Solr to ret
`mlt.count`::
Specifies the number of similar documents to be returned for each result. The default value is 5.
[[MoreLikeThis-ParametersfortheMoreLikeThisHandler]]
== Parameters for the MoreLikeThisHandler
The table below summarizes parameters accessible through the `MoreLikeThisHandler`. It supports faceting, paging, and filtering using common query parameters, but does not work well with alternate query parsers.
@ -105,7 +99,6 @@ Specifies an offset into the main query search results to locate the document on
Controls how the `MoreLikeThis` component presents the "interesting" terms (the top TF/IDF terms) for the query. Supports three settings. The setting list lists the terms. The setting none lists no terms. The setting details lists the terms along with the boost value used for each term. Unless `mlt.boost=true`, all terms will have `boost=1.0`.
[[MoreLikeThis-MoreLikeThisQueryParser]]
== More Like This Query Parser
== MoreLikeThis Query Parser
The `mlt` query parser provides a mechanism to retrieve documents similar to a given document, like the handler. More information on the usage of the mlt query parser can be found in the section <<other-parsers.adoc#other-parsers,Other Parsers>>.

View File

@ -26,7 +26,6 @@ With NRT, you can modify a `commit` command to be a *soft commit*, which avoids
However, pay special attention to cache and autowarm settings as they can have a significant impact on NRT performance.
[[NearRealTimeSearching-CommitsandOptimizing]]
== Commits and Optimizing
A commit operation makes index changes visible to new search requests. A *hard commit* uses the transaction log to get the id of the latest document changes, and also calls `fsync` on the index files to ensure they have been flushed to stable storage and no data loss will result from a power failure. The current transaction log is closed and a new one is opened. See the "transaction log" discussion below for data loss issues.
@ -45,7 +44,6 @@ The number of milliseconds to wait before pushing documents to the index. It wor
Use `maxDocs` and `maxTime` judiciously to fine-tune your commit strategies.
[[NearRealTimeSearching-TransactionLogs]]
=== Transaction Logs (tlogs)
Transaction logs are a "rolling window" of at least the last `N` (default 100) documents indexed. Tlogs are configured in solrconfig.xml, including the value of `N`. The current transaction log is closed and a new one opened each time any variety of hard commit occurs. Soft commits have no effect on the transaction log.
@ -54,7 +52,6 @@ When tlogs are enabled, documents being added to the index are written to the tl
When Solr is shut down gracefully (i.e. using the `bin/solr stop` command and the like) Solr will close the tlog file and index segments so no replay will be necessary on startup.
[[NearRealTimeSearching-AutoCommits]]
=== AutoCommits
An autocommit also uses the parameters `maxDocs` and `maxTime`. However it's useful in many strategies to use both a hard `autocommit` and `autosoftcommit` to achieve more flexible commits.
@ -72,7 +69,6 @@ For example:
It's better to use `maxTime` rather than `maxDocs` to modify an `autoSoftCommit`, especially when indexing a large number of documents through the commit operation. It's also better to turn off `autoSoftCommit` for bulk indexing.
[[NearRealTimeSearching-OptionalAttributesforcommitandoptimize]]
=== Optional Attributes for commit and optimize
`waitSearcher`::
@ -99,7 +95,6 @@ Example of `commit` and `optimize` with optional attributes:
<optimize waitSearcher="false"/>
----
[[NearRealTimeSearching-PassingcommitandcommitWithinparametersaspartoftheURL]]
=== Passing commit and commitWithin Parameters as Part of the URL
Update handlers can also get `commit`-related parameters as part of the update URL, if the `stream.body` feature is enabled. This example adds a small test document and causes an explicit commit to happen immediately afterwards:
@ -132,10 +127,9 @@ curl http://localhost:8983/solr/my_collection/update?commitWithin=10000
-H "Content-Type: text/xml" --data-binary '<add><doc><field name="id">testdoc</field></doc></add>'
----
WARNING: While the `stream.body` feature is great for development and testing, it should normally not be enabled in production systems, as it lets a user with READ permissions post data that may alter the system state. The feature is disabled by default. See <<requestdispatcher-in-solrconfig.adoc#RequestDispatcherinSolrConfig-requestParsersElement,RequestDispatcher in SolrConfig>> for details.
WARNING: While the `stream.body` feature is great for development and testing, it should normally not be enabled in production systems, as it lets a user with READ permissions post data that may alter the system state. The feature is disabled by default. See <<requestdispatcher-in-solrconfig.adoc#RequestDispatcherinSolrConfig-requestParsersElement,RequestDispatcher in SolrConfig>> for details.
[[NearRealTimeSearching-ChangingdefaultcommitWithinBehavior]]
=== Changing default commitWithin Behavior
=== Changing Default commitWithin Behavior
The `commitWithin` settings allow forcing document commits to happen in a defined time period. This is used most frequently with <<near-real-time-searching.adoc#near-real-time-searching,Near Real Time Searching>>, and for that reason the default is to perform a soft commit. This does not, however, replicate new documents to slave servers in a master/slave environment. If that's a requirement for your implementation, you can force a hard commit by adding a parameter, as in this example:

View File

@ -31,7 +31,6 @@ bin/post -c gettingstarted example/films/films.json
This will contact the server at `localhost:8983`. Specifying the `collection/core name` is *mandatory*. The `-help` (or simply `-h`) option will output information on its usage (i.e., `bin/post -help)`.
== Using the bin/post Tool
Specifying either the `collection/core name` or the full update `url` is *mandatory* when using `bin/post`.
@ -74,8 +73,7 @@ OPTIONS
...
----
[[bin_post_examples]]
== Examples
== Examples Using bin/post
There are several ways to use `bin/post`. This section presents several examples.

View File

@ -33,12 +33,10 @@ When might you want to use this feature?
* To mix and match parameter sets at request time.
* To avoid a reload of your collection for small parameter changes.
[[RequestParametersAPI-TheRequestParametersEndpoint]]
== The Request Parameters Endpoint
All requests are sent to the `/config/params` endpoint of the Config API.
[[RequestParametersAPI-SettingRequestParameters]]
== Setting Request Parameters
The request to set, unset, or update request parameters is sent as a set of Maps with names. These objects can be directly used in a request or a request handler definition.
@ -88,7 +86,6 @@ curl http://localhost:8983/solr/techproducts/config/params -H 'Content-type:appl
}'
----
[[RequestParametersAPI-UsingRequestParameterswithRequestHandlers]]
== Using Request Parameters with RequestHandlers
After creating the `my_handler_params` paramset in the above section, it is possible to define a request handler as follows:
@ -119,12 +116,10 @@ It will be equivalent to a standard request handler definition such as this one:
</requestHandler>
----
[[RequestParametersAPI-ImplicitRequestHandlers]]
=== Implicit RequestHandlers
=== Implicit RequestHandlers with the Request Parameters API
Solr ships with many out-of-the-box request handlers that may only be configured via the Request Parameters API, because their configuration is not present in `solrconfig.xml`. See <<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>> for the paramset to use when configuring an implicit request handler.
[[RequestParametersAPI-ViewingExpandedParamsetsandEffectiveParameterswithRequestHandlers]]
=== Viewing Expanded Paramsets and Effective Parameters with RequestHandlers
To see the expanded paramset and the resulting effective parameters for a RequestHandler defined with `useParams`, use the `expandParams` request param. E.g. for the `/export` request handler:
@ -134,7 +129,6 @@ To see the expanded paramset and the resulting effective parameters for a Reques
curl http://localhost:8983/solr/techproducts/config/requestHandler?componentName=/export&expandParams=true
----
[[RequestParametersAPI-ViewingRequestParameters]]
== Viewing Request Parameters
To see the paramsets that have been created, you can use the `/config/params` endpoint to read the contents of `params.json`, or use the name in the request:
@ -147,7 +141,6 @@ curl http://localhost:8983/solr/techproducts/config/params
curl http://localhost:8983/solr/techproducts/config/params/myQueries
----
[[RequestParametersAPI-TheuseParamsParameter]]
== The useParams Parameter
When making a request, the `useParams` parameter applies the request parameters sent to the request. This is translated at request time to the actual parameters.
@ -192,12 +185,10 @@ To summarize, parameters are applied in this order:
* parameter sets defined in `params.json` that have been defined in the request handler.
* parameters defined in `<defaults>` in `solrconfig.xml`.
[[RequestParametersAPI-PublicAPIs]]
== Public APIs
The RequestParams Object can be accessed using the method `SolrConfig#getRequestParams()`. Each paramset can be accessed by their name using the method `RequestParams#getRequestParams(String name)`.
[[RequestParametersAPI-Examples]]
== Examples
== Examples Using the Request Parameters API
The Solr "films" example demonstrates the use of the parameters API. See https://github.com/apache/lucene-solr/tree/master/solr/example/films for details.
The Solr "films" example demonstrates the use of the parameters API. You can use this example in your Solr installation (in the `example/films` directory) or view the files in the Apache GitHub mirror at https://github.com/apache/lucene-solr/tree/master/solr/example/films.

View File

@ -28,8 +28,7 @@ image::images/result-clustering/carrot2.png[image,width=900]
The query issued to the system was _Solr_. It seems clear that faceting could not yield a similar set of groups, although the goals of both techniques are similar—to let the user explore the set of search results and either rephrase the query or narrow the focus to a subset of current documents. Clustering is also similar to <<result-grouping.adoc#result-grouping,Result Grouping>> in that it can help to look deeper into search results, beyond the top few hits.
[[ResultClustering-PreliminaryConcepts]]
== Preliminary Concepts
== Clustering Concepts
Each *document* passed to the clustering component is composed of several logical parts:
@ -39,12 +38,11 @@ Each *document* passed to the clustering component is composed of several logica
* the main content,
* a language code of the title and content.
The identifier part is mandatory, everything else is optional but at least one of the text fields (title or content) will be required to make the clustering process reasonable. It is important to remember that logical document parts must be mapped to a particular schema and its fields. The content (text) for clustering can be sourced from either a stored text field or context-filtered using a highlighter, all these options are explained below in the <<ResultClustering-Configuration,configuration>> section.
The identifier part is mandatory, everything else is optional but at least one of the text fields (title or content) will be required to make the clustering process reasonable. It is important to remember that logical document parts must be mapped to a particular schema and its fields. The content (text) for clustering can be sourced from either a stored text field or context-filtered using a highlighter, all these options are explained below in the <<Clustering Configuration,configuration>> section.
A *clustering algorithm* is the actual logic (implementation) that discovers relationships among the documents in the search result and forms human-readable cluster labels. Depending on the choice of the algorithm the clusters may (and probably will) vary. Solr comes with several algorithms implemented in the open source http://carrot2.org[Carrot2] project, commercial alternatives also exist.
[[ResultClustering-QuickStartExample]]
== Quick Start Example
== Clustering Quick Start Example
The "```techproducts```" example included with Solr is pre-configured with all the necessary components for result clustering -- but they are disabled by default.
@ -137,16 +135,13 @@ There were a few clusters discovered for this query (`\*:*`), separating search
Depending on the quality of input documents, some clusters may not make much sense. Some documents may be left out and not be clustered at all; these will be assigned to the synthetic _Other Topics_ group, marked with the `other-topics` property set to `true` (see the XML dump above for an example). The score of the other topics group is zero.
[[ResultClustering-Installation]]
== Installation
== Installing the Clustering Contrib
The clustering contrib extension requires `dist/solr-clustering-*.jar` and all JARs under `contrib/clustering/lib`.
[[ResultClustering-Configuration]]
== Configuration
== Clustering Configuration
[[ResultClustering-DeclarationoftheSearchComponentandRequestHandler]]
=== Declaration of the Search Component and Request Handler
=== Declaration of the Clustering Search Component and Request Handler
Clustering extension is a search component and must be declared in `solrconfig.xml`. Such a component can be then appended to a request handler as the last component in the chain (because it requires search results which must be previously fetched by the search component).
@ -205,8 +200,6 @@ An example configuration could look as shown below.
</requestHandler>
----
[[ResultClustering-ConfigurationParametersoftheClusteringComponent]]
=== Configuration Parameters of the Clustering Component
The following parameters of each clustering engine or the entire clustering component (depending where they are declared) are available.
@ -237,7 +230,6 @@ If `true` and the algorithm supports hierarchical clustering, sub-clusters will
`carrot.numDescriptions`::
Maximum number of per-cluster labels to return (if the algorithm assigns more than one label to a cluster).
The `carrot.algorithm` parameter should contain a fully qualified class name of an algorithm supported by the http://project.carrot2.org[Carrot2] framework. Currently, the following algorithms are available:
* `org.carrot2.clustering.lingo.LingoClusteringAlgorithm` (open source)
@ -253,7 +245,6 @@ For a comparison of characteristics of these algorithms see the following links:
The question of which algorithm to choose depends on the amount of traffic (STC is faster than Lingo, but arguably produces less intuitive clusters, Lingo3G is the fastest algorithm but is not free or open source), expected result (Lingo3G provides hierarchical clusters, Lingo and STC provide flat clusters), and the input data (each algorithm will cluster the input slightly differently). There is no one answer which algorithm is "the best".
[[ResultClustering-ContextualandFullFieldClustering]]
=== Contextual and Full Field Clustering
The clustering engine can apply clustering to the full content of (stored) fields or it can run an internal highlighter pass to extract context-snippets before clustering. Highlighting is recommended when the logical snippet field contains a lot of content (this would affect clustering performance). Highlighting can also increase the quality of clustering because the content passed to the algorithm will be more focused around the query (it will be query-specific context). The following parameters control the internal highlighter.
@ -266,10 +257,9 @@ The size, in characters, of the snippets (aka fragments) created by the highligh
`carrot.summarySnippets`:: The number of summary snippets to generate for clustering. If not specified, the default highlighting snippet count (`hl.snippets`) will be used.
[[ResultClustering-LogicaltoDocumentFieldMapping]]
=== Logical to Document Field Mapping
As already mentioned in <<ResultClustering-PreliminaryConcepts,Preliminary Concepts>>, the clustering component clusters "documents" consisting of logical parts that need to be mapped onto physical schema of data stored in Solr. The field mapping attributes provide a connection between fields and logical document parts. Note that the content of title and snippet fields must be *stored* so that it can be retrieved at search time.
As already mentioned in <<Clustering Concepts>>, the clustering component clusters "documents" consisting of logical parts that need to be mapped onto physical schema of data stored in Solr. The field mapping attributes provide a connection between fields and logical document parts. Note that the content of title and snippet fields must be *stored* so that it can be retrieved at search time.
`carrot.title`::
The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's title. The clustering algorithms typically give more weight to the content of the title field compared to the content (snippet). For best results, the field should contain concise, noise-free content. If there is no clear title in your data, you can leave this parameter blank.
@ -280,7 +270,6 @@ The field (alternatively comma- or space-separated list of fields) that should b
`carrot.url`::
The field that should be mapped to the logical document's content URL. Leave blank if not required.
[[ResultClustering-ClusteringMultilingualContent]]
=== Clustering Multilingual Content
The field mapping specification can include a `carrot.lang` parameter, which defines the field that stores http://www.loc.gov/standards/iso639-2/php/code_list.php[ISO 639-1] code of the language in which the title and content of the document are written. This information can be stored in the index based on apriori knowledge of the documents' source or a language detection filter applied at indexing time. All algorithms inside the Carrot2 framework will accept ISO codes of languages defined in https://github.com/carrot2/carrot2/blob/master/core/carrot2-core/src/org/carrot2/core/LanguageCode.java[LanguageCode enum].
@ -295,15 +284,13 @@ A mapping of arbitrary strings into ISO 639 two-letter codes used by `carrot.lan
The default language can also be set using Carrot2-specific algorithm attributes (in this case the http://doc.carrot2.org/#section.attribute.lingo.MultilingualClustering.defaultLanguage[MultilingualClustering.defaultLanguage] attribute).
[[ResultClustering-TweakingAlgorithmSettings]]
== Tweaking Algorithm Settings
The algorithms that come with Solr are using their default settings which may be inadequate for all data sets. All algorithms have lexical resources and resources (stop words, stemmers, parameters) that may require tweaking to get better clusters (and cluster labels). For Carrot2-based algorithms it is probably best to refer to a dedicated tuning application called Carrot2 Workbench (screenshot below). From this application one can export a set of algorithm attributes as an XML file, which can be then placed under the location pointed to by `carrot.resourcesDir`.
image::images/result-clustering/carrot2-workbench.png[image,scaledwidth=75.0%]
[[ResultClustering-ProvidingDefaults]]
=== Providing Defaults
=== Providing Defaults for Clustering
The default attributes for all engines (algorithms) declared in the clustering component are placed under `carrot.resourcesDir` and with an expected file name of `engineName-attributes.xml`. So for an engine named `lingo` and the default value of `carrot.resourcesDir`, the attributes would be read from a file in `conf/clustering/carrot2/lingo-attributes.xml`.
@ -323,8 +310,7 @@ An example XML file changing the default language of documents to Polish is show
</attribute-sets>
----
[[ResultClustering-TweakingatQuery-Time]]
=== Tweaking at Query-Time
=== Tweaking Algorithms at Query-Time
The clustering component and Carrot2 clustering algorithms can accept query-time attribute overrides. Note that certain things (for example lexical resources) can only be initialized once (at startup, via the XML configuration files).
@ -332,8 +318,7 @@ An example query that changes the `LingoClusteringAlgorithm.desiredClusterCountB
The clustering engine (the algorithm declared in `solrconfig.xml`) can also be changed at runtime by passing `clustering.engine=name` request attribute: http://localhost:8983/solr/techproducts/clustering?q=*:*&rows=100&clustering.engine=kmeans
[[ResultClustering-PerformanceConsiderations]]
== Performance Considerations
== Performance Considerations with Dynamic Clustering
Dynamic clustering of search results comes with two major performance penalties:
@ -349,7 +334,6 @@ For simple queries, the clustering time will usually dominate the fetch time. If
Some of these techniques are described in _Apache SOLR and Carrot2 integration strategies_ document, available at http://carrot2.github.io/solr-integration-strategies. The topic of improving performance is also included in the Carrot2 manual at http://doc.carrot2.org/#section.advanced-topics.fine-tuning.performance.
[[ResultClustering-AdditionalResources]]
== Additional Resources
The following resources provide additional information about the clustering component in Solr and its potential applications.

View File

@ -54,8 +54,7 @@ Object 3
If you ask Solr to group these documents by "product_range", then the total amount of groups is 2, but the facets for ppm are 2 for 62 and 1 for 65.
[[ResultGrouping-RequestParameters]]
== Request Parameters
== Grouping Parameters
Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request:
@ -68,7 +67,7 @@ The name of the field by which to group results. The field must be single-valued
`group.func`::
Group based on the unique values of a function query.
+
NOTE: This option does not work with <<ResultGrouping-DistributedResultGroupingCaveats,distributed searches>>.
NOTE: This option does not work with <<Distributed Result Grouping Caveats,distributed searches>>.
`group.query`::
Return a single group of documents that match the given query.
@ -100,7 +99,7 @@ If `true`, the result of the first field grouping command is used as the main re
`group.ngroups`::
If `true`, Solr includes the number of groups that have matched the query in the results. The default value is false.
+
See below for <<ResultGrouping-DistributedResultGroupingCaveats,Distributed Result Grouping Caveats>> when using sharded indexes
See below for <<Distributed Result Grouping Caveats>> when using sharded indexes.
`group.truncate`::
If `true`, facet counts are based on the most relevant document of each group matching the query. The default value is `false`.
@ -110,7 +109,7 @@ Determines whether to compute grouped facets for the field facets specified in f
+
WARNING: There can be a heavy performance cost to this option.
+
See below for <<ResultGrouping-DistributedResultGroupingCaveats,Distributed Result Grouping Caveats>> when using sharded indexes.
See below for <<Distributed Result Grouping Caveats>> when using sharded indexes.
`group.cache.percent`::
Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is `0`. The maximum value is `100`.
@ -119,12 +118,10 @@ Testing has shown that group caching only improves search time with Boolean, wil
Any number of group commands (e.g., `group.field`, `group.func`, `group.query`, etc.) may be specified in a single request.
[[ResultGrouping-Examples]]
== Examples
== Grouping Examples
All of the following sample queries work with Solr's "`bin/solr -e techproducts`" example.
[[ResultGrouping-GroupingResultsbyField]]
=== Grouping Results by Field
In this example, we will group results based on the `manu_exact` field, which specifies the manufacturer of the items in the sample dataset.
@ -217,7 +214,6 @@ We can run the same query with the request parameter `group.main=true`. This wil
}
----
[[ResultGrouping-GroupingbyQuery]]
=== Grouping by Query
In this example, we will use the `group.query` parameter to find the top three results for "memory" in two different price ranges: 0.00 to 99.99, and over 100.
@ -267,7 +263,6 @@ In this example, we will use the `group.query` parameter to find the top three r
In this case, Solr found five matches for "memory," but only returns four results grouped by price. This is because one result for "memory" did not have a price assigned to it.
[[ResultGrouping-DistributedResultGroupingCaveats]]
== Distributed Result Grouping Caveats
Grouping is supported for <<solrcloud.adoc#solrcloud,distributed searches>>, with some caveats:

View File

@ -26,7 +26,6 @@ The roles can be used with any of the authentication plugins or with a custom au
Once defined through the API, roles are stored in `security.json`.
[[Rule-BasedAuthorizationPlugin-EnabletheAuthorizationPlugin]]
== Enable the Authorization Plugin
The plugin must be enabled in `security.json`. This file and where to put it in your system is described in detail in the section <<authentication-and-authorization-plugins.adoc#AuthenticationandAuthorizationPlugins-EnablePluginswithsecurity.json,Enable Plugins with security.json>>.
@ -61,14 +60,12 @@ There are several things defined in this example:
* The 'admin' role has been defined, and it has permission to edit security settings.
* The 'solr' user has been defined to the 'admin' role.
[[Rule-BasedAuthorizationPlugin-PermissionAttributes]]
== Permission Attributes
Each role is comprised of one or more permissions which define what the user is allowed to do. Each permission is made up of several attributes that define the allowed activity. There are some pre-defined permissions which cannot be modified.
The permissions are consulted in order they appear in `security.json`. The first permission that matches is applied for each user, so the strictest permissions should be at the top of the list. Permissions order can be controlled with a parameter of the Authorization API, as described below.
[[Rule-BasedAuthorizationPlugin-PredefinedPermissions]]
=== Predefined Permissions
There are several permissions that are pre-defined. These have fixed default values, which cannot be modified, and new attributes cannot be added. To use these attributes, simply define a role that includes this permission, and then assign a user to that role.
@ -111,15 +108,12 @@ The pre-defined permissions are:
* *read*: this permission is allowed to perform any read action on any collection. This includes querying using search handlers (using <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#RequestHandlersandSearchComponentsinSolrConfig-SearchHandlers,request handlers>>) such as `/select`, `/get`, `/browse`, `/tvrh`, `/terms`, `/clustering`, `/elevate`, `/export`, `/spell`, `/clustering`, and `/sql`. This applies to all collections by default ( `collection:"*"` ).
* *all*: Any requests coming to Solr.
[[Rule-BasedAuthorizationPlugin-AuthorizationAPI]]
== Authorization API
[[Rule-BasedAuthorizationPlugin-APIEndpoint]]
=== API Endpoint
=== Authorization API Endpoint
`/admin/authorization`: takes a set of commands to create permissions, map permissions to roles, and map roles to users.
[[Rule-BasedAuthorizationPlugin-ManagePermissions]]
=== Manage Permissions
Three commands control managing permissions:
@ -195,7 +189,6 @@ curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
"set-permission": {"name": "read", "role":"guest"}
}' http://localhost:8983/solr/admin/authorization
[[Rule-BasedAuthorizationPlugin-UpdateorDeletePermissions]]
=== Update or Delete Permissions
Permissions can be accessed using their index in the list. Use the `/admin/authorization` API to see the existing permissions and their indices.
@ -216,7 +209,6 @@ curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
}' http://localhost:8983/solr/admin/authorization
[[Rule-BasedAuthorizationPlugin-MapRolestoUsers]]
=== Map Roles to Users
A single command allows roles to be mapped to users:

View File

@ -31,7 +31,6 @@ This feature is used in the following instances:
* Replica creation
* Shard splitting
[[Rule-basedReplicaPlacement-CommonUseCases]]
== Common Use Cases
There are several situations where this functionality may be used. A few of the rules that could be implemented are listed below:
@ -43,7 +42,6 @@ There are several situations where this functionality may be used. A few of the
* Assign replica in nodes hosting less than 5 cores.
* Assign replicas in nodes hosting the least number of cores.
[[Rule-basedReplicaPlacement-RuleConditions]]
== Rule Conditions
A rule is a set of conditions that a node must satisfy before a replica core can be created there.
@ -52,9 +50,8 @@ There are three possible conditions.
* *shard*: this is the name of a shard or a wild card (* means for all shards). If shard is not specified, then the rule applies to the entire collection.
* *replica*: this can be a number or a wild-card (* means any number zero to infinity).
* *tag*: this is an attribute of a node in the cluster that can be used in a rule, e.g., “freedisk”, “cores”, “rack”, “dc”, etc. The tag name can be a custom string. If creating a custom tag, a snitch is responsible for providing tags and values. The section <<Rule-basedReplicaPlacement-Snitches,Snitches>> below describes how to add a custom tag, and defines six pre-defined tags (cores, freedisk, host, port, node, and sysprop).
* *tag*: this is an attribute of a node in the cluster that can be used in a rule, e.g., “freedisk”, “cores”, “rack”, “dc”, etc. The tag name can be a custom string. If creating a custom tag, a snitch is responsible for providing tags and values. The section <<Snitches>> below describes how to add a custom tag, and defines six pre-defined tags (cores, freedisk, host, port, node, and sysprop).
[[Rule-basedReplicaPlacement-RuleOperators]]
=== Rule Operators
A condition can have one of the following operators to set the parameters for the rule.
@ -64,25 +61,20 @@ A condition can have one of the following operators to set the parameters for th
* *less than (<)*: `tag:<x` means tag value less than x. x must be a number
* *not equal (!)*: `tag:!x` means tag value MUST NOT be equal to x. The equals check is performed on String value
[[Rule-basedReplicaPlacement-FuzzyOperator_]]
=== Fuzzy Operator (~)
This can be used as a suffix to any condition. This would first try to satisfy the rule strictly. If Solr cant find enough nodes to match the criterion, it tries to find the next best match which may not satisfy the criterion. For example, if we have a rule such as, `freedisk:>200~`, Solr will try to assign replicas of this collection on nodes with more than 200GB of free disk space. If that is not possible, the node which has the most free disk space will be chosen instead.
[[Rule-basedReplicaPlacement-ChoosingAmongEquals]]
=== Choosing Among Equals
The nodes are sorted first and the rules are used to sort them. This ensures that even if many nodes match the rules, the best nodes are picked up for node assignment. For example, if there is a rule such as `freedisk:>20`, nodes are sorted first on disk space descending and the node with the most disk space is picked up first. Or, if the rule is `cores:<5`, nodes are sorted with number of cores ascending and the node with the least number of cores is picked up first.
[[Rule-basedReplicaPlacement-Rulesfornewshards]]
== Rules for new shards
== Rules for New Shards
The rules are persisted along with collection state. So, when a new replica is created, the system will assign replicas satisfying the rules. When a new shard is created as a result of using the Collection API's <<collections-api.adoc#CollectionsAPI-createshard,CREATESHARD command>>, ensure that you have created rules specific for that shard name. Rules can be altered using the <<collections-api.adoc#CollectionsAPI-modifycollection,MODIFYCOLLECTION command>>. However, it is not required to do so if the rules do not specify explicit shard names. For example, a rule such as `shard:shard1,replica:*,ip_3:168:`, will not apply to any new shard created. But, if your rule is `replica:*,ip_3:168`, then it will apply to any new shard created.
The same is applicable to shard splitting. Shard splitting is treated exactly the same way as shard creation. Even though `shard1_1` and `shard1_2` may be created from `shard1`, the rules treat them as distinct, unrelated shards.
[[Rule-basedReplicaPlacement-Snitches]]
== Snitches
Tag values come from a plugin called Snitch. If there is a tag named rack in a rule, there must be Snitch which provides the value for rack for each node in the cluster. A snitch implements the Snitch interface. Solr, by default, provides a default snitch which provides the following tags:
@ -96,7 +88,6 @@ Tag values come from a plugin called Snitch. If there is a tag named rack
* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
* *sysprop.{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
[[Rule-basedReplicaPlacement-HowSnitchesareConfigured]]
=== How Snitches are Configured
It is possible to use one or more snitches for a set of rules. If the rules only need tags from default snitch it need not be explicitly configured. For example:
@ -114,11 +105,8 @@ snitch=class:fqn.ClassName,key1:val1,key2:val2,key3:val3
. After identifying the Snitches, they provide the tag values for each node in the cluster.
. If the value for a tag is not obtained for a given node, it cannot participate in the assignment.
[[Rule-basedReplicaPlacement-Examples]]
== Examples
== Replica Placement Examples
[[Rule-basedReplicaPlacement-Keeplessthan2replicas_atmost1replica_ofthiscollectiononanynode]]
=== Keep less than 2 replicas (at most 1 replica) of this collection on any node
For this rule, we define the `replica` condition with operators for "less than 2", and use a pre-defined tag named `node` to define nodes with any name.
@ -129,8 +117,6 @@ replica:<2,node:*
// this is equivalent to replica:<2,node:*,shard:**. We can omit shard:** because ** is the default value of shard
----
[[Rule-basedReplicaPlacement-Foragivenshard_keeplessthan2replicasonanynode]]
=== For a given shard, keep less than 2 replicas on any node
For this rule, we use the `shard` condition to define any shard , the `replica` condition with operators for "less than 2", and finally a pre-defined tag named `node` to define nodes with any name.
@ -140,7 +126,6 @@ For this rule, we use the `shard` condition to define any shard , the `replica`
shard:*,replica:<2,node:*
----
[[Rule-basedReplicaPlacement-Assignallreplicasinshard1torack730]]
=== Assign all replicas in shard1 to rack 730
This rule limits the `shard` condition to 'shard1', but any number of replicas. We're also referencing a custom tag named `rack`. Before defining this rule, we will need to configure a custom Snitch which provides values for the tag `rack`.
@ -157,7 +142,6 @@ In this case, the default value of `replica` is * (or, all replicas). So, it can
shard:shard1,rack:730
----
[[Rule-basedReplicaPlacement-Createreplicasinnodeswithlessthan5coresonly]]
=== Create replicas in nodes with less than 5 cores only
This rule uses the `replica` condition to define any number of replicas, but adds a pre-defined tag named `core` and uses operators for "less than 5".
@ -174,7 +158,6 @@ Again, we can simplify this to use the default value for `replica`, like so:
cores:<5
----
[[Rule-basedReplicaPlacement-Donotcreateanyreplicasinhost192.45.67.3]]
=== Do not create any replicas in host 192.45.67.3
This rule uses only the pre-defined tag `host` to define an IP address where replicas should not be placed.
@ -184,7 +167,6 @@ This rule uses only the pre-defined tag `host` to define an IP address where rep
host:!192.45.67.3
----
[[Rule-basedReplicaPlacement-DefiningRules]]
== Defining Rules
Rules are specified per collection during collection creation as request parameters. It is possible to specify multiple rule and snitch params as in this example:

View File

@ -31,7 +31,6 @@ Schemaless mode requires enabling the Managed Schema if it is not already, but f
While the "read" features of the Schema API are supported for all schema types, support for making schema modifications programatically depends on the `<schemaFactory/>` in use.
[[SchemaFactoryDefinitioninSolrConfig-SolrUsesManagedSchemabyDefault]]
== Solr Uses Managed Schema by Default
When a `<schemaFactory/>` is not explicitly declared in a `solrconfig.xml` file, Solr implicitly uses a `ManagedIndexSchemaFactory`, which is by default `"mutable"` and keeps schema information in a `managed-schema` file.
@ -54,7 +53,6 @@ If you wish to explicitly configure `ManagedIndexSchemaFactory` the following op
With the default configuration shown above, you can use the <<schema-api.adoc#schema-api,Schema API>> to modify the schema as much as you want, and then later change the value of `mutable` to *false* if you wish to "lock" the schema in place and prevent future changes.
[[SchemaFactoryDefinitioninSolrConfig-Classicschema.xml]]
== Classic schema.xml
An alternative to using a managed schema is to explicitly configure a `ClassicIndexSchemaFactory`. `ClassicIndexSchemaFactory` requires the use of a `schema.xml` configuration file, and disallows any programatic changes to the Schema at run time. The `schema.xml` file must be edited manually and is only loaded only when the collection is loaded.
@ -64,7 +62,6 @@ An alternative to using a managed schema is to explicitly configure a `ClassicIn
<schemaFactory class="ClassicIndexSchemaFactory"/>
----
[[SchemaFactoryDefinitioninSolrConfig-Switchingfromschema.xmltoManagedSchema]]
=== Switching from schema.xml to Managed Schema
If you have an existing Solr collection that uses `ClassicIndexSchemaFactory`, and you wish to convert to use a managed schema, you can simply modify the `solrconfig.xml` to specify the use of the `ManagedIndexSchemaFactory`.
@ -78,7 +75,6 @@ Once Solr is restarted and it detects that a `schema.xml` file exists, but the `
You are now free to use the <<schema-api.adoc#schema-api,Schema API>> as much as you want to make changes, and remove the `schema.xml.bak`.
[[SchemaFactoryDefinitioninSolrConfig-SwitchingfromManagedSchematoManuallyEditedschema.xml]]
=== Switching from Managed Schema to Manually Edited schema.xml
If you have started Solr with managed schema enabled and you would like to switch to manually editing a `schema.xml` file, you should take the following steps:

View File

@ -26,7 +26,6 @@ These Solr features, all controlled via `solrconfig.xml`, are:
. Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.
. Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see <<solr-field-types.adoc#solr-field-types,Solr Field Types>>.
[[SchemalessMode-UsingtheSchemalessExample]]
== Using the Schemaless Example
The three features of schemaless mode are pre-configured in the `_default` <<config-sets.adoc#config-sets,config set>> in the Solr distribution. To start an example instance of Solr using these configs, run the following command:
@ -67,12 +66,10 @@ You can use the `/schema/fields` <<schema-api.adoc#schema-api,Schema API>> to co
"uniqueKey":true}]}
----
[[SchemalessMode-ConfiguringSchemalessMode]]
== Configuring Schemaless Mode
As described above, there are three configuration elements that need to be in place to use Solr in schemaless mode. In the `_default` config set included with Solr these are already configured. If, however, you would like to implement schemaless on your own, you should make the following changes.
[[SchemalessMode-EnableManagedSchema]]
=== Enable Managed Schema
As described in the section <<schema-factory-definition-in-solrconfig.adoc#schema-factory-definition-in-solrconfig,Schema Factory Definition in SolrConfig>>, Managed Schema support is enabled by default, unless your configuration specifies that `ClassicIndexSchemaFactory` should be used.
@ -87,7 +84,6 @@ You can configure the `ManagedIndexSchemaFactory` (and control the resource file
</schemaFactory>
----
[[SchemalessMode-DefineanUpdateRequestProcessorChain]]
=== Define an UpdateRequestProcessorChain
The UpdateRequestProcessorChain allows Solr to guess field types, and you can define the default field type classes to use. To start, you should define it as follows (see the javadoc links below for update processor factory documentation):
@ -174,7 +170,6 @@ Javadocs for update processor factories mentioned above:
* {solr-javadocs}/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html[ParseDateFieldUpdateProcessorFactory]
* {solr-javadocs}/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html[AddSchemaFieldsUpdateProcessorFactory]
[[SchemalessMode-MaketheUpdateRequestProcessorChaintheDefaultfortheUpdateRequestHandler]]
=== Make the UpdateRequestProcessorChain the Default for the UpdateRequestHandler
Once the UpdateRequestProcessorChain has been defined, you must instruct your UpdateRequestHandlers to use it when working with index updates (i.e., adding, removing, replacing documents). There are two ways to do this. The update chain shown above has a `default=true` attribute which will use it for any update handler. An alternative, more explicit way is to use <<initparams-in-solrconfig.adoc#initparams-in-solrconfig,InitParams>> to set the defaults on all `/update` request handlers:
@ -193,7 +188,6 @@ Once the UpdateRequestProcessorChain has been defined, you must instruct your Up
After each of these changes have been made, Solr should be restarted (or, you can reload the cores to load the new `solrconfig.xml` definitions).
====
[[SchemalessMode-ExamplesofIndexedDocuments]]
== Examples of Indexed Documents
Once the schemaless mode has been enabled (whether you configured it manually or are using `_default`), documents that include fields that are not defined in your schema will be indexed, using the guessed field types which are automatically added to the schema.
@ -243,13 +237,14 @@ The fields now in the schema (output from `curl \http://localhost:8983/solr/gett
"name":"Sold",
"type":"plongs"},
{
"name":"_root_" ...}
"name":"_root_", ...},
{
"name":"_text_" ...}
"name":"_text_", ...},
{
"name":"_version_" ...}
"name":"_version_", ...},
{
"name":"id" ...}
"name":"id", ...}
]}
----
In addition string versions of the text fields are indexed, using copyFields to a `*_str` dynamic field: (output from `curl \http://localhost:8983/solr/gettingstarted/schema/copyfields` ):
@ -277,7 +272,7 @@ Even if you want to use schemaless mode for most fields, you can still use the <
Internally, the Schema API and the Schemaless Update Processors both use the same <<schema-factory-definition-in-solrconfig.adoc#schema-factory-definition-in-solrconfig,Managed Schema>> functionality.
Also, if you do not need the `*_str` version of a text field, you can simply remove the `copyField` definition from the auto-generated schema and it will not be re-added since the original field is now defined.
Also, if you do not need the `*_str` version of a text field, you can simply remove the `copyField` definition from the auto-generated schema and it will not be re-added since the original field is now defined.
====
Once a field has been added to the schema, its field type is fixed. As a consequence, adding documents with field value(s) that conflict with the previously guessed field type will fail. For example, after adding the above document, the "```Sold```" field has the fieldType `plongs`, but the document below has a non-integral decimal value in this field:

View File

@ -40,7 +40,6 @@ For example, if you only have two ZooKeeper nodes and one goes down, 50% of avai
More information on ZooKeeper clusters is available from the ZooKeeper documentation at http://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup.
[[SettingUpanExternalZooKeeperEnsemble-DownloadApacheZooKeeper]]
== Download Apache ZooKeeper
The first step in setting up Apache ZooKeeper is, of course, to download the software. It's available from http://zookeeper.apache.org/releases.html.
@ -52,15 +51,12 @@ When using stand-alone ZooKeeper, you need to take care to keep your version of
Solr currently uses Apache ZooKeeper v3.4.10.
====
[[SettingUpanExternalZooKeeperEnsemble-SettingUpaSingleZooKeeper]]
== Setting Up a Single ZooKeeper
[[SettingUpanExternalZooKeeperEnsemble-Createtheinstance]]
=== Create the instance
=== Create the Instance
Creating the instance is a simple matter of extracting the files into a specific target directory. The actual directory itself doesn't matter, as long as you know where it is, and where you'd like to have ZooKeeper store its internal data.
[[SettingUpanExternalZooKeeperEnsemble-Configuretheinstance]]
=== Configure the instance
=== Configure the Instance
The next step is to configure your ZooKeeper instance. To do that, create the following file: `<ZOOKEEPER_HOME>/conf/zoo.cfg`. To this file, add the following information:
[source,bash]
@ -80,15 +76,13 @@ The parameters are as follows:
Once this file is in place, you're ready to start the ZooKeeper instance.
[[SettingUpanExternalZooKeeperEnsemble-Runtheinstance]]
=== Run the instance
=== Run the Instance
To run the instance, you can simply use the `ZOOKEEPER_HOME/bin/zkServer.sh` script provided, as with this command: `zkServer.sh start`
Again, ZooKeeper provides a great deal of power through additional configurations, but delving into them is beyond the scope of this tutorial. For more information, see the ZooKeeper http://zookeeper.apache.org/doc/r3.4.5/zookeeperStarted.html[Getting Started] page. For this example, however, the defaults are fine.
[[SettingUpanExternalZooKeeperEnsemble-PointSolrattheinstance]]
=== Point Solr at the instance
=== Point Solr at the Instance
Pointing Solr at the ZooKeeper instance you've created is a simple matter of using the `-z` parameter when using the bin/solr script. For example, in order to point the Solr instance to the ZooKeeper you've started on port 2181, this is what you'd need to do:
@ -108,12 +102,10 @@ bin/solr start -cloud -s <path to solr home for new node> -p 8987 -z localhost:2
NOTE: When you are not using an example to start solr, make sure you upload the configuration set to ZooKeeper before creating the collection.
[[SettingUpanExternalZooKeeperEnsemble-ShutdownZooKeeper]]
=== Shut down ZooKeeper
=== Shut Down ZooKeeper
To shut down ZooKeeper, use the zkServer script with the "stop" command: `zkServer.sh stop`.
[[SettingUpanExternalZooKeeperEnsemble-SettingupaZooKeeperEnsemble]]
== Setting up a ZooKeeper Ensemble
With an external ZooKeeper ensemble, you need to set things up just a little more carefully as compared to the Getting Started example.
@ -188,8 +180,7 @@ Once these servers are running, you can reference them from Solr just as you did
bin/solr start -e cloud -z localhost:2181,localhost:2182,localhost:2183 -noprompt
----
[[SettingUpanExternalZooKeeperEnsemble-SecuringtheZooKeeperconnection]]
== Securing the ZooKeeper connection
== Securing the ZooKeeper Connection
You may also want to secure the communication between ZooKeeper and Solr.

View File

@ -24,7 +24,6 @@ IMPORTANT: This requires Apache Zeppelin 0.6.0 or greater which contains the JDB
To use http://zeppelin.apache.org[Apache Zeppelin] with Solr, you will need to create a JDBC interpreter for Solr. This will add SolrJ to the interpreter classpath. Once the interpreter has been created, you can create a notebook to issue queries. The http://zeppelin.apache.org/docs/latest/interpreter/jdbc.html[Apache Zeppelin JDBC interpreter documentation] provides additional information about JDBC prefixes and other features.
[[SolrJDBC-ApacheZeppelin-CreatetheApacheSolrJDBCInterpreter]]
== Create the Apache Solr JDBC Interpreter
.Click "Interpreter" in the top navigation
@ -41,7 +40,6 @@ image::images/solr-jdbc-apache-zeppelin/zeppelin_solrjdbc_3.png[image,height=400
For most installations, Apache Zeppelin configures PostgreSQL as the JDBC interpreter default driver. The default driver can either be replaced by the Solr driver as outlined above or you can add a separate JDBC interpreter prefix as outlined in the http://zeppelin.apache.org/docs/latest/interpreter/jdbc.html[Apache Zeppelin JDBC interpreter documentation].
====
[[SolrJDBC-ApacheZeppelin-CreateaNotebook]]
== Create a Notebook
.Click Notebook \-> Create new note
@ -50,7 +48,6 @@ image::images/solr-jdbc-apache-zeppelin/zeppelin_solrjdbc_4.png[image,width=517,
.Provide a name and click "Create Note"
image::images/solr-jdbc-apache-zeppelin/zeppelin_solrjdbc_5.png[image,width=839,height=400]
[[SolrJDBC-ApacheZeppelin-QuerywiththeNotebook]]
== Query with the Notebook
[IMPORTANT]

View File

@ -27,10 +27,8 @@ For https://www.dbvis.com/[DbVisualizer], you will need to create a new driver f
Once the driver has been created, you can create a connection to Solr with the connection string format outlined in the generic section and use the SQL Commander to issue queries.
[[SolrJDBC-DbVisualizer-SetupDriver]]
== Setup Driver
[[SolrJDBC-DbVisualizer-OpenDriverManager]]
=== Open Driver Manager
From the Tools menu, choose Driver Manager to add a driver.
@ -38,21 +36,18 @@ From the Tools menu, choose Driver Manager to add a driver.
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_1.png[image,width=673,height=400]
[[SolrJDBC-DbVisualizer-CreateaNewDriver]]
=== Create a New Driver
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_2.png[image,width=532,height=400]
[[SolrJDBC-DbVisualizer-NametheDriver]]
=== Name the Driver
=== Name the Driver in Driver Manager
Provide a name for the driver, and provide the URL format: `jdbc:solr://<zk_connection_string>/?collection=<collection>`. Do not fill in values for the variables "```zk_connection_string```" and "```collection```", those will be provided later when the connection to Solr is configured. The Driver Class will also be automatically added when the driver .jars are added.
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_3.png[image,width=532,height=400]
[[SolrJDBC-DbVisualizer-AddDriverFilestoClasspath]]
=== Add Driver Files to Classpath
The driver files to be added are:
@ -75,17 +70,14 @@ image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_7.png[image,width=655
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_9.png[image,width=651,height=400]
[[SolrJDBC-DbVisualizer-ReviewandCloseDriverManager]]
=== Review and Close Driver Manager
Once the driver files have been added, you can close the Driver Manager.
[[SolrJDBC-DbVisualizer-CreateaConnection]]
== Create a Connection
Next, create a connection to Solr using the driver just created.
[[SolrJDBC-DbVisualizer-UsetheConnectionWizard]]
=== Use the Connection Wizard
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_11.png[image,width=763,height=400]
@ -94,19 +86,16 @@ image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_11.png[image,width=76
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_12.png[image,width=807,height=400]
[[SolrJDBC-DbVisualizer-NametheConnection]]
=== Name the Connection
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_13.png[image,width=402,height=400]
[[SolrJDBC-DbVisualizer-SelecttheSolrdriver]]
=== Select the Solr driver
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_14.png[image,width=399,height=400]
[[SolrJDBC-DbVisualizer-SpecifytheSolrURL]]
=== Specify the Solr URL
Provide the Solr URL, using the ZooKeeper host and port and the collection. For example, `jdbc:solr://localhost:9983?collection=test`
@ -114,7 +103,6 @@ Provide the Solr URL, using the ZooKeeper host and port and the collection. For
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_15.png[image,width=401,height=400]
[[SolrJDBC-DbVisualizer-OpenandConnecttoSolr]]
== Open and Connect to Solr
Once the connection has been created, double-click on it to open the connection details screen and connect to Solr.
@ -125,7 +113,6 @@ image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_16.png[image,width=62
image::images/solr-jdbc-dbvisualizer/dbvisualizer_solrjdbc_17.png[image,width=592,height=400]
[[SolrJDBC-DbVisualizer-OpenSQLCommandertoEnterQueries]]
== Open SQL Commander to Enter Queries
When the connection is established, you can use the SQL Commander to issue queries and view data.

View File

@ -42,7 +42,6 @@ There are four main field types available for spatial search:
Some esoteric details that are not in this guide can be found at http://wiki.apache.org/solr/SpatialSearch.
[[SpatialSearch-LatLonPointSpatialField]]
== LatLonPointSpatialField
Here's how `LatLonPointSpatialField` (LLPSF) should usually be configured in the schema:
@ -52,7 +51,6 @@ Here's how `LatLonPointSpatialField` (LLPSF) should usually be configured in the
LLPSF supports toggling `indexed`, `stored`, `docValues`, and `multiValued`. LLPSF internally uses a 2-dimensional Lucene "Points" (BDK tree) index when "indexed" is enabled (the default). When "docValues" is enabled, a latitude and longitudes pair are bit-interleaved into 64 bits and put into Lucene DocValues. The accuracy of the docValues data is about a centimeter.
[[SpatialSearch-IndexingPoints]]
== Indexing Points
For indexing geodetic points (latitude and longitude), supply it in "lat,lon" order (comma separated).
@ -61,7 +59,6 @@ For indexing non-geodetic points, it depends. Use `x y` (a space) if RPT. For Po
If you'd rather use a standard industry format, Solr supports WKT and GeoJSON. However it's much bulkier than the raw coordinates for such simple data. (Not supported by the deprecated LatLonType or PointType)
[[SpatialSearch-SearchingwithQueryParsers]]
== Searching with Query Parsers
There are two spatial Solr "query parsers" for geospatial search: `geofilt` and `bbox`. They take the following parameters:
@ -100,7 +97,6 @@ When used with `BBoxField`, additional options are supported:
(Advanced option; not supported by LatLonType (deprecated) or PointType). If you only want the query to score (with the above `score` local parameter), not filter, then set this local parameter to false.
[[SpatialSearch-geofilt]]
=== geofilt
The `geofilt` filter allows you to retrieve results based on the geospatial distance (AKA the "great circle distance") from a given point. Another way of looking at it is that it creates a circular shape filter. For example, to find all documents within five kilometers of a given lat/lon point, you could enter `&q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5`. This filter returns all results within a circle of the given radius around the initial point:
@ -108,7 +104,6 @@ The `geofilt` filter allows you to retrieve results based on the geospatial dist
image::images/spatial-search/circle.png[5KM radius]
[[SpatialSearch-bbox]]
=== bbox
The `bbox` filter is very similar to `geofilt` except it uses the _bounding box_ of the calculated circle. See the blue box in the diagram below. It takes the same parameters as geofilt.
@ -126,7 +121,6 @@ image::images/spatial-search/bbox.png[Bounding box]
When a bounding box includes a pole, the bounding box ends up being a "bounding bowl" (a _spherical cap_) that includes all values north of the lowest latitude of the circle if it touches the north pole (or south of the highest latitude if it touches the south pole).
====
[[SpatialSearch-Filteringbyanarbitraryrectangle]]
=== Filtering by an Arbitrary Rectangle
Sometimes the spatial search requirement calls for finding everything in a rectangular area, such as the area covered by a map the user is looking at. For this case, geofilt and bbox won't cut it. This is somewhat of a trick, but you can use Solr's range query syntax for this by supplying the lower-left corner as the start of the range and the upper-right corner as the end of the range.
@ -138,7 +132,6 @@ Here's an example:
LatLonType (deprecated) does *not* support rectangles that cross the dateline. For RPT and BBoxField, if you are non-geospatial coordinates (`geo="false"`) then you must quote the points due to the space, e.g. `"x y"`.
[[SpatialSearch-Optimizing_CacheorNot]]
=== Optimizing: Cache or Not
It's most common to put a spatial query into an "fq" parameter a filter query. By default, Solr will cache the query in the filter cache.
@ -149,7 +142,6 @@ If you know the filter query (be it spatial or not) is fairly unique and not lik
LLPSF does not support Solr's "PostFilter".
[[SpatialSearch-DistanceSortingorBoosting_FunctionQueries_]]
== Distance Sorting or Boosting (Function Queries)
There are four distance function queries:
@ -161,7 +153,6 @@ There are four distance function queries:
For more information about these function queries, see the section on <<function-queries.adoc#function-queries,Function Queries>>.
[[SpatialSearch-geodist]]
=== geodist
`geodist` is a distance function that takes three optional parameters: `(sfield,latitude,longitude)`. You can use the `geodist` function to sort results by distance or score return results.
@ -170,19 +161,16 @@ For example, to sort your results by ascending distance, enter `...&q=*:*&fq={!g
To return the distance as the document score, enter `...&q={!func}geodist()&sfield=store&pt=45.15,-93.85&sort=score+asc`.
[[SpatialSearch-MoreExamples]]
== More Examples
== More Spatial Search Examples
Here are a few more useful examples of what you can do with spatial search in Solr.
[[SpatialSearch-UseasaSub-QuerytoExpandSearchResults]]
=== Use as a Sub-Query to Expand Search Results
Here we will query for results in Jacksonville, Florida, or within 50 kilometers of 45.15,-93.85 (near Buffalo, Minnesota):
`&q=*:*&fq=(state:"FL" AND city:"Jacksonville") OR {!geofilt}&sfield=store&pt=45.15,-93.85&d=50&sort=geodist()+asc`
[[SpatialSearch-FacetbyDistance]]
=== Facet by Distance
To facet by distance, you can use the Frange query parser:
@ -191,14 +179,12 @@ To facet by distance, you can use the Frange query parser:
There are other ways to do it too, like using a \{!geofilt} in each facet.query.
[[SpatialSearch-BoostNearestResults]]
=== Boost Nearest Results
Using the <<the-dismax-query-parser.adoc#the-dismax-query-parser,DisMax>> or <<the-extended-dismax-query-parser.adoc#the-extended-dismax-query-parser,Extended DisMax>>, you can combine spatial search with the boost function to boost the nearest results:
`&q.alt=*:*&fq={!geofilt}&sfield=store&pt=45.15,-93.85&d=50&bf=recip(geodist(),2,200,20)&sort=score desc`
[[SpatialSearch-RPT]]
== RPT
RPT refers to either `SpatialRecursivePrefixTreeFieldType` (aka simply RPT) and an extended version: `RptWithGeometrySpatialField` (aka RPT with Geometry). RPT offers several functional improvements over LatLonPointSpatialField:
@ -215,8 +201,7 @@ RPT _shares_ various features in common with `LatLonPointSpatialField`. Some are
* Sort/boost via `geodist`
* Well-Known-Text (WKT) shape syntax (required for specifying polygons & other complex shapes), and GeoJSON too. In addition to indexing and searching, this works with the `wt=geojson` (GeoJSON Solr response-writer) and `[geo f=myfield]` (geo Solr document-transformer).
[[SpatialSearch-Schemaconfiguration]]
=== Schema Configuration
=== Schema Configuration for RPT
To use RPT, the field type must be registered and configured in `schema.xml`. There are many options for this field type.
@ -266,7 +251,6 @@ A third choice is `packedQuad`, which is generally more efficient than `quad`, p
*_And there are others:_* `normWrapLongitude`, `datelineRule`, `validationRule`, `autoIndex`, `allowMultiOverlap`, `precisionModel`. For further info, see notes below about `spatialContextFactory` implementations referenced above, especially the link to the JTS based one.
[[SpatialSearch-JTSandPolygons]]
=== JTS and Polygons
As indicated above, `spatialContextFactory` must be set to `JTS` for polygon support, including multi-polygon.
@ -297,7 +281,6 @@ Inside the parenthesis following the search predicate is the shape definition. T
Beyond this Reference Guide and Spatila4j's docs, there are some details that remain at the Solr Wiki at http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4.
[[SpatialSearch-RptWithGeometrySpatialField]]
=== RptWithGeometrySpatialField
The `RptWithGeometrySpatialField` field type is a derivative of `SpatialRecursivePrefixTreeFieldType` that also stores the original geometry internally in Lucene DocValues, which it uses to achieve accurate search. It can also be used for indexed point fields. The Intersects predicate (the default) is particularly fast, since many search results can be returned as an accurate hit without requiring a geometry check. This field type is configured just like RPT except that the default `distErrPct` is 0.15 (higher than 0.025) because the grid squares are purely for performance and not to fundamentally represent the shape.
@ -316,7 +299,6 @@ An optional in-memory cache can be defined in `solrconfig.xml`, which should be
When using this field type, you will likely _not_ want to mark the field as stored because it's redundant with the DocValues data and surely larger because of the formatting (be it WKT or GeoJSON). To retrieve the spatial data in search results from DocValues, use the `[geo]` transformer -- <<transforming-result-documents.adoc#transforming-result-documents,Transforming Result Documents>>.
[[SpatialSearch-HeatmapFaceting]]
=== Heatmap Faceting
The RPT field supports generating a 2D grid of facet counts for documents having spatial data in each grid cell. For high-detail grids, this can be used to plot points, and for lesser detail it can be used for heatmap generation. The grid cells are determined at index-time based on RPT's configuration. At facet counting time, the indexed cells in the region of interest are traversed and a grid of counters corresponding to each cell are incremented. Solr can return the data in a straight-forward 2D array of integers or in a PNG which compresses better for larger data sets but must be decoded.
@ -365,7 +347,6 @@ The `counts_ints2D` key has a 2D array of integers. The initial outer level is i
If `format=png` then the output key is `counts_png`. It's a base-64 encoded string of a 4-byte PNG. The PNG logically holds exactly the same data that the ints2D format does. Note that the alpha channel byte is flipped to make it easier to view the PNG for diagnostic purposes, since otherwise counts would have to exceed 2^24 before it becomes non-opague. Thus counts greater than this value will become opaque.
[[SpatialSearch-BBoxField]]
== BBoxField
The `BBoxField` field type indexes a single rectangle (bounding box) per document field and supports searching via a bounding box. It supports most spatial search predicates, it has enhanced relevancy modes based on the overlap or area between the search rectangle and the indexed rectangle. It's particularly useful for its relevancy modes. To configure it in the schema, use a configuration like this:

View File

@ -36,7 +36,6 @@ The `solrconfig.xml` found in Solr's "```techproducts```" example has a Suggeste
The "```techproducts```" example `solrconfig.xml` has a `suggest` search component and a `/suggest` request handler already configured. You can use that as the basis for your configuration, or create it from scratch, as detailed below.
[[Suggester-AddingtheSuggestSearchComponent]]
== Adding the Suggest Search Component
The first step is to add a search component to `solrconfig.xml` and tell it to use the SuggestComponent. Here is some sample code that could be used.
@ -56,7 +55,6 @@ The first step is to add a search component to `solrconfig.xml` and tell it to u
</searchComponent>
----
[[Suggester-SuggesterSearchComponentParameters]]
=== Suggester Search Component Parameters
The Suggester search component takes several configuration parameters.
@ -72,10 +70,10 @@ Arbitrary name for the search component.
A symbolic name for this suggester. You can refer to this name in the URL parameters and in the SearchHandler configuration. It is possible to have multiples of these in one `solrconfig.xml` file.
`lookupImpl`::
Lookup implementation. There are several possible implementations, described below in the section <<Suggester-LookupImplementations,Lookup Implementations>>. If not set, the default lookup is `JaspellLookupFactory`.
Lookup implementation. There are several possible implementations, described below in the section <<Lookup Implementations>>. If not set, the default lookup is `JaspellLookupFactory`.
`dictionaryImpl`::
The dictionary implementation to use. There are several possible implementations, described below in the section <<Suggester-DictionaryImplementations,Dictionary Implementations>>.
The dictionary implementation to use. There are several possible implementations, described below in the section <<Dictionary Implementations>>.
+
If not set, the default dictionary implementation is `HighFrequencyDictionaryFactory`. However, if a `sourceLocation` is used, the dictionary implementation will be `FileDictionaryFactory`.
@ -113,12 +111,10 @@ If `true,` then the lookup data structure will be built when Solr starts or when
+
Enabling this to `true` could lead to the core talking longer to load (or reload) as the suggester data structure needs to be built, which can sometimes take a long time. Its usually preferred to have this setting set to `false`, the default, and build suggesters manually issuing requests with `suggest.build=true`.
[[Suggester-LookupImplementations]]
=== Lookup Implementations
The `lookupImpl` parameter defines the algorithms used to look up terms in the suggest index. There are several possible implementations to choose from, and some require additional parameters to be configured.
[[Suggester-AnalyzingLookupFactory]]
==== AnalyzingLookupFactory
A lookup that first analyzes the incoming text and adds the analyzed form to a weighted FST, and then does the same thing at lookup time.
@ -137,7 +133,6 @@ If `true`, the default, then a separator between tokens is preserved. This means
`preservePositionIncrements`::
If `true`, the suggester will preserve position increments. This means that token filters which leave gaps (for example, when StopFilter matches a stopword) the position would be respected when building the suggester. The default is `false`.
[[Suggester-FuzzyLookupFactory]]
==== FuzzyLookupFactory
This is a suggester which is an extension of the AnalyzingSuggester but is fuzzy in nature. The similarity is measured by the Levenshtein algorithm.
@ -174,7 +169,6 @@ The minimum length of query before which any string edits will be allowed. The d
`unicodeAware`::
If `true`, the `maxEdits`, `minFuzzyLength`, `transpositions` and `nonFuzzyPrefix` parameters will be measured in unicode code points (actual letters) instead of bytes. The default is `false`.
[[Suggester-AnalyzingInfixLookupFactory]]
==== AnalyzingInfixLookupFactory
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This uses a Lucene index for its dictionary.
@ -193,9 +187,8 @@ Boolean option for multiple terms. The default is `true`, all terms will be requ
`highlight`::
Highlight suggest terms. Default is `true`.
This implementation supports <<Suggester-ContextFiltering,Context Filtering>>.
This implementation supports <<Context Filtering>>.
[[Suggester-BlendedInfixLookupFactory]]
==== BlendedInfixLookupFactory
An extension of the `AnalyzingInfixSuggester` which provides additional functionality to weight prefix matches across the matched documents. You can tell it to score higher if a hit is closer to the start of the suggestion or vice versa.
@ -220,9 +213,8 @@ When using `BlendedInfixSuggester` you can provide your own path where the index
`minPrefixChars`::
Minimum number of leading characters before PrefixQuery is used (the default is `4`). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
This implementation supports <<Suggester-ContextFiltering,Context Filtering>> .
This implementation supports <<Context Filtering>> .
[[Suggester-FreeTextLookupFactory]]
==== FreeTextLookupFactory
It looks at the last tokens plus the prefix of whatever final token the user is typing, if present, to predict the most likely next token. The number of previous tokens that need to be considered can also be specified. This suggester would only be used as a fallback, when the primary suggester fails to find any suggestions.
@ -235,7 +227,6 @@ The analyzer used at "query-time" and "build-time" to analyze suggestions. This
`ngrams`::
The max number of tokens out of which singles will be made the dictionary. The default value is `2`. Increasing this would mean you want more than the previous 2 tokens to be taken into consideration when making the suggestions.
[[Suggester-FSTLookupFactory]]
==== FSTLookupFactory
An automaton-based lookup. This implementation is slower to build, but provides the lowest memory cost. We recommend using this implementation unless you need more sophisticated matching results, in which case you should use the Jaspell implementation.
@ -248,29 +239,24 @@ If `true`, the default, exact suggestions are returned first, even if they are p
`weightBuckets`::
The number of separate buckets for weights which the suggester will use while building its dictionary.
[[Suggester-TSTLookupFactory]]
==== TSTLookupFactory
A simple compact ternary trie based lookup.
[[Suggester-WFSTLookupFactory]]
==== WFSTLookupFactory
A weighted automaton representation which is an alternative to `FSTLookup` for more fine-grained ranking. `WFSTLookup` does not use buckets, but instead a shortest path algorithm.
Note that it expects weights to be whole numbers. If weight is missing it's assumed to be `1.0`. Weights affect the sorting of matching suggestions when `spellcheck.onlyMorePopular=true` is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights.
[[Suggester-JaspellLookupFactory]]
==== JaspellLookupFactory
A more complex lookup based on a ternary trie from the http://jaspell.sourceforge.net/[JaSpell] project. Use this implementation if you need more sophisticated matching results.
[[Suggester-DictionaryImplementations]]
=== Dictionary Implementations
The dictionary implementations define how terms are stored. There are several options, and multiple dictionaries can be used in a single request if necessary.
[[Suggester-DocumentDictionaryFactory]]
==== DocumentDictionaryFactory
A dictionary with terms, weights, and an optional payload taken from the index.
@ -286,7 +272,6 @@ The `payloadField` should be a field that is stored. This parameter is optional.
`contextField`::
Field to be used for context filtering. Note that only some lookup implementations support filtering.
[[Suggester-DocumentExpressionDictionaryFactory]]
==== DocumentExpressionDictionaryFactory
This dictionary implementation is the same as the `DocumentDictionaryFactory` but allows users to specify an arbitrary expression into the `weightExpression` tag.
@ -302,7 +287,6 @@ An arbitrary expression used for scoring the suggestions. The fields used must b
`contextField`::
Field to be used for context filtering. Note that only some lookup implementations support filtering.
[[Suggester-HighFrequencyDictionaryFactory]]
==== HighFrequencyDictionaryFactory
This dictionary implementation allows adding a threshold to prune out less frequent terms in cases where very common terms may overwhelm other terms.
@ -312,7 +296,6 @@ This dictionary implementation takes one parameter in addition to parameters des
`threshold`::
A value between zero and one representing the minimum fraction of the total documents where a term should appear in order to be added to the lookup dictionary.
[[Suggester-FileDictionaryFactory]]
==== FileDictionaryFactory
This dictionary implementation allows using an external file that contains suggest entries. Weights and payloads can also be used.
@ -332,7 +315,6 @@ accidentally 2.0
accommodate 3.0
----
[[Suggester-MultipleDictionaries]]
=== Multiple Dictionaries
It is possible to include multiple `dictionaryImpl` definitions in a single SuggestComponent definition.
@ -364,9 +346,8 @@ To do this, simply define separate suggesters, as in this example:
</searchComponent>
----
When using these Suggesters in a query, you would define multiple `suggest.dictionary` parameters in the request, referring to the names given for each Suggester in the search component definition. The response will include the terms in sections for each Suggester. See the <<Suggester-ExampleUsages,Examples>> section below for an example request and response.
When using these Suggesters in a query, you would define multiple `suggest.dictionary` parameters in the request, referring to the names given for each Suggester in the search component definition. The response will include the terms in sections for each Suggester. See the <<Example Usages>> section below for an example request and response.
[[Suggester-AddingtheSuggestRequestHandler]]
== Adding the Suggest Request Handler
After adding the search component, a request handler must be added to `solrconfig.xml`. This request handler works the <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#requesthandlers-and-searchcomponents-in-solrconfig,same as any other request handler>>, and allows you to configure default parameters for serving suggestion requests. The request handler definition must incorporate the "suggest" search component defined previously.
@ -384,7 +365,6 @@ After adding the search component, a request handler must be added to `solrconfi
</requestHandler>
----
[[Suggester-SuggestRequestHandlerParameters]]
=== Suggest Request Handler Parameters
The following parameters allow you to set defaults for the Suggest request handler:
@ -424,10 +404,8 @@ These properties can also be overridden at query time, or not set in the request
Context filtering (`suggest.cfq`) is currently only supported by `AnalyzingInfixLookupFactory` and `BlendedInfixLookupFactory`, and only when backed by a `Document*Dictionary`. All other implementations will return unfiltered matches as if filtering was not requested.
====
[[Suggester-ExampleUsages]]
== Example Usages
[[Suggester-GetSuggestionswithWeights]]
=== Get Suggestions with Weights
This is a basic suggestion using a single dictionary and a single Solr core.
@ -478,8 +456,7 @@ Example response:
}
----
[[Suggester-MultipleDictionaries.1]]
=== Multiple Dictionaries
=== Using Multiple Dictionaries
If you have defined multiple dictionaries, you can use them in queries.
@ -531,7 +508,6 @@ Example response:
}
----
[[Suggester-ContextFiltering]]
=== Context Filtering
Context filtering lets you filter suggestions by a separate context field, such as category, department or any other token. The `AnalyzingInfixLookupFactory` and `BlendedInfixLookupFactory` currently support this feature, when backed by `DocumentDictionaryFactory`.

View File

@ -31,7 +31,6 @@ All of the sample configuration and queries used in this section assume you are
bin/solr -e techproducts
----
[[TheQueryElevationComponent-ConfiguringtheQueryElevationComponent]]
== Configuring the Query Elevation Component
You can configure the Query Elevation Component in the `solrconfig.xml` file. Search components like `QueryElevationComponent` may be added to any request handler; a dedicated request handler is used here for brevity.
@ -72,7 +71,6 @@ Path to the file that defines query elevation. This file must exist in `<instanc
`forceElevation`::
By default, this component respects the requested `sort` parameter: if the request asks to sort by date, it will order the results by date. If `forceElevation=true` (the default), results will first return the boosted docs, then order by date.
[[TheQueryElevationComponent-elevate.xml]]
=== elevate.xml
Elevated query results are configured in an external XML file specified in the `config-file` argument. An `elevate.xml` file might look like this:
@ -95,10 +93,8 @@ Elevated query results are configured in an external XML file specified in the `
In this example, the query "foo bar" would first return documents 1, 2 and 3, then whatever normally appears for the same query. For the query "ipod", it would first return "MA147LL/A", and would make sure that "IW-02" is not in the result set.
[[TheQueryElevationComponent-UsingtheQueryElevationComponent]]
== Using the Query Elevation Component
[[TheQueryElevationComponent-TheenableElevationParameter]]
=== The enableElevation Parameter
For debugging it may be useful to see results with and without the elevated docs. To hide results, use `enableElevation=false`:
@ -107,21 +103,18 @@ For debugging it may be useful to see results with and without the elevated docs
`\http://localhost:8983/solr/techproducts/elevate?q=ipod&df=text&debugQuery=true&enableElevation=false`
[[TheQueryElevationComponent-TheforceElevationParameter]]
=== The forceElevation Parameter
You can force elevation during runtime by adding `forceElevation=true` to the query URL:
`\http://localhost:8983/solr/techproducts/elevate?q=ipod&df=text&debugQuery=true&enableElevation=true&forceElevation=true`
[[TheQueryElevationComponent-TheexclusiveParameter]]
=== The exclusive Parameter
You can force Solr to return only the results specified in the elevation file by adding `exclusive=true` to the URL:
`\http://localhost:8983/solr/techproducts/elevate?q=ipod&df=text&debugQuery=true&exclusive=true`
[[TheQueryElevationComponent-DocumentTransformersandthemarkExcludesParameter]]
=== Document Transformers and the markExcludes Parameter
The `[elevated]` <<transforming-result-documents.adoc#transforming-result-documents,Document Transformer>> can be used to annotate each document with information about whether or not it was elevated:
@ -132,7 +125,6 @@ Likewise, it can be helpful when troubleshooting to see all matching documents
`\http://localhost:8983/solr/techproducts/elevate?q=ipod&df=text&markExcludes=true&fl=id,[elevated],[excluded]`
[[TheQueryElevationComponent-TheelevateIdsandexcludeIdsParameters]]
=== The elevateIds and excludeIds Parameters
When the elevation component is in use, the pre-configured list of elevations for a query can be overridden at request time to use the unique keys specified in these request parameters.
@ -147,7 +139,6 @@ For example, in the request below documents IW-02 and F8V7067-APL-KIT will be el
`\http://localhost:8983/solr/techproducts/elevate?q=ipod&df=text&elevateIds=IW-02,F8V7067-APL-KIT`
[[TheQueryElevationComponent-ThefqParameter]]
=== The fq Parameter
=== The fq Parameter with Elevation
Query elevation respects the standard filter query (`fq`) parameter. That is, if the query contains the `fq` parameter, all results will be within that filter even if `elevate.xml` adds other documents to the result set.

View File

@ -27,7 +27,6 @@ The sample queries in this section assume you are running the "```techproducts``
bin/solr -e techproducts
----
[[TheStatsComponent-StatsComponentParameters]]
== Stats Component Parameters
The Stats Component accepts the following parameters:
@ -41,8 +40,7 @@ Specifies a field for which statistics should be generated. This parameter may b
<<local-parameters-in-queries.adoc#local-parameters-in-queries,Local Parameters>> may be used to indicate which subset of the supported statistics should be computed, and/or that statistics should be computed over the results of an arbitrary numeric function (or query) instead of a simple field name. See the examples below.
[[TheStatsComponent-Example]]
=== Example
=== Stats Component Example
The query below demonstrates computing stats against two different fields numeric fields, as well as stats over the results of a `termfreq()` function call using the `text` field:
@ -89,10 +87,9 @@ The query below demonstrates computing stats against two different fields numeri
</lst>
----
[[TheStatsComponent-StatisticsSupported]]
== Statistics Supported
The table below explains the statistics supported by the Stats component. Not all statistics are supported for all field types, and not all statistics are computed by default (see <<TheStatsComponent-LocalParameters,Local Parameters>> below for details)
The table below explains the statistics supported by the Stats component. Not all statistics are supported for all field types, and not all statistics are computed by default (see <<Local Parameters with the Stats Component>> below for details)
`min`::
The minimum value of the field/function in all documents in the set. This statistic is computed for all field types and is computed by default.
@ -134,14 +131,13 @@ Input for this option can be floating point number between `0.0` and `1.0` indic
+
This statistic is computed for all field types but is not computed by default.
[[TheStatsComponent-LocalParameters]]
== Local Parameters
== Local Parameters with the Stats Component
Similar to the <<faceting.adoc#faceting,Facet Component>>, the `stats.field` parameter supports local parameters for:
* Tagging & Excluding Filters: `stats.field={!ex=filterA}price`
* Changing the Output Key: `stats.field={!key=my_price_stats}price`
* Tagging stats for <<TheStatsComponent-TheStatsComponentandFaceting,use with `facet.pivot`>>: `stats.field={!tag=my_pivot_stats}price`
* Tagging stats for <<The Stats Component and Faceting,use with `facet.pivot`>>: `stats.field={!tag=my_pivot_stats}price`
Local parameters can also be used to specify individual statistics by name, overriding the set of statistics computed by default, eg: `stats.field={!min=true max=true percentiles='99,99.9,99.99'}price`
@ -159,8 +155,7 @@ Additional "Expert" local params are supported in some cases for affecting the b
** `hllLog2m` - an integer value specifying an explicit "log2m" value to use, overriding the heuristic value determined by the cardinality local param and the field type see the https://github.com/aggregateknowledge/java-hll/[java-hll] documentation for more details
** `hllRegwidth` - an integer value specifying an explicit "regwidth" value to use, overriding the heuristic value determined by the cardinality local param and the field type see the https://github.com/aggregateknowledge/java-hll/[java-hll] documentation for more details
[[TheStatsComponent-Examples]]
=== Examples
=== Examples with Local Parameters
Here we compute some statistics for the price field. The min, max, mean, 90th, and 99th percentile price values are computed against all products that are in stock (`q=*:*` and `fq=inStock:true`), and independently all of the default statistics are computed against all products regardless of whether they are in stock or not (by excluding that filter).
@ -193,7 +188,6 @@ Here we compute some statistics for the price field. The min, max, mean, 90th, a
</lst>
----
[[TheStatsComponent-TheStatsComponentandFaceting]]
== The Stats Component and Faceting
Sets of `stats.field` parameters can be referenced by `'tag'` when using Pivot Faceting to compute multiple statistics at every level (i.e.: field) in the tree of pivot constraints.

View File

@ -22,8 +22,7 @@ The TermVectorComponent is a search component designed to return additional info
For each document in the response, the TermVectorCcomponent can return the term vector, the term frequency, inverse document frequency, position, and offset information.
[[TheTermVectorComponent-Configuration]]
== Configuration
== Term Vector Component Configuration
The TermVectorComponent is not enabled implicitly in Solr - it must be explicitly configured in your `solrconfig.xml` file. The examples on this page show how it is configured in Solr's "```techproducts```" example:
@ -67,7 +66,6 @@ Once your handler is defined, you may use in conjunction with any schema (that h
termOffsets="true" />
----
[[TheTermVectorComponent-InvokingtheTermVectorComponent]]
== Invoking the Term Vector Component
The example below shows an invocation of this component using the above configuration:
@ -124,8 +122,7 @@ The example below shows an invocation of this component using the above configur
</lst>
----
[[TheTermVectorComponent-RequestParameters]]
=== Request Parameters
=== Term Vector Request Parameters
The example below shows some of the available request parameters for this component:
@ -168,7 +165,6 @@ To learn more about TermVector component output, see the Wiki page: http://wiki.
For schema requirements, see also the section <<field-properties-by-use-case.adoc#field-properties-by-use-case, Field Properties by Use Case>>.
[[TheTermVectorComponent-SolrJandtheTermVectorComponent]]
== SolrJ and the Term Vector Component
Neither the `SolrQuery` class nor the `QueryResponse` class offer specific method calls to set Term Vector Component parameters or get the "termVectors" output. However, there is a patch for it: https://issues.apache.org/jira/browse/SOLR-949[SOLR-949].

View File

@ -37,7 +37,5 @@ This section covers the following topics:
[IMPORTANT]
====
The focus of this section is generally on configuring a single Solr instance, but for those interested in scaling a Solr implementation in a cluster environment, see also the section <<solrcloud.adoc#solrcloud,SolrCloud>>. There are also options to scale through sharding or replication, described in the section <<legacy-scaling-and-distribution.adoc#legacy-scaling-and-distribution,Legacy Scaling and Distribution>>.
====

View File

@ -20,16 +20,29 @@
If you have JSON documents that you would like to index without transforming them into Solr's structure, you can add them to Solr by including some parameters with the update request. These parameters provide information on how to split a single JSON file into multiple Solr documents and how to map fields to Solr's schema. One or more valid JSON documents can be sent to the `/update/json/docs` path with the configuration params.
[[TransformingandIndexingCustomJSON-MappingParameters]]
== Mapping Parameters
These parameters allow you to define how a JSON file should be read for multiple Solr documents.
* **split**: Defines the path at which to split the input JSON into multiple Solr documents and is required if you have multiple documents in a single JSON file. If the entire JSON makes a single solr document, the path must be “`/`”. It is possible to pass multiple split paths by separating them with a pipe `(|)` example : `split=/|/foo|/foo/bar` . If one path is a child of another, they automatically become a child document **f**: This is a multivalued mapping parameter. The format of the parameter is` target-field-name:json-path`. The `json-path` is required. The `target-field-name` is the Solr document field name, and is optional. If not specified, it is automatically derived from the input JSON.The default target field name is the fully qualified name of the field. Wildcards can be used here, see the <<TransformingandIndexingCustomJSON-Wildcards,Wildcards>> below for more information.
* *mapUniqueKeyOnly* (boolean): This parameter is particularly convenient when the fields in the input JSON are not available in the schema and <<schemaless-mode.adoc#schemaless-mode,schemaless mode>> is not enabled. This will index all the fields into the default search field (using the `df` parameter, below) and only the `uniqueKey` field is mapped to the corresponding field in the schema. If the input JSON does not have a value for the `uniqueKey` field then a UUID is generated for the same.
* **df**: If the `mapUniqueKeyOnly` flag is used, the update handler needs a field where the data should be indexed to. This is the same field that other handlers use as a default search field.
* **srcField**: This is the name of the field to which the JSON source will be stored into. This can only be used if `split=/` (i.e., you want your JSON input file to be indexed as a single Solr document). Note that atomic updates will cause the field to be out-of-sync with the document.
* **echo**: This is for debugging purpose only. Set it to true if you want the docs to be returned as a response. Nothing will be indexed.
split::
Defines the path at which to split the input JSON into multiple Solr documents and is required if you have multiple documents in a single JSON file. If the entire JSON makes a single solr document, the path must be “`/`”. It is possible to pass multiple split paths by separating them with a pipe `(|)` example : `split=/|/foo|/foo/bar`. If one path is a child of another, they automatically become a child document
f::
A multivalued mapping parameter. The format of the parameter is `target-field-name:json-path`. The `json-path` is required. The `target-field-name` is the Solr document field name, and is optional. If not specified, it is automatically derived from the input JSON. The default target field name is the fully qualified name of the field.
+
Wildcards can be used here, see <<Using Wildcards for Field Names>> below for more information.
mapUniqueKeyOnly::
(boolean) This parameter is particularly convenient when the fields in the input JSON are not available in the schema and <<schemaless-mode.adoc#schemaless-mode,schemaless mode>> is not enabled. This will index all the fields into the default search field (using the `df` parameter, below) and only the `uniqueKey` field is mapped to the corresponding field in the schema. If the input JSON does not have a value for the `uniqueKey` field then a UUID is generated for the same.
df::
If the `mapUniqueKeyOnly` flag is used, the update handler needs a field where the data should be indexed to. This is the same field that other handlers use as a default search field.
srcField::
This is the name of the field to which the JSON source will be stored into. This can only be used if `split=/` (i.e., you want your JSON input file to be indexed as a single Solr document). Note that atomic updates will cause the field to be out-of-sync with the document.
echo::
This is for debugging purpose only. Set it to `true` if you want the docs to be returned as a response. Nothing will be indexed.
For example, if we have a JSON file that includes two documents, we could define an update request like this:
@ -152,15 +165,16 @@ In this example, we simply named the field paths (such as `/exams/test`). Solr w
[TIP]
====
Documents WILL get rejected if the fields do not exist in the schema before indexing. So, if you are NOT using schemaless mode, pre-create those fields. If you are working in <<schemaless-mode.adoc#schemaless-mode,Schemaless Mode>>, fields that don't exist will be created on the fly with Solr's best guess for the field type.
Documents WILL get rejected if the fields do not exist in the schema before indexing. So, if you are NOT using schemaless mode, pre-create those fields. If you are working in <<schemaless-mode.adoc#schemaless-mode,Schemaless Mode>>, fields that don't exist will be created on the fly with Solr's best guess for the field type.
====
[[TransformingandIndexingCustomJSON-Wildcards]]
== Wildcards
== Using Wildcards for Field Names
Instead of specifying all the field names explicitly, it is possible to specify wildcards to map fields automatically. There are two restrictions: wildcards can only be used at the end of the `json-path`, and the split path cannot use wildcards. A single asterisk `\*` maps only to direct children, and a double asterisk `\*\*` maps recursively to all descendants. The following are example wildcard path mappings:
Instead of specifying all the field names explicitly, it is possible to specify wildcards to map fields automatically.
There are two restrictions: wildcards can only be used at the end of the `json-path`, and the split path cannot use wildcards.
A single asterisk `\*` maps only to direct children, and a double asterisk `\*\*` maps recursively to all descendants. The following are example wildcard path mappings:
* `f=$FQN:/**`: maps all fields to the fully qualified name (`$FQN`) of the JSON field. The fully qualified name is obtained by concatenating all the keys in the hierarchy with a period (`.`) as a delimiter. This is the default behavior if no `f` path mappings are specified.
* `f=/docs/*`: maps all the fields under docs and in the name as given in json
@ -217,7 +231,7 @@ curl 'http://localhost:8983/solr/my_collection/update/json/docs'\
"test" : "term1",
"marks" : 86}
]
}'
}'
----
In the above example, we've said all of the fields should be added to a field in Solr named 'txt'. This will add multiple fields to a single field, so whatever field you choose should be multi-valued.
@ -247,7 +261,7 @@ curl 'http://localhost:8983/solr/my_collection/update/json/docs?split=/exams'\
The indexed documents would be added to the index with fields that look like this:
[source,bash]
[source,json]
----
{
"first":"John",
@ -265,8 +279,7 @@ The indexed documents would be added to the index with fields that look like thi
"exams.marks":86}
----
[[TransformingandIndexingCustomJSON-MultipledocumentsinaSinglePayload]]
== Multiple documents in a Single Payload
== Multiple Documents in a Single Payload
This functionality supports documents in the http://jsonlines.org/[JSON Lines] format (`.jsonl`), which specifies one document per line.
@ -288,7 +301,6 @@ curl 'http://localhost:8983/solr/my_collection/update/json/docs' -H 'Content-typ
{ "first":"Steve", "last":"Woz", "grade":1, "subject": "Calculus", "test" : "term1", "marks" : 86}]'
----
[[TransformingandIndexingCustomJSON-IndexingNestedDocuments]]
== Indexing Nested Documents
The following is an example of indexing nested documents:
@ -332,14 +344,12 @@ With this example, the documents indexed would be, as follows:
"zip":95014}]}
----
[[TransformingandIndexingCustomJSON-TipsforCustomJSONIndexing]]
== Tips for Custom JSON Indexing
1. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
2. Pre-created Schema : Post your docs to the `/update/json/docs` endpoint with `echo=true`. This gives you the list of field names you need to create. Create the fields before you actually index
3. No schema, only full-text search : All you need to do is to do full-text search on your JSON. Set the configuration as given in the Setting JSON Defaults section.
. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
. Pre-created Schema: Post your docs to the `/update/json/docs` endpoint with `echo=true`. This gives you the list of field names you need to create. Create the fields before you actually index
. No schema, only full-text search : All you need to do is to do full-text search on your JSON. Set the configuration as given in the Setting JSON Defaults section.
[[TransformingandIndexingCustomJSON-SettingJSONDefaults]]
== Setting JSON Defaults
It is possible to send any json to the `/update/json/docs` endpoint and the default configuration of the component is as follows:

View File

@ -20,7 +20,6 @@
Document Transformers can be used to modify the information returned about each documents in the results of a query.
[[TransformingResultDocuments-UsingDocumentTransformers]]
== Using Document Transformers
When executing a request, a document transformer can be used by including it in the `fl` parameter using square brackets, for example:
@ -46,11 +45,9 @@ fl=id,name,score,my_val_a:[value v=42 t=int],my_val_b:[value v=7 t=float]
The sections below discuss exactly what these various transformers do.
[[TransformingResultDocuments-AvailableTransformers]]
== Available Transformers
[[TransformingResultDocuments-_value_-ValueAugmenterFactory]]
=== [value] - ValueAugmenterFactory
Modifies every document to include the exact same value, as if it were a stored field in every document:
@ -94,7 +91,6 @@ In addition to using these request parameters, you can configure additional name
The "```value```" option forces an explicit value to always be used, while the "```defaultValue```" option provides a default that can still be overridden using the "```v```" and "```t```" local parameters.
[[TransformingResultDocuments-_explain_-ExplainAugmenterFactory]]
=== [explain] - ExplainAugmenterFactory
Augments each document with an inline explanation of its score exactly like the information available about each document in the debug section:
@ -128,8 +124,6 @@ A default style can be configured by specifying an "args" parameter in your conf
</transformer>
----
[[TransformingResultDocuments-_child_-ChildDocTransformerFactory]]
=== [child] - ChildDocTransformerFactory
This transformer returns all <<uploading-data-with-index-handlers.adoc#UploadingDatawithIndexHandlers-NestedChildDocuments,descendant documents>> of each parent document matching your query in a flat list nested inside the matching parent document. This is useful when you have indexed nested child documents and want to retrieve the child documents for the relevant parent documents for any type of search query.
@ -147,7 +141,6 @@ When using this transformer, the `parentFilter` parameter must be specified, and
* `limit` - the maximum number of child documents to be returned per parent document (default: 10)
[[TransformingResultDocuments-_shard_-ShardAugmenterFactory]]
=== [shard] - ShardAugmenterFactory
This transformer adds information about what shard each individual document came from in a distributed request.
@ -155,7 +148,6 @@ This transformer adds information about what shard each individual document came
ShardAugmenterFactory does not support any request parameters, or configuration options.
[[TransformingResultDocuments-_docid_-DocIdAugmenterFactory]]
=== [docid] - DocIdAugmenterFactory
This transformer adds the internal Lucene document id to each document this is primarily only useful for debugging purposes.
@ -163,7 +155,6 @@ This transformer adds the internal Lucene document id to each document this
DocIdAugmenterFactory does not support any request parameters, or configuration options.
[[TransformingResultDocuments-_elevated_and_excluded_]]
=== [elevated] and [excluded]
These transformers are available only when using the <<the-query-elevation-component.adoc#the-query-elevation-component,Query Elevation Component>>.
@ -195,7 +186,6 @@ fl=id,[elevated],[excluded]&excludeIds=GB18030TEST&elevateIds=6H500F0&markExclud
----
[[TransformingResultDocuments-_json_xml_]]
=== [json] / [xml]
These transformers replace field value containing a string representation of a valid XML or JSON structure with the actual raw XML or JSON structure rather than just the string value. Each applies only to the specific writer, such that `[json]` only applies to `wt=json` and `[xml]` only applies to `wt=xml`.
@ -206,7 +196,6 @@ fl=id,source_s:[json]&wt=json
----
[[TransformingResultDocuments-_subquery_]]
=== [subquery]
This transformer executes a separate query per transforming document passing document fields as an input for subquery parameters. It's usually used with `{!join}` and `{!parent}` query parsers, and is intended to be an improvement for `[child]`.
@ -261,8 +250,7 @@ Here is how it looks like in various formats:
SolrDocumentList subResults = (SolrDocumentList)doc.getFieldValue("children");
----
[[TransformingResultDocuments-Subqueryresultfields]]
==== Subquery result fields
==== Subquery Result Fields
To appear in subquery document list, a field should be specified both fl parameters, in main one fl (despite the main result documents have no this field) and in subquery's one eg `foo.fl`. Of course, you can use wildcard in any or both of these parameters. For example, if field title should appear in categories subquery, it can be done via one of these ways.
@ -274,14 +262,12 @@ fl=...*,categories:[subquery]&categories.fl=*&categories.q=...
fl=...*,categories:[subquery]&categories.fl=*&categories.q=...
----
[[TransformingResultDocuments-SubqueryParametersShift]]
==== Subquery Parameters Shift
If subquery is declared as `fl=*,foo:[subquery]`, subquery parameters are prefixed with the given name and period. eg
`q=*:*&fl=*,**foo**:[subquery]&**foo.**q=to be continued&**foo.**rows=10&**foo.**sort=id desc`
[[TransformingResultDocuments-DocumentFieldasanInputforSubqueryParameters]]
==== Document Field as an Input for Subquery Parameters
It's necessary to pass some document field values as a parameter for subquery. It's supported via implicit *`row.__fieldname__`* parameter, and can be (but might not only) referred via Local Parameters syntax: `q=namne:john&fl=name,id,depts:[subquery]&depts.q={!terms f=id **v=$row.dept_id**}&depts.rows=10`
@ -292,7 +278,6 @@ Note, when document field has multiple values they are concatenated with comma b
To log substituted subquery request parameters, add the corresponding parameter names, as in `depts.logParamsList=q,fl,rows,**row.dept_id**`
[[TransformingResultDocuments-CoresandCollectionsinSolrCloud]]
==== Cores and Collections in SolrCloud
Use `foo:[subquery fromIndex=departments]` to invoke subquery on another core on the same node, it's what *`{!join}`* does for non-SolrCloud mode. But in case of SolrCloud just (and only) explicitly specify its' native parameters like `collection, shards` for subquery, eg:
@ -301,13 +286,10 @@ Use `foo:[subquery fromIndex=departments]` to invoke subquery on another core on
[IMPORTANT]
====
If subquery collection has a different unique key field name (let's say `foo_id` at contrast to `id` in primary collection), add the following parameters to accommodate this difference: `foo.fl=id:foo_id&foo.distrib.singlePass=true`. Otherwise you'll get `NullPoniterException` from `QueryComponent.mergeIds`.
====
[[TransformingResultDocuments-_geo_-Geospatialformatter]]
=== [geo] - Geospatial formatter
Formats spatial data from a spatial field using a designated format type name. Two inner parameters are required: `f` for the field name, and `w` for the format name. Example: `geojson:[geo f=mySpatialField w=GeoJSON]`.
@ -317,7 +299,6 @@ Normally you'll simply be consistent in choosing the format type you want by set
In addition, this feature is very useful with the `RptWithGeometrySpatialField` to avoid double-storage of the potentially large vector geometry. This transformer will detect that field type and fetch the geometry from an internal compact binary representation on disk (in docValues), and then format it as desired. As such, you needn't mark the field as stored, which would be redundant. In a sense this double-storage between docValues and stored-value storage isn't unique to spatial but with polygonal geometry it can be a lot of data, and furthermore you'd like to avoid storing it in a verbose format (like GeoJSON or WKT).
[[TransformingResultDocuments-_features_-LTRFeatureLoggerTransformerFactory]]
=== [features] - LTRFeatureLoggerTransformerFactory
The "LTR" prefix stands for <<learning-to-rank.adoc#learning-to-rank,Learning To Rank>>. This transformer returns the values of features and it can be used for feature extraction and feature logging.

View File

@ -20,7 +20,6 @@
You can integrate the Apache Unstructured Information Management Architecture (https://uima.apache.org/[UIMA]) with Solr. UIMA lets you define custom pipelines of Analysis Engines that incrementally add metadata to your documents as annotations.
[[UIMAIntegration-ConfiguringUIMA]]
== Configuring UIMA
The SolrUIMA UpdateRequestProcessor is a custom update request processor that takes documents being indexed, sends them to a UIMA pipeline, and then returns the documents enriched with the specified metadata. To configure UIMA for Solr, follow these steps:
@ -123,4 +122,3 @@ The SolrUIMA UpdateRequestProcessor is a custom update request processor that ta
Once you are done with the configuration your documents will be automatically enriched with the specified fields when you index them.
For more information about Solr UIMA integration, see https://wiki.apache.org/solr/SolrUIMA.

View File

@ -25,16 +25,12 @@ The following sections describe how Solr breaks down and works with textual data
* <<about-tokenizers.adoc#about-tokenizers,Tokenizers>> break field data into lexical units, or _tokens_.
* <<about-filters.adoc#about-filters,Filters>> examine a stream of tokens and keep them, transform or discard them, or create new ones. Tokenizers and filters may be combined to form pipelines, or _chains_, where the output of one is input to the next. Such a sequence of tokenizers and filters is called an _analyzer_ and the resulting output of an analyzer is used to match query results or build indices.
[[UnderstandingAnalyzers_Tokenizers_andFilters-UsingAnalyzers_Tokenizers_andFilters]]
== Using Analyzers, Tokenizers, and Filters
Although the analysis process is used for both indexing and querying, the same analysis process need not be used for both operations. For indexing, you often want to simplify, or normalize, words. For example, setting all letters to lowercase, eliminating punctuation and accents, mapping words to their stems, and so on. Doing so can increase recall because, for example, "ram", "Ram" and "RAM" would all match a query for "ram". To increase query-time precision, a filter could be employed to narrow the matches by, for example, ignoring all-cap acronyms if you're interested in male sheep, but not Random Access Memory.
The tokens output by the analysis process define the values, or _terms_, of that field and are used either to build an index of those terms when a new document is added, or to identify which documents contain the terms you are querying for.
[[UnderstandingAnalyzers_Tokenizers_andFilters-ForMoreInformation]]
=== For More Information
These sections will show you how to configure field analyzers and also serves as a reference for the details of configuring each of the available tokenizer and filter classes. It also serves as a guide so that you can configure your own analysis classes if you have special needs that cannot be met with the included filters or tokenizers.

View File

@ -22,8 +22,7 @@ Every update request received by Solr is run through a chain of plugins known as
This can be useful, for example, to add a field to the document being indexed; to change the value of a particular field; or to drop an update if the incoming document doesn't fulfill certain criteria. In fact, a surprisingly large number of features in Solr are implemented as Update Processors and therefore it is necessary to understand how such plugins work and where are they configured.
[[UpdateRequestProcessors-AnatomyandLifecycle]]
== Anatomy and Lifecycle
== URP Anatomy and Lifecycle
An Update Request Processor is created as part of a {solr-javadocs}/solr-core/org/apache/solr/update/processor/UpdateRequestProcessorChain.html[chain] of one or more update processors. Solr creates a default update request processor chain comprising of a few update request processors which enable essential Solr features. This default chain is used to process every update request unless a user chooses to configure and specify a different custom update request processor chain.
@ -38,14 +37,12 @@ When an update request is received by Solr, it looks up the update chain to be u
NOTE: A single update request may contain a batch of multiple new documents or deletes and therefore the corresponding processXXX methods of an UpdateRequestProcessor will be invoked multiple times for every individual update. However, it is guaranteed that a single thread will serially invoke these methods.
[[UpdateRequestProcessors-Configuration]]
== Configuration
== Update Request Processor Configuration
Update request processors chains can be created by either creating the whole chain directly in `solrconfig.xml` or by creating individual update processors in `solrconfig.xml` and then dynamically creating the chain at run-time by specifying all processors via request parameters.
However, before we understand how to configure update processor chains, we must learn about the default update processor chain because it provides essential features which are needed in most custom request processor chains as well.
[[UpdateRequestProcessors-DefaultUpdateRequestProcessorChain]]
=== Default Update Request Processor Chain
In case no update processor chains are configured in `solrconfig.xml`, Solr will automatically create a default update processor chain which will be used for all update requests. This default update processor chain consists of the following processors (in order):
@ -56,7 +53,6 @@ In case no update processor chains are configured in `solrconfig.xml`, Solr will
Each of these perform an essential function and as such any custom chain usually contain all of these processors. The `RunUpdateProcessorFactory` is usually the last update processor in any custom chain.
[[UpdateRequestProcessors-CustomUpdateRequestProcessorChain]]
=== Custom Update Request Processor Chain
The following example demonstrates how a custom chain can be configured inside `solrconfig.xml`.
@ -85,7 +81,6 @@ In the above example, a new update processor chain named "dedupe" is created wit
Do not forget to add `RunUpdateProcessorFactory` at the end of any chains you define in `solrconfig.xml`. Otherwise update requests processed by that chain will not actually affect the indexed data.
====
[[UpdateRequestProcessors-ConfiguringIndividualProcessorsasTop-LevelPlugins]]
=== Configuring Individual Processors as Top-Level Plugins
Update request processors can also be configured independent of a chain in `solrconfig.xml`.
@ -113,7 +108,6 @@ In this case, an instance of `SignatureUpdateProcessorFactory` is configured wit
</updateProcessorChain>
----
[[UpdateRequestProcessors-UpdateProcessorsinSolrCloud]]
== Update Processors in SolrCloud
In a single node, stand-alone Solr, each update is run through all the update processors in a chain exactly once. But the behavior of update request processors in SolrCloud deserves special consideration.
@ -158,10 +152,8 @@ If the `AtomicUpdateProcessorFactory` is in the update chain before the `Distrib
Because `DistributedUpdateProcessor` is responsible for processing <<updating-parts-of-documents.adoc#updating-parts-of-documents,Atomic Updates>> into full documents on the leader node, this means that pre-processors which are executed only on the forwarding nodes can only operate on the partial document. If you have a processor which must process a full document then the only choice is to specify it as a post-processor.
[[UpdateRequestProcessors-UsingCustomChains]]
== Using Custom Chains
[[UpdateRequestProcessors-update.chainRequestParameter]]
=== update.chain Request Parameter
The `update.chain` parameter can be used in any update request to choose a custom chain which has been configured in `solrconfig.xml`. For example, in order to choose the "dedupe" chain described in a previous section, one can issue the following request:
@ -187,7 +179,6 @@ curl "http://localhost:8983/solr/gettingstarted/update/json?update.chain=dedupe&
The above should dedupe the two identical documents and index only one of them.
[[UpdateRequestProcessors-Processor_Post-ProcessorRequestParameters]]
=== Processor & Post-Processor Request Parameters
We can dynamically construct a custom update request processor chain using the `processor` and `post-processor` request parameters. Multiple processors can be specified as a comma-separated value for these two parameters. For example:
@ -232,7 +223,6 @@ curl "http://localhost:8983/solr/gettingstarted/update/json?processor=remove_bla
In the first example, Solr will dynamically create a chain which has "signature" and "remove_blanks" as pre-processors to be executed only on the forwarding node where as in the second example, "remove_blanks" will be executed as a pre-processor and "signature" will be executed on the leader and replicas as a post-processor.
[[UpdateRequestProcessors-ConfiguringaCustomChainasaDefault]]
=== Configuring a Custom Chain as a Default
We can also specify a custom chain to be used by default for all requests sent to specific update handlers instead of specifying the names in request parameters for each request.
@ -263,12 +253,10 @@ Alternately, one can achieve a similar effect using the "defaults" as shown in t
</requestHandler>
----
[[UpdateRequestProcessors-UpdateRequestProcessorFactories]]
== Update Request Processor Factories
What follows are brief descriptions of the currently available update request processors. An `UpdateRequestProcessorFactory` can be integrated into an update chain in `solrconfig.xml` as necessary. You are strongly urged to examine the Javadocs for these classes; these descriptions are abridged snippets taken for the most part from the Javadocs.
[[UpdateRequestProcessors-GeneralUseUpdateProcessorFactories]]
=== General Use UpdateProcessorFactories
{solr-javadocs}/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html[AddSchemaFieldsUpdateProcessorFactory]:: This processor will dynamically add fields to the schema if an input document contains one or more fields that don't match any field or dynamic field in the schema.
@ -300,7 +288,6 @@ What follows are brief descriptions of the currently available update request pr
{solr-javadocs}/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html[UUIDUpdateProcessorFactory]:: An update processor that adds a newly generated UUID value to any document being added that does not already have a value in the specified field.
[[UpdateRequestProcessors-FieldMutatingUpdateProcessorFactoryDerivedFactories]]
=== FieldMutatingUpdateProcessorFactory Derived Factories
These factories all provide functionality to _modify_ fields in a document as they're being indexed. When using any of these factories, please consult the {solr-javadocs}/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html[FieldMutatingUpdateProcessorFactory javadocs] for details on the common options they all support for configuring which fields are modified.
@ -349,7 +336,6 @@ These factories all provide functionality to _modify_ fields in a document as th
{solr-javadocs}/solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html[UniqFieldsUpdateProcessorFactory]:: Removes duplicate values found in fields matching the specified conditions.
[[UpdateRequestProcessors-UpdateProcessorFactoriesThatCanBeLoadedasPlugins]]
=== Update Processor Factories That Can Be Loaded as Plugins
These processors are included in Solr releases as "contribs", and require additional jars loaded at runtime. See the README files associated with each contrib for details:
@ -364,7 +350,6 @@ The {solr-javadocs}/solr-uima/index.html[`uima`] contrib provides::
{solr-javadocs}/solr-uima/org/apache/solr/uima/processor/UIMAUpdateRequestProcessorFactory.html[UIMAUpdateRequestProcessorFactory]::: Update document(s) to be indexed with UIMA extracted information.
[[UpdateRequestProcessors-UpdateProcessorFactoriesYouShouldNotModifyorRemove]]
=== Update Processor Factories You Should _Not_ Modify or Remove
These are listed for completeness, but are part of the Solr infrastructure, particularly SolrCloud. Other than insuring you do _not_ remove them when modifying the update request handlers (or any copies you make), you will rarely, if ever, need to change these.
@ -377,11 +362,9 @@ These are listed for completeness, but are part of the Solr infrastructure, part
{solr-javadocs}/solr-core/org/apache/solr/update/processor/RunUpdateProcessorFactory.html[RunUpdateProcessorFactory]:: Executes the update commands using the underlying UpdateHandler. Almost all processor chains should end with an instance of `RunUpdateProcessorFactory` unless the user is explicitly executing the update commands in an alternative custom `UpdateRequestProcessorFactory`.
[[UpdateRequestProcessors-UpdateProcessorsThatCanBeUsedatRuntime]]
=== Update Processors That Can Be Used at Runtime
These Update processors do not need any configuration is your `solrconfig.xml` . They are automatically initialized when their name is added to the `processor` parameter. Multiple processors can be used by appending multiple processor names (comma separated)
[[UpdateRequestProcessors-TemplateUpdateProcessorFactory]]
==== TemplateUpdateProcessorFactory
The `TemplateUpdateProcessorFactory` can be used to add new fields to documents based on a template pattern.

View File

@ -28,7 +28,6 @@ The steps outlined on this page assume you use the default service name of "```s
====
[[UpgradingaSolrCluster-PlanningYourUpgrade]]
== Planning Your Upgrade
Here is a checklist of things you need to prepare before starting the upgrade process:
@ -49,19 +48,16 @@ If you are upgrading from an installation of Solr 5.x or later, these values can
You should now be ready to upgrade your cluster. Please verify this process in a test / staging cluster before doing it in production.
[[UpgradingaSolrCluster-UpgradeProcess]]
== Upgrade Process
The approach we recommend is to perform the upgrade of each Solr node, one-by-one. In other words, you will need to stop a node, upgrade it to the new version of Solr, and restart it before moving on to the next node. This means that for a short period of time, there will be a mix of "Old Solr" and "New Solr" nodes running in your cluster. We also assume that you will point the new Solr node to your existing Solr home directory where the Lucene index files are managed for each collection on the node. This means that you won't need to move any index files around to perform the upgrade.
[[UpgradingaSolrCluster-Step1_StopSolr]]
=== Step 1: Stop Solr
Begin by stopping the Solr node you want to upgrade. After stopping the node, if using a replication, (ie: collections with replicationFactor > 1) verify that all leaders hosted on the downed node have successfully migrated to other replicas; you can do this by visiting the <<cloud-screens.adoc#cloud-screens,Cloud panel in the Solr Admin UI>>. If not using replication, then any collections with shards hosted on the downed node will be temporarily off-line.
[[UpgradingaSolrCluster-Step2_InstallSolrasaService]]
=== Step 2: Install Solr as a Service
Please follow the instructions to install Solr as a Service on Linux documented at <<taking-solr-to-production.adoc#taking-solr-to-production,Taking Solr to Production>>. Use the `-n` parameter to avoid automatic start of Solr by the installer script. You need to update the `/etc/default/solr.in.sh` include file in the next step to complete the upgrade process.
@ -74,7 +70,6 @@ If you have a `/var/solr/solr.in.sh` file for your existing Solr install, runnin
====
[[UpgradingaSolrCluster-Step3_SetEnvironmentVariableOverrides]]
=== Step 3: Set Environment Variable Overrides
Open `/etc/default/solr.in.sh` with a text editor and verify that the following variables are set correctly, or add them bottom of the include file as needed:
@ -84,13 +79,10 @@ Open `/etc/default/solr.in.sh` with a text editor and verify that the following
Make sure the user you plan to own the Solr process is the owner of the `SOLR_HOME` directory. For instance, if you plan to run Solr as the "solr" user and `SOLR_HOME` is `/var/solr/data`, then you would do: `sudo chown -R solr: /var/solr/data`
[[UpgradingaSolrCluster-Step4_StartSolr]]
=== Step 4: Start Solr
You are now ready to start the upgraded Solr node by doing: `sudo service solr start`. The upgraded instance will join the existing cluster because you're using the same `SOLR_HOME`, `SOLR_PORT`, and `SOLR_HOST` settings used by the old Solr node; thus, the new server will look like the old node to the running cluster. Be sure to look in `/var/solr/logs/solr.log` for errors during startup.
[[UpgradingaSolrCluster-Step5_RunHealthcheck]]
=== Step 5: Run Healthcheck
You should run the Solr *healthcheck* command for all collections that are hosted on the upgraded node before proceeding to upgrade the next node in your cluster. For instance, if the newly upgraded node hosts a replica for the *MyDocuments* collection, then you can run the following command (replace ZK_HOST with the ZooKeeper connection string):

View File

@ -20,7 +20,6 @@
If you are already using Solr 6.5, Solr 6.6 should not present any major problems. However, you should review the {solr-javadocs}/changes/Changes.html[`CHANGES.txt`] file found in your Solr package for changes and updates that may effect your existing implementation. Detailed steps for upgrading a Solr cluster can be found in the appendix: <<upgrading-a-solr-cluster.adoc#upgrading-a-solr-cluster,Upgrading a Solr Cluster>>.
[[UpgradingSolr-Upgradingfrom6.5.x]]
== Upgrading from 6.5.x
* Solr contribs map-reduce, morphlines-core and morphlines-cell have been removed.
@ -29,7 +28,6 @@ If you are already using Solr 6.5, Solr 6.6 should not present any major problem
* ZooKeeper dependency has been upgraded from 3.4.6 to 3.4.10.
[[UpgradingSolr-Upgradingfromearlier6.xversions]]
== Upgrading from earlier 6.x versions
* If you use historical dates, specifically on or before the year 1582, you should re-index after upgrading to this version.
@ -52,7 +50,6 @@ If you are already using Solr 6.5, Solr 6.6 should not present any major problem
* Index-time boosts are now deprecated. As a replacement, index-time scoring factors should be indexed in a separate field and combined with the query score using a function query. These boosts will be removed in Solr 7.0.
* Parallel SQL now uses Apache Calcite as its SQL framework. As part of this change the default aggregation mode has been changed to facet rather than map_reduce. There have also been changes to the SQL aggregate response and some SQL syntax changes. Consult the <<parallel-sql-interface.adoc#parallel-sql-interface,Parallel SQL Interface>> documentation for full details.
[[UpgradingSolr-Upgradingfrom5.5.x]]
== Upgrading from 5.5.x
* The deprecated `SolrServer` and subclasses have been removed, use <<using-solrj.adoc#using-solrj,`SolrClient`>> instead.
@ -74,7 +71,6 @@ If you are already using Solr 6.5, Solr 6.6 should not present any major problem
* <<using-solrj.adoc#using-solrj,SolrJ>> no longer includes `DateUtil`. If for some reason you need to format or parse dates, simply use `Instant.format()` and `Instant.parse()`.
* If you are using spatial4j, please upgrade to 0.6 and <<spatial-search.adoc#spatial-search,edit your `spatialContextFactory`>> to replace `com.spatial4j.core` with `org.locationtech.spatial4j` .
[[UpgradingSolr-UpgradingfromOlderVersionsofSolr]]
== Upgrading from Older Versions of Solr
Users upgrading from older versions are strongly encouraged to consult {solr-javadocs}/changes/Changes.html[`CHANGES.txt`] for the details of _all_ changes since the version they are upgrading from.

View File

@ -26,8 +26,7 @@ If you want to supply your own `ContentHandler` for Solr to use, you can extend
For more information on Solr's Extracting Request Handler, see https://wiki.apache.org/solr/ExtractingRequestHandler.
[[UploadingDatawithSolrCellusingApacheTika-KeyConcepts]]
== Key Concepts
== Key Solr Cell Concepts
When using the Solr Cell framework, it is helpful to keep the following in mind:
@ -42,12 +41,9 @@ When using the Solr Cell framework, it is helpful to keep the following in mind:
[TIP]
====
While Apache Tika is quite powerful, it is not perfect and fails on some files. PDF files are particularly problematic, mostly due to the PDF format itself. In case of a failure processing any file, the `ExtractingRequestHandler` does not have a secondary mechanism to try to extract some text from the file; it will throw an exception and fail.
====
[[UploadingDatawithSolrCellusingApacheTika-TryingoutTikawiththeSolrtechproductsExample]]
== Trying out Tika with the Solr techproducts Example
You can try out the Tika framework using the `techproducts` example included in Solr.
@ -96,8 +92,7 @@ In this command, the `uprefix=attr_` parameter causes all generated fields that
This command allows you to query the document using an attribute, as in: `\http://localhost:8983/solr/techproducts/select?q=attr_meta:microsoft`.
[[UploadingDatawithSolrCellusingApacheTika-InputParameters]]
== Input Parameters
== Solr Cell Input Parameters
The table below describes the parameters accepted by the Extracting Request Handler.
@ -158,8 +153,6 @@ Prefixes all fields that are not defined in the schema with the given prefix. Th
`xpath`::
When extracting, only return Tika XHTML content that satisfies the given XPath expression. See http://tika.apache.org/1.7/index.html for details on the format of Tika XHTML. See also http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput.
[[UploadingDatawithSolrCellusingApacheTika-OrderofOperations]]
== Order of Operations
Here is the order in which the Solr Cell framework, using the Extracting Request Handler and Tika, processes its input.
@ -169,7 +162,6 @@ Here is the order in which the Solr Cell framework, using the Extracting Request
. Tika applies the mapping rules specified by `fmap.__source__=__target__` parameters.
. If `uprefix` is specified, any unknown field names are prefixed with that value, else if `defaultField` is specified, any unknown fields are copied to the default field.
[[UploadingDatawithSolrCellusingApacheTika-ConfiguringtheSolrExtractingRequestHandler]]
== Configuring the Solr ExtractingRequestHandler
If you are not working with the supplied `sample_techproducts_configs` or `_default` <<config-sets.adoc#config-sets,config set>>, you must configure your own `solrconfig.xml` to know about the Jar's containing the `ExtractingRequestHandler` and its dependencies:
@ -216,7 +208,6 @@ The `tika.config` entry points to a file containing a Tika configuration. The `d
* `EEEE, dd-MMM-yy HH:mm:ss zzz`
* `EEE MMM d HH:mm:ss yyyy`
[[UploadingDatawithSolrCellusingApacheTika-Parserspecificproperties]]
=== Parser-Specific Properties
Parsers used by Tika may have specific properties to govern how data is extracted. For instance, when using the Tika library from a Java program, the PDFParserConfig class has a method setSortByPosition(boolean) that can extract vertically oriented text. To access that method via configuration with the ExtractingRequestHandler, one can add the parseContext.config property to the solrconfig.xml file (see above) and then set properties in Tika's PDFParserConfig as below. Consult the Tika Java API documentation for configuration parameters that can be set for any particular parsers that require this level of control.
@ -232,14 +223,12 @@ Parsers used by Tika may have specific properties to govern how data is extracte
</entries>
----
[[UploadingDatawithSolrCellusingApacheTika-Multi-CoreConfiguration]]
=== Multi-Core Configuration
For a multi-core configuration, you can specify `sharedLib='lib'` in the `<solr/>` section of `solr.xml` and place the necessary jar files there.
For more information about Solr cores, see <<the-well-configured-solr-instance.adoc#the-well-configured-solr-instance,The Well-Configured Solr Instance>>.
[[UploadingDatawithSolrCellusingApacheTika-IndexingEncryptedDocumentswiththeExtractingUpdateRequestHandler]]
== Indexing Encrypted Documents with the ExtractingUpdateRequestHandler
The ExtractingRequestHandler will decrypt encrypted files and index their content if you supply a password in either `resource.password` on the request, or in a `passwordsFile` file.
@ -254,11 +243,9 @@ myFileName = myPassword
.*\.pdf$ = myPdfPassword
----
[[UploadingDatawithSolrCellusingApacheTika-Examples]]
== Examples
== Solr Cell Examples
[[UploadingDatawithSolrCellusingApacheTika-Metadata]]
=== Metadata
=== Metadata Created by Tika
As mentioned before, Tika produces metadata about the document. Metadata describes different aspects of a document, such as the author's name, the number of pages, the file size, and so on. The metadata produced depends on the type of document submitted. For instance, PDFs have different metadata than Word documents do.
@ -277,17 +264,10 @@ The size of the stream in bytes.
The content type of the stream, if available.
[IMPORTANT]
====
IMPORTANT: We recommend that you try using the `extractOnly` option to discover which values Solr is setting for these metadata elements.
We recommend that you try using the `extractOnly` option to discover which values Solr is setting for these metadata elements.
====
[[UploadingDatawithSolrCellusingApacheTika-ExamplesofUploadsUsingtheExtractingRequestHandler]]
=== Examples of Uploads Using the Extracting Request Handler
[[UploadingDatawithSolrCellusingApacheTika-CaptureandMapping]]
==== Capture and Mapping
The command below captures `<div>` tags separately, and then maps all the instances of that field to a dynamic field named `foo_t`.
@ -297,18 +277,6 @@ The command below captures `<div>` tags separately, and then maps all the instan
bin/post -c techproducts example/exampledocs/sample.html -params "literal.id=doc2&captureAttr=true&defaultField=_text_&fmap.div=foo_t&capture=div"
----
[[UploadingDatawithSolrCellusingApacheTika-Capture_Mapping]]
==== Capture & Mapping
The command below captures `<div>` tags separately and maps the field to a dynamic field named `foo_t`.
[source,bash]
----
bin/post -c techproducts example/exampledocs/sample.html -params "literal.id=doc3&captureAttr=true&defaultField=_text_&capture=div&fmap.div=foo_t"
----
[[UploadingDatawithSolrCellusingApacheTika-UsingLiteralstoDefineYourOwnMetadata]]
==== Using Literals to Define Your Own Metadata
To add in your own metadata, pass in the literal parameter along with the file:
@ -318,8 +286,7 @@ To add in your own metadata, pass in the literal parameter along with the file:
bin/post -c techproducts -params "literal.id=doc4&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_t&literal.blah_s=Bah" example/exampledocs/sample.html
----
[[UploadingDatawithSolrCellusingApacheTika-XPath]]
==== XPath
==== XPath Expressions
The example below passes in an XPath expression to restrict the XHTML returned by Tika:
@ -328,7 +295,6 @@ The example below passes in an XPath expression to restrict the XHTML returned b
bin/post -c techproducts -params "literal.id=doc5&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_t&xpath=/xhtml:html/xhtml:body/xhtml:div//node()" example/exampledocs/sample.html
----
[[UploadingDatawithSolrCellusingApacheTika-ExtractingDatawithoutIndexingIt]]
=== Extracting Data without Indexing It
Solr allows you to extract data without indexing. You might want to do this if you're using Solr solely as an extraction server or if you're interested in testing Solr extraction.
@ -347,7 +313,6 @@ The output includes XML generated by Tika (and further escaped by Solr's XML) us
bin/post -c techproducts -params "extractOnly=true&wt=ruby&indent=true" -out yes example/exampledocs/sample.html
----
[[UploadingDatawithSolrCellusingApacheTika-SendingDocumentstoSolrwithaPOST]]
== Sending Documents to Solr with a POST
The example below streams the file as the body of the POST, which does not, then, provide information to Solr about the name of the file.
@ -357,7 +322,6 @@ The example below streams the file as the body of the POST, which does not, then
curl "http://localhost:8983/solr/techproducts/update/extract?literal.id=doc6&defaultField=text&commit=true" --data-binary @example/exampledocs/sample.html -H 'Content-type:text/html'
----
[[UploadingDatawithSolrCellusingApacheTika-SendingDocumentstoSolrwithSolrCellandSolrJ]]
== Sending Documents to Solr with Solr Cell and SolrJ
SolrJ is a Java client that you can use to add documents to the index, update the index, or query the index. You'll find more information on SolrJ in <<client-apis.adoc#client-apis,Client APIs>>.

View File

@ -22,7 +22,6 @@ http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html[Ja
Solr, like any other good citizen of the Java universe, can be controlled via a JMX interface. You can enable JMX support by adding lines to `solrconfig.xml`. You can use a JMX client, like jconsole, to connect with Solr. Check out the Wiki page http://wiki.apache.org/solr/SolrJmx for more information. You may also find the following overview of JMX to be useful: http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html.
[[UsingJMXwithSolr-ConfiguringJMX]]
== Configuring JMX
JMX configuration is provided in `solrconfig.xml`. Please see the http://www.oracle.com/technetwork/java/javase/tech/javamanagement-140525.html[JMX Technology Home Page] for more details.
@ -36,7 +35,6 @@ Enabling/disabling JMX and securing access to MBeanServers is left up to the use
====
[[UsingJMXwithSolr-ConfiguringanExistingMBeanServer]]
=== Configuring an Existing MBeanServer
The command:
@ -48,7 +46,6 @@ The command:
enables JMX support in Solr if and only if an existing MBeanServer is found. Use this if you want to configure JMX with JVM parameters. Remove this to disable exposing Solr configuration and statistics to JMX. If this is specified, Solr will try to list all available MBeanServers and use the first one to register MBeans.
[[UsingJMXwithSolr-ConfiguringanExistingMBeanServerwithagentId]]
=== Configuring an Existing MBeanServer with agentId
The command:
@ -60,7 +57,6 @@ The command:
enables JMX support in Solr if and only if an existing MBeanServer is found matching the given agentId. If multiple servers are found, the first one is used. If none is found, an exception is raised and depending on the configuration, Solr may refuse to start.
[[UsingJMXwithSolr-ConfiguringaNewMBeanServer]]
=== Configuring a New MBeanServer
The command:
@ -72,8 +68,7 @@ The command:
creates a new MBeanServer exposed for remote monitoring at the specific service URL. If the JMXConnectorServer can't be started (probably because the serviceUrl is bad), an exception is thrown.
[[UsingJMXwithSolr-Example]]
==== Example
==== MBean Server Example
Solr's `sample_techproducts_configs` config set uses the simple `<jmx />` configuration option. If you start the example with the necessary JVM system properties to launch an internal MBeanServer, Solr will register with it and you can connect using a tool like `jconsole`:
@ -87,7 +82,6 @@ bin/solr -e techproducts -Dcom.sun.management.jmxremote
3. Connect to the "`start.jar`" shown in the list of local processes.
4. Switch to the "MBeans" tab. You should be able to see "`solr/techproducts`" listed there, at which point you can drill down and see details of every solr plugin.
[[UsingJMXwithSolr-ConfiguringaRemoteConnectiontoSolrJMX]]
=== Configuring a Remote Connection to Solr JMX
If you need to attach a JMX-enabled Java profiling tool, such as JConsole or VisualVM, to a remote Solr server, then you need to enable remote JMX access when starting the Solr server. Simply change the `ENABLE_REMOTE_JMX_OPTS` property in the include file to true. Youll also need to choose a port for the JMX RMI connector to bind to, such as 18983. For example, if your Solr include script sets:
@ -118,7 +112,5 @@ http://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
[IMPORTANT]
====
Making JMX connections into machines running behind NATs (e.g. Amazon's EC2 service) is not a simple task. The `java.rmi.server.hostname` system property may help, but running `jconsole` on the server itself and using a remote desktop is often the simplest solution. See http://web.archive.org/web/20130525022506/http://jmsbrdy.com/monitoring-java-applications-running-on-ec2-i.
====

View File

@ -20,7 +20,6 @@
Solr includes an output format specifically for <<response-writers.adoc#ResponseWriters-PythonResponseWriter,Python>>, but <<response-writers.adoc#ResponseWriters-JSONResponseWriter,JSON output>> is a little more robust.
[[UsingPython-SimplePython]]
== Simple Python
Making a query is a simple matter. First, tell Python you will need to make HTTP connections.
@ -50,7 +49,6 @@ for document in response['response']['docs']:
print " Name =", document['name']
----
[[UsingPython-PythonwithJSON]]
== Python with JSON
JSON is a more robust response format, but you will need to add a Python package in order to use it. At a command line, install the simplejson package like this:

View File

@ -45,7 +45,6 @@ SolrClient solr = new CloudSolrClient.Builder().withSolrUrl("http://localhost:89
Once you have a `SolrClient`, you can use it by calling methods like `query()`, `add()`, and `commit()`.
[[UsingSolrJ-BuildingandRunningSolrJApplications]]
== Building and Running SolrJ Applications
The SolrJ API is included with Solr, so you do not have to download or install anything else. However, in order to build and run applications that use SolrJ, you have to add some libraries to the classpath.
@ -69,7 +68,6 @@ You can sidestep a lot of the messing around with the JAR files by using Maven i
If you are worried about the SolrJ libraries expanding the size of your client application, you can use a code obfuscator like http://proguard.sourceforge.net/[ProGuard] to remove APIs that you are not using.
[[UsingSolrJ-SpecifyingSolrUrl]]
== Specifying Solr Base URLs
Most `SolrClient` implementations (with the notable exception of `CloudSolrClient`) require users to specify one or more Solr base URLs, which the client then uses to send HTTP requests to Solr. The path users include on the base URL they provide has an effect on the behavior of the created client from that point on.
@ -77,7 +75,6 @@ Most `SolrClient` implementations (with the notable exception of `CloudSolrClien
. A URL with a path pointing to a specific core or collection (e.g. `http://hostname:8983/solr/core1`). When a core or collection is specified in the base URL, subsequent requests made with that client are not required to re-specify the affected collection. However, the client is limited to sending requests to that core/collection, and can not send requests to any others.
. A URL with a generic path pointing to the root Solr path (e.g. `http://hostname:8983/solr`). When no core or collection is specified in the base URL, requests can be made to any core/collection, but the affected core/collection must be specified on all requests.
[[UsingSolrJ-SettingXMLResponseParser]]
== Setting XMLResponseParser
SolrJ uses a binary format, rather than XML, as its default response format. If you are trying to mix Solr and SolrJ versions where one is version 1.x and the other is 3.x or later, then you MUST use the XML response parser. The binary format changed in 3.x, and the two javabin versions are entirely incompatible. The following code will make this change:
@ -87,7 +84,6 @@ SolrJ uses a binary format, rather than XML, as its default response format. If
solr.setParser(new XMLResponseParser());
----
[[UsingSolrJ-PerformingQueries]]
== Performing Queries
Use `query()` to have Solr search for results. You have to pass a `SolrQuery` object that describes the query, and you will get back a QueryResponse (from the `org.apache.solr.client.solrj.response` package).
@ -132,7 +128,6 @@ The `QueryResponse` is a collection of documents that satisfy the query paramete
SolrDocumentList list = response.getResults();
----
[[UsingSolrJ-IndexingDocuments]]
== Indexing Documents
Other operations are just as simple. To index (add) a document, all you need to do is create a `SolrInputDocument` and pass it along to the `SolrClient` 's `add()` method. This example assumes that the SolrClient object called 'solr' is already created based on the examples shown earlier.
@ -150,7 +145,6 @@ UpdateResponse response = solr.add(document);
solr.commit();
----
[[UsingSolrJ-UploadingContentinXMLorBinaryFormats]]
=== Uploading Content in XML or Binary Formats
SolrJ lets you upload content in binary format instead of the default XML format. Use the following code to upload using binary format, which is the same format SolrJ uses to fetch results. If you are trying to mix Solr and SolrJ versions where one is version 1.x and the other is 3.x or later, then you MUST stick with the XML request writer. The binary format changed in 3.x, and the two javabin versions are entirely incompatible.
@ -160,12 +154,10 @@ SolrJ lets you upload content in binary format instead of the default XML format
solr.setRequestWriter(new BinaryRequestWriter());
----
[[UsingSolrJ-UsingtheConcurrentUpdateSolrClient]]
=== Using the ConcurrentUpdateSolrClient
When implementing java applications that will be bulk loading a lot of documents at once, {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html[`ConcurrentUpdateSolrClient`] is an alternative to consider instead of using `HttpSolrClient`. The `ConcurrentUpdateSolrClient` buffers all added documents and writes them into open HTTP connections. This class is thread safe. Although any SolrClient request can be made with this implementation, it is only recommended to use the `ConcurrentUpdateSolrClient` for `/update` requests.
[[UsingSolrJ-EmbeddedSolrServer]]
== EmbeddedSolrServer
The {solr-javadocs}/solr-core/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html[`EmbeddedSolrServer`] class provides an implementation of the `SolrClient` client API talking directly to an micro-instance of Solr running directly in your Java application. This embedded approach is not recommended in most cases and fairly limited in the set of features it supports in particular it can not be used with <<solrcloud.adoc#solrcloud,SolrCloud>> or <<index-replication.adoc#index-replication,Index Replication>>. `EmbeddedSolrServer` exists primarily to help facilitate testing.

View File

@ -26,7 +26,6 @@ These files are uploaded in either of the following cases:
* When you create a collection using the `bin/solr` script.
* Explicitly upload a configuration set to ZooKeeper.
[[UsingZooKeepertoManageConfigurationFiles-StartupBootstrap]]
== Startup Bootstrap
When you try SolrCloud for the first time using the `bin/solr -e cloud`, the related configset gets uploaded to ZooKeeper automatically and is linked with the newly created collection.
@ -49,15 +48,9 @@ The create command will upload a copy of the `_default` configuration directory
Once a configuration directory has been uploaded to ZooKeeper, you can update them using the <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script>>
[IMPORTANT]
====
It's a good idea to keep these files under version control.
====
IMPORTANT: It's a good idea to keep these files under version control.
[[UsingZooKeepertoManageConfigurationFiles-UploadingConfigurationFilesusingbin_solrorSolrJ]]
== Uploading Configuration Files using bin/solr or SolrJ
In production situations, <<config-sets.adoc#config-sets,Config Sets>> can also be uploaded to ZooKeeper independent of collection creation using either Solr's <<solr-control-script-reference.adoc#solr-control-script-reference,Solr Control Script>> or the {solr-javadocs}/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html[CloudSolrClient.uploadConfig] java method.
@ -71,19 +64,17 @@ bin/solr zk upconfig -n <name for configset> -d <path to directory with configse
It is strongly recommended that the configurations be kept in a version control system, Git, SVN or similar.
[[UsingZooKeepertoManageConfigurationFiles-ManagingYourSolrCloudConfigurationFiles]]
== Managing Your SolrCloud Configuration Files
To update or change your SolrCloud configuration files:
1. Download the latest configuration files from ZooKeeper, using the source control checkout process.
2. Make your changes.
3. Commit your changed file to source control.
4. Push the changes back to ZooKeeper.
5. Reload the collection so that the changes will be in effect.
. Download the latest configuration files from ZooKeeper, using the source control checkout process.
. Make your changes.
. Commit your changed file to source control.
. Push the changes back to ZooKeeper.
. Reload the collection so that the changes will be in effect.
[[UsingZooKeepertoManageConfigurationFiles-PreparingZooKeeperbeforefirstclusterstart]]
== Preparing ZooKeeper before first cluster start
== Preparing ZooKeeper before First Cluster Start
If you will share the same ZooKeeper instance with other applications you should use a _chroot_ in ZooKeeper. Please see <<taking-solr-to-production.adoc#TakingSolrtoProduction-ZooKeeperchroot,ZooKeeper chroot>> for instructions.

View File

@ -34,7 +34,6 @@ The old API and the v2 API differ in three principle ways:
. Endpoint structure: The v2 API endpoint structure has been rationalized and regularized.
. Documentation: The v2 APIs are self-documenting: append `/_introspect` to any valid v2 API path and the API specification will be returned in JSON format.
[[v2API-v2APIPathPrefixes]]
== v2 API Path Prefixes
Following are some v2 API URL paths and path prefixes, along with some of the operations that are supported at these paths and their sub-paths.
@ -57,7 +56,6 @@ Following are some v2 API URL paths and path prefixes, along with some of the op
|`/v2/c/.system/blob` |Upload and download blobs and metadata.
|===
[[v2API-Introspect]]
== Introspect
Append `/_introspect` to any valid v2 API path and the API specification will be returned in JSON format.
@ -72,7 +70,6 @@ Most endpoints support commands provided in a body sent via POST. To limit the i
`\http://localhost:8983/v2/c/gettingstarted/_introspect?method=POST&command=modify`
[[v2API-InterpretingtheIntrospectOutput]]
=== Interpreting the Introspect Output
Example : `\http://localhost:8983/v2/c/gettingstarted/get/_introspect`
@ -154,13 +151,11 @@ Example of introspect for a POST API: `\http://localhost:8983/v2/c/gettingstarte
"/c/gettingstarted/update":["POST"]},
[... more sub-paths ...]
}
----
The `"commands"` section in the above example has one entry for each command supported at this endpoint. The key is the command name and the value is a json object describing the command structure using JSON schema (see http://json-schema.org/ for a description).
[[v2API-InvocationExamples]]
== Invocation Examples
For the "gettingstarted" collection, set the replication factor and whether to automatically add replicas (see above for the introspect output for the `"modify"` command used here):

View File

@ -42,7 +42,6 @@ The above example shows the optional initialization and custom tool parameters u
== Configuration & Usage
[[VelocityResponseWriter-VelocityResponseWriterinitializationparameters]]
=== VelocityResponseWriter Initialization Parameters
`template.base.dir`::
@ -66,7 +65,6 @@ External "tools" can be specified as list of string name/value (tool name / clas
+
A custom registered tool can override the built-in context objects with the same name, except for `$request`, `$response`, `$page`, and `$debug` (these tools are designed to not be overridden).
[[VelocityResponseWriter-VelocityResponseWriterrequestparameters]]
=== VelocityResponseWriter Request Parameters
`v.template`::
@ -102,7 +100,6 @@ Resource bundles can be added by providing a JAR file visible by the SolrResourc
`v.template._template_name_`:: When the "params" resource loader is enabled, templates can be specified as part of the Solr request.
[[VelocityResponseWriter-VelocityResponseWritercontextobjects]]
=== VelocityResponseWriter Context Objects
// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed

View File

@ -27,7 +27,6 @@ The `currency` FieldType provides support for monetary values to Solr/Lucene wit
* Currency parsing by either currency code or symbol
* Symmetric & asymmetric exchange rates (asymmetric exchange rates are useful if there are fees associated with exchanging the currency)
[[WorkingwithCurrenciesandExchangeRates-ConfiguringCurrencies]]
== Configuring Currencies
.CurrencyField has been Deprecated
@ -40,12 +39,12 @@ The `currency` field type is defined in `schema.xml`. This is the default config
[source,xml]
----
<fieldType name="currency" class="solr.CurrencyFieldType"
<fieldType name="currency" class="solr.CurrencyFieldType"
amountLongSuffix="_l_ns" codeStrSuffix="_s_ns"
defaultCurrency="USD" currencyConfig="currency.xml" />
----
In this example, we have defined the name and class of the field type, and defined the `defaultCurrency` as "USD", for U.S. Dollars. We have also defined a `currencyConfig` to use a file called "currency.xml". This is a file of exchange rates between our default currency to other currencies. There is an alternate implementation that would allow regular downloading of currency data. See <<WorkingwithCurrenciesandExchangeRates-ExchangeRates,Exchange Rates>> below for more.
In this example, we have defined the name and class of the field type, and defined the `defaultCurrency` as "USD", for U.S. Dollars. We have also defined a `currencyConfig` to use a file called "currency.xml". This is a file of exchange rates between our default currency to other currencies. There is an alternate implementation that would allow regular downloading of currency data. See <<Exchange Rates>> below for more.
Many of the example schemas that ship with Solr include a <<dynamic-fields.adoc#dynamic-fields,dynamic field>> that uses this type, such as this example:
@ -60,10 +59,9 @@ At indexing time, money fields can be indexed in a native currency. For example,
During query processing, range and point queries are both supported.
[[WorkingwithCurrenciesandExchangeRates-Sub-fieldSuffixes]]
=== Sub-field Suffixes
You must specify parameters `amountLongSuffix` and `codeStrSuffix`, corresponding to dynamic fields to be used for the raw amount and the currency dynamic sub-fields, e.g.:
You must specify parameters `amountLongSuffix` and `codeStrSuffix`, corresponding to dynamic fields to be used for the raw amount and the currency dynamic sub-fields, e.g.:
[source,xml]
----
@ -80,12 +78,10 @@ In the above example, the raw amount field will use the `"*_l_ns"` dynamic field
As noted on <<updating-parts-of-documents.adoc#UpdatingPartsofDocuments-FieldStorage,Updating Parts of Documents>>, stored dynamic sub-fields will cause indexing to fail when you use Atomic Updates. To avoid this problem, specify `stored="false"` on those dynamic fields.
====
[[WorkingwithCurrenciesandExchangeRates-ExchangeRates]]
== Exchange Rates
You configure exchange rates by specifying a provider. Natively, two provider types are supported: `FileExchangeRateProvider` or `OpenExchangeRatesOrgProvider`.
[[WorkingwithCurrenciesandExchangeRates-FileExchangeRateProvider]]
=== FileExchangeRateProvider
This provider requires you to provide a file of exchange rates. It is the default, meaning that to use this provider you only need to specify the file path and name as a value for `currencyConfig` in the definition for this type.
@ -103,9 +99,9 @@ There is a sample `currency.xml` file included with Solr, found in the same dire
<rate from="USD" to="CAD" rate="1.030815" comment="CANADA Dollar" />
<!-- Cross-rates for some common currencies -->
<rate from="EUR" to="GBP" rate="0.869914" />
<rate from="EUR" to="NOK" rate="7.800095" />
<rate from="GBP" to="NOK" rate="8.966508" />
<rate from="EUR" to="GBP" rate="0.869914" />
<rate from="EUR" to="NOK" rate="7.800095" />
<rate from="GBP" to="NOK" rate="8.966508" />
<!-- Asymmetrical rates -->
<rate from="EUR" to="USD" rate="0.5" />
@ -113,7 +109,6 @@ There is a sample `currency.xml` file included with Solr, found in the same dire
</currencyConfig>
----
[[WorkingwithCurrenciesandExchangeRates-OpenExchangeRatesOrgProvider]]
=== OpenExchangeRatesOrgProvider
You can configure Solr to download exchange rates from http://www.OpenExchangeRates.Org[OpenExchangeRates.Org], with updates rates between USD and 170 currencies hourly. These rates are symmetrical only.
@ -122,10 +117,10 @@ In this case, you need to specify the `providerClass` in the definitions for the
[source,xml]
----
<fieldType name="currency" class="solr.CurrencyFieldType"
<fieldType name="currency" class="solr.CurrencyFieldType"
amountLongSuffix="_l_ns" codeStrSuffix="_s_ns"
providerClass="solr.OpenExchangeRatesOrgProvider"
refreshInterval="60"
refreshInterval="60"
ratesFileLocation="http://www.openexchangerates.org/api/latest.json?app_id=yourPersonalAppIdKey"/>
----

View File

@ -20,7 +20,6 @@
The EnumField type allows defining a field whose values are a closed set, and the sort order is pre-determined but is not alphabetic nor numeric. Examples of this are severity lists, or risk definitions.
[[WorkingwithEnumFields-DefininganEnumFieldinschema.xml]]
== Defining an EnumField in schema.xml
The EnumField type definition is quite simple, as in this example defining field types for "priorityLevel" and "riskLevel" enumerations:
@ -33,11 +32,10 @@ The EnumField type definition is quite simple, as in this example defining field
Besides the `name` and the `class`, which are common to all field types, this type also takes two additional parameters:
* `enumsConfig`: the name of a configuration file that contains the `<enum/>` list of field values and their order that you wish to use with this field type. If a path to the file is not defined specified, the file should be in the `conf` directory for the collection.
* `enumName`: the name of the specific enumeration in the `enumsConfig` file to use for this type.
`enumsConfig`:: the name of a configuration file that contains the `<enum/>` list of field values and their order that you wish to use with this field type. If a path to the file is not defined specified, the file should be in the `conf` directory for the collection.
`enumName`:: the name of the specific enumeration in the `enumsConfig` file to use for this type.
[[WorkingwithEnumFields-DefiningtheEnumFieldconfigurationfile]]
== Defining the EnumField configuration file
== Defining the EnumField Configuration File
The file named with the `enumsConfig` parameter can contain multiple enumeration value lists with different names if there are multiple uses for enumerations in your Solr schema.
@ -68,9 +66,7 @@ In this example, there are two value lists defined. Each list is between `enum`
.Changing Values
[IMPORTANT]
====
You cannot change the order, or remove, existing values in an `<enum/>` without reindexing.
You can however add new values to the end.
====

View File

@ -20,7 +20,6 @@
This section describes using ZooKeeper access control lists (ACLs) with Solr. For information about ZooKeeper ACLs, see the ZooKeeper documentation at http://zookeeper.apache.org/doc/r3.4.10/zookeeperProgrammers.html#sc_ZooKeeperAccessControl.
[[ZooKeeperAccessControl-AboutZooKeeperACLs]]
== About ZooKeeper ACLs
SolrCloud uses ZooKeeper for shared information and for coordination.
@ -44,7 +43,6 @@ Protecting ZooKeeper itself could mean many different things. **This section is
But this content is also available to "the outside" via the ZooKeeper API. Outside processes can connect to ZooKeeper and create/update/delete/read content; for example, a Solr node in a SolrCloud cluster wants to create/update/delete/read, and a SolrJ client wants to read from the cluster. It is the responsibility of the outside processes that create/update content to setup ACLs on the content. ACLs describe who is allowed to read, update, delete, create, etc. Each piece of information (znode/content) in ZooKeeper has its own set of ACLs, and inheritance or sharing is not possible. The default behavior in Solr is to add one ACL on all the content it creates - one ACL that gives anyone the permission to do anything (in ZooKeeper terms this is called "the open-unsafe ACL").
[[ZooKeeperAccessControl-HowtoEnableACLs]]
== How to Enable ACLs
We want to be able to:
@ -55,7 +53,6 @@ We want to be able to:
Solr nodes, clients and tools (e.g. ZkCLI) always use a java class called {solr-javadocs}/solr-solrj/org/apache/solr/common/cloud/SolrZkClient.html[`SolrZkClient`] to deal with their ZooKeeper stuff. The implementation of the solution described here is all about changing `SolrZkClient`. If you use `SolrZkClient` in your application, the descriptions below will be true for your application too.
[[ZooKeeperAccessControl-ControllingCredentials]]
=== Controlling Credentials
You control which credentials provider will be used by configuring the `zkCredentialsProvider` property in `solr.xml` 's `<solrcloud>` section to the name of a class (on the classpath) implementing the {solr-javadocs}/solr-solrj/org/apache/solr/common/cloud/ZkCredentialsProvider[`ZkCredentialsProvider`] interface. `server/solr/solr.xml` in the Solr distribution defines the `zkCredentialsProvider` such that it will take on the value of the same-named `zkCredentialsProvider` system property if it is defined (e.g. by uncommenting the `SOLR_ZK_CREDS_AND_ACLS` environment variable definition in `solr.in.sh/.cmd` - see below), or if not, default to the `DefaultZkCredentialsProvider` implementation.
@ -69,12 +66,10 @@ You can always make you own implementation, but Solr comes with two implementati
** The schema is "digest". The username and password are defined by system properties `zkDigestUsername` and `zkDigestPassword`. This set of credentials will be added to the list of credentials returned by `getCredentials()` if both username and password are provided.
** If the one set of credentials above is not added to the list, this implementation will fall back to default behavior and use the (empty) credentials list from `DefaultZkCredentialsProvider`.
[[ZooKeeperAccessControl-ControllingACLs]]
=== Controlling ACLs
You control which ACLs will be added by configuring `zkACLProvider` property in `solr.xml` 's `<solrcloud>` section to the name of a class (on the classpath) implementing the {solr-javadocs}//solr-solrj/org/apache/solr/common/cloud/ZkACLProvider[`ZkACLProvider`] interface. `server/solr/solr.xml` in the Solr distribution defines the `zkACLProvider` such that it will take on the value of the same-named `zkACLProvider` system property if it is defined (e.g. by uncommenting the `SOLR_ZK_CREDS_AND_ACLS` environment variable definition in `solr.in.sh/.cmd` - see below), or if not, default to the `DefaultZkACLProvider` implementation.
[[ZooKeeperAccessControl-OutoftheBoxImplementations]]
==== Out of the Box ACL Implementations
You can always make you own implementation, but Solr comes with:
@ -97,8 +92,6 @@ Notice the overlap in system property names with credentials provider `VMParamsS
You can give the readonly credentials to "clients" of your SolrCloud cluster - e.g. to be used by SolrJ clients. They will be able to read whatever is necessary to run a functioning SolrJ client, but they will not be able to modify any content in ZooKeeper.
[[ZooKeeperAccessControl-bin_solr_solr.cmd_server_scripts_cloud-scripts_zkcli.sh_zkcli.bat]]
=== ZooKeeper ACLs in Solr Scripts
There are two scripts that impact ZooKeeper ACLs:
@ -150,7 +143,6 @@ REM -DzkDigestUsername=admin-user -DzkDigestPassword=CHANGEME-ADMIN-PASSWORD ^
REM -DzkDigestReadonlyUsername=readonly-user -DzkDigestReadonlyPassword=CHANGEME-READONLY-PASSWORD
----
[[ZooKeeperAccessControl-ChangingACLSchemes]]
== Changing ACL Schemes
Over the lifetime of operating your Solr cluster, you may decide to move from an unsecured ZooKeeper to a secured instance. Changing the configured `zkACLProvider` in `solr.xml` will ensure that newly created nodes are secure, but will not protect the already existing data. To modify all existing ACLs, you can use the `updateacls` command with Solr's ZkCLI. First uncomment the `SOLR_ZK_CREDS_AND_ACLS` environment variable definition in `server/scripts/cloud-scripts/zkcli.sh` (or `zkcli.bat` on Windows) and fill in the passwords for the admin-user and the readonly-user - see above - then run `server/scripts/cloud-scripts/zkcli.sh -cmd updateacls /zk-path`, or on Windows run `server\scripts\cloud-scripts\zkcli.bat cmd updateacls /zk-path`.