SOLR-14069: Ref guide: overhaul: resources, libs, plugins, config-sets (#1077)

* split "resource-and-plugin-loading.adoc" into "resource-loading.adoc" and "libs.adoc" then overhauled both.
* enhanced "config-sets.adoc", moving some content in from elsewhere; bit of an overhaul.
* solr-plugins.adoc is now top-level; overhauled content
* Move resource-loading.adoc up a level in the TOC to underneath "The Well-Configured Solr Instance.
* Separate out the leading sentence.
This commit is contained in:
David Smiley 2019-12-14 11:50:00 -05:00 committed by GitHub
parent 49c34028ab
commit 7c048c5070
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 252 additions and 132 deletions

View File

@ -33,12 +33,9 @@ Since the Analytics framework is a _search component_, it must be declared as su
For distributed analytics requests over cloud collections, the component uses the `AnalyticsHandler` strictly for inter-shard communication.
The Analytics Handler should not be used by users to submit analytics requests.
To configure Solr to use the Analytics Component, the first step is to add a `<lib/>` directive so Solr loads the Analytic Component classes (for more about the `<lib/>` directive, see <<resource-and-plugin-loading.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>). In the section of `solrconfig.xml` where the default `<lib/>` directives are, add a line:
To use the Analytics Component, the first step is to install this contrib module's plugins into Solr -- see the <<solr-plugins.adoc#installing-plugins,Solr Plugins>> section on how to do this.
[source,xml]
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-analytics-\d.*\.jar" />
Next you need to enable the request handler and search component. Add the following lines to `solrconfig.xml`, near the defintions for other request handlers:
Next you need to register the request handler and search component. Add the following lines to `solrconfig.xml`, near the defintions for other request handlers:
[source,xml]
.solrconfig.xml

View File

@ -16,17 +16,24 @@
// specific language governing permissions and limitations
// under the License.
On a multicore Solr instance, you may find that you want to share configuration between a number of different cores. You can achieve this using named configsets, which are essentially shared configuration directories stored under a configurable configset base directory.
Configsets are a set of configuration files used in a Solr installation: `solrconfig.xml`, the schema, and then <<resource-loading.adoc#resource-loading,resources>> like language files, `synonyms.txt`, DIH-related configuration, and others that are referenced from the config or schema.
Configsets are made up of the configuration files used in a Solr installation: inclduding `solrconfig.xml`, the schema, language-files, `synonyms.txt`, DIH-related configuration, and others as needed for your implementation.
Such configuration, _configsets_, can be named and then referenced by collections or cores, possibly with the intent to share them to avoid duplication.
Solr ships with two example configsets located in `server/solr/configsets`, which can be used as a base for your own. These example configsets are named `_default` and `sample_techproducts_configs`.
== Configsets in Standalone Mode
If you are using Solr in standalone mode, configsets are created on the filesystem.
If you are using Solr in standalone mode, configsets are managed on the filesystem.
To create a configset, add a new directory under the configset base directory. The configset will be identified by the name of this directory. Then into this copy the configuration directory you want to share. The structure should look something like this:
Each Solr core can have it's very own configSet located beneath it in a `<instance_dir>/conf/` dir.
Here, it is not named or shared and the word _configset_ isn't found.
In Solr's early years, this was _the only way_ it was configured.
To create a named configset, add a new directory under the configset base directory.
The configset will be identified by the name of this directory.
Then add a `conf/` directory containing the configuration you want to share.
The structure should look something like this:
[source,bash]
----
@ -76,4 +83,16 @@ curl -v -X POST -H 'Content-type: application/json' -d '{
== Configsets in SolrCloud Mode
In SolrCloud mode, you can use the <<configsets-api.adoc#configsets-api,Configsets API>> to manage your configsets.
In SolrCloud, it's critical to understand that configsets are fundamentally stored in ZooKeeper _and not_ the file system.
Solr's `_default` configset is uploaded to ZooKeeper on initialization.
This and some demonstration ones remain on the file system but Solr does not use them whatsoever in this mode.
When you create a collection in SolrCloud, you can specify a named configset -- possibly shared.
If you don't, then the `_default` will be copied and given a unique name for use by this collection.
A configset can be uploaded to ZooKeeper either via the <<configsets-api.adoc#configsets-api,Configsets API>> or more directly via <<solr-control-script-reference.adoc#upload-a-configuration-set,`bin/solr zk upconfig`>>.
The Configsets API has some other operations as well, and likewise, so does the CLI.
To upload a file to a configset already stored on ZooKeeper, you can use <<solr-control-script-reference.adoc#copy-between-local-files-and-zookeeper-znodes,`bin/solr zk cp`>>.
CAUTION: By default, ZooKeeper's file size limit is 1MB. If your files are larger than this, you'll need to either <<setting-up-an-external-zookeeper-ensemble.adoc#increasing-the-file-size-limit,increase the ZooKeeper file size limit>> or store them instead <<libs.adoc#lib-directives-in-solrconfig,on the filesystem>>.

View File

@ -1,5 +1,15 @@
= Configuring solrconfig.xml
:page-children: datadir-and-directoryfactory-in-solrconfig, resource-and-plugin-loading, schema-factory-definition-in-solrconfig, indexconfig-in-solrconfig, requesthandlers-and-searchcomponents-in-solrconfig, initparams-in-solrconfig, updatehandlers-in-solrconfig, query-settings-in-solrconfig, requestdispatcher-in-solrconfig, update-request-processors, codec-factory
:page-children: datadir-and-directoryfactory-in-solrconfig, \
schema-factory-definition-in-solrconfig, \
indexconfig-in-solrconfig, \
requesthandlers-and-searchcomponents-in-solrconfig, \
initparams-in-solrconfig, \
updatehandlers-in-solrconfig, \
query-settings-in-solrconfig, \
requestdispatcher-in-solrconfig, \
update-request-processors, \
codec-factory
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -38,7 +48,6 @@ The `solrconfig.xml` file is located in the `conf/` directory for each collectio
We've covered the options in the following sections:
* <<datadir-and-directoryfactory-in-solrconfig.adoc#datadir-and-directoryfactory-in-solrconfig,DataDir and DirectoryFactory in SolrConfig>>
* <<resource-and-plugin-loading.adoc#lib-directives-in-solrconfig,Lib Directives in SolrConfig>>
* <<schema-factory-definition-in-solrconfig.adoc#schema-factory-definition-in-solrconfig,Schema Factory Definition in SolrConfig>>
* <<indexconfig-in-solrconfig.adoc#indexconfig-in-solrconfig,IndexConfig in SolrConfig>>
* <<requesthandlers-and-searchcomponents-in-solrconfig.adoc#requesthandlers-and-searchcomponents-in-solrconfig,RequestHandlers and SearchComponents in SolrConfig>>
@ -49,6 +58,9 @@ We've covered the options in the following sections:
* <<update-request-processors.adoc#update-request-processors,Update Request Processors>>
* <<codec-factory.adoc#codec-factory,Codec Factory>>
Some SolrConfig aspects are covered in other sections.
See <<libs.adoc#lib-directives-in-solrconfig,lib directives in SolrConfig>>, which can be used for both Plugins and Resources.
== Substituting Properties in Solr Config Files
Solr supports variable substitution of property values in configuration files, which allows runtime specification of various configuration options in `solrconfig.xml`. The syntax is `${propertyname[:option default value]`}. This allows defining a default that can be overridden when Solr is launched. If a default value is not specified, then the property _must_ be specified at runtime or the configuration file will generate an error when parsed.

View File

@ -80,7 +80,7 @@ Here is an example of a minimal OpenNLP `langid` configuration in `solrconfig.xm
==== OpenNLP-specific Parameters
`langid.model`::
An OpenNLP language detection model. The OpenNLP project provides a pre-trained 103 language model on the http://opennlp.apache.org/models.html[OpenNLP site's model dowload page]. Model training instructions are provided on the http://opennlp.apache.org/docs/{ivy-opennlp-version}/manual/opennlp.html#tools.langdetect[OpenNLP website]. This parameter is required. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for information on where to put the model.
An OpenNLP language detection model. The OpenNLP project provides a pre-trained 103 language model on the http://opennlp.apache.org/models.html[OpenNLP site's model dowload page]. Model training instructions are provided on the http://opennlp.apache.org/docs/{ivy-opennlp-version}/manual/opennlp.html#tools.langdetect[OpenNLP website]. This parameter is required. See <<resource-loading.adoc#resource-loading,Resource Loading>> for information on where to put the model.
==== OpenNLP Language Codes

View File

@ -732,7 +732,7 @@ Note that for this filter to work properly, the upstream tokenizer must not remo
This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <<ICU Normalizer 2 Filter>>. This filter is a better substitute for the combined behavior of the <<ASCII Folding Filter>>, <<Lower Case Filter>>, and <<ICU Normalizer 2 Filter>>.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
*Factory class:* `solr.ICUFoldingFilterFactory`
@ -840,7 +840,7 @@ This filter factory normalizes text according to one of five Unicode Normalizati
For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms].
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
== ICU Transform Filter
@ -882,7 +882,7 @@ This filter applies http://userguide.icu-project.org/transforms/general[ICU Tran
For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
== Keep Word Filter
@ -2210,7 +2210,7 @@ NOTE: Although this filter produces correct token graphs, it cannot consume an i
*Arguments:*
`synonyms`:: (required) The path of a file that contains a list of synonyms, one per line. In the (default) `solr` format - see the `format` argument below for alternatives - blank lines and lines that begin with "`#`" are ignored. This may be a comma-separated list of paths. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`synonyms`:: (required) The path of a file that contains a list of synonyms, one per line. In the (default) `solr` format - see the `format` argument below for alternatives - blank lines and lines that begin with "`#`" are ignored. This may be a comma-separated list of paths. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
+
There are two ways to specify synonym mappings:
+

View File

@ -1,5 +1,24 @@
= Apache Solr Reference Guide
:page-children: about-this-guide, getting-started, deployment-and-operations, using-the-solr-administration-user-interface, documents-fields-and-schema-design, understanding-analyzers-tokenizers-and-filters, indexing-and-basic-data-operations, searching, streaming-expressions, solrcloud, legacy-scaling-and-distribution, the-well-configured-solr-instance, monitoring-solr, securing-solr, client-apis, further-assistance, solr-glossary, errata, how-to-contribute
:page-children: about-this-guide, \
getting-started, \
deployment-and-operations, \
using-the-solr-administration-user-interface, \
documents-fields-and-schema-design, \
understanding-analyzers-tokenizers-and-filters, \
indexing-and-basic-data-operations, \
searching, \
streaming-expressions, \
solrcloud, \
legacy-scaling-and-distribution, \
solr-plugins, \
the-well-configured-solr-instance, \
monitoring-solr, \
securing-solr, \
client-apis, \
further-assistance, \
solr-glossary, \
errata, \
how-to-contribute
:page-notitle:
:page-toc: false
:page-layout: home

View File

@ -166,7 +166,7 @@ Compound words are most commonly found in Germanic languages.
*Arguments:*
`dictionary`:: (required) The path of a file that contains a list of simple words, one per line. Blank lines and lines that begin with "#" are ignored. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`dictionary`:: (required) The path of a file that contains a list of simple words, one per line. Blank lines and lines that begin with "#" are ignored. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
`minWordSize`:: (integer, default 5) Any token shorter than this is not decompounded.
@ -220,7 +220,7 @@ Unicode Collation in Solr is fast, because all the work is done at index time.
Rather than specifying an analyzer within `<fieldtype ... class="solr.TextField">`, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`.
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
`solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways:
@ -487,7 +487,7 @@ The `lucene/analysis/opennlp` module provides OpenNLP integration via several an
NOTE: The <<OpenNLP Tokenizer>> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance.
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
=== OpenNLP Tokenizer
@ -497,9 +497,9 @@ The OpenNLP Tokenizer takes two language-specific binary model files as paramete
*Arguments:*
`sentenceModel`:: (required) The path of a language-specific OpenNLP sentence detection model file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`sentenceModel`:: (required) The path of a language-specific OpenNLP sentence detection model file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
`tokenizerModel`:: (required) The path of a language-specific OpenNLP tokenization model file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`tokenizerModel`:: (required) The path of a language-specific OpenNLP tokenization model file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
*Example:*
@ -541,7 +541,7 @@ NOTE: Lucene currently does not index token types, so if you want to keep this i
*Arguments:*
`posTaggerModel`:: (required) The path of a language-specific OpenNLP POS tagger model file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`posTaggerModel`:: (required) The path of a language-specific OpenNLP POS tagger model file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
*Examples:*
@ -636,7 +636,7 @@ NOTE: Lucene currently does not index token types, so if you want to keep this i
*Arguments:*
`chunkerModel`:: (required) The path of a language-specific OpenNLP phrase chunker model file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`chunkerModel`:: (required) The path of a language-specific OpenNLP phrase chunker model file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
*Examples*:
@ -700,9 +700,9 @@ This filter replaces the text of each token with its lemma. Both a dictionary-ba
Either `dictionary` or `lemmatizerModel` must be provided, and both may be provided - see the examples below:
`dictionary`:: (optional) The path of a lemmatization dictionary file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information. The dictionary file must be encoded as UTF-8, with one entry per line, in the form `word[tab]lemma[tab]part-of-speech`, e.g., `wrote[tab]write[tab]VBD`.
`dictionary`:: (optional) The path of a lemmatization dictionary file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information. The dictionary file must be encoded as UTF-8, with one entry per line, in the form `word[tab]lemma[tab]part-of-speech`, e.g., `wrote[tab]write[tab]VBD`.
`lemmatizerModel`:: (optional) The path of a language-specific OpenNLP lemmatizer model file. See <<resource-and-plugin-loading.adoc#resource-and-plugin-loading,Resource and Plugin Loading>> for more information.
`lemmatizerModel`:: (optional) The path of a language-specific OpenNLP lemmatizer model file. See <<resource-loading.adoc#resource-loading,Resource Loading>> for more information.
*Examples:*
@ -1033,7 +1033,7 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan
=== Traditional Chinese
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
<<tokenizers.adoc#standard-tokenizer,Standard Tokenizer>> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <<CJK Bigram Filter>>, overlapping bigrams of Chinese characters are formed.
@ -1105,9 +1105,9 @@ See the example under <<Traditional Chinese>>.
=== Simplified Chinese
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <<HMM Chinese Tokenizer>>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
The default configuration of the <<tokenizers.adoc#icu-tokenizer,ICU Tokenizer>> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
Also useful for Chinese analysis:
@ -1162,7 +1162,7 @@ Also useful for Chinese analysis:
=== HMM Chinese Tokenizer
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
*Factory class:* `solr.HMMChineseTokenizerFactory`
@ -1958,7 +1958,7 @@ Example:
[[hebrew-lao-myanmar-khmer]]
=== Hebrew, Lao, Myanmar, Khmer
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add.
Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add.
See <<tokenizers.adoc#icu-tokenizer,the ICUTokenizer>> for more information.
@ -2165,7 +2165,7 @@ Solr includes support for normalizing Persian, and Lucene includes an example st
=== Polish
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
*Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory`
@ -2682,7 +2682,7 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa
=== Ukrainian
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add.
Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar.

View File

@ -533,7 +533,7 @@ Assuming that you consider to use a large model placed at `/path/to/models/myMod
}
----
First, add the directory to Solr's resource paths with a <<resource-and-plugin-loading.adoc#lib-directives-in-solrconfig,`<lib/>` directive>>:
First, add the directory to Solr's resource paths with a <<libs.adoc#lib-directives-in-solrconfig,`<lib/>` directive>>:
[source,xml]
----

View File

@ -0,0 +1,78 @@
= Lib Directories and Directives
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Here we describe two simple and effective methods to make the `.jar` files for Solr plugins visible to Solr.
Such files are sometimes called "libraries" or "libs" for short.
Essentially you can put them in some special places or explicitly tell Solr about them from your config.
If there is overlap or inter-dependencies between libraries, then pay attention to the order. You can think of it like a stack that is searched top-down. At the top are the lib directives in reverse order, then Solr core's lib, then Solr home's lib, then Solr itself.
== Lib Directories
There are several special places you can place Solr plugin `.jar` files:
* `<solr_home>/lib/`: The `.jar` files placed here are available to all Solr cores running on the node, and to node level plugins referenced in `solr.xml` -- so basically everything.
This directory is not present by default so create it.
See <<taking-solr-to-production.adoc#solr-home-directory,Solr home directory>>.
* `<core_instance>/lib/`: In standalone Solr, you may want to add plugins just for a specific Solr core.
Create this adjacent to the `conf/` directory; it's not present by default.
* `<solr_install>/server/solr-webapp/webapp/WEB-INF/lib/`: The `.jar` files for Solr itself and it's dependencies live here.
Certain plugins or add-ons to plugins require placement here.
They will document themselves to say so.
Solr incorporates Jetty for providing HTTP server functionality.
Jetty has some directories that contain `.jar` files for itself and its own plugins / modules or JVM level plugins (e.g. loggers).
Solr plugins won't work in these locations.
== Lib Directives in SolrConfig
_Both_ plugin and <<resource-loading.adoc#resource-loading,resource>> file paths are configurable via `<lib/>` directives in `solrconfig.xml`.
When a directive matches a directory, then resources can be resolved from it.
When a directive matches a `.jar` file, Solr plugins and their dependencies are resolved from it.
Resources can be placed in a `.jar` too but that's unusual.
It's erroneous to refer to any other type of file.
A `<lib/>` directive must have one (not both) of these two attributes:
* `path`: used to refer to a single directory (for resources) or file (for a plugin `.jar`)
* `dir`: used to refer to _all_ direct descendants of the specified directory. Optionally supply a `regex` attribute to filter these to those matching the regular expression.
All directories are resolved as relative to the Solr core's `instanceDir`.
These examples show how to load contrib modules into Solr:
[source,xml]
----
<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/langid/lib/" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/contrib/velocity/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-velocity-\d.*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-ltr-\d.*\.jar" />
----

View File

@ -1,86 +0,0 @@
= Resource and Plugin Loading
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Solr components can be configured using *resources*: data stored in external files that may be referred to in a location-independent fashion. Examples include: files needed by schema components, e.g., a stopword list for <<filter-descriptions.adoc#stop-filter,Stop Filter>>; and machine-learned models for <<learning-to-rank.adoc#learning-to-rank,Learning to Rank>>.
Solr *plugins*, which can be configured in `solrconfig.xml`, are Java classes that are normally packaged in `.jar` files along with supporting classes and data. Solr ships with a number of built-in plugins, and can also be configured to use custom plugins. Example plugins are the <<uploading-structured-data-store-data-with-the-data-import-handler.adoc#uploading-structured-data-store-data-with-the-data-import-handler,Data Import Handler>> and custom search components.
Resources and plugins may be stored:
* in ZooKeeper under a collection's configset node (SolrCloud only);
* on a filesystem accessible to Solr nodes; or
* in Solr's <<blob-store-api.adoc#blob-store-api,Blob Store>> (SolrCloud only).
NOTE: Schema components may not be stored as plugins in the Blob Store, and cannot access resources stored in the Blob Store.
== Resource and Plugin Loading Sequence
Under SolrCloud, resources and plugins to be loaded are first looked up in ZooKeeper under the collection's configset znode. If the resource or plugin is not found there, Solr will fall back to loading <<Resources and Plugins on the Filesystem,from the filesystem>>.
Note that by default, Solr will not attempt to load resources and plugins from the Blob Store. To enable this, see the section <<blob-store-api.adoc#use-a-blob-in-a-handler-or-component,Use a Blob in a Handler or Component>>. When loading from the Blob Store is enabled for a component, lookups occur only in the Blob Store, and never in ZooKeeper or on the filesystem.
== Resources and Plugins in ConfigSets on ZooKeeper
Resources and plugins may be uploaded to ZooKeeper as part of a configset, either via the <<configsets-api.adoc#configsets-api,Configsets API>> or <<solr-control-script-reference.adoc#upload-a-configuration-set,`bin/solr zk upload`>>.
To upload a plugin or resource to a configset already stored on ZooKeeper, you can use <<solr-control-script-reference.adoc#copy-between-local-files-and-zookeeper-znodes,`bin/solr zk cp`>>.
CAUTION: By default, ZooKeeper's file size limit is 1MB. If your files are larger than this, you'll need to either <<setting-up-an-external-zookeeper-ensemble.adoc#increasing-the-file-size-limit,increase the ZooKeeper file size limit>> or store them instead <<Resources and Plugins on the Filesystem,on the filesystem>>.
== Resources and Plugins on the Filesystem
Under standalone Solr, when looking up a plugin or resource to be loaded, Solr's resource loader will first look under the `<instanceDir>/conf/` directory. If the plugin or resource is not found, the configured plugin and resource file paths are searched - see the section <<Lib Directives in SolrConfig>> below.
On core load, Solr's resource loader constructs a list of paths (subdirectories and jars), first under <<solr_homelib,`solr_home/lib`>>, and then under directories pointed to by <<Lib Directives in SolrConfig,`<lib/>` directives in SolrConfig>>.
When looking up a resource or plugin to be loaded, the paths on the list are searched in the order they were added.
NOTE: Under SolrCloud, each node hosting a collection replica will need its own copy of plugins and resources to be loaded.
To get Solr's resource loader to find resources either under subdirectories or in jar files that were created after Solr's resource path list was constructed, reload the collection (SolrCloud) or the core (standalone Solr). Restarting all affected Solr nodes also works.
WARNING: Resource files *will not be loaded* if they are located directly under either `solr_home/lib` or a directory given by the `dir` attribute on a `<lib/>` directive in SolrConfig. Resources are only searched for under subdirectories or in jar files found in those locations.
=== solr_home/lib
Each Solr node can have a directory named `lib/` under the <<taking-solr-to-production.adoc#solr-home-directory,Solr home directory>>. In order to use this directory to host resources or plugins, it must first be manually created.
=== Lib Directives in SolrConfig
Plugin and resource file paths are configurable via `<lib/>` directives in `solrconfig.xml`.
Loading occurs in the order `<lib/>` directives appear in `solrconfig.xml`. If there are dependencies, list the lowest level dependency jar first.
A regular expression supplied in the `<lib/>` element's `regex` attribute value can be used to restrict which subdirectories and/or jar files are added to the Solr resource loader's list of search locations. If no regular expression is given, all direct subdirectory and jar children are included in the resource path list. All directories are resolved as relative to the Solr core's `instanceDir`.
From an example SolrConfig:
[source,xml]
----
<lib dir="../../../contrib/extraction/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="../../../contrib/clustering/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-clustering-\d.*\.jar" />
<lib dir="../../../contrib/langid/lib/" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-langid-\d.*\.jar" />
<lib dir="../../../contrib/velocity/lib" regex=".*\.jar" />
<lib dir="../../../dist/" regex="solr-velocity-\d.*\.jar" />
----

View File

@ -0,0 +1,44 @@
= Resource Loading
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
Solr components can be configured using *resources*: data stored in external files that may be referred to in a location-independent fashion.
Examples of resources include: files needed by schema components, e.g., a stopword list for <<filter-descriptions.adoc#stop-filter,Stop Filter>>; and machine-learned models for <<learning-to-rank.adoc#learning-to-rank,Learning to Rank>>.
_Resources are typically resolved from the configSet_ but there are other options too.
Solr's resources are generally only loaded initially when the Solr collection or Solr core is loaded.
After you update a resource, you'll typically need to _reload_ the affected collections (SolrCloud) or the cores (standalone Solr).
Restarting all affected Solr nodes also works.
<<managed-resources.adoc#managed-resources,Managed resources>> can be manipulated via APIs and do not need an explicit reload.
== Resources in ConfigSets
<<config-sets.adoc#config-sets,ConfigSets>> are the directories containing solrconfig.xml, the schema, and resources referenced by them.
In SolrCloud they are in ZooKeeper whereas in standalone they are on the file system.
In either mode, configSets might be shared or might be dedicated to a configSet.
Prefer to put resources here.
== Resources in Other Places
Resources can also be placed in an arbitrary directory and <<libs.adoc#lib-directives-in-solrconfig,referenced>> from a `<lib />` directive in `solrconfig.xml`, provided the directive refers to a directory and not the actual resource file. Example: `<lib path="/volume/models/" />`
This choice may make sense if the resource is too large for a configSet in ZooKeeper.
However it's up to you to somehow ensure that all nodes in your cluster have access to these resources.
Finally, and this is very unusual, resources can also be packaged inside `.jar` files from which they will be referenced.
That might make sense for default resources wherein a plugin user can override it via placing the same-named file in a configSet.

View File

@ -1,5 +1,8 @@
= Solr Plugins
:page-children: adding-custom-plugins-in-solrcloud-mode
:page-children: libs, \
package-manager, \
adding-custom-plugins-in-solrcloud-mode
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -17,8 +20,37 @@
// specific language governing permissions and limitations
// under the License.
Solr allows you to load custom code to perform a variety of tasks within Solr, from custom Request Handlers to process your searches, to custom Analyzers and Token Filters for your text field. You can even load custom Field Types. These pieces of custom code are called _plugins_.
One of Solr's strengths is providing a rich platform of functionality with the option of adding your own custom components running within Solr.
Not everyone will need to create plugins for their Solr instances - what's provided is usually enough for most applications. However, if there's something that you need, you may want to review the Solr Wiki documentation on plugins at https://cwiki.apache.org/confluence/display/solr/SolrPlugins[SolrPlugins].
Solr calls such components *plugins* when the implementation is configurable.
Surely you have seen many already throughout Solr's configuration via the "class" reference.
Common examples are Request Handlers, Search Components, and Query Parsers to process your searches, and Token Filters for processing text.
If you have a plugin you would like to use, and you are running in SolrCloud mode, you can use the Blob Store API and the Config API to load the jars to Solr. The commands to use are described in the section <<adding-custom-plugins-in-solrcloud-mode.adoc#adding-custom-plugins-in-solrcloud-mode,Adding Custom Plugins in SolrCloud Mode>>.
Most apps don't need to create plugins because Solr offers a rich set of them built-in.
However if you do, start by looking at the code for existing similar plugins.
Writing your own is an advanced subject that is out of scope of the reference guide.
One resource is the Solr Wiki documentation on plugins at https://cwiki.apache.org/confluence/display/solr/SolrPlugins[SolrPlugins], which is rather out-of-date but has some utility.
== Installing Plugins ==
Most plugins are built-in to Solr and there is nothing to install.
The subject here is how to make other plugins available to Solr, including those in contrib modules.
Plugins are packaged into a Java jar file and may have other dependent jar files to function.
The next sections describe some options:
* <<libs.adoc#lib-directories,Lib Directories>>:
Describes where to put the plugin's JAR files on the file system; either in one of the special places or a place convenient to you along with a `<lib/>` directive in `solrconfig.xml`.
This has been the standard approach since Solr's inception.
It's simple and reliable but it's entirely on you to ensure that all nodes in a cluster have them.
Contrib modules ship with Solr so there's no effort for them but not so for other plugins (yours or 3rd party).
* <<package-manager.adoc#package-manager,Package Management>>:
Describes a new and experimental system to manage packages of plugins in SolrCloud.
It includes CLI commands, cluster-wide installation, use of plugin registries that host plugins, cryptographically signed plugins for security, and more.
Only some plugins support this.
* <<adding-custom-plugins-in-solrcloud-mode.adoc#adding-custom-plugins-in-solrcloud-mode,Blob and Runtimelib>>:
Describes a deprecated system that predates the above package management system.
It's functionality is a subset of the package management system.
It will no longer be supported in Solr 9.

View File

@ -1,5 +1,12 @@
= The Well-Configured Solr Instance
:page-children: configuring-solrconfig-xml, solr-cores-and-solr-xml, configuration-apis, implicit-requesthandlers, solr-plugins, jvm-settings, v2-api, package-manager
:page-children: configuring-solrconfig-xml, \
solr-cores-and-solr-xml, \
resource-loading, \
configuration-apis, \
implicit-requesthandlers, \
jvm-settings, \
v2-api
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
@ -25,14 +32,12 @@ This section covers the following topics:
<<solr-cores-and-solr-xml.adoc#solr-cores-and-solr-xml,Solr Cores and solr.xml>>: Describes how to work with `solr.xml` and `core.properties` to configure your Solr core, or multiple Solr cores within a single instance.
<<resource-loading.adoc#resource-loading,Resource Loading>>: Describes how word lists, model files, and other related data are resolved by the components that need them.
<<configuration-apis.adoc#configuration-apis,Configuration APIs>>: Describes several APIs used to configure Solr: Blob Store, Config, Request Parameters and Managed Resources.
<<implicit-requesthandlers.adoc#implicit-requesthandlers,Implicit RequestHandlers>>: Describes various end-points automatically provided by Solr and how to configure them.
<<solr-plugins.adoc#solr-plugins,Solr Plugins>>: Introduces Solr plugins with pointers to more information.
<<package-manager.adoc#glossary-of-terms, Packages and Package Management>>: Installing, deploying and updating packages (containing plugins) into a Solr cluster
<<jvm-settings.adoc#jvm-settings,JVM Settings>>: Gives some guidance on best practices for working with Java Virtual Machines.
<<v2-api.adoc#v2-api,V2 API>>: Describes how to use the new V2 APIs, a redesigned API framework covering most Solr APIs.

View File

@ -516,7 +516,7 @@ The default configuration for `solr.ICUTokenizerFactory` provides UAX#29 word br
[IMPORTANT]
====
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,Resources and Plugins on the Filesystem>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <<solr-plugins.adoc#installing-plugins,Solr Plugins>>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add.
====

View File

@ -353,7 +353,7 @@ The {solr-javadocs}/solr-langid/index.html[`langid`] contrib provides::
The {solr-javadocs}/solr-analysis-extras/index.html[`analysis-extras`] contrib provides::
{solr-javadocs}/solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model. Note that in order to use model files larger than 1MB on SolrCloud, you must either <<setting-up-an-external-zookeeper-ensemble#increasing-the-file-size-limit,configure both ZooKeeper server and clients>> or <<resource-and-plugin-loading.adoc#resources-and-plugins-on-the-filesystem,store the model files on the filesystem>> on each node hosting a collection replica.
{solr-javadocs}/solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model. Note that in order to use model files larger than 1MB on SolrCloud, you must either <<setting-up-an-external-zookeeper-ensemble#increasing-the-file-size-limit,configure both ZooKeeper server and clients>> or <<libs.adoc#lib-directives-in-solrconfig,store the model files on the filesystem>> on each node hosting a collection replica.
=== Update Processor Factories You Should _Not_ Modify or Remove