From 7c048c5070988f35d38d5f592fad5d295ddb380a Mon Sep 17 00:00:00 2001 From: David Smiley Date: Sat, 14 Dec 2019 11:50:00 -0500 Subject: [PATCH] SOLR-14069: Ref guide: overhaul: resources, libs, plugins, config-sets (#1077) * split "resource-and-plugin-loading.adoc" into "resource-loading.adoc" and "libs.adoc" then overhauled both. * enhanced "config-sets.adoc", moving some content in from elsewhere; bit of an overhaul. * solr-plugins.adoc is now top-level; overhauled content * Move resource-loading.adoc up a level in the TOC to underneath "The Well-Configured Solr Instance. * Separate out the leading sentence. --- solr/solr-ref-guide/src/analytics.adoc | 7 +- solr/solr-ref-guide/src/config-sets.adoc | 29 +++++-- .../src/configuring-solrconfig-xml.adoc | 16 +++- .../detecting-languages-during-indexing.adoc | 2 +- .../src/filter-descriptions.adoc | 8 +- solr/solr-ref-guide/src/index.adoc | 21 ++++- .../solr-ref-guide/src/language-analysis.adoc | 32 +++---- solr/solr-ref-guide/src/learning-to-rank.adoc | 2 +- solr/solr-ref-guide/src/libs.adoc | 78 +++++++++++++++++ .../src/resource-and-plugin-loading.adoc | 86 ------------------- solr/solr-ref-guide/src/resource-loading.adoc | 44 ++++++++++ solr/solr-ref-guide/src/solr-plugins.adoc | 40 ++++++++- .../the-well-configured-solr-instance.adoc | 15 ++-- solr/solr-ref-guide/src/tokenizers.adoc | 2 +- .../src/update-request-processors.adoc | 2 +- 15 files changed, 252 insertions(+), 132 deletions(-) create mode 100644 solr/solr-ref-guide/src/libs.adoc delete mode 100644 solr/solr-ref-guide/src/resource-and-plugin-loading.adoc create mode 100644 solr/solr-ref-guide/src/resource-loading.adoc diff --git a/solr/solr-ref-guide/src/analytics.adoc b/solr/solr-ref-guide/src/analytics.adoc index 748ee88fe6b..3ad70670a4f 100644 --- a/solr/solr-ref-guide/src/analytics.adoc +++ b/solr/solr-ref-guide/src/analytics.adoc @@ -33,12 +33,9 @@ Since the Analytics framework is a _search component_, it must be declared as su For distributed analytics requests over cloud collections, the component uses the `AnalyticsHandler` strictly for inter-shard communication. The Analytics Handler should not be used by users to submit analytics requests. -To configure Solr to use the Analytics Component, the first step is to add a `` directive so Solr loads the Analytic Component classes (for more about the `` directive, see <>). In the section of `solrconfig.xml` where the default `` directives are, add a line: +To use the Analytics Component, the first step is to install this contrib module's plugins into Solr -- see the <> section on how to do this. -[source,xml] - - -Next you need to enable the request handler and search component. Add the following lines to `solrconfig.xml`, near the defintions for other request handlers: +Next you need to register the request handler and search component. Add the following lines to `solrconfig.xml`, near the defintions for other request handlers: [source,xml] .solrconfig.xml diff --git a/solr/solr-ref-guide/src/config-sets.adoc b/solr/solr-ref-guide/src/config-sets.adoc index 7bb8a25035c..b846dcbbcd4 100644 --- a/solr/solr-ref-guide/src/config-sets.adoc +++ b/solr/solr-ref-guide/src/config-sets.adoc @@ -16,17 +16,24 @@ // specific language governing permissions and limitations // under the License. -On a multicore Solr instance, you may find that you want to share configuration between a number of different cores. You can achieve this using named configsets, which are essentially shared configuration directories stored under a configurable configset base directory. +Configsets are a set of configuration files used in a Solr installation: `solrconfig.xml`, the schema, and then <> like language files, `synonyms.txt`, DIH-related configuration, and others that are referenced from the config or schema. -Configsets are made up of the configuration files used in a Solr installation: inclduding `solrconfig.xml`, the schema, language-files, `synonyms.txt`, DIH-related configuration, and others as needed for your implementation. +Such configuration, _configsets_, can be named and then referenced by collections or cores, possibly with the intent to share them to avoid duplication. Solr ships with two example configsets located in `server/solr/configsets`, which can be used as a base for your own. These example configsets are named `_default` and `sample_techproducts_configs`. == Configsets in Standalone Mode -If you are using Solr in standalone mode, configsets are created on the filesystem. +If you are using Solr in standalone mode, configsets are managed on the filesystem. -To create a configset, add a new directory under the configset base directory. The configset will be identified by the name of this directory. Then into this copy the configuration directory you want to share. The structure should look something like this: +Each Solr core can have it's very own configSet located beneath it in a `/conf/` dir. +Here, it is not named or shared and the word _configset_ isn't found. +In Solr's early years, this was _the only way_ it was configured. + +To create a named configset, add a new directory under the configset base directory. +The configset will be identified by the name of this directory. +Then add a `conf/` directory containing the configuration you want to share. +The structure should look something like this: [source,bash] ---- @@ -76,4 +83,16 @@ curl -v -X POST -H 'Content-type: application/json' -d '{ == Configsets in SolrCloud Mode -In SolrCloud mode, you can use the <> to manage your configsets. +In SolrCloud, it's critical to understand that configsets are fundamentally stored in ZooKeeper _and not_ the file system. +Solr's `_default` configset is uploaded to ZooKeeper on initialization. +This and some demonstration ones remain on the file system but Solr does not use them whatsoever in this mode. + +When you create a collection in SolrCloud, you can specify a named configset -- possibly shared. +If you don't, then the `_default` will be copied and given a unique name for use by this collection. + +A configset can be uploaded to ZooKeeper either via the <> or more directly via <>. +The Configsets API has some other operations as well, and likewise, so does the CLI. + +To upload a file to a configset already stored on ZooKeeper, you can use <>. + +CAUTION: By default, ZooKeeper's file size limit is 1MB. If your files are larger than this, you'll need to either <> or store them instead <>. \ No newline at end of file diff --git a/solr/solr-ref-guide/src/configuring-solrconfig-xml.adoc b/solr/solr-ref-guide/src/configuring-solrconfig-xml.adoc index aaeb31b3740..fccd9d27435 100644 --- a/solr/solr-ref-guide/src/configuring-solrconfig-xml.adoc +++ b/solr/solr-ref-guide/src/configuring-solrconfig-xml.adoc @@ -1,5 +1,15 @@ = Configuring solrconfig.xml -:page-children: datadir-and-directoryfactory-in-solrconfig, resource-and-plugin-loading, schema-factory-definition-in-solrconfig, indexconfig-in-solrconfig, requesthandlers-and-searchcomponents-in-solrconfig, initparams-in-solrconfig, updatehandlers-in-solrconfig, query-settings-in-solrconfig, requestdispatcher-in-solrconfig, update-request-processors, codec-factory +:page-children: datadir-and-directoryfactory-in-solrconfig, \ + schema-factory-definition-in-solrconfig, \ + indexconfig-in-solrconfig, \ + requesthandlers-and-searchcomponents-in-solrconfig, \ + initparams-in-solrconfig, \ + updatehandlers-in-solrconfig, \ + query-settings-in-solrconfig, \ + requestdispatcher-in-solrconfig, \ + update-request-processors, \ + codec-factory + // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information @@ -38,7 +48,6 @@ The `solrconfig.xml` file is located in the `conf/` directory for each collectio We've covered the options in the following sections: * <> -* <> * <> * <> * <> @@ -49,6 +58,9 @@ We've covered the options in the following sections: * <> * <> +Some SolrConfig aspects are covered in other sections. +See <>, which can be used for both Plugins and Resources. + == Substituting Properties in Solr Config Files Solr supports variable substitution of property values in configuration files, which allows runtime specification of various configuration options in `solrconfig.xml`. The syntax is `${propertyname[:option default value]`}. This allows defining a default that can be overridden when Solr is launched. If a default value is not specified, then the property _must_ be specified at runtime or the configuration file will generate an error when parsed. diff --git a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc index 8d446a29fe7..92e5986dead 100644 --- a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc +++ b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc @@ -80,7 +80,7 @@ Here is an example of a minimal OpenNLP `langid` configuration in `solrconfig.xm ==== OpenNLP-specific Parameters `langid.model`:: -An OpenNLP language detection model. The OpenNLP project provides a pre-trained 103 language model on the http://opennlp.apache.org/models.html[OpenNLP site's model dowload page]. Model training instructions are provided on the http://opennlp.apache.org/docs/{ivy-opennlp-version}/manual/opennlp.html#tools.langdetect[OpenNLP website]. This parameter is required. See <> for information on where to put the model. +An OpenNLP language detection model. The OpenNLP project provides a pre-trained 103 language model on the http://opennlp.apache.org/models.html[OpenNLP site's model dowload page]. Model training instructions are provided on the http://opennlp.apache.org/docs/{ivy-opennlp-version}/manual/opennlp.html#tools.langdetect[OpenNLP website]. This parameter is required. See <> for information on where to put the model. ==== OpenNLP Language Codes diff --git a/solr/solr-ref-guide/src/filter-descriptions.adoc b/solr/solr-ref-guide/src/filter-descriptions.adoc index f59a366a5b1..1ddfd531aa2 100644 --- a/solr/solr-ref-guide/src/filter-descriptions.adoc +++ b/solr/solr-ref-guide/src/filter-descriptions.adoc @@ -732,7 +732,7 @@ Note that for this filter to work properly, the upstream tokenizer must not remo This filter is a custom Unicode normalization form that applies the foldings specified in http://www.unicode.org/reports/tr30/tr30-4.html[Unicode TR #30: Character Foldings] in addition to the `NFKC_Casefold` normalization form as described in <>. This filter is a better substitute for the combined behavior of the <>, <>, and <>. -To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. *Factory class:* `solr.ICUFoldingFilterFactory` @@ -840,7 +840,7 @@ This filter factory normalizes text according to one of five Unicode Normalizati For detailed information about these normalization forms, see http://unicode.org/reports/tr15/[Unicode Normalization Forms]. -To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. == ICU Transform Filter @@ -882,7 +882,7 @@ This filter applies http://userguide.icu-project.org/transforms/general[ICU Tran For detailed information about ICU Transforms, see http://userguide.icu-project.org/transforms/general. -To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. == Keep Word Filter @@ -2210,7 +2210,7 @@ NOTE: Although this filter produces correct token graphs, it cannot consume an i *Arguments:* -`synonyms`:: (required) The path of a file that contains a list of synonyms, one per line. In the (default) `solr` format - see the `format` argument below for alternatives - blank lines and lines that begin with "`#`" are ignored. This may be a comma-separated list of paths. See <> for more information. +`synonyms`:: (required) The path of a file that contains a list of synonyms, one per line. In the (default) `solr` format - see the `format` argument below for alternatives - blank lines and lines that begin with "`#`" are ignored. This may be a comma-separated list of paths. See <> for more information. + There are two ways to specify synonym mappings: + diff --git a/solr/solr-ref-guide/src/index.adoc b/solr/solr-ref-guide/src/index.adoc index 53df5005aec..f187d2a7916 100644 --- a/solr/solr-ref-guide/src/index.adoc +++ b/solr/solr-ref-guide/src/index.adoc @@ -1,5 +1,24 @@ = Apache Solr Reference Guide -:page-children: about-this-guide, getting-started, deployment-and-operations, using-the-solr-administration-user-interface, documents-fields-and-schema-design, understanding-analyzers-tokenizers-and-filters, indexing-and-basic-data-operations, searching, streaming-expressions, solrcloud, legacy-scaling-and-distribution, the-well-configured-solr-instance, monitoring-solr, securing-solr, client-apis, further-assistance, solr-glossary, errata, how-to-contribute +:page-children: about-this-guide, \ + getting-started, \ + deployment-and-operations, \ + using-the-solr-administration-user-interface, \ + documents-fields-and-schema-design, \ + understanding-analyzers-tokenizers-and-filters, \ + indexing-and-basic-data-operations, \ + searching, \ + streaming-expressions, \ + solrcloud, \ + legacy-scaling-and-distribution, \ + solr-plugins, \ + the-well-configured-solr-instance, \ + monitoring-solr, \ + securing-solr, \ + client-apis, \ + further-assistance, \ + solr-glossary, \ + errata, \ + how-to-contribute :page-notitle: :page-toc: false :page-layout: home diff --git a/solr/solr-ref-guide/src/language-analysis.adoc b/solr/solr-ref-guide/src/language-analysis.adoc index d71b8f5ba8e..adeb4b006da 100644 --- a/solr/solr-ref-guide/src/language-analysis.adoc +++ b/solr/solr-ref-guide/src/language-analysis.adoc @@ -166,7 +166,7 @@ Compound words are most commonly found in Germanic languages. *Arguments:* -`dictionary`:: (required) The path of a file that contains a list of simple words, one per line. Blank lines and lines that begin with "#" are ignored. See <> for more information. +`dictionary`:: (required) The path of a file that contains a list of simple words, one per line. Blank lines and lines that begin with "#" are ignored. See <> for more information. `minWordSize`:: (integer, default 5) Any token shorter than this is not decompounded. @@ -220,7 +220,7 @@ Unicode Collation in Solr is fast, because all the work is done at index time. Rather than specifying an analyzer within ``, the `solr.CollationField` and `solr.ICUCollationField` field type classes provide this functionality. `solr.ICUCollationField`, which is backed by http://site.icu-project.org[the ICU4J library], provides more flexible configuration, has more locales, is significantly faster, and requires less memory and less index space, since its keys are smaller than those produced by the JDK implementation that backs `solr.CollationField`. -To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +To use `solr.ICUCollationField`, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. `solr.ICUCollationField` and `solr.CollationField` fields can be created in two ways: @@ -487,7 +487,7 @@ The `lucene/analysis/opennlp` module provides OpenNLP integration via several an NOTE: The <> must be used with all other OpenNLP analysis components, for two reasons: first, the OpenNLP Tokenizer detects and marks the sentence boundaries required by all the OpenNLP filters; and second, since the pre-trained OpenNLP models used by these filters were trained using the corresponding language-specific sentence-detection/tokenization models, the same tokenization, using the same models, must be used at runtime for optimal performance. -To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +To use the OpenNLP components, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. === OpenNLP Tokenizer @@ -497,9 +497,9 @@ The OpenNLP Tokenizer takes two language-specific binary model files as paramete *Arguments:* -`sentenceModel`:: (required) The path of a language-specific OpenNLP sentence detection model file. See <> for more information. +`sentenceModel`:: (required) The path of a language-specific OpenNLP sentence detection model file. See <> for more information. -`tokenizerModel`:: (required) The path of a language-specific OpenNLP tokenization model file. See <> for more information. +`tokenizerModel`:: (required) The path of a language-specific OpenNLP tokenization model file. See <> for more information. *Example:* @@ -541,7 +541,7 @@ NOTE: Lucene currently does not index token types, so if you want to keep this i *Arguments:* -`posTaggerModel`:: (required) The path of a language-specific OpenNLP POS tagger model file. See <> for more information. +`posTaggerModel`:: (required) The path of a language-specific OpenNLP POS tagger model file. See <> for more information. *Examples:* @@ -636,7 +636,7 @@ NOTE: Lucene currently does not index token types, so if you want to keep this i *Arguments:* -`chunkerModel`:: (required) The path of a language-specific OpenNLP phrase chunker model file. See <> for more information. +`chunkerModel`:: (required) The path of a language-specific OpenNLP phrase chunker model file. See <> for more information. *Examples*: @@ -700,9 +700,9 @@ This filter replaces the text of each token with its lemma. Both a dictionary-ba Either `dictionary` or `lemmatizerModel` must be provided, and both may be provided - see the examples below: -`dictionary`:: (optional) The path of a lemmatization dictionary file. See <> for more information. The dictionary file must be encoded as UTF-8, with one entry per line, in the form `word[tab]lemma[tab]part-of-speech`, e.g., `wrote[tab]write[tab]VBD`. +`dictionary`:: (optional) The path of a lemmatization dictionary file. See <> for more information. The dictionary file must be encoded as UTF-8, with one entry per line, in the form `word[tab]lemma[tab]part-of-speech`, e.g., `wrote[tab]write[tab]VBD`. -`lemmatizerModel`:: (optional) The path of a language-specific OpenNLP lemmatizer model file. See <> for more information. +`lemmatizerModel`:: (optional) The path of a language-specific OpenNLP lemmatizer model file. See <> for more information. *Examples:* @@ -1033,7 +1033,7 @@ Solr can stem Catalan using the Snowball Porter Stemmer with an argument of `lan === Traditional Chinese -The default configuration of the <> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. +The default configuration of the <> is suitable for Traditional Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. <> can also be used to tokenize Traditional Chinese text. Following the Word Break rules from the Unicode Text Segmentation algorithm, it produces one token per Chinese character. When combined with <>, overlapping bigrams of Chinese characters are formed. @@ -1105,9 +1105,9 @@ See the example under <>. === Simplified Chinese -For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. +For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the <>. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. -The default configuration of the <> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. +The default configuration of the <> is also suitable for Simplified Chinese text. It follows the Word Break rules from the Unicode Text Segmentation algorithm for non-Chinese text, and uses a dictionary to segment Chinese words. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. Also useful for Chinese analysis: @@ -1162,7 +1162,7 @@ Also useful for Chinese analysis: === HMM Chinese Tokenizer -For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the `solr.HMMChineseTokenizerFactory` in the `analysis-extras` contrib module. This component includes a large dictionary and segments Chinese text into words with the Hidden Markov Model. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. *Factory class:* `solr.HMMChineseTokenizerFactory` @@ -1958,7 +1958,7 @@ Example: [[hebrew-lao-myanmar-khmer]] === Hebrew, Lao, Myanmar, Khmer -Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add. +Lucene provides support, in addition to UAX#29 word break rules, for Hebrew's use of the double and single quote characters, and for segmenting Lao, Myanmar, and Khmer into syllables with the `solr.ICUTokenizerFactory` in the `analysis-extras` contrib module. To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt for` instructions on which jars you need to add. See <> for more information. @@ -2165,7 +2165,7 @@ Solr includes support for normalizing Persian, and Lucene includes an example st === Polish -Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +Solr provides support for Polish stemming with the `solr.StempelPolishStemFilterFactory`, and `solr.MorphologikFilterFactory` for lemmatization, in the `contrib/analysis-extras` module. The `solr.StempelPolishStemFilterFactory` component includes an algorithmic stemmer with tables for Polish. To use either of these filters, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. *Factory class:* `solr.StempelPolishStemFilterFactory` and `solr.MorfologikFilterFactory` @@ -2682,7 +2682,7 @@ Solr includes support for stemming Turkish with the `solr.SnowballPorterFilterFa === Ukrainian -Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. +Solr provides support for Ukrainian lemmatization with the `solr.MorphologikFilterFactory`, in the `contrib/analysis-extras` module. To use this filter, you must add additional .jars to Solr's classpath (as described in the section <>). See `solr/contrib/analysis-extras/README.txt` for instructions on which jars you need to add. Lucene also includes an example Ukrainian stopword list, in the `lucene-analyzers-morfologik` jar. diff --git a/solr/solr-ref-guide/src/learning-to-rank.adoc b/solr/solr-ref-guide/src/learning-to-rank.adoc index 4f550cded7f..a0488a6c5db 100644 --- a/solr/solr-ref-guide/src/learning-to-rank.adoc +++ b/solr/solr-ref-guide/src/learning-to-rank.adoc @@ -533,7 +533,7 @@ Assuming that you consider to use a large model placed at `/path/to/models/myMod } ---- -First, add the directory to Solr's resource paths with a <` directive>>: +First, add the directory to Solr's resource paths with a <` directive>>: [source,xml] ---- diff --git a/solr/solr-ref-guide/src/libs.adoc b/solr/solr-ref-guide/src/libs.adoc new file mode 100644 index 00000000000..91f9bb81db9 --- /dev/null +++ b/solr/solr-ref-guide/src/libs.adoc @@ -0,0 +1,78 @@ += Lib Directories and Directives + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Here we describe two simple and effective methods to make the `.jar` files for Solr plugins visible to Solr. + +Such files are sometimes called "libraries" or "libs" for short. +Essentially you can put them in some special places or explicitly tell Solr about them from your config. + +If there is overlap or inter-dependencies between libraries, then pay attention to the order. You can think of it like a stack that is searched top-down. At the top are the lib directives in reverse order, then Solr core's lib, then Solr home's lib, then Solr itself. + +== Lib Directories + +There are several special places you can place Solr plugin `.jar` files: + +* `/lib/`: The `.jar` files placed here are available to all Solr cores running on the node, and to node level plugins referenced in `solr.xml` -- so basically everything. +This directory is not present by default so create it. +See <>. + +* `/lib/`: In standalone Solr, you may want to add plugins just for a specific Solr core. +Create this adjacent to the `conf/` directory; it's not present by default. + +* `/server/solr-webapp/webapp/WEB-INF/lib/`: The `.jar` files for Solr itself and it's dependencies live here. +Certain plugins or add-ons to plugins require placement here. +They will document themselves to say so. + +Solr incorporates Jetty for providing HTTP server functionality. +Jetty has some directories that contain `.jar` files for itself and its own plugins / modules or JVM level plugins (e.g. loggers). +Solr plugins won't work in these locations. + +== Lib Directives in SolrConfig + +_Both_ plugin and <> file paths are configurable via `` directives in `solrconfig.xml`. +When a directive matches a directory, then resources can be resolved from it. +When a directive matches a `.jar` file, Solr plugins and their dependencies are resolved from it. +Resources can be placed in a `.jar` too but that's unusual. +It's erroneous to refer to any other type of file. + +A `` directive must have one (not both) of these two attributes: + +* `path`: used to refer to a single directory (for resources) or file (for a plugin `.jar`) + +* `dir`: used to refer to _all_ direct descendants of the specified directory. Optionally supply a `regex` attribute to filter these to those matching the regular expression. + +All directories are resolved as relative to the Solr core's `instanceDir`. + +These examples show how to load contrib modules into Solr: + +[source,xml] +---- + + + + + + + + + + + + +---- diff --git a/solr/solr-ref-guide/src/resource-and-plugin-loading.adoc b/solr/solr-ref-guide/src/resource-and-plugin-loading.adoc deleted file mode 100644 index 6efd1353290..00000000000 --- a/solr/solr-ref-guide/src/resource-and-plugin-loading.adoc +++ /dev/null @@ -1,86 +0,0 @@ -= Resource and Plugin Loading -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor license agreements. See the NOTICE file -// distributed with this work for additional information -// regarding copyright ownership. The ASF licenses this file -// to you under the Apache License, Version 2.0 (the -// "License"); you may not use this file except in compliance -// with the License. You may obtain a copy of the License at -// -// http://www.apache.org/licenses/LICENSE-2.0 -// -// Unless required by applicable law or agreed to in writing, -// software distributed under the License is distributed on an -// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -// KIND, either express or implied. See the License for the -// specific language governing permissions and limitations -// under the License. - -Solr components can be configured using *resources*: data stored in external files that may be referred to in a location-independent fashion. Examples include: files needed by schema components, e.g., a stopword list for <>; and machine-learned models for <>. - -Solr *plugins*, which can be configured in `solrconfig.xml`, are Java classes that are normally packaged in `.jar` files along with supporting classes and data. Solr ships with a number of built-in plugins, and can also be configured to use custom plugins. Example plugins are the <> and custom search components. - -Resources and plugins may be stored: - -* in ZooKeeper under a collection's configset node (SolrCloud only); -* on a filesystem accessible to Solr nodes; or -* in Solr's <> (SolrCloud only). - -NOTE: Schema components may not be stored as plugins in the Blob Store, and cannot access resources stored in the Blob Store. - -== Resource and Plugin Loading Sequence - -Under SolrCloud, resources and plugins to be loaded are first looked up in ZooKeeper under the collection's configset znode. If the resource or plugin is not found there, Solr will fall back to loading <>. - -Note that by default, Solr will not attempt to load resources and plugins from the Blob Store. To enable this, see the section <>. When loading from the Blob Store is enabled for a component, lookups occur only in the Blob Store, and never in ZooKeeper or on the filesystem. - -== Resources and Plugins in ConfigSets on ZooKeeper - -Resources and plugins may be uploaded to ZooKeeper as part of a configset, either via the <> or <>. - -To upload a plugin or resource to a configset already stored on ZooKeeper, you can use <>. - -CAUTION: By default, ZooKeeper's file size limit is 1MB. If your files are larger than this, you'll need to either <> or store them instead <>. - -== Resources and Plugins on the Filesystem - -Under standalone Solr, when looking up a plugin or resource to be loaded, Solr's resource loader will first look under the `/conf/` directory. If the plugin or resource is not found, the configured plugin and resource file paths are searched - see the section <> below. - -On core load, Solr's resource loader constructs a list of paths (subdirectories and jars), first under <>, and then under directories pointed to by <` directives in SolrConfig>>. - -When looking up a resource or plugin to be loaded, the paths on the list are searched in the order they were added. - -NOTE: Under SolrCloud, each node hosting a collection replica will need its own copy of plugins and resources to be loaded. - -To get Solr's resource loader to find resources either under subdirectories or in jar files that were created after Solr's resource path list was constructed, reload the collection (SolrCloud) or the core (standalone Solr). Restarting all affected Solr nodes also works. - -WARNING: Resource files *will not be loaded* if they are located directly under either `solr_home/lib` or a directory given by the `dir` attribute on a `` directive in SolrConfig. Resources are only searched for under subdirectories or in jar files found in those locations. - -=== solr_home/lib - -Each Solr node can have a directory named `lib/` under the <>. In order to use this directory to host resources or plugins, it must first be manually created. - -=== Lib Directives in SolrConfig - -Plugin and resource file paths are configurable via `` directives in `solrconfig.xml`. - -Loading occurs in the order `` directives appear in `solrconfig.xml`. If there are dependencies, list the lowest level dependency jar first. - -A regular expression supplied in the `` element's `regex` attribute value can be used to restrict which subdirectories and/or jar files are added to the Solr resource loader's list of search locations. If no regular expression is given, all direct subdirectory and jar children are included in the resource path list. All directories are resolved as relative to the Solr core's `instanceDir`. - -From an example SolrConfig: - -[source,xml] ----- - - - - - - - - - - - ----- diff --git a/solr/solr-ref-guide/src/resource-loading.adoc b/solr/solr-ref-guide/src/resource-loading.adoc new file mode 100644 index 00000000000..944fb37a954 --- /dev/null +++ b/solr/solr-ref-guide/src/resource-loading.adoc @@ -0,0 +1,44 @@ += Resource Loading + +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +Solr components can be configured using *resources*: data stored in external files that may be referred to in a location-independent fashion. + +Examples of resources include: files needed by schema components, e.g., a stopword list for <>; and machine-learned models for <>. +_Resources are typically resolved from the configSet_ but there are other options too. + +Solr's resources are generally only loaded initially when the Solr collection or Solr core is loaded. +After you update a resource, you'll typically need to _reload_ the affected collections (SolrCloud) or the cores (standalone Solr). +Restarting all affected Solr nodes also works. +<> can be manipulated via APIs and do not need an explicit reload. + +== Resources in ConfigSets + +<> are the directories containing solrconfig.xml, the schema, and resources referenced by them. +In SolrCloud they are in ZooKeeper whereas in standalone they are on the file system. +In either mode, configSets might be shared or might be dedicated to a configSet. +Prefer to put resources here. + +== Resources in Other Places + +Resources can also be placed in an arbitrary directory and <> from a `` directive in `solrconfig.xml`, provided the directive refers to a directory and not the actual resource file. Example: `` +This choice may make sense if the resource is too large for a configSet in ZooKeeper. +However it's up to you to somehow ensure that all nodes in your cluster have access to these resources. + +Finally, and this is very unusual, resources can also be packaged inside `.jar` files from which they will be referenced. +That might make sense for default resources wherein a plugin user can override it via placing the same-named file in a configSet. \ No newline at end of file diff --git a/solr/solr-ref-guide/src/solr-plugins.adoc b/solr/solr-ref-guide/src/solr-plugins.adoc index 5e96b3ceb1d..b3693cfe75b 100644 --- a/solr/solr-ref-guide/src/solr-plugins.adoc +++ b/solr/solr-ref-guide/src/solr-plugins.adoc @@ -1,5 +1,8 @@ = Solr Plugins -:page-children: adding-custom-plugins-in-solrcloud-mode +:page-children: libs, \ + package-manager, \ + adding-custom-plugins-in-solrcloud-mode + // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information @@ -17,8 +20,37 @@ // specific language governing permissions and limitations // under the License. -Solr allows you to load custom code to perform a variety of tasks within Solr, from custom Request Handlers to process your searches, to custom Analyzers and Token Filters for your text field. You can even load custom Field Types. These pieces of custom code are called _plugins_. +One of Solr's strengths is providing a rich platform of functionality with the option of adding your own custom components running within Solr. -Not everyone will need to create plugins for their Solr instances - what's provided is usually enough for most applications. However, if there's something that you need, you may want to review the Solr Wiki documentation on plugins at https://cwiki.apache.org/confluence/display/solr/SolrPlugins[SolrPlugins]. +Solr calls such components *plugins* when the implementation is configurable. +Surely you have seen many already throughout Solr's configuration via the "class" reference. +Common examples are Request Handlers, Search Components, and Query Parsers to process your searches, and Token Filters for processing text. -If you have a plugin you would like to use, and you are running in SolrCloud mode, you can use the Blob Store API and the Config API to load the jars to Solr. The commands to use are described in the section <>. +Most apps don't need to create plugins because Solr offers a rich set of them built-in. +However if you do, start by looking at the code for existing similar plugins. +Writing your own is an advanced subject that is out of scope of the reference guide. +One resource is the Solr Wiki documentation on plugins at https://cwiki.apache.org/confluence/display/solr/SolrPlugins[SolrPlugins], which is rather out-of-date but has some utility. + +== Installing Plugins == + +Most plugins are built-in to Solr and there is nothing to install. +The subject here is how to make other plugins available to Solr, including those in contrib modules. +Plugins are packaged into a Java jar file and may have other dependent jar files to function. + +The next sections describe some options: + +* <>: +Describes where to put the plugin's JAR files on the file system; either in one of the special places or a place convenient to you along with a `` directive in `solrconfig.xml`. +This has been the standard approach since Solr's inception. +It's simple and reliable but it's entirely on you to ensure that all nodes in a cluster have them. +Contrib modules ship with Solr so there's no effort for them but not so for other plugins (yours or 3rd party). + +* <>: +Describes a new and experimental system to manage packages of plugins in SolrCloud. +It includes CLI commands, cluster-wide installation, use of plugin registries that host plugins, cryptographically signed plugins for security, and more. +Only some plugins support this. + +* <>: +Describes a deprecated system that predates the above package management system. +It's functionality is a subset of the package management system. +It will no longer be supported in Solr 9. diff --git a/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc b/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc index 85531e0f3b8..546f2c6944e 100644 --- a/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc +++ b/solr/solr-ref-guide/src/the-well-configured-solr-instance.adoc @@ -1,5 +1,12 @@ = The Well-Configured Solr Instance -:page-children: configuring-solrconfig-xml, solr-cores-and-solr-xml, configuration-apis, implicit-requesthandlers, solr-plugins, jvm-settings, v2-api, package-manager +:page-children: configuring-solrconfig-xml, \ + solr-cores-and-solr-xml, \ + resource-loading, \ + configuration-apis, \ + implicit-requesthandlers, \ + jvm-settings, \ + v2-api + // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information @@ -25,14 +32,12 @@ This section covers the following topics: <>: Describes how to work with `solr.xml` and `core.properties` to configure your Solr core, or multiple Solr cores within a single instance. +<>: Describes how word lists, model files, and other related data are resolved by the components that need them. + <>: Describes several APIs used to configure Solr: Blob Store, Config, Request Parameters and Managed Resources. <>: Describes various end-points automatically provided by Solr and how to configure them. -<>: Introduces Solr plugins with pointers to more information. - -<>: Installing, deploying and updating packages (containing plugins) into a Solr cluster - <>: Gives some guidance on best practices for working with Java Virtual Machines. <>: Describes how to use the new V2 APIs, a redesigned API framework covering most Solr APIs. diff --git a/solr/solr-ref-guide/src/tokenizers.adoc b/solr/solr-ref-guide/src/tokenizers.adoc index c883342debe..843c0fe8623 100644 --- a/solr/solr-ref-guide/src/tokenizers.adoc +++ b/solr/solr-ref-guide/src/tokenizers.adoc @@ -516,7 +516,7 @@ The default configuration for `solr.ICUTokenizerFactory` provides UAX#29 word br [IMPORTANT] ==== -To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. +To use this tokenizer, you must add additional .jars to Solr's classpath (as described in the section <>). See the `solr/contrib/analysis-extras/README.txt` for information on which jars you need to add. ==== diff --git a/solr/solr-ref-guide/src/update-request-processors.adoc b/solr/solr-ref-guide/src/update-request-processors.adoc index 27b999bc7eb..e459ddf4a82 100644 --- a/solr/solr-ref-guide/src/update-request-processors.adoc +++ b/solr/solr-ref-guide/src/update-request-processors.adoc @@ -353,7 +353,7 @@ The {solr-javadocs}/solr-langid/index.html[`langid`] contrib provides:: The {solr-javadocs}/solr-analysis-extras/index.html[`analysis-extras`] contrib provides:: -{solr-javadocs}/solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model. Note that in order to use model files larger than 1MB on SolrCloud, you must either <> or <> on each node hosting a collection replica. +{solr-javadocs}/solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html[OpenNLPExtractNamedEntitiesUpdateProcessorFactory]::: Update document(s) to be indexed with named entities extracted using an OpenNLP NER model. Note that in order to use model files larger than 1MB on SolrCloud, you must either <> or <> on each node hosting a collection replica. === Update Processor Factories You Should _Not_ Modify or Remove