mirror of https://github.com/apache/lucene.git
Ref Guide: Lots of grammar fixes: remove random spaces before commas, periods, colons; add missing periods at
ends of sentences; add monospace for conf filenames and paths; correct "i.e." and "e.g." usages
This commit is contained in:
parent
32ed8520c7
commit
188c26c8b7
|
@ -39,7 +39,7 @@ For a complete list of the available TokenFilters, see the section <<tokenizers.
|
||||||
|
|
||||||
== When To use a CharFilter vs. a TokenFilter
|
== When To use a CharFilter vs. a TokenFilter
|
||||||
|
|
||||||
There are several pairs of CharFilters and TokenFilters that have related (ie: `MappingCharFilter` and `ASCIIFoldingFilter`) or nearly identical (ie: `PatternReplaceCharFilterFactory` and `PatternReplaceFilterFactory`) functionality and it may not always be obvious which is the best choice.
|
There are several pairs of CharFilters and TokenFilters that have related (i.e., `MappingCharFilter` and `ASCIIFoldingFilter`) or nearly identical (i.e., `PatternReplaceCharFilterFactory` and `PatternReplaceFilterFactory`) functionality and it may not always be obvious which is the best choice.
|
||||||
|
|
||||||
The decision about which to use depends largely on which Tokenizer you are using, and whether you need to preprocess the stream of characters.
|
The decision about which to use depends largely on which Tokenizer you are using, and whether you need to preprocess the stream of characters.
|
||||||
|
|
||||||
|
|
|
@ -90,7 +90,7 @@ At query time, the only normalization that happens is to convert the query terms
|
||||||
|
|
||||||
=== Analysis for Multi-Term Expansion
|
=== Analysis for Multi-Term Expansion
|
||||||
|
|
||||||
In some types of queries (ie: Prefix, Wildcard, Regex, etc...) the input provided by the user is not natural language intended for Analysis. Things like Synonyms or Stop word filtering do not work in a logical way in these types of Queries.
|
In some types of queries (i.e., Prefix, Wildcard, Regex, etc...) the input provided by the user is not natural language intended for Analysis. Things like Synonyms or Stop word filtering do not work in a logical way in these types of Queries.
|
||||||
|
|
||||||
The analysis factories that _can_ work in these types of queries (such as Lowercasing, or Normalizing Factories) are known as {lucene-javadocs}/analyzers-common/org/apache/lucene/analysis/util/MultiTermAwareComponent.html[`MultiTermAwareComponents`]. When Solr needs to perform analysis for a query that results in Multi-Term expansion, only the `MultiTermAwareComponents` used in the `query` analyzer are used, Factory that is not Multi-Term aware will be skipped.
|
The analysis factories that _can_ work in these types of queries (such as Lowercasing, or Normalizing Factories) are known as {lucene-javadocs}/analyzers-common/org/apache/lucene/analysis/util/MultiTermAwareComponent.html[`MultiTermAwareComponents`]. When Solr needs to perform analysis for a query that results in Multi-Term expansion, only the `MultiTermAwareComponents` used in the `query` analyzer are used, Factory that is not Multi-Term aware will be skipped.
|
||||||
|
|
||||||
|
|
|
@ -109,7 +109,7 @@ Replica placement rules. See the section <<rule-based-replica-placement.adoc#rul
|
||||||
`snitch`::
|
`snitch`::
|
||||||
Details of the snitch provider. See the section <<rule-based-replica-placement.adoc#rule-based-replica-placement,Rule-based Replica Placement>> for details.
|
Details of the snitch provider. See the section <<rule-based-replica-placement.adoc#rule-based-replica-placement,Rule-based Replica Placement>> for details.
|
||||||
|
|
||||||
`policy`:: Name of the collection-level policy. See <<solrcloud-autoscaling-policy-preferences.adoc#collection-specific-policy, Defining Collection-Specific Policies >> for details
|
`policy`:: Name of the collection-level policy. See <<solrcloud-autoscaling-policy-preferences.adoc#collection-specific-policy, Defining Collection-Specific Policies >> for details.
|
||||||
|
|
||||||
=== CREATE Response
|
=== CREATE Response
|
||||||
|
|
||||||
|
|
|
@ -85,9 +85,9 @@ For `clusterprop`: the cluster property value. If not specified, *null* will be
|
||||||
|
|
||||||
[TIP]
|
[TIP]
|
||||||
====
|
====
|
||||||
The short form parameter options may be specified with a single dash (eg: `-c mycollection`).
|
The short form parameter options may be specified with a single dash (e.g., `-c mycollection`).
|
||||||
|
|
||||||
The long form parameter options may be specified using either a single dash (eg: `-collection mycollection`) or a double dash (eg: `--collection mycollection`)
|
The long form parameter options may be specified using either a single dash (e.g., `-collection mycollection`) or a double dash (e.g., `--collection mycollection`)
|
||||||
====
|
====
|
||||||
|
|
||||||
== ZooKeeper CLI Examples
|
== ZooKeeper CLI Examples
|
||||||
|
|
|
@ -28,7 +28,7 @@ The defType parameter selects the query parser that Solr should use to process t
|
||||||
|
|
||||||
`defType=dismax`
|
`defType=dismax`
|
||||||
|
|
||||||
If no defType param is specified, then by default, the <<the-standard-query-parser.adoc#the-standard-query-parser,The Standard Query Parser>> is used. (eg: `defType=lucene`)
|
If no defType param is specified, then by default, the <<the-standard-query-parser.adoc#the-standard-query-parser,The Standard Query Parser>> is used. (e.g., `defType=lucene`)
|
||||||
|
|
||||||
== sort Parameter
|
== sort Parameter
|
||||||
|
|
||||||
|
|
|
@ -28,7 +28,7 @@ When using this API, `solrconfig.xml` is not changed. Instead, all edited config
|
||||||
|
|
||||||
* `/config`: retrieve or modify the config. GET to retrieve and POST for executing commands
|
* `/config`: retrieve or modify the config. GET to retrieve and POST for executing commands
|
||||||
* `/config/overlay`: retrieve the details in the `configoverlay.json` alone
|
* `/config/overlay`: retrieve the details in the `configoverlay.json` alone
|
||||||
* `/config/params` : allows creating parameter sets that can override or take the place of parameters defined in `solrconfig.xml`. See the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>> section for more details.
|
* `/config/params`: allows creating parameter sets that can override or take the place of parameters defined in `solrconfig.xml`. See the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>> section for more details.
|
||||||
|
|
||||||
== Retrieving the Config
|
== Retrieving the Config
|
||||||
|
|
||||||
|
@ -427,7 +427,7 @@ And you should see the following as output:
|
||||||
"httpMethod":"GET"}}
|
"httpMethod":"GET"}}
|
||||||
----
|
----
|
||||||
|
|
||||||
To update a request handler, you should use the `update-requesthandler` command :
|
To update a request handler, you should use the `update-requesthandler` command:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
|
@ -507,7 +507,7 @@ curl http://localhost:8983/solr/techproducts/config -H'Content-type:application/
|
||||||
|
|
||||||
Directly editing any files without 'touching' the directory *will not* make it visible to all nodes.
|
Directly editing any files without 'touching' the directory *will not* make it visible to all nodes.
|
||||||
|
|
||||||
It is possible for components to watch for the configset 'touch' events by registering a listener using `SolrCore#registerConfListener()` .
|
It is possible for components to watch for the configset 'touch' events by registering a listener using `SolrCore#registerConfListener()`.
|
||||||
|
|
||||||
=== Listening to config Changes
|
=== Listening to config Changes
|
||||||
|
|
||||||
|
|
|
@ -172,7 +172,7 @@ The output will include the status of the request. If the status is anything oth
|
||||||
|
|
||||||
=== Upload ConfigSet Examples
|
=== Upload ConfigSet Examples
|
||||||
|
|
||||||
Create a config set named 'myConfigSet' from the zipped file myconfigset.zip. The zip file must be created from within the conf directory (i.e. the solrconfig.xml must be the top level entry in the zip file). Here is an example on how to create the zip file and upload it.
|
Create a config set named 'myConfigSet' from the zipped file myconfigset.zip. The zip file must be created from within the `conf` directory (i.e., `solrconfig.xml` must be the top level entry in the zip file). Here is an example on how to create the zip file and upload it:
|
||||||
|
|
||||||
[source,text]
|
[source,text]
|
||||||
----
|
----
|
||||||
|
@ -181,7 +181,7 @@ $ (cd solr/server/solr/configsets/sample_techproducts_configs/conf && zip -r - *
|
||||||
$ curl -X POST --header "Content-Type:application/octet-stream" --data-binary @myconfigset.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet"
|
$ curl -X POST --header "Content-Type:application/octet-stream" --data-binary @myconfigset.zip "http://localhost:8983/solr/admin/configs?action=UPLOAD&name=myConfigSet"
|
||||||
----
|
----
|
||||||
|
|
||||||
The same can be achieved using a unix pipe, without creating an intermediate zip file, as follows:
|
The same can be achieved using a Unix pipe, without creating an intermediate zip file, as follows:
|
||||||
|
|
||||||
[source,text]
|
[source,text]
|
||||||
----
|
----
|
||||||
|
|
|
@ -111,7 +111,7 @@ For high-volume search applications, logging every query can generate a large am
|
||||||
|
|
||||||
On the other hand, if you're only concerned about warnings and error messages related to requests, then you can set the log verbosity to WARN. However, this poses a potential problem in that you won't know if any queries are slow, as slow queries are still logged at the INFO level.
|
On the other hand, if you're only concerned about warnings and error messages related to requests, then you can set the log verbosity to WARN. However, this poses a potential problem in that you won't know if any queries are slow, as slow queries are still logged at the INFO level.
|
||||||
|
|
||||||
Solr provides a way to set your log verbosity threshold to WARN and be able to set a latency threshold above which a request is considered "slow" and log that request at the WARN level to help you identify slow queries in your application. To enable this behavior, configure the `<slowQueryThresholdMillis>` element in the *query* section of solrconfig.xml:
|
Solr provides a way to set your log verbosity threshold to WARN and be able to set a latency threshold above which a request is considered "slow" and log that request at the WARN level to help you identify slow queries in your application. To enable this behavior, configure the `<slowQueryThresholdMillis>` element in the *query* section of `solrconfig.xml`:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -69,7 +69,7 @@ There is an additional configuration option available, which is to modify the `d
|
||||||
Please note that the `docValuesFormat` option may change in future releases.
|
Please note that the `docValuesFormat` option may change in future releases.
|
||||||
|
|
||||||
[NOTE]
|
[NOTE]
|
||||||
Lucene index back-compatibility is only supported for the default codec. If you choose to customize the `docValuesFormat` in your schema.xml, upgrading to a future version of Solr may require you to either switch back to the default codec and optimize your index to rewrite it into the default codec before upgrading, or re-build your entire index from scratch after upgrading.
|
Lucene index back-compatibility is only supported for the default codec. If you choose to customize the `docValuesFormat` in your `schema.xml`, upgrading to a future version of Solr may require you to either switch back to the default codec and optimize your index to rewrite it into the default codec before upgrading, or re-build your entire index from scratch after upgrading.
|
||||||
|
|
||||||
== Using DocValues
|
== Using DocValues
|
||||||
|
|
||||||
|
|
|
@ -154,7 +154,7 @@ server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd clusterprop -n
|
||||||
server\scripts\cloud-scripts\zkcli.bat -zkhost localhost:2181 -cmd clusterprop -name urlScheme -val https
|
server\scripts\cloud-scripts\zkcli.bat -zkhost localhost:2181 -cmd clusterprop -name urlScheme -val https
|
||||||
----
|
----
|
||||||
|
|
||||||
If you have set up your ZooKeeper cluster to use a <<taking-solr-to-production.adoc#zookeeper-chroot,chroot for Solr>> , make sure you use the correct `zkhost` string with `zkcli`, e.g. `-zkhost localhost:2181/solr`.
|
If you have set up your ZooKeeper cluster to use a <<taking-solr-to-production.adoc#zookeeper-chroot,chroot for Solr>>, make sure you use the correct `zkhost` string with `zkcli`, e.g. `-zkhost localhost:2181/solr`.
|
||||||
|
|
||||||
=== Run SolrCloud with SSL
|
=== Run SolrCloud with SSL
|
||||||
|
|
||||||
|
@ -230,7 +230,7 @@ bin\solr.cmd -cloud -s cloud\node2 -z localhost:2181 -p 7574
|
||||||
|
|
||||||
[IMPORTANT]
|
[IMPORTANT]
|
||||||
====
|
====
|
||||||
curl on OS X Mavericks (10.9) has degraded SSL support. For more information and workarounds to allow one-way SSL, see http://curl.haxx.se/mail/archive-2013-10/0036.html. curl on OS X Yosemite (10.10) is improved - 2-way SSL is possible - see http://curl.haxx.se/mail/archive-2014-10/0053.html .
|
curl on OS X Mavericks (10.9) has degraded SSL support. For more information and workarounds to allow one-way SSL, see http://curl.haxx.se/mail/archive-2013-10/0036.html. curl on OS X Yosemite (10.10) is improved - 2-way SSL is possible - see http://curl.haxx.se/mail/archive-2014-10/0053.html.
|
||||||
|
|
||||||
The curl commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<Convert the Certificate and Key to PEM Format for Use with curl,above>> for instructions on creating this file):
|
The curl commands in the following sections will not work with the system `curl` on OS X Yosemite (10.10). Instead, the certificate supplied with the `-E` param must be in PKCS12 format, and the file supplied with the `--cacert` param must contain only the CA certificate, and no key (see <<Convert the Certificate and Key to PEM Format for Use with curl,above>> for instructions on creating this file):
|
||||||
|
|
||||||
|
|
|
@ -610,7 +610,7 @@ The parameter setting above causes the field facet results for the "doctype" fie
|
||||||
|
|
||||||
=== Limiting Facet with Certain Terms
|
=== Limiting Facet with Certain Terms
|
||||||
|
|
||||||
To limit field facet with certain terms specify them comma separated with `terms` local parameter. Commas and quotes in terms can be escaped with backslash, as in `\,`. In this case facet is calculated on a way similar to `facet.method=enum` , but ignores `facet.enum.cache.minDf`. For example:
|
To limit field facet with certain terms specify them comma separated with `terms` local parameter. Commas and quotes in terms can be escaped with backslash, as in `\,`. In this case facet is calculated on a way similar to `facet.method=enum`, but ignores `facet.enum.cache.minDf`. For example:
|
||||||
|
|
||||||
`facet.field={!terms='alfa,betta,with\,with\',with space'}symbol`
|
`facet.field={!terms='alfa,betta,with\,with\',with space'}symbol`
|
||||||
|
|
||||||
|
|
|
@ -96,15 +96,15 @@ Use `false` for field types with query analyzers including filters that can matc
|
||||||
|
|
||||||
[[docvaluesformat]]
|
[[docvaluesformat]]
|
||||||
`docValuesFormat`::
|
`docValuesFormat`::
|
||||||
Defines a custom `DocValuesFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml.
|
Defines a custom `DocValuesFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in `solrconfig.xml`.
|
||||||
|
|
||||||
`postingsFormat`::
|
`postingsFormat`::
|
||||||
Defines a custom `PostingsFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml.
|
Defines a custom `PostingsFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in `solrconfig.xml`.
|
||||||
|
|
||||||
|
|
||||||
[NOTE]
|
[NOTE]
|
||||||
====
|
====
|
||||||
Lucene index back-compatibility is only supported for the default codec. If you choose to customize the `postingsFormat` or `docValuesFormat` in your schema.xml, upgrading to a future version of Solr may require you to either switch back to the default codec and optimize your index to rewrite it into the default codec before upgrading, or re-build your entire index from scratch after upgrading.
|
Lucene index back-compatibility is only supported for the default codec. If you choose to customize the `postingsFormat` or `docValuesFormat` in your `schema.xml`, upgrading to a future version of Solr may require you to either switch back to the default codec and optimize your index to rewrite it into the default codec before upgrading, or re-build your entire index from scratch after upgrading.
|
||||||
====
|
====
|
||||||
|
|
||||||
=== Field Default Properties
|
=== Field Default Properties
|
||||||
|
|
|
@ -189,7 +189,7 @@ Implements the Daitch-Mokotoff Soundex algorithm, which allows identification of
|
||||||
|
|
||||||
*Arguments:*
|
*Arguments:*
|
||||||
|
|
||||||
`inject` :: (true/false) If true (the default), then new phonetic tokens are added to the stream. Otherwise, tokens are replaced with the phonetic equivalent. Setting this to false will enable phonetic matching, but the exact spelling of the target word may not match.
|
`inject`:: (true/false) If true (the default), then new phonetic tokens are added to the stream. Otherwise, tokens are replaced with the phonetic equivalent. Setting this to false will enable phonetic matching, but the exact spelling of the target word may not match.
|
||||||
|
|
||||||
*Example:*
|
*Example:*
|
||||||
|
|
||||||
|
|
|
@ -31,7 +31,7 @@ Functions must be expressed as function calls (for example, `sum(a,b)` instead o
|
||||||
|
|
||||||
There are several ways of using function queries in a Solr query:
|
There are several ways of using function queries in a Solr query:
|
||||||
|
|
||||||
* Via an explicit QParser that expects function arguments, such <<other-parsers.adoc#function-query-parser,`func`>> or <<other-parsers.adoc#function-range-query-parser,`frange`>> . For example:
|
* Via an explicit QParser that expects function arguments, such <<other-parsers.adoc#function-query-parser,`func`>> or <<other-parsers.adoc#function-range-query-parser,`frange`>>. For example:
|
||||||
+
|
+
|
||||||
[source,text]
|
[source,text]
|
||||||
----
|
----
|
||||||
|
@ -258,7 +258,7 @@ Arguments may be the name of a `DatePointField`, `TrieDateField`, or date math b
|
||||||
|
|
||||||
* `ms()`: Equivalent to `ms(NOW)`, number of milliseconds since the epoch.
|
* `ms()`: Equivalent to `ms(NOW)`, number of milliseconds since the epoch.
|
||||||
* `ms(a):` Returns the number of milliseconds since the epoch that the argument represents.
|
* `ms(a):` Returns the number of milliseconds since the epoch that the argument represents.
|
||||||
* `ms(a,b)` : Returns the number of milliseconds that b occurs before a (that is, a - b)
|
* `ms(a,b)`: Returns the number of milliseconds that b occurs before a (that is, a - b)
|
||||||
|
|
||||||
*Syntax Examples*
|
*Syntax Examples*
|
||||||
|
|
||||||
|
|
|
@ -33,7 +33,7 @@ Solr includes a Java implementation of index replication that works over HTTP:
|
||||||
* The configuration affecting replication is controlled by a single file, `solrconfig.xml`
|
* The configuration affecting replication is controlled by a single file, `solrconfig.xml`
|
||||||
* Supports the replication of configuration files as well as index files
|
* Supports the replication of configuration files as well as index files
|
||||||
* Works across platforms with same configuration
|
* Works across platforms with same configuration
|
||||||
* No reliance on OS-dependent file system features (eg: hard links)
|
* No reliance on OS-dependent file system features (e.g., hard links)
|
||||||
* Tightly integrated with Solr; an admin page offers fine-grained control of each aspect of replication
|
* Tightly integrated with Solr; an admin page offers fine-grained control of each aspect of replication
|
||||||
* The Java-based replication feature is implemented as a request handler. Configuring replication is therefore similar to any normal request handler.
|
* The Java-based replication feature is implemented as a request handler. Configuring replication is therefore similar to any normal request handler.
|
||||||
|
|
||||||
|
|
|
@ -139,7 +139,7 @@ On systems where the number of open files allowed per process is limited, CFS ma
|
||||||
.CFS: New Segments vs Merged Segments
|
.CFS: New Segments vs Merged Segments
|
||||||
[NOTE]
|
[NOTE]
|
||||||
====
|
====
|
||||||
To configure whether _newly written segments_ should use CFS, see the <<useCompoundFile,`useCompoundFile`>> setting described above. To configure whether _merged segments_ use CFS, review the Javadocs for your <<mergePolicyFactory,`mergePolicyFactory`>> .
|
To configure whether _newly written segments_ should use CFS, see the <<useCompoundFile,`useCompoundFile`>> setting described above. To configure whether _merged segments_ use CFS, review the Javadocs for your <<mergePolicyFactory,`mergePolicyFactory`>>.
|
||||||
|
|
||||||
Many <<Merging Index Segments,Merge Policy>> implementations support `noCFSRatio` and `maxCFSSegmentSizeMB` settings with default values that prevent compound files from being used for large segments, but do use compound files for small segments.
|
Many <<Merging Index Segments,Merge Policy>> implementations support `noCFSRatio` and `maxCFSSegmentSizeMB` settings with default values that prevent compound files from being used for large segments, but do use compound files for small segments.
|
||||||
|
|
||||||
|
|
|
@ -27,7 +27,7 @@ This section describes how Solr adds data to its index. It covers the following
|
||||||
|
|
||||||
* *<<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,Uploading Data with Index Handlers>>*: Information about using Solr's Index Handlers to upload XML/XSLT, JSON and CSV data.
|
* *<<uploading-data-with-index-handlers.adoc#uploading-data-with-index-handlers,Uploading Data with Index Handlers>>*: Information about using Solr's Index Handlers to upload XML/XSLT, JSON and CSV data.
|
||||||
|
|
||||||
* *<<transforming-and-indexing-custom-json.adoc#transforming-and-indexing-custom-json,Transforming and Indexing Custom JSON>>* : Index any JSON of your choice
|
* *<<transforming-and-indexing-custom-json.adoc#transforming-and-indexing-custom-json,Transforming and Indexing Custom JSON>>*: Index any JSON of your choice
|
||||||
|
|
||||||
* *<<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>*: Information about using the Solr Cell framework to upload data for indexing.
|
* *<<uploading-data-with-solr-cell-using-apache-tika.adoc#uploading-data-with-solr-cell-using-apache-tika,Uploading Data with Solr Cell using Apache Tika>>*: Information about using the Solr Cell framework to upload data for indexing.
|
||||||
|
|
||||||
|
|
|
@ -42,7 +42,7 @@ For example, here is one of the `<initParams>` sections defined by default in th
|
||||||
|
|
||||||
This sets the default search field ("df") to be "_text_" for all of the request handlers named in the path section. If we later want to change the `/query` request handler to search a different field by default, we could override the `<initParams>` by defining the parameter in the `<requestHandler>` section for `/query`.
|
This sets the default search field ("df") to be "_text_" for all of the request handlers named in the path section. If we later want to change the `/query` request handler to search a different field by default, we could override the `<initParams>` by defining the parameter in the `<requestHandler>` section for `/query`.
|
||||||
|
|
||||||
The syntax and semantics are similar to that of a `<requestHandler>` . The following are the attributes
|
The syntax and semantics are similar to that of a `<requestHandler>`. The following are the attributes
|
||||||
|
|
||||||
`path`::
|
`path`::
|
||||||
A comma-separated list of paths which will use the parameters. Wildcards can be used in paths to define nested paths, as described below.
|
A comma-separated list of paths which will use the parameters. Wildcards can be used in paths to define nested paths, as described below.
|
||||||
|
|
|
@ -126,7 +126,7 @@ curl -XGET http://localhost:8983/solr/books/query -d '
|
||||||
"query": {
|
"query": {
|
||||||
"bool": {
|
"bool": {
|
||||||
"must": [
|
"must": [
|
||||||
"title:solr" ,
|
"title:solr",
|
||||||
{"lucene": {"df: "content", query: "lucene solr"}}
|
{"lucene": {"df: "content", query: "lucene solr"}}
|
||||||
],
|
],
|
||||||
"must_not": [
|
"must_not": [
|
||||||
|
|
|
@ -252,9 +252,9 @@ SOLR_AUTHENTICATION_OPTS="-Djava.security.auth.login.config=/home/foo/jaas-clien
|
||||||
.KDC with AES-256 encryption
|
.KDC with AES-256 encryption
|
||||||
[IMPORTANT]
|
[IMPORTANT]
|
||||||
====
|
====
|
||||||
If your KDC uses AES-256 encryption, you need to add the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files to your JRE before a kerberized Solr can interact with the KDC.
|
If your KDC uses AES-256 encryption, you need to add the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files to your JRE before a Kerberized Solr can interact with the KDC.
|
||||||
|
|
||||||
You will know this when you see an error like this in your Solr logs : "KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled"
|
You will know this when you see an error like this in your Solr logs: "KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled".
|
||||||
|
|
||||||
For Java 1.8, this is available here: http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html.
|
For Java 1.8, this is available here: http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html.
|
||||||
|
|
||||||
|
|
|
@ -1418,7 +1418,7 @@ See the Solr wiki for tips & advice on using this filter: https://wiki.apache.or
|
||||||
|
|
||||||
*Arguments:*
|
*Arguments:*
|
||||||
|
|
||||||
`haircut` :: Select the extend of normalization. Valid values are:
|
`haircut`:: Select the extend of normalization. Valid values are:
|
||||||
+
|
+
|
||||||
* `bald`: (Default behavior) Cyrillic characters are first converted to Latin; then, Latin characters have their diacritics removed, with the exception of https://en.wikipedia.org/wiki/D_with_stroke[LATIN SMALL LETTER D WITH STROKE] (U+0111) which is converted to "```dj```"
|
* `bald`: (Default behavior) Cyrillic characters are first converted to Latin; then, Latin characters have their diacritics removed, with the exception of https://en.wikipedia.org/wiki/D_with_stroke[LATIN SMALL LETTER D WITH STROKE] (U+0111) which is converted to "```dj```"
|
||||||
* `regular`: Only Cyrillic to Latin normalization will be applied, preserving the Latin diatrics
|
* `regular`: Only Cyrillic to Latin normalization will be applied, preserving the Latin diatrics
|
||||||
|
|
|
@ -33,7 +33,7 @@ Two commands are available:
|
||||||
|
|
||||||
== Standalone Mode Backups
|
== Standalone Mode Backups
|
||||||
|
|
||||||
Backups and restoration uses Solr's replication handler. Out of the box, Solr includes implicit support for replication so this API can be used. Configuration of the replication handler can, however, be customized by defining your own replication handler in `solrconfig.xml` . For details on configuring the replication handler, see the section <<index-replication.adoc#configuring-the-replicationhandler,Configuring the ReplicationHandler>>.
|
Backups and restoration uses Solr's replication handler. Out of the box, Solr includes implicit support for replication so this API can be used. Configuration of the replication handler can, however, be customized by defining your own replication handler in `solrconfig.xml`. For details on configuring the replication handler, see the section <<index-replication.adoc#configuring-the-replicationhandler,Configuring the ReplicationHandler>>.
|
||||||
|
|
||||||
=== Backup API
|
=== Backup API
|
||||||
|
|
||||||
|
@ -144,7 +144,7 @@ http://localhost:8983/solr/gettingstarted/replication?command=restorestatus
|
||||||
</response>
|
</response>
|
||||||
----
|
----
|
||||||
|
|
||||||
The status value can be "In Progress" , "success" or "failed". If it failed then an "exception" will also be sent in the response.
|
The status value can be "In Progress", "success" or "failed". If it failed then an "exception" will also be sent in the response.
|
||||||
|
|
||||||
=== Create Snapshot API
|
=== Create Snapshot API
|
||||||
|
|
||||||
|
@ -214,11 +214,11 @@ Request ID to track this action which will be processed asynchronously
|
||||||
|
|
||||||
Solr provides interfaces to plug different storage systems for backing up and restoring. For example, you can have a Solr cluster running on a local filesystem like EXT3 but you can backup the indexes to a HDFS filesystem or vice versa.
|
Solr provides interfaces to plug different storage systems for backing up and restoring. For example, you can have a Solr cluster running on a local filesystem like EXT3 but you can backup the indexes to a HDFS filesystem or vice versa.
|
||||||
|
|
||||||
The repository interfaces needs to be configured in the solr.xml file . While running backup/restore commands we can specify the repository to be used.
|
The repository interfaces needs to be configured in the `solr.xml` file. While running backup/restore commands we can specify the repository to be used.
|
||||||
|
|
||||||
If no repository is configured then the local filesystem repository will be used automatically.
|
If no repository is configured then the local filesystem repository will be used automatically.
|
||||||
|
|
||||||
Example solr.xml section to configure a repository like <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,HDFS>>:
|
Example `solr.xml` section to configure a repository like <<running-solr-on-hdfs.adoc#running-solr-on-hdfs,HDFS>>:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -134,7 +134,7 @@ NOTE: PUT/POST is used to add terms to an existing list instead of replacing the
|
||||||
|
|
||||||
=== Managing Synonyms
|
=== Managing Synonyms
|
||||||
|
|
||||||
For the most part, the API for managing synonyms behaves similar to the API for stop words, except instead of working with a list of words, it uses a map, where the value for each entry in the map is a set of synonyms for a term. As with stop words, the `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> includes a pre-built set of synonym mappings suitable for the sample data that is activated by the following field type definition in schema.xml:
|
For the most part, the API for managing synonyms behaves similar to the API for stop words, except instead of working with a list of words, it uses a map, where the value for each entry in the map is a set of synonyms for a term. As with stop words, the `sample_techproducts_configs` <<config-sets.adoc#config-sets,configset>> includes a pre-built set of synonym mappings suitable for the sample data that is activated by the following field type definition in `schema.xml`:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
@ -193,7 +193,7 @@ To add a new synonym mapping, you can PUT/POST a single mapping such as:
|
||||||
curl -X PUT -H 'Content-type:application/json' --data-binary '{"mad":["angry","upset"]}' "http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english"
|
curl -X PUT -H 'Content-type:application/json' --data-binary '{"mad":["angry","upset"]}' "http://localhost:8983/solr/techproducts/schema/analysis/synonyms/english"
|
||||||
----
|
----
|
||||||
|
|
||||||
The API will return status code 200 if the PUT request was successful. To determine the synonyms for a specific term, you send a GET request for the child resource, such as `/schema/analysis/synonyms/english/mad` would return `["angry","upset"]` .
|
The API will return status code 200 if the PUT request was successful. To determine the synonyms for a specific term, you send a GET request for the child resource, such as `/schema/analysis/synonyms/english/mad` would return `["angry","upset"]`.
|
||||||
|
|
||||||
You can also PUT a list of symmetric synonyms, which will be expanded into a mapping for each term in the list. For example, you could PUT the following list of symmetric synonyms using the JSON list syntax instead of a map:
|
You can also PUT a list of symmetric synonyms, which will be expanded into a mapping for each term in the list. For example, you could PUT the following list of symmetric synonyms using the JSON list syntax instead of a map:
|
||||||
|
|
||||||
|
@ -227,7 +227,7 @@ Changing things like stop words and synonym mappings typically require re-indexi
|
||||||
|
|
||||||
Metadata about registered ManagedResources is available using the `/schema/managed` endpoint for each collection.
|
Metadata about registered ManagedResources is available using the `/schema/managed` endpoint for each collection.
|
||||||
|
|
||||||
Assuming you have the `managed_en` field type shown above defined in your schema.xml, sending a GET request to the following resource will return metadata about which schema-related resources are being managed by the RestManager:
|
Assuming you have the `managed_en` field type shown above defined in your `schema.xml`, sending a GET request to the following resource will return metadata about which schema-related resources are being managed by the RestManager:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
|
@ -285,7 +285,7 @@ For most users, creating resources in this way should never be necessary, since
|
||||||
|
|
||||||
However, You may want to explicitly delete managed resources if they are no longer being used by a Solr component.
|
However, You may want to explicitly delete managed resources if they are no longer being used by a Solr component.
|
||||||
|
|
||||||
For instance, the managed resource for German that we created above can be deleted because there are no Solr components that are using it, whereas the managed resource for English stop words cannot be deleted because there is a token filter declared in schema.xml that is using it.
|
For instance, the managed resource for German that we created above can be deleted because there are no Solr components that are using it, whereas the managed resource for English stop words cannot be deleted because there is a token filter declared in `schema.xml` that is using it.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
|
|
|
@ -187,8 +187,8 @@ Reporter plugins use the following arguments:
|
||||||
Additionally, several implementation-specific initialization arguments can be specified in nested elements. There are some arguments that are common to SLF4J, Ganglia and Graphite reporters:
|
Additionally, several implementation-specific initialization arguments can be specified in nested elements. There are some arguments that are common to SLF4J, Ganglia and Graphite reporters:
|
||||||
|
|
||||||
* *period* - (optional int) period in seconds between reports. Default value is 60.
|
* *period* - (optional int) period in seconds between reports. Default value is 60.
|
||||||
* *prefix* - (optional str) prefix to be added to metric names, may be helpful in logical grouping of related Solr instances, e.g., machine name or cluster name. Default is empty string, ie. just the registry name and metric name will be used to form a fully-qualified metric name.
|
* *prefix* - (optional str) prefix to be added to metric names, may be helpful in logical grouping of related Solr instances, e.g., machine name or cluster name. Default is empty string, i.e., just the registry name and metric name will be used to form a fully-qualified metric name.
|
||||||
* *filter* - (optional str) if not empty then only metric names that start with this value will be reported. Default is no filtering, ie. all metrics from selected registry will be reported.
|
* *filter* - (optional str) if not empty then only metric names that start with this value will be reported. Default is no filtering, i.e., all metrics from selected registry will be reported.
|
||||||
|
|
||||||
Reporters are instantiated for every group and registry that they were configured for, at the time when the respective components are initialized (e.g., on JVM startup or SolrCore load).
|
Reporters are instantiated for every group and registry that they were configured for, at the time when the respective components are initialized (e.g., on JVM startup or SolrCore load).
|
||||||
|
|
||||||
|
|
|
@ -20,7 +20,7 @@
|
||||||
|
|
||||||
Near Real Time (NRT) search means that documents are available for search soon after being indexed. NRT searching is one of the main features of SolrCloud and is rarely attempted in master/slave configurations.
|
Near Real Time (NRT) search means that documents are available for search soon after being indexed. NRT searching is one of the main features of SolrCloud and is rarely attempted in master/slave configurations.
|
||||||
|
|
||||||
Document durability and searchability are controlled by `commits`. The "Near" in "Near Real Time" is configurable to meet the needs of your application. Commits are either "hard" or "soft" and can be issued by a client (say SolrJ), via a REST call or configured to occur automatically in solrconfig.xml. The recommendation usually gives is to configure your commit strategy in solrconfig.xml (see below) and avoid issuing commits externally.
|
Document durability and searchability are controlled by `commits`. The "Near" in "Near Real Time" is configurable to meet the needs of your application. Commits are either "hard" or "soft" and can be issued by a client (say SolrJ), via a REST call or configured to occur automatically in `solrconfig.xml`. The recommendation usually gives is to configure your commit strategy in `solrconfig.xml` (see below) and avoid issuing commits externally.
|
||||||
|
|
||||||
Typically in NRT applications, hard commits are configured with `openSearcher=false`, and soft commits are configured to make documents visible for search.
|
Typically in NRT applications, hard commits are configured with `openSearcher=false`, and soft commits are configured to make documents visible for search.
|
||||||
|
|
||||||
|
@ -63,7 +63,7 @@ WARNING: Implicit in the above is that transaction logs will grow forever if har
|
||||||
|
|
||||||
=== Configuring commits
|
=== Configuring commits
|
||||||
|
|
||||||
As mentioned above, it is usually preferable to configure your commits (both hard and soft) in solrconfig.xml and avoid sending commits from an external source. Check your `solrconfig.xml` file since the defaults are likely not tuned to your needs. Here is an example NRT configuration for the two flavors of commit, a hard commit every 60 seconds and a soft commit every 30 seconds. Note that these are _not_ the values in some of the examples!
|
As mentioned above, it is usually preferable to configure your commits (both hard and soft) in `solrconfig.xml` and avoid sending commits from an external source. Check your `solrconfig.xml` file since the defaults are likely not tuned to your needs. Here is an example NRT configuration for the two flavors of commit, a hard commit every 60 seconds and a soft commit every 30 seconds. Note that these are _not_ the values in some of the examples!
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -108,7 +108,7 @@ In older version the error is:
|
||||||
[literal]
|
[literal]
|
||||||
child query must only match non-parent docs.
|
child query must only match non-parent docs.
|
||||||
|
|
||||||
You can search for `q=+(parentFilter) +(someChildren)` to find a cause .
|
You can search for `q=+(parentFilter) +(someChildren)` to find a cause.
|
||||||
|
|
||||||
Again using the example documents above, we can construct a query such as `q={!parent which="content_type:parentDocument"}comments:SolrCloud`. We get this document in response:
|
Again using the example documents above, we can construct a query such as `q={!parent which="content_type:parentDocument"}comments:SolrCloud`. We get this document in response:
|
||||||
|
|
||||||
|
@ -1005,7 +1005,7 @@ The XmlQParser implementation uses the {solr-javadocs}/solr-core/org/apache/solr
|
||||||
|
|
||||||
=== Customizing XML Query Parser
|
=== Customizing XML Query Parser
|
||||||
|
|
||||||
You can configure your own custom query builders for additional XML elements. The custom builders need to extend the {solr-javadocs}/solr-core/org/apache/solr/search/SolrQueryBuilder.html[SolrQueryBuilder] or the {solr-javadocs}/solr-core/org/apache/solr/search/SolrSpanQueryBuilder.html[SolrSpanQueryBuilder] class. Example solrconfig.xml snippet:
|
You can configure your own custom query builders for additional XML elements. The custom builders need to extend the {solr-javadocs}/solr-core/org/apache/solr/search/SolrQueryBuilder.html[SolrQueryBuilder] or the {solr-javadocs}/solr-core/org/apache/solr/search/SolrSpanQueryBuilder.html[SolrSpanQueryBuilder] class. Example `solrconfig.xml` snippet:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -44,7 +44,7 @@ To help users zero in on the content they're looking for, Solr supports two spec
|
||||||
|
|
||||||
<<faceting.adoc#faceting,*Faceting*>> is the arrangement of search results into categories (which are based on indexed terms). Within each category, Solr reports on the number of hits for relevant term, which is called a facet constraint. Faceting makes it easy for users to explore search results on sites such as movie sites and product review sites, where there are many categories and many items within a category.
|
<<faceting.adoc#faceting,*Faceting*>> is the arrangement of search results into categories (which are based on indexed terms). Within each category, Solr reports on the number of hits for relevant term, which is called a facet constraint. Faceting makes it easy for users to explore search results on sites such as movie sites and product review sites, where there are many categories and many items within a category.
|
||||||
|
|
||||||
The screen shot below shows an example of faceting from the CNET Web site (CBS Interactive Inc.) , which was the first site to use Solr.
|
The screen shot below shows an example of faceting from the CNET Web site (CBS Interactive Inc.), which was the first site to use Solr.
|
||||||
|
|
||||||
image::images/overview-of-searching-in-solr/worddav88969a784fb8a63d8c46e9c043f5f953.png[image,width=600,height=300]
|
image::images/overview-of-searching-in-solr/worddav88969a784fb8a63d8c46e9c043f5f953.png[image,width=600,height=300]
|
||||||
|
|
||||||
|
|
|
@ -431,4 +431,4 @@ Examples of using Python and Jython for connecting to Solr with the Solr JDBC dr
|
||||||
|
|
||||||
=== R
|
=== R
|
||||||
|
|
||||||
Examples of using R for connecting to Solr with the Solr JDBC driver are available in the section <<solr-jdbc-r.adoc#solr-jdbc-r,Solr JDBC - R>> .
|
Examples of using R for connecting to Solr with the Solr JDBC driver are available in the section <<solr-jdbc-r.adoc#solr-jdbc-r,Solr JDBC - R>>.
|
||||||
|
|
|
@ -36,7 +36,7 @@ You can request update request handler statistics with an API request such as `\
|
||||||
|
|
||||||
=== Search Request Handler
|
=== Search Request Handler
|
||||||
|
|
||||||
Can be useful to measure and track number of search queries, response times, etc. If you are not using the “select” handler then the path needs to be changed appropriately. Similarly if you are using the “sql” handler or “export” handler , the realtime handler “get”, or any other handler similar statistics can be found for that as well.
|
Can be useful to measure and track number of search queries, response times, etc. If you are not using the “select” handler then the path needs to be changed appropriately. Similarly if you are using the “sql” handler or “export” handler, the realtime handler “get”, or any other handler similar statistics can be found for that as well.
|
||||||
|
|
||||||
*Registry & Path*: `solr.<core>:QUERY./select`
|
*Registry & Path*: `solr.<core>:QUERY./select`
|
||||||
|
|
||||||
|
|
|
@ -56,7 +56,7 @@ Handling text properly will make your users happy by providing them with the bes
|
||||||
|
|
||||||
One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use `copyField` to take a variety of fields and funnel them all into a single text field for keyword searches.
|
One technique is using a text field as a catch-all for keyword searching. Most users are not sophisticated about their searches and the most common search is likely to be a simple keyword search. You can use `copyField` to take a variety of fields and funnel them all into a single text field for keyword searches.
|
||||||
|
|
||||||
In the schema.xml file for the "```techproducts```" example included with Solr, `copyField` declarations are used to dump the contents of `cat`, `name`, `manu`, `features`, and `includes` into a single field, `text`. In addition, it could be a good idea to copy `ID` into `text` in case users wanted to search for a particular product by passing its product number to a keyword search.
|
In the `schema.xml` file for the "```techproducts```" example included with Solr, `copyField` declarations are used to dump the contents of `cat`, `name`, `manu`, `features`, and `includes` into a single field, `text`. In addition, it could be a good idea to copy `ID` into `text` in case users wanted to search for a particular product by passing its product number to a keyword search.
|
||||||
|
|
||||||
Another technique is using `copyField` to use the same field in different ways. Suppose you have a field that is a list of authors, like this:
|
Another technique is using `copyField` to use the same field in different ways. Suppose you have a field that is a list of authors, like this:
|
||||||
|
|
||||||
|
|
|
@ -35,7 +35,7 @@ Solr caches are associated with a specific instance of an Index Searcher, a spec
|
||||||
|
|
||||||
When a new searcher is opened, the current searcher continues servicing requests while the new one auto-warms its cache. The new searcher uses the current searcher's cache to pre-populate its own. When the new searcher is ready, it is registered as the current searcher and begins handling all new search requests. The old searcher will be closed once it has finished servicing all its requests.
|
When a new searcher is opened, the current searcher continues servicing requests while the new one auto-warms its cache. The new searcher uses the current searcher's cache to pre-populate its own. When the new searcher is ready, it is registered as the current searcher and begins handling all new search requests. The old searcher will be closed once it has finished servicing all its requests.
|
||||||
|
|
||||||
In Solr, there are three cache implementations: `solr.search.LRUCache`, `solr.search.FastLRUCache,` and `solr.search.LFUCache` .
|
In Solr, there are three cache implementations: `solr.search.LRUCache`, `solr.search.FastLRUCache,` and `solr.search.LFUCache`.
|
||||||
|
|
||||||
The acronym LRU stands for Least Recently Used. When an LRU cache fills up, the entry with the oldest last-accessed timestamp is evicted to make room for the new entry. The net effect is that entries that are accessed frequently tend to stay in the cache, while those that are not accessed frequently tend to drop out and will be re-fetched from the index if needed again.
|
The acronym LRU stands for Least Recently Used. When an LRU cache fills up, the entry with the oldest last-accessed timestamp is evicted to make room for the new entry. The net effect is that entries that are accessed frequently tend to stay in the cache, while those that are not accessed frequently tend to drop out and will be re-fetched from the index if needed again.
|
||||||
|
|
||||||
|
|
|
@ -45,7 +45,7 @@ The available commands are:
|
||||||
|
|
||||||
* `set`: Create or overwrite a parameter set map.
|
* `set`: Create or overwrite a parameter set map.
|
||||||
* `unset`: delete a parameter set map.
|
* `unset`: delete a parameter set map.
|
||||||
* `update`: update a parameter set map. This is equivalent to a map.putAll(newMap) . Both the maps are merged and if the new map has same keys as old they are overwritten
|
* `update`: update a parameter set map. This is equivalent to a `map.putAll(newMap)`. Both the maps are merged and if the new map has same keys as old they are overwritten.
|
||||||
|
|
||||||
You can mix these commands into a single request if necessary.
|
You can mix these commands into a single request if necessary.
|
||||||
|
|
||||||
|
|
|
@ -78,7 +78,7 @@ The pre-defined permissions are:
|
||||||
* *schema-edit*: this permission is allowed to edit a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
|
* *schema-edit*: this permission is allowed to edit a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
|
||||||
* *schema-read*: this permission is allowed to read a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
|
* *schema-read*: this permission is allowed to read a collection's schema using the <<schema-api.adoc#schema-api,Schema API>>. Note that this allows schema read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
|
||||||
* *config-edit*: this permission is allowed to edit a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
|
* *config-edit*: this permission is allowed to edit a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created.
|
||||||
* *core-admin-read* : Read operations on the core admin API
|
* *core-admin-read*: Read operations on the core admin API
|
||||||
* *core-admin-edit*: Core admin commands that can mutate the system state.
|
* *core-admin-edit*: Core admin commands that can mutate the system state.
|
||||||
* *config-read*: this permission is allowed to read a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
|
* *config-read*: this permission is allowed to read a collection's configuration using the <<config-api.adoc#config-api,Config API>>, the <<request-parameters-api.adoc#request-parameters-api,Request Parameters API>>, and other APIs which modify `configoverlay.json`. Note that this allows configuration read permissions for _all_ collections. If read permissions should only be applied to specific collections, a custom permission would need to be created.
|
||||||
* *collection-admin-edit*: this permission is allowed to edit a collection's configuration using the <<collections-api.adoc#collections-api,Collections API>>. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created. Specifically, the following actions of the Collections API would be allowed:
|
* *collection-admin-edit*: this permission is allowed to edit a collection's configuration using the <<collections-api.adoc#collections-api,Collections API>>. Note that this allows configuration edit permissions for _all_ collections. If edit permissions should only be applied to specific collections, a custom permission would need to be created. Specifically, the following actions of the Collections API would be allowed:
|
||||||
|
@ -133,7 +133,7 @@ The name of the permission. This is required only if it is a predefined permissi
|
||||||
`collection`::
|
`collection`::
|
||||||
The collection or collections the permission will apply to.
|
The collection or collections the permission will apply to.
|
||||||
+
|
+
|
||||||
When the path that will be allowed is collection-specific, such as when setting permissions to allow use of the Schema API, omitting the collection property will allow the defined path and/or method for all collections. However, when the path is one that is non-collection-specific, such as the Collections API, the collection value must be `null`. The default value is * (all collections).
|
When the path that will be allowed is collection-specific, such as when setting permissions to allow use of the Schema API, omitting the collection property will allow the defined path and/or method for all collections. However, when the path is one that is non-collection-specific, such as the Collections API, the collection value must be `null`. The default value is `*`, or all collections.
|
||||||
|
|
||||||
`path`::
|
`path`::
|
||||||
A request handler name, such as `/update` or `/select`. A wild card is supported, to allow for all paths as appropriate (such as, `/update/*`).
|
A request handler name, such as `/update` or `/select`. A wild card is supported, to allow for all paths as appropriate (such as, `/update/*`).
|
||||||
|
|
|
@ -84,7 +84,7 @@ Tag values come from a plugin called Snitch. If there is a tag named ‘rack’
|
||||||
* *host*: host name of the node
|
* *host*: host name of the node
|
||||||
* *port*: port of the node
|
* *port*: port of the node
|
||||||
* *node*: node name
|
* *node*: node name
|
||||||
* *role* : The role of the node. The only supported role is 'overseer'
|
* *role*: The role of the node. The only supported role is 'overseer'
|
||||||
* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
|
* *ip_1, ip_2, ip_3, ip_4*: These are ip fragments for each node. For example, in a host with ip `192.168.1.2`, `ip_1 = 2`, `ip_2 =1`, `ip_3 = 168` and` ip_4 = 192`
|
||||||
* *sysprop.{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
|
* *sysprop.{PROPERTY_NAME}*: These are values available from system properties. `sysprop.key` means a value that is passed to the node as `-Dkey=keyValue` during the node startup. It is possible to use rules like `sysprop.key:expectedVal,shard:*`
|
||||||
|
|
||||||
|
@ -119,7 +119,7 @@ replica:<2,node:*
|
||||||
|
|
||||||
=== For a given shard, keep less than 2 replicas on any node
|
=== For a given shard, keep less than 2 replicas on any node
|
||||||
|
|
||||||
For this rule, we use the `shard` condition to define any shard , the `replica` condition with operators for "less than 2", and finally a pre-defined tag named `node` to define nodes with any name.
|
For this rule, we use the `shard` condition to define any shard, the `replica` condition with operators for "less than 2", and finally a pre-defined tag named `node` to define nodes with any name.
|
||||||
|
|
||||||
[source,text]
|
[source,text]
|
||||||
----
|
----
|
||||||
|
@ -135,7 +135,7 @@ This rule limits the `shard` condition to 'shard1', but any number of replicas.
|
||||||
shard:shard1,replica:*,rack:730
|
shard:shard1,replica:*,rack:730
|
||||||
----
|
----
|
||||||
|
|
||||||
In this case, the default value of `replica` is * (or, all replicas). So, it can be omitted and the rule can be reduced to:
|
In this case, the default value of `replica` is `*`, or all replicas. It can be omitted and the rule will be reduced to:
|
||||||
|
|
||||||
[source,text]
|
[source,text]
|
||||||
----
|
----
|
||||||
|
|
|
@ -166,7 +166,7 @@ Here is a sample `solrconfig.xml` configuration for storing Solr indexes on HDFS
|
||||||
</directoryFactory>
|
</directoryFactory>
|
||||||
----
|
----
|
||||||
|
|
||||||
If using Kerberos, you will need to add the three Kerberos related properties to the `<directoryFactory>` element in solrconfig.xml, such as:
|
If using Kerberos, you will need to add the three Kerberos related properties to the `<directoryFactory>` element in `solrconfig.xml`, such as:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -66,7 +66,7 @@ An alternative to using a managed schema is to explicitly configure a `ClassicIn
|
||||||
|
|
||||||
If you have an existing Solr collection that uses `ClassicIndexSchemaFactory`, and you wish to convert to use a managed schema, you can simply modify the `solrconfig.xml` to specify the use of the `ManagedIndexSchemaFactory`.
|
If you have an existing Solr collection that uses `ClassicIndexSchemaFactory`, and you wish to convert to use a managed schema, you can simply modify the `solrconfig.xml` to specify the use of the `ManagedIndexSchemaFactory`.
|
||||||
|
|
||||||
Once Solr is restarted and it detects that a `schema.xml` file exists, but the `managedSchemaResourceName` file (ie: "`managed-schema`") does not exist, the existing `schema.xml` file will be renamed to `schema.xml.bak` and the contents are re-written to the managed schema file. If you look at the resulting file, you'll see this at the top of the page:
|
Once Solr is restarted and it detects that a `schema.xml` file exists, but the `managedSchemaResourceName` file (i.e., "`managed-schema`") does not exist, the existing `schema.xml` file will be renamed to `schema.xml.bak` and the contents are re-written to the managed schema file. If you look at the resulting file, you'll see this at the top of the page:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -100,7 +100,7 @@ Then at query time, you include the prefix(es) into your query with the `\_route
|
||||||
|
|
||||||
The `compositeId` router supports prefixes containing up to 2 levels of routing. For example: a prefix routing first by region, then by customer: "USA!IBM!12345"
|
The `compositeId` router supports prefixes containing up to 2 levels of routing. For example: a prefix routing first by region, then by customer: "USA!IBM!12345"
|
||||||
|
|
||||||
Another use case could be if the customer "IBM" has a lot of documents and you want to spread it across multiple shards. The syntax for such a use case would be : `shard_key/num!document_id` where the `/num` is the number of bits from the shard key to use in the composite hash.
|
Another use case could be if the customer "IBM" has a lot of documents and you want to spread it across multiple shards. The syntax for such a use case would be: `shard_key/num!document_id` where the `/num` is the number of bits from the shard key to use in the composite hash.
|
||||||
|
|
||||||
So `IBM/3!12345` will take 3 bits from the shard key and 29 bits from the unique doc id, spreading the tenant over 1/8th of the shards in the collection. Likewise if the num value was 2 it would spread the documents across 1/4th the number of shards. At query time, you include the prefix(es) along with the number of bits into your query with the `\_route_` parameter (i.e., `q=solr&_route_=IBM/3!`) to direct queries to specific shards.
|
So `IBM/3!12345` will take 3 bits from the shard key and 29 bits from the unique doc id, spreading the tenant over 1/8th of the shards in the collection. Likewise if the num value was 2 it would spread the documents across 1/4th the number of shards. At query time, you include the prefix(es) along with the number of bits into your query with the `\_route_` parameter (i.e., `q=solr&_route_=IBM/3!`) to direct queries to specific shards.
|
||||||
|
|
||||||
|
|
|
@ -455,7 +455,7 @@ Any changes you make to the configuration for the contacts collection will not a
|
||||||
|
|
||||||
You can override the name given to the configuration directory in ZooKeeper by using the `-n` option. For instance, the command `bin/solr create -c logs -d _default -n basic` will upload the `server/solr/configsets/_default/conf` directory to ZooKeeper as `/configs/basic`.
|
You can override the name given to the configuration directory in ZooKeeper by using the `-n` option. For instance, the command `bin/solr create -c logs -d _default -n basic` will upload the `server/solr/configsets/_default/conf` directory to ZooKeeper as `/configs/basic`.
|
||||||
|
|
||||||
Notice that we used the `-d` option to specify a different configuration than the default. Solr provides several built-in configurations under `server/solr/configsets`. However you can also provide the path to your own configuration directory using the `-d` option. For instance, the command `bin/solr create -c mycoll -d /tmp/myconfigs`, will upload `/tmp/myconfigs` into ZooKeeper under `/configs/mycoll` .
|
Notice that we used the `-d` option to specify a different configuration than the default. Solr provides several built-in configurations under `server/solr/configsets`. However you can also provide the path to your own configuration directory using the `-d` option. For instance, the command `bin/solr create -c mycoll -d /tmp/myconfigs`, will upload `/tmp/myconfigs` into ZooKeeper under `/configs/mycoll`.
|
||||||
|
|
||||||
To reiterate, the configuration directory is named after the collection unless you override it using the `-n` option.
|
To reiterate, the configuration directory is named after the collection unless you override it using the `-n` option.
|
||||||
|
|
||||||
|
@ -608,7 +608,7 @@ If a pre-existing configuration set is specified, it will be overwritten in ZooK
|
||||||
*Example*: `-n myconfig`
|
*Example*: `-n myconfig`
|
||||||
|
|
||||||
`-d <configset dir>`::
|
`-d <configset dir>`::
|
||||||
The path of the configuration set to upload. It should have a "conf" directory immediately below it that in turn contains solrconfig.xml etc.
|
The path of the configuration set to upload. It should have a `conf` directory immediately below it that in turn contains `solrconfig.xml` etc.
|
||||||
+
|
+
|
||||||
If just a name is supplied, `$SOLR_HOME/server/solr/configsets` will be checked for this name. An absolute path may be supplied instead.
|
If just a name is supplied, `$SOLR_HOME/server/solr/configsets` will be checked for this name. An absolute path may be supplied instead.
|
||||||
+
|
+
|
||||||
|
|
|
@ -168,7 +168,7 @@ The Solr index Schema defines the fields to be indexed and the type for the fiel
|
||||||
|
|
||||||
[[solrconfig]]<<the-well-configured-solr-instance.adoc#the-well-configured-solr-instance,SolrConfig (solrconfig.xml)>>::
|
[[solrconfig]]<<the-well-configured-solr-instance.adoc#the-well-configured-solr-instance,SolrConfig (solrconfig.xml)>>::
|
||||||
|
|
||||||
The Apache Solr configuration file. Defines indexing options, RequestHandlers, highlighting, spellchecking and various other configurations. The file, solrconfig.xml is located in the Solr home conf directory.
|
The Apache Solr configuration file. Defines indexing options, RequestHandlers, highlighting, spellchecking and various other configurations. The file, `solrconfig.xml`, is located in the Solr home `conf` directory.
|
||||||
|
|
||||||
[[spellcheck]]<<spell-checking.adoc#spell-checking,Spell Check>>::
|
[[spellcheck]]<<spell-checking.adoc#spell-checking,Spell Check>>::
|
||||||
The ability to suggest alternative spellings of search terms to a user, as a check against spelling errors causing few or zero results.
|
The ability to suggest alternative spellings of search terms to a user, as a check against spelling errors causing few or zero results.
|
||||||
|
|
|
@ -132,14 +132,14 @@ The heap usage of the node as reported by the Metrics API under the key `solr.jv
|
||||||
`nodeRole`::
|
`nodeRole`::
|
||||||
The role of the node. The only supported value currently is `overseer`.
|
The role of the node. The only supported value currently is `overseer`.
|
||||||
|
|
||||||
`ip_1 , ip_2, ip_3, ip_4`::
|
`ip_1, ip_2, ip_3, ip_4`::
|
||||||
The least significant to most significant segments of IP address. For example, for an IP address `192.168.1.2`, `ip_1 = 2`, `ip_2 = 1`, `ip_3 = 168`, `ip_4 = 192`.
|
The least significant to most significant segments of IP address. For example, for an IP address `192.168.1.2`, `ip_1 = 2`, `ip_2 = 1`, `ip_3 = 168`, `ip_4 = 192`.
|
||||||
|
|
||||||
`sysprop.<system_property_name>`::
|
`sysprop.<system_property_name>`::
|
||||||
Any arbitrary system property set on the node on startup.
|
Any arbitrary system property set on the node on startup.
|
||||||
|
|
||||||
`metrics:<full-path-to-the metric>` ::
|
`metrics:<full-path-to-the metric>`::
|
||||||
Any arbitrary metric. eg: `metrics:solr.node:CONTAINER.fs.totalSpace`. Refer to the `key` parameter in <<metrics-reporting.adoc#metrics-reporting, Metrics API>>
|
Any arbitrary metric. For example, `metrics:solr.node:CONTAINER.fs.totalSpace`. Refer to the `key` parameter in the <<metrics-reporting.adoc#metrics-reporting, Metrics API>> section.
|
||||||
|
|
||||||
=== Policy Operators
|
=== Policy Operators
|
||||||
|
|
||||||
|
|
|
@ -257,7 +257,7 @@ If `geo=true` then the default prefix tree is `geohash`, otherwise it's `quad`.
|
||||||
+
|
+
|
||||||
A third choice is `packedQuad`, which is generally more efficient than `quad`, provided there are many levels -- perhaps 20 or more.
|
A third choice is `packedQuad`, which is generally more efficient than `quad`, provided there are many levels -- perhaps 20 or more.
|
||||||
|
|
||||||
`maxLevels`:: Sets the maximum grid depth for indexed data. Instead, it's usually more intuitive to compute an appropriate maxLevels by specifying `maxDistErr` .
|
`maxLevels`:: Sets the maximum grid depth for indexed data. Instead, it's usually more intuitive to compute an appropriate maxLevels by specifying `maxDistErr`.
|
||||||
|
|
||||||
*_And there are others:_* `normWrapLongitude`, `datelineRule`, `validationRule`, `autoIndex`, `allowMultiOverlap`, `precisionModel`. For further info, see notes below about `spatialContextFactory` implementations referenced above, especially the link to the JTS based one.
|
*_And there are others:_* `normWrapLongitude`, `datelineRule`, `validationRule`, `autoIndex`, `allowMultiOverlap`, `precisionModel`. For further info, see notes below about `spatialContextFactory` implementations referenced above, especially the link to the JTS based one.
|
||||||
|
|
||||||
|
@ -332,7 +332,7 @@ Once the field type has been defined, define a field that uses it.
|
||||||
|
|
||||||
The `RptWithGeometrySpatialField` field type is a derivative of `SpatialRecursivePrefixTreeFieldType` that also stores the original geometry internally in Lucene DocValues, which it uses to achieve accurate search. It can also be used for indexed point fields. The Intersects predicate (the default) is particularly fast, since many search results can be returned as an accurate hit without requiring a geometry check. This field type is configured just like RPT except that the default `distErrPct` is 0.15 (higher than 0.025) because the grid squares are purely for performance and not to fundamentally represent the shape.
|
The `RptWithGeometrySpatialField` field type is a derivative of `SpatialRecursivePrefixTreeFieldType` that also stores the original geometry internally in Lucene DocValues, which it uses to achieve accurate search. It can also be used for indexed point fields. The Intersects predicate (the default) is particularly fast, since many search results can be returned as an accurate hit without requiring a geometry check. This field type is configured just like RPT except that the default `distErrPct` is 0.15 (higher than 0.025) because the grid squares are purely for performance and not to fundamentally represent the shape.
|
||||||
|
|
||||||
An optional in-memory cache can be defined in `solrconfig.xml`, which should be done when the data tends to have shapes with many vertices. Assuming you name your field "geom", you can configure an optional cache in solrconfig.xml by adding the following – notice the suffix of the cache name:
|
An optional in-memory cache can be defined in `solrconfig.xml`, which should be done when the data tends to have shapes with many vertices. Assuming you name your field "geom", you can configure an optional cache in `solrconfig.xml` by adding the following – notice the suffix of the cache name:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -137,7 +137,7 @@ Here is how it might be configured in `solrconfig.xml`:
|
||||||
|
|
||||||
Some of the parameters will be familiar from the discussion of the other spell checkers, such as `name`, `classname`, and `field`. New for this spell checker is `combineWords`, which defines whether words should be combined in a dictionary search (default is true); `breakWords`, which defines if words should be broken during a dictionary search (default is true); and `maxChanges`, an integer which defines how many times the spell checker should check collation possibilities against the index (default is 10).
|
Some of the parameters will be familiar from the discussion of the other spell checkers, such as `name`, `classname`, and `field`. New for this spell checker is `combineWords`, which defines whether words should be combined in a dictionary search (default is true); `breakWords`, which defines if words should be broken during a dictionary search (default is true); and `maxChanges`, an integer which defines how many times the spell checker should check collation possibilities against the index (default is 10).
|
||||||
|
|
||||||
The spellchecker can be configured with a traditional checker (ie: `DirectSolrSpellChecker`). The results are combined and collations can contain a mix of corrections from both spellcheckers.
|
The spellchecker can be configured with a traditional checker (i.e., `DirectSolrSpellChecker`). The results are combined and collations can contain a mix of corrections from both spellcheckers.
|
||||||
|
|
||||||
=== Add It to a Request Handler
|
=== Add It to a Request Handler
|
||||||
|
|
||||||
|
|
|
@ -390,9 +390,9 @@ The `classify` function classifies tuples using a logistic regression text class
|
||||||
|
|
||||||
Each tuple that is classified is assigned two scores:
|
Each tuple that is classified is assigned two scores:
|
||||||
|
|
||||||
* probability_d* : A float between 0 and 1 which describes the probability that the tuple belongs to the class. This is useful in the classification use case.
|
* probability_d*: A float between 0 and 1 which describes the probability that the tuple belongs to the class. This is useful in the classification use case.
|
||||||
|
|
||||||
* score_d* : The score of the document that has not be squashed between 0 and 1. The score may be positive or negative. The higher the score the better the document fits the class. This un-squashed score will be useful in query re-ranking and recommendation use cases. This score is particularly useful when multiple high ranking documents have a probability_d score of 1, which won't provide a meaningful ranking between documents.
|
* score_d*: The score of the document that has not be squashed between 0 and 1. The score may be positive or negative. The higher the score the better the document fits the class. This un-squashed score will be useful in query re-ranking and recommendation use cases. This score is particularly useful when multiple high ranking documents have a probability_d score of 1, which won't provide a meaningful ranking between documents.
|
||||||
|
|
||||||
=== classify Parameters
|
=== classify Parameters
|
||||||
|
|
||||||
|
|
|
@ -186,7 +186,7 @@ The example above shows a facet function with rollups over three buckets, where
|
||||||
|
|
||||||
== features
|
== features
|
||||||
|
|
||||||
The `features` function extracts the key terms from a text field in a classification training set stored in a SolrCloud collection. It uses an algorithm known as * Information Gain* , to select the important terms from the training set. The `features` function was designed to work specifically with the <<train,train>> function, which uses the extracted features to train a text classifier.
|
The `features` function extracts the key terms from a text field in a classification training set stored in a SolrCloud collection. It uses an algorithm known as *Information Gain*, to select the important terms from the training set. The `features` function was designed to work specifically with the <<train,train>> function, which uses the extracted features to train a text classifier.
|
||||||
|
|
||||||
The `features` function is designed to work with a training set that provides both positive and negative examples of a class. It emits a tuple for each feature term that is extracted along with the inverse document frequency (IDF) for the term in the training set.
|
The `features` function is designed to work with a training set that provides both positive and negative examples of a class. It emits a tuple for each feature term that is extracted along with the inverse document frequency (IDF) for the term in the training set.
|
||||||
|
|
||||||
|
@ -348,8 +348,8 @@ The `shortestPath` function is an implementation of a shortest path graph traver
|
||||||
* `from`: (Mandatory) The nodeID to start the search from
|
* `from`: (Mandatory) The nodeID to start the search from
|
||||||
* `to`: (Mandatory) The nodeID to end the search at
|
* `to`: (Mandatory) The nodeID to end the search at
|
||||||
* `edge`: (Mandatory) Syntax: `from_field=to_field`. The `from_field` defines which field to search from. The `to_field` defines which field to search to. See example below for a detailed explanation.
|
* `edge`: (Mandatory) Syntax: `from_field=to_field`. The `from_field` defines which field to search from. The `to_field` defines which field to search to. See example below for a detailed explanation.
|
||||||
* `threads`: (Optional : Default 6) The number of threads used to perform the partitioned join in the traversal.
|
* `threads`: (Optional: Default 6) The number of threads used to perform the partitioned join in the traversal.
|
||||||
* `partitionSize`: (Optional : Default 250) The number of nodes in each partition of the join.
|
* `partitionSize`: (Optional: Default 250) The number of nodes in each partition of the join.
|
||||||
* `fq`: (Optional) Filter query
|
* `fq`: (Optional) Filter query
|
||||||
* `maxDepth`: (Mandatory) Limits to the search to a maximum depth in the graph.
|
* `maxDepth`: (Mandatory) Limits to the search to a maximum depth in the graph.
|
||||||
|
|
||||||
|
|
|
@ -213,7 +213,7 @@ When using `BlendedInfixSuggester` you can provide your own path where the index
|
||||||
`minPrefixChars`::
|
`minPrefixChars`::
|
||||||
Minimum number of leading characters before PrefixQuery is used (the default is `4`). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
|
Minimum number of leading characters before PrefixQuery is used (the default is `4`). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).
|
||||||
|
|
||||||
This implementation supports <<Context Filtering>> .
|
This implementation supports <<Context Filtering>>.
|
||||||
|
|
||||||
==== FreeTextLookupFactory
|
==== FreeTextLookupFactory
|
||||||
|
|
||||||
|
|
|
@ -324,7 +324,7 @@ Comments may be nested.
|
||||||
|
|
||||||
Solr's standard query parser differs from the Lucene Query Parser in the following ways:
|
Solr's standard query parser differs from the Lucene Query Parser in the following ways:
|
||||||
|
|
||||||
* A * may be used for either or both endpoints to specify an open-ended range query
|
* A `*` may be used for either or both endpoints to specify an open-ended range query
|
||||||
** `field:[* TO 100]` finds all field values less than or equal to 100
|
** `field:[* TO 100]` finds all field values less than or equal to 100
|
||||||
** `field:[100 TO *]` finds all field values greater than or equal to 100
|
** `field:[100 TO *]` finds all field values greater than or equal to 100
|
||||||
** `field:[* TO *]` matches all documents with the field
|
** `field:[* TO *]` matches all documents with the field
|
||||||
|
|
|
@ -142,7 +142,7 @@ Similar to the <<faceting.adoc#faceting,Facet Component>>, the `stats.field` par
|
||||||
* Changing the Output Key: `stats.field={!key=my_price_stats}price`
|
* Changing the Output Key: `stats.field={!key=my_price_stats}price`
|
||||||
* Tagging stats for <<The Stats Component and Faceting,use with `facet.pivot`>>: `stats.field={!tag=my_pivot_stats}price`
|
* Tagging stats for <<The Stats Component and Faceting,use with `facet.pivot`>>: `stats.field={!tag=my_pivot_stats}price`
|
||||||
|
|
||||||
Local parameters can also be used to specify individual statistics by name, overriding the set of statistics computed by default, eg: `stats.field={!min=true max=true percentiles='99,99.9,99.99'}price`
|
Local parameters can also be used to specify individual statistics by name, overriding the set of statistics computed by default, e.g., `stats.field={!min=true max=true percentiles='99,99.9,99.99'}price`.
|
||||||
|
|
||||||
[IMPORTANT]
|
[IMPORTANT]
|
||||||
====
|
====
|
||||||
|
|
|
@ -158,7 +158,7 @@ If `true`, returns payload information.
|
||||||
If `true`, returns document term frequency info for each term in the document.
|
If `true`, returns document term frequency info for each term in the document.
|
||||||
|
|
||||||
`tv.tf_idf`::
|
`tv.tf_idf`::
|
||||||
If `true`, calculates TF / DF (ie: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure.
|
If `true`, calculates TF / DF (i.e.,: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure.
|
||||||
+
|
+
|
||||||
This parameter requires both `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output)
|
This parameter requires both `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output)
|
||||||
|
|
||||||
|
|
|
@ -484,7 +484,10 @@ Simple tokenizer that splits the text stream on whitespace and returns sequences
|
||||||
|
|
||||||
*Factory class:* `solr.WhitespaceTokenizerFactory`
|
*Factory class:* `solr.WhitespaceTokenizerFactory`
|
||||||
|
|
||||||
*Arguments:* `rule` : Specifies how to define whitespace for the purpose of tokenization. Valid values:
|
*Arguments:*
|
||||||
|
|
||||||
|
`rule`::
|
||||||
|
Specifies how to define whitespace for the purpose of tokenization. Valid values:
|
||||||
|
|
||||||
* `java`: (Default) Uses https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-[Character.isWhitespace(int)]
|
* `java`: (Default) Uses https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html#isWhitespace-int-[Character.isWhitespace(int)]
|
||||||
* `unicode`: Uses Unicode's WHITESPACE property
|
* `unicode`: Uses Unicode's WHITESPACE property
|
||||||
|
|
|
@ -25,7 +25,7 @@ If you have JSON documents that you would like to index without transforming the
|
||||||
These parameters allow you to define how a JSON file should be read for multiple Solr documents.
|
These parameters allow you to define how a JSON file should be read for multiple Solr documents.
|
||||||
|
|
||||||
split::
|
split::
|
||||||
Defines the path at which to split the input JSON into multiple Solr documents and is required if you have multiple documents in a single JSON file. If the entire JSON makes a single solr document, the path must be “`/`”. It is possible to pass multiple split paths by separating them with a pipe `(|)` example : `split=/|/foo|/foo/bar`. If one path is a child of another, they automatically become a child document
|
Defines the path at which to split the input JSON into multiple Solr documents and is required if you have multiple documents in a single JSON file. If the entire JSON makes a single solr document, the path must be “`/`”. It is possible to pass multiple `split` paths by separating them with a pipe `(|)`, for example: `split=/|/foo|/foo/bar`. If one path is a child of another, they automatically become a child document.
|
||||||
|
|
||||||
f::
|
f::
|
||||||
A multivalued mapping parameter. The format of the parameter is `target-field-name:json-path`. The `json-path` is required. The `target-field-name` is the Solr document field name, and is optional. If not specified, it is automatically derived from the input JSON. The default target field name is the fully qualified name of the field.
|
A multivalued mapping parameter. The format of the parameter is `target-field-name:json-path`. The `json-path` is required. The `target-field-name` is the Solr document field name, and is optional. If not specified, it is automatically derived from the input JSON. The default target field name is the fully qualified name of the field.
|
||||||
|
@ -179,8 +179,8 @@ A single asterisk `\*` maps only to direct children, and a double asterisk `\*\*
|
||||||
* `f=$FQN:/**`: maps all fields to the fully qualified name (`$FQN`) of the JSON field. The fully qualified name is obtained by concatenating all the keys in the hierarchy with a period (`.`) as a delimiter. This is the default behavior if no `f` path mappings are specified.
|
* `f=$FQN:/**`: maps all fields to the fully qualified name (`$FQN`) of the JSON field. The fully qualified name is obtained by concatenating all the keys in the hierarchy with a period (`.`) as a delimiter. This is the default behavior if no `f` path mappings are specified.
|
||||||
* `f=/docs/*`: maps all the fields under docs and in the name as given in json
|
* `f=/docs/*`: maps all the fields under docs and in the name as given in json
|
||||||
* `f=/docs/**`: maps all the fields under docs and its children in the name as given in json
|
* `f=/docs/**`: maps all the fields under docs and its children in the name as given in json
|
||||||
* `f=searchField:/docs/*` : maps all fields under /docs to a single field called ‘searchField’
|
* `f=searchField:/docs/*`: maps all fields under /docs to a single field called ‘searchField’
|
||||||
* `f=searchField:/docs/**` : maps all fields under /docs and its children to searchField
|
* `f=searchField:/docs/**`: maps all fields under /docs and its children to searchField
|
||||||
|
|
||||||
With wildcards we can further simplify our previous example as follows:
|
With wildcards we can further simplify our previous example as follows:
|
||||||
|
|
||||||
|
@ -348,7 +348,7 @@ With this example, the documents indexed would be, as follows:
|
||||||
|
|
||||||
. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
|
. Schemaless mode: This handles field creation automatically. The field guessing may not be exactly as you expect, but it works. The best thing to do is to setup a local server in schemaless mode, index a few sample docs and create those fields in your real setup with proper field types before indexing
|
||||||
. Pre-created Schema: Post your docs to the `/update/json/docs` endpoint with `echo=true`. This gives you the list of field names you need to create. Create the fields before you actually index
|
. Pre-created Schema: Post your docs to the `/update/json/docs` endpoint with `echo=true`. This gives you the list of field names you need to create. Create the fields before you actually index
|
||||||
. No schema, only full-text search : All you need to do is to do full-text search on your JSON. Set the configuration as given in the Setting JSON Defaults section.
|
. No schema, only full-text search: All you need to do is to do full-text search on your JSON. Set the configuration as given in the Setting JSON Defaults section.
|
||||||
|
|
||||||
== Setting JSON Defaults
|
== Setting JSON Defaults
|
||||||
|
|
||||||
|
|
|
@ -76,7 +76,7 @@ By default, values are returned as a String, but a "```t```" parameter can be sp
|
||||||
q=*:*&fl=id,my_number:[value v=42 t=int],my_string:[value v=42]
|
q=*:*&fl=id,my_number:[value v=42 t=int],my_string:[value v=42]
|
||||||
----
|
----
|
||||||
|
|
||||||
In addition to using these request parameters, you can configure additional named instances of ValueAugmenterFactory, or override the default behavior of the existing `[value]` transformer in your solrconfig.xml file:
|
In addition to using these request parameters, you can configure additional named instances of ValueAugmenterFactory, or override the default behavior of the existing `[value]` transformer in your `solrconfig.xml` file:
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
@ -264,7 +264,7 @@ fl=...*,categories:[subquery]&categories.fl=*&categories.q=...
|
||||||
|
|
||||||
==== Subquery Parameters Shift
|
==== Subquery Parameters Shift
|
||||||
|
|
||||||
If subquery is declared as `fl=*,foo:[subquery]`, subquery parameters are prefixed with the given name and period. eg
|
If a subquery is declared as `fl=*,foo:[subquery]`, subquery parameters are prefixed with the given name and period. For example:
|
||||||
|
|
||||||
`q=*:*&fl=*,**foo**:[subquery]&**foo.**q=to be continued&**foo.**rows=10&**foo.**sort=id desc`
|
`q=*:*&fl=*,**foo**:[subquery]&**foo.**q=to be continued&**foo.**rows=10&**foo.**sort=id desc`
|
||||||
|
|
||||||
|
@ -272,15 +272,15 @@ If subquery is declared as `fl=*,foo:[subquery]`, subquery parameters are prefix
|
||||||
|
|
||||||
It's necessary to pass some document field values as a parameter for subquery. It's supported via implicit *`row.__fieldname__`* parameter, and can be (but might not only) referred via Local Parameters syntax: `q=namne:john&fl=name,id,depts:[subquery]&depts.q={!terms f=id **v=$row.dept_id**}&depts.rows=10`
|
It's necessary to pass some document field values as a parameter for subquery. It's supported via implicit *`row.__fieldname__`* parameter, and can be (but might not only) referred via Local Parameters syntax: `q=namne:john&fl=name,id,depts:[subquery]&depts.q={!terms f=id **v=$row.dept_id**}&depts.rows=10`
|
||||||
|
|
||||||
Here departmens are retrieved per every employee in search result. We can say that it's like SQL `join ON emp.dept_id=dept.id`.
|
Here departments are retrieved per every employee in search result. We can say that it's like SQL `join ON emp.dept_id=dept.id`.
|
||||||
|
|
||||||
Note, when document field has multiple values they are concatenated with comma by default, it can be changed by local parameter `foo:[subquery separator=' ']` , this mimics *`{!terms}`* to work smoothly with it.
|
Note, when a document field has multiple values they are concatenated with a comma by default. This can be changed with the local parameter `foo:[subquery separator=' ']`, this mimics *`{!terms}`* to work smoothly with it.
|
||||||
|
|
||||||
To log substituted subquery request parameters, add the corresponding parameter names, as in `depts.logParamsList=q,fl,rows,**row.dept_id**`
|
To log substituted subquery request parameters, add the corresponding parameter names, as in `depts.logParamsList=q,fl,rows,**row.dept_id**`
|
||||||
|
|
||||||
==== Cores and Collections in SolrCloud
|
==== Cores and Collections in SolrCloud
|
||||||
|
|
||||||
Use `foo:[subquery fromIndex=departments]` to invoke subquery on another core on the same node, it's what *`{!join}`* does for non-SolrCloud mode. But in case of SolrCloud just (and only) explicitly specify its' native parameters like `collection, shards` for subquery, eg:
|
Use `foo:[subquery fromIndex=departments]` to invoke subquery on another core on the same node, it's what *`{!join}`* does for non-SolrCloud mode. But in case of SolrCloud just (and only) explicitly specify its native parameters like `collection, shards` for subquery, e.g.:
|
||||||
|
|
||||||
`q=*:*&fl=*,foo:[subquery]&foo.q=cloud&**foo.collection**=departments`
|
`q=*:*&fl=*,foo:[subquery]&foo.q=cloud&**foo.collection**=departments`
|
||||||
|
|
||||||
|
|
|
@ -331,7 +331,7 @@ These factories all provide functionality to _modify_ fields in a document as th
|
||||||
|
|
||||||
{solr-javadocs}/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html[RegexReplaceProcessorFactory]:: An updated processor that applies a configured regex to any CharSequence values found in the selected fields, and replaces any matches with the configured replacement string.
|
{solr-javadocs}/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html[RegexReplaceProcessorFactory]:: An updated processor that applies a configured regex to any CharSequence values found in the selected fields, and replaces any matches with the configured replacement string.
|
||||||
|
|
||||||
{solr-javadocs}/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html[RemoveBlankFieldUpdateProcessorFactory]:: Removes any values found which are CharSequence with a length of 0. (ie: empty strings).
|
{solr-javadocs}/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html[RemoveBlankFieldUpdateProcessorFactory]:: Removes any values found which are CharSequence with a length of 0 (i.e., empty strings).
|
||||||
|
|
||||||
{solr-javadocs}/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html[TrimFieldUpdateProcessorFactory]:: Trims leading and trailing whitespace from any CharSequence values found in fields matching the specified conditions.
|
{solr-javadocs}/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html[TrimFieldUpdateProcessorFactory]:: Trims leading and trailing whitespace from any CharSequence values found in fields matching the specified conditions.
|
||||||
|
|
||||||
|
|
|
@ -398,7 +398,7 @@ curl 'http://localhost:8983/solr/my_collection/update?commit=true' --data-binary
|
||||||
|
|
||||||
=== CSV Update Parameters
|
=== CSV Update Parameters
|
||||||
|
|
||||||
The CSV handler allows the specification of many parameters in the URL in the form: `f._parameter_._optional_fieldname_=_value_` .
|
The CSV handler allows the specification of many parameters in the URL in the form: `f._parameter_._optional_fieldname_=_value_`.
|
||||||
|
|
||||||
The table below describes the parameters for the update handler.
|
The table below describes the parameters for the update handler.
|
||||||
|
|
||||||
|
|
|
@ -77,7 +77,7 @@ You can also use `bin/post` to send a PDF file into Solr (without the params, th
|
||||||
bin/post -c techproducts example/exampledocs/solr-word.pdf -params "literal.id=a"
|
bin/post -c techproducts example/exampledocs/solr-word.pdf -params "literal.id=a"
|
||||||
----
|
----
|
||||||
|
|
||||||
Now you should be able to execute a query and find that document. You can make a request like `\http://localhost:8983/solr/techproducts/select?q=pdf` .
|
Now you should be able to execute a query and find that document. You can make a request like `\http://localhost:8983/solr/techproducts/select?q=pdf`.
|
||||||
|
|
||||||
You may notice that although the content of the sample document has been indexed and stored, there are not a lot of metadata fields associated with this document. This is because unknown fields are ignored according to the default parameters configured for the `/update/extract` handler in `solrconfig.xml`, and this behavior can be easily changed or overridden. For example, to store and see all metadata and content, execute the following:
|
You may notice that although the content of the sample document has been indexed and stored, there are not a lot of metadata fields associated with this document. This is because unknown fields are ignored according to the default parameters configured for the `/update/extract` handler in `solrconfig.xml`, and this behavior can be easily changed or overridden. For example, to store and see all metadata and content, execute the following:
|
||||||
|
|
||||||
|
@ -208,7 +208,7 @@ The `tika.config` entry points to a file containing a Tika configuration. The `d
|
||||||
|
|
||||||
=== Parser-Specific Properties
|
=== Parser-Specific Properties
|
||||||
|
|
||||||
Parsers used by Tika may have specific properties to govern how data is extracted. For instance, when using the Tika library from a Java program, the PDFParserConfig class has a method setSortByPosition(boolean) that can extract vertically oriented text. To access that method via configuration with the ExtractingRequestHandler, one can add the parseContext.config property to the solrconfig.xml file (see above) and then set properties in Tika's PDFParserConfig as below. Consult the Tika Java API documentation for configuration parameters that can be set for any particular parsers that require this level of control.
|
Parsers used by Tika may have specific properties to govern how data is extracted. For instance, when using the Tika library from a Java program, the PDFParserConfig class has a method `setSortByPosition(boolean)` that can extract vertically oriented text. To access that method via configuration with the ExtractingRequestHandler, one can add the `parseContext.config` property to the `solrconfig.xml` file (see above) and then set properties in Tika's PDFParserConfig as below. Consult the Tika Java API documentation for configuration parameters that can be set for any particular parsers that require this level of control.
|
||||||
|
|
||||||
[source,xml]
|
[source,xml]
|
||||||
----
|
----
|
||||||
|
|
|
@ -968,7 +968,7 @@ groupNames::
|
||||||
A comma separated list of field column names, used where the regex contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas.
|
A comma separated list of field column names, used where the regex contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas.
|
||||||
|
|
||||||
replaceWith::
|
replaceWith::
|
||||||
Used along with regex . It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`.
|
Used along with regex. It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`.
|
||||||
|
|
||||||
Here is an example of configuring the regex transformer:
|
Here is an example of configuring the regex transformer:
|
||||||
|
|
||||||
|
|
|
@ -61,13 +61,13 @@ To limit the introspect output to include just one particular HTTP method, add r
|
||||||
|
|
||||||
`\http://localhost:8983/api/c/_introspect?method=POST`
|
`\http://localhost:8983/api/c/_introspect?method=POST`
|
||||||
|
|
||||||
Most endpoints support commands provided in a body sent via POST. To limit the introspect output to only one command, add request param `command=_command-name_` .
|
Most endpoints support commands provided in a body sent via POST. To limit the introspect output to only one command, add request param `command=_command-name_`.
|
||||||
|
|
||||||
`\http://localhost:8983/api/c/gettingstarted/_introspect?method=POST&command=modify`
|
`\http://localhost:8983/api/c/gettingstarted/_introspect?method=POST&command=modify`
|
||||||
|
|
||||||
=== Interpreting the Introspect Output
|
=== Interpreting the Introspect Output
|
||||||
|
|
||||||
Example : `\http://localhost:8983/api/c/gettingstarted/get/_introspect`
|
Example: `\http://localhost:8983/api/c/gettingstarted/get/_introspect`
|
||||||
|
|
||||||
[source,json]
|
[source,json]
|
||||||
----
|
----
|
||||||
|
@ -95,12 +95,12 @@ Example : `\http://localhost:8983/api/c/gettingstarted/get/_introspect`
|
||||||
|
|
||||||
Description of some of the keys in the above example:
|
Description of some of the keys in the above example:
|
||||||
|
|
||||||
* `**documentation**` : URL to the online Solr reference guide section for this API
|
* `**documentation**`: URL to the online Solr reference guide section for this API
|
||||||
* `**description**` : A text description of the feature/variable/command etc.
|
* `**description**`: A text description of the feature/variable/command etc.
|
||||||
* `**spec/methods**` : HTTP methods supported by this API
|
* `**spec/methods**`: HTTP methods supported by this API
|
||||||
* `**spec/url/paths**` : URL paths supported by this API
|
* `**spec/url/paths**`: URL paths supported by this API
|
||||||
* `**spec/url/params**` : List of supported URL request params
|
* `**spec/url/params**`: List of supported URL request params
|
||||||
* `**availableSubPaths**` : List of valid URL subpaths and the HTTP method(s) each supports
|
* `**availableSubPaths**`: List of valid URL subpaths and the HTTP method(s) each supports
|
||||||
|
|
||||||
Example of introspect for a POST API: `\http://localhost:8983/api/c/gettingstarted/_introspect?method=POST&command=modify`
|
Example of introspect for a POST API: `\http://localhost:8983/api/c/gettingstarted/_introspect?method=POST&command=modify`
|
||||||
|
|
||||||
|
|
|
@ -38,7 +38,7 @@ Its JAR and dependencies must be added (via `<lib>` or solr/home lib inclusion),
|
||||||
</queryResponseWriter>
|
</queryResponseWriter>
|
||||||
----
|
----
|
||||||
|
|
||||||
The above example shows the optional initialization and custom tool parameters used by VelocityResponseWriter; these are detailed in the following table. These initialization parameters are only specified in the writer registration in solrconfig.xml, not as request-time parameters. See further below for request-time parameters.
|
The above example shows the optional initialization and custom tool parameters used by VelocityResponseWriter; these are detailed in the following table. These initialization parameters are only specified in the writer registration in `solrconfig.xml`, not as request-time parameters. See further below for request-time parameters.
|
||||||
|
|
||||||
== Configuration & Usage
|
== Configuration & Usage
|
||||||
|
|
||||||
|
|
|
@ -165,7 +165,7 @@ name ::= text
|
||||||
value ::= text
|
value ::= text
|
||||||
----
|
----
|
||||||
|
|
||||||
Special characters in "text" values can be escaped using the escape character `\` . The following escape sequences are recognized:
|
Special characters in "text" values can be escaped using the escape character `\`. The following escape sequences are recognized:
|
||||||
|
|
||||||
[width="60%",options="header",]
|
[width="60%",options="header",]
|
||||||
|===
|
|===
|
||||||
|
|
Loading…
Reference in New Issue