mirror of https://github.com/apache/lucene.git
Ref Guide: copy edits for 8.8 release
This commit is contained in:
parent
90aabbdde8
commit
30aa0f5ba4
|
@ -45,7 +45,7 @@ This plugin provides no additional checking beyond what has been configured via
|
|||
This plugin will configure the user principal for the request based on the X500 subject present in the client certificate.
|
||||
Authorization plugins will need to accept and handle the full subject name, for example:
|
||||
|
||||
[source]
|
||||
[source,text]
|
||||
----
|
||||
CN=Solr User,OU=Engineering,O=Example Inc.,C=US
|
||||
----
|
||||
|
@ -58,4 +58,3 @@ It is best practice to verify the actual contents of certificates issued by your
|
|||
|
||||
With certificate authentication enabled, all client requests must include a valid certificate.
|
||||
This is identical to the <<enabling-ssl.adoc#example-client-actions,client requirements>> when using SSL.
|
||||
|
||||
|
|
|
@ -140,7 +140,7 @@ For example, if the Solr node is running behind a proxy or in a cloud environmen
|
|||
`hostPort` is the port that the Solr instance wants other nodes to contact it at.
|
||||
+
|
||||
In the default `solr.xml` file, this is set to `${solr.port.advertise:0}`.
|
||||
If no port is passed via the `solr.xml` (i.e. `0`), then Solr will default to the port that jetty is listening on, defined by `${jetty.port}`.
|
||||
If no port is passed via the `solr.xml` (i.e., `0`), then Solr will default to the port that jetty is listening on, defined by `${jetty.port}`.
|
||||
|
||||
`leaderVoteWait`::
|
||||
When SolrCloud is starting up, how long each Solr node will wait for all known replicas for that shard to be found before assuming that any nodes that haven't reported are down.
|
||||
|
|
|
@ -227,7 +227,7 @@ By default, the Unified Highlighter will usually pick the right offset source (s
|
|||
The offset source can be explicitly configured to one of: `ANALYSIS`, `POSTINGS`, `POSTINGS_WITH_TERM_VECTORS`, or `TERM_VECTORS`.
|
||||
|
||||
`hl.fragAlignRatio`::
|
||||
This parameter influences where the first match (i.e. highlighted text) in a passage is positioned.
|
||||
This parameter influences where the first match (i.e., highlighted text) in a passage is positioned.
|
||||
The default value of `0.5` means to align the match to the middle.
|
||||
A value of `0.0` means to align the match to the left, while `1.0` to align it to the right.
|
||||
This setting is a best-effort hint, as there are a variety of factors.
|
||||
|
|
|
@ -188,42 +188,42 @@ include::{example-source-dir}JsonRequestApiTest.java[tag=solrj-json-terms-facet-
|
|||
[width="100%",cols="20%,90%",options="header",]
|
||||
|===
|
||||
|Parameter |Description
|
||||
|field |The field name to facet over.
|
||||
|offset |Used for paging, this skips the first N buckets. Defaults to 0.
|
||||
|limit |Limits the number of buckets returned. Defaults to 10.
|
||||
|sort |Specifies how to sort the buckets produced.
|
||||
|`field` |The field name to facet over.
|
||||
|`offset` |Used for paging, this skips the first N buckets. Defaults to 0.
|
||||
|`limit` |Limits the number of buckets returned. Defaults to 10.
|
||||
|`sort` |Specifies how to sort the buckets produced.
|
||||
|
||||
“count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any <<json-facet-api.adoc#stat-facet-functions,facet function / statistic>> that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like `sort:{count:desc}`. The sort order may either be “asc” or “desc”
|
||||
|overrequest a|
|
||||
`count` specifies document count, `index` sorts by the index (natural) order of the bucket value. One can also sort by any <<json-facet-api.adoc#stat-facet-functions,facet function / statistic>> that occurs in the bucket. The default is `count desc`. This parameter may also be specified in JSON like `sort:{count:desc}`. The sort order may either be “asc” or “desc”
|
||||
|`overrequest` a|
|
||||
Number of buckets beyond the `limit` to internally request from shards during a distributed search.
|
||||
|
||||
Larger values can increase the accuracy of the final "Top Terms" returned when the individual shards have very diff top terms.
|
||||
|
||||
The default of `-1` causes a hueristic to be applied based on the other options specified.
|
||||
|refine |If `true`, turns on distributed facet refining. This uses a second phase to retrieve any buckets needed for the final result from shards that did not include those buckets in their initial internal results, so that every shard contributes to every returned bucket in this facet and any sub-facets. This makes counts & stats for returned buckets exact.
|
||||
|overrefine a|
|
||||
|`refine` |If `true`, turns on distributed facet refining. This uses a second phase to retrieve any buckets needed for the final result from shards that did not include those buckets in their initial internal results, so that every shard contributes to every returned bucket in this facet and any sub-facets. This makes counts & stats for returned buckets exact.
|
||||
|`overrefine` a|
|
||||
Number of buckets beyond the `limit` to consider internally during a distributed search when determining which buckets to refine.
|
||||
|
||||
Larger values can increase the accuracy of the final "Top Terms" returned when the individual shards have very diff top terms, and the current `sort` option can result in refinement pushing terms lower down the sorted list (ex: `sort:"count asc"`)
|
||||
|
||||
The default of `-1` causes a hueristic to be applied based on other options specified.
|
||||
|mincount |Only return buckets with a count of at least this number. Defaults to 1.
|
||||
|missing |A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to false.
|
||||
|numBuckets |A boolean. If true, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to false.
|
||||
|allBuckets |A boolean. If true, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to false.
|
||||
|prefix |Only produce buckets for terms starting with the specified prefix.
|
||||
|facet |Aggregations, metrics or nested facets that will be calculated for every returned bucket
|
||||
|method a|
|
||||
|`mincount` |Only return buckets with a count of at least this number. Defaults to `1`.
|
||||
|`missing` |A boolean that specifies if a special “missing” bucket should be returned that is defined by documents without a value in the field. Defaults to `false`.
|
||||
|`numBuckets` |A boolean. If `true`, adds “numBuckets” to the response, an integer representing the number of buckets for the facet (as opposed to the number of buckets returned). Defaults to `false`.
|
||||
|`allBuckets` |A boolean. If `true`, adds an “allBuckets” bucket to the response, representing the union of all of the buckets. For multi-valued fields, this is different than a bucket for all of the documents in the domain since a single document can belong to multiple buckets. Defaults to `false`.
|
||||
|`prefix` |Only produce buckets for terms starting with the specified prefix.
|
||||
|`facet` |Aggregations, metrics or nested facets that will be calculated for every returned bucket
|
||||
|`method` a|
|
||||
This parameter indicates the facet algorithm to use:
|
||||
|
||||
* "dv" DocValues, collect into ordinal array
|
||||
* "uif" UnInvertedField, collect into ordinal array
|
||||
* "dvhash" DocValues, collect into hash - improves efficiency over high cardinality fields
|
||||
* "enum" TermsEnum then intersect DocSet (stream-able)
|
||||
* "stream" Presently equivalent to "enum" - used for indexed, non-point fields with sort 'index asc' and allBuckets, numBuckets, missing disabled.
|
||||
* "smart" Pick the best method for the field type (this is the default)
|
||||
* `dv` DocValues, collect into ordinal array
|
||||
* `uif` UnInvertedField, collect into ordinal array
|
||||
* `dvhash` DocValues, collect into hash - improves efficiency over high cardinality fields
|
||||
* `enum` TermsEnum then intersect DocSet (stream-able)
|
||||
* `stream` Presently equivalent to `enum`. Used for indexed, non-point fields with sort `index asc` and `allBuckets`, `numBuckets`, and `missing` disabled.
|
||||
* `smart` Pick the best method for the field type (this is the default)
|
||||
|
||||
|prelim_sort |An optional parameter for specifying an approximation of the final `sort` to use during initial collection of top buckets when the <<json-facet-api.adoc#sorting-facets-by-nested-functions,`sort` parameter is very costly>>.
|
||||
|`prelim_sort` |An optional parameter for specifying an approximation of the final `sort` to use during initial collection of top buckets when the <<json-facet-api.adoc#sorting-facets-by-nested-functions,`sort` parameter is very costly>>.
|
||||
|===
|
||||
|
||||
=== Query Facet
|
||||
|
|
|
@ -290,7 +290,7 @@ The output will include the model picked for each search result, resembling the
|
|||
}}
|
||||
----
|
||||
|
||||
=== Running a Rerank Query Interleaving a model with the original ranking
|
||||
=== Running a Rerank Query Interleaving a Model with the Original Ranking
|
||||
When approaching Search Quality Evaluation with interleaving it may be useful to compare a model with the original ranking.
|
||||
To rerank the results of a query, interleaving a model with the original ranking, add the `rq` parameter to your search, passing the special inbuilt `_OriginalRanking_` model identifier as one model and your comparison model as the other model, for example:
|
||||
|
||||
|
@ -329,7 +329,7 @@ The output will include the model picked for each search result, resembling the
|
|||
}}
|
||||
----
|
||||
|
||||
=== Running a Rerank Query with Interleaving passing a specific algorithm
|
||||
=== Running a Rerank Query with Interleaving Passing a Specific Algorithm
|
||||
To rerank the results of a query, interleaving two models using a specific algorithm, add the `interleavingAlgorithm` local parameter to the ltr query parser, for example:
|
||||
|
||||
[source,text]
|
||||
|
|
|
@ -108,7 +108,7 @@ _(raw; not yet edited)_
|
|||
|
||||
* SOLR-11775: Return long value for facet count in Json Facet module irrespective of number of shards (hossman, Munendra S N)
|
||||
|
||||
* SOLR-12823: Remove /clusterstate.json support, i.e. support for collections created with stateFormat=1 as well as support
|
||||
* SOLR-12823: Remove /clusterstate.json support, i.e., support for collections created with stateFormat=1 as well as support
|
||||
for Collection API MIGRATESTATEFORMAT action. Also removes support for cluster property `legacyCloud` (as if always false now).
|
||||
|
||||
* SOLR-14656: Autoscaling framework removed
|
||||
|
|
|
@ -97,7 +97,7 @@ The metrics available in your system can be customized by modifying the `<metric
|
|||
|
||||
TIP: See also the section <<format-of-solr-xml.adoc#format-of-solr-xml,Format of Solr.xml>> for more information about the `solr.xml` file, where to find it, and how to edit it.
|
||||
|
||||
=== Disabling the metrics collection ===
|
||||
=== Disabling the Metrics Collection
|
||||
The `<metrics>` element in `solr.xml` supports one attribute `enabled`, which takes a boolean value,
|
||||
for example `<metrics enabled="true">`.
|
||||
|
||||
|
|
|
@ -118,7 +118,7 @@ The Solr's metrics exposed by `solr-exporter` can be seen at: `\http://localhost
|
|||
|
||||
=== Environment Variable Options
|
||||
|
||||
The bin scripts provided with the Prometheus Exporter support the use of custom java options through the following environment variables:
|
||||
The `./bin` scripts provided with the Prometheus Exporter support the use of custom java options through the following environment variables:
|
||||
|
||||
`JAVA_HEAP`::
|
||||
Sets the initial (`Xms`) and max (`Xmx`) Java heap size. The default is `512m`.
|
||||
|
@ -133,13 +133,13 @@ Custom Java garbage collection settings. The default is `-XX:+UseG1GC`.
|
|||
Extra JVM options.
|
||||
|
||||
`ZK_CREDS_AND_ACLS`::
|
||||
Credentials for connecting to a ZK Host that is protected with ACLs.
|
||||
For more information on what to include in this variable, refer to the <<zookeeper-access-control.adoc#zookeeper-acls-in-solr-scripts,Solr ZK ACL docs>> or the <<#getting-metrics-from-a-secured-solrcloud,example below>>.
|
||||
Credentials for connecting to a ZooKeeper host that is protected with ACLs.
|
||||
For more information on what to include in this variable, refer to the section <<zookeeper-access-control.adoc#zookeeper-acls-in-solr-scripts,ZooKeeper Access Control>> or the <<getting-metrics-from-a-secured-solrcloud,example below>>.
|
||||
|
||||
`CLASSPATH_PREFIX`::
|
||||
Location of extra libraries to load when starting the `solr-exporter`.
|
||||
|
||||
All <<#command-line-parameters,command line parameters>> are able to be provided via environment variables when using the bin scripts.
|
||||
All <<#command-line-parameters,command line parameters>> are able to be provided via environment variables when using the `./bin` scripts.
|
||||
|
||||
=== Getting Metrics from a Secured SolrCloud
|
||||
|
||||
|
|
|
@ -16,7 +16,6 @@
|
|||
// specific language governing permissions and limitations
|
||||
// under the License.
|
||||
|
||||
|
||||
In most search applications, the "top" matching results (sorted by score, or some other criteria) are displayed to some human user.
|
||||
|
||||
In many applications the UI for these sorted results are displayed to the user in "pages" containing a fixed number of matching results, and users don't typically look at results past the first few pages worth of results.
|
||||
|
@ -97,9 +96,15 @@ There are a few important constraints to be aware of when using `cursorMark` par
|
|||
|
||||
. `cursorMark` and `start` are mutually exclusive parameters.
|
||||
* Your requests must either not include a `start` parameter, or it must be specified with a value of "```0```".
|
||||
. When using the <<common-query-parameters.adoc#timeallowed-parameter,`timeAllowed` request param>>, partial results may be returned. If time expires before the search is complete - as indicated when the `responseHeader` includes `"partialResults": true`, some matching documents may have been skipped. Additionally, if `cursorMark` matches `nextCursorMark`, you cannot be sure that there are no more results. In these situation, consider increasing `timeAllowed` and reissuing the query. When the `responseHeader` no longer includes `"partialResults": true` and `cursorMark` matches `nextCursorMark`, there are no more results.
|
||||
. When using the <<common-query-parameters.adoc#timeallowed-parameter,`timeAllowed`>> request parameter, partial results may be returned.
|
||||
If time expires before the search is complete, as indicated when the `responseHeader` includes `"partialResults": true`, some matching documents may have been skipped.
|
||||
Additionally, if `cursorMark` matches `nextCursorMark`, you cannot be sure that there are no more results.
|
||||
+
|
||||
In this situation, consider increasing `timeAllowed` and reissuing the query.
|
||||
When the `responseHeader` no longer includes `"partialResults": true`, and `cursorMark` matches `nextCursorMark`, there are no more results.
|
||||
. `sort` clauses must include the uniqueKey field (either `asc` or `desc`).
|
||||
* If `id` is your uniqueKey field, then sort parameters like `id asc` and `name asc, id desc` would both work fine, but `name asc` by itself would not
|
||||
+
|
||||
If `id` is your uniqueKey field, then sort parameters like `id asc` and `name asc, id desc` would both work fine, but `name asc` by itself would not
|
||||
. Sorts including <<working-with-dates.adoc#working-with-dates,Date Math>> based functions that involve calculations relative to `NOW` will cause confusing results, since every document will get a new sort value on every subsequent request. This can easily result in cursors that never end, and constantly return the same documents over and over – even if the documents are never updated.
|
||||
+
|
||||
In this situation, choose & re-use a fixed value for the <<working-with-dates.adoc#now,`NOW` request param>> in all of your cursor requests.
|
||||
|
|
|
@ -50,7 +50,7 @@ If this assumption is false, Solr will do a cheap check that usually detects the
|
|||
throw an exception to alert you of the need to specify the Root ID.
|
||||
This backwards incompatible change was done to increase performance and robustness.
|
||||
*** This feature no longer requires stored=true or docValues=true on the `\_root_` field. You might
|
||||
have it for other purposes though (e.g. for `uniqueBlock(...)`)
|
||||
have it for other purposes though (e.g., for `uniqueBlock(...)`)
|
||||
*** This feature no longer requires the `\_nest_path_` field, although you probably ought to
|
||||
continue to define it as it's useful for other things.
|
||||
|
||||
|
|
|
@ -130,7 +130,7 @@ Solr offers two solutions to address this:
|
|||
Furthermore, you _should_ (sometimes _must_) specify the Root document's ID in the `\_root_`
|
||||
field of this partial update. This is how Solr understands that you are updating a child
|
||||
document, and not a Root document. Without it, Solr only guesses that the `\_route_` param is
|
||||
equivalent, but it may be absent or not equivalent (e.g. when using the `implicit` router).
|
||||
equivalent, but it may be absent or not equivalent (e.g., when using the `implicit` router).
|
||||
|
||||
All of the examples below use `id` prefixes, so no `\_route_` param will be necessary for these examples.
|
||||
====
|
||||
|
|
Loading…
Reference in New Issue