From bbae31b9d22d0f96810edbbe8898bb3016f0c253 Mon Sep 17 00:00:00 2001 From: Cassandra Targett Date: Fri, 14 Jul 2017 16:33:50 -0500 Subject: [PATCH] SOLR-11050, SOLR-10892: last set of Confluence-style anchors removed and parameter tables obliterated --- .../src/collapse-and-expand-results.adoc | 2 +- .../src/common-query-parameters.adoc | 145 +++---- .../src/configuring-logging.adoc | 2 +- solr/solr-ref-guide/src/defining-fields.adoc | 2 +- solr/solr-ref-guide/src/docvalues.adoc | 6 +- solr/solr-ref-guide/src/faceting.adoc | 390 ++++++------------ ...field-type-definitions-and-properties.adoc | 6 +- solr/solr-ref-guide/src/function-queries.adoc | 2 +- .../src/indexconfig-in-solrconfig.adoc | 2 +- solr/solr-ref-guide/src/query-re-ranking.adoc | 2 +- solr/solr-ref-guide/src/realtime-get.adoc | 2 +- ...rs-and-searchcomponents-in-solrconfig.adoc | 4 +- solr/solr-ref-guide/src/spell-checking.adoc | 188 +++------ .../src/the-dismax-query-parser.adoc | 58 +-- .../src/the-extended-dismax-query-parser.adoc | 88 ++-- .../src/the-standard-query-parser.adoc | 96 +---- .../src/the-stats-component.adoc | 2 +- 17 files changed, 307 insertions(+), 690 deletions(-) diff --git a/solr/solr-ref-guide/src/collapse-and-expand-results.adoc b/solr/solr-ref-guide/src/collapse-and-expand-results.adoc index 0c0bbd10033..3a99897967f 100644 --- a/solr/solr-ref-guide/src/collapse-and-expand-results.adoc +++ b/solr/solr-ref-guide/src/collapse-and-expand-results.adoc @@ -44,7 +44,7 @@ At most only one of the `min`, `max`, or `sort` (see below) parameters may be sp If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none. sort:: -Selects the group head document for each group based on which document comes first according to the specified <>. +Selects the group head document for each group based on which document comes first according to the specified <>. + At most only one of the `min`, `max`, (see above) or `sort` parameters may be specified. + diff --git a/solr/solr-ref-guide/src/common-query-parameters.adoc b/solr/solr-ref-guide/src/common-query-parameters.adoc index 1eea0807d46..08ce788f740 100644 --- a/solr/solr-ref-guide/src/common-query-parameters.adoc +++ b/solr/solr-ref-guide/src/common-query-parameters.adoc @@ -20,33 +20,9 @@ Several query parsers share supported query parameters. -The table below summarizes Solr's common query parameters, which are supported by the <> +The following sections describe Solr's common query parameters, which are supported by the <>. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Parameter |Description -|<> |Selects the query parser to be used to process the query. -|<> |Sorts the response to a query in either ascending or descending order based on the response's score or another specified characteristic. -|<> |Specifies an offset (by default, 0) into the responses at which Solr should begin displaying content. -|<> |Controls how many rows of responses are displayed at a time (default value: 10) -|<> |Applies a filter query to the search results. -|<> |Limits the information included in a query response to a specified list of fields. The fields need to either be `stored="true"` or `docValues="true"` -|<> |Request additional debugging information in the response. Specifying the `debug=timing` parameter returns just the timing information; specifying the `debug=results` parameter returns "explain" information for each of the documents returned; specifying the `debug=query parameter` returns all of the debug information. -|<> |Allows clients to specify a Lucene query to identify a set of documents. If non-blank, the explain info of each document which matches this query, relative to the main query (specified by the q parameter) will be returned along with the rest of the debugging information. -|<> |Defines the time allowed for the query to be processed. If the time elapses before the query response is complete, partial information may be returned. -|<> |Indicates that, if possible, Solr should stop collecting documents from each individual (sorted) segment once it can determine that any subsequent documents in that segment will not be candidates for the `rows` being returned. The default is false. -|<> |Excludes the header from the returned results, if set to true. The header contains information about the request, such as the time the request took to complete. The default is false. -|<> |Specifies the Response Writer to be used to format the query response. -|<> |By default, Solr logs all parameters. Set this parameter to restrict which parameters are logged. Valid entries are the parameters to be logged, separated by commas (i.e., `logParamsList=param1,param2`). An empty list will log no parameters, so if logging all parameters is desired, do not define this additional parameter at all. -|<> |The response header can include parameters sent with the query request. This parameter controls what is contained in that section of the response header. Valid values are `none`, `all`, and `explicit`. The default value is `explicit.` -|=== - -The following sections describe these parameters in detail. - -[[CommonQueryParameters-ThedefTypeParameter]] -== The defType Parameter +== defType Parameter The defType parameter selects the query parser that Solr should use to process the main query parameter (`q`) in the request. For example: @@ -54,8 +30,7 @@ The defType parameter selects the query parser that Solr should use to process t If no defType param is specified, then by default, the <> is used. (eg: `defType=lucene`) -[[CommonQueryParameters-ThesortParameter]] -== The sort Parameter +== sort Parameter The `sort` parameter arranges search results in either ascending (`asc`) or descending (`desc`) order. The parameter can be used with either numerical or alphabetical content. The directions can be entered in either all lowercase or all uppercase letters (i.e., both `asc` or `ASC`). @@ -87,26 +62,23 @@ Regarding the sort parameter's arguments: * Multiple sort orderings can be separated by a comma, using this syntax: `sort=+,+],...` ** When more than one sort criteria is provided, the second entry will only be used if the first entry results in a tie. If there is a third entry, it will only be used if the first AND second entries are tied. This pattern continues with further entries. -[[CommonQueryParameters-ThestartParameter]] -== The start Parameter +== start Parameter When specified, the `start` parameter specifies an offset into a query's result set and instructs Solr to begin displaying results from this offset. -The default value is "0". In other words, by default, Solr returns results without an offset, beginning where the results themselves begin. +The default value is `0`. In other words, by default, Solr returns results without an offset, beginning where the results themselves begin. -Setting the `start` parameter to some other number, such as 3, causes Solr to skip over the preceding records and start at the document identified by the offset. +Setting the `start` parameter to some other number, such as `3`, causes Solr to skip over the preceding records and start at the document identified by the offset. You can use the `start` parameter this way for paging. For example, if the `rows` parameter is set to 10, you could display three successive pages of results by setting start to 0, then re-issuing the same query and setting start to 10, then issuing the query again and setting start to 20. -[[CommonQueryParameters-TherowsParameter]] -== The rows Parameter +== rows Parameter -You can use the rows parameter to paginate results from a query. The parameter specifies the maximum number of documents from the complete result set that Solr should return to the client at one time. +You can use the `rows` parameter to paginate results from a query. The parameter specifies the maximum number of documents from the complete result set that Solr should return to the client at one time. -The default value is 10. That is, by default, Solr returns 10 documents at a time in response to a query. +The default value is `10`. That is, by default, Solr returns 10 documents at a time in response to a query. -[[CommonQueryParameters-Thefq_FilterQuery_Parameter]] -== The fq (Filter Query) Parameter +== fq (Filter Query) Parameter The `fq` parameter defines a query that can be used to restrict the superset of documents that can be returned, without influencing score. It can be very useful for speeding up complex queries, since the queries specified with `fq` are cached independently of the main query. When a later query uses the same filter, there's a cache hit, and filter results are returned quickly from the cache. @@ -127,14 +99,13 @@ fq=+popularity:[10 TO *] +section:0 ---- * The document sets from each filter query are cached independently. Thus, concerning the previous examples: use a single `fq` containing two mandatory clauses if those clauses appear together often, and use two separate `fq` parameters if they are relatively independent. (To learn about tuning cache sizes and making sure a filter cache actually exists, see <>.) -* It is also possible to use <> inside the `fq` to cache clauses individually and - among other things - to achieve union of cached filter queries. +* It is also possible to use <> inside the `fq` to cache clauses individually and - among other things - to achieve union of cached filter queries. * As with all parameters: special characters in an URL need to be properly escaped and encoded as hex values. Online tools are available to help you with URL-encoding. For example: http://meyerweb.com/eric/tools/dencoder/. -[[CommonQueryParameters-Thefl_FieldList_Parameter]] -== The fl (Field List) Parameter +== fl (Field List) Parameter -The `fl` parameter limits the information included in a query response to a specified list of fields. The fields need to either be `stored="true"` or `docValues="true"``.` +The `fl` parameter limits the information included in a query response to a specified list of fields. The fields must be either `stored="true"` or `docValues="true"``.` The field list can be specified as a space-separated or comma-separated list of field names. The string "score" can be used to indicate that the score of each document for the particular query should be returned as a field. The wildcard character `*` selects all the fields in the document which are either `stored="true"` or `docValues="true"` and `useDocValuesAsStored="true"` (which is the default when docValues are enabled). You can also add pseudo-fields, functions and transformers to the field list request. @@ -154,8 +125,7 @@ This table shows some basic examples of how to use `fl`: |*,dv_field_name |Return all the `stored` fields in each document, and any `docValues` fields that have `useDocValuesAsStored="true`" and the docValues from dv_field_name even if it has `useDocValuesAsStored="false`" |=== -[[CommonQueryParameters-FunctionValues]] -=== Function Values +=== Functions with fl <> can be computed for each document in the result and returned as a pseudo-field: @@ -164,8 +134,7 @@ This table shows some basic examples of how to use `fl`: fl=id,title,product(price,popularity) ---- -[[CommonQueryParameters-DocumentTransformers]] -=== Document Transformers +=== Document Transformers with fl <> can be used to modify the information returned about each documents in the results of a query: @@ -174,7 +143,6 @@ fl=id,title,product(price,popularity) fl=id,title,[explain] ---- -[[CommonQueryParameters-FieldNameAliases]] === Field Name Aliases You can change the key used to in the response for a field, function, or transformer by prefixing it with a `_"displayName_:`". For example: @@ -203,8 +171,7 @@ fl=id,sales_price:price,secret_sauce:prod(price,popularity),why_score:[explain s }]}}]}} ---- -[[CommonQueryParameters-ThedebugParameter]] -== The debug Parameter +== debug Parameter The `debug` parameter can be specified multiple times and supports the following arguments: @@ -218,8 +185,7 @@ For backwards compatibility with older versions of Solr, `debugQuery=true` may i The default behavior is not to include debugging information. -[[CommonQueryParameters-TheexplainOtherParameter]] -== The explainOther Parameter +== explainOther Parameter The `explainOther` parameter specifies a Lucene query in order to identify a set of documents. If this parameter is included and is set to a non-blank value, the query will return debugging information, along with the "explain info" of each document that matches the Lucene query, relative to the main query (which is specified by the q parameter). For example: @@ -232,45 +198,40 @@ The query above allows you to examine the scoring explain info of the top matchi The default value of this parameter is blank, which causes no extra "explain info" to be returned. -[[CommonQueryParameters-ThetimeAllowedParameter]] -== The timeAllowed Parameter +== timeAllowed Parameter This parameter specifies the amount of time, in milliseconds, allowed for a search to complete. If this time expires before the search is complete, any partial results will be returned, but values such as `numFound`, <> counts, and result <> may not be accurate for the entire result set. This value is only checked at the time of: -1. Query Expansion, and -2. Document collection +. Query Expansion, and +. Document collection -As this check is periodically performed, the actual time for which a request can be processed before it is aborted would be marginally greater than or equal to the value of `timeAllowed`. If the request consumes more time in other stages, e.g., custom components, etc., this parameter is not expected to abort the request. +As this check is periodically performed, the actual time for which a request can be processed before it is aborted would be marginally greater than or equal to the value of `timeAllowed`. If the request consumes more time in other stages, custom components, etc., this parameter is not expected to abort the request. -[[CommonQueryParameters-ThesegmentTerminateEarlyParameter]] -== The segmentTerminateEarly Parameter +== segmentTerminateEarly Parameter -This parameter may be set to either true or false. +This parameter may be set to either `true` or `false`. -If set to true, and if <> for this collection is a {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] which uses a `sort` option which is compatible with <> specified for this query, then Solr will attempt to use an {lucene-javadocs}/core/org/apache/lucene/search/EarlyTerminatingSortingCollector.html[`EarlyTerminatingSortingCollector`]. +If set to `true`, and if <> for this collection is a {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] which uses a `sort` option compatible with <> specified for this query, then Solr will attempt to use an {lucene-javadocs}/core/org/apache/lucene/search/EarlyTerminatingSortingCollector.html[`EarlyTerminatingSortingCollector`]. If early termination is used, a `segmentTerminatedEarly` header will be included in the `responseHeader`. -Similar to using <>, when early segment termination happens values such as `numFound`, <> counts, and result <> may not be accurate for the entire result set. +Similar to using <>, when early segment termination happens values such as `numFound`, <> counts, and result <> may not be accurate for the entire result set. -The default value of this parameter is false. +The default value of this parameter is `false`. -[[CommonQueryParameters-TheomitHeaderParameter]] -== The omitHeader Parameter +== omitHeader Parameter -This parameter may be set to either true or false. +This parameter may be set to either `true` or `false`. -If set to true, this parameter excludes the header from the returned results. The header contains information about the request, such as the time it took to complete. The default value for this parameter is false. +If set to `true`, this parameter excludes the header from the returned results. The header contains information about the request, such as the time it took to complete. The default value for this parameter is `false`. -[[CommonQueryParameters-ThewtParameter]] -== The wt Parameter +== wt Parameter The `wt` parameter selects the Response Writer that Solr should use to format the query's response. For detailed descriptions of Response Writers, see <>. -[[CommonQueryParameters-Thecache_falseParameter]] -== The cache=false Parameter +== cache Parameter Solr caches the results of all queries and filter queries by default. To disable result caching, set the `cache=false` parameter. @@ -279,24 +240,22 @@ You can also use the `cost` option to control the order in which non-cached filt For very high cost filters, if `cache=false` and `cost>=100` and the query implements the `PostFilter` interface, a Collector will be requested from that query and used to filter documents after they have matched the main query and all other filter queries. There can be multiple post filters; they are also ordered by cost. For example: -// TODO: fix this, it looks horrible (CT) + +This is a normal function range query used as a filter, all matching documents generated up front and cached: [source,text] ----- -// normal function range query used as a filter, all matching documents -// generated up front and cached fq={!frange l=10 u=100}mul(popularity,price) -// function range query run in parallel with the main query like a traditional -// lucene filter +This is a function range query run in parallel with the main query like a traditional lucene filter: + +[source,text] fq={!frange l=10 u=100 cache=false}mul(popularity,price) -// function range query checked after each document that already matches the query -// and all other filters. Good for really expensive function queries. -fq={!frange l=10 u=100 cache=false cost=100}mul(popularity,price) ----- +This is a function range query checked after each document that already matches the query and all other filters. This is good for really expensive function queries: -[[CommonQueryParameters-ThelogParamsListParameter]] -== The logParamsList Parameter +[source,text] +fq={!frange l=10 u=100 cache=false cost=100}mul(popularity,price) + +== logParamsList Parameter By default, Solr logs all parameters of requests. Set this parameter to restrict which parameters of a request are logged. This may help control logging to only those parameters considered important to your organization. @@ -308,27 +267,17 @@ And only the 'q' and 'fq' parameters will be logged. If no parameters should be logged, you can send `logParamsList` as empty (i.e., `logParamsList=`). -[TIP] -==== -This parameter does not only apply to query requests, but to any kind of request to Solr. -==== +TIP: This parameter not only applies to query requests, but to any kind of request to Solr. -[[CommonQueryParameters-TheechoParamsParameter]] -== The echoParams Parameter +== echoParams Parameter The `echoParams` parameter controls what information about request parameters is included in the response header. -The table explains how Solr responds to various settings of the `echoParams` parameter: +The `echoParams` parameter accepts the following values: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Value |Meaning -|explicit |This is the default value. Only parameters included in the actual request, plus the `_` parameter (which is a 64-bit numeric timestamp) will be added to the params section of the response header. -|all |Include all request parameters that contributed to the query. This will include everything defined in the request handler definition found in `solrconfig.xml` as well as parameters included with the request, plus the `_` parameter. If a parameter is included in the request handler definition AND the request, it will appear multiple times in the response header. -|none |Entirely removes the "params" section of the response header. No information about the request parameters will be available in the response. -|=== +* `explicit`: This is the default value. Only parameters included in the actual request, plus the `_` parameter (which is a 64-bit numeric timestamp) will be added to the params section of the response header. +* `all`: Include all request parameters that contributed to the query. This will include everything defined in the request handler definition found in `solrconfig.xml` as well as parameters included with the request, plus the `_` parameter. If a parameter is included in the request handler definition AND the request, it will appear multiple times in the response header. +* `none`: Entirely removes the "params" section of the response header. No information about the request parameters will be available in the response. Here is an example of a JSON response where the echoParams parameter was not included, so the default of `explicit` is active. The request URL that created this response included three parameters - `q`, `wt`, and `indent`: diff --git a/solr/solr-ref-guide/src/configuring-logging.adoc b/solr/solr-ref-guide/src/configuring-logging.adoc index 05a6c7465ef..f1ceb4be015 100644 --- a/solr/solr-ref-guide/src/configuring-logging.adoc +++ b/solr/solr-ref-guide/src/configuring-logging.adoc @@ -22,7 +22,7 @@ Solr logs are a key way to know what's happening in the system. There are severa [IMPORTANT] ==== -In addition to the logging options described below, there is a way to configure which request parameters (such as parameters sent as part of queries) are logged with an additional request parameter called `logParamsList`. See the section on <> for more information. +In addition to the logging options described below, there is a way to configure which request parameters (such as parameters sent as part of queries) are logged with an additional request parameter called `logParamsList`. See the section on <> for more information. ==== == Temporary Logging Settings diff --git a/solr/solr-ref-guide/src/defining-fields.adoc b/solr/solr-ref-guide/src/defining-fields.adoc index ef93d605d6d..82f0345d9aa 100644 --- a/solr/solr-ref-guide/src/defining-fields.adoc +++ b/solr/solr-ref-guide/src/defining-fields.adoc @@ -63,7 +63,7 @@ Fields can have many of the same properties as field types. Properties from the |omitPositions |Similar to `omitTermFreqAndPositions` but preserves term frequency information. |true or false |* |termVectors termPositions termOffsets termPayloads |These options instruct Solr to maintain full term vectors for each document, optionally including position, offset and payload information for each term occurrence in those vectors. These can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr. |true or false |false |required |Instructs Solr to reject any attempts to add a document which does not have a value for this field. This property defaults to false. |true or false |false -|useDocValuesAsStored |If the field has `<>` enabled, setting this to true would allow the field to be returned as if it were a stored field (even if it has `stored=false`) when matching "`*`" in an <>. |true or false |true +|useDocValuesAsStored |If the field has `<>` enabled, setting this to true would allow the field to be returned as if it were a stored field (even if it has `stored=false`) when matching "`*`" in an <>. |true or false |true |large |Large fields are always lazy loaded and will only take up space in the document cache if the actual value is < 512KB. This option requires `stored="true"` and `multiValued="false"`. It's intended for fields that might have very large values so that they don't get cached in memory. |true or false |false |=== diff --git a/solr/solr-ref-guide/src/docvalues.adoc b/solr/solr-ref-guide/src/docvalues.adoc index 2ec3677575b..c0b7c3199df 100644 --- a/solr/solr-ref-guide/src/docvalues.adoc +++ b/solr/solr-ref-guide/src/docvalues.adoc @@ -57,7 +57,7 @@ DocValues are only available for specific field types. The types chosen determin These Lucene types are related to how the {lucene-javadocs}/core/org/apache/lucene/index/DocValuesType.html[values are sorted and stored]. -There is an additional configuration option available, which is to modify the `docValuesFormat` <>. The default implementation employs a mixture of loading some things into memory and keeping some on disk. In some cases, however, you may choose to specify an alternative {lucene-javadocs}/core/org/apache/lucene/codecs/DocValuesFormat.html[DocValuesFormat implementation]. For example, you could choose to keep everything in memory by specifying `docValuesFormat="Memory"` on a field type: +There is an additional configuration option available, which is to modify the `docValuesFormat` <>. The default implementation employs a mixture of loading some things into memory and keeping some on disk. In some cases, however, you may choose to specify an alternative {lucene-javadocs}/core/org/apache/lucene/codecs/DocValuesFormat.html[DocValuesFormat implementation]. For example, you could choose to keep everything in memory by specifying `docValuesFormat="Memory"` on a field type: [source,xml] ---- @@ -73,13 +73,13 @@ Lucene index back-compatibility is only supported for the default codec. If you === Sorting, Faceting & Functions -If `docValues="true"` for a field, then DocValues will automatically be used any time the field is used for <>, <> or <>. +If `docValues="true"` for a field, then DocValues will automatically be used any time the field is used for <>, <> or <>. === Retrieving DocValues During Search Field values retrieved during search queries are typically returned from stored values. However, non-stored docValues fields will be also returned along with other stored fields when all fields (or pattern matching globs) are specified to be returned (e.g. "`fl=*`") for search queries depending on the effective value of the `useDocValuesAsStored` parameter for each field. For schema versions >= 1.6, the implicit default is `useDocValuesAsStored="true"`. See <> & <> for more details. -When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly requested by name in the <>, but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular" stored fields at query time has performance implications that stored fields may not because DocValues are column-oriented and may therefore incur additional cost to retrieve for each returned document. Also note that while returning non-stored fields from DocValues, the values of a multi-valued field are returned in sorted order (and not insertion order). If you require the multi-valued fields to be returned in the original insertion order, then make your multi-valued field as stored (such a change requires re-indexing). +When `useDocValuesAsStored="false"`, non-stored DocValues fields can still be explicitly requested by name in the <>, but will not match glob patterns (`"*"`). Note that returning DocValues along with "regular" stored fields at query time has performance implications that stored fields may not because DocValues are column-oriented and may therefore incur additional cost to retrieve for each returned document. Also note that while returning non-stored fields from DocValues, the values of a multi-valued field are returned in sorted order (and not insertion order). If you require the multi-valued fields to be returned in the original insertion order, then make your multi-valued field as stored (such a change requires re-indexing). In cases where the query is returning _only_ docValues fields performance may improve since returning stored fields requires disk reads and decompression whereas returning docValues fields in the fl list only requires memory access. diff --git a/solr/solr-ref-guide/src/faceting.adoc b/solr/solr-ref-guide/src/faceting.adoc index 4384a74f462..bf23e08a03d 100644 --- a/solr/solr-ref-guide/src/faceting.adoc +++ b/solr/solr-ref-guide/src/faceting.adoc @@ -23,30 +23,24 @@ Faceting is the arrangement of search results into categories based on indexed t Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found for each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for. -[[Faceting-GeneralParameters]] -== General Parameters +== General Facet Parameters There are two general parameters for controlling faceting. -[[Faceting-ThefacetParameter]] -=== The facet Parameter - -If set to *true*, this parameter enables facet counts in the query response. If set to *false*, a blank or missing value, this parameter disables faceting. None of the other parameters listed below will have any effect unless this parameter is set to *true*. The default value is blank (false). - -[[Faceting-Thefacet.queryParameter]] -=== The facet.query Parameter +`facet`:: +If set to `true`, this parameter enables facet counts in the query response. If set to `false`, a blank or missing value, this parameter disables faceting. None of the other parameters listed below will have any effect unless this parameter is set to `true`. The default value is blank (false). +`facet.query`:: This parameter allows you to specify an arbitrary query in the Lucene default syntax to generate a facet count. - ++ By default, Solr's faceting feature automatically determines the unique terms for a field and returns a count for each of those terms. Using `facet.query`, you can override this default behavior and select exactly which terms or expressions you would like to see counted. In a typical implementation of faceting, you will specify a number of `facet.query` parameters. This parameter can be particularly useful for numeric-range-based facets or prefix-based facets. - ++ You can set the `facet.query` parameter multiple times to indicate that multiple queries should be used as separate facet constraints. - ++ To use facet queries in a syntax other than the default syntax, prefix the facet query with the name of the query notation. For example, to use the hypothetical `myfunc` query parser, you could set the `facet.query` parameter like so: - ++ `facet.query={!myfunc}name~fred` -[[Faceting-Field-ValueFacetingParameters]] == Field-Value Faceting Parameters Several parameters can be used to trigger faceting based on the indexed terms in a field. @@ -55,335 +49,218 @@ When using these parameters, it is important to remember that "term" is a very s If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the `copyField` directive in your Schema to create two versions of the field: one Text and one String. Make sure both are `indexed="true"`. (For more information about the `copyField` directive, see <>.) -The table below summarizes Solr's field value faceting parameters. - -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Parameter |Description -|<> |Identifies a field to be treated as a facet. -|<> |Limits the terms used for faceting to those that begin with the specified prefix. -|<> |Limits the terms used for faceting to those that contain the specified substring. -|<> |If facet.contains is used, ignore case when searching for the specified substring. -|<> |Controls how faceted results are sorted. -|<> |Controls how many constraints should be returned for each facet. -|<> |Specifies an offset into the facet results at which to begin displaying facets. -|<> |Specifies the minimum counts required for a facet field to be included in the response. -|<> |Controls whether Solr should compute a count of all matching results which have no value for the field, in addition to the term-based constraints of a facet field. -|<> |Selects the algorithm or method Solr should use when faceting a field. -|<> |Caps facet counts by one. Available only for `facet.method=enum` as performance optimization. -|<> |Removes specific terms from facet counts. This allows you to exclude certain terms from faceting, while maintaining the terms in the index for general queries. -|<> |(Advanced) Specifies the minimum document frequency (the number of documents matching a term) for which the `filterCache` should be used when determining the constraint count for that term. -|<> |(Advanced) A number of documents, beyond the effective `facet.limit` to request from each shard in a distributed search -|<> |(Advanced) A multiplier of the effective `facet.limit` to request from each shard in a distributed search -|<> |(Advanced) Controls parallel execution of field faceting -|=== - -These parameters are described in the sections below. - -[[Faceting-Thefacet.fieldParameter]] -=== The facet.field Parameter +Unless otherwise specified, all of the parameters below can be specified on a per-field basis with the syntax of `f..facet.` +`facet.field`:: The `facet.field` parameter identifies a field that should be treated as a facet. It iterates over each Term in the field and generate a facet count using that Term as the constraint. This parameter can be specified multiple times in a query to select multiple facet fields. ++ +IMPORTANT: If you do not set this parameter to at least one field in the schema, none of the other parameters described in this section will have any effect. -[IMPORTANT] -==== -If you do not set this parameter to at least one field in the schema, none of the other parameters described in this section will have any effect. -==== - -[[Faceting-Thefacet.prefixParameter]] -=== The facet.prefix Parameter - +`facet.prefix`:: The `facet.prefix` parameter limits the terms on which to facet to those starting with the given string prefix. This does not limit the query in any way, only the facets that would be returned in response to the query. ++ -This parameter can be specified on a per-field basis with the syntax of `f..facet.prefix`. - -[[Faceting-Thefacet.containsParameter]] -=== The facet.contains Parameter - +`facet.contains`:: The `facet.contains` parameter limits the terms on which to facet to those containing the given substring. This does not limit the query in any way, only the facets that would be returned in response to the query. -This parameter can be specified on a per-field basis with the syntax of `f..facet.contains`. - -[[Faceting-Thefacet.contains.ignoreCaseParameter]] -=== The facet.contains.ignoreCase Parameter +`facet.contains.ignoreCase`:: If `facet.contains` is used, the `facet.contains.ignoreCase` parameter causes case to be ignored when matching the given substring against candidate facet terms. -This parameter can be specified on a per-field basis with the syntax of `f..facet.contains.ignoreCase`. - -[[Faceting-Thefacet.sortParameter]] -=== The facet.sort Parameter - +`facet.sort`:: This parameter determines the ordering of the facet field constraints. - ++ There are two options for this parameter. - -count:: Sort the constraints by count (highest count first). -index:: Return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ASCII range, this will be alphabetically sorted. - ++ +-- +`count`::: Sort the constraints by count (highest count first). +`index`::: Return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ASCII range, this will be alphabetically sorted. +-- ++ The default is `count` if `facet.limit` is greater than 0, otherwise, the default is `index`. -This parameter can be specified on a per-field basis with the syntax of `f..facet.sort`. - -[[Faceting-Thefacet.limitParameter]] -=== The facet.limit Parameter - +`facet.limit`:: This parameter specifies the maximum number of constraint counts (essentially, the number of facets for a field that are returned) that should be returned for the facet fields. A negative value means that Solr will return unlimited number of constraint counts. ++ +The default value is `100`. -The default value is 100. - -This parameter can be specified on a per-field basis to apply a distinct limit to each field with the syntax of `f..facet.limit`. - -[[Faceting-Thefacet.offsetParameter]] -=== The facet.offset Parameter +`facet.offset`:: The `facet.offset` parameter indicates an offset into the list of constraints to allow paging. ++ +The default value is `0`. -The default value is 0. - -This parameter can be specified on a per-field basis with the syntax of `f..facet.offset`. - -[[Faceting-Thefacet.mincountParameter]] -=== The facet.mincount Parameter +`facet.mincount`:: The `facet.mincount` parameter specifies the minimum counts required for a facet field to be included in the response. If a field's counts are below the minimum, the field's facet is not returned. ++ +The default value is `0`. -The default value is 0. - -This parameter can be specified on a per-field basis with the syntax of `f..facet.mincount`. - -[[Faceting-Thefacet.missingParameter]] -=== The facet.missing Parameter - -If set to true, this parameter indicates that, in addition to the Term-based constraints of a facet field, a count of all results that match the query but which have no facet value for the field should be computed and returned in the response. - -The default value is false. - -This parameter can be specified on a per-field basis with the syntax of `f..facet.missing`. - -[[Faceting-Thefacet.methodParameter]] -=== The facet.method Parameter - -The facet.method parameter selects the type of algorithm or method Solr should use when faceting a field. +`facet.missing`:: +If set to `true`, this parameter indicates that, in addition to the Term-based constraints of a facet field, a count of all results that match the query but which have no facet value for the field should be computed and returned in the response. ++ +The default value is `false`. +`facet.method`:: +The `facet.method` parameter selects the type of algorithm or method Solr should use when faceting a field. ++ The following methods are available. - -enum:: Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query. ++ +-- +`enum`::: Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query. + This method is recommended for faceting multi-valued fields that have only a few distinct values. The average number of values per document does not matter. + For example, faceting on a field with U.S. States such as `Alabama, Alaska, ... Wyoming` would lead to fifty cached filters which would be used over and over again. The `filterCache` should be large enough to hold all the cached filters. -fc:: Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document. +`fc`::: Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document. + This is currently implemented using an `UnInvertedField` cache if the field either is multi-valued or is tokenized (according to `FieldType.isTokened()`). Each document is looked up in the cache to see what terms/values it contains, and a tally is incremented for each value. + This method is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the `filterCache` for terms that match many documents. The letters `fc` stand for field cache. -fcs:: Per-segment field faceting for single-valued string fields. Enable with `facet.method=fcs` and control the number of threads used with the `threads` local parameter. This parameter allows faceting to be faster in the presence of rapid index changes. - +`fcs`::: Per-segment field faceting for single-valued string fields. Enable with `facet.method=fcs` and control the number of threads used with the `threads` local parameter. This parameter allows faceting to be faster in the presence of rapid index changes. +-- ++ The default value is `fc` (except for fields using the `BoolField` field type and when `facet.exists=true` is requested) since it tends to use less memory and is faster when a field has many unique terms in the index. -This parameter can be specified on a per-field basis with the syntax of `f..facet.method`. - -[[Faceting-Thefacet.enum.cache.minDfParameter]] -=== The facet.enum.cache.minDf Parameter - +`facet.enum.cache.minDf`:: This parameter indicates the minimum document frequency (the number of documents matching a term) for which the filterCache should be used when determining the constraint count for that term. This is only used with the `facet.method=enum` method of faceting. ++ +A value greater than zero decreases the filterCache's memory usage, but increases the time required for the query to be processed. If you are faceting on a field with a very large number of terms, and you wish to decrease memory usage, try setting this parameter to a value between `25` and `50`, and run a few tests. Then, optimize the parameter setting as necessary. ++ +The default value is `0`, causing the filterCache to be used for all terms in the field. -A value greater than zero decreases the filterCache's memory usage, but increases the time required for the query to be processed. If you are faceting on a field with a very large number of terms, and you wish to decrease memory usage, try setting this parameter to a value between 25 and 50, and run a few tests. Then, optimize the parameter setting as necessary. +`facet.exists`:: +To cap facet counts by 1, specify `facet.exists=true`. This parameter can be used with `facet.method=enum` or when it's omitted. It can be used only on non-trie fields (such as strings). It may speed up facet counting on large indices and/or high-cardinality facet values. -The default value is 0, causing the filterCache to be used for all terms in the field. - -This parameter can be specified on a per-field basis with the syntax of `f..facet.enum.cache.minDf`. - -[[Faceting-Thefacet.existsParameter]] -=== The facet.exists Parameter - -To cap facet counts by 1, specify `facet.exists=true`. It can be used with `facet.method=enum` or when it's omitted. It can be used only on non-trie fields (such as strings). It may speed up facet counting on large indices and/or high-cardinality facet values.. - -This parameter can be specified on a per-field basis with the syntax of `f..facet.exists` or via local parameter` facet.field={!facet.method=enum facet.exists=true}size`. - -[[Faceting-Thefacet.excludeTermsParameter]] -=== The facet.excludeTerms Parameter +`facet.excludeTerms`:: If you want to remove terms from facet counts but keep them in the index, the `facet.excludeTerms` parameter allows you to do that. -[[Faceting-Over-RequestParameters]] -=== Over-Request Parameters +`facet.overrequest.count` and `facet.overrequest.ratio`:: +In some situations, the accuracy in selecting the "top" constraints returned for a facet in a distributed Solr query can be improved by "over requesting" the number of desired constraints (i.e., `facet.limit`) from each of the individual shards. In these situations, each shard is by default asked for the top `10 + (1.5 * facet.limit)` constraints. ++ +In some situations, depending on how your docs are partitioned across your shards and what `facet.limit` value you used, you may find it advantageous to increase or decrease the amount of over-requesting Solr does. This can be achieved by setting the `facet.overrequest.count` (defaults to `10`) and `facet.overrequest.ratio` (defaults to `1.5`) parameters. -In some situations, the accuracy in selecting the "top" constraints returned for a facet in a distributed Solr query can be improved by "Over Requesting" the number of desired constraints (ie: `facet.limit`) from each of the individual Shards. In these situations, each shard is by default asked for the top "`10 + (1.5 * facet.limit)`" constraints. +`facet.threads`:: +This parameter will cause loading the underlying fields used in faceting to be executed in parallel with the number of threads specified. Specify as `facet.threads=N` where `N` is the maximum number of threads used. ++ +Omitting this parameter or specifying the thread count as `0` will not spawn any threads, and only the main request thread will be used. Specifying a negative number of threads will create up to `Integer.MAX_VALUE` threads. -In some situations, depending on how your docs are partitioned across your shards, and what `facet.limit` value you used, you may find it advantageous to increase or decrease the amount of over-requesting Solr does. This can be achieved by setting the `facet.overrequest.count` (defaults to 10) and `facet.overrequest.ratio` (defaults to 1.5) parameters. - -[[Faceting-Thefacet.threadsParameter]] -=== The facet.threads Parameter - -This param will cause loading the underlying fields used in faceting to be executed in parallel with the number of threads specified. Specify as `facet.threads=N` where `N` is the maximum number of threads used. Omitting this parameter or specifying the thread count as 0 will not spawn any threads, and only the main request thread will be used. Specifying a negative number of threads will create up to Integer.MAX_VALUE threads. - -[[Faceting-RangeFaceting]] == Range Faceting You can use Range Faceting on any date field or any numeric field that supports range queries. This is particularly useful for stitching together a series of range queries (as facet by query) for things like prices. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Parameter |Description -|<> |Specifies the field to facet by range. -|<> |Specifies the start of the facet range. -|<> |Specifies the end of the facet range. -|<> |Specifies the span of the range as a value to be added to the lower bound. -|<> |A boolean parameter that specifies how Solr handles a range gap that cannot be evenly divided between the range start and end values. If true, the last range constraint will have the `facet.range.end` value an upper bound. If false, the last range will have the smallest possible upper bound greater then `facet.range.end` such that the range is the exact width of the specified range gap. The default value for this parameter is false. -|<> |Specifies inclusion and exclusion preferences for the upper and lower bounds of the range. See the `facet.range.include` topic for more detailed information. -|<> |Specifies counts for Solr to compute in addition to the counts for each facet range constraint. -|<> |Specifies the algorithm or method to use for calculating facets. -|=== - -[[Faceting-Thefacet.rangeParameter]] -=== The facet.range Parameter - +`facet.range`:: The `facet.range` parameter defines the field for which Solr should create range facets. For example: - ++ `facet.range=price&facet.range=age` - ++ `facet.range=lastModified_dt` -[[Faceting-Thefacet.range.startParameter]] -=== The facet.range.start Parameter - +`facet.range.start`:: The `facet.range.start` parameter specifies the lower bound of the ranges. You can specify this parameter on a per field basis with the syntax of `f..facet.range.start`. For example: - ++ `f.price.facet.range.start=0.0&f.age.facet.range.start=10` - ++ `f.lastModified_dt.facet.range.start=NOW/DAY-30DAYS` -[[Faceting-Thefacet.range.endParameter]] -=== The facet.range.end Parameter - -The facet.range.end specifies the upper bound of the ranges. You can specify this parameter on a per field basis with the syntax of `f..facet.range.end`. For example: - +`facet.range.end`:: +The `facet.range.end` specifies the upper bound of the ranges. You can specify this parameter on a per field basis with the syntax of `f..facet.range.end`. For example: ++ `f.price.facet.range.end=1000.0&f.age.facet.range.start=99` - ++ `f.lastModified_dt.facet.range.end=NOW/DAY+30DAYS` -[[Faceting-Thefacet.range.gapParameter]] -=== The facet.range.gap Parameter - +`facet.range.gap`:: The span of each range expressed as a value to be added to the lower bound. For date fields, this should be expressed using the {solr-javadocs}/solr-core/org/apache/solr/util/DateMathParser.html[`DateMathParser` syntax] (such as, `facet.range.gap=%2B1DAY ... '+1DAY'`). You can specify this parameter on a per-field basis with the syntax of `f..facet.range.gap`. For example: - ++ `f.price.facet.range.gap=100&f.age.facet.range.gap=10` - ++ `f.lastModified_dt.facet.range.gap=+1DAY` -[[Faceting-Thefacet.range.hardendParameter]] -=== The facet.range.hardend Parameter - +`facet.range.hardend`:: The `facet.range.hardend` parameter is a Boolean parameter that specifies how Solr should handle cases where the `facet.range.gap` does not divide evenly between `facet.range.start` and `facet.range.end`. - -If *true*, the last range constraint will have the `facet.range.end` value as an upper bound. If *false*, the last range will have the smallest possible upper bound greater then `facet.range.end` such that the range is the exact width of the specified range gap. The default value for this parameter is false. - ++ +If `true`, the last range constraint will have the `facet.range.end` value as an upper bound. If `false`, the last range will have the smallest possible upper bound greater then `facet.range.end` such that the range is the exact width of the specified range gap. The default value for this parameter is false. ++ This parameter can be specified on a per field basis with the syntax `f..facet.range.hardend`. -[[Faceting-Thefacet.range.includeParameter]] -=== The facet.range.include Parameter - +`facet.range.include`:: By default, the ranges used to compute range faceting between `facet.range.start` and `facet.range.end` are inclusive of their lower bounds and exclusive of the upper bounds. The "before" range defined with the `facet.range.other` parameter is exclusive and the "after" range is inclusive. This default, equivalent to "lower" below, will not result in double counting at the boundaries. You can use the `facet.range.include` parameter to modify this behavior using the following options: - -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Option |Description -|lower |All gap-based ranges include their lower bound. -|upper |All gap-based ranges include their upper bound. -|edge |The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified. -|outer |The "before" and "after" ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. -|all |Includes all options: lower, upper, edge, outer. -|=== - ++ +-- +* `lower`: All gap-based ranges include their lower bound. +* `upper`: All gap-based ranges include their upper bound. +* `edge`: The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified. +* `outer`: The "before" and "after" ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries. +* `all`: Includes all options: `lower`, `upper`, `edge`, and `outer`. +-- ++ You can specify this parameter on a per field basis with the syntax of `f..facet.range.include`, and you can specify it multiple times to indicate multiple choices. ++ +NOTE: To ensure you avoid double-counting, do not choose both `lower` and `upper`, do not choose `outer`, and do not choose `all`. -[NOTE] -==== -To ensure you avoid double-counting, do not choose both `lower` and `upper`, do not choose `outer`, and do not choose `all`. -==== - -[[Faceting-Thefacet.range.otherParameter]] -=== The facet.range.other Parameter - +`facet.range.other`:: The `facet.range.other` parameter specifies that in addition to the counts for each range constraint between `facet.range.start` and `facet.range.end`, counts should also be computed for these options: - -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Option |Description -|before |All records with field values lower then lower bound of the first range. -|after |All records with field values greater then the upper bound of the last range. -|between |All records with field values between the start and end bounds of all ranges. -|none |Do not compute any counts. -|all |Compute counts for before, between, and after. -|=== - ++ +-- +* `before`: All records with field values lower then lower bound of the first range. +* `after`: All records with field values greater then the upper bound of the last range. +* `between`: All records with field values between the start and end bounds of all ranges. +* `none`: Do not compute any counts. +* `all`: Compute counts for before, between, and after. +-- ++ This parameter can be specified on a per field basis with the syntax of `f..facet.range.other`. In addition to the `all` option, this parameter can be specified multiple times to indicate multiple choices, but `none` will override all other options. -[[Faceting-Thefacet.range.methodParameter]] -=== The facet.range.method Parameter - +`facet.range.method`:: The `facet.range.method` parameter selects the type of algorithm or method Solr should use for range faceting. Both methods produce the same results, but performance may vary. ++ +-- +filter::: This method generates the ranges based on other facet.range parameters, and for each of them executes a filter that later intersects with the main query resultset to get the count. It will make use of the filterCache, so it will benefit of a cache large enough to contain all ranges. ++ +dv::: This method iterates the documents that match the main query, and for each of them finds the correct range for the value. This method will make use of <> (if enabled for the field) or fieldCache. The `dv` method is not supported for field type DateRangeField or when using <>. +-- ++ +The default value for this parameter is `filter`. -filter:: This method generates the ranges based on other facet.range parameters, and for each of them executes a filter that later intersects with the main query resultset to get the count. It will make use of the filterCache, so it will benefit of a cache large enough to contain all ranges. -dv:: This method iterates the documents that match the main query, and for each of them finds the correct range for the value. This method will make use of <> (if enabled for the field) or fieldCache. The `dv` method is not supported for field type DateRangeField or when using <>. - -Default value for this parameter is "filter". - -[[Faceting-Thefacet.mincountParameterinRangeFaceting]] -=== The facet.mincount Parameter in Range Faceting - -The `facet.mincount` parameter, the same one as used in field faceting is also applied to range faceting. When used, no ranges with a count below the minimum will be included in the response. .Date Ranges & Time Zones [NOTE] ==== - Range faceting on date fields is a common situation where the <> parameter can be useful to ensure that the "facet counts per day" or "facet counts per month" are based on a meaningful definition of when a given day/month "starts" relative to a particular TimeZone. For more information, see the examples in the <> section. - ==== +=== facet.mincount in Range Faceting + +The `facet.mincount` parameter, the same one as used in field faceting is also applied to range faceting. When used, no ranges with a count below the minimum will be included in the response. -[[Faceting-Pivot_DecisionTree_Faceting]] == Pivot (Decision Tree) Faceting Pivoting is a summarization tool that lets you automatically sort, count, total or average data stored in a table. The results are typically displayed in a second table showing the summarized data. Pivot faceting lets you create a summary table of the results from a faceting documents by multiple fields. Another way to look at it is that the query produces a Decision Tree, in that Solr tells you "for facet A, the constraints/counts are X/N, Y/M, etc. If you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc.". In other words, it tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results. -[[Faceting-facet.pivot]] -=== facet.pivot - +`facet.pivot`:: The `facet.pivot` parameter defines the fields to use for the pivot. Multiple `facet.pivot` values will create multiple "facet_pivot" sections in the response. Separate each list of fields with a comma. -[[Faceting-facet.pivot.mincount]] -=== facet.pivot.mincount - +`facet.pivot.mincount`:: The `facet.pivot.mincount` parameter defines the minimum number of documents that need to match in order for the facet to be included in results. The default is 1. - ++ Using the "`bin/solr -e techproducts`" example, A query URL like this one will return the data below, with the pivot faceting results found in the section "facet_pivot": - ++ [source,text] ---- http://localhost:8983/solr/techproducts/select?q=*:*&facet.pivot=cat,popularity,inStock &facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5 &rows=0&wt=json&indent=true&facet.pivot.mincount=2 ---- - ++ [source,json] ---- { "facet_counts":{ @@ -413,10 +290,9 @@ http://localhost:8983/solr/techproducts/select?q=*:*&facet.pivot=cat,popularity, }]}}} ---- -[[Faceting-CombiningStatsComponentWithPivots]] === Combining Stats Component With Pivots -In addition to some of the <> supported by other types of faceting, a `stats` local parameters can be used with `facet.pivot` to refer to <> instances (by tag) that you would like to have computed for each Pivot Constraint. +In addition to some of the <> supported by other types of faceting, a `stats` local parameters can be used with `facet.pivot` to refer to <> instances (by tag) that you would like to have computed for each Pivot Constraint. In the example below, two different (overlapping) sets of statistics are computed for each of the facet.pivot result hierarchies: @@ -503,7 +379,6 @@ Results: "..."}]}}}}]}]}} ---- -[[Faceting-CombiningFacetQueriesAndFacetRangesWithPivotFacets]] === Combining Facet Queries And Facet Ranges With Pivot Facets A `query` local parameter can be used with `facet.pivot` to refer to `facet.query` instances (by tag) that should be computed for each pivot constraint. Similarly, a `range` local parameter can be used with `facet.pivot` to refer to `facet.range` instances. @@ -630,10 +505,9 @@ facet.pivot={!range=r1}cat,inStock "..."]}]}}} ---- -[[Faceting-AdditionalPivotParameters]] === Additional Pivot Parameters -Although `facet.pivot.mincount` deviates in name from the `facet.mincount` parameter used by field faceting, many other Field faceting parameters described above can also be used with pivot faceting: +Although `facet.pivot.mincount` deviates in name from the `facet.mincount` parameter used by field faceting, many of the faceting parameters described above can also be used with pivot faceting: * `facet.limit` * `facet.offset` @@ -641,7 +515,6 @@ Although `facet.pivot.mincount` deviates in name from the `facet.mincount` param * `facet.overrequest.count` * `facet.overrequest.ratio` -[[Faceting-IntervalFaceting]] == Interval Faceting Another supported form of faceting is interval faceting. This sounds similar to range faceting, but the functionality is really closer to doing facet queries with range queries. Interval faceting allows you to set variable intervals and count the number of documents that have values within those intervals in the specified field. @@ -652,23 +525,21 @@ If you are concerned about the performance of your searches you should test with This method will use <> if they are enabled for the field, will use fieldCache otherwise. -[[Faceting-Thefacet.intervalparameter]] -=== The facet.interval parameter +Use these parameters for interval faceting: + +`facet.interval`:: This parameter Indicates the field where interval faceting must be applied. It can be used multiple times in the same request to indicate multiple fields. - ++ `facet.interval=price&facet.interval=size` -[[Faceting-Thefacet.interval.setparameter]] -=== The facet.interval.set parameter - +`facet.interval.set`:: This parameter is used to set the intervals for the field, it can be specified multiple times to indicate multiple intervals. This parameter is global, which means that it will be used for all fields indicated with `facet.interval` unless there is an override for a specific field. To override this parameter on a specific field you can use: `f..facet.interval.set`, for example: - ++ [source,text] f.price.facet.interval.set=[0,10]&f.price.facet.interval.set=(10,100] -[[Faceting-IntervalSyntax]] === Interval Syntax Intervals must begin with either '(' or '[', be followed by the start value, then a comma (','), the end value, and finally a closing ')' or ']’. @@ -699,12 +570,10 @@ Interval faceting supports output key replacement described below. Output keys c &facet=true ---- -[[Faceting-LocalParametersforFaceting]] == Local Parameters for Faceting The <> allows overriding global settings. It can also provide a method of adding metadata to other parameter values, much like XML attributes. -[[Faceting-TaggingandExcludingFilters]] === Tagging and Excluding Filters You can tag specific filters and exclude those filters when faceting. This is useful when doing multi-select faceting. @@ -732,7 +601,6 @@ To return counts for doctype values that are currently not selected, tag filters Filter exclusion is supported for all types of facets. Both the `tag` and `ex` local parameters may specify multiple values by separating them with commas. -[[Faceting-ChangingtheOutputKey]] === Changing the Output Key To change the output key for a faceting command, specify a new name with the `key` local parameter. For example: @@ -741,14 +609,12 @@ To change the output key for a faceting command, specify a new name with the `ke The parameter setting above causes the field facet results for the "doctype" field to be returned using the key "mylabel" rather than "doctype" in the response. This can be helpful when faceting on the same field multiple times with different exclusions. -[[Faceting-Limitingfacetwithcertainterms]] === Limiting Facet with Certain Terms To limit field facet with certain terms specify them comma separated with `terms` local parameter. Commas and quotes in terms can be escaped with backslash, as in `\,`. In this case facet is calculated on a way similar to `facet.method=enum` , but ignores `facet.enum.cache.minDf`. For example: `facet.field={!terms='alfa,betta,with\,with\',with space'}symbol` -[[Faceting-RelatedTopics]] == Related Topics -* <> +See also <>. diff --git a/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc index 695146b8de0..c3c1b5d0b0f 100644 --- a/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc +++ b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc @@ -90,11 +90,11 @@ For multivalued fields, specifies a distance between multiple values, which prev `autoGeneratePhraseQueries`:: For text fields. If `true`, Solr automatically generates phrase queries for adjacent terms. If `false`, terms must be enclosed in double-quotes to be treated as phrases. `enableGraphQueries`:: -For text fields, applicable when querying with <>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g., <> and <>. +For text fields, applicable when querying with <>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g., <> and <>. + Use `false` for field types with query analyzers including filters that can match docs when some tokens are missing, e.g., <>. -[[FieldTypeDefinitionsandProperties-docValuesFormat]] +[[docvaluesformat]] `docValuesFormat`:: Defines a custom `DocValuesFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml. @@ -130,7 +130,7 @@ The default values for each property depend on the underlying `FieldType` class, |omitPositions |Similar to `omitTermFreqAndPositions` but preserves term frequency information. |true or false |* |termVectors termPositions termOffsets termPayloads |These options instruct Solr to maintain full term vectors for each document, optionally including position, offset and payload information for each term occurrence in those vectors. These can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr. |true or false |false |required |Instructs Solr to reject any attempts to add a document which does not have a value for this field. This property defaults to false. |true or false |false -|useDocValuesAsStored |If the field has <> enabled, setting this to true would allow the field to be returned as if it were a stored field (even if it has `stored=false`) when matching "`*`" in an <>. |true or false |true +|useDocValuesAsStored |If the field has <> enabled, setting this to true would allow the field to be returned as if it were a stored field (even if it has `stored=false`) when matching "`*`" in an <>. |true or false |true |large |Large fields are always lazy loaded and will only take up space in the document cache if the actual value is < 512KB. This option requires `stored="true"` and `multiValued="false"`. It's intended for fields that might have very large values so that they don't get cached in memory. |true or false |false |=== diff --git a/solr/solr-ref-guide/src/function-queries.adoc b/solr/solr-ref-guide/src/function-queries.adoc index 5a9f6dfda5a..11dfb08f301 100644 --- a/solr/solr-ref-guide/src/function-queries.adoc +++ b/solr/solr-ref-guide/src/function-queries.adoc @@ -60,7 +60,7 @@ the output would be: 0.343 ... ---- -* Use in a parameter that is explicitly for specifying functions, such as the EDisMax query parser's <> param, or DisMax query parser's <>. (Note that the `bf` parameter actually takes a list of function queries separated by white space and each with an optional boost. Make sure you eliminate any internal white space in single function queries when using `bf`). For example: +* Use in a parameter that is explicitly for specifying functions, such as the EDisMax query parser's <> param, or DisMax query parser's <>. (Note that the `bf` parameter actually takes a list of function queries separated by white space and each with an optional boost. Make sure you eliminate any internal white space in single function queries when using `bf`). For example: + [source,text] ---- diff --git a/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc b/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc index a592a2daf02..d81936fb199 100644 --- a/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc @@ -108,7 +108,7 @@ If the configuration options for the built-in merge policies do not fully suit y ---- -The example above shows Solr's {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] being configured to sort documents in merged segments by `"timestamp desc"`, and wrapped around a `TieredMergePolicyFactory` configured to use the values `maxMergeAtOnce=10` and `segmentsPerTier=10` via the `inner` prefix defined by `SortingMergePolicyFactory` 's `wrapped.prefix` option. For more information on using `SortingMergePolicyFactory`, see <>. +The example above shows Solr's {solr-javadocs}/solr-core/org/apache/solr/index/SortingMergePolicyFactory.html[`SortingMergePolicyFactory`] being configured to sort documents in merged segments by `"timestamp desc"`, and wrapped around a `TieredMergePolicyFactory` configured to use the values `maxMergeAtOnce=10` and `segmentsPerTier=10` via the `inner` prefix defined by `SortingMergePolicyFactory` 's `wrapped.prefix` option. For more information on using `SortingMergePolicyFactory`, see <>. === mergeScheduler diff --git a/solr/solr-ref-guide/src/query-re-ranking.adoc b/solr/solr-ref-guide/src/query-re-ranking.adoc index f05af720446..b633064214f 100644 --- a/solr/solr-ref-guide/src/query-re-ranking.adoc +++ b/solr/solr-ref-guide/src/query-re-ranking.adoc @@ -67,4 +67,4 @@ The `ltr` stands for Learning To Rank, please see <> to re-rank the group heads after they've been collapsed. It also preserves the order of documents elevated by the <>. And it even has its own custom explain so you can see how the re-ranking scores were derived when looking at <>. +The `rq` parameter and the re-ranking feature in general works well with other Solr features. For example, it can be used in conjunction with the <> to re-rank the group heads after they've been collapsed. It also preserves the order of documents elevated by the <>. And it even has its own custom explain so you can see how the re-ranking scores were derived when looking at <>. diff --git a/solr/solr-ref-guide/src/realtime-get.adoc b/solr/solr-ref-guide/src/realtime-get.adoc index 0573e05b395..f7d10361680 100644 --- a/solr/solr-ref-guide/src/realtime-get.adoc +++ b/solr/solr-ref-guide/src/realtime-get.adoc @@ -94,7 +94,7 @@ http://localhost:8983/solr/techproducts/get?id=mydoc&id=IW-02 } ---- -Real Time Get requests can also be combined with filter queries, specified with an <>, just like search requests: +Real Time Get requests can also be combined with filter queries, specified with an <>, just like search requests: [source,text] ---- diff --git a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc index 10fababcf1c..f043e844a4d 100644 --- a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc @@ -65,7 +65,7 @@ All of the parameters described in the section <>, or other query rules that should be added to each query. There is no mechanism in Solr to allow a client to override these additions, so you should be absolutely sure you always want these parameters applied to queries. +* `appends`: This allows definition of parameters that are added to the user query. These might be <>, or other query rules that should be added to each query. There is no mechanism in Solr to allow a client to override these additions, so you should be absolutely sure you always want these parameters applied to queries. + [source,xml] ---- @@ -125,7 +125,7 @@ There are several default search components that work with all SearchHandlers wi |mlt |`solr.MoreLikeThisComponent` |Described in the section <>. |highlight |`solr.HighlightComponent` |Described in the section <>. |stats |`solr.StatsComponent` |Described in the section <>. -|debug |`solr.DebugComponent` |Described in the section on <>. +|debug |`solr.DebugComponent` |Described in the section on <>. |expand |`solr.ExpandComponent` |Described in the section <>. |=== diff --git a/solr/solr-ref-guide/src/spell-checking.adoc b/solr/solr-ref-guide/src/spell-checking.adoc index b46c8a1e096..20ec5e0250c 100644 --- a/solr/solr-ref-guide/src/spell-checking.adoc +++ b/solr/solr-ref-guide/src/spell-checking.adoc @@ -22,15 +22,12 @@ The SpellCheck component is designed to provide inline query suggestions based o The basis for these suggestions can be terms in a field in Solr, externally created text files, or fields in other Lucene indexes. -[[SpellChecking-ConfiguringtheSpellCheckComponent]] == Configuring the SpellCheckComponent -[[SpellChecking-DefineSpellCheckinsolrconfig.xml]] === Define Spell Check in solrconfig.xml The first step is to specify the source of terms in `solrconfig.xml`. There are three approaches to spell checking in Solr, discussed below. -[[SpellChecking-IndexBasedSpellChecker]] ==== IndexBasedSpellChecker The `IndexBasedSpellChecker` uses a Solr index as the basis for a parallel index used for spell checking. It requires defining a field as the basis for the index terms; a common practice is to copy terms from some fields (such as `title`, `body`, etc.) to another field created for spell checking. Here is a simple example of configuring `solrconfig.xml` with the `IndexBasedSpellChecker`: @@ -57,7 +54,6 @@ The `spellcheckIndexDir` defines the location of the directory that holds the sp Finally, _buildOnCommit_ defines whether to build the spell check index at every commit (that is, every time new documents are added to the index). It is optional, and can be omitted if you would rather set it to `false`. -[[SpellChecking-DirectSolrSpellChecker]] ==== DirectSolrSpellChecker The `DirectSolrSpellChecker` uses terms from the Solr index without building a parallel index like the `IndexBasedSpellChecker`. This spell checker has the benefit of not having to be built regularly, meaning that the terms are always up-to-date with terms in the index. Here is how this might be configured in `solrconfig.xml` @@ -89,9 +85,8 @@ Because this spell checker is querying the main index, you may want to limit how The `maxInspections` parameter defines the maximum number of possible matches to review before returning results; the default is 5. `minQueryLength` defines how many characters must be in the query before suggestions are provided; the default is 4. -At first, spellchecker analyses incoming query words by looking up them in the index. Only query words, which are absent in index or too rare ones (below `maxQueryFrequency` ) are considered as misspelled and used for finding suggestions. Words which are frequent than `maxQueryFrequency` bypass spellchecker unchanged. After suggestions for every misspelled word are found they are filtered for enough frequency with `thresholdTokenFrequency` as boundary value. These parameters (`maxQueryFrequency` and `thresholdTokenFrequency`) can be a percentage (such as .01, or 1%) or an absolute value (such as 4). +At first, spellchecker analyses incoming query words by looking up them in the index. Only query words, which are absent in index or too rare ones (below `maxQueryFrequency`) are considered as misspelled and used for finding suggestions. Words which are frequent than `maxQueryFrequency` bypass spellchecker unchanged. After suggestions for every misspelled word are found they are filtered for enough frequency with `thresholdTokenFrequency` as boundary value. These parameters (`maxQueryFrequency` and `thresholdTokenFrequency`) can be a percentage (such as .01, or 1%) or an absolute value (such as 4). -[[SpellChecking-FileBasedSpellChecker]] ==== FileBasedSpellChecker The `FileBasedSpellChecker` uses an external file as a spelling dictionary. This can be useful if using Solr as a spelling server, or if spelling suggestions don't need to be based on actual terms in the index. In `solrconfig.xml`, you would define the searchComponent as so: @@ -120,7 +115,6 @@ The differences here are the use of the `sourceLocation` to define the location In the previous example, _name_ is used to name this specific definition of the spellchecker. Multiple definitions can co-exist in a single `solrconfig.xml`, and the _name_ helps to differentiate them. If only defining one spellchecker, no name is required. ==== -[[SpellChecking-WordBreakSolrSpellChecker]] ==== WordBreakSolrSpellChecker `WordBreakSolrSpellChecker` offers suggestions by combining adjacent query terms and/or breaking terms into multiple words. It is a `SpellCheckComponent` enhancement, leveraging Lucene's `WordBreakSpellChecker`. It can detect spelling errors resulting from misplaced whitespace without the use of shingle-based dictionaries and provides collation support for word-break errors, including cases where the user has a mix of single-word spelling errors and word-break errors in the same query. It also provides shard support. @@ -145,7 +139,6 @@ Some of the parameters will be familiar from the discussion of the other spell c The spellchecker can be configured with a traditional checker (ie: `DirectSolrSpellChecker`). The results are combined and collations can contain a mix of corrections from both spellcheckers. -[[SpellChecking-AddIttoaRequestHandler]] === Add It to a Request Handler Queries will be sent to a <>. If every request should generate a suggestion, then you would add the following to the `requestHandler` that you are using: @@ -173,151 +166,86 @@ Here is an example with multiple dictionaries: ---- -[[SpellChecking-SpellCheckParameters]] == Spell Check Parameters -The SpellCheck component accepts the parameters described in the table below. +The SpellCheck component accepts the parameters described below. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`spellcheck`:: +This parameter turns on SpellCheck suggestions for the request. If `true`, then spelling suggestions will be generated. This is required if spell checking is desired. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|<> |Turns on or off SpellCheck suggestions for the request. If *true*, then spelling suggestions will be generated. -|<> |Selects the query to be spellchecked. -|<> |Instructs Solr to build a dictionary for use in spellchecking. -|<> |Causes Solr to build a new query based on the best suggestion for each term in the submitted query. -|<> |This parameter specifies the maximum number of collations to return. -|<> |This parameter specifies the number of collation possibilities for Solr to try before giving up. -|<> |This parameter specifies the maximum number of word correction combinations to rank and evaluate prior to deciding which collation candidates to test against the index. -|<> |If true, returns an expanded response detailing the collations found. If `spellcheck.collate` is false, this parameter will be ignored. -|<> |The maximum number of documents to collect when testing potential Collations -|<> |Specifies param=value pairs that can be used to override normal query params when validating collations -|<> |Specifies the maximum number of spelling suggestions to be returned. -|<> |Specifies the dictionary that should be used for spellchecking. -|<> |Causes Solr to return additional information about spellcheck results, such as the frequency of each original term in the index (origFreq) as well as the frequency of each suggestion in the index (frequency). Note that this result format differs from the non-extended one as the returned suggestion for a word is actually an array of lists, where each list holds the suggested term and its frequency. -|<> |Limits spellcheck responses to queries that are more popular than the original query. -|<> |The maximum number of hits the request can return in order to both generate spelling suggestions and set the "correctlySpelled" element to "false". -|<> |The count of suggestions to return for each query term existing in the index and/or dictionary. -|<> |Reloads the spellchecker. -|<> |Specifies an accuracy value to help decide whether a result is worthwhile. -|<.key>> |Specifies a key/value pair for the implementation handling a given dictionary. -|=== +`spellcheck.q` or `q`:: +This parameter specifies the query to spellcheck. ++ +If `spellcheck.q` is defined, then it is used; otherwise the original input query is used. The `spellcheck.q` parameter is intended to be the original query, minus any extra markup like field names, boosts, and so on. If the `q` parameter is specified, then the `SpellingQueryConverter` class is used to parse it into tokens; otherwise the <> is used. ++ +The choice of which one to use is up to the application. Essentially, if you have a spelling "ready" version in your application, then it is probably better to use `spellcheck.q`. Otherwise, if you just want Solr to do the job, use the `q` parameter. -[[SpellChecking-ThespellcheckParameter]] -=== The spellcheck Parameter - -This parameter turns on SpellCheck suggestions for the request. If *true*, then spelling suggestions will be generated. - -[[SpellChecking-Thespellcheck.qorqParameter]] -=== The spellcheck.q or q Parameter - -This parameter specifies the query to spellcheck. If `spellcheck.q` is defined, then it is used; otherwise the original input query is used. The `spellcheck.q` parameter is intended to be the original query, minus any extra markup like field names, boosts, and so on. If the `q` parameter is specified, then the `SpellingQueryConverter` class is used to parse it into tokens; otherwise the <> is used. The choice of which one to use is up to the application. Essentially, if you have a spelling "ready" version in your application, then it is probably better to use `spellcheck.q`. Otherwise, if you just want Solr to do the job, use the `q` parameter. - -[NOTE] -==== -The SpellingQueryConverter class does not deal properly with non-ASCII characters. In this case, you have either to use `spellcheck.q`, or implement your own QueryConverter. -==== - -[[SpellChecking-Thespellcheck.buildParameter]] -=== The spellcheck.build Parameter - -If set to *true*, this parameter creates the dictionary that the SolrSpellChecker will use for spell-checking. In a typical search application, you will need to build the dictionary before using the SolrSpellChecker. However, it's not always necessary to build a dictionary first. For example, you can configure the spellchecker to use a dictionary that already exists. +NOTE: The `SpellingQueryConverter` class does not deal properly with non-ASCII characters. In this case, you have either to use `spellcheck.q`, or implement your own QueryConverter. +`spellcheck.build`:: +If set to `true`, this parameter creates the dictionary to be used for spell-checking. In a typical search application, you will need to build the dictionary before using the spell check. However, it's not always necessary to build a dictionary first. For example, you can configure the spellchecker to use a dictionary that already exists. ++ The dictionary will take some time to build, so this parameter should not be sent with every request. -[[SpellChecking-Thespellcheck.reloadParameter]] -=== The spellcheck.reload Parameter +`spellcheck.reload`:: +If set to `true`, this parameter reloads the spellchecker. The results depend on the implementation of `SolrSpellChecker.reload()`. In a typical implementation, reloading the spellchecker means reloading the dictionary. -If set to true, this parameter reloads the spellchecker. The results depend on the implementation of `SolrSpellChecker.reload()`. In a typical implementation, reloading the spellchecker means reloading the dictionary. +`spellcheck.count`:: +This parameter specifies the maximum number of suggestions that the spellchecker should return for a term. If this parameter isn't set, the value defaults to `1`. If the parameter is set but not assigned a number, the value defaults to `5`. If the parameter is set to a positive integer, that number becomes the maximum number of suggestions returned by the spellchecker. -[[SpellChecking-Thespellcheck.countParameter]] -=== The spellcheck.count Parameter +`spellcheck.onlyMorePopular`:: +If `true`, Solr will to return suggestions that result in more hits for the query than the existing query. Note that this will return more popular suggestions even when the given query term is present in the index and considered "correct". -This parameter specifies the maximum number of suggestions that the spellchecker should return for a term. If this parameter isn't set, the value defaults to 1. If the parameter is set but not assigned a number, the value defaults to 5. If the parameter is set to a positive integer, that number becomes the maximum number of suggestions returned by the spellchecker. +`spellcheck.maxResultsForSuggest`:: +If, for example, this is set to `5` and the user's query returns 5 or fewer results, the spellchecker will report "correctlySpelled=false" and also offer suggestions (and collations if requested). Setting this greater than zero is useful for creating "did-you-mean?" suggestions for queries that return a low number of hits. -[[SpellChecking-Thespellcheck.onlyMorePopularParameter]] -=== The spellcheck.onlyMorePopular Parameter +`spellcheck.alternativeTermCount`:: +Defines the number of suggestions to return for each query term existing in the index and/or dictionary. Presumably, users will want fewer suggestions for words with docFrequency>0. Also, setting this value enables context-sensitive spell suggestions. -If *true*, Solr will to return suggestions that result in more hits for the query than the existing query. Note that this will return more popular suggestions even when the given query term is present in the index and considered "correct". +`spellcheck.extendedResults`:: +If `true`, this parameter causes to Solr to return additional information about spellcheck results, such as the frequency of each original term in the index (`origFreq`) as well as the frequency of each suggestion in the index (`frequency`). Note that this result format differs from the non-extended one as the returned suggestion for a word is actually an array of lists, where each list holds the suggested term and its frequency. -[[SpellChecking-Thespellcheck.maxResultsForSuggestParameter]] -=== The spellcheck.maxResultsForSuggest Parameter - -For example, if this is set to 5 and the user's query returns 5 or fewer results, the spellchecker will report "correctlySpelled=false" and also offer suggestions (and collations if requested). Setting this greater than zero is useful for creating "did-you-mean?" suggestions for queries that return a low number of hits. - -[[SpellChecking-Thespellcheck.alternativeTermCountParameter]] -=== The spellcheck.alternativeTermCount Parameter - -Specify the number of suggestions to return for each query term existing in the index and/or dictionary. Presumably, users will want fewer suggestions for words with docFrequency>0. Also setting this value turns "on" context-sensitive spell suggestions. - -[[SpellChecking-Thespellcheck.extendedResultsParameter]] -=== The spellcheck.extendedResults Parameter - -This parameter causes to Solr to include additional information about the suggestion, such as the frequency in the index. - -[[SpellChecking-Thespellcheck.collateParameter]] -=== The spellcheck.collate Parameter - -If *true*, this parameter directs Solr to take the best suggestion for each token (if one exists) and construct a new query from the suggestions. For example, if the input query was "jawa class lording" and the best suggestion for "jawa" was "java" and "lording" was "loading", then the resulting collation would be "java class loading". - -The spellcheck.collate parameter only returns collations that are guaranteed to result in hits if re-queried, even when applying original `fq` parameters. This is especially helpful when there is more than one correction per query. +`spellcheck.collate`:: +If `true`, this parameter directs Solr to take the best suggestion for each token (if one exists) and construct a new query from the suggestions. ++ +For example, if the input query was "jawa class lording" and the best suggestion for "jawa" was "java" and "lording" was "loading", then the resulting collation would be "java class loading". ++ +The `spellcheck.collate` parameter only returns collations that are guaranteed to result in hits if re-queried, even when applying original `fq` parameters. This is especially helpful when there is more than one correction per query. NOTE: This only returns a query to be used. It does not actually run the suggested query. -[[SpellChecking-Thespellcheck.maxCollationsParameter]] -=== The spellcheck.maxCollations Parameter +`spellcheck.maxCollations`:: +The maximum number of collations to return. The default is `1`. This parameter is ignored if `spellcheck.collate` is false. -The maximum number of collations to return. The default is *1*. This parameter is ignored if `spellcheck.collate` is false. +`spellcheck.maxCollationTries`:: +This parameter specifies the number of collation possibilities for Solr to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. The default value is `0`, which is equivalent to not checking collations. This parameter is ignored if `spellcheck.collate` is false. -[[SpellChecking-Thespellcheck.maxCollationTriesParameter]] -=== The spellcheck.maxCollationTries Parameter +`spellcheck.maxCollationEvaluations`:: +This parameter specifies the maximum number of word correction combinations to rank and evaluate prior to deciding which collation candidates to test against the index. This is a performance safety-net in case a user enters a query with many misspelled words. The default is `10000` combinations, which should work well in most situations. -This parameter specifies the number of collation possibilities for Solr to try before giving up. Lower values ensure better performance. Higher values may be necessary to find a collation that can return results. The default value is `0`, which maintains backwards-compatible (Solr 1.4) behavior (do not check collations). This parameter is ignored if `spellcheck.collate` is false. +`spellcheck.collateExtendedResults`:: +If `true`, this parameter returns an expanded response format detailing the collations Solr found. The default value is `false` and this is ignored if `spellcheck.collate` is false. -[[SpellChecking-Thespellcheck.maxCollationEvaluationsParameter]] -=== The spellcheck.maxCollationEvaluations Parameter - -This parameter specifies the maximum number of word correction combinations to rank and evaluate prior to deciding which collation candidates to test against the index. This is a performance safety-net in case a user enters a query with many misspelled words. The default is *10,000* combinations, which should work well in most situations. - -[[SpellChecking-Thespellcheck.collateExtendedResultsParameter]] -=== The spellcheck.collateExtendedResults Parameter - -If *true*, this parameter returns an expanded response format detailing the collations Solr found. The default value is *false* and this is ignored if `spellcheck.collate` is false. - -[[SpellChecking-Thespellcheck.collateMaxCollectDocsParameter]] -=== The spellcheck.collateMaxCollectDocs Parameter - -This parameter specifies the maximum number of documents that should be collect when testing potential collations against the index. A value of *0* indicates that all documents should be collected, resulting in exact hit-counts. Otherwise an estimation is provided as a performance optimization in cases where exact hit-counts are unnecessary – the higher the value specified, the more precise the estimation. - -The default value for this parameter is *0*, but when `spellcheck.collateExtendedResults` is *false*, the optimization is always used as if a *1* had been specified. - - -[[SpellChecking-Thespellcheck.collateParam._ParameterPrefix]] -=== The spellcheck.collateParam.* Parameter Prefix +`spellcheck.collateMaxCollectDocs`:: +This parameter specifies the maximum number of documents that should be collected when testing potential collations against the index. A value of `0` indicates that all documents should be collected, resulting in exact hit-counts. Otherwise an estimation is provided as a performance optimization in cases where exact hit-counts are unnecessary – the higher the value specified, the more precise the estimation. ++ +The default value for this parameter is `0`, but when `spellcheck.collateExtendedResults` is false, the optimization is always used as if `1` had been specified. +`spellcheck.collateParam.*` Prefix:: This parameter prefix can be used to specify any additional parameters that you wish to the Spellchecker to use when internally validating collation queries. For example, even if your regular search results allow for loose matching of one or more query terms via parameters like `q.op=OR` and `mm=20%` you can specify override params such as `spellcheck.collateParam.q.op=AND&spellcheck.collateParam.mm=100%` to require that only collations consisting of words that are all found in at least one document may be returned. -[[SpellChecking-Thespellcheck.dictionaryParameter]] -=== The spellcheck.dictionary Parameter - -This parameter causes Solr to use the dictionary named in the parameter's argument. The default setting is "default". This parameter can be used to invoke a specific spellchecker on a per request basis. - -[[SpellChecking-Thespellcheck.accuracyParameter]] -=== The spellcheck.accuracy Parameter +`spellcheck.dictionary`:: +This parameter causes Solr to use the dictionary named in the parameter's argument. The default setting is `default`. This parameter can be used to invoke a specific spellchecker on a per request basis. +`spellcheck.accuracy`:: Specifies an accuracy value to be used by the spell checking implementation to decide whether a result is worthwhile or not. The value is a float between 0 and 1. Defaults to `Float.MIN_VALUE`. - -[[spellcheck_DICT_NAME]] -=== The spellcheck..key Parameter - -Specifies a key/value pair for the implementation handling a given dictionary. The value that is passed through is just `key=value` (`spellcheck..` is stripped off. - +`spellcheck..key`:: +Specifies a key/value pair for the implementation handling a given dictionary. The value that is passed through is just `key=value` (`spellcheck..` is stripped off). ++ For example, given a dictionary called `foo`, `spellcheck.foo.myKey=myValue` would result in `myKey=myValue` being passed through to the implementation handling the dictionary `foo`. -[[SpellChecking-Example]] -=== Example +=== Spell Check Example Using Solr's `bin/solr -e techproducts` example, this query shows the results of a simple request that defines a query using the `spellcheck.q` parameter, and forces the collations to require all input terms must match: @@ -368,19 +296,15 @@ Results: ---- -[[SpellChecking-DistributedSpellCheck]] == Distributed SpellCheck The `SpellCheckComponent` also supports spellchecking on distributed indexes. If you are using the SpellCheckComponent on a request handler other than "/select", you must provide the following two parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`shards`:: +Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <> -[cols="30,70",options="header"] -|=== -|Parameter |Description -|shards |Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <> -|shards.qt |Specifies the request handler Solr uses for requests to shards. This parameter is not required for the `/select` request handler. -|=== +`shards.qt`:: +Specifies the request handler Solr uses for requests to shards. This parameter is not required for the `/select` request handler. For example: diff --git a/solr/solr-ref-guide/src/the-dismax-query-parser.adoc b/solr/solr-ref-guide/src/the-dismax-query-parser.adoc index 378fd93f1c4..1cb3014f139 100644 --- a/solr/solr-ref-guide/src/the-dismax-query-parser.adoc +++ b/solr/solr-ref-guide/src/the-dismax-query-parser.adoc @@ -33,50 +33,25 @@ ____ Whether or not you remember this explanation, do remember that the DisMax Query Parser was primarily designed to be easy to use and to accept almost any input without returning an error. -[[TheDisMaxQueryParser-DisMaxParameters]] -== DisMax Parameters +== DisMax Query Parser Parameters -In addition to the common request parameter, highlighting parameters, and simple facet parameters, the DisMax query parser supports the parameters described below. Like the standard query parser, the DisMax query parser allows default parameter values to be specified in `solrconfig.xml`, or overridden by query-time values in the request. - -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Parameter |Description -|<> |Defines the raw input strings for the query. -|<> |Calls the standard query parser and defines query input strings, when the q parameter is not used. -|<> |Query Fields: specifies the fields in the index on which to perform the query. If absent, defaults to `df`. -|<> |Minimum "Should" Match: specifies a minimum number of clauses that must match in a query. If no 'mm' parameter is specified in the query, or as a default in `solrconfig.xml`, the effective value of the `q.op` parameter (either in the query or as a default in `solrconfig.xml`) is used to influence the behavior. If `q.op` is effectively AND'ed, then mm=100%; if `q.op` is OR'ed, then mm=1. Users who want to force the legacy behavior should set a default value for the 'mm' parameter in their `solrconfig.xml` file. Users should add this as a configured default for their request handlers. This parameter tolerates miscellaneous white spaces in expressions (e.g., `" 3 < -25% 10 < -3\n", " \n-25%\n ", " \n3\n "`). -|<> |Phrase Fields: boosts the score of documents in cases where all of the terms in the q parameter appear in close proximity. -|<> |Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase. -|<> |Query Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase. Used specifically with the `qf` parameter. -|<> |Tie Breaker: specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries. Default: 0.0 -|<> |Boost Query: specifies a factor by which a term or phrase should be "boosted" in importance when considering a match. -|<> |Boost Functions: specifies functions to be applied to boosts. (See for details about function queries.) -|=== +In addition to the common request parameters, highlighting parameters, and simple facet parameters, the DisMax query parser supports the parameters described below. Like the standard query parser, the DisMax query parser allows default parameter values to be specified in `solrconfig.xml`, or overridden by query-time values in the request. The sections below explain these parameters in detail. -[[TheDisMaxQueryParser-TheqParameter]] -=== The q Parameter +=== q Parameter The `q` parameter defines the main "query" constituting the essence of the search. The parameter supports raw input strings provided by users with no special escaping. The + and - characters are treated as "mandatory" and "prohibited" modifiers for terms. Text wrapped in balanced quote characters (for example, "San Jose") is treated as a phrase. Any query containing an odd number of quote characters is evaluated as if there were no quote characters at all. -[IMPORTANT] -==== +IMPORTANT: The `q` parameter does not support wildcard characters such as *. -The `q` parameter does not support wildcard characters such as *. -==== - -[[TheDisMaxQueryParser-Theq.altParameter]] -=== The q.alt Parameter +=== q.alt Parameter If specified, the `q.alt` parameter defines a query (which by default will be parsed using standard query parsing syntax) when the main q parameter is not specified or is blank. The `q.alt` parameter comes in handy when you need something like a query to match all documents (don't forget `&rows=0` for that one!) in order to get collection-wide faceting counts. -[[TheDisMaxQueryParser-Theqf_QueryFields_Parameter]] -=== The qf (Query Fields) Parameter +=== qf (Query Fields) Parameter The `qf` parameter introduces a list of fields, each of which is assigned a boost factor to increase or decrease that particular field's importance in the query. For example, the query below: @@ -85,8 +60,7 @@ The `qf` parameter introduces a list of fields, each of which is assigned a boos assigns `fieldOne` a boost of 2.3, leaves `fieldTwo` with the default boost (because no boost factor is specified), and `fieldThree` a boost of 0.4. These boost factors make matches in `fieldOne` much more significant than matches in `fieldTwo`, which in turn are much more significant than matches in `fieldThree`. -[[TheDisMaxQueryParser-Themm_MinimumShouldMatch_Parameter]] -=== The mm (Minimum Should Match) Parameter +=== mm (Minimum Should Match) Parameter When processing queries, Lucene/Solr recognizes three types of clauses: mandatory, prohibited, and "optional" (also known as "should" clauses). By default, all words or phrases specified in the `q` parameter are treated as "optional" clauses unless they are preceded by a "+" or a "-". When dealing with these "optional" clauses, the `mm` parameter makes it possible to say that a certain minimum number of those clauses must match. The DisMax query parser offers great flexibility in how the minimum number can be specified. @@ -115,27 +89,23 @@ When specifying `mm` values, keep in mind the following: The default value of `mm` is 100% (meaning that all clauses must match). -[[TheDisMaxQueryParser-Thepf_PhraseFields_Parameter]] -=== The pf (Phrase Fields) Parameter +=== pf (Phrase Fields) Parameter Once the list of matching documents has been identified using the `fq` and `qf` parameters, the `pf` parameter can be used to "boost" the score of documents in cases where all of the terms in the q parameter appear in close proximity. The format is the same as that used by the `qf` parameter: a list of fields and "boosts" to associate with each of them when making phrase queries out of the entire q parameter. -[[TheDisMaxQueryParser-Theps_PhraseSlop_Parameter]] -=== The ps (Phrase Slop) Parameter +=== ps (Phrase Slop) Parameter The `ps` parameter specifies the amount of "phrase slop" to apply to queries specified with the pf parameter. Phrase slop is the number of positions one token needs to be moved in relation to another token in order to match a phrase specified in a query. -[[TheDisMaxQueryParser-Theqs_QueryPhraseSlop_Parameter]] -=== The qs (Query Phrase Slop) Parameter +=== qs (Query Phrase Slop) Parameter The `qs` parameter specifies the amount of slop permitted on phrase queries explicitly included in the user's query string with the `qf` parameter. As explained above, slop refers to the number of positions one token needs to be moved in relation to another token in order to match a phrase specified in a query. -[[TheDisMaxQueryParser-Thetie_TieBreaker_Parameter]] === The tie (Tie Breaker) Parameter The `tie` parameter specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries. @@ -145,8 +115,7 @@ When a term from the user's input is tested against multiple fields, more than o A value of "0.0" - the default - makes the query a pure "disjunction max query": that is, only the maximum scoring subquery contributes to the final score. A value of "1.0" makes the query a pure "disjunction sum query" where it doesn't matter what the maximum scoring sub query is, because the final score will be the sum of the subquery scores. Typically a low value, such as 0.1, is useful. -[[TheDisMaxQueryParser-Thebq_BoostQuery_Parameter]] -=== The bq (Boost Query) Parameter +=== bq (Boost Query) Parameter The `bq` parameter specifies an additional, optional, query clause that will be added to the user's main query to influence the score. For example, if you wanted to add a relevancy boost for recent documents: @@ -159,8 +128,7 @@ bq=date:[NOW/DAY-1YEAR TO NOW/DAY] You can specify multiple `bq` parameters. If you want your query to be parsed as separate clauses with separate boosts, use multiple `bq` parameters. -[[TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter]] -=== The bf (Boost Functions) Parameter +=== bf (Boost Functions) Parameter The `bf` parameter specifies functions (with optional boosts) that will be used to construct FunctionQueries which will be added to the user's main query as optional clauses that will influence the score. Any function supported natively by Solr can be used, along with a boost value. For example: @@ -180,7 +148,7 @@ bf=recip(rord(creationDate),1,1000,1000) bq={!func}recip(rord(creationDate),1,1000,1000) ---- -[[TheDisMaxQueryParser-ExamplesofQueriesSubmittedtotheDisMaxQueryParser]] + == Examples of Queries Submitted to the DisMax Query Parser All of the sample URLs in this section assume you are running Solr's "techproducts" example: diff --git a/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc b/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc index c4e0fde4e45..3b0bd49bc31 100644 --- a/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc +++ b/solr/solr-ref-guide/src/the-extended-dismax-query-parser.adoc @@ -1,4 +1,4 @@ -= The Extended DisMax Query Parser += The Extended DisMax (eDismax) Query Parser :page-shortname: the-extended-dismax-query-parser :page-permalink: the-extended-dismax-query-parser.html // Licensed to the Apache Software Foundation (ASF) under one @@ -33,76 +33,52 @@ In addition to supporting all the DisMax query parser parameters, Extended Disma * supports pure negative nested queries: queries such as `+foo (-foo)` will match all documents. * lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches. -[[TheExtendedDisMaxQueryParser-ExtendedDisMaxParameters]] == Extended DisMax Parameters -In addition to all the <>, Extended DisMax includes these query parameters: +In addition to all the <>, Extended DisMax includes these query parameters: -[[TheExtendedDisMaxQueryParser-ThesowParameter]] -=== The sow Parameter +`sow`:: +Split on whitespace. If set to `false`, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g., multi-word synonyms and shingles. Defaults to `true`, so text analysis is invoked separately for each individual whitespace-separated term. -Split on whitespace: if set to `false`, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to `true`: text analysis is invoked separately for each individual whitespace-separated term. +`mm.autoRelax`:: +If `true`, the number of clauses required (<>) will automatically be relaxed if a clause is removed (by e.g. stopwords filter) from some but not all <> fields. Use this parameter as a workaround if you experience that queries return zero hits due to uneven stopword removal between the `qf` fields. ++ +Note that relaxing `mm` may cause undesired side effects, such as hurting the precision of the search, depending on the nature of your index content. -[[TheExtendedDisMaxQueryParser-Themm.autoRelaxParameter]] -=== The mm.autoRelax Parameter - -If true, the number of clauses required (<>) will automatically be relaxed if a clause is removed (by e.g. stopwords filter) from some but not all <> fields. Use this parameter as a workaround if you experience that queries return zero hits due to uneven stopword removal between the `qf` fields. - -Note that relaxing mm may cause undesired side effects, hurting the precision of the search, depending on the nature of your index content. - -[[TheExtendedDisMaxQueryParser-TheboostParameter]] -=== The boost Parameter - -A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the `BoostQParserPlugin` - -[[TheExtendedDisMaxQueryParser-ThelowercaseOperatorsParameter]] -=== The lowercaseOperators Parameter +`boost`:: +A multivalued list of strings parsed as queries with scores multiplied by the score from the main query for all matching documents. This parameter is shorthand for wrapping the query produced by eDisMax using the `BoostQParserPlugin`. +`lowercaseOperators`:: A Boolean parameter indicating if lowercase "and" and "or" should be treated the same as operators "AND" and "OR". Defaults to `false`. -[[TheExtendedDisMaxQueryParser-ThepsParameter]] -=== The ps Parameter +`ps`:: +Phrase Slop. The default amount of slop - distance between terms - on phrase queries built with `pf`, `pf2` and/or `pf3` fields (affects boosting). See also the section <> below. -Default amount of slop on phrase queries built with `pf`, `pf2` and/or `pf3` fields (affects boosting). +`pf2`:: -[[TheExtendedDisMaxQueryParser-Thepf2Parameter]] -=== The pf2 Parameter - -A multivalued list of fields with optional weights, based on pairs of word shingles. - -[[TheExtendedDisMaxQueryParser-Theps2Parameter]] -=== The ps2 Parameter +A multivalued list of fields with optional weights. Similar to `pf`, but based on _pairs_ of word shingles. +`ps2`:: This is similar to `ps` but overrides the slop factor used for `pf2`. If not specified, `ps` is used. -[[TheExtendedDisMaxQueryParser-Thepf3Parameter]] -=== The pf3 Parameter - -A multivalued list of fields with optional weights, based on triplets of word shingles. Similar to `pf`, except that instead of building a phrase per field out of all the words in the input, it builds a set of phrases for each field out of each triplet of word shingles. - -[[TheExtendedDisMaxQueryParser-Theps3Parameter]] -=== The ps3 Parameter +`pf3`:: +A multivalued list of fields with optional weights, based on triplets of word shingles. Similar to `pf`, except that instead of building a phrase per field out of all the words in the input, it builds a set of phrases for each field out of each _triplet_ of word shingles. +`ps3`:: This is similar to `ps` but overrides the slop factor used for `pf3`. If not specified, `ps` is used. -[[TheExtendedDisMaxQueryParser-ThestopwordsParameter]] -=== The stopwords Parameter - -A Boolean parameter indicating if the `StopFilterFactory` configured in the query analyzer should be respected when parsing the query: if it is false, then the `StopFilterFactory` in the query analyzer is ignored. - -[[TheExtendedDisMaxQueryParser-TheufParameter]] -=== The uf Parameter +`stopwords`:: +A Boolean parameter indicating if the `StopFilterFactory` configured in the query analyzer should be respected when parsing the query. If this is set to `false`, then the `StopFilterFactory` in the query analyzer is ignored. +`uf`:: Specifies which schema fields the end user is allowed to explicitly query. This parameter supports wildcards. The default is to allow all fields, equivalent to `uf=\*`. To allow only title field, use `uf=title`. To allow title and all fields ending with '_s', use `uf=title,*_s`. To allow all fields except title, use `uf=*,-title`. To disallow all fielded searches, use `uf=-*`. -[[TheExtendedDisMaxQueryParser-Fieldaliasingusingper-fieldqfoverrides]] -=== Field aliasing using per-field qf overrides +=== Field Aliasing using Per-Field qf Overrides Per-field overrides of the `qf` parameter may be specified to provide 1-to-many aliasing from field names specified in the query string, to field names used in the underlying query. By default, no aliasing is used and field names specified in the query string are treated as literal field names in the index. -[[TheExtendedDisMaxQueryParser-ExamplesofQueriesSubmittedtotheExtendedDisMaxQueryParser]] -== Examples of Queries Submitted to the Extended DisMax Query Parser +== Examples of eDismax Queries All of the sample URLs in this section assume you are running Solr's "```techproducts```" example: @@ -158,14 +134,12 @@ qf=title text last_name first_name f.name.qf=last_name first_name ---- -[[TheExtendedDisMaxQueryParser-Usingnegativeboost]] -== Using negative boost +== Using Negative Boost Negative query boosts have been supported at the "Query" object level for a long time (resulting in negative scores for matching documents). Now the QueryParsers have been updated to handle this too. -[[TheExtendedDisMaxQueryParser-Using_slop_]] -== Using 'slop' +== Using 'Slop' `Dismax` and `Edismax` can run queries against all query fields, and also run a query in the form of a phrase against the phrase fields. (This will work only for boosting documents, not actually for matching.) However, that phrase query can have a 'slop,' which is the distance between the terms of the query while still considering it a phrase match. For example: @@ -223,8 +197,7 @@ A document that contains "Hans Anderson" will match, but a document that contain Finally, in addition to the phrase fields (`pf`) parameter, `edismax` also supports the `pf2` and `pf3` parameters, for fields over which to create bigram and trigram phrase queries. The phrase slop for these parameters' queries can be specified using the `ps2` and `ps3` parameters, respectively. If you use `pf2`/`pf3` but `ps2`/`ps3`, then the phrase slop for these parameters' queries will be taken from the `ps` parameter, if any. -[[TheExtendedDisMaxQueryParser-Usingthe_magicfields__val_and_query_]] -== Using the "magic fields" \_val_ and \_query_ +== Using the "Magic Fields" \_val_ and \_query_ The Solr Query Parser's use of `\_val_` and `\_query_` differs from the Lucene Query Parser in the following ways: @@ -257,9 +230,4 @@ createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR] createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z] ---- -[IMPORTANT] -==== - -`TO` must be uppercase, or Solr will report a 'Range Group' error. - -==== +IMPORTANT: `TO` must be uppercase, or Solr will report a 'Range Group' error. diff --git a/solr/solr-ref-guide/src/the-standard-query-parser.adoc b/solr/solr-ref-guide/src/the-standard-query-parser.adoc index f389b92a962..b58c4f365e4 100644 --- a/solr/solr-ref-guide/src/the-standard-query-parser.adoc +++ b/solr/solr-ref-guide/src/the-standard-query-parser.adoc @@ -22,31 +22,28 @@ Solr's default Query Parser is also known as the "```lucene```" parser. The key advantage of the standard query parser is that it supports a robust and fairly intuitive syntax allowing you to create a variety of structured queries. The largest disadvantage is that it's very intolerant of syntax errors, as compared with something like the <> query parser which is designed to throw as few errors as possible. -[[TheStandardQueryParser-StandardQueryParserParameters]] == Standard Query Parser Parameters In addition to the <>, <>, <>, and <>, the standard query parser supports the parameters described in the table below. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`q`:: +Defines a query using standard query syntax. This parameter is mandatory. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|q |Defines a query using standard query syntax. This parameter is mandatory. -|q.op |Specifies the default operator for query expressions, overriding the default operator specified in the Schema. Possible values are "AND" or "OR". -|df |Specifies a default field, overriding the definition of a default field in the Schema. -|sow |Split on whitespace: if set to `false`, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to `true`: text analysis is invoked separately for each individual whitespace-separated term. -|=== +`q.op`:: +Specifies the default operator for query expressions, overriding the default operator specified in the Schema. Possible values are "AND" or "OR". + +`df`:: +Specifies a default field, overriding the definition of a default field in the Schema. + +`sow`:: +Split on whitespace: if set to `false`, whitespace-separated term sequences will be provided to text analysis in one shot, enabling proper function of analysis filters that operate over term sequences, e.g. multi-word synonyms and shingles. Defaults to `true`: text analysis is invoked separately for each individual whitespace-separated term. Default parameter values are specified in `solrconfig.xml`, or overridden by query-time values in the request. +== Standard Query Parser Response -[[TheStandardQueryParser-TheStandardQueryParser_sResponse]] -== The Standard Query Parser's Response +By default, the response from the standard query parser contains one `` block, which is unnamed. If the <> is used, then an additional `` block will be returned, using the name "debug". This will contain useful debugging info, including the original query string, the parsed query string, and explain info for each document in the block. If the <> is also used, then additional explain info will be provided for all the documents matching that query. -By default, the response from the standard query parser contains one `` block, which is unnamed. If the <> is used, then an additional `` block will be returned, using the name "debug". This will contain useful debugging info, including the original query string, the parsed query string, and explain info for each document in the block. If the <> is also used, then additional explain info will be provided for all the documents matching that query. - -[[TheStandardQueryParser-SampleResponses]] === Sample Responses This section presents examples of responses from the standard query parser. @@ -97,7 +94,6 @@ Results: ---- -[[TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser]] == Specifying Terms for the Standard Query Parser A query to the standard query parser is broken up into terms and operators. There are two types of terms: single terms and phrases. @@ -107,19 +103,12 @@ A query to the standard query parser is broken up into terms and operators. Ther Multiple terms can be combined together with Boolean operators to form more complex queries (as described below). -[IMPORTANT] -==== +IMPORTANT: It is important that the analyzer used for queries parses terms and phrases in a way that is consistent with the way the analyzer used for indexing parses terms and phrases; otherwise, searches may produce unexpected results. -It is important that the analyzer used for queries parses terms and phrases in a way that is consistent with the way the analyzer used for indexing parses terms and phrases; otherwise, searches may produce unexpected results. - -==== - -[[TheStandardQueryParser-TermModifiers]] === Term Modifiers Solr supports a variety of term modifiers that add flexibility or precision, as needed, to searches. These modifiers include wildcard characters, characters for making a search "fuzzy" or more general, and so on. The sections below describe these modifiers in detail. -[[TheStandardQueryParser-WildcardSearches]] === Wildcard Searches Solr's standard query parser supports single and multiple character wildcard searches within single terms. Wildcard characters can be applied to single terms, but not to search phrases. @@ -133,7 +122,6 @@ Solr's standard query parser supports single and multiple character wildcard sea |Multiple characters (matches zero or more sequential characters) |* |The wildcard search: `tes*` would match test, testing, and tester. You can also use wildcard characters in the middle of a term. For example: `te*t` would match test and text. `*est` would match pest and test. |=== -[[TheStandardQueryParser-FuzzySearches]] === Fuzzy Searches Solr's standard query parser supports fuzzy searches based on the Damerau-Levenshtein Distance or Edit Distance algorithm. Fuzzy searches discover terms that are similar to a specified term without necessarily being an exact match. To perform a fuzzy search, use the tilde ~ symbol at the end of a single-word term. For example, to search for a term similar in spelling to "roam," use the fuzzy search: @@ -148,14 +136,8 @@ An optional distance parameter specifies the maximum number of edits allowed, be This will match terms like roams & foam - but not foams since it has an edit distance of "2". -[IMPORTANT] -==== +IMPORTANT: In many cases, stemming (reducing terms to a common stem) can produce similar effects to fuzzy searches and wildcard searches. -In many cases, stemming (reducing terms to a common stem) can produce similar effects to fuzzy searches and wildcard searches. - -==== - -[[TheStandardQueryParser-ProximitySearches]] === Proximity Searches A proximity search looks for terms that are within a specific distance from one another. @@ -166,7 +148,6 @@ To perform a proximity search, add the tilde character ~ and a numeric value to The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if "apache" and "jakarta" were 10 spaces apart in a field, but "apache" appeared before "jakarta", more than 10 term movements would be required to move the terms together and position "apache" to the right of "jakarta" with a space in between. -[[TheStandardQueryParser-RangeSearches]] === Range Searches A range search specifies a range of values for a field (a range with an upper bound and a lower bound). The query matches documents whose values for the specified field or fields fall within the range. Range queries can be inclusive or exclusive of the upper and lower bounds. Sorting is done lexicographically, except on numeric fields. For example, the range query below matches all documents whose `popularity` field has a value between 52 and 10,000, inclusive. @@ -185,8 +166,6 @@ The brackets around a query determine its inclusiveness. * Curly brackets `{` & `}` denote an exclusive range query that matches values between the upper and lower bounds, but excluding the upper and lower bounds themselves. * You can mix these types so one end of the range is inclusive and the other is exclusive. Here's an example: `count:{1 TO 10]` - -[[TheStandardQueryParser-BoostingaTermwith_]] === Boosting a Term with "^" Lucene/Solr provides the relevance level of matching documents based on the terms found. To boost a term use the caret symbol `^` with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. @@ -204,7 +183,6 @@ This will make documents with the term jakarta appear more relevant. You can als By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (for example, it could be 0.2). -[[TheStandardQueryParser-ConstantScorewith_]] === Constant Score with "^=" Constant score queries are created with `^=`, which sets the entire clause to the specified score for any documents matching that clause. This is desirable when you only care about matches for a particular clause and don't want other relevancy factors such as term frequency (the number of times the term appears in the field) or inverse document frequency (a measure across the whole index for how rare a term is in a field). @@ -214,9 +192,7 @@ Example: [source,text] (description:blue OR color:blue)^=1.0 text:shoes - -[[TheStandardQueryParser-SpecifyingFieldsinaQuerytotheStandardQueryParser]] -== Specifying Fields in a Query to the Standard Query Parser +== Querying Specific Fields Data indexed in Solr is organized in fields, which are <>. Searches can take advantage of fields to add precision to queries. For example, you can search for a term only in a specific field, such as a title field. @@ -234,7 +210,6 @@ Since text is the default field, the field indicator is not required; hence the The field is only valid for the term that it directly precedes, so the query `title:Do it right` will find only "Do" in the title field. It will find "it" and "right" in the default field (in this case the text field). -[[TheStandardQueryParser-BooleanOperatorsSupportedbytheStandardQueryParser]] == Boolean Operators Supported by the Standard Query Parser Boolean operators allow you to apply Boolean logic to queries, requiring the presence or absence of specific terms or conditions in fields in order to match documents. The table below summarizes the Boolean operators supported by the standard query parser. @@ -253,19 +228,9 @@ Boolean operators allow you to apply Boolean logic to queries, requiring the pre Boolean operators allow terms to be combined through logic operators. Lucene supports AND, "`+`", OR, NOT and "`-`" as Boolean operators. -[IMPORTANT] -==== +IMPORTANT: When specifying Boolean operators with keywords such as AND or NOT, the keywords must appear in all uppercase. -When specifying Boolean operators with keywords such as AND or NOT, the keywords must appear in all uppercase. - -==== - -[NOTE] -==== - -The standard query parser supports all the Boolean operators listed in the table above. The DisMax query parser supports only `+` and `-`. - -==== +NOTE: The standard query parser supports all the Boolean operators listed in the table above. The DisMax query parser supports only `+` and `-`. The OR operator is the default conjunction operator. This means that if there is no Boolean operator between two terms, the OR operator is used. The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR. @@ -277,8 +242,6 @@ or `"jakarta apache" OR jakarta` - -[[TheStandardQueryParser-TheBooleanOperator_]] === The Boolean Operator "+" The `+` symbol (also known as the "required" operator) requires that the term after the `+` symbol exist somewhere in a field in at least one document in order for the query to return a match. @@ -287,15 +250,8 @@ For example, to search for documents that must contain "jakarta" and that may or `+jakarta lucene` -[NOTE] -==== +NOTE: This operator is supported by both the standard query parser and the DisMax query parser. -This operator is supported by both the standard query parser and the DisMax query parser. - -==== - - -[[TheStandardQueryParser-TheBooleanOperatorAND_]] === The Boolean Operator AND ("&&") The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol `&&` can be used in place of the word AND. @@ -307,7 +263,6 @@ To search for documents that contain "jakarta apache" and "Apache Lucene," use e `"jakarta apache" && "Apache Lucene"` -[[TheStandardQueryParser-TheBooleanOperatorNOT_]] === The Boolean Operator NOT ("!") The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol `!` can be used in place of the word NOT. @@ -318,7 +273,6 @@ The following queries search for documents that contain the phrase "jakarta apac `"jakarta apache" ! "Apache Lucene"` -[[TheStandardQueryParser-TheBooleanOperator-]] === The Boolean Operator "-" The `-` symbol or "prohibit" operator excludes documents that contain the term after the `-` symbol. @@ -327,7 +281,6 @@ For example, to search for documents that contain "jakarta apache" but not "Apac `"jakarta apache" -"Apache Lucene"` -[[TheStandardQueryParser-EscapingSpecialCharacters]] === Escaping Special Characters Solr gives the following characters special meaning when they appear in a query: @@ -341,7 +294,6 @@ To make Solr interpret any of these characters literally, rather as a special ch \(1\+1\)\:2 ---- -[[TheStandardQueryParser-GroupingTermstoFormSub-Queries]] == Grouping Terms to Form Sub-Queries Lucene/Solr supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query. @@ -352,15 +304,13 @@ The query below searches for either "jakarta" or "apache" and "website": This adds precision to the query, requiring that the term "website" exist, along with either term "jakarta" and "apache." -[[TheStandardQueryParser-GroupingClauseswithinaField]] === Grouping Clauses within a Field To apply two or more Boolean operators to a single field in a search, group the Boolean clauses within parentheses. For example, the query below searches for a title field that contains both the word "return" and the phrase "pink panther": `title:(+return +"pink panther")` -[[TheStandardQueryParser-Comments]] -== Comments +== Comments in Queries C-Style comments are supported in query strings. @@ -370,7 +320,6 @@ Example: Comments may be nested. -[[TheStandardQueryParser-DifferencesbetweenLuceneQueryParserandtheSolrStandardQueryParser]] == Differences between Lucene Query Parser and the Solr Standard Query Parser Solr's standard query parser differs from the Lucene Query Parser in the following ways: @@ -399,7 +348,6 @@ This can even be used to cache individual clauses of complex filter queries. In * Constant score queries are created with `^=`, which sets the entire clause to the specified score for any documents matching that clause: ** `q=(description:blue color:blue)^=1.0 title:blue^=5.0` -[[TheStandardQueryParser-SpecifyingDatesandTimes]] === Specifying Dates and Times Queries against fields using the `TrieDateField` type (typically range queries) should use the <>: @@ -410,9 +358,3 @@ Queries against fields using the `TrieDateField` type (typically range queries) * `pubdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]` * `createdate:[1976-03-06T23:59:59.999Z TO 1976-03-06T23:59:59.999Z+1YEAR]` * `createdate:[1976-03-06T23:59:59.999Z/YEAR TO 1976-03-06T23:59:59.999Z]` - -[[TheStandardQueryParser-RelatedTopics]] -== Related Topics - -* <> -* <> diff --git a/solr/solr-ref-guide/src/the-stats-component.adoc b/solr/solr-ref-guide/src/the-stats-component.adoc index ada56a86e1a..b3545055cf0 100644 --- a/solr/solr-ref-guide/src/the-stats-component.adoc +++ b/solr/solr-ref-guide/src/the-stats-component.adoc @@ -192,4 +192,4 @@ Here we compute some statistics for the price field. The min, max, mean, 90th, a Sets of `stats.field` parameters can be referenced by `'tag'` when using Pivot Faceting to compute multiple statistics at every level (i.e.: field) in the tree of pivot constraints. -For more information and a detailed example, please see <>. +For more information and a detailed example, please see <>.