diff --git a/solr/solr-ref-guide/src/blockjoin-faceting.adoc b/solr/solr-ref-guide/src/blockjoin-faceting.adoc index bf33aca1961..1a89a570071 100644 --- a/solr/solr-ref-guide/src/blockjoin-faceting.adoc +++ b/solr/solr-ref-guide/src/blockjoin-faceting.adoc @@ -102,14 +102,15 @@ Queries are constructed the same way as for a <>. ++ +At most only one of the `min`, `max`, or `sort` (see below) parameters may be specified. ++ +If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none. -At most only one of the min, max, or sort (see below) parameters may be specified. - -If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. |none -|sort a| +sort:: Selects the group head document for each group based on which document comes first according to the specified <>. ++ +At most only one of the `min`, `max`, (see above) or `sort` parameters may be specified. ++ +If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. The default is none. -At most only one of the min, max, (see above) or sort parameters may be specified. +nullPolicy:: +There are three available null policies: ++ +* `ignore`: removes documents with a null value in the collapse field. This is the default. +* `expand`: treats each document with a null value in the collapse field as a separate group. +* `collapse`: collapses all documents with a null value into a single group using either highest score, or minimum/maximum. ++ +The default is `ignore`. -If none are specified, the group head document of each group will be selected based on the highest scoring document in that group. |none -|nullPolicy a| -There are three null policies: +hint:: +Currently there is only one hint available: `top_fc`, which stands for top level FieldCache. ++ +The `top_fc` hint is only available when collapsing on String fields. `top_fc` usually provides the best query time speed but takes the longest to warm on startup or following a commit. `top_fc` will also result in having the collapsed field cached in memory twice if it's used for faceting or sorting. For very high cardinality (high distinct count) fields, `top_fc` may not fare so well. ++ +The default is none. -* *ignore*: removes documents with a null value in the collapse field. This is the default. -* *expand*: treats each document with a null value in the collapse field as a separate group. -* *collapse*: collapses all documents with a null value into a single group using either highest score, or minimum/maximum. +size:: +Sets the initial size of the collapse data structures when collapsing on a *numeric field only*. ++ +The data structures used for collapsing grow dynamically when collapsing on numeric fields. Setting the size above the number of results expected in the result set will eliminate the resizing cost. ++ +The default is 100,000. - |ignore -|hint |Currently there is only one hint available: `top_fc`, which stands for top level FieldCache. The `top_fc` hint is only available when collapsing on String fields. `top_fc` usually provides the best query time speed but takes the longest to warm on startup or following a commit. `top_fc` will also result in having the collapsed field cached in memory twice if it's used for faceting or sorting. For very high cardinality (high distinct count) fields, `top_fc` may not fare so well. |none -|size |Sets the initial size of the collapse data structures when collapsing on a *numeric field only*. The data structures used for collapsing grow dynamically when collapsing on numeric fields. Setting the size above the number of results expected in the result set will eliminate the resizing cost. |100,000 -|=== -*Sample Syntax:* +=== Sample Syntax Collapse on `group_field` selecting the document in each group with the highest scoring document: @@ -137,13 +148,14 @@ Inside the expanded section there is a _map_ with each group head pointing to th The ExpandComponent has the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +expand.sort:: +Orders the documents within the expanded groups. The default is `score desc`. -[cols="20,60,20",options="header"] -|=== -|Parameter |Description |Default -|expand.sort |Orders the documents within the expanded groups |score desc -|expand.rows |The number of rows to display in each group |5 -|expand.q |Overrides the main q parameter, determines which documents to include in the main group. |main q -|expand.fq |Overrides main fq's, determines which documents to include in the main group. |main fq's -|=== +expand.rows:: +The number of rows to display in each group. The default is 5 rows. + +expand.q:: +Overrides the main query (`q`), determines which documents to include in the main group. The default is to use the main query. + +expand.fq:: +Overrides main filter queries (`fq`), determines which documents to include in the main group. The default is to use the main filter queries. diff --git a/solr/solr-ref-guide/src/configsets-api.adoc b/solr/solr-ref-guide/src/configsets-api.adoc index 2bd50ffc4ba..603e08e097e 100644 --- a/solr/solr-ref-guide/src/configsets-api.adoc +++ b/solr/solr-ref-guide/src/configsets-api.adoc @@ -46,15 +46,13 @@ Create a ConfigSet, based on an existing ConfigSet. [[ConfigSetsAPI-Input]] === Input -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +The following parameters are supported when creating a ConfigSet. -[cols="25,10,10,10,45",options="header"] -|=== -|Key |Type |Required |Default |Description -|name |String |Yes | |ConfigSet to be created -|baseConfigSet |String |Yes | |ConfigSet to copy as a base -|configSetProp._name=value_ |String |No | |ConfigSet property from base to override -|=== +name:: The ConfigSet to be created. This parameter is required. + +baseConfigSet:: The ConfigSet to copy as a base. This parameter is required. + +configSetProp._name_=_value_:: Any ConfigSet property from base to override. [[ConfigSetsAPI-Output]] === Output @@ -101,13 +99,7 @@ Delete a ConfigSet *Query Parameters* -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,15,10,15,40",options="header"] -|=== -|Key |Type |Required |Default |Description -|name |String |Yes | |ConfigSet to be deleted -|=== +name:: The ConfigSet to be deleted. This parameter is required. [[ConfigSetsAPI-Output.1]] === Output @@ -184,13 +176,7 @@ Upload a ConfigSet, sent in as a zipped file. Please note that a ConfigSet is up [[ConfigSetsAPI-Input.3]] === Input -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,15,10,15,40",options="header"] -|=== -|Key |Type |Required |Default |Description -|name |String |Yes | |ConfigSet to be created -|=== +name:: The ConfigSet to be created when the upload is complete. This parameter is required. The body of the request should contain a zipped config set. diff --git a/solr/solr-ref-guide/src/cross-data-center-replication-cdcr.adoc b/solr/solr-ref-guide/src/cross-data-center-replication-cdcr.adoc index 3bd482b623f..67729557a5a 100644 --- a/solr/solr-ref-guide/src/cross-data-center-replication-cdcr.adoc +++ b/solr/solr-ref-guide/src/cross-data-center-replication-cdcr.adoc @@ -252,15 +252,15 @@ The configuration details, defaults and options are as follows: CDCR can be configured to forward update requests to one or more replicas. A replica is defined with a “replica” list as follows: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed -[cols="20,10,15,55",options="header"] -|=== -|Parameter |Required |Default |Description -|zkHost |Yes |none |The host address for ZooKeeper of the target SolrCloud. Usually this is a comma-separated list of addresses to each node in the target ZooKeeper ensemble. -|Source |Yes |none |The name of the collection on the Source SolrCloud to be replicated. -|Target |Yes |none |The name of the collection on the target SolrCloud to which updates will be forwarded. -|=== +`zkHost`:: +The host address for ZooKeeper of the target SolrCloud. Usually this is a comma-separated list of addresses to each node in the target ZooKeeper ensemble. This parameter is required. + +`Source`:: +The name of the collection on the Source SolrCloud to be replicated. This parameter is required. + +`Target`:: +The name of the collection on the target SolrCloud to which updates will be forwarded. This parameter is required. ==== The Replicator Element @@ -268,39 +268,28 @@ The CDC Replicator is the component in charge of forwarding updates to the repli The replicator uses a fixed thread pool to forward updates to multiple replicas in parallel. If more than one replica is configured, one thread will forward a batch of updates from one replica at a time in a round-robin fashion. The replicator can be configured with a “replicator” list as follows: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`threadPoolSize`:: +The number of threads to use for forwarding updates. One thread per replica is recommended. The default is `2`. -[cols="20,10,15,55",options="header"] -|=== -|Parameter |Required |Default |Description -|threadPoolSize |No |2 |The number of threads to use for forwarding updates. One thread per replica is recommended. -|schedule |No |10 |The delay in milliseconds for the monitoring the update log(s). -|batchSize |No |128 |The number of updates to send in one batch. The optimal size depends on the size of the documents. Large batches of large documents can increase your memory usage significantly. -|=== +`schedule`:: +The delay in milliseconds for the monitoring the update log(s). The default is `10`. + +`batchSize`:: +The number of updates to send in one batch. The optimal size depends on the size of the documents. Large batches of large documents can increase your memory usage significantly. The default is `128`. ==== The updateLogSynchronizer Element Expert: Non-leader nodes need to synchronize their update logs with their leader node from time to time in order to clean deprecated transaction log files. By default, such a synchronization process is performed every minute. The schedule of the synchronization can be modified with a “updateLogSynchronizer” list as follows: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,10,15,55",options="header"] -|=== -|Parameter |Required |Default |Description -|schedule |No |60000 |The delay in milliseconds for synchronizing the updates log. -|=== +`schedule`:: + The delay in milliseconds for synchronizing the updates log. The default is `60000`. ==== The Buffer Element CDCR is configured by default to buffer any new incoming updates. When buffering updates, the updates log will store all the updates indefinitely. Replicas do not need to buffer updates, and it is recommended to disable buffer on the target SolrCloud. The buffer can be disabled at startup with a “buffer” list and the parameter “defaultState” as follows: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,10,15,55",options="header"] -|=== -|Parameter |Required |Default |Description -|defaultState |No |enabled |The state of the buffer at startup. -|=== +`defaultState`:: +The state of the buffer at startup. The default is `enabled`. == CDCR API diff --git a/solr/solr-ref-guide/src/de-duplication.adoc b/solr/solr-ref-guide/src/de-duplication.adoc index 8f4d01a2c9d..3e9cd46a141 100644 --- a/solr/solr-ref-guide/src/de-duplication.adoc +++ b/solr/solr-ref-guide/src/de-duplication.adoc @@ -20,17 +20,12 @@ If duplicate, or near-duplicate documents are a concern in your index, de-duplication may be worth implementing. -Preventing duplicate or near duplicate documents from entering an index or tagging documents with a signature/fingerprint for duplicate field collapsing can be efficiently achieved with a low collision or fuzzy hash algorithm. Solr natively supports de-duplication techniques of this type via the `Signature` class and allows for the easy addition of new hash/signature implementations. A Signature can be implemented several ways: +Preventing duplicate or near duplicate documents from entering an index or tagging documents with a signature/fingerprint for duplicate field collapsing can be efficiently achieved with a low collision or fuzzy hash algorithm. Solr natively supports de-duplication techniques of this type via the `Signature` class and allows for the easy addition of new hash/signature implementations. A Signature can be implemented in a few ways: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +* MD5Signature: 128-bit hash used for exact duplicate detection. +* Lookup3Signature: 64-bit hash used for exact duplicate detection. This is much faster than MD5 and smaller to index. +* http://wiki.apache.org/solr/TextProfileSignature[TextProfileSignature]: Fuzzy hashing implementation from Apache Nutch for near duplicate detection. It's tunable but works best on longer text. -[cols="30,70",options="header"] -|=== -|Method |Description -|MD5Signature |128-bit hash used for exact duplicate detection. -|Lookup3Signature |64-bit hash used for exact duplicate detection. This is much faster than MD5 and smaller to index. -|http://wiki.apache.org/solr/TextProfileSignature[TextProfileSignature] |Fuzzy hashing implementation from Apache Nutch for near duplicate detection. It's tunable but works best on longer text. -|=== Other, more sophisticated algorithms for fuzzy/near hashing can be added later. @@ -68,23 +63,27 @@ The `SignatureUpdateProcessorFactory` has to be registered in `solrconfig.xml` a The `SignatureUpdateProcessorFactory` takes several properties: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,30,50",options="header"] -|=== -|Parameter |Default |Description -|signatureClass |`org.apache.solr.update.processor.Lookup3Signature` a| -A Signature implementation for generating a signature hash. The full classpath of the implementation must be specified. The available options are described above, the associated classpaths to use are: +signatureClass:: +A Signature implementation for generating a signature hash. The default is `org.apache.solr.update.processor.Lookup3Signature`. ++ +The full classpath of the implementation must be specified. The available options are described above, the associated classpaths to use are: * `org.apache.solr.update.processor.Lookup3Signature` * `org.apache.solr.update.processor.MD5Signature` * `org.apache.solr.update.process.TextProfileSignature` -|fields |all fields |The fields to use to generate the signature hash in a comma separated list. By default, all fields on the document will be used. -|signatureField |signatureField |The name of the field used to hold the fingerprint/signature. The field should be defined in schema.xml. -|enabled |true |Enable/disable de-duplication processing. -|overwriteDupes |true |If true, when a document exists that already matches this signature, it will be overwritten. -|=== +fields:: +The fields to use to generate the signature hash in a comma separated list. By default, all fields on the document will be used. + +signatureField:: +The name of the field used to hold the fingerprint/signature. The field should be defined in `schema.xml`. The default is `signatureField`. + +enabled:: +Set to *false* to disable de-duplication processing. The default is *true*. + +overwriteDupes:: +If true, the default, when a document exists that already matches this signature, it will be overwritten. + [[De-Duplication-Inschema.xml]] === In schema.xml diff --git a/solr/solr-ref-guide/src/defining-core-properties.adoc b/solr/solr-ref-guide/src/defining-core-properties.adoc index 93a3d3ea3c7..a533098609e 100644 --- a/solr/solr-ref-guide/src/defining-core-properties.adoc +++ b/solr/solr-ref-guide/src/defining-core-properties.adoc @@ -70,26 +70,32 @@ The minimal `core.properties` file is an empty file, in which case all of the pr Java properties files allow the hash (`#`) or bang (`!`) characters to specify comment-to-end-of-line. -This table defines the recognized properties: +The following properties are available: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`name`:: The name of the SolrCore. You'll use this name to reference the SolrCore when running commands with the CoreAdminHandler. -[cols="25,75",options="header"] -|=== -|Property |Description -|`name` |The name of the SolrCore. You'll use this name to reference the SolrCore when running commands with the CoreAdminHandler. -|`config` |The configuration file name for a given core. The default is `solrconfig.xml`. -|`schema` |The schema file name for a given core. The default is `schema.xml` but please note that if you are using a "managed schema" (the default behavior) then any value for this property which does not match the effective `managedSchemaResourceName` will be read once, backed up, and converted for managed schema use. See <> for details. -|`dataDir` |The core's data directory (where indexes are stored) as either an absolute pathname, or a path relative to the value of `instanceDir`. This is `data` by default. -|`configSet` |The name of a defined configset, if desired, to use to configure the core (see the <> for more details). -|`properties` |The name of the properties file for this core. The value can be an absolute pathname or a path relative to the value of `instanceDir`. -|`transient` |If *true*, the core can be unloaded if Solr reaches the `transientCacheSize`. The default if not specified is *false*. Cores are unloaded in order of least recently used first. _Setting to *true* is not recommended in SolrCloud mode._ -|`loadOnStartup` |If *true*, the default if it is not specified, the core will loaded when Solr starts. _Setting to *false* is not recommended in SolrCloud mode._ -|`coreNodeName` |Used only in SolrCloud, this is a unique identifier for the node hosting this replica. By default a coreNodeName is generated automatically, but setting this attribute explicitly allows you to manually assign a new core to replace an existing replica. For example: when replacing a machine that has had a hardware failure by restoring from backups on a new machine with a new hostname or port.. -|`ulogDir` |The absolute or relative directory for the update log for this core (SolrCloud). -|`shard` |The shard to assign this core to (SolrCloud). -|`collection` |The name of the collection this core is part of (SolrCloud). -|`roles` |Future param for SolrCloud or a way for users to mark nodes for their own use. -|=== +`config`:: The configuration file name for a given core. The default is `solrconfig.xml`. -Additional "user defined" properties may be specified for use as variables. For more information on how to define local properties, see the section <>. +`schema`:: The schema file name for a given core. The default is `schema.xml` but please note that if you are using a "managed schema" (the default behavior) then any value for this property which does not match the effective `managedSchemaResourceName` will be read once, backed up, and converted for managed schema use. See <> for more details. + +`dataDir`:: The core's data directory (where indexes are stored) as either an absolute pathname, or a path relative to the value of `instanceDir`. This is `data` by default. + +`configSet`:: The name of a defined configset, if desired, to use to configure the core (see the section <> for more details). + +`properties`:: The name of the properties file for this core. The value can be an absolute pathname or a path relative to the value of `instanceDir`. + +`transient`:: If *true*, the core can be unloaded if Solr reaches the `transientCacheSize`. The default if not specified is *false*. Cores are unloaded in order of least recently used first. _Setting this to *true* is not recommended in SolrCloud mode._ + +`loadOnStartup`:: If *true*, the default if it is not specified, the core will loaded when Solr starts. _Setting this to *false* is not recommended in SolrCloud mode._ + +`coreNodeName`:: Used only in SolrCloud, this is a unique identifier for the node hosting this replica. By default a `coreNodeName` is generated automatically, but setting this attribute explicitly allows you to manually assign a new core to replace an existing replica. For example, this can be useful when replacing a machine that has had a hardware failure by restoring from backups on a new machine with a new hostname or port. + +`ulogDir`:: The absolute or relative directory for the update log for this core (SolrCloud). + +`shard`:: The shard to assign this core to (SolrCloud). + +`collection`:: The name of the collection this core is part of (SolrCloud). + +`roles`:: Future parameter for SolrCloud or a way for users to mark nodes for their own use. + +Additional user-defined properties may be specified for use as variables. For more information on how to define local properties, see the section <>. diff --git a/solr/solr-ref-guide/src/defining-fields.adoc b/solr/solr-ref-guide/src/defining-fields.adoc index 4ef3a5d9e39..8e6de9c4269 100644 --- a/solr/solr-ref-guide/src/defining-fields.adoc +++ b/solr/solr-ref-guide/src/defining-fields.adoc @@ -33,15 +33,16 @@ The following example defines a field named `price` with a type named `float` an [[DefiningFields-FieldProperties]] == Field Properties -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +Field definitions can have the following properties: -[cols="30,70",options="header"] -|=== -|Property |Description -|name |The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g., `\_version_`) are reserved. Every field must have a `name`. -|type |The name of the `fieldType` for this field. This will be found in the `name` attribute on the `fieldType` definition. Every field must have a `type`. -|default |A default value that will be added automatically to any document that does not have a value in this field when it is indexed. If this property is not specified, there is no default. -|=== +`name`:: +The name of the field. Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g., `\_version_`) are reserved. Every field must have a `name`. + +`type`:: +The name of the `fieldType` for this field. This will be found in the `name` attribute on the `fieldType` definition. Every field must have a `type`. + +`default`:: +A default value that will be added automatically to any document that does not have a value in this field when it is indexed. If this property is not specified, there is no default. [[DefiningFields-OptionalFieldTypeOverrideProperties]] == Optional Field Type Override Properties @@ -70,4 +71,3 @@ Fields can have many of the same properties as field types. Properties from the |=== // TODO: SOLR-10655 END - diff --git a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc index b73fdf77777..4003f1ac914 100644 --- a/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc +++ b/solr/solr-ref-guide/src/detecting-languages-during-indexing.adoc @@ -71,28 +71,75 @@ Here is an example of a minimal LangDetect `langid` configuration in `solrconfig As previously mentioned, both implementations of the `langid` UpdateRequestProcessor take the same parameters. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`langid`:: +When `true`, the default, enables language detection. -[cols="30,10,10,10,40",options="header"] -|=== -|Parameter |Type |Default |Required |Description -|langid |Boolean |true |no |Enables and disables language detection. -|langid.fl |string |none |yes |A comma- or space-delimited list of fields to be processed by `langid`. -|langid.langField |string |none |yes |Specifies the field for the returned language code. -|langid.langsField |multivalued string |none |no |Specifies the field for a list of returned language codes. If you use `langid.map.individual`, each detected language will be added to this field. -|langid.overwrite |Boolean |false |no |Specifies whether the content of the `langField` and `langsField` fields will be overwritten if they already contain values. -|langid.lcmap |string |none |false |A space-separated list specifying colon delimited language code mappings to apply to the detected languages. For example, you might use this to map Chinese, Japanese, and Korean to a common `cjk` code, and map both American and British English to a single `en` code by using `langid.lcmap=ja:cjk zh:cjk ko:cjk en_GB:en en_US:en`. This affects both the values put into the `langField` and `langsField` fields, as well as the field suffixes when using `langid.map`, unless overridden by `langid.map.lcmap` -|langid.threshold |float |0.5 |no |Specifies a threshold value between 0 and 1 that the language identification score must reach before `langid` accepts it. With longer text fields, a high threshold such at 0.8 will give good results. For shorter text fields, you may need to lower the threshold for language identification, though you will be risking somewhat lower quality results. We recommend experimenting with your data to tune your results. -|langid.whitelist |string |none |no |Specifies a list of allowed language identification codes. Use this in combination with `langid.map` to ensure that you only index documents into fields that are in your schema. -|langid.map |Boolean |false |no |Enables field name mapping. If true, Solr will map field names for all fields listed in `langid.fl`. -|langid.map.fl |string |none |no |A comma-separated list of fields for `langid.map` that is different than the fields specified in `langid.fl`. -|langid.map.keepOrig |Boolean |false |no |If true, Solr will copy the field during the field name mapping process, leaving the original field in place. -|langid.map.individual |Boolean |false |no |If true, Solr will detect and map languages for each field individually. -|langid.map.individual.fl |string |none |no |A comma-separated list of fields for use with `langid.map.individual` that is different than the fields specified in `langid.fl`. -|langid.fallbackFields |string |none |no |If no language is detected that meets the `langid.threshold` score, or if the detected language is not on the `langid.whitelist`, this field specifies language codes to be used as fallback values. If no appropriate fallback languages are found, Solr will use the language code specified in `langid.fallback`. -|langid.fallback |string |none |no |Specifies a language code to use if no language is detected or specified in `langid.fallbackFields`. -|langid.map.lcmap |string |determined by `langid.lcmap` |no |A space-separated list specifying colon delimited language code mappings to use when mapping field names. For example, you might use this to make Chinese, Japanese, and Korean language fields use a common `*_cjk` suffix, and map both American and British English fields to a single `*_en` by using `langid.map.lcmap=ja:cjk zh:cjk ko:cjk en_GB:en en_US:en`. -|langid.map.pattern |Java regular expression |none |no |By default, fields are mapped as _. To change this pattern, you can specify a Java regular expression in this parameter. -|langid.map.replace |Java replace |none |no |By default, fields are mapped as _. To change this pattern, you can specify a Java replace in this parameter. -|langid.enforceSchema |Boolean |true |no |If false, the `langid` processor does not validate field names against your schema. This may be useful if you plan to rename or delete fields later in the UpdateChain. -|=== +`langid.fl`:: +A comma- or space-delimited list of fields to be processed by `langid`. This parameter is required. + +`langid.langField`:: +Specifies the field for the returned language code. This parameter is required. + +`langid.langsField`:: +Specifies the field for a list of returned language codes. If you use `langid.map.individual`, each detected language will be added to this field. + +`langid.overwrite`:: +Specifies whether the content of the `langField` and `langsField` fields will be overwritten if they already contain values. The default is `false`. + +`langid.lcmap`:: +A space-separated list specifying colon delimited language code mappings to apply to the detected languages. ++ +For example, you might use this to map Chinese, Japanese, and Korean to a common `cjk` code, and map both American and British English to a single `en` code by using `langid.lcmap=ja:cjk zh:cjk ko:cjk en_GB:en en_US:en`. ++ +This affects both the values put into the `langField` and `langsField` fields, as well as the field suffixes when using `langid.map`, unless overridden by `langid.map.lcmap`. + +`langid.threshold`:: +Specifies a threshold value between 0 and 1 that the language identification score must reach before `langid` accepts it. ++ +With longer text fields, a high threshold such as `0.8` will give good results. For shorter text fields, you may need to lower the threshold for language identification, though you will be risking somewhat lower quality results. We recommend experimenting with your data to tune your results. ++ +The default is `0.5`. + +`langid.whitelist`:: +Specifies a list of allowed language identification codes. Use this in combination with `langid.map` to ensure that you only index documents into fields that are in your schema. + +`langid.map`:: +Enables field name mapping. If `true`, Solr will map field names for all fields listed in `langid.fl`. The default is `false`. + +`langid.map.fl`:: +A comma-separated list of fields for `langid.map` that is different than the fields specified in `langid.fl`. + +`langid.map.keepOrig`:: +If `true`, Solr will copy the field during the field name mapping process, leaving the original field in place. The default is `false`. + +`langid.map.individual`:: +If `true`, Solr will detect and map languages for each field individually. The default is `false`. + +`langid.map.individual.fl`:: +A comma-separated list of fields for use with `langid.map.individual` that is different than the fields specified in `langid.fl`. + +`langid.fallback`:: +Specifies a language code to use if no language is detected or specified in `langid.fallbackFields`. + +`langid.fallbackFields`:: +If no language is detected that meets the `langid.threshold` score, or if the detected language is not on the `langid.whitelist`, this field specifies language codes to be used as fallback values. ++ +If no appropriate fallback languages are found, Solr will use the language code specified in `langid.fallback`. + +`langid.map.lcmap`:: +A space-separated list specifying colon-delimited language code mappings to use when mapping field names. ++ +For example, you might use this to make Chinese, Japanese, and Korean language fields use a common `*_cjk` suffix, and map both American and British English fields to a single `*_en` by using `langid.map.lcmap=ja:cjk zh:cjk ko:cjk en_GB:en en_US:en`. ++ +A list defined with this parameter will override any configuration set with `langid.lcmap`. + +`langid.map.pattern`:: +By default, fields are mapped as _. To change this pattern, you can specify a Java regular expression in this parameter. + +`langid.map.replace`:: +By default, fields are mapped as `_`. To change this pattern, you can specify a Java replace in this parameter. + +`langid.enforceSchema`:: +If `false`, the `langid` processor does not validate field names against your schema. This may be useful if you plan to rename or delete fields later in the UpdateChain. ++ +The default is `true`. diff --git a/solr/solr-ref-guide/src/distributed-requests.adoc b/solr/solr-ref-guide/src/distributed-requests.adoc index 75f023c1c0c..b89878fa0c5 100644 --- a/solr/solr-ref-guide/src/distributed-requests.adoc +++ b/solr/solr-ref-guide/src/distributed-requests.adoc @@ -91,21 +91,32 @@ To configure the standard handler, provide a configuration like this in `solrcon The parameters that can be specified are as follows: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`socketTimeout`:: +The amount of time in ms that a socket is allowed to wait. The default is `0`, where the operating system's default will be used. -[cols="20,15,65",options="header"] -|=== -|Parameter |Default |Explanation -|`socketTimeout` |0 (use OS default) |The amount of time in ms that a socket is allowed to wait. -|`connTimeout` |0 (use OS default) |The amount of time in ms that is accepted for binding / connecting a socket -|`maxConnectionsPerHost` |20 |The maximum number of concurrent connections that is made to each individual shard in a distributed search. -|`maxConnections` |`10000` |The total maximum number of concurrent connections in distributed searches. -|`corePoolSize` |0 |The retained lowest limit on the number of threads used in coordinating distributed search. -|`maximumPoolSize` |Integer.MAX_VALUE |The maximum number of threads used for coordinating distributed search. -|`maxThreadIdleTime` |5 seconds |The amount of time to wait for before threads are scaled back in response to a reduction in load. -|`sizeOfQueue` |-1 |If specified, the thread pool will use a backing queue instead of a direct handoff buffer. High throughput systems will want to configure this to be a direct hand off (with -1). Systems that desire better latency will want to configure a reasonable size of queue to handle variations in requests. -|`fairnessPolicy` |false |Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. -|=== +`connTimeout`:: +The amount of time in ms that is accepted for binding / connecting a socket. The default is `0`, where the operating system's default will be used. + +`maxConnectionsPerHost`:: +The maximum number of concurrent connections that is made to each individual shard in a distributed search. The default is `20`. + +`maxConnections`:: +The total maximum number of concurrent connections in distributed searches. The default is `10000` + +`corePoolSize`:: +The retained lowest limit on the number of threads used in coordinating distributed search. The default is `0`. + +`maximumPoolSize`:: +The maximum number of threads used for coordinating distributed search. The default is `Integer.MAX_VALUE`. + +`maxThreadIdleTime`:: +The amount of time in seconds to wait for before threads are scaled back in response to a reduction in load. The default is `5`. + +`sizeOfQueue`:: +If specified, the thread pool will use a backing queue instead of a direct handoff buffer. High throughput systems will want to configure this to be a direct hand off (with `-1`). Systems that desire better latency will want to configure a reasonable size of queue to handle variations in requests. The default is `-1`. + +`fairnessPolicy`:: +Chooses the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favored over latency. The default is `false`. [[DistributedRequests-ConfiguringstatsCache_DistributedIDF_]] == Configuring statsCache (Distributed IDF) diff --git a/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc index 12d291306cf..89b8e9062ad 100644 --- a/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc +++ b/solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc @@ -75,19 +75,33 @@ The properties that can be specified for a given field type fall into three majo === General Properties -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +These are the general properties for fields + +`name`:: +The name of the fieldType. This value gets used in field definitions, in the "type" attribute. It is strongly recommended that names consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced. + +`class`:: +The class name that gets used to store and index the data for this type. Note that you may prefix included class names with "solr." and Solr will automatically figure out which packages to search for the class - so `solr.TextField` will work. ++ +If you are using a third-party class, you will probably need to have a fully qualified class name. The fully qualified equivalent for `solr.TextField` is `org.apache.solr.schema.TextField`. + +`positionIncrementGap`:: +For multivalued fields, specifies a distance between multiple values, which prevents spurious phrase matches. + +`autoGeneratePhraseQueries`:: For text fields. If `true`, Solr automatically generates phrase queries for adjacent terms. If `false`, terms must be enclosed in double-quotes to be treated as phrases. + +`enableGraphQueries`:: +For text fields, applicable when querying with <>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g., <> and <>. ++ +Use `false` for field types with query analyzers including filters that can match docs when some tokens are missing, e.g., <>. + +[[FieldTypeDefinitionsandProperties-docValuesFormat]] +`docValuesFormat`:: +Defines a custom `DocValuesFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml. + +`postingsFormat`:: +Defines a custom `PostingsFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml. -[cols="30,40,30",options="header"] -|=== -|Property |Description |Values -|name |The name of the fieldType. This value gets used in field definitions, in the "type" attribute. It is strongly recommended that names consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced. | -|class |The class name that gets used to store and index the data for this type. Note that you may prefix included class names with "solr." and Solr will automatically figure out which packages to search for the class - so `solr.TextField` will work. If you are using a third-party class, you will probably need to have a fully qualified class name. The fully qualified equivalent for `solr.TextField` is `org.apache.solr.schema.TextField`. | -|positionIncrementGap |For multivalued fields, specifies a distance between multiple values, which prevents spurious phrase matches |integer -|autoGeneratePhraseQueries |For text fields. If true, Solr automatically generates phrase queries for adjacent terms. If false, terms must be enclosed in double-quotes to be treated as phrases. |true or false -|enableGraphQueries |For text fields, applicable when querying with <>. Use `true` (the default) for field types with query analyzers including graph-aware filters, e.g. <> and <>. Use `false` for field types with query analyzers including filters that can match docs when some tokens are missing, e.g., <>. |true or false -|[[FieldTypeDefinitionsandProperties-docValuesFormat]]docValuesFormat |Defines a custom `DocValuesFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml. |n/a -|postingsFormat |Defines a custom `PostingsFormat` to use for fields of this type. This requires that a schema-aware codec, such as the `SchemaCodecFactory` has been configured in solrconfig.xml. |n/a -|=== [NOTE] ==== diff --git a/solr/solr-ref-guide/src/hadoop-authentication-plugin.adoc b/solr/solr-ref-guide/src/hadoop-authentication-plugin.adoc index 2ac541a57af..1c17fbca029 100644 --- a/solr/solr-ref-guide/src/hadoop-authentication-plugin.adoc +++ b/solr/solr-ref-guide/src/hadoop-authentication-plugin.adoc @@ -41,21 +41,35 @@ For most SolrCloud or standalone Solr setups, the `HadoopAuthPlugin` should suff [[HadoopAuthenticationPlugin-PluginConfiguration]] == Plugin Configuration -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`class`:: +Should be either `solr.HadoopAuthPlugin` or `solr.ConfigurableInternodeAuthHadoopPlugin`. This parameter is required. + +`type`:: +The type of authentication scheme to be configured. See https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[configuration] options. This parameter is required. + +`sysPropPrefix`:: +The prefix to be used to define the Java system property for configuring the authentication mechanism. This property is required. ++ +The name of the Java system property is defined by appending the configuration parameter name to this prefix value. For example, if the prefix is `solr` then the Java system property `solr.kerberos.principal` defines the value of configuration parameter `kerberos.principal`. + +`authConfigs`:: +Configuration parameters required by the authentication scheme defined by the type property. This property is required. For more details, see https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[Hadoop configuration] options. + +`defaultConfigs`:: +Default values for the configuration parameters specified by the `authConfigs` property. The default values are specified as a collection of key-value pairs (i.e., `"property-name": "default_value"`). + +`enableDelegationToken`:: +If `true`, the delegation tokens functionality will be enabled. + +`initKerberosZk`:: +For enabling initialization of kerberos before connecting to ZooKeeper (if applicable). + +`proxyUserConfigs`:: +Configures proxy users for the underlying Hadoop authentication mechanism. This configuration is expressed as a collection of key-value pairs (i.e., `"property-name": "default_value"`). + +`clientBuilderFactory`:: No | +The `HttpClientBuilderFactory` implementation used for the Solr internal communication. Only applicable for `ConfigurableInternodeAuthHadoopPlugin`. -[cols="20,15,65",options="header"] -|=== -|Parameter Name |Required |Description -|class |Yes |Should be either `solr.HadoopAuthPlugin` or `solr.ConfigurableInternodeAuthHadoopPlugin`. -|type |Yes |The type of authentication scheme to be configured. See https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[configuration] options. -|sysPropPrefix |Yes |The prefix to be used to define the Java system property for configuring the authentication mechanism. The name of the Java system property is defined by appending the configuration parameter name to this prefix value. For example, if the prefix is 'solr' then the Java system property 'solr.kerberos.principal' defines the value of configuration parameter 'kerberos.principal'. -|authConfigs |Yes |Configuration parameters required by the authentication scheme defined by the type property. For more details, see https://hadoop.apache.org/docs/stable/hadoop-auth/Configuration.html[Hadoop configuration] options. -|defaultConfigs |No |Default values for the configuration parameters specified by the `authConfigs` property. The default values are specified as a collection of key-value pairs (i.e., `property-name:default_value`). -|enableDelegationToken |No |Enable (or disable) the delegation tokens functionality. -|initKerberosZk |No |For enabling initialization of kerberos before connecting to ZooKeeper (if applicable). -|proxyUserConfigs |No |Configures proxy users for the underlying Hadoop authentication mechanism. This configuration is expressed as a collection of key-value pairs (i.e., `property-name:value`). -|clientBuilderFactory |No |The `HttpClientBuilderFactory` implementation used for the Solr internal communication. Only applicable for `ConfigurableInternodeAuthHadoopPlugin`. -|=== [[HadoopAuthenticationPlugin-ExampleConfigurations]] == Example Configurations diff --git a/solr/solr-ref-guide/src/index-replication.adoc b/solr/solr-ref-guide/src/index-replication.adoc index df8e9c60371..774b78cde75 100644 --- a/solr/solr-ref-guide/src/index-replication.adoc +++ b/solr/solr-ref-guide/src/index-replication.adoc @@ -51,21 +51,33 @@ When using SolrCloud, the `ReplicationHandler` must be available via the `/repli The table below defines the key terms associated with Solr replication. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +Index:: +A Lucene index is a directory of files. These files make up the searchable and returnable data of a Solr Core. + +Distribution:: +The copying of an index from the master server to all slaves. The distribution process takes advantage of Lucene's index file structure. + +Inserts and Deletes:: +As inserts and deletes occur in the index, the directory remains unchanged. Documents are always inserted into newly created files. Documents that are deleted are not removed from the files. They are flagged in the file, deletable, and are not removed from the files until the index is optimized. + +Master and Slave:: +A Solr replication master is a single node which receives all updates initially and keeps everything organized. Solr replication slave nodes receive no updates directly, instead all changes (such as inserts, updates, deletes, etc.) are made against the single master node. Changes made on the master are distributed to all the slave nodes which service all query requests from the clients. + +Update:: +An update is a single change request against a single Solr instance. It may be a request to delete a document, add a new document, change a document, delete all documents matching a query, etc. Updates are handled synchronously within an individual Solr instance. + +Optimization:: +A process that compacts the index and merges segments in order to improve query performance. Optimization should only be run on the master nodes. An optimized index may give query performance gains compared to an index that has become fragmented over a period of time with many updates. Distributing an optimized index requires a much longer time than the distribution of new segments to an un-optimized index. + +Segments:: +A self contained subset of an index consisting of some documents and data structures related to the inverted index of terms in those documents. + +mergeFactor:: +A parameter that controls the number of segments in an index. For example, when mergeFactor is set to 3, Solr will fill one segment with documents until the limit maxBufferedDocs is met, then it will start a new segment. When the number of segments specified by mergeFactor is reached (in this example, 3) then Solr will merge all the segments into a single index file, then begin writing new documents to a new segment. + +Snapshot:: +A directory containing hard links to the data files of an index. Snapshots are distributed from the master nodes when the slaves pull them, "smart copying" any segments the slave node does not have in snapshot directory that contains the hard links to the most recent index data files. -[cols="30,70",options="header"] -|=== -|Term |Definition -|Index |A Lucene index is a directory of files. These files make up the searchable and returnable data of a Solr Core. -|Distribution |The copying of an index from the master server to all slaves. The distribution process takes advantage of Lucene's index file structure. -|Inserts and Deletes |As inserts and deletes occur in the index, the directory remains unchanged. Documents are always inserted into newly created files. Documents that are deleted are not removed from the files. They are flagged in the file, deletable, and are not removed from the files until the index is optimized. -|Master and Slave |A Solr replication master is a single node which receives all updates initially and keeps everything organized. Solr replication slave nodes receive no updates directly, instead all changes (such as inserts, updates, deletes, etc.) are made against the single master node. Changes made on the master are distributed to all the slave nodes which service all query requests from the clients. -|Update |An update is a single change request against a single Solr instance. It may be a request to delete a document, add a new document, change a document, delete all documents matching a query, etc. Updates are handled synchronously within an individual Solr instance. -|Optimization |A process that compacts the index and merges segments in order to improve query performance. Optimization should only be run on the master nodes. An optimized index may give query performance gains compared to an index that has become fragmented over a period of time with many updates. Distributing an optimized index requires a much longer time than the distribution of new segments to an un-optimized index. -|Segments |A self contained subset of an index consisting of some documents and data structures related to the inverted index of terms in those documents. -|mergeFactor |A parameter that controls the number of segments in an index. For example, when mergeFactor is set to 3, Solr will fill one segment with documents until the limit maxBufferedDocs is met, then it will start a new segment. When the number of segments specified by mergeFactor is reached (in this example, 3) then Solr will merge all the segments into a single index file, then begin writing new documents to a new segment. -|Snapshot |A directory containing hard links to the data files of an index. Snapshots are distributed from the master nodes when the slaves pull them, "smart copying" any segments the slave node does not have in snapshot directory that contains the hard links to the most recent index data files. -|=== [[IndexReplication-ConfiguringtheReplicationHandler]] == Configuring the ReplicationHandler @@ -80,17 +92,20 @@ In addition to `ReplicationHandler` configuration options specific to the master Before running a replication, you should set the following parameters on initialization of the handler: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`replicateAfter`:: +String specifying action after which replication should occur. Valid values are commit, optimize, or startup. There can be multiple values for this parameter. If you use "startup", you need to have a "commit" and/or "optimize" entry also if you want to trigger replication on future commits or optimizes. -[cols="30,70",options="header"] -|=== -|Name |Description -|replicateAfter |String specifying action after which replication should occur. Valid values are commit, optimize, or startup. There can be multiple values for this parameter. If you use "startup", you need to have a "commit" and/or "optimize" entry also if you want to trigger replication on future commits or optimizes. -|backupAfter |String specifying action after which a backup should occur. Valid values are commit, optimize, or startup. There can be multiple values for this parameter. It is not required for replication, it just makes a backup. -|maxNumberOfBackups |Integer specifying how many backups to keep. This can be used to delete all but the most recent N backups. -|confFiles |The configuration files to replicate, separated by a comma. -|commitReserveDuration |If your commits are very frequent and your network is slow, you can tweak this parameter to increase the amount of time taken to download 5Mb from the master to a slave. The default is 10 seconds. -|=== +`backupAfter` +String specifying action after which a backup should occur. Valid values are commit, optimize, or startup. There can be multiple values for this parameter. It is not required for replication, it just makes a backup. + +`maxNumberOfBackups` +Integer specifying how many backups to keep. This can be used to delete all but the most recent N backups. + +`confFiles`:: +The configuration files to replicate, separated by a comma. + +`commitReserveDuration`:: +If your commits are very frequent and your network is slow, you can tweak this parameter to increase the amount of time taken to download 5Mb from the master to a slave. The default is 10 seconds. The example below shows a possible 'master' configuration for the `ReplicationHandler`, including a fixed number of backups and an invariant setting for the `maxWriteMBPerSec` request parameter to prevent slaves from saturating its network interface @@ -203,17 +218,13 @@ Here is an example of a ReplicationHandler configuration for a repeater: When a commit or optimize operation is performed on the master, the RequestHandler reads the list of file names which are associated with each commit point. This relies on the `replicateAfter` parameter in the configuration to decide which types of events should trigger replication. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +These operations are supported: -[cols="30,70",options="header"] -|=== -|Setting on the Master |Description -|commit |Triggers replication whenever a commit is performed on the master index. -|optimize |Triggers replication whenever the master index is optimized. -|startup |Triggers replication whenever the master index starts up. -|=== +* `commit`: Triggers replication whenever a commit is performed on the master index. +* `optimize`: Triggers replication whenever the master index is optimized. +* `startup`: Triggers replication whenever the master index starts up. -The replicateAfter parameter can accept multiple arguments. For example: +The `replicateAfter` parameter can accept multiple arguments. For example: [source,xml] ---- @@ -262,36 +273,87 @@ To correct this problem, the slave then copies all the index files from master t You can use the HTTP commands below to control the ReplicationHandler's operations. -[width="100%",options="header",] -|=== -|Command |Description -|http://_master_host:port_/solr/_core_name_/replication?command=enablereplication |Enables replication on the master for all its slaves. -|http://_master_host:port_/solr/_core_name_/replication?command=disablereplication |Disables replication on the master for all its slaves. -|http://_host:port_/solr/_core_name_/replication?command=indexversion |Returns the version of the latest replicatable index on the specified master or slave. -|http://_slave_host:port_/solr/_core_name_/replication?command=fetchindex |Forces the specified slave to fetch a copy of the index from its master. If you like, you can pass an extra attribute such as masterUrl or compression (or any other parameter which is specified in the `` tag) to do a one time replication from a master. This obviates the need for hard-coding the master in the slave. -|http://_slave_host:port_/solr/_core_name_/replication?command=abortfetch |Aborts copying an index from a master to the specified slave. -|http://_slave_host:port_/solr/_core_name_/replication?command=enablepoll |Enables the specified slave to poll for changes on the master. -|http://_slave_host:port_/solr/_core_name_/replication?command=disablepoll |Disables the specified slave from polling for changes on the master. -|http://_slave_host:port_/solr/_core_name_/replication?command=details |Retrieves configuration details and current status. -|http://_host:port_/solr/_core_name_/replication?command=filelist&generation=<_generation-number_> |Retrieves a list of Lucene files present in the specified host's index. You can discover the generation number of the index by running the `indexversion` command. -|http://_master_host:port_/solr/_core_name_/replication?command=backup a| -Creates a backup on master if there are committed index data in the server; otherwise, does nothing. This command is useful for making periodic backups. +`enablereplication`:: +Enable replication on the "master" for all its slaves. ++ +[source,bash] +http://_master_host:port_/solr/_core_name_/replication?command=enablereplication -supported request parameters: +`disablereplication`:: +Disable replication on the master for all its slaves. ++ +[source,bash] +http://_master_host:port_/solr/_core_name_/replication?command=disablereplication -* `numberToKeep:` request parameter can be used with the backup command unless the `maxNumberOfBackups` initialization parameter has been specified on the handler – in which case `maxNumberOfBackups` is always used and attempts to use the `numberToKeep` request parameter will cause an error. -* `name` : (optional) Backup name . The snapshot will be created in a directory called snapshot. within the data directory of the core . By default the name is generated using date in `yyyyMMddHHmmssSSS` format. If `location` parameter is passed , that would be used instead of the data directory -* `location`: Backup location +`indexversion`:: +Return the version of the latest replicatable index on the specified master or slave. ++ +[source,bash] +http://_host:port_/solr/_core_name_/replication?command=indexversion -|http://_master_host:port_ /solr/_core_name_/replication?command=deletebackup a| -Delete any backup created using the `backup` command . +`fetchindex`:: +Force the specified slave to fetch a copy of the index from its master. ++ +[source.bash] +http://_slave_host:port_/solr/_core_name_/replication?command=fetchindex ++ +If you like, you can pass an extra attribute such as `masterUrl` or `compression` (or any other parameter which is specified in the `` tag) to do a one time replication from a master. This obviates the need for hard-coding the master in the slave. -Request parameters: +`abortfetch`:: +Abort copying an index from a master to the specified slave. ++ +[source,bash] +http://_slave_host:port_/solr/_core_name_/replication?command=abortfetch -* name: The name of the snapshot . A snapshot with the name snapshot. must exist .If not, an error is thrown -* location: Location where the snapshot is created +`enablepoll`:: +Enable the specified slave to poll for changes on the master. ++ +[source,bash] +http://_slave_host:port_/solr/_core_name_/replication?command=enablepoll + +`disablepoll`:: +Disable the specified slave from polling for changes on the master. ++ +[source,bash] +http://_slave_host:port_/solr/_core_name_/replication?command=disablepoll + +`details`:: +Retrieve configuration details and current status. ++ +[source,bash] +http://_slave_host:port_/solr/_core_name_/replication?command=details + +`filelist`:: +Retrieve a list of Lucene files present in the specified host's index. ++ +[source,bash] +http://_host:port_/solr/_core_name_/replication?command=filelist&generation=<_generation-number_> ++ +You can discover the generation number of the index by running the `indexversion` command. + +`backup`:: +Create a backup on master if there are committed index data in the server; otherwise, does nothing. ++ +[source,bash] +http://_master_host:port_/solr/_core_name_/replication?command=backup ++ +This command is useful for making periodic backups. There are several supported request parameters: ++ +* `numberToKeep:`: This can be used with the backup command unless the `maxNumberOfBackups` initialization parameter has been specified on the handler – in which case `maxNumberOfBackups` is always used and attempts to use the `numberToKeep` request parameter will cause an error. +* `name`: (optional) Backup name. The snapshot will be created in a directory called `snapshot.` within the data directory of the core. By default the name is generated using date in `yyyyMMddHHmmssSSS` format. If `location` parameter is passed, that would be used instead of the data directory +* `location`: Backup location. + +`deletebackup`:: +Delete any backup created using the `backup` command. ++ +[source,bash] +http://_master_host:port_ /solr/_core_name_/replication?command=deletebackup ++ +There are two supported parameters: + +* `name`: The name of the snapshot. A snapshot with the name `snapshot._name_` must exist. If not, an error is thrown. +* `location`: Location where the snapshot is created. -|=== [[IndexReplication-DistributionandOptimization]] == Distribution and Optimization @@ -302,7 +364,9 @@ The time required to optimize a master index can vary dramatically. A small inde Distributing a newly optimized index may take only a few minutes or up to an hour or more, again depending on the size of the index and the performance capabilities of network connections and disks. During optimization the machine is under load and does not process queries very well. Given a schedule of updates being driven a few times an hour to the slaves, we cannot run an optimize with every committed snapshot. -Copying an optimized index means that the *entire* index will need to be transferred during the next snappull. This is a large expense, but not nearly as huge as running the optimize everywhere. Consider this example: on a three-slave one-master configuration, distributing a newly-optimized index takes approximately 80 seconds _total_. Rolling the change across a tier would require approximately ten minutes per machine (or machine group). If this optimize were rolled across the query tier, and if each slave node being optimized were disabled and not receiving queries, a rollout would take at least twenty minutes and potentially as long as an hour and a half. Additionally, the files would need to be synchronized so that the _following_ the optimize, snappull would not think that the independently optimized files were different in any way. This would also leave the door open to independent corruption of indexes instead of each being a perfect copy of the master. +Copying an optimized index means that the *entire* index will need to be transferred during the next `snappull`. This is a large expense, but not nearly as huge as running the optimize everywhere. + +Consider this example: on a three-slave one-master configuration, distributing a newly-optimized index takes approximately 80 seconds _total_. Rolling the change across a tier would require approximately ten minutes per machine (or machine group). If this optimize were rolled across the query tier, and if each slave node being optimized were disabled and not receiving queries, a rollout would take at least twenty minutes and potentially as long as an hour and a half. Additionally, the files would need to be synchronized so that the _following_ the optimize, `snappull` would not think that the independently optimized files were different in any way. This would also leave the door open to independent corruption of indexes instead of each being a perfect copy of the master. Optimizing on the master allows for a straight-forward optimization operation. No query slaves need to be taken out of service. The optimized index can be distributed in the background as queries are being normally serviced. The optimization can occur at any time convenient to the application providing index updates. diff --git a/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc b/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc index ce3250355d2..63ab26dda8a 100644 --- a/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/indexconfig-in-solrconfig.adoc @@ -192,15 +192,12 @@ The maximum time to wait for a write lock on an IndexWriter. The default is 1000 There are a few other parameters that may be important to configure for your implementation. These settings affect how or when updates are made to an index. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`reopenReaders`:: Controls if IndexReaders will be re-opened, instead of closed and then opened, which is often less efficient. The default is true. + +`deletionPolicy`:: Controls how commits are retained in case of rollback. The default is `SolrDeletionPolicy`, which has sub-parameters for the maximum number of commits to keep (`maxCommitsToKeep`), the maximum number of optimized commits to keep (`maxOptimizedCommitsToKeep`), and the maximum age of any commit to keep (`maxCommitAge`), which supports `DateMathParser` syntax. + +`infoStream`:: The InfoStream setting instructs the underlying Lucene classes to write detailed debug information from the indexing process as Solr log messages. -[cols="30,70",options="header"] -|=== -|Setting |Description -|reopenReaders |Controls if IndexReaders will be re-opened, instead of closed and then opened, which is often less efficient. The default is true. -|deletionPolicy |Controls how commits are retained in case of rollback. The default is `SolrDeletionPolicy`, which has sub-parameters for the maximum number of commits to keep (`maxCommitsToKeep`), the maximum number of optimized commits to keep (`maxOptimizedCommitsToKeep`), and the maximum age of any commit to keep (`maxCommitAge`), which supports `DateMathParser` syntax. -|infoStream |The InfoStream setting instructs the underlying Lucene classes to write detailed debug information from the indexing process as Solr log messages. -|=== [source,xml] ---- diff --git a/solr/solr-ref-guide/src/initparams-in-solrconfig.adoc b/solr/solr-ref-guide/src/initparams-in-solrconfig.adoc index 126e96b045b..ac409ff66ce 100644 --- a/solr/solr-ref-guide/src/initparams-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/initparams-in-solrconfig.adoc @@ -44,22 +44,16 @@ This sets the default search field ("df") to be "_text_" for all of the request The syntax and semantics are similar to that of a `` . The following are the attributes -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`path`:: +A comma-separated list of paths which will use the parameters. Wildcards can be used in paths to define nested paths, as described below. -[cols="30,70",options="header"] -|=== -|Property |Description -|path |A comma-separated list of paths which will use the parameters. Wildcards can be used in paths to define nested paths, as described below. -|name a| +`name`:: The name of this set of parameters. The name can be used directly in a requestHandler definition if a path is not explicitly named. If you give your `` a name, you can refer to the params in a `` that is not defined as a path. - ++ For example, if an `` section has the name "myParams", you can call the name when defining your request handler: - ++ [source,xml] ----- ----- -|=== [[InitParamsinSolrConfig-Wildcards]] == Wildcards diff --git a/solr/solr-ref-guide/src/kerberos-authentication-plugin.adoc b/solr/solr-ref-guide/src/kerberos-authentication-plugin.adoc index 396243349ce..da963166f37 100644 --- a/solr/solr-ref-guide/src/kerberos-authentication-plugin.adoc +++ b/solr/solr-ref-guide/src/kerberos-authentication-plugin.adoc @@ -232,19 +232,26 @@ The main properties we are concerned with are the `keyTab` and `principal` prope While starting up Solr, the following host-specific parameters need to be passed. These parameters can be passed at the command line with the `bin/solr` start command (see <> for details on how to pass system parameters) or defined in `bin/solr.in.sh` or `bin/solr.in.cmd` as appropriate for your operating system. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`solr.kerberos.name.rules`:: +Used to map Kerberos principals to short names. Default value is `DEFAULT`. Example of a name rule: `RULE:[1:$1@$0](.\*EXAMPLE.COM)s/@.*//`. + +`solr.kerberos.cookie.domain`:: Used to issue cookies and should have the hostname of the Solr node. This parameter is required. + +`solr.kerberos.cookie.portaware`:: +When set to `true`, cookies are differentiated based on host and port, as opposed to standard cookies which are not port aware. This should be set if more than one Solr node is hosted on the same host. The default is `false`. + +`solr.kerberos.principal`:: +The service principal. This parameter is required. + +`solr.kerberos.keytab`:: +Keytab file path containing service principal credentials. This parameter is required. + +`solr.kerberos.jaas.appname`:: +The app name (section name) within the JAAS configuration file which is required for internode communication. Default is `Client`, which is used for ZooKeeper authentication as well. If different users are used for ZooKeeper and Solr, they will need to have separate sections in the JAAS configuration file. + +`java.security.auth.login.config`:: +Path to the JAAS configuration file for configuring a Solr client for internode communication. This parameter is required. -[cols="35,10,55",options="header"] -|=== -|Parameter Name |Required |Description -|`solr.kerberos.name.rules` |No |Used to map Kerberos principals to short names. Default value is `DEFAULT`. Example of a name rule: `RULE:[1:$1@$0](.\*EXAMPLE.COM)s/@.*//` -|`solr.kerberos.cookie.domain` |Yes |Used to issue cookies and should have the hostname of the Solr node. -|`solr.kerberos.cookie.portaware` |No |When set to true, cookies are differentiated based on host and port, as opposed to standard cookies which are not port aware. This should be set if more than one Solr node is hosted on the same host. The default is false. -|`solr.kerberos.principal` |Yes |The service principal. -|`solr.kerberos.keytab` |Yes |Keytab file path containing service principal credentials. -|`solr.kerberos.jaas.appname` |No |The app name (section name) within the JAAS configuration file which is required for internode communication. Default is `Client`, which is used for ZooKeeper authentication as well. If different users are used for ZooKeeper and Solr, they will need to have separate sections in the JAAS configuration file. -|`java.security.auth.login.config` |Yes |Path to the JAAS configuration file for configuring a Solr client for internode communication. -|=== Here is an example that could be added to `bin/solr.in.sh`. Make sure to change this example to use the right hostname and the keytab file path. @@ -279,18 +286,23 @@ There are a few use cases for Solr where this might be helpful: To enable delegation tokens, several parameters must be defined. These parameters can be passed at the command line with the `bin/solr` start command (see <> for details on how to pass system parameters) or defined in `bin/solr.in.sh` or `bin/solr.in.cmd` as appropriate for your operating system. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`solr.kerberos.delegation.token.enabled`:: +This is `false` by default, set to `true` to enable delegation tokens. This parameter is required if you want to enable tokens. -[cols="40,10,50",options="header"] -|=== -|Parameter Name |Required |Description -|`solr.kerberos.delegation.token.enabled` |Yes, to enable tokens |False by default, set to true to enable delegation tokens. -|`solr.kerberos.delegation.token.kind` |No |Type of delegation tokens. By default this is `solr-dt`. Likely this does not need to change. No other option is available at this time. -|`solr.kerberos.delegation.token.validity` |No |Time, in seconds, for which delegation tokens are valid. The default is 36000 seconds. -|`solr.kerberos.delegation.token.signer.secret.provider` |No |Where delegation token information is stored internally. The default is `zookeeper` which must be the location for delegation tokens to work across Solr servers (when running in SolrCloud mode). No other option is available at this time. -|`solr.kerberos.delegation.token.signer.secret.provider.zookeper.path` |No |The ZooKeeper path where the secret provider information is stored. This is in the form of the path + /security/token. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/token`. -|`solr.kerberos.delegation.token.secret.manager.znode.working.path` |No |The ZooKeeper path where token information is stored. This is in the form of the path + /security/zkdtsm. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/zkdtsm`. -|=== +`solr.kerberos.delegation.token.kind`:: +The type of delegation tokens. By default this is `solr-dt`. Likely this does not need to change. No other option is available at this time. + +`solr.kerberos.delegation.token.validity`:: +Time, in seconds, for which delegation tokens are valid. The default is 36000 seconds. + +`solr.kerberos.delegation.token.signer.secret.provider`:: +Where delegation token information is stored internally. The default is `zookeeper` which must be the location for delegation tokens to work across Solr servers (when running in SolrCloud mode). No other option is available at this time. + +`solr.kerberos.delegation.token.signer.secret.provider.zookeper.path`:: +The ZooKeeper path where the secret provider information is stored. This is in the form of the path + /security/token. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/token`. + +`solr.kerberos.delegation.token.secret.manager.znode.working.path`:: +The ZooKeeper path where token information is stored. This is in the form of the path + /security/zkdtsm. The path can include the chroot or the chroot can be omitted if you are not using it. This example includes the chroot: `server1:9983,server2:9983,server3:9983/solr/security/zkdtsm`. [[KerberosAuthenticationPlugin-StartSolr]] === Start Solr diff --git a/solr/solr-ref-guide/src/making-and-restoring-backups.adoc b/solr/solr-ref-guide/src/making-and-restoring-backups.adoc index 3d7a17de6b5..6f3383c1b45 100644 --- a/solr/solr-ref-guide/src/making-and-restoring-backups.adoc +++ b/solr/solr-ref-guide/src/making-and-restoring-backups.adoc @@ -37,7 +37,7 @@ Backups and restoration uses Solr's replication handler. Out of the box, Solr in === Backup API -The backup API requires sending a command to the `/replication` handler to back up the system. +The `backup` API requires sending a command to the `/replication` handler to back up the system. You can trigger a back-up with an HTTP command like this (replace "gettingstarted" with the name of the core you are working with): @@ -47,27 +47,28 @@ You can trigger a back-up with an HTTP command like this (replace "gettingstarte http://localhost:8983/solr/gettingstarted/replication?command=backup ---- -The backup command is an asynchronous call, and it will represent data from the latest index commit point. All indexing and search operations will continue to be executed against the index as usual. +The `backup` command is an asynchronous call, and it will represent data from the latest index commit point. All indexing and search operations will continue to be executed against the index as usual. Only one backup call can be made against a core at any point in time. While an ongoing backup operation is happening subsequent calls for restoring will throw an exception. The backup request can also take the following additional parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`location`:: +The path where the backup will be created. If the path is not absolute then the backup path will be relative to Solr's instance directory. +|name |The snapshot will be created in a directory called `snapshot.`. If a name is not specified then the directory name would have the following format: `snapshot.`. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|location |The path where the backup will be created. If the path is not absolute then the backup path will be relative to Solr's instance directory. -|name |The snapshot will be created in a directory called `snapshot.`. If a name is not specified then the directory name would have the following format: `snapshot.` -|numberToKeep |The number of backups to keep. If `maxNumberOfBackups` has been specified on the replication handler in `solrconfig.xml`, `maxNumberOfBackups` is always used and attempts to use `numberToKeep` will cause an error. Also, this parameter is not taken into consideration if the backup name is specified. More information about `maxNumberOfBackups` can be found in the section <>. -|repository |The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically. -|commitName |The name of the commit which was used while taking a snapshot using the CREATESNAPSHOT command. -|=== +`numberToKeep`:: +The number of backups to keep. If `maxNumberOfBackups` has been specified on the replication handler in `solrconfig.xml`, `maxNumberOfBackups` is always used and attempts to use `numberToKeep` will cause an error. Also, this parameter is not taken into consideration if the backup name is specified. More information about `maxNumberOfBackups` can be found in the section <>. + +`repository`:: +The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically. + +`commitName`:: +The name of the commit which was used while taking a snapshot using the CREATESNAPSHOT command. === Backup Status -The backup operation can be monitored to see if it has completed by sending the `details` command to the `/replication` handler, as in this example: +The `backup` operation can be monitored to see if it has completed by sending the `details` command to the `/replication` handler, as in this example: .Status API Example [source,text] @@ -103,25 +104,24 @@ http://localhost:8983/solr/gettingstarted/replication?command=restore&name=backu This will restore the named index snapshot into the current core. Searches will start reflecting the snapshot data once the restore is complete. -The restore request can also take these additional parameters: +The `restore` request can take these additional parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`location`:: +The location of the backup snapshot file. If not specified, it looks for backups in Solr's data directory. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|location |The location of the backup snapshot file. If not specified, it looks for backups in Solr's data directory. -|name |The name of the backed up index snapshot to be restored. If the name is not provided it looks for backups with `snapshot.` format in the location directory. It picks the latest timestamp backup in that case. -|repository |The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically. -|=== +`name`:: +The name of the backed up index snapshot to be restored. If the name is not provided it looks for backups with `snapshot.` format in the location directory. It picks the latest timestamp backup in that case. -The restore command is an asynchronous call. Once the restore is complete the data reflected will be of the backed up index which was restored. +`repository`:: +The name of the repository to be used for the backup. If no repository is specified then the local filesystem repository will be used automatically. -Only one restore call can can be made against a core at one point in time. While an ongoing restore operation is happening subsequent calls for restoring will throw an exception. +The `restore` command is an asynchronous call. Once the restore is complete the data reflected will be of the backed up index which was restored. + +Only one `restore` call can can be made against a core at one point in time. While an ongoing restore operation is happening subsequent calls for restoring will throw an exception. === Restore Status API -You can also check the status of a restore operation by sending the `restorestatus` command to the `/replication` handler, as in this example: +You can also check the status of a `restore` operation by sending the `restorestatus` command to the `/replication` handler, as in this example: .Status API Example [source,text] @@ -158,21 +158,18 @@ You can trigger a snapshot command with an HTTP command like this (replace "tech http://localhost:8983/solr/admin/cores?action=CREATESNAPSHOT&core=techproducts&commitName=commit1 ---- -The list snapshot request parameters are: +The `CREATESNAPSHOT` request parameters are: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`commitName`:: +The name to store the snapshot as. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|commitName |Specify the commit name to store the snapshot as -|core |name of the core to perform the snapshot on -|async |Request ID to track this action which will be processed asynchronously -|=== +`core`:: The name of the core to perform the snapshot on. + +`async`:: Request ID to track this action which will be processed asynchronously. === List Snapshot API -The list snapshot functionality lists all the taken snapshots for a particular core. +The `LISTSNAPSHOTS` command lists all the taken snapshots for a particular core. You can trigger a list snapshot command with an HTTP command like this (replace "techproducts" with the name of the core you are working with): @@ -184,20 +181,17 @@ http://localhost:8983/solr/admin/cores?action=LISTSNAPSHOTS&core=techproducts&co The list snapshot request parameters are: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +core:: +The name of the core to whose snapshots we want to list. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|core |name of the core to whose snapshots we want to list -|async |Request ID to track this action which will be processed asynchronously -|=== +async:: +Request ID to track this action which will be processed asynchronously. === Delete Snapshot API -The delete snapshot functionality deletes a particular snapshot for a particular core. +The `DELETESNAPSHOT` command deletes a snapshot for a particular core. -You can trigger a delete snapshot command with an HTTP command like this (replace "techproducts" with the name of the core you are working with): +You can trigger a delete snapshot with an HTTP command like this (replace "techproducts" with the name of the core you are working with): .Delete Snapshot API Example [source,text] @@ -207,13 +201,15 @@ http://localhost:8983/solr/admin/cores?action=DELETESNAPSHOT&core=techproducts&c The delete snapshot request parameters are: -[width="100%",options="header",] -|=== -|Parameter |Description -|commitName |Specify the commit name to be deleted -|core |name of the core whose snapshot we want to delete -|async |Request ID to track this action which will be processed asynchronously -|=== +`commitName`:: +Specify the commit name to be deleted + +`core`:: +The name of the core whose snapshot we want to delete + +`async`:: +Request ID to track this action which will be processed asynchronously + == Backup/Restore Storage Repositories diff --git a/solr/solr-ref-guide/src/mbean-request-handler.adoc b/solr/solr-ref-guide/src/mbean-request-handler.adoc index 6900cf0b0bb..eebd082e9b3 100644 --- a/solr/solr-ref-guide/src/mbean-request-handler.adoc +++ b/solr/solr-ref-guide/src/mbean-request-handler.adoc @@ -22,16 +22,17 @@ The MBean Request Handler offers programmatic access to the information provided The MBean Request Handler accepts the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`key`:: +Restricts results by object key. -[cols="10,20,10,60",options="header"] -|=== -|Parameter |Type |Default |Description -|key |multivalued |all |Restricts results by object key. -|cat |multivalued |all |Restricts results by category name. -|stats |boolean |false |Specifies whether statistics are returned with results. You can override the `stats` parameter on a per-field basis. -|wt |multivalued |xml |The output format. This operates the same as the <>. -|=== +`cat`:: +Restricts results by category name. + +`stats`:: +Specifies whether statistics are returned with results. You can override the `stats` parameter on a per-field basis. The default is `false`. + +`wt`:: +The output format. This operates the same as the <>. The default is `xml`. [[MBeanRequestHandler-Examples]] == Examples diff --git a/solr/solr-ref-guide/src/morelikethis.adoc b/solr/solr-ref-guide/src/morelikethis.adoc index ec6129eeb29..e0756cbbc55 100644 --- a/solr/solr-ref-guide/src/morelikethis.adoc +++ b/solr/solr-ref-guide/src/morelikethis.adoc @@ -47,51 +47,63 @@ The next phase filters terms from the original document using thresholds defined The table below summarizes the `MoreLikeThis` parameters supported by Lucene/Solr. These parameters can be used with any of the three possible MoreLikeThis approaches. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`mlt.fl`:: +Specifies the fields to use for similarity. If possible, these should have stored `termVectors`. + +`mlt.mintf`:: +Specifies the Minimum Term Frequency, the frequency below which terms will be ignored in the source document. + +`mlt.mindf`:: +Specifies the Minimum Document Frequency, the frequency at which words will be ignored which do not occur in at least this many documents. + +`mlt.maxdf`:: +Specifies the Maximum Document Frequency, the frequency at which words will be ignored which occur in more than this many documents. + +`mlt.minwl`:: +Sets the minimum word length below which words will be ignored. + +`mlt.maxwl`:: +Sets the maximum word length above which words will be ignored. + +`mlt.maxqt`:: +Sets the maximum number of query terms that will be included in any generated query. + +`mlt.maxntp`:: +Sets the maximum number of tokens to parse in each example document field that is not stored with TermVector support. + +`mlt.boost`:: +Specifies if the query will be boosted by the interesting term relevance. It can be either "true" or "false". + +`mlt.qf`:: +Query fields and their boosts using the same format as that used by the <>. These fields must also be specified in `mlt.fl`. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|mlt.fl |Specifies the fields to use for similarity. If possible, these should have stored `termVectors`. -|mlt.mintf |Specifies the Minimum Term Frequency, the frequency below which terms will be ignored in the source document. -|mlt.mindf |Specifies the Minimum Document Frequency, the frequency at which words will be ignored which do not occur in at least this many documents. -|mlt.maxdf |Specifies the Maximum Document Frequency, the frequency at which words will be ignored which occur in more than this many documents. -|mlt.minwl |Sets the minimum word length below which words will be ignored. -|mlt.maxwl |Sets the maximum word length above which words will be ignored. -|mlt.maxqt |Sets the maximum number of query terms that will be included in any generated query. -|mlt.maxntp |Sets the maximum number of tokens to parse in each example document field that is not stored with TermVector support. -|mlt.boost |Specifies if the query will be boosted by the interesting term relevance. It can be either "true" or "false". -|mlt.qf |Query fields and their boosts using the same format as that used by the <>. These fields must also be specified in `mlt.fl`. -|=== [[MoreLikeThis-ParametersfortheMoreLikeThisComponent]] == Parameters for the MoreLikeThisComponent Using MoreLikeThis as a search component returns similar documents for each document in the response set. In addition to the common parameters, these additional options are available: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`mlt`:: +If set to `true`, activates the `MoreLikeThis` component and enables Solr to return `MoreLikeThis` results. + +`mlt.count`:: +Specifies the number of similar documents to be returned for each result. The default value is 5. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|mlt |If set to true, activates the `MoreLikeThis` component and enables Solr to return `MoreLikeThis` results. -|mlt.count |Specifies the number of similar documents to be returned for each result. The default value is 5. -|=== [[MoreLikeThis-ParametersfortheMoreLikeThisHandler]] == Parameters for the MoreLikeThisHandler The table below summarizes parameters accessible through the `MoreLikeThisHandler`. It supports faceting, paging, and filtering using common query parameters, but does not work well with alternate query parsers. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`mlt.match.include`:: +Specifies whether or not the response should include the matched document. If set to false, the response will look like a normal select response. + +`mlt.match.offset`:: +Specifies an offset into the main query search results to locate the document on which the `MoreLikeThis` query should operate. By default, the query operates on the first result for the q parameter. + +`mlt.interestingTerms`:: +Controls how the `MoreLikeThis` component presents the "interesting" terms (the top TF/IDF terms) for the query. Supports three settings. The setting list lists the terms. The setting none lists no terms. The setting details lists the terms along with the boost value used for each term. Unless `mlt.boost=true`, all terms will have `boost=1.0`. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|mlt.match.include |Specifies whether or not the response should include the matched document. If set to false, the response will look like a normal select response. -|mlt.match.offset |Specifies an offset into the main query search results to locate the document on which the `MoreLikeThis` query should operate. By default, the query operates on the first result for the q parameter. -|mlt.interestingTerms |Controls how the `MoreLikeThis` component presents the "interesting" terms (the top TF/IDF terms) for the query. Supports three settings. The setting list lists the terms. The setting none lists no terms. The setting details lists the terms along with the boost value used for each term. Unless `mlt.boost=true`, all terms will have `boost=1.0`. -|=== [[MoreLikeThis-MoreLikeThisQueryParser]] == More Like This Query Parser diff --git a/solr/solr-ref-guide/src/near-real-time-searching.adoc b/solr/solr-ref-guide/src/near-real-time-searching.adoc index 8d87d5477e9..fe0e44988bd 100644 --- a/solr/solr-ref-guide/src/near-real-time-searching.adoc +++ b/solr/solr-ref-guide/src/near-real-time-searching.adoc @@ -37,14 +37,11 @@ An *optimize* is like a *hard commit* except that it forces all of the index seg Soft commit takes uses two parameters: `maxDocs` and `maxTime`. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`maxDocs`:: +Integer. Defines the number of documents to queue before pushing them to the index. It works in conjunction with the `update_handler_autosoftcommit_max_time` parameter in that if either limit is reached, the documents will be pushed to the index. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`maxDocs` |Integer. Defines the number of documents to queue before pushing them to the index. It works in conjunction with the `update_handler_autosoftcommit_max_time` parameter in that if either limit is reached, the documents will be pushed to the index. -|`maxTime` |The number of milliseconds to wait before pushing documents to the index. It works in conjunction with the `update_handler_autosoftcommit_max_docs` parameter in that if either limit is reached, the documents will be pushed to the index. -|=== +`maxTime`:: +The number of milliseconds to wait before pushing documents to the index. It works in conjunction with the `update_handler_autosoftcommit_max_docs` parameter in that if either limit is reached, the documents will be pushed to the index. Use `maxDocs` and `maxTime` judiciously to fine-tune your commit strategies. @@ -78,17 +75,20 @@ It's better to use `maxTime` rather than `maxDocs` to modify an `autoSoftCommit` [[NearRealTimeSearching-OptionalAttributesforcommitandoptimize]] === Optional Attributes for commit and optimize -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`waitSearcher`:: +Block until a new searcher is opened and registered as the main query searcher, making the changes visible. Default is `true`. -[cols="20,20,60",options="header"] -|=== -|Parameter |Valid Attributes |Description -|`waitSearcher` |true, false |Block until a new searcher is opened and registered as the main query searcher, making the changes visible. Default is true. -|`OpenSearcher` |true, false |Open a new searcher making all documents indexed so far visible for searching. Default is true. -|`softCommit` |true, false |Perform a soft commit. This will refresh the view of the index faster, but without guarantees that the document is stably stored. Default is false. -|`expungeDeletes` |true, false |Valid for `commit` only. This parameter purges deleted data from segments. The default is false. -|`maxSegments` |integer |Valid for `optimize` only. Optimize down to at most this number of segments. The default is 1. -|=== +`OpenSearcher`:: +Open a new searcher making all documents indexed so far visible for searching. Default is `true`. + +`softCommit`:: +Perform a soft commit. This will refresh the view of the index faster, but without guarantees that the document is stably stored. Default is `false`. + +`expungeDeletes`:: +Valid for `commit` only. This parameter purges deleted data from segments. The default is `false`. + +`maxSegments`:: +Valid for `optimize` only. Optimize down to at most this number of segments. The default is `1`. Example of `commit` and `optimize` with optional attributes: diff --git a/solr/solr-ref-guide/src/parameter-reference.adoc b/solr/solr-ref-guide/src/parameter-reference.adoc index 7f395d574fd..a511bedd1b8 100644 --- a/solr/solr-ref-guide/src/parameter-reference.adoc +++ b/solr/solr-ref-guide/src/parameter-reference.adoc @@ -20,48 +20,39 @@ == Cluster Parameters -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`numShards`:: +Defaults to `1`. The number of shards to hash documents to. There must be one leader per shard and each leader can have _N_ replicas. -[cols="20,20,60"] -|=== -|`numShards` |Defaults to 1 |The number of shards to hash documents to. There must be one leader per shard and each leader can have N replicas. -|=== == SolrCloud Instance Parameters These are set in `solr.xml`, but by default the `host` and `hostContext` parameters are set up to also work with system properties. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`host`:: +Defaults to the first local host address found. If the wrong host address is found automatically, you can override the host address with this parameter. -[cols="20,20,60"] -|=== -|`host` |Defaults to the first local host address found |If the wrong host address is found automatically, you can override the host address with this parameter. -|`hostPort` |Defaults to the port specified via `bin/solr -p `, or `8983` if not specified. |The port that Solr is running on. This value is only used when `-DzkRun` is specified without a value (see below), to calculate the default port on which embedded ZooKeeper will run. **I**n the `solr.xml` shipped with Solr, the `hostPort` system property is not referenced, and so is ignored. If you want to run Solr on a non-default port, use `bin/solr -p ` rather than specifying `-DhostPort`. -|`hostContext` |Defaults to `solr` |The context path for the Solr web application. -|=== +`hostPort`:: +Defaults to the port specified via `bin/solr -p `, or `8983` if not specified. The port that Solr is running on. This value is only used when `-DzkRun` is specified without a value (see below), to calculate the default port on which embedded ZooKeeper will run. In the `solr.xml` shipped with Solr, the `hostPort` system property is not referenced, and so is ignored. If you want to run Solr on a non-default port, use `bin/solr -p ` rather than specifying `-DhostPort`. + +`hostContext`:: +Defaults to `solr`. The context path for the Solr web application. == SolrCloud Instance ZooKeeper Parameters -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`zkRun`:: +Defaults to `localhost:`. Causes Solr to run an embedded version of ZooKeeper. Set to the address of ZooKeeper on this node; this allows us to know who you are in the list of addresses in the `zkHost` connect string. Use `-DzkRun` (with no value) to get the default value. -[cols="20,20,60"] -|=== -|`zkRun` |Defaults to `localhost:` |Causes Solr to run an embedded version of ZooKeeper. Set to the address of ZooKeeper on this node; this allows us to know who you are in the list of addresses in the `zkHost` connect string. Use `-DzkRun` (with no value) to get the default value. -|`zkHost` |No default |The host address for ZooKeeper. Usually this is a comma-separated list of addresses to each node in your ZooKeeper ensemble. -|`zkClientTimeout` |Defaults to 15000 |The time a client is allowed to not talk to ZooKeeper before its session expires. -|=== +`zkHost`:: +The host address for ZooKeeper. Usually this is a comma-separated list of addresses to each node in your ZooKeeper ensemble. + +`zkClientTimeout`:: +Defaults to 15000. The time a client is allowed to not talk to ZooKeeper before its session expires. `zkRun` and `zkHost` are set up using system properties. `zkClientTimeout` is set up in `solr.xml` by default, but can also be set using a system property. == SolrCloud Core Parameters -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,20,60"] -|=== -|`shard` |Defaults to being automatically assigned based on numShards |Specifies which shard this core acts as a replica of. -|=== - -`shard` can be specified in the <> for each core. +`shard`:: +Defaults to being automatically assigned based on numShards. Specifies which shard this core acts as a replica of. `shard` can be specified in the <> for each core. Additional cloud related parameters are discussed in <> diff --git a/solr/solr-ref-guide/src/query-screen.adoc b/solr/solr-ref-guide/src/query-screen.adoc index 23c2334c746..e089f4ff824 100644 --- a/solr/solr-ref-guide/src/query-screen.adoc +++ b/solr/solr-ref-guide/src/query-screen.adoc @@ -33,26 +33,48 @@ The response has at least two sections, but may have several more depending on t The `response` includes the documents that matched the query, in `doc` sub-sections. The fields return depend on the parameters of the query (and the defaults of the request handler used). The number of results is also included in this section. -This screen allows you to experiment with different query options, and inspect how your documents were indexed. The query parameters available on the form are some basic options that most users want to have available, but there are dozens more available which could be simply added to the basic request by hand (if opened in a browser). The table below explains the parameters available: +This screen allows you to experiment with different query options, and inspect how your documents were indexed. The query parameters available on the form are some basic options that most users want to have available, but there are dozens more available which could be simply added to the basic request by hand (if opened in a browser). The following parameters are available: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +Request-handler (qt):: +Specifies the query handler for the request. If a query handler is not specified, Solr processes the response with the standard query handler. -[cols="20,80",options="header"] -|=== -|Field |Description -|Request-handler (qt) |Specifies the query handler for the request. If a query handler is not specified, Solr processes the response with the standard query handler. -|q |The query event. See <> for an explanation of this parameter. -|fq |The filter queries. See <> for more information on this parameter. -|sort |Sorts the response to a query in either ascending or descending order based on the response's score or another specified characteristic. -|start, rows |`start` is the offset into the query result starting at which documents should be returned. The default value is 0, meaning that the query should return results starting with the first document that matches. This field accepts the same syntax as the start query parameter, which is described in <>. `rows` is the number of rows to return. -|fl |Defines the fields to return for each document. You can explicitly list the stored fields, <>, and <> you want to have returned by separating them with either a comma or a space. -|wt |Specifies the Response Writer to be used to format the query response. Defaults to XML if not specified. -|indent |Click this button to request that the Response Writer use indentation to make the responses more readable. -|debugQuery |Click this button to augment the query response with debugging information, including "explain info" for each document returned. This debugging information is intended to be intelligible to the administrator or programmer. -|dismax |Click this button to enable the Dismax query parser. See <> for further information. -|edismax |Click this button to enable the Extended query parser. See <> for further information. -|hl |Click this button to enable highlighting in the query response. See <> for more information. -|facet |Enables faceting, the arrangement of search results into categories based on indexed terms. See <> for more information. -|spatial |Click to enable using location data for use in spatial or geospatial searches. See <> for more information. -|spellcheck |Click this button to enable the Spellchecker, which provides inline query suggestions based on other, similar, terms. See <> for more information. -|=== +q:: +The query event. See <> for an explanation of this parameter. + +fq:: +The filter queries. See <> for more information on this parameter. + +sort:: +Sorts the response to a query in either ascending or descending order based on the response's score or another specified characteristic. + +start, rows:: +`start` is the offset into the query result starting at which documents should be returned. The default value is 0, meaning that the query should return results starting with the first document that matches. This field accepts the same syntax as the start query parameter, which is described in <>. `rows` is the number of rows to return. + +fl:: +Defines the fields to return for each document. You can explicitly list the stored fields, <>, and <> you want to have returned by separating them with either a comma or a space. + +wt:: +Specifies the Response Writer to be used to format the query response. Defaults to XML if not specified. + +indent:: +Click this button to request that the Response Writer use indentation to make the responses more readable. + +debugQuery:: +Click this button to augment the query response with debugging information, including "explain info" for each document returned. This debugging information is intended to be intelligible to the administrator or programmer. + +dismax:: +Click this button to enable the Dismax query parser. See <> for further information. + +edismax:: +Click this button to enable the Extended query parser. See <> for further information. + +hl:: Click this button to enable highlighting in the query response. See <> for more information. + +facet:: +Enables faceting, the arrangement of search results into categories based on indexed terms. See <> for more information. + +spatial:: +Click to enable using location data for use in spatial or geospatial searches. See <> for more information. + +spellcheck:: +Click this button to enable the Spellchecker, which provides inline query suggestions based on other, similar, terms. See <> for more information. diff --git a/solr/solr-ref-guide/src/requestdispatcher-in-solrconfig.adoc b/solr/solr-ref-guide/src/requestdispatcher-in-solrconfig.adoc index 2883f9af368..430b4041389 100644 --- a/solr/solr-ref-guide/src/requestdispatcher-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/requestdispatcher-in-solrconfig.adoc @@ -71,15 +71,14 @@ The `` element controls HTTP cache control headers. Do not confuse This element allows for three attributes and one sub-element. The attributes of the `` element control whether a 304 response to a GET request is allowed, and if so, what sort of response it should be. When an HTTP client application issues a GET, it may optionally specify that a 304 response is acceptable if the resource has not been modified since the last time it was fetched. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`never304`:: +If present with the value `true`, then a GET request will never respond with a 304 code, even if the requested resource has not been modified. When this attribute is set to true, the next two attributes are ignored. Setting this to true is handy for development, as the 304 response can be confusing when tinkering with Solr responses through a web browser or other client that supports cache headers. -[cols="20,80",options="header"] -|=== -|Parameter |Description -|never304 |If present with the value `true`, then a GET request will never respond with a 304 code, even if the requested resource has not been modified. When this attribute is set to true, the next two attributes are ignored. Setting this to true is handy for development, as the 304 response can be confusing when tinkering with Solr responses through a web browser or other client that supports cache headers. -|lastModFrom |This attribute may be set to either `openTime` (the default) or `dirLastMod`. The value `openTime` indicates that last modification times, as compared to the If-Modified-Since header sent by the client, should be calculated relative to the time the Searcher started. Use `dirLastMod` if you want times to exactly correspond to when the index was last updated on disk. -|etagSeed |This value of this attribute is sent as the value of the `ETag` header. Changing this value can be helpful to force clients to re-fetch content even when the indexes have not changed---for example, when you've made some changes to the configuration. -|=== +`lastModFrom`:: +This attribute may be set to either `openTime` (the default) or `dirLastMod`. The value `openTime` indicates that last modification times, as compared to the If-Modified-Since header sent by the client, should be calculated relative to the time the Searcher started. Use `dirLastMod` if you want times to exactly correspond to when the index was last updated on disk. + +`etagSeed`:: +This value of this attribute is sent as the value of the `ETag` header. Changing this value can be helpful to force clients to re-fetch content even when the indexes have not changed---for example, when you've made some changes to the configuration. [source,xml] ---- diff --git a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc index 15ddef66c0a..46d9c9ebece 100644 --- a/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/requesthandlers-and-searchcomponents-in-solrconfig.adoc @@ -127,13 +127,13 @@ There are several default search components that work with all SearchHandlers wi [cols="20,40,40",options="header"] |=== |Component Name |Class Name |More Information -|query |solr.QueryComponent |Described in the section <>. -|facet |solr.FacetComponent |Described in the section <>. -|mlt |solr.MoreLikeThisComponent |Described in the section <>. -|highlight |solr.HighlightComponent |Described in the section <>. -|stats |solr.StatsComponent |Described in the section <>. -|debug |solr.DebugComponent |Described in the section on <>. -|expand |solr.ExpandComponent |Described in the section <>. +|query |`solr.QueryComponent` |Described in the section <>. +|facet |`solr.FacetComponent` |Described in the section <>. +|mlt |`solr.MoreLikeThisComponent` |Described in the section <>. +|highlight |`solr.HighlightComponent` |Described in the section <>. +|stats |`solr.StatsComponent` |Described in the section <>. +|debug |`solr.DebugComponent` |Described in the section on <>. +|expand |`solr.ExpandComponent` |Described in the section <>. |=== If you register a new search component with one of these default names, the newly defined component will be used instead of the default. diff --git a/solr/solr-ref-guide/src/response-writers.adoc b/solr/solr-ref-guide/src/response-writers.adoc index 8d113705ab1..4f33effde0a 100644 --- a/solr/solr-ref-guide/src/response-writers.adoc +++ b/solr/solr-ref-guide/src/response-writers.adoc @@ -19,27 +19,26 @@ // specific language governing permissions and limitations // under the License. -A Response Writer generates the formatted response of a search. Solr supports a variety of Response Writers to ensure that query responses can be parsed by the appropriate language or application. +A Response Writer generates the formatted response of a search. -The `wt` parameter selects the Response Writer to be used. The table below lists the most common settings for the `wt` parameter. +Solr supports a variety of Response Writers to ensure that query responses can be parsed by the appropriate language or application. + +The `wt` parameter selects the Response Writer to be used. The list below describe shows the most common settings for the `wt` parameter, with links to further sections that discuss them in more detail. + +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> +* <> -[width="100%",options="header",] -|=== -|`wt` Parameter Setting |Response Writer Selected -|csv |<> -|geojson |<> -|javabin |<> -|json |<> -|php |<> -|phps |<> -|python |<> -|ruby |<> -|smile |<> -|velocity |<> -|xlsx |<> -|xml |<> -|xslt |<> -|=== [[ResponseWriters-TheStandardXMLResponseWriter]] == The Standard XML Response Writer @@ -55,13 +54,7 @@ The behavior of the XML Response Writer can be driven by the following query par The `version` parameter determines the XML protocol used in the response. Clients are strongly encouraged to _always_ specify the protocol version, so as to ensure that the format of the response they receive does not change unexpectedly if the Solr server is upgraded and a new default format is introduced. -Currently supported version values are: - -[width="100%",options="header",] -|=== -|XML Version |Notes -|2.2 |The format of the responseHeader changed to use the same `` structure as the rest of the response. -|=== +The only currently supported version value is `2.2`. The format of the `responseHeader` changed to use the same `` structure as the rest of the response. The default value is the latest supported. @@ -173,17 +166,35 @@ The default mime type for the JSON writer is `application/json`, however this ca This parameter controls the output format of NamedLists, where order is more important than access by name. NamedList is currently used for field faceting data. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +The `json.nl` parameter takes the following values: + -[cols="20,40,40",options="header"] -|=== |json.nl Parameter setting |Example output for `NamedList("a"=1, "bar"="foo", null=3, null=null)` |Description -|flat _(the default)_ |`["a",1, "bar","foo", null,3, null,null]` |NamedList is represented as a flat array, alternating names and values. -|map |`{"a":1, "bar":"foo", "":3, "":null}` |NamedList is represented as a JSON object. Although this is the simplest mapping, a NamedList can have optional keys, repeated keys, and preserves order. Using a JSON object (essentially a map or hash) for a NamedList results in the loss of some information. -|arrarr |`[["a",1], ["bar","foo"], [null,3], [null,null]]` |NamedList is represented as an array of two element arrays. -|arrmap |[`{"a":1}, {"b":2}, 3, null]` |NamedList is represented as an array of JSON objects. -|arrntv |`[{"name":"a","type":"int","value":1}, {"name":"bar","type":"str","value":"foo"}, {"name":null,"type":"int","value":3}, {"name":null,"type":"null","value":null}]` |NamedList is represented as an array of Name Type Value JSON objects. -|=== + +`flat`:: +The default. NamedList is represented as a flat array, alternating names and values. ++ +With input of `NamedList("a"=1, "bar"="foo", null=3, null=null)`, the output would be `["a",1, "bar","foo", null,3, null,null]`. + +`map`:: +NamedList is represented as a JSON object. Although this is the simplest mapping, a NamedList can have optional keys, repeated keys, and preserves order. Using a JSON object (essentially a map or hash) for a NamedList results in the loss of some information. ++ +With input of `NamedList("a"=1, "bar"="foo", null=3, null=null)`, the output would be `{"a":1, "bar":"foo", "":3, "":null}`. + +arrarr:: +NamedList is represented as an array of two element arrays. ++ +With input of `NamedList("a"=1, "bar"="foo", null=3, null=null)`, the output would be `[["a",1], ["bar","foo"], [null,3], [null,null]]`. + +arrmap:: +NamedList is represented as an array of JSON objects. ++ +With input of `NamedList("a"=1, "bar"="foo", null=3, null=null)`, the output would be `[{"a":1}, {"b":2}, 3, null]`. + +arrntv:: +NamedList is represented as an array of Name Type Value JSON objects. ++ +With input of `NamedList("a"=1, "bar"="foo", null=3, null=null)`, the output would be `[{"name":"a","type":"int","value":1}, {"name":"bar","type":"str","value":"foo"}, {"name":null,"type":"int","value":3}, {"name":null,"type":"null","value":null}]`. [[ResponseWriters-json.wrf]] ==== json.wrf @@ -278,11 +289,11 @@ These parameters specify the CSV format that will be returned. You can accept th [width="50%",options="header",] |=== |Parameter |Default Value -|csv.encapsulator |" +|csv.encapsulator |`"` |csv.escape |None -|csv.separator |, -|csv.header |Defaults to true. If false, Solr does not print the column headers -|csv.newline |\n +|csv.separator |`,` +|csv.header |Defaults to `true`. If `false`, Solr does not print the column headers. +|csv.newline |`\n` |csv.null |Defaults to a zero length string. Use this parameter when a document has no value for a particular field. |=== @@ -295,8 +306,8 @@ These parameters specify how multi-valued fields are encoded. Per-field override |=== |Parameter |Default Value |csv.mv.encapsulator |None -|csv.mv.escape |\ -|csv.mv.separator |Defaults to the `csv.separator` value +|csv.mv.escape |`\` +|csv.mv.separator |Defaults to the `csv.separator` value. |=== [[ResponseWriters-Example]] diff --git a/solr/solr-ref-guide/src/result-clustering.adoc b/solr/solr-ref-guide/src/result-clustering.adoc index c4fa048366b..db9a43ce608 100644 --- a/solr/solr-ref-guide/src/result-clustering.adoc +++ b/solr/solr-ref-guide/src/result-clustering.adoc @@ -209,31 +209,34 @@ An example configuration could look as shown below. [[ResultClustering-ConfigurationParametersoftheClusteringComponent]] === Configuration Parameters of the Clustering Component -The table below summarizes parameters of each clustering engine or the entire clustering component (depending where they are declared). +The following parameters of each clustering engine or the entire clustering component (depending where they are declared) are available. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`clustering`:: +When `true`, clustering component is enabled. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`clustering` |When `true`, clustering component is enabled. -|`clustering.engine` |Declares which clustering engine to use. If not present, the first declared engine will become the default one. -|`clustering.results` |When `true`, the component will perform clustering of search results (this should be enabled). -|`clustering.collection` |When `true`, the component will perform clustering of the whole document index (this section does not cover full-index clustering). -|=== +`clustering.engine`:: +Declares which clustering engine to use. If not present, the first declared engine will become the default one. + +`clustering.results`:: +When `true`, the component will perform clustering of search results (this should be enabled). + +`clustering.collection`:: +When `true`, the component will perform clustering of the whole document index (this section does not cover full-index clustering). At the engine declaration level, the following parameters are supported. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`carrot.algorithm`:: +The algorithm class. + +`carrot.resourcesDir`:: +Algorithm-specific resources and configuration files (stop words, other lexical resources, default settings). By default points to `conf/clustering/carrot2/` + +`carrot.outputSubClusters`:: +If `true` and the algorithm supports hierarchical clustering, sub-clusters will also be emitted. Default value: true. + +`carrot.numDescriptions`:: +Maximum number of per-cluster labels to return (if the algorithm assigns more than one label to a cluster). -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`carrot.algorithm` |The algorithm class. -|`carrot.resourcesDir` |Algorithm-specific resources and configuration files (stop words, other lexical resources, default settings). By default points to `conf/clustering/carrot2/` -|`carrot.outputSubClusters` |If `true` and the algorithm supports hierarchical clustering, sub-clusters will also be emitted. Default value: true. -|`carrot.numDescriptions` |Maximum number of per-cluster labels to return (if the algorithm assigns more than one label to a cluster). -|=== The `carrot.algorithm` parameter should contain a fully qualified class name of an algorithm supported by the http://project.carrot2.org[Carrot2] framework. Currently, the following algorithms are available: @@ -255,30 +258,27 @@ The question of which algorithm to choose depends on the amount of traffic (STC The clustering engine can apply clustering to the full content of (stored) fields or it can run an internal highlighter pass to extract context-snippets before clustering. Highlighting is recommended when the logical snippet field contains a lot of content (this would affect clustering performance). Highlighting can also increase the quality of clustering because the content passed to the algorithm will be more focused around the query (it will be query-specific context). The following parameters control the internal highlighter. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`carrot.produceSummary`:: +When `true` the clustering component will run a highlighter pass on the content of logical fields pointed to by `carrot.title` and `carrot.snippet`. Otherwise full content of those fields will be clustered. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`carrot.produceSummary` |When `true` the clustering component will run a highlighter pass on the content of logical fields pointed to by `carrot.title` and `carrot.snippet`. Otherwise full content of those fields will be clustered. -|`carrot.fragSize` |The size, in characters, of the snippets (aka fragments) created by the highlighter. If not specified, the default highlighting fragsize (`hl.fragsize`) will be used. -|`carrot.summarySnippets` |The number of summary snippets to generate for clustering. If not specified, the default highlighting snippet count (`hl.snippets`) will be used. -|=== +`carrot.fragSize`:: +The size, in characters, of the snippets (aka fragments) created by the highlighter. If not specified, the default highlighting fragsize (`hl.fragsize`) will be used. + +`carrot.summarySnippets`:: The number of summary snippets to generate for clustering. If not specified, the default highlighting snippet count (`hl.snippets`) will be used. [[ResultClustering-LogicaltoDocumentFieldMapping]] === Logical to Document Field Mapping As already mentioned in <>, the clustering component clusters "documents" consisting of logical parts that need to be mapped onto physical schema of data stored in Solr. The field mapping attributes provide a connection between fields and logical document parts. Note that the content of title and snippet fields must be *stored* so that it can be retrieved at search time. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`carrot.title`:: +The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's title. The clustering algorithms typically give more weight to the content of the title field compared to the content (snippet). For best results, the field should contain concise, noise-free content. If there is no clear title in your data, you can leave this parameter blank. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`carrot.title` |The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's title. The clustering algorithms typically give more weight to the content of the title field compared to the content (snippet). For best results, the field should contain concise, noise-free content. If there is no clear title in your data, you can leave this parameter blank. -|`carrot.snippet` |The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's main content. If this mapping points to very large content fields the performance of clustering may drop significantly. An alternative then is to use query-context snippets for clustering instead of full field content. See the description of the `carrot.produceSummary` parameter for details. -|`carrot.url` |The field that should be mapped to the logical document's content URL. Leave blank if not required. -|=== +`carrot.snippet`:: +The field (alternatively comma- or space-separated list of fields) that should be mapped to the logical document's main content. If this mapping points to very large content fields the performance of clustering may drop significantly. An alternative then is to use query-context snippets for clustering instead of full field content. See the description of the `carrot.produceSummary` parameter for details. + +`carrot.url`:: +The field that should be mapped to the logical document's content URL. Leave blank if not required. [[ResultClustering-ClusteringMultilingualContent]] === Clustering Multilingual Content @@ -287,14 +287,11 @@ The field mapping specification can include a `carrot.lang` parameter, which def The language hint makes it easier for clustering algorithms to separate documents from different languages on input and to pick the right language resources for clustering. If you do have multi-lingual query results (or query results in a language different than English), it is strongly advised to map the language field appropriately. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`carrot.lang`:: +The field that stores ISO 639-1 code of the language of the document's text fields. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|`carrot.lang` |The field that stores ISO 639-1 code of the language of the document's text fields. -|`carrot.lcmap` |A mapping of arbitrary strings into ISO 639 two-letter codes used by `carrot.lang`. The syntax of this parameter is the same as `langid.map.lcmap`, for example: `langid.map.lcmap=japanese:ja polish:pl english:en` -|=== +`carrot.lcmap`:: +A mapping of arbitrary strings into ISO 639 two-letter codes used by `carrot.lang`. The syntax of this parameter is the same as `langid.map.lcmap`, for example: `langid.map.lcmap=japanese:ja polish:pl english:en` The default language can also be set using Carrot2-specific algorithm attributes (in this case the http://doc.carrot2.org/#section.attribute.lingo.MultilingualClustering.defaultLanguage[MultilingualClustering.defaultLanguage] attribute). diff --git a/solr/solr-ref-guide/src/result-grouping.adoc b/solr/solr-ref-guide/src/result-grouping.adoc index 72a79a1abe4..89b3c339e56 100644 --- a/solr/solr-ref-guide/src/result-grouping.adoc +++ b/solr/solr-ref-guide/src/result-grouping.adoc @@ -59,44 +59,65 @@ If you ask Solr to group these documents by "product_range", then the total amou Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`group`:: +If `true`, query results will be grouped. -[cols="20,20,60",options="header"] -|=== -|Parameter |Type |Description -|group |Boolean |If true, query results will be grouped. -|group.field |string |The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as `ExternalFileField`. It must also be a string-based field, such as `StrField` or `TextField` -|group.func |query a| +`group.field`:: +The name of the field by which to group results. The field must be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as `ExternalFileField`. It must also be a string-based field, such as `StrField` or `TextField` + +`group.func`:: Group based on the unique values of a function query. - ++ NOTE: This option does not work with <>. -|group.query |query |Return a single group of documents that match the given query. -|rows |integer |The number of groups to return. The default value is 10. -|start |integer |Specifies an initial offset for the list of groups. -|group.limit |integer |Specifies the number of results to return for each group. The default value is 1. -|group.offset |integer |Specifies an initial offset for the document list of each group. -|sort |sortspec |Specifies how Solr sorts the groups relative to each other. For example, `sort=popularity desc` will cause the groups to be sorted according to the highest popularity document in each group. The default value is `score desc`. -|group.sort |sortspec |Specifies how Solr sorts documents within each group. The default behavior if `group.sort` is not specified is to use the same effective value as the `sort` parameter. -|group.format |grouped/simple |If this parameter is set to `simple`, the grouped documents are presented in a single flat list, and the `start` and `rows` parameters affect the numbers of documents instead of groups. -|group.main |Boolean |If true, the result of the first field grouping command is used as the main result list in the response, using `group.format=simple`. -|group.ngroups |Boolean a| -If true, Solr includes the number of groups that have matched the query in the results. The default value is false. +`group.query`:: +Return a single group of documents that match the given query. +`rows`:: +The number of groups to return. The default value is `10`. + +`start`:: +Specifies an initial offset for the list of groups. + +`group.limit`:: +Specifies the number of results to return for each group. The default value is `1`. + +`group.offset`:: +Specifies an initial offset for the document list of each group. + +`sort`:: +Specifies how Solr sorts the groups relative to each other. For example, `sort=popularity desc` will cause the groups to be sorted according to the highest popularity document in each group. The default value is `score desc`. + +`group.sort`:: +Specifies how Solr sorts documents within each group. The default behavior if `group.sort` is not specified is to use the same effective value as the `sort` parameter. + +`group.format`:: +If this parameter is set to `simple`, the grouped documents are presented in a single flat list, and the `start` and `rows` parameters affect the numbers of documents instead of groups. An alternate value for this parameter is `grouped`. + +`group.main`:: +If `true`, the result of the first field grouping command is used as the main result list in the response, using `group.format=simple`. + +`group.ngroups`:: +If `true`, Solr includes the number of groups that have matched the query in the results. The default value is false. ++ See below for <> when using sharded indexes -|group.truncate |Boolean |If true, facet counts are based on the most relevant document of each group matching the query. The default value is false. -|group.facet |Boolean a| -Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false. +`group.truncate`:: +If `true`, facet counts are based on the most relevant document of each group matching the query. The default value is `false`. -*Warning*: There can be a heavy performance cost to this option. +`group.facet`:: +Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is `false`. ++ +WARNING: There can be a heavy performance cost to this option. ++ +See below for <> when using sharded indexes. -See below for <> when using sharded indexes +`group.cache.percent`:: +Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is `0`. The maximum value is `100`. ++ +Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance. -|group.cache.percent |integer between 0 and 100 |Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is 0. Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance. -|=== - -Any number of group commands (`group.field`, `group.func`, `group.query`) may be specified in a single request. +Any number of group commands (e.g., `group.field`, `group.func`, `group.query`, etc.) may be specified in a single request. [[ResultGrouping-Examples]] == Examples diff --git a/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc b/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc index df377819de5..5b548f2e77d 100644 --- a/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc +++ b/solr/solr-ref-guide/src/rule-based-authorization-plugin.adoc @@ -132,35 +132,35 @@ Permissions need to be created if they are not on the list of pre-defined permis Several properties can be used to define your custom permission. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`name`:: +The name of the permission. This is required only if it is a predefined permission. -[cols="30,70",options="header"] -|=== -|Property |Description -|name |The name of the permission. This is required only if it is a predefined permission. -|collection a| +`collection`:: The collection or collections the permission will apply to. ++ +When the path that will be allowed is collection-specific, such as when setting permissions to allow use of the Schema API, omitting the collection property will allow the defined path and/or method for all collections. However, when the path is one that is non-collection-specific, such as the Collections API, the collection value must be `null`. The default value is * (all collections). -When the path that will be allowed is collection-specific, such as when setting permissions to allow useof the Schema API, omitting the collection property will allow the defined path and/or method for all collections. However, when the path is one that is non-collection-specific, such as the Collections API, the collection value must be `null`. The default value is * (all collections). +`path`:: +A request handler name, such as `/update` or `/select`. A wild card is supported, to allow for all paths as appropriate (such as, `/update/*`). -|path |A request handler name, such as `/update` or `/select`. A wild card is supported, to allow for all paths as appropriate (such as, `/update/*`). -|method |HTTP methods that are allowed for this permission. You could allow only GET requests, or have a role that allows PUT and POST requests. The method values that are allowed for this property are GET, POST, PUT,DELETEand HEAD. -|params a| +`method`:: HTTP methods that are allowed for this permission. You could allow only GET requests, or have a role that allows PUT and POST requests. The method values that are allowed for this property are GET, POST, PUT,DELETE and HEAD. + +`params`:: The names and values of request parameters. This property can be omitted if all request parameters are to be matched, but will restrict access only to the values provided if defined. - ++ For example, this property could be used to limit the actions a role is allowed to perform with the Collections API. If the role should only be allowed to perform the LIST or CLUSTERSTATUS requests, you would define this as follows: - ++ [source,json] ---- "params": { "action": ["LIST", "CLUSTERSTATUS"] } ---- - ++ The value of the parameter can be a simple string or it could be a regular expression. Use the prefix `REGEX:` to use a regular expression match instead of a string identity match - ++ If the commands LIST and CLUSTERSTATUS are case insensitive, the above example should be as follows - ++ [source,json] ---- "params": { @@ -168,9 +168,11 @@ If the commands LIST and CLUSTERSTATUS are case insensitive, the above example s } ---- -|before |This property allows ordering of permissions. The value of this property is the index of the permission that this new permission should be placed before in `security.json`. The index is automatically assigned in the order they are created -|role |The name of the role(s) to give this permission. This name will be used to map user IDs to the role to grant these permissions. The value can be wildcard such as (`*`), which means that any user is OK, but no user is NOT OK. -|=== +`before`:: +This property allows ordering of permissions. The value of this property is the index of the permission that this new permission should be placed before in `security.json`. The index is automatically assigned in the order they are created. + +`role`:: +The name of the role(s) to give this permission. This name will be used to map user IDs to the role to grant these permissions. The value can be wildcard such as (`*`), which means that any user is OK, but no user is NOT OK. The following creates a new permission named "collection-mgr" that is allowed to create and list collections. The permission will be placed before the "read" permission. Note also that we have defined "collection as `null`, this is because requests to the Collections API are never collection-specific. diff --git a/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc b/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc index e23431cdfb9..9f8e2dc300b 100644 --- a/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc +++ b/solr/solr-ref-guide/src/running-solr-on-hdfs.adoc @@ -103,73 +103,59 @@ The `HdfsDirectoryFactory` has a number of settings that are defined as part of [[RunningSolronHDFS-SolrHDFSSettings]] === Solr HDFS Settings -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,30,10,40",options="header"] -|=== -|Parameter |Example Value |Default |Description -|`solr.hdfs.home` |`hdfs://host:port/path/solr` |N/A |A root location in HDFS for Solr to write collection data to. Rather than specifying an HDFS location for the data directory or update log directory, use this to specify one root location and have everything automatically created within this HDFS location. -|=== +`solr.hdfs.home`:: +A root location in HDFS for Solr to write collection data to. Rather than specifying an HDFS location for the data directory or update log directory, use this to specify one root location and have everything automatically created within this HDFS location. The structure of this parameter is `hdfs://host:port/path/solr`. [[RunningSolronHDFS-BlockCacheSettings]] === Block Cache Settings -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`solr.hdfs.blockcache.enabled`:: +Enable the blockcache. The default is `true`. -[cols="30,10,60",options="header"] -|=== -|Parameter |Default |Description -|`solr.hdfs.blockcache.enabled` |true |Enable the blockcache -|`solr.hdfs.blockcache.read.enabled` |true |Enable the read cache -|`solr.hdfs.blockcache.direct.memory.allocation` |true |Enable direct memory allocation. If this is false, heap is used -|`solr.hdfs.blockcache.slab.count` |1 |Number of memory slabs to allocate. Each slab is 128 MB in size. -|`solr.hdfs.blockcache.global` |true |Enable/Disable using one global cache for all SolrCores. The settings used will be from the first HdfsDirectoryFactory created. -|=== +`solr.hdfs.blockcache.read.enabled`:: +Enable the read cache. The default is `true`. + +`solr.hdfs.blockcache.direct.memory.allocation`:: +Enable direct memory allocation. If this is `false`, heap is used. The default is `true`. + +`solr.hdfs.blockcache.slab.count`:: +Number of memory slabs to allocate. Each slab is 128 MB in size. The default is `1`. + +`solr.hdfs.blockcache.global`:: +Enable/Disable using one global cache for all SolrCores. The settings used will be from the first HdfsDirectoryFactory created. The default is `true`. [[RunningSolronHDFS-NRTCachingDirectorySettings]] === NRTCachingDirectory Settings -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`solr.hdfs.nrtcachingdirectory.enable`:: true | +Enable the use of NRTCachingDirectory. The default is `true`. -[cols="30,10,60",options="header"] -|=== -|Parameter |Default |Description -|`solr.hdfs.nrtcachingdirectory.enable` |true |Enable the use of NRTCachingDirectory -|`solr.hdfs.nrtcachingdirectory.maxmergesizemb` |16 |NRTCachingDirectory max segment size for merges -|`solr.hdfs.nrtcachingdirectory.maxcachedmb` |192 |NRTCachingDirectory max cache size -|=== +`solr.hdfs.nrtcachingdirectory.maxmergesizemb`:: +NRTCachingDirectory max segment size for merges. The default is `16`. + +`solr.hdfs.nrtcachingdirectory.maxcachedmb`:: +NRTCachingDirectory max cache size. The default is `192`. [[RunningSolronHDFS-HDFSClientConfigurationSettings]] === HDFS Client Configuration Settings -solr.hdfs.confdir pass the location of HDFS client configuration files - needed for HDFS HA for example. - -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,10,60",options="header"] -|=== -|Parameter |Default |Description -|`solr.hdfs.confdir` |N/A |Pass the location of HDFS client configuration files - needed for HDFS HA for example. -|=== +`solr.hdfs.confdir`:: +Pass the location of HDFS client configuration files - needed for HDFS HA for example. [[RunningSolronHDFS-KerberosAuthenticationSettings]] === Kerberos Authentication Settings Hadoop can be configured to use the Kerberos protocol to verify user identity when trying to access core services like HDFS. If your HDFS directories are protected using Kerberos, then you need to configure Solr's HdfsDirectoryFactory to authenticate using Kerberos in order to read and write to HDFS. To enable Kerberos authentication from Solr, you need to set the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`solr.hdfs.security.kerberos.enabled`:: false |Set to `true` to enable Kerberos authentication. The default is `false`. -[cols="30,10,60",options="header"] -|=== -|Parameter |Default |Description -|`solr.hdfs.security.kerberos.enabled` |false |Set to true to enable Kerberos authentication -|`solr.hdfs.security.kerberos.keytabfile` |N/A a| +`solr.hdfs.security.kerberos.keytabfile`:: A keytab file contains pairs of Kerberos principals and encrypted keys which allows for password-less authentication when Solr attempts to authenticate with secure Hadoop. - ++ This file will need to be present on all Solr servers at the same path provided in this parameter. -|`solr.hdfs.security.kerberos.principal` |N/A |The Kerberos principal that Solr should use to authenticate to secure Hadoop; the format of a typical Kerberos V5 principal is: `primary/instance@realm` -|=== +`solr.hdfs.security.kerberos.principal`:: +The Kerberos principal that Solr should use to authenticate to secure Hadoop; the format of a typical Kerberos V5 principal is: `primary/instance@realm`. [[RunningSolronHDFS-Example]] == Example @@ -210,20 +196,19 @@ One benefit to running Solr in HDFS is the ability to automatically add new repl Collections created using `autoAddReplicas=true` on a shared file system have automatic addition of replicas enabled. The following settings can be used to override the defaults in the `` section of `solr.xml`. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`autoReplicaFailoverWorkLoopDelay`:: +The time (in ms) between clusterstate inspections by the Overseer to detect and possibly act on creation of a replacement replica. The default is `10000`. -[cols="40,10,50",options="header"] -|=== -|Param |Default |Description -|autoReplicaFailoverWorkLoopDelay |10000 |The time (in ms) between clusterstate inspections by the Overseer to detect and possibly act on creation of a replacement replica. -|autoReplicaFailoverWaitAfterExpiration |30000 |The minimum time (in ms) to wait for initiating replacement of a replica after first noticing it not being live. This is important to prevent false positives while stoping or starting the cluster. -|autoReplicaFailoverBadNodeExpiration |60000 |The delay (in ms) after which a replica marked as down would be unmarked. -|=== +`autoReplicaFailoverWaitAfterExpiration`:: +The minimum time (in ms) to wait for initiating replacement of a replica after first noticing it not being live. This is important to prevent false positives while stoping or starting the cluster. The default is `30000`. + +`autoReplicaFailoverBadNodeExpiration`:: +The delay (in ms) after which a replica marked as down would be unmarked. The default is `60000`. [[RunningSolronHDFS-TemporarilydisableautoAddReplicasfortheentirecluster]] -=== Temporarily disable autoAddReplicas for the entire cluster +=== Temporarily Disable autoAddReplicas for the Entire Cluster -When doing offline maintenance on the cluster and for various other use cases where an admin would like to temporarily disable auto addition of replicas, the following APIs will disable and re-enable autoAddReplicas for **all collections in the cluster**: +When doing offline maintenance on the cluster and for various other use cases where an admin would like to temporarily disable auto addition of replicas, the following APIs will disable and re-enable autoAddReplicas for *all collections in the cluster*: Disable auto addition of replicas cluster wide by setting the cluster property `autoAddReplicas` to `false`: diff --git a/solr/solr-ref-guide/src/spatial-search.adoc b/solr/solr-ref-guide/src/spatial-search.adoc index 69d130514a3..8b56c022f23 100644 --- a/solr/solr-ref-guide/src/spatial-search.adoc +++ b/solr/solr-ref-guide/src/spatial-search.adoc @@ -66,44 +66,46 @@ If you'd rather use a standard industry format, Solr supports WKT and GeoJSON. H There are two spatial Solr "query parsers" for geospatial search: `geofilt` and `bbox`. They take the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`d`:: +The radial distance, usually in kilometers. RPT & BBoxField can set other units via the setting `distanceUnits`. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|d |the radial distance, usually in kilometers. (RPT & BBoxField can set other units via the setting `distanceUnits`) -|pt |the center point using the format "lat,lon" if latitude & longitude. Otherwise, "x,y" for PointType or "x y" for RPT field types. -|sfield |a spatial indexed field -|score a| +`pt`:: +The center point using the format "lat,lon" if latitude & longitude. Otherwise, "x,y" for PointType or "x y" for RPT field types. + +`sfield`:: +A spatial indexed field. + +`score`:: (Advanced option; not supported by LatLonType (deprecated) or PointType) If the query is used in a scoring context (e.g. as the main query in `q`), this _<>_ determines what scores will be produced. Valid values are: -* `none` - A fixed score of 1.0. (the default) -* `kilometers` - distance in kilometers between the field value and the specified center point -* `miles` - distance in miles between the field value and the specified center point -* `degrees` - distance in degrees between the field value and the specified center point -* `distance` - distance between the field value and the specified center point in the `distanceUnits` configured for this field -* `recipDistance` - 1 / the distance - +* `none`: A fixed score of 1.0. (the default) +* `kilometers`: distance in kilometers between the field value and the specified center point +* `miles`: distance in miles between the field value and the specified center point +* `degrees`: distance in degrees between the field value and the specified center point +* `distance`: distance between the field value and the specified center point in the `distanceUnits` configured for this field +* `recipDistance`: 1 / the distance ++ [WARNING] ==== Don't use this for indexed non-point shapes (e.g. polygons). The results will be erroneous. And with RPT, it's only recommended for multi-valued point data, as the implementation doesn't scale very well and for single-valued fields, you should instead use a separate non-RPT field purely for distance sorting. ==== - ++ When used with `BBoxField`, additional options are supported: ++ +* `overlapRatio`: The relative overlap between the indexed shape & query shape. +* `area`: haversine based area of the overlapping shapes expressed in terms of the `distanceUnits` configured for this field +* `area2D`: cartesian coordinates based area of the overlapping shapes expressed in terms of the `distanceUnits` configured for this field -* `overlapRatio` - The relative overlap between the indexed shape & query shape. -* `area` - haversine based area of the overlapping shapes expressed in terms of the `distanceUnits` configured for this field -* `area2D` - cartesian coordinates based area of the overlapping shapes expressed in terms of the `distanceUnits` configured for this field +`filter`:: +(Advanced option; not supported by LatLonType (deprecated) or PointType). If you only want the query to score (with the above `score` local parameter), not filter, then set this local parameter to false. -|filter |(Advanced option; not supported by LatLonType (deprecated) or PointType). If you only want the query to score (with the above `score` local parameter), not filter, then set this local parameter to false. -|=== [[SpatialSearch-geofilt]] === geofilt The `geofilt` filter allows you to retrieve results based on the geospatial distance (AKA the "great circle distance") from a given point. Another way of looking at it is that it creates a circular shape filter. For example, to find all documents within five kilometers of a given lat/lon point, you could enter `&q=*:*&fq={!geofilt sfield=store}&pt=45.15,-93.85&d=5`. This filter returns all results within a circle of the given radius around the initial point: -image::images/spatial-search/circle.png[image] +image::images/spatial-search/circle.png[5KM radius] [[SpatialSearch-bbox]] @@ -117,8 +119,7 @@ Here's a sample query: The rectangular shape is faster to compute and so it's sometimes used as an alternative to `geofilt` when it's acceptable to return points outside of the radius. However, if the ideal goal is a circle but you want it to run faster, then instead consider using the RPT field and try a large `distErrPct` value like `0.1` (10% radius). This will return results outside the radius but it will do so somewhat uniformly around the shape. -image::images/spatial-search/bbox.png[image] - +image::images/spatial-search/bbox.png[Bounding box] [IMPORTANT] ==== @@ -148,7 +149,6 @@ If you know the filter query (be it spatial or not) is fairly unique and not lik LLPSF does not support Solr's "PostFilter". - [[SpatialSearch-DistanceSortingorBoosting_FunctionQueries_]] == Distance Sorting or Boosting (Function Queries) @@ -220,32 +220,51 @@ RPT _shares_ various features in common with `LatLonPointSpatialField`. Some are To use RPT, the field type must be registered and configured in `schema.xml`. There are many options for this field type. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`name`:: +The name of the field type. -[cols="30,70",options="header"] -|=== -|Setting |Description -|name |The name of the field type. -|class |This should be `solr.SpatialRecursivePrefixTreeFieldType`. But be aware that the Lucene spatial module includes some other so-called "spatial strategies" other than RPT, notably TermQueryPT*, BBox, PointVector*, and SerializedDV. Solr requires a field type to parallel these in order to use them. The asterisked ones have them. -|spatialContextFactory |This is a Java class name to an internal extension point governing support for shape definitions & parsing. If you require polygon support, set this to `JTS` – an alias for `org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory`; otherwise it can be omitted. See important info below about JTS. (note: prior to Solr 6, the "org.locationtech.spatial4j" part was "com.spatial4j.core" and there used to be no convenience JTS alias) -|geo |If **true**, the default, latitude and longitude coordinates will be used and the mathematical model will generally be a sphere. If false, the coordinates will be generic X & Y on a 2D plane using Euclidean/Cartesian geometry. -|format |Defines the shape syntax/format to be used. Defaults to `WKT` but `GeoJSON` is another popular format. Spatial4j governs this feature and supports https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/io/package-frame.html[other formats]. If a given shape is parseable as "lat,lon" or "x y" then that is always supported. -|distanceUnits a| -This is used to specify the units for distance measurements used throughout the use of this field. This can be `degrees`, `kilometers` or `miles`. It is applied to nearly all distance measurements involving the field: `maxDistErr`, `distErr`, `d`, `geodist` and the `score` when score is `distance`, `area`, or `area2d`. However, it doesn't affect distances embedded in WKT strings, (eg: "`BUFFER(POINT(200 10),0.2)`"), which are still in degrees. +`class`:: +This should be `solr.SpatialRecursivePrefixTreeFieldType`. But be aware that the Lucene spatial module includes some other so-called "spatial strategies" other than RPT, notably TermQueryPT*, BBox, PointVector*, and SerializedDV. Solr requires a field type to parallel these in order to use them. The asterisked ones have them. -`distanceUnits` defaults to either "```kilometers```" if `geo` is "```true```", or "```degrees```" if `geo` is "```false```". +`spatialContextFactory`:: +This is a Java class name to an internal extension point governing support for shape definitions & parsing. If you require polygon support, set this to `JTS` – an alias for `org.locationtech.spatial4j.context.jts.JtsSpatialContextFactory`; otherwise it can be omitted. See important info below about JTS. (note: prior to Solr 6, the "org.locationtech.spatial4j" part was "com.spatial4j.core" and there used to be no convenience JTS alias) +`geo`:: +If `true`, the default, latitude and longitude coordinates will be used and the mathematical model will generally be a sphere. If `false`, the coordinates will be generic X & Y on a 2D plane using Euclidean/Cartesian geometry. + +`format`:: Defines the shape syntax/format to be used. Defaults to `WKT` but `GeoJSON` is another popular format. Spatial4j governs this feature and supports https://locationtech.github.io/spatial4j/apidocs/org/locationtech/spatial4j/io/package-frame.html[other formats]. If a given shape is parseable as "lat,lon" or "x y" then that is always supported. + +`distanceUnits`:: a| +This is used to specify the units for distance measurements used throughout the use of this field. This can be `degrees`, `kilometers` or `miles`. It is applied to nearly all distance measurements involving the field: `maxDistErr`, `distErr`, `d`, `geodist` and the `score` when score is `distance`, `area`, or `area2d`. However, it doesn't affect distances embedded in WKT strings, (e.g., `BUFFER(POINT(200 10),0.2)`), which are still in degrees. ++ +`distanceUnits` defaults to either `kilometers` if `geo` is true`, or `degrees` if `geo` is `false`. ++ `distanceUnits` replaces the `units` attribute; which is now deprecated and mutually exclusive with this attribute. -|distErrPct |Defines the default precision of non-point shapes (both index & query), as a fraction between 0.0 (fully precise) to 0.5. The closer this number is to zero, the more accurate the shape will be. However, more precise indexed shapes use more disk space and take longer to index. Bigger distErrPct values will make queries faster but less accurate. At query time this can be overridden in the query syntax, such as to 0.0 so as to not approximate the search shape. The default for the RPT field is 0.025. Note: For RPTWithGeometrySpatialField (see below), there's always complete accuracy with the serialized geometry and so this doesn't control accuracy so much as it controls the trade-off of how big the index should be. distErrPct defaults to 0.15 for that field. -|maxDistErr |Defines the highest level of detail required for indexed data. If left blank, the default is one meter – just a bit less than 0.000009 degrees. This setting is used internally to compute an appropriate maxLevels (see below). -|worldBounds |Defines the valid numerical ranges for x and y, in the format of `ENVELOPE(minX, maxX, maxY, minY)`. If `geo="true"`, the standard lat-lon world boundaries are assumed. If `geo=false`, you should define your boundaries. -|distCalculator |Defines the distance calculation algorithm. If `geo=true`, "haversine" is the default. If `geo=false`, "cartesian" will be the default. Other possible values are "lawOfCosines", "vincentySphere" and "cartesian^2". -|prefixTree |Defines the spatial grid implementation. Since a PrefixTree (such as RecursivePrefixTree) maps the world as a grid, each grid cell is decomposed to another set of grid cells at the next level. If `geo=true` then the default prefix tree is "```geohash```", otherwise it's "```quad```". Geohash has 32 children at each level, quad has 4. Geohash can only be used for `geo=true` as it's strictly geospatial. A third choice is "```packedQuad```", which is generally more efficient than plain "quad", provided there are many levels -- perhaps 20 or more. -|maxLevels |Sets the maximum grid depth for indexed data. Instead, it's usually more intuitive to compute an appropriate maxLevels by specifying `maxDistErr` . -|=== +`distErrPct`:: +Defines the default precision of non-point shapes (both index & query), as a fraction between `0.0` (fully precise) to `0.5`. The closer this number is to zero, the more accurate the shape will be. However, more precise indexed shapes use more disk space and take longer to index. ++ +Bigger `distErrPct` values will make queries faster but less accurate. At query time this can be overridden in the query syntax, such as to `0.0` so as to not approximate the search shape. The default for the RPT field is `0.025`. ++ +NOTE: For RPTWithGeometrySpatialField (see below), there's always complete accuracy with the serialized geometry and so this doesn't control accuracy so much as it controls the trade-off of how big the index should be. distErrPct defaults to 0.15 for that field. -*_And there are others:_* `normWrapLongitude` _,_ `datelineRule`, `validationRule`, `autoIndex`, `allowMultiOverlap`, `precisionModel`. For further info, see notes below about `spatialContextFactory` implementations referenced above, especially the link to the JTS based one. +`maxDistErr`:: Defines the highest level of detail required for indexed data. If left blank, the default is one meter – just a bit less than 0.000009 degrees. This setting is used internally to compute an appropriate maxLevels (see below). + +`worldBounds`:: +Defines the valid numerical ranges for x and y, in the format of `ENVELOPE(minX, maxX, maxY, minY)`. If `geo="true"`, the standard lat-lon world boundaries are assumed. If `geo=false`, you should define your boundaries. + +`distCalculator`:: +Defines the distance calculation algorithm. If `geo=true`, "haversine" is the default. If `geo=false`, "cartesian" will be the default. Other possible values are "lawOfCosines", "vincentySphere" and "cartesian^2". + +`prefixTree`:: Defines the spatial grid implementation. Since a PrefixTree (such as RecursivePrefixTree) maps the world as a grid, each grid cell is decomposed to another set of grid cells at the next level. ++ +If `geo=true` then the default prefix tree is `geohash`, otherwise it's `quad`. Geohash has 32 children at each level, quad has 4. Geohash can only be used for `geo=true` as it's strictly geospatial. ++ +A third choice is `packedQuad`, which is generally more efficient than `quad`, provided there are many levels -- perhaps 20 or more. + +`maxLevels`:: Sets the maximum grid depth for indexed data. Instead, it's usually more intuitive to compute an appropriate maxLevels by specifying `maxDistErr` . + +*_And there are others:_* `normWrapLongitude`, `datelineRule`, `validationRule`, `autoIndex`, `allowMultiOverlap`, `precisionModel`. For further info, see notes below about `spatialContextFactory` implementations referenced above, especially the link to the JTS based one. [[SpatialSearch-JTSandPolygons]] === JTS and Polygons @@ -304,23 +323,30 @@ The RPT field supports generating a 2D grid of facet counts for documents having The heatmap feature is accessed from Solr's faceting feature. As a part of faceting, it supports the `key` local parameter as well as excluding tagged filter queries, just like other types of faceting do. This allows multiple heatmaps to be returned on the same field with different filters. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`facet`:: +Set to `true` to enable faceting. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|facet |Set to `true` to enable faceting -|facet.heatmap |The field name of type RPT -|facet.heatmap.geom |The region to compute the heatmap on, specified using the rectangle-range syntax or WKT. It defaults to the world. ex: `["-180 -90" TO "180 90"]` -|facet.heatmap.gridLevel |A specific grid level, which determines how big each grid cell is. Defaults to being computed via distErrPct (or distErr) -|facet.heatmap.distErrPct |A fraction of the size of geom used to compute gridLevel. Defaults to 0.15. It's computed the same as a similarly named parameter for RPT. -|facet.heatmap.distErr |A cell error distance used to pick the grid level indirectly. It's computed the same as a similarly named parameter for RPT. -|facet.heatmap.format |The format, either `ints2D` (default) or `png`. -|=== +`facet.heatmap`:: +The field name of type RPT. + +`facet.heatmap.geom`:: +The region to compute the heatmap on, specified using the rectangle-range syntax or WKT. It defaults to the world. ex: `["-180 -90" TO "180 90"]`. + +`facet.heatmap.gridLevel`:: +A specific grid level, which determines how big each grid cell is. Defaults to being computed via `distErrPct` (or `distErr`). + +`facet.heatmap.distErrPct`:: +A fraction of the size of geom used to compute gridLevel. Defaults to 0.15. It's computed the same as a similarly named parameter for RPT. + +`facet.heatmap.distErr`:: +A cell error distance used to pick the grid level indirectly. It's computed the same as a similarly named parameter for RPT. + +`facet.heatmap.format`:: +The format, either `ints2D` (default) or `png`. [TIP] ==== -You'll experiment with different distErrPct values (probably 0.10 - 0.20) with various input geometries till the default size is what you're looking for. The specific details of how it's computed isn't important. For high-detail grids used in point-plotting (loosely one cell per pixel), set distErr to be the number of decimal-degrees of several pixels or so of the map being displayed. Also, you probably don't want to use a geohash based grid because the cell orientation between grid levels flip-flops between being square and rectangle. Quad is consistent and has more levels, albeit at the expense of a larger index. +You'll experiment with different `distErrPct` values (probably 0.10 - 0.20) with various input geometries till the default size is what you're looking for. The specific details of how it's computed isn't important. For high-detail grids used in point-plotting (loosely one cell per pixel), set `distErr` to be the number of decimal-degrees of several pixels or so of the map being displayed. Also, you probably don't want to use a geohash-based grid because the cell orientation between grid levels flip-flops between being square and rectangle. Quad is consistent and has more levels, albeit at the expense of a larger index. ==== Here's some sample output in JSON (with "..." inserted for brevity): diff --git a/solr/solr-ref-guide/src/the-query-elevation-component.adoc b/solr/solr-ref-guide/src/the-query-elevation-component.adoc index 9898a082216..dcd3c7e190f 100644 --- a/solr/solr-ref-guide/src/the-query-elevation-component.adoc +++ b/solr/solr-ref-guide/src/the-query-elevation-component.adoc @@ -61,17 +61,16 @@ Optionally, in the Query Elevation Component configuration you can also specify foo ---- -The Query Elevation Search Component takes the following arguments: +The Query Elevation Search Component takes the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`queryFieldType`:: +Specifies which fieldType should be used to analyze the incoming text. For example, it may be appropriate to use a fieldType with a LowerCaseFilter. -[cols="30,70",options="header"] -|=== -|Argument |Description -|`queryFieldType` |Specifies which fieldType should be used to analyze the incoming text. For example, it may be appropriate to use a fieldType with a LowerCaseFilter. -|`config-file` |Path to the file that defines query elevation. This file must exist in `/conf/` or `/`. If the file exists in the /conf/ directory it will be loaded once at startup. If it exists in the data directory, it will be reloaded for each IndexReader. -|`forceElevation` |By default, this component respects the requested `sort` parameter: if the request asks to sort by date, it will order the results by date. If `forceElevation=true` (the default), results will first return the boosted docs, then order by date. -|=== +`config-file`:: +Path to the file that defines query elevation. This file must exist in `/conf/` or `/`. If the file exists in the `conf/` directory it will be loaded once at startup. If it exists in the `data/` directory, it will be reloaded for each IndexReader. + +`forceElevation`:: +By default, this component respects the requested `sort` parameter: if the request asks to sort by date, it will order the results by date. If `forceElevation=true` (the default), results will first return the boosted docs, then order by date. [[TheQueryElevationComponent-elevate.xml]] === elevate.xml diff --git a/solr/solr-ref-guide/src/the-stats-component.adoc b/solr/solr-ref-guide/src/the-stats-component.adoc index 96ba88c1fe3..a5eb334a1bf 100644 --- a/solr/solr-ref-guide/src/the-stats-component.adoc +++ b/solr/solr-ref-guide/src/the-stats-component.adoc @@ -32,18 +32,14 @@ bin/solr -e techproducts The Stats Component accepts the following parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`stats`:: +If `true`, then invokes the Stats component. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|stats |If **true**, then invokes the Stats component. -|stats.field a| +`stats.field`:: Specifies a field for which statistics should be generated. This parameter may be invoked multiple times in a query in order to request statistics on multiple fields. - ++ <> may be used to indicate which subset of the supported statistics should be computed, and/or that statistics should be computed over the results of an arbitrary numeric function (or query) instead of a simple field name. See the examples below. -|=== [[TheStatsComponent-Example]] === Example @@ -96,26 +92,47 @@ The query below demonstrates computing stats against two different fields numeri [[TheStatsComponent-StatisticsSupported]] == Statistics Supported -The table below explains the statistics supported by the Stats component. Not all statistics are supported for all field types, and not all statistics are computed by default (See <> below for details) +The table below explains the statistics supported by the Stats component. Not all statistics are supported for all field types, and not all statistics are computed by default (see <> below for details) -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`min`:: +The minimum value of the field/function in all documents in the set. This statistic is computed for all field types and is computed by default. -[cols="10,10,50,20,10",options="header"] -|=== -|Local Param |Sample Input |Description |Supported Types |Computed by Default -|min |true |The minimum value of the field/function in all documents in the set. |All |Yes -|max |true |The maximum value of the field/function in all documents in the set. |All |Yes -|sum |true |The sum of all values of the field/function in all documents in the set. |Numeric & Date |Yes -|count |true |The number of values found in all documents in the set for this field/function. |All |Yes -|missing |true |The number of documents in the set which do not have a value for this field/function. |All |Yes -|sumOfSquares |true |Sum of all values squared (a by product of computing stddev) |Numeric & Date |Yes -|mean |true |The average `(v1 + v2 .... + vN)/N` |Numeric & Date |Yes -|stddev |true |Standard deviation, measuring how widely spread the values in the data set are. |Numeric & Date |Yes -|percentiles |"1,99,99.9" |A list of percentile values based on cut-off points specified by the param value. These values are an approximation, using the https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[t-digest algorithm]. |Numeric |No -|distinctValues |true |The set of all distinct values for the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. |All |No -|countDistinct |true |The exact number of distinct values in the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. |All |No -|cardinality |"true" or"0.3" |A statistical approximation (currently using the https://en.wikipedia.org/wiki/HyperLogLog[HyperLogLog] algorithm) of the number of distinct values in the field/function in all of the documents in the set. This calculation is much more efficient then using the 'countDistinct' option, but may not be 100% accurate. Input for this option can be floating point number between 0.0 and 1.0 indicating how aggressively the algorithm should try to be accurate: 0.0 means use as little memory as possible; 1.0 means use as much memory as needed to be as accurate as possible. 'true' is supported as an alias for "0.3" |All |No -|=== +`max`:: +The maximum value of the field/function in all documents in the set. This statistic is computed for all field types and is computed by default. + +`sum`:: +The sum of all values of the field/function in all documents in the set. This statistic is computed for numeric and date field types and is computed by default. + +`count`:: +The number of values found in all documents in the set for this field/function. This statistic is computed for all field types and is computed by default. + +`missing`:: +The number of documents in the set which do not have a value for this field/function. This statistic is computed for all field types and is computed by default. + +`sumOfSquares`:: +Sum of all values squared (a by product of computing stddev). This statistic is computed for numeric and date field types and is computed by default. + +`mean`:: +The average `(v1 + v2 .... + vN)/N`. This statistic is computed for numeric and date field types and is computed by default. + +`stddev`:: +Standard deviation, measuring how widely spread the values in the data set are. This statistic is computed for numeric and date field types and is computed by default. + +`percentiles`:: +A list of percentile values based on cut-off points specified by the parameter value, such as `1,99,99.9`. These values are an approximation, using the https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf[t-digest algorithm]. This statistic is computed for numeric field types and is not computed by default. + +`distinctValues`:: +The set of all distinct values for the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. This statistic is computed for all field types but is not computed by default. + +`countDistinct`:: +The exact number of distinct values in the field/function in all of the documents in the set. This calculation can be very expensive for fields that do not have a tiny cardinality. This statistic is computed for all field types but is not computed by default. + +`cardinality`:: +A statistical approximation (currently using the https://en.wikipedia.org/wiki/HyperLogLog[HyperLogLog] algorithm) of the number of distinct values in the field/function in all of the documents in the set. This calculation is much more efficient then using the `countDistinct` option, but may not be 100% accurate. ++ +Input for this option can be floating point number between `0.0` and `1.0` indicating how aggressively the algorithm should try to be accurate: `0.0` means use as little memory as possible; `1.0` means use as much memory as needed to be as accurate as possible. `true` is supported as an alias for `0.3`. ++ +This statistic is computed for all field types but is not computed by default. [[TheStatsComponent-LocalParameters]] == Local Parameters diff --git a/solr/solr-ref-guide/src/the-term-vector-component.adoc b/solr/solr-ref-guide/src/the-term-vector-component.adoc index fb92dc9db31..dd73d8640ec 100644 --- a/solr/solr-ref-guide/src/the-term-vector-component.adoc +++ b/solr/solr-ref-guide/src/the-term-vector-component.adoc @@ -127,37 +127,48 @@ The example below shows an invocation of this component using the above configur [[TheTermVectorComponent-RequestParameters]] === Request Parameters -The example below shows the available request parameters for this component: +The example below shows some of the available request parameters for this component: -`\http://localhost:8983/solr/techproducts/tvrh?q=includes:[* TO *]&rows=10&indent=true&tv=true&tv.tf=true&tv.df=true&tv.positions=true&tv.offsets=true&tv.payloads=true&tv.fl=includes` +[source,bash] +http://localhost:8983/solr/techproducts/tvrh?q=includes:[* TO *]&rows=10&indent=true&tv=true&tv.tf=true&tv.df=true&tv.positions=true&tv.offsets=true&tv.payloads=true&tv.fl=includes -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`tv`:: +If `true`, the Term Vector Component will run. -[cols="20,60,20",options="header"] -|=== -|Boolean Parameters |Description |Type -|tv |Should the component run or not |boolean -|tv.docIds |Returns term vectors for the specified list of Lucene document IDs (not the Solr Unique Key). |comma seperated integers -|tv.fl |Returns term vectors for the specified list of fields. If not specified, the `fl` parameter is used. |comma seperated list of field names -|tv.all |A shortcut that invokes all the boolean parameters listed below. |boolean -|tv.df |Returns the Document Frequency (DF) of the term in the collection. This can be computationally expensive. |boolean -|tv.offsets |Returns offset information for each term in the document. |boolean -|tv.positions |Returns position information. |boolean -|tv.payloads |Returns payload information. |boolean -|tv.tf |Returns document term frequency info per term in the document. |boolean -|tv.tf_idf a| -Calculates TF / DF (ie: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure. +`tv.docIds`:: +For a given comma-separated list of Lucene document IDs (*not* the Solr Unique Key), term vectors will be returned. -Requires the parameters `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output) +`tv.fl`:: +For a given comma-separated list of fields, term vectors will be returned. If not specified, the `fl` parameter is used. - |boolean -|=== +`tv.all`:: +If `true`, all the boolean parameters listed below (`tv.df`, `tv.offsets`, `tv.positions`, `tv.payloads`, `tv.tf` and `tv.tf_idf`) will be enabled. + +`tv.df`:: +If `true`, returns the Document Frequency (DF) of the term in the collection. This can be computationally expensive. + +`tv.offsets`:: +If `true`, returns offset information for each term in the document. + +`tv.positions`:: +If `true`, returns position information. + +`tv.payloads`:: +If `true`, returns payload information. + +`tv.tf`:: +If `true`, returns document term frequency info for each term in the document. + +`tv.tf_idf`:: a| +If `true`, calculates TF / DF (ie: TF * IDF) for each term. Please note that this is a _literal_ calculation of "Term Frequency multiplied by Inverse Document Frequency" and *not* a classical TF-IDF similarity measure. ++ +This parameter requires both `tv.tf` and `tv.df` to be "true". This can be computationally expensive. (The results are not shown in example output) To learn more about TermVector component output, see the Wiki page: http://wiki.apache.org/solr/TermVectorComponentExampleOptions -For schema requirements, see the Wiki page: http://wiki.apache.org/solr/FieldOptionsByUseCase +For schema requirements, see also the section <>. [[TheTermVectorComponent-SolrJandtheTermVectorComponent]] == SolrJ and the Term Vector Component -Neither the SolrQuery class nor the QueryResponse class offer specific method calls to set Term Vector Component parameters or get the "termVectors" output. However, there is a patch for it: https://issues.apache.org/jira/browse/SOLR-949[SOLR-949]. +Neither the `SolrQuery` class nor the `QueryResponse` class offer specific method calls to set Term Vector Component parameters or get the "termVectors" output. However, there is a patch for it: https://issues.apache.org/jira/browse/SOLR-949[SOLR-949]. diff --git a/solr/solr-ref-guide/src/the-terms-component.adoc b/solr/solr-ref-guide/src/the-terms-component.adoc index 346278b52a0..c8ec78234e8 100644 --- a/solr/solr-ref-guide/src/the-terms-component.adoc +++ b/solr/solr-ref-guide/src/the-terms-component.adoc @@ -53,89 +53,86 @@ You could add this component to another handler if you wanted to, and pass "term The parameters below allow you to control what terms are returned. You can also configure any of these with the request handler if you'd like to set them permanently. Or, you can add them to the query request. These parameters are: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="20,15,15,50",options="header"] -|=== -|Parameter |Required |Default |Description -|terms |No |false a| -If set to true, enables the Terms Component. By default, the Terms Component is off. - +`terms`:: +If set to `true`, enables the Terms Component. By default, the Terms Component is off (`false`). ++ Example: `terms=true` -|terms.fl |Yes |null a| -Specifies the field from which to retrieve terms. - +`terms.fl`:: +Specifies the field from which to retrieve terms. This parameter is required if `terms=true`. ++ Example: `terms.fl=title` -|terms.list |No |null a| +`terms.list`:: Fetches the document frequency for a comma delimited list of terms. Terms are always returned in index order. If `terms.ttf` is set to true, also returns their total term frequency. If multiple `terms.fl` are defined, these statistics will be returned for each term in each requested field. - ++ Example: `terms.list=termA,termB,termC` -|terms.limit |No |10 a| -Specifies the maximum number of terms to return. The default is 10. If the limit is set to a number less than 0, then no maximum limit is enforced. Although this is not required, either this parameter or `terms.upper` must be defined. - +`terms.limit`:: +Specifies the maximum number of terms to return. The default is `10`. If the limit is set to a number less than 0, then no maximum limit is enforced. Although this is not required, either this parameter or `terms.upper` must be defined. ++ Example: `terms.limit=20` -|terms.lower |No |empty string a| +`terms.lower`:: Specifies the term at which to start. If not specified, the empty string is used, causing Solr to start at the beginning of the field. - ++ Example: `terms.lower=orange` -|terms.lower.incl |No |true a| +`terms.lower.incl`:: If set to true, includes the lower-bound term (specified with `terms.lower` in the result set. - ++ Example: `terms.lower.incl=false` -|terms.mincount |No |null a| +`terms.mincount`:: Specifies the minimum document frequency to return in order for a term to be included in a query response. Results are inclusive of the mincount (that is, >= mincount). - ++ Example: `terms.mincount=5` -|terms.maxcount |No |null a| +`terms.maxcount`:: Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount). - ++ Example: `terms.maxcount=25` -|terms.prefix |No |null a| +`terms.prefix`:: Restricts matches to terms that begin with the specified string. - ++ Example: `terms.prefix=inter` -|terms.raw |No |false a| +`terms.raw`:: If set to true, returns the raw characters of the indexed term, regardless of whether it is human-readable. For instance, the indexed form of numeric numbers is not human-readable. - ++ Example: `terms.raw=true` -|terms.regex |No |null a| +`terms.regex`:: Restricts matches to terms that match the regular expression. - ++ Example: `terms.regex=.*pedist` -|terms.regex.flag |No |null a| +`terms.regex.flag`:: Defines a Java regex flag to use when evaluating the regular expression defined with `terms.regex`. See http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html for details of each flag. Valid options are: -* case_insensitive -* comments -* multiline -* literal -* dotall -* unicode_case -* canon_eq -* unix_lines - +* `case_insensitive` +* `comments` +* `multiline` +* `literal` +* `dotall` +* `unicode_case` +* `canon_eq` +* `unix_lines` ++ Example: `terms.regex.flag=case_insensitive` -|terms.stats |No |null |Include index statistics in the results. Currently returns only the *numDocs* for a collection. When combined with terms.list it provides enough information to compute idf for a list of terms. -|terms.sort |No |count a| -Defines how to sort the terms returned. Valid options are *count*, which sorts by the term frequency, with the highest term frequency first, or *index*, which sorts in index order. +`terms.stats`:: +Include index statistics in the results. Currently returns only the *numDocs* for a collection. When combined with `terms.list` it provides enough information to compute inverse document frequency (IDF) for a list of terms. +`terms.sort`:: +Defines how to sort the terms returned. Valid options are `count`, which sorts by the term frequency, with the highest term frequency first, or `index`, which sorts in index order. ++ Example: `terms.sort=index` -|terms.ttf |No |false a| +`terms.ttf`:: If set to true, returns both `df` (docFreq) and `ttf` (totalTermFreq) statistics for each requested term in `terms.list`. In this case, the response format is: - ++ [source,xml] ---- @@ -148,19 +145,19 @@ If set to true, returns both `df` (docFreq) and `ttf` (totalTermFreq) statistics ---- -|terms.upper |No |null a| +`terms.upper`:: Specifies the term to stop at. Although this parameter is not required, either this parameter or `terms.limit` must be defined. - ++ Example: `terms.upper=plum` -|terms.upper.incl |No |false a| +`terms.upper.incl`:: If set to true, the upper bound term is included in the result set. The default is false. - ++ Example: `terms.upper.incl=true` -|=== +The response to a terms request is a list of the terms and their document frequency values. -The output is a list of the terms and their document frequency values. See below for examples. +You may also be interested in the {solr-javadocs}/solr-core/org/apache/solr/handler/component/TermsComponent.html[TermsComponent javadoc]. [[TheTermsComponent-Examples]] == Examples @@ -296,16 +293,8 @@ Result: The TermsComponent also supports distributed indexes. For the `/terms` request handler, you must provide the following two parameters: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`shards`:: +Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <>. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|shards |Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see <>. -|shards.qt |Specifies the request handler Solr uses for requests to shards. -|=== - -[[TheTermsComponent-MoreResources]] -== More Resources - -* {solr-javadocs}/solr-core/org/apache/solr/handler/component/TermsComponent.html[TermsComponent javadoc] +`shards.qt`:: +Specifies the request handler Solr uses for requests to shards. diff --git a/solr/solr-ref-guide/src/update-request-processors.adoc b/solr/solr-ref-guide/src/update-request-processors.adoc index d1f5c351e74..7942028f792 100644 --- a/solr/solr-ref-guide/src/update-request-processors.adoc +++ b/solr/solr-ref-guide/src/update-request-processors.adoc @@ -386,7 +386,7 @@ These Update processors do not need any configuration is your `solrconfig.xml` . The `TemplateUpdateProcessorFactory` can be used to add new fields to documents based on a template pattern. -Use the parameter `processor=Template` to use it. The template parameter `Template.field` (multivalued) define the field to add and the pattern. Templates may contain placeholders which refer to other fields in the document. You can have multiple `Template.field` parameters in a single request. +Use the parameter `processor=Template` to use it. The template parameter `Template.field` (multivalued) defines the field to add and the pattern. Templates may contain placeholders which refer to other fields in the document. You can have multiple `Template.field` parameters in a single request. For example: @@ -395,7 +395,7 @@ For example: processor=Template&Template.field=fullName:Mr. {firstName} {lastName} ---- -The above example would add a new field to the document called `fullName`. The fields `firstName and` `lastName` are supplied from the document fields. If either of them is missing, that part is replaced with an empty string. If those fields are multi-valued, only the first value is used. +The above example would add a new field to the document called `fullName`. The fields `firstName` and `lastName` are supplied from the document fields. If either of them is missing, that part is replaced with an empty string. If those fields are multi-valued, only the first value is used. ==== AtomicUpdateProcessorFactory @@ -414,4 +414,4 @@ The above parameters convert a normal `update` operation on * `field1` to an atomic `add` operation * `field2` to an atomic `set` operation * `field3` to an atomic `inc` operation -* `field4` to an atomic `remove` operation \ No newline at end of file +* `field4` to an atomic `remove` operation diff --git a/solr/solr-ref-guide/src/updatehandlers-in-solrconfig.adoc b/solr/solr-ref-guide/src/updatehandlers-in-solrconfig.adoc index 664bd8c8e38..040da8626db 100644 --- a/solr/solr-ref-guide/src/updatehandlers-in-solrconfig.adoc +++ b/solr/solr-ref-guide/src/updatehandlers-in-solrconfig.adoc @@ -46,17 +46,16 @@ For more information about Near Real Time operations, see <> for more information: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`numRecordsToKeep`:: +The number of update records to keep per log. The default is `100`. + +`maxNumLogsToKeep`:: +The maximum number of logs keep. The default is `10`. + +`numVersionBuckets`:: +The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires `(8 bytes (long) * numVersionBuckets)` of heap space per Solr core. The default is `65536`. -[cols="25,10,10,55",options="header"] -|=== -|Setting Name |Type |Default |Description -|numRecordsToKeep |int |100 |The number of update records to keep per log -|maxNumLogsToKeep |int |10 |The maximum number of logs keep -|numVersionBuckets |int |65536 |The number of buckets used to keep track of max version values when checking for re-ordered updates; increase this value to reduce the cost of synchronizing access to version buckets during high-volume indexing, this requires (8 bytes (long) * numVersionBuckets) of heap space per Solr core. -|=== An example, to be included under `` in `solrconfig.xml`, employing the above advanced settings: diff --git a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc index ecd9b4c96f6..fac3cac57b1 100644 --- a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc +++ b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc @@ -35,37 +35,22 @@ Solr supports several modifiers that atomically update values of a document. Thi To use atomic updates, add a modifier to the field that needs to be updated. The content can be updated, added to, or incrementally increased if a number. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed - -[cols="30,70",options="header"] -|=== -|Modifier |Usage -|set a| +`set`:: Set or replace the field value(s) with the specified value(s), or remove the values if 'null' or empty list is specified as the new value. ++ +May be specified as a single value, or as a list for multiValued fields. -May be specified as a single value, or as a list for multiValued fields +`add`:: +Adds the specified values to a multiValued field. May be specified as a single value, or as a list. -|add a| -Adds the specified values to a multiValued field. +`remove`:: +Removes (all occurrences of) the specified values from a multiValued field. May be specified as a single value, or as a list. -May be specified as a single value, or as a list. +`removeregex`:: +Removes all occurrences of the specified regex from a multiValued field. May be specified as a single value, or as a list. -|remove a| -Removes (all occurrences of) the specified values from a multiValued field. - -May be specified as a single value, or as a list. - -|removeregex a| -Removes all occurrences of the specified regex from a multiValued field. - -May be specified as a single value, or as a list. - -|inc a| -Increments a numeric value by a specific amount. - -Must be specified as a single numeric value. - -|=== +`inc`:: +Increments a numeric value by a specific amount. Must be specified as a single numeric value. [[UpdatingPartsofDocuments-FieldStorage]] === Field Storage @@ -130,20 +115,11 @@ An atomic update operation is performed using this approach only when the fields To use in-place updates, add a modifier to the field that needs to be updated. The content can be updated or incrementally increased. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`set`:: +Set or replace the field value(s) with the specified value(s). May be specified as a single value. -[cols="30,70",options="header"] -|=== -|Modifier |Usage -|set a| -Set or replace the field value(s) with the specified value(s). - -May be specified as a single value. -|inc a| -Increments a numeric value by a specific amount. - -Must be specified as a single numeric value. -|=== +`inc`:: +Increments a numeric value by a specific amount. Must be specified as a single numeric value. [[UpdatingPartsofDocuments-Example.1]] === Example diff --git a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc index 8bad5f58062..6a9d350120a 100644 --- a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc +++ b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc @@ -74,18 +74,15 @@ For example: The add command supports some optional attributes which may be specified. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`commitWithin`:: +Add the document within the specified number of milliseconds. -[cols="30,70",options="header"] -|=== -|Optional Parameter |Parameter Description -|commitWithin=_number_ |Add the document within the specified number of milliseconds -|overwrite=_boolean_ |Default is true. Indicates if the unique key constraints should be checked to overwrite previous versions of the same document (see below) -|=== +`overwrite`:: +Default is `true`. Indicates if the unique key constraints should be checked to overwrite previous versions of the same document (see below). -If the document schema defines a unique key, then by default an `/update` operation to add a document will overwrite (ie: replace) any document in the index with the same unique key. If no unique key has been defined, indexing performance is somewhat faster, as no check has to be made for an existing documents to replace. +If the document schema defines a unique key, then by default an `/update` operation to add a document will overwrite (i.e., replace) any document in the index with the same unique key. If no unique key has been defined, indexing performance is somewhat faster, as no check has to be made for an existing documents to replace. -If you have a unique key field, but you feel confident that you can safely bypass the uniqueness check (eg: you build your indexes in batch, and your indexing code guarantees it never adds the same document more than once) you can specify the `overwrite="false"` option when adding your documents. +If you have a unique key field, but you feel confident that you can safely bypass the uniqueness check (e.g., you build your indexes in batch, and your indexing code guarantees it never adds the same document more than once) you can specify the `overwrite="false"` option when adding your documents. [[UploadingDatawithIndexHandlers-XMLUpdateCommands]] === XML Update Commands @@ -101,15 +98,12 @@ The `` operation requests Solr to merge internal data structures in or The `` and `` elements accept these optional attributes: -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`waitSearcher`:: +Default is `true`. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible. -[cols="30,70",options="header"] -|=== -|Optional Attribute |Description -|waitSearcher |Default is true. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible. -|expungeDeletes |(commit only) Default is false. Merges segments that have more than 10% deleted docs, expunging them in the process. -|maxSegments |(optimize only) Default is 1. Merges the segments down to no more than this number of segments. -|=== +`expungeDeletes`:: (commit only) Default is `false`. Merges segments that have more than 10% deleted docs, expunging them in the process. + +`maxSegments`:: (optimize only) Default is `1`. Merges the segments down to no more than this number of segments. Here are examples of and using optional attributes: @@ -426,29 +420,83 @@ The CSV handler allows the specification of many parameters in the URL in the fo The table below describes the parameters for the update handler. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`separator`:: +Character used as field separator; default is ",". This parameter is global; for per-field usage, see the `split` parameter. ++ +Example: `separator=%09` -[cols="20,40,20,20",options="header"] -|=== -|Parameter |Usage |Global (g) or Per Field (f) |Example -|separator |Character used as field separator; default is "," |g,(f: see split) |separator=%09 -|trim |If true, remove leading and trailing whitespace from values. Default=false. |g,f |f.isbn.trim=true trim=false -|header |Set to true if first line of input contains field names. These will be used if the *fieldnames* parameter is absent. |g | -|fieldnames |Comma separated list of field names to use when adding documents. |g |fieldnames=isbn,price,title -|literal. |A literal value for a specified field name. |g |literal.color=red -|skip |Comma separated list of field names to skip. |g |skip=uninteresting,shoesize -|skipLines |Number of lines to discard in the input stream before the CSV data starts, including the header, if present. Default=0. |g |skipLines=5 -|encapsulator |The character optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator. |g,(f: see split) |encapsulator=" -|escape |The character used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both |g |escape=\ -|keepEmpty |Keep and index zero length (empty) fields. Default=false. |g,f |f.price.keepEmpty=true -|map |Map one value to another. Format is value:replacement (which can be empty.) |g,f |map=left:right f.subject.map=history:bunk -|split |If true, split a field into multiple values by a separate parser. |f | -|overwrite |If true (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the Solr schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up setting this to false. |g | -|commit |Issues a commit after the data has been ingested. |g | -|commitWithin |Add the document within the specified number of milliseconds. |g |commitWithin=10000 -|rowid |Map the rowid (line number) to a field specified by the value of the parameter, for instance if your CSV doesn't have a unique key and you want to use the row id as such. |g |rowid=id -|rowidOffset |Add the given offset (as an int) to the rowid before adding it to the document. Default is 0 |g |rowidOffset=10 -|=== +`trim`:: +If `true`, remove leading and trailing whitespace from values. The default is `false`. This parameter can be either global or per-field. ++ +Examples: `f.isbn.trim=true` or `trim=false` + +`header`:: +Set to `true` if first line of input contains field names. These will be used if the `fieldnames` parameter is absent. This parameter is global. + +`fieldnames`:: +Comma-separated list of field names to use when adding documents. This parameter is global. ++ +Example: `fieldnames=isbn,price,title` + +`literal._field_name_`:: +A literal value for a specified field name. This parameter is global. ++ +Example: `literal.color=red` + +`skip`:: +Comma separated list of field names to skip. This parameter is global. ++ +Example: `skip=uninteresting,shoesize` + +`skipLines`:: +Number of lines to discard in the input stream before the CSV data starts, including the header, if present. Default=`0`. This parameter is global. ++ +Example: `skipLines=5` + +`encapsulator`:: The character optionally used to surround values to preserve characters such as the CSV separator or whitespace. This standard CSV format handles the encapsulator itself appearing in an encapsulated value by doubling the encapsulator. ++ +This parameter is global; for per-field usage, see `split`. ++ +Example: `encapsulator="` + +`escape`:: The character used for escaping CSV separators or other reserved characters. If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both. |g | + +Example: `escape=\` + +`keepEmpty`:: +Keep and index zero length (empty) fields. The default is `false`. This parameter can be global or per-field. ++ +Example: `f.price.keepEmpty=true` + +`map`:: Map one value to another. Format is value:replacement (which can be empty). This parameter can be global or per-field. ++ +Example: `map=left:right` or `f.subject.map=history:bunk` + +`split`:: +If `true`, split a field into multiple values by a separate parser. This parameter is used on a per-field basis. + +`overwrite`:: +If `true` (the default), check for and overwrite duplicate documents, based on the uniqueKey field declared in the Solr schema. If you know the documents you are indexing do not contain any duplicates then you may see a considerable speed up setting this to `false`. ++ +This parameter is global. + +`commit`:: +Issues a commit after the data has been ingested. This parameter is global. + +`commitWithin`:: +Add the document within the specified number of milliseconds. This parameter is global. ++ +Example: `commitWithin=10000` + +`rowid`:: +Map the `rowid` (line number) to a field specified by the value of the parameter, for instance if your CSV doesn't have a unique key and you want to use the row id as such. This parameter is global. ++ +Example: `rowid=id` + +`rowidOffset`:: +Add the given offset (as an integer) to the `rowid` before adding it to the document. Default is `0`. This parameter is global. ++ +Example: `rowidOffset=10` [[UploadingDatawithIndexHandlers-IndexingTab-Delimitedfiles]] === Indexing Tab-Delimited files diff --git a/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc b/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc index 670ef2b90ed..8096e8c11d3 100644 --- a/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc +++ b/solr/solr-ref-guide/src/uploading-data-with-solr-cell-using-apache-tika.adoc @@ -101,41 +101,73 @@ This command allows you to query the document using an attribute, as in: `\http: The table below describes the parameters accepted by the Extracting Request Handler. -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`capture`:: +Captures XHTML elements with the specified name for a supplementary addition to the Solr document. This parameter can be useful for copying chunks of the XHTML into a separate field. For instance, it could be used to grab paragraphs (`

`) and index them into a separate field. Note that content is still also captured into the overall "content" field. + +`captureAttr`:: +Indexes attributes of the Tika XHTML elements into separate fields, named after the element. If set to true, for example, when extracting from HTML, Tika can return the href attributes in tags as fields named "a". See the examples below. + +`commitWithin`:: +Add the document within the specified number of milliseconds. + +`date.formats`:: +Defines the date format patterns to identify in the documents. + +`defaultField`:: +If the `uprefix` parameter (see below) is not specified and a field cannot be determined, the default field will be used. + +`extractOnly`:: +Default is `false`. If `true`, returns the extracted content from Tika without indexing the document. This literally includes the extracted XHTML as a string in the response. When viewing manually, it may be useful to use a response format other than XML to aid in viewing the embedded XHTML tags. For an example, see http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput. + +`extractFormat`:: +The default is `xml`, but the other option is `text`. Controls the serialization format of the extract content. The `xml` format is actually XHTML, the same format that results from passing the `-x` command to the Tika command line application, while the text format is like that produced by Tika's `-t` command. This parameter is valid only if `extractOnly` is set to true. + +`fmap._source_field_`:: +Maps (moves) one field name to another. The `source_field` must be a field in incoming documents, and the value is the Solr field to map to. Example: `fmap.content=text` causes the data in the `content` field generated by Tika to be moved to the Solr's `text` field. + +`ignoreTikaException`:: +If `true`, exceptions found during processing will be skipped. Any metadata available, however, will be indexed. + +`literal._fieldname_`:: +Populates a field with the name supplied with the specified value for each document. The data can be multivalued if the field is multivalued. + +`literalsOverride`:: +If `true` (the default), literal field values will override other values with the same field name. If `false`, literal values defined with `literal._fieldname_` will be appended to data already in the fields extracted from Tika. If setting `literalsOverride` to `false`, the field must be multivalued. + +`lowernames`:: +Values are `true` or `false`. If `true`, all field names will be mapped to lowercase with underscores, if needed. For example, "Content-Type" would be mapped to "content_type." + +`multipartUploadLimitInKB`:: +Useful if uploading very large documents, this defines the KB size of documents to allow. + +`passwordsFile`:: +Defines a file path and name for a file of file name to password mappings. + +`resource.name`:: +Specifies the optional name of the file. Tika can use it as a hint for detecting a file's MIME type. + +`resource.password`:: +Defines a password to use for a password-protected PDF or OOXML file + +`tika.config`:: +Defines a file path and name to a customized Tika configuration file. This is only required if you have customized your Tika implementation. + +`uprefix`:: +Prefixes all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions. Example: `uprefix=ignored_` would effectively ignore all unknown fields generated by Tika given the example schema contains `` + +`xpath`:: +When extracting, only return Tika XHTML content that satisfies the given XPath expression. See http://tika.apache.org/1.7/index.html for details on the format of Tika XHTML. See also http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput. -[cols="30,70",options="header"] -|=== -|Parameter |Description -|capture |Captures XHTML elements with the specified name for a supplementary addition to the Solr document. This parameter can be useful for copying chunks of the XHTML into a separate field. For instance, it could be used to grab paragraphs (`

`) and index them into a separate field. Note that content is still also captured into the overall "content" field. -|captureAttr |Indexes attributes of the Tika XHTML elements into separate fields, named after the element. If set to true, for example, when extracting from HTML, Tika can return the href attributes in tags as fields named "a". See the examples below. -|commitWithin |Add the document within the specified number of milliseconds. -|date.formats |Defines the date format patterns to identify in the documents. -|defaultField |If the uprefix parameter (see below) is not specified and a field cannot be determined, the default field will be used. -|extractOnly |Default is false. If true, returns the extracted content from Tika without indexing the document. This literally includes the extracted XHTML as a string in the response. When viewing manually, it may be useful to use a response format other than XML to aid in viewing the embedded XHTML tags.For an example, see http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput. -|extractFormat |Default is "xml", but the other option is "text". Controls the serialization format of the extract content. The xml format is actually XHTML, the same format that results from passing the `-x` command to the Tika command line application, while the text format is like that produced by Tika's `-t` command. This parameter is valid only if `extractOnly` is set to true. -|fmap.<__source_field__> |Maps (moves) one field name to another. The `source_field` must be a field in incoming documents, and the value is the Solr field to map to. Example: `fmap.content=text` causes the data in the `content` field generated by Tika to be moved to the Solr's `text` field. -|ignoreTikaException |If true, exceptions found during processing will be skipped. Any metadata available, however, will be indexed. -|literal.<__fieldname__> |Populates a field with the name supplied with the specified value for each document. The data can be multivalued if the field is multivalued. -|literalsOverride |If true (the default), literal field values will override other values with the same field name. If false, literal values defined with `literal.<__fieldname__>` will be appended to data already in the fields extracted from Tika. If setting `literalsOverride` to "false", the field must be multivalued. -|lowernames |Values are "true" or "false". If true, all field names will be mapped to lowercase with underscores, if needed. For example, "Content-Type" would be mapped to "content_type." -|multipartUploadLimitInKB |Useful if uploading very large documents, this defines the KB size of documents to allow. -|passwordsFile |Defines a file path and name for a file of file name to password mappings. -|resource.name |Specifies the optional name of the file. Tika can use it as a hint for detecting a file's MIME type. -|resource.password |Defines a password to use for a password-protected PDF or OOXML file -|tika.config |Defines a file path and name to a customized Tika configuration file. This is only required if you have customized your Tika implementation. -|uprefix |Prefixes all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions. Example: `uprefix=ignored_` would effectively ignore all unknown fields generated by Tika given the example schema contains `` -|xpath |When extracting, only return Tika XHTML content that satisfies the given XPath expression. See http://tika.apache.org/1.7/index.html for details on the format of Tika XHTML. See also http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput. -|=== [[UploadingDatawithSolrCellusingApacheTika-OrderofOperations]] == Order of Operations Here is the order in which the Solr Cell framework, using the Extracting Request Handler and Tika, processes its input. -1. Tika generates fields or passes them in as literals specified by `literal.=`. If `literalsOverride=false`, literals will be appended as multi-value to the Tika-generated field. -2. If `lowernames=true`, Tika maps fields to lowercase. -3. Tika applies the mapping rules specified by `fmap.__source__=__target__` parameters. -4. If `uprefix` is specified, any unknown field names are prefixed with that value, else if `defaultField` is specified, any unknown fields are copied to the default field. +. Tika generates fields or passes them in as literals specified by `literal.=`. If `literalsOverride=false`, literals will be appended as multi-value to the Tika-generated field. +. If `lowernames=true`, Tika maps fields to lowercase. +. Tika applies the mapping rules specified by `fmap.__source__=__target__` parameters. +. If `uprefix` is specified, any unknown field names are prefixed with that value, else if `defaultField` is specified, any unknown fields are copied to the default field. [[UploadingDatawithSolrCellusingApacheTika-ConfiguringtheSolrExtractingRequestHandler]] == Configuring the Solr ExtractingRequestHandler @@ -194,7 +226,7 @@ You may also need to adjust the `multipartUploadLimitInKB` attribute as follows ---- [[UploadingDatawithSolrCellusingApacheTika-Parserspecificproperties]] -=== Parser specific properties +=== Parser-Specific Properties Parsers used by Tika may have specific properties to govern how data is extracted. For instance, when using the Tika library from a Java program, the PDFParserConfig class has a method setSortByPosition(boolean) that can extract vertically oriented text. To access that method via configuration with the ExtractingRequestHandler, one can add the parseContext.config property to the solrconfig.xml file (see above) and then set properties in Tika's PDFParserConfig as below. Consult the Tika Java API documentation for configuration parameters that can be set for any particular parsers that require this level of control. @@ -241,16 +273,18 @@ As mentioned before, Tika produces metadata about the document. Metadata describ In addition to Tika's metadata, Solr adds the following metadata (defined in `ExtractingMetadataConstants`): -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`stream_name`:: +The name of the Content Stream as uploaded to Solr. Depending on how the file is uploaded, this may or may not be set. + +`stream_source_info`:: +Any source info about the stream. (See the section on Content Streams later in this section.) + +`stream_size`:: +The size of the stream in bytes. + +`stream_content_type`:: +The content type of the stream, if available. -[cols="30,70",options="header"] -|=== -|Solr Metadata |Description -|stream_name |The name of the Content Stream as uploaded to Solr. Depending on how the file is uploaded, this may or may not be set -|stream_source_info |Any source info about the stream. (See the section on Content Streams later in this section.) -|stream_size |The size of the stream in bytes. -|stream_content_type |The content type of the stream, if available. -|=== [IMPORTANT] ==== diff --git a/solr/solr-ref-guide/src/v2-api.adoc b/solr/solr-ref-guide/src/v2-api.adoc index 51357ab08f9..6906b1c2210 100644 --- a/solr/solr-ref-guide/src/v2-api.adoc +++ b/solr/solr-ref-guide/src/v2-api.adoc @@ -30,9 +30,9 @@ For now the two API styles will coexist, and all the old APIs will continue to w The old API and the v2 API differ in three principle ways: -1. Command format: The old API commands and associated parameters are provided through URL request parameters on HTTP GET requests, while in the v2 API most API commands are provided via a JSON body POST'ed to v2 API endpoints. The v2 API also supports HTTP methods GET and DELETE where appropriate. -2. Endpoint structure: The v2 API endpoint structure has been rationalized and regularized. -3. Documentation: The v2 APIs are self-documenting: append `/_introspect` to any valid v2 API path and the API specification will be returned in JSON format. +. Command format: The old API commands and associated parameters are provided through URL request parameters on HTTP GET requests, while in the v2 API most API commands are provided via a JSON body POST'ed to v2 API endpoints. The v2 API also supports HTTP methods GET and DELETE where appropriate. +. Endpoint structure: The v2 API endpoint structure has been rationalized and regularized. +. Documentation: The v2 APIs are self-documenting: append `/_introspect` to any valid v2 API path and the API specification will be returned in JSON format. [[v2API-v2APIPathPrefixes]] == v2 API Path Prefixes @@ -43,15 +43,15 @@ Following are some v2 API URL paths and path prefixes, along with some of the op |=== |Path prefix |Some Supported Operations |`/v2/collections` or equivalently: `/v2/c` |Create, alias, backup, and restore a collection. -|`/v2/c/__collection-name__/update` |Update requests. -|`/v2/c/__collection-name__/config` |Configuration requests. -|`/v2/c/__collection-name__/schema` |Schema requests. -|`/v2/c/__collection-name__/__handler-name__` |Handler-specific requests. -|`/v2/c/__collection-name__/shards` |Split a shard, create a shard, add a replica. -|`/v2/c/__collection-name__/shards/___shard-name___` |Delete a shard, force leader election -|`/v2/c/__collection-name__/shards/___shard-name____/____replica-name___` |Delete a replica. +|`/v2/c/_collection-name_/update` |Update requests. +|`/v2/c/_collection-name_/config` |Configuration requests. +|`/v2/c/_collection-name_/schema` |Schema requests. +|`/v2/c/_collection-name_/_handler-name_` |Handler-specific requests. +|`/v2/c/_collection-name_/shards` |Split a shard, create a shard, add a replica. +|`/v2/c/_collection-name_/shards/_shard-name_` |Delete a shard, force leader election +|`/v2/c/_collection-name_/shards/_shard-name_/_replica-name_` |Delete a replica. |`/v2/cores` |Create a core. -|`/v2/cores/__core-name__` |Reload, rename, delete, and unload a core. +|`/v2/cores/_core-name_` |Reload, rename, delete, and unload a core. |`/v2/node` |Perform overseer operation, rejoin leader election. |`/v2/cluster` |Add role, remove role, set cluster property. |`/v2/c/.system/blob` |Upload and download blobs and metadata. @@ -68,7 +68,7 @@ To limit the introspect output to include just one particular HTTP method, add r `\http://localhost:8983/v2/c/_introspect?method=POST` -Most endpoints support commands provided in a body sent via POST. To limit the introspect output to only one command, add request param `command=__command-name__` . +Most endpoints support commands provided in a body sent via POST. To limit the introspect output to only one command, add request param `command=_command-name_` . `\http://localhost:8983/v2/c/gettingstarted/_introspect?method=POST&command=modify` diff --git a/solr/solr-ref-guide/src/velocity-response-writer.adoc b/solr/solr-ref-guide/src/velocity-response-writer.adoc index b9bc3bdce5d..3b576f6fb50 100644 --- a/solr/solr-ref-guide/src/velocity-response-writer.adoc +++ b/solr/solr-ref-guide/src/velocity-response-writer.adoc @@ -45,57 +45,62 @@ The above example shows the optional initialization and custom tool parameters u [[VelocityResponseWriter-VelocityResponseWriterinitializationparameters]] === VelocityResponseWriter Initialization Parameters -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`template.base.dir`:: +If specified and exists as a file system directory, a file resource loader will be added for this directory. Templates in this directory will override "solr" resource loader templates. -[cols="20,60,20",options="header"] -|=== -|Parameter |Description |Default value -|template.base.dir |If specified and exists as a file system directory, a file resource loader will be added for this directory. Templates in this directory will override "solr" resource loader templates. | -|init.properties.file |Specifies a properties file name which must exist in the Solr `conf/` directory (**not** under a `velocity/` subdirectory) or root of a JAR file in a . | -|params.resource.loader.enabled a| +`init.properties.file`:: Specifies a properties file name which must exist in the Solr `conf/` directory (*not* under a `velocity/` subdirectory) or root of a JAR file in a . + +`params.resource.loader.enabled`:: The "params" resource loader allows templates to be specified in Solr request parameters. For example: ++ +[source,bash] +http://localhost:8983/solr/gettingstarted/select?q=\*:*&wt=velocity&v.template=custom&v.template.custom=CUSTOM%3A%20%23core_name ++ +where `v.template=custom` says to render a template called "custom" and the value of `v.template.custom` is the custom template. This is `false` by default; it'd be a niche, unusual, use case to need this enabled. -`\http://localhost:8983/solr/gettingstarted/select?q=\*:*&wt=velocity&v.template=custom&v.template.custom=CUSTOM%3A%20%23core_name` +`solr.resource.loader.enabled`:: +The "solr" resource loader is the only template loader registered by default. Templates are served from resources visible to the SolrResourceLoader under a `velocity/` subdirectory. The VelocityResponseWriter itself has some built-in templates (in its JAR file, under `velocity/`) that are available automatically through this loader. These built-in templates can be overridden when the same template name is in conf/velocity/ or by using the `template.base.dir` option. -where `v.template=custom` says to render a template called "custom" and `v.template.custom` 's value is the actual custom template. This is disabled by default; it'd be a niche, unusual, use case to need this enabled. - - |false -|solr.resource.loader.enabled |The "solr" resource loader is the only template loader registered by default. Templates are served from resources visible to the SolrResourceLoader under a `velocity/` subdirectory. The VelocityResponseWriter itself has some built-in templates (in its JAR file, under velocity/) that are available automatically through this loader. These built-in templates can be overridden when the same template name is in conf/velocity/ or by using the `template.base.dir` option. |true -|tools |External "tools" can be specified as list of string name/value (tool name / class name) pairs. Tools, in the Velocity context, are simply Java objects. Tool classes are constructed using a no-arg constructor (or a single-SolrCore-arg constructor if it exists) and added to the Velocity context with the specified name. A custom registered tool can override the built-in context objects with the same name, except for $request, $response, $page, and $debug (these tools are designed to not be overridden). | -|=== +`tools`:: +External "tools" can be specified as list of string name/value (tool name / class name) pairs. Tools, in the Velocity context, are simply Java objects. Tool classes are constructed using a no-arg constructor (or a single-SolrCore-arg constructor if it exists) and added to the Velocity context with the specified name. ++ +A custom registered tool can override the built-in context objects with the same name, except for `$request`, `$response`, `$page`, and `$debug` (these tools are designed to not be overridden). [[VelocityResponseWriter-VelocityResponseWriterrequestparameters]] === VelocityResponseWriter Request Parameters -// TODO: Change column width to %autowidth.spread when https://github.com/asciidoctor/asciidoctor-pdf/issues/599 is fixed +`v.template`:: +Specifies the name of the template to render. -[cols="20,60,20",options="header"] -|=== -|Parameter |Description |Default value -|v.template |Specifies the name of the template to render. | -|v.layout a| +`v.layout`:: Specifies a template name to use as the layout around the main, `v.template`, specified template. - ++ The main template is rendered into a string value included into the layout rendering as `$content`. - | -|v.layout.enabled |Determines if the main template should have a layout wrapped around it. True by default, but requires `v.layout` to specified as well. |true -|v.contentType |Specifies the content type used in the HTTP response. If not specified, the default will depend on whether `v.json` is specified or not. a| -without json.wrf: text/html;charset=UTF-8 +`v.layout.enabled`:: +Determines if the main template should have a layout wrapped around it. The default is `true`, but requires `v.layout` to specified as well. -with json.wrf: application/json;charset=UTF-8 +`v.contentType`:: +Specifies the content type used in the HTTP response. If not specified, the default will depend on whether `v.json` is specified or not. ++ +The default without `v.json=wrf`: `text/html;charset=UTF-8`. ++ +The default with `v.json=wrf`: `application/json;charset=UTF-8`. -|v.json a| +`v.json`:: Specifies a function name to wrap around the response rendered as JSON. If specified, the content type used in the response will be "application/json;charset=UTF-8", unless overridden by `v.contentType`. - -Output will be in this format (with v.json=wrf): - ++ +Output will be in this format (with `v.json=wrf`): ++ `wrf("result":"")` - | -|v.locale |Locale to use with the `$resource` tool and other LocaleConfig implementing tools. The default locale is `Locale.ROOT`. Localized resources are loaded from standard Java resource bundles named `resources[_locale-code].properties`. Resource bundles can be added by providing a JAR file visible by the SolrResourceLoader with resource bundles under a velocity sub-directory. Resource bundles are not loadable under conf/, as only the class loader aspect of SolrResourceLoader can be used here. | -|v.template. |When the "params" resource loader is enabled, templates can be specified as part of the Solr request. | -|=== +`v.locale`:: +Locale to use with the `$resource` tool and other LocaleConfig implementing tools. The default locale is `Locale.ROOT`. Localized resources are loaded from standard Java resource bundles named `resources[_locale-code].properties`. ++ +Resource bundles can be added by providing a JAR file visible by the SolrResourceLoader with resource bundles under a velocity sub-directory. Resource bundles are not loadable under `conf/`, as only the class loader aspect of SolrResourceLoader can be used here. + +`v.template._template_name_`:: When the "params" resource loader is enabled, templates can be specified as part of the Solr request. + [[VelocityResponseWriter-VelocityResponseWritercontextobjects]] === VelocityResponseWriter Context Objects @@ -105,19 +110,19 @@ Output will be in this format (with v.json=wrf): [cols="30,70",options="header"] |=== |Context Reference |Description -|request |http://lucene.apache.org/solr/api/org/apache/solr/request/SolrQueryRequest.html[SolrQueryRequest] javadocs -|response |http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html[QueryResponse] most of the time, but in some cases where https://wiki.apache.org/solr/QueryResponse[QueryResponse] doesn't like the request handlers output (https://wiki.apache.org/solr/AnalysisRequestHandler[AnalysisRequestHandler], for example, causes a ClassCastException parsing "response"), the response will be a https://wiki.apache.org/solr/SolrResponseBase[SolrResponseBase] object. -|esc |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#EscapeTool[EscapeTool] instance -|date |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#ComparisonDateTool[ComparisonDateTool] instance -|list |A Velocity http://velocity.apache.org/tools/2.0/apidocs/org/apache/velocity/tools/generic/ListTool.html[ListTool] instance -|math |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#MathTool[MathTool] instance -|number |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#NumberTool[NumberTool] instance -|sort |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#SortTool[SortTool] instance -|display |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#DisplayTool[DisplayTool] instance -|resource |A Velocity http://velocity.apache.org/tools/2.0/tools-summary.html#ResourceTool[ResourceTool] instance -|engine |The current VelocityEngine instance -|page |An instance of Solr's PageTool (only included if the response is a QueryResponse where paging makes sense) -|debug |A shortcut to the debug part of the response, or null if debug is not on. This is handy for having debug-only sections in a template using `#if($debug)...#end` -|content |The rendered output of the main template, when rendering the layout (v.layout.enabled=true and v.layout=